Is Open Data ‘what the doctor ordered’?

Demos’ panel at Labour’s conference today, with the medical information company Dr Foster, the Royal College of Nursing and Andy Slaughter MP, understandably focused on the role of information in the NHS. This turned out to be quite a controversial topic, but highlighted some of the important issues ahead for Open Data advocates.

Firstly, some members of the panel had a tendency to conflate ‘data sharing’ IT projects with ‘open data’. Thus we had a discussion about making health records accessible, and the recently abandoned, multi-billion pound Connecting for Health project. Its failure was outlined by Peter Carter of the Royal College of Nursing.

Yet ‘data sharing’, particularly of personal records, has little to do with open data. Nor has, per se, ‘big data’, that is the huge data businesses like Experian whose services are not dependent of a thriving open data culture. While these businesses may create value, they also have an interest in keeping the data closed, and charging for it. For many types of government data, this would be the wrong result.

Andy Slaughter MP made a strong case for improved transparency and the ease by which government departments can evade their freedom of information duties.

Roger Taylor of Dr Foster made a convincing case for the interaction between data analysis and service improvements.

Some members of the audience made observations about the quality of data collection and the need for analysis. Others gave examples of both incorrect data recording (including one patient – a head teacher – being wrongly described as a paedophile on his non-amendable summary care record) and good clinical results from inspection of data. Another described how he had, in one dataset, had only two patients, due to mis-assigning data. Several people described how difficult analysis can be, when factors such as geography and social class are very heavy influences on factors diet, exercise and smoking.

Thus analysis, and investment in collection and interpretation of data, were identified as very fundamental. Privacy, too, was a clear concern; although the last word on the boundary between acceptable sizes of data sets has not yet been had. Frequently, open data may have no impact on privacy. Sometimes, it may. While the expectation is to err on the side of caution, there is pressure to release particularly health data in ways that may not meet our expectations.

Which? from the audience asked in what ways we could expect the public to make genuine choices in health and education. In health, for instance, the factors patients judge by are often far removed from quality of clinical experience: they are more likely to be impressed by aspects of the personal experience, like the politeness of staff, timeliness of appointments, or even the entertainment provided by tropical fish.

My reply was that we should be concentrating on the transparency aspect, rather than always viewing open data as a market driver. Having said that, I added that transport data is a clear example of a type of data that can drive consumer choice (and that bus and train companies should make sure they view their business as getting people on transport, not selling access to timetables).

There is however a potential question about the possibility of private sector cherry picking on the basis of open data. Thus the public sector should be equipped to analyse and benefit from the data it generates. So far, this question has yet to be fully addressed.

Lastly, I made some observations about the Public Data Corporation. Javier Ruiz is making a thorough analysis of the PDC and the governments’ other open data consultations, much of which my comments here are based on.

An initial concern is how much data really will be released as a result of the PDC, since it should bring together all the bodies that generate revenue from their data, including the Met Office and Ordnance Survey. These bodies hold core, infrastructural datasets, that may be vital to interrogate other datasets (like, for instance, post code data, or core mapping data).

While some of this core data has now been released, the PDC may put pressure to keep more as closed, commercial datasets. There are also issues around the quality of collection of data, and its open-ness, if collection is privatized as many people expect.

All in all, this was a very interesting debate, which showed how far we have to go, especially in order to equip citizens, society and the public sector to really benefit from open data, as well as the barriers to getting it released.