Open Data, Privacy and Anonymisation Briefing

Background

The UK government has been following a policy of opening up public data, carrying on from the previous administration and the recommendations of the highly influential Power of Information Review.

Open Data policy until now has been structured along 3 lines: 

–       Transparency and efficiency: opening the COINS database to “an army of armchair auditors”, Right to Data reforms to FOIA.

–       Delivery and outcomes: publishing health and education data to support the Open Public Services choice agenda.

–       Infrastructure and economic development: reforming trading funds, Ordnance Survey OpenData, postcodes, Public Data Corporation.

Current policy proposals

There were increasingly clear signs that personal data was moving centre-stage including a consultation in the summer. The autumn budget and related announcements have clarified the exact proposals [1].

There are many data areas covered in the budget statement (transport, weather, etc.), but for the purpose of anonymisation and data protection the main issue took many people by surprise:

“The new Clinical Practice Research Datalink will set up a secure system to deliver data extracts, using linked data from primary and secondary care and other sources, on a routine basis at an unidentifiable, individual level.”

(Cabinet Office)

 

Other area of concern is welfare, where some proposals are less mature:

–       Fit Note anonymised shared/sold data

–       Universal Credit aggregate open data

–       Linking welfare data to commercial datasets “to increase value to industry”. This may be starting soon in a deal with Experian [2].

All along the government has been loudly claiming that privacy was a red line that would mark the limits of the OD agenda.

 

Issues and Risks

We share concerns with many of the key issues about releasing anonymised data presented by Kieron O’Hara in his independent review:

Existing context of inadequate protection

Bad record of handling personal data in the public sector has created lack of trust and the way these policies are rammed through will further undermine the prospects for evidence based debate and informed consent.

The UK is already a world leader in citizen surveillance and linking of data among public bodies [3]. These plans sanction the status quo and blur the lines between state and private data spheres.

Weak implementation of EU data protection in UK legislation may not provide adequate protection for anonymised data, which can be used and shared without consent. Public benefit can legally override consent for sharing patient information but this defined ad-hoc, e.g. cancer research in 2002 parliamentary Statutory Instrument.

Lack of technical expertise

Hospitals, GPs and schools are not prepared to deal with routine generation and processing of anonymised data. Thus it is more likely that data will be sent in an identifiable form to “anonymisation havens” provided by specialist units or commercial providers [4].

There are also concerns a general lack of high-level skills in anonymisation in UK will not allow for proper scrutiny of commercial technologies. This includes the ICO.

Risk of re-identification of anonymous data

The current consensus outside government appears to be that we cannot guarantee the complete confidentiality provided by anonymised data once that the data could be combined with other sources of information.

There is an increasing body of academic research pointing at the potential pitfalls of anonymisation [5]. The re-identification of anonymised data is a complex computer intensive process that may not be routinely carried out to the point of making the technology completely ineffective. However, it is feasible enough to make it impossible to guarantee privacy to a similar level as before the changes are introduced.

The government continues to gloss over this in policy documents and public announcements, and has more or less ignored the independent review by Kieron O’Hara that laid out these issues in detail [6].

This is a key underlying problem because the only way the government can claim to have privacy as a red line is by trusting the technology puts anonymised data outside the realm of protection.

Data linking and sharing of data

The government is keen to promote data sharing and linking. There are separate plans in the Cabinet Office autumn announcement to give NHS users online access to their records with the idea that they will download and share that information with businesses:

“These measures will help to position UK companies in the development of a personal information market, which is likely to be the next stage of development on from the growth of social networks.”

The implications of this kind of market for anonymised data are not well understood, but it would appear that routine data linking increases the risks of re-identification.

Uncontrolled linking of anonymised personal data with social media and private commercial databases such as Experian or Rapleaf [7] increases the risks of re-identification.

There are also emerging issues around power imbalances from information asymmetries and democratic control of big data that do not fall under a narrower remit of privacy.

Lack of proper debate and consultation

There has been a lack of proper consultation and debate, as piecemeal consultations involving different departments mean the right stakeholders may not have been made fully aware.

The original announcement of NHS data sharing can be found in the in the Plan for Growth (March 2011) [8]:

“The Government will build a consensus on using e-health record data to create a unique position for the UK in health research.

(…) including more powerful uses of anonymised data sets and aggregated prescription data linked down to GP practice level. That can happen only if there is robust protection for individual patients’ confidentiality and privacy.”

The Cabinet Office consulted the open data plans for the NHS over the summer in the context of Open Public Services, not of fully sharing individual anonymised NHS records with industry for research.

Apparently there were some parallel consultations in the DoH, where Patient Concern pulled out claiming they were only asked to applaud. Only now after the announcement we learn of plans for a consultation to change the constitution of the NHS in order to make sharing of personal data for research the default.

The political dynamic

The different policy announcements have made it difficult to analyse the implications as they present a confusing mix up of:

–       open data (uncontrolled free online access with an open license for reuse and technically accessible)

–       data sharing (synergies and efficiencies of increased semi/controlled access)

–       big data processing (generating unique new insights by processing, combining and mining huge datasets)

This has allowed government to conflate very different issues and muddle the waters, but there are several worrying aspects that are coming into sharp focus.

The coalition government promised to break with Labour’s authoritarian streak and had an initial honeymoon with civil liberties. The autumn statement announced a bonfire of liberal causes in the name of economic growth, and patient confidentiality went out along with national parks.

The primacy of commercial exploitation of data and disregard for privacy seems to have taken over the policy seat, quite literally. Since the spring, Tim Kelsey has been the Transparency Tsar in charge of opening data, seconded from business consultancy McKinsey. Kelsey is on record saying, “no-one who uses a public service should be allowed to opt out of sharing their records” [9].

Another indication of how policy is shaped can be seen in how the autumn statement proudly claims:

“These measures have been developed in collaboration with nearly 120 existing commercial enterprises – from GlaxoSmithKline, Experian and SAS UK, to Action 4 Employment and high-tech digital start-ups.”

This is far from the promised “consensus” and it appears that civil society groups are being sidelined as powerful interests hijack the open data agenda.

Some concrete proposals

Complete clarity and separation of issues in debates on public data

Open Data should not be associated with proposals to share personal data with businesses, as each policy has its own implications. The open data community needs to get a louder voice.

Wider participation in policy making

The preference for business interests is producing a skewed policy. Some spaces, such as the Transparency Board involve open data civic hackers and web technology experts, but this is clearly not enough. Input should be widened to include civil groups looking at other policy aspects, users and experts in anonymisation. It is disturbing that Patient Concern resigned from the consultation board of the NHS reforms.

Informed public debate on the limits of anonymisation

The government must stop pretending that anonymous data removes data from privacy concerns and data protection does not apply. Privacy groups need to make an effort to mainstream this issue and push the media to start challenging these claims.

Give proper consideration to the independent privacy review

The independent review by Kieron O’Hara should not be treated as just another contribution to the consultations. Many groups and individuals spent time and energy informing the review and it does contain sensible and fairly mild recommendations.

Open peer review and responsible disclosure of anonymisation technologies

Given the impossibility to predict the vulnerabilities of applying anonymisation mechanisms to specific data the scrutiny should be possible through engaging the technical community via disclosure and open review similar to those found in other computer security areas.