Public Administration Committee: Open Data strategy

For the last 18 months, ORG has been watching the coalition’s Open Data policy. It has morphed from being an extremely ambitious project, with huge potential to revitalize democracy and drive innovation, to something rather confused.

It has failed to address the big questions around data sets that are vital as infrastructure, like maps and post codes, but are presently sold rather than freely released.

Worse, the Open Data agenda has been used to promote the release of access to psuedonymised data, like health or benefits records, on commercial terms. This carries considerable risks and has nothing to do with (freely released) Open Data.

Today, we are presenting evidence to the Public Administration Committee about the state of government Open Data policy. They are examining the National Audit Office’s report, which is critical of the government’s Open Data Strategy.

As a report by an auditor, it does not make much comment about the big questions. It does not comment about the value of data for democratic engagement, and accountability. Accountability is often the main reason for citizens to want government data, but that may not be high on the agenda of civil servants.

Nor does it talk about the need for citizens to gain access to more data as a balance against the knowledge gained and stored by commercial entities. Without these balances, “big data” offers a future where power and information are increasingly held by a small group of powerful commercial concerns who gain advantages from this informational disbalance.

Taking the NAO’s points in slightly more detail:

1 Governance

The NAO criticizes the Cabinet office for failing to focus on achieving value for money and has not “systematically assessed the costs and benefits of the Government’s specific transparency initiatives.”

They note the creation of the Open Data Institute and note that it has a role to “develop a fuller evidence base of the economic and public service benefits of open data”. (p6)

ORG would add that the department BIS has had a significant and distorting impact on governance. BIS acts as the “owner” of Ordnance Survey and other data publishers through the Shareholder Executive.

Data like detailed maps has commercial value. If it is to be published freely, this may have innovation benefits, but it has a direct cost.

The government has no means of assessing the impact of opening data to ascertain whether the social, economic and tax revenue benefits outweigh the revenue cost. For ORG, we believe that many of these data sets (like company information, detailed maps and post codes) form “data infrastructure” which needs to be freely available to interrogate and use other government data.

When ORG and Open Data campaigners met Ed Davey last year, his officials rejected the models put forward by Pollock, but could put no other in its place. Instead, they asked us to provide some evidence of the impacts.

In truth, modeling is the only answer anyone has. Impacts are unpredictable. The Government can only make informed guesses.

The result has been a compromise structure designed to put the argument into the future. The revenues generated by the sale of data rights by the the Public Data Group will be given to the Data Strategy board to buy the data back.

That presumably creates a dynamic where the most lucrative sales are protected in order to pay to release data that has low commercial value. This means that decisions will not relate to public and economic impacts, and may in fact do the opposite.

This is made more complicated by the fact that often the revenues generated are actually payments from government departments back to the government-owned companies. This further distorts the picture.

  • The PAC should ask BIS whether it really should be defending the interests of the companies it owns, or asking what will benefit the economy as a whole.
  • The PAC should ask the Cabinet Office, BIS and the NAC how they will stop the Public Data Group’s finance structure from restricting the release of high value, high impact datasets
  • The PAC should ask what economic modeling BIS has used and whether it can be publicly released
  • The PAC should ask the Cabinet Office and BIS whether they will be using modeling to make choices about data releases, especially around data that is currently sold on restrictive, commercial terms
  • The PAC should ask if the modeling used to assess Open Data impacts will itself be openly published so its assumptions can be challenged and refined in public
  • The PAC should ask if the Open Data Institute will be helping develop open modeling of the impacts of releasing Open Data including previous commercial data sets

2 Implementation

We had one criticism to add around implementation. Data release is meant to be ‘demand-led’ but we do not think it is in practice. Nor is there, to our knowledge, a definition of what constitutes ‘demand’.

For instance, Tim Davies and other campaigners asked for the release of the Strategic Export Controls database in a re-usable form in November 2010. It was eventually extracted and republished by the Campaign Against the Arms Trade. It has never been published as Open Data, yet it has a clear case for transparency and publication.

3 Supporting choice and accountability

The NAO make some legitimate comments about differing levels of publication of data and the need for an overall assessment of what is important to publish for these objectives.

However, as a policy concern, Open Data is currently regarded as essentially a support for the ‘Open Public Services’ agenda. The Open Public Services agenda has the aim of allowing different providers to run government services; private, public, charitable and community.

The Open Data agenda is much wider than this. Democracy, transparency and accountability are much more important than enabling different provider models, however laudable that may be.

Open Data should also serve service improvement from within government. Government should think about the data at users, managers and staff need.

When we think about how the data is used and interpreted, it becomes obvious that accountability must apply at all levels. Data at the delivery level may show which service or unit is failing, but the decisions that led to that failure must also be transparent. Otherwise it becomes difficult to actually address the real problem and there may be a tendency to blame the lowest level units rather than decision makers.

The PAC should ask

  • Why Open Data is placed within the Choice agenda, and if that is appropriate
  • How the Government will ensure that accountability is applied at all levels

4 Stimulating economic growth

We agree with the NAO that there appears to be a missed opportunity relating to traded data. The NAO highlights that estimates of data releases have ranged from £1.6bn to £6bn a year, yet only £49m over 20 years is claimed for the current releases.

BIS are the owners of the Public Data Group (Ordnance Survey, Met Office, HM Land Registry and Companies House).

They therefore hold the key to a great deal of the economic benefits. They have rejected the models put forward by Rufus Pollock, nor explained hy but have not released the research created for the Ordnance Survey by Consultingwhere.

Nor is the Ordnance Survey’s commercial data available in enough detail to assess the relationship between demand, price and usage for their data products. ORG asked for this data under FOI but it was redacted under commercial confidentiality. The equivalent data was not redacted by any other organization in a later FOI request concerning the business case to Harry Metcalfe.

BIS appear to have rejected the modeling created for the EU’s Public Sector Information review, known as the Vickery Review, which claimed €16bn of benefits for the UK (2008 figures). No study we are aware of claims that publication would have negative economic impacts.

Yet BIS also have no model to assess the economic impacts. Nor do the Cabinet Office.

In effect, BIS

  • Reject modeling done by others;
  • Ask us for evidence that data releases would create economic growth;
  • Refuse to give us the data which would allow us to create a case that this may be true

The decisions around this data are highly strategic, yet they have effectively been placed in the hands of BIS acting as Shop Stewards for the Trading Funds. Trading Funds need economic certainty, but that does not have to rely on the current trading models, especially if it can be shown that other means would better stimulate the economy.

5 Risks

Privacy

The NAO notes the is privacy risks of publishing some sorts of Open Data. It highlights the Keiran O’Hara report, commissioned by the Government Cabinet Office.

This report was not taken seriously. In the Open Data policy consultation, the report was diminished to a mere consultation response, listed as submission 119 just before Kirklees District Council.

That consultation also widened the Open Data agenda to include personal datasets, such as health or benefits data, that would be interrogated on commercial terms, where identities would be protected

ORG agrees with FIPR that this is an enormous technical challenge that is extremely difficult to achieve without creating risks to individual privacy. Re-identification of data is often a lot easier than supposed. Furthermore, the civil service lacks experience in assessing the problem.

FIPR propose that any technique should be subject to ‘peer review’. They argue that there is a need for any and all experts to be able to examine the techniques used for ‘anonymisation’ so that flaws in techniques can be spotted before they are used; and also to report risks in a responsible way when they are identified, so access to the data can be closed, or risks removed, beyond the public eye.

The PAC should ask the ‘Open Data Tsar’ Tim Kelsey:

  • What the official response of the Cabinet Office to O’Hara’s report is and which recommendations are accepted or rejected?
  • What commercial access to personal datasets has to do with Open Data?
  • Why this was included in the Open Data consultation?
  • Whether this question deserves a wider and separate public debate, goven the concerns about consent and privacy risks identified in the O’Hara report?
  • Whether the idea of ‘peer review’ for anonymisation techniques, and responsible disclosure when risks are identified, could help alleviate privacy risks?

Other risks

Privatisation of data sets: moving government services to a procurement model can introduce risks that data is privatized. This has been recognized but needs constant and careful attention.

Undermining government capacity: if information, such as datasets relating to delivery is released but information about the process of decision making is not available, misleading conclusions could be drawn as we note above.

For instance, to give a crude example, if a school performs badly, but the decisions about teacher training, government funding or rules about service procurement are not transparent, then the school might be blamed rather than the department.

6 Conclusion

The major omission made by the NAO is the relationship between BIS and the Public Data Group, in driving Open Data policy. Furthermore the financial relationship between profits from the sale of public data by the PDG being used to open public data through the Data Strategy Board seems counter productive and likely to lead to distorted outcomes.

BIS currently seems most concerned to protect the revenues and business models of the companies it controls, rather than assessing the benefits of opening these commercial datasets for the whole economy.

The NAO is correct to identify the lack of a model to assess the benefits of Open Data, by the Cabinet Office or BIS, and the PAC should question BIS further about its ‘pass the buck’ strategy around evidence, and its rejection of modeling, while failing to provide alternative models or the necessary evidence to create models.

There also serious questions to be raised in relation to the privacy debate around reuse of personal data which have been dragged into the Open Data debate for reasons which are not clear.