Our view on the Public Data Corporation

PUBLIC DATA CORPORATION ENGAGEMENT

Open Rights Group

1 Introduction

The stated objectives of the PDC are:

  • Making data freely available and setting consistency on charges 
  • Centre of excellence, efficiency and lower costs 
  • Attracting private investment

As a starting point ORG believes that government should enable the development of data services and not compete with the private sector in the provision of these. Its primary role should be to provide the basis for social and economic development.

At the same time, there are some core functions that the state has to perform that are not the result of market failures, and that it would be difficult to transfer to the private sector without impairing the safety or security of the nation and citizenship.

The balancing of these considerations provides the basis for our analysis of the current proposals for a PDC.

This document is based on various conversations and online consultations we have carried out with our stakeholders. It is also based on information provided at the Cabinet Office engagement workshop.

Principle of free access

During our discussions with multiple stakeholders on the PDC, it has become clear that the established commercial sector in PSI is not opposed to charges as long as they are predictable and fair.

In contrast, civil and citizen groups that make use of public data to increase engagement or improve the use of public services, have expressed to us that charging even at low prices is a potential obstacle.

Incumbent new small businesses and independent developers are more mixed in relation to charges, but generally tell us that charging increases the entry barrier and makes recursive reuse more difficult, as costs are passed on downstream.

It is difficult to set a single principle for charging because public data encompasses very different areas, with different values: economic development, civic participation, etc. It is impossible to predict or decide a priori which is the complete value of a dataset. All data has some value in each area, for example planning data has clear economic value, but also in civic terms. Also, the value of information increases with reuse.

ORG’s view is that because of this impossibility to separate the terms, all public data (and generally digital public information with negligible marginal cost) should in principle be freely available, as part of the democratic process. Data and information are the basis for transparency and accountability of public officials, but also evidence based policy is increasingly based on data, and citizens need access in order to engage.

The economic case for Open Data

A full economic analysis of the advantages of providing open data for free reuse is beyond the scope of this engagement. However, we would like to stress that there is evidence that cannot be ignored in the drafting of this policy.

There are many well respected organisations providing a plethora of research papers and articles supporting our view, and we would hope that the decisions on data policy and the PDC will be based on evidence. i ii

2Data provision

2.1 Charging for data

Transparency in cost recovery

If exceptions must be made because of the high cost of data or other determinants, there should be a clear process for how the costs are determined and what alternatives there are to make them cheaper while ensuring quality and availability. Commercial confidentiality should not over ride this basic transparency.

Data produced as part of a core activity that is a public task should not be subjected to cost recovery.

First purchaser

Strict cost recovery means that primary data that has been paid for by a first purchaser that is a public institution should not then be converted into a revenue stream in secondary access.

This includes a lot of data that is absolutely necessary for public bodies, such as post codes, meteorological data paid by Defence, etc.

Multiple pricing systems should include non-profits

In some models, commercial uses of data is charged at different prices from non- commercial. This is problematic as there are many borderline uses, and it may hinder economic development. In many cases, two-tier pricing is translated into free for individuals but expensive for incorporated organisations. If this model is applied to some data there should be a third tier for non-profit organisations.

Issues with “Freemium” models

“Freemium” pricing models, based on free access to a limited amount of data or basic service level, are increasingly popular on the Internet.

In the context of PSI regulations this could easily fit along the lines of downstream and upstream data, but if applied to the same dataset it can lead to problems of fairness.

Freemium pricing in the commercial world is designed primarily to drive customers towards fully paid services. Multiple pricing models in public sector information should reflect diverse needs, social sectors or principles, not be used to deliver substandard services to citizens.

Charging for data creation instead of consumption

It has been proposed elsewhere iii that registries that depend on external input for updating (companies, land, etc.) could adopt the principle of financing through extra charges for adding / creating data rather than sale / reading.

While we agree with the principle, this would depend on effective external regulation of charges to ensure profit imperatives do not trump over cost recovery. In practice this would apply to a limited set of public data providers.

2.2 Scope

The Cabinet engagement website mentions a limited set of data areas (registrations, environmental science, built environment and critical infrastructure), but we are told this is just a temporary focus. Eventually the PDC should cover a lot more.

Data demanded

In our own consultations, and in some cases also at workshops we attended organised by government, there have been repeated calls for the early release of data with demonstrable high demand.

1. Number one among these is transport data. This is the most popular data in the London Datastore. The difficulties with private sector involvement are understood, but many people feel this would provide a headstart in development of applications and utilities.

2. Postcodes and addressing. We are told that the Address Management Unit responsible for postcodes cannot be touched while the future of Royal Mail is under consideration, but addressing is seen as key by many people we spoke to.

3.Local authorities are currently out of scope, and it seems difficult to foresee how they could be integrated in the centralised framework currently being discussed. However, much of the data that matters to citizens is held at local level.

Private data

Besides transport, in critical infrastructure there is a lot of data produced on behalf of the public sector, or held by utilities and other private companies. Much of this data should be within scope, and it has been suggested that data sharing is routinely included as a precondition in public contracting processes. Given the increased provision of public services by both commercial and non-profit organisations this should be established as a priority.

2.3 Delivery

The discussions of how exactly the PDC would deliver its mission have generated very heated debate. The following are some of the issues raised:

Information and availability of data

The current system of information asset registers and publishing schedules does not work well, although the amendments of the FOIA may improve some areas. Overall, we believe an agile system for requesting information about data available may be better than relying on supply side information only.

Data portals

Data portals are useful to give an overview of what is available and search across departments. However, they could potentially increase separation with the producers of the data. It is important to know who is responsible for a dataset and be able to contact them for corrections or explanations. The PDC should also provide means for direct contact.

Versioning and public archiving of data are also seen as problematic with existing portals. Apparently, TNA archives full web pages with the data embedded, but in many cases there is no direct access to the older datasets from near the newer version.

Single “common customer interface”

In principle we welcome the idea being circulated of a central point of contact, as there are many complaints from people around ORG about complex negotiations and even conflicting licensing. However, many of the issues around negotiations seem to relate to the aggressive approach taken by employees of some public bodies. Unless these attitudes – and the commercial imperatives driving these behaviours – are tackled, the PDC will simply transfer the problem to a new organisational framework.

It has been raised that the PDC should also have appropriate interfaces for citizens requesting data, not just businesses.

Relationship with existing bodies

It appears that the PDC could take over some of the functions of trading funds, but it remains unclear how this would work. This is a crucial aspect, and our view is that it should not repeat past mistakes in a larger scale. We believe the PDC should be an independent body with the required powers to drive the delivery via existing public organisations, which should be reformed in parallel.

Issues with direct sale of data through the PDC

If the PDC deals with direct sales of data, as in a centralised continuation of the trading funds model, it will fall under the same imperatives of these. The PDC will probably see itself as a commercial organisation mandated to extract value and income from customers in an adversarial relationship. The PDC provides an opportunity to focus on making data available as a priority. We believe the PDC itself should only provide free data.

For paid data the PDC should provide brokerage across organisations and financial structures for a single point of payments.

3Working across government

The PDC will need to develop relationships with a very broad range of public bodies. It will also require a strong capacity and sufficient authority to deliver its mission, which once defined should be clear and simple.

The current potentially conflicting demands – delivering free open data and attracting private investment- could lead to confusion over mission.

Advocacy

The Cabinet Office has been delivering the open data and transparency agenda in its 2011-12 business plan. However, in order to make this a lasting improvement, this role should be independent of the priorities of the government of the day. The PDC should have a clear advocacy role.

Decision on data

The PDC should have a clear unambiguous mandate to deliver free open data from across government in as much as possible. The PDC should decide if exceptions to this principle are merited after examination of evidence submitted by data holders and requesters.

Regulation

In several discussions the need was raised for some regulation. Market regulators, such as OFCOM, deal with private sector, but this is a different case as it involves public bodies.

The prevailing position seems to be that there is a need for the function, but not necessarily in the same body as the PDC, although this may make sense in terms of minimising bureaucracy.

The issue then is that there should be at least a 2 step decision process for appeals. The appeals could be provided by a small external board. In any case, the final decision for data release should be removed from the data holder.

Governance: A Data Ombudsman?

Our main concern with traditional sector regulation is that it generally is a cumbersome slow process that would deter all but those with strong financial interests and capacity.

Another concern is that regulation has to be seen as independent and neutral, and this could conflict with the mission.

There is a need for agile conflict resolution over data demanded with fast appeals. We see this as a critical success factor for public data policy and the PDC.

A Data Ombudsman function in the governance process, with a less neutral position tilted towards citizens and open data, and fast processes, would be necessary.

Technical Capacity

Whether the main role is delivery or regulation, the PDC should be well resourced. At ORG we have been dealing with problems in other areas, where the bodies responsible lack the technical expertise and capacity to intervene.

As in other domains of cutting edge technology, it is impossible for government to attract all the required know how, so in addition there should be good flexible processes for consultations with independent experts.

Standard Licensing

ORG’s position is that public data should be released freely under open licenses. Where data is sold, the PDC should deliver data under a limited set of standard licenses rather than enter complex negotiations for each dataset. The standard licensing scheme could be quite granular to cover different cases, but it would remove uncertainty and lower costs.

Quality of data

This is a commonplace problem, and it requires standards, support for data publishers and feedback mechanisms. Issues range from relatively minor glitches, such as tables released without headers (e.g. National Statistics Postcode Database), to correcting dangerous errors for GPS navigation.

Customer channels, value and free data

Pricing is not just about exchanging money for goods, it’s also about feedback and information about customers needs. In zero price models, there is a need for alternative channels between end users and producers to improve products. The PDC will have to engage data users, both within government and outside, and fulfil this function.

Privacy

Current frameworks for privacy risk assessment for statistical release control in closed environments are inappropriate for open data. ORG is currently co-organising a series of workshops with European partners towards the publication of privacy guidelines for governments and advocates of open data.

At present, ORG’s preliminary recommendations are transparency in the decision making for choices of any anonymisation technologies for aggregated data, and good feedback mechanisms to communicate potential breaches.

Government procuring private data

As the PDC develops capacity, it could also improve availability of private data to public bodies by getting better deals and applying economies of scale. Privately produced but publicly available data (market data for instance) could also be channelled through the PDC.

Dynamic data and APIs

Data that is changed in real time or quite frequently benefits greatly from being delivered via some online services, such as APIs, that allow “plugging in” applications and other services. This incurs considerably higher costs than simply making the data available for bulk download, but we believe that in the current technological climate it should be considered a core element of the delivery of this sort of dynamic data.

 

4Sustainability and attracting private investment

There is a fundamental distinction to be made here between sustainability of the delivery of quality data by the state, and generating income for the state in the context of economic crisis.

Privatisation of Trading Funds

One possibility would be to simply sell off the basic data infrastructure provision of the trading funds and regulate the sector. Indeed, the current crisis has been perceived as an opportunity by many in the business community:

“Infrastructure privatization poses significant opportunities, according to Mr. Herhalt. He points to the inclusion of private investment in the transport sector of several European countries – something which has become commonplace after many years. “Involving the private sector more in infrastructure also meets a growing demand for access to the stable revenue streams infrastructure assets can generate.” iv

However, although it is clear how stable revenue streams are important in business portfolio management, it remains unclear how beneficial privatisation would be for the public purse in the longer term, as the biggest consumer of public data in most cases is the state itself. Putting basic data provision completely outside state control would potentially negate any opportunities for more efficient use within public bodies.

Private investment in raw data production

We are told that the aim of the PDC is to attract private investment at every level of the data process, including basic data collection and production, not just refined value added services.

Although less dangerous than an irreversible privatisation, this is potentially very problematic, and would require extremely careful governance:

•Private companies would need very strong incentives to encourage data reuse beyond first sale, and it just seems implausible that they would provide free open data unless forced to do so.

The provision of quality data would be very difficult to guarantee when cost drivers to cut prices are freed from other imperatives.

Intellectual property in a mixed environment would become complex to manage and could hamper reuse and further economic development.

• If complete competition was to be encouraged at this basic level, there could be potential issues about lack of comprehensiveness and fragmentation of data. In other cases we could find duplication of efforts and wasted resources. Some data collection can also be extremely costly in environmental terms (e.g. flights and launching satellites for mapping), and so the overall cost to society could increase.

• In practical terms private investment for raw data production could mean that PSI regulations would not apply, and the whole sector would be regulated by contractual law. Although the initial contracts would probably reflect a continuity of the spirit of PSI reuse, there is no guarantee in the long term.

•Commercial confidentiality of contracts would need to be removed to allow scrutiny.

• Know how and expertise could disappear from public bodies. This is already acknowledged as an endemic serious problem in government ICT management that hampers even the tendering and contracting process for external services.

Pledge funding model and data as infrastructure

Internet infrastructure costs are often shared by key users but without demanding exclusive access. Web server software, large data pipelines, etc, are paid for by businesses that require these goods as an enabling component, but not as a unique competitive advantage to be denied to others. We could see some basic public data in a similar light.

Datasets could be put up for pledge auction and businesses that want the data to exist would pledge to pay for it until enough funds are raised. The critical aspect here is to ensure the data remains open for everyone after its sustainability has been reached.

This system would work for datasets that have high costs and a small number of customers. For example, court cases currently passed on by the Registry Trust to credit agencies could be a good candidate, as the latter would undoubtedly pay for their availability, but their business model does not rely on exclusive access.

Public funding and private innovation

For data that is absolutely required for core government tasks the model advocated elsewhere of public funding and taxation seems the best option, as seen in the evidence contained in the references.

 

i http://wiki.linkedgov.org/index.php/The_economic_impact_of_open_data

ii http://www.lapsi-project.eu/biblio iii http://www.rufuspollock.org/economics/papers/economics_of_psi.pdf iv http://www.kpmg.com/Global/en/IssuesAndInsights/ArticlesPublications/Press- releases/Pages/Unprecedented-public-sector-deficits.aspx)

Contact: Javier Ruiz javier@openrightsgroup.org