24 Aug 2011 Javier Ruiz Open Data

Access to the Agreement between Google Books and the British Library

CC-BY John McCullough The Google Books project has been the subject of protracted legal battles, generating a huge debate as to whether it will help authors distribute their work or turn them into low paid employees of the corporation. Most of these debates have focused on books under intellectual property restrictions, with less debate covering the inclusion of out-of-copyright works.
The British Library recently announced to much fanfare a deal with Google to make available online a quarter of a million books no longer restricted by copyright, thus in the public domain.

The deal is presented as a win-win situation, where Google pays for the costs of scanning the books, which will be available on both Google and BL’s websites. This sounds very philanthropic from Google, however the catch is in the detail:

“Once digitised, these unique items will be available for full text search, download and reading through Google Books, as well as being searchable through the Library’s website and stored in perpetuity within the Library’s digital archive.”

In order to find out what this really means we asked the British Library for a copy of the agreement with Google, which was not uploaded to their transparency website with other similar contracts, as it didn’t involve monetary exchange. This may be a loophole transparency activists want to look at. After some toing and froing with the Freedom of Information Act we got a copy, which can be downloaded here:

British Library Google Books Agreement

Notice: Google has kindly agreed to the publication of the agreement, while asserting their copyright over it and wishing to restrict further re-distribution.

The document seems to follow similar agreements with US libraries, but please let us know in the comments or by email what you think. Our preliminary views are below.

Restrictions

The agreement has clauses that in a nutshell mean that only Google will be able to do anything they want with the scanned books, while BL will have restrictions on what they can or cannot do with their digital copy of the scans. BL will be able to display the books in their website, but must prevent commercial use (e.g. print on demand), redistribution of the copies or automated downloads. Google will primarily index the books, but will also be able to license or sell copies and make them available for printing.

This is understandable, as despite its laudable motto to “don’t be evil” Google is not a charity, but a very successful business that is investing hard cash on scanning books in order to make a profit elsewhere. It must restrict access to the books to competitors. But, however natural it may be, is this a satisfactory state of affairs for the public interest and the protection of the public domain?

Free as in beer

Google already has digitised and made freely available over 15 million books in the public domain. This is a good thing in principle, but is it wise to base national policy for the digitisation of literary works on the good will of a corporation? There is a clause (4.3.1) in the Agreement that would lift restrictions on the Library if Google fails to provide free online access to the public domain works for a certain period of time.

While this provides some safeguards, public institutions should look for a mixed model that avoids relying excessively on one single partner. There are other initiatives promoting open access -such as The Internet Archive – which should be given consideration.

Copyright Year Zero

An issue with public private partnerships for digitisation is the creation of new intellectual property. This is not generally a problem in the USA, but in UK digital copies may attract a new copyright, although this is unclear. When combined with restrictions on access to original works, this could create a de facto “copyright reset” on materials that have long entered the public domain. This would place restrictions on redistribution and reuse of the digitized books, making derivative works very difficult and expensive.

The agreement clearly claims all rights on Google’s digital copy in clause 4.2. In the case of Google fortunately this seems less of an issue in practice, as their business model is not based on selling access to the public domain. There is an intractable conflict between open access and placing restrictions on public domain works via digitisation contracts –mass downloading, text mining, redistribution, etc. — instead of copyright, which remains an issue here and elsewhere.

Monopoly concerns

Concerns have been raised with this concentration of digital works under one company, although in theory anyone else can step forward and scan those books again as the Agreement is not exclusive.

This sounds reasonable until you have a look at the wider picture of the digitisation of culture in Europe. Nick Poole, from the Collections Trust estimates the costs of digitising the contents of Europe’s museums, archives and libraries, including the audiovisual material they hold at €100 billion over ten years, with another €10-25 billion over the following ten years to maintain it and make it available.

With such a need for investment, it would be reasonable to expect that works be digitised only once, with a common strategy for ensuring the eventual incorporation of all works to an unrestricted public domain digital network run for the public interest in the same way as physical national libraries.

Length of restrictions

If Google and other companies are to invest in digitisation they will expect a profit, which will come from some restrictions. The Agreement establishes such restrictions for a period of 15 years.

A recent EU report on the digitisation of culture called The New Renaissance defines optimal arrangements for public private partnerships, and sets a maximum of seven years for preferential terms. This is perceived to strike a balance between the interests of businesses and public institutions.

Most digitisation agreements of this kind by The National Archives and British Library are set to last ten years. We believe that a transparent cost recovery should inform the length of restrictions, with a cap of seven years as recommended at European level.

Open Data and the Strategy for Growth

Google does not make any money selling public domain books, but it uses them for text mining, for its search engine and translation software, which is seen as the main business objective of the whole operation, including the digitisation of in-copyright books.

The Agreement contains provisions for non-commercial access to the material by non-profit institutions for academic and research purposes, although the latter will have to sign a separate contract with Google. There is also a welcome clause explicitly allowing for metadata to be included in the Europeana database (4.9).

The Hargreaves Review of copyright in UK proposed a “wide non-commercial research exception covering text and data mining” because this area is perceived as critically important. Separately, the recent consultation paper on Open Data envisions that public data will be one of the engines of innovation to overcome the economic crisis, deserving a section in the forthcoming Strategy for Growth.

If we look at the Agreement from this perspective we see that allowing non-commercial research is laudable, but many opportunities for innovation will require commercial input and it will be up to Google to determine what counts as commercial in the research access contracts. Economic growth will be lost if start-up companies are denied the chance to innovate by incumbent businesses such as Google.

Public funding

If the government wants to stimulate growth through open data it needs to put its money where its mouth is and provide adequate funding. Besides, the digitisation of cultural and archival materials, into datasets, media, texts and metadata, should be a natural extension of the mission of public institutions. However, in our conversations with the British Library, the response is always the same: there is simply no money being provided for digital activities.

In addition, this process distorts the values of cultural institutions, which increasingly perceive digital activities as a source of revenue similar to the ubiquitous gift shop. Thus, libraries and museums attempt to claim and enforce copyright over digital copies of public domain works themselves.

The British Library already has a digital collection of public domain works which are not open and freely accessible, in part due to the perceived loss of potential revenue. We would like to see a commitment form the British Library to make public domain books fully available once they are free from contractual restrictions. However, we understand this entails some funding, which is not generally available.

A recent report on Funding of the arts and heritage from the House of Commons Culture, Media and Sport Committee contains one single passing reference to supporting the “challenging transition to the digital age”, and some praise for the efforts of the Arts Council Collection to digitize and put their works online. There is no vision for the internet age.

Without a national strategy for the digitisation of culture, supported by an adequate mix of government and private funding, public institutions will be at the mercy of a handful of businesses, which is not beneficial for the public interest. This should be seen as money well invested in the future.