May 13, 2013 | Jim Killock

EE and sale of user data: does Anonymisation work?

This afternoon, EE called ORG to ask us about our blog. They did not question the article, but confirmed that it is their belief that IPSOS MORI employees misrepresented what the data they are offering can do.

They said in response that “most” of the data is large, aggregated datasets, of around 50 users. However, their customers currently don’t know how and when their data might be aggregated or made available in an anonymised form.

Anonymising datasets rarely prevents re-identification. For instance, Nature highlights research showing “in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals.”

Cambridge research on network identification shows similar kinds of results.

In response to these publicly-aired concerns, the CEO of Ipsos Mori offered data to researchers:

Ben Page, Ipsos MORI ‏@benatipsosmori
@PlanetJamie39 @PaulbernalUK @patrick_kane_ I don't see why not. Should publish peer reviewed paper on this data

But there are other answers to the problem, other than waiting for a public outcry. These are

  1. Ask for users’ permission before offering their anonymised data. Make this legally required in data protection, helpfully being debated right now.
  2. Open anonymisations techniques for peer review. Then the best brains can help spot mistakes. Such approaches take place in security software, e-voting software, and of course in Open Source software more widely.
  3. Offer “responsible disclosure” mechanisms for people to explain when they see mistakes, so data providers can stop the problem.

Mobile companies are not the only people playing with fire in this way. There are also government data initiatives, which are even more worrying, looking at personal health data, education and benefits data.

If you want to do something today, why not ask your MEP for strong data protection, as a first step?

Comments (2)

  1. Bert Eaton:
    May 14, 2013 at 10:39 AM

    Several thoughts

    1) 50 is too small to say aggregated data. More info is needed on how subsets of 50 peoples data has been generated, but the assumption should be that it is not anonymised
    2) EE could be helpful by stating the nature of the any conditions onm subsequent use of the data by third parties. Have they a clear an enforceable contract with the third party that is in line with the basis on which they original collected the data
    3) Taking into account point 2 - when selected the subsets of 50 customers - did they for example have sufficient data to exclude any customers that explicitly stated they did not want their data sharing with a third party

    There are many more issues to this, not least that the reality is that once the data is passed to the third party - control has passed from A to B unless strictly and enforceable controlled by a contract - it is not at all clear how well controlled this is

  2. Jim Killock:
    May 14, 2013 at 09:33 PM

    Agreed, 50 is a low number, so it depends what is aggregated and what's revealed. EE need to open up their systems for analysis as they’ve said they will.