Dr Richard Clayton is a software developer by trade, but since 2000 he has been an academic at the University of Cambridge. He conducts research into email spam, fake bank “phishing” websites, and other Internet wickedness. He has assisted the APIG and APComms all-party groups of Parliamentarians in several inquiries into Internet issues, and has acted as a specialist adviser for Select Committees of both the Lords and Commons in various inquiries into Internet security topics. He is treasurer of FIPR (Foundation for Information Policy Research) and a member of the Open Rights Group (ORG) Advisory Council.
Surveillance technology is of two main types – equipment that keeps tabs on you in the physical world, and processes that track your activity “online” where computers keep a record of your communications and your financial activity.
The physical world is reasonably straightforward to understand. As is well known, very large numbers of CCTV cameras are installed in public and private spaces in the UK and a recording will be kept of what they see. The cameras may be fixed, or a remote operator may be able to choose where they point and how much they zoom in. The quality of the images captured varies very considerably – with older systems merely producing a blurry impression of what has occurred with little hope of identifying the people concerned. Newer systems can produce high quality material that will enable precise identification of individuals and may also capture audio to accompany the pictures.
Experimental systems process the recordings in real time, trying to match the images to mugshot galleries – with mixed results. The early trials, for example in Newham, were significantly overhyped – but face recognition technology continues to improve. Other systems attempt to identify individuals by their gait as they move from camera to camera, and others detect when people don’t follow the usual routes through an underground station or a car park – perhaps because they are ill, or suicidal, or are contemplating a theft.
The police ANPR (automatic number plate recognition) system records the index plates of vehicles passing an extensive system of cameras on major (and minor) roads. These systems permit immediate dispatch of interceptor vehicles when cars are stolen, or after-the-fact analysis to determine which vehicles were at a crime scene, where they came from and where they went to. More targeted deployments of cameras, linked to online databases, are used for random checks to determine if passing cars are insured and have a current vehicle excise license.
The examples so far have been mass surveillance systems – more directed surveillance traditionally involved police officers staking out a house and noting all the comings and goings. Nowadays, less labour-intensive surveillance may involve cameras and microphones being surreptitiously installed into light switches, or into a car, the location of which is continuously recorded by an attached GPS tracking device. The police are also getting interested in using drones for surveillance – mainly as a cheaper (and quieter) alternative to the police helicopter.
“Online” tracking can be equally revealing of people’s actions and movements. The records created by an Oyster card user will show where they have travelled over the preceding couple of years; their bank and credit card records will show where they have been spending money and give an indication of what they have been buying. Companies like Tesco will, for those customers with loyalty cards, have complete records of their purchases and will be able to make educated guesses about their lifestyle, living arrangements, and some aspects of their medical history.
Mobile phones continuously interact with nearby cell towers so that incoming calls can be delivered. The phone companies are obliged to retain data about the location of a phone whenever a call is made or received, but if your phone is powered up then they have access to your location at all times and can provide this to law enforcement in real time if this is required. The location information can be extremely precise within cities, but in rural areas it may only give the most general of positions. However, many new phones are capable of learning their exact position using GPS and this information can be remotely accessed.
The records that telephone companies (both fixed line and mobile) keep can be rapidly interrogated to provide lists of calls made from any particular phone, or to any particular phone. These lists will also include the duration of the call and the physical location of the endpoints. Mobile handsets have a unique identifier called an IMEI; this is independent of the phone number which is set by the SIM card inserted into the phone. Call records can be identified either by the phone number or the IMEI device identifier – permitting the tracing of phone activity even when the SIM has been changed.
When interaction is by email instead of by phone then the authorities can still get lists of who is communicating with whom. The email provider is obliged (if they are within the European Union) to keep records of who email was sent to or from, along with timestamp information and exactly how large each email was. Once again, law enforcement regularly requests lists of this email metadata, which can be indexed by sender or receiver.
However, when the email provider is not within the jurisdiction then they may not be obliged to keep records of activity or they may not be prepared to hand over substantial amounts of detailed information to foreign law enforcement. One way of tackling this problem is for UK ISPs to wiretap all the Internet traffic going back and forth to the foreign website and then to reconstruct the email metadata for that site from the wiretap results.
Proponents of this approach claim that by using DPI (deep packet inspection) equipment they can straightforwardly extract a summary listing of what is in someone’s mailbox – these are fixed format strings that are easy to locate on the page – whenever the user is looking at the relevant page. The content of the email would not be captured by this method – that would be “interception” and require special warrants – but the metadata created would be equivalent to what a UK-based provider is already required to keep.
It remains unclear how cost effective this DPI approach will be – as traffic levels increase, ever more DPI equipment would be needed. As a technique it is completely stymied by the use of web page encryption (which turns the traffic on the wire into incomprehensible blobs of data) and would doubtless be very fragile in that minor changes to the email website would require the data extraction code to be recast. The system could never be capable of handling more than a handful of foreign websites at any one time and operational security considerations make it unlikely that details about the currently targeted sites could be shared with the junior police officers who might to use the data for their investigations – i.e. it’s hard to see how this sort of data could ever be used for anything outside the national security ambit.
There is, of course, all sorts of other surveillance-relevant information available on the Internet freely available that anyone can access. The police call this “open source” data and it includes Facebook pages, Twitter feeds, postings on public web forums and so forth. Should any of this information be of interest, then the police will generally wish to ascertain who has posted it. The owner of the relevant website will, upon receipt of a properly formed request, be able to supply the “IP address” of the poster. Police will also be told the IP addresses of computers used in hacking attacks and other bad events. The ability to determine which user’s account was associated with a particular IP address at a particular time is called “traceability”.
Computers connected to the Internet are given a unique “IP address” and data packets are routed towards this IP address, so where there is direct two way communication it cannot be forged. The IP addresses are allocated to ISPs in contiguous blocks, so if it is necessary to determine “who did that?” then public records can be interrogated to determine which ISP was providing Internet service, and they can then consult their records to determine which customer was allocated the particular IP address at the relevant time. That information does not of course indicate whose fingers were on the keyboard, but it will clearly indicate where to look next, or whose door to break down.
This precise mapping of IP address to customer account is problematic for Internet access from smartphones – there are not enough spare IP addresses for every phone to get a unique allocation. Instead, phones share IP addresses in a very dynamic manner and although the mobile phone provider could record the detail of these allocations, at present they often fail to do so. Addressing this “traceability hole” is one of the key aims of current Home Office policies, although they’ve been rather coy about saying so, in the hope that criminals will not exploit the weakness.
So far, all of the surveillance and tracking systems have been considered in isolation, but this is about to change in a major way. One of the provisions of the draft Communications Data Bill is the creation of a data correlation system dubbed a “Filter”. This system will combine enormous amounts of data from different systems, hoping to identify activity that would not have been apparent within a single system.
It is fundamentally inherent to this proposal that Filter data should be collected on everyone’s activity and that this data should be made available en masse from the private companies, the ISPs and telephone companies that provide services, to government systems for the correlation processing. The data won’t necessarily be physically combined on a single system (in fact it would be poor engineering to do this) but it will be logically combined. The original collectors of the data will not have any knowledge of what it is being used for, or possibly even how much data is being processed, so there will be no opportunity for whistle-blowing should excesses occur.
This integrated processing promises to make it much harder for criminals to communicate over a diversity of systems and thereby avoid being tracked – records of phone calls, emails and tweets could be easily combined. But the system’s capabilities go much further than that and the type of “big data” system envisaged will be capable of complex data mining tasks. To take a fictional example from Charlie Brooker’s “National Anthem”, the source of a YouTube upload could be identified by the uniqueness of its size and timing; or, closer to real life, the source of an embarrassing leak could be identified by cross-correlating records to pick out exactly who in Whitehall sent out an email whose reception by a journalist triggered an immediate call to the relevant newspaper editor.
The trade-off for these new insights into criminal activity is that more information must be automatically collected about everyone (“just in case”), it must be stored for long periods, measured in years, and it must be handed over to the government operated filter for processing with the inherent assumption that the processing will be necessary, proportionate and authorised. There is tremendous scope for misusing such a system; a police state would relish the opportunity of correlating data on everyone out on the streets for a demonstration, everyone gathering in groups behind closed doors – or just collating a list of everyone who passed on an email containing a subversive joke. The complexity and secrecy of the proposed “Filter” system will make it extremely challenging to ensure that misuse, or just simple “mission creep”, does not occur.