the liberation of mobile location data and its ...geoanalytics.net/and/papers/sigmobile13.pdf ·...

12
The Liberation of Mobile Location Data and its Implications for Privacy Research * Gennady Andrienko a Aris Gkoulalas-Divanis b Marco Gruteser c [email protected] [email protected] [email protected] Christine Kopp a Thomas Liebig a Klaus Rechert d [email protected] [email protected] [email protected] a Fraunhofer IAIS, St. Augustin, Germany b IBM Research, Dublin, Ireland c Rutgers University, New Brunswick, U.S.A d University of Freiburg, Freiburg i. B., Germany With the emergence of the mobile app ecosystem, user location data has escaped the grip of the tightly regulated telecommunication industry and is now being collected at unprece- dented scale and accuracy by mobile advertising, platform, and app providers. This paper is based on discussions of the authors at the Dagstuhl seminar on Mobility Data Mining and Privacy. It seeks to highlight this shift by providing a tutorial on location data flows and as- sociated privacy risks in this mobile app ecosystem. It then also reflects on the implications for the mobile privacy research community. I. Introduction The phones we carry around as we go about our daily lives do not only provide a convenient way to com- municate and access information, but also pose pri- vacy risks by collecting data about our movements and habits. For example, they can record when we get up in the morning, when we leave our homes, whether we violate speed limits, how much time we spend at work, how much we exercise, whom we meet, and where we spend the night. The places we visit allow inferences about not just one, but many potentially sensitive subjects: health, sexual orientation, finances or creditworthiness, religion, and political opinions. For many, such inferences can be embarrassing, even if they are untrue and simply misinterpretations of the data. For some, this movement data can even pose a danger of physical harm, such as in stalking cases. These risks have been amplified by the emergence of smartphones and the app economy over the last few years. We have witnessed a fundamental shift in mo- bility data collection and processing from a selected group of tightly regulated cellular operators to a com- plex web of app providers and Internet companies. This new ecosystem of mobility data collectors relies on a more sophisticated mix of positioning technolo- gies to acquire increasingly precise mobility data. In addition, smartphones also carry a much richer set of sensors and input devices, which allow collection of a * Revised and Enhanced Dagstuhl Report – Working Group on Cellular Data. Authors are listed in alphabetical order. diverse set of other data types in combination with the mobility data. Many of these types of data were pre- viously unavailable. While individual aspects of these changes have been highlighted in a number of arti- cles as well as in a string of well-publicized privacy scandals, the overall structure of current mobility data streams remains confusing. This position paper intends to survey this new mo- bility data ecosystem and to discuss the implications of this broader shift. The survey includes the types of data collected, the positioning technologies involved, and the purpose of the collection as well as privacy threats resulting from such data. We begin in Sec- tion II by reviewing how cellular networks have to monitor the location of subscriber phones to be able to route incoming calls and to provide the ability to locate emergency callers (known as E911 in the US). We survey the technologies used by operators to de- termine the position of a phone, describe how this location information at operators is stored, and how it is accessed by law enforcement entities [27, 26] and selected application service providers. We then describe how smartphone apps can directly acquire position and movement information from the hand- set, without assistance from cellular operators. This can involve different positioning technologies based on crowdsourced maps of WiFi access points and cell sectors, which poses additional privacy risks. We fur- ther discuss several classes of apps, such as location- based applications that collect position information to deliver location-targeted information and advertising-

Upload: tranngoc

Post on 24-Apr-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

The Liberation of Mobile Location Data and itsImplications for Privacy Research ∗

Gennady Andrienkoa Aris Gkoulalas-Divanisb Marco [email protected] [email protected] [email protected]

Christine Koppa Thomas Liebiga Klaus [email protected] [email protected] [email protected]

aFraunhofer IAIS, St. Augustin, GermanybIBM Research, Dublin, Ireland

cRutgers University, New Brunswick, U.S.AdUniversity of Freiburg, Freiburg i. B., Germany

With the emergence of the mobile app ecosystem, user location data has escaped the gripof the tightly regulated telecommunication industry and is now being collected at unprece-dented scale and accuracy by mobile advertising, platform, and app providers. This paperis based on discussions of the authors at the Dagstuhl seminar on Mobility Data Mining andPrivacy. It seeks to highlight this shift by providing a tutorial on location data flows and as-sociated privacy risks in this mobile app ecosystem. It then also reflects on the implicationsfor the mobile privacy research community.

I. Introduction

The phones we carry around as we go about our dailylives do not only provide a convenient way to com-municate and access information, but also pose pri-vacy risks by collecting data about our movements andhabits. For example, they can record when we get upin the morning, when we leave our homes, whetherwe violate speed limits, how much time we spend atwork, how much we exercise, whom we meet, andwhere we spend the night. The places we visit allowinferences about not just one, but many potentiallysensitive subjects: health, sexual orientation, financesor creditworthiness, religion, and political opinions.For many, such inferences can be embarrassing, evenif they are untrue and simply misinterpretations of thedata. For some, this movement data can even pose adanger of physical harm, such as in stalking cases.

These risks have been amplified by the emergenceof smartphones and the app economy over the last fewyears. We have witnessed a fundamental shift in mo-bility data collection and processing from a selectedgroup of tightly regulated cellular operators to a com-plex web of app providers and Internet companies.This new ecosystem of mobility data collectors relieson a more sophisticated mix of positioning technolo-gies to acquire increasingly precise mobility data. Inaddition, smartphones also carry a much richer set ofsensors and input devices, which allow collection of a

∗Revised and Enhanced Dagstuhl Report – Working Group onCellular Data. Authors are listed in alphabetical order.

diverse set of other data types in combination with themobility data. Many of these types of data were pre-viously unavailable. While individual aspects of thesechanges have been highlighted in a number of arti-cles as well as in a string of well-publicized privacyscandals, the overall structure of current mobility datastreams remains confusing.

This position paper intends to survey this new mo-bility data ecosystem and to discuss the implicationsof this broader shift. The survey includes the types ofdata collected, the positioning technologies involved,and the purpose of the collection as well as privacythreats resulting from such data. We begin in Sec-tion II by reviewing how cellular networks have tomonitor the location of subscriber phones to be ableto route incoming calls and to provide the ability tolocate emergency callers (known as E911 in the US).We survey the technologies used by operators to de-termine the position of a phone, describe how thislocation information at operators is stored, and howit is accessed by law enforcement entities [27, 26]and selected application service providers. We thendescribe how smartphone apps can directly acquireposition and movement information from the hand-set, without assistance from cellular operators. Thiscan involve different positioning technologies basedon crowdsourced maps of WiFi access points and cellsectors, which poses additional privacy risks. We fur-ther discuss several classes of apps, such as location-based applications that collect position information todeliver location-targeted information and advertising-

supported apps that collect position and mobility in-formation for targeted ads. In addition to the commer-cial use of mobile network and application data, wepresent the scientific point of view on the data.

In Section III we discuss privacy threats and risksthat arise from the data collection. In particular, wedistinguish risks for the three types of collected data.First, we assume that location information is availablealong with a personal identifier. Second, we relax thisnotion and assume (a series of) anonymous locationdata. Finally, we consider how location informationabout a user may be derived even though no georefer-ence is included in the collected data.

We conclude our work with a section on the man-ifold implications of this rapidly evolving mobilitydata ecosystem. We find that it is difficult to under-stand the data flows, apparently even for the serviceproviders and operators themselves [7, 17, 24] . Thereappears to be a much greater need for transparency,perhaps supported by technical solutions that moni-tor and raise awareness of such data collection. Wefind that location data is increasingly flowing acrossnational borders, which raises questions about the ef-fectiveness of current regulatory protections. We alsofind that applications are accessing a richer set of sen-sors, which allows cross-referencing and linking ofdata in ways that are not yet fully understood.

II. Location Data Collection

In order to assess privacy risks posed by locationand mobility data, in a first step an overview on cur-rent data collection practices and the general char-acteristics of such data is given. We distinguishthree groups of data collectors (observers): mobilenetwork operators (MNO), mobile platform serviceproviders (MPSP), and application service providers(ASP). While for the first group of observers (MNOs),location data is generated and collected primarily dueto technical reasons, i.e. efficient signaling, in the caseof MPSP and ASPs location information is usuallygenerated and collected to support positioning, map-ping, and advertising services and to provide variouskinds of location based services. Figure 1 providesa schematic overview on location data generated bymobile phones but also highlights the specific com-ponents and building blocks of mobile phones whichare controlled by the different entities. Furthermore,available location data originating from the aforemen-tioned layers may be re-used to support various new(third-party) businesses. Typically the data is thenanonymized or aggregated in some way before being

2:30

Application CPU

Baseband CPU

Mobile Platform

APP APP

MNO

MPSP

Law EnforcmentE911

ASPs

3rd Party Re-Use of Location Data

Figure 1: A schematic overview on a today’s smart-phone, its essential building blocks and their con-trollers illustrating the generation and informationflow of location data.

transferred to third parties.The primary – and usually most accurate – source

for location information that we associate with a mo-bile phone is a Global Navigation Satellite System(GNSS) as, for example, GPS. However, any mo-bile radio-based communication system can be eitherused to acquire location data (e.g. through triangula-tion) or location data is implicitly generated (e.g. bycell / network association). Section II.A discusses de-tails on location information in mobile telephony net-works. WiFi-based positioning and various combina-tion thereof are briefly discussed in Section III.A.

In today’s smartphones the GNSS unit is usuallybundled with a so-called baseband processor, whichis an autonomous CPU running the mobile networkstack, e.g., handling calls, network attachment etc. AMNO is required by regulation to provide a device’slocation within 50 - 300 meters in the case of an emer-gency (E-911) 1. Due to the bundling of GNSS andbaseband CPU accurate positioning of an individualdevice becomes possible even in situations where cel-lular network positioning is difficult or too inaccurate(e.g. in rural areas).

Baseband and application CPU are both concep-tually and physically separated. On today’s smart-

1FCC Enhanced 911 Wireless Service,http://www.fcc.gov/pshs/services/911-services/enhanced911.

phones the user interacts solely with the mobile plat-form, i.e. a sophisticated mobile operation system,which runs on the application (generic) CPU. The mo-bile platform concentrates various location informa-tion sources into a single location API and furtherimplements and enforces user-defined privacy poli-cies. The application layer (so-called apps) connectsto such an API in order to retrieve and use locationdata.

II.A. Collection and Usage of MobileTelephony Network Data

As an example for mobile telephony networks we dis-cuss the widely deployed GSM infrastructure, as itssuccessors UMTS (3G) and LTE (4G) have a signifi-cantly smaller coverage and share most of its principalcharacteristics. A typical GSM network is structuredinto cells, each served by a single base transceiverstation (BTS). Larger cell-compounds are called lo-cation areas. To establish a connection to the mobilestation (MS) e.g. in the case of an incoming connec-tion request, the network has to know if the MS is stillavailable and in which location area it is currently lo-cated. To cope with subscriber mobility the locationupdate procedure was introduced. Either periodicallyor when changing the location area, a location updateis triggered. The time lapse between periodic locationupdates is defined by the network and varies betweeninfrastructure providers.

Additionally, the infrastructure’s radio subsystemmeasures the distance of phones to the serving cell tocompensate for the signal propagation delay betweenthe MS and BTS. The timing advance (TA) value (8-bit value) is used to split the cell radius into virtualrings. In the case of GSM these rings have a size ofroughly 550 m in diameter. The TA is regularly up-dated and is sent by the serving infrastructure to eachmobile phone. In the following we will provide de-tails on the different positioning methods available toMNOs.

II.A.1. Active Positioning

To obtain a position of an idle phone, active com-munication, e.g., voice/text/data transmission but alsoprotocol related communication like location updates,IMSI attach, etc. is required. Thus, the network ei-ther has to wait for the next active period of the MS(e.g. phone call, location update) or has to trigger MSactivity. This can be achieved by transmitting a so-called silent text message in order to force an activecommunication without raising the user’s awareness.

Active positioning methods yield immediate andmore accurate results. These methods work with-out special requirements on the mobile station andachieve a positioning accuracy of up to 50 m in ur-ban areas (TDOA) [38]. However, there are additionalcosts involved (e.g. network utilization, suitable in-frastructure as, e.g., SLMC) and thus, an incentiveand dedicated target is required. In general, activeGSM positioning methods are not suitable for loca-tion tracking of massive amounts of people, but are avaluable and quite accurate tool to track individuals.

II.A.2. Call Data Records

For billing purposes so-called call data records (CDR)are generated. This datum usually consists of the cell-ID where a call has been started (either incoming oroutgoing), the cell where a call has been terminated,start time, duration, ID of the caller and the phonenumber called. A typical GSM cell size ranges froma few hundred meters in diameter to a maximum sizeof 35 km. In a typical network setup a cell is furtherdivided into three sectors. In that case also the sectorID is available, and the sector ID is also part of a callrecord. CDRs are usually not available in real-time.However, MNOs store CDRs for a certain time span,either because of legal requirement (e.g. EU data re-tention directive [13]) or accounting purposes.

II.A.3. Alternative, Non-Standard Posi-tioning Methods

Another, (non-standard) method to determine the lo-cation of a MS is to make use of received sig-nal strength measurement results. Usually based ondatabases derived from signal propagation modelsused during the planning phase of the infrastructure,this data an be exploited to create a look-up table forsignal measurements to determine the MS’s location.Based on the cell, TA and received signal strength ofthe serving cell as well as the six neighboring cells,[42] achieved a positioning accuracy of below 80 m in67% of cases and below 200 m in 95% of cases in anurban scenario. In a recent study using RSSI in com-bination with map information and movement predic-tion [6] achieved in 50% of cases less than 19 m andin 95% of cases less than 64 m of accuracy. Whilethese methods seem cost effective (no changes to in-frastructure or protocols required) and technically fea-sible they are usually not available in current deploy-ments yet.

II.A.4. Re-Use of Cellular Data

Mobile telephony networks and their physical charac-teristics are able to help locating mobile phone usersin the case of an emergency and may be a valuabletool for search and rescue (SAR) [30]. For instance,[8] analyzed post-disaster populations displacementusing SIM-card movements in order to improve theallocation of relief supplies.

Furthermore, location information gatheredthrough mobile telephony networks is now a standardtool for crime prosecution and is used by the EC DataRetention Directive with the aim of reducing the riskof terror and organized crime [13]. As an example,law enforcement officials seized CDRs over a 48 hourtimespan resulting in 896,072 individual records con-taining 257,858 call numbers after a demonstrationin Dresden, Germany, went violent [26]. Further,the police of North Rhine-Westphalia issued 225,784active location determinations on 2,644 differentsubjects in 778 preliminary proceedings in 2010 [34].While in principle law enforcement could also collectlocation- and movement-data from MPSP and ASPs,difficulties arise if such data is stored outside of therespective jurisdiction.

Additionally, commercial services are based on theavailability of live mobility patterns of larger groups.(e.g. for traffic monitoring or location-aware advertis-ing [25]). Thus, location information of network sub-scribers might be passed on to third parties. Usually,subscribers are neither aware of the extent of their in-formation disclosure (just by carrying a switched-onmobile phone), nor of how the collected data is usedand by whom. Even sporadic disclosure of locationdata, e.g. through periodic location updates, are ableto disclose a users frequently visited places (i.e. pref-erences) in an accuracy similar to continuos locationdata after 10-14 days [36].

II.B. Collection and Usage of DataThrough MPSPs and ASPs

Mobile advertising is one of the fasted growing ad-vertising media, doubling its yearly revenue over thenext years by a prediction of [14]. The availability ofnew powerful mobile devices (e.g. Smartphones) incombination with comprehensive and affordable mo-bile broadband communication has given rise to thisnew generation of advertising media, which allows todeliver up-to-date information in a context-aware andpersonalized manner. However, personal informationas a user’s current location and personal preferencesare prerequisite for a tailored advertisement delivery.

Consequently, MPSPs and ASPs are interested to pro-file users and personal data is disclosed in an unprece-dented manner to various (unknown) commercial en-tities which poses serious privacy risks.

In general, mobile advertising is similar to onlineadvertising involving the four main entities: adver-tiser, publisher, client and broker [19]. The adver-tiser is interested to promote his goods or servicesand provides the ad content. Publishers (e.g. web-sites, apps) provide the space to place an advertise-ment (e.g. in form of a banner) and clients refer tothe devices that receive the advertisement. Brokersplay the mayor role in the advertising ecosystem asthey connect advertisers, publishers and clients. Ina mobile advertising setting they are also known asad networks. Their task is the optimal placement ofadvertisements given the available pool of advertisingspace, advertisements and users that can be addressed.Smartphones are ideal clients for targeted advertisingbecause they are typically used by a single user. Inconsequence, brokers are highly interested in personalinformation stored or available on smartphones thatallow to infer a user’s interests. As MPSPs and ASPsare potentially able access such information, they havebecome a mayor player in the advertising ecosystem.Often, they even subsume the roles of publisher andbroker. For instance, Google offers advertising spacein its search engine and owns the mobile advertisingcompany called AdMob. Similarly, Apple acquiredthe mobile advertising platform iAd.

Over the past years, researchers and journalistshave started to analyze apps and mobile operating sys-tems w.r.t. the collection of personal data [16, 22,11, 1, 37, 39, 7]. The analyses show that sensitiveinformation is accessed and transmitted. The follow-ing sections provide an overview on how data is col-lected and examples about personal information thatis or was collected by MPSPs and ASPs.

II.B.1. MPSP – Positioning & ProfilingServices

There are three reasons for MPSP to collect locationinformation: positioning, mapping, and advertising.These services can also be provided by third partiesbut in practice they have become so important for themobile ecosystem that they are usually linked into themobile platform itself and offered by the mobile plat-form service provider.

For instance, mobile platform providers utilize theirinstallation base to create or support new (commer-cial) services based on crowdsourced data. Eventhough a mobile phone may not be equipped with

GPS, a position may be obtained by approximate lo-cation determination based on mobile telephony in-frastructure or WiFi. The sensor data is sent to ex-ternal services and external information sources areused to improve (i.e. speed-up) the determination ofthe user’s current location. Thus, the modeling of theuser’s context is not conducted solely on the user’s de-vice anymore. Just as well as the user’s location, datanecessary for calculation and displaying requested in-formation are usually not stored on the user’s deviceanymore. It is downloaded to the user’s device only ondemand. Therefore, the user’s whereabouts (as wellas the user’s preferences) have to be transmitted to theservice provider frequently.

By aggregating location information of many users,such information could improve or enable new kindsof services. For instance, Google Mobile Maps makesuse of user contributed data (with the user’s consent)to determine and visualize the current traffic situation.

In Spring 2011 it was found that Apple’s iPhonegenerates and stores a user’s location history, morespecifically, data records correlating visible WiFiaccess-points or mobile telephony cell-ids with thedevice’s GPS location on the user’s phone. More-over, the recorded data-sets are frequently synchro-nized with the platform provider. Presumably, thisdata is used by MPSPs to improve database-based, al-ternative location determination techniques for situa-tions where GNSS or similar techniques are not avail-able or not operational. Thus, re-visited locations arestored on the phone irregularly. Therefore they arenot suitable for identifying frequently visited placesand providing semantic interpretations to routine tripsand activities [2]. Still, the stored locational informa-tion is sufficient for inferring which places have beenvisited by a phone owner or, in contrast, which placeswere not attended. Such information can be harmfulto personal privacy, too, if a person was expected tovisit some places due to his obligations.

II.B.2. ASP – Personalization of MobileContext

In order to perform dedicated tasks apps, also accessother data such as the user’s contacts, calendar andbookmarks as well as sensors readings (e.g. camera,microphone). If these apps have access to the Internet,they are potentially able to disclose this informationand are a serious thread to user privacy [22]. Mostoften, advertisement libraries (e.g., as part of an app)require access to the phone information and locationAPI [16] in order to obtain the phone’s IMEI numberand geographic position.

For instance, Apple Siri records, stores and trans-mits any spoken request to Apple’s cloud-based ser-vices where it is processed through speech recogni-tion software, is analyzed to be understood, and issubsequently serviced. The computed result of eachrequest is communicated back to the user. Addi-tionally, to fully support inferencing from context,Siri is ”expected to have knowledge of users’ con-tact lists, relationships, messaging accounts, media(songs, playlists, etc) and more” 2, including locationdata to provide the context of the request, which arecommunicated to Apple’s data center. As an exampleof Siri’s use of location data, users are able to geo-tagfamiliar locations (such as their home or work) and seta reminder when they visit these locations. Moreover,user location data is used to enable Siri to support re-quests for finding the nearest place of interest (e.g.,restaurant) or to report the local weather.

II.C. Specifics of Episodical MovementData

Most of the data collected by MNO, MPSP and ASPcan be referred as ’Episodical Movement Data’: dataabout spatial positions of moving objects where thetime intervals between the measurements may bequite large and therefore the intermediate positionscannot be reliably reconstructed by means of interpo-lation, map matching, or other methods. Such datacan also be called ’temporally sparse’; however, thisterm is not very accurate since the temporal resolutionof the data may greatly vary and occasionally be quitefine.

Three types of uncertainty in episodic movementdata are identified in [5]. First, the common type ofuncertainty is the lack of information about the spa-tial positions of the objects between the recorded posi-tions (continuity), which is caused by large time inter-vals between the recordings and by missed recordings.Second, a frequently occurring type of uncertainty isimprecision of the recorded positions (accuracy). Dueto these two types of uncertainty, episodic movementdata cannot be treated as continuous trajectories, i.e.,unbroken lines in the spatio-temporal continuum suchthat some point on the line exists for each time mo-ment. Third, the number of recorded objects (cover-age) may also be uncertain due to the usage of a ser-vice or due to the utilized sensor technology. For ex-ample, one individual may carry two or more devices,which will be registered as independent objects. Onthe other hand, some techniques only capture devices

2http://privacycast.com/siri-privacy-and-data-collection-retention/, Online, Version of 9/6/2012

which are turned on. The activation status may changewhile a device carrier moves.

Figure 2 and 3 emphasizes the differences betweencontinuous and smooth GPS-based trajectories anddiscrete and abrupt phone-based trajectories. Two im-ages on the left show a map (Figure 2) and so-calledspace-time cube representation (Figure 3) of a one-day car trajectory in Milan, Italy. Similarly, two im-ages on the right show a map and a space-time cubeof a single-day phone-based trajectory.

It should be noted that many of the existing dataanalysis and privacy preservation methods designedfor dealing with movement data are explicitly or im-plicitly based on the assumption of continuous ob-jects’ movement between the measured positions andare therefore not suitable for episodic data. Interpola-tion is obviously involved in visual representation oftrajectories by continuous lines but it is also implic-itly involved in computation of movement speeds, di-rections, and other attributes characterizing the move-ment (these computations also assume that the posi-tions are precise). The same holds for the summariza-tion of movement data in the form of density or vectorfields. Mining methods for finding patterns of rela-tive or collective movement of two or more objects(e.g. meeting or flocking) also require fine-resolutiondata. Since many of the existing methods are not ap-plicable to episodic movement data, there is a needin finding suitable approaches to analyzing this kindof data and in the retrieval of data for the unobservedlocations [29, 28].

III. Privacy Threats and Risks

From a business perspective, mobility data with suffi-ciently precise location estimation are often valuabledata enabling various location-based services; fromthe perspective of privacy advocates, such insights areoften deemed a privacy threat or a privacy risk. Lo-cation privacy risks can arise if a third-party acquiresa data tuple (user ID, location), which proves that anidentifiable user has visited a certain location. In mostcases, the datum will be a triple that also includes atime field describing when the user was present at thislocation. Note that in theory there are no location pri-vacy risks if the user cannot be identified or if the lo-cation cannot be inferred from the data. In practice,however, it is difficult to determine when identifica-tion and such inferences are possible.

Recently, several location privacy incidents werereported in the media. A famous incident regards thecase of Apple [9], where 3G Apple iOS devices were

reported to store the location of their mobile users’in unencrypted form for a period of over one year.This precise location information was stored withoutthe knowledge of the users and was transmitted to theiTunes application during the synchronization of thedevice. According to Apple, the stored location in-formation was not used to track the users but was at-tributed to a programming error which was later fixedwith a software update.

Google was also reported to be using precise lo-cation data, collected from users’ mobile devices, toimprove the accuracy of its navigation services [20],while Microsoft [32] recently admitted that their cam-era application in Windows Phone 7 ignored the users’privacy settings to disable transmitting their locationinformation to Microsoft. In response to this incident,the company issued a software update.

Although the above-mentioned privacy incidentsdid not lead to actual harm caused to the individu-als due to the lack of location privacy, the continualflurry of such breaches is worrying as it becomes evi-dent that sensitive location information may easily fallinto the wrong hands [10]. In the following subsec-tions, we elaborate on different types of privacy risksleading to user identification or to sensitive locationinferences.

III.A. Collection of location informationwith assigned user ID

This is the most trivial case, as long as the locationof the user is estimated with sufficient accuracy forproviding the intended LBS. In case the location is notyet precise enough, various techniques (e.g. fusion ofseveral raw location data from various sensors) allowfor improving the accuracy.

• Example 1.1: MNO routinely stores tuples ofthe form (cell ID and sector ID, user ID), e.g.within the CDR data.

• Example 1.2: ASP gets the GPS-location for auser who has already been identified, e.g. byhis/her log-in to the ASP or by a payment trans-action.

• Example 1.3: From a smartphone, ASP re-ceives the IDs and signal strengths of sev-eral nearby transmitters (base stations, WiFi de-vices,...). Based on previously established mapsof these transmitters, the ASP is able to estimatea more precise location.

Figure 2: Comparison of continuous GPS (left) vs. episodic phone-based trajectories (right)

Figure 3: Time-space cube comparison of continuous GPS (left) vs. episodic phone-based trajectories (right)

Additionally, ASP may have direct access to a va-riety of publicly available spatial and temporal datasuch as

• geographical space and inherent properties ofdifferent locations and parts of the space (e.g.street vs. park)

• various objects existing or occurring in spaceand time: static spatial objects (having particu-lar constant positions in space), events (havingparticular positions in time), and moving objects(changing their spatial positions over time)

Such information either exists in explicit form in pub-lic databases like OSM, WikiMapia or in ASP’s datacenters, or can be extracted from publicly availabledata by means of event detection or situation simi-

larity assessment [3][4]. Combining such informa-tion with positions and identities of users allow deepsemantic understanding of their habits, contacts, andlifestyle.

III.B. Collection of anonymous locationinformation

When location data is collected without any obvioususer identifiers, privacy risks are reduced and suchseemingly anonymous data is usually exempted fromprivacy regulations. It is, however, often possible tore-identify the user based on quasi-identifying datathat has been collected. Therefore, the aforemen-tioned risks can apply even to such anonymous data.

The degree of difficulty in re-identifying anony-mous data depends on the exact details of the data

collection and anonymization scheme and the adver-saries access to background information. Consider thefollowing examples:

Re-identifying individual samples. Individual lo-cation records can be re-identified through observa-tion re-identification [31]. The adversary knows thatuser Alice was the only user in location (area) l at timet, perhaps because the adversary has seen the personat this location or because records from another sourceprove it. If the adversary now finds an anonymous da-tum (l, t) in the collected mobility data, the adversarycan infer that this datum could only have been col-lected from Alice and has re-identified the data. Inthis trivial example, there is actually no privacy riskfrom this re-identification because the adversary knewa priori that Alice was at location l at time t, so the ad-versary has not learned anything new.

There are, however, three important variants of thistrivial case that can pose privacy risks. First, theanonymous datum may contain a more precise loca-tion l′ or a more precise time t′ than the adversaryknew about a priori. In this case, the adversary learnsthis more precise information. Second, the adversarymay not know that Alice was at l but simply know thatAlice is the only user who has access to location l.In this latter case, also referred to as restricted spaceidentification, the adversary would learn when Alicewas actually present at this location. Third, the anony-mous datum may contain additional fields with poten-tially sensitive information that the adversary did notknow before. Note, however, that such additional in-formation can also make the re-identification task eas-ier.

Re-identifying time-series location data. Re-identification can also become substantially easierwhen location data is repeatedly collected and time-series location traces are available. We refer to time-series location traces, rather than individual locationsamples when it is clear which set of location sam-ples was collected from the same user (even thoughthe identity of the user is not known). For example,the location data may be stored in separate files foreach user or a pseudonym may be used to link multi-ple records to the same user.

Example 2.1: A partner of the MNO has obtainedanonymized traces of a user, e.g. as a sequence ofCDRs where all user IDs have been removed. Whilethis looks like anonymous location data, various ap-proaches exist to re-identify the user associated withthese mobility traces. One approach is to identify thetop 2 locations where the user has spent most of itstime. This corresponds in many cases to the home

and work location of a certain user.Empirical research [15] has further observed that

the pair (home location, work location) is often al-ready identifying a unique user. A recent empir-ical study [41] explains various approaches for re-identification of a user. Another paper has ana-lyzed the consequences of increasingly strong re-identification methods to privacy law and its interpre-tation [35].

Further re-identification methods for location datarely on various inference and data mining techniques.

III.C. Collection of data without loca-tion

Even in absence of actual location readings providedby positioning devices, location disclosures may oc-cur by means of other modern technologies. Recentwork by Han, et al. [23] demonstrated that the com-plete trajectory of a user can be revealed with a 200 maccuracy by using accelerometer readings, even whenno initial location information is known. What iseven more alarming is that accelerometers, typicallyinstalled in modern smartphones, are usually not se-cured against third-party applications, which can eas-ily obtain such readings without requiring any spe-cial privileges. Acceleration information can thus betransmitted to external servers and be used to discloseuser location even if all localization mechanisms ofthe mobile device are disabled.

Another example of privacy disclosures in mobiledevices regards the monitoring of user screen tapsthrough the use of accelerometer and gyroscope read-ings. Recent work by Miluzzo, et al. [33] demon-strated that user inputs across the display and the let-ters of a mobile device can be silently identified withhigh precision through the use of motion sensors andmachine learning analysis. Their prototype imple-mentation achieved tap location identification rates ofas high as 90% in accuracy, practically demonstrat-ing that malevolent applications installed in mobiledevices may severely compromise the privacy of theusers.

Last but not least, several privacy vulnerabilitiesmay be exposed through the various resource typesthat are typically supported and communicated bymodern mobile phone applications. Hornyack, et al.[22] examined several popular Android applicationswhich require both internet access and access to sen-sitive data, such as location, contacts, camera, mi-crophone, etc. for their operation. Their examina-tion showed that almost 34% of the top 1100 popu-lar Android applications required access to location

data, while almost 10% of the applications requiredaccess to the user contacts. As can be anticipated, ac-cess of third-party applications to such sensitive datasources may lead to both user re-identification as wellas sensitive information disclosure attacks, unless pri-vacy enabling technology is in place.

• Example 3.1: During a vacation, a user has shotmany photos, which are all tagged with a time-stamp but not geotagged. Then there are tech-niques to assign to most of these photos a geo-location, as long as these photos contain someunique features. Similarly, there are techniquesto assign real names to most persons on thesephotos, e.g. by using tools or crowd-sourcing asprovided e.g. by a social network or other plat-forms to store photos. Having time and placesof a photo stream one might reconstruct precisetrajectories.

• Example 3.2: An app is able to continuouslyread the accelerometer of a hand-set. Then it canreconstruct a 3D trace of the user’s movements.

IV. Implications

Potentially sensitive location data from the use ofsmartphones is now flowing to a largely inscrutableecosystem of international app and mobile platformproviders, often without knowledge of the data sub-ject. This represents a fundamental shift from thetraditional mobile phone system, where location datawas primarily stored at more tightly regulated cellularcarriers that operated within national borders.

A large number of apps customize the presented in-formation or their functionality based on user loca-tion. Examples of such apps include local weather in-formation, location-based reminders, maps and navi-gation, restaurant rating, and friend finders. Such appsoften transmit the user location to a server, where itmay be stored for a longer duration.

It is particularly noteworthy, however, that mobileadvertisers and platform providers have emerged asan additional entity that aggregates massive sets of lo-cation records obtained from user interactions with avariety of apps. When apps request location informa-tion, the user location can also be disclosed to the mo-bile platform service provider as part of the wirelesspositioning service function. Even apps that do notneed any location information to function, often re-veal the user location to mobile advertisers. The in-formation collected by these advertising and mobileproviders is arguably more precise than the call data

records stored by cellular carriers, since it is often ob-tained via WiFi positioning or the GPS. In addition,privacy notices by app providers often neglect to dis-close such background data flows [1]. While the di-versity of location-based apps has been foreseen bymobile privacy research to some extent—for example,research on spatial cloaking [18] has sought to provideprivacy-preserving mechanisms for sharing locationdata with a large number of apps—this aggregation ofdata at mobile platform providers was less expected.In essence, this development is for economic reasons.Personal location information has become a tradablegood: users provide personal information for targetedadvertising in exchange for free services (quite simi-lar to web-based advertising models). The advertisingrevenue generated out of such data, finances the oper-ation of the service provider. Because of this implicitbargain between users and service providers, there islittle incentive to curb data flows or adopt strongertechnical privacy protections as long as it is not de-manded by users or regulators.

We suspect, however, that many users are not fullyaware of this implicit bargain. Therefore, we believethat it is most important from a privacy perspectiveto create awareness of these data flows among users,which is not incidentally the very first core principleof the fair information practice principles [40]. It iswell understood that lengthy privacy disclosures, ifthey exist for smartphone apps, are not very effec-tive at reaching the majority of users and even the re-cent media attention regarding smartphone privacy 3

does not appear to have found a sufficiently wide au-dience as our workshop discussions suggest. Raisingawareness and empowering users to make informeddecisions about their privacy will require novel ap-proaches, user-interfaces, and tools.

When using smartphones, users should not only beaware of what data they are revealing to third-partiesand how frequently it is revealed but also should beable to understand the potential risks of sharing suchdata. For instance, users/subscribers in the EU are cur-rently entitled to get a full copy of their personal datastored by a commercial entity 4 but such voluminousdatasets can currently only be analyzed by experts.Even then, it will be difficult to judge what sensitiveinformation can be learned from this dataset when it islinked with other data about the same person or whenit is analyzed by a human expert with powerful vi-

3For instance, http://blogs.wsj.com/wtk-mobile/ Re-trieved 2012/10/18.

4For example, an Austrian student requested all personal datafrom Facebook and received a CD [21]

sual analysis tools [2]. Was the precision of a locationrecord sufficient to determine the building that a userhas entered? Is it possible to reconstruct the path ausers has taken between two location records? Howeasily can one infer habits or health of a person basedon the location records collected from smartphones?

As another example, some service providers claimto collect location data only in anonymous form. Themethods for re-identification, however, have evolvedquickly. When can ’anonymized’ time-series locationdata really qualify as data that is not personally iden-tifiable information and remain outside most currentprivacy regulations? Finally, even non-georeferenceddata provided by the sensors embedded in a smart-phone (camera, accelerometer, microphone, etc.), aswell as the files stored in the internal memory (photos,music, playlists), allow extracting knowledge abouta person’s location and mobility. Overall, it appearsnecessary to investigate what associations can be es-tablished and what inferences can be made by a hu-man when the data is considered in context, and howsuch information can be conveyed to users of services.

Users should also be able to learn in which coun-tries their data is stored or processed, since this canhave important implications for the applicable legalprivacy framework. While the European Union hasachieved some degree of harmonization of privacystandards for exported data from its citizens throughthe safe harbor provisions [12], differences still exist,for example, with respect to law enforcement accessto user data. We believe that providing transparencyof cross-border data flows would lead to a more mean-ingful public discussion of data protection policies.For example, when data is handled by multi-nationalcorporations, should data subjects be given a choicewhere their data is processed and stored?

We hope that the research community will help ad-dress these questions and will interface with data pro-tection authorities and policy experts to actively defineprivacy for this mobility data ecosystem.

References

[1] Mobile apps for kids: Current privacydisclosures are disappointing. Techni-cal report, Federal Trade Commission,2012. http://www.ftc.gov/os/2012/

02/120216mobile_apps_kids.pdf.

[2] G. Andrienko and N. Andrienko. Privacy is-sues in geospatial visual analytics. In GeorgGartner, Felix Ortag, William Cartwright, GeorgGartner, Liqiu Meng, and Michael P. Peterson,

editors, Advances in Location-Based Services,Lecture Notes in Geoinformation and Cartog-raphy, pages 239–246. Springer Berlin Heidel-berg, 2012.

[3] G. L. Andrienko, N. V. Andrienko, C. Hurter,S. Rinzivillo, and S. Wrobel. From movementtracks through events to places: Extracting andcharacterizing significant places from mobilitydata. In IEEE VAST, pages 161–170. IEEE,2011.

[4] G. L. Andrienko, N. V. Andrienko, M. Mlade-nov, M. Mock, and C. Politz. Identifying placehistories from activity traces with an eye toparameter impact. IEEE Trans. Vis. Comput.Graph., 18(5):675–688, 2012.

[5] N. Andrienko, G. Andrienko, H. Stange,T. Liebig, and D. Hecker. Visual analytics forunderstanding spatial situations from episodicmovement data. KI - Kunstliche Intelligenz,pages 241–251, 2012.

[6] M. Anisetti, C. A. Ardagna, V. Bellandi,E. Damiani, and S. Reale. Map-based locationand tracking in multipath outdoor mobile net-works. Wireless Communications, IEEE Trans-actions on, 10(3):814 –824, march 2011.

[7] C. Arthur. iphone keeps record ofeverywhere you go. The Guardian,2011. http://www.guardian.co.uk/

technology/2011/apr/20/iphone-

tracking-prompts-privacy-fears.

[8] L. Bengtsson, X. Lu, A. Thorson, R. Garfield,and J. von Schreeb. Improved response to disas-ters and outbreaks by tracking population move-ments with mobile phone network data: A post-earthquake geospatial study in haiti. PLoS Med,8(8):e1001083, 08 2011.

[9] N. Bilton. 3G Apple iOS devices are storingusers location data. The New York Times, Pub-lished: April 20, 2011, 2011.

[10] N. Bilton. Holding companies accountable forprivacy breaches. The New York Times, Pub-lished: April 27, 2011, 2011.

[11] W. Enck, P. Gilbert, B.-G. Chun, L. P. Cox,J. Jung, P. McDaniel, and A. N. Sheth. Taint-droid: an information-flow tracking system forrealtime privacy monitoring on smartphones. InProceedings of the 9th USENIX conference on

Operating systems design and implementation,OSDI’10, pages 1–6, Berkeley, CA, USA, 2010.USENIX Association.

[12] European Parlament and European Council. Di-rective 95/46/ec on the protection of individualswith regard to the processing of personal dataand on the free movement of such data. OfficialJournal of the European Communities, (L281),1995.

[13] Council European Parliament. Directive2006/24/ec of the european parliament and ofthe council of 15 march 2006 on the reten-tion of data generated or processed in connec-tion with the provision of publicly availableelectronic communications services or of publiccommunications networks and amending direc-tive 2002/58/ec. Official Journal of the Euro-pean Union, L 105:54 – 63, 2006.

[14] Gartner, Inc. Gartner says worldwide mobile ad-vertising revenue forecast to reach $3.3 billion in2011, 2011. http://www.gartner.com/it/

page.jsp?id=1726614.

[15] Philippe Golle and Kurt Partridge. On theanonymity of home/work location pairs. InHideyuki Tokuda, Michael Beigl, Adrian Fri-day, A. Brush, and Yoshito Tobe, editors, Perva-sive Computing, volume 5538 of Lecture Notesin Computer Science, pages 390–397. SpringerBerlin / Heidelberg, 2009.

[16] M. C. Grace, W. Zhou, X. Jiang, and A.-R.Sadeghi. Unsafe exposure analysis of mobilein-app advertisements. In Proceedings of thefifth ACM conference on Security and Privacyin Wireless and Mobile Networks, WISEC ’12,pages 101–112, New York, NY, USA, 2012.ACM.

[17] A. Greenberg. Phone ’rootkit’ maker carrier iqmay have violated wiretap law in millions ofcases. Forbes, 2011. http://www.forbes.

com/sites/andygreenberg/2011/11/30/

phone-rootkit-carrier-iq-may-have-

violated-wiretap-law-in-millions-of-

cases/. Retrieved 2012/10/18.

[18] M. Gruteser and D. Grunwald. Anonymous us-age of location-based services through spatialand temporal cloaking. In Proceedings of the1st international conference on Mobile systems,

applications and services, MobiSys ’03, pages31–42, New York, NY, USA, 2003. ACM.

[19] H. Haddadi, P. Hui, T. Henderson, and I. Brown.Targeted advertising on the handset: Privacy andsecurity challenges. In Hans Jrg Mller, FlorianAlt, and Daniel Michelis, editors, Pervasive Ad-vertising, Human-Computer Interaction Series,pages 119–137. Springer, 2011.

[20] M. Helft. Apple and Google use phone data tomap the world. The New York Times, Published:April 25, 2011, 2011.

[21] K. Hill. Forbes, 2012. http://www.forbes.

com/sites/kashmirhill/2012/02/07/

the-austrian-thorn-in-facebooks-

side/. Retrieved 2012/10/18.

[22] P. Hornyack, S. Han, J. Jung, S. Schechter, andD. Wetherall. These aren’t the droids you’relooking for: retrofitting android to protect datafrom imperious applications. In Proceedings ofthe 18th ACM conference on Computer and com-munications security, CCS ’11, pages 639–652,New York, NY, USA, 2011. ACM.

[23] H. Jun, E. Owusu, L. T. Nguyen, A. Perrig, andJ. Zhang. Accomplice: Location inference usingaccelerometers on smartphones. In Communica-tion Systems and Networks (COMSNETS), 2012Fourth International Conference on, pages 1 –9,2012.

[24] D. Kravets. An intentional mistake: Theanatomy of googles wi-fi sniffing debacle.Wired, 2011. http://www.wired.com/

threatlevel/2012/05/google-wifi-fcc-

investigation/. Retrieved 2012/10/18.

[25] J. Krumm. Ubiquitous advertising: The killerapplication for the 21st century. Pervasive Com-puting, IEEE, 10(1):66 –73, jan.-march 2011.

[26] S. Landtag. Druchsache 5/6787. SachsischerLandtag 5. Wahlperiode, 2011.

[27] E. Lichtblau. Police are using phone tracking asa routine tool. The New York Times, Published:March 31, 2012, 2012.

[28] T. Liebig and Z. Xu. Pedestrian monitoring sys-tem for indoor billboard evaluation. Journal ofApplied Operational Research, 4:28–36, 2012.

[29] T. Liebig, Z. Xu, M. May, and S. Wrobel. Pedes-trian quantity estimation with trajectory patterns.In Proceedings of the European Conference onMachine Learning and Principles and Practiceof Knowledge Discovery in Databases ECMLPKDD 2012, Part II, LNCS 7524, pages 629–643. Springer, 2012.

[30] C. Ling, M. Loschonsky, and L. M. Reindl.Characterization of delay spread for mobile ra-dio communications under collapsed buildings.In IEEE 21st International Symposium on Per-sonal Indoor and Mobile Radio Communica-tions (PIMRC), pages 329–334, Sept. 2010.

[31] C. Y. T. Ma, D. K. Y. Yau, N. K. Yip, andN. S. V. Rao. Privacy vulnerability of publishedanonymous mobility traces. In Proceedings ofthe sixteenth annual international conference onMobile computing and networking, MobiCom’10, pages 185–196, New York, NY, USA, 2010.ACM.

[32] D. McCullagh. Microsoft collects locations ofWindows phone users. CNet News, Published:April 25, 2011, 2011.

[33] E. Miluzzo, A. Varshavsky, S. Balakrishnan, andR. R. Choudhury. Tapprints: your finger tapshave fingerprints. In Proceedings of the 10th in-ternational conference on Mobile systems, appli-cations, and services, MobiSys ’12, pages 323–336, New York, NY, USA, 2012. ACM.

[34] Ministerium fur Inneres und Kommu-nales NRW. Funkzellenauswertung (FZA)und Versenden ”Stiller SMS” zur Krimi-nalitatsbekampfung. MMD 15/3300, 2011.

[35] P. Ohm. Broken promises of privacy: Respond-ing to the surprising failure of anonymization.UCLA Law Review, Vol. 57, p. 1701, 2010, 2009.

[36] K. Rechert, K. Meier, B. Greschbach, D. Wehrle,and D. von Suchodoletz. Assessing location pri-vacy in mobile communication networks. In

H. Li X. Lai, J. Zhou, editor, ISC 11, LNCS7001, pages 309–324. Springer, Heidelberg,2011.

[37] E. Smith. iphone applications & privacyissues: An analysis of application trans-mission of iphone unique device identi-fiers (udids). Technical report, PSKL,2010. http://www.pskl.us/wp/wp-content/uploads/2010/09/iPhone-

Applications-Privacy-Issues.pdf.

[38] G. Sun, J. Chen, W. Guo, and K. J. R. Liu. Sig-nal processing techniques in network-aided po-sitioning: a survey of state-of-the-art position-ing designs. Signal Processing Magazine, IEEE,22(4):12 – 23, July 2005.

[39] S. Thurm and I. Yukari Kane. Your appsare watching you. The Wall Street Journal,2010. http://online.wsj.com/article/

SB10001424052748704694004576020083703574602.

html.

[40] W. H. Ware. Records, computers, and the rightsof citizens. Secretary’s Advisory Committee onAutomated Personal Data Systems, Departmentof Health, Education and Welfare, Washington,D.C., 1973.

[41] H. Zang and J. Bolot. Anonymization of loca-tion data does not work: a large-scale measure-ment study. In Proceedings of the 17th annualinternational conference on Mobile computingand networking, MobiCom ’11, pages 145–156,New York, NY, USA, 2011. ACM.

[42] D. Zimmermann, J. Baumann, A. Layh, F. Land-storfer, R. Hoppe, and G. Wolfle. Databasecorrelation for positioning of mobile termi-nals in cellular networks using wave propaga-tion models. In Vehicular Technology Confer-ence, 2004. VTC2004-Fall. 2004 IEEE 60th, vol-ume 7, pages 4682 – 4686 Vol. 7, 2004.