privacy, ethics, and big (smartphone) data, at mobisys 2014
DESCRIPTION
Keynote talk I gave at the Mobile and Cloud Workshop at Mobisys 2014. I talk about my experiences and reflections on privacy, focusing on (1) Urban Analytics, (2) Google Glass, and (3) PrivacyGrade.TRANSCRIPT
©2
00
9 C
arn
eg
ie M
ello
n U
niv
ers
ity :
1
Privacy, Ethics, and Big (Smartphone) Data
Mobile Cloud Computing and ServicesJune 16, 2014
Jason Hong
ComputerHumanInteraction:MobilityPrivacySecurity
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
2
Smartphones are Pervasive
• 58% penetration in the US as of early 2014
• About 1.2M Android and iOS apps
• Over 75 billion apps downloaded on each of Android and iOS
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
3
Smartphones are Intimate
• Mobile phones and millennials (Cisco 2012):• 75% use in bed before sleep • 83% sleep with their phones• 90% check first thing in the
morning• A third use in bathroom (!!)• A fifth check every ten
minutes
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
4
Smartphone Data is Intimate
Who we know(contact list)
Who we call(call log)
Who we text(sms log)
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
5
Smartphone Data is Intimate
Where we go(gps, foursquare)
Photos(some geotagged)
Sensors(accel, sound, light)
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
6
The Opportunity
• We are creating a worldwide sensor network with these smartphones
• We can now capture and analyze human behavior at unprecedented fidelity and scale
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
7
Understanding Individuals
Min et al, Toss ‘n’ Turn: Smartphone as Sleep and Sleep Quality Detector, CHI 2014.
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
8
Understanding Small Groups
Better communication
ComputationalSocial Science
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
9
Understanding Urban Areas
• AT&T Work on Human Mobility
Median distance traveled per day
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
10
Laborshed for Morristown NJ
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
11
Eric Fischer’s Maps of Tourists vs Locals
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
12
The Challenge
• Mobile and Cloud computing a clear part of the future
• Lots of fun technical challenges here– Computer scientists are great here!– We’re awesome at optimizing things
• But biggest challenges will likelyrelate to privacy and ethics
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
13
Three Stories
• Reflections and personal experiences• Livehoods
– How much can we infer about people?– Who wins, who doesn’t with tech?
• Google Glass– Why so much negative feedback?– Are there lessons we can learn?
• PrivacyGrade– What are ways of protecting people?– What is the role of developers?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
14
Story 1: The Challenge of Getting Data About Our Cities
• Today’s methods for getting city data slow, costly, limited– Ex. Travel Behavioral Inventory– US Census 2010 cost $13b– Quality of life surveys
• Some approaches today:– Call Data Records– Deploy a custom app
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
15
The Vision: Urban Analytics
• How can we use smartphones + social media + machine learning to offer new and useful insights about cities in a manner that is cheap, fast, and scalable?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
16
Livehoods, Our First Urban Analytics Tool• The character of an urban area is defined
not just by the types of places found there, but also by the people that make it part of their daily life
Cranshaw et al, The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City, ICWSM 2012.
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
17
Two Perspectives
“Politically constructed” “Socially constructed”
Neighborhoods have fixed borders defined by the city government.
Neighborhoods are organic, cultural artifacts. Borders are blurry, imprecise, and may be different to different people.
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
18
Two Perspectives of Cities
“Socially constructed”
Neighborhoods are organic, cultural artifacts. Borders are blurry, imprecise, and may be different to different people.
Can we discover automated ways of identifying the “organic” boundaries of the city?
Can we extract local cultural knowledge from social media?
Can we build a collective cognitive map from data?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
19
Livehoods Data Source
• Crawled 18m check-ins from foursquare– Claims 20m users– People who linked their
foursquare accts to Twitter
• Spectral clustering basedon geographic and socialproximity
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
20
☺☺
☺
☺
If you watch check-ins over time, you’ll notice that groups of like-minded people tend to stay in the same areas.
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
21We can aggregate these
patterns to compute relationships between check-in venues.
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
22These relationships can
then be used to identify natural borders in the urban landscape.
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
23
Livehood 2
Livehood 1
We call the discovered clusters “Livehoods” reflecting their dynamic character.
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
25
South Side Pittsburgh
Safety
LH8LH9LH7
LH6
LH7 vs LH8“Whenever I was living down on 15th Street [LH7] I had to worry about drunk people following me home, but on 23rd [LH8] I need to worry about people trying to mug you... so it’s different. It’s not something I had anticipated, but there is a distinct difference between the two areas of the South Side.”
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
26
South Side Pittsburgh
Demographic Differences
LH8LH9LH7
LH6
LH6“There is this interesting mix of people there I don’t see walking around the neighborhood. I think they are coming to the Giant Eagle [grocery store] from lower income neighborhoods... I always assumed they came from up the hill.”
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
27
South Side Pittsburgh“I always assumed they came from up the hill.”
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
28
South Side Pittsburgh
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
29
Other Potential Urban Analytics
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
30
Topic Modeling (LDA)
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
31
Reflections on Urban AnalyticsFew Privacy Issues• Publicly visible data with no
expectations of privacy– No IRB issues
• Remove venues labeled as “home”– We only received one request to remove
a venue (wasn’t labeled as a home)
• We only show data about geographic areas vs individuals
• So far, so good
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
32
Reflections on Urban AnalyticsBut Lots of Questions About Ethics• Discussions with lots of different folks
on how data like this might be used in the future– Urban planners, Yahoo, Facebook,
Google Maps, Startups, and more
• Lots of great questions– Not so many great answers– Share some of these with you
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
33
How Much Can Be Inferred?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
34
How Much Can Be Inferred?
• Very likely much more can be inferred using rich data like this– Demographics, socioeconomic, friends– Physical and mental health (depression)– How “risky” you are (bars, clinics, etc)
• Unclear how far inferencing can go– Also, not much can stop advertisers, NSA,
startups, etc– And, not just what you do, but what others
like you are doing
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
35
Built a logistic regression to predict sexuality based on what your friends on Facebook disclosed
How Much Can Be Inferred?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
36
“[An analyst at Target] was able to identify about 25 products that… allowed him to assign each shopper a ‘pregnancy prediction’ score. [H]e could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.” (NYTimes)
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
37
How Much Can Be Inferred?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
38
A New Kind of Redlining?
• “denying, or charging more for, services such as banking, insurance, access to health care, … supermarkets, or denying jobs … against a particular group of people” (Wikipedia)
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
39
Huge Risk of New Kind of Redlining
Map of Philadelphia showing redlining of lower income neighborhoods.Households and businesses in the red zones could not get mortgages or business loans. (Wikipedia)
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
40
Huge Risk of New Kind of Redlining
Johnson says his jaw dropped when he read one of the reasons American Express gave for lowering his credit limit:
"Other customers who have used their card at establishments where you recently shopped have a poor repayment history with American Express."
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
41
Potential Biases in the Data?
Pittsburgh’s Hill District
Was ground zero for Jazz musicians in 20th century
8,000 residents and 400 businesses, decimating the economic center of African-American Pittsburgh
Median Income (2009): $17,939
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
42
Potential Biases in the Data?
• Socioeconomic bias– Little data from lower socioeconomic areas
• Urban bias– Twitter, Flickr, Foursquare all more active
per capita in cities
• Age and gender bias– Most young, male, technology-savvy
• Is this a problem that will solve itself?– Or, can we address this in our models?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
43
Who Gains From this Data?
• Today, most data only flows one way– Mainly to advertisers (and NSA)– Also banks, insurance, credit cards
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
44
Who Gains From this Data?
• Can we design systems that share the value across more people?– People co-create data and gain value– Participatory design philosophy
• Can we also make people feel more invested in the cities they live in?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
45
Tiramisu Bus Tracker App
• People can see incoming bus data
• People can alsoshare info– Got on bus– #seats available
• New feature: people can discuss transit
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
46
Story 2: Google Glass
• Why there has been so much negative backlash about Google Glass?
• Are there lessons we can learn here about privacy and adoption of tech?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
47
Origins of Ubicomp
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
48
Popular Press Reaction
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
49
Why Such Negative Press?
• PARC knew privacy a big problem– But didn’t know what to do– So didn’t build any privacy protections
• Unclear value proposition– Focused on technical aspects– What benefits to end-users?
• Google Glass is replaying the past
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
50
Why Such Negative Press?
• Grudin’s law– Why does groupware fail?– “When those who benefit are not
those who do the work, then the technology is likely to fail, or at least be subverted”
• Privacy corollary– When those who bear the privacy risks do
not benefit in proportion to the perceived risks, the tech is likely to fail
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
51
But Expectations Can Change
• Initial perceptionsof mobile phone users– Rude, annoying– Casual chat, driving
• Six weeks later…– Had same behaviors
• People with more exposure to mobilephones better
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
52
But Expectations Can Change
• Famous 1890article definingprivacy as “theright to be let alone” was aboutphotography
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
53
• People objected to having phones in their homes because it “permitted intrusion… by solicitors, purveyors of inferior music, eavesdropping operators, and even wire-transmitted germs”
But Expectations Can Change
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
54
But Expectations Can Change
One resort felt the trend so heavily that it posted a notice: “PEOPLE ARE FORBIDDEN TO USE THEIR KODAKS ON THE BEACH.” Other locations were no safer. For a time, Kodak cameras were banned from the Washington Monument. The “Hartford Courant” sounded the alarm as well, declaring the “the sedate citizen can’t indulge in any hilariousness without the risk of being caught in the act and having his photograph passed around among his Sunday School children.”
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
55
And Framing Really Matters
• Ubicomp -> Invisible Computing– Talked less about the tech– More about how it could help people– More positive press
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
56
Privacy Hump
• A lot of our concerns about tech fall under umbrella term “privacy”– Value, fears, expectations,
what others around us thinkMany legitimate concerns
Many alarmist rants
“Right” way to deploy?
Value proposition?
Rules on proper use?
Things have settled down
Few fears materialized
People feel in control
People get tangible value
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
57
But How to Get Over Hump?
• Still a big gap in knowledge on best ways of mitigating privacy issues
• Prime example: Facebook news feed
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
58
“We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.”
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
59
• Privacy Placebos: Things that make people feel better about privacy, but doesn’t offer much.
• Other examples: Privacy policies, access logs• How ethical is it to use these approaches?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
60
Story 3: PrivacyGrade
• – every SMS, search, and phone#
•
• How can we help improve privacy? – Developers– End-users
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
61
What Do Developers Know about Privacy?
• Interviews with 13 app developers• Surveys with 228 app developers
• What tools? Knowledge? Incentives?• Points of leverage?
Balebako et al, The Privacy and Security Behaviors of Smartphone App Developers. USEC 2014.
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
62
Summary of Findings
• App developers not evil• But need to monetize
– Ads and analytics heavily used– Turns out libraries big source of problems
• Also don’t know what to do– “I haven’t even read [our privacy policy]. I
mean, it’s just legal stuff that’s required, so I just put in there.”
– Often just ask other people around them– Low awareness of guidelines
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
63
Developers Need a Lot of Help!
• Good set best practices for security– SSL, hashing of passwords, randomization– Common attacks: SQL, XSS, CSRF
• What are best practices for privacy???
• Developers also have many problems– App functionality, bandwidth, power,
making money… privacy far down the list
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
64
What are your apps really doing?
Shares your location,gender, unique phone ID,phone# with advertisers
Uploads your entire contact list to their server(including phone #s)
Lin et al, Expectation and Purpose: Understanding User’s Mental Models of Mobile App Privacy thru Crowdsourcing. Ubicomp 2012.
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
65
Many Smartphone Apps Have “Unusual” Permissions
Location Data
Unique device ID
Location Data
Network Access
Unique device ID
Location Data
Unique device ID
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
66
Expectations vs Reality
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
67
Privacy as Expectations
• Apply this same idea of mental models for privacy– Compare what people expect an app
to do vs what an app actually does– Emphasize the biggest gaps,
misconceptions that many people had
App Behavior(What an app actually does)
User Expectations(What people think
the app does)
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
68
10% users were surprised this app wrote contents to their SD card.
25% users were surprised this app sent their approximate location to dictionary.com for searching nearby words.
85% users were surprised this app sent their phone’s unique ID to mobile ads providers.
0% users were surprised this app could control their audio settings.
See all
90% users were surprised this app sent their precise location to mobile ads providers.
95% users were surprised this app sent their approximate location to mobile ads providers.
95% users were surprised this app sent their phone’s unique ID to mobile ads providers.
See all
0% users were surprised this app can control camera flashlight.
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
69
Results for Location Data (N=20 per app, Expectations Condition)
App Comfort Level (-2 – 2)
Maps 1.52
GasBuddy 1.47
Weather Channel 1.45
Foursquare 0.95
TuneIn Radio 0.60
Evernote 0.15
Angry Birds -0.70
Brightest Flashlight Free -1.15
Toss It -1.2
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
70
How to Scale It Up?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
71
How PrivacyGrade Works
• Libraries can offer us insight into the purpose of a permission request– Location used by Google Maps library vs
Location used by advertising library
• Create a model of people’s concerns of permission by purpose
Lin et al, Modeling Users’ Mobile App Privacy Preferences: Restoring Usability in a Sea of Permission Settings. SOUPS 2014.
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
72
How PrivacyGrade Works
• We crowdsourced people’s expectations of a core set of 837 apps– Ex. “How comfortable are you with
Fruit Ninja using your location for ads?”– 20 people per question (as above)
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
73
How PrivacyGrade Works
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
74
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
75
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
76
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
77
Early Discussion of PrivacyGrade
• Can we use big data for privacy?• Many points of leverage in ecosystem
– Policy makers / third party advocates– Developers– OS / Hardware / app store– End-users
• Legal issues– Still under discussion with CMU lawyers– DMCA, liability issues
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
78
How far should we go in inferring behaviors?
How can we minimize biases in our data and
models?How can we help ensure
data doesn’t flow one way?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
79
What are better ways of going over the privacy
hump?
How acceptable is it to use privacy placebos?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
80
How can we helpdevelopers with privacy?
Can we use big data approaches for privacy?
What are ways of getting our work out there?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
81
How can we work with public policy makers
to create better guidelines
around privacy?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
82
How can we createa connected world we
would all want to live in?
ComputerHumanInteraction:MobilityPrivacySecurity
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
83
Thanks!
More info at cmuchimps.orgor email [email protected]
Special thanks to:• Army Research Office• NSF• Alfred P. Sloan• NQ Mobile
• DARPA• Google• CMU Cylab
• Shah Amini• Justin Cranshaw• Afsaneh Doryab• Jialiu Lin
• Song Luan• Jun-Ki Min• Dan Tasse• Jason Wiese
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
84
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
85
Summary
• Smartphones and cloud computing
offer big opportunity to understand human behavior
• Also pose many large challenges, in privacy and ethics
• But I’m optimistic
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
86
86
Inte
nsity
feat
ures
Nu
mb
er
of
co-l
ocati
on
s
With
out inte
nsity
Full model
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
87
Using Location Data to Infer Friendships
• 2.8m location sightings of 489 users of Locaccino friend finder in Pittsburgh
• Place entropy for inferring social quality of a place– #unique people seen in a place– 0.0002 x 0.0002 lat/lon grid,
~30m x 30m
Cranshaw et al, Bridging the Gap Between Physical Location and Online Social Networks, Ubicomp 2010
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
88
Inferring Friendships
• 67 different machine learning features– Location diversity (and entropy)– Intensity and Duration– Specificity (TF-IDF)– Graph structure (overlap in friends)
• 92% accuracy in predicting friend/not– Location entropy improves performance
over shallow features like #co-locations
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
89
• Insert graph here• Describe entropy
Co-location data to infer friendshipUsing place entropy, accuracy of 92%Can also infer number of friends
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
90
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
91
• Insert graph here• Describe entropy
Could infer friend / not-friend based on co-location patterns and place entropy at 92%
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
92
Brooklyn Queens Expressway
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
93
Bezerkeley, CA
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
94
Expectations Condition
Why do you think Angry Birds uses your location data?
How comfortable are you with Angry Birds using your location data?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
95
Purpose Condition
Angry Birds uses your location data for advertising.
How comfortable are you with Angry Birds using your location data?
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
96
Few people read privacy policies• We want to install the app• Reading policies not part of main task• Complexity (the pain!!!)• Clear cost (time) for unclear benefit
©2
01
4 C
arn
eg
ie M
ello
n U
niv
ers
ity :
97
How PrivacyGrade Works (1)
• We operationalize privacy as people’s expectations of data use– Ex. Most people don’t expect Fruit Ninja
to use location data, but it actually does– Ex. Most people do expect Google Maps
to use location data