applying the wisdom of crowds to usable privacy and security, cmu crowdsourcing seminar oct 2011

67
©2009 Carnegie Mellon University : 1 Applying the Wisdom of Crowds to Usable Privacy and Security Jason I. Hong Carnegie Mellon University

Upload: jason-hong

Post on 27-Jan-2015

106 views

Category:

Technology


1 download

DESCRIPTION

A summary of my group's work in using crowdsourcing techniques and wisdom of crowds to improve privacy and security. I talked about some techniques to improve crowdsourcing for anti-phishing, some ways of using lots of location data to infer location privacy preferences, and some of our early work on using crowdsourcing to understand privacy preferences regarding smartphone apps.

TRANSCRIPT

Page 1: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

00

9 C

arn

eg

ie M

ello

n U

niv

ers

ity :

1

Applying the Wisdom of Crowds to Usable Privacy and Security

Jason I. HongCarnegie Mellon University

Page 2: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

2

Usable Privacy and Security

• Cyber security is a national priority– Increasing levels of malware and phishing– Accidental disclosures of sensitive info– Reliability of critical infrastructure

• Privacy concerns growing as well– Breaches and theft of customer data– Ease of gathering, storing, searching

• Increasing number of issues deal with human element

Page 3: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

3

Fake Interfaces to Trick People

Fake Anti-Virus(installs malware)

Page 4: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

4

Misconfigurations

Facebook controlsfor managing

sharing preferences

Page 5: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

5

Too Many Passwords

Page 6: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

6

Other Examples

• Web browser certificates• Do not track / behavioral advertising• Location privacy• Online social network privacy• Intrusion detection and visualizations• Effective warnings• Effective security training• …

Page 7: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

7

Usable Privacy and Security

“Give end-users security controls they can understand and privacy they can control for the dynamic, pervasive computing environments of the future.”

CRA “Grand Challenges in Information Security & Assurance” 2003

Page 8: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

8

Today’s Talk

• Apply crowdsourcing to speed up detection of phishing web sites

• Using location data to understand people, places, and relationships

• Using crowdsourcing to understand privacy of mobile apps

Page 9: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

9

Today’s Talk

• Apply crowdsourcing to speed up detection of phishing web sites

• Using location data to understand people, places, and relationships

• Using crowdsourcing to understand privacy of mobile apps

Page 10: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

00

9 C

arn

eg

ie M

ello

n U

niv

ers

ity :

10

Smartening the Crowds: Computational Techniques for Improving Human Verification to Fight Phishing Scams

Symposium on Usable Privacy and Security 2011

Gang LiuWenyin LiuDepartment of Computer ScienceCity University of Hong Kong

Guang XiangBryan A. PendletonJason I. HongCarnegie Mellon University

Page 11: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

11

• RSA SecurID• Lockheed-Martin• Gmail• Epsilon mailing list• Australian government• Canadian government• Oak Ridge Nat’l Labs• Operation Aurora

Page 12: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

12

Detecting Phishing Websites

• Method 1: Use heuristics– Unusual patterns in URL, HTML, topology– Approach favored by researchers– High true positives, some false positives

• Method 2: Manually verify– Approach used by industry blacklists today

(Microsoft, Google, PhishTank)– Very few false positives, low risk of liability– Slow, easy to overwhelm

Page 13: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

13

Page 14: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

14

Page 15: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

15

Page 16: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

16

Wisdom of Crowds Approach

• Mechanics of PhishTank– Submissions require at least 4 votes

and 70% agreement– Some votes weighted more

• Total stats (Oct2006 – Feb2011)– 1.1M URL submissions from volunteers– 4.3M votes– resulting in about 646k identified phish

• Why so many votes for only 646k phish?

Page 17: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

17

PhishTank StatisticsJan 2011

Submissions 16019

Total Votes 69648

Valid Phish 12789

Invalid Phish 549

Median Time 2hrs 23min

• 69648 votes max of 17412 labels– But only 12789 phish and 549 legitimate identified– 2681 URLs not identified at all

• Median delay of 2+ hours still has room for improvement (used to be 12 hours)

Page 18: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

18

Why Care?

• Can improve performance of human-verified blacklists– Dramatically reduce time to blacklist– Improve breadth of coverage– Offer same or better level of accuracy

• More broadly, new way of improving performance of crowd for a task

Page 19: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

19

Ways of Smartening the Crowd

• Change the order URLs are shown– Ex. most recent vs closest to completion

• Change how submissions are shown– Ex. show one at a time or in groups

• Adjust threshold for labels– PhishTank is 4 votes and 70%– Ex. vote weights, algorithm also votes

• Motivating people / allocating work– Filtering by brand, competitions,

teams of voters, leaderboards

Page 20: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

20

Ways of Smartening the Crowd

• Change the order URLs are shown– Ex. most recent vs closest to completion

• Change how submissions are shown– Ex. show one at a time or in groups

• Adjust threshold for labels– PhishTank is 4 votes and 70%– Ex. vote weights, algorithm also votes

• Motivating people / allocating work– Filtering by brand, competitions,

teams of voters, leaderboards

Page 21: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

21

Ways of Smartening the Crowd

• Change the order URLs are shown– Ex. most recent vs closest to completion

• Change how submissions are shown– Ex. show one at a time or in groups

• Adjust threshold for labels– PhishTank is 4 votes and 70%– Ex. vote weights, algorithm also votes

• Motivating people / allocating work– Filtering by brand, competitions,

teams of voters, leaderboards

Page 22: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

22

Ways of Smartening the Crowd

• Change the order URLs are shown– Ex. most recent vs closest to completion

• Change how submissions are shown– Ex. show one at a time or in groups

• Adjust threshold for labels– PhishTank is 4 votes and 70%– Ex. vote weights, algorithm also votes

• Motivating people / allocating work– Filtering by brand, competitions,

teams of voters, leaderboards

Page 23: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

23

Ways of Smartening the Crowd

• Change the order URLs are shown– Ex. most recent vs closest to completion

• Change how submissions are shown– Ex. show one at a time or in groups

• Adjust threshold for labels– PhishTank is 4 votes and 70%– Ex. vote weights, algorithm also votes

• Motivating people / allocating work– Filtering by brand, competitions,

teams of voters, leaderboards

Page 24: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

24

Overview of Our Work

• Crawled unverified submissions from PhishTank over 2 week period

• Replayed URLs on MTurk over 2 weeks– Required participants to play

2 rounds of Anti-Phishing Phil– Clustered phish by html similarity– Two cases: phish one at a time, or in a

cluster (not strictly separate conditions)– Evaluated effectiveness of vote weight

algorithm after the fact

Page 25: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

25

Anti-Phishing Phil

• We had MTurkers play two rounds of Phil [Sheng 2007] to qualify ( = 5.2min)

• Goal was to reduce lazy MTurkers and ensure base level of knowledge

Page 26: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

26

Page 27: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

27

Page 28: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

28

Clustering Phish

• Observations– Most phish are generated by toolkits and

thus are similar in content and appearance– Can potentially reduce labor by labeling

suspicious sites in bulk– Labeling single sites as phish can be hard

if unfamiliar, easier if multiple examples

Page 29: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

29

Clustering Phish

• Motivations– Most phish are generated by toolkits and

thus similar– Labeling single sites as phish can be hard,

easier if multiple examples– Reduce labor by labeling suspicious sites

in bulk

Page 30: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

30

Clustering Phish

• Motivations– Most phish are generated by toolkits and

thus similar– Labeling single sites as phish can be hard,

easier if multiple examples– Reduce labor by labeling suspicious sites

in bulk

Page 31: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

31

Most Phish Can be Clustered

• With all data over two weeks, 3180 of 3973 web pages can be grouped (80%)– Used shingling and DBSCAN (see paper)– 392 clusters, size from 2 to 153 URLs

Page 32: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

32

Page 33: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

33

MTurk Tasks

• Two kinds of tasks, control and cluster– Listed these two as separate HITs– MTurkers paid $0.01 per label– Cannot do between-conditions on MTurk– MTurker saw a given URL at most once

• Four votes minimum, 70% threshold– Stopped at 4 votes, cannot dynamically

request more votes on MTurk– 153 (3.9%) in control and 127 (3.2%) in

cluster not labeled

Page 34: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

34

MTurk Tasks

• URLs were replayed in order– Ex. If crawled at 2:51am from PhishTank

on day 1, then we would replay at 2:51am on day 1 of experiment

– Listed new HITs each day rather than a HIT lasting two weeks (to avoid delays and last minute rush)

Page 35: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

35

Summary of Experiment

• 3973 suspicious URLs– Ground truth from Google, MSIE, and

PhishTank, checked every 10 min– 3877 were phish, 96 not

• 239 MTurkers participated– 174 did HITs for both control and cluster– 26 in Control only, 39 in Cluster only

• Total of 33,781 votes placed– 16,308 in control– 11,463 in cluster (17473 equivalent)

• Cost (participants + Amazon): $476.67 USD

Page 36: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

36

Results of Aquarium

• All votes are the individual votes• Labeled URLs are after aggregation

Page 37: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

37

Comparing Coverage and Time

Page 38: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

38

Voteweight

• Use time and accuracy to weight votes– Those who vote early and accurately

are weighted more– Older votes discounted– Incorporates a penalty for wrong votes

• Done after data was collected– Harder to do in real-time since we don’t

know true label until later

• See paper for parameter tuning– Of threshold and penalty function

Page 39: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

39

Voteweight Results

• Control condition best scenario– Before-after– 94.8% accuracy, avg 11.8 hrs, median 3.8– 95.6% accuracy, avg 11.0 hrs, median 2.3

• Cluster condition best scenario– Before-after– 95.4% accuracy, avg 1.8 hrs, median 0.7– 97.2% accuracy, avg 0.8 hrs, median 0.5

• Overall: small gains, potentially more fragile and more complex though

Page 40: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

40

Limitations of Our Study

• Two limitations of MTurk– No separation between control and cluster– ~3% tie votes unresolved (more votes)

• Possible learning effects?– Hard to tease out with our data– Aquarium doesn’t offer feedback– Everyone played Phil– No condition prioritized over other

• Optimistic case, no active subversion

Page 41: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

41

Conclusion

• Investigated two techniques for smartening the crowd for anti-phishing– Clustering and voteweight

• Clustering offers significant advantages wrt time and coverage

• Voteweight offers smaller improvements in effectiveness

Page 42: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

42

Today’s Talk

• Apply crowdsourcing to speed up detection of phishing web sites

• Using location data to understand people, places, and relationships

• Using crowdsourcing to understand privacy of mobile apps

Page 43: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

00

9 C

arn

eg

ie M

ello

n U

niv

ers

ity :

43

Bridging the Gap BetweenPhysical Location and Online Social Networks

12th International Conference on Ubiquitous Computing (Ubicomp 2010)

Justin CranshawEran TochJason HongAniket KitturNorman SadehCarnegie Mellon University

Page 44: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

44

Understanding Human Behavior at Large Scales

• Capabilities of today’s mobile devices– Location, sound, proximity, motion– Call logs, SMS logs, pictures

• We can now analyze real-world social networks and human behaviors at unprecedented fidelity and scale

• 2.8m location sightings of 489 participants in Pittsburgh

Page 45: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

45

• Insert graph here• Describe entropy

Page 46: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

46

Early Results

• Can predict Facebook friendships based on co-location patterns– 67 different features

• Intensity and Duration• Location diversity (entropy)• Mobility• Specificity (TF-IDF)• Graph structure (mutual neighbors, overlap)

– 92% accuracy in predicting friend/not

Page 47: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

47

Using features like location entropy significantly improves performance over shallow features such as #co-locations

Page 48: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

48

Inte

nsity

feat

ures

Inte

nsity

feat

ures

Num

ber

of

co-

loca

t ions

Num

ber

of

co-

loca

t ions

With

out intensit

y

Full model

Page 49: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

49

Early Results

• Can predict number of friends based on mobility patterns– People who go out often, on weekends,

and to high entropy places tend to have more friends

– (Didn’t check age though)

Page 50: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

50

Entropy Related to Location Privacy

Page 51: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

51

Collective Real-World Intelligence

• Location data alone can tell us a lot about people, the places they go, the relationships they have

• Characterizing individuals– Personal frequency– Personal mobility pattern

• Characterizing social quality of places– Entropy – number of unique people– Churn – same people or different– Transience – amount of time spent– Burst – regularity of people seen

Page 52: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

52

Collective Real-World Intelligence

• Apps for Usable Privacy and Security– Using places for authentication– Protecting geotagged data

• 4.3% Flickr photos, 3% YouTube, 1% Craigslist photos geotagged

Page 53: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

53

Collective Real-World Intelligence

• Other potential apps and analyses:– Architecture and urban design– Use of public resources (e.g. buses) – Traffic Behavioral Inventory (TBI)– Characterizing neighborhoods– What do Pittsburghers do?

Page 54: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

54

Crowdsourcing Location Data

• How to incentivize thousands of people in multiple cities to run our app?– Pay?– Altruism?– Enjoyment?– Side effect?

• Key difference is highly sensitive personal data (vs microtasks)

Page 55: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

55

Today’s Talk

• Apply crowdsourcing to speed up detection of phishing web sites

• Using location data to understand people, places, and relationships

• Using crowdsourcing to understand privacy of mobile apps

Page 56: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

56

Shares your location,gender, unique phone ID,phone# with advertisers

Uploads your entire contact list to their server(including phone #s)

What are your apps really doing?

• WSJ analysis of 101 apps found half share phone’s unique ID and location

Page 57: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

57

Android

• What do these permissions mean?

• Why does app need this permission?

• When does it usethese permissions?

Page 58: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

58

Research on Scanning Apps

• TaintDroid intercepts certain calls and asks user if it’s ok

• Others scan binaries – Ex. what web sites it connects to

• Others scan what goes on the network– Ex. “looks like a SSN”

Page 59: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

59

Our Position

• No automated technique will ever be able to differentiate between acceptable and unacceptable behavior

• Many false positives b/c scanners also flag things app does by design– Ex. Flagging Evernote for

connecting to their servers

Page 60: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

60

Crowdsourcing and Privacy

• Re-frame privacy as expectations– Capture what people expect an app to do– See how well app matches expectations– Use top mismatches as privacy summary

for non-experts (and for devs)

• Use crowdsourcing to accomplish this– Ideally would like experts, but experts

don’t scale– 300k Android apps, 500k iPhone apps

Page 61: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

61

Screen-by-Screen Probing

• Generate tree of UI screens

Page 62: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

62

Screen-by-Screen Probing

• Scan app to capturewhat happens if aperson transitionsfrom one screen toanother

…Gets locationSends to yelp.com

Gets contactsSends to yelp.com

Page 63: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

63

Screen-by-Screen Probing

What data do you think is sent to Yelp if you click the “Nearby” icon?• Current location• Contact List• Phone call log

• SMS log• Unique phone ID• …

Page 64: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

64

Screen-by-Screen Probing

How comfortable would you be if the Yelp app sent your current location to the Yelp servers when you click on the “Nearby” icon?

Page 65: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

65

Screen-by-Screen Probing

• Use top mismatches to generate new privacy summaries– Ex. “93% of people didn’t expect Facebook

app to send contact list to their servers”

• Current work:– Building remote evaluation tool– Creating screen mockups to compare

expert vs MTurker results• Can MTurkers understand the data types?• Can MTurkers offer mostly accurate results?

Page 66: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

66

What’s New and Different for Crowdsourcing?• New crowdsourcing issues with security

– Active and adaptive adversaries– Timeliness has new urgency

• New ways of understanding human behaviors at large scale thru location– Incentivizing people to share data

• New ways of gauging end-user privacy– Possibly new way of understanding privacy– Structuring tasks so that novices can give

useful feedback

Page 67: Applying the Wisdom of Crowds to Usable Privacy and Security, CMU Crowdsourcing Seminar Oct 2011

©2

01

1 C

arn

eg

ie M

ello

n U

niv

ers

ity :

67

Acknowledgments

• CyLab and Army Research Office• Research Grants Council of the Hong

Kong Special Administrative Region• Alfred P. Sloan Foundation• Google• DARPA