privacy usc csci499

Post on 24-Feb-2016

17 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Privacy USC CSci499. Dr. Genevieve Bartlett USC/ISI. Privacy. The state or condition of being free from observation. Privacy. The state or condition of being free from observation. Not really possible today…at least not on the internet. Privacy. - PowerPoint PPT Presentation

TRANSCRIPT

PRIVACYUSC CSCI499Dr. Genevieve BartlettUSC/ISI

Privacy The state or condition of being free from

observation.

Privacy The state or condition of being free from

observation.

Not really possible today…at least not on the internet.

Privacy The right of people to choose freely

under what circumstances and to what extent they will reveal themselves, their attitude, and their behavior to others.

Privacy is not black and white Lots of grey areas and points for

discussion What seems private to you may not

seem private to me Three examples to start us off:

HTTP Cookies Google Street View Facebook

HTTP cookies: What are they? Cookies = small text file Received from a server, stored on your

machine Usually web

Purpose: HTTP is stateless, so cookies maintain state for the HTTP protocol Eg keeping the contents of your “shopping

cart” while you browse a site

HTTP cookies: 3rd party cookies

You visited your favorite site unicornsareawesome.com

unicornsareawesome.com pulls ads from lameads.com

You get a cookie from lameads.com, even though you never visited lameads.com

lameads.com can track your browsing habits every time you visit any page with ads from lameads.com… those might be a lot of pages

HTTP cookies: Grey Area? 3rd party cookies allow ad servers to

personalize your ads = more useful to you. Good!

But You choose to go to

unicornsareawesome.com = ok with unicornsareawesome.com knowing about how you use their site

Nowhere did you choose to let lameads.com monitor your browsing habits

Short Discussion: Collusion: tool to track these 3rd party

cookies TED talk on “Tracking the Trackers”

http://www.ted.com/talks/gary_kovacs_tracking_the_trackers.html

Google Street View: What is it? Google cars drive around and take360° panoramic pictures. Images are stitched together andcan be browsed through on the Internet

Google Street View: Me

Google Street View: Lots to See

Google Street View: Grey Area Expectation of privacy?

I’m in public, I can expect people will see me

Expectations? Picture linked to location Searchable Widely available Available for a long time to come

Facebook: What is it? Social networking site

Connect with friends Share pictures, interests (“likes”)

Facebook: Grey Area Who uses Facebook data and how is data

used? 4.7 million liked a page about health

conditions or treatments. Insurance agents? 4.8 million shared information about dates

of vacations. Burglars? 2.6 million discussed recreational use of

alcohol. Employers?

Facebook: More Grey Security issues with Facebook Confusion over privacy settings Sudden changes in default privacy

settings Facebook tracks browsing habits, even if

a user isn’t logged in (third-party cookies)

Facebook sells user information to ad agencies and behavioral trackers

Why start with these examples? 3 examples: HTTP cookies, Google Street

View, Facebook Lots more “every day” examples

Users gain benefits by sharing data Tons of data generated, widely shared

and accessible and stored (for how long?)

Are users really aware of how and who?

Today’s Agenda Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy

in mind?

Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy

in mind?

Examples private information Tons of information can be gained from Internet use:

Behavior Eg. Person X reads reddit.com at work.

Preferences Eg. Person Y likes high heel shoes and uses Apple products.

Associations Eg. Person X and Person Y are friends.

PPI (private, personal/protected information) credit card #s, SSN, nick names, addresses

PII (personally identifying information) Eg. Your age + your address = I know who you are, even if I’m

not given your name.

How do we achieve privacy? policy + security mechanisms + law + ethics + trust Anonymity & Anonymization

mechanisms Make each user indistinguishable from the

next Remove PPI & PII Aggregate information

Who wants private info? Governments – surveillance Businesses – targeted advertising,

following trends Attackers – monetize information or

cause havoc Researchers – medical, behavioral,

social, computer

Who has private info? You and me

End-users Customers Patients

Businesses Protect mergers, product plans,

investigations Government & law enforcement

National security Criminal investigations

Privacy and Security Security enables privacy

Data is only as safe as the system its on

Sometimes security at odds with privacy Eg. Security requires authentication, but

privacy is achieved through anonymity Eg. TSA pat down at the airport

Privacy and Privacy & Security How do we “safely” share private

data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy

in mind?

Why do we want to share? Share existing data sets:

Research Companies

Buy data from each other Check out each other’s assets before

merges/buyouts Start a new dataset:

Mutually beneficial relationships Share data with me and you can use this

service

Sharing everything? Easy, but what are the ramifications? Legal/policy may limit what can be

shared/collected IRBs: Institutional Review Board HITECH & HIPAA: Health Insurance

Portability and Accountability Act Future use and protection of data?

Mechanisms for limited sharing Remove really sensitive stuff

(sanitization) PPI & PII (private, personal & private

identifying) Without a crystal ball, this is hard

Anonymization Replace information to limit ability to tie

entities to meaningful identities Aggregation

Remove PII by only collecting/releasing statistics

Anonymization Example Network trace:

PAYLOAD

Anonymization Example Network trace:

PAYLOAD

All sorts of PII and PPI in there!

Anonymization Example Network trace:

PAYLOAD

Routing information: IP addresses, TCP flags/options, OS fingerprinting

Anonymization Example Network trace:

PAYLOAD

Remove IPs? Anonymize IPs?

Anonymization Example Network trace:

PAYLOAD

Removing IPs severely limits what you can do with the data.Replace with something identifying, but not the same data.

IP1 = AIP2 = B Etc.

Aggregation Example “Fewer U.S. Households Have Debt,

But Those Who Do Have More, Census Bureau Reports”

Methods can be bad or good Just because someone uses aggregation

or anonymization, doesn’t mean the data is safe

Example: Release aggregate stats of people’s favorite

color?

Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy

in mind?

What is Inferred? Take 2 sources of information, correlate

data X + Y = …. Example: Google Street View + what my

car looks like + where I live = you know where I was back in November

Another example Paula Broadwell who had an affair with

CIA director David Petraeus, similarly took extensive precautions to hide her identity. She never logged in to her anonymous e-mail service from her home network. Instead, she used hotel and other public networks when she e-mailed him. The FBI correlated hotel registration data from several different hotels -- and hers was the common name.

Another example: Netflix & IMDB Netflix prize: released an anonymized

dataset Correlated with IMDB: undid

anonymization (University of Texas)

Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy

in mind?

What is social networking data? Associations Not what you say, but who you talk to

OMG NEW BOYFRIEND

Why is social data interesting? From a privacy point of view:

Guilt by association Eg. Government very interested

Phone records (US) Facebook activity (Iran)

Computer Communication Computer communication = social network What sites/servers you visit/use = information

on your relationship with those sites/servers

Never mind the content…How often you visit and who you visit may reveal a lot!

You Unicornsareawesome.com

How do we provide privacy?

Of course encrypt content (payload)! But: Network/transport layer = no

encryption (for now)

Anyone along the path can see source and destination… so now what?

Onion Routing General idea: bounce connection

through a bunch of machines

Don’t we bounce around already?

Not actually what happens……

Don’t we bounce around already?

Closer to what actually happens.

Don’t we bounce around already?

Yes, we route packets through a series of routers

BUT this doesn’t protect the privacy of who’s talking to whom…

Why? PAYLOAD

Don’t we bounce around already?

Yes, we route packets through a series of routers

BUT this doesn’t protect the privacy of who’s talking to who…

Why?

Contains routing information.

ENCRYPTED

Yes, we bounce… but: Everyone along the way can see src &

dst Routes are easy to figure out

Contains routing information = Can’t encryptEveryone along the path (routers and observers) can see who is talking to whom

ENCRYPTED

Onion routing saves us Each router only knows about the

last/next hop Routes are hard to figure out

Change frequently Chosen by the source

The Onion part of Onion Routing

Layers of encryption

PAYLOAD

Last hop’s key

Second hop’s key

First hop’s key

Onion Routing Example: Tor

You

Unicornsareawesome.com

Onion Routing Example: Tor

YouTor directory

Get a list of Tor Routers from the publically known Tor directory

Tor Router IPs + public key for each router

Onion Routing Example: Tor

You

Unicornsareawesome.com

Tor Routers

Onion Routing Example: Tor

You

Unicornsareawesome.comChoose a set of Tor routers to use

1st

2nd

3rd

Onion Routing Example: Tor

You

Unicornsareawesome.comPackets are now encrypted with 3 keys

1st

2nd

3rd

Onion Routing Example: Tor

You

Unicornsareawesome.com

1st

2nd

3rd

Source: YOU, Dest: 1st Tor router

Onion Routing Example: Tor

You

Unicornsareawesome.com

1st

2nd

3rd

Decrypts 1st layer

Onion Routing Example: Tor

You

Unicornsareawesome.com

1st

2nd

3rd

Source: 1st Tor router, Dest: 2nd Tor router

Onion Routing Example: Tor

You

Unicornsareawesome.com

1st

2nd

3rd

Decrypts 2nd layer

Onion Routing Example: Tor

You

Unicornsareawesome.com

1st

2nd

3rdSource: 2nd Tor router, Dest: 3rd Tor router

Onion Routing Example: Tor

You

Unicornsareawesome.com

1st

2nd

3rdDecrypts last layer

Onion Routing Example: Tor

You

Unicornsareawesome.com

1st

2nd

3rd

Original (unencrypted) packet sent to server.

Source: 3rd Tor router, Dest: Unicornsareawesome.com

What does our attacker see?

Encrypted traffic from You, to 1st Tor router

You

What does our attacker see?

Other view points? Not easily traceable to you.

You

What does our attacker see?

Global view points? Very unlikely... But if so… trouble!

What does our attacker see?

Also unlikely… can perform correlation between end-to-end.

Reliance on multiple users

What would happen here if You were the only one using Tor?

You

Side note: Tor is an overlay

Tor routers are often just someone’s regular machine. Traffic is still routed over regular routers too.

Onion Routing: Things to Note Not perfect, but pretty nifty End host (unicornsareawesome.com)

does not need to know about the Tor protocol (good for wide usage and acceptance)

Data is encrypted all the way to the last Tor router If end-to-end application (like HTTPS) is

using encryption, the payload is doubly encrypted along the Tor route.

Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy

in mind?

Designing privacy preserving systems

Aim for the minimum amount of information needed to achieve goals

Think through how info can be gained and inferred Inferred is often a gotcha! x + y =

something private, but x and y by themselves don’t seem all that special

Think through how information be gained On the wire? Stored in logs? At a router? At

an ISP?

Privacy and Stored Information Data is only as safe as the system How long is the data stored affects

privacy Longer term = bigger privacy risk (in

general) Longer time frame, more data to correlate

& infer Longer opportunity for data theft Increased chances of mistakes, lapsed

security etc.

An example of keeping privacy in mind

My work: P2P file sharing detection

top related