privacy usc csci499
DESCRIPTION
Privacy USC CSci499. Dr. Genevieve Bartlett USC/ISI. Privacy. The state or condition of being free from observation. Privacy. The state or condition of being free from observation. Not really possible today…at least not on the internet. Privacy. - PowerPoint PPT PresentationTRANSCRIPT
PRIVACYUSC CSCI499Dr. Genevieve BartlettUSC/ISI
Privacy The state or condition of being free from
observation.
Privacy The state or condition of being free from
observation.
Not really possible today…at least not on the internet.
Privacy The right of people to choose freely
under what circumstances and to what extent they will reveal themselves, their attitude, and their behavior to others.
Privacy is not black and white Lots of grey areas and points for
discussion What seems private to you may not
seem private to me Three examples to start us off:
HTTP Cookies Google Street View Facebook
HTTP cookies: What are they? Cookies = small text file Received from a server, stored on your
machine Usually web
Purpose: HTTP is stateless, so cookies maintain state for the HTTP protocol Eg keeping the contents of your “shopping
cart” while you browse a site
HTTP cookies: 3rd party cookies
You visited your favorite site unicornsareawesome.com
unicornsareawesome.com pulls ads from lameads.com
You get a cookie from lameads.com, even though you never visited lameads.com
lameads.com can track your browsing habits every time you visit any page with ads from lameads.com… those might be a lot of pages
HTTP cookies: Grey Area? 3rd party cookies allow ad servers to
personalize your ads = more useful to you. Good!
But You choose to go to
unicornsareawesome.com = ok with unicornsareawesome.com knowing about how you use their site
Nowhere did you choose to let lameads.com monitor your browsing habits
Short Discussion: Collusion: tool to track these 3rd party
cookies TED talk on “Tracking the Trackers”
http://www.ted.com/talks/gary_kovacs_tracking_the_trackers.html
Google Street View: What is it? Google cars drive around and take360° panoramic pictures. Images are stitched together andcan be browsed through on the Internet
Google Street View: Me
Google Street View: Lots to See
Google Street View: Grey Area Expectation of privacy?
I’m in public, I can expect people will see me
Expectations? Picture linked to location Searchable Widely available Available for a long time to come
Facebook: What is it? Social networking site
Connect with friends Share pictures, interests (“likes”)
Facebook: Grey Area Who uses Facebook data and how is data
used? 4.7 million liked a page about health
conditions or treatments. Insurance agents? 4.8 million shared information about dates
of vacations. Burglars? 2.6 million discussed recreational use of
alcohol. Employers?
Facebook: More Grey Security issues with Facebook Confusion over privacy settings Sudden changes in default privacy
settings Facebook tracks browsing habits, even if
a user isn’t logged in (third-party cookies)
Facebook sells user information to ad agencies and behavioral trackers
Why start with these examples? 3 examples: HTTP cookies, Google Street
View, Facebook Lots more “every day” examples
Users gain benefits by sharing data Tons of data generated, widely shared
and accessible and stored (for how long?)
Are users really aware of how and who?
Today’s Agenda Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy
in mind?
Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy
in mind?
Examples private information Tons of information can be gained from Internet use:
Behavior Eg. Person X reads reddit.com at work.
Preferences Eg. Person Y likes high heel shoes and uses Apple products.
Associations Eg. Person X and Person Y are friends.
PPI (private, personal/protected information) credit card #s, SSN, nick names, addresses
PII (personally identifying information) Eg. Your age + your address = I know who you are, even if I’m
not given your name.
How do we achieve privacy? policy + security mechanisms + law + ethics + trust Anonymity & Anonymization
mechanisms Make each user indistinguishable from the
next Remove PPI & PII Aggregate information
Who wants private info? Governments – surveillance Businesses – targeted advertising,
following trends Attackers – monetize information or
cause havoc Researchers – medical, behavioral,
social, computer
Who has private info? You and me
End-users Customers Patients
Businesses Protect mergers, product plans,
investigations Government & law enforcement
National security Criminal investigations
Privacy and Security Security enables privacy
Data is only as safe as the system its on
Sometimes security at odds with privacy Eg. Security requires authentication, but
privacy is achieved through anonymity Eg. TSA pat down at the airport
Privacy and Privacy & Security How do we “safely” share private
data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy
in mind?
Why do we want to share? Share existing data sets:
Research Companies
Buy data from each other Check out each other’s assets before
merges/buyouts Start a new dataset:
Mutually beneficial relationships Share data with me and you can use this
service
Sharing everything? Easy, but what are the ramifications? Legal/policy may limit what can be
shared/collected IRBs: Institutional Review Board HITECH & HIPAA: Health Insurance
Portability and Accountability Act Future use and protection of data?
Mechanisms for limited sharing Remove really sensitive stuff
(sanitization) PPI & PII (private, personal & private
identifying) Without a crystal ball, this is hard
Anonymization Replace information to limit ability to tie
entities to meaningful identities Aggregation
Remove PII by only collecting/releasing statistics
Anonymization Example Network trace:
PAYLOAD
Anonymization Example Network trace:
PAYLOAD
All sorts of PII and PPI in there!
Anonymization Example Network trace:
PAYLOAD
Routing information: IP addresses, TCP flags/options, OS fingerprinting
Anonymization Example Network trace:
PAYLOAD
Remove IPs? Anonymize IPs?
Anonymization Example Network trace:
PAYLOAD
Removing IPs severely limits what you can do with the data.Replace with something identifying, but not the same data.
IP1 = AIP2 = B Etc.
Aggregation Example “Fewer U.S. Households Have Debt,
But Those Who Do Have More, Census Bureau Reports”
Methods can be bad or good Just because someone uses aggregation
or anonymization, doesn’t mean the data is safe
Example: Release aggregate stats of people’s favorite
color?
Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy
in mind?
What is Inferred? Take 2 sources of information, correlate
data X + Y = …. Example: Google Street View + what my
car looks like + where I live = you know where I was back in November
Another example Paula Broadwell who had an affair with
CIA director David Petraeus, similarly took extensive precautions to hide her identity. She never logged in to her anonymous e-mail service from her home network. Instead, she used hotel and other public networks when she e-mailed him. The FBI correlated hotel registration data from several different hotels -- and hers was the common name.
Another example: Netflix & IMDB Netflix prize: released an anonymized
dataset Correlated with IMDB: undid
anonymization (University of Texas)
Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy
in mind?
What is social networking data? Associations Not what you say, but who you talk to
OMG NEW BOYFRIEND
Why is social data interesting? From a privacy point of view:
Guilt by association Eg. Government very interested
Phone records (US) Facebook activity (Iran)
Computer Communication Computer communication = social network What sites/servers you visit/use = information
on your relationship with those sites/servers
Never mind the content…How often you visit and who you visit may reveal a lot!
You Unicornsareawesome.com
How do we provide privacy?
Of course encrypt content (payload)! But: Network/transport layer = no
encryption (for now)
Anyone along the path can see source and destination… so now what?
Onion Routing General idea: bounce connection
through a bunch of machines
Don’t we bounce around already?
Not actually what happens……
Don’t we bounce around already?
Closer to what actually happens.
Don’t we bounce around already?
Yes, we route packets through a series of routers
BUT this doesn’t protect the privacy of who’s talking to whom…
Why? PAYLOAD
Don’t we bounce around already?
Yes, we route packets through a series of routers
BUT this doesn’t protect the privacy of who’s talking to who…
Why?
Contains routing information.
ENCRYPTED
Yes, we bounce… but: Everyone along the way can see src &
dst Routes are easy to figure out
Contains routing information = Can’t encryptEveryone along the path (routers and observers) can see who is talking to whom
ENCRYPTED
Onion routing saves us Each router only knows about the
last/next hop Routes are hard to figure out
Change frequently Chosen by the source
The Onion part of Onion Routing
Layers of encryption
PAYLOAD
Last hop’s key
Second hop’s key
First hop’s key
Onion Routing Example: Tor
You
Unicornsareawesome.com
Onion Routing Example: Tor
YouTor directory
Get a list of Tor Routers from the publically known Tor directory
Tor Router IPs + public key for each router
Onion Routing Example: Tor
You
Unicornsareawesome.com
Tor Routers
Onion Routing Example: Tor
You
Unicornsareawesome.comChoose a set of Tor routers to use
1st
2nd
3rd
Onion Routing Example: Tor
You
Unicornsareawesome.comPackets are now encrypted with 3 keys
1st
2nd
3rd
Onion Routing Example: Tor
You
Unicornsareawesome.com
1st
2nd
3rd
Source: YOU, Dest: 1st Tor router
Onion Routing Example: Tor
You
Unicornsareawesome.com
1st
2nd
3rd
Decrypts 1st layer
Onion Routing Example: Tor
You
Unicornsareawesome.com
1st
2nd
3rd
Source: 1st Tor router, Dest: 2nd Tor router
Onion Routing Example: Tor
You
Unicornsareawesome.com
1st
2nd
3rd
Decrypts 2nd layer
Onion Routing Example: Tor
You
Unicornsareawesome.com
1st
2nd
3rdSource: 2nd Tor router, Dest: 3rd Tor router
Onion Routing Example: Tor
You
Unicornsareawesome.com
1st
2nd
3rdDecrypts last layer
Onion Routing Example: Tor
You
Unicornsareawesome.com
1st
2nd
3rd
Original (unencrypted) packet sent to server.
Source: 3rd Tor router, Dest: Unicornsareawesome.com
What does our attacker see?
Encrypted traffic from You, to 1st Tor router
You
What does our attacker see?
Other view points? Not easily traceable to you.
You
What does our attacker see?
Global view points? Very unlikely... But if so… trouble!
What does our attacker see?
Also unlikely… can perform correlation between end-to-end.
Reliance on multiple users
What would happen here if You were the only one using Tor?
You
Side note: Tor is an overlay
Tor routers are often just someone’s regular machine. Traffic is still routed over regular routers too.
Onion Routing: Things to Note Not perfect, but pretty nifty End host (unicornsareawesome.com)
does not need to know about the Tor protocol (good for wide usage and acceptance)
Data is encrypted all the way to the last Tor router If end-to-end application (like HTTPS) is
using encryption, the payload is doubly encrypted along the Tor route.
Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy
in mind?
Designing privacy preserving systems
Aim for the minimum amount of information needed to achieve goals
Think through how info can be gained and inferred Inferred is often a gotcha! x + y =
something private, but x and y by themselves don’t seem all that special
Think through how information be gained On the wire? Stored in logs? At a router? At
an ISP?
Privacy and Stored Information Data is only as safe as the system How long is the data stored affects
privacy Longer term = bigger privacy risk (in
general) Longer time frame, more data to correlate
& infer Longer opportunity for data theft Increased chances of mistakes, lapsed
security etc.
An example of keeping privacy in mind
My work: P2P file sharing detection