privacy data storage humboldt

Upload: jeremie

Post on 30-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Privacy Data Storage Humboldt

    1/23

    DATA STORAGEVIEWPOINT OF PRIVACYSlides from Prof. Johan Christoph Freytag (HumboldtUniversity, Berlin)

  • 8/9/2019 Privacy Data Storage Humboldt

    2/23

    Outline

    PrivacyPrivacy and contextPrivacy and mobilityPrivacy and context combined with privacy andmobility

    PRECIOSA kick off meetingHU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    2

    Paris, 11.04.2008

  • 8/9/2019 Privacy Data Storage Humboldt

    3/23

    PrivacyPrivacy of movement

    PRECIOSA kick off meetingHU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    3

    Paris, 11.04.2008

    B-O-333

    RFID

  • 8/9/2019 Privacy Data Storage Humboldt

    4/23

    PrivacyIs it always obvious?

    PRECIOSA kick off meetingHU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    5

    Paris, 11.04.2008

    Is it always obvious that privacy is violated or breached?

    Latanya Sweeneys FindingIn Massachusetts, USA, the Group Insurance Commission (GIC) is

    responsible for purchasing health insurance for state employees

    GIC has to publish the data:

    GIC( zip, dob, sex , diagnosis, procedure, ...)

    http://lab.privacy.cs.cmu.edu/people/sweeney/d ate o f b irth

    [Sween01]

  • 8/9/2019 Privacy Data Storage Humboldt

    5/23

    PrivacyLatanya Sweeneys Finding

    PRECIOSA kick off meetingHU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    6

    Paris, 11.04.2008

    Sweeney paid $20 and bought the voter registration list forCambridge, MA:

    William Weld (former governor) lives in Cambridge, hence isin VOTER

    6 people in VOTER share his dobonly 3 of them were man (same sex )Weld was the only one in that zipSweeney learned Welds medical records !

    VOTER(name, party, ..., zip, dob, sex )

    GIC( zip, dob, sex , diagnosis, procedure, ...)

  • 8/9/2019 Privacy Data Storage Humboldt

    6/23

    PrivacyLatanya Sweeneys Finding

    Observation: All systems worked as specified, yetan important data has leaked

    Information leakage occurred

    Despite the observation that all systems worked asspecifiedBeyond correctness!Whats missing?

    How do we protect against that kind of lack(leakage) of privacy?

    Paris, 11.04.2008PRECIOSA kick off meetingHU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    7

  • 8/9/2019 Privacy Data Storage Humboldt

    7/23

    PrivacyData Security

    Dorothy Denning, 1982:Data Security is the science and study of methods ofprotecting data (...) from unauthorized disclosure

    and modification

    Data Security =Confidentiality + Integrity

    (+ Availability)Distinct from system and network security

    Paris, 11.04.2008PRECIOSA kick off meetingHU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    8

  • 8/9/2019 Privacy Data Storage Humboldt

    8/23

    PrivacyWhat is Privacy?

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    9

    Definition 1:Privacy reflects the ability of a person, organization,government, or entity to control its own space, where the conceptof space (or privacy space) takes on different contexts.

    Physical space, against invasionBodily space, medical consentComputer space, spamWeb browsing space, Internet privacy

    [Sween02]

    [Agrawal03] Definition 2:

    Privacy is the right of individuals to determine for themselves when, how, andto what extent information about them is communicated to others.

    (We shall call this data/information privacy )

  • 8/9/2019 Privacy Data Storage Humboldt

    9/23

    PrivacyAnonymity and unobservability

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    10

    anonymity group event

    message

    access

    Everybody could be the originator of an event with an equal likelihood

  • 8/9/2019 Privacy Data Storage Humboldt

    10/23

    PrivacyApproaches for non-observable communication

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    11

    Anonymity group Events

    Message

    Access

    Whom to protect?sender(content of message)

    Basic approach:Dummy trafficProxies

    MIX-NetworksDC-Networks more

  • 8/9/2019 Privacy Data Storage Humboldt

    11/23

    PrivacyMaintaining data privacy for accessing databases

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    12

    k-anonymity &its properties

    [Sween02]

    introduced by Sweeney

  • 8/9/2019 Privacy Data Storage Humboldt

    12/23

    PrivacyAn example: Medical Records

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    13

    Identifying SensitiveInformation

    SoSecN Name Age Ethnic B Zipcode Disease

    007 Chris 07 Caucas 12344 Arthritis009 Jane 77 Caucas 53211 Cold

    011 Adam 28 Caucas 70234 Heart problem

    023 Charlie 27 Afr-Amer 95505 Flu

    034 Eve 27 Afr-Amer 54327 Arthritis

    054 Yvonne 44 Hispanic 12007 Diabetes

    099 John 65 Hispanic 12007 Flu

    [Aggarwal03]

  • 8/9/2019 Privacy Data Storage Humboldt

    13/23

    PrivacyMedical Records: De-identify & Release

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    14

    Sensitive

    Age Ethnic B Zipcode Disease

    07 Caucas 12344 Arthritis

    77 Caucas 53211 Cold

    28 Caucas 70234 Heart problem

    27 Afr-Amer 95505 Flu

    27 Afr-Amer 54327 Arthritis

    44 Hispanic 12007 Diabetes

    65 Hispanic 12007 Flu

  • 8/9/2019 Privacy Data Storage Humboldt

    14/23

    PrivacyNot sufficient!

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    15

    Public Database

    Uniquelyidentify

    you!

    Sensitive

    Age Ethnic B Zipcode Disease

    07 Caucas 12344 Arthritis

    77 Caucas 53211 Cold

    28 Caucas 70234 Heart problem

    27 Afr-Amer 95505 Flu

    27 Afr-Amer 54327 Arthritis

    44 Hispanic 12007 Diabetes

    65 Hispanic 12007 Flu

    Quasi-identifiers:reveal less information

    k-anonymity model

  • 8/9/2019 Privacy Data Storage Humboldt

    15/23

    Privacyk-anonymity Problem Definition

    Input: Database consisting of n rows, each with mattributesSet of domain values for attributes is finiteGoal : Suppress some entries in the table such thateach modified row becomes identical to at least k-1other rows.

    Objective : Minimizethe number of suppressedentries.

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    16

  • 8/9/2019 Privacy Data Storage Humboldt

    16/23

    PrivacyMedical Records: 2-anonymized table

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    17

    Age Ethnic B Zipcode Disease

    * Caucas * Arthritis

    * Caucas * Cold

    * Caucas * Heart problem

    27 Afr-Amer * Flu

    27 Afr-Amer * Arthritis* Hispanic 94042 Diabetes

    * Hispanic 94042 Flu

  • 8/9/2019 Privacy Data Storage Humboldt

    17/23

    PrivacyAccessing databases privately (Access privacy)

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    18

    Patent-DB

  • 8/9/2019 Privacy Data Storage Humboldt

    18/23

    PrivacyFirst (nave) approach

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    19

    Problem to solve:User/Client: no one should know thecontents of the query nor the result (noteven the server)

    Observation :Encrypting the communication betweenclient and server might not be sufficient(Adversary might access decrypted queryif he can get inside the databasesystem and if he can observe disk access)

    Nave solution:Client downloads the entire DB &executes queries locally unrealisticsolution (size & ownership of data)

    DB SERVER

    DB

    USER/

    CLIENT

    q u e r y

    r e s u

    l t

  • 8/9/2019 Privacy Data Storage Humboldt

    19/23

    PrivacyAccessing databases privately (Access privacy)

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    Simple Solution:Use a Secure Coprocessor (SC)Proven Hardware properties:

    Cannot observe computation from outside

    If tampered self-destruction occursRead entire database per query O(N)

    123

    45567

    Encrypted (Return record x)

    Database Server

    R e a

    d e n

    t i r e

    d a

    t a b a s e

    IBM 4758 Secure Coprocessor (SC )

    [Asonov01]

    20

  • 8/9/2019 Privacy Data Storage Humboldt

    20/23

    PrivacyMetric (Probabilistic Privacy)

    Using (Shannons) entropy definition to measure privacy:Pi ... Probability of query to access record iE ... uncertainty of adversarys observation

    E is maximal if all Pis have the same valuei.e. the adversary cannot give some values stored in the db a higherprobability of being accessed than others

    Perfect privacy : E does not change by observationsProbabilistic Privacy : adversary learns by observation (i.e. increaseprobability P i for some records)Goal : minimize learning (i.e. minimize increase of probabilities P i)

    Paris, 11.04.2008PRECIOSA kick off meeting

    HU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    21

    i

    N

    ii

    Pld P E *1

  • 8/9/2019 Privacy Data Storage Humboldt

    21/23

    PDA Probabilistic privacySecurity Parameters:

    a # of (sequential) requests to shuffled & encrypted database;b

    # of random requests to original database ( includes requested record)reshuffling after N/ b queries necessary

    ???

    ???????

    ?

    Database Server

    SC

    s h u

    f f l e d a n

    d e n c y p

    t e d d a t a b a s e

    123

    4556789

    10 Q u e r y

    o r i

    g i n a

    l d a t a b a s e

    0

    0,05

    0,1

    0,15

    0,2

    0,25

    1 2 3 4 5 6 7 8 9 10

    Probability distribution Each record of original database: P =(1- a / N )/ b Others: P =( a / N)/(N-b) Therefore, no one record can be completely excluded from

    query

  • 8/9/2019 Privacy Data Storage Humboldt

    22/23

    Privacy and context

    CombinatoricsMachine learningUse of backround knowledge linkage attacks

    CancerBreast cancer

    Lung cancerMale vs. female

    PRECIOSA kick off meetingHU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    23

    Paris, 11.04.2008

  • 8/9/2019 Privacy Data Storage Humboldt

    23/23

    Privacy and contextChallenges

    Modeling the domain of ITSOntologies can be used to specify relevant contextsCombine contexts with probabilities

    Preventingcontexts to be identifiedcontexts to be combined with individuals

    Apply methods of anonymization and Probabilistic privacy (e.g.shuffle contexts)

    Shannons entropy definition applicable (normalized)

    Contexts may change with the time (e.g. dense of traffic)Pseudonyms (temporary identifiers)

    PRECIOSA kick off meetingHU Berlin: Prof. Johann-Christoph Freytag, Dipl.-Inf. Martin Kost

    24

    Paris, 11.04.2008