9 - papadimitriou
TRANSCRIPT
-
7/28/2019 9 - Papadimitriou
1/20
Detecting Data Leakage
Panagiotis [email protected]
Hector [email protected]
-
7/28/2019 9 - Papadimitriou
2/20
Leakage Problem
Stanford Infolab 2
App. U1 App. U2
Jeremy Sarah Mark
Other Sources
e.g. Sarahs Network
Name: Mark
Sex: Male
.
Name: Sarah
Sex: Female
.
Kathryn
-
7/28/2019 9 - Papadimitriou
3/20
Outline
Problem Description
Guilt Models
Pr{U1 leaked data} = 0.7
Pr{U2 leaked data} = 0.2
Distribution Strategies
Stanford Infolab 3
-
7/28/2019 9 - Papadimitriou
4/20
Problem Description
Guilt Models
Distribution Strategies
Stanford Infolab 4
-
7/28/2019 9 - Papadimitriou
5/20
Problem Entities
Entity Dataset
Distributor
Facebook
T
Set of all Facebook profiles
Agents
Facebook Apps U1, , Un
R1, , Rn
Ri: Set of peoples profiles who have
added the application Ui
LeakerS
Set of leaked profiles
Stanford Infolab 5
-
7/28/2019 9 - Papadimitriou
6/20
Agents Data Requests
Sample
100 profiles of Stanford people
Explicit
All people who added application
(example we used so far)
All Stanford profiles
Stanford Infolab 6
-
7/28/2019 9 - Papadimitriou
7/20
Problem Description
Guilt Models
Distribution Strategies
Stanford Infolab 7
-
7/28/2019 9 - Papadimitriou
8/20
Guilt Models (1/3)
Stanford Infolab 8
Other Sources
e.g. Sarahs
Network
8
p
p: posterior probability that a leaked profile
comes from other sources
pGuilty Agent: Agent who leaks at least one profile
Pr{Gi|S}: probability that agent Ui is guilty, given
the leaked set of profiles S
-
7/28/2019 9 - Papadimitriou
9/20
Guilt Models (2/3)
Stanford Infolab 99
or
or
Agents leak each of their
data items independently
Agents leak all their data
items OR nothing
or
(1-p)2
(1-p)p
p(1-p)
p2
-
7/28/2019 9 - Papadimitriou
10/20
Guilt Models (3/3)
Independently NOT Independently
Stanford Infolab 10
Pr{G1}
Pr{G2} Pr{G2}
Pr{G1}
-
7/28/2019 9 - Papadimitriou
11/20
Problem Description
Guilt Models
Distribution Strategies
Stanford Infolab 11
-
7/28/2019 9 - Papadimitriou
12/20
The Distributors Objective (1/2)
Stanford Infolab 12
U1
U2
U3
U4
R1
Pr{G1|S}>>Pr{G2|S}
Pr{G1|S}>> Pr{G4|S}
S (leaked)
R1
R3
R2
R3
R4
-
7/28/2019 9 - Papadimitriou
13/20
The Distributors Objective (2/2)
To achieve his objective the distributor has to
distribute sets Ri, , Rn that
minimize
Intuition: Minimized data sharing amongagents makes leaked data reveal the guilty
agents
Stanford Infolab 13
njiRRRi ij
ji
i
,...,1,,1
-
7/28/2019 9 - Papadimitriou
14/20
Distribution Strategies Sample (1/4)
Set T has four profiles:
Kathryn, Jeremy, Sarah and Mark
There are 4 agents:
U1, U2, U3 and U4
Each agent requests a sample of any 2 profiles
of T for a market survey
Stanford Infolab 14
-
7/28/2019 9 - Papadimitriou
15/20
Distribution Strategies Sample (2/4)
Poor
ji
ji RRMinimize
Stanford Infolab 15
U1
U2
U3
U4
U1
U2
U3
U4
-
7/28/2019 9 - Papadimitriou
16/20
Distribution Strategies Sample (3/4)
Optimal Distribution
Avoid full overlaps and minimize
Stanford Infolab 16
U1
U2
U3
U4
i ij
ji
i
RRR
1
-
7/28/2019 9 - Papadimitriou
17/20
Distribution Strategies Sample (4/4)
Stanford Infolab 17
-
7/28/2019 9 - Papadimitriou
18/20
Distribution Strategies
Sample Data Requests
The distributor has the
freedom to select the data
items to provide the agents
with
General Idea:
Provide agents with as much
disjoint sets of data as possible
Problem: There are caseswhere the distributed data
must overlap E.g.,
|Ri|++|Rn|>|T|
Explicit Data Requests
The distributor must
provide agents with the
data they request General Idea:
Add fake data to the
distributed ones to minimize
overlap of distributed data
Problem: Agents can collude
and identify fake data
NOT COVERED in this talk
Stanford Infolab 18
-
7/28/2019 9 - Papadimitriou
19/20
Conclusions
Data Leakage
Modeled as maximum likelihood problem
Data distribution strategies that help identify
the guilty agents
Stanford Infolab 19
-
7/28/2019 9 - Papadimitriou
20/20
Thank You!