bootstrapping privacy compliance in big data system
DESCRIPTION
Bootstrapping Privacy Compliance in Big Data System. Shayak Sen, Saikat Guha et al Carnegie Mellon University Microsoft Research Presenter: Cheng Li. We have your everything. Your bank account. Your mobile. Your social network. Your shopping account. We will keep it as a secret. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/1.jpg)
Bootstrapping Privacy Bootstrapping Privacy Compliance in Big Data Compliance in Big Data SystemSystem
Shayak Sen, Saikat Guha et alCarnegie Mellon UniversityMicrosoft Research
Presenter: Cheng Li
![Page 2: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/2.jpg)
We have your everythingWe have your everythingYour bank account
Your mobile
Your social network
Your shopping account
![Page 3: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/3.jpg)
We will keep it as a secretWe will keep it as a secret
![Page 4: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/4.jpg)
This is how we workThis is how we work
Legal team craft privacy policy
Privacy Champion interprets policy
Developer writes code
Audit Team verifies compliance
![Page 5: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/5.jpg)
Life could be much easierLife could be much easier
encode
refine
code analysis
![Page 6: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/6.jpg)
OutlineOutlineIntroductionLEGALEASE
◦Goal◦Syntax◦Domain-Specific Attribute◦Formal Semantics◦Properties
GROKValidationDiscussionConclusion
![Page 7: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/7.jpg)
LEGALEASELEGALEASEGoal
◦Usability: Policy clauses are structured very similarly to clauses in English language policy.
◦Expressivity: Clauses are built around an attribute abstraction that allows the language to evolve as policy evolves.
◦Compositional Reasoning: LEGALEASE provides meaningful syntactic restrictions to allow compositional reasoning.
![Page 8: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/8.jpg)
OutlineOutlineIntroductionLEGALEASE
◦Goal◦Syntax◦Domain-Specific Attribute◦Formal Semantics◦Properties
GROKValidationDiscussionConclusion
![Page 9: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/9.jpg)
LEGALEASELEGALEASESyntax
Domain-Specific attributes are defined in concept lattice
LEGLEASE Policies are checked at each node in the data dependency graph.Each node is labeled with attr’s name and set of values.ALLOW: permits node labeled with subset of values.DENY: forbids node labeled with sets that overlaps the attribute values.
![Page 10: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/10.jpg)
LEGALEASELEGALEASEExample
◦ Full IP address will not be used for advertising. IP address may be used for detecting abuse. In such cases it will not be combined with account information.
◦ DENY DataType IPAddress UseForPurpose AdvertisingEXCEPTALLOW DataType IPAddress:TruncatedALLOW DataType IPAddress UseForPurpose AbuseDetect EXCEPT DENY DataType IPAddress, AccountInfo
![Page 11: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/11.jpg)
OutlineOutlineIntroductionLEGALEASE
◦Goal◦Syntax◦Domain-Specific Attribute◦Formal Semantics◦Properties
GROKValidationDiscussionConclusion
![Page 12: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/12.jpg)
LEGALEASELEGALEASEDomain-specific Attribute
◦Attribute values are organized as a concept lattice.
◦Advantages of concept lattice: Abstracts away semantics. The lattice structure allows users to
concisely define sets of elements through their least upper bound.
The lattice structure allows us to statically check the policy for certain classes of errors.
![Page 13: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/13.jpg)
LEGALEASELEGALEASEAttribute define in the
implementation◦InStore attribute: encode certain
policies around collection and storage of data.
![Page 14: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/14.jpg)
LEGALEASELEGALEASEAttribute define in the
implementation◦UseForPurpose attribute: Encode the
data usage.
![Page 15: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/15.jpg)
LEGALEASELEGALEASEAttribute define in the
implementation◦AccessByRole attribute: For encoding
internal access-control based policies.
![Page 16: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/16.jpg)
LEGALEASELEGALEASEAttribute define in the
implementation◦DataType attribute:
Policy datatypes: types of data
![Page 17: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/17.jpg)
LEGALEASELEGALEASEAttribute define in the
implementation◦DataType attribute:
Policy datatypes: Category of data types Limited typestate: A limited way of
tracking history.
![Page 18: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/18.jpg)
LEGALEASELEGALEASEAttribute define in the
implementation◦DataType attribute:
Combining policy datatypes and typestates:
t:s where t is policy datatypes and s is typestates.
![Page 19: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/19.jpg)
OutlineOutlineIntroductionLEGALEASE
◦Goal◦Syntax◦Domain-Specific Attribute◦Formal Semantics◦Properties
GROKValidationDiscussionConclusion
![Page 20: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/20.jpg)
LEGALEASELEGALEASEFormal Semantics
◦Notions: T – a vector of sets of latice elements. Tx – the value of attribute x in T. TG – Graph node. TC – Policy clause vector.
![Page 21: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/21.jpg)
LEGALEASELEGALEASEFormal Semantics
◦ where is ALLOW TC applies to a graph node TG if TG
⊑TC
◦ is for each x,
DENY TC applies to TG if
![Page 22: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/22.jpg)
LEGALEASELEGALEASEFormal Semantics
◦A graph node is allowed by an ALLOW clause if and only if the clause applies and is allowed by each exception.
![Page 23: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/23.jpg)
LEGALEASELEGALEASEFormal Semantics
◦A graph node is denied by an DENY clause if and only if the clause applies and is denied by each exception.
![Page 24: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/24.jpg)
OutlineOutlineIntroductionLEGALEASE
◦Goal◦Syntax◦Domain-Specific Attribute◦Formal Semantics◦Properties
GROKValidationDiscussionConclusion
![Page 25: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/25.jpg)
LEGALEASELEGALEASEProperties
◦Totality: C should either allow T or deny it.
◦Unicity: C cannot allow T and deny T at the same time.
◦Monotonicity: If C1 C2, then for any TG, C1 allows TG implies that C2 allows TG and C2;C2 denies TG implies C1 denies TG.
![Page 26: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/26.jpg)
OutlineOutlineIntroductionLEGALEASEGROKValidationDiscussionConclusion
![Page 27: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/27.jpg)
GROKGROKGROK SystemNodes are labeled with
attribute
Confidence value
Different granularity
![Page 28: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/28.jpg)
GROKGROKData Flow Edges and Labeling
Nodes◦Log Analysis: Use log to bootstrap
the coarse-grained data flow graph Label file nodes with InStore attribute,
entity nodes with AccessByRole attribute. (high confidence)
Label UseForPurpose attribute for each job. (low confidence)
![Page 29: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/29.jpg)
Log Analysis
![Page 30: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/30.jpg)
GROKGROKData Flow Edges and Labeling
Nodes◦Syntactic Analysis: Label Datatype
attr by syntactically analyzing the source code of the job that read or wrote data. (low confidence)
![Page 31: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/31.jpg)
Syntactic Analysis
![Page 32: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/32.jpg)
GROKGROKData Flow Edges and Labeling
Nodes◦Semantic Analysis: Refine file nodes
to a collection of column nodes. Refine job nodes to a sub-graph of nodes.
![Page 33: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/33.jpg)
Semantic Analysis
![Page 34: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/34.jpg)
GROKGROKData Flow Analysis
◦Copy DataType attribute of one node to all nodes that data flows to.
◦Join two attributes that has the same confidence value.
◦If data flow through UDF(user defined function), check whether typestate has been modified. If it does, assign low confidence value.
![Page 35: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/35.jpg)
GROKGROKVerifying Labels
◦Attributes verified by developers are assigned with high confidence value.
low = IPAddress
low confidence attribute
related source file related low confidence
attribute
low = IPAddresslow = UserAgent …
source file
reverse mapping Contact
the developer with highest-ranking source file
![Page 36: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/36.jpg)
GROKGROKImplementation
GROK
static semantic analyzer
data flow analyzer
processes individual jobs from the cluster log into the nodes and edges in data dependency graph without attr
collates all the graph node, syntactic analysis and conservative data flow analysis, augmented with attrs.
![Page 37: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/37.jpg)
OutlineOutlineIntroductionLEGALEASEGROKValidationDiscussionConclusion
![Page 38: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/38.jpg)
ValidationValidationScale
◦ 100 day period, 77 thousand jobs each day, submitted by over 7 thousand entities in over 300 functional units.
◦ 1.1 million unique lines of code, 21% changes on a day-to-day basis.
![Page 39: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/39.jpg)
ValidationValidationCoverage
simulate syntactic analyses on real-
world DDG
add dataflow analysis
add manual verification
![Page 40: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/40.jpg)
ValidationValidationUsability
◦Online survey◦12 participants from Microsoft
privacy champions.◦Majority of participants were able to
use LEGALEASE to code policy clauses
![Page 41: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/41.jpg)
ValidationValidationExpressiveness
![Page 42: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/42.jpg)
OutlineOutlineIntroductionLEGALEASEGROKValidationDiscussionConclusion
![Page 43: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/43.jpg)
DiscussionDiscussionExpressiveness: LEGALEASE cannot
express policies based on first-order temporal-logic. However, LEGALEASE is enough to express privacy policies.
Infer sensitive data: Unless explicitly labeled, GROK cannot detect inference from non-sensitive data to sensitive data.
Precision: Major source of precision comes from overly conservative treatment of UDF.
![Page 44: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/44.jpg)
DiscussionDiscussionFalse Negatives: The authors are
unable to characterize the exact nature of false negatives in the system due to lack of ground truth.
Assurance: The system can not guarantee the result in face of adversarial developers’ behavior.
![Page 45: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/45.jpg)
OutlineOutlineIntroductionLEGALEASEGROKValidationDiscussionConclusion
![Page 46: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/46.jpg)
ConclusionConclusionAutomated privacy compliance
checking◦LEGALEASE: stating privacy policies as a
form of restrictions on information flows.◦GROK: data inventory that maps low level
data types in code to high level policy concepts.
Evaluation results show that◦LEGALEASE is expressive enough to capture
real-world privacy policies.◦GROK could bootstrap labeling the graph
with LEGALEASE at massive scale.
![Page 47: Bootstrapping Privacy Compliance in Big Data System](https://reader035.vdocuments.site/reader035/viewer/2022062717/56812b07550346895d8ee8a6/html5/thumbnails/47.jpg)