quality control in sage privacy accounting and
TRANSCRIPT
![Page 1: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/1.jpg)
Privacy Accounting and Quality Control in Sage
![Page 2: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/2.jpg)
Whys is DP needed with ML?
● ML datasets could leak specifics about individual entries in their training sets.
● Prevent featurization of dataset ○ Membership inference ○ Reconstruction attacks
2
![Page 3: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/3.jpg)
Q: Why can’t you just train a ML model using PINQ?
3
![Page 4: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/4.jpg)
4
Sage Access Control & privacy adaptive training
Leverages the idea that the growing database is not static but growing, keeps training models endlessly on sensitive data stream
![Page 5: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/5.jpg)
Challenges
Privacy Utility trade-off:
● Less accurate results that fail to meet the quality targets more often then w/o DP.
● low -quality models whose validations succeed by chance.
5
![Page 6: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/6.jpg)
Splitting the data
● User-Level: based on user ID ○ Use incrementing userID’s, max stored ○ New blocks are only created when new users join
● Event-level: splitting on time ○ days , months, etc.
6
![Page 7: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/7.jpg)
Taxi Example
7
● Preprocessing_fn: makes aggregate features i.e distance of ride, hour of day
○ Dp_group_by_mean: ■ Number of times key
appears ■ Sum of values associated
w/ key ○ Each data point has one key
![Page 8: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/8.jpg)
Sage Access Control : requirements for composition theory
8
● R1: Multiple training pipelines w/ differing amounts of data needed for performance
● R2: Adaptivity in choice of queries, DP parameters and data subsets
● R3: Some models are ran periodically w/ new data and others are retired
![Page 9: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/9.jpg)
Failed Methods: which rules do these violate? 1. Query across the entire stream:
○ ϵд = ϵ1 + ϵ2 + ϵ32. Queries split in to subqueries and each run DP
on individual blocks, results aggregated 3. A new data point is allocated to one of the
waiting queries, which consumes entire privacy budget.
9
![Page 10: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/10.jpg)
Block Composition Theory cont. ● Splits data into disjoint blocks adaptively chosen(R1, R2) ● Privacy loss of three queries will be max of ϵ1 + ϵ2 , and ϵ2
+ ϵ3● New blocks D5 arrive w/ privacy loss of zero(R3)
System can run endlessly by training new models on new data!
10
![Page 11: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/11.jpg)
Q: What does it mean for DP parameters to be chosen Adaptively?
11
![Page 12: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/12.jpg)
Adaptive Parameters
12
![Page 13: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/13.jpg)
Privacy-Adaptive Training
● To improve DP quality: ○ Increase privacy budget(ϵ, δ) or increase
dataset size ● Accept: prediction target reached ● Retry: more data needed for assessment ● Reject: model will never reach target w/
sample size/privacy requirements
13
![Page 14: Quality Control in Sage Privacy Accounting and](https://reader034.vdocuments.site/reader034/viewer/2022042101/6256393f30411b5d9155a80c/html5/thumbnails/14.jpg)
Q: What assumptions are made about the data? In what cases could Sage potentially
not perform well?
14
Discuss: