privacy-aware regression modeling of participatory sensing data
DESCRIPTION
Privacy-aware Regression Modeling of Participatory Sensing Data. Hossein Ahmadi , Nam Pham, Raghu Ganti , Tarek Abdelzaher , Suman Nath , Jiawei Han Pallavi Arora. Outline. Introduction Problem Formulation Linear regression Privacy Filter Application Server Model Construction - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/1.jpg)
Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han
Pallavi Arora
Privacy-aware Regression Modeling of Participatory
Sensing Data
![Page 2: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/2.jpg)
IntroductionProblem Formulation
Linear regressionPrivacy FilterApplication Server
Model ConstructionPrivacy AnalysisCase StudyDiscussionRelated WorkConclusion
Outline
![Page 3: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/3.jpg)
Crowdsource aka Participatory SensingPredict Statistics or Extrapolate from collected
data approach in paperPrivate data Public model
Private Data Samples Population density + Eco-friendly behavior Pollution
Model (Public)
Predict Pollution elsewhere.
Introduction
![Page 4: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/4.jpg)
Analyzes relationship between two variables, X and y
Error (Zero mean const variance)
Output Input Regression CoefficientsGiven X and y estimate β.Regression Model
Data (combination of X and y) Model (β)Given X and β predict y.
Linear Regression
![Page 5: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/5.jpg)
Private PublicUsage of electricity + Time of year Energy
consumption (Model)Given usage pattern predict energy consumption.Help users save on energy cost.
How much gas a vehicle will spend on a given route? How much energy a household will save if they
installed motion-activated light controls?How much weight a 300lb person might lose if
engaged in a particular diet and exercise routine?
Example
![Page 6: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/6.jpg)
Ensure anonymitySecurity mechanism users modify data,
PerturbationIrrecoverably alter data Approach in paper.
Sharing private data
![Page 7: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/7.jpg)
Problem Formulation
![Page 8: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/8.jpg)
Data (time series) output variables (e.g., household energy consumption)+ input variables (good predictors of output).
Data Neutral FeaturesReconstruction
Compute private data from features.Higher reconstruction error higher privacy.
Problem Formulation
![Page 9: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/9.jpg)
The model relating user inputs to the outputs is public.
Each data sample collected by an individual is private and may not be revealed.
The models used in the service are linear in coefficients.
The time-series data can be packed into uncorrelated data samples by aggregation (over time for example).
Assumptions
![Page 10: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/10.jpg)
Minimize the modeling errorAccuracy = No Alteration Accuracy.Perfect modeling
Maximize the reconstruction (breach) errorPerfect Neutrality
Information with shared data = information w/o shared data
Design Goals
![Page 11: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/11.jpg)
Data SegmentationAggregation over time to remove correlation
Sum/average.Length of time interval a day? a month?
Large enough to remove correlation.Result in accurate prediction.Usable by participatory sensing application.Depends on application.
Privacy Filter
![Page 12: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/12.jpg)
Segmentation n data points with d input values.Time independent data.yi to denote the value of the output attribute in the
ith segmentxij to denote the value of jth input of segment iEstimate yi using
Does not prevent privacy appliance usage + temperature inside a house each
month show whether a residence is occupied or not in a particular month.
Segmentation
![Page 13: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/13.jpg)
Input variableOutput variablePredictor variable and
denote
Model of system
Neutral Features
![Page 14: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/14.jpg)
Neutral Features correlations of data
Size of data independent of number of samples n.
Large n larger privacy.
Neutral Features
Constant O(n2)
Vector of length k O(kn2)
Matrix of size k*k O(k2n2)
![Page 15: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/15.jpg)
Construct regression modelLeast Square Estimator (LSE)
Let u1, . . . ,um be the m users of the participatory sensing application and provide
Let
The Application Server
![Page 16: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/16.jpg)
Define
The Application Server
![Page 17: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/17.jpg)
Model coefficients
Only uses the neutral features….YEAHExact model construction.
Regression Error
Error using neutral features
The Application Server
![Page 18: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/18.jpg)
Reconstruction Error
Reconstruction Error of mean values
Effective reconstruction If reconstruction err < 1
Privacy Enabling TransformationsIf reconstruction err > 1
Privacy Analysis
Segmented data
Reconstructed data
Variance of reconstructed data
![Page 19: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/19.jpg)
Optimal Reconstruction find the values Yu and Wu that produce the
given transformed matrices ρu, νu, Θu while maximizing the joint probability of observing such values.
Probability of observing values (known to attacker)
Privacy Enabling Properties
![Page 20: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/20.jpg)
Constraints and data points If data points < constraints 100%
reconstruction 0% privacyIf n infinity, Optimal solution difficult to construct private data. Constraints ≠ Affine non- convex optimization NP hard Exponential time in number of variables.
Inaccuracy and Inefficiency of Reconstruction
![Page 21: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/21.jpg)
Assumption Maximum likelihood is obtained if solution is close to the expected value also n is known.
KNITRO non-linear solver.
Conditions to Protect Privacy
![Page 22: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/22.jpg)
Best value of n?? Number of constraints = number of variables
Simulationn >
k high reconstruction errorn
< k
sin
gle
feas
ible
sol
utio
n
![Page 23: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/23.jpg)
Vertical correlation correlation among different attributes
Horizontal correlationcorrelation within a single attribute
Correlation
![Page 24: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/24.jpg)
Conjecture: If n > 2k error 1.
![Page 25: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/25.jpg)
Predict fuel efficient routeCompare
White noise Perturbation techniqueProposed method
Case Study
![Page 26: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/26.jpg)
ClientC++Data trace file
Location trace from GPSConfiguration file
Unique application IDSegmentation intervalSegmentation attributes(e.g.
time) Euclidean distance between
valuesPredictor function map X W. Feature Matrices
Transferred as XML to server
Case Study• Server• C++• List of models with unique application ID• Create aggregation matrices
![Page 27: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/27.jpg)
Data 16 users (different cars), different cars, 3 monthsGeo-tagged engine sensor measurement650 segments each ~ 2miles. Input
w1 = m(ST +v TL) m and v Mass and Velocity of vehicle ST Number of stop signs TL Number of traffic lights
w2 = m v2
w3 = mw4 = Av2 A frontal area of car
Output Fuel consumption
Case Study
![Page 28: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/28.jpg)
Reconstruction error
Case Study
![Page 29: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/29.jpg)
Dependence on number of samples
High error for n > 2k
Case Study
![Page 30: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/30.jpg)
Case Study
![Page 31: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/31.jpg)
RandomizationPerturbationDifferential PrivacyError in modeling
k-anonymityLoss of useful information
Distributed privacy preservationHorizontal or vertical partition aggregate featuresFine grained control to user to prevent his privacy.
Cryptographic techniquesHomographic encryptionComputationally expensive Limited scope
Related work
![Page 32: Privacy-aware Regression Modeling of Participatory Sensing Data](https://reader036.vdocuments.site/reader036/viewer/2022062411/56816683550346895dda2b12/html5/thumbnails/32.jpg)
Regression model same as from private data.Derive a safe number of samples.Study privacy.Neutral features high Reconstruction error .Quantification of privacy does not capture all
privacy breachesDistribution of original data is narrowHigher correlation easy reconstruction.
Can not guarantee privacy in theory.
Conclusion