the privacy-by-design ai conundrum: privacy-preserving ml
TRANSCRIPT
The privacy-by-design AI conundrum:Privacy-preserving ML
1
Nicolas Kourtellis, Ph.D.
Telefonica Research, Barcelona
MWC 2021
We are better, connected
2
Who am I?
Now: @Telefonica Research, Barcelona
Past: @Yahoo Labs, Barcelona
@USF, Florida, USA
Interests:
• Privacy-preserving Machine Learning
• Personal Data Privacy & Leak Analysis (GDPR, etc.)
• Inappropriate behavior models (cyberbullying, fake news, ...)
• (Distributed)(Stream)(Graph) Data Mining
PROTASIS
EU-Projects
3
Who am I?
Now: @Telefonica Research, Barcelona
Past: @Yahoo Labs, Barcelona
@USF, Florida, USA
Interests:
• Privacy-preserving Machine Learning
• Personal Data Privacy & Leak Analysis (GDPR, etc.)
• Inappropriate behavior models (cyberbullying, fake news, ...)
• (Distributed)(Stream)(Graph) Data Mining
Telefonica Research (since 2006):
• Located in Barcelona
• 12+ PhDs, visiting professors & students
• Publish academic studies
• Patents
• EU / Spanish projects
• Internal innovation projects
Networks
& Systems
Machine
Learning
UX, HCI
Security
& Privacy
PROTASIS
EU-Projects
5
Big Data
10s of billions of devices
• Smart phones, IoT, …
5V’s
• Volume, Velocity, Variety, Veracity, …
Stress on infrastructure
• Networking, storing, mining
Processing
• Towards the edge
6
Companies
Constant user tracking
• Online or offline
Constant data collection
• Storing @ data centers
User modeling behavior
• ML / DL @ data centers, or
• MLaaS
8
Users
Intense web tracking• Data & anonymity leakage• Fingerprinting
Regulations• GDPR, CCPA, e-Privacy
Tools• Private browsers• Ad-blockers
Isolate activities• Tools / devices
9
An ecosystem with conflicting goals
Entitled to privacy
of actions & data,
protected by law
Users
Need data to
remain relevant
in business
Companies
Big Data
5V’s, Infrastructure
10
What can companies do in the future?
How to extract value from data but preserve user privacy?
⇒ Use privacy-preserving ML (PPML)
11
What can companies do in the future?
How to extract value from data but preserve user privacy?
⇒ Use privacy-preserving ML (PPML):
• Differential Privacy (DP)
• Secured Multi-Party Computation (MPC)
• Fully Homomorphic Encryption (FHE)
• Federated Learning (FL):• Google’s Gboard mobile, Android Messages• Apple’s Quicktype keyboard + vocal classifier for Siri
• …
Recent advances of edge computing:
⇒ PPML: competitive alternative to cloud-based MLaaS
What is Telefonica Research doing in this space?
12
Proposed the 1st Federated Learning as a Service (FLaaS)
Proposed the 1st Privacy-Preserving Federated Learning solution (PPFL)
Studied Utility VS. Privacy tradeoff of PPML methods
Studied adversarial attacks on PPML methods
14
FL: How does it work?
F1(w) F2(w) F3(w) F4(w)
1. Send
starting
global
model
2.
Personalize
model
2.
Personalize
model
2.
Personalize
model
2.
Personalize
model
15
FL: How does it work?
F1(w) F2(w) F3(w) F4(w)
1. Send
starting
global
model
2.
Personalize
model
2.
Personalize
model
2.
Personalize
model
2.
Personalize
model
FLaaS:Federated Learning as a Service*
17
*Paper: Kourtellis, Katevas, Perino. FLaaS: Federated Learning as a Service. ACM DistributedML, 2020. arxiv.org/abs/2011.09359
*Patent: Kourtellis, Katevas, Perino. Federated Machine Learning as a Service. EU Patent Application EP20383001, 2020
18
What can I do with FLaaS?
AI/ML
EAI/ML
AI/ML
E AI/ML FLaaS API
3rd-party services
AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters
ML global model
ML results & inferences
Personal data + command flows
F
F
F
F
F
F
AI/ML
EAI/ML
AI/ML
E AI/ML FLaaS API
3rd-party services
AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters
ML global model
ML results & inferences
Personal data + command flows
F
F
F
F
F
F
AI/ML
EAI/ML
AI/ML
E AI/ML FLaaS API
3rd-party services
AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters
ML global model
ML results & inferences
Personal data + command flows
F
F
F
F
F
F
AI/ML
EAI/ML
AI/ML
E AI/ML FLaaS API
3rd-party services
AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters
ML global model
ML results & inferences
Personal data + command flows
F
F
F
F
F
F
AI/ML
EAI/ML
AI/ML
E AI/ML FLaaS API
3rd-party services
AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters
ML global model
ML results & inferences
Personal data + command flows
F
F
F
F
F
F
AI/ML
EAI/ML
AI/ML
E AI/ML FLaaS API
3rd-party services
AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters
ML global model
ML results & inferences
Personal data + command flows
F
F
F
F
F
F
19
What can I do with FLaaS?
2nd use case:
Collaborative similar apps
for same ML problem
• Instagram + Facebook
• YouTube + Netflix
3rd use case:
Collaborative, but orthogonal
apps for new ML problem
• Uber + Spotify
• GMaps + Spotify
1st use case:
Individual apps for
own ML problem
• Spotify
AI/ML
EAI/ML
AI/ML
E AI/ML FLaaS API
3rd-party services
AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters
ML global model
ML results & inferences
Personal data + command flows
F
F
F
F
F
F
AI/ML
EAI/ML
AI/ML
E AI/ML FLaaS API
3rd-party services
AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters
ML global model
ML results & inferences
Personal data + command flows
F
F
F
F
F
F
AI/ML
EAI/ML
AI/ML
E AI/ML FLaaS API
3rd-party services
AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters
ML global model
ML results & inferences
Personal data + command flows
F
F
F
F
F
F
AI/ML
EAI/ML
AI/ML
E AI/ML FLaaS API
3rd-party services
AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters
ML global model
ML results & inferences
Personal data + command flows
F
F
F
F
F
F
AI/ML
EAI/ML
AI/ML
E AI/ML FLaaS API
3rd-party services
AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters
ML global model
ML results & inferences
Personal data + command flows
F
F
F
F
F
F
AI/ML
EAI/ML
AI/ML
E AI/ML FLaaS API
3rd-party services
AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters
ML global model
ML results & inferences
Personal data + command flows
F
F
F
F
F
F
20
FLaaS Features
1. Enables collaborative ML training across customers on same device with APIs
• Federated, secured, PP fashion
2. Provides extensible / high-level APIs & SDK for service usage & privacy/permissions management
3. Enables hierarchical construction & exchange of ML models across network
4. Can be instantiated in different types of devices and operational environments
• Phones, home devices, edge nodes
22
FL problem
Models (gradients) memorize seen data
Vulnerable to model attacks:
• Membership/Attribute Inference
• Data reconstruction
⇒ How to protect private information of models?
Requires access to model
training process & updates
23
FL problem
Models (gradients) memorize seen data
Vulnerable to model attacks:
• Membership/Attribute Inference
• Data reconstruction
⇒ How to protect private information of models?
Use Trusted Execution Environments (TEEs)
• Limited memory of TEE
⇒ How to train full FL models inside limited TEEs?
PPFL:Privacy-Preserving FL*
24
*Paper: Mo, Haddadi, Katevas, Marin, Perino, Kourtellis. Privacy-preserving Federated Learning with TEEs. MobiSys 2021. arxiv.org/pdf/2104.14380
*Patent: Kourtellis, Marin, Katevas, Perino. Federated Learning for Preserving Privacy. EU Patent Application EP21382368, 2021
25
PPFL: Architectural Design
Clients
Private
Dataset
TEE
Move to next block of layers after convergence
CL CL CL③
Server
TEE
PublicKnow-ledge
⑤
CL②
CL
④
①
if transferring
Configuration ReportingDevice selection & secure channel
⑥+
①Transferring knowledge if any②Model initialization③Model broadcasting
④Layer-wise local training
⑥Secure aggregation⑤Model reporting
CL: ClassifierTEE: Trusted Execution Environment
26
PPFL Features
1. Does not leak private information at both server & client-side under FL
• Membership/Attribute Inference attacks
• Data reconstruction attacks
2. Overcomes limited TEE memory constraints
3. Provides comparable ML accuracy with regular FL
• Even without training all layers
4. Imposes small overhead for on-device training in TEE
• Tolerable delay
28
Research Directions
Hierarchical FLFL SchedulingOn-device training optimization
Heterogeneity in FL Resources
FL Fault Tolerance
User study, 100s of real devices