the privacy-by-design ai conundrum: privacy-preserving ml

30
The privacy-by-design AI conundrum: Privacy-preserving ML 1 Nicolas Kourtellis, Ph.D. Telefonica Research, Barcelona MWC 2021 We are better, connected

Upload: others

Post on 27-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

The privacy-by-design AI conundrum:Privacy-preserving ML

1

Nicolas Kourtellis, Ph.D.

Telefonica Research, Barcelona

MWC 2021

We are better, connected

2

Who am I?

Now: @Telefonica Research, Barcelona

Past: @Yahoo Labs, Barcelona

@USF, Florida, USA

Interests:

• Privacy-preserving Machine Learning

• Personal Data Privacy & Leak Analysis (GDPR, etc.)

• Inappropriate behavior models (cyberbullying, fake news, ...)

• (Distributed)(Stream)(Graph) Data Mining

PROTASIS

EU-Projects

3

Who am I?

Now: @Telefonica Research, Barcelona

Past: @Yahoo Labs, Barcelona

@USF, Florida, USA

Interests:

• Privacy-preserving Machine Learning

• Personal Data Privacy & Leak Analysis (GDPR, etc.)

• Inappropriate behavior models (cyberbullying, fake news, ...)

• (Distributed)(Stream)(Graph) Data Mining

Telefonica Research (since 2006):

• Located in Barcelona

• 12+ PhDs, visiting professors & students

• Publish academic studies

• Patents

• EU / Spanish projects

• Internal innovation projects

Networks

& Systems

Machine

Learning

UX, HCI

Security

& Privacy

PROTASIS

EU-Projects

4

What’s the problem?

5

Big Data

10s of billions of devices

• Smart phones, IoT, …

5V’s

• Volume, Velocity, Variety, Veracity, …

Stress on infrastructure

• Networking, storing, mining

Processing

• Towards the edge

6

Companies

Constant user tracking

• Online or offline

Constant data collection

• Storing @ data centers

User modeling behavior

• ML / DL @ data centers, or

• MLaaS

7

Users

Intense web tracking

• Data & anonymity leakage

• Fingerprinting

8

Users

Intense web tracking• Data & anonymity leakage• Fingerprinting

Regulations• GDPR, CCPA, e-Privacy

Tools• Private browsers• Ad-blockers

Isolate activities• Tools / devices

9

An ecosystem with conflicting goals

Entitled to privacy

of actions & data,

protected by law

Users

Need data to

remain relevant

in business

Companies

Big Data

5V’s, Infrastructure

10

What can companies do in the future?

How to extract value from data but preserve user privacy?

⇒ Use privacy-preserving ML (PPML)

11

What can companies do in the future?

How to extract value from data but preserve user privacy?

⇒ Use privacy-preserving ML (PPML):

• Differential Privacy (DP)

• Secured Multi-Party Computation (MPC)

• Fully Homomorphic Encryption (FHE)

• Federated Learning (FL):• Google’s Gboard mobile, Android Messages• Apple’s Quicktype keyboard + vocal classifier for Siri

• …

Recent advances of edge computing:

⇒ PPML: competitive alternative to cloud-based MLaaS

What is Telefonica Research doing in this space?

12

Proposed the 1st Federated Learning as a Service (FLaaS)

Proposed the 1st Privacy-Preserving Federated Learning solution (PPFL)

Studied Utility VS. Privacy tradeoff of PPML methods

Studied adversarial attacks on PPML methods

13

What is Federated Learning?

14

FL: How does it work?

F1(w) F2(w) F3(w) F4(w)

1. Send

starting

global

model

2.

Personalize

model

2.

Personalize

model

2.

Personalize

model

2.

Personalize

model

15

FL: How does it work?

F1(w) F2(w) F3(w) F4(w)

1. Send

starting

global

model

2.

Personalize

model

2.

Personalize

model

2.

Personalize

model

2.

Personalize

model

16

How can companies build collaborative ML models?

FLaaS:Federated Learning as a Service*

17

*Paper: Kourtellis, Katevas, Perino. FLaaS: Federated Learning as a Service. ACM DistributedML, 2020. arxiv.org/abs/2011.09359

*Patent: Kourtellis, Katevas, Perino. Federated Machine Learning as a Service. EU Patent Application EP20383001, 2020

18

What can I do with FLaaS?

AI/ML

EAI/ML

AI/ML

E AI/ML FLaaS API

3rd-party services

AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters

ML global model

ML results & inferences

Personal data + command flows

F

F

F

F

F

F

AI/ML

EAI/ML

AI/ML

E AI/ML FLaaS API

3rd-party services

AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters

ML global model

ML results & inferences

Personal data + command flows

F

F

F

F

F

F

AI/ML

EAI/ML

AI/ML

E AI/ML FLaaS API

3rd-party services

AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters

ML global model

ML results & inferences

Personal data + command flows

F

F

F

F

F

F

AI/ML

EAI/ML

AI/ML

E AI/ML FLaaS API

3rd-party services

AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters

ML global model

ML results & inferences

Personal data + command flows

F

F

F

F

F

F

AI/ML

EAI/ML

AI/ML

E AI/ML FLaaS API

3rd-party services

AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters

ML global model

ML results & inferences

Personal data + command flows

F

F

F

F

F

F

AI/ML

EAI/ML

AI/ML

E AI/ML FLaaS API

3rd-party services

AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters

ML global model

ML results & inferences

Personal data + command flows

F

F

F

F

F

F

19

What can I do with FLaaS?

2nd use case:

Collaborative similar apps

for same ML problem

• Instagram + Facebook

• YouTube + Netflix

3rd use case:

Collaborative, but orthogonal

apps for new ML problem

• Uber + Spotify

• GMaps + Spotify

1st use case:

Individual apps for

own ML problem

• Spotify

• Instagram

AI/ML

EAI/ML

AI/ML

E AI/ML FLaaS API

3rd-party services

AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters

ML global model

ML results & inferences

Personal data + command flows

F

F

F

F

F

F

AI/ML

EAI/ML

AI/ML

E AI/ML FLaaS API

3rd-party services

AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters

ML global model

ML results & inferences

Personal data + command flows

F

F

F

F

F

F

AI/ML

EAI/ML

AI/ML

E AI/ML FLaaS API

3rd-party services

AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters

ML global model

ML results & inferences

Personal data + command flows

F

F

F

F

F

F

AI/ML

EAI/ML

AI/ML

E AI/ML FLaaS API

3rd-party services

AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters

ML global model

ML results & inferences

Personal data + command flows

F

F

F

F

F

F

AI/ML

EAI/ML

AI/ML

E AI/ML FLaaS API

3rd-party services

AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters

ML global model

ML results & inferences

Personal data + command flows

F

F

F

F

F

F

AI/ML

EAI/ML

AI/ML

E AI/ML FLaaS API

3rd-party services

AI/ML FLaaS AI/ML modulesE Edge computing nodeLearned ML parameters

ML global model

ML results & inferences

Personal data + command flows

F

F

F

F

F

F

20

FLaaS Features

1. Enables collaborative ML training across customers on same device with APIs

• Federated, secured, PP fashion

2. Provides extensible / high-level APIs & SDK for service usage & privacy/permissions management

3. Enables hierarchical construction & exchange of ML models across network

4. Can be instantiated in different types of devices and operational environments

• Phones, home devices, edge nodes

21

Can I trust my FLaaS-trained model?

22

FL problem

Models (gradients) memorize seen data

Vulnerable to model attacks:

• Membership/Attribute Inference

• Data reconstruction

⇒ How to protect private information of models?

Requires access to model

training process & updates

23

FL problem

Models (gradients) memorize seen data

Vulnerable to model attacks:

• Membership/Attribute Inference

• Data reconstruction

⇒ How to protect private information of models?

Use Trusted Execution Environments (TEEs)

• Limited memory of TEE

⇒ How to train full FL models inside limited TEEs?

PPFL:Privacy-Preserving FL*

24

*Paper: Mo, Haddadi, Katevas, Marin, Perino, Kourtellis. Privacy-preserving Federated Learning with TEEs. MobiSys 2021. arxiv.org/pdf/2104.14380

*Patent: Kourtellis, Marin, Katevas, Perino. Federated Learning for Preserving Privacy. EU Patent Application EP21382368, 2021

25

PPFL: Architectural Design

Clients

Private

Dataset

TEE

Move to next block of layers after convergence

CL CL CL③

Server

TEE

PublicKnow-ledge

CL②

CL

if transferring

Configuration ReportingDevice selection & secure channel

⑥+

①Transferring knowledge if any②Model initialization③Model broadcasting

④Layer-wise local training

⑥Secure aggregation⑤Model reporting

CL: ClassifierTEE: Trusted Execution Environment

26

PPFL Features

1. Does not leak private information at both server & client-side under FL

• Membership/Attribute Inference attacks

• Data reconstruction attacks

2. Overcomes limited TEE memory constraints

3. Provides comparable ML accuracy with regular FL

• Even without training all layers

4. Imposes small overhead for on-device training in TEE

• Tolerable delay

27

Next steps?

28

Research Directions

Hierarchical FLFL SchedulingOn-device training optimization

Heterogeneity in FL Resources

FL Fault Tolerance

User study, 100s of real devices

29

MWC 2021

We are better, connected

Want to know more?

30

The privacy-by-design AI conundrum:Privacy-preserving ML

Nicolas Kourtellis, Ph.D.

Telefonica Research, Barcelona

@kourtellisMWC 2021

We are better, connected

Questions?