data science and pending eu privacy laws - a storm on the horizon

25
Data Science and EU Privacy A Storm on the Horizon David Stephenson, Ph.D. dsiAnalytics.com

Upload: david-stephenson-phd

Post on 29-Jan-2018

160 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Data Science and EU Privacy

A Storm on the Horizon

David Stephenson, Ph.D.

dsiAnalytics.com

PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE

• Intro & Case Studies

• Data & Data Science: Growth and Usage

• Privacy: Storm on the Horizon

• Concluding Thoughts

Agenda

2

My BackgroundIntro & Case Studies

Head of Global Business

Analytics

Professor

(Advanced Analytics)

Ph.D. Analytics &

Computer Science

Financial Analytics,

Credit Risk and Insurance

Independent Consultant

3

Target corporation makes an embarrassing revelationIntro & Case Studies

Legal

4

Netflix Plays with Fire and Gets BurnedIntro & Case Studies

5

Netflix

5

PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE

• Intro & Case Studies

• Data & Data Science: Growth and Usage

– Data Science

– Modern Technology

• Privacy: Storm on the Horizon

Agenda

6

Data UsageThe Power of Data Science

Propensity Classification/Profiling

PersonalizationMarketing

7

More Data Means More Insights

Traditional Data

Big

DataSmart

Devices

IoT

8

William Weld

Healthcare Case Study: Linking Data Destroys AnonymizationData Science

9

PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE

• Intro & Case Studies

• Data & Data Science: Growth and Usage

– Data Science

– Modern Technology

• Privacy: Storm on the Horizon

Agenda

10

Data Sources The Power of Data Science

11Brainstorm: Today’s Sources of Personal Data? 11

Sources: BrowsingData Growth: Modern Technology

12

Sources: off-line behaviorData Growth: Modern Technology

13

Sources: BiometricsData Growth: Modern Technology

14

Sources: The Internet of ThingsData Growth: Modern Technology

15

Data StorageData Growth: Modern Technology

16

Source and Use of Customer DataPrivacy: A Brief Background

Can be

Known

UsedStored

Shared with

3rd parties

Observed

VolunteeredData

Science

17

PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE

18

• Intro & Case Studies

• Data & Data Science: Growth and Usage

• Privacy: Storm on the Horizon

Agenda

Privacy Legislation

19

Privacy Legislation

EU Data

Protection

Directives

19

Preparing for compliancePrivacy: Storm on the Horizon

What are my

data assets?

UsageStorage

Flow to/from

3rd parties

Observed

VolunteeredData

Science

Right to be forgotten

De-anonymization

Cloud computing

Explicit and up-

front consent

Restricted profiling

Privacy by Design

Potential liabilities from

buying, selling and sharing

20

Moving ForwardPrivacy: Storm on the Horizon

Become aware of your entire data ecosystem and how it may expose

you to privacy violations

Audit current data storage and governance for compliance

Ensure that all product roadmaps comply with the principles of

Privacy by Design

21

Ensure that proper user consent is in place from the moment of first

user registration

Initiate dialogue with corporate privacy officer or external expert

22

[email protected]

@Stephenson_data

Contact

Appendix

23

Privacy by Design

24

1 Proactive not Reactive; Preventative not Remedial

2 Privacy as the Default Setting

3 Privacy Embedded into Design

4 Full Functionality – Positive-Sum, not Zero-Sum

5 End-to-End Security – Full Lifecycle Protection

6 Visibility and Transparency – Keep it Open

7 Respect for User Privacy – Keep it User-Centric

Privacy by Design for Big Data (Jeff Jonas, IBM)

25

1. FULL ATTRIBUTION: Every observation (record) needs to know from where it came and when. There cannot be

merge/purge data survivorship processing whereby some observations or fields are discarded.

2. DATA TETHERING: Adds, changes and deletes occurring in systems of record must be accounted for, in real time, in sub-

seconds.

3. ANALYTICS ON ANONYMIZED DATA: The ability to perform advanced analytics (including some fuzzy matching) over

cryptographically altered data means organizations can anonymize more data before information sharing.

4. TAMPER-RESISTANT AUDIT LOGS: Every user search should be logged in a tamper-resistant manner — even the

database administrator should not be able to alter the evidence contained in this audit log.

5. FALSE NEGATIVE FAVORING METHODS: The capability to more strongly favor false negatives is of critical importance

in systems that could be used to affect someone’s civil liberties.

6. SELF-CORRECTING FALSE POSITIVES: With every new data point presented, prior assertions are re-evaluated to

ensure they are still correct, and if no longer correct, these earlier assertions can often be repaired — in real time.

7. INFORMATION TRANSFER ACCOUNTING: Every secondary transfer of data, whether to human eyeball or a tertiary

system, can be recorded to allow stakeholders (e.g., data custodians or the consumers themselves) to understand how

their data is flowing.