how can privacy, security, and data integrity requirements ...€¦ · palantir technologies...

21
Zurich Development Center Workshop: Mastering the Challenges of our Digital Society How can privacy, security, and data integrity requirements be respected with Big Data and Algorithmic Decision making? Courtney Bowman Palantir Technologies pmonsch@ ethz.ch

Upload: duongtuong

Post on 05-Jun-2018

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

How can privacy, security, and data integrity requirements be respected with Big Data and Algorithmic Decision making?

Courtney Bowman Palantir Technologies

pmonsch@ ethz.ch

Page 2: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

First automated elevators introduced around the turn of the

20th century.

This was one innovation in a steady progression of

technological developments to improve efficiency, safety, and

utility of vertical movement that enabled ever taller building

construction in space-constrained urban settings.

Historical Analogy – The “driverless” elevator

1

pmonsch@ ethz.ch

Page 3: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

And passengers despised them!

People were terrified to step into an elevator without a

human operator.

Indeed, it wasn’t until the elevator operator strike in NYC in

1945, which prevented some 1.5M office workers from

getting to work that building owners demanded change.

Historical Analogy – The “driverless” elevator

2

pmonsch@ ethz.ch

Page 4: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

And even that wasn’t sufficient.

To ease adoption of the automated elevator, other features

were introduced including the emergency stop button,

emergency telephone, and calming voice aides.

Additionally, a public advertising campaign featuring elderly

women and children operating the push-button elevators

sought to demonstrate the ease and safety of these new lifts.

Historical Analogy – The “driverless” elevator

3

pmonsch@ ethz.ch

Page 5: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

In order for riders to feel comfortable using these machines, it took the confluence of:

• Safety features to enable human intervention

• A mode of outreach in the event of a failing

• Voice aides to explain how the elevator was moving (“going up”; “third floor”)

• A broad community appeal extolling ease of use

• And an exigent crisis (a massive strike) threatening access to office space for the

work force of a major urban center

What do we learn from the elevator analogy?

4

pmonsch@ ethz.ch

Page 6: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

The initial unveiling of the automated elevator, in the pursuit of efficiency, safety, and

reliability, failed to factor in a number of anxieties and risks that impeded its early

adoption.

Presented with this novel technology, society had to countenance a number of

unmitigated, jarring alterations to an important feature of urban life:

• Action seemingly dislocated from the realm of human decision making capability and culpability

• Opaqueness of process

• Uncertainty of outcomes (unclear causality of underlying machinery)

• Absence of a professional , guiding presence

• (Not to mention job displacement)

What else do we learn from the elevator analogy?

5

pmonsch@ ethz.ch

Page 7: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Benefits

• Efficiency gains

• Minimization / mitigation of human error (or bias)

• Freeing workers from the tyranny of the mundane

How does this relate to Big Data / Algorithmic Decision Making?

6

Risks

• Loss of human agency

• Diminished / unclear culpability

• Opaqueness, obscurity of increasingly complex processes

• Reification of autonomous machine

• Technology is only as good / reliable as its design

Both the analogy and the new era of Big Data present a common ledger of benefits and

risks.

pmonsch@ ethz.ch

Page 8: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

And we have as much to learn from examining where the analogy breaks down as we

do from considering the common ties to Big Data / Algorithmic Decision Making.

What’s different?:

• The very ambiguity of the expressions “Big Data” and “Algorithmic Decision Making”

triggers a cluster of unnamed or poorly named anxieties of the information age.

• Big Data presents an entirely distinct risk terrain:

• Security risks

• Privacy risks

• Data integrity / fidelity risks

• Algorithms and Big Data are materially distinct from Elevators. The latter are

physical objects with readily imaginable, tangible harms. It’s not always so clear

when an algorithm has run amok.

But the elevator is an imperfect analogy…

7

pmonsch@ ethz.ch

Page 9: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

For most of the world, “Big Data” looks like this

8

pmonsch@ ethz.ch

Page 10: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Or this

9

pmonsch@ ethz.ch

Page 11: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Or this

10

pmonsch@ ethz.ch

Page 12: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

For the anointed few, “Big Data” may look a little more like this

11

DATA ALGORITHM

𝜑1~𝑎1 log 𝑥1 + 𝑏1 log 𝑦1 + 𝑐1 log 𝑧1 + ⋯ + 𝑑1 log 𝑞1 + 𝐷 𝜑2~𝑎2 log 𝑥2 + 𝑏2 log 𝑦2 + 𝑐2 log 𝑧2 + ⋯ + 𝑑2 log 𝑞2 + 𝐸.

… 𝜑𝑛~𝑎𝑛 log 𝑥𝑛 + 𝑏𝑛 log 𝑦𝑛 + 𝑐𝑛 log 𝑧𝑛 + ⋯ + 𝑑𝑛 log 𝑞𝑛 + 𝑌.

Feature set: ZIP code Language Name City

Employer Education level Marital status Age Birthplace

Model:

Optimization Parameters: • Min(Computation Time)

SCORING / OUTPUT

Candidate A – 90.1% Candidate B – 83.5% Candidate C – 34.2% . . . . . . Candidate X – 12.4% . . .

pmonsch@ ethz.ch

Page 13: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Big Data Analytics and Algorithmic Decision Making are fraught with risks:

• Lack of clear governing policy for use of data and derived results

• Misapplication / re-purposing / over-extension

• Algorithmic Secrecy / blackbox modeling / lack of algorithmic introspection and auditing

• Lack of public engagement (esp. end-user or data subject notice / consent)

• Lack of documentation (including proper accounting of systems’ strengths and deficiencies)

• Poor data curation, tracking, and representation

• Garbage in / garbage out – data as such is not all-revealing and infallible; data curation, understanding, cleansing are often necessary precursors to defensible big data program

• Encoding bias in algorithmic design and application

• Insufficient permissioning, access restrictions, and security in transit and at rest

• Etc.

Data Privacy, Security, and Integrity concerns are numerous

12

pmonsch@ ethz.ch

Page 14: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Perhaps most critically, these risks potentially impact our lives in subtle, compounding,

and intangible (or not immediately tangible) ways:

• Subtle – E.g., proxy measures

• Compounding – E.g., cumulative bias

• Virtual – E.g., cyber-infrastructure threats

Data Privacy, Security, and Integrity failings are insidious

13

pmonsch@ ethz.ch

Page 15: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Whether through regulatory pressure or good corporate citizenship, the design and

architecture of Big Data Analytics and Algorithmic Decision Making systems must

extend scope beyond standard objectives of performance, scalability, or efficiency (as

we learned with the elevator analogy) to include a set of critical additional optimization

parameters at the outset:

• Ethical / Socially Responsible Design & Implementation

• Data Protection / Privacy

• Information Security

• Accountability / Oversight

• Efficacy Evaluations

• Explainability / Interpretability / Transparency

So what is the remedy?

14

pmonsch@ ethz.ch

Page 16: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

In order to realize the value of their data in a way that respects data privacy, security,

and integrity concerns and requirements, organizations need to implement a data

infrastructure that will enable them to process personal data in a compliant way.

Without this infrastructure, personal data subject to regulation transforms from an

asset into a liability. Key capabilities of such a technical infrastructure include:

• Data discovery and classification

• Consent management and data provenance

• Data minimization

• Data integration, de-deduplication, retention, and deletion

• Robust and granular access controls

• Auditing for effective oversight

• Algorithmic transparency

• Responsible Algorithmic Development & Use

What does this look like in practice?

15

pmonsch@ ethz.ch

Page 17: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

• The first step is understanding data exposure. How much of the data is personally

identifying/identifiable/sensitive data? Where is it hosted? Who has access to it?

• Need to be able to discover, understand, and catalogue data holdings in order to understand the

full scope of risk exposure.

CONSENT MANAGEMENT AND DATA PROVENANCE

16

• Where consent conditions apply, processing of personal data must be collected via affirmative

consent from the data subject, the terms of which may limit the scope of authorized processing.

• Data infrastructure needs to maintain a link between any piece of personal data in the enterprise

to an affirmation of consent.

• It must also maintain the lineage of all data, including any changes that have been made to it,

when they were made, and by whom.

DATA DISCOVERY AND CLASSIFICATION

pmonsch@ ethz.ch

Page 18: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

• Data minimization techniques such as anonymization or pseudonymization of personal data

should be used as appropriate for any secondary applications such as archiving, scientific or

historical research, or statistical analysis.

• A data infrastructure that supports pseudonymization and other dynamic data

minimization/masking techniques can help realize the value of data without the unnecessary

disclosure of personal data.

DATA INTEGRATION, DE-DUPLICATION, RETENTION, AND DELETION

17

• Certain regulatory regimes (e.g., GDPR) include extensive rights for the data subject, including

rights to data access and redress, to object to processing, to erasure, and to data portability.

• Responding to data subject rights requests requires the ability to integrate all personal data

about a data subject into a single view, de-duplicating records that may be derived from

disparate and disconnected source systems.

• The data infrastructure must manage the entire data lifecycle, from integration to retention and

deletion.

DATA MINIMIZATION

pmonsch@ ethz.ch

Page 19: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

• Processing of personal data must ensure the data’s security and confidentiality and protect it

from unauthorized access.

• Data infrastructure needs to ensure users with the appropriate permissions can interact with

data with corresponding access restrictions.

AUDITING FOR EFFECTIVE OVERSIGHT

18

• Data controllers may have responsibilities for affirmatively demonstrating compliance with

regulatory requirements or be able to answer to data subject requests.

• Data infrastructure must therefore support meaningful interrogation of all data processing

activities.

ROBUST AND GRANULAR ACCESS CONTROLS

ALGORITHMIC TRANSPARENCY • Regulatory regimes (e.g. the GDPR) place restrictions on the use of “automated processing,”

and call for transparency into any decisions made about data subjects solely as the result of

such processing. The data infrastructure must allow oversight authorities to inspect how such

decisions are made and also to explain (in lay terms) the underlying “logic” of the algorithm. pmonsch@ ethz.ch

Page 20: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

• Understanding and accounting for bias in data used to develop profiling models

• Excluding sensitive categories from algorithmic evaluations.

• Exploring and mitigating risks of sensitive category proxy features.

• Selecting an algorithmic approach that:

• is methodologically appropriate

• enables necessary controls

• minimizes algorithmic bias / disparate impact

• enables the necessary / appropriate degree of introspection, tuning, and oversight

• avails itself to meaningful assessments of efficacy, accuracy, and regulatory compliance over time

• Consider implementing human-in-the-loop applications to confirm results, provide for direct

accountability, esp. where decisions / outcomes may impact data subjects in adverse ways

• Provides for feedback, refinement, and redress when errors are made

19

RESPONSIBLE ALGORITHMIC DEVELOPMENT & USE

pmonsch@ ethz.ch

Page 21: How can privacy, security, and data integrity requirements ...€¦ · Palantir Technologies ethz.ch. Zurich Development Center Workshop: Mastering the Challenges of our Digital Society

Zurich Development Center Workshop: Mastering the Challenges of our Digital Society 20

QUESTIONS? [email protected]

pmonsch@ ethz.ch