ann cavoukian, ph.d. information and privacy commissioner … · 2013. 5. 15. · big data •each...

15
Ann Cavoukian, Ph.D. Information and Privacy Commissioner Ontario, Canada A Q&A with the Commissioner: Big Data and Privacy Health Research: Big Data, Health Research Yes! Personal Data No!

Upload: others

Post on 04-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

Ann Cavoukian, Ph.D.

Information and Privacy Commissioner

Ontario, Canada

A Q&A with the Commissioner: Big Data and

Privacy Health Research: Big Data, Health

Research –Yes! Personal Data – No!

Page 2: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

THE AGE OF BIG DATA,

OPEN DATA AND PRIVACY

Big Data – Yes

Open Data – Yes

Personal Data - No

Page 3: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

BIG DATA

• Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created in the past 2 years;

• Big data analysis and data analytics promises new opportunities to gain valuable insights and benefits, (e.g., improved load management, better assets management, new programs and services etc.);

• However, it can also enable expanded surveillance, on a scale previously unimaginable;

• This situation cries out for a positive-sum solution, win-win strategy: what is needed is Big Data and Big Privacy!

Page 4: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

PRIVACY BY DESIGN

IN THE AGE OF BIG DATA

www.privacybydesign.ca

• The Big Difference with Big Data;

• “Sensemaking” Systems;

• Privacy by Design in the Age of Big Data;

• The Creation of a Big Data Sensemaking System through PbD.

Page 5: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

DATA MINIMIZATION AND

DE-IDENTIFICATION

Page 6: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

DATA MINIMIZATION

• Data minimization is the most important safeguard in protecting personal health information, including for health research and data analysis;

• Ontario’s PHIPA prohibits health care providers from collecting, using or disclosing personal health information if other information (such as de-identified or anonymized information) will serve the purpose;

• It also prohibits health care providers from collecting, using or disclosing more personal health information than is reasonably necessary to meet the purpose.

Page 7: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

DISPELLING THE MYTHS ABOUT DE-IDENTIFICATION

www.ipc.on.ca/English/Resources/Discussion-Papers/Discussion-Papers-Summary/?id=1084

• The claim that de-identification has no value in protecting privacy due to the ease of re-identification, is a myth;

• If proper de-identification techniques and re-identification risk management procedures are used, re-identification becomes a very difficult task;

• While there may be a residual risk of re-identification, in the vast majority of cases, de-identification will strongly protect the privacy of individuals when additional safeguards are in place.

Page 8: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

DATA DE-IDENTIFICATION TOOL

www.ipc.on.ca/images/Resources/positive-sum-khalid.pdf

• Developed by Dr. Khaled El Emam, Canada Research Chair in Electronic Health Information;

• De-identification tool that minimizes the risk of re-identification based on:

- The low probability of re-identification;

- Whether mitigation controls are in place;

- Motives and capacity of the recipient;

- The extent a breach invades privacy;

• Simultaneously maximizes privacy and data quality while minimizing distortion to the original database.

Page 9: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

EVIDENCE THE TOOL WORKS

• Dr. El Emam was approached to create a longitudinal public use

dataset using his de-identification tool for the purposes of a

global data mining competition – the Heritage Health Prize;

• Participants in the Heritage Health Prize competition were asked

to predict, using de-identified claims data, the number of days

patients would be hospitalized in a subsequent year;

• Before releasing the dataset created using Dr. El Emam’s tool,

the de-identified dataset was subjected to a strong re-

identification attack by a highly skilled expert;

• The expert concluded the dataset could not be re-identified –

Dr. El Emam's de-identification tool was highly successful!

Page 10: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

EVIDENCE THAT RE-IDENTIFICATION IS EXTREMELY DIFFICULT

•A literature search by Dr. El Emam et al. identified 14 published accounts of re-identification attacks on de-identified data;

•A review of these attacks revealed that one quarter of all records and roughly one-third of health records were re-identified;

•However, Dr. El Emam found that only 2 out of the 14 attacks were made on records that had been properly de-identified using existing standards;

•Further, only 1 of the 2 attacks had been made on health data, resulting in a very low re-identification rate of 0.013%.

Page 11: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

DATA MINIMIZATION FOR RECORD LINKAGES

• Dr. El Emam has also developed a protocol for securely linking

databases without sharing any identifying information;

• The protocol uses an encryption system to identify and locate

records relating to an individual, existing in multiple datasets;

• This involves encrypting personal identifiers in each dataset and

comparing only the encrypted identifiers, using mathematical

operations, resulting in a list of matched records, without

revealing any personal identifiers;

• The protocol promotes compliance with existing prohibition in

PHIPA by allowing linkages of datasets without the disclosure of

any identifying information – a win/win solution – positive-sum!

Page 12: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

HOMOMORPHIC ENCRYPTION

• A form of encryption that allows computations to be carried out on encrypted data to obtain an encrypted result;

• Homomorphic describes the transformation of one dataset into another while preserving relationships between data elements in both sets;

• Homomorphic encryption allows you to make computations or engage in data analytics on encrypted values – data you cannot “read” because it is not in plain text, therefore inaccessible;

• May also be used to link two or more databases without the disclosure of any unique identifiers – positive-sum – win/win.

Page 13: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

SECURE DATA ANALYTICS

ON THE CLOUD

• The Value of De-identification;

• Challenges in Re-identifying De-identified Information;

• De-identification in the Context of Privacy Laws;

• Re-identification Risk Assessment.

Page 14: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

CONCLUDING THOUGHTS

• Make privacy a priority – ensure that privacy

is embedded into your systems and operational

processes – into your business practices;

• It is easier and far more cost-effective to build

in privacy up-front, rather than after-the-fact;

• Privacy risks are best managed by proactively

embedding the principles of Privacy by Design;

• Get smart – lead with Privacy – by Design, not

privacy by chance or, worse, Privacy by Disaster!

Page 15: Ann Cavoukian, Ph.D. Information and Privacy Commissioner … · 2013. 5. 15. · BIG DATA •Each day we create 2.5 quintillion bytes of data – 90% of the data today has been created

HOW TO CONTACT US

Ann Cavoukian, Ph.D. Information & Privacy Commissioner of Ontario

2 Bloor Street East, Suite 1400

Toronto, Ontario, Canada

M4W 1A8

Phone: (416) 326-3948 / 1-800-387-0073

Web: www.ipc.on.ca

E-mail: [email protected]

For more information on Privacy by Design, please visit: www.privacybydesign.ca