open research cambridge december 16 2013 - presentation by fiona nielsen dnadigest

Post on 27-Jan-2015



Health & Medicine



Click to see full reader


On December 16th 2013, Fiona Nielsen from DNAdigest gave a presentation at the Open Research meetup in Cambridge. The meetup organiser had invited DNAdigest to participate in a discussion on genomics and data sharing. Keren introduced the evening with a video explaining what a genome is and what it means to have your genome sequenced. Fiona gave a general presentation on the field of genetics data sharing including topics such as data sharing for research, patient consent and direct-to-consumer genetic testing. The audience was an approximately 50/50 division of researchers vs other professions which gave an insightful discussion on the advantages of genetic research and the potential risks of data sharing and the high hopes of the impact of genetics on the future of medicine. The Open Research meetup is a meetup group initiated by members of the Open Knowledge Foundation. You can read about their upcoming events on their OKFN web page: Read more about DNAdigest online: and follow the Twitter feed @DNAdigest


Secure the data – share the knowledge

Open Research Cambridge

Fiona Nielsen, December 16, 2013

Take home messages

• DNA sequencing = exciting new opportunities + new challenges

• Your genome is your data (like your EHR)

• Options for sharing today are limited mostly to all-or-nothing

• Take the opportunity to voice your opinion

My aim with this talk: to give you a little view of how you can share genetic data today, and give

you an idea of the challenges involved.

I will end the presentation with a brief introduction to the DNAdigest project, and then open the

floor for questions and discussion.

Data is donated to research

Individuals are offered to opt-in their consent for their data to be used for research to aid development of

diagnostics and treatments for genetic diseases

Genetic data is needed for research into inborn illnesses, heriditary diseases, rare diseases and cancer.

Genome research today

the patient

the researcher

the sample

the data

Direct-to-consumer genetic testing

You can order your own personal genotyping kit online for only $99 from

You can assess your own carrier status for known disease genes before you get pregnant, for example

You can obtain non-invasive pre-natal testing by detecting foetus DNA in the mothers blood, example

You can have your whole genome sequenced for about $7,000

Example: 23andme

Manuel Corpas used direct-to-consumer testing for himself and his family

The “Corpasome”

Family genome and analysis published open access online

• Deceased

• 1M 23andMe

v3 SNP chip

• Age: 75

• 1M 23andMe v3 SNP chip

• 15,823,554 HiSeq Exome

PE Reads

• Age: 79

• 1M 23andMe v3 SNP chip

• 15,190,489 HiSeq Exome

PE Reads

• Age: 51

• 1M 23andMe v3 SNP chip

• 14,123,580 HiSeq Exome PE


• Age: 36

• 0.5M 23andMe v2 SNP chip

• 32,116,828 HiSeq Exome SE Reads

• Metagenomics

The “Corpasome”

Family genome and analysis published open access online

open access online as a free resource for research

But what about privacy? There is large variation between individuals,

and we are all unique

This means that your genome sequence can identify you - or your heritage

Similarly, your medical record may contain information that is unique to you

3,000,000,000bp ~ 3billion basepairs

in the human genome

Consent for research

The head of the research project will create a custom consent form:

- Purpose

- How, when and who

Consent is obtained in the interest of the patient


Contact with patient may be lost after data collection

The institution acts as the custodian of the collected data,

and data is locked up

Institutions do not freely share data because revealing entire datasets breaches confidentiality

Data access is restricted if available at all If available, access requires specific application per dataset per project

Results are published

When sufficient data is collected for a project, an analysis is made and a paper may be published to report the results.

Results from confidential data are reported in anonymized form

Anonymization = removal of identifiers

• Name, birthdate, NHS number, town of birth

Published information

Level of data detail varies per project and per publication.

Rare disease research usually includes family pedigrees and detailed description of the disease symptoms per individual

GWAS studies usually include only aggregated statistics

Problem for data sharing

Trade-off: details are necessary for data re-use!

Restricted access


• Advantage: access to

complete datasets of

genetics and medical


• Disadvantage:


timeconsuming and

slow processing of

application for access

• Disadvantage:

difficult to discover the

data you need

Completely public data

• Advantage: Easy


• Disadvantage: if

medical data is

removed = no value for


• Disadvantage: no

guarantee of privacy

• Example: The Personal

Genomes Project

Limitations of current mechanisms

• Not easy to discover data

• Not easy to apply for access to data

• Not easy to deal with bulk datasets

As a consequence: • Researchers do not cross-check their results

• Data is not re-used for analysis

• Researchers duplicate existing work

• Results are published based on small sample sizes

What if?

• Every individuals would be custodian of their own data?

• What if there would be different ways of sharing data?

• What if you could share just part of your data?

• What if the consent form included options for the level of sharing of the data?

• What if you gave patients the option to share their data with no restrictions?

• What if you could share data in aggregate statistics?

• What if you could share your data today and change your mind tomorrow?

New approaches

• Crowdsourcing of genetic testing results: #freethedata for breast cancer genes BRCA1 and BRCA2

• Share your 23andme data with OpenSNP

• Control your EHR with PatientsKnowBest

• DNAdigest: Allow sharing of aggregated data to enable discovery and faster access for research

Our mission

To create a self sustainable platform that supports the widest possible sharing and access of genomic data in

accordance with patient consent.

DNAdigest is designing an ethical data sharing platform

Allowing hypothesis centered queries, returning anonymised aggregated data by patented mechanism


Results are delivered as

anonymised aggregated statistics

Further reading

• What to consider before undergoing a DNA test article in the Wall Street Journal

• Manuel Corpas blog:

• Interview about the DNAdigest project on

• Genetic Privacy Network (launched Dec 2013) resources about the risks and legal issues for US residents

• Anonymization and re-identification: Routes for breaching and protecting genetic privacy by Erlich and Narayanan

Take home messages

• DNA sequencing = exciting new opportunities + new challenges

• Your genome is your data (like your EHR)

• Options for sharing today are limited mostly to all-or-nothing

• Take the opportunity to voice your opinion

• It is question time!

Thanks for listening

And thanks to OpenResCam and Panton Arms for hosting!

DNAdigest is a not-for-profit organisation, founded for the purpose of enabling faster and easier access and sharing of genomic data for research.

Please visit us at

and on twitter @dnadigest

Thank you!

top related