big data genómico - ernest lluchbig data genómico mesa redonda: desde el big data a la medicina de...
TRANSCRIPT
Big Data GenómicoMesa redonda: Desde el Big Data a la medicina de precisión
---ENCUENTRO «ERNEST LLUCH»
Big Data y Real World Data en el análisis de la utilización, seguridad y efectividad de medicamentos, tecnologías e intervenciones sanitarias
CUIMPB Barcelona, 9 de junio, 2019
Joaquín Dopazo Área de Bioinformática, Fundación Progreso y Salud, Nodo de Genómica Funcional, (INB-ELIXIR-es), Bioinformática de ER (BiER-CIBERER),CDCA, Hospital Virgen del Rocío, Sevilla
http://www.clinbioinfosspa.eshttp://www. babelomics.org
@xdopazo, @ClinicalBioinfo
The clinical bioinformatics area
http://www.clinbioinfosspa.es/
The Bioinformatics Area, created in June 2016 in the Fundación Progreso y Salud, has as main goal supporting the Program of Personalized Medicine of the Andalusian Community by facilitating the use of genomic data for precision diagnosis and treatment recommendation, implementing opportunities for the prospective innovation in health care in the public health system .
Data analysis ...and… exploitation
12Million,since 2001
Mission: facilitating the use of genomic big data in the clinic
The two tiers of personalized medicine:Use and re‐use of genomic data to
generate new knowledge
First tier: use of patient genomic data for precision diagnosis (typically RDs) and treatment recommendation(cancer).Extensively implemented in hospitalsRequires information on gene to phenotype association
Second tier: use clinical data (eHR) along with genomic data for preventive medicine, biomarkerdiscovery and clinical research. Andalusian Population Health Database, with over 12M people since 2001.Aim: converting the whole Public Health System (SAS) into a huge prospective clinical study (GDPR compliance within SAS)
Use of genomic data in the public health system requires sustainability
• Use: Increasingly DSS, and decreasingly research• DSS
• User: DSS geneticist, research bioinformatician• Routine genomic analysis (diagnosis, treatment recommendation) with
tools for end users, which involves hiding the complexity of the analysis.• A solution for the management of genomic data must be integrated the
same way other analyses of the health system are.
• GDPR compliance and data reusability:Genomic data must be storedwithin the health system, linkedto clinical data the same way other data are for further potential prospective clinical studies
Second tier: exploitation of the clinical Big Data
The population health databasePossibly the largest database ever created with detailed clinical data, storing information on 12.083.681 patients since 2001
Genomic data collected by the SAS is being uploaded onto BPS, and soon Digital pathology and Liquid Biopsy will be as well.More complex queries, including the genetic basis of the traits, will be possible
∑
Genomic biomarkers
PADIGA
Liquid biopsy
Why doctors wash hands?Small‐scale big data analysis in 1840
Ignaz Philipp Semmelweis (1818‐1865)
Observation: doctors' wards had three times the mortality of midwives' wards (30% vs <5%). Washing hands reduced the mortality <1%
Health records, beyond to keep track of the histories of patients, can collectively tell histories on the diseases, the processes, etc.
Semmelweis, by looking at the HRs, made an interesting observation: giving birth assisted by a doctor was dangerous
However, clinical data exploration and research can give new information different from the original storage purpose
Clinical big data exploitation requires a digital transformation of health systemsData digitalization is necessary but not sufficient for big data exploitation.Digitalization has been mainly designed to control processes and to facilitate the access to patient’s data.
1E‐080,00000010,0000010,000010,00010,0010,010,11101001000100001000001000000
2001‐01‐01
2002‐01‐01
2003‐01‐01
2004‐01‐01
2004‐12‐31
2005‐12‐31
2006‐12‐31
2007‐12‐31
2008‐12‐30
2009‐12‐30
2010‐12‐30
2011‐12‐30
2012‐12‐29
2013‐12‐29
2014‐12‐29
2015‐12‐29
2016‐12‐28
2017‐12‐28
2018‐12‐28
2019‐12‐28
2020‐12‐27
2021‐12‐27
2022‐12‐27
2023‐12‐27
2024‐12‐26
2025‐12‐26
2026‐12‐26
2027‐12‐26
2028‐12‐25
2029‐12‐25Thou
sand
s of d
ollars
Cost of human genome sequencing
The future of data generation
1$ genome?
• DNA sequencing prices will soon be comparable to any other conventional test.
• Actually, they already are (if the whole treatment is considered)
Health systems will become the main genomic data generator in a near future
Radiography
TAC
TAC‐PET
The future of data generation
PanelPanel WESWES WGSWGS
Obstacles: Lack of interpretability on most of the findings
WES yields 50 – 80K variantsWGS yields 1‐2 M variantsMany variants of unknown significance (VUS)
The future of generation of information: Diagnostic (RD) ore treatment (cancer) biomarkers
About 6000 rare disease over 80% with genetic cause
In less than 5 years most of the rare variation will be known
About 6000 rare disease over 80% with genetic cause
In less than 5 years most of the rare variation will be known
Today 1‐3 new therapies enter in hospitals every Q
In less than 5 years WES and WGS will increase therapeutic options for patients
Today 1‐3 new therapies enter in hospitals every Q
In less than 5 years WES and WGS will increase therapeutic options for patients
RD & cancer are genetic diseases with strong penetrance
AI can help with the generation of information for the interpretation of the findings
Making use of the information currently available on pathologic variation, the potential pathologic character of a new mutation can be predicted (to some extent) with AI methods, that clearly outperform conventional statistic methods
• Modular nature of genetic diseases: Causative genes for the same or phenotypically similar diseases may generally reside in the same biological module.
• Mechanistic models use pathways as maps of protein functional interplay to convert genemeasurements into cell functional activity profiles.
• Mechanistic models predict the effect that interventions over the pathway components have over the pathway functional activity.
• However, the generation of biological knowledge (e.g. pathways) is a slow and artisanal procedure carried out by individual laboratories and requiring a lot of validation.
Generation of Biological knowledge:
The last frontier:Knowledge generation from data using AI
Topol, 2019, Nat. Med.
Variables
Samples
Variables
SamplesCurse of dimensionalityLearning biological knowledge from the data is currently quite complex. New methods for feature selection, dimensionality reduction, multi‐view learning and network learning need to be developed.
Optimal ML scenario
Transition to models that integrate omicand clinical data
…
…
Genomic Clinic
Clinical study
• Treatment of genomic data for research purposes (GDPR)
• Principle of use of minimal personal data
• Data pseudoanonimization• Each study requires of a
specific genomic and clinical data collection into an external database
• Serious security concerns (genomic + clinical data outside the hospital)
• Static clinical data (e.g. if a control becomes a case the external DB will not be updated)
• Limited genomic data reuse for purposes different from the original study.
…
…
Genome Clinic
….
Study1 ….. Studyn
Query engine
• Clinical data dynamicallyassociated to genomic data
• Possibility of many clinical studies by reanalyzing genomic data under diverse perspectives (with no extra investment)
• Growing genomic DB with increasing study possibilities
• The whole health system becomes a enormous potential prospective clinical study
Today’s information generation Possibilities in systems with universal eHR
Possibly the largest database ever created with detailed
clinical data, storing information on 12.083.681 patients since
2001
GDPR(minimal data use)
Possible future models for large‐scale data sharing
…
Study1Low risk. Aggregated data
Genomic Clinic
…
Risk
….
Study1 ….. Studyn
Federated External repository
Nordic countriesMEGA 1Million genomes
Genomics England LtdCIBERER
...the INB-ELIXIR-ES, National Institute of Bioinformaticsand the BiER (CIBERER Network of Centers for Research in Rare Diseases)
@xdopazo
@ClinicalBioinfo
Follow us on twitter
https://www.slidesha
re.net/xdo
pazo/
Clinical Bioinformatics AreaFundación Progreso y Salud, Sevilla, Spain, and…
?
eHR
K
NO
YES
D
I: patient`s InformaciónG: patient`s GenomeD: high precision DiagnosisKnowledge
K
Clinical research
D
Knowledge
Diagnosis / Diagnosis / therapy
G
I
Sequencing Unit
11
22
33
44
5566
77
88
Genomic data in the clinic:Transition from research and discovery to DSS
?
Bioinfo Unit
Corporative
Genetics Unit