elixir · elixir human data implementation study 2016 project / study partners comment elixir /...
Post on 08-May-2020
19 Views
Preview:
TRANSCRIPT
www.elixir-europe.org
ELIXIRUpdate on ELIXIR - TraIT pilot
Sanne Abeln (VU / ELIXIR Netherland)
Dylan Spalding (EMBL-EBI)
ELIXIR Human Data Implementation Study 2016
Project / Study Partners Comment
ELIXIR / GA4GH Beacons
Leads: Jordi Rambla & Ilkka
Lappalainen
CRG (ES), EMBL-EBI, FR, BE,
SE, FI, NL
Implement Beacons in ELIXIR
Nodes (12M)
ELIXIR / GA4GH Metadata API
Leads: Michael Baudis & Helen
Parkinson
SIB (CH), EMBL-EBI, FR Reference implementation of
meta-data API (12M)
ELIXIR / IMI Oncotrack
Lead: Susanna Repo, David
Henderson & Dylan Spalding
Hub, EMBL-EBI, CRG (ES) Scoping study on long term data
management (6M)
ELIXIR / TrAIT: EGA backend for
TranSMART
Leads: Sanne Abeln & Dylan
Spalding
NL, EMBL-EBI, CRG (ES) Provide EGA connector for data
submission/retrieval (12PM) [June
2015-December 2016]
ELIXIR Human Data Implementation Study 2016
Project / Study Partners Comment
ELIXIR / GA4GH Beacons
Leads: Jordi Rambla & Ilkka
Lappalainen
CRG (ES), EMBL-EBI, FR, BE,
SE, FI, NL
Implement Beacons in ELIXIR
Nodes (12M)
ELIXIR / GA4GH Metadata API
Leads: Michael Baudis & Helen
Parkinson
SIB (CH), EMBL-EBI, FR Reference implementation of
meta-data API (12M)
ELIXIR / IMI Oncotrack
Lead: Susanna Repo, David
Henderson & Dylan Spalding
Hub, EMBL-EBI, CRG (ES) Scoping study on long term data
management (6M)
ELIXIR / TrAIT: EGA backend for
TranSMART
Leads: Sanne Abeln & Dylan
Spalding
NL, EMBL-EBI, CRG (ES) Provide EGA connector for data
submission/retrieval (12M)
Types of (Big) Data in Health and Disease
Clinical Data Imaging
High throughput molecular profilingNon-high throughput diagnostics
Size of the (raw) data
DNA sequence data is doubling every 6-8 months over the last 3 years and looks to continue for this decade
Privacy & Ethical Issues: Snyderome use case
• triangle indicates average expected risk (given age, sex, etc)
• yellow line indicates risk given SNV profiles
high cholesterol
diabetes
skin cancer
Human data: experimental molecular profiling
Molecular Profiling Data (TraIT)
raw data
processed data
computational
workflows
Aim of pilot: ELIXIR-TraIT
processed data
computational
workflows
raw data
TraIT
Aim: “The CTMM Translational Research IT (TraIT) project is
developing and implementing a long-lasting IT infrastructure for
translational research projects in the Netherlands that will
facilitate the collection, storage, analysis, and archiving of data
generated in the biomedical research projects.”
• no research
• end-user driven
CTMM-TraIT
Five data
generation work
packages
Integration and
analysis across
the platforms
7
Five data
generation
work
packages
Integration and
analysis across the
platforms
TraIT - Structure
CTMM-TraIT
Five data
generation work
packages
Integration and
analysis across
the platforms
7
TraIT - Structure
wet-lab-person
tech-operator
(bio)informatician
PI / (end)user
PI / (end)user
Scientific Output
Participant/patient enters
IntellectualProperty
ImprovedHealthcare
Experimentaldata
Downstreamanalysis
Clinical/Biobank Procedures
Imaging Samples ExperimentsQuestionnaire
EHR
DataIntegration
External data
Image database& analysis
Biobankdatabase
Phenotypedatabase
ParticipantInteraction
TraIT - Infrastructure
Scientific Output
Participant/patient enters
IntellectualProperty
ImprovedHealthcare
Experimentaldata
Downstreamanalysis
Clinical/Biobank Procedures
Imaging Samples ExperimentsQuestionnaire
EHR
DataIntegration
External data
Image database& analysis
Biobankdatabase
Phenotypedatabase
ParticipantInteraction
TraIT workflow Mostly open source IT tools
supported by international
public and private communities
8
25
26
77
TraIT Experimental data flow
9
Molecular profiling workflow
raw data processed data
TraIT Infrastructure
29
TraIT WP4: experimental data
management and analysis
Molecular profiling computation pipelines
(Bioinformatics)
Galaxy: workflow tool
End user friendly
tranSMART: data explorationclinical data + processed data
study centric
cohort / sample selection
13
tranSMART: data exploration
TraIT: ongoing effort to make tranSMART ready for processedmolecular profiling data
14
Allows “onco print” like view in tranSMART
24
EGA
European Genome-phenome Archive (EGA)
•
•
Structural variants
Open & public archives
European Nucleotide Archive
EVAEuropean Variation Archive
BioSamples
Database of Genomic Variants archive
Controlled access archive
About the EGA: What is controlled access data?
• Human, personally identifiable data types (‘raw’, processed, phenotypic)
• Affiliated to bio-medical research or consortium projects
• Informed consents specify controlled release requirements
• Access by formal application procedure to Data Access Committee (DAC)
About the EGA: What is controlled access data?
• Launched 14th July 2008
• Role: Secure archive for controlled distribution of consented genetic & phenotypic data
• 7500+ users with access; 360+ submission accounts
• Over 3.3PB archived; ~1800 datasets; 1.6PB distributed over the last year
• +200 contacts to Helpdesk/month
About the EGA: Context and metrics
About the EGA: Archived data growth
About the EGA: Distributed data volumes
About the EGA: Studies at the EGA
About the EGA: consortia data archived at the EGA
https://icgc.org/
PUBMED: 25529582
PUBMED: 25529582
https://www.sanger.ac.uk/
http://www.hipsci.org/
http://www.uk10k.org/
http://rd-connect.eu/
http://www.icr.ac.uk/
http://www.wtccc.org.uk/
About the EGA: Accessing data
About the EGA: Accessing data
About the EGA: Distribution model for controlled access
~1800 datasets
EGA Download streamer
About the EGA: Distribution model for controlled access
About the EGA: Distribution model for controlled access
The EGA data access life cycle
1.•
•
•
2.•
•
3.•
•
Source: https://wiki.galaxyproject.org/Events/GCC2015?action=AttachFile&do=get&target=S1_T2_Hoogstrate_TraIT.pdf
Paul Flicek Arcadi Navarro
Helen Parkinson
Dylan SpaldingJordi Rambla
Angel Carreno
Jeff Almeida-King
Sabela de la Torre
Saif Ur Rehman
Oscar Martinez
Pablo Garcia Mario Alberich Alexander Senf Alfred Gil Giselle Kerry
Mauricio Moldes
Jag Kandasamy Audald Lloret
Aim of pilot: ELIXIR-TraIT
processed data
computational
workflows
raw data
human data
clinical data
experimental
profiling data
processing
workflows
sequencing
workflow
arrays
workflow
Proteomics
workflow
clinical and
processed data
archival storage
workflows &
analysis
Galaxy
transSMART
raw data storage
retrieve raw data
workflow provenance
Reference to raw data
EGA
reference workflows
human data
clinical data
experimental
profiling data
processing
workflows
sequencing
workflow
arrays
workflow
Proteomics
workflow
clinical and
processed data
...
archival storage
EGA
local storage
workflows &
analysis
Galaxy
transSMART
raw data storage
retrieve raw data
workflow provenance
Reference to raw data
Demo: TraIT - Cell Line Use Case
• Non-privacy sensitive data (cell lines)
• Test and develop workflows
• Demo and training of workflows
1. User creates cohort subsets in TranSMART
2. Explores the data by
viewing summary
statistics or generating
Geneprints.
1
3. The user can perform TranSMARTs built-in analyses.1
4. As built-in analyses allow very little flexibility the user is able
to export the data to his workstation or to Galaxy.
Here there will be options to export raw data to the TraIT Galaxy.
2
Link to EGA in tranSMART
F1000Research 2015 - DRAFT ARTICLE (PRE-SUBMISSION)
ced : RNA_seq_CACO2 pred : f i l e_e
ga_l oc ? f i l e .
}
We have the same response:
( r d f l i b . term . L i t er a l ( u ’ EGAR00001352058 ’ ,
dat at ype= r d f l i b . term . URIRef ( u ’ h t t p : / / www
. example . org/ ega# f i l e ’ ) ) , )
We replace pred:file_ega_loc with pred:file_surfsara_loc
to get the PID of the files in Beehub, we have the response:
( r d f l i b . term . URIRef ( u ’ h t t p : / / ep i c3 . st o r a
ge . su r f sar a . n l :8001/ 846/ CACO2_f i l e ’ ) , )
ConclusionThe proposed ontology structure only focuses on the ba-
sic property of the data: CED and data files; which can be
mapped to in all kinds of the data structures and be re-
sistant to the changes in biology experimental techniques
in the future. We deem that this proposed ontology struc-
ture is so flexible and stable that it can be application to
other modules in TraIT. Then, ISA-Tab can be adopted to
make the documenting of the mapping more friendly.
We will finish this paper with a possible user interface de-
sign we have conjured up (Figure 11).
Figure 11. A User Inferface Design in tanSMART: it is
permissable to download the processed data, because
it is easier to handle; for the raw data, the big data
file and the complicated content obviate the down-
loads; the big files will be transferred to GALAXY, so
that users can do the analysis in the web interface,
easier and more intuitive.
Background InformationThis paper is finished by Chao Zhang under the super-
vision and tutelage of Sanne Abeln, Christine Staiger and
Jochem Bijlard as the thesisof Chao Zhang’s major intern-
ship of Bioinformatics Master Program in VU University,
Amsterdam.
Software AvailabilityA python command client that wraps up all the EGA APIs
of uploading has been developed and a demo that con-
nects EGA and Beehub with TranSMART and Visualizes
RDF graph has been given. The sources codes of the
client (including its usage) and the demo can be found
in https:/ / github.com/ cicocn/ TraIT-Demo.
Author Contributions
Competing InterestsNo competing interests were discosed.
Grant Information
AcknowledgementsBesides the authors, many people have also contributed
valuable ideas and suggestions to this project. They are
listed as followed in no particular order.
TheHype: Keesvan Bochove, Ward Weistra, Ruslan Foros-
tianov, Peter Kok
VU: Bas Stringer Luiz Bonino(DTL)
TraIT: Remond Fijneman (NKI), Stef van Lieshout
(VUmc), Mariska Bierkens (NKI), Freek de Bruijn (NKI),
Youri Hoogstrate (EMC), Andrew Stubs (EMC), Christian
Rausch (VUmc), David van Enckevort (UMCG), Gerrit
Meijer (NKI), Jan-Willem Boiten(CTMM)
EGA:Justin Pascal, Ilkka Lappalainen, Jordi Rambla
SURFsara: Irene Nooren, Jan Bot
References[ 1] Jianjiong Gao, Bülent Arman Aksoy, Ugur Dogrusoz,
Gideon Dresdner, Benjamin Gross, S Onur Sumer, Yichao
Sun, Anders Jacobsen, Rileen Sinha, Erik Larsson, Ethan
Cerami, Chris Sander, and Nikolaus Schultz. Integrative
analysis of complex cancer genomics and clinical profiles
using the cBioPortal. Science signaling, 6(269):pl1, apr
2013.
[2] Ethan Cerami, Jianjiong Gao, Ugur Dogrusoz, Benjamin E
Gross, Selcuk Onur Sumer, Bülent Arman Aksoy, Anders
Jacobsen, Caitlin J Byrne, Michael L Heuer, Erik Larsson,
Yevgeniy Antipin, Boris Reva, Arthur P Goldberg, Chris
Sander, and Nikolaus Schultz. The cBio cancer genomics
portal: an open platform for exploring multidimensional
cancer genomics data. Cancer discovery, 2(5):401–4, may
2012.
[3] Tracy Hampton. Cancer Genome Atlas. JAMA,
296(16):1958, oct 2006.
[4] John N Weinstein, Eric A Collisson, Gordon B Mills, Kenna
RMillsShaw, Brad A Ozenberger, KyleEllrott, Ilya Shmule-
vich, Chris Sander, and Joshua M Stuart. The Cancer
Genome AtlasPan-Cancer analysisproject. Naturegenetics,
45(10):1113–20, oct 2013.
[5] Erik Roelofs, André Dekker, Elisa Meldolesi, Ruud G P M
van Stiphout, Vincenzo Valentini, and Philippe Lambin.
International data-sharing for radiotherapy research: an
open-source based infrastructure for multicentric clinical
data mining. Radiotherapy and oncology : journal of the
European Society for Therapeutic Radiology and Oncology,
110(2):370–4, feb 2014.
[6] Stefan Klein, Erwin Vast, Johan van Soest, Andre Dekker,
Marcel Koek, and Wiro Niessen. XNAT imaging platform
for BioMedBridges and CTMM TraIT. Journal of Clinical
Bioinformatics, 5(Suppl 1):S18, may 2015.
[7] Saskia Hiltemann, Hailiang Mei, Mattias de Hollander,
Ivo Palli, Peter van der Spek, Guido Jenster, and Andrew
Stubbs. Cgtag: complete genomics toolkit and annotation
in a cloud-based galaxy. GigaScience, 3(1):1, 2014.
Page 8 of 10
2
6. The user will be able to import the data into the Galaxy tool
to start a custom workflow on the raw data.
Through a new connection the data will be visible here.
3
7. The user will then be able to run an existing workflow or
create one with the existing tools.
Workflow input:
2 FASTQ files from
paired end RNA-seq
Workflow output:
QC, feature counts,
snps, indels
4
human data
clinical data
experimental
profiling data
processing
workflows
sequencing
workflow
arrays
workflow
Proteomics
workflow
clinical and
processed data
Sample
Catalogue
archival storage
EGA
local storage
workflows &
analysis
Galaxy
transSMART
raw data storage
retrieve raw data
workflow provenance
reference workflows
Reference to raw data
Try to make a general framework
Today we use: tranSMART, EGA, Galaxy
Tomorrow?
Timelines of ELIXIR TraIT pilot
• June – Dec 2015
• Enabling CLUC upload and download (VU)
• First draft of meta-data scheme (VU)
• Jan 2016 – June 2016
• Galaxy import from EGA (EMC)
• Continued upload of CLUC data (The Hyve)
• June 2016 – Dec 2016
• Finish meta-data scheme (EGA + VU + The Hyve)
• tranSMART connection (VU + The Hyve)
continuing support EGA
(EBI & Barcelona)
FAIR principles - interoperable
Paradigm shift needed:
• share data
• interoperable data referencing
• less study centric
Beyond the pilot?
• BBRMI (Sample catalogue)
• Rare disease use case
• Galaxy – tranSMART
• integration (?)
• Workflow referencing
• Meta data / Fair data push, PIDs - EGA
EMBL- EBI
• ‘European Node’
• EMBL-ELIXIR Work Programme agreed, ie, Ensembl, EGA, ENA, PDBe, EuropePMC
• Lead EXCELERATE Use Cases on Rare Diseases, Human Data, Plant Sciences, Marine..
• Involved in several ELIXIR Pilot Actions
36
ELIXIR Netherlands
• Led by DTL
• Exists as a PPP
• Officially launched in 2014
• Run Bring Your Own Data (BYOD) programme
• Large role in interoperability backbone, ie FAIR
37
ELIXIR Spain
• Long history of national collaboration - INB
• ELIXIR Pilot on EGA as a Joint Venture
• Involvement in ENCODE, ICGC, IRDiRC, RD-Connect, tomato genome consortium, various IMI projects
38
EGA Team
Paul Flicek Arcadi
Navarro
Helen
ParkinsonDylan SpaldingJordi Rambla
Angel
Carreno
Jeff Almeida-
King
Sabela de la
TorreSaif Ur
Rehman
Oscar Martinez
Pablo Garcia Mario Alberich Alexander Senf Alfred Gil Giselle Kerry
Mauricio
Moldes
Jag Kandasamy Audald Lloret
TraIT (WP4)
Remond Fijneman
Sanne Abeln
Stef van Lieshout
Mariska Bierkens
Freek de Bruijn
Thang Pham
Youri Hoogstrate
Saskia Hiltemann
TraIT
Gerrit Meijer
Jan-Willem Boiten
Wim van der Linden
Rita Azevedo
Jeroen Beliën
Pieter Neerincx
Jaap Heringa
The Hyve
Kees van Bochove
Ward Weistra
Ruslan Forostianov
Peter Kok
contact: j.bijlard@vu.nl
TraIT –WP4 (NL)
Remond Fijneman (NKI)
Stef van Lieshout (VUmc)
Mariska Bierkens (NKI)
Freek de Bruijn (NKI)
Youri Hoogstrate (EMC)
Saskia Hiltemann (EMC)
Thang Pham (VUmc)
Andrew Stubs (EMC)
Bauke Ylstra (VUmc)
Christian Rausch (NKI)
Connie Jimenez (VUmc)
Guido Jenster (EMC)
Pieter Neerincx (UMCG)
The Hyve (NL)
Kees van Bochove
Ward Weistra
Ruslan Forostianov
Peter Kok
Jochem Bijlard
Wibo Pipping
TraIT (NL)
Gerrit Meijer (NKI)
Jan-Willem Boiten (CTMM)
Wim van der Linden (Philips)
Rita Azevado (eScience/ CTMM)
Jeroen Beliën (VUmc)
Jaap Heringa (VU)
`
ELIXIR – Human Data
Dylan Spalding (EGA, EBI, UK)
Jordi Rambla (EGA, CRG, ES)
Justin Paschall (EGA, EBI, UK)
Alexander Senf (EGA, EBI, UK)
David van Enckevort (BBRMI,UMCG, NL)
Morris Swertz (BBRMI,UMCG, NL)
Luiz Bonino (ELIXIR, DTL, NL)
Susanna Repo (ELIXIR, EBI, UK)
Christine Staiger (SurfSara, NL)
Irene Nooren (SurfSara,NL)
Ilkka Lappalainen(ELIXIR)
Chao Zhang [Cico] (ADS, VU, NL)
www.elixir-europe.org
@ELIXIREurope /company/elixir-europe
THANK YOU!
top related