transmart community meeting 5-7 nov 13 - session 1: chilly-mazarin meeting objectives
DESCRIPTION
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives Sherry Cao and Keith EllistonTRANSCRIPT
![Page 1: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/1.jpg)
TranSMART CoreFrom tool to ecosystem
Kees van BochovetranSMART Workshop Amsterdam
June 17, 2013
![Page 2: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/2.jpg)
Today, we have a chance to write history.
![Page 3: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/3.jpg)
•Microarray data analysis support•Load public microarray data from GEO•Store and retrieve saved analyses•Search on gene name, disease name etc.•Genomic variants and VCF support•Load TCGA studies we have access to•Load 1000 Genomes data
$$$$$$$$$$$$
•Microarray data analysis support•Load public microarray data from GEO•Store and retrieve saved analyses•Search on gene name, disease name etc.•Genomic variants and VCF support•Load TCGA studies we have access to•Load 1000 Genomes data
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
![Page 4: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/4.jpg)
There has to be a better way.
![Page 5: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/5.jpg)
costs $ 0!
No-brainer!
Ehm.. wait a minute…
![Page 6: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/6.jpg)
Let’s have a look how these scientists in academia are doing.
They love to collaborate right?!
![Page 7: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/7.jpg)
In 2003…(Ancient history; before Facebook)
![Page 8: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/8.jpg)
Yet Another ‘New’ Web-based Solution for the Management of Microarray Data ?!
![Page 9: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/9.jpg)
Not Invented Here Syndrome
Image from Rob Hooft, CTO Netherlands Bioinformatics Centrehttp://nothinkingbeyondthispoint.blogspot.nl/2011/11/decision-tree-for-scientific.html
![Page 10: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/10.jpg)
What about all these great FP6, FP7, IMI, … projects?
![Page 11: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/11.jpg)
Source code of major projects isreadily available on GitHub
![Page 12: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/12.jpg)
But… I’m afraid it’s still up to you and me to put the pieces together.
![Page 13: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/13.jpg)
Phenotype DatabaseWritten in Grails, supports several types of omics data, provides data integration and visualization, has R, Groovy and PHP API’s. Sounds familiar?
http://phenotypefoundation.org
![Page 14: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/14.jpg)
share
reuse
specialize
![Page 15: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/15.jpg)
Writing good software is hard.
![Page 16: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/16.jpg)
So far…
• TranSMART has a huge business potential. It’s no silver bullet though.
• Scientists sometimes have trouble re-using each others’ work. Especially when it comes to open source software.
![Page 17: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/17.jpg)
Do they?
Time to look at some succes stories.
![Page 18: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/18.jpg)
R and Bioconductor
Who doesn’t love R?
![Page 19: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/19.jpg)
Website looks as if dates from Stone Age.Must be those LaTeX-loving physicists.
![Page 20: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/20.jpg)
Very active community, and…lots of packages.
![Page 21: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/21.jpg)
Governance of R community
Brian Ripley: “The R Project is governed by a self-perpetuating oligarchy, a group with a lot of power. R was principally developed for the benefit of the core team.”
As cited on http://blog.revolutionanalytics.com/2011/08/brian-ripley-on-the-r-development-process.html
![Page 22: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/22.jpg)
Galaxy
![Page 23: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/23.jpg)
Galaxy is the most widely used open source bioinformatics web interface AFAIK.
Probably in no small amount thanks to their continuous dedication to
improving the UI.But there’s something else.
![Page 24: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/24.jpg)
Galaxy Toolshed
![Page 25: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/25.jpg)
• An open source CMS (Content Management System) written in Python, nowadays backing thousands of production grade websites
• Started by 2 developers in 2000, now an active open source project with hundreds of active developers
• In 2004, the Plone Foundation was formed to formalize IP and secure the future of Plone
• Plone Collective has hundreds of plugins
![Page 26: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/26.jpg)
![Page 27: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/27.jpg)
What do all these success stories have in common?
Bioconductor PackagesGalaxy ToolshedPlone CollectiveDrupal Modules
![Page 28: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/28.jpg)
Lessons for tranSMART
TranSMART needs a marketplace and a thriving community to survive.
To get to a functioning marketplace, we need a well-designed core.
![Page 29: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/29.jpg)
There is also another reason.
![Page 30: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/30.jpg)
TranSMART Contributions - Pharma
• Janssen– Initial version of tranSMART– Genomics viewer using IGV and GenePattern– Faceted Search interface (results browsing)
• Millenium– Loading TCGA and many GEO studies– R interface for interacting with data directly in R– Several R analyses available directly in GUI
![Page 31: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/31.jpg)
TranSMART Contributions - Pharma
• Sanofi– Cleaner user interface– Added metadata layer for all concepts– Study/Program categorization & file management
• Pfizer– GWAS upload (VCF), data storage and analysis– Enhanced data export capabilities
![Page 32: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/32.jpg)
![Page 33: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/33.jpg)
This is a mess.
Another reason why we need that core.
![Page 34: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/34.jpg)
Start the Core: I2B2 Refactoring
1. I2B2 was integrated with tranSMART, but the I2B2 API abstractions were leaked all over the place in the tranSMART application.
2. We agreed in the London meeting that all parties would set some time apart for working on the core.
3. Combined, it made sense to start working at the clinical data API, properly using the I2B2 API where possible, and re-implement all I2B2 functionality in a new ‘core-db’ plugin.
![Page 35: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/35.jpg)
The first version of core-integration was completed half April.
By then, all webservice calls to what formerly was an outdated version of the
I2B2 Ontology and CRC cells, were handled by the newly implemented core-db plugin.
Also, a set of tests was written in the process and API documentation generated.
![Page 36: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/36.jpg)
In the long run, I believe forming a good distributed working group on the core API is a more important delivery of this workshop
than crunching out a stable 1.1 version.
That’s how we write that history
![Page 37: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/37.jpg)
![Page 38: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/38.jpg)
Kees van Bochove - The Hyve
Current tranSMART Architecture
![Page 39: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/39.jpg)
TranSMART’s Strong Points
• Powerful, ready to go user interface for common analyses (survival analysis, gene expression heatmaps etc.)
• Leverages i2b2 data model for clinical data and offers unified view over different studies
• Uses a lot of good open source technology under the hood (Grails, R, SOLR, Pentaho) leveraging existing community developments
![Page 40: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/40.jpg)
TranSMART Building Blocks
• R: open source statistics package with CRAN, an active repository in which many algorithms and statistical packages are published
• Grails: a rapid application development framework in Groovy leveraging Java technology such as Hibernate, Spring, Quartz
• I2b2: domain specific open source package for storing and querying clinical data
• GenePattern, maybe soon: Galaxy, KNIME?
![Page 41: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/41.jpg)
TranSMART’s Weaknesses
• Large monolithic codebase with little modularization beyond the standard Grails MVC setup
• Code quality is problematic, especially JavaScript• Test coverage is low, no functional / web tests
and little unit and integration tests• No clear internal API’s, only a service level that
does the plumbing.• I2b2 integration violates i2b2 abstractions
![Page 42: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/42.jpg)
tranSMART Plans
• Use a clearly modularized architecture with separation of clinical, high dimensional, search and metadata storage; workflow execution enginges and knowledge repository
• Define clear API and rewrite current implementations with good test coverage
• Use i2b2 data model, re-harmonize with latest i2b2 APIs, and don’t use i2b2 binaries directly
• Separate analysis definitions and abstract from workflow execution engine
http://prezi.com/t6twshyctdsk/transmart-core-refactoring
![Page 43: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/43.jpg)
Kees van Bochove - The Hyve
Target tranSMART Architecture
![Page 44: tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives](https://reader033.vdocuments.site/reader033/viewer/2022052618/554e8b06b4c905fc368b48d7/html5/thumbnails/44.jpg)
Further reading
• Description of core API efforts: http://thehyve.nl/rewiring-transmart
• In depth description of i2b2 refactoring: http://thehyve.nl/inital-work-on-transmarts-core
• Overview of tranSMART Core API so far: http://thehyve.github.io/transmart-core-api/
• Example of continuous integration test suite (of core-db): https://ci.ctmmtrait.nl/browse/TM-COREDB-JOB1-51/test