interoperation between intermines
TRANSCRIPT
Interoperation between InterMines
Legume Federation, June 22, 2015Vivek Krishnakumar
Chris TownJ. Craig Venter Institute
InterMine in a nutshell
• Open-source data warehouse software• Integration of complex biological data• Parsers for common biological data formats• Extensible framework for custom data• Cookie-cutter interface, highly customizable• Interact using sophisticated web query tools• Programmatic access using web-service API
Open-source Project
• Source code available online• Distributed with the GNU
LGPL license• GitHub Repo:
https://github.com/intermine/intermine
• GitHub Organization: https://github.com/intermine
intermine / intermine> bio> biotestmine> config> flymine> humanmine> imbuild> intermine> testmodel .gitignore .travis.yml LICENSE LICENSE.LIBS README.md RELEASE_NOTES
Richard N. Smith et al. Bioinformatics 2012;28:3163-3165
InterMine system architecture
InterMine system architecture
Web Application• Java Server Pages (JSP), HTML, JS, CSS• Interfaces with Java Servlets and IM web-services
Web Server• Tomcat 7.0.x, serves Web application ARchive file• ant based build system using Java SDK
Database Server• PostgreSQL 9.2 or above• range query, btree, gist enabled (refer docs here)
http://intermine.readthedocs.org/en/latest/system-requirements/
Alex Kalderimis et al. Nucl. Acids Res. 2014;42:W468-W472
InterMine web services
http://iodocs.labs.intermine.org
JBrowse
Federated Authentication
• Apart from the standard login scheme (username/password), InterMine supports industry standard OAuth2 based login flows, implemented by Google, GitHub, Agave, etc.
• ThaleMine (Arabidopsis) relies on this infrastructure to authenticate users against the araport.org tenant registered within the Agave infrastructure
• Documentation available here: http://intermine.readthedocs.org/en/latest/webapp/properties/web-properties/#openauth2-settings-aka-openid-connect
Interoperability?
• Ability of InterMine instances to communicate ‘automatically’ with each other
• By way of leveraging web services• Questions to be answered:
What do they say to each other? How do they say it? What mechanisms are used? Enabling these mechanisms…
Data Model
• Data Model === Schema of InterMine instance
• Defined in XML format• Core data model (based on SO) can be
extended to suit requirements• Access a mines data model in JSON format
http://MINE_URL/service/model/?format=json
• Compatibility of data models across mines ensures interoperability
Advantages of common data model
• Data mining scripts developed for one mine immediately compatible with others
• Promotes crowdsourcing one/more groups write
tools/widgets/parsers can be easily reused by others
• Enables cross species analysis
Available tools
• Multi-mine search toolhttps://github.com/alexkalderimis/multimine-search-tool
Based on InterMine Lucene-based search index Allows for interoperation when data models are different
• Integration based on Homologs: Ontology integration using `dagify`
https://github.com/intermine/dagify
Pathway Integration by way of collating shared pathways
• InterMine Staircase Powerful client-side interface enabling data analysis
workflows and cross-mine integration via web serviceshttp://staircase.herokuapp.com
InterMine Staircase
InterMine StaircaseConfigure access to multiple mines
InterMine StaircaseCross-mine search
InterMine StaircaseFilter results by facets
InterMine StaircasePrepare and enrich lists
InterMine StaircasePerform mine-to-mine list conversions
InterMine StaircaseApp/tool compatibility
InterMine StaircaseApplication model
MedicMine SoyMine....
Available Reference Mines
• ThaleMine: https://github.com/Arabidopsis-Information-Portal/intermine/
Integrates variety of genomic datasets pertaining to Arabidopsis thaliana col-0 Leverages both data warehousing and federation methods Represents wide variety of data: genes, proteins, function, expression, co-
expression, interactions, pathways, homologs, alleles, polymorphism, stocks, germplasm, phenotypes
• MedicMine: https://github.com/jcvi-plant-genomics/intermine/ Warehouse for Medicago truncatula A17 genomic data Houses variety of data: genes, proteins, function, expression
• PhytoMine: https://github.com/JoeCarlson/intermine/ Warehouse for 47 different Angiosperm genomes Developed on a Chado InterMine migration path Houses variety of data: genes, proteins, expression, homologs, protein families,
variation
• FlyMine: https://github.com/intermine/intermine/
Recommendations and Challenges
• Recommendations: Develop core plant InterMine model Follow InterMine guidelines Learn from prior initiatives - InterMOD
• Challenges Users/developers are used to current way of doing
things Time taken to adapt to common data model and/or
software stack Difficult to arrive at consensus with diverse group
Acknowledgments
• InterMine Team Gos Micklem Julie Sullivan Alex Kalderimis Richard Smith Sergio Contrino Josh Heimbach et al.
• Araport Team Chris Town Jason Miller Matt Vaughn Maria Kim Svetlana
Karamycheva Erik Ferlanti Chia-Yi Cheng Benjamin Rosen Irina Belyaeva