the iplant collaborative: a cyberinfrastructure for the life sciences
DESCRIPTION
iPlant Presentation given at NESCent in June 2012 for the phylotastic participants of PhylotasticTRANSCRIPT
The iPlant Collaborative: A Cyberinfrastructure for the Life
Sciences
Naim MatasciBIO5 / The iPlant Collaborative
What is iPlant?
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
Problem 1: Data Volume
• Cost of analysis follows Moore's Law:– 1 Student with 1 computer to analyze 1 Mb of
data produced in 2001– 200 Students and 200 computers to analyze all
data produced for the same cost today (10 Gb)
1. Tools separated by compute platform, data format, integration issues, and programming model.
2. Mixture of desktop, command line, database, and web-based tools
3. Labor intensive, fragile solutions devised to reach scientific objectives
4. Little ability to share results, analytical methods
5. Lack of reproducibility
Problem 2: Fragmented Analytical Landscape
Scalability
ABI 3730 DNA Analyzer, illumina Genome Analyzer, Joe Felesenstein ca. 1980, Ranger Cluster at TACC
10
Major Ways to Access iPlant
• Storing and sharing data large and small: iPlant Data Storage• Integrated web-based analysis: The Discovery Environment• Cloud computing: Atmosphere• Applications: TNRS, TreeViewer, PhytoBisque, etc• Scientific networking, knowledgebase and information
exchange: My-Plant.org • Educational tools: DNASubway• Embedding iPlant CI capabilities into software: The
Foundation API• High Performance Computing for experts: TeraGrid/XSEDE
Why is the tree of life important?
“Knowledge of evolutionary relationships is fundamental to biology, yielding new insights across the plant sciences, from comparative genomics and molecular evolution, to plant development, to the study of adaptation, speciation, community assembly, and ecosystem functioning.”
Nothing in biology makes sense except in the light of evolution.
T. G. Dobzahnsky
C3 to C4 Photosynthesis
Xin-Guang et al. 2008
"We combined geospatial and molecular sequence data from two public archives to produce a 1,230-taxon phylogeny of the grasses with accompanying climate data for all species, extracted from more than 1.1 million herbarium specimens."
Edwards and Smith, 2010
"Here we show that grasses are ancestrally a warm-adapted clade and that C4 evolution was not correlated with shifts between temperate and tropical biomes. Instead, 18 of 20 inferred C4 origins were correlated with marked reductions in mean annual precipitation."
New Possibilities
illumina Genome Analyzer, Ranger Cluster at TACC, Acer phylogeny (Ackerly 2009), Green Plant ToL
Just Ask
Atmosphere
iPlant's APIs – The Foundation APIService
EndpointRole
IO File storage, retrieval and management. Database interoperability
DATA File format conversion
APPS Registration and discovery of HPC applications
JOB Submission and management of compute jobs
SYSTEMS Availability and info about XSEDE hosts
PROFILE User profile discovery
AUTH Token based secure authentication
POSTIT URL shortener
25
Consumer Applications
iPlant Data Store
Dramatization: Not the actual iPlant Data Store
Overview of the iPlant Data StoreSome important items we won’t see in the demo
Source Destination Copy Method Time (seconds)
CD My Computer cp 320
Berkeley Server My Computer scp 150
External Drive My Computer cp 36
USB2.0 Flash My Computer cp 30
iDS MyComputer iget 18
My Computer My Computer cp 15
Close to optimum conditions; transfer between
Univ. of Arizona and UC Berkeley
100GB: 29m15s
1 GB / 17.5 seconds
Tree Visualization
• > 500K Taxa• Fast• Web based, platform independent• Semantic zooming• Metadata driven display of information
iPlant Tree Viewer
http://portnoy.iplantcollaborative.org/
LIVE TREE VIEW DEMO
Obstacles
Number of taxa Taxa names
Taxonomic uncertainty
1. Non-existent names• Misspellings• Contamination
• Annotations• Morphospecies• Digitization issues (frame shifts, character encoding)Lexical
variants (digitization conventions)
2. Synonymy• Nomenclatural synonyms• Taxonomic synonyms / concepts
3. Misidentifications, incomplete identifications
a) Centaurium curvistamineum (Wittr.) Abrams (1951)
b) Centaurium minimum (Howell) Piper (1915)
c) Centaurium muhlenbergii (Griseb.) Wight ex Piper (1906)
d) Centaurium muhlenbergii (Griseb.) Wight ex Piper forma albiflorum (Suksd.) St. John (1937)
e) Centaurium muhlenbergii (Griseb.) Wight ex Piper var. albiflorum Suksd. (1927)
f) Centaurodes muhlenbergii (Griseb.) Kuntze (1891)
g) Erythraea curvistaminea Wittr. (1886)
h) Erythraea minima Howell (1901)i) Erythraea muhlenbergii Griseb.
(1839)
Image: Gordon Leppig & Andrea J. Pickart
Request Tool Installation
Apps -> Create -> New App
Create New -> Request Tool Installation
Fill out forms and submit.Receive response in 2-5 days.