the iplant collaborative community cyberinfrastructure for life science arthropod genomics research...
TRANSCRIPT
The iPlant Collaborative Community Cyberinfrastructure for Life Science
Arthropod Genomics Research in ARS Workshop
Jason Williams / @JasonWilliamsNY
Cold Spring Harbor Laboratory, iPlant
Goals for today’s talk
• Begin the process of adopting /adapting iPlant to build your own community capacity
• Learn about what you hope iPlant may be able to offer
• Highlight existing capabilities of the platform
• Explain some of the context and rationale behind iPlant
The iPlant CollaborativeVision
Enable life science researchers and educators to use and extend iPlant's foundational cyberinfrastructure to understand and ultimately predict the complexity of biological systems and their dynamic nature under various environmental conditions.
The iPlant CollaborativeWhat is Cyberinfrastructure?
The iPlant CollaborativeWhat is Cyberinfrastructure?
Platforms, tools, datasets Storage and compute Training and support
The iPlant CollaborativeWhat problems can iPlant Solve?
Crops and model plant systems Animal and livestock Agronomic microbes, insects…
The iPlant CollaborativeWhat problems can iPlant Solve?
iPlant is built for Data
The iPlant CollaborativeHow was iPlant built?
The iPlant CollaborativeLandscape of community identified priorities
Genomic data and analysis:• Reference guided assembly• De novo assembly• RNA-Seq (expression; gene/isoform discovery)• Variant calling• Genome/Transcriptome annotation• ChIP-Seq/Integration of epigenetic information• Multiple sequencing platforms• New and evolving technologies
The iPlant CollaborativeLandscape of community identified priorities
Genotypic Environmental Phenotypic
Comparative Genomics
Sequencing & Assembly
Annotation
Environmental datasets
Climate model products Image-based
Phenotyping
Molecular Phenotyping
Trait Data
In planningIn progressFoundation in placeEvolutionary
ModelsEcological
Models
Association Studies
PathwayAnalysis
iPlant is a collaborative virtual organization
The iPlant CollaborativeWho makes up iPlant?
The iPlant CollaborativeHow is iPlant funded?
Funded by NSF
• First funding ($50 Million) in 2008
• Renewal funding ($50.3 Million) in 2013
o Scientific Advisory Boardo Focus on Genotype-Phenotype scienceo NSF Recommended expansion of scope
beyond plants
Ultracentrifuge - Electrophoresis
Cycle sequencing – HTS
~20 years
Technology…Transition…
Enablement…
The iPlant CollaborativeWhat a unified platform gets you
• Ability to access and manage data• Software to analyze data• Computing resources• Skills and help to use software and interpret results
Get Science Done
The iPlant CollaborativeWhat a unified platform gets you
• Metadata management• Ability to share data and workflows• Open source sustainable tools
Reproducibility
The iPlant CollaborativeWhat a unified platform gets you
• High-performance and scalable computing• Ability automate and collaborate• Funding spent on science, not software or hardware
Productivity
The iPlant CollaborativeSupport for a diverse user base
Bioinformatics Users:
• Easy-to-use tools/interfaces (little or no command-line)
• Generous data storage, end-to-end workflows
• Access to training and support
The iPlant CollaborativeSupport for a diverse user base
Bioinformaticians:
• (More) access to HPC
• Make tools and algorithms more accessible to users
• Better ways to manage large-project metadata
The iPlant CollaborativeSupport for a diverse user base
Bioinformatics Engineers (community/core support):
• Ways to scale support for community or institutional users
• Optimization of software
• Shared data storage and user portals
The iPlant CollaborativeProducts
What do you get with your account?
The iPlant CollaborativeProducts
• We strive to be the CI Lego blocks• Danish 'leg godt' - 'play well’• Also translates as 'I put
together' in Latin• If a solution is not available you
can craft your own using iPlant CI components
iPlant Data Store
Initial 100 GB allocation – TB allocations available
Automatic data backup
Easy upload /download and sharing
The resources you need to share and manage data with your lab, colleagues and community
Discovery EnvironmentHundreds of bioinformatics Apps in an easy-to-use interface
A platform that can run almost any bioinformatics application
Seamlessly integrated with data and high performance computing
User extensible – add your own applications
AtmosphereCloud computing for the life sciences
Simple: One-click access to more than 200 virtual machine images
Flexible: Fully customize your software setup
Powerful: Integrated with iPlant computing and data resources
Science APIsFully customize iPlant resources
Science-as-a-service platform
Define your own compute, and storage resources (local and iPlant)
Build your own app store of scientific codes and workflows
DNA SubwayEducational workflows for Genomes, DNA Barcoding, RNA-Seq
Commonly used bioinformatics tools in streamlined workflows
Teach important concepts in biology and bioinformatics
Inquiry-based experiments for novel discovery and publication of data
BisqueImage analysis, management, and metadata
Secure image storage, analysis, and data management
Integrate existing applications or create new ones
Custom visualization and image handling routines and APIs
The iPlant CollaborativeGenome Assembly and Annotation
The iPlant CollaborativeGenome Assembly and Annotation
Annotation of the Lobolly Pine Mega genome—Jill Wegrzyn20.15 Gb assembly—split into 40 jobs—216 CPU/job (8640 CPU total)—17 hours
22,656 CPU cores on1,888 nodes
Genome AssemblySize (Mb) CPU
Run Time
Arabidopsis thaliana TAIR10 120 600 2:44Arabidopsis thaliana TAIR10 120 1500 1:27Zea mays RefGen_v2 2067 2172 2:53
TACC Lonestar Supercomputer
Campbell et al. Plant Physiology. December 4, 2013, DOI:10.1104/pp.113.230144
The iPlant CollaborativeAn Evolving Data Commons
specimencollection
analysis
project creation publication
data discovery and re-use
The iPlant CollaborativeChallenge: Transform existing datasets to do custom queries
The iPlant CollaborativeLeveraging iPlant Data Store and iRODS
The iPlant Collaborative
Collaborating with us
The iPlant Collaborative
• “Powered by iPlant” supports a variety of ways of using the iPlant infrastructure underneath another application that communicates with users; usually outside the iPlant project.
• Other major projects have adtoped the iPlant CI as their underlying infrastructure (some completely, some in limited ways – more on this later).
Example “Powered by iPlant” Impact
CoGE usage and user count after federation and interoperability with iPlant
Extended SupportMake bioinformatics tools better
• We find example after example of codes that get well below .01% of peak on a single core
• By the end of the year, it will be difficult to get a server below 20 cores.
• There is little sympathy for data/computing challenges when the software is willing to ignore at least 95-99.99% of available performance
D.Stanzione, Director TACC
The iPlant CollaborativeGetting tools out there
GenSel installed by developers, made available through the DEFor whole-genome predictions, widely used in breeding
Dorian Garrick, Iowa State University
The iPlant CollaborativeSolving problems faster
iAnimal genotyping pipeline developed for 1000 Bulls processes two terabytes (TB) of raw sequence data to DNA variants in less than 8 hours
James Koltes, Iowa State University
Where to go from here:
iPlant Learning Center
• Get Started Guide• Tutorials and Videos• Documentation
Upcoming Events
• Workshops• Webinars
iPlant can come to you…
Tools & Services Workshops Genomics in Education Workshops
• Targeted to researchers• Hands-on learning modules• Individual consultations
• Targeted to educators• Pair bioinformatics with classroom labs• Help for generating lesson plans
• Pairs with asynchronous learning• Reach broader audiences• Follow up with workshop learners
Webinars
Where to go from here:
• If iPlant can, we’ll help show you how…• If iPlant can’t we’ll find the path that gets you what you need
Don’t hesitate to ask “Can iPlant do this?”
Keep asking at ask.iplantcollabortive.org
Staff:Greg AbramSonali AdityaRitu AroraRoger BarthelsonRob BovillBrad BoyleGordon BurleighJohn CazesMike ConwayVictor CorderoRion DooleyAaron DubrowAndy EdmondsDmitry FedorovMelyssa FratkinMichael GattoUtkarsh GaurCornel Ghiban
Executive Team
Steve Goff - UAMatthew Vaughn - TACCNirav Merchant – UAEric Lyons - UADoreen Ware – CSHL
Current and Former:
Faculty Advisors & Collaborators:Ali AkogluKobus BarnardTimothy ClausnerBrian EnquistDamian GesslerRuth GreneJohn HartmanMatthew HudsonDavid LowenthalB.S. Manjunath
Students:Peter BaileyJeremy BeaulieuDevi BhattacharyaStorme BriscoeYaDi ChenDavid ChoiBarbara Dobrin
David NealeBrian O’MearaSudha RamDavid SaltMark SchildhauerDoug SoltisPam SoltisEdgar SpaldingAlexis StamatakisSteve Welch
Zhenyuan LuAaron MarcuseKubitzRobert McLayNathan MillerSteve Mock Martha NarroBenoit ParmentierJmatt PetersonDennis RobertsPaul SarandoJerry SchneiderBruce Schumaker
Steve GregoryMatthew HanlonNatalie HenriquesUwe HilgertNicole HopkinsEunSook JeongLogan JohnsonChris JordanKathleen KennedyMohammed KhalfanDavid KnappLars KoersterkSangeeta KuchimanchiKristian KvilekvalSue LauterTina Lee
Edwin SkidmoreBrandon SmithMary Margaret Sprinkle Sriram SrinivasanJosh SteinLisa StillwellJonathan StrootmanPeter Van BurenHans VasquezGrossRebeka VillarrealRamona WalllsLiya WangAnton Westveld Jason WilliamsJohn WregglesworthWeijia Xu
Andrew PredoehlSathee RavindranathKyle SimekGregory StriemerJason VandeventerNicholas WoodwardKuan Yang
Postdocs:Barbara BanburyChristos Noutsos Solon PissisBrad Ruhfel
John DonoghueYekatarina KhartianovaChris La RoseAmgad MadkourAniruddha MaratheAndre MercerKurt MichaelsZack Pierce
The iPlant CollaborativeWho makes up iPlant?
Download these slides…
www.iplantc.org/arswiki1
@JasonWilliamsNY
@iPlantCollab
Jason Williams – [email protected]