dna subway green line onramp to hpc in biology education dave micklos and uwe hilgert iplant...

35
DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory; Bio5 Institute, University of Arizona

Upload: mitchell-jacobs

Post on 16-Dec-2015

223 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

DNA Subway Green Line Onramp to HPC in Biology Education

Dave Micklos and Uwe Hilgert

iPlant CollaborativeDNA Learning Center,

Cold Spring Harbor Laboratory; Bio5 Institute,

University of Arizona

Page 2: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

…ridean educational Discovery Environment

Page 3: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

Green Line: RNA Sequence (RNA-Seq) Analysis

• First fully GUI interface for RNA-Seq analysis — no command line or data conversions

• Accesses XSEDE system through the iPlant Agave API• Co-localizes up to 100 GB of data in iPlant Data Store• Look for differential gene expression in different

tissues, life stages, or treatment• Generate lists of expressed genes and fold-changes• Annotate sequenced genomes; add results to Red

Line projects

Page 4: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

150 feet

RNA code represents “active” DNA in genome

Page 5: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

Homo sapiens bitter taste receptor (TAS2R38) DNA code > RNA code

CCTTTCTGCACTGGGTGGCAACCAGGTCTTTAGATTAGCCAACTAGAGAAGAGAAGTAGAATAGCCAATTAGAGAAGTGACATCATGTTGACTCTAACTCGCATCCGCACTGTGTCCTATGAAGTCAGGAGTACATTTCTGTTCATTTCAGTCCTGGAGTTTGCAGTGGGGTTTCTGACCAATGCCTTCGTTTTCTTGGTGAATTTTTGGGATGTAGTGAAGAGGCAGGCACTGAGCAACAGTGATTGTGTGCTGCTGTGTCTCAGCATCAGCCGGCTTTTCCTGCATGGACTGCTGTTCCTGAGTGCTATCCAGCTTACCCACTTCCAGAAGTTGAGTGAACCACTGAACCACAGCTACCAAGCCATCATCATGCTATGGATGATTGCAAACCAAGCCAACCTCTGGCTTGCTGCCTGCCTCAGCCTGCTTTACTGCTCCAAGCTCATCCGTTTCTCTCACACCTTCCTGATCTGCTTGGCAAGCTGGGTCTCCAGGAAGATCTCCCAGATGCTCCTGGGTATTATTCTTTGCTCCTGCATCTGCACTGTCCTCTGTGTTTGGTGCTTTTTTAGCAGACCTCACTTCACAGTCACAACTGTGCTATTCATGAATAACAATACAAGGCTCAACTGGCAGATTAAAGATCTCAATTTATTTTATTCCTTTCTCTTCTGCTATCTGTGGTCTGTGCCTCCTTTCCTATTGTTTCTGGTTTCTTCTGGGATGCTGACTGTCTCCCTGGGAAGGCACATGAGGACAATGAAGGTCTATACCAGAAACTCTCGTGACCCCAGCCTGGAGGCCCACATTAAAGCCCTCAAGTCTCTTGTCTCCTTTTTCTGCTTCTTTGTGATATCATCCTGTGCTGCCTTCATCTCTGTGCCCCTACTGATTCTGTGGCGCGACAAAATAGGGGTGATGGTTTGTGTTGGGATAATGGCAGCTTGTCCCTCTGGGCATGCAGCCATCCTGATCTCAGGCAATGCCAAGTTGAGGAGAGCTGTGATGACCATTCTGCTCTGGGCTCAGAGCAGCCTGAAGGTAAGAGCCGACCACAAGGCAGATTCCCGGACACTGTGCTGAGAATGGACATGAAATGAGCTCTTCATTAATACGCCTGTGAGTCTTCATAAATATGCC

Page 6: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

66

Differential Gene ExpressionRNA Sequence (RNA-Seq) gives “snapshot” of genes active in different cells at different times

Page 7: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

77

Differential Gene ExpressionRNA Sequence (RNA-Seq) gives “snapshot” of genes active in different cells

Page 8: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

RNA Sequence (RNA-Seq) Analysis

Isolate total RNA; convert to DNA library

Design RNA-Seq experiment, i.e., differential expression

Sequence experiment and control libraries

Analyze sequence data on DNA Subway Green Line

Follow-up experimental validation

Page 9: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

Image source: http://www.bgisequence.com

Page 10: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

1) Manage Data: Quality Assessment with FastQC; ~100 Million 75/150 nucleotide reads in < 1hr

Page 11: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

2) FastX ToolKit: Quality Control with FastX Toolkit; ~100M 75/150 nucleotide reads in <1 hr (some took up to 19 hours…)

Page 12: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;
Page 13: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

3) TopHat: Aligns ~100 Million 75/150 nucleotide (paired end) reads to a reference genome of 100M–5B in 6–19hr

Page 14: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;
Page 15: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

TopHat AlignmentJBrowse

Page 16: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

TopHat AlignmentJBrowse

Page 17: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

4) CuffLinks: Assembles transcripts and calculates abundance on BAM files, 1–12GB in 6–19hr

Page 18: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;
Page 19: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

5) CuffDiff: Merges assemblies from Cufflinks and performs differential expression analysis on 4–9 samples in 6–19 hr

Page 20: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;
Page 21: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;
Page 22: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

Green LineQueue time vs Run time

Asking for a high run time, leads to longer queue times Asking for a short high time may lead to job being

terminated Users don't like to wait too long Users want the results right away Finding the right balance is not easy

Page 23: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

Green LineDealing w/ the unexpected

Systems taken offline Maintenance Network outages, data transfer issues Science API gives glitches Authentication

Page 24: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

Green Line“Monitoring XSEDE”

Page 25: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;
Page 26: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

DNA Subway“Power Desktop”

• Intuitive interface to support seamless genome “round trip” for eukaryote of choice

• Access high performance computing to analyze whole genome data (RNA-seq, initially)

• Scaffold data to sequenced genomes available in iPlant Data Store

• Directly upload RNA-seq reads as biological evidence for genome annotation using Red Line

Page 27: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

NSF CCLI Project RetreatJune 8–20, 2014, CSHL

• 11 faculty from PUIs• Program included lectures/practical sessions

Wet lab: RNA library prepGreen Line analysis & bioinformaticsPedagogy/teaching resources Virtual training materials

Page 28: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

Agnes Ayme-Southgate College of Charleston, SC

Flight muscle development during life-stage transitions in Apis melifera (honeybee)

Judy Brusslan California State University, Long Beach, CA

Leaf development and senescence in Arabidopsis thaliana

Raymond Enke James Madison University, VA

Retina development in Gallus gallus

Shaye Lewis Prairie View A&M University, TX

Testes development from juvenile to puberty in caprine (goat)

Irina Makarevitch Hamline University, MN

Response to cold stress in maize

Judith Ogilvie Saint Louis University, MO

Retinal changes of mice with retinitis pigmentosa

Jeremy Seto New York City College of Technology, CUNY, NY

Differentiation of rat pheochromocytoma line cells (PC12) to a neuronal-like phenotype

Carrie Thurber Abraham Baldwin Agricultural College, IL

Seed abscission in Sorghum bicolor

George Ude Bowie State University, MD

Floral inflorescence genes in banana/plantains

Deirdre Vaden Prairie View A&M University, TX

Peripheral blood mononuclear cells from hypertensive rats treated with captopril

Scott Woody University of Wisconsin, WI

Gibberellic acid exposure in Brassica rapa (Fast Plants) gibberellic acid (gad) mutants

NSF CCLI Project RetreatFaculty Participants

Page 29: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

NSF CCLI Project RetreatFlight muscle development during life-stage

transitions in Apis mellifera (honeybee)

Agnes Ayme-Southgate, College of Charleston, SC

All honeybees begin as worker bees, flying short distances. Some honeybees transition into foragers, flying long distances. This transition necessitates major changes in flight muscles. Goal is to identify the gene expression changes in flight muscles during this transition

Courses• Biol 322: Developmental Biology, 30–38 students• Genetics, 100 students• Undergraduate research in lab, 2–3 students

Page 30: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

NSF CCLI Project RetreatDifferential gene expression in Capra hircus (goat)

testes during juvenile development

Shaye Lewis, Prairie View A&M University, TX

Fertility phenotypes show low heritability, and semen analysis parameters cannot determine fertility status. Molecular biomarkers can increase efficiency of artificial insemination and embryo transfer in goats. Goal is to identify genes important for normal testes development and function

Courses•4533: Animal Breeding & Genetics, 20 students•Undergraduate research in lab, 4 students

Page 31: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

NSF CCLI Project RetreatUnderstanding transcriptional response to cold

stress in maize

Irina Makarevitch, Hamline University, MN

Maize is grown worldwide and is astaple for >1 billion people. Maize is thermophilic and sensitive to low temperatures, and understanding how plants respond to cold can improve yields.Goal is to identify genes that are differentially expressed when maize is grown under cold stress

Courses•Biol 201: Principles of Genetics, 80 students•Biol 301: Genomics & Bioinformatics, 20 students•Undergraduate research in lab, 4 students

Page 32: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

NSF CCLI Project RetreatRNA-Seq Datasets Generated and Analyzed

Using the Green Line of DNA Subway

• 8 eukaryotic organisms• 21 controls paired with 26

experimental conditions• 402 Gbases sequenced• 837 jobs submitted to TACC• 87% jobs completed• 695 hours total CPU time• 16 threads/processors running

concurrently

Page 33: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

100 level

200 level

300 level

400 level

500 level

Undergrad Research

Intro

Biology

Genetics, 270

Molecular & Cell Biology, 50

Genetics, 220

Molecular Biology, 100

Genomics & Bioinformatics, 70

Developmental Biology, 35

Cell Structure & Function, 30

Synthetic Biology, 30

Anatomy/Physiology, 50

Advanced Genetic Techniques, 15

Cell & Molecular Biology, 75

Genomics, 40

Animal Breeding & Genetics, 20

Independent Research, 5

Molecular Applications in Crop Improvement

15

100s 320 550 140 20 15

Intended Implementation 2014-15

Page 34: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

DNA Subway is…

ProducersUwe HilgertDavid MicklosJason Williams

DesignersEun-Sook JeongSusan Lauter

ProgrammersCornel GhibanMohammed KhalfanSheldon McKay

ContributorsMatt VaughnRion DooleyAnthony BiondoJim BurnetteScott CainEd LeeZhenyuan Lu

AdvisorsMatt ConteCarson HoltBruce NashOscar Pineda-Catalan

Page 35: DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

HPC in Undergraduate Biology EducationBanbury Center, CSHL, September 3-5, 2014

Contact Dave Micklos ([email protected])

A Great Gatsby era estate on Long Island’s “Gold Coast”

Funded by NSF and the Alfred P. Sloan Foundation