an introduction to web apollo for i5k pilot species projects - hemiptera

27
An introduction to Web Apollo. A webinar for the i5K Pilot Species Projects - Hemiptera Monica Munoz-Torres, PhD Biocurator & Bioinformatics Analyst | @monimunozto Genomics Division, Lawrence Berkeley National Laboratory 12+1 May, 2014 UNIVERSITY OF CALIFORNIA

Upload: monica-munoz-torres

Post on 10-May-2015

191 views

Category:

Science


0 download

DESCRIPTION

Introduction to Web Apollo for the i5K Pilot species project. WebApollo is genome annotation editor; it provides a web-based environment that allows multiple distributed users to review, edit, and share manual annotations. This presentation includes information specific to the projects of the Global Initiative to sequence the genomes of 5,000 species of arthropods, i5K. Let's get started!

TRANSCRIPT

Page 1: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

An introduction to Web Apollo.A webinar for the i5K Pilot Species Projects - Hemiptera

Monica Munoz-Torres, PhDBiocurator & Bioinformatics Analyst | @monimunoztoGenomics Division, Lawrence Berkeley National Laboratory12+1 May, 2014

UNIVERSITY OF CALIFORNIA

Page 2: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Outline1. What is Web Apollo?:

• Definition & working concept.

2. Community based curation from our experience. Lessons Learned.

3. Manual Annotation at i5K: how do we get there?

4. Becoming acquainted with Web Apollo.

An introduction to Web Apollo.A webinar for the i5K Pilot Species Projects - Hemiptera.

Outline 2

Page 3: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

What is Web Apollo?

• Web Apollo is a web-based, collaborative genomic annotation editing platform.

We need annotation editing tools to modify and refine the precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically.

31. What is Web Apollo?

Find more about Web Apollo athttp://GenomeArchitect.org

and Genome Biol 14:R93. (2013).

Page 4: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Brief history of Apollo*:

a. Desktop: one person at a time editing a specific region, annotations saved in local files; slowed down collaboration.

b. Java Web Start: users saved annotations directly to a centralized database; potential issues with stale annotation data remained.

1. What is Web Apollo? 4

Biologists could finally visualize computational analyses and experimental evidence from genomic features and buildmanually-curated consensus gene structures. Apollo became a very popular, open source tool (insects, fish, mammals, birds, etc.).

*

Page 5: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Web Apollo• Browser-based; plugin for JBrowse.

• Allows for intuitive annotation creation and editing, with gestures and pull-down menus to create transcripts, add/delete/resize exons, merge/split exons or transcripts, insert comments(CV, freeform text), etc.

• Customizable rules and appearance.

• Edits in one client are instantly pushed to all other clients: Collaborative!

1. What is Web Apollo? 5

Page 6: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Working Concept

In the context of gene manual annotation, curation tries to find the best examples and/or eliminate (most) errors.

To conduct manual annotation efforts:

Gather and evaluate all available evidence using quality-control metrics to corroborate or modify automated annotation predictions.

Perform sequence similarity searches (phylogenetic framework) and use literature and public databases to:

• Predict functional assignments from experimental data.

• Distinguish orthologs from paralogs, and classify gene membership in families and networks.

2. In our experience. 6

Automated gene models

Evidence: cDNAs, HMM domain searches, alignments with assemblies or

genes from other species.

Manual annotation & curation

Page 7: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Dispersed, community-based gene manual annotation efforts.

Using Web Apollo, we* have trained geographically dispersed scientific communities to perform biologically supported manual annotations, and monitored their findings: ~80 institutions, 14 countries, hundreds of scientists, and gate keepers.

– Training workshops and geneborees.– Tutorials with detailed instructions.– Personalized user support.

2. In our experience. 7

*Collaboration with Elsik Lab, Hymenoptera Genome

Database.

Page 8: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

What have we learned?

Harvesting expertise from dispersed researchers who assigned functions to predicted and curated peptides, we have developed more interactive and responsive tools, as well as better visualization, editing, and analysis capabilities.

82. In our experience.

Page 9: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

It is helpful to work together.

Scientific community efforts bring together domain-specific and natural history expertise that would have otherwise remain disconnected.

92. In our experience.

Page 10: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Improved Automated Annotations*

In many cases, automated annotations have been improved (e.g: Apis mellifera. Elsik et al. BMC Genomics 2014, 15:86).

Also, learned of the challenges of newer sequencing technologies, e.g.: – Frameshifts and indel errors– Split genes across scaffolds– Highly repetitive sequences

To face these challenges, we train annotators in recovering coding sequences in agreement with all available biological evidence.

102. In our experience.

Page 11: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Understanding the evolution of sociality.

Comparison of the genomes of 7 species of ants contributed to a better understanding of the evolution and organization of insect societies at the molecular level.

Insights drawn mainly from six core aspects of ant biology:

1. Alternative morphological castes

2. Division of labor

3. Chemical Communication

4. Alternative social organization

5. Social immunity

6. Mutualism

11

… groups of communities have taught us a lot!

Libbrecht et al. 2012. Genome Biology 2013, 14:212

2. In our experience.

Page 12: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

A little training goes a long way!

With the right tools, wet lab scientists make exceptional curators who can easily learn to maximize the generation of accurate, biologically supported gene models.

122. In our experience.

Page 13: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Manual annotation at i5K

How do we get there?

3. How do we get there? 13

AssemblyManual

annotationExperimental

validationAutomated Annotation

In a genome sequencing project…

Page 14: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Gene Prediction

Gene Prediction:

Identification of protein-coding genes, tRNAs, rRNAs, regulatory motifs, repetitive elements (masked), etc.

Ab initio or homology-based. E.g: fgenesh, Augustus, geneid, SGP2

14

Nucleic Acids 2003 vol. 31 no. 13 3738-3741

3. How do we get there?

Page 15: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Gene Annotation

Gene Annotation:

Integration of data from prediction tools to generate a consensus set of predictions (gene models).

• Models may be organized by:- automatic integration of predicted sets; e.g: GLEAN- packaging necessary tools into pipeline; e.g: MAKER

• Transcriptomes are used to further inform the annotation process.

153. How do we get there?

Page 16: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

The Collaborative Curation Process at i5K

1) A computationally predicted consensus gene set has been generated using multiple lines of evidence; e.g. CLEC_v0.5.3-Models.

2) i5K Projects will integrate consensus computational predictions with manual annotations to produce an updated Official Gene Set (OGS):

» If it’s not on either track, it won’t make the OGS!» If it’s there and it shouldn’t, it will still make the OGS!

163. How do we get there?

Page 17: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Consensus set: reference and start point

• In some cases algorithms and metrics used to generate consensus sets may actually reduce the accuracy of the gene’s representation; e.g. use Augustus model instead to create a new annotation.

• Isoforms: drag original and alternatively spliced form to ‘User-created Annotations’ area.

• If an annotation needs to be removed from the consensus set, drag it to the ‘User-created Annotations’ area and label as ‘Delete’ on Information Editor.

• Overlapping interests? Collaborate to reach agreement.

• Follow guidelines for i5K Pilot Species Projects as shown at http://goo.gl/LRu1VY

173. How do we get there?

Page 18: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Navigation tools: pan and zoom Search box: go

to a scaffold or a gene model.

Grey bar of coordinates indicates location. You can also select here in order to zoom to a sub-region.

‘View’: change color by CDS, toggle strands, set highlight.

‘File’:Upload your own evidence: GFF3, BAM, BigWig, VCF*. Add combination and sequence search tracks.

‘Tools’: Use BLAT to query the genome with a protein or DNA sequence.

Available Tracks

Evidence Tracks Area

‘User-created Annotations’ Track

Login

Web Apollo

18

Graphical User Interface (GUI) for editing annotations

4. Becoming Acquainted with Web Apollo.

Page 19: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Flags non-canonical splice sites.

Selection of features and sub-features

Edge-matching

Evidence Tracks Area

‘User-created Annotations’ Track

The editing logic (server): selects longest ORF as CDS flags non-canonical splice sites

19

Web Apollo

4. Becoming Acquainted with Web Apollo.

Page 20: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

DNA Track

‘User-created Annotations’ Track

Two new kinds of tracks: annotation editing sequence alteration editing

Web Apollo

204. Becoming Acquainted with Web Apollo.

Page 21: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Web Apollo

21

Annotations, annotation edits, and History: stored in a centralized database.

4. Becoming Acquainted with Web Apollo.

Page 22: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Web Apollo

22

Annotation Information Editor

4. Becoming Acquainted with Web Apollo.

Page 23: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Web Apollo

23

Annotation Information Editor

4. Becoming Acquainted with Web Apollo.

Page 24: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

[Some of the] Functionality: Protein-coding gene annotation (that you know and love)

Sequence alterations (less coverage = more fragmentation)

Visualization of stage and cell-type specific transcription data as coverage plots, heat maps, and alignments

244. Becoming Acquainted with Web Apollo.

Page 25: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Example: ORCO

Live Demonstration using the Cimex lectularius genome

Footer 25

Page 26: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Arthropodcentric Thanks!AgriPest Base

FlyBase

Hymenoptera Genome Database

VectorBase

Apis mellifera

Tribolium castaneum

Pogonomyrmex barbatus

Manduca sexta

Bombus terrestris

Helicoverpa armigera

Nasonia vitripennis

Acyrthosiphon pisum

Mayetiola destructor

Atta cephalotes

Linepithema humile

Camponotus floridanus

Solenopsis invicta

Acromyrmex echinatior

26

Page 27: An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera

Thanks!

• Berkeley Bioinformatics Open-source Projects (BBOP), Berkeley Lab: Web Apollo and Gene Ontology teams. Suzanna E. Lewis (PI).

• Elsik Lab. § University of Missouri. Christine G. Elsik (PI).

• Ian Holmes (PI). * University of California Berkeley.

• Arthropod genomics community, i5K http://www.arthropodgenomes.org/wiki/i5K Steering Committee, USDA/NAL, HGSC-BCM, BGI, and 1KITE http://www.1kite.org/.

• Web Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI, and by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

• Insect images used with permission: http://AlexanderWild.com

• For your attention, thank you!

Thank you. 27

Web Apollo

Ed Lee

Gregg Helt

Colin Diesh §

Deepak Unni §

Rob Buels *

Gene Ontology

Chris Mungall

Seth Carbon

Heiko Dietze

BBOP

Web Apollo: http://GenomeArchitect.org

GO: http://GeneOntology.org

i5K: http://arthropodgenomes.org/wiki/i5K