ApolloCollaborative genome annotation editing
A workshop for the Stowers Institute Research Community
Monica Munoz-Torres, PhD | @monimunozto
Berkeley Bioinformatics Open-Source Projects (BBOP)Environmental Genomics & Systems Biology DivisionLawrence Berkeley National Laboratory
Kansas City, MO | 12 December, 2016
http://GenomeArchitect.org
Outline
• Today we will discusseffective ways to extract valuable information about a genome through curation efforts.
After this talk you will...• Better understand curation in the context of genome annotation:
assembled genome à automated annotation à manual annotation
• Become familiar with Apollo’s environment and functionality.
• Learn to identify homologs of known genes of interest in your newly sequenced genome.
• Learn how to corroborate and modify automatically annotated gene models using all available evidence in Apollo.
Experimental design, sampling.
Comparative analyses
Merged Gene Set
Manual Annotation
Automated Annotation
SequencingAssembly
Synthesis & dissemination.
Genome sequencing projects
Unlocking genomes
Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild
First,arefresher
A few things to remember during curation of gene models
7BIO-REFRESHER
• KEEPAGLOSSARY HANDYfromcontig tosplicesite
• WHATISAGENE?definingyourgoal
• TRANSCRIPTIONmRNAindetail
• TRANSLATIONreadingframes,etc.
• GENOMECURATIONstepsinvolved
The gene: a moving target
“The gene is a union of genomic
sequences encoding a coherent set of
potentially overlapping
functional products.”
Gerstein et al., 2007. Genome Res
9
"Gene structure" by Daycd- Wikimedia Commons
BIO-REFRESHER
mRNA
• Although of brief existence, understanding mRNAs is crucial,as they will become the center of your work.
10BIO-REFRESHER
Reading frames
v In eukaryotes, only one reading frame per section of DNA is biologically relevant at a time: it has the potential to be transcribed into RNA and translated into protein. This is called the OPEN READING FRAME (ORF)• ORF = Start signal + coding sequence (divisible by 3) + Stop signal
11BIO-REFRESHER
Splice sites
v The spliceosome catalyzes the removal of introns and the ligation of flanking exons.
v Splicing signals (from the point of view of an intron): • One splice signal (site) on the 5’ end: usually GT (less common: GC)• And a 3’ end splice site: usually AG• Canonical splice sites look like this: …]5’-GT/AG-3’[…
12BIO-REFRESHER
Exons and Introns
v Introns can interrupt the reading frame of a gene by inserting a sequence between two consecutive codons
v Between the first and second nucleotide of a codon
v Or between the second and third nucleotide of a codon
"Exon and Intron classes”. Licensed under Fair use via Wikipedia
13BIO-REFRESHER
Obstaclesto transcription and translation
v The presence of premature Stop codons in the message is possible. A process called non-sense mediated decay checks for them and corrects them to avoid: incomplete splicing, DNA mutations, transcription errors, and leaky scanning of ribosome – causing changes in the reading frame (frame shifts).
v Insertions and deletions (indels) can cause frame shifts when indel is not divisible by three. As a result, the peptide can be abnormally long, or abnormally short – depending when the first in-frame Stop signal is located.
Prediction&Annotation
15GENE PREDICTION & ANNOTATION
PREDICTION & ANNOTATION
v Identificationandannotationofgenomefeatures:
• primarilyfocusesonprotein-codinggenes.• alsoidentifiesRNAs(tRNA,rRNA,longandsmallnon-coding
RNAs(ncRNA)),regulatorymotifs,repetitiveelements,etc.
• happensin2phases:1. Computationphase2. Annotationphase
16GENE PREDICTION & ANNOTATION
COMPUTATION PHASE
a. Experimentaldataarealignedtothegenome:expressedsequencetags,RNA-sequencingreads,proteins(alsofromotherspecies).
b. Genepredictionsaregenerated:- ab initio:basedonnucleotidesequenceandcompositione.g.Augustus,GENSCAN,geneid,fgenesh,etc.
- evidence-driven:identifyingalsodomainsandmotifse.g.SGP2,JAMg,fgenesh++,etc.
Result:thesinglemostlikelycodingsequence,noUTRs,noisoforms.Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
17GENE PREDICTION & ANNOTATION
ANNOTATION PHASE
Experimentaldata(evidence)and predictionsaresynthetizedintogeneannotations.
Result: genemodelsthatgenerallyincludeUTRs,isoforms,evidencetrails.
Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
5’UTR 3’UTR
18
Insomecasesalgorithmsandmetricsusedtogenerateconsensussetsmayactuallyreducetheaccuracyofthegene’srepresentation.
CONSENSUS GENE SETS
Genemodelsmaybeorganizedintosetsusing:v combinersforautomaticintegrationofpredictedsets
e.g:GLEAN,EvidenceModeler
orv toolspackagedintopipelines
e.g:MAKER,PASA,Gnomon,Ensembl,etc.
GENE PREDICTION & ANNOTATION
19BIO-REFRESHER
Good genes are required!
1. Generate gene modelsv A few rounds of gene prediction.
2. Annotate gene modelsv Function, expression patterns,
metabolic network memberships.
3. Manually review themv Structure & Function.
Best representation of biology & removal of elements reflecting errors in automated analyses.
Functional assignments through comparative analysis using literature, databases, and experimental data.
Apollo
Gene Ontology
Curation improves quality
21BIO-REFRESHER
Curation is inherently collaborative
• It is impossible for a single individual to curate an entire genome with precise biological fidelity.
• Curators need second opinions and insights from colleagues with domain and gene family expertise.
CollaborativecurationwithApollo
Apollo: genome annotation editing• Collaborative, instantaneous, web-based, built on top of JBrowse.
• Supports real time collaboration & generates analysis-ready data
USER-CREATED ANNOTATIONS
EVIDENCE TRACKS
ANNOTATOR PANEL
GenomeArchitect.org
BECOMING ACQUAINTED WITH APOLLO
General process of curation
1. Select or find a region of interest (e.g. scaffold).
2. Select appropriate evidence tracks to review the genome element to annotate (e.g. gene model).
3. Determine whether a feature in an existing evidence track will provide a reasonable gene model to start working.
4. If necessary, adjust the gene model.
5. Check your edited gene model for integrity and accuracy by comparing it with available homologs.
6. Comment and finish.
ColorbyCDSframe,togglestrands,setcolorschemeandhighlights.
Uploadevidencefiles(GFF3,BAM,BigWig),addcombinationandsequencesearchtracks.
QuerythegenomeusingBLAT.
Navigationandzoom.Searchforagenemodelorascaffold.
User-createdannotations. Annotator
panel.
EvidenceTracks.
Stageandcell-typespecifictranscriptiondata.
GenomeArchitect.org
Apollo Genome Annotation Editor
Protein coding, pseudogenes, ncRNAs, regulatory elements, variants, etc.
Admin.
Apollo Architecture
GenomeArchitect.org
Web-based client + annotation-editing engine + server-side data service
Generates ready-made computable data
GenomeArchitect.org
Ref Sequence
Supports real time collaboration
GenomeArchitect.org
Let’sPlay!
Access Apollo
Annotations Organism Users Groups AdminTracks Reference Sequence
Removable side dock
1
Annotations
gene
mRNA
Annotation details & exon boundaries
1
2
2
CuratingwithApollo
34 | BECOMING ACQUAINTED WITH APOLLO
USER NAVIGATION
Annotatorpanel.
• Chooseappropriateevidencefromlistof“Tracks”onannotatorpanel.
• Select&dragelementsfromevidencetrackintothe‘User-createdAnnotations’area.
• Hoveringoverannotationinprogressbringsupaninformationpop-up.
• Creatinganewannotation
Adding a gene model
Adding a gene model
Adding a gene model
Editing functionality
Editing functionalityExample: Adding an exon supported by experimental data
• RNAseq reads show evidence in support of a transcribed product that was not predicted.• Add exon by dragging up one of the RNAseq reads.
Editing functionalityExample: Adjusting exon boundaries supported by experimental data
41 |
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• ‘Zoomtobaselevel’ revealstheDNATrack.
42 |
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• ColorexonsbyCDSfromthe‘View’menu.
43 |
Zoomin/outwithkeyboard:shift+arrowkeysup/down
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• TogglereferenceDNAsequenceand translationframesinforwardstrand.Togglemodelsineitherdirection.
annotatingsimplecases
“Simplecase”:- thepredictedgenemodeliscorrectornearlycorrect,and- thismodelissupportedbyevidencethatcompletely ormostlyagreeswiththeprediction.- evidencethatextendsbeyondthepredictedmodelisassumedtobenon-codingsequence.
Thefollowingaresimplemodifications.
ANNOTATING SIMPLE CASES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
• A confirmation box will warn you if the receiving transcript is not on thesame strand as the feature where the new exon originated.
• Check ‘Start’ and ‘Stop’ signals after each edit.
ADDING EXONS
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Iftranscriptalignmentdataareavailable&extendbeyondyouroriginalannotation,youmayextendoraddUTRs.
1. Rightclickattheexonedgeand‘Zoomtobaselevel’.
2. PlacethecursorovertheedgeoftheexonuntilitbecomesablackarrowthenclickanddragtheedgeoftheexontothenewcoordinatepositionthatincludestheUTR.
ADDING UTRs
ToaddanewsplicedUTRtoanexistingannotationalsofollowtheprocedureforaddinganexon.
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
To modify an exon boundary and matchdata in the evidence tracks: selectboth the [offending] exon and thefeature with the expected boundary,then right click on the annotation toselect ‘Set 3’ end’ or ‘Set 5’ end’ asappropriate.
Insomecasesallthedatamaydisagreewiththeannotation,inothercasessomedatasupporttheannotationandsomeofthe
datasupportoneormorealternativetranscripts.Trytoannotateasmanyalternativetranscriptsasarewellsupportedbythedata.
MATCHING EXON BOUNDARY TO EVIDENCE
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
1. Twoexonsfromdifferenttrackssharingthesamestart/endcoordinatesdisplayaredbartoindicatematchingedges.
2. Selectingthewholeannotationoroneexonatatime,usethis edge-matching functionandscrollalongthelengthoftheannotation,verifyingexonboundariesagainstavailabledata.Usesquare[]bracketstoscrollfromexontoexon.Usercurly{}bracketstoscrollfromannotationtoannotation.
3. CheckifcDNA/RNAseqreadslackoneormoreoftheannotatedexonsorincludeadditionalexons.
CHECKING EXON INTEGRITY
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Non-canonicalsplicesitesflags. Doubleclick:selectionoffeatureandsub-features
EvidenceTracksArea
‘User-createdAnnotations’Track
Edge-matching
Apollo’seditinglogic(brain):§ selectslongestORFasCDS§ flagsnon-canonicalsplicesites
ORFs AND SPLICE SITES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Non-canonicalsplices areindicatedbyanorangecirclewithawhiteexclamationpointinside,placedovertheedgeoftheoffendingexon.
Canonicalsplicesites:
3’-…exon]GA/TG[exon…-5’
5’-…exon]GT/AG[exon…-3’reversestrand,notreverse-complemented:
forwardstrand
SPLICE SITES
Zoom toreviewnon-canonicalsplicesitewarnings.Althoughthesemaynotalwayshavetobecorrected(e.g GCdonor),theyshouldbeflaggedwithacomment.
Exon/intronsplicesiteerrorwarning
Curatedmodel
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Apollocalculatesthelongestpossibleopenreadingframe(ORF)thatincludescanonical‘Start’and‘Stop’signalswithinthepredictedexons.
If‘Start’appearstobeincorrect,modifyitbyselectinganin-frame‘Start’codonfurtherupordownstream,dependingonevidence(proteins,RNAseq).
Itmaybepresentoutsidethepredictedgenemodel,withinaregionsupportedbyanotherevidencetrack.
Inveryrarecases,theactual‘Start’ codonmaybenon-canonical(non-ATG).
‘Start’ AND ‘Stop’ SITES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
annotatingcomplexcases
Evidencemaysupportjoiningtwoormoredifferentgenemodels.Warning: proteinalignmentsmayhaveincorrectsplicesitesandlacknon-conservedregions!
1. In‘User-createdAnnotations’area shift-clicktoselectanintronfromeachgenemodelandrightclicktoselectthe‘Merge’ optionfromthemenu.
2. Dragsupportingevidencetracksoverthecandidatemodelstocorroborateoverlap,orreviewedgematchingandcoverageacrossmodels.
3. Checktheresultingtranslationbyqueryingaproteindatabase e.g.UniProt,NCBInr.Addcommentstorecordthatthisannotationistheresultofamerge.
Redlinesaroundexons:‘edge-matching’allowsannotatorstoconfirmwhethertheevidenceisinagreementwithoutexaminingeachexonatthebaselevel.
COMPLEX CASESmerge two gene predictions on the same scaffold
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
Oneormoresplitsmayberecommendedwhen:- differentsegmentsofthepredictedproteinaligntotwoormoredifferentgenefamilies- predictedproteindoesn’taligntoknownproteinsoveritsentirelength- Transcriptdatamaysupportasplit,butfirstverifywhethertheyarealternativetranscripts.
COMPLEX CASESsplit a gene prediction
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
DNATrack
‘User-createdAnnotations’Track
COMPLEX CASESannotate frameshifts and correct single-base errors
Alwaysremember:whenannotatinggenemodelsusingApollo,youarelookingata‘frozen’versionofthegenomeassemblyandyouwillnotbeabletomodifytheassemblyitself.
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
COMPLEX CASEScorrecting selenocysteine containing proteins
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
COMPLEX CASEScorrecting selenocysteine containing proteins
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
1. Apolloallowsannotatorstomakesinglebasemodificationsorframeshiftsthatarereflectedinthesequenceandstructureofanytranscriptsoverlappingthemodification.ThesemanipulationsdoNOTchangetheunderlyinggenomicsequence.
2. Ifyoudeterminethatyouneedtomakeoneofthesechanges,zoomintothenucleotidelevelandrightclickoverasinglenucleotideonthegenomicsequencetoaccessamenuthatprovidesoptionsforcreatinginsertions,deletionsorsubstitutions.
3. The‘CreateGenomicInsertion’featurewillrequireyoutoenterthenecessarystringofnucleotideresiduesthatwillbeinsertedtotherightofthecursor’scurrentlocation.The‘CreateGenomicDeletion’ optionwillrequireyoutoenterthelengthofthedeletion,startingwiththenucleotidewherethecursorispositioned.The‘CreateGenomicSubstitution’featureasksforthestringofnucleotideresiduesthatwillreplacetheonesontheDNAtrack.
4. Onceyouhaveenteredthemodifications,Apollowillrecalculatethecorrectedtranscriptandproteinsequences,whichwillappearwhenyouusetheright-clickmenu‘GetSequence’option.Sincetheunderlyinggenomicsequenceisreflectedinallannotationsthatincludethemodifiedregionyoushouldalertthecuratorsofyourorganismsdatabaseusingthe‘Comments’sectiontoreporttheCDSedits.
5. Inspecialcasessuchasselenocysteinecontainingproteins(read-throughs),right-clickovertheoffending/premature‘Stop’signalandchoosethe‘Setreadthroughstopcodon’optionfromthemenu.
COMPLEX CASESannotating frameshifts and correcting single-base errors & selenocysteines
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
60 |
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• Information Editor
TheAnnotationInformationEditorUSER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
TheAnnotationInformationEditor
• AddPubMedIDs• IncludeGO termsasappropriate
fromanyofthethreeontologies• Writecomments statinghowyou
havevalidatedeachmodel.
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
63 |
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• Keeping track of each edit
Annotations,annotationedits,andHistory: storedinacentralizeddatabase.
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
Followthechecklistuntilyouarehappywiththeannotation!
Andrememberto…– commenttovalidateyourannotation,evenifyoumadenochangestoanexistingmodel.Thinkofcommentsasyourvoteofconfidence.
– oraddacommenttoinformthecommunityofunresolvedissuesyouthinkthismodelmayhave.
65 |
AlwaysRemember:Apollocurationisacommunityeffortsopleaseusecommentstocommunicatethereasonsforyour
annotation.Yourcommentswillbevisibletoeveryone.
COMPLETING THE ANNOTATION
BECOMING ACQUAINTED WITH APOLLO
Checklist
• Check‘Start’ and‘Stop’sites.
• Checksplicesites:mostsplicesitesdisplaytheseresidues…]5’-GT/AG-3’[…
• CheckifyoucanannotateUTRs,forexampleusingRNA-Seq data:– alignitagainstrelevantgenes/genefamily– blastp againstNCBI’sRefSeq ornr
• Checkforgaps inthegenome.
• Additionalfunctionalitymaybenecessary:–merging 2genepredictions- samescaffold– ‘merging’ 2genepredictions- differentscaffolds
– splitting ageneprediction– annotating frameshifts– annotatingselenocysteines,correctingsingle-baseandotherassemblyerrors,etc.
67 |
• Add:– Importantprojectinformationintheformof
comments– IDsfrompublicdatabasese.g.GenBank (via
DBXRef),genesymbol(s),commonname(s),synonyms,topBLASThits,orthologswithspeciesnames,andeverythingelseyoucanthinkof,becauseyouaretheexpert.
– Commentsaboutthekindsofchangesyoumadetothegenemodelofinterest,ifany.
– Anyappropriatefunctionalassignments,e.g.viaBLAST,RNA-Seq data,literaturesearches,etc.
CHECKLISTfor accuracy and integrity
MANUAL ANNOTATION CHECKLIST
Example
What’s new?... finding inspiration in the literature.
Example 69
“Molecular analysis of bed bug populations from across the USA and Europe found that >80% and >95% of the respective populations contained V419L and/or L925I mutations in the voltage-gated sodium channel gene, indicating widespread distribution of target-site-based pyrethroid resistance.”
Example
Example 70
CurationexampleusingtheHyalella aztecagenome(amphipodcrustacean).
What we know about this genome
• DataatNCBI:• >37,000 nucleotideseqsà scaffolds,mitochondrialgenes• 344 aminoacidseqsà mitochondrion• 47 ESTs• 0 conserveddomainsidentified• 0 “gene”entriessubmitted
• Dataati5KWorkspace@NAL(annotationhostedatUSDA)- 10,832scaffolds:23,288transcripts:12,906proteins
Example 71
PubMed Search: what’s new?
Example 72
PubMed Search: what’s new?
Example 73
“Tenpopulationsdifferedbyatleast550-foldinsensitivity topyrethroids.”
“Sequencingtheprimarypyrethroid targetsite,thevoltage-gatedsodiumchannel(vgsc),showsthatpointmutationsandtheirspreadinnaturalpopulationswereresponsiblefordifferencesinpyrethroid sensitivity.”
“Thefindingthatanon-targetaquaticspecieshasacquiredresistancetopesticidesusedonlyonterrestrialpestsistroublingevidenceoftheimpactofchronicpesticidetransportfromland-basedapplicationsintoaquaticsystems.”
How many sequences are there, publicly available, for our gene of interest?
Example 74
• Para,(voltage-gatedsodiumchannelalphasubunit;Nasonia vitripennis).
• NaCP60E (Sodiumchannelprotein60E;D.melanogaster).– MF:voltage-gatedcation channelactivity(IDA,GO:0022843).
– BP:olfactorybehavior(IMP,GO:0042048),sodiumiontransmembrane transport(ISS,GO:0035725).
– CC:voltage-gatedsodiumchannelcomplex(IEA,GO:0001518).
Andwhatdoweknowaboutthem?
Retrieving sequences for a sequence similarity search.
Example 75
>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAT searchinput
Example 76
>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAT searchresults
Example 77
• High-scoringsegmentpairs(hsp)arelistedintabulatedformat.
• Clickingononelineofresultssendsyoutothosecoordinates.
Creating a new gene model: drag and drop
Example 78
• ApolloautomaticallycalculateslongestORF.
• Inthiscase,ORFincludesthehigh-scoringsegmentpairs(hsp),markedhereinblue.
• Notethatgeneistranscribedfromreversestrand.
Available evidence tracks
Example 79
Get Sequence
Example 80
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Also, flanking sequences (other gene models) vs. NCBI nr
Example 81
Inthiscase,twogenemodelsupstream,at5’end.
BLASThsps
Review alignments
Example 82
HaztTmpM006234
HaztTmpM006233
HaztTmpM006232
Hypothesis for vgsc gene model
Example 83
Editing: merge the three models
Example 84
Mergebydroppinganexonorgenemodelontoanother.
Mergebyselectingtwoexons(holdingdown“Shift”)andusingtherightclickmenu.
or…
Result of merging the gene models:
Example 85
Editing: correct offending splice site
Example 86
Modifyexon/intronboundary:- Dragtheendofthe
exontothenearestcanonicalsplicesite.
or
- Useright-clickmenu.
Editing: set translation start
Example 87
Editing: delete exon not supported by evidence
Example 88
DeletefirstexonfromHaztTmpM006233
Editing: add an exon supported by RNAseq
Example 89
• RNAseqreadsshowevidenceinsupportoftranscribedproduct,whichwasnotpredicted.• Addexonatcoordinates97946-98012bydragginguponeoftheRNAseqreads.
Editing: adjust offending splice site using evidence
Example 90
Editing: adjust other boundaries supported by evidence
Example 91
Finished model
Example 92
Corroborateintegrityandaccuracyofthemodel:- Start andStop- Exonstructureandsplicesites…]5’-GT/AG-3’[…- Checkthepredictedproteinproductvs.NCBInr,UniProt,etc.
Information Editor
• DBXRefs:e.g.NP_001128389.1,N.vitripennis,RefSeq
• PubMedidentifier:PMID:24065824
• GeneOntologyIDs:GO:0022843,GO:0042048,GO:0035725,GO:0001518.
• Comments
• Name,Symbol
• Approve/Deleteradiobutton
Example 93
Comments(ifapplicable)
Exercises
Example dataset: Apis mellifera
GenomeArchitect.org
Practice using the Apis mellifera genome
GenomeArchitect.org
1. Evidence in support of protein coding gene models.
1.1 Consensus Gene Sets:Official Gene Set v3.2Official Gene Set v1.0
1.2 Consensus Gene Sets comparison:OGSv3.2 genes that merge OGSv1.0 andRefSeq genesOGSv3.2 genes that split OGSv1.0 and RefSeq genes
1.3 Protein Coding Gene Predictions Supported by Biological Evidence:NCBI GnomonFgenesh++ with RNASeq training dataFgenesh++ without RNASeq training dataNCBI RefSeq Protein Coding Genes and Low Quality Protein Coding Genes
1.4 Ab initio protein coding gene predictions:Augustus Set 12, Augustus Set 9, Fgenesh, GeneID, N-SCAN, SGP2
1.5 Transcript Sequence Alignment:NCBI ESTs, Apis cerana RNA-Seq, Forager Bee Brain Illumina Contigs, Nurse Bee Brain Illumina Contigs, Forager RNA-Seq reads, Nurse RNA-Seq reads, Abdomen 454 Contigs, Brain and Ovary 454 Contigs, Embryo 454 Contigs, Larvae 454 Contigs, Mixed Antennae 454 Contigs, Ovary 454 Contigs, Testes 454 Contigs, Forager RNA-Seq HeatMap, Forager RNA-Seq XY Plot, Nurse RNA-Seq HeatMap, Nurse RNA-Seq XY Plot
Practice using the Apis mellifera genome
GenomeArchitect.org
1. Evidence in support of protein coding gene models (Continued).
1.6 Protein homolog alignment:Acep_OGSv1.2Aech_OGSv3.8Cflo_OGSv3.3Dmel_r5.42Hsal_OGSv3.3Lhum_OGSv1.2Nvit_OGSv1.2Nvit_OGSv2.0Pbar_OGSv1.2Sinv_OGSv2.2.3Znev_OGSv2.1Metazoa_Swissprot
2. Evidence in support of non protein coding gene models
2.1 Non-protein coding gene predictions:NCBI RefSeq Noncoding RNANCBI RefSeq miRNA
2.2 Pseudogene predictions:NCBI RefSeq Pseudogene
Apollo Development
Nathan DunnTechnical Lead Eric Yao
Christine Elsik’s Lab, University of Missouri
Suzi LewisPrincipal Investigator
BBOP
Moni Munoz-TorresProject Manager
Deepak Unni
JBrowse. Ian Holmes’ Lab University of California, Berkeley
Thank you!Berkeley Bioinformatics Open-Source Projects, Environmental Genomics & Systems Biology, Lawrence Berkeley National Laboratory
Funding• Apollo is supported by NIH grants 5R01GM080203 from NIGMS,
and 5R01HG004483 from NHGRI.
• BBOP is also supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231
Collaborators• Ian Holmes, Eric Yao, UC Berkeley (JBrowse)• Chris Elsik, Deepak Unni, U of Missouri (Apollo)• Monica Poelchau, USDA/NAL (Apollo)• i5k Community
berkeleybop.org
UNIVERSITY OF CALIFORNIA
Suzanna Lewis & Chris Mungall
Seth Carbon (Noctua / AmiGO)Nathan Dunn (Apollo)Monica Munoz-Torres (Apollo / GO)Jeremy Nguyen Xuan (Monarch Init.)
PUBLIC DEMO100 |
APOLLO ON THE WEBinstructions
Public Honey bee demo available at:
genomearchitect.org/demo/
Username:[email protected]
Password:demo
APOLLOdemonstration
PUBLIC DEMO 101
Demonstrationvideoavailableathttps://youtu.be/VgPtAP_fvxY
Access Apollo