importing community annotations into vectorbase. aims provide the vectorbase community with tools...

20
Importing Community annotations into VectorBase

Upload: mercy-ellis

Post on 05-Jan-2016

232 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Importing Community annotations into

VectorBase

Page 2: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Aims

• Provide the VectorBase community with tools for improving genome annotation.

• Must have low entry requirements, be scaleable and (relatively) simple to use

Page 3: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Genome annotation

• First-pass genome annotation is almost always based on “automatic” computational approaches

• ab initio

• Similarity based

• Transcript (ESTs, RNAseq)

• Protein (nr protein database)

Page 4: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Genome assembly

Map Repeats

Genefinding

Protein-coding genes

Map Transcripts Map Peptides

nc-RNAs

Functional annotation

Submission to archival databases (Release)

Genome annotation - building a pipeline

Page 5: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Current VectorBase annotation pipeline

• MAKER based automatic annotation

• includes SNAP training and ab initio

• RNAseq based transcript similarity prediction

• Taxonomically constrained peptide similarity prediction

• 2 rounds of prediction refinement & final round includes all peptide similarity

• Community annotation phase

• Capture gene structure changes

• Metadata associated with locus (symbol, description, citation)

• Submission to INSDC, propagation to UniProt

• Presentation through VectorBase

Start

1.0 set(automati

c)

1.1 set(published

)

Page 6: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Processing submissions

• 4 phases

• Capture

• Moderation

• Storage

• Integration

Page 7: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Capture: Community annotation decision tree

Page 8: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Community annotation decision tree

Page 9: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Tool of choice: WebApollo

• Web-based

• Eliminates main drawback of deprecated CAP system - GFF3 format validation

Page 10: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

WebApollo example

Page 11: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Community annotation decision tree

Page 12: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Community annotation decision tree

Page 13: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Tool of choice: Web forms

Page 14: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Moderation & Storage

• Gene metadata captured through forms to spreadsheets

• Batch submissions use similar spreadsheet format

Page 15: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Integration: Dataflow for ‘patch’ build

CAP GFF3

WebApollo

Reference core

Updated geneset

TXT

Patch

Users

Stable IDs

Reports

Updated core

IDs

Reference core CAP

Release coreGoogle Fusion

TableXrefs

Release

XrefsGoogle Form

`

Metadata

Users

}Commit

Page 16: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Presentation of community annotation

Page 17: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Usage (as of 2015-03-30)

• 31 WebApollo instances (Organisms)

• 3,407 gene models

• Gene metadata (protein-coding loci)

• 4,987 gene symbols

• 512 gene synonyms

• 57,878 gene descriptions

• 910 loci citations from 208 publications

Page 18: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Supplementing annotations

• Community jamboree’s

• ‘Standard’ improvement (e.g. Sandfly, snail communities)

• Glossina community (e.g. March 2015, Kenya)

• VectorBase

• Default Xref run includes symbol/description assignment via UniProt

• Projection of gene description via orthology from key marker species (e.g. An. gambiae). Due to be deployed for June (VB-2015-06) release.

• Supplemental data from genome papers (e.g. 16 Anopheles spp, Musca)

Page 19: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry
Page 20: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry

Deprecated CAP system example