model organism databases and community annotation gene structure annotation at tair philippe lamesch...

29
Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch [email protected]

Upload: stephanie-pope

Post on 26-Mar-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Model Organism Databases and Community Annotation

Gene Structure Annotation at TAIR

Philippe Lamesch

[email protected]

Page 2: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Curator-User collaborations in various databases

Karen Yook

Issak Yosief Tecle

Donghui LiPhilippe Lamesch

Page 3: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

TAIR

TAIR

ESTs, cDNAs

Usersubmissions

curators Newrelease

TAIR webcurators

Gene annotationpipeline

Page 4: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Statistics on various data submissions

30, affecting >1,500 genes

NovelSequenceExon-Intron StructureUTRsSplice-variantsGene type (protein coding, RNA gene, pseudogene)

Page 5: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Gene structure & sequence info at TAIR

GFF file: exon/intron data

Gene Model Page: Fasta seqGenome Browsers:Seqview & Gbrowse

Page 6: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

2 types of data submission

• Small sets: mostly gene structure update

• Genome-wide lists

Page 7: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Submitting Gene structure data to TAIR

• Download submission formGene reannotation submission form:ChromosomeGene NameGene DescriptioncDNA SequenceProtein SequenceGenbank entryContact InformationMethod DescriptionPublication

http://www.arabidopsis.org/submit/gene_annotation.submission.jsp

Page 8: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Submitting Gene structure data to TAIR

• Submit tab delimited or gff file (especially for large data sets)

http://www.arabidopsis.org/submit/gene_annotation.submission.jsp

Page 9: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

2 types of data submission

• Small sets: mostly gene structure update

• Genome-wide lists

Page 10: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Gene Annotation SubmissionExample (1) of small dataset

Randall Shultz: Reannotation of 4 genes coding for core DNA replication proteins

AT1G19080

Page 11: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

• Have a look at the current structure of that gene• Identify the suggested structure difference• Analyze evidence supporting the structure update• Update the gene structure

Gene Annotation SubmissionComplex gene structure

Page 12: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

• Have a look at the current structure of that gene

Gene Annotation SubmissionExample of small dataset

Apollo software interface

Intronless geneMulti-exon gene

ESTsProtein similarity

Page 13: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

• Have a look at the current structure of that gene• Identify the suggested structure difference

Gene Annotation SubmissionExample of small dataset

Seq 1

Seq 2Blast2Seq TAIR7 gene extends

at position 115

Page 14: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

• Have a look at the current structure of that gene• Identify the suggested structure difference• Analyze evidence supporting the structure update

Gene Annotation SubmissionExample of small dataset

ESTs and cDNAs confirm R.S.’s gene structure reannotation

Page 15: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

AT1G08260

Gene Annotation SubmissionExample of small dataset

Page 16: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Gene Annotation SubmissionExample of small dataset

• Have a look at the current structure of that gene• Identify the suggested structure difference• Analyze evidence supporting the structure update• Update the gene structure

Page 17: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Complex gene structure

Page 18: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Gene Annotation SubmissionComplex gene structure

• Have a look at the current structure of that gene• Identify the suggested structure difference• Analyze evidence supporting the structure update• Update the gene structure

Page 19: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Gene Annotation SubmissionComplex gene structure

• Have a look at the current structure of that gene• Identify the suggested structure difference• Analyze evidence supporting the structure update• Update the gene structure

Page 20: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

With a little help from the submitter…

Sequencealignment

Page 21: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org
Page 22: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org
Page 23: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Gene Annotation SubmissionLarge datasets

Hanada

Brendel

Ceres

Eugene

Gnomon

uORFs

Rhoades

miRNA

687

25

26

34

326

64

23

58

Specific genen type

Large set

Large set

Genome wide predictions

Genome wide predictions

Specific gene type

Specific gene type

Specific gene type

# of genes Dataset typeDataset name

Page 24: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Integrating large gene structure datasets into the TAIR annotation

An active process

• Gather evidence supporting the gene update• Read publication(s) if existing• Categorize genes based on strength of evidence• Load gene structures into Apollo

• Decide which genes will be integrated into the TAIR

annotation and which will be shown as track in Gbrowse

Page 25: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Example: Hanada et al 2007

Page 26: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Hanada et al 2007

– Constrained or Expressed 3633– Constrained and Expressed 934 – overlap TAIR7 844 – overlap TE coordinates 768 – cluster within 350 bp 662

Page 27: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Of the 7159 genes

- 687 have been integrated into TAIR8

- 2946 are not integrated but are shown in a special Gbrowse track

Hanada et al 2007Conclusion

Page 28: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

How to improve the user submission process

• Encourage users to use submission forms• Improved gene structure submission form with additional

columns for information regarding the structure update• Encourage users to use gff3 format, especially for large

datasets• Encourage users to provide as much supporting

evidence as possible along with their structural dataset • One-on-one sessions for scientists and curators at

science conferences

Page 29: Model Organism Databases and Community Annotation Gene Structure Annotation at TAIR Philippe Lamesch curator@arabidopsis.org

Non-formatted submissions