the philosophy of biocuration and its use to analyse the fission yeast genome content

38
The philosophy of biocuration and its use to analyse the fission yeast genome content Valerie Wood

Upload: caleb-macias

Post on 03-Jan-2016

27 views

Category:

Documents


4 download

DESCRIPTION

The philosophy of biocuration and its use to analyse the fission yeast genome content. Valerie Wood. What is Biocuration. Two main aspects to fission yeast curation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The philosophy of biocuration and its use to analyse the fission yeast genome content

The philosophy of biocuration

and its use to analyse the fission yeast genome

content

Valerie Wood

Page 2: The philosophy of biocuration and its use to analyse the fission yeast genome content

What is Biocuration

Two main aspects to fission yeast curation

1. Literature curation: involves reading the full text of publications and associating novel biological information with the appropriate genes or features

2. Sequence analysis: to infer biological information for unpublished genes

Page 3: The philosophy of biocuration and its use to analyse the fission yeast genome content

• We need to make annotations as specific (complete depth), and as comprehensively (complete breadth) as possible. We need to group similar annotations consistently

so users can

i) Access required information on a gene by gene basis

ii) Analyse their own datasets e.g enrichment

iii) Search for candidate genes of interest

iv) Access similar features in other organisms

The Challenges

Page 4: The philosophy of biocuration and its use to analyse the fission yeast genome content

Data gathering for genes of interest

• traditionally small number of genes • requires detailed literature searching

• time-consuming Gene 1RNA recognition motifmRNA exportprotein phosphorylationnuclearmitotic cell cyclephosphorylated....

Gene 2SAP domainmRNA exportnucleolarRNA elongation (pol II)…

Gene 3mRNA exporttranscription (pol II)…

Gene 4mRNA exporttranscription polyadenylation…

Gene 5mRNA exportRNA elongation…

Gene 6mRNA exportrRNA transcriptionDNA topological change…

Gene 5000cell cyclechromosome segregationkinetochore assemblyprotein localization…

Not Scalable!

Page 5: The philosophy of biocuration and its use to analyse the fission yeast genome content

Grouping by “feature”

By establishing links between similar features we can begin to identify tends (enrichments and depletions) in thousands of genes typically obtained in functional genomics datasets

mRNA exportGene 1Gene 2Gene 3Gene 4Gene 5

nucleolarGene 10Gene 15Gene 18…

phosphorylatedGene 1Gene 7Gene 10…

transcriptionGene 1Gene 2Gene 3Gene 4Gene 5..

Cell cycleGene 1Gene 7Gene 8…

RNA recognition motifGene 1Gene 7Gene 8…

Page 6: The philosophy of biocuration and its use to analyse the fission yeast genome content

The literature corpus

What is the size of the ‘annotation problem’?

Fission yeast OR pombe gives 9264Adding “cell cycle” gives 2871

SolutionsMore curatorsCommunity curation

ProblemsFunders don’t want to fund curationCan we make the community curate

Page 7: The philosophy of biocuration and its use to analyse the fission yeast genome content

Grant

• Additional curators (2) to ensure comprehensive and deep curation of the literature

• Software to support curation activities (including community curation)• A computational infrastructure to

integrate nd display the curated data with the HTP data within Ensembl

Page 8: The philosophy of biocuration and its use to analyse the fission yeast genome content

http://www.sanger.ac.uk/Projects/S_pombe/

Need to make an intuitive web based user interface where the community can add “consistent” and comprehensive curationWatch this space!

Page 9: The philosophy of biocuration and its use to analyse the fission yeast genome content

Ontologies• Ontologies provides a “controlled vocabulary” for

biological knowledge • Consistent unambiguous descriptions• Species independent, interpreted identically both

within and between genomes, therefore enabling cross species comparisons

• Provides a way to capture and represent biological knowledge in a computable form

• Ability to annotate to different levels of granularity depending what is know or what can be inferred

Ontologies Include:

1. A vocabulary of terms (names for concepts)2. Definitions3. Defined logical relationships to each other

Page 10: The philosophy of biocuration and its use to analyse the fission yeast genome content

bud initiation?tooth bud initiation, cell bud initiation, plant bud initiationConversely different names are used for the same concepts MVB sorting, multivesicular body sorting, late endosome to vacuole transport, alternative names are exact synonyms

Disambiguation and Grouping

This principle applies to any type of curation, for example when describing phenotypes, similar cells can be described as “skittle” “bottle” or “dumbell”

Page 11: The philosophy of biocuration and its use to analyse the fission yeast genome content

GO is 3 ontologies F molecular function (activity, GTPase, transporter, receptor)P biological process (cell division transcription,gluconeogenesisC cellular component (location or complex)

Demonstrating ontology principles with GO

Page 12: The philosophy of biocuration and its use to analyse the fission yeast genome content

DAG Structure

Many-to-many parental relationship

Each child may have one or more parents

DAG: Directed Acyclic Graph

One-to-many parental relationship

Each child has only one parent

Heirarchy

Page 13: The philosophy of biocuration and its use to analyse the fission yeast genome content

Relationships between terms

cell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

is-apart-of

Page 14: The philosophy of biocuration and its use to analyse the fission yeast genome content

Inheritance

An important feature of GO is that broader parents give rise to more specific children.When a gene is directly annotated to a term (I.e DNA replication), it is automatically indirectly annotated to all of its parent terms

Allows curators to assign terms at different levels of granularity, depending what is known or can be inferred

gene A

Page 15: The philosophy of biocuration and its use to analyse the fission yeast genome content

Ontologies.....

• Provides a standard for annotation• Have 2 components the ontology and the

annotations• Allows experimental work to be evaluated in the

context of other experimental data which may be annotated at different levels of granularity

• Allows biologists to search and analyse data (particularly for identifying groups of overrepresented genes in large scale experiments)

• Becomes increasingly powerful as the ontologies and annotations are refined

Page 16: The philosophy of biocuration and its use to analyse the fission yeast genome content

Other annotation types• products (special case, unique descriptors)• annotation status• species distribution• orthology• phenotype data, will use (PATO)• protein modifications, will use(MOD) • metabolites will use (Chebi, chemical entities of biological

importance)• sequence features will use (SO)• protein-protein interactions will use (MI) and BioGrid Increasingly, features will be described using “cross products”

derived from multiple ontologies:e.g.“response to a specific drug” will be made with the GO

biological process term “response to drug” and a drug from the ChEBI

e.g. phenotypes are typically annotated using a PATO “quality” term combined with a wild-type GO process (e.g. conjugation, defective; crossover formation, abolished)

Page 17: The philosophy of biocuration and its use to analyse the fission yeast genome content

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Page 18: The philosophy of biocuration and its use to analyse the fission yeast genome content

The curation process and annotation status

Page 19: The philosophy of biocuration and its use to analyse the fission yeast genome content

Manual Curation• Emphasis on Primary

Literature • Manual inspection of

sequence similarity

Computational Mappings • Inferred electronically

No data for FP or C 2542

Total 34032

GO Curation Strategy

1829 publications17655 annotations

4127 annotations

9708 annotations

Page 20: The philosophy of biocuration and its use to analyse the fission yeast genome content

Evidence Codes Used

Oct 07 Dec 08 June 09 8618 8889 9076 IDA inferred from direct assay 776 991 1083 IPI inferred from physical interaction 901 1129 1164 IGI inferred from genetic interaction 1089 1091 1106 TAS traceable author statement 1073 1164 1264 IC inferred by curator 9045 9706 9708 ISS inferred from sequence similarity 1912 2328 2455 IMP inferred from mutant phenotype 522 595 617 NAS non-traceable author statement 6397 4620 4127 IEA from electronic annotation

2542 ND no data, root node annotations 185 IEP 702 RCA

30333 31676 34032

Page 21: The philosophy of biocuration and its use to analyse the fission yeast genome content

Molecular Function: 9049Biological Process: 10985Cellular Component: 13998Total 34032

30,616 annotations to 3080 terms 06/06/07

31,676 annotations to 3263 terms 13/12/08

34,035 annotations to 3361 terms 16/06/09

GO annotation progress

Page 22: The philosophy of biocuration and its use to analyse the fission yeast genome content

Analysing the curated data

Page 23: The philosophy of biocuration and its use to analyse the fission yeast genome content

GO aspect coverage

Total 5025All 3 aspects unknown 118

Page 24: The philosophy of biocuration and its use to analyse the fission yeast genome content

experimentallycharacterised, known

inferred from orthology,known

conserved unknown

sequence orphan

pombe specific family

639

312

2133

1817

Protein Annotation Status

56

36.7 %

43.0 %

12.9 %

6.3 %

1.1 %

Total 4957

Page 25: The philosophy of biocuration and its use to analyse the fission yeast genome content

639

98 Bacteria,Fungi,Plant

196 Fungi only

346 to Metazoa of these 235 1:1 of these 131 nuclear

over 100 nature papers?

The conserved “unknown” unknowns

Page 26: The philosophy of biocuration and its use to analyse the fission yeast genome content

This is the 53 at the top of the list

Splicing?

Page 27: The philosophy of biocuration and its use to analyse the fission yeast genome content

Kim D-U, Hayles J, Kim D et al (manuscript submitted)

Page 28: The philosophy of biocuration and its use to analyse the fission yeast genome content

• High level view of GO (genes annotated to granular terms are mapped to higher level terms)

• Allows users to group genes into broader categories to assess their distribution, useful for large scale, genome wide analyses or smaller gene sets

• Different Annotation groups have created specific GO_Slims are available at GO’s FTP site (pombe now has an “official GO slim” which give good coverage of high level processes).

• You can create and use your own GO slim with high level terms of interest

• CARE: not a gene product count, as gene products have multiple annotations (will explain this in the workshop)

“Slimming”

Page 29: The philosophy of biocuration and its use to analyse the fission yeast genome content

Process Super Slim

Added 8454 i.e. more than the number of genes. Not mutually exclusive, therefore it doesn’t make sense to put in a pie chart and show as percentagesAlso important to show which genes are not annotated (root node annotations)Which genes are not in the slim set but are annotated to other terms

Page 30: The philosophy of biocuration and its use to analyse the fission yeast genome content

Term Enrichment

• Finding significantly enriched terms shared among a list of genes

• Discover what these genes may have in common • Statistical measure of how likely your differentially

regulated genes fall into that category by chance

Page 31: The philosophy of biocuration and its use to analyse the fission yeast genome content

This is a comparative enrichment analysis (fission yeast vs. budding yeast)

It is showing processes enriched in the essential gene set in the non-essential gene set.

The enrichment also identified many child terms which were enriched but the results were presented as a “slim” of the high level terms, and the complete tem lists are presented in supplementary data

Kim D-U, Hayles J, Kim D et al (manuscript submitted)

Page 32: The philosophy of biocuration and its use to analyse the fission yeast genome content

Acknowledgements

• Martin Aslett (WT Sanger UK)• Midori Harris and the GO editorial

team (EBI UK)• Jacky Hayles (CRUK) and the

deletion project consortium (Kwang Lae-Hoe)

Page 33: The philosophy of biocuration and its use to analyse the fission yeast genome content

Data mining, complex A B C D E F G H I J A cell division 10

18 356 224 31 49 2 271 132 - -

B transcription>translat. 1367 53 66 172 0 111 47 - - C cytoskeletal/morph/vmt 842 152 32 30 78 160 - - D metabolic pathways 800 196 61 36 52 - - E mitochondrial translation 732 98 47 14 - - F membrane transport 299 6 2 - - G stress 422 65 - - H signal transduction 369 - - I other 323 - J none 988

What: You can data mine the entire genome to find overlaps and intersections between terms of interest to target genes for further study

UPDATE

Page 34: The philosophy of biocuration and its use to analyse the fission yeast genome content

• A gene product can have several functions, cellular locations and be involved in many processes

• Annotation of a gene product to one ontology is independent from its annotation to other ontologies

• Annotations are only to terms reflecting a normal activity or location

• Usage of ‘unknown’ GO terms

Additional points

Page 35: The philosophy of biocuration and its use to analyse the fission yeast genome content

1. NOT• a gene product is NOT associated with the GO term • to document conflicting claims in the literature.

2. Contributes to• distinguishes between individual subunit functions and

whole complex functions• used with GO Function Ontology

3. Colocalizes with• transiently or peripherally associated with an organelle

or complex • used with GO Component Ontology

Modifying the interpretation of an annotation: the

Qualifier column

Page 36: The philosophy of biocuration and its use to analyse the fission yeast genome content

Fatty acid biosynthesis (Swiss-Prot Keyword)

EC:6.4.1.2 (EC number)

IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry)

GO:Fatty acid biosynthesis

(GO:0006633)

GO:acetyl-CoA carboxylase activity

(GO:0003989)

GO:acetyl-CoA carboxylaseactivity

(GO:0003989)

Electronic Annotations

Page 37: The philosophy of biocuration and its use to analyse the fission yeast genome content

Unknown v.s. Unannotated• Direct root node annotations are used when

the curator has determined that there is no existing literature to support an annotation.– Biological process GO:0000004– Molecular function GO:0005554– Cellular component GO:0008372

• NOT the same as having no annotation at all – No annotation means that no one has looked yet

Page 38: The philosophy of biocuration and its use to analyse the fission yeast genome content

All three aspects unknown 105 (564 S. cerevisiae)

Function 3542 (includes protein binding)

Biological Process4019

Cellular Component4821

14672679

3279(3455)

191 54

18

Total 5004 (5780 S. cerevisiae)

993

GO aspect coverage (old)