crowd sourcing methods to annotate biological processes andra waagmeester micelio

33
Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Upload: hillary-nicholson

Post on 18-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Crowd Sourcing Methods to Annotate Biological Processes

Andra Waagmeester

Micelio

Page 2: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Brothers Grimm: Stone soup

James Taylor http://km.aifb.kit.edu/ws/ckc2007/StoneSoup-www2007.pdf

Page 3: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

“We try to analyze a 3D cell on a 2D level.” - Mike Washburn

Page 4: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Subsequently, we represent the multi-dimensional data space of this 2D view of the cell, again in a 2D space

Page 5: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Relational databases

Gene name ID Identifier

ZNF635m 18801 23126

…. ….. ….

Gene name ID Identifier

ZNF280E POGZ ENSG00000143442

…. ….. ……

Page 6: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Relational databasesGene name ID Identifier

ZNF635m 18801 23126

…. ….. ….

Gene name ID Identifier

ZNF280E POGZ ENSG00000143442

…. ….. ……

HGNC ID HGNC Symbol Name

18801 POGZ Pogo transposable element with ZNF domain

….. …… ……

Page 7: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Graph databases• ZNF635m is_a gene • ZNF635m has_Entrez_ID “23126”• ZNF635m ID “18801”• “18801” has_symbol “POGZ”• ZNF280E has_Ensembl_ID “ENSG00000143442”• ZNF280E HGNC_symbol “POGZ”

Page 8: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Something more profound is needed than relabeling old

wine in new bottles

Page 9: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Unique Resource Identifier• HGNCID:18801• ENSEMBL:ENSG00000143442• ENTREZ:23126• PMID:20196795

Page 10: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

• ENTREZ:23126 rdf:type dbpedia:Gene• ENTREZ:23126 rdfs:label “ZNF635m”• ENTREZ:23126 rdfs:seeAlso HGNCID:18801• HGNCID:18801 rdfs:label “POGZ”• ENSEMBL:ENSG00000143442 rdf:type dbpedia:Gene• ENSEMBL:ENSG00000143442 rdfs:label “ZNF280E• ENSEMBL:ENSG00000143442 rdfs:seeAlso “HGNCID:POGZ”

Page 11: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Gerhard Michal 1974

Page 12: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio
Page 13: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Pathway external references

http://www.wikipathways.org/index.php/Pathway:WP430

Page 14: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Allows visualization of differences in expression

http://www.wikipathways.org/index.php/Pathway:WP430

Page 15: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Human and machine readable

@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix cas: <http://identifiers.org/cas/> .@prefix wprdf: <http://rdf.wikipathways.org/> .@prefix foaf: <http://xmlns.com/foaf/0.1/> ....<http://www.ncbi.nlm.nih.gov/gene/1394> a gpml:DataNode , skos:Concept , wp:GeneProduct ; rdfs:isDefinedBy gpml:DataNode ; rdfs:label "CRHR1"@en ; dc:identifier <http://identifiers.org/ncbigene/1394> , "1394"^^xsd:string ; dc:source "Entrez Gene"^^xsd:string ; dcterms:isPartOf <http://rdf.wikipathways.org/WP4_r39380.ttl> ; gpml:centerx "340.0"^^xsd:float ; ...

Page 16: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio
Page 17: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

311,696 articles (1.5% of PubMed)have been cited by GO annotations

Page 18: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio
Page 19: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Wikipedia is reasonably accurate

19

Page 20: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Wikipedia has breadth and depth

20

http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008

Articles

Words(millions)

Wikipedia Britannica Online

Page 21: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Centralizing key data storage

21

Source: http://commons.wikimedia.org/wiki/File:Wikidata_slides_Magnus_Manske,_Cambridge,_2014-02-27.pdf

Page 22: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Centralizing key data storage

22

Source: http://commons.wikimedia.org/wiki/File:Wikidata_slides_Magnus_Manske,_Cambridge,_2014-02-27.pdf

Page 23: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Wikidata

23

Provide a database of the world’s knowledge that

anyone can edit

- Denny Vrandečić

Page 24: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Centralizing key data storage

24

Page 25: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Centralizing key data storage

25

Page 26: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Centralizing key data storage

26

287 language editions of Wikipedia

Biocurators/Bioinformatics community

Page 27: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Wikidata for biology

27

is a

regulates

Interacts with

Protein

Glycoprotein

Neural development

VLDL receptor

Amyloid precursor protein

Property:P31

Property:P128

Property:P129

Q8054

Q187126

Q1345738

Q1979313

Q423510

Q414043

Reelin

http://www.wikidata.org/wiki/Q414043

Page 28: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Wikidata for biology

28

Property:P31

Property:P128

Property:P129

Q8054

Q187126

Q1345738

Q1979313

Q423510

Q414043

http://wikidata.org/w/api.php?action=wbgetentities&ids=Q414043&languages=en

Page 29: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Current progress

● All human and mouse genes and proteins loaded

● All diseases (Human Disease Ontology) loaded

● Dataset of all drugs in preparation

● Model for interlinking relations ready and proposed

Page 30: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Our current workflow

Page 31: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Stone soup of data

James Taylor http://km.aifb.kit.edu/ws/ckc2007/StoneSoup-www2007.pdf

Page 32: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Andrew Su, Scripps

Benjamin Good, Scripps

Sebastian Burgstaller, Scripps

Lynn Schriml, U Maryland

Elvira Mitraka, U Maryland

Gang Fu, NCBI

Evan Bolton, NCBI

Paul Pavlidis, U British Columbia

Peter Robinson, Charite

Many Wikipedia and Wikidata

editorsContact:

[email protected]@micelio.be

[email protected]

Page 33: Crowd Sourcing Methods to Annotate Biological Processes Andra Waagmeester Micelio

Crowdsourcing in action