the biomoby semantic annotation experiment
Post on 11-May-2015
752 Views
Preview:
DESCRIPTION
TRANSCRIPT
Open Semantic Annotation an experiment with BioMoby Web Services
Benjamin Good, Paul Lu, Edward Kawas, Mark Wilkinson
University of British ColumbiaHeart + Lung Research Institute
St. Paul’s Hospital
The Web contains lots of things
But the Web doesn’t know what they ARE
text/html video/mpeg image/jpg audio/aiff
The Semantic Web
It’s A Duck
Semantic Web Reasoning
Logically… It’s A DuckWalks Like a Duck
Quacks Like a Duck
Looks Like a Duck
Defining the world by its properties helps me find the KINDS of things I am looking for
Add properties to the things we are describing
Asserted vs. ReasonedSemantic Web
Catalog/ID
SelectedLogicalConstraints(disjointness, inverse, …)
Terms/glossary
Thesauri“narrowerterm”relation
Formalis-a
Frames(Properties)
Informalis-a
Formalinstance
Value Restrs.
GeneralLogical
constraints
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
Who assigns these properties?
Works ~well
…but doesn’t scale
When we say “Web” we mean “Scale”
Natural Language Processing
Scales Well…
Works!!
…Sometimes…
…Sort of….
Natural Language Processing
Problem #1
Requires text to get the process started
Problem #2
Low accuracy means it can only support, not replace, manual annotation
Web 2.0 Approach
OPEN to all Web users (Scale!)
Parallel, Distributed,
“Human Computation”
Human Computation
Getting lots of people to solve problems
that are difficult for computers.
(term introduced by Luis Von Ahn, Carnegie Mellon University)
Example: Image Annotation
ESP Game results
• >4 million images labeled
• >23,000 players
• Given 5,000 players online simultaneously, could label all of the images accessible to Google in a month – See the “Google image labeling game”…
Luis Von Ahn and Laura Dabbish (2004) “Labeling images with a computer game” ACM Conference on Human Factors in Computing Systems (CHI)
Social Tagging
• Accepted
• Widely applied
• Passive volunteer annotation.
• Del.icio.us • 2006 surpassed 1 million users
• Connotea, CiteUlike, etc.• See also our ED2Connotea extension
This is a picture of Japanese traditional wagashi sweets called “seioubo” which is modeled after a peach
BUSTED!
I just pulled a bunch of Semantics out of my Seioubo!
BUSTED!
This is a picture of Japanese traditional wagashi sweets called “seioubo” which is modeled after a
peach
This is a totally sweet picture of peaches grown in the city of Seioubo, in the Wagashi region of
Japan
So tagging isn’t enough…
We need properties, but the properties need to be semantically-
grounded in order to enable reasoning
(and this ain’t gonna happen through NLP because there is even less context in tags!)
Social Semantic Tagging
Q1: Can we design interfaces that assist “the masses” to derive their tags from controlled vocabularies (ontologies)?
Q2: How well do “the masses” do when faced with such an interface? Can this data be used “rigorously” for e.g. logical reasoning?
Q3: “The masses” seem to be good at tagging things like pictures… no brainer! How do they do at tagging more complex things like bioinformatics Web Services?
Context: BioMoby Web Services
BioMoby is a Semantic Web Services framework in which the data-objects consumed/produced by
BioMoby service providers are explicitly grounded (semantically and syntactically) in an
ontology
A second ontology describes the analytical functions that a Web Service can perform
Context: BioMoby Web Services
BioMoby ontologies suffer from being semantically VERY shallow…
thus it is VERY difficult to discover the Web Service that you REALLY want at any given
moment…
Can we improve discovery by improving the semantic annotation of the services?
Experiment
1. Implemented The BioMoby Annotator• Web interface for annotation• myGrid ontology + Freebase as the grounding
2. Recruited volunteers
3. Volunteers annotated BioMoby Web Services
4. Measured• Inter-annotator agreement• Agreement with manually constructed standard
• Individuals, aggregates
BioMoby AnnotatorInformation extracted from
Moby Central Web Service Registry
Tagging areas
Tagging
Type-ahead tag suggestions drawn from myGrid Web Service Ontology & from Freebase
Tagging
New simple tags can also be created, as per normal tagging
“Gold-Standard” Dataset
• 27 BioMoby services were hand-annotated by us
• Typical bioinformatics functions– Retrieve database record– Perform sequence alignment– Identifier-to-Identifier mapping
Volunteers
• Recruited friends and posted on mailing lists.• Offered small reward for completing the experiment
($20 Amazon)
• 19 participants– Mix of BioMoby developers, bioinformaticians,
statisticians, students.
– Majority had some experience with Web Services
– 13 completed annotating all of the selected services
Measurements
• Inter-annotator agreement– Standard approach for estimating annotation
quality.– Usually measured for small groups of professional
annotators (typically 2-4**)
• Agreement with the “gold standard”– Measured in the same way but one “annotator” is
considered the standard
Inter-annotator Agreement Metric
• Positive Specific Agreement– Amount of overlap between all annotations elicited for a
particular item comparing annotators pairwise
2*I
(2*I + a + b)
I = intersection of sets A and Ba = A without Ib = B without I
PSA(A, B) =
Gold-standard Agreement Metrics
• Precision, Recall, F measure
True tags by T
All tags by TPrecision (T) =
True tags by T
All true tagsRecall (T) =
(F = PSA if one set considered “true”)
F = harmonic mean of P and R (2PR/P+R)
Metrics
• Average pairwise agreements reported– Across all pairs of annotators– By Service Operation (e.g. retrieval) and
Objects (e.g. DNA sequence)– By semantically-grounded tags– By free-text tags
Inter-Annotator Agreement
Type N pairs mean median min max stand. dev.
coefficient of variation
Free, Object
1658 0.09 0.00 0.00 1.00 0.25 2.79
Semantic, Object
3482 0.44 0.40 0.00 1.00 0.43 0.98
Free, Operation
210 0.13 0.00 0.00 1.00 0.33 2.49
Semantic, Operation
2599 0.54 0.67 0.00 1.00 0.32 0.58
Agreement to “Gold” Standard
Subject Type
measure mean median min max stand. dev. coefficient of variation
Data-types(input & output)
PSA 0.52 0.51 0.32 0.71 0.11 0.22
Precision 0.54 0.53 0.33 0.74 0.13 0.24
Recall 0.54 0.54 0.30 0.71 0.12 0.21
Web ServiceOperations
PSA 0.59 0.60 0.36 0.75 0.10 0.18
Precision 0.81 0.79 0.52 1.0 0.13 0.16
Recall 0.53 0.50 0.26 0.77 0.15 0.28
Consensus & Correctness: Datatypes
0.00
0.20
0.40
0.60
0.80
1.00
1 2 3 4 5 6 7 8 9
Min votes per tag
PSA
Precision
Recall
coverage
Consensus and Correctness: Operations
0.00
0.20
0.40
0.60
0.80
1.00
1 2 3 4 5 6 7 8 9
Min votes per tag
PSA
Precision
Recall
coverage
Open Annotations are Different
Trust must be earned
• Can be decided at runtime– By consensus agreement (as described here)– By annotator reputation– By recency– By your favorite algorithm– By you !
IT’S ALL ABOUT CONTEXT!!
We can get REALLY good semantic annotations IF we provide context!!
Open Semantic Annotation Works
• IF we provide CONTEXT
• IF enough volunteers contribute
• BUT we do not understand why people do or do not contribute without $$$ incentive
• SO further research is needed to understand Social Psychology on the Web
Watch for
• Forthcoming issue in the International Journal of Knowledge Engineering and Data Mining on
“Incentives for Semantic Content Creation”
Ack’s
• Benjamin Good
• Edward Kawas
• Paul Lu
• MSFHR/CIHR Bioinformatics Training Programme @ UBC
• iCAPTURE Centre @ St. Paul’s Hospital
• NSERC
• Genome Canada/Genome Alberta
top related