uab 2011- combining human and computational intelligence
DESCRIPTION
TRANSCRIPT
Combining Human and Computational Intelligence
Ilya Zaihrayeu, Pierre Andrews, Juan Pane
2
Semantic annotation lifecycle
User
free text annotations
What if the users could use semantic annotations
instead to leverage semantic technology services?
Semantic annotation=structure
and/or meaningReasoning Semantic search …
Problem 1: help the user find and
understand the meaning of semantic
annotations
Problem 2: extract
(semantic) annotations
from contexts of user
resource at publishing
Context Problem 3: QoS of semantics-enabled services
Problem 4: semi-automatic semantification of existing
annotations
4/14/2011
3
Index: meaning summarization
User
Reasoning Semantic search …
Problem 1: help the user find and
understand the meaning of semantic
annotations
4/14/2011
4
Meaning summarization: why?• The right meaning of the words being used for the
annotation are in the mind of the people using them• E.g.: Java:– an island in Indonesia south of Borneo; one of the world's
most densely populated regions– a beverage consisting of an infusion of ground coffee beans;
"he ordered a cup of coffee“– a simple platform-independent object-oriented
programming language used for writing applets that are downloaded from the World Wide Web by a client and run on the client's machine
• Descriptions are too long for the user to grasp the meaning immediately – too high barrier to start generating semantic annotations
island
beverage
programming language
4/14/2011
5
Meaning summarization: an example
One word summaries are generated from the relations in
the knowledge base, sense definitions, synonyms and
hypernym terms
4/14/2011
6
Meaning summarization: evaluation results
Best precision: 63%
Discriminating power: 76,4%
4/14/2011
If we talk about java, does the word coffee mean the same as island?
7
Index: gold standard dataset
User
Reasoning Semantic search …
Problem 3: QoS of semantics-enabled services?
Problem 4: semi-automatic semantification of existing
annotations
In order to evaluate the performance of the
algorithms, a gold standard dataset is
needed
4/14/2011
8
Proposed Approach
Tag Tokens Senses
javaisland Java islandJava is land…
Java – an island in Indonesia to the south of BorneoIsland – a land mass that is surrounded by water
DisambiguationPreprocessing
Create a gold standard of folksonomy with sense
80% Accuracy 59% Accuracy
# of annotations 4 296
Unique tags 857
Unique URLs 644
Unique users 1 194
Annotator Agreement 81 %
4/14/2011
9
A Platform for Gold Standards of Semantic Annotation Systems
• Manual validation• RDF export• Evaluation of– Preprocessing– WSD – BoW Search– Convergence
• Open source:http://sourceforge.net/projects/tags2con/
7 modules25K lines of code26% of comments
4/14/2011
10
Delicious RDF Dataset @ LOD cloud
http://disi.unitn.it/~knowdive/dataset/delicious/
# triples 85 908
Outlinks to LOD cloud (WN synsets)
651
4/14/2011
Dereferenceable at:
11
Index: QoS for semantic search
User
Reasoning Semantic search …
Problem 3: QoS of semantics-enabled services?
4/14/2011
12
Semantic search: why?
• With the free text search, the following problems may reduce precision and recall:– synonymy problem: searching for “images” should return
resources annotated with “picture”– polysemy problem: searching for “java” (island) should
not return resources annotated with “java” (coffee beverage)
– specificity gap problem: searching for “animals” should also return resources annotated with “dogs”
• Semantic, meaning-based search can address the above listed problems
4/14/2011
13
Semantics vs Folksonomy
Specificity Gap
Semantic search: complete and correct results (the baseline)
Recall goes down as the specificity gap increasescar
taxi
vehiclelink
User
query
submit
resource
annotation
result
SG=1
SG=2
4/14/2011
javaisland
java island
Java(island) island(land)
Used to build “raw” queries
Used to build BoW queries
Used to build semantic queries
correct and completeSpecificity Gap (SG)
14
Index: semantic convergence
User
Reasoning Semantic search …
Problem 4: semi-automatic semantification of existing
annotations
4/14/2011
15
Semantic convergence: Why?Other
3% Cannot decide5%
Ab-brevia-tion2%
Missing sense15%
I don'
t know
4%
With a WN sense71%
“General” domains: cooking, travel, education
Other1% Cannot decide
6% Ab-brevia-tion5%
Missing
sense35%
I don't know3%
With a WN sense49%
Random: programming and web domain
4/14/2011
AjaxMacAppleCSS…
16
Semantic convergence: proposed solution
• Find new senses of terms– Find different senses of the same term (word sense)– Find synonymous of a term (synonymous sets - synset)
• Place the new synset in the vocabulary is-a hierarchy• What we improve
– Better use of Machine Learning techniques– The polysemy issue is not considered in the state of the
art– Missing or “subjective” evaluations in the state of the art
• Evaluation using the Delicious dataset
4/14/2011
17
Convergence Evaluation: Finding Senses
Tag Collocation User Collocation
4/14/2011
B1
B4
B2
B3
t2
t3t4 t5
t1
Precision: 56%Recall: 73%
Random Baseline
Precision: 42%Recall: 29%
Precision: 57%Recall: 68%
B1
B4B3
t2
t3
t4
t5
t1
U1
U2
18
Semantic annotation lifecycle
User
free text annotations
What if the users could use semantic annotations
instead to leverage semantic technology services?
Semantic annotation=structure
and/or meaningReasoning Semantic search …
Problem 1: help the user understand the meaning of semantic
annotations?
Problem 2: extract
(semantic) annotations
from contexts of user
resource at publishing?
Context Problem 3: QoS of semantics-enabled services?
Problem 4: semi-automatic semantification of existing
annotations
combining human and computational intelligence
Conclusions
4/14/2011
19
Conclusions• We developed and evaluated a meaning summarization algorithm• We developed a “semantic folksonomy” evaluation platform• We studied the effect of semantics on social tagging systems:
– how much semantics can help? – how much the user needs to be involved? – How human and computer intelligence can be combined in the generation
and consumption of semantic annotations• We developed and evaluated a knowledge base enrichment algorithm• We built and used a gold standard dataset for evaluating:
– Word Sense Disambiguation– Tag Preprocessing– Semantic Search– Semantic Convergence
4/14/2011
20
Integration with the use cases4/14/2011
21
Publications
• Semantic Disambiguation in Folksonomy: a Case StudyPierre Andrews, Juan Pane, and Ilya Zaihrayeu;
Advanced Language Technologies for Digital Libraries, Springer’s LNCS.• Semantic Annotation of Images on Flickr
Pierre Andrews, Sergey Kanshin, Juan Pane, and Ilya Zaihrayeu;ESWC 2011
• A Classification of Semantic Annotation SystemsPierre Andrews, Sergey Kanshin, Juan Pane, and Ilya Zaihrayeu;Semantic Web Journal – second review phase
• Sense Induction in FolksonomiesPierre Andrews, Juan Pane, and Ilya Zaihrayeu;IJCAI-LHD 2011 – under review
• Evaluating the Quality of Service in Semantic Annotation SystemsIlya Zaihrayeu, Pierre Andrews, and Juan Pane;in preparation
4/14/2011
ONTO
UNITN
UIBK
18 24 30 36
WP 2 TIMELINE AND DELIVERABLES6 120
Tasks Task 2.1Designing models
Task 2.2Designingmethods
Task 2.3Research on Information Retrieval (IR) methods for semantic content
D2.1.1: State of the Art and requirements from the use case partners
Months
D2.1.2: Specification of the model
D2.3.1: Requirements for semantics-aware IR methods
D2.2.1: Report on bootstrapping semantic annotations and on reaching consensus in the use of semantics
D2.2.2+D2.2.3: Report on linking semantic annotations to external sources and on keeping them up-to-date when the underlying semantic model changes
D2.3.2: Specification for semantics-aware IR methods
D2.4 Report on the refinement of the proposed models, methods and semantic search
D2.5 Report on the state of the art, proposed suitable models and methods for automatic visual annotation
Task 2.4 Models and methods for automatic visual annotation
UTC