uab 2011- combining human and computational intelligence

Combining Human and Computational Intelligence

Ilya Zaihrayeu, Pierre Andrews, Juan Pane

2

Semantic annotation lifecycle

User

free text annotations

What if the users could use semantic annotations

instead to leverage semantic technology services?

Semantic annotation=structure

and/or meaningReasoning Semantic search …

Problem 1: help the user find and

understand the meaning of semantic

annotations

Problem 2: extract

(semantic) annotations

from contexts of user

resource at publishing

Context Problem 3: QoS of semantics-enabled services

Problem 4: semi-automatic semantification of existing

annotations

4/14/2011

3

Index: meaning summarization

User

Reasoning Semantic search …

Problem 1: help the user find and

understand the meaning of semantic

annotations

4/14/2011

4

Meaning summarization: why?• The right meaning of the words being used for the

annotation are in the mind of the people using them• E.g.: Java:– an island in Indonesia south of Borneo; one of the world's

most densely populated regions– a beverage consisting of an infusion of ground coffee beans;

"he ordered a cup of coffee“– a simple platform-independent object-oriented

programming language used for writing applets that are downloaded from the World Wide Web by a client and run on the client's machine

• Descriptions are too long for the user to grasp the meaning immediately – too high barrier to start generating semantic annotations

island

beverage

programming language

4/14/2011

5

Meaning summarization: an example

One word summaries are generated from the relations in

the knowledge base, sense definitions, synonyms and

hypernym terms

4/14/2011

6

Meaning summarization: evaluation results

Best precision: 63%

Discriminating power: 76,4%

4/14/2011

If we talk about java, does the word coffee mean the same as island?

7

Index: gold standard dataset

User


Problem 3: QoS of semantics-enabled services?


annotations

In order to evaluate the performance of the

algorithms, a gold standard dataset is

needed

4/14/2011

8

Proposed Approach

Tag Tokens Senses

javaisland Java islandJava is land…

Java – an island in Indonesia to the south of BorneoIsland – a land mass that is surrounded by water

DisambiguationPreprocessing

Create a gold standard of folksonomy with sense

80% Accuracy 59% Accuracy

# of annotations 4 296

Unique tags 857

Unique URLs 644

Unique users 1 194

Annotator Agreement 81 %

4/14/2011

9

A Platform for Gold Standards of Semantic Annotation Systems

• Manual validation• RDF export• Evaluation of– Preprocessing– WSD – BoW Search– Convergence

• Open source:http://sourceforge.net/projects/tags2con/

7 modules25K lines of code26% of comments

4/14/2011

http://sourceforge.net/projects/tags2con/

10

Delicious RDF Dataset @ LOD cloud

http://disi.unitn.it/~knowdive/dataset/delicious/

# triples 85 908

Outlinks to LOD cloud (WN synsets)

651

4/14/2011

Dereferenceable at:

http://disi.unitn.it/~knowdive/dataset/delicious/

11

Index: QoS for semantic search

User


Problem 3: QoS of semantics-enabled services?

4/14/2011

12

Semantic search: why?

• With the free text search, the following problems may reduce precision and recall:– synonymy problem: searching for “images” should return

resources annotated with “picture”– polysemy problem: searching for “java” (island) should

not return resources annotated with “java” (coffee beverage)

– specificity gap problem: searching for “animals” should also return resources annotated with “dogs”

• Semantic, meaning-based search can address the above listed problems

4/14/2011

13

Semantics vs Folksonomy

Specificity Gap

Semantic search: complete and correct results (the baseline)

Recall goes down as the specificity gap increasescar

taxi

vehiclelink

User

query

submit

resource

annotation

result

SG=1

SG=2

4/14/2011

javaisland

java island

Java(island) island(land)

Used to build “raw” queries

Used to build BoW queries

Used to build semantic queries

correct and completeSpecificity Gap (SG)

14

Index: semantic convergence

User



annotations

4/14/2011

15

Semantic convergence: Why?Other

3% Cannot decide5%

Ab-brevia-tion2%

Missing sense15%

I don'

t know

4%

With a WN sense71%

“General” domains: cooking, travel, education

Other1% Cannot decide

6% Ab-brevia-tion5%

Missing

sense35%

I don't know3%

With a WN sense49%

Random: programming and web domain

4/14/2011

AjaxMacAppleCSS…

16

Semantic convergence: proposed solution

• Find new senses of terms– Find different senses of the same term (word sense)– Find synonymous of a term (synonymous sets - synset)

• Place the new synset in the vocabulary is-a hierarchy• What we improve

– Better use of Machine Learning techniques– The polysemy issue is not considered in the state of the

art– Missing or “subjective” evaluations in the state of the art

• Evaluation using the Delicious dataset

4/14/2011

17

Convergence Evaluation: Finding Senses

Tag Collocation User Collocation

4/14/2011

B1

B4

B2

B3

t2

t3t4 t5

t1

Precision: 56%Recall: 73%

Random Baseline



B1

B4B3

t2

t3

t4

t5

t1

U1

U2

18

Semantic annotation lifecycle

User

free text annotations

What if the users could use semantic annotations

instead to leverage semantic technology services?

Semantic annotation=structure

and/or meaningReasoning Semantic search …

Problem 1: help the user understand the meaning of semantic

annotations?

Problem 2: extract

(semantic) annotations

from contexts of user

resource at publishing?

Context Problem 3: QoS of semantics-enabled services?


annotations

combining human and computational intelligence

Conclusions

4/14/2011

19

Conclusions• We developed and evaluated a meaning summarization algorithm• We developed a “semantic folksonomy” evaluation platform• We studied the effect of semantics on social tagging systems:

– how much semantics can help? – how much the user needs to be involved? – How human and computer intelligence can be combined in the generation

and consumption of semantic annotations• We developed and evaluated a knowledge base enrichment algorithm• We built and used a gold standard dataset for evaluating:

– Word Sense Disambiguation– Tag Preprocessing– Semantic Search– Semantic Convergence

4/14/2011

20

Integration with the use cases4/14/2011

21

Publications

• Semantic Disambiguation in Folksonomy: a Case StudyPierre Andrews, Juan Pane, and Ilya Zaihrayeu;

Advanced Language Technologies for Digital Libraries, Springer’s LNCS.• Semantic Annotation of Images on Flickr

Pierre Andrews, Sergey Kanshin, Juan Pane, and Ilya Zaihrayeu;ESWC 2011

• A Classification of Semantic Annotation SystemsPierre Andrews, Sergey Kanshin, Juan Pane, and Ilya Zaihrayeu;Semantic Web Journal – second review phase

• Sense Induction in FolksonomiesPierre Andrews, Juan Pane, and Ilya Zaihrayeu;IJCAI-LHD 2011 – under review

• Evaluating the Quality of Service in Semantic Annotation SystemsIlya Zaihrayeu, Pierre Andrews, and Juan Pane;in preparation

4/14/2011

ONTO

UNITN

UIBK

18 24 30 36

WP 2 TIMELINE AND DELIVERABLES6 120

Tasks Task 2.1Designing models

Task 2.2Designingmethods

Task 2.3Research on Information Retrieval (IR) methods for semantic content

D2.1.1: State of the Art and requirements from the use case partners

Months

D2.1.2: Specification of the model

D2.3.1: Requirements for semantics-aware IR methods

D2.2.1: Report on bootstrapping semantic annotations and on reaching consensus in the use of semantics

D2.2.2+D2.2.3: Report on linking semantic annotations to external sources and on keeping them up-to-date when the underlying semantic model changes

D2.3.2: Specification for semantics-aware IR methods

D2.4 Report on the refinement of the proposed models, methods and semantic search

D2.5 Report on the state of the art, proposed suitable models and methods for automatic visual annotation

Task 2.4 Models and methods for automatic visual annotation

UTC

uab 2011- combining human and computational intelligence

Technology