can we crate better links playing games?
DESCRIPTION
ICSC 2013 (http://ieee-icsc.org/icsc2013/) presentation on game based link verificationTRANSCRIPT
Can we Create Better Links by Playing Games?
Jens Lehmann Tri Quan Nguyen Timofey Ermilov
September 18, 2013
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 1 / 35
Outline
1 Motivation
2 VeriLinks
PlatformGame ImplementationReward and Balancing
3 Evaluation and Experiments
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 2 / 35
Outline
1 Motivation
2 VeriLinks
PlatformGame ImplementationReward and Balancing
3 Evaluation and Experiments
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 3 / 35
Motivation
Links are the backbone of the traditional World Wide Web and theData Web
Automated tools (LIMES, SILK) are able to create a high number oflinks between RDF resources by using heuristics
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 4 / 35
Why is it di�cult?
De�nition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}
S: DBpedia
rdfs:label: "African Elephant"
T: BBC Wildlife
dc:title: "African Bush Elephant"
dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?δ = levenshtein(S .rdfs:label,T .dc:title)
δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 5 / 35
Why is it di�cult?
De�nition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}
S: DBpedia
rdfs:label: "African Elephant"
T: BBC Wildlife
dc:title: "African Bush Elephant"
dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?δ = levenshtein(S .rdfs:label,T .dc:title)
δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 5 / 35
Why is it di�cult?
De�nition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}
S: DBpedia
rdfs:label: "African Elephant"
T: BBC Wildlife
dc:title: "African Bush Elephant"
dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?
δ = levenshtein(S .rdfs:label,T .dc:title)δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 5 / 35
Why is it di�cult?
De�nition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}
S: DBpedia
rdfs:label: "African Elephant"
T: BBC Wildlife
dc:title: "African Bush Elephant"
dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?δ = levenshtein(S .rdfs:label,T .dc:title)
δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 5 / 35
Why is it di�cult?
De�nition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) ∈ S × T : δ(s, t) ≤ θ}
S: DBpedia
rdfs:label: "African Elephant"
T: BBC Wildlife
dc:title: "African Bush Elephant"
dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?δ = levenshtein(S .rdfs:label,T .dc:title)
δ(dbpedia:AfricanElephant, bbc:hfzw82929) = 5
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 5 / 35
Motivation
Without manual veri�cation of the created links, it is di�cult toensure high precision and recall
Is it possible to use game based approaches to improve this manualveri�cation stage?
→ VeriLinks Platform
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 6 / 35
Outline
1 Motivation
2 VeriLinks
PlatformGame ImplementationReward and Balancing
3 Evaluation and Experiments
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 7 / 35
VeriLinks
Lightweight framework in which users validate links while playing agame
Collection of HTML Games
Game 1
Game 2
...
Add new game
Reward and BalancingSystem
User Management
Statistics Management
Links Management
Server API
VeriLinks PlatformVeriLinks Platform
Link Verification
Only few methods needed to create an interlinking game (independentof programming language)
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 8 / 35
Game Task
VeriLinks input:A set of linksSPARQL endpoints containing information about the resources whichshould be interlinkedTemplate for visual presentation of validation task
Link Specification
Link Specification Set of linksSet of links
Template for SPARQL Retrieval
Template for SPARQL Retrieval
Visual Template
Visual Template
LIMES/SILK
Create manually
(semi-)automatic
Game TaskGame Task
automatic
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 9 / 35
Visual Template
Visual presentation of resources by using JsRender templates and CSSstylesheets
Generate HTML by taking JSON input and combining it with aJsRender template
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 10 / 35
Game Implementation
A VeriLinks game is divided into two components:
Game panel
Validation panel
Activities in the validation panel in�uence the game panel
e.g. if a player answers questions correctly, he earns coins/bene�ts
Aim of every game is to correctly validate as many links a possble
While validating he has the options to:
con�rm a linkreject a linkor make no decision ("not sure" option)
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 11 / 35
Game Implementation
Validation panel/questions examples:
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 12 / 35
Game Implementation
So far, two di�erent games have been implemented in VeriLinks:
pea invasion space ships
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 13 / 35
Pea Invasion
Pea invasion is a single player web game in which players indirectlycompete against other players
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 14 / 35
Pea Invasion
Pea invasion is a single player web game in which players indirectlycompete against other players
Their scores depend on the evaluation of others
If a player answers questions correctly, he earns coins (coins can beused in the game to defend better against enemies)
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 15 / 35
Pea Invasion
Pea invasion is a single player web game in which players indirectlycompete against other players
Their scores depend on the evaluation of others
If a player answers questions correctly, he earns coins (coins can beused in the game to defend better against enemies)
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 15 / 35
Pea Invasion
Some players found pea invasion to be a cognitively di�cult, sincethey had to evaluate links and play the main game simultaneously
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 16 / 35
Space Ships
Space ships is a turn based game in which link veri�cation and gameplaying phases alternate
Target of the game is to destroy the ship of the opponent before hedestroys yours
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 17 / 35
Space Ships
Space ships is a turn based game in which link veri�cation and gameplaying phases alternate
Target of the game is to destroy the ship of the opponent before hedestroys yours
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 17 / 35
Space Ships
Damage calculated at end of turn and based on number of veri�ed
links and precision of the veri�cation
Turn based game allows implementation of asynchronous multiplayermode: user can compete against an AI of a particular strength oragainst a ship of another user of the system
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 18 / 35
Reward and Balancing
Three valued approach (con�rm, reject link, no decision) as basis forthe internal rewarding and balancing functions
VeriLinks mainly designed for single player games
In-game questions and rewards depend on history of all games playedso far
Common simultaneous two player games often have a simple rewardscheme: agreement = reward
VeriLinks reward system has to take history into account
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 19 / 35
Compute Di�culty
A simple way to compute di�culty:
divide the number of times the unsure option was selected by thenumber of times the question was played
We use an enhanced version of this formula:
Take uncertainty into account in cases when the question was playedonly few timesUse the center of the 80% con�dence interval of the Wilson method,which works well for small samples to estimate the di�culty d :
d =p̂ + 1
2nz21−α/2
1+ 1nz21−α/2
p̂ . . . percentage of "unsure replies"α . . . error percentilez1−α/2 . . . percentile of standard normal distribution
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 20 / 35
Reward and Balancing
Player strength: s = max(0, ( cw− 0.5)/(1− 0.5)) (w window size, c
correct answers)
9 out 20 correct: s = 0 (random); 17 out of 20: s = 0.7
Adjust game di�culty to player:
Draw random number between 0 and player strengthSelect link with closest di�culty value
Agreement:
For estimating link con�dence applied Wilson method (high percentageof players saying that the link is correct translates to a high con�dencevalue)Lower and upper threshold tl (= 0.3) and tu (= 0.7)Penalties below tu, small rewards in between tl and tu and standardreward above tu
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 21 / 35
Reward and Balancing
Player strength: s = max(0, ( cw− 0.5)/(1− 0.5)) (w window size, c
correct answers)
9 out 20 correct: s = 0 (random); 17 out of 20: s = 0.7
Adjust game di�culty to player:
Draw random number between 0 and player strengthSelect link with closest di�culty value
Agreement:
For estimating link con�dence applied Wilson method (high percentageof players saying that the link is correct translates to a high con�dencevalue)Lower and upper threshold tl (= 0.3) and tu (= 0.7)Penalties below tu, small rewards in between tl and tu and standardreward above tu
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 21 / 35
Reward and Balancing
Player strength: s = max(0, ( cw− 0.5)/(1− 0.5)) (w window size, c
correct answers)
9 out 20 correct: s = 0 (random); 17 out of 20: s = 0.7
Adjust game di�culty to player:
Draw random number between 0 and player strengthSelect link with closest di�culty value
Agreement:
For estimating link con�dence applied Wilson method (high percentageof players saying that the link is correct translates to a high con�dencevalue)Lower and upper threshold tl (= 0.3) and tu (= 0.7)Penalties below tu, small rewards in between tl and tu and standardreward above tu
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 21 / 35
Outline
1 Motivation
2 VeriLinks
PlatformGame ImplementationReward and Balancing
3 Evaluation and Experiments
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 22 / 35
Evaluation and Experiments
In our experiments and the survey, we addressed the following researchquestions:
1 How accurate are game based interlinking methods?
2 Can they be combined with manually crafted link speci�cation toboost F-measure?
3 Are players willing to invest time in linking games?
4 Did the balancing and rewarding as well as the overall game play workwell?
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 23 / 35
Experimental Setup
Announced VeriLinks survey and platform on Semantic Web lists
Recorded in-game statistics and survey results over 3 days in which118 players used VeriLinks and 42 of those completed the survey
Selected linksets understandable by a general audience and smallenough to manually create a gold standard
name number of links source class target class
DBpedia-LinkedGeoData 204 dbpedia-owl:Country lgdo:Country
DBpedia-Factbook 142 dbpedia-owl:Language factbook:Country
DBpedia-BBCwildlife 108 dbpedia-owl:Species bbc-wildlife:Species
Table: linksets selected from the LATC link speci�cation repository
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 24 / 35
Experimental Setup II
User tasks:linkset task
DBpedia-LinkedGeoData map a �ag of a country to a country displayed on a map
DBpedia-Factbook map languages to countries
DBpedia-BBCwildlife match pictures of animals
Con�dence thresholds in those speci�cations were lowered
Results in linksets with high recall and lower precision
Reviewed all resulting links to create gold standard
Templates were kept as simple as possible
Previous internal evaluation phase revealed: players overwhelmedwhen shown too much information
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 25 / 35
Experimental Setup II
User tasks:linkset task
DBpedia-LinkedGeoData map a �ag of a country to a country displayed on a map
DBpedia-Factbook map languages to countries
DBpedia-BBCwildlife match pictures of animals
Con�dence thresholds in those speci�cations were lowered
Results in linksets with high recall and lower precision
Reviewed all resulting links to create gold standard
Templates were kept as simple as possible
Previous internal evaluation phase revealed: players overwhelmedwhen shown too much information
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 25 / 35
Experimental Setup II
User tasks:linkset task
DBpedia-LinkedGeoData map a �ag of a country to a country displayed on a map
DBpedia-Factbook map languages to countries
DBpedia-BBCwildlife match pictures of animals
Con�dence thresholds in those speci�cations were lowered
Results in linksets with high recall and lower precision
Reviewed all resulting links to create gold standard
Templates were kept as simple as possible
Previous internal evaluation phase revealed: players overwhelmedwhen shown too much information
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 25 / 35
Results
Two con�dence measures for each link derived from link speci�cationand the game
Computed F-Measures for all thresholds, for links which wereevaluated in the game at least 5 times:
0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
Threshold
0,9
0,91
0,92
0,93
0,94
0,95
F-Measure
GameSpec
Game and the link speci�cations achieve similar performance on thegiven tasks
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 26 / 35
Results
Two con�dence measures for each link derived from link speci�cationand the game
Computed F-Measures for all thresholds, for links which wereevaluated in the game at least 5 times:
0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
Threshold
0,9
0,91
0,92
0,93
0,94
0,95
F-Measure
GameSpec
Game and the link speci�cations achieve similar performance on thegiven tasks
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 26 / 35
Results II
We expected that the game would not outperform the specs, because
Specs were of good qualityIntentionally minimised the information shown to the gamer
Goal: do not maximise accuracy of the game predictions itself, butmaximise the accuracy of link spec predictions combined with
game predictions
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 27 / 35
Results III
Con�dence scatter plot:
Green squares = correct, red diamonds = incorrect
Combination of both con�dence values allows very precise predictions
There is a simple linear separation, i.e. a classi�er, which does only asingle error while still achieving remarkably high recall
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 28 / 35
Results IV
Con�rmation by running a support vector machine classi�er using thetwo con�dence values as input
measurement value
spec F-Measure (original, no threshold optimisation) 71%spec F-Measure (SVM, 10 fold CV) 91%game F-Measure (SVM, 10 fold CV) 89%combined F-Measure (SVM, 10 fold CV) 94%
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 29 / 35
Results V
Usage statistics for 3-day evaluation phase
criterion value
number of players 118number of distinct evaluated links 454number of agreements 4738number of disagreements 1053number of times unsure option has been selected 507total hours played 17.24 haverage playing time per session 6.73 min
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 30 / 35
Results VI
Limitations of image comparisons:
Similar looking �ags often incorrectly assigned to countriesSimilar looking animals confused
Nile Crocodile Saltwater Crocodile
Since most users make this mistake systematically, it will persist evenin the long run
Sometimes problems in original sources:
If incorrect images are displayed, because of errors in the source data,this also leads to misjudgements in the game
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 31 / 35
Survey
Survey was divided in two parts:
personal questionsgame experience with the pea invasion game
51 people �lled the survey with 42 of them answering all questions
Matching animals (62%) was the most popular game task, followed by�ags (40%) and languages (28%)
Showing that games with a purpose involving images tend to attractinterest
About 62% of the players said the game was "quite good" or "apleasure to play", 38% did "not really" enjoy it and no one said that itis among the worst games
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 32 / 35
Survey
Question and game di�culty are shown below:
60% of the players said that they would probably or certainly play thegame again
22 players (21%) did actually already return and play again within the3 day evaluation phase
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 33 / 35
Conclusion
Designed VeriLinks platform with balancing and reward mechanisms,two game prototypes and executed a survey and in-game study
Experiment shows game based interlinking can be useful to increaseprecision and recall
Games need to be designed in a way, which either exploits thestrengths of the human brain or draws on existing expert knowledge
Presenting players content they are knowledgeable about andintentionally hiding other relevant information did lead to morepromising results when combining both sources of evidence
> 5700 decisions in 17+ hours → e�ort to con�gure game lower thane�ort of manual link creation
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 34 / 35
Conclusion
Designed VeriLinks platform with balancing and reward mechanisms,two game prototypes and executed a survey and in-game study
Experiment shows game based interlinking can be useful to increaseprecision and recall
Games need to be designed in a way, which either exploits thestrengths of the human brain or draws on existing expert knowledge
Presenting players content they are knowledgeable about andintentionally hiding other relevant information did lead to morepromising results when combining both sources of evidence
> 5700 decisions in 17+ hours → e�ort to con�gure game lower thane�ort of manual link creation
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 34 / 35
Conclusion
Designed VeriLinks platform with balancing and reward mechanisms,two game prototypes and executed a survey and in-game study
Experiment shows game based interlinking can be useful to increaseprecision and recall
Games need to be designed in a way, which either exploits thestrengths of the human brain or draws on existing expert knowledge
Presenting players content they are knowledgeable about andintentionally hiding other relevant information did lead to morepromising results when combining both sources of evidence
> 5700 decisions in 17+ hours → e�ort to con�gure game lower thane�ort of manual link creation
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 34 / 35
End
Jens [email protected]/Uni Leipzig
Thank you!
http://verilinks.aksw.org
Lehmann,Nguyen,Ermilov (AKSW) Better Links by Playing Games September 18, 2013 35 / 35