the semantic web: there and back again
DESCRIPTION
The Semantic Web: there and back again. Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad , Anupam Joshi. http:// ebiq.org /r/353. LOD 123 : Making the semantic w eb e asier to u se. Tim Finin University of Maryland, Baltimore County - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/1.jpg)
The Semantic Web:there and back again
Tim FininUniversity of Maryland, Baltimore County
Joint work with Lushan Han, Varish Mulwad, Anupam Joshi
http://ebiq.org/r/353
![Page 2: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/2.jpg)
LOD 123: Making the semantic web easier to use
Tim FininUniversity of Maryland, Baltimore County
Joint work with Lushan Han, Varish Mulwad, Anupam Joshi
![Page 3: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/3.jpg)
Semantic Web: then and now
• Ten years ago the we developed complex ontologies used to encode and reason over small datasets of 1000s of facts
• Recently the focus has shifted to using simple ontologies and minimal reasoning over very large datasets of 100s of millions of facts
• Major companies are moving: Google Know-ledge Graph, Facebook Open Graph, Microsoft Satori, Apple Siri KB, IMB Watson KB Linked open data or “Things, not strings”
![Page 4: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/4.jpg)
Linked Open Data (LOD)• Linked data is just RDF data, lots of it,
with a small schema• RDF data is a graph of triples (subject, predicate
– URI URI String: dbr:Barack_Obama dbo:spouse “Michelle Obama”
– URI URI URI: dbr:Barack_Obama dbo:spouse dbpedia:Michelle_Obama
• Best linked data practice prefers 2nd pattern, using nodes rather than strings for “entities”– Things, not strings!
• Linked open data is just linked data freely acces-sible on the Web along with their ontologies
![Page 5: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/5.jpg)
Semantic web technologies allow machines to share data and knowledge using common web language and protocols.
~ 1997
Semantic Web
Semantic Web beginning
Use Semantic Web Technology to publish shared data & knowledge
![Page 6: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/6.jpg)
2007
Semantic Web => Linked Open DataUse Semantic Web Technology to publish shared data & knowledge
Data is inter-linked to support inte-gration and fusion of knowledge
LOD beginning
![Page 7: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/7.jpg)
2008
Semantic Web => Linked Open DataUse Semantic Web Technology to publish shared data & knowledge
Data is inter-linked to support inte-gration and fusion of knowledge
LOD growing
![Page 8: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/8.jpg)
2009
Semantic Web => Linked Open DataUse Semantic Web Technology to publish shared data & knowledge
Data is inter-linked to support inte-gration and fusion of knowledge
… and growing
![Page 9: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/9.jpg)
Linked Open Data
2010
LOD is the new Cyc: a common source of background
knowledge
Use Semantic Web Technology to publish shared data & knowledge
Data is inter-linked to support inte-gration and fusion of knowledge
…growing faster
![Page 10: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/10.jpg)
Linked Open Data
2011: 31B facts in 295 datasets interlinked by 504M assertions on ckan.net
LOD is the new Cyc: a common source of background
knowledge
Use Semantic Web Technology to publish shared data & knowledge
Data is inter-linked to support inte-gration and fusion of knowledge
![Page 11: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/11.jpg)
Exploiting LOD not (yet) Easy
• Publishing or using LOD data hasinherent difficulties for the potential user
– It’s difficult to explore LOD data and to query it for answers
– It’s challenging to publish data using appropriate LOD vocabularies & link it to existing data
• Problem: O(104) schema terms, O(1011) instances
• I’ll describe two ongoing research projects that are addressing these problems
![Page 12: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/12.jpg)
GoRelations:Intuitive Query System
for Linked Data
Research with Lushan Han
http://ebiq.org/j/93
![Page 13: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/13.jpg)
Dbpedia is the Stereotypical LOD•DBpedia is an important example of Linked Open Data–Extracts structured data from Infoboxes in Wikipedia –Stores in RDF using custom ontologies Yago terms
•The major integration point for the entire LOD cloud•Explorable as HTML, but harder to query in SPARQL
DBpedia
![Page 14: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/14.jpg)
Browsing DBpedia’s
Mark Twain
![Page 15: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/15.jpg)
Why it’s hard to query LOD• Querying DBpedia requires a lot of a user– Understand the RDF model– Master SPARQL, a formal query language– Understand ontology terms: 320 classes & 1600 properties !– Know instance URIs (>2M entities !)– Term heterogeneity (Place vs. PopulatedPlace)
• Querying large LODsets overwhelming
• Natural languagequery systems stilla research goal
![Page 16: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/16.jpg)
Goal
• Let users with a basic understanding of RDF query DBpedia and other LOD collections– Explore what data is in the system– Get answers to question– Create SPARQL queries for reuse or adaptation
• Desiderata– Easy to learn and to use– Good accuracy (e.g., precision and recall)– Fast
![Page 17: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/17.jpg)
Key Idea
Structured keyword queries reduce problem complexity:
– User enters a simple graph, and– Annotates the nodes and arcs with
words and phrases
![Page 18: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/18.jpg)
Structured Keyword Queries
• Nodes are entities and links binary relations• Entities described by two unrestricted terms:
name or value and type or concept• Outputs marked with ? • Compromise between a natural language Q&A
system and formal query–Users provide compositional structure of the question–Free to use their own terms to annotate structure
![Page 19: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/19.jpg)
Translation – Step Onefinding semantically similar ontology terms
For each graph concept/relation, generate k most semantically similar ontology classes/properties
Lexical similarity metric based on distributional similarity, LSA, and WordNet
![Page 21: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/21.jpg)
Semantic Textual Similarity task• 2013 lexical and computational semantics conference• Do two sentences have same meaning (0…5)
1: “The woman is playing the violin” vs. “The young lady enjoys listening to the guitar”4: "In May 2010, the troops attempted to invade Kabul” vs. "The US army invaded Kabul on May 7th last year, 2010"
• 2012: 35 teams, 88 runs, 2013: 36 teams, 89 runs• 2250 sentence pairs
from four domains• Our three runs
#1, #2 and #4
Dataset PairingWords Galactus Saiyan
Headlines (750 pairs) 0.7642 (3) 0.7428 (7) 0.7838 (1)
OnWN (561 pairs) 0.7529 (5) 0.7053 (12) 0.5593 (36)
FNWN (189 pairs) 0.5818 (1) 0.5444 (3) 0.5815 (2)
SMT (750 pairs) 0.3804 (8) 0.3705 (11) 0.3563 (16)
Weighted mean 0.6181 (1) 0.5927 (2) 0.5683 (4)
![Page 22: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/22.jpg)
• Assemble best interpretation using statistics of the data
• Use pointwise mutual informa-tion (PMI) between RDF terms in the LOD collection
Measures degree to which two RDF terms co-occur in knowledge base
• In a good interpretation, ontology terms associate like their corresponding user terms connect in the query
Translation – Step Twodisambiguation algorithm
![Page 23: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/23.jpg)
Three aspects are combined to derive an overall goodness measure for each candidate interpretation
Translation – Step Twodisambiguation algorithm
Joint disam-biguation
Resolvingdirection
Link reason-ableness
![Page 24: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/24.jpg)
Translation resultConcepts: Place => Place, Author => Writer, Book => BookProperties: born in => birthPlace, wrote => author (inverse direction)
![Page 25: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/25.jpg)
The translation of a semantic graph query to SPARQL is straightforward given the mappings
SPARQL Generation
Concepts•Place => Place•Author => Writer•Book => Book
Relations•born in => birthPlace•wrote => author
![Page 26: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/26.jpg)
Evaluation• 33 test questions from 2011 Workshop on Question
Answering over Linked Data answerable using DBpedia• Three human subjects unfamiliar with DBpedia translated
the test questions into semantic graph queries• Compared with two top natural language QA systems:
PowerAqua and True Knowledge
![Page 27: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/27.jpg)
Current work
• Baseline system works well for Dbpedia; we’re testing a second use case now
• Current work– Better entity matching– Relaxing the need for type information– A better Web interface with user feedback & advice
• See http://ebiq.org/93 for more information & try our alpha version at http://ebiq.org/GOR
![Page 28: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/28.jpg)
Generating Linked Databy Inferring the
Semantics of Tables
Research with Varish Mulwad
http://ebiq.org/j/96
![Page 29: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/29.jpg)
Goal: Table => LOD*
Name Team Position Height
Michael Jordan Chicago Shooting guard 1.98
Allen Iverson Philadelphia Point guard 1.83
Yao Ming Houston Center 2.29
Tim Duncan San Antonio Power forward 2.11
http://dbpedia.org/class/yago/NationalBasketballAssociationTeams
http://dbpedia.org/resource/Allen_Iverson Player height in meters
dbprop:team
* DBpedia
![Page 30: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/30.jpg)
Goal: Table => LOD*
Name Team Position HeightMichael Jordan Chicago Shooting guard 1.98
Allen Iverson Philadelphia Point guard 1.83
Yao Ming Houston Center 2.29
Tim Duncan San Antonio Power forward 2.11
@prefix dbpedia: <http://dbpedia.org/resource/> .@prefix dbo: <http://dbpedia.org/ontology/> .@prefix yago: <http://dbpedia.org/class/yago/> .
"Name"@en is rdfs:label of dbo:BasketballPlayer ."Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams .
"Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan .dbpedia:Michael Jordan a dbo:BasketballPlayer .
"Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls .dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams .
RDF Linked Data
All this in a completely automated way* DBpedia
![Page 31: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/31.jpg)
Tables are everywhere !! … yet …
The web – 154 million high quality relational tables
Fewer than 1% of the 400K tables at data.gov have rich semantic schemas
![Page 32: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/32.jpg)
Sampling Acronym detection
Pre-processing modules
Query and generate initial mappings
2 1
Generate Linked RDF Verify (optional) Store in a knowledge base & publish as LOD
Joint Inference/Assignment
A Domain Independent Framework
![Page 33: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/33.jpg)
Query and Rank
Rank(String Similarity,
Popularity)Chicago
Boston
Allen Iverson 1. Chicago_Bulls2. Chicago3. Judy_Chicago
possible entitiesfor Chicago
Can be replaced by Domain Specific / other LOD knowledge bases
![Page 34: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/34.jpg)
Generating candidate ‘types’ for Columns
Team
Chicago
Philadelphia
Houston
San Antonio
Class
Instance
1. Chicago Bulls2. Chicago3. Judy Chicago
{dbpedia-owl:Place,dbpedia-owl:City,yago:WomenArtist,yago:LivingPeople,yago:NationalBasketballAssociationTeams }
{dbpedia-owl:Place, dbpedia-owl:PopulatedPlace, dbpedia-owl:Film,yago:NationalBasketballAssociationTeams …. ….. ….. }
{……………………………………………………………. }
dbpedia-owl:Place, dbpedia-owl:City, yago:WomenArtist, yago:LivingPeople, yago:NationalBasketballAssociationTeams, dbpedia-owl:PopulatedPlace, dbpedia-owl:Film ….
![Page 35: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/35.jpg)
Joint Inference / Assignment
![Page 36: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/36.jpg)
A graphical model for tablesJoint inference over evidence in a table
C1 C2 C3
R11
R12
R13
R21
R22
R23
R31
R32
R33
Team
Chicago
Philadelphia
Houston
San Antonio
Class
Instance
![Page 37: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/37.jpg)
Parameterized Graphical Model
C1 C2C3
𝝍𝟓
R11 R12 R13 R21 R22 R23 R31 R32 R33
Function capturing affinity between column headers and row values
Row value
Variable Node: Column header
Captures interaction between column headers
Factor Node
Captures interaction between row values
![Page 38: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/38.jpg)
Standard message passingP(C1, C2, C3, R11, R12 ,R13, R21, R22, R23, R31, R32, R33)Joint Assignment :
P(C1, R11, R12 ,R13)
P(C3, R31, R32, R33)
P(C2,R21, R22, R23)
P(R31, R32, R33)
Graphical Models : Exploit Conditional
Independences
Still …
C1 R11 R12 R13 Val
4 Variables; Each having
25 options -- 390,625
entries !
![Page 39: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/39.jpg)
Semantic message passing
Michael_I_Jordan Chicago_BullsShooting_guard
“Change”Related to Chicago_Bulls
“No Change”“No Change”
R11:[Michael_I_Jordan]
R12:[Yao_Ming]
R13:[Allen_Iverson]
R21:[Chicago_Bulls]
R31:[Shooting_Guard]
……
C1:[BasketballPlayer] C2:[NBATeam] C3:[BasketBallPositions]
Yao_Ming Allen_Iverson
BasketballPlayer
NBATeam BasketBallPositions
“Change”BasketBall
Player
“No Change”
“No Change”
“No Change”“No Change”
“No Change”
“No Change”
![Page 40: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/40.jpg)
Inference – ExampleR11:
[Michael_I_Jordan]
R12:[Allen_Iverson] R13:[Yao_Ming]
C1:[Name]
(Michael_I_Jordan, Yao_Ming, Allen_Iverson)
“Change”“No Change”“No Change”
“BasketBallPlayer”
R11
Michael Jordan
1. Michael_I_Jordan (Professor)2. ….. 3. Michael_Jordan (BasketballPlayer)
….
Michael_I_Jordan Allen_Iverson Yao_Ming
“BasketBallPlayer”
![Page 41: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/41.jpg)
Column header – row value agreement[Michael_I_Jordan, Allen_Iverson, Yao_Ming]
WomenArtistCityPopulatedPlaceAthleteFilm
LivingPeopleGeoPopulatedPlaceBasketBallPlayerArtWork
Name
Michael_I_Jordan
Allen_Iverson
Yao_Ming BasketballPlayerAtheleteLivingPeople
LivingPeopleAI_Researchers +1
+1
+1
+1
+1 1. Athlete2. City 3. …
1.LivingPeople2. BasketBallPlayer3.GeoPopulatedPlace….
Yago Tie-breaker/Re-order : Choose more ‘descriptive’ class. E.g. BasketBallPlayer better than LivingPeople
ClassGranularityScore = 1-[]
1: Majority Voting
2: Choose the top Class
Top Yago : BasketBallPlayer TopDBpedia : Athelete
![Page 42: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/42.jpg)
Column header – row value agreementtopClassScore = (numberOfVotes)/(numberofRows)
Compute topScoreYago & topScoreDBpedia
(both)Score < Threshold(topScoreYago || topScoreDBpedia) >= Threshold
Check for AlignmentIs Athelete sub/superClass of BasketBallPlayer ?
Columnn Header Annotation = BasketBallPlayer, Athlete
Name
Michael_I_Jordan
Allen_Iverson
Yao_Ming BasketballPlayerAtheleteLivingPeople
LivingPeopleAI_Researchers Change
No - Change
Update Column Header Annotation = “No-Annotation”
![Page 43: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/43.jpg)
Update row value entity annotations
R11
Michael Jordan
1. Michael_I_Jordan 2. ….. 3. Michael_Jordan ….
LivingPeopleAI_Researchers
BasketBallPlayerAthlete
𝝍𝟑 “CHANGE”
Entity Class : BasketBallPlayer or
Athlete
R11 Michael Jordan Michael_Jordan
Candidate Entities for R11
![Page 44: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/44.jpg)
Evaluation
• Dataset of 80 tables (Wikipedia tables; part of larger dataset released by IIT-Bombay)
• Evaluated Column Header Annotation Accuracy– How good was the mapping Team to
NationalBasketballAssociationTeams• Evaluated Entity Linking Accuracy
– Mapping Michael Jordan to Michael_Jordan
![Page 45: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/45.jpg)
Column Header Annotation Accuracy
• System produced a ranked a list of Yago & DBpedia classes
• Human judges evaluated each class• For precision, judges scored each class
• 1 if the class was accurate• 0.5 if the class ok, but not best (e.g., Place vs. City)• 0 if it was incorrect
• For Recall, score 1 if accurate/correct, 0 for incorrect
522
259
422
Accurate
Okay
Incorrect
![Page 46: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/46.jpg)
Top K classes, F-measure
Yago rank 1
Yago rank 2
Yago rank 3
dbp rank 1
dbp rank 2
dbp rank 3
GOOG IIT-B
F-Measure 0.688360450563
204
0.487568135310
842
0.438263244010
532
0.677154394707
449
0.626750158305
777
0.612644415917
844
0.67 0.56
0.05
0.15
0.25
0.35
0.45
0.55
0.65
0.75
Column Header Annotations
F-Measure
![Page 47: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/47.jpg)
Entity linking accuracy
http://dbpedia.org/resource/Allen_IversonAllen Iverson
Correctly Linked Entities
3022
Incorrectly Linked Entities
959
Total Entities 3981
Accuracy : 75.91 %
![Page 48: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/48.jpg)
Other Challenges• Using table captions and other text is
associated documents to provide context• Size of some data.gov tables (> 400K rows!)
makes using full graphical model impractical– Sample table and run model on the subset
• Achieving acceptable accuracy may require human input– 100% accuracy unattainable automatically– How best to let humans offer advice and/or
correct interpretations?
![Page 49: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/49.jpg)
Final Conclusions
• Linked data great for sharing structured and semi-structured data– Backed by machine-understandable semantics– Uses successful Web languages and protocols
• Generating and exploring linked data resources is challenging– Schemas are too large, too many URIs
• New tools mapping tables to linked data and translating structured natural language queries reduce the barriers
![Page 50: The Semantic Web: there and back again](https://reader035.vdocuments.site/reader035/viewer/2022062301/568159e7550346895dc7329f/html5/thumbnails/50.jpg)
http://ebiq.org/