what to send first? a study of utility in the semantic web mike dean 1, prithwish basu 1, ben...
TRANSCRIPT
What to Send First? A Study of Utility in the Semantic Web
Mike Dean1, Prithwish Basu1, Ben Carterette2, Craig Partridge1, and James Hendler3
1Raytheon BBN Technologies2University of Delaware
3Rensselaer Polytechnic Institute
Joint Large and Heterogeneous Data and Quantified Formalization Workshop (LHD+SemQuant 2012)
Boston, Massachusetts12 November 2012
1Copyright 2012 Raytheon BBN Technologies
Outline
• Problem• Our Solution• Future Work
2Copyright 2012 Raytheon BBN Technologies
Problem
• Transfer a knowledge base in a constrained or intermittent communication environment– Tactical military– Large football game or conference
• Send the most important information first– Prioritize statements based on their utility
• Account for inference– No need to transfer inferred statements
3
KBKB KBKB
Copyright 2012 Raytheon BBN Technologies
Utility
• The utility of a statement can be calculated by a preference function U(S, s) where S is the set of statements in a knowledge base and s S∈
• Somewhat arbitrarily– Utility ranges from 0 to 1– The total utility of all statements in S should equal 1
4Copyright 2012 Raytheon BBN Technologies
Preference Functions
• Ideally, users would provide a preference function suitable for a given context– Difficult to extract or derive
• Need a default preference function when nothing more specific is available
• We selected inverse frequency as the default• Motivations
– Surprise in previous research on Semantic Information Theory
– Term frequency-inverse document frequency in Information Retrieval systems
5Copyright 2012 Raytheon BBN Technologies
RDF Utility
• We consider each URI and literal to be a symbol• We compute the utility of a statement by
averaging the inverse frequencies of its subject, predicate, and object components and then normalizing the results
6Copyright 2012 Raytheon BBN Technologies
Inference
• Statements can be used to infer other statements– We want to quantify this by computing the inference
contribution of each of these statements
• Statements can have different utilities in different KBs– We’re particularly interested in the initial (ground) KB
and its deductive closure
• The total inference contribution is 1 – the utility of each of the ground statements in the deductive closure
7Copyright 2012 Raytheon BBN Technologies
Framework
• An experiment consists of – A set of statements (KB)– An inference procedure – we used RDF Schema– A preference function – we used inverse frequency– A statement ranking function, which uses various
computed values• Implemented using Jena, its rule based reasoner,
and its Derivation interface• We accumulate utility in the deductive closure as
statements are transmitted and inferred– Generate a transcript and a cumulative utility graph– An experiment can be summarized by its average
cumulative utility
8Copyright 2012 Raytheon BBN Technologies
Data Sets
• POTUS – Wikipedia information about Presidents of the United States
• FOAF – My FOAF profile + FOAF vocabulary• Cascade – Discussed later
• Data sets and code are available at http://asio.bbn.com/2012/04/utility/
9Copyright 2012 Raytheon BBN Technologies
Ranking
• Gold standard: Ranking by utility in the deductive closure + inference contribution
• Inference contribution is rather difficult and expensive to compute– Most reasoners provide 1 justification, not all
• Also tried several heuristics– Utility in the initial KB– Utility in the deductive closure– Random T box, then random A box
• Base case: random10Copyright 2012 Raytheon BBN Technologies
Results of Different Ranking Functions
• Cumulative utility for 262 statements in the POTUS data set
11Copyright 2012 Raytheon BBN Technologies
Observations
• We can effectively order statements to increase or maximize average cumulative utility
• Using inverse frequency– Inferred RDFS statements are of lower utility– Matches intuitions and practice regarding
rdf:Resource, etc.
• Ranking based on simpler heuristics appears promising– More research is needed
12Copyright 2012 Raytheon BBN Technologies
Cascade Data Set
• @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix : <http://example.org/cascade#> .
:A rdfs:subClassOf :B . :B rdfs:subClassOf :C . :C rdfs:subClassOf :D . :D rdfs:subClassOf :E .
:a rdf:type :A . • Possible to analyze all 5! = 120 possible permutations• What order do you think is best?
13Copyright 2012 Raytheon BBN Technologies
Cascade Data Set (2)
• Average Cumulative Utility for all 120 permutations of cascade statements
14Copyright 2012 Raytheon BBN Technologies
Cascade Data Set (3)
• Statements0. :D rdfs:subClassOf :E . 1.:B rdfs:subClassOf :C . 2.:C rdfs:subClassOf :D . 3.:a rdf:type :A .4.:A rdfs:subClassOf :B .
• Best results: average cumulative utility .639–01423–04123–04213–10423–24013–40123–40213–42013
15Copyright 2012 Raytheon BBN Technologies
Contributions
• Introducing utility into the Semantic Web• Quantifying inference• A new problem• An evaluation framework
16Copyright 2012 Raytheon BBN Technologies
Future Directions
• Incorporating user-defined preferences• Employing more sophisticated inference (e.g.
OWL RL)• Working with (much) larger data sets• Generalizing our framework into a toolkit• Considering bits required to encode messages• Addressing multi-party situations with different
preference functions• Modeling information fusion
17Copyright 2012 Raytheon BBN Technologies