what to send first? a study of utility in the semantic web mike dean 1, prithwish basu 1, ben...

What to Send First? A Study of Utility in the Semantic Web

Mike Dean1, Prithwish Basu1, Ben Carterette2, Craig Partridge1, and James Hendler3

1Raytheon BBN Technologies2University of Delaware

3Rensselaer Polytechnic Institute

Joint Large and Heterogeneous Data and Quantified Formalization Workshop (LHD+SemQuant 2012)

Boston, Massachusetts12 November 2012

1Copyright 2012 Raytheon BBN Technologies

Outline

• Problem• Our Solution• Future Work


Problem

• Transfer a knowledge base in a constrained or intermittent communication environment– Tactical military– Large football game or conference

• Send the most important information first– Prioritize statements based on their utility

• Account for inference– No need to transfer inferred statements

3

KBKB KBKB

Copyright 2012 Raytheon BBN Technologies

Utility

• The utility of a statement can be calculated by a preference function U(S, s) where S is the set of statements in a knowledge base and s S∈

• Somewhat arbitrarily– Utility ranges from 0 to 1– The total utility of all statements in S should equal 1


Preference Functions

• Ideally, users would provide a preference function suitable for a given context– Difficult to extract or derive

• Need a default preference function when nothing more specific is available

• We selected inverse frequency as the default• Motivations

– Surprise in previous research on Semantic Information Theory

– Term frequency-inverse document frequency in Information Retrieval systems


RDF Utility

• We consider each URI and literal to be a symbol• We compute the utility of a statement by

averaging the inverse frequencies of its subject, predicate, and object components and then normalizing the results


Inference

• Statements can be used to infer other statements– We want to quantify this by computing the inference

contribution of each of these statements

• Statements can have different utilities in different KBs– We’re particularly interested in the initial (ground) KB

and its deductive closure

• The total inference contribution is 1 – the utility of each of the ground statements in the deductive closure


Framework

• An experiment consists of – A set of statements (KB)– An inference procedure – we used RDF Schema– A preference function – we used inverse frequency– A statement ranking function, which uses various

computed values• Implemented using Jena, its rule based reasoner,

and its Derivation interface• We accumulate utility in the deductive closure as

statements are transmitted and inferred– Generate a transcript and a cumulative utility graph– An experiment can be summarized by its average

cumulative utility


Data Sets

• POTUS – Wikipedia information about Presidents of the United States

• FOAF – My FOAF profile + FOAF vocabulary• Cascade – Discussed later

• Data sets and code are available at http://asio.bbn.com/2012/04/utility/


Ranking

• Gold standard: Ranking by utility in the deductive closure + inference contribution

• Inference contribution is rather difficult and expensive to compute– Most reasoners provide 1 justification, not all

• Also tried several heuristics– Utility in the initial KB– Utility in the deductive closure– Random T box, then random A box

• Base case: random10Copyright 2012 Raytheon BBN Technologies

Results of Different Ranking Functions

• Cumulative utility for 262 statements in the POTUS data set


Observations

• We can effectively order statements to increase or maximize average cumulative utility

• Using inverse frequency– Inferred RDFS statements are of lower utility– Matches intuitions and practice regarding

rdf:Resource, etc.

• Ranking based on simpler heuristics appears promising– More research is needed


Cascade Data Set

• @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix : <http://example.org/cascade#> .

:A rdfs:subClassOf :B . :B rdfs:subClassOf :C . :C rdfs:subClassOf :D . :D rdfs:subClassOf :E .

:a rdf:type :A . • Possible to analyze all 5! = 120 possible permutations• What order do you think is best?


Cascade Data Set (2)

• Average Cumulative Utility for all 120 permutations of cascade statements


Cascade Data Set (3)

• Statements0. :D rdfs:subClassOf :E . 1.:B rdfs:subClassOf :C . 2.:C rdfs:subClassOf :D . 3.:a rdf:type :A .4.:A rdfs:subClassOf :B .

• Best results: average cumulative utility .639–01423–04123–04213–10423–24013–40123–40213–42013


Contributions

• Introducing utility into the Semantic Web• Quantifying inference• A new problem• An evaluation framework


Future Directions

• Incorporating user-defined preferences• Employing more sophisticated inference (e.g.

OWL RL)• Working with (much) larger data sets• Generalizing our framework into a toolkit• Considering bits required to encode messages• Addressing multi-party situations with different

preference functions• Modeling information fusion


what to send first? a study of utility in the semantic web mike dean 1, prithwish basu 1, ben...

Documents