inferring web citations using social data and sparql rules
TRANSCRIPT
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Inferring Web Citations using Social Data and SPARQL Rules
Matthew RoweOrganisations, Information and Knowledge Group
University of Sheffield
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Outline
• Problem Setting– Personal Information Dissemination
• SPARQL Rules: Identifying Web Citations– Generating Seed Data – Gathering Possible Web Citations– Inferring Web Citations
• Evaluation• Conclusions• Future Work
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Personal Information on the Web
• Personal information on the Web is disseminated:– Voluntarily– Involuntarily
• Increase in personal information:– Identity Theft– Lateral Surveillance
• Web users must discover their identity web references– 2 stage process
• Find possible references• Identify definite references
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Ambiguity!
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Matthew Rowe: Composer
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Matthew Rowe: Cyclist
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Matthew Rowe: Gardener
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Matthew Rowe: Song Writer
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Matthew Rowe: PhD Student
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Problem Setting
• Performing identification manually:– Time consuming – Laborious
• Handle masses of information– Repeated often
• The Web keeps changing
• Solution = automated techniques– Alleviate the need for humans– Need background knowledge
• Who am I searching for?• What makes them unique?
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
SPARQL Rules: Identifying Web Citations
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
SPARQL Rules: Identifying Web Citations
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Generating Seed Data
• Profiles on Social Web are leveraged as seed data• To generate seed data:
1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF
– Biographical Information– Social Network Information
2. Enrich Graphs with URIs3. Interlink graphs
• Detect equivalent foaf:Person instances• Builds a single social graph
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Generating Seed Data
• Profiles on Social Web are leveraged as seed data• To generate seed data:
1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF
– Biographical Information– Social Network Information
2. Enrich Graphs with URIs3. Interlink graphs
• Detect equivalent foaf:Person instances• Builds a single social graph
http://www.dcs.shef.ac.uk/~mrowe/foafgenerator.html
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Generating Seed Data
• Profiles on Social Web are leveraged as seed data• To generate seed data:
1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF
– Biographical Information– Social Network Information
2. Enrich Graphs with URIs3. Interlink graphs
• Detect equivalent foaf:Person instances• Builds a single social graph
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Generating Seed Data
• Profiles on Social Web are leveraged as seed data• To generate seed data:
1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF
– Biographical Information– Social Network Information
2. Enrich Graphs with URIs3. Interlink graphs
• Detect equivalent foaf:Person instances• Builds a single social graph
1. Blocking Step2. Compare values of Inverse
Functional Properties3. Compare Geo URIs4. Compare Geo data
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Generating Seed Data
• Profiles on Social Web are leveraged as seed data• To generate seed data:
1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF
– Biographical Information– Social Network Information
2. Enrich Graphs with URIs3. Interlink graphs
• Detect equivalent foaf:Person instances• Builds a single social graph
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
SPARQL Rules: Identifying Web Citations
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Gathering Possible Web Citations
• Search WWW and Semantic Web for possible citations• Web resources come in many flavours:
– Data Models, HTML documents, XHTML documents• Convert into RDF
– XHTML Documents:• Use GRDDL• Automated RDF model lifting
– HTML Documents:• Apply person name gazetteer: identify person information• Apply Hidden Markov Model to extract information• Build RDF model from information
M Rowe. Data.dcs: Converting Legacy Data into Linked Data. In proceedings of Linked Data on the Web Workshop, WWW 2010. Raleigh, USA. (2010)
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
SPARQL Rules: Identifying Web Citations
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Inferring Web Citations using SPARQL Rules
• Seed data = solitary example to build rules– State of the art rule induction strategies are limited
• E.g. FOIL and C4.5– Build rules from RDF instances!
1. Extract instances from Seed Data2. For each instance, build a rule:
– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional
3. Apply the rules to the web resources
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Inferring Web Citations using SPARQL Rules
• Seed data = solitary example to build rules– State of the art rule induction strategies are limited
• E.g. FOIL and C4.5– Build rules from RDF instances!
1. Extract instances from Seed Data2. For each instance, build a rule:
– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional
3. Apply the rules to the web resources
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Inferring Web Citations using SPARQL Rules
• Seed data = solitary example to build rules– State of the art rule induction strategies are limited
• E.g. FOIL and C4.5– Build rules from RDF instances!
1. Extract instances from Seed Data2. For each instance, build a rule:
– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional
3. Apply the rules to the web resources
PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {
<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n
}
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Inferring Web Citations using SPARQL Rules
• Seed data = solitary example to build rules– State of the art rule induction strategies are limited
• E.g. FOIL and C4.5– Build rules from RDF instances!
1. Extract instances from Seed Data2. For each instance, build a rule:
– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional
3. Apply the rules to the web resources
PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {
<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n .<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:name ?m .?url foaf:topic ?r .?r foaf:name ?m
}
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Inferring Web Citations using SPARQL Rules
• Seed data = solitary example to build rules– State of the art rule induction strategies are limited
• E.g. FOIL and C4.5– Build rules from RDF instances!
1. Extract instances from Seed Data2. For each instance, build a rule:
– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional
3. Apply the rules to the web resources
PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {
<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n .<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:homepage ?h .?url foaf:topic ?r .?r foaf:homepage ?h
}
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Inferring Web Citations using SPARQL Rules
• Seed data = solitary example to build rules– State of the art rule induction strategies are limited
• E.g. FOIL and C4.5– Build rules from RDF instances!
1. Extract instances from Seed Data2. For each instance, build a rule:
– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional
3. Apply the rules
PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {
<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n .<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:homepage ?h .?url foaf:topic ?r .?r foaf:homepage ?h
}
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Evaluation
• Measures:– Precision, Recall, F-Measure
• Dataset– 50 participants from the Semantic Web and Web 2.0 communities– Seed data collected from Facebook and Twitter– ~17300 web resources: 346 web resources for each participant
• Baselines– Baseline 1: Person name as positive classification
• Skeleton SPARQL Rule– Baseline 2: Human Processing
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
ResultsPrecision Recall F-Measure
Inference Rules 0.955 0.436 0.553Baseline 1 0.191 0.998 0.294Baseline 2 0.765 0.725 0.719
• High precision– Better than humans– Triple Patterns
• Low recall– Rules are strict
• No room for variability– Hard to generalise
• No learning from disambiguation decisions
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Conclusions
• SPARQL Rules are precise– Poor generalisation however– Outperform humans at low web presence levels
• “Needle in a haystack problem”
• User profiles provide seed data– Inexpensively– Capturing:
• Biographical information• Social networking information
• Inability to learn from identifications– Plan for future work– Overcome poor seed data feature coverage
Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010
Questions?
Twitter: @mattroweshowWeb: http://www.dcs.shef.ac.uk/~mroweEmail: [email protected]
M Rowe and F Ciravegna. Disambiguating Identity Web References using Web 2.0 Data and Semantics. In Press for special issue on "Web 2.0" in the Journal of Web Semantics. (2010)
For more information: