inferring web citations using social data and sparql rules

30
ng Web Citations using SPARQL Rules and Social Data – LUPAS2010 Inferring Web Citations using Social Data and SPARQL Rules Matthew Rowe Organisations, Information and Knowledge Group University of Sheffield

Upload: matthew-rowe

Post on 11-May-2015

898 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using Social Data and SPARQL Rules

Matthew RoweOrganisations, Information and Knowledge Group

University of Sheffield

Page 2: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Outline

• Problem Setting– Personal Information Dissemination

• SPARQL Rules: Identifying Web Citations– Generating Seed Data – Gathering Possible Web Citations– Inferring Web Citations

• Evaluation• Conclusions• Future Work

Page 3: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Personal Information on the Web

• Personal information on the Web is disseminated:– Voluntarily– Involuntarily

• Increase in personal information:– Identity Theft– Lateral Surveillance

• Web users must discover their identity web references– 2 stage process

• Find possible references• Identify definite references

Page 4: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Ambiguity!

Page 5: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: Composer

Page 6: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: Cyclist

Page 7: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: Gardener

Page 8: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: Song Writer

Page 9: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Matthew Rowe: PhD Student

Page 10: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Problem Setting

• Performing identification manually:– Time consuming – Laborious

• Handle masses of information– Repeated often

• The Web keeps changing

• Solution = automated techniques– Alleviate the need for humans– Need background knowledge

• Who am I searching for?• What makes them unique?

Page 11: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

SPARQL Rules: Identifying Web Citations

Page 12: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

SPARQL Rules: Identifying Web Citations

Page 13: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

Page 14: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

http://www.dcs.shef.ac.uk/~mrowe/foafgenerator.html

Page 15: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

Page 16: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

1. Blocking Step2. Compare values of Inverse

Functional Properties3. Compare Geo URIs4. Compare Geo data

Page 17: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Generating Seed Data

• Profiles on Social Web are leveraged as seed data• To generate seed data:

1. Export Social Graphs• Interface with the platform’s API• Convert proprietary response into RDF

– Biographical Information– Social Network Information

2. Enrich Graphs with URIs3. Interlink graphs

• Detect equivalent foaf:Person instances• Builds a single social graph

Page 18: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

SPARQL Rules: Identifying Web Citations

Page 19: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Gathering Possible Web Citations

• Search WWW and Semantic Web for possible citations• Web resources come in many flavours:

– Data Models, HTML documents, XHTML documents• Convert into RDF

– XHTML Documents:• Use GRDDL• Automated RDF model lifting

– HTML Documents:• Apply person name gazetteer: identify person information• Apply Hidden Markov Model to extract information• Build RDF model from information

M Rowe. Data.dcs: Converting Legacy Data into Linked Data. In proceedings of Linked Data on the Web Workshop, WWW 2010. Raleigh, USA. (2010)

Page 20: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

SPARQL Rules: Identifying Web Citations

Page 21: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

Page 22: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

Page 23: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n

}

Page 24: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n .<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:name ?m .?url foaf:topic ?r .?r foaf:name ?m

}

Page 25: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules to the web resources

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n .<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:homepage ?h .?url foaf:topic ?r .?r foaf:homepage ?h

}

Page 26: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Inferring Web Citations using SPARQL Rules

• Seed data = solitary example to build rules– State of the art rule induction strategies are limited

• E.g. FOIL and C4.5– Build rules from RDF instances!

1. Extract instances from Seed Data2. For each instance, build a rule:

– Build a skeleton rule– Add triples to the rule– Create a new rule if a triple’s predicate is Inverse Functional

3. Apply the rules

PREFIX foaf:<http://xmlns.com/foaf/0.1/>CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }WHERE {

<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .?url foaf:topic ?p .?p foaf:name ?n .<http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .?q foaf:homepage ?h .?url foaf:topic ?r .?r foaf:homepage ?h

}

Page 27: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Evaluation

• Measures:– Precision, Recall, F-Measure

• Dataset– 50 participants from the Semantic Web and Web 2.0 communities– Seed data collected from Facebook and Twitter– ~17300 web resources: 346 web resources for each participant

• Baselines– Baseline 1: Person name as positive classification

• Skeleton SPARQL Rule– Baseline 2: Human Processing

Page 28: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

ResultsPrecision Recall F-Measure

Inference Rules 0.955 0.436 0.553Baseline 1 0.191 0.998 0.294Baseline 2 0.765 0.725 0.719

• High precision– Better than humans– Triple Patterns

• Low recall– Rules are strict

• No room for variability– Hard to generalise

• No learning from disambiguation decisions

Page 29: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Conclusions

• SPARQL Rules are precise– Poor generalisation however– Outperform humans at low web presence levels

• “Needle in a haystack problem”

• User profiles provide seed data– Inexpensively– Capturing:

• Biographical information• Social networking information

• Inability to learn from identifications– Plan for future work– Overcome poor seed data feature coverage

Page 30: Inferring Web Citations using Social Data and SPARQL Rules

Inferring Web Citations using SPARQL Rules and Social Data – LUPAS2010

Questions?

Twitter: @mattroweshowWeb: http://www.dcs.shef.ac.uk/~mroweEmail: [email protected]

M Rowe and F Ciravegna. Disambiguating Identity Web References using Web 2.0 Data and Semantics. In Press for special issue on "Web 2.0" in the Journal of Web Semantics. (2010)

For more information: