dbpedia linker

22
London. September 4, 2008 Christian Becker Chris Bizer Georgi Kobilarov Freie Universität Berlin DBpedia Linker Interlinking CIS concepts with DBpedia

Upload: christianhbecker

Post on 28-Nov-2014

5.378 views

Category:

Technology


4 download

DESCRIPTION

Interlinking BBC CIS concepts with DBpedia Learn more about our work on http://mes-semantics.com

TRANSCRIPT

Page 1: DBpedia Linker

Christian Becker: DBpedia LinkerLondon. September 4, 2008

Christian BeckerChris Bizer

Georgi Kobilarov

Freie Universität Berlin

DBpedia Linker

Interlinking CIS concepts with DBpedia

Page 2: DBpedia Linker

Christian Becker: DBpedia Linker

Hello

Name Christian Becker

Job Partner, MES (Consulting on media-centric solutions)

PhD Student at Freie Universität Berlin

Semantic Web Projects DBpedia’s Geo and Homepage Extractors

DBpedia Mobile and Marbles Browser

flickr™ wrappr

Page 3: DBpedia Linker

Christian Becker: DBpedia Linker

DBpedia/Wikipedia as a Common Vocabulary

Better link between BBC properties

Better link externally

Better find and integrate BBC content elsewhere; leverage BBC metadata

Page 4: DBpedia Linker

Christian Becker: DBpedia Linker

Better link between BBC properties

Page 5: DBpedia Linker

Christian Becker: DBpedia Linker

Better link externally

BBC properties can be enriched with information from Wikipedia articles as well as content connected to them

Page 6: DBpedia Linker

Christian Becker: DBpedia Linker

Better find and integrate BBC content elsewhere

Page 7: DBpedia Linker

Christian Becker: DBpedia Linker

DBpedia

Page 8: DBpedia Linker

Programmes Music

Topics

Users

Events

News Food

Gardening

Christian Becker: DBpedia Linker

Page 9: DBpedia Linker

Christian Becker: DBpedia Linker

BBCProgrammes

BBCTopics

BBCMusic ✔

DBpedia

TODAY!

Music-brainz

BBCNews etc.

FUTURE

FUTURE

Page 10: DBpedia Linker

Christian Becker: DBpedia Linker

BBC Topics: CIS Taxonomy

Core datasets 6,630 brands

55,943 locations

55,943 names

11,231 subjects

Preferred and alternative labels

Tree hierarchy expressed in SKOS

Implicit hierarchy in parentheses texts, e.g. Jane Seymour (actor)

Page 11: DBpedia Linker

Christian Becker: DBpedia Linker

Results

Total Linked Precision* Recall*

Brand

Location

Name

Subject

6,630 1,267 (19%) 86% 41%

55,943 11,316 (20%) 99% 77%

73,442 22,341 (30%) 92% 67%

11,231 6,822 (61%) 92% 75%

* Against test set of 600 resources. Updated to reflect only cases where links are possible.

Page 12: DBpedia Linker

Christian Becker: DBpedia Linker

Why so few links...?

Many concepts simply don’t have their own Wikipedia articles Brands

- “Mind the baby, Mr Bean” is in Wikipedia’s “List of Mr. Bean episodes”- “Face to face (BBC Radio Gloucestershire community

programme)” (not the BBC TV Series!)

Locations- “West Woods (Wiltshire)”- “Hobhole Drain” (notable mention in “List of rivers of England”)- “Hinchingbrooke Country Park”

Names- “The Jolly Anker (pub, Northampton)”- “Moulton Players (drama group)”- “Halliwell, Jo (BBC Leeds volunteer for Fat Nation)”

Subjects- “Agricultural Statistics”

We think that important concepts are largely linked!

Page 13: DBpedia Linker

Christian Becker: DBpedia Linker

Linking Approach

Automated linking: Tradeoff between quality and quantity

We wanted highly qualitative links

Limited input - only labels and hierarchy

Problems No correspondences

Differing labels- Word stemming- Determining term nearness using Lucene’s scorer- Integrating Wikipedia redirects to find alias labels

Ambiguities- Sorting by number of inter-wiki references- DBpedia class restrictions - Class Equivalence- Require exact matches

Page 14: DBpedia Linker

Christian Becker: DBpedia Linker

Poor man’s PageRank

Bill Clinton

30000

...Democratic

Party

Hillary Clinton

United States

List of United States

Presidents

Lucene boost factor = Number of article from which an article is referenced

Page 15: DBpedia Linker

Christian Becker: DBpedia Linker

Integrating Redirects

Bill Clinton

30000

William Blythe III

200

Buddy (Clinton's

dog)

5

Putting People First

100

Redirects serve as alias labels. Their references count towards the redirection target.

Page 16: DBpedia Linker

“Brand” category set

Christian Becker: DBpedia Linker

Class Restrictions

imdb_title

“Mary (1985 sitcom)” = ?

Mary (Holy Mother)

50000

Something about Mary

5000

The Mary Tyler Moore

Show

1000

Mary (1985 series)

500

Infobox album

Infobox television

Black and white films

...

Page 17: DBpedia Linker

Christian Becker: DBpedia Linker

Class Equivalence

Mary (1985 sitcom) 1985

tv brand Infobox television

1980s American television

series

(15 more)

BBC CIS DBpedia

sitcom

Something about Mary

The Mary Tyler Moore

Show

Mary (1985 series)

Lucene query:((+mary 1985 sitcom )) AND ((categories:Category\:1985_television_series_debuts))

Page 18: DBpedia Linker

Christian Becker: DBpedia Linker

Class Equivalence

About 5% boost in precision and recall (after class restrictions and exact matching)

Algorithm Enrich class hierarchy using parentheses texts

Perform label-based lookup on all items in the dataset and memorize result candidates

Rank CIS classes against DBpedia classes

Perform label lookup restricting results to top 5,10,15% class equivalences; excluding the overall top 20% classes

Page 19: DBpedia Linker

Christian Becker: DBpedia Linker

Class Equivalence

Mary (1985 sitcom) 1985

tv brand Infobox television

1980s American television

series

(15 more)

BBC CIS DBpedia

sitcom

Something about Mary

The Mary Tyler Moore

Show

Mary (1985 series)

Page 20: DBpedia Linker

Christian Becker: DBpedia Linker

The Linkage tool

Written in Java, uses Lucene indexes prepared in C#

Command line interface with link and benchmark modes

Components Apache Lucene search

OpenRDF Sesame (native storage)

Dataset-specific algorithm choice and parameters

Next step: General Linking Interface

Page 21: DBpedia Linker

Christian Becker: DBpedia Linker

Future Directions

Improve quality / quantity Text-level comparison of content relating to the CIS concepts with

Wikipedia articles

Manual review based on confidence score

General Interlinking Framework Describe input data

Select algorithms

Link!

Add non-existant resources to DBpedia Wikipedia requires qualitative content according to Wikipedia

Guidelines

Idea: A “Minipedia” that serves as an additional source to DBpedia

Page 22: DBpedia Linker

Christian Becker: DBpedia Linker

Thanks!

Questions?