david graus - entity linking (at sea), search engines amsterdam, fri june 27th

39
Entity Linking (at SEA) David Graus, University of Amsterdam Photo by TRPultz (Creative Commons Attribution 3.0 Unported License)

Upload: david-graus

Post on 27-Jan-2015

111 views

Category:

Science


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking (at SEA)David Graus, University of Amsterdam

Photo by TRPultz (Creative Commons Attribution 3.0 Unported License)

Page 2: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 2

Today’s talk

Ò What? Ò Why? Ò How? Ò Etc.

Page 3: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 3

Entity Linking?

Ò Link mentions of entities (in text) to their referent entities (in a KB)

Ò Example:“During Tank Johnson’s tumultuous tenure with the Bears, incidents with guns got him arrested, jailed and suspended, and his close friend was shot and killed in front of him after an altercation at a Chicago bar.”

Page 4: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 4

Entity Linking?

Ò Link mentions of entities (in text) to their referent entities (in a KB)

Ò Example:“During Tank Johnson’s tumultuous tenure with the Bears, incidents with guns got him arrested, jailed and suspended, and his close friend was shot and killed in front of him after an altercation at a Chicago bar.”

Page 5: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 5

Entity Mention: Tank

TANK (VEHICLE)

KnowledgeBase (KB)

Document r

TANKquery q

?

?

TANK JOHNSON

Page 6: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 6

Entity Search Outline

Ò What? Ò Why? Ò How? Ò Etc.

Page 7: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 7

Page 8: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 8

Social Media Monitoring

Page 9: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 9

Page 10: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 10

Entity Search Outline

Ò What? Ò Why? Ò How? Ò Etc.

Page 11: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 11

Page 12: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 12

The Semanticizer

Ò Open source framework (https://github.com/semanticize/semanticizer/)

Ò Links to Wikipedia Ò Entity = Wikipedia Page

Ò “Lexical matching” approach Ò no NER, information extraction

http://semanticize.uva.nl/

Page 13: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 13

Lexical matching

Ò Construct “entity dictionaries” Ò By taking entity Titles

!

!Ò Anchors

!

!Ò Redirect pages

Page 14: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 14

n-gram -> entityÒ Kendrick Lamar Ò K-Dot Ò Kendrick Ò K. Dot Ò Kendrick Duckworth Ò Kendrick Lamar' Ò Kendrick Lamar's Ò K Dot Ò Kendrick Lama Ò Kendrick Lamarr Ò Kendrick Llama Ò The Jig Is Up (Dump'n)

Page 15: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14

Ò For an input sentence s; !

!

!

!Ò Retrieve all possible entity candidates

“Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’”

15

Start linking!

Page 16: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14

Ò For an input sentence s; !

!

!

!Ò Retrieve) all possible entity candidates

“Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’”

16

Start linking!

http://en.wikipedia.org/wiki/Eminem

http://en.wikipedia.org/wiki/Good_(economics)

http://en.wikipedia.org/wiki/Lamar_County,_Alabamahttp://en.wikipedia.org/wiki/Lamar_County,_Mississippi

http://en.wikipedia.org/wiki/Lamar_Advertising_Company

http://en.wikipedia.org/wiki/Kendrick,_Idahohttp://en.wikipedia.org/wiki/Good_Kid_Maad_City

http://en.wikipedia.org/wiki/Kendrick_Lamarhttp://en.wikipedia.org/wiki/Kendrick_School

http://en.wikipedia.org/wiki/Lamar_Cardinals_basketball

Page 17: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 17

Ranking entity candidates

Ò “Prior probabilities” Ò link probability Ò commonness Ò sense probability

Page 18: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14

18

1. Link Probability

Ò “Kendrick Lamar” occurs 698x on Wikipedia Ò as hyperlink: 501x Ò no hyperlink: 197x

!

!Ò “Kendrick” occurs 5.037x on Wikipedia

Ò as hyperlink: 24x Ò no hyperlink: 5.014x

!

24 5.037

= 0,005

!

501 698

= 0,718

Page 19: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 19

2. “Commonness”

Ò “Kendrick” is used to refer to: Ò Kendrick,_Idaho Ò Kendrick,_Oklahoma Ò T._D._Kendrick Ò Kendrick_School Ò John_Kendrick_(American_sea_captain) Ò Kendrick_Lamar Ò Francis_Kenrick Ò Kendrick Ò Howie Kendrick

!8 3 3 2 2 2 2 1 1

!/ 24 / 24 / 24 / 24 / 24 / 24 / 24 / 24 / 24

!= 0,333 = 0,125 = 0,125 = 0.083 = 0.083 = 0.083 = 0.083 = 0.042 = 0.042

Page 20: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 20

3. Sense Probability

Ò no. of times n-gram links to entity Ò over all occurrences of n-gram

!

2 5.037

= 0,0004Kendrick -> Kendrick_Lamar =

Kendrick Lamar -> Kendrick_Lamar =

!

500 698

= 0,716

Page 21: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 21

Ranking by prior probability

Works quite well for the bulk of times! !

High accuracy reported on naive linking using only “popularity ranking” [1] !

!

[1] Heng Ji, Ralph Grishman, “Knowledge Base Population: Successful Approaches and Challenges”, ACL 2011

Page 22: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 22

Beyond ranking: supervised linking

Ò Entity linking as binary classification !

Ò Input: Ò sentence s + set of target entities E

Page 23: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 23

Beyond ranking: supervised linking

“Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’”

http://en.wikipedia.org/wiki/Eminem

http://en.wikipedia.org/wiki/Good_(economics)

http://en.wikipedia.org/wiki/Lamar_County,_Alabama

http://en.wikipedia.org/wiki/Lamar_County,_Mississippi

http://en.wikipedia.org/wiki/Lamar_Advertising_Company

http://en.wikipedia.org/wiki/Kendrick,_Idaho

http://en.wikipedia.org/wiki/Good_Kid_Maad_City

http://en.wikipedia.org/wiki/Kendrick_Lamar

Page 24: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 24

Beyond ranking: supervised linking

Ò Given a new sentence, for each candidate entity e output probability of belonging to class:

Ò positive (= target), or Ò negative (= no target)

Page 25: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 25

Features

Ò Local: Ò link each entity mention separately

Ò Global: Ò link all mentions in a document simultaneously,

to arrive at a coherent set of entities

Page 26: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 26

Global features

“Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’”

Page 27: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 27

Global features

“[Eminem] Thinks [Kendrick Lamar]’s [good kid, m.A.A.d. city] Was ‘Genius’” http://en.wikipedia.org/wiki/Eminem

http://en.wikipedia.org/wiki/Good_(economics)

http://en.wikipedia.org/wiki/Lamar_County,_Alabama

http://en.wikipedia.org/wiki/Lamar_County,_Mississippi

http://en.wikipedia.org/wiki/Lamar_Advertising_Company

http://en.wikipedia.org/wiki/Kendrick,_Idaho

http://en.wikipedia.org/wiki/Good_Kid_Maad_City

http://en.wikipedia.org/wiki/Kendrick_Lamar

Page 28: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 28

“Relatedness”

Source: Milne, D. and Witten, I.H. (2008) An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In WIKIAI'08.

Page 29: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 29

Features: Semanticizer

Ò Local: Ò n-gram Ò KB Ò n-gram+KB Ò Text similarity

Ò Global: Ò Finding “related entities”

Page 30: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 30

Local features: n-gram/KBÒ n-gram features:

Ò link probability Ò length of n-gram Ò number of entity titles that contain n-gram

Ò entity features: Ò entity’s number of inlinks Ò entity’s number of outlinks Ò number of redirect pages referring to entity

Ò n-gram+entity features: Ò commonness Ò sense probability Ò edit distance between n-gram and entity title Ò does n-gram contain entity title? Ò does entity title contain n-gram? Ò does title equal n-gram? Ò TF of n-gram in entity document

Page 31: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 31

Local features: Text similarity

Ò Similarity between input sentence s !

!

!

and entity candidate document (Wikipedia page) !

Ò Kendrick_Lamar 0.4215 Ò Kendrick,_Idaho 0.1599

“Eminem Thinks Kendrick Lamar’s good kid, m.A.A.d. city Was ‘Genius’”

Page 32: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 32

Global features

query q

Document r

query q

Document rCand.

e1

Cand.

e2

query q

Document rCand.e1

Cand. e2

Outlinke3

Inlinke4

Inlinke5

Outlinke6

Inlinke7

query q

Document rCand.e1

Cand. e2

Outlinke3

Inlinke4

Inlinke5

Outlinke6

Inlinke7

Anchor 3 Anchor 4

Anchor 3A

Anchor 3B

Anchor 5A

Anchor 5B

Anchor 5C

Anchor 4A

Anchor 4B

Anchor 2A

Anchor 2B

Anchor 2C

Anchor 1B

Anchor 1 Anchor 2

Anchor 1A

query q

Document rCand.e1

Cand. e2

Outlinke3

Inlinke4

Inlinke5

Outlinke6

Inlinke7

Anchor 3 Anchor 4

Anchor 3A

Anchor 3B

Anchor 5A

Anchor 5B

Anchor 5C

Anchor 4A

Anchor 4B

Anchor 2A

Anchor 2B

Anchor 2C

Anchor 1B

Anchor 1 Anchor 2

Anchor 1A

SupportAnchor 1A

SupportAnchor 5C

SupportAnchor 4B

Page 33: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 33

But

Ò Too slow in real life Ò Solution:

set of linked entities (inlinks / outlinks) as “virtual document”

Page 34: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 34

Related entity document

["Entertainment Weekly”, "Compton, California”, “California", “Rapping", “songwriter", "Hip hop music”, "Top Dawg Entertainment”, "Aftermath

Entertainment”, "Interscope Records”, "Black Hippy”, "Dr. Dre”, "The Game (rapper)”, "Jay Rock”, "J. Cole”, "Hip hop music”, "recording artist”, "Compton,

California”, "Carson, California","Top Dawg Entertainment","Aftermath Entertainment","Interscope Records","West Coast hip hop","Supergroup

(music)","Black Hippy","rapper","Schoolboy Q","Jay Rock","Ab-Soul","Overly Dedicated","independent album","Section.80","iTunes Store","Major record label","Dr. Dre","Game (rapper)","Drake (entertainer)","Young Jeezy","Talib

Kweli","Busta Rhymes","E-40","Warren G”, …]

Page 35: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 35

Ò Similarity between sentence s and virtual document as related entity approximation

Page 36: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 36

Supervised Linking

Ò Feature vector for each sentence-entity pair Ò Train a Random Forest classifier

Page 37: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 37

Local vs. global

Ò Hybrid > Local | Global Ò Local & Global > Hybrid

Ò Approaches are complementary Ò Global preferred for highly ambiguous entity

mentions (i.e., short ones)

Page 38: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 38

Etc…

Ò Open challenges: Ò out of KB entities Ò Knowledge Base Creation

Page 39: David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th

Entity Linking at SEA Search Engines Amsterdam, 27 June ’14 39

Thanks!

!

!

!

!

!

!

David Graus [email protected]