distant supervision and multir - iit delhi · 2017. 3. 7. · steve jobs founded apple apple was...

35
Distant Supervision and MultiR Happy Mittal

Upload: others

Post on 30-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Distant Supervision and MultiR

    Happy Mittal

  • We will discuss

    • Distant Supervision [Mintz et al, 2009]

    • MultiR [Hoffmann et al, 2011]

  • Relation Instance Extraction

    • Fully Supervised Learning• Labeled corpora of sentences.• Suffers from small dataset, domain bias.

    • Unsupervised Learning• Cluster patterns to identify relations.• Large corpora available.• Can’t give name to relations identified.

    • Bootstrap Learning• Give initial seed patterns and facts.• Generate more facts and patterns.• Suffers from semantic drift.

    • Distant Supervision• Combines advantages of above approaches.

    Hrithik Roshan’s Movie Kaabilfeatures love affair between two blind people.

    Actor(Hrithik Roshan, Kaabil)

  • Distant Supervision [Mintz et al 2009]

    Sentences (Ex : Wikipedia articles)

    Person Birth Place

    Edwin Hubble Marshfield

    …. ….

    Knowledge base(Ex : Freebase)

    Generate training data

    HOW ?

    Assumption : Fact r(e1,e2) => Every sentence having entities e1 and e2specifies relation r.

  • Distant Supervision (Generating training data)• Astronomer Edwin Hubble was born in Marshfield, Missouri.

    • Features : • Lexical Features

    o Entity Types of both entities.

    NE1 NE2 Label

    PER LOC Birthplace

  • Distant Supervision (Generating training data)• Astronomer Edwin Hubble was born in Marshfield, Missouri.

    • Features : • Lexical Features

    o Words between entities and their POS tags.

    NE1 Middle NE2 Label

    PER [was/VERB born/VERB in/CLOSED] LOC Birthplace

  • Distant Supervision (Generating training data)• Astronomer Edwin Hubble was born in Marshfield, Missouri.

    • Features : • Lexical Features

    o Window of k words to left and right, k∈{0,1,2}

    Left Window NE1 Middle NE2 Right window Label

    [] PER [was/VERB born/VERB in/CLOSED] LOC [] Birthplace

    [Astronomer] PER [was/VERB born/VERB in/CLOSED] LOC [,] Birthplace

    [#,Astronomer] PER [was/VERB born/VERB in/CLOSED] LOC [,Missouri] Birthplace

  • Distant Supervision (Generating training data)• Astronomer Edwin Hubble was born in Marshfield, Missouri.

    • Features : • Syntactic Features

    o Dependency Path between entities.

    o Window node in dependency path.

  • Distant supervision

    • Strong Assumption : If a fact r(e1,e2) is seen in KB, then • Every sentence having e1 and e2 specifies relation r.

    • Relax this assumption : • At least one sentence having e1 and e2 specifies relation r [Riedel et al, 2010]

  • Relaxing the assumption [Riedel et al 2010]

    Founded 𝑌 ∈ R Relation Variable

    Z1 = 1 Z2 = 0

    Steve Jobs founded Apple

    Steve Jobs is the CEO of Apple

    Z1,Z2∈ {0,1} Relation mention Variables

    X1 X2

    • Model the joint distribution 𝑃(𝑌 = 𝑦, 𝑍 = 𝑧|𝑥)

  • Relaxing the assumption [Riedel et al 2010]

    Founded 𝑌 ∈ R Relation Variable

    Z1 = 1 Z2 = 0

    Steve Jobs founded Apple

    Steve Jobs is the CEO of Apple

    Z1,Z2∈ {0,1} Relation mention Variables

    X1 X2

    • Model the joint distribution 𝑃 𝑌 = 𝑦, 𝑍 = 𝑧 𝑥• Problem : Doesn’t allow overlapping relations.• MultiR solves that problem.

  • MultiR [Hoffman et al 2011]

    Founded 𝑌 ∈ 0,1 𝑟

    Relation Variables(Capture aggregate level prediction)

    Z1 = Founded

    Z2 = CEO-of

    Steve Jobs founded Apple

    Steve Jobs is the CEO of Apple

    𝑍𝑖 ∈ 𝑅

    Relation mention Variables(Capture sentence level prediction)

    X1 X2

    CEO-of

    Z3 = None

    Steve Jobs left Apple

    X3

  • MultiR [Hoffman et al 2011]

    • Probability Distribution

    • 𝑃 𝑌 = 𝑦, 𝑍 = 𝑧 𝑥 =1

    𝑍𝑥 𝑟𝜙𝑗𝑜𝑖𝑛(𝑦𝑟 , 𝑧) 𝑖 ∅

    𝑒𝑥𝑡𝑟𝑎𝑐𝑡(𝑧𝑖,𝑥𝑖)

    1 if at least one 𝑧𝑖mentions relation 𝑦𝑟

    [Mintz et al] features

  • MultiR [Hoffman et al 2011]• Parameter Learning

    • 𝑃 𝑌 = 𝑦, 𝑍 = 𝑧 𝑥; 𝜃 =1

    𝑍𝑥 𝑟𝜙𝑗𝑜𝑖𝑛(𝑦𝑟 , 𝑧) 𝑖 ∅

    𝑒𝑥𝑡𝑟𝑎𝑐𝑡(𝑧𝑖,𝑥𝑖)

    • 𝑃 𝑌 = 𝑦, 𝑍 = 𝑧 𝑥; 𝜃 =1

    𝑍𝑥 𝑟𝜙𝑗𝑜𝑖𝑛(𝑦𝑟 , 𝑧) 𝑖 exp( 𝑗 𝜃𝑗 ∅𝑗(𝑧𝑖,𝑥𝑖)

    • Treat Z variables as latent variables.

    • Interested in maximizing

    𝐿 𝜃 =

    𝑖

    𝑃 𝑦𝑖 𝑥𝑖; 𝜃 =

    𝑖

    𝑧

    𝑃 𝑦𝑖 , 𝑧 𝑥𝑖; 𝜃

    𝑙 𝜃 =

    𝑖

    𝑙𝑜𝑔

    𝑧

    𝑃 𝑦𝑖 , 𝑧 𝑥𝑖; 𝜃

    1 if at least one 𝑧𝑖mentions relation 𝑦𝑟

    [Mintz et al] features

  • MultiR [Hoffman et al 2011]• Parameter learning

    Assumption of online training

  • MultiR [Hoffman et al 2011]• Parameter learning

    Difficult to computeCompute argmax instead

  • MultiR [Hoffman et al 2011]

    • Learning Algorithm

    Need to do two inferences

  • MultiR Inference 1 𝑎𝑟𝑔𝑚𝑎𝑥𝑦,𝑧𝑃(𝑦, 𝑧|𝑥; 𝜃)

    ? 𝑌 ∈ 0,1 𝑟

    Relation Variables(Capture aggregate level prediction)

    ? ?

    Steve Jobs founded Apple

    Apple was founded by Steve Jobs

    𝑍𝑖 ∈ 𝑅

    Relation mention Variables(Capture sentence level prediction)

    X1 X2

    ?

    ?

    Steve Jobs ls the CEO of Apple

    X3

    founder CEO-of

    ?

    Capital

    Founder 10.5 12.5 4.5

    CEO-of 8.9 8.7 8.5

    Capital 6.3 4.5 0.5

  • MultiR Inference 1 𝑎𝑟𝑔𝑚𝑎𝑥𝑦,𝑧𝑃(𝑦, 𝑧|𝑥; 𝜃)

    ? 𝑌 ∈ 0,1 𝑟

    Relation Variables(Capture aggregate level prediction)

    ? ?

    Steve Jobs founded Apple

    Apple was founded by Steve Jobs

    𝑍𝑖 ∈ 𝑅

    Relation mention Variables(Capture sentence level prediction)

    X1 X2

    ?

    ?

    Steve Jobs is the CEO of Apple

    X3

    founder CEO-of

    ?

    Capital

    Founder 10.5 12.5 4.5

    CEO-of 8.9 8.7 8.5

    Capital 6.3 4.5 0.5

  • MultiR Inference 1 𝑎𝑟𝑔𝑚𝑎𝑥𝑦,𝑧𝑃(𝑦, 𝑧|𝑥; 𝜃)

    ? 𝑌 ∈ 0,1 𝑟

    Relation Variables(Capture aggregate level prediction)

    Founder Founder

    Steve Jobs founded Apple

    Apple was founded by Steve Jobs

    𝑍𝑖 ∈ 𝑅

    Relation mention Variables(Capture sentence level prediction)

    X1 X2

    ?

    CEO-of

    Steve Jobs is the CEO of Apple

    X3

    founder CEO-of

    ?

    Capital

    Founder 10.5 12.5 4.5

    CEO-of 8.9 8.7 8.5

    Capital 6.3 4.5 0.5

  • MultiR Inference 1 𝑎𝑟𝑔𝑚𝑎𝑥𝑦,𝑧𝑃(𝑦, 𝑧|𝑥; 𝜃)

    1 𝑌 ∈ 0,1 𝑟

    Relation Variables(Capture aggregate level prediction)

    Founder Founder

    Steve Jobs founded Apple

    Apple was founded by Steve Jobs

    𝑍𝑖 ∈ 𝑅

    Relation mention Variables(Capture sentence level prediction)

    X1 X2

    1

    CEO-of

    Steve Jobs is the CEO of Apple

    X3

    founder CEO-of

    0

    Capital

    𝑂( 𝑅 𝑆 )Founder 10.5 12.5 4.5

    CEO-of 8.9 8.7 8.5

    Capital 6.3 4.5 0.5

  • MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)

    1 𝑌 ∈ 0,1 𝑟

    Relation Variables(Capture aggregate level prediction)

    ? ?

    Steve Jobs founded Apple

    Apple was founded by Steve Jobs

    𝑍𝑖 ∈ 𝑅

    Relation mention Variables(Capture sentence level prediction)

    X1 X2

    1

    ?

    Steve Jobs is the CEO of Apple

    X3

    founder CEO-of

    0

    Capital

    Founder 10.5 12.5 4.5

    CEO-of 8.9 8.7 8.5

    Capital 6.3 4.5 0.5

  • MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)

    1 𝑌 ∈ 0,1 𝑟

    Relation Variables(Capture aggregate level prediction)

    ? ?

    Steve Jobs founded Apple

    Apple was founded by Steve Jobs

    𝑍𝑖 ∈ 𝑅

    Relation mention Variables(Capture sentence level prediction)

    X1 X2

    1

    ?

    Steve Jobs is the CEO of Apple

    X3

    founder CEO-of

    0

    Capital

    10.512.5 4.58.9

    8.78.5

    Founder 10.5 12.5 4.5

    CEO-of 8.9 8.7 8.5

    Capital 6.3 4.5 0.5

    Potentials as edge weights(Ignore edgesWith y = 0)

  • MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)

    1

    Variant of weighted edge cover problem

    ? ?

    Steve Jobs founded Apple

    Apple was founded by Steve Jobs

    X1 X2

    1

    ?

    Steve Jobs is the CEO of Apple

    X3

    founder CEO-of

    0

    Capital

    10.512.5 4.58.9

    8.78.5

    Founder 10.5 12.5 4.5

    CEO-of 8.9 8.7 8.5

    Capital 6.3 4.5 0.5

    Potentials as edge weights(Ignore edgesWith y = 0)

    Each y at least one edgeEach z exactly one edge

  • MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)

    1

    Variant of weighted edge cover problem

    ? ?

    Steve Jobs founded Apple

    Apple was founded by Steve Jobs

    X1 X2

    1

    ?

    Steve Jobs is the CEO of Apple

    X3

    founder CEO-of

    0

    Capital

    10.512.5 4.58.9

    8.78.5

    Founder 10.5 12.5 4.5

    CEO-of 8.9 8.7 8.5

    Capital 6.3 4.5 0.5

    Potentials as edge weights(Ignore edgesWith y = 0)

    Each y at least one edgeEach z exactly one edge

  • MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)

    1

    Variant of weighted edge cover problem

    Founder Founder

    Steve Jobs founded Apple

    Apple was founded by Steve Jobs

    X1 X2

    1

    CEO-Of

    Steve Jobs is the CEO of Apple

    X3

    founder CEO-of

    0

    Capital

    10.512.5 4.58.9

    8.78.5

    Founder 10.5 12.5 4.5

    CEO-of 8.9 8.7 8.5

    Capital 6.3 4.5 0.5

    Potentials as edge weights(Ignore edgesWith y = 0)

    Each y at least one edgeEach z exactly one edge

    Exact Solution𝑂(𝑉(𝐸 + 𝑉𝑙𝑜𝑔𝑉))

  • MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)

    1

    Variant of weighted edge cover problem

    Founder Founder

    Steve Jobs founded Apple

    Apple was founded by Steve Jobs

    X1 X2

    1

    CEO-Of

    Steve Jobs is the CEO of Apple

    X3

    founder CEO-of

    0

    Capital

    10.512.5 4.58.9

    8.78.5

    Founder 10.5 12.5 4.5

    CEO-of 8.9 8.7 8.5

    Capital 6.3 4.5 0.5

    Potentials as edge weights(Ignore edgesWith y = 0)

    Each y at least one edgeEach z exactly one edge

    Approx Solution𝑂(|𝑅||𝑆|)

  • Experiments• Data

    • NY Times sentences : NER tagged

    • Used Freebase as KB.

    • Evaluation Metric

    • Challenging

    • Only 3% of sentences match facts in KB.

    • Number of matches across relations highly unbalanced.

    • Aggregate Extraction

    • Matched extracted relations with freebase relations.

    • Underestimates accuracy because many true relations not in free base.

    • Sentential Extraction

    • Sampled sentences from union of two sets of sentences : • Sentences from which some relation is extracted.

    • Sentences whose arguments match with entities in freebase.

    • Manually labelled them correct or incorrect.

    • Overestimates the recall.

  • Experiments

    • Systems compared• Original implementation of Riedel et al [2010]

    • SoloR : Reimplementation of Riedel et al [2010]

    • MultiR

    • Metrics• Aggregate and sentential extraction results (PR curve)

    • Relation specific results

    • Running time

  • Experiments

    • Results• Aggregate extraction

    • MultiR : High precision over all recall

    • MultiR : Recall from 20% to 25%

    • Low precision in 0-1% Recall

    • To investigate, extracted top 10

    Relations marked wrong.

    • Correct but not present in Freebase.

  • Experiments

    • Results• Sentential extraction

    • Riedel et al didn’t report.

    • MultiR : High precision and recall

    • MultiR : F1 score : 60.5%

  • Experiments

    • Results• Relation specific results

    • Take 10 top frequent relations.

    • 𝑆𝑟𝑀 : Sentences MultiR extracted relation r.

    • 𝑆𝑟𝐹 : Sentences matching arguments in freebase for relation r.

    • Sample 100 sentences from both.

    • Compute Accuracy, Precision and recall.

  • Experiments

    Effect of modeling overlapping relations

  • Discussion

    • Only relies on freebase for experimental evaluation [Nupur et al]

    • Assumes that if a fact is present in text, then it must be present in KB [Dinesh Raghu]

    • Only one relation in a sentence [Barun]

    • Assume entities occur as NP only. [Gagan]

    • Should use sampling instead of argmax as done in Riedel et al. [Happy, Barun]

    • Evaluation problem : Only 3% sentences match in Freebase [Gagan]

    • For sentential extraction evaluation, sampled only 1000 sentences.

    • Separate graph for every entity pair : Scaling issue [Prachi]

  • Possible Extensions

    • Evaluate on some other datasets as well, like Google knowledge graph [Anshul, Rishabh]

    • Bootstrapping like NELL [Gagan et al]

    • Iteratively correct the facts during learning for 0-1% recall range [Surag]

    • Extract entity mentions spanning multiple sentences [Anshul]

    • Relation to MLNs : Apply Lifting [Ankit]