automatic set expansion for list question answering richard c. wang, nico schlaefer, william w....

Automatic Set Expansion for List Question AnsweringRichard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg

Language Technologies InstituteCarnegie Mellon UniversityPittsburgh, PA 15213 USA

2 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Task Automatically improve answers generated by Que

stion Answering systems for list questions, by using a Set Expansion system.

For example: Name cities that have Starbucks.

QA Answers Expanded AnswersBostonSeattle

Carnegie-MellonAquafinaGoogle

Logitech

SeattleBoston

ChicagoPittsburgh

Carnegie-MellonGoogle

Better!



Outline Introduction

Question Answering Set Expansion

Proposed Approach Aggressive Fetcher Lenient Extractor Hinted Expander

Experimental Results QA System: Ephyra Other QA Systems

Conclusion



Question Answering (QA) Question Answering task:

Retrieve answers to natural language questions Different question types:

Factoid questions List questions Definitional questions Opinion questions

Major QA evaluations: Text REtrieval Conference (TREC): English NTCIR: Japanese, Chinese CLEF: European languages



Typical QA Pipe

line

QuestionAnalysis

Query Generation& Search

CandidateGeneration

AnswerScoring

KnowledgeSources

Question String

Analyzed Question

Search Results

Candidate Answers

Scored Answers

The two original textsmileys were inventedon September 19, 1982by Scott E. Fahlman ...

• smileys• September 19, 1982• Scott E. Fahlman

Candidate Score

Scott E. Fahlman 0.853smileys 0.418September 19, 1982 0.239

“Who invented the smiley?”

Answer type: PersonKeywords: invented, smiley...



QA System: Ephyra (Schlaefer et al., TREC 200

7) History:

Developed at University of Karlsruhe, Germany and Carnegie Mellon University, USA

TREC participations in 2006 (13th out of 27 teams) and 2007 (7th out of 21 teams)

Released into open source in 2008

Different candidate generators: Answer type classification Regular expression matching Semantic parsing

Available for download at: http://www.ephyra.info/







Conclusion



Set Expansion (SE) For example,

Given a query: {“survivor”, “amazing race”} Answer is: {“american idol”, “big brother”, ....}

More formally, Given a small number of seeds: x1, x2, …, xk wh

ere each xi St Answer is a listing of other probable elements: e1, e2, …, en where each ei St

A well-known example of a web-based set expansion system is Google Sets™ http://labs.google.com/sets



SE System: SEAL (Wang & Cohen, ICDM 2007)

Features Independent of human/markup language

Support seeds in English, Chinese, Japanese, Korean, ... Accept documents in HTML, XML, SGML, TeX, WikiML, …

Does not require pre-annotated training data Utilize readily-available corpus: World Wide Web

Based on two research contributions Automatically construct wrappers for extracting candi

date items Rank extracted items using random graph walk

Try it out for yourself: http://rcwang.com/seal



SEAL’s SE Pipeline

Fetcher: downloads web pages from the Web Extractor: learns wrappers from web pages Ranker: ranks entities extracted by wrappers

CanonNikonOlympus

PentaxSonyKodakMinoltaPanasonicCasioLeicaFujiSamsung…



Challenge SE systems require relevant (non-noisy) s

eeds, but answers produced by QA systems are often noisy.

How can we integrate those two systems together?We propose three extensions to SEAL

Aggressive Fetcher Lenient Extractor Hinted Expander







Conclusion



Original Fetcher

Procedure:1. Compose a search query by concatenating all seeds

2. Use Google to request top 100 web pages

3. Fetch web pages and send to the Extractor

Seeds

BostonSeattle

Carnegie-Mellon

Query

Boston Seattle Carnegie-Mellon



Proposed Fetcher Aggressive Fetcher (AF)

Sends a two-seed query for every possible pair of seeds to the search engines

More likely to compose queries containing only relevant seeds

Seeds

BostonSeattle

Carnegie-Mellon

Queries

Boston SeattleBoston Carnegie-MellonSeattle Carnegie-Mellon







Conclusion



Original Extractor A wrapper is a pair of L and R context string

Maximally-long contextual strings that bracket at least one instance of every seed

Extracts strings between L and R

Learn wrappers from web pages and seeds on the fly Utilize semi-structured documents Wrappers defined at character level

No tokenization required (language-independent) However, very page specific (page-dependent)

<img src="/common/logos/honda/logo-horiz-rgb-lg-dkbg.gif" alt="4"></a> <ul><li><a href="http://www.curryhonda-ga.com/"> Curry Honda Atlanta...</li> <li><a href="http://www.curryhondamass.com/"> Curry Honda...</li> <li class="last"><a href="http://www.curryhondany.com/"> Curry Honda Yorktown...</li></ul> </li>

<li class="honda"><a href="http://www.curryauto.com/">

<li class="acura"><a href="http://www.curryauto.com/">

<li class="toyota"><a href="http://www.curryauto.com/">

<li class="nissan"><a href="http://www.curryauto.com/">

<li class="ford"><a href="http://www.curryauto.com/"> <img src="/common/logos/ford/logo-horiz-rgb-lg-dkbg.gif" alt="3"></a> <ul><li class="last"><a href="http://www.curryauto.com/"> Curry Ford...</li></ul> </li>

<img src="/curryautogroup/images/logo-horiz-rgb-lg-dkbg.gif" alt="5"></a> <ul><li class="last"><a href="http://www.curryacura.com/"> Curry Acura...</li></ul> </li>

<img src="/common/logos/toyota/logo-horiz-rgb-lg-dkbg.gif" alt="7"></a> <ul><li class="last"><a href="http://www.geisauto.com/toyota/"> Curry Toyota...</li></ul> </li>

<img src="/common/logos/nissan/logo-horiz-rgb-lg-dkbg.gif" alt="6"></a> <ul><li class="last"><a href="http://www.geisauto.com/"> Curry Nissan...</li></ul> </li>

Proposed Extractor Lenient Extractor (LE)

Maximally-long contextual strings that bracket at least one instance of a minimum of two seeds

More likely to find useful contexts that bracket only relevant seeds

Text

... in Boston City Hall ...

... in Seattle City Hall ...

... at Boston University ...

... at Seattle University ...

... at Carnegie-Mellon University ...

Learned Wrapper (w/o LE)

at <blah> University

Learned Wrappers (w/ LE)

at <blah> University

in <blah> City Hall







Conclusion



Hinted Expander (HE)

Utilizes contexts in the question to constrain SEAL’s search space on the Web Extract up to three keywords from the question using

Ephyra’s keyword extractor Append the keywords to the search query

Example: Name cities that have Starbucks.

More likely to find documents containing desired set of answers







Conclusion



Experiment #1: Ephyra Evaluate on TREC 13, 14, and 15 datasets

55, 93, and 89 list questions respectively

Use SEAL to expand top four answers from Ephyra Outputs a list of answers ranked by confidence scores

For each dataset, we report: Mean Average Precision (MAP)

Mean of average precision for each ranked list

Average F1 with Optimal Per-Question Threshold For each question, cut off the list at a threshold which maximizes

the F1 score for that particular question



Experiment #1: EphyraMean Average Precision

6%

10%

14%

18%

22%

26%

30%

34%

Trec 13 Trec 14 Trec 15

TREC Dataset

Mea

n A

vg. P

reci

sio

n (%

)

Ephyra

Ephyra's Top 4

SEAL

SEAL+LE

SEAL+LE+AF

SEAL+LE+AF+HE

F1 with Optimal Per-Question Threshold

12%

16%

20%

24%

28%

32%

36%

40%

Trec 13 Trec 14 Trec 15

TREC Dataset

Av

g. O

pti

ma

l F1

(%

)

Ephyra

Ephyra's Top 4

SEAL

SEAL+LE

SEAL+LE+AF

SEAL+LE+AF+HE



Experiment #2: Ephyra

In practice, thresholds are unknown For each dataset, do 5-fold cross validation:

Train: Find one optimal threshold for four folds Test: Use the threshold to evaluate the fifth fold

Introduce a fourth dataset: All Union of TREC 13, 14, and 15

Introduce another system: Hybrid Intersection of original answers from Ephyra and expand

ed answers from SEAL



Experiment #2: EphyraF1 with Trained Threshold

12%

14%

16%

18%

20%

22%

24%

26%

28%

30%

32%

Trec 13 Trec 14 Trec 15 All

TREC Dataset

Av

g. F

1 (

%)

Ephyra

SEAL+LE+AF+HE

Hybrid







Conclusion



Experiment: Other QA Systems Top five QA systems that perform the best on li

st questions in TREC 15 evaluation1. Language Computer Corporation (lccPA06)

2. The Chinese University of Hong Kong (cuhkqaepisto)

3. National University of Singapore (NUSCHUAQA1)

4. Fudan University (FDUQAT15A)

5. National Security Agency (QACTIS06C)

For each QA system, train thresholds for SEAL and Hybrid on the union of TREC 13 and 14 Expand top four answers from the QA systems on T

REC 15, and apply the trained threshold



Experiment: Top QA Systems

30%

32%

34%

36%

38%

40%

42%

44%

46%

lccPA06

Av

era

ge

F1

(%

)

F1 with Trained Threshold

12%

13%

14%

15%

16%

17%

18%

19%

20%

21%

22%

cuhkqaepisto NUSCHUAQA1 FDUQAT15A QACTIS06C

TREC Dataset

Baseline

Top 4 Ans.

Google Sets

SEAL+LE+AF+HE

Hybrid



Conclusion

A feasible method for integrating a SE approach into any QA system

Proposed SE approach is effective Improves QA systems on list questions by usi

ng only a few top answers as seeds Proposed hybrid system is effective

Improves Ephyra and (most) top five QA systems



Thank You!

automatic set expansion for list question answering richard c. wang, nico schlaefer, william w....

Documents

qa question

set expansion system

set expansion se

automatic set expansion

outline introduction

university of karlsruhe

qa systems conclusion

usa slide