the knowledge acquisition bottleneck revisited: how can we build large kbs?

12
The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs? Illustrations of different approaches Peter Clark and John Thompson Boeing Research 2004

Upload: nicki

Post on 08-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?. Illustrations of different approaches Peter Clark and John Thompson Boeing Research 2004. Premise. Intelligent machines needs lots of knowledge , for question-answering intelligent search information integration - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?

The Knowledge Acquisition Bottleneck Revisited:

How can we build large KBs?

Illustrations of different approachesPeter Clark and John Thompson

Boeing Research2004

Page 2: The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?

Premise• Intelligent machines needs lots of knowledge, for

– question-answering– intelligent search– information integration– natural language understanding– decision support– modeling– etc. etc.

• Much of this knowledge can be drawn from some general repository of reusable knowledge– e.g., WordNet

• How does one build such a repository?“No-one considers hand-building a large KB to be a realistic proposition these days” [paraphrase of Daphne Koller, 2004]

Page 3: The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?

1. Build it by Hand• “Let’s roll up our sleeves and

get on with it!”• But: It’s a daunting task

– Our own work• Cyc

+ Lots in it, (Relatively) well designed ontology

- 650 person-years effort so far

- Still patchy coverage (why?)

- Difficult to use outside Cycorp

Page 4: The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?

1. Build it by Hand (cont)- WordNet

+ Easy to use+ Comprehensive- Little inference-

supporting knowledge in

- Ad hoc ontology

Page 5: The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?

1. Build it by Hand (cont)• The Component Library

Claim: can bound the required knowledge by working at a coarse-grained level

+ Large, more doable

- Hard to use, still very incomplete

Page 6: The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?

2. Extract from Dictionaries

- MindNet+ Automatically built- Unusable?

- Extended WordNet+ Won TREC

competition- Still somewhat

incoherent- Lot of manual

labor

Page 7: The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?

3. Corpus-based Text/Web Mining

- Schubert’s system+ Automatic

+ Lots of knowledge

- Noisy- No word senses- Only grabs certain

kinds of knowledge

30M entries…

Page 8: The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?

3. Corpus-based Text/Web Mining (cont)

- KnowIt (Etsioni)+ automatic- only factoids

Page 9: The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?

4. Community-Based Acquisition• Knowledge entry by the masses• OpenMind

+ Large- Full of junk, unusable (?)

- Would this work with better acquisition tools?

(see next slide for illustration)

Page 10: The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?
Page 11: The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?

5. Use Existing Resources

• e.g.,– databases– CIA World Fact Book– Web data/services

• e.g., SRI/ISI’s ARDA QA system+ Syntactically simple + Available- Largely limited to factoids- Information integration is a major challenge

- different ontologies, contradictory data

Page 12: The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?

Where to?• Can we bound the knowledge needed

– for a particular application– for a useful, sharable, general resource?

• Which of these approaches seems most realistic?– build by hand– extract from dictionaries– mine text corpora– community knowledge entry– use existing resources