the knowledge acquisition bottleneck revisited: how can we build large kbs?
DESCRIPTION
The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?. Illustrations of different approaches Peter Clark and John Thompson Boeing Research 2004. Premise. Intelligent machines needs lots of knowledge , for question-answering intelligent search information integration - PowerPoint PPT PresentationTRANSCRIPT
The Knowledge Acquisition Bottleneck Revisited:
How can we build large KBs?
Illustrations of different approachesPeter Clark and John Thompson
Boeing Research2004
Premise• Intelligent machines needs lots of knowledge, for
– question-answering– intelligent search– information integration– natural language understanding– decision support– modeling– etc. etc.
• Much of this knowledge can be drawn from some general repository of reusable knowledge– e.g., WordNet
• How does one build such a repository?“No-one considers hand-building a large KB to be a realistic proposition these days” [paraphrase of Daphne Koller, 2004]
1. Build it by Hand• “Let’s roll up our sleeves and
get on with it!”• But: It’s a daunting task
– Our own work• Cyc
+ Lots in it, (Relatively) well designed ontology
- 650 person-years effort so far
- Still patchy coverage (why?)
- Difficult to use outside Cycorp
1. Build it by Hand (cont)- WordNet
+ Easy to use+ Comprehensive- Little inference-
supporting knowledge in
- Ad hoc ontology
1. Build it by Hand (cont)• The Component Library
Claim: can bound the required knowledge by working at a coarse-grained level
+ Large, more doable
- Hard to use, still very incomplete
2. Extract from Dictionaries
- MindNet+ Automatically built- Unusable?
- Extended WordNet+ Won TREC
competition- Still somewhat
incoherent- Lot of manual
labor
3. Corpus-based Text/Web Mining
- Schubert’s system+ Automatic
+ Lots of knowledge
- Noisy- No word senses- Only grabs certain
kinds of knowledge
30M entries…
3. Corpus-based Text/Web Mining (cont)
- KnowIt (Etsioni)+ automatic- only factoids
4. Community-Based Acquisition• Knowledge entry by the masses• OpenMind
+ Large- Full of junk, unusable (?)
- Would this work with better acquisition tools?
(see next slide for illustration)
5. Use Existing Resources
• e.g.,– databases– CIA World Fact Book– Web data/services
• e.g., SRI/ISI’s ARDA QA system+ Syntactically simple + Available- Largely limited to factoids- Information integration is a major challenge
- different ontologies, contradictory data
Where to?• Can we bound the knowledge needed
– for a particular application– for a useful, sharable, general resource?
• Which of these approaches seems most realistic?– build by hand– extract from dictionaries– mine text corpora– community knowledge entry– use existing resources