deep processing qa & information retrieval
DESCRIPTION
Deep Processing QA & Information Retrieval. Ling573 NLP Systems and Applications April 11, 2013. Roadmap. PowerAnswer-2: Deep processing Q/A Problem : Matching Topics and Documents Methods: Vector Space Model Retrieval evaluation. PowerAnswer2. Language Computer Corp. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/1.jpg)
Deep Processing QA& Information Retrieval
Ling573NLP Systems and Applications
April 11, 2013
![Page 2: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/2.jpg)
RoadmapPowerAnswer-2: Deep processing Q/AProblem:
Matching Topics and DocumentsMethods:
Vector Space ModelRetrieval evaluation
![Page 3: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/3.jpg)
PowerAnswer2Language Computer Corp.
Lots of UT Dallas affiliatesTasks: factoid questionsMajor novel components:
Web-boosting of resultsCOGEX logic proverTemporal event processingExtended semantic chains
Results: “Above median”: 53.4% main
![Page 4: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/4.jpg)
Challenges: Co-referenceSingle, basic referent:
![Page 5: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/5.jpg)
Challenges: Co-referenceSingle, basic referent:
Multiple possible antecedents:Depends on previous correct answers
![Page 6: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/6.jpg)
Challenges: EventsEvent answers:
Not just nominal concepts
![Page 7: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/7.jpg)
Challenges: EventsEvent answers:
Not just nominal conceptsNominal events:
Preakness 1998
![Page 8: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/8.jpg)
Challenges: EventsEvent answers:
Not just nominal conceptsNominal events:
Preakness 1998Complex events:
Plane clips cable wires in Italian resort
![Page 9: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/9.jpg)
Challenges: EventsEvent answers:
Not just nominal conceptsNominal events:
Preakness 1998Complex events:
Plane clips cable wires in Italian resort
Establish question context, constraints
![Page 10: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/10.jpg)
Handling Question SeriesGiven target and series, how deal with
reference?
![Page 11: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/11.jpg)
Handling Question SeriesGiven target and series, how deal with
reference?Shallowest approach:
Concatenation:Add the ‘target’ to the question
![Page 12: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/12.jpg)
Handling Question SeriesGiven target and series, how deal with
reference?Shallowest approach:
Concatenation:Add the ‘target’ to the question
Shallow approach:Replacement:
Replace all pronouns with target
![Page 13: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/13.jpg)
Handling Question SeriesGiven target and series, how deal with
reference?Shallowest approach:
Concatenation:Add the ‘target’ to the question
Shallow approach:Replacement:
Replace all pronouns with target
Least shallow approach:Heuristic reference resolution
![Page 14: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/14.jpg)
Question Series ResultsNo clear winning strategy
![Page 15: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/15.jpg)
Question Series ResultsNo clear winning strategy
All largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine
![Page 16: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/16.jpg)
Question Series ResultsNo clear winning strategy
All largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine
‘Replacement’ strategy can be problematic E.g. Target=Nirvana: What is their biggest hit?
![Page 17: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/17.jpg)
Question Series ResultsNo clear winning strategy
All largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine
‘Replacement’ strategy can be problematic E.g. Target=Nirvana: What is their biggest hit? When was the band formed?
![Page 18: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/18.jpg)
Question Series ResultsNo clear winning strategy
All largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine
‘Replacement’ strategy can be problematic E.g. Target=Nirvana: What is their biggest hit? When was the band formed?
Wouldn’t replace ‘the band’
![Page 19: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/19.jpg)
Question Series ResultsNo clear winning strategy
All largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine
‘Replacement’ strategy can be problematic E.g. Target=Nirvana: What is their biggest hit? When was the band formed?
Wouldn’t replace ‘the band’
Most teams concatenate
![Page 20: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/20.jpg)
PowerAnswer-2Factoid QA system:
![Page 21: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/21.jpg)
PowerAnswer-2Standard main components:
Question analysis, passage retrieval, answer processing
![Page 22: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/22.jpg)
PowerAnswer-2Standard main components:
Question analysis, passage retrieval, answer processing
Web-based answer boosting
![Page 23: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/23.jpg)
PowerAnswer-2Standard main components:
Question analysis, passage retrieval, answer processing
Web-based answer boostingComplex components:
![Page 24: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/24.jpg)
PowerAnswer-2Standard main components:
Question analysis, passage retrieval, answer processing
Web-based answer boostingComplex components:
COGEX abductive proverWord knowledge, semantics:
Extended WordNet, etcTemporal processing
![Page 25: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/25.jpg)
Web-Based BoostingCreate search engine queries from question
![Page 26: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/26.jpg)
Web-Based BoostingCreate search engine queries from questionExtract most redundant answers from search
Cf. Dumais et al – AskMSR; Lin – ARANEA
![Page 27: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/27.jpg)
Web-Based BoostingCreate search engine queries from questionExtract most redundant answers from search
Cf. Dumais et al - AskMSR; Lin – ARANEAIncrease weight on TREC candidates that match
Higher weight if higher frequency
![Page 28: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/28.jpg)
Web-Based BoostingCreate search engine queries from questionExtract most redundant answers from search
Cf. Dumais et al - AskMSR; Lin – ARANEAIncrease weight on TREC candidates that match
Higher weight if higher frequencyIntuition:
Common terms in search likely to be answerQA answer search too focused on query terms
![Page 29: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/29.jpg)
Web-Based BoostingCreate search engine queries from questionExtract most redundant answers from search
Cf. Dumais et al - AskMSR; Lin – ARANEA Increase weight on TREC candidates that match
Higher weight if higher frequency Intuition:
Common terms in search likely to be answer QA answer search too focused on query terms Reweighting improves
Web-boosting improves significantly: 20%
![Page 30: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/30.jpg)
Deep Processing: Query/Answer Formulation
Preliminary shallow processing:Tokenization, POS tagging, NE recognition,
Preprocess
![Page 31: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/31.jpg)
Deep Processing: Query/Answer Formulation
Preliminary shallow processing:Tokenization, POS tagging, NE recognition,
PreprocessParsing creates syntactic representation:
Focused on nouns, verbs, and particles Attachment
![Page 32: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/32.jpg)
Deep Processing: Query/Answer Formulation
Preliminary shallow processing:Tokenization, POS tagging, NE recognition,
PreprocessParsing creates syntactic representation:
Focused on nouns, verbs, and particles Attachment
Coreference resolution links entity references
![Page 33: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/33.jpg)
Deep Processing: Query/Answer Formulation
Preliminary shallow processing:Tokenization, POS tagging, NE recognition,
PreprocessParsing creates syntactic representation:
Focused on nouns, verbs, and particles Attachment
Coreference resolution links entity referencesTranslate to full logical form
As close as possible to syntax
![Page 34: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/34.jpg)
Syntax to Logical Form
![Page 35: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/35.jpg)
Syntax to Logical Form
![Page 36: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/36.jpg)
Syntax to Logical Form
![Page 37: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/37.jpg)
Deep Processing:Answer Selection
Lexical chains:Bridge gap in lexical choice b/t Q and A
Improve retrieval and answer selection
![Page 38: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/38.jpg)
Deep Processing:Answer Selection
Lexical chains:Bridge gap in lexical choice b/t Q and A
Improve retrieval and answer selectionCreate connections between synsets through
topicalityQ: When was the internal combustion engine
invented?A: The first internal-combustion engine was built in
1867.invent → create_mentally → create → build
![Page 39: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/39.jpg)
Deep Processing:Answer Selection
Lexical chains: Bridge gap in lexical choice b/t Q and A
Improve retrieval and answer selection Create connections between synsets through topicality
Q: When was the internal combustion engine invented?A: The first internal-combustion engine was built in 1867. invent → create_mentally → create → build
Perform abductive reasoning b/t QLF & ALF Tries to justify answer given question
![Page 40: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/40.jpg)
Deep Processing:Answer Selection
Lexical chains: Bridge gap in lexical choice b/t Q and A
Improve retrieval and answer selection Create connections between synsets through topicality
Q: When was the internal combustion engine invented?A: The first internal-combustion engine was built in 1867. invent → create_mentally → create → build
Perform abductive reasoning b/t QLF & ALF Tries to justify answer given question Yields 12% improvement in accuracy!
![Page 41: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/41.jpg)
Temporal Processing16% of factoid questions include time reference
![Page 42: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/42.jpg)
Temporal Processing16% of factoid questions include time referenceIndex documents by date: absolute, relative
![Page 43: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/43.jpg)
Temporal Processing16% of factoid questions include time referenceIndex documents by date: absolute, relativeIdentify temporal relations b/t events
Store as triples of (S, E1, E2)S is temporal relation signal – e.g. during, after
![Page 44: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/44.jpg)
Temporal Processing16% of factoid questions include time referenceIndex documents by date: absolute, relativeIdentify temporal relations b/t events
Store as triples of (S, E1, E2)S is temporal relation signal – e.g. during, after
Answer selection:Prefer passages matching Question temporal
constraintDiscover events related by temporal signals in Q & AsPerform temporal unification; boost good As
![Page 45: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/45.jpg)
Temporal Processing16% of factoid questions include time reference Index documents by date: absolute, relative Identify temporal relations b/t events
Store as triples of (S, E1, E2)S is temporal relation signal – e.g. during, after
Answer selection: Prefer passages matching Question temporal constraint Discover events related by temporal signals in Q & As Perform temporal unification; boost good As
Improves only by 2% Mostly captured by surface forms
![Page 46: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/46.jpg)
Results
![Page 47: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/47.jpg)
Matching Topics and Documents
Two main perspectives:Pre-defined, fixed, finite topics:
“Text Classification”
![Page 48: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/48.jpg)
Matching Topics and Documents
Two main perspectives:Pre-defined, fixed, finite topics:
“Text Classification”
Arbitrary topics, typically defined by statement of information need (aka query)“Information Retrieval”
Ad-hoc retrieval
![Page 49: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/49.jpg)
Information Retrieval Components
Document collection:Used to satisfy user requests, collection of:
![Page 50: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/50.jpg)
Information Retrieval Components
Document collection:Used to satisfy user requests, collection of:Documents:
Basic unit available for retrieval
![Page 51: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/51.jpg)
Information Retrieval Components
Document collection:Used to satisfy user requests, collection of:Documents:
Basic unit available for retrieval Typically: Newspaper story, encyclopedia entry
![Page 52: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/52.jpg)
Information Retrieval Components
Document collection:Used to satisfy user requests, collection of:Documents:
Basic unit available for retrieval Typically: Newspaper story, encyclopedia entry Alternatively: paragraphs, sentences; web page, site
![Page 53: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/53.jpg)
Information Retrieval Components
Document collection:Used to satisfy user requests, collection of:Documents:
Basic unit available for retrieval Typically: Newspaper story, encyclopedia entry Alternatively: paragraphs, sentences; web page, site
Query: Specification of information need
![Page 54: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/54.jpg)
Information Retrieval Components
Document collection:Used to satisfy user requests, collection of:Documents:
Basic unit available for retrieval Typically: Newspaper story, encyclopedia entry Alternatively: paragraphs, sentences; web page, site
Query: Specification of information need
Terms:Minimal units for query/document
![Page 55: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/55.jpg)
Information Retrieval Components
Document collection: Used to satisfy user requests, collection of: Documents:
Basic unit available for retrieval Typically: Newspaper story, encyclopedia entry Alternatively: paragraphs, sentences; web page, site
Query: Specification of information need
Terms: Minimal units for query/document
Words, or phrases
![Page 56: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/56.jpg)
Information Retrieval Architecture
![Page 57: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/57.jpg)
Vector Space ModelBasic representation:
Document and query semantics defined by their terms
![Page 58: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/58.jpg)
Vector Space ModelBasic representation:
Document and query semantics defined by their terms
Typically ignore any syntaxBag-of-words (or Bag-of-terms)
Dog bites man == Man bites dog
![Page 59: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/59.jpg)
Vector Space ModelBasic representation:
Document and query semantics defined by their terms
Typically ignore any syntaxBag-of-words (or Bag-of-terms)
Dog bites man == Man bites dog
Represent documents and queries asVectors of term-based features
![Page 60: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/60.jpg)
Vector Space ModelBasic representation:
Document and query semantics defined by their termsTypically ignore any syntax
Bag-of-words (or Bag-of-terms) Dog bites man == Man bites dog
Represent documents and queries asVectors of term-based featuresE.g. N:
![Page 61: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/61.jpg)
Vector Space ModelBasic representation:
Document and query semantics defined by their terms Typically ignore any syntax
Bag-of-words (or Bag-of-terms) Dog bites man == Man bites dog
Represent documents and queries asVectors of term-based featuresE.g. N:
# of terms in vocabulary of collection: Problem?
![Page 62: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/62.jpg)
RepresentationSolution 1:
Binary features: w=1 if term present, 0 otherwise
Similarity:Number of terms in commonDot product
Issues?
![Page 63: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/63.jpg)
VSM WeightsWhat should the weights be?“Aboutness”
To what degree is this term what document is about?Within document measureTerm frequency (tf): # occurrences of t in doc j
Examples: Terms: chicken, fried, oil, pepper D1: fried chicken recipe: (8, 2, 7,4) D2: poached chick recipe: (6, 0, 0, 0) Q: fried chicken: (1, 1, 0, 0)
![Page 64: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/64.jpg)
Vector Space Model (II)Documents & queries:
Document collection: term-by-document matrix
View as vector in multidimensional spaceNearby vectors are related
Normalize for vector length
![Page 65: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/65.jpg)
Vector Space Model
![Page 66: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/66.jpg)
Vector Similarity Computation
Normalization:Improve over dot product
Capture weightsCompensate for document length
:
![Page 67: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/67.jpg)
Vector Similarity Computation
Normalization:Improve over dot product
Capture weightsCompensate for document length
Cosine similarity
![Page 68: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/68.jpg)
Vector Similarity Computation
Normalization:Improve over dot product
Capture weightsCompensate for document length
Cosine similarity
Identical vectors:
![Page 69: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/69.jpg)
Vector Similarity Computation
Normalization:Improve over dot product
Capture weightsCompensate for document length
Cosine similarity
Identical vectors: 1No overlap:
![Page 70: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/70.jpg)
Vector Similarity Computation
Normalization:Improve over dot product
Capture weightsCompensate for document length
Cosine similarity
Identical vectors: 1No overlap: 0
![Page 71: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/71.jpg)
Term Weighting Redux“Aboutness”
Term frequency (tf): # occurrences of t in doc j
![Page 72: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/72.jpg)
Term Weighting Redux“Aboutness”
Term frequency (tf): # occurrences of t in doc jChicken: 6; Fried: 1 vs Chicken: 1; Fried: 6
![Page 73: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/73.jpg)
Term Weighting Redux“Aboutness”
Term frequency (tf): # occurrences of t in doc jChicken: 6; Fried: 1 vs Chicken: 1; Fried: 6
Question: what about ‘Representative’ vs ‘Giffords’?
![Page 74: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/74.jpg)
Term Weighting Redux“Aboutness”
Term frequency (tf): # occurrences of t in doc jChicken: 6; Fried: 1 vs Chicken: 1; Fried: 6
Question: what about ‘Representative’ vs ‘Giffords’?
“Specificity”How surprised are you to see this term?Collection frequencyInverse document frequency (idf):
)log(i
i nNidf
![Page 75: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/75.jpg)
Term Weighting Redux“Aboutness”
Term frequency (tf): # occurrences of t in doc jChicken: 6; Fried: 1 vs Chicken: 1; Fried: 6
Question: what about ‘Representative’ vs ‘Giffords’?
“Specificity”How surprised are you to see this term?Collection frequencyInverse document frequency (idf):
)log(i
i nNidf
![Page 76: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/76.jpg)
Tf-idf SimilarityVariants of tf-idf prevalent in most VSM
![Page 77: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/77.jpg)
Term SelectionSelection:
Some terms are truly useless
![Page 78: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/78.jpg)
Term SelectionSelection:
Some terms are truly uselessToo frequent:
Appear in most documents
![Page 79: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/79.jpg)
Term SelectionSelection:
Some terms are truly uselessToo frequent:
Appear in most documentsLittle/no semantic content
![Page 80: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/80.jpg)
Term SelectionSelection:
Some terms are truly uselessToo frequent:
Appear in most documentsLittle/no semantic content
Function wordsE.g. the, a, and,…
![Page 81: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/81.jpg)
Term SelectionSelection:
Some terms are truly uselessToo frequent:
Appear in most documentsLittle/no semantic content
Function wordsE.g. the, a, and,…
Indexing inefficiency:Store in inverted index:
For each term, identify documents where it appears ‘the’: every document is a candidate match
![Page 82: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/82.jpg)
Term SelectionSelection:
Some terms are truly uselessToo frequent:
Appear in most documentsLittle/no semantic content
Function words E.g. the, a, and,…
Indexing inefficiency:Store in inverted index:
For each term, identify documents where it appears ‘the’: every document is a candidate match
Remove ‘stop words’ based on list Usually document-frequency based
![Page 83: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/83.jpg)
Term CreationToo many surface forms for same concepts
![Page 84: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/84.jpg)
Term CreationToo many surface forms for same concepts
E.g. inflections of words: verb conjugations, pluralProcess, processing, processed Same concept, separated by inflection
![Page 85: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/85.jpg)
Term CreationToo many surface forms for same concepts
E.g. inflections of words: verb conjugations, pluralProcess, processing, processed Same concept, separated by inflection
Stem terms: Treat all forms as same underlying
E.g., ‘processing’ -> ‘process’; ‘Beijing’ -> ‘Beije’
Issues:
![Page 86: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/86.jpg)
Term CreationToo many surface forms for same concepts
E.g. inflections of words: verb conjugations, pluralProcess, processing, processed Same concept, separated by inflection
Stem terms: Treat all forms as same underlying
E.g., ‘processing’ -> ‘process’; ‘Beijing’ -> ‘Beije’
Issues: Can be too aggressive
AIDS, aids -> aid; stock, stocks, stockings -> stock
![Page 87: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/87.jpg)
Evaluating IRBasic measures: Precision and Recall
![Page 88: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/88.jpg)
Evaluating IRBasic measures: Precision and RecallRelevance judgments:
For a query, returned document is relevant or non-relevantTypically binary relevance: 0/1
![Page 89: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/89.jpg)
Evaluating IRBasic measures: Precision and RecallRelevance judgments:
For a query, returned document is relevant or non-relevantTypically binary relevance: 0/1
T: returned documents; U: true relevant documentsR: returned relevant documentsN: returned non-relevant documents
![Page 90: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/90.jpg)
Evaluating IRBasic measures: Precision and RecallRelevance judgments:
For a query, returned document is relevant or non-relevantTypically binary relevance: 0/1
T: returned documents; U: true relevant documentsR: returned relevant documentsN: returned non-relevant documents
![Page 91: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/91.jpg)
Evaluating IRIssue: Ranked retrieval
Return top 1K documents: ‘best’ first
![Page 92: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/92.jpg)
Evaluating IRIssue: Ranked retrieval
Return top 1K documents: ‘best’ first10 relevant documents returned:
![Page 93: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/93.jpg)
Evaluating IRIssue: Ranked retrieval
Return top 1K documents: ‘best’ first10 relevant documents returned:
In first 10 positions?
![Page 94: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/94.jpg)
Evaluating IRIssue: Ranked retrieval
Return top 1K documents: ‘best’ first10 relevant documents returned:
In first 10 positions?In last 10 positions?
![Page 95: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/95.jpg)
Evaluating IRIssue: Ranked retrieval
Return top 1K documents: ‘best’ first10 relevant documents returned:
In first 10 positions?In last 10 positions?Score by precision and recall – which is better?
![Page 96: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/96.jpg)
Evaluating IRIssue: Ranked retrieval
Return top 1K documents: ‘best’ first10 relevant documents returned:
In first 10 positions?In last 10 positions?Score by precision and recall – which is better?
Identical !!! Correspond to intuition? NO!
![Page 97: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/97.jpg)
Evaluating IRIssue: Ranked retrieval
Return top 1K documents: ‘best’ first10 relevant documents returned:
In first 10 positions?In last 10 positions?Score by precision and recall – which is better?
Identical !!! Correspond to intuition? NO!
Need rank-sensitive measures
![Page 98: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/98.jpg)
Rank-specific P & R
![Page 99: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/99.jpg)
Rank-specific P & RPrecisionrank: based on fraction of reldocs at rank
Recallrank: similarly
![Page 100: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/100.jpg)
Rank-specific P & RPrecisionrank: based on fraction of reldocs at rank
Recallrank: similarlyNote: Recall is non-decreasing; Precision varies
![Page 101: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/101.jpg)
Rank-specific P & RPrecisionrank: based on fraction of reldocs at rank
Recallrank: similarlyNote: Recall is non-decreasing; Precision variesIssue: too many numbers; no holistic view
![Page 102: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/102.jpg)
Rank-specific P & RPrecisionrank: based on fraction of reldocs at rank
Recallrank: similarlyNote: Recall is non-decreasing; Precision variesIssue: too many numbers; no holistic view
Typically, compute precision at 11 fixed levels of recall
Interpolated precision:
Can smooth variations in precision
![Page 103: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/103.jpg)
Interpolated Precision
![Page 104: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/104.jpg)
Comparing SystemsCreate graph of precision vs recall
Averaged over queriesCompare graphs
![Page 105: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/105.jpg)
Mean Average Precision (MAP)
Traverse ranked document list:Compute precision each time relevant doc found
![Page 106: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/106.jpg)
Mean Average Precision (MAP)
Traverse ranked document list:Compute precision each time relevant doc found
Average precision up to some fixed cutoffRr: set of relevant documents at or above rPrecision(d) : precision at rank when doc d found
![Page 107: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/107.jpg)
Mean Average Precision (MAP)
Traverse ranked document list:Compute precision each time relevant doc found
Average precision up to some fixed cutoffRr: set of relevant documents at or above rPrecision(d) : precision at rank when doc d found
Mean Average Precision: 0.6Compute average over all queries of these averages
![Page 108: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/108.jpg)
Mean Average Precision (MAP)
Traverse ranked document list:Compute precision each time relevant doc found
Average precision up to some fixed cutoffRr: set of relevant documents at or above rPrecision(d) : precision at rank when doc d found
Mean Average Precision: 0.6Compute average of all queries of these averagesPrecision-oriented measure
![Page 109: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/109.jpg)
Mean Average Precision (MAP)
Traverse ranked document list: Compute precision each time relevant doc found
Average precision up to some fixed cutoffRr: set of relevant documents at or above rPrecision(d) : precision at rank when doc d found
Mean Average Precision: 0.6Compute average of all queries of these averagesPrecision-oriented measure
Single crisp measure: common TREC Ad-hoc
![Page 110: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/110.jpg)
RoadmapRetrieval systems
Improving document retrievalCompression & Expansion techniques
Passage retrieval:Contrasting techniques Interactions with document retreival
![Page 111: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/111.jpg)
Retrieval SystemsThree available systems
Lucene: ApacheBoolean systems with Vector Space RankingProvides basic CLI/API (Java, Python)
Indri/Lemur: Umass /CMULanguage Modeling system (best ad-hoc) ‘Structured query language
Weighting, Provides both CLI/API (C++,Java)
Managing Gigabytes (MG):Straightforward VSM
![Page 112: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/112.jpg)
Retrieval System BasicsMain components:
Document indexingReads document text
Performs basic analysisMinimally – tokenization, stopping, case foldingPotentially stemming, semantics, phrasing, etc
Builds index representation
![Page 113: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/113.jpg)
Retrieval System BasicsMain components:
Document indexingReads document text
Performs basic analysisMinimally – tokenization, stopping, case foldingPotentially stemming, semantics, phrasing, etc
Builds index representation Query processing and retrieval
Analyzes query (similar to document) Incorporates any additional term weighting, etc
Retrieves based on query content Returns ranked document list
![Page 114: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/114.jpg)
Example (I/L) indri-5.0/buildindex/IndriBuildIndex parameter_file
XML parameter file specifies: Minimally:
Index: path to output Corpus (+): path to corpus, corpus type
Optionally: Stemmer, field information
indri-5.0/runquery/IndriRunQuery query_parameter_file -count=1000 \
-index=/path/to/index -trecFormat=true > result_file Parameter file: formatted queries w/query #
![Page 115: Deep Processing QA & Information Retrieval](https://reader036.vdocuments.site/reader036/viewer/2022062323/5681619f550346895dd159a9/html5/thumbnails/115.jpg)
LuceneCollection of classes to support IR
Less directly linked to TRECE.g. query, doc readers
IndexWriter class Builds, extends index Applies analyzers to content
SimpleAnalyzer: stops, case folds, tokenizesAlso Stemmer classes, other langs, etc
Classes to read, search, analyze indexQueryParser parses query (fields, boosting, regexp)