challenges in information retrieval and language modeling michael shepherd dalhousie university...
TRANSCRIPT
![Page 1: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/1.jpg)
Challenges in Information Retrieval and Language Modeling
Michael ShepherdDalhousie University
Halifax, NSCanada
![Page 2: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/2.jpg)
Report of a Workshop
James Allan, et al., “Challenges in Information Retrieval and Language Modeling”. Report of a Workshop held in the Center for Intelligent Information Retrieval, University of Massachusetts Amherst, September 2002.
The following presentation is based on:
![Page 3: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/3.jpg)
Long-Term Challenges
• LT Challenge 1 – Global Information Access– Satisfy human information needs through natural,
efficient interaction with an automated system that leverages world-wide structured and unstructured data in any language
• Need– Massively distributed, multi-lingual retrieval systems– Techniques from distributed retrieval, data fusion,
cross-lingual IR
![Page 4: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/4.jpg)
Long-Term Challenges
• LT Challenge 2 – Contextual Retrieval– Combine search technologies and knowledge about
query and user context into a single framework in order to provide the most “appropriate” answer for a user’s information needs
• Need– Context and query features to infer characteristics of
the info need such as query type, answer type, answer level, task etc.
![Page 5: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/5.jpg)
User Information
Need
Query
User Profile
Task
Activity
![Page 6: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/6.jpg)
Topics
1. Retrieval Models2. Cross-Lingual information Retrieval3. Web Search4. User Modeling5. Filtering, Topic Detection & Tracking, and classification6. Summarization7. Question Answering8. Metasearch and distributed retrieval9. Multimedia retrieval10. Information extraction11.Testbeds
![Page 7: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/7.jpg)
Topics
1. Retrieval Models2. Cross-Lingual information Retrieval3. Web Search4. User Modeling5. Filtering, Topic Detection & Tracking, and classification6. Summarization7. Question Answering8. Metasearch and distributed retrieval9. Multimedia retrieval10. Information extraction11.Testbeds
![Page 8: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/8.jpg)
User Modeling
• Much research over the past number of years has abstracted the user out of the retrieval problem
• But, in recent years, the rate of improvement of IR systems has slowed
• One reason may be that generic IR systems are “good-enough” for everyone but “never great” for anyone
• It is suggested that greater focus on the user will enable major advances in IR
![Page 9: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/9.jpg)
How Do We Get Info About the User?
![Page 10: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/10.jpg)
How Do We Get Info About the User?
• a priori– Ask the user
• a posteriori– Explicit
• Show user a document and ask them if it was relevant
– Implicit• Track what the user does
– Web logs
– Time spent reading a page
![Page 11: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/11.jpg)
How Do We Model the User?
![Page 12: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/12.jpg)
How Do We Model the User?
• IR Technique– A vector of terms or features supplied by the
user or drawn from documents deemed relevant to the user
– May be static or adaptive
• Machine Learning Technique– An adaptive technique such as a neural net
that “learns” the preferences of the user– Feature set selection is important
![Page 13: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/13.jpg)
User Model as Filter
Query representatio
n
Document representatio
n
Matching algorithm
results
Information need
User Model as
Filter
![Page 14: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/14.jpg)
User Model as Query
Document representatio
n
Matching algorithm
results
Information need
User Model as
Query
![Page 15: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/15.jpg)
Integrating the User Model and the Query
Query User Profile
Modified Query
Moving the Query within the Document Space
![Page 16: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/16.jpg)
Integrating the User Model and the Query
Document Space
p
q
q'
![Page 17: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/17.jpg)
Integrating the User Profile and the Query
Document Space
p q
![Page 18: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/18.jpg)
Integrating the User Profile and the Query
Document Space
p q
![Page 19: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/19.jpg)
Short-term/Long-term Interests
• Users’ interests change over time
• May have short-term interests but we do not want these to skew our models away from our long-term interests
• Particular focus is electronic news
![Page 20: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/20.jpg)
Single task/Multiple tasks
• Most user models are built for a specific task, such as filtering news items looking for certain types of news
• Most people multi-task so we currently run multiple user models for different tasks for the same user
• Really would like to have a single model for multiple tasks
![Page 21: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/21.jpg)
Filtering, Topic Detection & Tracking and Classification
• Some of these technologies have been adopted widely
• These topics are grouped together because they are similar technologies used in similar applications
![Page 22: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/22.jpg)
Routing of email and phone messages for Customer Relationship Management
MessageMessage Routing System
Service Department
New Accounts
Customer Complaints
![Page 23: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/23.jpg)
Categorization of Trouble Tickets
Trouble Ticket
Ticket Routing System
Trouble Category 1
Trouble Category 2
Trouble Category 3
![Page 24: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/24.jpg)
Topic Detection
News Item
News Item Routing System
Topic 1
Topic 2
Topic 3
New Topic
![Page 25: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/25.jpg)
Topic Tracking
Topic
Sub-Topic Sub-Topic
![Page 26: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/26.jpg)
Topic TrackingW
MD
in
Iraq
Inva
sion
of I
raq
to lo
cate
WM
DC
anno
t fin
d W
MD
Bush
and
Ker
ry d
ebat
e re
ason
s fo
r inv
adin
g Ira
q
Ele
ctio
n D
ay in
US
A
Nov ‘02 Mar ‘03 Jan ‘04 Sept ‘04 Nov ‘04
![Page 27: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/27.jpg)
Summarization
• Text summarization is an active field of research in both IR and Natural Language Processing (NLP)
• NLP is required for high-quality summarization• IR summarization can provide access to large
repositories of data in an efficient way• IR summarization shares some basic techniques
with indexing as both are concerned with identifying what a document is “about”
![Page 28: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/28.jpg)
Summarization
• A summary can consist of:– A set of keywords or noun phrases– A set of sentences with “important” terms
• A summary can be about:– A single document (but not generally)– A set of documents– A web site
![Page 29: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/29.jpg)
Summarization
• Each document is represented as a vector and tf.idf is used to determine the best terms
• Cluster the documents, create the centroids, and determine the best terms
• Sentences are given weights based on occurrence of terms and the associated tf.idf weights
![Page 30: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/30.jpg)
Metasearch and Distributed Retrieval
• Retrieving and combining information from multiple sources:– Data fusion
• the combination of information from multiple sources that index an effectively common data set
– Collection fusion or distributed retrieval• the combination of info from multiple sources that
index effectively disjoint data sets
![Page 31: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/31.jpg)
Issues for Metasearch and DR
• Resource description
• Resource ranking
• Resource selection
• Searching
• Merging of results
![Page 32: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/32.jpg)
Major Issue
• Resource description
• Resource ranking
• Resource selection
• Searching
• Merging of results
Semantic Interoperability
![Page 33: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/33.jpg)
Summary
• IR is no longer the domain of the “specialist” – everyone gets to play
• Drowning in information• Next Generation IR tools must be dramatically
better than what we have• IR field must rethink its basic assumptions and
evaluation methodologies because the ones that brought us to the level of success we have today will not be sufficient to reach the next level
![Page 34: Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada](https://reader031.vdocuments.site/reader031/viewer/2022032203/56649e205503460f94b0b560/html5/thumbnails/34.jpg)
Long-Term Challenges
• Global Information Access
• Contextual Retrieval