text based information retrieval 2010-2011 text based information retrieval h02c8a h02c8b...
TRANSCRIPT
Text Based Information Retrieval 2010-2011
Text Based Information Retrieval
H02C8A
H02C8B Marie-Francine Moens
Karl GyllstromKatholieke Universiteit Leuven
Study points: 4 or 6 Language: English
Periodicity: Taught in the second semester
e-mail: [email protected]@cs.kuleuven.be
2010-2011
Ranking: deciding which of
the relevant documents are
the best
Ranking: deciding which of
the relevant documents are
the best
Crawling: Discovering
documents on the web
Crawling: Discovering
documents on the web
Indexing: storing documents so they can be
quickly retrieved when users
search
Indexing: storing documents so they can be
quickly retrieved when users
searchClustering:
finding similar documents so they can be
retrieved together or stored on same
servers
Clustering: finding similar documents so they can be
retrieved together or stored on same
servers
Retrieval: finding good documents to answer users’
queries
Retrieval: finding good documents to answer users’
queries
Text Based Information Retrieval 2010-2011
E.g., text categorization, information extraction, text clustering, summarization, cross-language and cross-media retrieval, ...
Text Based Information Retrieval 2010-2011
Aims of the course• Acquire the fundamental techniques for text
based information retrieval and text mining
• Learn to design, partially implement, and evaluate a text based information retrieval system
• Acquire insights into current research questions
• Illustrate with commercial applications
• 1 lesson: speaker of an international company (e.g., Microsoft, Yahoo)
Text Based Information Retrieval 2010-2011
Prerequisites
• Basic knowledge of: – Probability theory and statistics– Information theory– Linear algebra– (Machine learning)
Text Based Information Retrieval 2010-2011
Course material
• Course slides and exercise questions/solutions can be downloaded from the Toledo platform
– http://toledo.kuleuven.be– Background literature
Text Based Information Retrieval 2010-2011
Evaluation
• An assignment (grading: 33.3%): At the start of the course (week 7) the student can choose an assignment (paper or programming exercise), which regards a specific problem in information retrieval. The assignment is due during week 21. A score of 50% or more on this assignment is transferred to the second exam session.Large programming assignment for 6 study points, choice for paper only for 4 study points
• Theory exam (grading: 33.3 %): Oral with written preparation, closed book.
• Exercise exam (grading: 33.3%): Written, open book.