text based information retrieval 2010-2011 text based information retrieval h02c8a h02c8b...

10
Text Based Information Retrieval 2010-2011 Retrieval H02C8A H02C8B Marie-Francine Moens Karl Gyllstrom Katholieke Universiteit Leuven Study points: 4 or 6 Language: English Periodicity: Taught in the second semester e-mail: [email protected] [email protected] 2010-2011

Upload: conrad-lynch

Post on 03-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Text Based Information Retrieval 2010-2011

Text Based Information Retrieval

H02C8A

H02C8B Marie-Francine Moens

Karl GyllstromKatholieke Universiteit Leuven

Study points: 4 or 6 Language: English

Periodicity: Taught in the second semester

e-mail: [email protected]@cs.kuleuven.be

2010-2011

information retrieval

Ranking: deciding which of

the relevant documents are

the best

Ranking: deciding which of

the relevant documents are

the best

Crawling: Discovering

documents on the web

Crawling: Discovering

documents on the web

Indexing: storing documents so they can be

quickly retrieved when users

search

Indexing: storing documents so they can be

quickly retrieved when users

searchClustering:

finding similar documents so they can be

retrieved together or stored on same

servers

Clustering: finding similar documents so they can be

retrieved together or stored on same

servers

Retrieval: finding good documents to answer users’

queries

Retrieval: finding good documents to answer users’

queries

Text Based Information Retrieval 2010-2011

E.g., text categorization, information extraction, text clustering, summarization, cross-language and cross-media retrieval, ...

Text Based Information Retrieval 2010-2011

Aims of the course• Acquire the fundamental techniques for text

based information retrieval and text mining

• Learn to design, partially implement, and evaluate a text based information retrieval system

• Acquire insights into current research questions

• Illustrate with commercial applications

• 1 lesson: speaker of an international company (e.g., Microsoft, Yahoo)

Text Based Information Retrieval 2010-2011

Prerequisites

• Basic knowledge of: – Probability theory and statistics– Information theory– Linear algebra– (Machine learning)

Text Based Information Retrieval 2010-2011

Course material

• Course slides and exercise questions/solutions can be downloaded from the Toledo platform

– http://toledo.kuleuven.be– Background literature

Text Based Information Retrieval 2010-2011

Evaluation

• An assignment (grading: 33.3%): At the start of the course (week 7) the student can choose an assignment (paper or programming exercise), which regards a specific problem in information retrieval. The assignment is due during week 21. A score of 50% or more on this assignment is transferred to the second exam session.Large programming assignment for 6 study points, choice for paper only for 4 study points

• Theory exam (grading: 33.3 %): Oral with written preparation, closed book.

• Exercise exam (grading: 33.3%): Written, open book.