ist 441 example projects. undergrad project find a customer – interest in xbox game forum build a...

3
IST 441 Example Projects

Upload: shawn-heath

Post on 12-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IST 441 Example Projects. Undergrad Project Find a customer – interest in xbox game forum Build a search engine for Xbox game forums etc. Compare two

IST 441 Example Projects

Page 2: IST 441 Example Projects. Undergrad Project Find a customer – interest in xbox game forum Build a search engine for Xbox game forums etc. Compare two

Undergrad Project

• Find a customer – interest in xbox game forum• Build a search engine for Xbox game forums etc. • Compare two approaches: Google CSE and LucidWorks.• Steps:

• Crawl websites (at most 5).• Determine crawl depth, how to include/exclude certain pages, filetypes.

• Extract information and build the index.• Experiment with different rankings (see “relevancy workbench” app in your

LucidWorks installation).• http://ist441.ist.psu.edu:8988/relevancy/experiment

• Perform search and compare the precision@K values.

Page 3: IST 441 Example Projects. Undergrad Project Find a customer – interest in xbox game forum Build a search engine for Xbox game forums etc. Compare two

Graduate Project

• Crawling academic institution webpages in Qatar (it’s a small domain). • Integrating a more powerful crawler such as Nutch/heritrix with LucidWorks

system.• Focused crawling i.e. crawling for specific type of pages such as researchers’

home pages.

• Modifying the parser to extract specific information such as email address, phone numbers in a web page.• Modifying Solr schema and/or ranking functions.• Comparing search results with Google CSE.• Discuss with instructor for more information.