vertical search for courses of uiuc by jessica bell, alexander loeb, sharon paradesi, michael paul,...

11
Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang

Upload: gary-rice

Post on 19-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang

Vertical Search for Courses of UIUC

by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul,

Jing Xia, Jie Zhang

Page 2: Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang

Demo

http://greedy.cs.uiuc.edu/dssi/course/search.php

Page 3: Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang

Goals of the project- construct a database of UIUC courses across all departments ultimately creating a centralized knowledgebase about each course.

- augment the database by drawing relations between courses both within and between departments and further by finding similarities among courses outside of the University of Illinois.

Page 4: Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang

DA

TA

SO

UR

CE

Course Catalog

Book Store

Webpages

Other Universities

PHP script

JAVA script

AgentIDE

Heritrix

WEKA

DATABASE

Basic Course Info

Book Info

Course homepage

Keywords

Related Courses

Query by

Course Name

Instructor

Description

PHP

Architecture

Page 5: Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang

Web Crawling Wget, AgentIDE and Heritrix

Parsers Python and Java

Learning Tools WEKA

Website Design PHP and MySQL

Tools used

Page 6: Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang

Tasks finished

Data Mining – Basic course information Similar course recommendation Prerequisite course list Recommended book information

Learning – Clustering Classification

Page 7: Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang

Keywords

Pull from course descriptions Remove uninformative/common words

Page 8: Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang

Keywords (contd.)

topics 0.1328 fruits 0.6453their 0.1352 horticultural 0.6453problems 0.1370 agricultural 0.6454basic 0.1373 0.6478techniques 0.1439 doctorate 0.6489students 0.1457 speaker 0.6489is 0.1494 meteorological 0.6492are 0.1505 anthropology 0.6493analysis 0.1531 institute 0.6498special 0.1531 reflective 0.6498areas 0.1556 later 0.6508graduate 0.1563 weather 0.6513research 0.1586 protein 0.6514be 0.1586 mobilization 0.6514various 0.1589 authentic 0.6514methods 0.1600 romance 0.6514selected 0.1618 libraries 0.6561current 0.1625 became 0.6563advanced 0.1651 novelists 0.6563that 0.1651 colonization 0.6563concepts 0.1668 initiatives 0.6563both 0.1731 revisit 0.6563development 0.1744 churches 0.6563

russian

Page 9: Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang

Search Search by name, instructor, or content Clean up search string

“cs125” becomes “CS 125” “real-time” becomes “real time realtime”

Split search string into individual words and query database for word matches

Score and rank results by match frequencies and keyword informativeness scores

Look at distribution of scores and display the top results

Page 10: Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang

Classification NBTree Classifier

Training set: 34 instances Test set: 38 instances Attributes: 17

Accuracy - 94.74% Precision - 0.947 Recall - 0.947 F-Measure - .947

Page 11: Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang

Clustering Cobweb Clustering Algorithm

Instances: 20 Attributes: 112

Number of clusters: 17 Incorrectly clustered instances: 7.0 (i.e. 35%)