automatic syllabus classification jcdl – vancouver – 22 june 2007

20
Automatic Syllabus Automatic Syllabus Classification Classification JCDL – Vancouver – 22 June 2007 JCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co- Edward A. Fox (presenting co- author), author), Xiaoyan Yu, Manas Tungare, Weiguo Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones, Fan, Manuel Perez-Quinones, William Cameron, GuoFang Teng, and William Cameron, GuoFang Teng, and Lillian (“Boots”) Cassel Lillian (“Boots”) Cassel

Upload: karif

Post on 21-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007. Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones, William Cameron, GuoFang Teng, and Lillian (“Boots”) Cassel. Why Study the Syllabus Genre?. Educational resource - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Automatic Syllabus Automatic Syllabus ClassificationClassification

JCDL – Vancouver – 22 June 2007JCDL – Vancouver – 22 June 2007

Edward A. Fox (presenting co-Edward A. Fox (presenting co-author),author),

Xiaoyan Yu, Manas Tungare, Weiguo Fan, Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones, William Manuel Perez-Quinones, William

Cameron, GuoFang Teng, and Lillian Cameron, GuoFang Teng, and Lillian (“Boots”) Cassel(“Boots”) Cassel

Page 2: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Why Study the Syllabus Why Study the Syllabus Genre?Genre?

►Educational resourceEducational resource► Importance to the educational Importance to the educational

communitycommunity EducatorsEducators Students Students Self-learnersSelf-learners

►Thanks to NSF DUE grant 5328255 Thanks to NSF DUE grant 5328255 (personalization support for NSDL)(personalization support for NSDL)

Page 3: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Where to look for a specific Where to look for a specific syllabus?syllabus?

►Non-standard publishing mechanisms: Instructor’s website CMSs (courseware management systems,

e.g., Sakai) Catalogs

►Limited access outside the university►Search on the WebSearch on the Web

Many non-relevant links in search resultsMany non-relevant links in search results

Page 4: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Syllabus Library Syllabus Library

►BootstrappingBootstrapping Identify true syllabi from search resultsIdentify true syllabi from search results Store in a repositoryStore in a repository Develop tools & applications Develop tools & applications

►Scaling upScaling up Encourage contributions from educational Encourage contributions from educational

communitiescommunities

Page 5: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

An Essential Step towards An Essential Step towards Syllabus Library: ClassificationSyllabus Library: Classification

►Classification Objects:Classification Objects: Potential syllabi in Computer Science: Potential syllabi in Computer Science:

search on the Web, using syllabus search on the Web, using syllabus keywords, only in the educational domainskeywords, only in the educational domains

►Class DefinitionClass Definition►Feature SelectionFeature Selection►Model SelectionModel Selection►Training and TestingTraining and Testing

Page 6: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Four ClassesFour Classes

Class distribution on 1020 documents manually tagged

Partial 20%

Noise 18%

Entry 13%

Full 49%

Noise

Page 7: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Full Full SyllabusSyllabus

Page 8: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

PartialPartial

SyllabuSyllabuss

Page 9: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

EntrEntry y

PagePage

Page 10: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

NoiseNoise

Page 11: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Syllabus ComponentsSyllabus Components

►course code►title►class time& location►offering institution►teaching staff►course description►objectives

►web site►prerequisite►textbook►grading policy►schedule►assignment►exam and

resources

Page 12: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

FeaturesFeatures

►84 Genre-specific Features84 Genre-specific Features the occurrences of keywords the positions of keywords, and the co-occurrences of keywords and links

►A series of keywords for each syllabus component

Page 13: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Classification ModelsClassification Models

► Discriminative ModelsDiscriminative Models Support Vector Machines (SVM)Support Vector Machines (SVM) SMO-L: SMO-L: Sequential Minimal Optimization,

accelerating the training process of SVM SMO-P: SMO with a polynomial kernelSMO-P: SMO with a polynomial kernel

► Generative ModelsGenerative Models Naïve Bayes (NB)Naïve Bayes (NB) NB-K: Applying kernel methods to estimate the NB-K: Applying kernel methods to estimate the

distribution of numeric attributes in NB modelingdistribution of numeric attributes in NB modeling

Page 14: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

EvaluationEvaluation

►Training corpus: 1020 out of the 8000+ Training corpus: 1020 out of the 8000+ potential syllabipotential syllabi

►All in HTML, PDF, PostScript, or Text All in HTML, PDF, PostScript, or Text ►Manual tagging on the training corpusManual tagging on the training corpus

Unanimous agreement by three co-authorsUnanimous agreement by three co-authors

►Evaluation strategy: ten-fold cross Evaluation strategy: ten-fold cross validationvalidation

►Metrics: FMetrics: F11 (an overall measure of (an overall measure of classification performance)classification performance)

Page 15: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Results w. random setResults w. random set

Best items are in purple boxes.

Acctr: Classification accuracy on the training set.

Page 16: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Results (Cont’d)Results (Cont’d)

►SVM outperforms NB regarding our SVM outperforms NB regarding our syllabus classification on average. syllabus classification on average.

►All classifiers fail in identifying the All classifiers fail in identifying the partial syllabus class. partial syllabus class.

►The kernel settings for NB are not The kernel settings for NB are not helpful in the syllabus classification helpful in the syllabus classification task. task.

►Classification accuracy on training data Classification accuracy on training data is not that good. is not that good.

Page 17: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Future WorkFuture Work

►Feature selectionFeature selection Add general feature selection methods on Add general feature selection methods on

text classificationtext classification e.g., Document Frequency, Information e.g., Document Frequency, Information

Gain, and Mutual InformationGain, and Mutual Information

Hybrid: combine our genre-specific Hybrid: combine our genre-specific features with the general featuresfeatures with the general features

Page 18: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Future Work (Cont’d)Future Work (Cont’d)

►Syllabus LibrarySyllabus Library Welcome to Welcome to http://doc.cs.vt.eduhttp://doc.cs.vt.edu Share your favorite course resources – not Share your favorite course resources – not

limited to the syllabus genre.limited to the syllabus genre.

► Information ExtractionInformation Extraction Semantic searchSemantic search

►PersonalizationPersonalization

Page 19: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

SummarySummary

►Towards a syllabus libraryTowards a syllabus library Starting from search results on the webStarting from search results on the web Classification of the search results for true Classification of the search results for true

syllabisyllabi► SVM is a better choice for our syllabus SVM is a better choice for our syllabus

classification task.classification task.

►Towards an educational on-line Towards an educational on-line community around the syllabus librarycommunity around the syllabus library

Page 20: Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007

Q & AQ & A