project 1: machine learning using neural networks ver 1.1
TRANSCRIPT
![Page 1: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/1.jpg)
Project 1:Project 1:Machine Learning Using Neural Machine Learning Using Neural NetworksNetworks
Ver 1.1
![Page 2: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/2.jpg)
2 (C) 2006, SNU Biointelligence La
boratory
OutlineOutline
Classification using ANN Learn and classify text documents Estimate several statistics on the dataset
![Page 3: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/3.jpg)
3 (C) 2006, SNU Biointelligence La
boratory
Network StructureNetwork Structure
…
Class 1
Class 3
Class 2Input
![Page 4: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/4.jpg)
CLASSIC3 DatasetCLASSIC3 Dataset
![Page 5: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/5.jpg)
5 (C) 2006, SNU Biointelligence La
boratory
CLASSIC3CLASSIC3
Three categories: 3891 documents CISI: 1,460 document abstracts on information retrieval from In
stitute of Scientific Information. CRAN: 1,398 document abstracts on Aeronautics from Cranfiel
d Institute of Technology. MED: 1,033 biomedical abstracts from MEDLINE.
![Page 6: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/6.jpg)
6 (C) 2006, SNU Biointelligence La
boratory
Text Presentation in Vector Text Presentation in Vector SpaceSpace
. . .
1 0 0 0 2 0 0 1
0 3 0 1 0 0 0 1
문서집합
Term vectors
1 0 2 0 1 0 1 0
0 1 1 3 1 0 0 1
2 0 0 0 0 1 0 1
0 0 1 0 0 0 3 0
0 2 1 0 0 0 0 1
0 0 3 0 0 1 0 0
1 0 1 1 0 0 2 1
0 1 1 0 1 0 0 0
0 0 0 0 3 1 0 0
baseball
specs
graphics
hockey
unixspace
d1
d2
d3
dn
Term-document matrix
stemmingstop-words eliminationfeature selection
1 0 1 0 0 0 0 2
Bag-of-Words representation
VSM representation
Dataset Format
![Page 7: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/7.jpg)
7 (C) 2006, SNU Biointelligence La
boratory
Dimensionality ReductionDimensionality Reduction
Sort by scoreScoring measure
(on individual feature)
ML algorithm
term (or feature) vectors
choose terms with higher values
individual feature
scores
Term Weighting
TF or TF x IDF
documents in vector space
TF: term frequencyIDF: Inverse Document Frequency
)/log()(IDF ii nNw N: Number of documentsni: number of documents that contain the j-th word
![Page 8: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/8.jpg)
8 (C) 2006, SNU Biointelligence La
boratory
Construction of Document Construction of Document VectorsVectors Controlled vocabulary
Stopwords are removed Stemming is used. Words of which document frequency is less than 5 is removed.
Term size: 3,850 A document is represented with a 3,850-dimensional vector of whic
h elements are the frequency of words. Words are sorted according to their values of information gain.
Top 100 terms are selected 3,830 (examples) x 100 (terms) matrix
![Page 9: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/9.jpg)
Experimental ResultsExperimental Results
![Page 10: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/10.jpg)
10 (C) 2006, SNU Biointelligence La
boratory
Data Setting for the Data Setting for the ExperimentsExperiments Basically, training and test set are given.
Training : 2,683 examples Test : 1,147 examples
N-fold cross-validation (Optional) Dataset is divided into N subsets. The holdout method is repeated N times.
Each time, one of the N subsets is used as the test set and the other (N-1) subsets are put together to form a training set.
The average performance across all N trials is computed.
![Page 11: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/11.jpg)
11 (C) 2006, SNU Biointelligence La
boratory
Number of EpochsNumber of Epochs
![Page 12: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/12.jpg)
12 (C) 2006, SNU Biointelligence La
boratory
Number of Hidden UnitsNumber of Hidden Units
Number of Hidden Units Minimum 10 runs for each setting
# Hidden
Units
Train Test
Average SD
Best Worst Average SD
Best Worst
Setting 1
Setting 2
Setting 3
![Page 13: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/13.jpg)
13 (C) 2006, SNU Biointelligence La
boratory
![Page 14: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/14.jpg)
14 (C) 2006, SNU Biointelligence La
boratory
Other Methods/ParametersOther Methods/Parameters
Normalization method for input vectors Class decision policy Learning rates ….
![Page 15: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/15.jpg)
15 (C) 2006, SNU Biointelligence La
boratory
ANN SourcesANN Sources
Source codes Free software Weka NN libraries (C, C++, JAVA, …) MATLAB tool box
Web sites http://www.cs.waikato.ac.nz/~ml/weka/ http://www.faqs.org/faqs/ai-faq/neural-nets/part5/
![Page 16: Project 1: Machine Learning Using Neural Networks Ver 1.1](https://reader036.vdocuments.site/reader036/viewer/2022072016/56649ef65503460f94c09d74/html5/thumbnails/16.jpg)
16 (C) 2006, SNU Biointelligence La
boratory
SubmissionSubmission
Due date: April 18 (Tue) Both ‘hardcopy’ and ‘email’
Used software and running environments Experimental results with various parameter settings Analysis and explanation about the results in your own way FYI, it is not important to achieve the best performance