reporter: jin-huei dai

61
1 Weka: Practical Machine Learning Tools and Techniques with Java Implementations Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems, pages 192-196, 1999. Dunedin, New Zealand. Ian H. Witten, Eibe Frank, Len Trigg, Mark Hall, Geoffrey Holmes, and Sally Jo Cunningham. Reporter: Jin-huei Dai

Upload: iniko

Post on 14-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

- PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reporter: Jin-huei Dai

1

Weka: Practical Machine Learning Tools and Techniques with Java Implementations

Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-

Based Information Systems, pages 192-196, 1999. Dunedin, New Zealand.

Ian H. Witten, Eibe Frank, Len Trigg, Mark Hall, Geoffrey Holmes, and Sally Jo Cunningham.

Reporter: Jin-huei Dai

Page 2: Reporter: Jin-huei Dai

2

OUTLINE

1. Introduction

2. The command-line interface

3. The Explorer

4. The Knowledge Flow interface

5. The Experimenter

6. Conclusions

7. References

Page 3: Reporter: Jin-huei Dai

3

1. Introduction

Data mining is an experimental science. Machine learning provides the technical basis of data mining.

The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools. It is designed so that users can quickly try out existing methods on new datasets in flexible ways. It provides extensive support for the whole process of experimental data mining, including preparing the input data, evaluating learning schemes statistically, and visualizing the input data and the result of learning.

Weka was developed at the University of Waikato in New Zealand, and the name stands for Waikato Environment for Knowledge Analysis.

Page 4: Reporter: Jin-huei Dai

4

1. Introduction(cont.) Weka is freely available on the World-Wide Web and ac

companies a new text on data mining which documents and fully explains all the algorithms it contains. Applications written using the Weka class libraries can be run on any computer with a Web browsing capability; this allows users to apply machine learning techniques to their own data regardless of computer platform. The Weka software is written entirely in Java to facilitate the availability of data mining tools regardless of computer platform.

The primary learning methods in Weka are “classifiers”, and they induce a rule set or decision tree that models the data. Weka also includes algorithms for learning association rules and clustering data.

Page 5: Reporter: Jin-huei Dai

5

Page 6: Reporter: Jin-huei Dai

6

Page 7: Reporter: Jin-huei Dai

7

Page 8: Reporter: Jin-huei Dai

8

Page 9: Reporter: Jin-huei Dai

9

2.The command-line interface

Page 10: Reporter: Jin-huei Dai

10

Page 11: Reporter: Jin-huei Dai

11

3.The Explorerp.375 === Run information ===Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2Relation: weatherInstances: 14 Attributes: 5outlook temperature humidity windy playTest mode: 10-fold cross-validation=== Classifier model (full training set) ===J48 pruned treeoutlook = sunny| humidity <= 75: yes (2.0)| humidity > 75: no (3.0)outlook = overcast: yes (4.0)outlook = rainy| windy = TRUE: no (2.0)| windy = FALSE: yes (3.0)Number of Leaves : 5Size of the tree : 8

Page 12: Reporter: Jin-huei Dai

12

Time taken to build model: 0.08 seconds=== Stratified cross-validation ====== Summary ===Correctly Classified Instances 9 64.2857 %Incorrectly Classified Instances 5 35.7143 %Kappa statistic 0.186 Mean absolute error 0.2857Root mean squared error 0.4818Relative absolute error 60 %Root relative squared error 97.6586 %Total Number of Instances 14 === Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure Class 0.778 0.6 0.7 0.778 0.737 yes 0.4 0.222 0.5 0.4 0.444 no=== Confusion Matrix === a b <-- classified as 7 2 | a = yes 3 2 | b = no

Page 13: Reporter: Jin-huei Dai

13

Page 14: Reporter: Jin-huei Dai

14

Page 15: Reporter: Jin-huei Dai

15

444.09.0

4.0

4.05.0

)4.0)(5.0(2

737.0778.07.0

)778.0)(7.0(22

5.022

2 ,7.0

37

7 Pr

222.027

2 ,6.0

23

3

TNFP

FPRate

4.023

2 ,778.0

27

7

RPPRmeasureF

ecision

FP

TPRate

Page 16: Reporter: Jin-huei Dai

16

Page 17: Reporter: Jin-huei Dai

17

Page 18: Reporter: Jin-huei Dai

18

Page 19: Reporter: Jin-huei Dai

19

=== Run information ===Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0Relation: weather.symbolicInstances: 14Attributes: 5=== Associator model (full training set) ===Size of set of large itemsets L(1): 12Size of set of large itemsets L(2): 47Size of set of large itemsets L(3): 39Size of set of large itemsets L(4): 6

Best rules found: 1. humidity=normal windy=FALSE 4 ==> play=yes 4 conf:(1) 2. temperature=cool 4 ==> humidity=normal 4 conf:(1) 3. outlook=overcast 4 ==> play=yes 4 conf:(1) 4. temperature=cool play=yes 3 ==> humidity=normal 3 conf:(1) 5. outlook=rainy windy=FALSE 3 ==> play=yes 3 conf:(1) 6. outlook=rainy play=yes 3 ==> windy=FALSE 3 conf:(1) 7. outlook=sunny humidity=high 3 ==> play=no 3 conf:(1) 8. outlook=sunny play=no 3 ==> humidity=high 3 conf:(1) 9. temperature=cool windy=FALSE 2 ==> humidity=normal play=yes 2 conf:(1)10. temperature=cool humidity=normal windy=FALSE 2 ==> play=yes 2 conf:(1)

Page 20: Reporter: Jin-huei Dai

20

=== Run information ===Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2Relation: soybean Instances: 683 Attributes: 36date plant-stand precip temp hail crop-histarea-damaged severity seed-tmt germination plant-growth leavesleafspots-halo leafspots-marg leafspot-size leaf-shread leaf-malf leaf-mildstem lodging stem-cankers canker-lesion fruiting-bodies external-decaymycelium int-discolor sclerotia fruit-pods fruit-spots seedmold-growth seed-discolor seed-size shriveling roots classTest mode: evaluate on training dataJ48 pruned tree------------------leafspot-size = lt-1/8| canker-lesion = dna| | leafspots-marg = w-s-marg| | | seed-size = norm: bacterial-blight (21.0/1.0)| | | seed-size = lt-norm: bacterial-pustule (3.23/1.23)| | leafspots-marg = no-w-s-marg: bacterial-pustule (17.91/0.91)| | leafspots-marg = dna: bacterial-blight (0.0)| canker-lesion = brown: bacterial-blight (0.0)| canker-lesion = dk-brown-blk: phytophthora-rot (4.78/0.1)| canker-lesion = tan: purple-seed-stain (11.23/0.23)leafspot-size = gt-1/8

Page 21: Reporter: Jin-huei Dai

21

| roots = norm| | mold-growth = absent| | | fruit-spots = absent| | | | leaf-malf = absent| | | | | fruiting-bodies = absent| | | | | | date = april: brown-spot (5.0)| | | | | | date = may: brown-spot (24.0/1.0)| | | | | | date = june| | | | | | | precip = lt-norm: phyllosticta-leaf-spot (4.0)| | | | | | | precip = norm: brown-spot (5.0/2.0)| | | | | | | precip = gt-norm: brown-spot (21.0)| | | | | | date = july| | | | | | | precip = lt-norm: phyllosticta-leaf-spot (1.0)| | | | | | | precip = norm: phyllosticta-leaf-spot (2.0)| | | | | | | precip = gt-norm: frog-eye-leaf-spot (11.0/5.0)| | | | | | date = august| | | | | | | leaf-shread = absent| | | | | | | | seed-tmt = none: alternarialeaf-spot (16.0/4.0)| | | | | | | | seed-tmt = fungicide| | | | | | | | | plant-stand = normal: frog-eye-leaf-spot (6.0)| | | | | | | | | plant-stand = lt-normal: alternarialeaf-spot (5.0/1.0)| | | | | | | | seed-tmt = other: frog-eye-leaf-spot (3.0)| | | | | | | leaf-shread = present: alternarialeaf-spot (2.0)| | | | | | date = september| | | | | | | stem = norm: alternarialeaf-spot (44.0/4.0)| | | | | | | stem = abnorm: frog-eye-leaf-spot (2.0)| | | | | | date = october: alternarialeaf-spot (31.0/1.0)| | | | | fruiting-bodies = present: brown-spot (34.0)

Page 22: Reporter: Jin-huei Dai

22

| | | | leaf-malf = present: phyllosticta-leaf-spot (10.0)| | | fruit-spots = colored| | | | fruit-pods = norm: brown-spot (2.0)| | | | fruit-pods = diseased: frog-eye-leaf-spot (62.0)| | | | fruit-pods = few-present: frog-eye-leaf-spot (0.0)| | | | fruit-pods = dna: frog-eye-leaf-spot (0.0)| | | fruit-spots = brown-w/blk-specks| | | | crop-hist = diff-lst-year: brown-spot (0.0)| | | | crop-hist = same-lst-yr: brown-spot (2.0)| | | | crop-hist = same-lst-two-yrs: brown-spot (0.0)| | | | crop-hist = same-lst-sev-yrs: frog-eye-leaf-spot (2.0)| | | fruit-spots = distort: brown-spot (0.0)| | | fruit-spots = dna: brown-stem-rot (9.0)| | mold-growth = present| | | leaves = norm: diaporthe-pod-&-stem-blight (7.25)| | | leaves = abnorm: downy-mildew (20.0)| roots = rotted| | area-damaged = scattered: herbicide-injury (1.1/0.1)| | area-damaged = low-areas: phytophthora-rot (30.03)| | area-damaged = upper-areas: phytophthora-rot (0.0)| | area-damaged = whole-field: herbicide-injury (3.66/0.66)| roots = galls-cysts: cyst-nematode (7.81/0.17)leafspot-size = dna| int-discolor = none| | leaves = norm| | | stem-cankers = absent| | | | canker-lesion = dna: diaporthe-pod-&-stem-blight (5.53)

Page 23: Reporter: Jin-huei Dai

23

| | | | canker-lesion = brown: purple-seed-stain (0.0)| | | | canker-lesion = dk-brown-blk: purple-seed-stain (0.0)| | | | canker-lesion = tan: purple-seed-stain (9.0)| | | stem-cankers = below-soil: rhizoctonia-root-rot (19.0)| | | stem-cankers = above-soil: anthracnose (0.0)| | | stem-cankers = above-sec-nde: anthracnose (24.0)| | leaves = abnorm| | | stem = norm| | | | plant-growth = norm: powdery-mildew (22.0/2.0)| | | | plant-growth = abnorm: cyst-nematode (4.3/0.39)| | | stem = abnorm| | | | plant-stand = normal| | | | | leaf-malf = absent| | | | | | seed = norm: diaporthe-stem-canker (21.0/1.0)| | | | | | seed = abnorm: anthracnose (9.0)| | | | | leaf-malf = present: 2-4-d-injury (3.0)| | | | plant-stand = lt-normal| | | | | fruiting-bodies = absent: phytophthora-rot (50.16/7.61)| | | | | fruiting-bodies = present| | | | | | roots = norm: anthracnose (11.0/1.0)| | | | | | roots = rotted: phytophthora-rot (12.89/2.15)| | | | | | roots = galls-cysts: phytophthora-rot (0.0)| int-discolor = brown| | leaf-malf = absent: brown-stem-rot (35.73/0.73)| | leaf-malf = present: 2-4-d-injury (3.15/0.68)| int-discolor = black: charcoal-rot (22.22/2.22)

Page 24: Reporter: Jin-huei Dai

24

Number of Leaves : 61Size of the tree : 93Time taken to build model: 0.05 seconds

=== Evaluation on training set ===

=== Summary ===Correctly Classified Instances 658 96.3397 %Incorrectly Classified Instances 25 3.6603 %Kappa statistic 0.9598Mean absolute error 0.0104Root mean squared error 0.0625Relative absolute error 10.7981 %Root relative squared error 28.5358 %Total Number of Instances 683

Page 25: Reporter: Jin-huei Dai

25

=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure Class 1 0.002 0.952 1 0.976 diaporthe-stem-canker 1 0 1 1 1 charcoal-rot 0.95 0 1 0.95 0.974 rhizoctonia-root-rot 1 0.008 0.946 1 0.972 phytophthora-rot 1 0 1 1 1 brown-stem-rot 1 0 1 1 1 powdery-mildew 1 0 1 1 1 downy-mildew 0.978 0.005 0.968 0.978 0.973 brown-spot 1 0.002 0.952 1 0.976 bacterial-blight 0.95 0 1 0.95 0.974 bacterial-pustule 1 0 1 1 1 purple-seed-stain 0.977 0 1 0.977 0.989 anthracnose 0.85 0 1 0.85 0.919 phyllosticta-leaf-spot 0.967 0.017 0.898 0.967 0.931 alternarialeaf-spot 0.89 0.008 0.942 0.89 0.915 frog-eye-leaf-spot 1 0 1 1 1 diaporthe-pod-&-stem-blight 1 0 1 1 1 cyst-nematode 1 0 1 1 1 2-4-d-injury 0.5 0 1 0.5 0.667 herbicide-injury

Page 26: Reporter: Jin-huei Dai

26

=== Confusion Matrix === a b c d e f g h i j k l m n o p q r s <-- classified as 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = diaporthe-stem-canker 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = charcoal-rot 1 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = rhizoctonia-root-rot 0 0 0 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = phytophthora-rot 0 0 0 0 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = brown-stem-rot 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = powdery-mildew 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 | g = downy-mildew 0 0 0 0 0 0 0 90 0 0 0 0 0 0 2 0 0 0 0 | h = brown-spot 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 | i = bacterial-blight 0 0 0 0 0 0 0 0 1 19 0 0 0 0 0 0 0 0 0 | j = bacterial-pustule 0 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 | k = purple-seed-stain 0 0 0 1 0 0 0 0 0 0 0 43 0 0 0 0 0 0 0 | l = anthracnose 0 0 0 0 0 0 0 3 0 0 0 0 17 0 0 0 0 0 0 | m = phyllosticta-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 88 3 0 0 0 0 | n = alternarialeaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 10 81 0 0 0 0 | o = frog-eye-leaf-spot 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 | p = diaporthe-pod-&-stem-blight 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 | q = cyst-nematode 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 0 | r = 2-4-d-injury 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 | s = herbicide-injury

Page 27: Reporter: Jin-huei Dai

27

Association rules Weka contain an implementation of the Apriori leaner for generating association rules, a commonly use technique in market basket analysis. This algorithm does not seek rules that predict a particular class attribute but rather looks for any rules that capture strong associations between different attribute.

ClusteringMethod of clustering also do not seek rules that predict a particular class, but rather try to divide the data into natural groups or “clusters.” Weka includes an implementation of the EM algorithm, which can be used for unsupervised learning, it makes the assumption that all attributes are independent random variables.

Page 28: Reporter: Jin-huei Dai

28

PredictiveAprioriBest rules found: 1. outlook=overcast 4 ==> play=yes 4 acc:(0.95323) 2. temperature=cool 4 ==> humidity=normal 4 acc:(0.95323) 3. humidity=normal windy=FALSE 4 ==> play=yes 4 acc:(0.95323) 4. outlook=sunny humidity=high 3 ==> play=no 3 acc:(0.92093) 5. outlook=sunny play=no 3 ==> humidity=high 3 acc:(0.92093) 6. outlook=rainy windy=FALSE 3 ==> play=yes 3 acc:(0.92093) 7. outlook=rainy play=yes 3 ==> windy=FALSE 3 acc:(0.92093) 8. outlook=sunny temperature=hot 2 ==> humidity=high play=no 2 acc:(0.86233) 9. outlook=sunny humidity=normal 2 ==> play=yes 2 acc:(0.86233) 10. outlook=sunny play=yes 2 ==> humidity=normal 2 acc:(0.86233) 11. outlook=overcast temperature=hot 2 ==> windy=FALSE play=yes 2 acc:(0.86233) 12. outlook=overcast windy=FALSE 2 ==> temperature=hot play=yes 2 acc:(0.86233) 13. outlook=rainy humidity=high 2 ==> temperature=mild 2 acc:(0.86233) 14. outlook=rainy windy=TRUE 2 ==> play=no 2 acc:(0.86233) 15. outlook=rainy play=no 2 ==> windy=TRUE 2 acc:(0.86233) 16. temperature=hot play=yes 2 ==> outlook=overcast windy=FALSE 2 acc:(0.86233) 17. temperature=hot play=no 2 ==> outlook=sunny humidity=high 2 acc:(0.86233) 18. temperature=mild humidity=normal 2 ==> play=yes 2 acc:(0.86233) 19. temperature=mild play=no 2 ==> humidity=high 2 acc:(0.86233) 20. temperature=cool windy=FALSE 2 ==> humidity=normal play=yes 2 acc:(0.86233)

Page 29: Reporter: Jin-huei Dai

29

Page 30: Reporter: Jin-huei Dai

30

Scheme: weka.clusterers.Cobweb -A 1.0 -C 0.0028209479177387815 Relation: weather Number of merges: 1 Number of splits: 0 Number of clusters: 21 node 0 [14] | node 1 [5] | | leaf 2 [1] | node 1 [5] | | leaf 3 [1] | node 1 [5] | | node 4 [2] | | | leaf 5 [1] | | node 4 [2] | | | leaf 6 [1] | node 1 [5] | | leaf 7 [1] node 0 [14]

Page 31: Reporter: Jin-huei Dai

31

node 0 [14]| node 8 [6]| | node 9 [2]| | | leaf 10 [1]| | node 9 [2]| | | leaf 11 [1]| node 8 [6]| | leaf 12 [1]| node 8 [6]| | node 13 [3]| | | leaf 14 [1]| | node 13 [3]| | | leaf 15 [1]| | node 13 [3]| | | leaf 16 [1]node 0 [14]| node 17 [3]| | leaf 18 [1]| node 17 [3]| | leaf 19 [1]| node 17 [3]| | leaf 20 [1]

Page 32: Reporter: Jin-huei Dai

32

Page 33: Reporter: Jin-huei Dai

33

Select Attributes === Run information ===Evaluator: weka.attributeSelection.PrincipalComponents -R 0.95 -A 5Search: weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1Relation: weather Instances: 14 Attributes: 5 outlook temperature humidity windy playEvaluation mode: evaluate on all training dataSearch Method: Attribute ranking.Attribute Evaluator (unsupervised): Principal Components Attribute TransformerCorrelation matrix 1 -0.47 -0.56 0.31 0.03 0.04 -0.47 1 -0.47 0.14 -0.17 -0.09 -0.56 -0.47 1 -0.44 0.13 0.04 0.31 0.14 -0.44 1 0.32 0.33 0.03 -0.17 0.13 0.32 1 0.2 0.04 -0.09 0.04 0.33 0.2 1 eigenvalue proportion cumulative 1.94405 0.32401 0.32401 0.578temperature-0.571outlook=rainy+0.506outlook=sunny+0.227windy+0.164humidity... 1.58814 0.26469 0.5887 -0.68outlook=overcast+0.443humidity+0.424outlook=rainy+0.334windy+0.217outlook=sunny... 1.29207 0.21534 0.80404 0.567outlook=sunny-0.443windy-0.432outlook=overcast-0.414humidity-0.312temperature... 0.79269 0.13212 0.93616 0.738windy-0.667humidity-0.077temperature-0.052outlook=overcast+0.033outlook=rainy... 0.38305 0.06384 1 0.748temperature-0.4humidity+0.348outlook=rainy-0.308windy-0.191outlook=overcast...

Page 34: Reporter: Jin-huei Dai

34

Eigenvectors V1 V2 V3 V4 V5 0.5064 0.2166 0.5674 0.0167 -0.1683 outlook=sunny 0.0684 -0.6798 -0.4317 -0.0522 -0.1906 outlook=overcast-0.5709 0.4244 -0.1603 0.0325 0.348 outlook=rainy 0.5785 0.053 -0.3125 -0.0772 0.7476 temperature 0.1639 0.4432 -0.4145 -0.6669 -0.4003 humidity 0.227 0.3341 -0.4433 0.7384 -0.3083 windy

Ranked attributes: 0.675990846445273472 1 0.578temperature-0.571outlook=rainy+0.506outlook=sunny+0.227windy+0.164humidity... 0.411301353536642624 2 -0.68outlook = overcast+0.443humidity+0.424outlook = rainy + 0.334windy +0.217outlook=sunny...

0.195956514975330624 3 0.567outlook=sunny-0.443windy-0.432outlook=overcast-0.414humidity-0.312temperature... 0.063841150150769192 4 0.738windy-0.667humidity-0.077temperature-0.052outlook=overcast+0.033outlook=rainy... 0.000000000000000111 5 0.748temperature-0.4humidity+0.348outlook=rainy-0.308windy-0.191outlook=overcast...

Selected attributes: 1,2,3,4,5 : 5

Page 35: Reporter: Jin-huei Dai

35

Search Method: Attribute ranking.Attribute Evaluator (supervised, Class (nominal): 5 play):

Symmetrical Uncertainty Ranking FilterRanked attributes: 0.196 1 outlook 0.05 4 windy 0 3 humidity 0 2 temperatureSelected attributes: 1,4,3,2 : 4========================Search Method: Attribute ranking.

OneR feature evaluator.Using 10 fold cross validation for evaluating attributes.Minimum bucket size for OneR: 6

Ranked attributes: 57.143 3 humidity 50 1 outlook 50 2 temperature 42.857 4 windySelected attributes: 3,1,2,4 : 4=========================Search Method: Attribute ranking.

Information Gain Ranking FilterRanked attributes: 0.2467 1 outlook 0.0481 4 windy 0 3 humidity 0 2 temperatureSelected attributes: 1,4,3,2 : 4

Page 36: Reporter: Jin-huei Dai

36

Search Method: Best first, Exhaustive Search.Selected attributes: 1,4 : 2 outlook windy============================

Search Method: Genetic search.Initial populationmerit scaled subset 0 0.03362 2 0 0.03362 2 0.04999 0.0548 4 0.06572 0.06147 1 2 3 4 ... 0.17354 0.10716 1 4 0 0.03362 3 0 0.03362 2 0 0.03362 2 Generation: 20merit scaled subset 0.19601 0.2076 1 0.19601 0.2076 1 0.19601 0.2076 1 0.19601 0.2076 1 0.17354 0.16236 1 4 0.09292 0 1 3 4 ... 0.19601 0.2076 1 Attribute Subset Evaluator (supervised, Class (nominal): 5 play):

CFS Subset EvaluatorIncluding locally predictive attributes

Selected attributes: 1,4 : 2 outlook windy

Page 37: Reporter: Jin-huei Dai

37

4. The Knowledge Flow Interface

Page 38: Reporter: Jin-huei Dai

38

Page 39: Reporter: Jin-huei Dai

39

Page 40: Reporter: Jin-huei Dai

40

Page 41: Reporter: Jin-huei Dai

41

Page 42: Reporter: Jin-huei Dai

42

Page 43: Reporter: Jin-huei Dai

43

Page 44: Reporter: Jin-huei Dai

44

Page 45: Reporter: Jin-huei Dai

45

Page 46: Reporter: Jin-huei Dai

46

Page 47: Reporter: Jin-huei Dai

47

Page 48: Reporter: Jin-huei Dai

48

5. The Experimenter

Page 49: Reporter: Jin-huei Dai

49

Page 50: Reporter: Jin-huei Dai

50

Page 51: Reporter: Jin-huei Dai

51

Dataset (1) r.ZeroR|(2)r.OneR(3)trees.J48 ------------------------------------iris (100) 33.33 | 93.53 v 94.73 v ------------------------------------ (v/ /*) | (1/0/0) (1/0/0)

Dataset (1) trees.J48|(2)r.OneR(3) r.ZeroR ------------------------------------iris (100) 94.73 | 93.53 33.33 * ------------------------------------ (v/ /*) | (0/1/0) (0/0/1)

Page 52: Reporter: Jin-huei Dai

52

Dataset (1)r.DecisionTable(2)r.ConjunctiveRule(3)r.NNge ------------------------------------iris (20) 93.33 | 66.67 * 95.67 ------------------------------------ (v/ /*) | (0/0/1) (0/1/0)

Dataset (1) rules.On | (2) rules (3) rules (4) rules (5) rules (6) rules (7) rules ----------------------------------------------------------------------------Iris (100) 93.53 | 66.67 * 93.27 93.93 96.00 94.20 94.60 ---------------------------------------------------------------------------- (v/ /*) | (0/0/1) (0/1/0) (0/1/0) (0/1/0) (0/1/0) (0/1/0) Skipped: Key:(1) rules.OneR '-B 6' 3010129309850089072(2) rules.ConjunctiveRule '-N 3 -M 2.0 -P -1 -S 1' -5938309903225087198(3) rules.DecisionTable '-X 1 -S 5' 2788557078165701326(4) rules.JRip '-F 3 -N 2.0 -O 2 -S 1' -6589312996832147161(5) rules.NNge '-G 5 -I 5' 4084742275553788972(6) rules.PART '-M 2 -C 0.25 -Q 1' 8121455039782598361(7) rules.Ridor '-F 3 -S 1 -N 2.0' -7261533075088314436

Page 53: Reporter: Jin-huei Dai

53

Dataset (1) rules.Co(2)rules(3)rules(4)trees(5)trees(6)trees(7)trees(8)trees(9)trees ----------------------------------------------------------------------------------------------------------------contact-lenses 63.17 82.50 80.67 72.17 73.17 83.50 77.00 76.17 75.67 ---------------------------------------------------------------------------------------------------------------- (v/ /*) (0/1/0) (0/1/0)(0/1/0) (0/1/0) (1/0/0) (0/1/0)(0/1/0) (0/1/0) (1) rules.ConjunctiveRule '-N 3 -M 2.0 -P -1 -S 1' -5938309903225087198(2) rules.DecisionTable '-X 1 -S 5' 2788557078165701326(3) rules.JRip '-F 3 -N 2.0 -O 2 -S 1' -6589312996832147161(4) trees.DecisionStump '' -7265551604329079943(5) trees.Id3 '' -2693678647096322561(6) trees.J48 '-C 0.25 -M 2' -217733168393644444(7) trees.LMT '-I -1 -M 15' -1113212459618104943(8) trees.NBTree '' -4716005707058256086(9) trees.RandomForest '-I 10 -K 0 -S 1' 4216839470751428698

Page 54: Reporter: Jin-huei Dai

54

Dataset (1)r.ConjunctiveRule(2)r.DecisionTable(3)r.JRip(4)r.NNge(5)r.OneR(6)r.PART(7)rules.Ridor(8)rules.ZeroR

labor-neg-data (100)77.60 | 83.80 83.70 86.23 72.77 77.73 82.70 64.67 * (v/ /*) | (0/1/0) (0/1/0) (0/1/0) (0/1/0) (0/1/0) (0/1/0) (0/0/1)

Dataset (1)r.Ridor(2)r.ConjunctiveRule(3)r.NNge(4)t.DecisionStump(5)t.LMT(6)t.RandomTree(7)bayes.BayesNet labor-neg-data(100) 82.70 | 77.60 86.23 78.77 91.37 83.90 90.60 (v/ /*) | (0/1/0) (0/1/0) (0/1/0) (0/1/0) (0/1/0) (0/1/0)

Page 55: Reporter: Jin-huei Dai

55

LIBSVM -- A Library for Support Vector Machines

• LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM ). It supports multi-class classification.

• Since version 2.8, it implements an SMO-type algorithm proposed in this paper:R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order information for training SVM. Journal of Machine Learning Research 6, 1889-1918,

2005.

Page 56: Reporter: Jin-huei Dai

56

SVM, Support Vector Machine , is something that

has similar roots with neural networks. But

recently it has been widely used in Classification.

That means, if I have some sets of things

classified (But you know nothing about HOW I

CLASSIFIED THEM, or say you don't know the

rules used for classification), when a new data

comes, SVM can PREDICT which set it should

belong to.

Page 57: Reporter: Jin-huei Dai

57

The syntax of svmtrain is basically:

svmtrain [options] training_set_file [model_file]

The syntax to svm-predict is:

svmpredict test_file model_file output_file

Page 58: Reporter: Jin-huei Dai

58

Page 59: Reporter: Jin-huei Dai

59

6. Conclusions

As the technology of machine learning continues to develop and mature, learning algorithms need to be brought to the desktops of people who work with data andunderstand the application domain from which it arises. It is necessary to get the algorithms out of the laboratoryand into the work environment of those who can use them. Weka is a significant step in the transfer of machine learning technology into the workplace.

Page 60: Reporter: Jin-huei Dai

60

6. Conclusions(cont.)

The primary one of the three separate interactive interfaces is Explorer, which gives access to all of Weka’s facilities using menu selection and form filling.

The Knowledge Flow interface allows users to design configurations for streamed data processing, and the Experimenter, with which users set up automated experiments that run selected machine learning algorithms with different parameter settings on a corpus of datasets, collect performance statistics, and perform significance tests on the results.

Page 61: Reporter: Jin-huei Dai

61

7. References

1. Ian H. Witten & Eibe Frank. [2005]. Data Mining- Practical Machine Learning Tools and Techniques, Second Edition, Morgan kaufmann,San Francisco.2. Zdravko Markov, Ingrid Russell. 2006. An Introduction to the WEKA Data Mining System, ITICSE '06: Proceedings of the 11th annual SIGCSE conference on Innovation and technology in computer science education.3. 台大資工系林智仁(cjlin)老師的 libsvm