part i – installing weka step 1: installing weka windows). step 2

3
HW Assignment 1 Due: June 19th 2009 Part I Installing Weka The purpose of this assignment is to install and run Weka, a widely used, FREE, Data Mining Software Toolbox in Java. This homework will walk you through the basic steps of installing, running the software, building classifiers, and labeling test cases. For this assignment, you will need to download the TRAINING and TEST sets from the course website. Note: It is important that you properly install and learn how to run Weka because we will use Weka for future hands on assignments as well as for the data mining competition and course project. Step 1: Installing Weka Go to the Weka website, http://www.cs.waikato.ac.nz/ml/weka/, and download the software. On the left hand side, click on the link that says download. Select the appropriate link corresponding to the version of the software based on your operating system and whether or not you already have Java VM running on your machine (if you don’t know what Java VM is, then you probably don’t). The link will forward you to a site where you can download the software from a mirror site. Save the self-extracting executable to disk and then double click on it to install Weka. Answer yes or next to the questions during the installation. Click yes to accept the Java agreement if necessary. After you install the program Weka should appear on your start menu under Programs (if you are using Windows). Step 2: Running Weka From the start menu select Programs, then Weka, then Weka 3*. You will see the Weka GUI Chooser. Select Explorer. The Weka Explorer will then launch.

Upload: dotram

Post on 30-Jan-2017

223 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Part I – Installing Weka Step 1: Installing Weka Windows). Step 2

HW Assignment 1

Due: June 19th 2009

Part I – Installing Weka

The purpose of this assignment is to install and run Weka, a widely used, FREE, Data Mining Software Toolbox in Java. This homework will

walk you through the basic steps of installing, running the software, building classifiers, and labeling test cases. For this assignment, you will

need to download the TRAINING and TEST sets from the course website. Note: It is important that you properly install and learn how to run

Weka because we will use Weka for future hands on assignments as well as for the data mining competition and course project.

Step 1: Installing Weka

Go to the Weka website, http://www.cs.waikato.ac.nz/ml/weka/, and download the software. On the left hand side, click on the link

that says download. Select the appropriate link corresponding to the version of the software based on your operating system and

whether or not you already have Java VM running on your machine (if you don’t know what Java VM is, then you probably don’t).

The link will forward you to a site where you can download the software from a mirror site. Save the self-extracting executable to

disk and then double click on it to install Weka. Answer yes or next to the questions during the installation. Click yes to accept the

Java agreement if necessary. After you install the program Weka should appear on your start menu under Programs (if you are using

Windows). Step 2: Running Weka From the start menu select Programs, then Weka, then Weka 3*.

You will see the Weka GUI Chooser. Select Explorer. The Weka Explorer will then launch.

Page 2: Part I – Installing Weka Step 1: Installing Weka Windows). Step 2

Step 3: Load Training Set

You will find the training set, TRAIN.arff on the course website. The training set includes the records you will use in your next

homework assignment.

The TRAINING set contains the following data:

On the Weka Explorer, push the button that says open file. Open TRAIN.arff.

Step 5: Constructing the Initial Decision Tree

Select the tab that says Classify. In the box that says classifier, you can choose a classifier. Click on the Choose button and you will be

presented with a hierarchy of methods. Pick weka, classifiers, trees, J48. Click on the text box in the classifer box (which says J48 and some

cryptic options instead of ZeroR which is the default classifier). In the popup, change the following settings, minNumObj to 1 and unpruned to

True and then Click OK. (Note: The order the options appear might vary depending on which mirror site you choose. For example, we found

minNumObj is closer to the top of the GUI in some versions)

Page 3: Part I – Installing Weka Step 1: Installing Weka Windows). Step 2

You will find the test set, TEST.arff on the course website. The TEST set includes the records you will use in future homework assignments.

The TEST set contains the data below. In the box that says test options, pick Supplied test set. Click on the Set button and select your

TEST.arff file.

Now press Start!!!!!!!!!!!!! AND WATCH WEKA GO!

Step 6: Results

You may have to scroll up and down in the classifier output box to see all the results.

Cut and paste the results in the classifier output window to a text editor and HAND IN (or email) with your assignment.

You will compare these results with a future homework assignment. Don’t worry that you don’t yet know how to interpret the output. In a

short time, you will. This exercise is only to get you started with WEKA.

In the results box, on the bottom left, Right click on the item that says … trees.J48.

Select Visualize Classification Errors from the list. Click Save. And save the results as RESULTS.arff. This file will include your original

TEST set plus an extra column for the predicted classification.

Cut and paste the text in the RESULTS.arff file to the end of your assignment and HAND IN.

So, for the first part of the assignment, you simply need to hand in (or email to me) a text document with the results output from Weka

along with the prediction results found in your RESULTS.arff file

Part II: Classification/prediction problem ideas

List three prediction problem ideas for your class project based on publicly available data, Wharton research data services (wrds) data (see

shawndra.pbwiki.com), or data you have at your firm

Append your three ideas to the text file with your answers to Part I.