CSC 573 Data Mining Assignment: Attribute Relevance Analysis using WEKA | Assignments Computer Science

CSC 573: Data Mining

Weka Assignment #1: Attribute Relevance Analysis in WEKA

Instructor: Ratko Orlandic

For this assignment, your task is to familiarize yourself with the WEKA machine learning tool and

the attribute ranking facilities in WEKA (“Select attributes” feature in WEKA Explorer). For this

assignment, you will use “contact-lenses”, “iris”, and “soybean” data sets, all of which are

available in the required .arff format in the WEKA package. The “contact-lenses” data set has 24

instances with 5 nominal attributes, the last of which (“contact-lenses”) is the class dimension. The

“iris” set has 150 instances with 4 continuous attributes and the nominal class, which is the last

(5th) dimension. The “soybean” set has 683 instances with 36 nominal attributes, the last of which

is the class dimension. Unlike the other two sets, “soybean” has missing values.

Installing WEKA on Your Computer

WEKA machine learning tool is installed on the computers in the computer lab UHB 2030. You

can also download to your computer a free copy of the software as follows:

1. Go to: http://www.cs.waikato.ac.nz/~ml/ .

2. Click on the “software” tab.

3. Under “Getting started”, click on “Download”.

4. Under “Windows”, click on the link to download a self-extracting executable that includes

Java VM 1.4 (weka-3-4-10jre.exe).

5. Install the WEKA software on your computer selecting default directories.

WEKA comes with certain data files and some documentation. Once you install the software, you

can find these on your computer in the directory C:\Program Files\Weka-3-4. Whether you work in

the lab or on your computer, you should spend some time familiarizing yourself with WEKA. For

this assignment, you will be working with WEKA Explorer.

Attribute Relevance Ranking

For each step, open the indicated file in the “Preprocess” window. Then, go to the “Attribute

Selection” window and set the “Attribute selection mode to “Use full training set”. For each case

A-E below, perform attribute ranking using the following attribute selection methods with default

parameters:

a) InfoGainAttributeEval; and

b) GainRatioAttributeEval;

These attribute selection methods should consider only non-class dimensions (for each set, the

class attribute is indicated above the “Start” button). Record the output of each run in a text file

called “output.txt”. For that, copy the output of the run from the “Attribute selection output”

window in the Explorer and paste it at the end of the “output.txt” file.

A. Perform attribute ranking on the “contact-lenses.arff” data set using the two attribute ranking

methods with default parameters.

CSC 573 Data Mining Assignment: Attribute Relevance Analysis using WEKA, Assignments of Computer Science