Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CSC 573 Data Mining Assignment: Attribute Relevance Analysis using WEKA, Assignments of Computer Science

An assignment for the data mining course (csc 573) at the university level, focusing on attribute relevance analysis using the weka machine learning tool. Students are required to use three different datasets ('contact-lenses.arff', 'iris.arff', and 'soybean.arff') and perform attribute ranking using infogainattributeeval and gainratioattributeeval methods. The document also covers the installation of weka and the discretization of non-class attributes. The assignment includes evaluating the results and submitting the output and observations files.

Typology: Assignments

Pre 2010

Uploaded on 08/18/2009

koofers-user-bce-1
koofers-user-bce-1 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSC 573: Data Mining
Weka Assignment #1: Attribute Relevance Analysis in WEKA
Instructor: Ratko Orlandic
For this assignment, your task is to familiarize yourself with the WEKA machine learning tool and
the attribute ranking facilities in WEKA (“Select attributes” feature in WEKA Explorer). For this
assignment, you will use “contact-lenses”, “iris”, and “soybean” data sets, all of which are
available in the required .arff format in the WEKA package. The “contact-lenses” data set has 24
instances with 5 nominal attributes, the last of which (“contact-lenses”) is the class dimension. The
“iris” set has 150 instances with 4 continuous attributes and the nominal class, which is the last
(5th) dimension. The “soybean” set has 683 instances with 36 nominal attributes, the last of which
is the class dimension. Unlike the other two sets, “soybean” has missing values.
Installing WEKA on Your Computer
WEKA machine learning tool is installed on the computers in the computer lab UHB 2030. You
can also download to your computer a free copy of the software as follows:
1. Go to: http://www.cs.waikato.ac.nz/~ml/ .
2. Click on the “software” tab.
3. Under “Getting started”, click on “Download”.
4. Under “Windows”, click on the link to download a self-extracting executable that includes
Java VM 1.4 (weka-3-4-10jre.exe).
5. Install the WEKA software on your computer selecting default directories.
WEKA comes with certain data files and some documentation. Once you install the software, you
can find these on your computer in the directory C:\Program Files\Weka-3-4. Whether you work in
the lab or on your computer, you should spend some time familiarizing yourself with WEKA. For
this assignment, you will be working with WEKA Explorer.
Attribute Relevance Ranking
For each step, open the indicated file in the “Preprocess” window. Then, go to the “Attribute
Selection” window and set the “Attribute selection mode to “Use full training set”. For each case
A-E below, perform attribute ranking using the following attribute selection methods with default
parameters:
a) InfoGainAttributeEval; and
b) GainRatioAttributeEval;
These attribute selection methods should consider only non-class dimensions (for each set, the
class attribute is indicated above the “Start” button). Record the output of each run in a text file
called “output.txt”. For that, copy the output of the run from the “Attribute selection output”
window in the Explorer and paste it at the end of the “output.txt” file.
A. Perform attribute ranking on the “contact-lenses.arff” data set using the two attribute ranking
methods with default parameters.
pf2

Partial preview of the text

Download CSC 573 Data Mining Assignment: Attribute Relevance Analysis using WEKA and more Assignments Computer Science in PDF only on Docsity!

CSC 573: Data Mining Weka Assignment #1 : Attribute Relevance Analysis in WEKA Instructor: Ratko Orlandic

For this assignment, your task is to familiarize yourself with the WEKA machine learning tool and the attribute ranking facilities in WEKA (“Select attributes” feature in WEKA Explorer). For this assignment, you will use “contact-lenses”, “iris”, and “soybean” data sets, all of which are available in the required .arff format in the WEKA package. The “contact-lenses” data set has 24 instances with 5 nominal attributes, the last of which (“contact-lenses”) is the class dimension. The “iris” set has 150 instances with 4 continuous attributes and the nominal class, which is the last (5 th^ ) dimension. The “soybean” set has 683 instances with 36 nominal attributes, the last of which is the class dimension. Unlike the other two sets, “soybean” has missing values.

Installing WEKA on Your Computer

WEKA machine learning tool is installed on the computers in the computer lab UHB 2030. You can also download to your computer a free copy of the software as follows:

  1. Go to: http://www.cs.waikato.ac.nz/~ml/.
  2. Click on the “software” tab.
  3. Under “Getting started”, click on “Download”.
  4. Under “Windows”, click on the link to download a self-extracting executable that includes Java VM 1.4 (weka-3-4-10jre.exe).
  5. Install the WEKA software on your computer selecting default directories.

WEKA comes with certain data files and some documentation. Once you install the software, you can find these on your computer in the directory C:\Program Files\Weka-3-4. Whether you work in the lab or on your computer, you should spend some time familiarizing yourself with WEKA. For this assignment, you will be working with WEKA Explorer.

Attribute Relevance Ranking

For each step, open the indicated file in the “Preprocess” window. Then, go to the “Attribute Selection” window and set the “Attribute selection mode to “Use full training set”. For each case A-E below, perform attribute ranking using the following attribute selection methods with default parameters: a) InfoGainAttributeEval; and b) GainRatioAttributeEval; These attribute selection methods should consider only non-class dimensions (for each set, the class attribute is indicated above the “Start” button). Record the output of each run in a text file called “output.txt”. For that, copy the output of the run from the “Attribute selection output” window in the Explorer and paste it at the end of the “output.txt” file.

A. Perform attribute ranking on the “contact-lenses.arff” data set using the two attribute ranking methods with default parameters.

B. Load the “iris.arff” data set. Perform attribute ranking on the “iris.arff” data set using the two attribute ranking methods with default parameters.

C. Go back to “Preprocess” and load the “iris.arff” data set. Perform discretization of all non-class attributes into 10 equal-width bins as follows: under “Filter” in the “Preprocess” window of the Explorer, select ‘filters’->’unsupervised’->’attribute’->’Discretize’ (use default parameters of the ‘Discretize’ filter) and hit `Apply’. Verify that all attributes are nominal by clicking on individual attributes in the “Attributes” window in “Preprocess”. Then perform attribute ranking on the discretized set using the two attribute-ranking methods with default parameters.

D. Go back to “Preprocess” and load the original “iris.arff” data set again. Perform discretization of all non-class attributes into 5 close-to-equal-height bins by selecting the ’Discretize’ filter. Then, select appropriate parameters by clicking on the ’Discretize’ filter in the “Filter” window, and setting `bins’ to 5 and ‘useEqualFrequency’ to true. After you verify that all attributes are nominal, perform attribute ranking on the new set using the two attribute-ranking methods with default parameters.

E. Load the “soybean.arff” data set. Then perform attribute ranking on the “soybean.arff” data set using the two attribute ranking methods with default parameters.

Evaluation

Once you have performed the experiments, you should spend some time evaluating your results. In particular, try to answer at least the following questions: Why would one need attribute relevance ranking? Do these attribute-ranking methods often agree or disagree? On which data set(s), if any, these methods disagree? Does discretization and its method affect the results of attribute ranking? Do missing values affect the results of attribute ranking? Record these and any other observations in a Word file called “Observations.doc”.

Assignment Submission and Grading

On or before the due date, please submit in a single zipped file through the Blackboard system the “output.txt” file with the results of your runs and the “Observations.doc” file. Please adhere to the following submission procedure:

  1. ZIP all files using WinZip;
  2. Name the zipped file as follows: LastnameFirstnameAssign1.zip ;
  3. Submit the zipped file through the digital drop box in the Blackboard system.

Grading will be done based on the correctness of the results in your output file as well as the extensiveness, clarity, and correctness of your observations.

Good luck!