Main

Portfolio Assignment 2

This assignment has 2 independent parts: a Python assignment and a weka one.

Python

The Python part of this portfolio assignment is brief as you may want to put in time on the more programming intensive portfolio assignment 3. Simply finish up chapter 2 of PCI including the del.icio.us and movielens datasets. Also do one of the exercises 1-4 on p28 (exercise 5 is Portfolio Assignment 3) Note: The easiest way to get pydelicious.py is to download the complete Python code from the book

Weka

This project focuses on part 10.1 of the textbook. It presents an introduction to the weka toolkit and as an example, uses the J48 decision tree algorithm.

Part 1

First, go through 10.1 and focus on loading and running J48 on the sample data as shown in figure 10.2 (This sample data is included when you dl weka. Spend some time in understanding the format of the output as shown in figure 10.5. There is a nice description of this very output here

Part 2

In the dataset section of our website, I have the Cleveland Heart Disease dataset. I would like you to use this dataset and run through the same process as you did in Part 1. Update: I added the arff format version of this dataset to the dataset section. This should make this part substantially easier. I want you to examine the performance of C4.5 (J48 in Weka). Report on the accuracy of the method. Show and discuss your results.

Remember to credit the lead people in your blog entry.

About

A hands-on introductory course on data mining and information retrieval.

Content

Student Blogs

edit SideBar