Portfolio Assignment 2
This assignment has 2 independent parts: a Python assignment and a weka one.
Python
The Python part of this portfolio assignment is brief as you may want to put in time on the more programming intensive portfolio assignment 3. Simply finish up chapter 2 of PCI including the del.icio.us and movielens datasets. Also do one of the exercises 1-4 on p28 (exercise 5 is Portfolio Assignment 3) Note: The easiest way to get pydelicious.py is to download the complete Python code from the book
Weka
This project focuses on part 10.1 of the textbook. It presents an introduction to the weka toolkit and as an example, uses the J48 decision tree algorithm.
Part 1
First, go through 10.1 and focus on loading and running J48 on the sample data as shown in figure 10.2 (This sample data is included when you dl weka. Spend some time in understanding the format of the output as shown in figure 10.5. There is a nice description of this very output here
Part 2
In the dataset section of our website, I have the Cleveland Heart Disease dataset. I would like you to use this dataset and run through the same process as you did in Part 1. Update: I added the arff format version of this dataset to the dataset section. This should make this part substantially easier. I want you to examine the performance of C4.5 (J48 in Weka). Report on the accuracy of the method. Show and discuss your results.
Remember to credit the lead people in your blog entry.