Main

Portfolio Assignment 4

Clustering

You can do this assignment in any language.

Part 1

You are to go through the material in pci chapter 3. This chapter covers clustering, spidering, and visualization. In the first pass through the chapter you might want to skip spidering and just use his dataset.

Part 2

Create a dataset suitable for clustering. Run hierarchical and k-means clustering on it. For example, you can create this dataset using the del.icio.us api, the imdb collection, or last.fm. The criteria is that it needs to be suitable for clustering. This effort can be a team one, or you can find a partner interested in the same data. In addition to blogging, you should be able to present what you did in class.

Part 3

I'd like you to start exploring visualization. As a team decide on either many-eyes.com or a programming language frequently used for data visualization, Processing. Spend an evening exploring it. I will ask teams to do a class presentation. The next portfolio assignment focuses on visualization.

About

A hands-on introductory course on data mining and information retrieval.

Content

Student Blogs

edit SideBar