Archive for November, 2008

Textbooks for data mining

Sunday, November 16th, 2008

I finally  made a decision regarding what textbook to use for a data mining course I will be teaching in the spring. One challenge was that the course is cross-listed in a variety of departments: computer science, business, and information technology and, as a result, the students taking the class will have a diversity of backgrounds–some strong in statistics, others in programming. My original plan was not to have people do programming at all and have them just use Weka, a free, data mining tool. I was considering 2 textbooks: Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar; and Data Mining: Practical Machine Learning Tools and Techniques, by Ian Witten and Eibe Frank. (more…)

Delivered presentation at the Chicago Digital Humanities Conference

Monday, November 3rd, 2008

About an hour ago I presented the talk titled Linguistic Dumpster Diving: Geographical Classification of Arabic Text. I co-authored this paper with my colleagues at New Mexico State University, Ahmed, Jim, and Steve. I think the talk was well-received and I received a number of great comments and suggestions. Unfortunately, I don’t know the names of all the people who made suggestions so I can’t credit them all by name. In the talk, I primarily focused on a support vector machine approach to geographically classifying text. (more…)