Hackintosh
December 31st, 2009Over Christmas break I built a Hackintosh. I was partly inspired by my son, Adam, who built, a few months ago, an Intel i7 920 based Hackintosh using a solid state hard drive, an ASUS P6T Deluxe V2 motherboard, 6GB of Corsair memory, and a Sapphire Radeon HD4870 1GB DDR5 Dual DVI / TVO PCI-Express Graphics Card. I was also inspired by a Hackintosh how-to article on Lifehacker. I could have gone the safe route and used the exact components of Adam’s or the Lifehacker build. Instead i decided to build a Hackintosh based on the Intel i7 860 Lynnfield. From reports on tech websites the 860 seems like a slightly better processor than the 920 (for example, this anandtech review). Both are of a 4 core/8 thread design, but the 860 has a slightly better clock speed and a higher single core turbo frequency. My build included the Gigabyte GA-P55-UD3R, 8GB of Patriot memory, and a GeForce 9800 video card (based mainly on its compatibility with Snow Leopard). The total cost of the build was around $700 (not including a case, power supply, and a CD/DVD drive, which I salvaged from my previous computer–an Ubuntu Box I built).
Regarding installing Snow Leopard, I was unsuccessful in getting either the boot CD method or the USB method to work with this build (both methods described on the tonymacx86 website). The method that did work was to install the OS on the hard drive by using another Mac. My Hackintosh runs the stock 10.6.2 kernel. I used a DSDT specific to my motherboard and available at DSDT Database for P55 Motherboards. To get networking working I used a kext specific to the network chipset. Most of the information on how to do this is available on the tonymacx86 website. Sound does not work even with trying various kexts specific to the audio chipset. I hope to resolve this by using a usb audio interface. Other than sound, everything seems to be working great!
Arabic Localization
December 26th, 2009Over the Christmas break I have been looking at words in Standard Arabic that are more common in one region compared to another. This is a continuation of work I have been doing with Ahmed Abdelali and Steve Helmreich. Ahmed has collected a corpus of Standard Arabic texts from newspapers in Egypt, Sudan, Libya, Syria, and the UK. In previous work we looked at distinguishing texts from different regions using the frequency of common words (the equivalent of common English words such as at, on, and in). In this work over Christmas break, I was looking for the difference in the frequency of content words (similar to Amazon’s ’statistically improbably phrases’)–words that occur in texts more frequently than you would expect by chance. I used 2 statistics, log likelihood and mutual information. Work by Ted Dunning suggests that log likelihood works better for statistically rare events than mutual information does. Currently I am not sure what to make of the results but here are the top 5 ’statistically improbably’ words from each region (using log likelihood):
Sudan المقاولون الساحل برانكو اعداده الموردة استعدادا
Egypt المقاولون الساحل برانكو الموردة التضامن اعداده
UK المقاولون الساحل برانكو اعداده التضامن الامل
Libya مدني الساحل استعدادا الامل الاولمبي حليم
Syria بشار البوكمال المادة النادي الفنان الشاعر
Investigations on Standard Arabic Geographical Classification
September 24th, 2009I am finally getting around to posting the paper I presented at the Computational Approaches to Arabic Script-based Languages Workshop last month. This paper examines how to build a give me an Arabic document and I will tell you what country it is from system. It reports on work that I have done with Ahmed Abdelali and Stephen Helmreich at New Mexico State University. I would like to thank the people at the workshop for their gracious comments. I would particularly like to thank Ali Farghaly and Karine Megerdoomian for organizing the conference.
The formal abstract of the paper is
This paper reports on a series of studies focused on the geographical classification of Standard Arabic. The aim of these studies was to automatically classify a document based on the author’s country of origin. The studies examined documents from newspapers in five countries: Egypt, Libya, Sudan, Syria, and the U.K. using the frequency of common words for classification. We evaluated ten classification algorithms on this task. The best performing algorithms were bagging C4.5, neural network with back propagation, NBTree, and SMO with a polynomial kernel. These methods were over 99% accurate in geographically classifying the documents.
Abdelali, Ahmed, Steve Helmreich, and Ron Zacharski. 2009. Investigations on Standard Arabic Geographical Classification. Proceedings of the Computational Approaches to Arabic Script-based Languages Workshop, Ottawa, 26 August 2009. (pdf)