Submitted paper to topiCS

March 5, 2010

Jeanette Gundel (University of Minnesota), Nancy Hedberg (Simon Fraser University) and I just submitted a paper to the new journal topiCS (topics in Cognitive Science). This pretty much consumed my entire spring break. The title of the paper is Underspecification of Cognitive Status in Reference Production: Some Empirical Predictions. Here’s the abstract. >>

Arabic Localization

December 26, 2009

Over the Christmas break I have been looking at words in Standard Arabic that are more common in one region compared to another. This is a continuation of work I have been doing with Ahmed Abdelali and Steve Helmreich. Ahmed has collected a corpus of Standard Arabic texts from newspapers in Egypt, Sudan, Libya, Syria, and the UK. In previous work we looked at distinguishing texts from different regions using the frequency of common words (the equivalent of common English words such as at, on,and  in).  In this work over Christmas break, I was looking for the difference in the frequency of content words (similar to Amazon’s ’statistically improbably phrases’)–words that occur in texts more frequently than you would expect by chance. >>

Delivered paper at the Computational Approaches to Arabic Script-based Languages workshop

August 28, 2009

I presented the paper Investigations on standard Arabic geographical classification at theComputational Approaches to Arabic Script-based Languages workshop. Immediately before my talk, I convinced myself that the paper was not related to the conference topic and that it was simplistic. However, it seems that it was well received. >>

Paper accepted to Arabic Script Languages Workshop

July 23, 2009

I am grateful that the paper Ahmed Abdelali, Steve Helmreich, and I worked on was accepted at the Computational Approaches to Arabic Script-based Languages workshop to be held August 26th in Ottawa (workshop program). I would also like to thank to the three reviewers for their helpful comments. Here is the conclusion. >>

Paper submitted to the Arabic Script-based Languages Workshop

May 21, 2009

Ahmed Abdelali, Steve Helmreich and I just submitted a paper to CAASL3: Computational Approaches to Arabic Script-based Languages to be held in Ottawa on August 26th. It reports on work we have done on geographical classification of Arabic text. We presented a paper on this topic at the Chicago Colloquia on Digital Humanities and Computer Science back in November 2008 (Linguistic Dumpster Diving: Geographical Classification of Arabic Text – pdf). At that colloqiua a number of people gave us good suggestions and criticisms. Our work since then has included investigating the suggestions these people made and also addressing the criticisms. For example, one individual suggested we look at non-linear methods of classification. >>

Paper submitted to the Machine Translation Summit

April 30, 2009

As I mentioned in previous posts, I developed (with tremendous help from Adam Zacharski) a cross-language instant messaging system using Adobe Flex. This system provides concurrent real-time translation for instant messaging using multiple machine translation engines. During this last academic year, Bill Ogden, my colleague in New Mexico, and several people in his lab (Sieun An and Yuki Ishikawa) used this system to evaluate the performance of machine translation systems based on how effective they were in helping people accomplish shared tasks. They used paid participants who worked in pairs (one Japanese speaker paired with a native English speaker) to accomplish a photo identification task using this instant messaging system. We just submitted a paper describing the results of this work to the Machine Translation Summit in Ottawa in August.

Playing with the Stanford Log-linear Part-Of-Speech Tagger

March 4, 2009

I would like to create a part-of-speech tagger for Paraguayan Guarani. Initially I thought I would use the Brill part of speech tagger, but it seems to have vanished from the web. In my search, I ran across the Stanford Log-Linear Part-Of-Speech Tagger. It was developed by Chris Manning’s group and I figured anything developed by Chris Manning is probably exceptional.  I downloaded it and ran the included English part-of-speech tagger on a 250k text (a public domain Tom Swift book). >>

Delivered presentation at the Chicago Digital Humanities Conference

November 3, 2008

About an hour ago I presented the talk titled Linguistic Dumpster Diving: Geographical Classification of Arabic Text. I co-authored this paper with my colleagues at New Mexico State University, Ahmed, Jim, and Steve. I think the talk was well-received and I received a number of great comments and suggestions. Unfortunately, I don’t know the names of all the people who made suggestions so I can’t credit them all by name. In the talk, I primarily focused on a support vector machine approach to geographically classifying text. >>

Linguistic Dumpster Diving

September 21, 2008

I am grateful to have the paper I wrote with my colleagues accepted at the Chicago Colloquium on Digital Humanities and Computer Science to be held Nov 1-3. Here’s the abstract. >>

Defcon

August 12, 2008

I just returned from the Defcon conference in Las Vegas. The conference deals with hacking and computer security (The organizers call it “real time social networking for ninjas”  and Wired calls it the “world’s largest computer security convention.”) This year a federal judge prevented 3 MIT students from giving their talk on how to hack the smart cards used by the Boston subway system. Fortunately, their entire detailed presentation was included on the conference CD. I went to a number of talks dealing with penetration testing. Joe Cicero talked about hacking into the typical web applications used by universities. Nathan Hamiel and Shwn Moyer gave an excellent talk on attacking social networks. Most related to my work was a talk on breaking into SCADA systems and a talk on scanning for active ports on the internet. >>