Linguistic Dumpster Diving: Geographical Classification of Arabic Text

In many text analysis tasks it is common to remove frequently occurring words as part of the pre-processing step  prior to analysis.  While the removal of frequent words is  correct for many text analysis tasks, it is not correct for all tasks. There are many analysis tasks where frequent  words play a crucial role. In this paper we examine the use of frequent words to geographically classify Arabic news stories

Zacharski, Ron; Ahmed Abdelali; Stephen Helmreich; and Jim Cowie. 2009. Linguistic Dumpster Diving: Geographical Classification of Arabic Text. Proceedings of the Chicago Colloquia on Digital Humanities and Computer Science. (pdf)

No Comments

Categories Arabic, Publications | Tags:

Social Networks: Facebook, Twitter, Google Bookmarks, del.icio.us, StumbleUpon, Digg, Reddit, Posterous.

You can follow any follow up comments to this entry through the RSS 2.0 feed.

Leave a Reply

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

By submitting a comment here you grant Ron Zacharski a perpetual license to reproduce your words and name/web site in attribution. Inappropriate or irrelevant comments will be removed at an admin's discretion.