<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Ron Zacharski</title>
	<atom:link href="http://www.zacharski.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.zacharski.org</link>
	<description></description>
	<lastBuildDate>Wed, 01 Feb 2012 23:15:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Cluster by Night</title>
		<link>http://www.zacharski.org/2012/01/05/cluster-by-night/</link>
		<comments>http://www.zacharski.org/2012/01/05/cluster-by-night/#comments</comments>
		<pubDate>Thu, 05 Jan 2012 02:32:15 +0000</pubDate>
		<dc:creator>raz</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.zacharski.org/?p=2250</guid>
		<description><![CDATA[If you are a student looking for a cool individual study project or are just interested in a project for its own sake, you might consider updating the existing resource, Cluster by Night. Cluster by Night (CnB) is a live CD approach to setting up an HPC (High Performance Computing) cluster for MPI work. (MPI [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.zacharski.org/wp-content/uploads/2012/01/photo2-e1325729842910.jpg"><img class="alignleft size-thumbnail wp-image-2253" style="border-image: initial; border-width: 10px; border-color: #211e1f; border-style: solid;" title="photo" src="http://www.zacharski.org/wp-content/uploads/2012/01/photo2-e1325729842910-150x150.jpg" alt="" width="150" height="150" /></a>If you are a student looking for a cool individual study project or are just interested in a project for its own sake, you might consider updating the existing resource, <a href="http://www.dirigibleflightcraft.com/CbN/">Cluster by Night</a>. Cluster by Night (CnB) is a live CD approach to setting up an HPC (High Performance Computing) cluster for MPI work. (MPI is a programming library that allows you to write programs for computing clusters.) We&#8217;ve used CnB for the last several years in our operating systems class. What distinguishes CbN from other approaches (for example, the popular <a href="http://idea.uab.es/mcreel/PelicanHPC/">PelicanHPC</a>) is that it can work with an existing network. With other approaches the master node on the cluster hands out IP addresses; with CbN the cluster nodes receive their IP addresses from the existing DHCP server. I think Cluster by Night is an awesome resource.</p>
<p><span style="color: #ffffff;">How can you help?<span id="more-2250"></span></span></p>
<p>Here&#8217;s the thing. The last time CbN was updated was several years ago. It doesn&#8217;t work on computers that have new network cards. So, for example, it doesn&#8217;t work on the desktops in our computer lab. Bummer. Getting this updated is not rocket science.  Right now, CbN uses Tiny Core Linux v2.2. We would need to update that to the current version, 4.2.  CnB also uses an outdated version of the openMPI library. If would be nice if we could update that. There are several other enhancements we could make. The work is incremental. So there are nice defined independent tasks.  And there is yet another reason you might consider this &#8230;</p>
<p><span style="color: #ff9900;">You will be doing something that will be used by a community of people.</span></p>
<p>Another cool thing about this project is that there is a high probability that it will be used by a fair number of people. Often undergrad projects once completed are put on the shelf and ignored. Here&#8217;s a chance to make a difference!  If you are interested contact me.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.zacharski.org/2012/01/05/cluster-by-night/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>topiCS paper &#8211; Accepted!</title>
		<link>http://www.zacharski.org/2011/04/06/topics-paper-accepted/</link>
		<comments>http://www.zacharski.org/2011/04/06/topics-paper-accepted/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 13:02:46 +0000</pubDate>
		<dc:creator>raz</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.zacharski.org/?p=1677</guid>
		<description><![CDATA[Jeanette Gundel, Nancy Hedberg, and I finally got our paper, Underspecification of Cognitive Status in Reference Production: Some Empirical Predictions accepted to the journal, Topics In Cognitive Science, a journal of the Cognitive Science Society. It will be appearing in the special issue on &#8220;Production of Referring Expressions: Bridging the Gap between Computational and Empirical [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://linguistics.umn.edu/people/profile.php?UID=gunde003">Jeanette Gundel</a>, <a href="http://www.sfu.ca/linguistics/people/faculty/hedberg.html">Nancy Hedberg</a>, and I finally got our paper, <a href="http://www.zacharski.org/papers/TopiCS.pdf">Underspecification of Cognitive Status in Reference Production: Some Empirical Predictions</a> accepted to the journal, <a href="http://www.cognitivesciencesociety.org/journal_topics.html">Topics In Cognitive Science</a>, a journal of the Cognitive Science Society. It will be appearing in the special issue on &#8220;Production of Referring Expressions: Bridging the Gap between Computational and Empirical Approaches to Reference.&#8221;  Within the Givenness Hierarchy framework we outlined in our 1993 paper, lexical items included in referring forms are assumed to conventionally encode two kinds of information: conceptual information about the speaker&#8217;s intended referent and procedural information about the assumed cognitive status of that referent in the mind of the addressee. In this current paper we explore the role of underspecification of cognitive status in reference processing.We show how this framework accounts for a number of experimental results in the literature.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.zacharski.org/2011/04/06/topics-paper-accepted/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Presented Workshop in Kansas</title>
		<link>http://www.zacharski.org/2011/03/10/presented-workshop-in-kansas/</link>
		<comments>http://www.zacharski.org/2011/03/10/presented-workshop-in-kansas/#comments</comments>
		<pubDate>Thu, 10 Mar 2011 01:40:19 +0000</pubDate>
		<dc:creator>raz</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.zacharski.org/?p=1693</guid>
		<description><![CDATA[I just presented a half-day session on data and text mining at the Digital Jumpstart Workshop at the University of Kansas. I am grateful to the co-directors of the Institute for Digital Research in the Humanities for inviting me to this event, which was open to KU faculty, staff, and graduate students. Links to the [...]]]></description>
			<content:encoded><![CDATA[<p>I just presented a half-day session on data and text mining at the Digital Jumpstart Workshop at the University of Kansas. I am grateful to the co-directors of the Institute for Digital Research in the Humanities for inviting me to this event, which was open to KU faculty, staff, and graduate students. Links to the resources I covered at the workshop are available at <a href="http://guidetodatamining.com/resource/digital-humanities/">Resources for the Digital Jumpstart Workshop</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.zacharski.org/2011/03/10/presented-workshop-in-kansas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kansas Corpus Linguistics Talk</title>
		<link>http://www.zacharski.org/2011/03/02/kansas-corpus-linguistics-talk/</link>
		<comments>http://www.zacharski.org/2011/03/02/kansas-corpus-linguistics-talk/#comments</comments>
		<pubDate>Wed, 02 Mar 2011 14:32:06 +0000</pubDate>
		<dc:creator>raz</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.zacharski.org/?p=1732</guid>
		<description><![CDATA[I just presented an invited paper &#8220;Don&#8217;t throw the analysis out with the bath water: Lessons learned from Modern Standard Arabic geographical classification&#8221; at the University of Kansas. The talk was sponsored by the departments of Linguistics and Slavic Languages. The abstract is as follows: In corpus linguistics we throw out information. For example, in [...]]]></description>
			<content:encoded><![CDATA[<p>I just presented an invited paper &#8220;Don&#8217;t throw the analysis out with the bath water: Lessons learned from Modern Standard Arabic geographical classification&#8221; at the University of Kansas. The talk was sponsored by the departments of Linguistics and Slavic Languages. The abstract is as follows:</p>
<p>In corpus linguistics we throw out information. For example, in collecting corpora we necessarily omit some information about the extralinguistic context and only record that which we consider relevant for the purpose of our current research. In the analysis stage, we often remove data without thinking. One clear example of this is the routine practice of removing frequent words (commonly referred to as ‘stop words’)  in a pre-processing step before analysis. In this talk I  describe my work in Modern Standard Arabic geographical classification to illustrate the importance of being more mindful when we make these decisions about what to keep and what to discard. For example, I will show that it is possible to geographically classify text solely using words that some researchers have described as being fluff, superfluous, and non-significant. I will also describe how the paucity of metadata of commonly available Arabic corpora hampers research such as this</p>
]]></content:encoded>
			<wfw:commentRss>http://www.zacharski.org/2011/03/02/kansas-corpus-linguistics-talk/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>API Workshop</title>
		<link>http://www.zacharski.org/2011/02/27/api-workshop/</link>
		<comments>http://www.zacharski.org/2011/02/27/api-workshop/#comments</comments>
		<pubDate>Sun, 27 Feb 2011 14:23:39 +0000</pubDate>
		<dc:creator>raz</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.zacharski.org/?p=1730</guid>
		<description><![CDATA[I  just attended, as an invited participant, the Maryland Institute for Technology in the Humanities&#8217; API Workshop, which was held on February 25th and 26th. The workshop alternated between presentations, lightning talks, and what the organizers called &#8216;unconferencing&#8217;. The highlights for me were the talks given by Mano Marks on Google&#8217;s MAP API, Google&#8217;s Fusion [...]]]></description>
			<content:encoded><![CDATA[<p>I  just attended, as an invited participant, the Maryland Institute for Technology in the Humanities&#8217; API Workshop, which was held on February 25th and 26th. The workshop alternated between presentations, lightning talks, and what the organizers called &#8216;unconferencing&#8217;. The highlights for me were the talks given by Mano Marks on Google&#8217;s MAP API, Google&#8217;s <a href="http://www.google.com/fusiontables/public/tour/index.html">Fusion Table</a>, and <a href="http://code.google.com/p/google-refine/">Google Refine</a>. I&#8217;ve spent more time than I care to remember cleaning up language data. For example, I spent weeks cleaning up a Guarani lexicon. Google Refine is a tool that helps automate that process. If I used that for the Guarani lexicon I would have been done in a day! Google Fusion Table is an amazingly easy way to create map mashups just using a spreadsheet. The maps you create can be embedded on your web page. Even though I am gushing about Mano Marks&#8217; talks, the other presentations were equally valuable.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.zacharski.org/2011/02/27/api-workshop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chicago Colloquium for Digital Humanities</title>
		<link>http://www.zacharski.org/2010/11/24/chicago-colloquium-for-digital-humanities/</link>
		<comments>http://www.zacharski.org/2010/11/24/chicago-colloquium-for-digital-humanities/#comments</comments>
		<pubDate>Wed, 24 Nov 2010 14:13:23 +0000</pubDate>
		<dc:creator>raz</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.zacharski.org/?p=1727</guid>
		<description><![CDATA[I just got back from attending the Chicago Colloquium for Digital Humanities and Computer Science (21-22 of November).  I presented the paper &#8220;Language Preservation: A case study in collecting and digitizing machine-tractable language data.&#8221; The paper was about work I have done with Jim Cowie and Steve Helmreich of New Mexico State University on our [...]]]></description>
			<content:encoded><![CDATA[<p>I just got back from attending the Chicago Colloquium for Digital Humanities and Computer Science (21-22 of November).  I presented the paper &#8220;Language Preservation: A case study in collecting and digitizing machine-tractable language data.&#8221; The paper was about work I have done with Jim Cowie and Steve Helmreich of New Mexico State University on our collection efforts to collect resources for lesser-studied languages. It reported on work we have done on the Paraguayan indigenous language Guarani, and Uighur, an Altaic Turkic language spoken in the Xinjiang province of China.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.zacharski.org/2010/11/24/chicago-colloquium-for-digital-humanities/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>THATCamp Chicago</title>
		<link>http://www.zacharski.org/2010/11/20/thatcamp-chicago/</link>
		<comments>http://www.zacharski.org/2010/11/20/thatcamp-chicago/#comments</comments>
		<pubDate>Sat, 20 Nov 2010 14:38:49 +0000</pubDate>
		<dc:creator>raz</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.zacharski.org/?p=1736</guid>
		<description><![CDATA[I was an invited participant at THATCamp Chicago (The Humanities and Technology Camp), &#8220;a user-generated unconference where humanists and technologists work together for the common good&#8221; which was held on November 20th. I participated in a number of great sessions. Of particular interest to me was the GeoTools/GIS session.  Jo Guldi, a historian at Harvard, [...]]]></description>
			<content:encoded><![CDATA[<p>I was an invited participant at THATCamp Chicago (The Humanities and Technology Camp), &#8220;a user-generated unconference where humanists and technologists work together for the common good&#8221; which was held on November 20th. I participated in a number of great sessions. Of particular interest to me was the GeoTools/GIS session.  <a href="http://www.joguldi.com/">Jo Guldi</a>, a historian at Harvard, was interested in what she calls &#8216;geo-parsing&#8217;&#8211; identifying place names in text. She is interested in detecting subaltern agency in Britain by analyzing books published between 1848 and 1919.  It sounds like a fun named entity extraction task and I volunteered to help her.  I also attended sessions on GIT and XML/TEI.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.zacharski.org/2010/11/20/thatcamp-chicago/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>topiCS paper &#8211; revised</title>
		<link>http://www.zacharski.org/2010/11/05/topics-paper-finished/</link>
		<comments>http://www.zacharski.org/2010/11/05/topics-paper-finished/#comments</comments>
		<pubDate>Fri, 05 Nov 2010 18:28:01 +0000</pubDate>
		<dc:creator>raz</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.zacharski.org/?p=1395</guid>
		<description><![CDATA[Jeanette Gundel, Nancy Hedberg, and I just finished a revision of our paper: Underspecification of Cognitive Status in Reference Production: Some Empirical Predictions and resubmitted it to the journal, Topics In Cognitive Science.]]></description>
			<content:encoded><![CDATA[<p>Jeanette Gundel, Nancy Hedberg, and I just finished a revision of our paper: <a href="http://www.zacharski.org/papers/TopiCS.pdf">Underspecification of Cognitive Status in Reference Production: Some Empirical Predictions</a> and resubmitted it to the journal, Topics In Cognitive Science.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.zacharski.org/2010/11/05/topics-paper-finished/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Underspecification of Cognitive Status in Reference Production</title>
		<link>http://www.zacharski.org/2010/11/05/topics-paper-done/</link>
		<comments>http://www.zacharski.org/2010/11/05/topics-paper-done/#comments</comments>
		<pubDate>Fri, 05 Nov 2010 18:14:20 +0000</pubDate>
		<dc:creator>raz</dc:creator>
				<category><![CDATA[Publications]]></category>
		<category><![CDATA[Referring Expressions]]></category>

		<guid isPermaLink="false">http://www.zacharski.org/?p=1386</guid>
		<description><![CDATA[Within the Givenness Hierarchy framework of Gundel, Hedberg, &#38; Zacharski (1993), lexical items included in referring forms are assumed to conventionally encode two kinds of information: conceptual information about the speaker’s intended referent and procedural information about the assumed cognitive status of that referent in the mind of the addressee, the latter encoded by various [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" src="http://www.zacharski.org/papers/topics.png" alt="" width="161" height="233" /></p>
<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Times New Roman'} -->Within the Givenness Hierarchy framework of Gundel, Hedberg, &amp; Zacharski (1993), lexical items included in referring forms are assumed to conventionally encode two kinds of information: conceptual information about the speaker’s intended referent and procedural information about the assumed cognitive status of that referent in the mind of the addressee, the latter encoded by various determiners and pronouns. The current work focuses on effects of underspecification of cognitive status, establishing that, while salience and degree of accessibility play an important role in reference processing, the Givenness Hierarchy itself is not a hierarchy of degrees of salience/accessibility, contrary to what has often been assumed. We thus show that the framework is able to account for a number of experimental results in the literature without making additional assumptions about form-specific constraints associated with different referring forms.</p>
<p>Gundel, Jeanette K., Nancy Hedberg, and Ron Zacharski. Forthcoming. Underspecification of Cognitive Status in Reference Production: Some Empirical Predictions. Topics in Cognitive Science (<a href="http://www.zacharski.org/papers/TopiCS.pdf">pdf</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.zacharski.org/2010/11/05/topics-paper-done/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Language Preservation</title>
		<link>http://www.zacharski.org/2010/09/01/abstract-submitted-language-preservation/</link>
		<comments>http://www.zacharski.org/2010/09/01/abstract-submitted-language-preservation/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 03:06:19 +0000</pubDate>
		<dc:creator>raz</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://www.zacharski.org/?p=1247</guid>
		<description><![CDATA[My colleagues (Jim Cowie and Steve Helmreich of New Mexico State University) and I just submitted a paper titled &#8220;Language Preservation: A case study in collecting and digitizing machine-tractable language data&#8221; to the Chicago Colloquium. The abstract is: In this paper we describe a process for collecting and digitizing machine-tractable resources for lesser-studied languages. We [...]]]></description>
			<content:encoded><![CDATA[<p>My colleagues (Jim Cowie and Steve Helmreich of New Mexico State University) and I just submitted a paper titled &#8220;Language Preservation: A case study in collecting and digitizing machine-tractable language data&#8221; to the Chicago Colloquium. The abstract is:</p>
<p>In this paper we describe a process for collecting and digitizing machine-tractable resources for lesser-studied languages. We illustrate this process by using examples from the Paraguayan indigenous language Guarani, and Uighur, a Altaic Turkic language spoken in the Xinjiang province of China. By &#8216;machine-tractable&#8217;  we mean that in addition to being readable by people, the resource can also be processed by a computational tool. Our goal in acquiring these resources is to use them for quick ramp-up machine translation. These resources are also useful to scholars who are studying these particular languages.<span id="more-1247"></span></p>
<p>In previous work we developed a complex web-based acquisition system, Boas, that guided linguistically-naive language informants through the process of acquiring descriptive knowledge about the parameter inventory for a particular language. For example, through a set of guided examples, morphological parameters such as number, gender, and case would be elicited from the informant.  The system was designed to be used for any of the world&#8217;s languages and, as a result, the elicitation process was complex. Most of our language informants understandably grew tired of the process and the quality of the resources we collected suffered. These experiences led us to a different methodology for our current collection efforts. Instead of guiding acquirers through an elicitation process designed to gather knowledge about parameters, acquirers are used to construct basic resources for a language including:</p>
<p>monolingual text corpus of at least 250,000 words</p>
<p>a parallel bilingual (with English) text corpus of at least 250,000 words</p>
<p>a bilingual lexicon of at least 10,000 headwords or lemmas</p>
<p>a small manually-annotated part of speech tagged corpus</p>
<p>small manually-annotated named entity tagged corpus  (a corpus where the proper names are tagged with their classification)</p>
<p>a morphological analyzer</p>
<p>Boas, as described above, was a web-based application and as a result had all the benefits of a modern web application. This had a number of direct advantages for language preservation projects. For example, acquirers could use the software from the nearest browser. It enabled the developers to fix defects promptly without having to distribute updates which users would need to install. Finally it facilitated the central storage of linguistic information. However, a significant downside was that it required acquirers to have adequate connection to the Internet and this could be a considerable obstacle for some acquirers living in remote areas.</p>
<p>In this paper we contrast our previous work which focused on developing a sophisticated tool with our current work which embodies a number of principles of language acquisition including:</p>
<p><span style="color: #3366ff;">1. Pragmatic use of language acquirers.</span> The creation of a team to carry out the acquisition task is often tricky. The languages we are considering are not normally supported by large translation agencies. In addition language acquirers for many of the languages we have been working on are rare in the United States. There may be a population of refugees, but their linguistic expertise and their grasp of English may be poor. This, however, is not always true and we have carried out acquisition efforts for Uighur and Chechen by finding a few people willing to undertake what is a very heavy workload. There may also be an occasional academic, who has the the language and linguistic skills needed, but is usually too busy for the chore of actual acquisition. This type of expert, however, may be a good choice for validating some subset of the acquired corpora. At this point we need to start considering acquirers in the country of use. These acquirers are in their linguistic community, but pose unique problems in terms of training. In the case of Guarani, for example, we worked with Idelguap, the Instituto de la Lingüística Guaraní del Paraguay where it was possible to visit and carry out acquisition training. Concepts such as corpora were completely unknown to our team of acquirers; they imagined this to be some sort of word list. Other issues, such as part-of-speech inventory had to be agreed upon by the entire geographically dispersed team. It is necessary to identify one person who has a good grasp of the ideas after training and who can support the remainder of the team in the acquisition effort. Our development teams have usually emerged serendipitously and are a combination of computer capable linguists and bilingual speakers.</p>
<p><span style="color: #3366ff;">2. Use a bridge language only if absolutely necessary.</span> For many lesser-studied languages, it is difficult to find native speakers who are bilingual in their language and English. In such cases we use a bridge language. Speakers of Guarani are primarily bilingual in Spanish, and none are bilingual in English and Guarani. In cases such as this, we use Spanish as a &#8220;bridge language&#8221; both for within-project communication, and also for acquisition. We expect that this will be an increasingly common way of acquiring resources as the languages for which resources are being acquired become less and less well-known and are spoken by fewer and fewer people. The main problem is addressing ambiguity issues when developing lexical resources. Some of this can be resolved using the corpora by a speaker of the bridge language and English, but often it involves long discussions between the primary developer and the secondary (to English) developer. In these cases Skype becomes an essential tool of the acquisition process.</p>
<p><span style="color: #3366ff;">3. PC based application</span>. We opted to replace the &#8220;run from the nearest browser&#8221; approach used in the Boas system with a mobile solution that enabled the acquirer to be nearly untethered from the web. This aspect was particularly important for our Guarani acquirers as high bandwidth internet connections are not widespread in Paraguay. This also allowed the system to be taken into the field. At times convenient to the acquirers, they can connect to our web-server and upload the resources they have collected to a centralized store. The quality of the resources are automatically checked during the upload process.</p>
<p><span style="color: #3366ff;">4. Favor applications and interfaces known to the acquirers.</span> Thus, from the acquirer&#8217;s perspective, there is a high preference for tools to run on the operating system they are most familiar with. In the case of our Guarani acquirers this was Windows 98 and for our Chechen acquirers this was Windows XP SP2. Our design criteria is that all our applications will run on the acquirers&#8217; native operating system.</p>
<p><span style="color: #3366ff;">5. Keep things simple</span>. The people in our lab—us included—have a passion for building computational tools. The Boas system mentioned above embodied a substantial amount of linguistic information in its acquisition tools and took over 12 person years to develop. Echoing point 4, acquirers prefer familiar applications and interfaces. From the acquirer&#8217;s perspective, the best tool is invisible allowing them to focus on the acquisition task. Our current acquisition effort makes use of common applications (when possible) that are familiar to the acquirers. For example, instead of a lexicon acquisition tool which has knowledge of inherent features and irregular inflectional paradigms, our current lexicon acquisition is done using a standard spreadsheet template with a handful of macros.</p>
<p><span style="color: #3366ff;">6. Preference to open source solutions.</span> In an effort to have the acquisition effort continue and thrive after our initial funding expires—to have local language communities take over the effort, we prefer to use and develop tools and language resources that are open source. All the tools and resources that we develop in-house are under a Creative Commons Attribution Non-Commercial license which allows others to tweak and build upon our work.</p>
<p>If you have any comments or suggestions please email me at the address in the footer.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.zacharski.org/2010/09/01/abstract-submitted-language-preservation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

