The following is a nearly complete list of my publications. To view publications by topic, use the menu on the left.
The topic of mood and modality (MOD) is a difficult aspect of language description because, among other reasons, the inventory of modal meanings is not stable across languages, moods do not map neatly from one language to another, modality may be realised morphologically or by free-standing words, and modality interacts in complex ways with other modules of the grammar, like tense and aspect. Describing MOD is especially difficult if one attempts to develop a unified approach that not only provides cross-linguistic coverage, but is also useful in practical natural language processing systems. This article discusses an approach to MOD that was developed for and implemented in the Boas Knowledge-Elicitation (KE) system.
McShane, Marjorie, Nirenburg, Sergei, and Ron Zacharski. 2004. Mood and modality: Out of the theory and into the fray. Natural Language Engineering Journal 10.57-89. (pdf)
We examine demonstrative pronouns in a portion of the Santa Barbara Corpus of American English and propose a coding scheme that classifies pronouns with nominal as well as non-nominal antecedents into direct and indirect, depending on whether their referent is the same as the referent/denotation of the antecedent. In agreement with previous studies, we find that demonstratives more often have non-NP antecedents than NP-antecedents, the opposite pattern from that of the personal pronouns. Since anaphoric relationships involving non=NP antecedents are more frequently indirect, our scheme allows for a principled explanation for the difference in distribution patterns of demonstratives compared with personal pronouns. We propose that the indirect anaphoric cases are more accessible to reference with demonstratives because, demonstratives only require the referent to be activated, not necessarily in focus.
Gundel, Jeanette K., Nancy Hedberg and Ron Zacharski. 2004. Demonstrative Pronouns in Natural Discourse. Proceedings of DAARC-2004 (the Fifth Discourse Anaphora and Anaphora Resolution Colloquium), Sao Miguel, Portugal, Sept. 23-24, 2004 (pdf)
This paper describes a discourse system for conversational characters used for interactive stories. This system is part of an environment that allows learners to practice language skills by interacting with the characters, other learners, and native speakers using instant messaging and email. The dialogues are not purely task oriented and, as a result, are difficult to model using traditional AI planners. On the other hand the dialogues must move the story forward and, thus, systems for the meandering dialogues of chatterbots (for example, AliceBot) are not appropriate. Our approach combines two methods. We use the notion of dialogue game or speech act networks to model the local coherence of dialogues. The story moves forward from one dialogue game to another by means of a situated activity planner.
Ron Zacharski. 2003. Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, ed. by Alexander Gelbukh. Heidelberg: Springer-Verlag. 492-495. (pdf)
This paper describes Expedition, an environment designed to facilitate the quick ramp-up of MT systems from practically any alphabetic language (L) into English. The central component of Expedition is a knowledge elicitation system that guides a lin- guistically naive bilingual speaker through the process of describing L in terms of its eco- logical, morphological, grammatical, lexical, and transfer information. Expedition also includes a module for converting the elicited information into the format expected by the underlying MT system and an MT engine that relies on both the elicited knowledge and resident knowledge about English. The Expedition environment is integrated using a con- figuration and control system. Expedition represents an innovative approach to answering the need for rapid-configuration MT by preparing an MT system in which the only miss- ing link is information about L, which is elicited in a structured fashion such that it can be directly exploited by the system.
McShane, M., S. Nirenburg, J. Cowie and R. Zacharski. 2002. Embedding knowledge elicitation and MT systems within a single architecture. Machine Translation 17(4):271-305. (pdf)
A commonly held view of English definite articles is that they signal that the referent of an NP is familiar to the addressee. However, it is well known that not all definite article phrases meet this familiarity requirement. In this paper we argue that the Givenness Hierarchy framework provides an insightful account of all uses of definite article phrases without requiring an appeal to accommodation. Such an account provides a unified treatment of definite article phrases, including demonstrative phrases and personal pronouns, while at the same time distinguishing among them in a principled way. This proposal is supported by results of a corpus-based examination of the use of definite articles and by an examination of cleft presuppositions.
Gundel, Jeanette K., Nancy Hedberg and Ron Zacharski. 2001. “Cognitive Status and Definite Descriptions in English: Why Accommodation is Unnecessary.” English Language and Linguistics 5. 273-295. (pdf)
This paper discusses the role of modularity in the knowledge elicitation component of a natural language processing system. The system at hand, Expedition, is intended to develop the capability for fast deployment of a machine translation (MT) system between any so-called “low-density” language (one lacking significant machine-tractable resources) and English.1 The knowledge-elicitation component of Expedition, called Boas, guides non-expert human informants through questions about the morphology, syntax, lexical stock, and ecology. The linguistic challenges for the developers of Boas can be summarized as follows: how does one gather all the necessary information about all the phenomena that can occur in any natural language in a way that is both understandable to a non-expert informant and machine tractable without post-elicitation human intervention?
McShane, Marjorie and Ron Zacharski. 2001. Proceedings of the Third Annual High Desert Linguistics Conference. University of New Mexico, Albuquerque NM, April 7-9, 2000. 93-104. (pdf)
The prototypical anaphoric expression is one which is interpreted as coreferential with a previous expression in the discourse. However, a nominal phrase may also be ‘linked’ to the previous discourse without being corefential with a previous expression and we refer to such expressions as ‘indirect anaphors’. This paper reports on an investigation of the conditions under which pronouns and demonstrative phrases can occur as indirect anaphors.
Jeanette Gundel, Nancy Hedberg and Ron Zacharski. 2000.
Verbum 22.79-102. (French PDF) (English PDF)
Our principle objective was to reduce the error rate of speech recognition systems used by professional translators. Our work concentrated on Spanish-to-English translation. In a baseline study we estimated the error rate of an off-the-shelf recognizer to be 9.98%. In this paper we describe two independent methods of improving speech recognizers: a machine translation (MT) method and a topic-based one. An evaluation of the MT method suggests that the vocabulary used for recognition cannot be completely restricted to the set of translations produced by the MT system and a more sophisticated constraint system must be used. An evaluation of the topic-based method showed significant error rate reduction, to 5.07%.
Ludovik, Yevgeny and Ron Zacharski. 2000. MT and topic-based techniques to enhance speech recognition systems for professional translators. Proceedings of CoLing 2000, 1061-1065. Saarbrücken, July 31-August 4, 2000. (pdf)
In this paper we describe language recognition algorithms for mono- and multi-lingual documents that are based on mixed-order n-grams, Markov chains, maximum likelihood, and dynamic programming. We compare the monolingual algorithm to those suggested by other researchers. This comparison suggests that this algorithm significantly outperforms commonly used language recognition algorithms. We then describe the multilingual algorithm, which allows for segmenting a multilingual document into single language chunks and identifying the languages of those chunks.
Cowie, Jim, Yevgeny Ludovik, and Ron Zacharski. 1999. Language recognition for mono- and multi-lingual documents. Proceedings of the Vextal Conference, 209-214. Venice, November 22-24, 1999. 209-214. (pdf)
In this paper we describe a language recognition algorithm for multilingual documents that is based on mixed-order n-grams, Markov chains, maximum likelihood, and dynamic programming. We present the re- sults of an experimental study that showed that the performance of this algorithm has practical value.
Ludovik, Yevgeny and Ron Zacharski. 1999. Multilingual document language recognition. Proceedings of the Machine Translation Summit VII, 317-323. Singapore, September 13-17, 1999. (pdf)