Publications on Publications

Demonstrative pronouns in natural discourse.

We examine demonstrative pronouns in a portion of the Santa Barbara Corpus of American English and propose a coding scheme that classifies pronouns with nominal as well as non-nominal antecedents into direct and indirect, depending on whether their referent is the same as the referent/denotation of the antecedent. In agreement with previous studies, we find that demonstratives more often have non-NP antecedents than NP-antecedents, the opposite pattern from that of the personal pronouns. Since anaphoric relationships involving non=NP antecedents are more frequently indirect, our scheme allows for a principled explanation for the difference in distribution patterns of demonstratives compared with personal pronouns. We propose that the indirect anaphoric cases are more accessible to reference with demonstratives because, demonstratives only require the referent to be activated, not necessarily in focus.

Gundel, Jeanette K., Nancy Hedberg and Ron Zacharski. 2004. Demonstrative Pronouns in Natural Discourse. Proceedings of DAARC-2004 (the Fifth Discourse Anaphora and Anaphora Resolution Colloquium), Sao Miguel, Portugal, Sept. 23-24, 2004 (pdf)

A Discourse System for Conversational Characters

This paper describes a discourse system for conversational characters used for interactive stories. This system is part of an environment that allows learners to practice language skills by interacting with the characters, other learners, and native speakers using instant messaging and email. The dialogues are not purely task oriented and, as a result, are difficult to model using traditional AI planners. On the other hand the dialogues must move the story forward and, thus, systems for the meandering dialogues of chatterbots (for example, AliceBot) are not appropriate. Our approach combines two methods. We use the notion of dialogue game or speech act networks to model the local coherence of dialogues. The story moves forward from one dialogue game to another by means of a situated activity planner.

Ron Zacharski. 2003. Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, ed. by Alexander Gelbukh. Heidelberg: Springer-Verlag. 492-495. (pdf)

Embedding Knowledge Elicitation and MT Systems within a Single Architecture

This paper describes Expedition, an environment designed to facilitate the quick ramp-up of MT systems from practically any alphabetic language (L) into English. The central component of Expedition is a knowledge elicitation system that guides a lin- guistically naive bilingual speaker through the process of describing L in terms of its eco- logical, morphological, grammatical, lexical, and transfer information. Expedition also includes a module for converting the elicited information into the format expected by the underlying MT system and an MT engine that relies on both the elicited knowledge and resident knowledge about English. The Expedition environment is integrated using a con- figuration and control system. Expedition represents an innovative approach to answering the need for rapid-configuration MT by preparing an MT system in which the only miss- ing link is information about L, which is elicited in a structured fashion such that it can be directly exploited by the system.

McShane, M., S. Nirenburg, J. Cowie and R. Zacharski. 2002. Embedding knowledge elicitation and MT systems within a single architecture. Machine Translation 17(4):271-305. (pdf)

Definite Descriptions and Cognitive Status in English: Why Accommodation is Unnecessary

A commonly held view of English definite articles is that they signal that the referent of an NP is familiar to the addressee. However, it is well known that not all definite article phrases meet this familiarity requirement. In this paper we argue that the Givenness Hierarchy framework provides an insightful account of all uses of definite article phrases without requiring an appeal to accommodation. Such an account provides a unified treatment of definite article phrases, including demonstrative phrases and personal pronouns, while at the same time distinguishing among them in a principled way. This proposal is supported by results of a corpus-based examination of the use of definite articles and by an examination of cleft presuppositions.

Gundel, Jeanette K., Nancy Hedberg and Ron Zacharski. 2001. “Cognitive Status and Definite Descriptions in English: Why Accommodation is Unnecessary.” English Language and Linguistics 5. 273-295. (pdf)

Modularity in knowledge elicitation and language processing

This paper discusses the role of modularity in the knowledge elicitation component of a natural language processing system. The system at hand, Expedition, is intended to develop the capability for fast deployment of a machine translation (MT) system between any so-called “low-density” language (one lacking significant machine-tractable resources) and English.1 The knowledge-elicitation component of Expedition, called Boas, guides non-expert human informants through questions about the morphology, syntax, lexical stock, and ecology. The linguistic challenges for the developers of Boas can be summarized as follows: how does one gather all the necessary information about all the phenomena that can occur in any natural language in a way that is both understandable to a non-expert informant and machine tractable without post-elicitation human intervention?

McShane, Marjorie and Ron Zacharski. 2001. Proceedings of the Third Annual High Desert Linguistics Conference. University of New Mexico, Albuquerque NM, April 7-9, 2000. 93-104. (pdf)

Statut cognitif et forme des anaphoriques indirects

The prototypical anaphoric expression is one which is interpreted as coreferential with a previous expression in the discourse. However, a nominal phrase may also be ‘linked’ to the previous discourse without being corefential with a previous expression and we refer to such expressions as ‘indirect anaphors’. This paper reports on an investigation of the conditions under which pronouns and demonstrative phrases can occur as indirect anaphors.

Jeanette Gundel, Nancy Hedberg and Ron Zacharski. 2000.
Verbum 22.79-102. (French PDF) (English PDF)

MT and topic-based techniques to enhance speech recognition

Our principle objective was to reduce the error rate of speech recognition systems used by professional translators. Our work concentrated on Spanish-to-English translation. In a baseline study we estimated the error rate of an off-the-shelf recognizer to be 9.98%. In this paper we describe two independent methods of improving speech recognizers: a machine translation (MT) method and a topic-based one. An evaluation of the MT method suggests that the vocabulary used for recognition cannot be completely restricted to the set of translations produced by the MT system and a more sophisticated constraint system must be used. An evaluation of the topic-based method showed significant error rate reduction, to 5.07%.

Ludovik, Yevgeny and Ron Zacharski. 2000. MT and topic-based techniques to enhance speech recognition systems for professional translators. Proceedings of CoLing 2000, 1061-1065. Saarbrücken, July 31-August 4, 2000. (pdf)

Language recognition for mono- and multi-lingual documents

In this paper we describe language recognition algorithms for mono- and multi-lingual documents that are based on mixed-order n-grams, Markov chains, maximum likelihood, and dynamic programming. We compare the monolingual algorithm to those suggested by other researchers. This comparison suggests that this algorithm significantly outperforms commonly used language recognition algorithms. We then describe the multilingual algorithm, which allows for segmenting a multilingual document into single language chunks and identifying the languages of those chunks.

Cowie, Jim, Yevgeny Ludovik, and Ron Zacharski. 1999. Language recognition for mono- and multi-lingual documents. Proceedings of the Vextal Conference, 209-214. Venice, November 22-24, 1999. 209-214. (pdf)

Multilingual Document Language Recognition for Creating Corpora

In this paper we describe a language recognition algorithm for multilingual documents that is based on mixed-order n-grams, Markov chains, maximum likelihood, and dynamic programming. We present the re- sults of an experimental study that showed that the performance of this algorithm has practical value.

Ludovik, Yevgeny and Ron Zacharski. 1999. Multilingual document language recognition. Proceedings of the Machine Translation Summit VII, 317-323. Singapore, September 13-17, 1999. (pdf)

Pragmatic Determinants of Intonation Countours

This paper describes an implemented computational model that generates intonation contours for dialogue systems. It presents a general overview of my dissertation work at the University of Minnesota under Jeanette Gundel and Maria Gini and the continuation of that work at the University of Edinburgh under D. Robert Ladd. My co-author on this paper is Judy Delin, a colleague of mine when I was at the University of Edinburgh.

Delin, Judy, and Ron Zacharski. 1997. Pragmatic Determinants of Intonation Contours for Dialogue Systems. International Journal of Speech Technology 1:109-120. (pdf)