Publications on Machine Translation

User choice as an evaluation metric for cross language IM

A method for evaluating MT performance embedded in Cross-Language Instant Messaging (CLIM) systems is presented. A web interface that provided concurrent real- time translation for instant messaging from multiple MT services was developed and used by paid participants to collaborate on a photo identification task. The method showed a task performance benefit due to the availability of multiple translation alternatives.

Ogden, William, Ron Zacharski, Sieun An and Yuki Ishikawa. 2009. User choice as an evaluation metric for web translation services in cross language instant messaging applications. Proceedings of the Machine Translation Summit XII. Ottawa, Canada. (pdf)

Guarani: A case study in resource development for quick ramp­up MT

In this paper we describe a set of processes for the acquisition of re­ sources for quick ramp­up machine translation (MT) from any language lacking significant machine tracta­ ble resources into English, using the Paraguayan indigenous lan­ guage Guarani as well as Amharic and Chechen, as examples.

Abdelali, Ahmed; James Cowie; Steve Helmreich; Wanying Jin; Maria Pilar Milagros; Bill Ogden; Hamid Mansouri Rad; and Ron Zacharski. 2006. Guarani: A case study in resource development for quick ramp-up MT. Proceedings of the Seventh Biennial Conference of the Association for Machine Translation in the Americas. Boston, MA. (pdf)

The Role of Ontologies in a Linguistic Knowledge Acquisition Task

This paper discusses the role of ontologies in a knowledge elicitation component of a natural language processing system. The system is intended to assist in the rapid development and deployment of a machine translation system from any so–called ‘low–density’ language (one lacking significant machine–tractable resources) into English. The elicitation component, called BOAS, is intended to guide non–expert informants to provide linguistic information in sufficient detail to automatically generate a machine translation system.

Helmreich, Steve, and Ron Zacharski. 2005. The role of ontologies in a linguistic knowledge acquisition task. Proceedings of The Electronic Metastructure for Endangered Languages Data Workshop on Linguistic Ontologies and Data Categories for Language Resources. Cambridge, MA. July 1-3, 2005. (pdf)

Mood and modality: Out of the theory and into the fray.

The topic of mood and modality (MOD) is a difficult aspect of language description because, among other reasons, the inventory of modal meanings is not stable across languages, moods do not map neatly from one language to another, modality may be realised morphologically or by free-standing words, and modality interacts in complex ways with other modules of the grammar, like tense and aspect. Describing MOD is especially difficult if one attempts to develop a unified approach that not only provides cross-linguistic coverage, but is also useful in practical natural language processing systems. This article discusses an approach to MOD that was developed for and implemented in the Boas Knowledge-Elicitation (KE) system.

McShane, Marjorie, Nirenburg, Sergei, and Ron Zacharski. 2004. Mood and modality: Out of the theory and into the fray. Natural Language Engineering Journal 10.57-89. (pdf)

Embedding Knowledge Elicitation and MT Systems within a Single Architecture

This paper describes Expedition, an environment designed to facilitate the quick ramp-up of MT systems from practically any alphabetic language (L) into English. The central component of Expedition is a knowledge elicitation system that guides a lin- guistically naive bilingual speaker through the process of describing L in terms of its eco- logical, morphological, grammatical, lexical, and transfer information. Expedition also includes a module for converting the elicited information into the format expected by the underlying MT system and an MT engine that relies on both the elicited knowledge and resident knowledge about English. The Expedition environment is integrated using a con- figuration and control system. Expedition represents an innovative approach to answering the need for rapid-configuration MT by preparing an MT system in which the only miss- ing link is information about L, which is elicited in a structured fashion such that it can be directly exploited by the system.

McShane, M., S. Nirenburg, J. Cowie and R. Zacharski. 2002. Embedding knowledge elicitation and MT systems within a single architecture. Machine Translation 17(4):271-305. (pdf)

Modularity in knowledge elicitation and language processing

This paper discusses the role of modularity in the knowledge elicitation component of a natural language processing system. The system at hand, Expedition, is intended to develop the capability for fast deployment of a machine translation (MT) system between any so-called “low-density” language (one lacking significant machine-tractable resources) and English.1 The knowledge-elicitation component of Expedition, called Boas, guides non-expert human informants through questions about the morphology, syntax, lexical stock, and ecology. The linguistic challenges for the developers of Boas can be summarized as follows: how does one gather all the necessary information about all the phenomena that can occur in any natural language in a way that is both understandable to a non-expert informant and machine tractable without post-elicitation human intervention?

McShane, Marjorie and Ron Zacharski. 2001. Proceedings of the Third Annual High Desert Linguistics Conference. University of New Mexico, Albuquerque NM, April 7-9, 2000. 93-104. (pdf)

MT and topic-based techniques to enhance speech recognition

Our principle objective was to reduce the error rate of speech recognition systems used by professional translators. Our work concentrated on Spanish-to-English translation. In a baseline study we estimated the error rate of an off-the-shelf recognizer to be 9.98%. In this paper we describe two independent methods of improving speech recognizers: a machine translation (MT) method and a topic-based one. An evaluation of the MT method suggests that the vocabulary used for recognition cannot be completely restricted to the set of translations produced by the MT system and a more sophisticated constraint system must be used. An evaluation of the topic-based method showed significant error rate reduction, to 5.07%.

Ludovik, Yevgeny and Ron Zacharski. 2000. MT and topic-based techniques to enhance speech recognition systems for professional translators. Proceedings of CoLing 2000, 1061-1065. Saarbrücken, July 31-August 4, 2000. (pdf)

Language recognition for mono- and multi-lingual documents

In this paper we describe language recognition algorithms for mono- and multi-lingual documents that are based on mixed-order n-grams, Markov chains, maximum likelihood, and dynamic programming. We compare the monolingual algorithm to those suggested by other researchers. This comparison suggests that this algorithm significantly outperforms commonly used language recognition algorithms. We then describe the multilingual algorithm, which allows for segmenting a multilingual document into single language chunks and identifying the languages of those chunks.

Cowie, Jim, Yevgeny Ludovik, and Ron Zacharski. 1999. Language recognition for mono- and multi-lingual documents. Proceedings of the Vextal Conference, 209-214. Venice, November 22-24, 1999. 209-214. (pdf)

Multilingual Document Language Recognition for Creating Corpora

In this paper we describe a language recognition algorithm for multilingual documents that is based on mixed-order n-grams, Markov chains, maximum likelihood, and dynamic programming. We present the re- sults of an experimental study that showed that the performance of this algorithm has practical value.

Ludovik, Yevgeny and Ron Zacharski. 1999. Multilingual document language recognition. Proceedings of the Machine Translation Summit VII, 317-323. Singapore, September 13-17, 1999. (pdf)