Translation Examples by MT System (Q3)

Now, I am going to translate a short text, in Spanish, into five languages:

“Es una verdad universalmente aceptada que un soltero con posibles ha de buscar esposa.

Por muy poco que se sepa de los gustos u opiniones de tal varón cuando se incorpora a una comunidad, esa verdad tiene tanto arraigo en en la mente de las familias circundantes que se le considera, por derecho, propiedad de una u otra de sus hijas.

-Mi querido señor Bennet -le dijo un día su esposa a este caballero-, ¿te has enterado de que por fin se ha alquilado Netherfield Park?”

Catalan: “És una veritat universalment acceptada que un solter amb possibles ha de buscar dona.

Per molt poc que se sàpiga dels gustos o opinions de tal home quan s’incorpora a una comunitat, aquesta veritat té tant arrelament en en la ment de les famílies circumdants que se li considera, per dret, propietat d’una o una altra de les seves filles.

-El meu estimat senyor Bennet -li va dir un dia la seva dona a aquest cavaller-, t’has assabentat que per fi s’ha llogat Netherfield Park?”

Galician: “É unha verdade universalmente aceptada que un solteiro con posibles ten que buscar esposa.

Por moi pouco que se saiba dos gustos ou opinións de tal home cando se incorpora a unha comunidade, esa verdade ten tanto arraigamento en na mente das familias circundantes que se lle considera, por dereito, propiedade dunha ou outra das súas fillas.

-O meu querido señor Bennet -díxolle un día a súa esposa a este cabaleiro-, decatáchesche de que por fin se alugou Netherfield Park?”

English: “It is a universally accepted truth that a bachelor with possible ones has to look for wife.

For very little that is known about the tastes or opinions of such a male when it|he|she is incorporated in a community, that truth has so much rooting in in the mind of the surrounding families that he is considered, for law|right, estate|property of one or another of its|his|her|their daughters.

-My dear Mr. Bennet -said it|him a day its|his|her|their wife to this gentleman-, you have found out about at last Netherfield Park having been rented?”

Portuguese: “É uma verdade universalmente aceitada que um solteiro com possíveis há de buscar esposa.

Por muito pouco que se saiba dos gostos ou opiniões de tal varão quando se incorpora a uma comunidade, essa verdade tem tanto arraigo em na mente das famílias circundantes que se lhe considera, por direito, propriedade de uma ou outra de suas filhas.

-Meu querido senhor Bennet -lhe disse um dia sua esposa a este cavalheiro-, te inteiraste que por último se alugou Netherfield Park?”

German: “Es ist eine universalmente akzeptierte Wahrheit, dass ein Junggeselle mit möglichen Gattin suchen muss.

Durch sehr wenig der sich von den Geschmäcken oder Meinungen so eines Mannes, wenn er in eine Gemeinschaft eingegliedert wird, weiß, hat diese Wahrheit so viel Einwurzelung in im Verstand der umgebenden Familien, die ihn durch|für Recht hält, Eigenschaft|Eigentum von einer oder einer anderen seiner|ihrer Töchter.

-Mein beliebter|lieber Herr Bennet -er sagte einen Tag seine|ihre Gattin zu diesem Herrn|Ritter-, du hast davon erfahren, dass man endlich Netherfield Park gemietet|vermietet hat?”


Texto: Capitulo I de “Orgullo y Prejuicio” de Jane Austen

Translation Machines:

Comprendium Translator

Instituto Cervantes servicio de Traducción

Definition of some concepts of the Translation World (Q3)

Here I will explain some of the most used words with the purpose to make easier the study of this subject.

  • Machine Translation, also referred by MT, is according to the Free Encyclopedia “sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.
  • Machine aided Translation, or CAT s a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process. Some advanced computer-assisted translation solutions include controlled machine translation (MT). Integration of MT into computer-assisted translation has been implemented in various ways by various parties. Although this type of technology is neither widely known nor available to individual translators, carefully-customized user dictionaries based on correct terminology significantly improve the accuracy of MT, and as a result, they improve the efficiency of translation process.
  • Multilingual content management, is a multilingual website is usually a mixture of global and local content. Local content presents no particular content management issues; global content – which has to be translated across all language locales – does. Deciding where multiple language versions of content are going to be required and where content can be maintained separately for different locales is a critical decision that will affect how a site should be maintained and what it will cost.
  • Translation Technology, s the action of interpretation of the meaning of a text, and subsequent production of an equivalent text, also called a translation, that communicates the same message in another language. The text to be translated is called the “source text,” and the language it is to be translated into is called the “target language”; the final product is sometimes called the “target text.”


Machile Translation. Wikipedia. Retrieved May 5, 2008, 11:58. From

Machine aided Translation. Wikipedia. Retrieved May 5, 2008, 12:03. From

Muntilingual content management. Kitsite. Retrieved May 5, 2008, 12:09. From

Translation Technology. Wikipedia Translation. Retrieved May 5, 2008, 12:12. From

Main Characteristics of a Translation Task by FEMTI (Q3)

To start with lets explain what the FEMTI is. The Framework for Machine Translation Evaluation in ISLE is a resource that helps MT evaluators define contextual evaluation plans. It consist on two interrelated classifications:

  • It lists possible characteristics of the contexts of use that are applicable to MT systems.
  • It lists the possible characteristics of an MT system, along with the metrics that were proposed to measure them.

FEMTI proposes a set of quality characteristics that are relevant to that context, using its embedded knowledge base. Evaluators can modify this set of quality characteristics and select evaluation metrics for each of them, by browsing the second classification. Evaluators can then print the evaluation plan and execute the evaluation.

According to FEMTI the main characteristics of a translation task are the following:

  • Assimilation: “The ultimate purpose of the assimilation task (of which translation forms a part) is to monitor a (relatively) large volume of texts produced by people outside the organization, in (usually) several languages.”
  • Dissemination: “The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.”
  • Communication: “The ultimate purpose of the communication task is to support multi-turn dialogues between people who speak different languages. The translation quality must be high enough for painless conversation, despite possible syntactically ill-formed input and idiosyncratic word and format usage. The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.”


Three topics development (Q2)

The first project I have chosen to explain is the VSDS: Viennese Sociolect and Dialect Synthesis which is being developed by Fiedrich Neubarth, who belong to the OFAI Language Technology Group. One important means of natural human-computer interaction is spoken language, so for a variety of applications it is essential to have high quality speech synthesis for different languages. The outcome of this project will be high quality synthetic voices, which allow a computer to “speak” in different Viennese dialects/sociolects. Since the sources of these voices are pieces taken from actual human speech, the outcome of the synthetic voices will sound very natural, close to human speech. With this technology it is possible to realize a lot of applications from the domain of education and tourism to art. A mobile sample application, a Viennese district guide capable of various dialects or variants, is also developed within the project. In the research part of the project efficient methods are investigated for developing synthetic voices for languages that are variants of other languages. Furthermore, it is necessary to employ methods for switching, or shifting between the standard language and dialectal variants, which reflects the fact that this mixing of standards corresponds to the everyday language use of many speakers. User tests are conducted to evaluate the quality of the synthetic voices and of the relevant sample applications.

The second research project explained is from the Edimburgh Language Technology Group. Ewan Klein, Claire Grover as principal investigators from the University of Edimburgh and Chris Manning from Standford University have developed EASIE, which builds on existing techniques for information extraction (IE) in order to develop and implement improved methods for extracting semantic content from text. The results of the research are being used to significantly extend the functionality of Edinburgh’s existing XML-based LT-TTT software, in part by incorporating machine learning approaches developed at Stanford. The objective is to develop and implement improved methods for extracting semantic content from text.

The last project which I will focus on is K-Space, developed by Thierry Declerck, from the Language Technology Lab. It is a network of leading research teams from academia and industry conducting integrative research and dissemination activities in semantic inference for automatic and semi-automatic annotation and retrieval of multimedia content. The aim of K-Space research is to narrow the gap between low-level content descriptions that can be computed automatically by a machine and the richness and subjectivity of semantics in high-level human interpretations of audiovisual media: The Semantic Gap. The Network of Excellence K-Space exploits the complementary expertise of project partners, enables resource optimization and fosters innovative research in the field. Specifically, K-Space integrative research focus on three areas:

  • Content-based multimedia analysis:
    Tools and methodologies for low-level signal processing, object segmentation, audio/speech processing and text analysis, and audiovisual content structuring and description.
  • Knowledge extraction:
    Building of a multimedia ontology infrastructure, knowledge acquisition from multimedia content, knowledge-assisted multimedia analysis, context based multimedia mining and intelligent exploitation of user relevance feedback.
  • Semantic multimedia:
    knowledge representation for multimedia, distributed semantic management of multimedia data, semantics-based interaction with multimedia and multimodal media analysis.


Recent Research Topics on Human Language Technology (Q2)

Next lines will deal with the most recent research topics mentioned in some important sites on Human Language Technology from different research centers in Europe:

The German Research Center for Artificial Intelligence is at the moment working in the following research projects:

  • CoSy-Cognitive Systems for Cognitive Assistants
  • HyLaP-Hybrid Language Pricessing Technologues for a personal associative information access and managemente application
  • K-Space-Knowledge Space of semantic inference for automatic annotation and retrieval of multimedia content
  • MESH-Multimedia Semantic Syndication for Enhanced News Services
  • MUSING-MUlti-Industry, Semantic-based Next Generation Business INtelliGence
  • PAVOQUE-PArametrisation of prosody and VOice QUality for concatenative speech synthesis in view of Emotion expression
  • QALL-ME-Question Answering Learning technologues in a multilingual and Multimoda Enviroment
  • RASCALLI-Responsive Artificial Situated Cognitive Agents Living and Learning on the Internet

In Ireland, the National Centre for Language Technology is developing on:

  • CALL Computer Assisted Language Learning-Integrating CL/NLP/HLT Technology into CALL, CALL for Endangered Languages, CALL for Primary School Environments, CALL for Remedial Learners
  • Corpus Linguistics- Statistical and Rule-Based MT (SMT, RBMT), Example-Based MT (EBMT), Translation Memories (TMs), Boosting Existing MT Systems, Machine-Aided Translation (MAT), Computer-Aided Translation (CAT), Controlled Languages
  • Treebank-Based Unification Grammar Acquisition-Automatic Feature-Structure Annotation Algorithms, Subcategorisation Frame Extraction, Wide-Coverage Robust Probabilistic Unification Grammar Acquisition, PCFG-Based LFG Approximation, HPSG Acquisition, Multilingual Treebank-Based Grammar Acquisition
  • Semantics-Discourse Representation Theory, Linear-Logic Based Semantics, Computation of Logical Forms from Treebanks, Open-Domain Question Answering Systems
  • Speech Technology- Speaker Characterisation, Audio Classification, Retrieval and Coding, Human Computer Interfaces (HCIs)
  • Multilingual Information Retrieval/Extraction
  • Language Evolution

The OFAI Language Technology Group is now involved in four projects and in some of these projects, there is a cooperations with Austrian university departments and companies.

  • VSDS: Viennese Sociolect and Dialect Synthesis (2007 – 2009)
  • SEMPRE: Semantically Aware Profiling for Recommenders (2007 – 2008 )
  • INSPIRATION (2006 – 2010)
  • RASCALLI: Responsive Artificial Situated Cognitive Agents Living and Learning on the Internet (2006 – 2008 )

In Edinburgh is possible to find their Language Technology Group which is on research and development of the following listed topics:

  • EASIE-Combining Shallow Semantics and Domain Knowledge
  • TXM-Text Mining for Biomedical Content Curation
  • CROSSMARC-Cross-retail Multi-agent Retail Comparison
  • SQUAD-Smart Qualitative Data: Methods and Community Tools for Data Mark-Up
  • SEER-Machine Learning for Named Entity Recognition
  • BOPCRIS-Named entity tagging of historical parliamentary proceedings
  • Synthesis-Integrated Models and Tools for Fine-Grained Prosody in Discourse
  • JAST-Joint Action Science and Technology
  • AMI and AMIDA-AMI consortium projects that are developing technologies for meeting browsing and to assist people participating in meetings from a remote location
  • Collaborating Using Diagrams-Study of how pairs collaborate when in planning a route on a map


European research centres for Human Language Technologies (Q1)

The following are some of the European research centers for Human Language Technologies:

  • The Edinburgh Language Technology Group (LTG) is a research and development group that has been working in the area of natural language engineering since the early 1990s. The LTG was originally established as part of the Human Communication Research Centre, and is now based in the Institute for Communicating and Collaborative Systems of the Division of Informatics, University of Edinburgh, one of the largest communities of natural language processing specialists in Europe.
  • The National Centre for Language Technology (NCLT), by Professor Josef van Genabith, conducts research into the processing of human language by computers, such as speech recognition and synthesis, machine translation, human-computer interfaces, information retrieval and extraction, the teaching and learning of languages using computers and software localisation and globalisation. Research in Human Language Technology (HLT) is interdisciplinary and includes Natural Language Processing (NLP) and Computational Linguistics (CL). HLT has substantial economic implications and potential. The centre carries out basic research and develops applications.
  • The Language Technology Lab whose mission is the improvement of language technology through novel computational techniques for processing text, speech and knowledge, a deeper understanding of human language and thought, studying the true needs of the end user and the demands of the market. They develop novel and improved applications in three areas: Information and Knowledge Management. Document Production, Natural Communication. One of their commercial activities is indexing of German and English texts using the IDX software package.
  • Language Technology (LT) forms a major research area at the Austrian Research Institute for Artificial Intelligence (OFAI) since its inception in 1984. We conduct research in modelling and processing human languages, especially for German. This includes constructing linguistic resources (such as lexicons, grammars, discourse models), processing algorithms (such as morphological components, parsers, generators, speech synthesizers, discourse processing components), and application prototypes (such as natural language interfaces, advisory systems and concept-to-speech systems).


Definition of Human Language Technology (Q1)

The definitions on this topics are numerous and different on the Net. These are two of them:

Wikipedia, under the name of Natural Language Processing, defines our aim of study as

” a subfield of artificial intelligence and linguistics. It studies the problems of automated generation and understanding of natural human languages. Natural language generation systems convert information from computer databases into normal-sounding human language, and natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.”

According to the Language Technology Lab, written by Hans Uszkoreit, Human Language Technology

“comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language. Therefore language technology defines the engineering brach of computational linguistics”.

Searching for Hans Uszkoreit, we can find his curriculum vitae: Uszkoreit studied Linguistics and Computer Science at the Technical University of Berlin from 1973 to 1977 and the University of Texas at Austin from 1977 to 1981. During this time he also worked as a research associate in a large machine translation project at the Linguistics Research Center. He received the Ph. D. (Doctor in Philosophy) in linguistics from University of Texas in 1984. From 1982 until 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, Ca. During this time he was also affiliated with the Center for the Study of Language and Information at Stanford University as a senior researcher and later as a project leader. In 1986 he spent six months in Stuttgart on an IBM (International Business Machines Corporation) Research Fellowship at the Science Division of IBM Germany. In December 1986 he returned to Sttutgart to work for IBM Germany as a project leader in the project LILOG (Linguistic and Logical Methods for the Understanding of German Texts). During this time, he also taught at the University of Stuttgart.
Among all his relevant publications and projects we can quote here some of them:

  • Uszkoreit, H. (2007) Methods and Applications for Relation Detection. In: Proceedings of the Third IEEE International Conference on Natural Language Processing and Knowledge Engineering, Beijing, 2007.
  • Uszkoreit, H., F. Xu, J. Steffen and I. Aslan (2006) The pragmatic combination of different cross-lingual resources for multilingual information services In Proceedings of LREC 2006, Genova, Italy, May, 2006.
  • Uszkoreit, H. (2000): Sprache und Sprachtechnologie bei der Strukturierung digitalen Wissens. In: W. Kallmeyer (Ed.) Sprache in neuen Medien, Institut für Deutsche Sprache, Jahrbuch 1999, De Gruyter, Berlin.
  • Uszkoreit, H. (1999): Sprachtechnologie für die Wissensgesellschaft: Herausforderungen und Chancen für die Computerlinguistik und die theoretische Sprachwissenschaft. In: F. Meyer-Krahmer und S. Lange (Eds.), Geisteswissenschaften und Innovationen, Physica Verlag.
  • Uszkoreit, H. (1998): Cross-Lingual Information Retrieval: From Naive Concepts to Realistic Applications. In: Language Technology in Multimedia Information Retrieval, Proceedings of the14th Twente Workshop on Language Technology.