Three topics development (Q2)

The first project I have chosen to explain is the VSDS: Viennese Sociolect and Dialect Synthesis which is being developed by Fiedrich Neubarth, who belong to the OFAI Language Technology Group. One important means of natural human-computer interaction is spoken language, so for a variety of applications it is essential to have high quality speech synthesis for different languages. The outcome of this project will be high quality synthetic voices, which allow a computer to “speak” in different Viennese dialects/sociolects. Since the sources of these voices are pieces taken from actual human speech, the outcome of the synthetic voices will sound very natural, close to human speech. With this technology it is possible to realize a lot of applications from the domain of education and tourism to art. A mobile sample application, a Viennese district guide capable of various dialects or variants, is also developed within the project. In the research part of the project efficient methods are investigated for developing synthetic voices for languages that are variants of other languages. Furthermore, it is necessary to employ methods for switching, or shifting between the standard language and dialectal variants, which reflects the fact that this mixing of standards corresponds to the everyday language use of many speakers. User tests are conducted to evaluate the quality of the synthetic voices and of the relevant sample applications.

The second research project explained is from the Edimburgh Language Technology Group. Ewan Klein, Claire Grover as principal investigators from the University of Edimburgh and Chris Manning from Standford University have developed EASIE, which builds on existing techniques for information extraction (IE) in order to develop and implement improved methods for extracting semantic content from text. The results of the research are being used to significantly extend the functionality of Edinburgh’s existing XML-based LT-TTT software, in part by incorporating machine learning approaches developed at Stanford. The objective is to develop and implement improved methods for extracting semantic content from text.

The last project which I will focus on is K-Space, developed by Thierry Declerck, from the Language Technology Lab. It is a network of leading research teams from academia and industry conducting integrative research and dissemination activities in semantic inference for automatic and semi-automatic annotation and retrieval of multimedia content. The aim of K-Space research is to narrow the gap between low-level content descriptions that can be computed automatically by a machine and the richness and subjectivity of semantics in high-level human interpretations of audiovisual media: The Semantic Gap. The Network of Excellence K-Space exploits the complementary expertise of project partners, enables resource optimization and fosters innovative research in the field. Specifically, K-Space integrative research focus on three areas:

  • Content-based multimedia analysis:
    Tools and methodologies for low-level signal processing, object segmentation, audio/speech processing and text analysis, and audiovisual content structuring and description.
  • Knowledge extraction:
    Building of a multimedia ontology infrastructure, knowledge acquisition from multimedia content, knowledge-assisted multimedia analysis, context based multimedia mining and intelligent exploitation of user relevance feedback.
  • Semantic multimedia:
    knowledge representation for multimedia, distributed semantic management of multimedia data, semantics-based interaction with multimedia and multimodal media analysis.


Recent Research Topics on Human Language Technology (Q2)

Next lines will deal with the most recent research topics mentioned in some important sites on Human Language Technology from different research centers in Europe:

The German Research Center for Artificial Intelligence is at the moment working in the following research projects:

  • CoSy-Cognitive Systems for Cognitive Assistants
  • HyLaP-Hybrid Language Pricessing Technologues for a personal associative information access and managemente application
  • K-Space-Knowledge Space of semantic inference for automatic annotation and retrieval of multimedia content
  • MESH-Multimedia Semantic Syndication for Enhanced News Services
  • MUSING-MUlti-Industry, Semantic-based Next Generation Business INtelliGence
  • PAVOQUE-PArametrisation of prosody and VOice QUality for concatenative speech synthesis in view of Emotion expression
  • QALL-ME-Question Answering Learning technologues in a multilingual and Multimoda Enviroment
  • RASCALLI-Responsive Artificial Situated Cognitive Agents Living and Learning on the Internet

In Ireland, the National Centre for Language Technology is developing on:

  • CALL Computer Assisted Language Learning-Integrating CL/NLP/HLT Technology into CALL, CALL for Endangered Languages, CALL for Primary School Environments, CALL for Remedial Learners
  • Corpus Linguistics- Statistical and Rule-Based MT (SMT, RBMT), Example-Based MT (EBMT), Translation Memories (TMs), Boosting Existing MT Systems, Machine-Aided Translation (MAT), Computer-Aided Translation (CAT), Controlled Languages
  • Treebank-Based Unification Grammar Acquisition-Automatic Feature-Structure Annotation Algorithms, Subcategorisation Frame Extraction, Wide-Coverage Robust Probabilistic Unification Grammar Acquisition, PCFG-Based LFG Approximation, HPSG Acquisition, Multilingual Treebank-Based Grammar Acquisition
  • Semantics-Discourse Representation Theory, Linear-Logic Based Semantics, Computation of Logical Forms from Treebanks, Open-Domain Question Answering Systems
  • Speech Technology- Speaker Characterisation, Audio Classification, Retrieval and Coding, Human Computer Interfaces (HCIs)
  • Multilingual Information Retrieval/Extraction
  • Language Evolution

The OFAI Language Technology Group is now involved in four projects and in some of these projects, there is a cooperations with Austrian university departments and companies.

  • VSDS: Viennese Sociolect and Dialect Synthesis (2007 – 2009)
  • SEMPRE: Semantically Aware Profiling for Recommenders (2007 – 2008 )
  • INSPIRATION (2006 – 2010)
  • RASCALLI: Responsive Artificial Situated Cognitive Agents Living and Learning on the Internet (2006 – 2008 )

In Edinburgh is possible to find their Language Technology Group which is on research and development of the following listed topics:

  • EASIE-Combining Shallow Semantics and Domain Knowledge
  • TXM-Text Mining for Biomedical Content Curation
  • CROSSMARC-Cross-retail Multi-agent Retail Comparison
  • SQUAD-Smart Qualitative Data: Methods and Community Tools for Data Mark-Up
  • SEER-Machine Learning for Named Entity Recognition
  • BOPCRIS-Named entity tagging of historical parliamentary proceedings
  • Synthesis-Integrated Models and Tools for Fine-Grained Prosody in Discourse
  • JAST-Joint Action Science and Technology
  • AMI and AMIDA-AMI consortium projects that are developing technologies for meeting browsing and to assist people participating in meetings from a remote location
  • Collaborating Using Diagrams-Study of how pairs collaborate when in planning a route on a map


European research centres for Human Language Technologies (Q1)

The following are some of the European research centers for Human Language Technologies:

  • The Edinburgh Language Technology Group (LTG) is a research and development group that has been working in the area of natural language engineering since the early 1990s. The LTG was originally established as part of the Human Communication Research Centre, and is now based in the Institute for Communicating and Collaborative Systems of the Division of Informatics, University of Edinburgh, one of the largest communities of natural language processing specialists in Europe.
  • The National Centre for Language Technology (NCLT), by Professor Josef van Genabith, conducts research into the processing of human language by computers, such as speech recognition and synthesis, machine translation, human-computer interfaces, information retrieval and extraction, the teaching and learning of languages using computers and software localisation and globalisation. Research in Human Language Technology (HLT) is interdisciplinary and includes Natural Language Processing (NLP) and Computational Linguistics (CL). HLT has substantial economic implications and potential. The centre carries out basic research and develops applications.
  • The Language Technology Lab whose mission is the improvement of language technology through novel computational techniques for processing text, speech and knowledge, a deeper understanding of human language and thought, studying the true needs of the end user and the demands of the market. They develop novel and improved applications in three areas: Information and Knowledge Management. Document Production, Natural Communication. One of their commercial activities is indexing of German and English texts using the IDX software package.
  • Language Technology (LT) forms a major research area at the Austrian Research Institute for Artificial Intelligence (OFAI) since its inception in 1984. We conduct research in modelling and processing human languages, especially for German. This includes constructing linguistic resources (such as lexicons, grammars, discourse models), processing algorithms (such as morphological components, parsers, generators, speech synthesizers, discourse processing components), and application prototypes (such as natural language interfaces, advisory systems and concept-to-speech systems).


Definition of Human Language Technology (Q1)

The definitions on this topics are numerous and different on the Net. These are two of them:

Wikipedia, under the name of Natural Language Processing, defines our aim of study as

” a subfield of artificial intelligence and linguistics. It studies the problems of automated generation and understanding of natural human languages. Natural language generation systems convert information from computer databases into normal-sounding human language, and natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.”

According to the Language Technology Lab, written by Hans Uszkoreit, Human Language Technology

“comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language. Therefore language technology defines the engineering brach of computational linguistics”.

Searching for Hans Uszkoreit, we can find his curriculum vitae: Uszkoreit studied Linguistics and Computer Science at the Technical University of Berlin from 1973 to 1977 and the University of Texas at Austin from 1977 to 1981. During this time he also worked as a research associate in a large machine translation project at the Linguistics Research Center. He received the Ph. D. (Doctor in Philosophy) in linguistics from University of Texas in 1984. From 1982 until 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, Ca. During this time he was also affiliated with the Center for the Study of Language and Information at Stanford University as a senior researcher and later as a project leader. In 1986 he spent six months in Stuttgart on an IBM (International Business Machines Corporation) Research Fellowship at the Science Division of IBM Germany. In December 1986 he returned to Sttutgart to work for IBM Germany as a project leader in the project LILOG (Linguistic and Logical Methods for the Understanding of German Texts). During this time, he also taught at the University of Stuttgart.
Among all his relevant publications and projects we can quote here some of them:

  • Uszkoreit, H. (2007) Methods and Applications for Relation Detection. In: Proceedings of the Third IEEE International Conference on Natural Language Processing and Knowledge Engineering, Beijing, 2007.
  • Uszkoreit, H., F. Xu, J. Steffen and I. Aslan (2006) The pragmatic combination of different cross-lingual resources for multilingual information services In Proceedings of LREC 2006, Genova, Italy, May, 2006.
  • Uszkoreit, H. (2000): Sprache und Sprachtechnologie bei der Strukturierung digitalen Wissens. In: W. Kallmeyer (Ed.) Sprache in neuen Medien, Institut für Deutsche Sprache, Jahrbuch 1999, De Gruyter, Berlin.
  • Uszkoreit, H. (1999): Sprachtechnologie für die Wissensgesellschaft: Herausforderungen und Chancen für die Computerlinguistik und die theoretische Sprachwissenschaft. In: F. Meyer-Krahmer und S. Lange (Eds.), Geisteswissenschaften und Innovationen, Physica Verlag.
  • Uszkoreit, H. (1998): Cross-Lingual Information Retrieval: From Naive Concepts to Realistic Applications. In: Language Technology in Multimedia Information Retrieval, Proceedings of the14th Twente Workshop on Language Technology.