National Centre for Language Technology

Dublin City University, Ireland

National Centre for Language Technology


Centre for Next Generation Localisation

School of Computing

School of Applied Languages and Intercultural Studies

School of Electronic Engineering


NCLT Seminar Series








Research Groups


NCLT Seminar Series 2003/2004

The schedule of presenters for the 2003/2004 series was as follows:

October 22nd 2003 Donal Fitzpatrick and Andy Way LaTex tutorial
October 29th 2003 Gareth Jones An Introduction to CLEF (Cross-Language Evaluation Forum)
November 5th 2003 Gaby Saldanha Overview of commercial corpus analysis tools
November 12th 2003 Minako O'Hagan Can Language Technology Respond to the Subtitler's Dilemma? - A Preliminary Study
November 19th 2003 Joachim Wagner Presentation of Joachim's previous work on CALL/CAT projects
November 26th 2003 Bart Mellebeek METAL in the dotcom bubble
December 3rd 2003 Anna Khasin A Goal-Oriented Conversational Agent
February 4th 2004 Gareth Jones and Adenike Lam-Adesina CLEF 2001-2003: Translation for Monolingual, Bilingual and Multilingual Information Retrieval
February 11th 2004 Brian Murphy Tuning Themes: Finding an Appropriate Model of Roles for Translation
February 18th 2004 Declan Groves and Mary Hearne Phrase Structure Tree Alignment for Data Oriented Translation
February 25th 2004 Gareth Jones CLEF meeting
March 10th 2004 Hyowon Lee User Interface Design: design principles
March 31st 2004 Monica Ward XML for Linguists
April 7th 2004 Aphra Kerr Games are a serious research opportunities..
April 21st 2004 Thomas Koller Creating flexible web-based language learning environments via Flash, XML, Perl and PHP

An Introduction to CLEF (Cross-Language Evaluation Forum)

CLEF is a workshop programme organised within Europe to explore multilingual information retrieval applications. CLEF provides standard data sets for a number of evaluation tasks each year. Registered participants are given access to the agreed document collections and search requests, and must subject their best attempts at the task by a fixed deadline. Particpants then meet at a workshop later in the year to compare their methods and results. CLEF is modelled very much on the US NIST TREC (Text REtrieval Conference) programme which has been running for more than 12 years. CLEF was established to focus on issues relating to tasks primarily related to European languages.

CLEF offers a range of tasks each year. The standard tasks are non-English monolingual information retrieval, e.g. building a French or German search engine; cross-language information retrieval, e.g using Italian search topics to retrieve documents from a Spanish collection; and multilingual information retrieval, submitting a search topic to a collection containing documents in more than one language. More recently a number of additional tracks have been introduced including interactive retrieval, how to work with retrieved documents that you cannot read; cross-language image retrieval; cross-language spoken document retrieval; multilingual question-answering, providing an answer to a question rather than a set of documents that might contain the answer; and a domain specific searching task.

In this presentation I will briefly overview the background to CLEF, review the tasks and results of the previous workshops, outline my involvement in the CLEF workshops to date and introduce the expected evaluation tasks for CLEF 2004.

An overview of commercial corpus analysis tools

In this session I will demonstrate three tools for corpus analysis: one (Wordsmith Tools) for use with a monolingual corpus, and two for use with a bilingual (parallel or comparable) corpus (Multiconcord and Paraconc). They are standard tools used not only for research purposes but also by translators, translator trainers, and foreign language teachers. I will first offer an overview of their capabilities, highlighting both their strengths and weaknesses, and then a hands-on session will follow where we'll get to use two of the three tools: Wordsmith and Multiconcord.

Can Language Technology Respond to the Subtitler's Dilemma? - A Preliminary Study

The modern film industry has come to rely substantially on revenue earned outside of the home market for which the film is originally made. The recent widely publicised criticism of the Japanese subtitles for "The Lord of the Rings: The Fellowship of the Ring" revealed that the work had been done by a single translator in just one week. In view of the increasing demand for subtitling not only of major feature films but also anime and computer games, a preliminary study was attempted, using translation technology to test their effectiveness in producing Japanese subtitles for English language audiovisual content. On the basis of the preliminary results, the author hopes to carry out more comprehensive investigation into how technology could help subtitlers meet the end-user requirement for quality within the time and cost pressure imposed by the entertainment industry.

Presentation of Joachim's previous work on CALL/CAT projects

In this session I will talk about my background and my previous work in computer assisted language learning (CALL) and computer aided translating (CAT). For each of the projects I participated in, I will give a short overview of its aims and the CL/NLP techniques employed to achieve them. Then I will focus on LogoTax, a postdoctoral project of Dr. Petra Ludewig, Osnabrueck, Germany. LogoTax is an explorative CALL environment to learn German verb-noun constructions. It uses a corpus and NLP techniques to provide detailed information on V-N constructions and real examples from texts.

METAL in the dotcom bubble

METAL is a transfer-based Machine Translation engine that was developed in the 1960's and was passed on in one or another form to different companies/universities, where it has been under development up until today. In this talk, I will give a short overview of the history of METAL and explain the underlying technology used in the variant acquired by SAIL Labs, where I worked between 1998 and 2001. At its peak, SAIL Labs employed more than 100 linguists and software engineers dedicated to machine translation. I will also briefly sketch the reasons that led to the company's bankruptcy only a few years after its foundation, a situation in which many other companies found themselves after the bursting of the dotcom bubble.

A Goal-Oriented Conversational Agent

My talk will focus on a "conversational agent," or a man-machine dialogue system, that I helped develop. Its purpose was to assist in product selection on Internet shopping sites, as well as fulfil associated functions such as providing help and information about the product base.

CLEF 2001-2003: Translation for Monolingual, Bilingual and Multilingual Information Retrieval

The Cross-Language Evaluation Forum (CLEF) organises an annual workshop focused on comparative analysis of techniques for information retrieval from monolingual, bilingual and multilingual document collections. CLEF works prinicpally with tasks involving European languages. The challenges of CLEF relate to adaptation of information retrieval methods to new languages and the mechanisms for natural language translation for multilingual retrieval environments.

For the last 3 years we have participated in CLEF undertaking tasks of increasing complexity each year, and often achieving excellent comparative results. In this presentation we review our participation in previous CLEF workshops and outline the tasks available for CLEF 2004 in which we are considering participating.

This presentation is intended as an opportunity to find out about the tasks and challenges of CLEF 2004. We hope that some other members of the group will be interested in working with us on the DCU participation in CLEF this year.

Tuning Themes: Finding an Appropriate Model of Roles for Translation

Many schemes have been proposed for describing the semantics of the relationship between entities and situations. The number of thematic or semantic roles posited ranges from few (2 for Dowty) to many (35 for Dixon). This empirical study examines the same short text in Irish, English, Chinese, German and Spanish to see what approximate level of specificity is sufficient to find cross-language agreement when assigning roles to translated sentences, and then what particular model of roles delivers the most faithful rendering of predicate-argument relationships.

Phrase Structure Tree Alignment for Data Oriented Translation

Data Oriented Translation makes use of phrase-structure tree alignments which in the past have been performed manually Ė a very time-consuming and labour intensive task, which requires knowledge of both the source and target languages.

A number of algorithms for the automatic alignment of structure trees have been put forward recently (Eisner et al 2003, Ding et al 2003, Grishman 1994), but all of these methods have been developed for use with dependency tree structures rather than phrase-structure trees, and none with a view towards the use of alignments in translation. We have developed an algorithm that successfully aligns phrase-structure trees and that can be easily applied to any language pair. In this talk we will give a brief overview of the DOT system, an outline of the algorithm itself and present results that we have achieved so far.

User Interface Design: design principles

Typically, technically-oriented developers of a system who have been working on the inside workings of the system end up designing the front-end without much consideration on human users and their needs, what they are good at and what they are not good at. However, technical developers are well-known for their poor grasp of how to design the front-end to support their human users. In this seminar, we will look at the basics of usability: focusing on target users, do's and don't's of interaction design (widely known as 'design principles') with some example interfaces. This seminar will help the researchers who need to develop a front-end who do not have background in usability and interaction design field.

XML for Linguists

XML has emerged as a useful and flexible way of tagging data. It can be used to manage web pages or to annotate data for linguistic purposes. This talk will provide an overview of XML and how it can be used. It will outline the philosophy behind XML and will provide examples that are of interest to linguists, be they traditional or computational linguists. The talk will cover representation techniques, description levels and tools. No prior programming experience will be assumed, although something will be included to keep those with prior knowledge awake.

Games are a serious research opportunities..

For the past three and a half years Iíve been involved in researching games as a new media industry, as an emerging cultural form and as cultural practice. From an initial situation whereby one had to justify such a research interest, things have changed significantly and there is now an international digital games research association (DIGRA), a growing number of digital game courses at third level and an expanding range of research projects and opportunities for interesting inter-disciplinary work in this area. This talk will provide an introduction to what we mean when we talk about digital games, it will examine the global industry and where Ireland is positioned within it and it will examine some serious design issues facing the industry.

More info:

Creating flexible web-based language learning environments via Flash, XML, Perl and PHP

The development of modern language learning software requires the integration of graphical components, flexible database technologies and NLP (Natural Language Processing) tools. Apart from developing sophisticated language processing tools (parsers, morphologizers, corpora processing tools, etc.), one has to create intuitive and adaptable user interfaces and to deploy flexible database technologies which lend themselves readily to diverse application programming interfaces (APIís). The incorporation of these integrated systems certainly fosters the acceptance and applicability in the real language learning lab.

In my presentation I will first give an overview of different approaches to present multimedia content. Then I will present a software architecture which combines the graphical software Macromedia Flash, the scripting languages Perl and PHP and the data description and exchange format XML (Extensible Markup Language) in order to exchange and process language data.

Dublin City University   Last update: 1st October 2010