|
National Centre for Language Technology |
|
|
|
|
|
NCLT Links |
NCLT Seminar SeriesThe schedule of presenters for the 2003/2004 series was as follows:An Introduction to CLEF (Cross-Language Evaluation Forum) CLEF is a workshop programme organised within Europe to explore multilingual information retrieval applications. CLEF provides standard data sets for a number of evaluation tasks each year. Registered participants are given access to the agreed document collections and search requests, and must subject their best attempts at the task by a fixed deadline. Particpants then meet at a workshop later in the year to compare their methods and results. CLEF is modelled very much on the US NIST TREC (Text REtrieval Conference) programme which has been running for more than 12 years. CLEF was established to focus on issues relating to tasks primarily related to European languages. An overview of commercial corpus analysis tools In this session I will demonstrate three tools for corpus analysis: one (Wordsmith Tools) for use with a monolingual corpus, and two for use with a bilingual (parallel or comparable) corpus (Multiconcord and Paraconc). They are standard tools used not only for research purposes but also by translators, translator trainers, and foreign language teachers. I will first offer an overview of their capabilities, highlighting both their strengths and weaknesses, and then a hands-on session will follow where we'll get to use two of the three tools: Wordsmith and Multiconcord. Can Language Technology Respond to the Subtitler's Dilemma? - A Preliminary Study The modern film industry has come to rely substantially on revenue earned outside of the home market for which the film is originally made. The recent widely publicised criticism of the Japanese subtitles for "The Lord of the Rings: The Fellowship of the Ring" revealed that the work had been done by a single translator in just one week. In view of the increasing demand for subtitling not only of major feature films but also anime and computer games, a preliminary study was attempted, using translation technology to test their effectiveness in producing Japanese subtitles for English language audiovisual content. On the basis of the preliminary results, the author hopes to carry out more comprehensive investigation into how technology could help subtitlers meet the end-user requirement for quality within the time and cost pressure imposed by the entertainment industry. Presentation of Joachim's previous work on CALL/CAT projects In this session I will talk about my background and my previous work in computer assisted language learning (CALL) and computer aided translating (CAT). For each of the projects I participated in, I will give a short overview of its aims and the CL/NLP techniques employed to achieve them. Then I will focus on LogoTax, a postdoctoral project of Dr. Petra Ludewig, Osnabrueck, Germany. LogoTax is an explorative CALL environment to learn German verb-noun constructions. It uses a corpus and NLP techniques to provide detailed information on V-N constructions and real examples from texts. METAL is a transfer-based Machine Translation engine that was developed in the 1960's and was passed on in one or another form to different companies/universities, where it has been under development up until today. In this talk, I will give a short overview of the history of METAL and explain the underlying technology used in the variant acquired by SAIL Labs, where I worked between 1998 and 2001. At its peak, SAIL Labs employed more than 100 linguists and software engineers dedicated to machine translation. I will also briefly sketch the reasons that led to the company's bankruptcy only a few years after its foundation, a situation in which many other companies found themselves after the bursting of the dotcom bubble. A Goal-Oriented Conversational Agent My talk will focus on a "conversational agent," or a man-machine dialogue system, that I helped develop. Its purpose was to assist in product selection on Internet shopping sites, as well as fulfil associated functions such as providing help and information about the product base. CLEF 2001-2003: Translation for Monolingual, Bilingual and Multilingual Information Retrieval The Cross-Language Evaluation Forum (CLEF) organises an annual workshop focused on comparative analysis of techniques for information retrieval from monolingual, bilingual and multilingual document collections. CLEF works prinicpally with tasks involving European languages. The challenges of CLEF relate to adaptation of information retrieval methods to new languages and the mechanisms for natural language translation for multilingual retrieval environments. For the last 3 years we have participated in CLEF undertaking tasks of increasing complexity each year, and often achieving excellent comparative results. In this presentation we review our participation in previous CLEF workshops and outline the tasks available for CLEF 2004 in which we are considering participating. This presentation is intended as an opportunity to find out about the tasks and challenges of CLEF 2004. We hope that some other members of the group will be interested in working with us on the DCU participation in CLEF this year. Tuning Themes: Finding an Appropriate Model of Roles for Translation Many schemes have been proposed for describing the semantics of the relationship between entities and situations. The number of thematic or semantic roles posited ranges from few (2 for Dowty) to many (35 for Dixon). This empirical study examines the same short text in Irish, English, Chinese, German and Spanish to see what approximate level of specificity is sufficient to find cross-language agreement when assigning roles to translated sentences, and then what particular model of roles delivers the most faithful rendering of predicate-argument relationships. Phrase Structure Tree Alignment for Data Oriented Translation Data Oriented Translation makes use of phrase-structure tree alignments which in the past have been performed manually – a very time-consuming and labour intensive task, which requires knowledge of both the source and target languages. A number of algorithms for the automatic alignment of structure trees have been put forward recently (Eisner et al 2003, Ding et al 2003, Grishman 1994), but all of these methods have been developed for use with dependency tree structures rather than phrase-structure trees, and none with a view towards the use of alignments in translation. We have developed an algorithm that successfully aligns phrase-structure trees and that can be easily applied to any language pair. In this talk we will give a brief overview of the DOT system, an outline of the algorithm itself and present results that we have achieved so far. User Interface Design: design principles Typically, technically-oriented developers of a system who have been working on the inside workings of the system end up designing the front-end without much consideration on human users and their needs, what they are good at and what they are not good at. However, technical developers are well-known for their poor grasp of how to design the front-end to support their human users. In this seminar, we will look at the basics of usability: focusing on target users, do's and don't's of interaction design (widely known as 'design principles') with some example interfaces. This seminar will help the researchers who need to develop a front-end who do not have background in usability and interaction design field. XML has emerged as a useful and flexible way of tagging data. It can be used to manage web pages or to annotate data for linguistic purposes. This talk will provide an overview of XML and how it can be used. It will outline the philosophy behind XML and will provide examples that are of interest to linguists, be they traditional or computational linguists. The talk will cover representation techniques, description levels and tools. No prior programming experience will be assumed, although something will be included to keep those with prior knowledge awake. Games are a serious business..new research opportunities.. For the past three and a half years I’ve been involved in researching games as a new media industry, as an emerging cultural form and as cultural practice. From an initial situation whereby one had to justify such a research interest, things have changed significantly and there is now an international digital games research association (DIGRA), a growing number of digital game courses at third level and an expanding range of research projects and opportunities for interesting inter-disciplinary work in this area. This talk will provide an introduction to what we mean when we talk about digital games, it will examine the global industry and where Ireland is positioned within it and it will examine some serious design issues facing the industry. More info: Creating flexible web-based language learning environments via Flash, XML, Perl and PHP The development of modern language learning software requires the integration of graphical components, flexible database technologies and NLP (Natural Language Processing) tools. Apart from developing sophisticated language processing tools (parsers, morphologizers, corpora processing tools, etc.), one has to create intuitive and adaptable user interfaces and to deploy flexible database technologies which lend themselves readily to diverse application programming interfaces (API’s). The incorporation of these integrated systems certainly fosters the acceptance and applicability in the real language learning lab. In my presentation I will first give an overview of different approaches to present multimedia content. Then I will present a software architecture which combines the graphical software Macromedia Flash, the scripting languages Perl and PHP and the data description and exchange format XML (Extensible Markup Language) in order to exchange and process language data. Details of previous seminars:
Last Updated: 13th April 2004 by aclweb@computing.dcu.ie |