National Centre for Language Technology

National Centre for Language Technology

Dublin City University
School of Computing
School of Applied Language and Intercultural Studies
School of Electronic Engineering

NCLT Links

About NCLT
News Items
NCLT Members
Postgraduate Students
Publications
Research Areas
Projects
Software Demos
Awards
Industry Links
Academic Links
Activities
Degree Programmes at DCU
NCLT History
Other Links

Research Areas

CALL Computer Assisted Language Learning
    Computer Assisted Language Learning (CALL) studies the role and the use of Information and Communication Technologies (ICT) in second/foreign language learning and teaching. It includes a wide range of activities spanning materials and courseware development, pedagogical practice and research (see http://www.eurocall.org/research_policy.htm for an internationally approved definition of the field).

    Within NCLT, we have looked at the use of authoring tools in the foreign language classroom as well as for self-access purposes. We are conducting research in developing tools and an adequate framework for the use and integration of Computer-Mediated Communication (CMC), mainly within a task or project-based approach. Some of the areas we are focusing on are learner autonomy, tandem language learning, multidisciplinary and multilingual collaborative learning at a distance (e.g. TECHNE Project), the effects of syntax priming in bilingual asyncronous communication and motivation among others. We are working on the development of applications and tools which support communication over the web for language learning purposes as well as allowing for collection of data for learner corpora. We are also interested in the development and use of tools that facilitate learning in multilingual virtual learning environments. Such tools involve collaboration with Machine Translation and Speech Technology colleagues.

    The CALL group is involved in teacher training activities. In 2001, two workshops were organised for DCU staff and facilitated by world experts in CALL. We are contributing to the OILTE Project (CALL training for trainers project for primary and secondary teachers in Ireland) in collaboration with the Linguistics Institute of Ireland (ITE), EUROCALL and the NCTE.
Corpus Linguistics
    Computational Models of Concept Combination
      Research on concept combination aims to understand how people think, to develop computational models of thinking, and to test those models experimentally.

      In particular, researchers in this area develop and test computational models of how people use and combine their mental representations of concepts and categories. For example one of us (Costello) has developed a model of how people classify items in simple categories (how they recognise and identify members of categories like "animal", "pet", or "carnivore") and how they manipulate those categories to classify items in combinations of categories (how they identify "a carnivorous animal which is also a pet"). This model accounts for a number of empirically observed patterns in people's classification of items in simple and combined categories, and can make accurate predictions about people's classifications (see papers). Related research aims to apply this work to medical diagnosis and the identification of multiple disorders in patients, and to classification in subsective and privative adjective-noun combinations such as "skilful violinist" and "fake surgeon" (again see papers). Other releated research involves developing an information retrieval (IR) system based on this model. IR systems aim to find documents matching a user's query, where the query consists of a combination of terms (e.g. a query for documents describing "carnivorous pet animals"). Models of classification in combined categories should provide a useful framework for such systems.
    Endangered Languages
      90% of the world's languages are in danger of disappearing within the next 100 years. Many of the world's languages have never been written and some have less than 20 speakers left. Endangered Languages are as rich and complex as any of the world's major languages - they represent part of our world heritage and knowledge. Language loss results both in cultural and scientific loss. There is recognition within the linguistics community that something must be done for Endangered Languages. Current research in the NCLT focuses on the creation of CALL programs for Endangered Languages - not only as a way of teaching the language but also to act as a conduit to language documentation.

      Papers of interest:

        Hale, K (1992). On Endangered Languages and the safeguarding of diversity. Language V68. No 1 (1992).

        Krauss, M. (1992). The world's languages in crisis. Language Vol. 68 No. 1 (1992)

        Unesco (1996) World Conference on Linguistic Rights: Barcelona Declaration. Barcelona, 1996

        Ward, M. (2001), "A template for a CALL program for Endangered Languages". Linguistic Perspectives on Endangered Languages, Helsinki, Finland 2001 (to appear).
    Language Evolution
      Research in Language Evolution involves the artificial creation of systems which synthesise conditions facilitating language evolution. Language can evolve through reinforcement learning where agents communicate with each other and provide rewards if communication is successful. The fundamental difference between the learning mechanisms that humans use to communicate with one another and how machines learn to communicate can be summarised as follows: the system used by humans presupposes that the adult already knows the meanings associated with human language, while machine learning usually requires the training of one of the participants. Two points need to be addressed (1) How does the human infant acquire modern language? (2) How did language itself evolve? By attempting to answer these questions can we build a system that has the capacity to evolve a communications protocol ?

      Languages evolve historically to be optimal communication systems where human language learning mechanisms have evolved in order to learn these systems more efficiently. Machines in their learning of natural language have to start at a place that humans mastered thousands of years ago, uttering previously unheard signals and collectively establishing meaning. The question that this research deals with is how can a communication system that uses evolutionary computation and reinforcement learning evolve if initially none of the conspirators have mastered the system.
    Machine Translation
      Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains.

      The term MT is associated with standalone translation programs. Nowadays a number of Computer-aided translation (CAT) tools exist, such as Translation Memory, dictionary lookup programs and Terminology management tools. These translation aids are of particular use when translating highly repetitive texts, such as technical documentation. The main growth in the use of MT is via the Internet, with hundreds of millions of pages of text available for training statistical systems. On-line translation systems such as Babelfish are being used to connect an increasingly gloabl -- and linguistically diverse -- public.

      The field of MT is currently as healthy as it has been for years. More CAT (and even MT) tools are being used in the software localisation industry, conferences abound, and more and more students are being exposed to MT and translation tools in their programmes of study. This has all come about primarily because of a widespread recognition -- long known by many MT researchers -- of the limitations of MT, so that people's expectations are much more realistic than before, with the result that their aims are much more likely to be met by MT than was previously the case.
    Translation Technology
      Formal Semantics
        In most communicative situations we use language to transmit information. Semantics models the information content or meaning conveyed by language. Formal semantics uses Logic to model meaning. A Logic is a formal language which is interpreted in models. Furthermore, a consequence relation (what follows from what; what you can infer given some premises) is defined between expressions in a logic. Logic captures two important aspects of (literal) meaning: "aboutness" and "inference". Logical representations and inference play a crucial role in NLP applications of and research on Machine Translation, Natural Language Interfaces to computer systems, Question-Answering systems, Knowledge Representation and Information Extraction.

        References


          Crouch, R. and van Genabith, J., (1999), "Context Change, Underspecification and the Structure of Glue Language Derivations", in (ed.) Mary Dalrymple, Semantics and Syntax in Lexical Functional Grammar: The Resource Logic Approach pp. 117-189, The MIT Press, Cambridge, Massachusetts, ISBN 0-262-04171-5.

          van Genabith, J. and Crouch, R., (1999), "Dynamic and Underspecified Semantics for LFG", in (ed.) Mary Dalrymple, Semantics and Syntax in Lexical Functional Grammar: The Resource Logic Approach pp. 209-260, The MIT Press, Cambridge, Massachusetts, ISBN 0-262-04171-5.

          van Genabith, J. and Crouch, R., (1999), "How to Glue a Donkey to an f-Structure or Porting a Dynamic Meaning Representation Language into LFG's Linear Logic Based Glue Language Semantics", in: Computing Meaning, volume 1, (eds.) Harry Bunt and Reinhard Muskens, Studies in Linguistics and Philosophy , volume 73, Kluwer Academic Press, Dordrecht, Boston and London, 1999, pp.129 - 148, ISBN 0-7923-6108-3

      Software Localisation and Globalization
        Localisation is the process of integrating the whole of the product cohesively into the language and culture of the target markets to meet their specific needs. It involves all components of a software product including the adaptation of the software's functionality, and the translation of manuals and on-screen text, as well as affecting technical specification and marketing literature. It also includes ensuring graphics, colours and sound effects are culturally appropriate.

        It is now widely accepted within the computer industry that Ireland is a world centre of excellence in software localisation with most major software firms having a significant presence in the field in this country. It is estimated that Ireland exports up to 60% of PC-based software sold in Europe, and is the world's second-largest exporter of software after the USA [LOCA97]. Those companies that have chosen Ireland for their product localisation centres include software publishers such as Microsoft World Product Group Ireland, Lotus Development Ireland, Corel Corporation, Symantec, Visio International, Novell, Oracle Corporation and Claris; hardware manufacturers such as Gateway 2000 and Sun Microsystems; Service Providers such as Berlitz International; and tools developers such as Trados.
      Speech Technology
        Once lying in the realm of science fiction, spoken language technology has benefitted from developments in computational linguistics, artificial intelligence, computer science, mathematical modelling and engineering principles to become an area where commercial products have been available for over 10 years, and where the expectation is that these ever-emerging applications will be ubiquitous in the not-too-distant future (see a recent Irish Times article).

        Speech technology encompasses many subdisciplines and encorporates much technology from other areas of NLP. The most immediate applications that spring to mind are: automatic speech recognition (ASR) - in simple terms, talking to machines; and speech generation (synthesis) - getting machines to talk.

        Related to speech recognition is speaker recognition, which has two main subdivisions: speaker identification and speaker verification, the latter having implications for security.

        Language identification plays an important role in multiligual spoken language systems, which are important in dealing with speakers from different language groups within a country, or visitors from abroad.

        Multimodality involves combinations of speech, text and image processing. Some examples are integrating recognition of facial movements with speech recognition and generating facial movements for talking heads.

        Some other areas of huge importance to these applications and speech communication in general are: speech analysis, speech coding and speech enhancement. Speech analysis is particularly important in helping us better understand the production of human speech which in turn helps us improve speech technology in general.

        An excellent overview of speech technology (and human language technology in general) can be found at http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html
      Terminology
        Treebank Annotation
          Treebanks are large parse-annotated text corpora. The Penn-II treebank (> 1M words) or the Susanne corpus are probably the most widely used examples. Treebank resources can be used to develop large scale computational grammar resources and to train probabilistic grammars. We have developed a number of methodologies for semi-automatically annotating treebank resources with feature structure information as used in current unification grammars.

          References

            Sadler, L., van Genabith, J., and Way, A., (2000), "Automatic F-Structure Annotation from the AP Treebank", LFG-2000, Proceedings of the fifth International Conference on Lexical-Functional Grammar, The University of California at Berkeley, 19 July - 20 July 2000, CSLI Publications, Stanford, CA, ISSN 1098-6782, http://cslipublications.stanford.edu/


        Last Updated: 12th July 2002 by aclweb@computing.dcu.ie