National Centre for Language Technology

Dublin City University, Ireland

National Centre for Language Technology

 

Centre for Next Generation Localisation

School of Computing

School of Applied Languages and Intercultural Studies

School of Electronic Engineering

 
 
 

NCLT Links

NCLT Seminar Series 2008/2009

NCLT Home

Members

History

Projects

Publications

Research Groups

 

NCLT Seminar Series 2009/2010

The NCLT seminar series usually takes place on Wednesdays from 4-5 pm in Room L2.21 (School of Computing).

The schedule of presenters will be added below as they are confirmed. Please contact Deirdre Hogan if you have any queries about the NCLT 2009/2010 Seminar Series.

November 11th 2009; 16:00, L2.21 Hanna Bechara, NCLT, DCU. TBA
November 4th 2009; 16:00, L2.21 Johannes Leveling, NCLT, DCU. Semantic analysis for NLP-based applications
October 28th 2009; 16:00, L2.21 Sudip Naskar, CNGL-DCU. Template based EBMT
October 22th 2009; 14:00, CNGL board room Yusuke Miyao, Tsujii lab, University of Tokyo Grammar engineering work at U-Tokyo View slides
October 22th 2009; 14:30, CNGL board room Takuya Matsuzaki, Tsujii lab, University of Tokyo. HPSG parser development at U-Tokyo View slides
October 2nd 2009; 14:00, L2.21 Anton Bryl, Joachim Wagner IWPT dry-runs
October 1st 2009; 14:00, L2.21 Antonio Toral, Istituto di Linguistica Computazionale - CNR (Pisa, Italy) and Universitat d'Alacant (Spain) Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon
September 24th 2009; 11:00, CNGL Board Room Yvette Graham, NCLT, DCU Fstructure Transfer-Based SMT
September 24th 2009; 11:30, CNGL Board Room Ozlem Cetinoglu, NCLT, DCU DCU Grammar Projects
September 21st 2009; 12:00, L2.21 Pavel Pecina, Charles University, Prague. Lexical Association Measures and Collocation Extraction
September 9th 2009 Joel Tetreault, Educational Testing Service, Princeton The Ups and Downs of Preposition Error Detection
Template based EBMT

Sudip Naskar, CNGL-DCU

Example-based machine translation (EBMT) is essentially translation by analogy. It takes a stance somewhere between RBMT and SMT. EBMT systems differ widely in the way they store the examples. In this talk I will give an overview of EBMT and the core tasks involved: matching, alignment and recombination. I will briefly discuss about different approaches to EBMT: run-time approach to EBMT, template-based EBMT, and tree-based EBMT. Then I will talk about generalized templates and Template based EBMT in detail.


Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon

Antonio Toral, Istituto di Linguistica Computazionale - CNR (Pisa, Italy) and Universitat d'Alacant (Spain)

In this talk I will tackle the knowledge acquisition bottleneck problem in the field of Computational Linguistics. This issue will be introduced and studied from a general point of view, with the aim of identifying key elements that could lead us a step forward. In this respect, I will argue in favour of highlighting Language Resources (LRs), Web 2.0 sources and representation standards. Subsequently, I will move to the specific by applying these guidelines to a case of study: the acquisition of Named Entities (NEs). I will present an automatic procedure to build a multilingual lexicon of NEs and to connect it to other LRs and ontologies. The different phases of this methodology and the techniques involved (e.g. text similarity) will be evaluated and, furthermore, I'll show the utility of the knowledge gathered by applying it to a real-world Question Answering scenario.

View slides


Fstructure Transfer-Based SMT

Yvette Graham, NCLT, DCU.

Transfer-Based SMT is composed of three parts, i) parsing to deep linguistic structure, ii) transfer from source language (SL) linguistic structure to target language (TL) linguistic structure and iii) generation of TL sentence. Each of the three steps uses a statistical model to select the best or n-best output. In this talk, I describe a Transfer-Based SMT system that uses the LFG Fstructure as the intermediate representation for transfer and is trained fully automatically on LFG-parsed bilingual corpora. For training, similar to Phrase-Based SMT we extract phrasal correspondences by firstly establishing a word alignment between pairs of sentences before extracting all phrases consistent with this alignment. In our case, the word alignment is between nodes in dependency structures as opposed to surface form sentences. In addition, the structure of phrases extracted are pairs of dependency snippets with variables allowed at leaf level to map missing arguments to the correct position in the TL. The system includes a statistical beam-search decoder that uses a log-linear model to combine feature scores for ranking hypothesis TL structures. In the talk, I will present preliminary experiment results for the system trained on Europarl and Newswire text for a restricted sentence length 5-15 words and tested on held-out data.


DCU Grammar Projects

Ozlem Cetinoglu, NCLT, DCU.

DCU employs systems that can automatically annotate Penn-II style trees and generate deep syntactic analyses based on Lexical Functional Grammar. This talk starts with a very brief overview of the existing automatic annotation algorithms developed at DCU. The remaining of the talk focuses on the English Annotation Algorithm, in particular, enriching and restructuring its output so that the resulting syntactic analyses also contain deep semantic representations.


Lexical Association Measures and Collocation Extraction

Pavel Pecina, Charles University, Prague.

We present an extensive empirical evaluation of collocation extraction methods based on lexical association measures and their combination. The experiments are performed on a set of collocation candidates extracted from the Prague Dependency Treebank with manual morphosyntactic annotation. The collocation candidates were manually labeled as collocational or non-collocational. The evaluation is based on measuring the quality of ranking the candidates according to their chance to form collocations. Performance of the methods is compared by precision-recall curves and mean average precision scores. Further, we study the possibility of combining lexical association measures and present empirical results of several combination methods that significantly improved the state-of-the art in this task. We also propose a model reduction algorithm significantly reducing the number of combined measures without a statistically significant difference in performance.

View slides


The Ups and Downs of Preposition Error Detection

Joel Tetreault, Educational Testing Service, Princeton

The long-term goal of our work is to develop a system which detects errors in grammar and usage so that appropriate feedback can be given to non-native English writers, a large and growing segment of the world's population. Estimates are that in China alone as many as 300 million people are currently studying English as a second language (ESL). In particular, usage errors involving prepositions are among the most common types seen in the writing of non-native English speakers. For example, Izumi et al., (2003) reported error rates for English prepositions that were as high as 10% in a Japanese learner corpus. Since prepositions are such a nettlesome problem ESL writers, developing an NLP application that can reliably detect these types of errors will provide an invaluable learning resource to ESL students. To address this problem, we use a maximum entropy classifier combined with rule-based filters to detect preposition errors in a corpus of student essays with a precision of 84%. In this talk, I will discuss the system as well as issues in developing and evaluating NLP grammatical error detection applications.

View slides






Dublin City University   Last update: 24th September 2009