Contact

portrait

Joachim Wagner
Postdoctoral Researcher
ADAPT Centre,
School of Computing,
Dublin City University
+353 (0)1 700 6915
jwagner@computing.DCU.IE

Data Normalisation for Tuning Base Text Analytics Technologies to User Generated Content

word cloud

Word cloud created with Wordle.

My research within the CNGL programme is in Creation and Curation Challenge 1 "Tuning Base Text Analytics Technologies to User Generated Content" and focuses on text normalisation (including sentence boundary detection, tokenisation, POS tagging and parsing) and the interaction between text normalisation and domain adaptation.

Collaborators

PhD Students

This research is supported by Science Foundation Ireland.

Publications, talks etc.

See also my google scholar profile.

2015

Joachim Wagner and Jennifer Foster (2015): DCU-ADAPT: Learning Edit Operations for Microblog Normalisation with the Generalised Perceptron. Proceedings of the ACL 2015 Workshop on Noisy User-generated Text (W-NUT), pp. 93-98, Beijing, China. Association for Computational Linguistics. Download paper [PDF, 142 KiB]

2014

Utsab Barman, Joachim Wagner, Grzegorz Chrupała, and Jennifer Foster (2014): DCU-UVT: Word-level language classification with code-mixed data. In Proceedings of the First Workshop on Computational Approaches to Code-Switching. EMNLP 2014, Conference on Empirical Methods in Natural Language Processing, pp. 127-132, Doha, Qatar. Association for Computational Linguistics. Download paper [PDF, 170 KiB]

Utsab Barman, Amitava Das, Joachim Wagner, and Jennifer Foster (2014): Code-mixing: A challenge for language identification in the language of social media. In Proceedings of the First Workshop on Computational Approaches to Code-Switching. EMNLP 2014, Conference on Empirical Methods in Natural Language Processing, pp. 13-23, Doha, Qatar, October. Association for Computational Linguistics. Download paper [PDF, 201 KiB], Download slides [PDF, 473 KiB]

Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Barman, Dasha Bogdanova, Jennifer Foster, and Lamia Tounsi (2014): DCU: Aspect-based polarity classification for SemEval task 4. In Proceedings of the International Workshop on Semantic Evaluation (SemEval-2014), pages 392-397, Dublin, Ireland, August. Association for Computational Linguistics. Download paper [PDF, 130 KiB], Download poster [PDF, 1,364 KiB]

Chris Hokamp, Iacer Calixto, Joachim Wagner and Jian Zhang (2014): Target-Centric Features for Translation Quality Estimation. In Proceedings of the Nineth Workshop on Statistical Machine Translation (WMT 2014), 26-27 June 2014, Baltimore, USA. Download paper [PDF, 107 KiB]

2013

Raphael Rubino, Joachim Wagner, Jennifer Foster, Johann Roturier, Rasoul Samad Zadeh Kaljahi and Fred Hollowood (2013): DCU-Symantec at the WMT 2013 Quality Estimation Shared Task. In Proceedings of the Eighth Workshop on Statistical Machine Translation (WMT 2013), 8-9 August 2013, Sofia, Bulgaria. Download paper [PDF, 150 KiB]

2012

Joseph Le Roux, Jennifer Foster, Joachim Wagner, Rasul Samad Zadeh Kaljahi and Anton Bryl (2012): DCU-Paris13 Systems for the SANCL 2012 Shared Task. System description posted on the Shared-Task Website of The NAACL 2012 First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL) (not part of the proceedings), June 8, 2012, Montreal, Quebec, Canada. Download system description [PDF, 162 KiB]

Raphael Rubino, Jennifer Foster, Joachim Wagner, Johann Roturier, Rasul Samad Zadeh Kaljahi and Fred Hollowood (2012): DCU-Symantec Submission for the WMT 2012 Quality Estimation Task. In Proceedings of the NAACL 2012 Seventh Workshop on Statistical Machine Translation (WMT'12), June 7 - 8, 2012, Montreal, Quebec, Canada. Download paper [PDF, 147 KiB]

Joachim Wagner (2012): Detecting Grammatical Errors with Treebank-Induced, Probabilistic Parsers. PhD Thesis, Dublin City University, Dublin, Ireland. Download thesis [PDF, 5.5 MB]

2011

Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner, Joseph Le Roux, Joakim Nivre, Deirdre Hogan and Josef van Genabith (2011): From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand.

Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner and Josef van Genabith (2011): Comparing the use of edited and unedited text in parser self-training. In Proceedings of the 12th International Conference on Parsing Technologies (IWPT 2011), Dublin, Ireland

Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner, Joseph Le Roux and Stephen Hogan (2011): #hardtoparse: POS Tagging and Parsing the Twitterverse. In Proceedings of the Workshop on Analyzing Microtext at the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), 8 August 2011, Hyatt Regency Hotel, San Francisco

2009

Joachim Wagner and Jennifer Foster (2009): The effect of correcting grammatical errors on parse probabilities. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT'09), Paris, France, 7th-9th October, 2009

Joachim Wagner, Jennifer Foster and Josef van Genabith (2009): Judging Grammaticality: Experiments in Sentence Classification. In CALICO Journal, pages 474-490, volume 26, number 3

2008

Jennifer Foster, Joachim Wagner, and Josef van Genabith (2008): Adapting a WSJ-Trained Parser to Grammatically Noisy Text. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Short Papers, pages 221-224, Columbus, OH, June 15-20, 2008

Deirdre Hogan, Jennifer Foster, Joachim Wagner and Josef van Genabith (2008): Parser-Based Retraining for Domain Adaptation of Probabilistic Generators (Title of early draft: Investigating the Effect of Domain Variation on Generation Performance). In Proceedings of the 5th International Natural Language Generation Conference (INLG08), Salt Fork Park, Ohio, June 12-14, 2008

Jennifer Foster, Joachim Wagner, and Josef van Genabith (2008): Using Decision Trees to Detect and Classify Grammatical Errors. Talk presented jointly by Jennifer and me at the Calico '08 Workshop on Automatic Analysis of Learner Language: Bridging Foreign Language Teaching Needs and NLP Possibilities, University of San Francisco, March 18 and 19, 2008, PDF

Joachim Wagner (2008): Nadja Nesselhauf, Collocations in a Learner Corpus. Book review in Machine Translation Vol 20, No 4, March 2006 [sic], pages 301-303, DOI: 10.1007/s10590-007-9028-8, Draft PDF

2007

Joachim Wagner, Djamé Seddah, Jennifer Foster and Josef van Genabith (2007): C-Structures and F-Structures for the British National Corpus. In Proceedings of the Twelfth International Lexical Functional Grammar Conference (LFG07), pages 418-438, CSLI Publications, Stanford University, July 28-30, 2007, PDF from publisher website, DORAS repository

Joachim Wagner, Jennifer Foster and Josef van Genabith (2007): A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , Prague, June 28-30, 2007 (Extended version presented at the Summer 2007 ParGram meeting in Palo Alto.)

Jennifer Foster, Joachim Wagner, Djamé Seddah and Josef van Genabith (2007): Adapting WSJ-Trained Parsers to the British National Corpus using In-Domain Self-Training. In Proceedings of the 10th International Conference on Parsing Technologies (IWPT 2007), Prague, June 23-24, 2007

2006

Joachim Wagner, Jennifer Foster and Josef van Genabith (2006): Detecting Grammatical Errors Using Probabilistic Parsing. Talk presented by Jennifer at the Workshop on Interfaces of Intelligent Computer-Assisted Language Learning, Ohio State University, December 17, 2006,

2005

Gareth J. F. Jones, Michael Burke, John Judge, Anna Khasin, Adenike Lam-Adesina and Joachim Wagner (2005): Dublin City University at CLEF 2004: Experiments in Monolingual, Bilingual and Multilingual Retrieval. In Multilingual Information Access for Text, Speech and Images: 5th Workshop of the Cross-Language Evaluation Forum, Carol Peters, Paul Clough, Julio Gonzalo, G.Jones, M.Kluck and B.Magnini (Eds.), Volume 3491 of Lecture Notes in Computer Science, pages 207 - 220, Springer, Heidelberg, Germany

2004

Petra Ludewig and Joachim Wagner (2004): Collocations - mediating between lexical abstractions and textual concretions. In Proc. of the sixth TALC conference, pages 32 -33, Granada, Spain - Handout

Cara Greene, Katrina Keogh, Thomas Koller, Joachim Wagner, Monica Ward and Josef van Genabith (2004): Using NLP Technology in CALL. In NLP and Speech Technologies in Advanced Language Learning Systems - Proc. of InSTIL/ICALL2004 Symposium on Computer Assisted Language Learning, ed. Rodolfo Delmonte, Philippe Delcloque and Sara Tonelli, pages 55 - 58, Venice, Italy - Handout, more

Joachim Wagner (2004): A false friend exercise with authentic material retrieved from a corpus. In NLP and Speech Technologies in Advanced Language Learning Systems - Proc. of InSTIL/ICALL2004 Symposium on Computer Assisted Language Learning, pages 115 - 118, Venice, Italy - Poster, more

2003

Monica Ward, Thomas Koller and Joachim Wagner (2003): Integrating Techniques from computational Linguistics into Computer-Assisted Language Learning. Poster presented at the Annual IRCSET Symposium 2003, Dublin, Ireland

Joachim Wagner (2003): Datengesteuerte maschinelle Übersetzung mit flachen Analysestrukturen, Master's thesis, Universität Osnabrück, Germany

2002

Jahn-Takeshi Saito, Joachim Wagner, Graham Katz, Philip Reuter, Michael Burke, and Sabine Reinhard (2002): Evaluation of GermaNet: Problems Using GermaNet for Automatic Word Sense Disambiguation. In Proc. of the LREC Workshop on WordNet Structure and Standardization and how These Affect WordNet Applications and Evaluation, pages 14-29, Las Palmas de Gran Canaria

Norman Kummer and Joachim Wagner (2002): Phrase processing for detecting collocations with KoKS, In online Proc. of Colloc02 Workshop on Computational Approaches to Collocations, http://www.ai.univie.ac.at/colloc02/, Vienna, Austria - more

Arno Erpenbeck, Britta Koch, Norman Kummer, Philip Reuter, Patrick Tschorn and Joachim Wagner (2002): KOKS - Korpusbasierte Kollokationssuche, technical report (Abschlussbericht), Universität Osnabrück, Germany

Enlarged NCLT Logo

NCLT logo
This is an enlarged and edited version of the tiny NCLT logo that can be found on the NCLT website. I created this version to improve its appearance on posters.
Download: 1360 x 1152 image [96 KB]

Building Letter Codes

building letter codes
Excerpt from http://www.dcu.ie/buildings/downloads/brochure.pdf [846 KB]

Other Research Interests

Corpus Preprocessing

I was involved in the work on corpus preprocessing in several projects:

ProjectCorpusYear
LogoTax Spiegel 1999
KoKS De-News and EuroParl (project's own download) 2001
My M.A. Harry Potter 1-4 2003
PhD research leser-service.de (book excerpts) 2004
DCU CLEF Newspaper 2004
PhD research Glasgow Herald (sample) 2005
PhD research EuroParl (OPUS) 2005
PhD research Jennifer Foster's error corpus 2006
PhD research BNC 1.0 2006
PhD research JPU learner corpus (sample) 2007
PhD research PELCRA learner corpus (sample) 2007
PhD research Microsoft ``ESL 123 Mass Noun Examples'' 2007
PhD research ICLE learner corpus 2008
PhD research WSJ raw sections 26-60 (PTB v0.75) 2008
PhD research Gonzaga learner corpus (sample) 2008

Resources

Affiliation

CNGL HPC Cluster Administration

My Other Homepages

Valid XHTML 1.0 Strict


2015-10-28T11:58:49+0000 Wed Oct 28 11:58:49 GMT 2015
© 2004 - 2015 Joachim Wagner jwagner@computing.DCU.IE