-- current projects --
EuroMatrix+
Panacea
CoSyne
PLuTO
T4ME
-- completed projects --
Prospect
Attempt
Sign language translation
Evaluation
Transbooster
DVD subtitling
LFG-DOP
EBMT & Marker Hypothesis
DOP & DOT
Hybrid EBMT-SMT
|
| Title: | Example-based Sign Language Machine Translation | |
| Duration: | October 1st 2004 - September 30th 2007 |
| Funded by: | IBM-IRCSET Fellowship | |
| People: | Sara Morrissey, Andy Way |
| Description: | Sign languages (SLs) are the first and preferred languages of the Deaf
Community worldwide. As with other minority languages, they are often poorly
resourced and in many cases lack political and social recognition. As with
speakers of minority languages, Deaf people are often required to access
documentation or communicate in a language that is not natural to them. In
an attempt to alleviate this problem we are developing an example-based
machine translation (EBMT) system to allow Deaf people to access information
in the language of their choice. While some research exists on translating
between natural and sign languages, we believe ours is the first attempt to
tackle this problem using an EBMT approach. |
| | An EBMT approach necessitates the composition of a bilingual data set
aligned sententially and sub-sententially using a predefined method. The
lack of a formally adopted, or even recognised, writing system for SLs makes
finding a dataset suited to our method difficult. Of the small few
transcription methods available, we have chosen to use annotated video data
to construct our bilingual corpus. An example of such data may be seen below
where the video of SL utterances is present in the upper left corner and the
respective annotations of this data presented horizontally below in
correspondence with a timeline. |
| |  |
| | The annotations are composed of a gloss for the articulations of the right
and left hands with the possibility of including non-manual feature (NMF)
details such as head nods and eyebrow movement that can alter the semantics
of a sentence. One of the main advantages to using annotated data is that
all features, (i.e. glosses, NMFs and phonetic description of the signs in
terms of handshape, orientation etc.) can be included and temporally
aligned. This allows for the annotations to be bound together according to
their time frames to form chunks that can correspond to the chunks formed on
the spoken language side of the text. The Marker Hypothesis is used to chunk
the spoken language side of the texts. Despite the different chunking
methods, manual examination of both chunk sets showed a large number of
potentially alignable chunks are produced. |
| | We have developed an EBMT system using data in Dutch Sign
Language/Nederlandse Gebarentaal (NGT). The dataset is composed of only 561
sentences of poetry and children's fables, a topic not suited to machine
translation. For this reason we have created and developed a dataset of
Irish Sign Language (ISL) videos with corresponding annotations three times
the size of the NGT corpus and on the more suited closed domain topic of
flight information queries. |
| | Currently output is in the form of the SL video annotations. In future work,
we intend to make use of the phonetic details added to the annotations in
combination with the glosses and NMFs to automatically produce sign language
using an signing avatar like the one below. |
| |  |
|