[Elsnet-list] 1st CFP: Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12 WS and Shared Task)

Marta Ruiz martaruizcostajussa at gmail.com
Tue Jul 31 14:50:53 CEST 2012

“Second Workshop on Applying Machine Learning Techniques to Optimise the
Division of Labour in Hybrid MT (ML4HMT-12 WS and Shared Task)”

The "Second Workshop on Applying Machine Learning Techniques to Optimise
the Division of Labour in Hybrid MT: ML4HMT-12” is an effort to trigger a
systematic investigation on improving state-of-the-art hybrid machine
translation, making use of advanced machine-learning (ML) methodologies.

It follows the ML4HMT-11 workshop (http://www.dfki.de/ml4hmt/), which took
place last November in Barcelona. The first workshop also road-tested a
shared task (and associated data set) and laid the basis for a broader
reach in 2012.

ML4HMT-12 involves regular papers on hybrid MT as well as a Shared Task.

Regular Papers ML4HMT-12

We are soliciting original papers on hybrid MT, including (but not limited

* use of machine learning methods in hybrid MT;
* system combination: parallel in multi-engine MT (MEMT) or sequential in
statistical post-editing (SPMT);
* combining phrases and translation units from different types of MT;
* syntactic pre-/re-ordering;
* using richer linguistic information in phrase-based or in hierarchical
* learning resources (e.g., transfer rules, transduction grammars) for
probabilistic rule-based MT.

Full papers should be anonymous and follow the COLING full paper format (

Shared Task ML4HMT-12

The main focus of the Shared Task is to address the question:

“Can Hybrid MT and System Combination techniques benefit from extra
information (linguistically motivated, decoding, runtime, confidence
scores, or other meta-data) from the systems involved?”

Participants are invited to build hybrid MT systems and/or system
combinations by using the output of several MT systems of different types,
as provided by the organisers.

While participants are encouraged to explore machine learning techniques to
explore the additional meta-data information sources, other general
improvements in hybrid and combination based MT are strongly invited to
participate in the challenge.

For systems that exploit additional meta-data information the challenge is
that additional meta-data is highly heterogeneous and (individual) system


The ML4HMT-12 Shared Task involves (ES-EN) and (ZH-EN) data sets, in each
case translating into EN.

* (ES-EN): Participants are given a development bilingual set aligned at a
sentence level. Each "bilingual sentence" contains: 1) the source sentence,
2) the target (reference) sentence and 3) the corresponding multiple output
translations from five systems, based on different MT approaches (Apertium,
Ramirez-Sanchez, 2006; Joshua, Zhifei Li et al, 2009; Lucy, Alonso and
Thurmair, 2003; Moses, Koehn et. al., 2007). The output has been annotated
with system-internal meta-data information derived from the translation
process of each of the systems.

* (ZH-EN) A corresponding data set for ZH-EN with output translations from
three systems (Moses, Joshua and Huajian RBMT) will be provided.

Baselines are given by state-of-the-art open-source system-combination
systems: MANY (Barrault, 2010) and CMU-MEMT (Heafield and Lavie, 2010).

Participants are challenged to build an MT mechanism that improves over the
baseline, where possible making effective use of the system-specific MT
meta-data output. They can provide solutions based on opensource systems,
or develop their own mechanisms. The development set can be used for tuning
the systems during the development phase. Final submissions have to include
translation output on a test set, which will be made available one week
after training data release. Data will be provided to build
language/reordering models, possibly re-using existing resources from MT

Participants can also make use of additional (linguistic analysis,
confidence estimation etc.) tools, if their systems require so, but they
have to explicitly declare this upon submission, so that they are judged as
"unconstrained" systems. This will allow for a better comparison between
participating systems.

System output will be judged via peer-based human evaluation as well as
automatic evaluation. During the evaluation phase, participants will be
requested to rank system outputs of other participants through a web-based
interface (Appraise, Federmann 2010). Automatic metrics include BLEU
(Papineni et. Al, 2002), TER (Snover et al., 2006) and METEOR  (Lavie,

Shared task participants will be invited to submit system description
papers (7 pages, not blind and should follow COLING format,

The ML4HMT workshop is supported by the META-NET T4ME project (
http://t4me.dfki.de/), funded by the DG INFSO of the European Commission
through the Seventh Framework Programme, grant agreement no.:
249119META-NET (http://www.meta-net.eu/).

Important Dates 2012
15th August Shared task Training data release (updated ML4HMT corpus)
23rd August Shared task Test data release
15th September Shared task Translation results submission deadline
21st September Shared task Evaluation results release
30th September Workshop full paper and Shared task system description paper
submission deadline
31st October Workshop paper accept/reject notification
15th November Workshop and Shared task Camera ready paper due
8th and 9th December Pre-conference workshops

-Prof. Josef van Genabith, Dublin City University (DCU) and Centre for Next
Generation Localisation (CNGL)
-Prof. Toni Badia, Universitat Pompeu Fabra and Barcelona Media (BM)
-Christian Federmann, German Research Center for Artificial Intelligence
(DFKI), contact person: cfedermann at dfki.de
-Dr. Maite Melero, Barcelona Media (BM)
-Dr. Marta R. Costa-jussà, Barcelona Media (BM)
-Dr. Tsuyoshi Okita, Dublin City University (DCU)

Program committee

- Eleftherios Avramidis (German Research Center for Artificial
Intelligence, Germany)
- Prof. Sivaji Bandyopadhyay (Jadavpur University, India)
- Dr. Rafael Banchs (Institute for Infocomm Research - I2R, Singapore)
- Prof. Loïc Barrault (LIUM - University of Le Mans, France)
- Prof. Antal van den Bosch (Centre for Language Studies, Radboud
University Nijmegen, Netherlands)
- Dr. Grzegorz Chrupala (Saarland University, Saarbrücken, Germany)
- Prof. Jinhua Du (Xi'an University of Technology (XAUT), China)
- Dr. Andreas Eisele (Directorate-General for Translation (DGT),
- Dr. Cristina España-Bonet (Technical University of Catalonia, TALP,
- Dr. Declan Groves (Center for Next Generation Localisation, Dublin City
University, Ireland)
- Dr. Yuqing Guo (Toshiba China, Research & Development Center)
- Prof. Jan Hajic (Institute of Formal and Applied Linguistics, Charles
University in Prague)
- Prof. Timo Honkela (Aalto University, Finland)
- Dr. Patrick Lambert (LIUM - University of Le Mans, France)
- Prof. Qun Liu (Institute of Computing Technology, Chinese Academy of
Sciences, China)
- Dr. Maite Melero (Barcelona Media Innovation Center, Spain)
- Dr. Tsuyoshi Okita (Dublin City University, Ireland)
- Prof. Pavel Pecina (Institute of Formal and Applied Linguistics, Charles
University in Prague)
- Dr. Marta R. Costa-jussà (Barcelona Media Innovation Center, Spain)
- Dr. Felipe Sanchez Martinez (Escuela Politecnica Superior, Universidad de
Alicante, Spain)
- Dr. Nicolas Stroppa (Google, Zurich, Switzerland)
- Prof. Hans Uszkoreit (German Research Center for Artificial Intelligence,
- Dr. David Vilar (German Research Center for Artificial Intelligence,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.elsnet.org/pipermail/elsnet-list/attachments/20120731/179ff552/attachment.htm 

More information about the Elsnet-list mailing list