[Elsnet-list] Multilingual summary evaluation data available

Ralf Steinberger ralf.steinberger at jrc.ec.europa.eu
Thu Sep 30 22:37:45 CEST 2010


  Dear Colleagues,

we are happy to inform you that the first multilingual summary 
evaluation data in seven languages is available for research and 
evaluation purposes. You can download the set from 
http://langtech.jrc.ec.europa.eu/JRC_Resources.html .

This dataset consists of a manually annotated collection of document 
clusters of parallel texts in seven languages (Arabic, Czech, English, 
French, German, Russian and Spanish) that can be used to evaluate 
multi-document, or even single document, summarisation software. The 
data is particularly useful to compare the performance of software 
across languages.

The four document clusters consist of five high-level commentaries each, 
selected from http://www.project-syndicate.org/, discussing fields that 
can roughly be described as being about malaria, 
Israel-and-Palestine-Conflict, genetics and science-and-society.

The resource and its use are described in:

     Marco Turchi, Josef Steinberger, Mijail Kabadjov and Ralf 
Steinberger (2010)
     Using Parallel Corpora for Multilingual (Multi-document) 
Summarisation Evaluation.
Springer Lecture Notes in Computer Science (LNCS), Volume 6360/2010, 52-63.

We look forward to receiving any comments you may have,

Best Regards

Marco Turchi, Josef Steinberger, Mijail Kabadjov and Ralf Steinberger


* European Commission - Joint Research Centre (JRC)*
URL - Applications: http://emm.jrc.it/overview.html
<http://emm.jrc.it/overview.html> URL - The science behind them: 
http://langtech.jrc.ec.europa.eu/
T.P. 267, Via Fermi 2749
21027 Ispra (VA), Italy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.elsnet.org/pipermail/elsnet-list/attachments/20100930/cb873dab/attachment-0001.htm 


More information about the Elsnet-list mailing list