[Elsnet-list] [Mt-list] CFP: ACL 2005 Workshop "Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization"

Alon Lavie alavie at cs.cmu.edu
Sun Mar 13 08:39:14 CET 2005


                  CALL FOR PAPERS  
  
       Intrinsic and Extrinsic Evaluation Measures 
             for MT and/or Summarization 
  
            Workshop at the Annual Meeting of 
     the Association of Computational Linguistics (ACL 2005) 
  
               Ann Arbor, Michigan 
                   June 29, 2005   
  
            http://www.isi.edu/~cyl/MTSE2005/ 

This one-day workshop will focus on the challenges that the MT 
and summarization communities face in developing valid and useful 
evaluation measures. Our aim is to bring these two communities 
together to learn from each other's approaches. 
  
In the past few years, we have witnessed---in both MT and 
summarization evaluation---the innovation of ngram-based intrinsic 
metrics that automatically score system-outputs against 
human-produced reference documents (e.g., IBM's BLEU and ISI/USC's 
counterpart ROUGE).  Similarly, there has been renewed interest in 
user applications and task-based extrinsic measures in both 
communities (e.g., DUC'05 and TIDES'04). Most recently, evaluation 
efforts have tested for correlations to cross-validate 
independently derived intrinsic and extrinsic assessments of 
system-outputs with each other and with human judgments on output, 
such as accuracy and fluency. 
  
The concrete questions that we hope to see addressed in this 
workshop include, but are not limited to: 
  
- How adequately do intrinsic measures capture the variation 
  between system-outputs and human-generated reference documents 
  (summaries or translations)? What methods exist for calibrating 
  and controlling the variation in linguistic complexity and content 
  differences in input test-sets and reference sets?  
  How much variation exists within these constructed sets? 
  How does that variation affect different intrinsic measures? 
  How many reference documents are needed for effective scoring? 
  
- How can intrinsic measures go beyond simple n-gram matching, to 
  quantify the similarity between system-output and human-references? 
  What other features and weighting alternatives lead to better 
  metrics for both MT and summarization?  How can intrinsic measures 
  capture fluency and adequacy? Which types of new intrinsic metrics 
  are needed to adequately evaluate non-extractive summaries and 
  paraphrasing (e.g.,interlingual) translations? 
  
- How effectively do extrinsic (or proxy extrinsic) measures capture the

  quality of system output, as needed for downstream use in human tasks,

  such as triage (document relevance judgments), extraction (factual 
  question answering), and report writing; and in automated tasks, 
  such as filtering, information extraction, and question-answering? 
  For example, when is an MT system good enough that a summarization 
  system benefits from the additional information available in 
  the MT output? 
  
- How should metrics for MT and summarization be assessed and 
  compared? What characteristics should a good metric possess? 
  When is one evaluation method better than another? What are the 
  most effective ways of assessing the correlation testing and 
  statistical modeling that seek to predict human task performance 
  or human notions of output quality (e.g., fluency and adequacy) 
  from "cheaper" automatic metrics? How reliable are human judgments?  
  
Anyone with an interest in MT or summarization evaluation research or 
in issues pertaining to the combination of MT and summarization is 
encouraged to participate in the workshop. We are looking for research 
papers on the aforementioned topics, as well as position papers that 
identify limitations in current approaches and describe promising 
future research directions. 
 
SHARED DATA SETS 
To facilitate the comparison of different measures during the 
workshop, we will be making available data sets in advance for 
workshop participants to test their approaches to evaluation. 
For the details for accessing the data sets, please go to workshop's
website at http://www.isi.edu/~cyl/MTSE2005. 
  
WORKSHOP FORMAT 
The workshop will include presentations of research papers 
and short reports, an invited report on the TIDES 2005 Multi-lingual, 
multi-document summarization evaluation, and significant discussion 
time to compare results of different researchers. The workshop 
will conclude with a panel of invited discussants to address future 
research directions. 
  
TARGET AUDIENCE 
The topic of this workshop should be of significant interest to 
the entire MT and Summarization research communities, and also to 
commercial developers of MT and Summarization systems.  It should be 
of particular interest to the program managers and participants of the 
MT and Summarization programs funded by the US Government, where 
common evaluations are an integral part of the research program.  
 
SUBMISSION INFORMATION 
Submissions will consist of regular full papers, reports on evaluations 
using shared data sets, and position papers, formatted following the 
ACL 2005 guidelines. Details for submission will be posted on the 
workshop website. The submission and review processes will be 
handled electronically. 
 
IMPORTANT DATES   
All submissions due:             Mon, May  2, 2005
Notification:                    Sun, May 22, 2005   
Camera-ready papers due:         Wed, June 1, 2005 
 
ORGANIZERS 
  
Jade Goldstein, US Department of Defense, USA
Alon Lavie, Language Technologies Institute, CMU, USA
Chin-Yew Lin, Information Sciences Institute, USC, USA
Clare Voss, Army Research Laboratory, USA 
  
PROGRAM COMMITTEE 
  
Yasuhiro Akiba (ATR, Japan) 
Leslie Barrett (TransClick, USA) 
Bonnie Dorr (U Maryland, USA) 
Tony Hartley (U Leeds, UK)  
John Henderson (MITRE, USA)  
Chiori Hori (LTI CMU, USA) 
Eduard Hovy (ISI/USC, USA) 
Doug Jones (MIT Lincoln Laboratory, USA) 
Philipp Koehn (CSAIL MIT, USA) 
Marie-Francine Moens (Katholieke Universiteit, Leuven, Belgium) 
Hermann Ney (RWTH Aachen, Germany) 
Franz Och (Google, USA) 
Becky Passonneau (Columbia U, NY USA) 
Andrei Popescu-Belis  (ISSCO/TIM/ETI, U Geneva, Switzerland) 
Dragomir Radev (U Michigan, USA) 
Karen Sparck Jones (Computer Laboratory, Cambridge U, UK) 
Simone Teufel (Computer Laboratory, Cambridge U, UK) 
Nicola Ueffing (RWTH Aachen, Germany) 
Hans van Halteren (U Nijmegen, The Netherlands) 
Michelle Vanni (ARL, USA) 
Dekai Wu (HKUST, Hong Kong) 


_______________________________________________
Mt-list mailing list


More information about the Elsnet-list mailing list