[Elsnet-list] Recognising Textual Entailment Challenge--A Common Benchmark for Semantic Inference: First Announcement and Call for Participation

Oren Glickman oren at glickman.com
Mon Jun 14 13:04:28 CEST 2004


***************************************************************
           Recognising Textual Entailment Challenge
           A Common Benchmark for Semantic Inference

         First Announcement and Call for Participation

          http://www.pascal-network.org/Challenges/RTE/
***************************************************************

INTRODUCTION

Textual entailment recognition is the task of deciding,  given
two  text  fragments,  whether the  meaning  of  one  text  is
entailed  (can  be inferred) from the other  text.  This  task
captures  generically  a broad range of  inferences  that  are
relevant  for multiple applications. For example,  a  Question
Answering  (QA) system has to identify texts that  entail  the
expected answer. Given the question "Who killed Kennedy?", the
text  "the  assassination of Kennedy by  Oswald"  entails  the
expected  answer form "Oswald killed Kennedy".  Similarly,  in
Information Retrieval (IR) the concept or relation denoted  by
a  query expression should be entailed from relevant retrieved
documents.    In  multi-document  summarization  a   redundant
sentence or expression, to be omitted from the summary, should
be   entailed  from  other  expressions  in  the  summary.  In
Information Extraction (IE) entailment holds between different
text  variants that express the same target relation.  And  in
Machine Translation evaluation a correct translation should be
semantically equivalent to the gold standard translation,  and
thus  both translations have to entail each other. Thus, in  a
similar  spirit to Word Sense Disambiguation and Named  Entity
Recognition,  which are recognized as generic tasks,  modeling
textual  entailment may consolidate and promote broad research
on applied semantic inference. The PASCAL Challenge introduces
textual  entailment as a common task and evaluation  framework
for  Natural  Language Processing, Information  Retrieval  and
Machine Learning researchers.

TASK DEFINITION

Participants in the evaluation exercise will be provided  with
pairs  of  small  text  snippets (one  or  more  sentences  in
English), which we term Text-Hypothesis (T-H) pairs. The  data
set  will  include over 1000 English T-H pairs from  the  news
domain   (political,  economical,  etc.).  Examples  will   be
manually  tagged for entailment (i.e. whether T entails  H  or
not)   by  human  annotators  and  will  be  divided  into   a
Development  Set (one third of the data) and a Test  Set  (two
thirds of the data). Participating systems will have to decide
for  each  T-H  pair whether T indeed entails H  or  not,  and
results will be compared to the manual gold standard.
The  dataset  will be collected with respect to  several  text
processing   applications,   such   as   question   answering,
information  extraction, information retrieval, multi-document
summarization,    paraphrase    acquisition,    and    machine
translation. Each portion of the dataset will include  typical
T-H  examples that correspond to success and failure cases  of
actual  applications.  The examples will  represent  different
levels  of  entailment reasoning, such as lexical,  syntactic,
morphological and logical.
The  goal  of  the challenge is to provide a first opportunity
for  presenting and comparing possible approaches for modeling
textual  entailment. In this spirit, we aim at an  explorative
rather  than a competitive setting. While participant  results
will  be  reported  there will not be an official  ranking  of
systems. A development set will be released first to  give  an
early impression of the different types of test examples.  The
test  set  will  be released two months prior  to  the  result
submission  date. Reported systems are expected to be  generic
in  nature,  but  given  the  short  time  frame  it  will  be
acceptable  to  run generic learning or knowledge  acquisition
procedures  specifically for the lexical/syntactic  constructs
in the test set.


EXAMPLES

-----------------------------------------------------------
                      TRUE

Eyeing  the  huge market potential, currently led  by  Google,
Yahoo  took  over  search company Overture Services  Inc  last
year.

Yahoo bought Overture.
-----------------------------------------------------------
                      FALSE

Microsoft's  rival  Sun Microsystems Inc. bought  Star  Office
last  month and plans to boost its development as a  Web-based
device running over the Net on personal computers and Internet
appliances.

Microsoft bought Star Office.
-----------------------------------------------------------
                      FALSE

The   National  Institute  for  Psychobiology  in  Israel  was
established in May 1971 as the Israel Center for Psychobiology
by Prof. Joel.

Israel was established in May 1971.
-----------------------------------------------------------
                      TRUE

Since  its  formation in 1948, Israel fought  many  wars  with
neighboring Arab countries.

Israel was established in 1948.
-----------------------------------------------------------
                      FALSE

The  market  value of u.s. overseas assets exceeds their  book
value.

The  market  value of u.s. overseas assets equals  their  book
value.
-----------------------------------------------------------


IMPORTANT DATES (may be subject to change)

** June 1,  2004: Registration Opens and Guidelines Available
** June 30, 2004: Release of the Development Set
** July 31, 2004: Release of the Test Set
** September 30, 2004: Deadline for Participants' Submissions
** October 5,  2004: Release of individual results
** October 20, 2004: Submission   of   participants' reports
** November 8-10, 2004: PASCAL Challenges Workshop
                         (in Southampton, U.K.)


ORGANIZING COMMITTEE

Ido Dagan (Bar Ilan University, Israel)
Oren Glickman (Bar Ilan University, Israel)
Bernardo   Magnini   (ITC-irst, Italy)

-----------------------------------------------------------

For  registration, further information and inquiries  -  visit
the      challenge      web      site      (http://www.pascal-
network.org/Challenges/RTE/)   or   contact   Oren    Glickman
<glikmao at cs.biu.ac.il>.

-----------------------------------------------------------




More information about the Elsnet-list mailing list