# [Elsnet-list] SIGIR 2004 Workshop CFP: Information Retrieval for Question Answering (IR4QA)

Mark Greenwood mark at dcs.shef.ac.uk
Wed Apr 14 13:08:32 CEST 2004

                         Call for Papers

SIGIR'04 Workshop

INFORMATION RETRIEVAL FOR QUESTION ANSWERING (IR4QA)

July 29, 2004, Sheffield, UK

Open domain question answering has become a very active research area
over the past few years, due in large measure to the stimulus of the
finding *answers* to natural language (NL) questions (e.g. How
tall is the Eiffel Tower?" Who is Aaron Copland?'') from large text
collections. This task stands in contrast to the more conventional IR
task of retrieving *documents* relevant to a query, where the
query may be simply a collection of keywords (e.g. Eiffel Tower",
American composer, born Brooklyn NY 1900, ...'').

Finding answers requires processing texts at a level of detail that
cannot be carried out at retrieval time for very large text
collections. This limitation has led many researchers to propose,
broadly, a two stage approach to the QA task. In stage one a subset of
query-relevant texts are selected from the whole collection.  In stage
two this subset is subjected to detailed processing for answer
extraction. To date stage one has received limited explicit attention,
despite its obvious importance -- performance at stage two is bounded
by performance at stage one.  The goal of this workshop is to correct
this situation, and, hopefully, to draw attention of IR researchers to
the specific challenges raised by QA.

A straightforward approach to stage one is to employ a conventional IR
engine, using the NL question as the query and with the collection
indexed in the standard manner, to retrieve the initial set of
candidate answer bearing documents for stage two.  However, a number
of possibilities arise to optimise this set-up for QA, including:
o preprocessing the question in creating the IR query;
o preprocessing the collection to identify significant information that
can be included in the indexation for retrieval;
o adapting the similarity metric used in selecting documents;
o modifying the form of retrieval return, e.g. to deliver passages
rather than whole documents.

For this workshop, we solicit papers that address any aspect of how
this first, retrieval stage of QA can be adapted to improve overall
system performance. Possible topics include, but are not limited to:
o parametrizations/optimizations of specific IR systems for QA
o studies of query formation strategies suited to QA
o different uses of IR for factoid vs. non-factoid questions
o utility of term matching constraints, e.g. term proximity, for QA
o analyses of passage retrieval vs full document retrieval for QA
o analyses of boolean vs ranked retrieval for QA
o impact of IR performance on overall QA performance
o named entity preprocessing of questions or collections
o corpus preprocessing to create corpus-specific thesauri for question
expansion
o evaluation measures for assessing IR for QA

The workshop will include paper presentations and discussion.  All
those wishing to make a presentation should submit a 5-8 page position
paper; other attendees may submit a short abstract on why this topic
is of interest to them. The papers should describe recent work and may
be preliminary in nature.  The programme committee will arrange the
presentations and discussion based on the quality of submissions and
expressed interests of the attendees, and may invite other
presentations as well. See http://www.sigir.org/sigir2004 for further
details.

Important Dates
===============

Position paper submission:    June 7
Final papers due:             July 6
Workshop:                     July 29
\end{tabbing}

Submission Instructions
=======================

Position papers should be no more than 4000 words (5-8 pages). The
standard ACM conference style is recommended (see:
http://www.acm.org/sigs/pubs/proceed/template.html). Submissions must
be sent electronically in PDF or PostScript format to:

Rob Gaizauskas
R.Gaizauskas at sheffield.ac.uk

Workshop Organizers
===================

Rob Gaizauskas          (University of Sheffield)
Mark Hepple             (University of Sheffield)
Mark Greenwood          (University of Sheffield)

Programme Committee
===================

Charles Clarke          (University of Waterloo)
Sanda Harabagiu         (University of Texas at Dallas)
Eduard Hovy             (University of Southern California)
Jimmy Lin               (Massachusetts Institute of Technology)
Christof Monz           (University of Maryland)
John Prager             (IBM)