[Elsnet-list] INEX NLP Task 2006, Call for Participation

Xavier Tannier tannier at emse.fr
Fri Mar 3 14:20:17 CET 2006


INEX NLP Task, call for participation

  Natural Language Interfaces for XML Information Retrieval



XML Retrieval
Content-oriented XML retrieval has been receiving
increasing interest fuelled by the widespread use of the
eXtensible Markup Language (XML), as a standard
document format. The continuous growth in XML data
sources is matched by increasing efforts in the development
of XML retrieval systems, which aim at exploiting the
available structural information in documents to implement
a more focused retrieval strategy and return document
components, the so-called XML elements - instead of
complete documents - in response to a user query.
Implementing this, more focused, retrieval paradigm
means that an XML retrieval system needs not only to
find relevant information in the XML documents, but
also determine the appropriate level of granularity to be
returned to the user. In addition, the relevance of a retrieved
component is dependent on meeting both content and
structural conditions.

NLP in XML Retrieval
For the third year, the INitiative for the Evaluation of XML
Retrieval (INEX) investigates the idea of using the specifics
of XML retrieval to allow users to address content and
structural needs intuitively via natural language queries.

* Like in traditional information retrieval, the user need is loose,
linguistic variations are frequent, answers are a rank list of
relevant elements.
* Like in database querying, structure is of importance and a
simple list of keywords cannot be a sufficient query. Structured
query languages have been developed, but appear to be difficult
to use.
* Furthermore, the size of the unit of information is variable
and elements overlap in the documents.

Therefore developing natural language interfaces for XML-IR
is a separate research domain requiring its own innovative solutions.

The ultimate goal is to design and build software that will analyse,
understand, and generate results in response to queries that
humans express naturally. The primary objective of retrieval
would be to interpret both structural and content constraints of
an information need expressed in a natural language query (as
opposed to the rigid syntax of XPath). The IR system would not
only select and rank suitable documents, but select the more
suitable XML elements within documents that best satisfy the
information need (both accurately and concisely).

2006 INEX campaign uses English Wikipedia collection.
Queries will concern any content or structural elements that
can be find in this set of documents, will be written both
in English and in NEXI, a formal structured query language.

in English: "Find lists of air battles in article dealing with World War II"
in NEXI:    //article[about("World War II")]//list[about(. air battle)]

NLP Tasks

There are two distinct tasks in the NLP track in 2006 - NLQ2NEXI and NLP.

* NLQ2NEXI - a simplified task that does not require participants
to index the collection or to implement a search engine. Instead, NLQ2NEXI
requires the translation of a natural language query, provided
in the element of a topic, into a formal INEX query. The submissions
of all participants will be evaluated by a running the titles on
search engine/s that can operate on NEXI expressions. The
objective is to compare the results obtained with natural language
queries (translated into NEXI) with the results that are obtained
by the same search engine/s when using the original NEXI expressions.
This task is designed to allow new participants with NLP expertise to
join the INEX workshop without the need to develop a search engine.

* NLQ - this task has no restrictions on the use of any NLP technique
to interpret the queries as they appear in the <description> element
of a topic. Here participants are required to submit retrieval runs,
but enjoy the freedom to implement any NLP techniques in their search
engine. The objective is not only to compare between different NLP
based systems, but to also compare the results obtained with natural
language queries with the results obtained with NEXI queries by any
other system in the Ad-hoc track. We wish to test whether natural
language queries are effective alternatives to formal queries and to
quantify the trade off in performance.

Important Dates
  March 17: Deadline for declaration of intent to participate.
  May 05: Distribution of sets of topics.
  Jul 14: Submission deadline of search results.
  Dec 18-20: Workshop in Schloss Dagstuhl.

Shlomo Geva       s.geva at qut.edu.au
Xavier Tannier    tannier at emse.fr

More information about the Elsnet-list mailing list