[Elsnet-list] Greek named-entity recognizer (version 2) freely available

Ion Androutsopoulos ion at aueb.gr
Thu Apr 19 17:47:26 CEST 2007


We are pleased to announce that the second version of our Greek 
named-entity recognizer is now freely available from 
<http://www.aueb.gr/users/ion/software/GREEK_NERC_v2.tar.gz>. Apart from 
temporal expressions and person names, the recognizer now also supports 
organization names.

The recognizer was developed in the final-year undergraduate project 
described in the following report (in Greek):

Xenofon Vasilakos, "Named entity recognition and categorization in Greek 
texts with Support Vector Machines", Department of Informatics, Athens 
University of Economics and Business, 2006. Available from: 
<http://www.aueb.gr/users/ion/docs/vassilakos_final_report.pdf>.

The system is also described in the following paper (in English):

G. Lucarelli, X. Vasilakos and I. Androutsopoulos, "Named entity 
recognition in Greek texts with an ensemble of SVMs and active learing". 
Accepted for publication by the International Journal of AI Tools, World 
Scientific. Pre-print available from: 
<http://www.aueb.gr/users/ion/publications.html>.

The system is an extension of the recognizer that was developed in the 
following MSc thesis (in Greek):

G. Lucarelli, "Named entity recognition and categorization in Greek 
texts", MSc thesis, Department of Informatics, Athens University of 
Economics and Business, 2005. Available from: 
<http://www.aueb.gr/users/ion/docs/lucarelli_msc_final_report.pdf>.

The following paper (in English) is a summary of the MSc thesis:

G. Lucarelli and I. Androutsopoulos, "A Greek named-entity recognizer 
that uses Support Vector Machines and active learning". Proceedings of 
the 4th Hellenic Conference on Artificial Intelligence (SETN 2006), 
Heraklion, Crete, Greece, 2006. Available from: 
<http://www.aueb.gr/users/ion/docs/setn2006_paper.pdf>.

The software identifies temporal expressions, person names and 
organization names in Greek texts, using semi-automatically produced 
regular expression patterns (for temporal expressions) and an ensemble 
of Support Vector Machines (SVMs) for person and organization names. It 
includes a sentence splitter, which also employs an SVM. The software of 
the named-entity recognizer (and sentence splitter) is released under 
the GNU General Public License. It requires LIBSVM, which is available 
from: <http://www.csie.ntu.edu.tw/~cjlin/libsvm/>. For convenience, a 
copy of LIBSVM is included in the recognizer's software; please read 
LIBSVM's copyright note (included).

The named-entity recognizer (including the sentence splitter) is a Java 
application. You need Java 1.6 or later to use it. The software is 
currently configured to be used as an MS Windows application. It should 
be possible to use the software on Unix/Linux machines, but this 
requires recompiling the C++ code of LIBSVM.

PLEASE NOTE THAT THE NAMED-ENTITY RECOGNIZER IS A RESEARCH PROTOTYPE. IT 
IS PROVIDED WITH ABSOLUTELY NO GUARANTEE AND ABSOLUTELY NO SUPPORT!   

Xenofon Vasilakos, Giorgio Lucarelli and Ion Androutsopoulos
Department of Informatics
Athens University of Economics and Business, Greece
<http://www.cs.aueb.gr/>



More information about the Elsnet-list mailing list