[Elsnet-list] EACL workshop: Web as Corpus

Adam Kilgarriff adam at lexmasterclass.com
Thu Nov 17 16:31:14 CET 2005

EACL 2006 Workshop on the




April 4 2006, Trento, Italy




The EACL 2006 Workshop on the Web as Corpus will be hosted in

conjunction with the 11th Conference of the European Chapter of the

Association for Computational Linguistics that will take place April

3-7, 2006, in Trento, italy.




Despite the fact that a growing body of work has shown that the World

Wide Web is a mine of language data of unprecedented richness and ease

of access (see, e.g., the papers collected in Kilgarriff and

Grefenstette, 2003), many fundamental issues about the viability and

exploitation of the Web as a linguistic corpus are just starting to be

tackled, ranging from Web frequency distributions and registers, to

efficient handling of massive data sets, to copyright.  Research on

the Web as corpus is currently at a very exciting stage: increasing

evidence points to the enormous potential of the Internet as a source

of linguistic data, but we are still far from a working, fully-fledged

linguists' search engine.


We invite submissions which:


* describe Web corpus collection projects, or modules for one part of

  the process (crawling, filtering, language-id, tokenising,

  lemmatising, POS-tagging, indexing, ...)


* explore characteristics of Web data, from a linguistics/NLP



* use crawled Web data for NLP purposes.


Preference will be given to projects where Web data are downloaded and

processed directly, rather than via search engine interfaces.





Authors are invited to submit full papers on original, unpublished

work in the topic area of this workshop. Submissions should follow the

two-column format of ACL proceedings and should not exceed eight (8)

pages, including references. We strongly recommend the use of ACL

LaTeX or Microsoft Word style files tailored for this year's

conference available at




Papers must conform to the official EACL-06 style guidelines, and we

reserve the right to reject submissions that do not conform to these

styles, including font size restrictions.  Submissions should be in PDF

format and must include all fonts, so that the paper will print (not

just view) anywhere.


Please submit your paper no later than January 6, 2006. Details on the

submission procedure will be available soon on the workshop Website.


Each submission will be reviewed at least by two members of the

programme committee. Accepted papers will be published in the workshop



Dual submissions to the main EACL 2006 conference and this workshop

are allowed; if you submit to the main session, do indicate this when

you submit to the workshop, and specify your EACL submission reference

number, for administrative ease.  If your paper is accepted for the

main session, you should withdraw your paper from the workshop upon

notification by the main session.





Information on registration and registration fees will be provided at

the conference web page.





January 6, 2006   - Deadline for workshop papers 

January 27, 2006  - Notification of acceptance 

February 10, 2006 - Camera-ready papers due 

April 4, 2006     - Workshop


As the schedule is extremely tight, deadline extensions are NOT






Marco Baroni (co-chair)

Silvia Bernardini

Massimiliano Ciaramita

Stefan Evert

William H. Fletcher

Gregory Grefenstette

Frank Keller

Adam Kilgarriff (co-chair)

Mirella Lapata

Anke Luedeling

Drago Radev

Philip Resnik

Serge Sharoff





Workshop web page 



Conference web page 



EACL 2006 Workshops site







Adam Kilgarriff

Lexical Computing Ltd

71 Freshfield Road, Brighton BN2 0BL, UK

adam -at-lexmasterclass . com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://stratus.let.uu.nl/pipermail/elsnet-list/attachments/20051117/abdab743/attachment-0001.html

More information about the Elsnet-list mailing list