SemEval 2010

Coreference Resolution in Multiples Languages

The task is concerned with intra-document coreference resolution for six different languages: Catalan, Dutch, English, German, Italian and Spanish. The core of the task is to identify which noun phrases (NPs) in a text refer to the same discourse entity.
Data is provided for both statistical training and evaluation, which extract the coreference chains from manually annotated corpora: the AnCora corpora for Catalan and Spanish, the OntoNotes and ARRAU corpora for English, the TüBa-D/Z for German, the KNACK corpus for Dutch, and the LiveMemories corpus for Italian, additionally enriched with morphological, syntactic and semantic information (such as gender, number, constituents, dependencies, predicates, etc.).

Pàgina web: http://stel3.ub.edu/semeval2010-coref/

Descàrrega dels datasets: task01.train_.v1.0_0.zip