AnCora

AnCora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:

Lemma and Part of Speech
Syntactic constituents and functions
Argument structure and thematic roles
Semantic classes of the verb
Denotative type of deverbal nouns
Nouns related to WordNet synsets
Named Entities
Coreference relations

AnCora corpus is mainly based on journalist texts. For more information, click AnCora-corpus.

Two verbal lexicons, AnCora-Verb, and a lexicon of deverbal nominalizations, AnCora-Nom, are also available as the result of this annotation process. The Spanish verbal lexicon consists of 2,647 entries and the Catalan lexicon of 2,143. The Spanish nominal lexicon consists of 1,600 entrie. These lexicons contain the following information:

Ancora-Verb

Ancora-Nom

Semantic class

Subcategorization

Argument Structure and Thematic Roles

Denotative type

WordNet Synset

Argument Structure and Thematic Roles

Verb from which is derived

The annotators of AnCora are:

Joan Aparicio Mena, Oriol Borrega Cepa, Isabel Briz Hernández, Núria Bufí Cabrol, Montserrat Civit Torruella, María Jesús Díaz Cabrera, Silvia Garcia Casaseca, Raquel Hernández Bitinas, Marina Lloberes Salvatella, Raquel Marcos, Difda Monterde, Borja Navarro, Aina Peris Morant, Lourdes Puiggròs Casals, Marta Recasens Potau, Alba Rodríguez, Bàrbara Soriano Bautista, Rita Zaragoza Jové.

The annotators of AnCora-Verb and AnCora-Nom are:

Joan Aparicio Mera, Ester Arias Valor, Oriol Borrega Cepa, Patricia fernández, Difda Monterde, Aina Peris Morant, Lourdes Puiggrós Casals, Marta Recasens Potau, Bàrbara Soriano Bautista, Rita Zaragza Jové.