AnCora consist of a Catalan corpus (AnCora-CA) and a Spanish corpus (AnCora-ES), each of them of 500,000 words. The corpora are annotated at different levels:

  • Lemma and Part of Speech
  • Syntactic constituents and functions
  • Argument structure and thematic roles
  • Semantic classes of the verb
  • Denotative type of deverbal nouns
  • Nouns related to WordNet synsets
  • Named Entities
  • Coreference relations

AnCora corpus is mainly based on journalist texts. For more information, click AnCora-corpus.

Two verbal lexicons, AnCora-Verb, and a lexicon of deverbal nominalizations, AnCora-Nom, are also available as the result of this annotation process. The Spanish verbal lexicon consists of 2,647 entries and the Catalan lexicon of 2,143. The Spanish nominal lexicon consists of 1,600 entrie. These lexicons contain the following information:

Ancora-Verb Ancora-Nom

Semantic class


Argument Structure and Thematic Roles

    Denotative type

    WordNet Synset

    Argument Structure and Thematic Roles

    Verb from which is derived

      The annotators of AnCora are:

      Joan Aparicio Mena, Oriol Borrega Cepa, Isabel Briz Hernández, Núria Bufí Cabrol, Montserrat Civit Torruella, María Jesús Díaz Cabrera, Silvia Garcia Casaseca, Raquel Hernández Bitinas, Marina Lloberes Salvatella, Raquel Marcos, Difda Monterde, Borja Navarro, Aina Peris Morant, Lourdes Puiggròs Casals, Marta Recasens Potau, Alba Rodríguez, Bàrbara Soriano Bautista, Rita Zaragoza Jové.

      The annotators of AnCora-Verb and AnCora-Nom are:

      Joan Aparicio Mera, Ester Arias Valor, Oriol Borrega Cepa, Patricia fernández, Difda Monterde, Aina Peris Morant, Lourdes Puiggrós Casals, Marta Recasens Potau, Bàrbara Soriano Bautista, Rita Zaragza Jové.