Corpus https://clic.ub.edu/corpus/en en AnCora https://clic.ub.edu/corpus/ancora <span property="schema:name" class="field field--name-title field--type-string field--label-hidden">AnCora</span> <span rel="schema:author" class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="/corpus/en/user/1" typeof="schema:Person" property="schema:name" datatype="">admin</span></span> <span property="schema:dateCreated" content="2022-01-26T11:48:50+00:00" class="field field--name-created field--type-created field--label-hidden">Wed, 26/01/2022 - 12:48</span> <div property="schema:text" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p><strong>AnCora</strong> consist of a <strong>Catalan </strong>corpus <strong>(AnCora-CA)</strong> and a <strong>Spanish </strong>corpus <strong>(AnCora-ES)</strong>, each of them of <strong>500,000 words</strong>. The corpora are annotated at different levels:</p> <ul><li>Lemma and Part of Speech</li> <li>Syntactic constituents and functions</li> <li>Argument structure and thematic roles</li> <li>Semantic classes of the verb</li> <li>Denotative type of deverbal nouns</li> <li>Nouns related to WordNet synsets</li> <li>Named Entities</li> <li>Coreference relations</li> </ul><p>AnCora corpus is mainly based on journalist texts. For more information, click <a class="file file--mime-application-pdf file--application-pdf" data-entity-type="file" data-entity-uuid="bce03c7d-db2e-4858-8fab-3dfdf9dd2ad0" filename="ancora-corpus.pdf" href="/corpus/sites/default/files/inline-files/ancora-corpus.pdf" target="_blank">AnCora-corpus</a>.</p> <p style="text-align: justify;">Two verbal lexicons, <strong>AnCora-Verb</strong>, and a lexicon of deverbal nominalizations, <strong>AnCora-Nom</strong>, are also available as the result of this annotation process. The Spanish verbal lexicon consists of 2,647 entries and the Catalan lexicon of 2,143. The Spanish nominal lexicon consists of 1,600 entrie. These lexicons contain the following information:</p> <div class="table-responsive"> <table align="center" border="1" cellpadding="0" cellspacing="0" style="width: 618px;"><tbody><tr><td style="background-color: rgb(255, 255, 255); border-color: rgb(0, 0, 0);"><strong>Ancora-Verb</strong></td> <td style="background-color: rgb(255, 255, 255); border-color: rgb(0, 0, 0);"><strong>Ancora-Nom</strong></td> </tr><tr><td style="background-color: rgb(255, 255, 255); border-color: rgb(0, 0, 0);"> <p style="margin-left: 40px;">Semantic class</p> <p style="margin-left: 40px;">Subcategorization</p> <p style="margin-left: 40px;">Argument Structure and Thematic Roles</p> <ul></ul></td> <td style="background-color: rgb(255, 255, 255); border-color: rgb(0, 0, 0);"> <p style="margin-left: 40px;">Denotative type</p> <p style="margin-left: 40px;">WordNet Synset</p> <p style="margin-left: 40px;">Argument Structure and Thematic Roles</p> <p style="margin-left: 40px;">Verb from which is derived</p> <ul></ul></td> </tr></tbody></table></div> <p style="text-align: justify;">The annotators of AnCora are:</p> <p style="text-align: justify;"><span style="font-size: 10px;">Joan Aparicio Mena, Oriol Borrega Cepa, Isabel Briz Hernández, Núria Bufí Cabrol, Montserrat Civit Torruella, María Jesús Díaz Cabrera, Silvia Garcia Casaseca, Raquel Hernández Bitinas, Marina Lloberes Salvatella, Raquel Marcos, Difda Monterde, Borja Navarro, Aina Peris Morant, Lourdes Puiggròs Casals, Marta Recasens Potau, Alba Rodríguez, Bàrbara Soriano Bautista, Rita Zaragoza Jové.</span></p> <p style="text-align: justify;">The annotators of AnCora-Verb and AnCora-Nom are:</p> <p style="text-align: justify;"><span style="font-size: 10px;"><span style="font-size: 10px;">Joan Aparicio Mera, Ester Arias Valor, Oriol Borrega Cepa, Patricia fernández, Difda Monterde, Aina Peris Morant, Lourdes Puiggrós Casals, Marta Recasens Potau, Bàrbara Soriano Bautista, Rita Zaragza Jové.</span></span></p> </div> Wed, 19 Jan 2022 11:16:06 +0000 admin 2 at https://clic.ub.edu/corpus