Skip to main content


CesCa: El Català Escolar Escrit a Catalunya

The aim of this project is to provide the educative community with a fundamental tool to know pupils’ linguistic usage. It is a reference corpus of the written scholar Catalan in Catalonia which also provides data derived from its processing.

Cesca corpus

It contains 2.426 processed texts that have been produced by children between the last year of childhood education (P5) and the last year of obligatory education (4th ESO). They have been collected from 31 educative centers of different Catalan regions.

The corpus contains vocabulary produced for five lexical fields:

  • Food names
  • Garments
  • Natural phenomena
  • Free-time activities
  • Personality features

You will find organized information about:

  • Words frequency of usage: forms and lemmas
  • Forms and lemmas relationships
  • Lemmas distribution by scholar level. For how long the informants have been speaking Catalan and their mother tongue.