The LSC Corpus

In 2007, the Institut d’Estudis Catalans, the Federació de Persones Sordes de Catalunya, the Universitat Pompeu Fabra, the Fundació Barcelona Media and Linguamón undertook a collaborative initiative to create a reference corpus of LSC. However, at that time, the lack of financing did not allow to carry out the project. In 2012, the Institut d’Estudis Catalans offered the possibility of initiating a first corpus constitution project with a preparatory phase and a pilot test, which was possible thanks to the support of the Government of Catalonia’s Directorate for Language Policy ( Departament de Plolítica Lingüística) and funding from Obra Social “La Caixa”. One year after starting the pilot project, it became clear that the LSC corpus project was possible, and the pilot project became the LSC Corpus Project. Since then, and thanks to the continued support of the Government of Catalonia’s Directorate for Language Policy ( Departament de Política Lingüística de Catalunya) and the funding from Obra Social “La Caixa”, we have been able to record signers from all over the LSC area.

The main objectives of the LSC Corpus Project are the following. First, to document the current state of LSC through a broad and representative sample of different types of signed discourse. Second, to make a basic and descriptive annotation and put the corpus online as accessible material that can be used for research, educational or consultation purposes by users. Third, to offer a useful tool for both theoretical research, since there will be access to a set of annotated data that will allow descriptions and analyses to better understand the grammar and lexicon of LSC, as well as applied, since it will serve as point of reference in the creation of dictionaries and databases or in automatic translation programs. This corpus will be an important contribution that will provide the basis for the standardization of LSC, both from a linguistic point of view, and with a view to establishing shared methodological criteria, by setting annotation conventions.

During the preparatory stage, the necessary elicitation materials were created; the linguistic profiles of signers to be recorded were specified in order for the corpus to be representative, taking into account aspects such as age, sex, geographical distribution, schooling, etc. Moreover, we did fieldwork to determine the data collection points, through associations and personal contacts within the Deaf Community. The annotation criteria and the technical requirements for the constitution of the corpus, such as the server, the coding programs, the web interface, etc., were also determined. In addition, we developed a dissemination and information plan for the corpus project during execution. Finally, we developed an ethical protocol for the collection, processing, storage, and distribution of the recorded data.

The execution process was divided into three phases. In the first one, for data collection, the city or town in Catalonia where the recordings were to be made was determined. The native signers and the deaf interviewer, who used the elicitation materials and guided the topics of conversation, were selected. The second phase consisted in annotating the signed discourse in written Catalan, using the ELAN program. The third phase consists in the revision of the annotations and the publication of the recordings, together with their annotation.