Sign Language Corpora

A corpus is a representative collection of samples of a language in machine-readable format, used to study the type and frequency of linguistic units. In addition, it constitutes a broad representation of the language and its geographical, register and generational variants. As for sign language corpora, they are characterized by being collections of annotated videos that contain written material aligned with the main data in the corresponding sign language. They also constitute a representative sample of the language.

The main benefit of this type of corpus is to preserve signed language as an important part of the social and linguistic heritage of a society. It is important to point out that the initiative that we present here has the valuable precedent of similar projects for the elaboration of corpora of other European sign languages that are faced with comparable deficiencies. Thus, in the Netherlands, the United Kingdom, Australia, Germany, Ireland and Italy, corpus projects have already been established for the respective sign languages of the country and are in the construction, annotation, and finalization phase, depending on the case. The experience accumulated in these projects, to which we have access through existing collaborations with some of the directors and coordinators, allows us to advance even more solidly and efficiently in the constitution of the LSC Corpus based on reliable criteria.

Els corpus ens permeten recollir les unitats lèxiques d’una llengua. Del corpus en podem obtenir les propietats gramaticals de les unitats lèxiques que hi apareixen, ja que ens aporta informació sobre el context sintàctic, la fonologia, la morfologia, la semàntica i la pragmàtica d’aquestes unitats. Tota aquesta informació es pot recollir en una base de dades lèxica per classificar-la i organitzar-la. Alguns dels projectes de corpus ja estan prou avançats com per haver-ne pogut obtenir una base de dades lèxica. No totes les bases de dades mostren el mateix tipus d’informació, de manera que trobem des d’aquelles que contenen la definició, la fonologia, la morfologia i fins i tot aspectes pragmàtics dels signes, fins aquelles que es troben en un estadi més inicial i només mostren com s’articula el signe i la glossa corresponent.

-Corpus de la llengua de signes britànica (BSL)

-Corpus de la llengua de signes dels Països Baixos (NGT)

-Corpus de la llengua de signes australiana (AUSLAN)

-Corpus de la llengua de signes alemanya (DGS)

-Corpus de la llengua de signes de la Bèlgica francòfona (LSBF)

-Corpus de la llengua de signes polonesa (PJM)

-Bases de dades de la llengua de signes espanyola (LSE)

-Base de dades de la llengua de signes americana (ASL)