Coding of sound files

For individual words, a subset of phenomena was coded (see individual corpus descriptions for further details). The coding scheme is phonemic and specifies the following linguistic variables: target phoneme/grapheme; preceding and following phoneme/grapheme; stress; and position in the word. A complete list of the values for each of these variables is available in [to be added].

The following coding conventions were used (adapted from the Romance Phonetics Database):

  1. Coding of 'Target phoneme' / 'Preceding phoneme' / 'Following phoneme': coding is based on phonemic (i.e. dictionary) transcription. For example, French <r> is coded /R/ even though non-uvular fricative realizations may be encountered (NB: SAMPA symbols [SAMPA_CLAD.pdf] are used here and elsewhere). Similarly, Spanish <b> is coded phonemically /b/ in spite of the possibility of approximant realizations in some environments.

  2. Pauses between target phoneme and preceding/following phoneme: If the target sound is preceded or followed by a pause in the particular sound file, 'Preceding phoneme'/'Following phoneme' is coded as '#' (pause).

  3. Coding 'Target / Preceding / Following Grapheme':


  4. Stress: stress is coded based on (i) the syllable in which the sound occurs and (ii) the location of this syllable vis-a-vis the syllable bearing main (tonic) stress (in the examples below, the syllable for which the coding is valid is in bold and the stressed syllable is underlined).


  5. Position in word: this is based on the phonetic transcription and not the orthographic form.


The Cross-Language Articulatory Database (CLAD) @ CHASS / University of Toronto Copyright © 2026 University of Toronto