Coding of sound files
For individual words, a subset of phenomena was coded (see individual corpus descriptions for further details). The coding scheme is phonemic and specifies the following linguistic variables: target phoneme/grapheme; preceding and following phoneme/grapheme; stress; and position in the word. A complete list of the values for each of these variables is available in [to be added].
The following coding conventions were used (adapted from the Romance Phonetics Database):
-
Coding of 'Target phoneme' / 'Preceding phoneme' / 'Following phoneme': coding is based on phonemic (i.e. dictionary) transcription. For example, French <r> is coded /R/ even though non-uvular fricative realizations may be encountered (NB: SAMPA symbols [SAMPA_CLAD.pdf] are used here and elsewhere). Similarly, Spanish <b> is coded phonemically /b/ in spite of the possibility of approximant realizations in some environments.
-
Pauses between target phoneme and preceding/following phoneme: If the target sound is preceded or followed by a pause in the particular sound file, 'Preceding phoneme'/'Following phoneme' is coded as '#' (pause).
-
Coding 'Target / Preceding / Following Grapheme':
-
Complex graphemes: Some sounds may be represented by two or more graphemes (e.g. French [J]=<gn>, [u]=<ou>; Spanish [rr]=<rr>). In the case of double consonants (e.g. <bb>, <tt>), with the exception of <rr>, coding is based on the first grapheme of the pair (i.e. in the coding, 'Following grapheme' will be the second of the two identical graphemes);
-
Initial/final graphemes: If the grapheme in question is word-initial or word-final, 'Preceding grapheme' / 'Following grapheme' is coded as '#'. The grapheme of the preceding/following word is not indicated.
-
Stress: stress is coded based on (i) the syllable in which the sound occurs and (ii) the location of this syllable vis-a-vis the syllable bearing main (tonic) stress (in the examples below, the syllable for which the coding is valid is in bold and the stressed syllable is underlined).
-
Ante Pre-tonic: two syllables before the main stress (e.g. initial [a] in Spanish ahuecar; [{f] in English afternoon)
-
Pre-tonic: syllable preceding main stress (e.g. [e] in Spanish reloj; [s@] in English severe)
-
Tonic: syllable receiving main stress (e.g. [e] in Spanish pesca; [zI] in English zipper)
-
Post-tonic: syllable following main stress (e.g. Spanish pasado; [fi] in English coffee)
-
Post post-tonic: two syllables following main stress (e.g. [o] in Spanish número; [g@r/] in English hamburger)
-
Position in word: this is based on the phonetic transcription and not the orthographic form.
-
Consonants
-
Initial: all consonants at the beginning of words ([l] in Spanish laca [laka]) including the second or third member of a cluster (e.g. [R] in French trois /tRwa/; [l] in Spanish plato /plato/)
-
Medial: all consonants between two pronounced vowels whether singleton (e.g. [T] in English nothing /nVTIN/) or in clusters (e.g. [vg] in French sauvegarde /sOvgaRd/; [nd] in Spanish candado /kandaDo/)
-
Final: all consonants at the end of words (e.g. [r] in Spanish mar /mar/) including the first member of a cluster (e.g. [R] in French plateforme /platfORm/)
-
Vowels: vowels are coded based on the syllable in which they occur.
-
Initial: first syllable, whether preceded by a consonant (e.g. [a] in French chapeau /Sapo/ or not (e.g. [a] in Spanish ámbar /ambar/)
-
Medial: in words of three syllables or more, any vowel neither in the first or last syllable (e.g. [e] in Spanish ahuecar /awekar/; [I] in English gorilla /g@r/Il@/)
-
Final: last syllable (e.g. [a~] in French demande /d@ma~d/; [i] in Spanish videoclip /biDeoklip/; [I] in English rabbit /r/{bIt/)
The Cross-Language Articulatory Database (CLAD) @ CHASS / University of Toronto Copyright © 2026 University of Toronto