Speech tags


Section 8 of the TEI Guidelines deals with transcriptions of speech, and begins with the disclaimer that 'the present proposals are not intended to support unmodified every variety of research undertaken upon spoken material now or in the future; some discourse analysts, some phonologists, and doubtless others may wish to extend the scheme presented here to express more precisely the set of distinctions they wish to draw in their transcriptions'.

DECTE uses a small selection of the tags provided in this section of the TEI Guidelines:

  • <u>, the TEI 'utterance' tag. The TEI guidelines define <u> as the element that 'contains a stretch of speech usually preceded and followed by silence or by a change of speaker' (TEI Guidelines, 8.3.1).
  • In DECTE it is used to demarcate speaker utterances in all the levels of representation contained in the corpus texts.
  • In each <u> element, the @who attribute identifies the speaker to whom the utterance is attributed, e.g. <u who="#informantTLSG01">.
  • Each speaker in the corpus has a unique @who code, consisting of interviewer or informant (as appropriate) followed by the entity name for the interview in question, as recorded in the list defined by interviews.ent in the DTD.
  • Where an interview involves more than one informant, which is invariably the case in the PVC interviews of the 1990s and the NECTE2 interviews of the 2000s, and also occasionally in the TLS interviews of the 1970s, a final a, b, c, etc is appended as necessary, e.g. <u who="#informantPVC03a">, <u who="#informantPVC03b">, and so on.
  • <pause/> 'marks a pause either between or within utterances' (TEI Guidelines, 8.3.2).
  • The guidelines indicate that the <vocal> element can be used to mark 'any vocalized but not necessarily lexical phonemenon, for example voiced pauses, non-lexical back-channels, etc', with an embedded <desc> element being used to provide a description or representation of the vocalization in question (TEI Guidelines, 8.3.3).
  • In DECTE, this approach is used to denote laughter: <vocal><desc>laughter</desc></vocal>.
  • <incident> 'marks any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication' (TEI Guidelines, 8.3.3).
  • In DECTE, this element is used to note any point in a speaker's utterance that is interrupted or overlapped by the utterance of another speaker (i.e. the next utterance recorded in the orthographic transcription). The full tag for an interruption or overlap point of this kind is <incident><desc>interruption</desc></incident>.