Corpus Files




Related Resources

List of texts

The list of texts included in DECTE is a sequence of entity references defined by the interviews.ent file included in the DTD.


&decten1tlsg01; &decten1tlsg02; &decten1tlsg03; &decten1tlsg04; &decten1tlsg05; &decten1tlsg06;
&decten1tlsg07; &decten1tlsg08; &decten1tlsg09; &decten1tlsg10; &decten1tlsg11; &decten1tlsg12;
&decten1tlsg13; &decten1tlsg14; &decten1tlsg15; &decten1tlsg16; &decten1tlsg17; &decten1tlsg18;
&decten1tlsg19; &decten1tlsg20; &decten1tlsg21; &decten1tlsg22; &decten1tlsg23; &decten1tlsg24;
&decten1tlsg25; &decten1tlsg26; &decten1tlsg27; &decten1tlsg28; &decten1tlsg29; &decten1tlsg30;
&decten1tlsg31; &decten1tlsg32; &decten1tlsg33; &decten1tlsg34; &decten1tlsg35; &decten1tlsg36;
&decten1tlsg37; &decten1tlsn01; &decten1tlsn02; &decten1tlsn03; &decten1tlsn04; &decten1tlsn05;
&decten1tlsn06; &decten1tlsn07; &decten1pvc01; &decten1pvc02; &decten1pvc03; &decten1pvc04;
&decten1pvc05; &decten1pvc06; &decten1pvc07; &decten1pvc08; &decten1pvc09; &decten1pvc10;
&decten1pvc11; &decten1pvc12; &decten1pvc13; &decten1pvc14; &decten1pvc15; &decten1pvc16;
&decten1pvc17; &decten1pvc18; &decten2y07i001; &decten2y07i002; &decten2y07i003; &decten2y07i004;
&decten2y07i005; &decten2y07i006; &decten2y07i007; &decten2y07i008; &decten2y07i009; &decten2y07i010;
&decten2y07i011; &decten2y07i012; &decten2y07i013; &decten2y07i014; &decten2y08i001; &decten2y08i002;
&decten2y08i003; &decten2y08i004; &decten2y10i001; &decten2y10i002; &decten2y10i003; &decten2y10i004;
&decten2y10i005; &decten2y10i006; &decten2y10i007; &decten2y10i008; &decten2y10i009; &decten2y10i010;
&decten2y10i011; &decten2y10i012; &decten2y10i013; &decten2y10i014; &decten2y10i015; &decten2y10i016;
&decten2y10i017; &decten2y10i018; &decten2y10i019; &decten2y10i020; &decten2y10i021; &decten2y10i022;
&decten2y10i023; &decten2y10i024; &decten2y10i025; &decten2y10i026;    

Each reference in the sequence denotes a single constituent text of the DECTE corpus, and each text is itself a TEI-conformant XML document.

All the text names begin with 'decte'. Thereafter the names reflect the documents' origins in TLS, PVC, and NECTE2:

TLS texts:
  • 'n1' indicates that the text comes from the NECTE corpus
  • 'tls' indicates its TLS origin
  • 'g' / 'n' indicates Gateshead (g) or Newcastle (n) speakers
  • The number ranks the texts
PVC texts:
  • 'n1' indicates that the text comes from the NECTE corpus
  • 'pvc' indicates its PVC origin
  • The number ranks the texts


NECTE2 texts:
  • 'n2' indicates that the text comes from the NECTE2 corpus
  • 'y' followed by a number indicates the year collected: 'y07' was collected in 2007, and so on.
  • 'i' (='informant') followed by a number ranks the texts

All texts have the same structure:

<TEI xml:id="decten1tlsg01">


<teiHeader type="text">

<!--Header information -->




<!-- Content -->





  • Each <TEI> element contains a single TEI-conformant document comprising a header and a text, and which is uniquely identified by an 'xml:id' attribute whose value in the present case is one of the above text entity references.

  • <teiHeader> contains information specific to the interview. This header has the same structure as the global one, but it is much simpler since most characteristics of the individual interviews are described globally. Specifically, it contains only an empty <fileDesc> element, which is mandatory, and a <profileDesc> element which contains social data relating to the interviewee. <profileDesc> has the following structure:




<!-- A sequence of one or more <person> elements on the following pattern-->


<person xml:id="informantTlsg01">

 <age> 31-40 </age>

<sex> female </sex>

<residence> Gateshead; parents UK Northern </residence>
<occupation> Skilled manual and routine non-manual: tailor </occupation>

<education> Legal minimum </education>







Information for the <person> subelements is not uniformly available for every participant in every interview in DECTE, and where it is unavailable the subelement is left empty.

  • The <text> element contains the text of the interview. This element is relatively complex:

- The overall structure of the <text> element

- Alignment

- Speech tags

- Part-of-speech tags

- General editorial tags