THE NEWCASTLE ELECTRONIC CORPUS OF TYNESIDE ENGLISH

Home

Acknowledgements

Documentation

The corpus

People

Publications

Sponsors

References

Links

Appendices

Documentation: document instance

Chapter 23 of the TEI Guidelines deals with language corpora, and is therefore the foundation on which the structure of the NECTE document instance is built. There, a corpus is regarded as a ‘composite’ as opposed to a ‘unitary’ text the like novel (Guidelines 7), and consists of a header followed by a sequence of TEI-conformant XML texts. The NECTE corpus document file necte.xml correspondingly contains both the header and the text sequence, that is, the sequence of interviews comprising the NECTE corpus.

<teiCorpus.2>
 
<teiHeader type='corpus'>
</teiheader>
 
 &tlsg01; &tlsg22; &tlsn06;
 &tlsg02; &tlsg23; &tlsn07;
 &tlsg03; &tlsg24; &pvc01;
 &tlsg04; &tlsg25; &pvc02;

 &tlsg05;

&tlsg26; &pvc03;
 &tlsg06; &tlsg27; &pvc04;
 &tlsg07; &tlsg28; &pvc05;
 &tlsg08; &tlsg29; &pvc06;
 &tlsg09; &tlsg30; &pvc07;

 &tlsg10;

&tlsg31; &pvc08;
 &tlsg11; &tlsg32; &pvc09;
 &tlsg12; &tlsg33; &pvc10;
 &tlsg13; &tlsg34; &pvc11;
 &tlsg14; &tlsg35; &pvc12;
 &tlsg15; &tlsg36; &pvc13;
 &tlsg16; &tlsg37; &pvc14;
 &tlsg17; &tlsn01; &pvc15;
 &tlsg18; &tlsn02; &pvc16;
 &tlsg19; &tlsn03; &pvc17;
 &tlsg20; &tlsn04; &pvc18;
 &tlsg21; &tlsn05;  
    </teiCorpus.2
The interview sequence is not lexically present in necte.xml. Rather, necte.xml contains a list of references to entities defined by <!ENTITY % interviews SYSTEM 'interviews.ent'> %interviews; in the DOCTYPE declaration. Each such entity reference denotes an operating system XML file that contains a single interview; an XML processor understands this denotation and processes the interview file as if it were lexically present in necte.xml. The motivation for using entity references in this way is to make the corpus modular and thus more manageable than a single large file; the list of interview entity references is shown in three columns to make this page more compact.
  • <teiCorpus.2> denotes a TEI-conformant corpus.

  • <teiHeader> contains information that applies to all the constituent interviews of the corpus, and is in this sense global.

  • Each entity reference in the sequence &tlsg; to  &pvc18; denotes a single constituent interview of the corpus.