THE NEWCASTLE ELECTRONIC CORPUS OF TYNESIDE ENGLISH

Home

Acknowledgements

Documentation

The corpus

People

Publications

Sponsors

References

Links

Appendices

Documentation: interview entities

In the sequence of entity references &tlsg01; to &pvc18; that immediately follows the global header in necte.xml, each reference denotes a single constituent interview of the NECTE corpus. Every interview is itself a TEI-conformant XML document, and all have a uniform structure:

<TEI.2 id=”tlsg01”>
 
<teiHeader type=”text”>
<!--Header information -->
</teiHeader>
 
<text>
<!-- Content -->
</text>
 
</TEI.2>

 where:

  • Each <TEI.2> element contains a single interview document whose root is <TEI.2> and which is uniquely identified by an 'id' attribute whose value is one of the interview entity names specified in the DOCTYPE declaration.

  • <teiHeader> contains information specific to the interview. This header has the same structure as the global one, but it is much simpler since most characteristics of the individual interviews are described globally. More specifically, it contains only an empty <fileDesc> element, which is mandatory in every TEI header (Guidelines 5), and a <profileDesc> element which contains social data relating to the interviewee. <profileDesc> has the following structure:

<profileDesc>
 
<particDesc>
 
<!-- A sequence of one or more <person> elements on the following pattern-->
<person id="informantTlsn07" role="interviewee" age="31-40" sex="m">
<residence></residence>
<occupation></occupation>
<education></education>
<socecStatus></socecStatus>
</person>
 
</particDesc>
 
</profileDesc>

The <particDesc> element describes the participants in the interview, and each in the succession of <person> elements within <particDesc> describes a single participant. The attributes of the <person> element give a formal identifier and some basic information about the person in question, and a succession of subelements with self-explanatory names provide further information; information relevant to the <person> subelements is not uniformly available for every participant in every interview in the corpus, and where it is unavailable the subelement is left empty. <profileDesc> is described in Guidelines 5.4, and <particDesc> together with <person> and its subelements in Guidelines 23.2.2

  • The <text> element contains the text of the interview. This element is relatively complex:

-- The overall structure of the <text> element

-- Alignment

-- Speech tags

-- Part-of-speech tags

-- General editorial tags