THE NEWCASTLE ELECTRONIC CORPUS OF TYNESIDE ENGLISH

Home

Acknowledgements

Documentation

The corpus

People

Publications

Sponsors

References

Links

Appendices

Documentation: document structure

The main NECTE corpus file is necte.xml. This file contains a Text Encoding Initiative (TEI)-conformant XML document in the TEI local processing format sense, as specified in the TEI Guidelines 28. This section of the website motivates and describes the selection of TEI features used in the structuring of the corpus with reference to TEI version P4. Familiarity with XML and TEI is assumed throughout. Reference to ‘Guidelines’ in what follows are to the online TEI P4 Guidelines, and, unless otherwise indicated, quotations are from specified sections of these Guidelines.

To be TEI-conformant, an XML document has to be validated relative to the TEI Document Type Definition (DTD). NECTE's selection of a validator was based on information provided by Thijs van den Broek's technical report Benchmarking XML-editors (2004), a version of which is avilable on the Arts and Humanities Data Service (AHDS) website. We chose the oXygen XML editor, which provides facilities not only for creation of XML documents but also for validation in relation to user-defined DTDs. The NECTE corpus document in necte.xml has been validated relative to the TEI DTD by oXygen.

Like every XML document (Guidelines 2.10), necte.xml begins with a prolog which is followed by a document instance: