THE NEWCASTLE ELECTRONIC CORPUS OF TYNESIDE ENGLISH

Home

Acknowledgements

Documentation

The corpus

People

Publications

Sponsors

References

Links

Appendices

The NECTE corpus

NECTE is available free of charge for non-commercial use by individuals or groups that can demonstrate a bona fide interest. Potential users include those in the list below, though others are also welcome to apply.

  • Academic researchers in linguistics and related disciplines such as anthropology, ethnography, sociology, social history, and cultural studies.

  • Educationalists.

  • The media in non-commercial applications.

  • Organisations such as language societies and individuals that may not belong to categories (1) - (3) above, but have a serious interest in historical dialect materials.


How to obtain the NECTE corpus

NECTE can be obtained in two ways:

1. From the Arts and Humanities Data Service (AHDS) at http://ahds.ac.uk/, where further instructions are given.

2. From the School of English Literary and Linguistic Studies at the University of Newcastle upon Tyne. The NECTE access request form should be downloaded and returned by post or email to:

  • Postal address: The Newcastle Electronic Corpus of Tyneside English, School of English Literary and Linguistic Studies, Percy Building, University of Newcastle, Newcastle upon Tyne NE1 7RU, United Kingdom.

  • Email: k.p.corrigan@ncl.ac.uk

Successful applicants will be sent a user ID and password for access to the download area of this site. It should, however, be noted that the NECTE audio files are large, and that download of the entire corpus may be impractical. We therefore offer applicants the option of receiving NECTE by post on DVD.

To enter the download area, click here.


Using the NECTE corpus

The NECTE project has aimed to create a corpus that conforms to emerging world standards for encoding of text, and to that end it has used Text Encoding Initiative (TEI)-conformant XML syntax. Adoption of these standards requires no justification in principle, but it can be an obstacle to users of the corpus in practice. The Documentation page of this website observes, rather economically, that 'familiarity with XML and TEI is assumed throughout'; users not familiar with these may find the pervasive markup tags in the NECTE files a distracting encumbrance and yearn for the good old days of plain text files. This is not an unreasonable position. XML was never intended to be reader-friendly. It is a markup language that provides a standard for structuring of documents and document collections, and, though XML-encoded documents are plain text files that can be read by humans, in general they should not be. For an XML document to be readily legible, software that can represent the structural markup in a visually-accessible way is required --for example, XSLT (Extensible Stylesheet Language Transformations) can transform an XML-encoded document into an HTML-encoded one that can then be viewed using any standard Web browser. Similarly, search and analysis of NECTE or any other XML-encoded corpus requires software to interpret the markup. The Oxford University Computing Service's Xaira system, for example, is 'a general purpose XML search engine, which will operate on any corpus of well-formed XML documents. It is however best used with TEI-conformant documents'; we have not yet tried using Xaira on NECTE, and are keen to hear from anyone who has. Some directories of XML-aware software are:

It remains, however, that the NECTE files can be be read (and in the case of the audio files, listened to) directly. They can be accessed in the following ways:

1. By opening the required file at this NECTE website via the download area; this requires a user ID and password, as explained above. Simply double-click on the relevant link.

2. By using the copy of this website supplied on the NECTE distribution DVD. On MS Windows systems the website should open automatically once the disk is inserted into the DVD drive: select 'The Corpus', follow this link, and double-click the relevant filename. If the site does not open automatically, or if a Macintosh system is being used, open 'index.htm' and proceed as above.

3. By opening the required file directly, either from the NECTE distribution DVD or from the directory on the user's system to which the NECTE files have been copied, using appropriate application software:

  • The files with the '.xml' extension are corpus content text files. Because they are text files, they can be viewed using any of a wide range of applications: text editors and word processors; HTML editors such as Microsoft Frontpage  and Macromedia Dreamweaver; or any of the dedicated XML editors listed in the above XML directory sites; recent versions of Internet Explorer (other Web browsers may also work, but we have not verified this). Using Internet Explorer, for example, an XML file should look like this; other applications may not represent the structure hierarchy as clearly.

  • The files with the '.wav' extension are audio files. These can be opened using standard multimedia applications such as Microsoft's Windows Media Player.

  • The files with the '.ent' extension have to do with the TEI / XML encoding. These are plain text files and can be viewed with any text editor or word processor.

  • The file 'tei_p4_dtd.zip' is an archive that contains a range of files having to do with the TEI / XML encoding. The '.xml' files can be viewed using the types of application mentioned above, and all the others with text editors and word processors. Obviously, this file needs to be unzipped before attempting to view any of its contents.

Note that, when double-clicking in access methods (1) and (2) above, the application software used to view a file is determined not by NECTE but by the user's own system setup, and more particularly by the associations between file extensions and applications specified on the user system. For MS Windows, instructions on how to create such definitions can be found via the Windows Help menu: select 'Index', enter keyword 'associating files with programs', and follow the instructions.

Notes:

1. As the Documentation section of this website explains, NECTE comprises a fairly large number of files that are designed to work in conjunction with one another. The file 'necte.xml' is the NECTE master file. When it is opened, it attempts to pull together the '.xml', '.wav', and '.ent' files along with with the contents of 'tei_p4_dtd.zip', thereby creating the corpus in its entirety. Clearly, to be able to do this, all the files that constitute NECTE have to be available. If 'necte.xml' is opened online via the NECTE website or from a NECTE distribution DVD, this is the default situation and there should be no problem, though it may take some time to load on account of the large amount of text involved. If the corpus has been downloaded from the NECTE website, however, the following should be noted:

  • Because 'necte.xml' tries to create the entire corpus, ALL the NECTE-constituent files must be downloaded.

  • The file  'tei_p4_dtd.zip' must be unzipped before attempting to open 'necte.xml'.

  • ALL NECTE files must be in the same directory on the user's file system, including those unzipped from  'tei_p4_dtd.zip'.

Attempting to open 'necte.xml' when one or more of these conditions is unsatisfied results in a malfunction.

2. Because NECTE is freely available, there can be and therefore is no restriction on user emendation of content and/or TEI-conformant XML encoding. The '.ent' files and those in 'tei_p4_dtd.zip' should, however, only be edited by users fully conversant with XML and TEI. Changes that have not been carefully considered will cause the corpus to behave in unpredictable ways or to malfunction. The NECTE team would, moreover, be obliged if any such changes were explicitly stated in any public output based on an emended corpus.

3. We have successfully run NECTE as it is supplied both by download from the main website and by DVD on a variety of recent MS-Windows and Mac platforms. The operative word here is 'recent'. There is a wide range of hardware / operating system combinations in our potential user community, and it may well be that NECTE will not run on some of these, or run so slowly as to be impractical --for example, machines with limited memories and / or relatively slow CPUs, or without DVD drives, or running obsolete operating system versions. 

4. NECTE is hot off the press, and as such we would be more than pleased to be told by users about omissions, errors, and improvements so that these can be incorporated into future revisions. The relevant contacts are karen.corrigan@ncl.ac.uk and hermann.moisl@ncl.ac.uk.