THE NEWCASTLE ELECTRONIC CORPUS OF TYNESIDE ENGLISH |
The NECTE
corpus NECTE is available free of charge for non-commercial use by individuals or groups that can demonstrate a bona fide interest. Potential users include those in the list below, though others are also welcome to apply.
How to obtain the NECTE corpus NECTE can be obtained in two ways:
Using the NECTE corpus The NECTE project has aimed to create a corpus that conforms to emerging world standards for encoding of text, and to that end it has used Text Encoding Initiative (TEI)-conformant XML syntax. Adoption of these standards requires no justification in principle, but it can be an obstacle to users of the corpus in practice. The Documentation page of this website observes, rather economically, that 'familiarity with XML and TEI is assumed throughout'; users not familiar with these may find the pervasive markup tags in the NECTE files a distracting encumbrance and yearn for the good old days of plain text files. This is not an unreasonable position. XML was never intended to be reader-friendly. It is a markup language that provides a standard for structuring of documents and document collections, and, though XML-encoded documents are plain text files that can be read by humans, in general they should not be. For an XML document to be readily legible, software that can represent the structural markup in a visually-accessible way is required --for example, XSLT (Extensible Stylesheet Language Transformations) can transform an XML-encoded document into an HTML-encoded one that can then be viewed using any standard Web browser. Similarly, search and analysis of NECTE or any other XML-encoded corpus requires software to interpret the markup. The Oxford University Computing Service's Xaira system, for example, is 'a general purpose XML search engine, which will operate on any corpus of well-formed XML documents. It is however best used with TEI-conformant documents'; we have not yet tried using Xaira on NECTE, and are keen to hear from anyone who has. Some directories of XML-aware software are: It remains, however, that the NECTE files can be be read (and in the case of the audio files, listened to) directly. They can be accessed in the following ways:
Note that, when double-clicking in access methods (1) and (2) above, the application software used to view a file is determined not by NECTE but by the user's own system setup, and more particularly by the associations between file extensions and applications specified on the user system. For MS Windows, instructions on how to create such definitions can be found via the Windows Help menu: select 'Index', enter keyword 'associating files with programs', and follow the instructions. Notes: 1. As the Documentation section of this website explains, NECTE comprises a fairly large number of files that are designed to work in conjunction with one another. The file 'necte.xml' is the NECTE master file. When it is opened, it attempts to pull together the '.xml', '.wav', and '.ent' files along with with the contents of 'tei_p4_dtd.zip', thereby creating the corpus in its entirety. Clearly, to be able to do this, all the files that constitute NECTE have to be available. If 'necte.xml' is opened online via the NECTE website or from a NECTE distribution DVD, this is the default situation and there should be no problem, though it may take some time to load on account of the large amount of text involved. If the corpus has been downloaded from the NECTE website, however, the following should be noted:
Attempting to open 'necte.xml' when one or more of these conditions is unsatisfied results in a malfunction. 2. Because NECTE is freely available, there can be and therefore is no restriction on user emendation of content and/or TEI-conformant XML encoding. The '.ent' files and those in 'tei_p4_dtd.zip' should, however, only be edited by users fully conversant with XML and TEI. Changes that have not been carefully considered will cause the corpus to behave in unpredictable ways or to malfunction. The NECTE team would, moreover, be obliged if any such changes were explicitly stated in any public output based on an emended corpus. 3. We have successfully run NECTE as it is supplied both by download from the main website and by DVD on a variety of recent MS-Windows and Mac platforms. The operative word here is 'recent'. There is a wide range of hardware / operating system combinations in our potential user community, and it may well be that NECTE will not run on some of these, or run so slowly as to be impractical --for example, machines with limited memories and / or relatively slow CPUs, or without DVD drives, or running obsolete operating system versions. 4. NECTE is hot off the press, and as such we would be more than pleased to be told by users about omissions, errors, and improvements so that these can be incorporated into future revisions. The relevant contacts are karen.corrigan@ncl.ac.uk and hermann.moisl@ncl.ac.uk.
|