Documentation:
document type definition
A valid as
opposed to merely well-formed XML document must include a document
type definition in relation to which the document can be
validated. This is done by means of a document type or DOCTYPE
declaration in which the elements, attributes, and so on used in the
document are specified. This specification can be internal in the
sense that its components appear lexically within the DOCTYPE
declaration, or external in that the names of one or more files
containing the specification are given in the DOCTYPE declaration,
or it can be a combination of the two. For further
information on DTD definition see Guidelines 2.4; 2.10;
Guidelines 3.
The NECTE
document "necte.xml" has been validated using the Topologi
Schematron Validator, which was downloaded from
http://www.topologi.com/products/validator/index.html.
The NECTE
DOCTYPE declaration looks like this:
<!DOCTYPE TEI.2 SYSTEM "tei2.dtd " [ |
<!ENTITY % TEI.XML 'INCLUDE'>
|
<!ENTITY % TEI.spoken 'INCLUDE'>
|
<!ENTITY % TEI.mixed 'INCLUDE'> |
<!ENTITY % TEI.linking 'INCLUDE'> |
<!ENTITY % TEI.analysis 'INCLUDE'> |
<!ENTITY % TEI.figures 'INCLUDE'> |
<!ENTITY % TEI.corpus 'INCLUDE'> |
|
<!NOTATION wav SYSTEM "wmplayer.exe"> |
<!NOTATION jpg SYSTEM "iexplorer.exe"> |
|
<!ENTITY % interviews SYSTEM
"interviews.ent"> %interviews; |
<!ENTITY % audiofiles SYSTEM
"audiofiles.ent"> %audiofiles; |
<!ENTITY % graphics SYSTEM "graphics.ent"> %graphics; |
<!ENTITY % urls SYSTEM "urls.ent"> %urls; |
<!ENTITY % emails SYSTEM "emails.ent"> %emails; |
]> |
where:
-- TEI.2 names the root element of the
document or documents to which the DTD applies. As we shall see,
the NECTE corpus consists of a sequence of XML documents each of
which has a root element TEI.2; the DOCTYPE declaration
applies to them all.
-- SYSTEM "tei.dtd" says that the file
containing the required DTD definitions is available locally in
the file "tei.dtd" (Guidelines 3.6).
This file contains the full TEI DTD, and is provided free by the TEI Consortium.
NECTE has downloaded this file and provides it along with the rest
of the NECTE materials rather than referring to it at a remote
site with the aim of making the corpus usable on a standalone
computer or convenient for users with a slow internet connection.
--the square brackets [ ] enclose
selections from the DTD definitions contained in "tei.dtd" and a
sequence of NECTE-specific NOTATION and ENTITY declarations.
-
The ENTITY declarations from <!ENTITY %
TEI.XML 'INCLUDE> to <!ENTITY % TEI.corpus 'INCLUDE'> select the
relevant parts of the full DTD provided by TEI. The TEI standard is defined
by a DTD which is partitioned into fragments that can be selected
according to the requirements of particular applications, thus
obviating the need to include the entire DTD in situations where
its full range is not required. For more
information on this selection mechanism see Guidelines 2.8.2 and
Guidelines 3.
-- <!ENTITY % TEI.XML 'INCLUDE'> is the
core tag set that every TEI-conformant document must have access
to; see Guidelines 6.
-- <!ENTITY % TEI.mixed 'INCLUDE'> TEI
regards corpora as ‘mixed’ documents that transcend conceptually
unitary types like verse or prose, and that consequently require
tags from a selection of DTD fragments. This declaration specifies
the mixed base fragment, which requires that all DTD fragments
that are to constitute the mix be specified; these are given
below.
-- <!ENTITY % TEI.spoken 'INCLUDE'>
defines tags appropriate to spoken / audio corpora; see Guidelines
11.
-- <!ENTITY % TEI.linking 'INCLUDE'>
defines tags for linking and aligning documents; see Guidelines
14.
-- <!ENTITY % TEI.analysis 'INCLUDE'>
defines 'a tag set for associating simple analyses and
interpretations with text elements', including grammatical markup;
see Guidelines 15.
-- <!ENTITY %
TEI.figures 'INCLUDE'>
defines a tag set for referring to graphics declared as external
entities; see Guidelines 22.
-- <!ENTITY % TEI.corpus 'INCLUDE'>
defines corpus-specific tags; see Guidelines 23.
-
NOTATION
declarations: Non-text entities such as graphics and sound can be
embedded in XML documents, but instructions on how these are to be
dealt with must be provided for XML processors. This is done using
NOTATION declarations (Guidelines 2.7.4;
Guidelines 22.3).
In the NECTE document, audio and graphics are required: <!NOTATION wav SYSTEM "wmplayer.exe">
says that any audio files are in '.wav' format
and are played using
the Microsoft Windows Media Player©,
and <!NOTATION jpg SYSTEM "iexplorer.exe">
that any graphics files are in '.jpg' format
and are viewed using
Microsoft Internet Explorer©. These applications were
selected on account of their ubiquity; users are, of course, at
liberty to alter this arrangement if desired.
-
<!ENTITY % interviews SYSTEM
"interviews.ent"> %interviews; declares a file of
entity declarations "interviews.ent" and inserts it into DOCTYPE
by the "%interviews;" call (Guidelines 2.7;
Guidelines 22.3).
The entity declarations in this file are used to insert the
interviews into the corpus, as described in the
discussion of the document instance.
|