A valid — as opposed to merely well-formed — XML document must include a DTD in relation to which the document can be validated.

This is done by means of a document type or DOCTYPE declaration in which the XML element, attribute, and other tags used in the document are specified. This specification can be internal in the sense that its components appear lexically within the DOCTYPE declaration, or external in that the names of one or more files containing the specification are given, or it can be a combination of the two.

In the present case, the specification takes the form of references to external files, some of which are a selection TEI module files containing the tag sets used in the corpus, and some of which are files containing the names of the content files which constitute the corpus.

The DTD declaration in the decte.xml file is as follows:

<!-- DTD declaration -->

<!DOCTYPE teiCorpus SYSTEM "tei.dtd" [

<!-- Additions to core entities in tei.dtd -->


<!ENTITY % TEI.corpus 'INCLUDE'>

<!ENTITY % TEI.header 'INCLUDE'>

<!ENTITY % TEI.textstructure 'INCLUDE'>

<!ENTITY % TEI.linking 'INCLUDE'>

<!ENTITY % TEI.analysis 'INCLUDE'>

<!ENTITY % TEI.namesdates 'INCLUDE'>

<!ENTITY % TEI.spoken 'INCLUDE'>

<!-- Additional ENTITY declarations -->

<!-- Interview XML files -->

<!ENTITY % interviews SYSTEM "interviews.ent"> %interviews;

<!-- Interview audio files -->

<!ENTITY % audiofiles SYSTEM "audiofiles.ent"> %audiofiles;



  • <!DOCTYPE teiCorpus SYSTEM "tei.dtd"  […] > is the DOCTYPE declaration in which:

(a) teiCorpus names the root element of the document to which the DTD applies. This is the name defined by the TEI as the root element name for language corpora (TEI Guidelines, 15.1).

(b) SYSTEM "tei.dtd" indicates that the required DTD definitions are available locally via the tei.dtd file. To understand the role of this file, one has to realize that the TEI DTD is partitioned into modules that can be selected according to the requirements of particular applications, thus obviating the need to include the entire DTD in situations where its full range is not required; tei.dtd is a 'driver' file which refers to these TEI DTD modules.

(c) the square brackets [  ] enclose the DECTE-specific selections from the TEI DTD and some DECTE-specific <ENTITY> declarations. These are described below.

  • The <ENTITY> declarations from <!ENTITY % TEI.core 'INCLUDE'> to <!ENTITY % TEI.spoken 'INCLUDE'> select the parts of the full TEI DTD that are relevant to DECTE.

(1) <!ENTITY % TEI.core 'INCLUDE'>: the core tag set that must be available in every TEI-conformant document (TEI Guidelines, section 3).

(2) <!ENTITY % TEI.corpus 'INCLUDE'>: tags specific to language corpora (TEI Guidelines, section 15).

(3) <!ENTITY % TEI.header 'INCLUDE'>: tags that relate to the inclusion of metadata in the header of a TEI document (TEI Guidelines, section 2).

(4) <!ENTITY % TEI.textstructure 'INCLUDE'>: document structuring tags (TEI Guidelines, section 4);

(5) <!ENTITY % TEI.linking 'INCLUDE'>: linking, segmentation and alignment of document components (TEI Guidelines, section 16).

(6) <!ENTITY % TEI.analysis 'INCLUDE'>: interpretation of document elements (TEI Guidelines, section 17).

(7) <!ENTITY % TEI.namesdates 'INCLUDE'>: inclusion of names, dates, and related information (TEI Guidelines, section 13).

(8) <!ENTITY % TEI.spoken 'INCLUDE'>: tags that relate to the transcription of spoken language (TEI Guidelines, section 8).

  • The additional <ENTITY> declarations are DECTE-specific additions to the TEI DTD.

(9) <!ENTITY % interviews SYSTEM "interviews.ent"> %interviews: a list of the XML-formatted interview files included in DECTE.

(10) <!ENTITY % audiofiles SYSTEM "audiofiles.ent"> %audiofiles: a list of the interview audio files included in DECTE.