xml bible

24
Attribute Declarations in DTDs S ome XML elements have attributes. Attributes contain information intended for the application. Attributes are intended for extra information associated with an element (like an ID number) used only by programs that read and write the file, and not for the content of the element that’s read and written by humans. In this chapter, you will learn about the various attribute types and how to declare attributes in DTDs. What Is an Attribute? As first discussed in Chapter 3, start tags and empty tags may contain attributes-name-value pairs separated by an equals sign (=). For example, <GREETING LANGUAGE=”English”> Hello XML! <MOVIE SOURCE=”WavingHand.mov”/> </GREETING> In the preceding example, the GREETING element has a LANGUAGE attribute, which has the value English. The MOVIE element has a SOURCE attribute, which has the value WavingHand.mov. The GREETING element’s content is Hello XML!. The language in which the content is written is useful information about the content. The language, however, is not itself part of the content. Similarly, the MOVIE element’s content is the binary data stored in the file WavingHand.mov. The name of the file is not the content, although the name tells you where the content can be found. Once again, the attribute contains information about the content of the element, rather than the content itself. 10 10 CHAPTER In This Chapter What is an attribute? How to declare attributes in DTDs How to declare multiple attributes How to specify default values for attributes Attribute types Predefined attributes A DTD for attribute- based baseball statistics

Upload: pradeep-sripada

Post on 15-Jan-2015

615 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: XML Bible

AttributeDeclarations in DTDs

Some XML elements have attributes. Attributes containinformation intended for the application. Attributes are

intended for extra information associated with an element(like an ID number) used only by programs that read and writethe file, and not for the content of the element that’s read andwritten by humans. In this chapter, you will learn about thevarious attribute types and how to declare attributes in DTDs.

What Is an Attribute?As first discussed in Chapter 3, start tags and empty tags maycontain attributes-name-value pairs separated by an equalssign (=). For example,

<GREETING LANGUAGE=”English”>Hello XML!<MOVIE SOURCE=”WavingHand.mov”/>

</GREETING>

In the preceding example, the GREETING element has aLANGUAGE attribute, which has the value English. The MOVIEelement has a SOURCE attribute, which has the valueWavingHand.mov. The GREETING element’s content is HelloXML!. The language in which the content is written is usefulinformation about the content. The language, however, is notitself part of the content.

Similarly, the MOVIE element’s content is the binary data storedin the file WavingHand.mov. The name of the file is not thecontent, although the name tells you where the content can befound. Once again, the attribute contains information aboutthe content of the element, rather than the content itself.

1010C H A P T E R

✦ ✦ ✦ ✦

In This Chapter

What is an attribute?

How to declareattributes in DTDs

How to declaremultiple attributes

How to specifydefault values forattributes

Attribute types

Predefined attributes

A DTD for attribute-based baseballstatistics

✦ ✦ ✦ ✦

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 283

Page 2: XML Bible

284 Part II ✦ Document Type Definitions

Elements can possess more than one attribute. For example:

<RECTANGLE WIDTH=”30” HEIGHT=”45”/><SCRIPT LANGUAGE=”javascript” ENCODING=”8859_1”>...</SCRIPT>

In this example, the LANGUAGE attribute of the SCRIPT element has the valuejavascript. The ENCODING attribute of the SCRIPT element has the value 8859_1.The WIDTH attribute of the RECTANGLE element has the value 30. The HEIGHTattribute of the RECT element has the value 45. These values are all strings, notnumbers.

End tags cannot possess attributes. The following example is illegal:

<SCRIPT>...</SCRIPT LANGUAGE=”javascript” ENCODING=”8859_1”>

Declaring Attributes in DTDsLike elements and entities, the attributes used in a document must be declared inthe DTD for the document to be valid. The <!ATTLIST> tag declares attributes.<!ATTLIST> has the following form:

<!ATTLIST Element_name Attribute_name Type Default_value>

Element_name is the name of the element possessing this attribute.Attribute_name is the name of the attribute. Type is the kind of attribute-one ofthe ten valid types listed in Table 10-1. The most general type is CDATA. Finally,Default_value is the value the attribute takes on if no value is specified for theattribute.

For example, consider the following element:

<GREETING LANGUAGE=”Spanish”>Hola!

</GREETING>

This element might be declared as follows in the DTD:

<!ELEMENT GREETING (#PCDATA)><!ATTLIST GREETING LANGUAGE CDATA “English”>

The <!ELEMENT> tag simply says that a greeting element contains parsed characterdata. That’s nothing new. The <!ATTLIST> tag says that GREETING elements havean attribute with the name LANGUAGE whose value has the type CDATA—which isessentially the same as #PCDATA for element content. If you encounter a GREETINGtag without a LANGUAGE attribute, the value English is used by default.

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 284

Page 3: XML Bible

285Chapter 10 ✦ Attribute Declarations in DTDs

Table 10-1Attribute Types

Type Meaning

CDATA Character data — text that is not markup

Enumerated A list of possible values from which exactly one will be chosen

ID A unique name not shared by any other ID type attribute in the document

IDREF The value of an ID type attribute of an element in the document

IDREFS Multiple IDs of elements separated by whitespace

ENTITY The name of an entity declared in the DTD

ENTITIES The names of multiple entities declared in the DTD, separated bywhitespace

NMTOKEN An XML name

NOTATION The name of a notation declared in the DTD

NMTOKENS Multiple XML names separated by whitespace

The attribute list is declared separately from the tag itself. The name of the elementto which the attribute belongs is included in the <!ATTLIST> tag. This attributedeclaration applies only to that element, which is GREETING in the precedingexample. If other elements also have LANGUAGE attributes, they require separate<!ATTLIST> declarations.

As with most declarations, the exact order in which attribute declarations appear isnot important. They can come before or after the element declaration with whichthey’re associated. In fact, you can even declare an attribute more than once(though I don’t recommend this practice), in which case the first such declarationtakes precedence.

You can even declare attributes for tags that don’t exist, although this isuncommon. Perhaps you could declare these nonexistent attributes as part of theinitial editing of the DTD, with a plan to return later and declare the elements.

Declaring Multiple AttributesElements often have multiple attributes. HTML’s IMG element can have HEIGHT,WIDTH, ALT, BORDER, ALIGN, and several other attributes. In fact, most HTML tags

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 285

Page 4: XML Bible

286 Part II ✦ Document Type Definitions

can have multiple attributes. XML tags can also have multiple attributes. Forinstance, a RECTANGLE element naturally needs both a LENGTH and a WIDTH.

<RECTANGLE LENGTH=”70px” WIDTH=”85px”/>

You can declare these attributes in several attribute declarations, with onedeclaration for each attribute. For example:

<!ELEMENT RECTANGLE EMPTY><!ATTLIST RECTANGLE LENGTH CDATA “0px”><!ATTLIST RECTANGLE WIDTH CDATA “0px”>

The preceding example says that RECTANGLE elements possess LENGTH and WIDTHattributes, each of which has the default value 0px.

You can combine the two <!ATTLIST> tags into a single declaration like this:

<!ATTLIST RECTANGLE LENGTH CDATA “0px”WIDTH CDATA “0px”>

This single declaration declares both the LENGTH and WIDTH attributes, each withtype CDATA and each with a default value of 0px. You can also use this syntax whenthe attributes have different types or defaults, as shown below:

<!ATTLIST RECTANGLE LENGTH CDATA “15px”WIDTH CDATA “34pt”>

Personally, I’m not very fond of this style. It seems excessively confusing and reliestoo much on proper placement of extra whitespace for legibility (though thewhitespace is unimportant to the actual meaning of the tag). You will certainlyencounter this style in DTDs written by other people, however, so you need tounderstand it.

Specifying Default Values for AttributesInstead of specifying an explicit default attribute value like 0px, an attributedeclaration can require the author to provide a value, allow the value to be omittedcompletely, or even always use the default value. These requirements are specifiedwith the three keywords #REQUIRED, #IMPLIED, and #FIXED, respectively.

#REQUIREDYou may not always have a good option for a default value. For example, whenwriting a DTD for use on your intranet, you may want to require that all documents

Note

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 286

Page 5: XML Bible

287Chapter 10 ✦ Attribute Declarations in DTDs

have at least one empty <AUTHOR/> tag. This tag is not normally rendered, but itcan identify the person who created the document. This tag can have NAME, EMAIL,and EXTENSION attributes so the author may be contacted. For example:

<AUTHOR NAME=”Elliotte Rusty Harold” EMAIL=”[email protected]” EXTENSION=”3459”/>

Instead of providing default values for these attributes, suppose you want to forceanyone posting a document on the intranet to identify themselves. While XML can’tprevent someone from attributing authorship to “Luke Skywalker,” it can at leastrequire that authorship is attributed to someone by using #REQUIRED as the defaultvalue. For example:

<!ELEMENT AUTHOR EMPTY><!ATTLIST AUTHOR NAME CDATA #REQUIRED><!ATTLIST AUTHOR EMAIL CDATA #REQUIRED><!ATTLIST AUTHOR EXTENSION CDATA #REQUIRED>

If the parser encounters an <AUTHOR/> tag that does not include one or more ofthese attributes, it returns an error.

You might also want to use #REQUIRED to force authors to give their IMG elementsWIDTH, HEIGHT, and ALT attributes. For example:

<!ELEMENT IMG EMPTY><!ATTLIST IMG ALT CDATA #REQUIRED><!ATTLIST IMG WIDTH CDATA #REQUIRED><!ATTLIST IMG HEIGHT CDATA #REQUIRED>

Any attempt to omit these attributes (as all too many Web pages do) produces aninvalid document. The XML processor notices the error and informs the author ofthe missing attributes.

#IMPLIEDSometimes you may not have a good option for a default value, but you do not wantto require the author of the document to include a value, either. For example,suppose some of the people posting documents to your intranet are offsitefreelancers who have email addresses but lack phone extensions. Therefore, youdon’t want to require them to include an extension attribute in their <AUTHOR/>tags. For example:

<AUTHOR NAME=”Elliotte Rusty Harold” EMAIL=”[email protected]” />

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 287

Page 6: XML Bible

288 Part II ✦ Document Type Definitions

You still don’t want to provide a default value for the extension, but you do want toenable authors to include such an attribute. In this case, use #IMPLIED as thedefault value like this:

<!ELEMENT AUTHOR EMPTY><!ATTLIST AUTHOR NAME CDATA #REQUIRED><!ATTLIST AUTHOR EMAIL CDATA #REQUIRED><!ATTLIST AUTHOR EXTENSION CDATA #IMPLIED>

If the XML parser encounters an <AUTHOR/> tag without an EXTENSION attribute, itinforms the XML application that no value is available. The application can act onthis notification as it chooses. For example, if the application is feeding elementsinto a SQL database where the attributes are mapped to fields, the applicationwould probably insert a null into the corresponding database field.

#FIXEDFinally, you may want to provide a default value for the attribute without allowingthe author to change it. For example, you may wish to specify an identical COMPANYattribute of the AUTHOR element for anyone posting documents to your intranet likethis:

<AUTHOR NAME=”Elliotte Rusty Harold” COMPANY=”TIC”EMAIL=”[email protected]” EXTENSION=”3459”/>

You can require that everyone use this value of the company by specifying thedefault value as #FIXED, followed by the actual default. For example:

<!ELEMENT AUTHOR EMPTY><!ATTLIST AUTHOR NAME CDATA #REQUIRED><!ATTLIST AUTHOR EMAIL CDATA #REQUIRED><!ATTLIST AUTHOR EXTENSION CDATA #IMPLIED><!ATTLIST AUTHOR COMPANY CDATA #FIXED “TIC”>

Document authors are not required to actually include the fixed attribute in theirtags. If they don’t include the fixed attribute, the default value will be used. If theydo include the fixed attribute, however, they must use an identical value.Otherwise, the parser will return an error.

Attribute TypesAll preceding examples have CDATA type attributes. This is the most general type,but there are nine other types permitted for attributes. Altogether the ten types are:

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 288

Page 7: XML Bible

289Chapter 10 ✦ Attribute Declarations in DTDs

✦ CDATA

✦ Enumerated

✦ NMTOKEN

✦ NMTOKENS

✦ ID

✦ IDREF

✦ IDREFS

✦ ENTITY

✦ ENTITIES

✦ NOTATION

Nine of the preceding attributes are constants used in the type field, whileEnumerated is a special type that indicates the attribute must take its value from alist of possible values. Let’s investigate each type in depth.

The CDATA Attribute TypeCDATA, the most general attribute type, means the attribute value may be any stringof text not containing a less-than sign (<) or quotation marks (“). These charactersmay be inserted using the usual entity references (&lt;, and &quot;) or by theirUnicode values using character references. Furthermore, all raw ampersands(&)-that is ampersands that do not begin a character or entity reference-must alsobe escaped as &amp;.

In fact, even if the value itself contains double quotes, they do not have to beescaped. Instead, you may use single quotes to delimit the attributes, as in thefollowing example:

<RECTANGLE LENGTH=’7”’ WIDTH=’8.5”’/>

If the attribute value contains single and double quotes, the one not used to delimitthe value must be replaced with the entity references &apos; (apostrophe) and&quot; (double quote). For example:

<RECTANGLE LENGTH=’8&apos;7”’ WIDTH=”10’6&quot;”/>

The Enumerated Attribute TypeThe enumerated type is not an XML keyword, but a list of possible values for theattribute, separated by vertical bars. Each value must be a valid XML name. The

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 289

Page 8: XML Bible

290 Part II ✦ Document Type Definitions

document author can choose any one member of the list as the value of theattribute. The default value must be one of the values in the list.

For example, suppose you want an element to be visible or invisible. You may wantthe element to have a VISIBLE attribute, which can only have the values TRUE orFALSE. If that element is the simple P element, then the <!ATTLIST> declarationwould look as follows:

<!ATTLIST P VISIBLE (TRUE | FALSE) “TRUE”>

The preceding declaration says that a P element may or may not have a VISIBLEattribute. If it does have a VISIBLE attribute, the value of that attribute must beeither TRUE or FALSE. If it does not have such an attribute, the value TRUE isassumed. For example,

<P VISIBLE=”FALSE”>You can’t see me! Nyah! Nyah!</P><P VISIBLE=”TRUE”>You can see me.</P><P>You can see me too.</P>

By itself, this declaration is not a magic incantation that enables you to hide text. Itstill relies on the application to understand that it shouldn’t display invisibleelements. Whether the element is shown or hidden would probably be set througha style sheet rule applied to elements with VISIBLE attributes. For example,

<xsl:template match=”P[@VISIBLE=’FALSE’]”></xsl:template>

<xsl:template match=”P[@VISIBLE=’TRUE’]”><xsl:apply-templates/>

</xsl:template>

The NMTOKEN Attribute TypeThe NMTOKEN attribute type restricts the value of the attribute to a valid XML name.As discussed in Chapter 6, XML names must begin with a letter or an underscore(_). Subsequent characters in the name may include letters, digits, underscores,hyphens, and periods. They may not include whitespace. (The underscore oftensubstitutes for whitespace.) Technically, names may contain colons, but youshouldn’t use this character because it’s reserved for use with namespaces.

The NMTOKEN attribute type proves useful when you’re using a programminglanguage to manipulate the XML data. It’s not a coincidence that—except forallowing colons—the preceding rules match the rules for identifiers in Java,JavaScript, and many other programming languages. For example, you could useNMTOKEN to associate a particular Java class with an element. Then, you could useJava’s reflection API to pass the data to a particular method in a particular class.

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 290

Page 9: XML Bible

291Chapter 10 ✦ Attribute Declarations in DTDs

The NMTOKEN attribute type also helps when you need to pick from any large groupof names that aren’t specifically part of XML but meet XML’s name requirements.The most significant of these requirements is the prohibition of whitespace. Forexample, NMTOKEN could be used for an attribute whose value had to map to an 8.3DOS file name. On the other hand, it wouldn’t work well for UNIX, Macintosh, orWindows NT file-name attributes because those names often contain whitespace.

For example, suppose you want to require a state attribute in an <ADDRESS/> tag tobe a two-letter abbreviation. You cannot force this characteristic with a DTD, butyou can prevent people from entering “New York” or “Puerto Rico” with thefollowing <!ATTLIST> declaration:

<!ATTLIST ADDRESS STATE NMTOKEN #REQUIRED>

However, “California,” “Nevada,” and other single word states are still legal values.Of course, you could simply use an enumerated list with several dozen two-lettercodes, but that approach results in more work than most people want to expend.For that matter, do you even know the two-letter codes for all 50 U.S. states, all theterritories and possessions, all foreign military postings, and all Canadianprovinces? On the other hand, if you define this list once in a parameter entityreference in a DTD file, you can reuse the file many times over.

The NMTOKENS Attribute TypeThe NMTOKENS attribute type is a rare plural form of NMTOKEN. It enables the valueof the attribute to consist of multiple XML names, separated from each other bywhitespace. Generally, you can use NMTOKENS for the same reasons as NMTOKEN, butonly when multiple names are required.

For example, if you want to require multiple two-letter state codes for a state’sattribute, you can use the following example:

<!ATTLIST ADDRESS STATES NMTOKENS #REQUIRED>

Then, you could have an address tag as follows:

<ADDRESS STATES=”MI NY LA CA”>

Unfortunately, if you apply this technique, you’re no longer ruling out states likeNew York because each individual part of the state name qualifies as an NMTOKEN,as shown below:

<ADDRESS STATES=”MI New York LA CA”>

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 291

Page 10: XML Bible

292 Part II ✦ Document Type Definitions

The ID Attribute TypeAn ID type attribute uniquely identifies the element in the document. Authoringtools and other applications commonly use ID to help enumerate the elements of adocument without concern for their exact meaning or relationship to one another.

An attribute value of type ID must be a valid XML name-that is,it begins with aletter and is composed of alphanumeric characters and the underscore withoutwhitespace. A particular name may not be used as an ID attribute of more than onetag. Using the same ID twice in one document causes the parser to return an error.Furthermore, each element may not have more than one attribute of type ID.

Typically, ID attributes exist solely for the convenience of programs thatmanipulate the data. In many cases, multiple elements can be effectively identicalexcept for the value of an ID attribute. If you choose IDs in some predictablefashion, a program can enumerate all the different elements or all the differentelements of one type in the document.

The ID type is incompatible with #FIXED. An attribute cannot be both fixed andhave ID type because a #FIXED attribute can only have a single value, while eachID type attribute must have a different value. Most ID attributes use #REQUIRED, asListing 10-1 demonstrates.

Listing 10-1: A required ID attribute type

<?xml version=”1.0” standalone=”yes”?><!DOCTYPE DOCUMENT [

<!ELEMENT DOCUMENT (P*)><!ELEMENT P (#PCDATA)><!ATTLIST P PNUMBER ID #REQUIRED>

]>

<DOCUMENT><P PNUMBER=”p1”>The quick brown fox</P><P PNUMBER=”p2”>The quick brown fox</P>

</DOCUMENT>

The IDREF Attribute TypeThe value of an attribute with the IDREF type is the ID of another element in thedocument. For example, Listing 10-2 shows the IDREF and ID attributes used toconnect children to their parents.

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 292

Page 11: XML Bible

293Chapter 10 ✦ Attribute Declarations in DTDs

Listing 10-2: family.xml

<?xml version=”1.0” standalone=”yes”?><!DOCTYPE DOCUMENT [

<!ELEMENT DOCUMENT (PERSON*)><!ELEMENT PERSON (#PCDATA)><!ATTLIST PERSON PNUMBER ID #REQUIRED><!ATTLIST PERSON FATHER IDREF #IMPLIED><!ATTLIST PERSON MOTHER IDREF #IMPLIED>

]>

<DOCUMENT><PERSON PNUMBER=”a1”>Susan</PERSON><PERSON PNUMBER=”a2”>Jack</PERSON><PERSON PNUMBER=”a3” MOTHER=”a1” FATHER=”a2”>Chelsea</PERSON><PERSON PNUMBER=”a4” MOTHER=”a1” FATHER=”a2”>David</PERSON>

</DOCUMENT>

You generally use this uncommon but crucial type when you need to establishconnections between elements that aren’t reflected in the tree structure of thedocument. In Listing 10-2, each child is given FATHER and MOTHER attributescontaining the ID attributes of its father and mother.

You cannot easily and directly use an IDREF to link parents to their children inListing 10-2 because each parent has an indefinite number of children. As aworkaround, you can group all the children of the same parents into a FAMILYelement and link to the FAMILY. Even this approach falters in the face of half-siblings who share only one parent. In short, IDREF works for many-to-onerelationships, but not for one-to-many relationships.

The ENTITY Attribute TypeAn ENTITY type attribute enables you to link external binary data-that is, anexternal unparsed general entity-into the document. The value of the ENTITYattribute is the name of an unparsed general entity declared in the DTD, which linksto the external data.

The classic example of an ENTITY attribute is an image. The image consists ofbinary data available from another URL. Provided the XML browser can support it,you may include an image in an XML document with the following declarations inyour DTD:

<!ELEMENT IMAGE EMPTY><!ATTLIST IMAGE SOURCE ENTITY #REQUIRED><!ENTITY LOGO SYSTEM “logo.gif”>

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 293

Page 12: XML Bible

294 Part II ✦ Document Type Definitions

Then, at the desired image location in the document, insert the following IMAGE tag:

<IMAGE SOURCE=”LOGO”/>

This approach is not a magic formula that all XML browsers automaticallyunderstand. It is simply one technique browsers and other applications may or maynot adopt to embed non-XML data in documents.

This technique will be explored further in Chapter 11, Embedding Non-XML Data.

The ENTITIES Attribute TypeENTITIES is a relatively rare plural form of ENTITY. An ENTITIES type attribute hasa value part that consists of multiple unparsed entity names separated bywhitespace. Each entity name refers to an external non-XML data source. One usefor this approach might be a slide show that rotates different pictures, as in thefollowing example:

<!ELEMENT SLIDESHOW EMPTY><!ATTLIST SLIDESHOW SOURCES ENTITIES #REQUIRED><!ENTITY PIC1 SYSTEM “cat.gif”><!ENTITY PIC2 SYSTEM “dog.gif”><!ENTITY PIC3 SYSTEM “cow.gif”>

Then, at the point in the document where you want the slide show to appear, insertthe following tag:

<SLIDESHOW SOURCES=”PIC1 PIC2 PIC3”>

Once again, this is not a universal formula that all (or even any) XML browsersautomatically understand, simply one method browsers and other applications mayor may not adopt to embed non-XML data in documents.

The NOTATION Attribute TypeThe NOTATION attribute type specifies that an attribute’s value is the name of anotation declared in the DTD. The default value of this attribute must also be thename of a notation declared in the DTD. Notations will be introduced in the nextchapter. In brief, notations identify the format of non-XML data, for instance byspecifying a helper application for an unparsed entity.

Chapter 11, Embedding Non-XML Data, covers notations.Cross-Reference

Cross-Reference

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 294

Page 13: XML Bible

295Chapter 10 ✦ Attribute Declarations in DTDs

For example, this PLAYER attribute of a SOUND element has type NOTATION, and adefault value of MP-the notation signifying a particular kind of sound file:

<!ATTLIST SOUND PLAYER NOTATION (MP) #REQUIRED><!NOTATION MP SYSTEM “mplay32.exe”>

You can also offer a choice of different notations. One use for this is to specifydifferent helper apps for different platforms. The browser can pick the one it hasavailable. In this case, the NOTATION keyword is followed by a set of parenthesescontaining the list of allowed notation names separated by vertical bars. Forexample:

<!NOTATION MP SYSTEM “mplay32.exe”><!NOTATION ST SYSTEM “soundtool”><!NOTATION SM SYSTEM “Sound Machine”><!ATTLIST SOUND PLAYER NOTATION (MP | SM | ST) #REQUIRED>

This says that the PLAYER attribute of the SOUND element may be set to MP, ST, orSM. We’ll explore this further in the next chapter.

At first glance, this approach may appear inconsistent with the handling of otherlist attributes like ENTITIES and NMTOKENS, but these two approaches are actu-ally quite different. ENTITIES and NMTOKENS have a list of attributes in the actualelement in the document but only one value in the attribute declaration in theDTD. NOTATION only has a single value in the attribute of the actual element inthe document, however. The list of possible values occurs in the attribute declara-tion in the DTD.

Predefined AttributesIn a way, two attributes are predefined in XML. You must declare these attributes inyour DTD for each element to which they apply, but you should only use thesedeclared attributes for their intended purposes. Such attributes are identified by aname that begins with xml:.

These two attributes are xml:space and xml:lang. The xml:space attributedescribes how whitespace is treated in the element. The xml:lang attributedescribes the language (and optionally, dialect and country) in which the elementis written.

xml:spaceIn HTML, whitespace is relatively insignificant. Although the difference between onespace and no space is significant, the difference between one space and two spaces,

Note

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 295

Page 14: XML Bible

296 Part II ✦ Document Type Definitions

one space and a carriage return, or one space, three carriage returns, and 12 tabs isnot important. For text in which whitespace is significant—computer source code,certain mainframe database reports, or the poetry of e. e. cummings, for example—you can use a PRE element to specify a monospaced font and preservation ofwhitespace.

XML, however, preserves whitespace by default. The XML processor passes allwhitespace characters to the application unchanged. The application usuallyignores the extra whitespace. However, the XML processor can tell the applicationthat certain elements contain significant whitespace that should be preserved. Thepage author uses the xml:space attribute to indicate these elements to theapplication.

If an element contains significant whitespace, the DTD should have an <!ATTLIST>for the xml:space attribute. This attribute will have an enumerated type with thetwo values, default and preserve, as shown in Listing 10-3.

Listing 10-3: Java source code with significant whitespaceencoded in XML

<?xml version=”1.0” standalone=”yes”?><!DOCTYPE PROGRAM [<!ELEMENT PROGRAM (#PCDATA)><!ATTLIST PROGRAM xml:space (default|preserve) ‘preserve’>

]><PROGRAM xml:space=”preserve”>public class AsciiTable {

public static void main (String[] args) {

for (int i = 0; i &lt; 128; i++) {System.out.println(i + “ “ + (char) i);

}

}

}</PROGRAM>

All whitespace is passed to the application, regardless of whether xml:space’svalue is default or preserve. With a value of default, however, the applicationdoes what it would normally do with extra whitespace. With a value of preserve,the application treats the extra whitespace as significant.

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 296

Page 15: XML Bible

297Chapter 10 ✦ Attribute Declarations in DTDs

Significance depends somewhat on the eventual destination of the data. Forinstance, extra whitespace in Java source code is relevant to a source code editorbut not to a compiler.

Children of an element for which xml:space is defined are assumed to behavesimilarly as their parent (either preserving or not preserving space), unless theypossess an xml:space attribute with a conflicting value.

xml:langThe xml:lang attribute identifies the language in which the element’s content iswritten. The value of this attribute can have type CDATA, NMTOKEN, or anenumerated list. Ideally, each of these attributes values should be one of the two-letter language codes defined by the original ISO-639 standard. The complete list ofcodes can be found on the Web athttp://www.ics.uci.edu/pub/ietf/http/related/iso639.txt.

For instance, consider the two examples of the following sentence from Petronius’sSatiricon in both Latin and English. A sentence tag encloses both sentences, but thefirst sentence tag has an xml:lang attribute for Latin while the second has anxml:lang attribute for English.

Latin:

<SENTENCE xml:lang=”la”>Veniebamus in forum deficiente now die, in quo notavimus frequentiam rerum venalium, non quidem pretiosarum sed tamen quarum fidem male ambulantem obscuritas temporis facillime tegeret.

</SENTENCE>

English:

<SENTENCE xml:lang=”en”>We have come to the marketplace now when the day is failing, where we have seen many things for sale, not for the valuable goods but rather that the darkness of the time may most easily conceal their shoddiness.

</SENTENCE>

While an English-speaking reader can easily tell which is the original text and whichis the translation, a computer can use the hint provided by the xml:lang attribute.This distinction enables a spell checker to determine whether to check a particularelement and designate which dictionary to use. Search engines can inspect theselanguage attributes to determine whether to index a page and return matchesbased on the user’s preferences.

Note

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 297

Page 16: XML Bible

298 Part II ✦ Document Type Definitions

Too Many Languages, Not Enough Codes

XML remains a little behind the times in this area. The original ISO-639 standard languagecodes were formed from two case-insensitive ASCII alphabetic characters. This standardallows no more than 26 ( 26 or 676 different codes. More than 676 different languages arespoken on Earth today (not even counting dead languages like Etruscan). In practice, thereasonable codes are somewhat fewer than 676 because the language abbreviationsshould have some relation to the name of the language.

ISO-639, part two, uses three-letter language codes, which should handle all languagesspoken on Earth. The XML standard specifically requires two-letter codes, however.

The language applies to the element and all its children until one of its childrendeclares a different language. The declaration of the SENTENCE element can appearas follows:

<!ELEMENT SENTENCE (#PCDATA)><!ATTLIST SENTENCE xml:lang NMTOKEN “en”>

If no appropriate ISO code is available, you can use one of the codes registered withthe IANA, though currently IANA only adds four additional codes (listed in Table10-2). You can find the most current list at http://www.isi.edu/in-notes/iana/assignments/languages/tags.

Table 10-2The IANA Language Codes

Code Language

no-bok Norwegian “Book language”

no-nyn Norwegian “New Norwegian”

i-navajo Navajo

i-mingo Mingo

For example:

<P xml:lang=”no-nyn”>

If neither the ISO nor the IANA has a code for the language you need (Klingonperhaps?), you may define new language codes. These “x-codes” must begin withthe string x- or X- to identify them as user-defined, private use codes. For example,

<P xml:lang=”x-klingon”>

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 298

Page 17: XML Bible

299Chapter 10 ✦ Attribute Declarations in DTDs

The value of the xml:lang attribute may include additional subcode segments,separated from the primary language code by a hyphen. Most often, the firstsubcode segment is a two-letter country code specified by ISO 3166. You canretrieve the most current list of country codes from http://www.isi.edu/in-notes/iana/assignments/country-codes. For example:

<P xml:lang=”en-US”>Put the body in the trunk of the car.</P><P xml:lang=”en-GB”>Put the body in the boot of the car.</P>

If the first subcode segment does not represent a two-letter ISO country code, itshould be a character set subcode for the language registered with the IANA, suchas csDECMCS, roman8, mac, cp037, or ebcdic-cp-ca. The current list can be found atftp://ftp.isi.edu/in-notes/iana/assignments/character-sets. Forexample:

<P xml:lang=”en-mac”>

The final possibility is that the first subcode is another x-code that begins with x-or X-. For example,

<P xml:lang=”en-x-tic”>

By convention, language codes are written in lowercase and country codes arewritten in uppercase. However, this is merely a convention. This is one of the fewparts of XML that is case-insensitive, because of its heritage in the case-insensitiveISO standard.

Like all attributes used in DTDs for valid documents, the xml:lang attribute mustbe specifically declared for those elements to which it directly applies. (It indirectlyapplies to children of elements that have specified xml:lang attributes, but thesechildren do not require separate declaration.)

You may not want to permit arbitrary values for xml:lang. The permissible valuesare also valid XML names, so the attribute is commonly given the NMTOKEN type.This type restricts the value of the attribute to a valid XML name. For example,

<!ELEMENT P (#PCDATA)><!ATTLIST P xml:lang NMTOKEN #IMPLIED “en”>

Alternately, if only a few languages or dialects are permitted, you can use anenumerated type. For example, the following DTD says that the P element may beeither English or Latin.

<!ELEMENT P (#PCDATA)><!ATTLIST P xml:lang (en | la) “en”>

You can use a CDATA type attribute, but there’s little reason to. Using NMTOKEN or anenumerated type helps catch some potential errors.

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 299

Page 18: XML Bible

300 Part II ✦ Document Type Definitions

A DTD for Attribute-Based Baseball StatisticsChapter 5 developed a well-formed XML document for the 1998 Major LeagueSeason that used attributes to store the YEAR of a SEASON, the NAME of leagues,divisions, and teams, the CITY where a team plays, and the detailed statistics ofindividual players. Listing 10-4, below, presents a shorter version of Listing 5-1. It isa complete XML document with two leagues, six divisions, six teams, and twoplayers. It serves to refresh your memory of which elements belong where and withwhich attributes.

Listing 10-4: A complete XML document

<?xml version=”1.0” standalone=”yes”?><SEASON YEAR=”1998”><LEAGUE NAME=”National League”><DIVISION NAME=”East”><TEAM CITY=”Atlanta” NAME=”Braves”><PLAYER GIVEN_NAME=”Marty” SURNAME=”Malloy” POSITION=”Second Base” GAMES=”11” GAMES_STARTED=”8” AT_BATS=”28” RUNS=”3” HITS=”5” DOUBLES=”1” TRIPLES=”0” HOME_RUNS=”1” RBI=”1” STEALS=”0” CAUGHT_STEALING=”0” SACRIFICE_HITS=”0” SACRIFICE_FLIES=”0” ERRORS=”0” WALKS=”2” STRUCK_OUT=”2” HIT_BY_PITCH=”0” /><PLAYER GIVEN_NAME=”Tom” SURNAME=”Glavine” POSITION=”Starting Pitcher” GAMES=”33” GAMES_STARTED=”33” WINS=”20” LOSSES=”6” SAVES=”0” COMPLETE_GAMES=”4” SHUTOUTS=”3” ERA=”2.47” INNINGS=”229.1” HOME_RUNS_AGAINST=”13” RUNS_AGAINST=”67” EARNED_RUNS=”63” HIT_BATTER=”2” WILD_PITCHES=”3” BALK=”0” WALKED_BATTER=”74” STRUCK_OUT_BATTER=”157” />

</TEAM></DIVISION><DIVISION NAME=”Central”><TEAM CITY=”Chicago” NAME=”Cubs”></TEAM>

</DIVISION><DIVISION NAME=”West”><TEAM CITY=”San Francisco” NAME=”Giants”></TEAM>

</DIVISION></LEAGUE><LEAGUE NAME=”American League”><DIVISION NAME=”East”><TEAM CITY=”New York” NAME=”Yankees”></TEAM>

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 300

Page 19: XML Bible

301Chapter 10 ✦ Attribute Declarations in DTDs

</DIVISION><DIVISION NAME=”Central”><TEAM CITY=”Minnesota” NAME=”Twins”></TEAM>

</DIVISION><DIVISION NAME=”West”><TEAM CITY=”Oakland” NAME=”Athletics”></TEAM>

</DIVISION></LEAGUE>

</SEASON>

In order to make this document valid and well-formed, you need to provide a DTD.This DTD must declare both the elements and the attributes used in Listing 10-4.The element declarations resemble the previous ones, except that there are fewerof them because most of the information has been moved into attributes:

<!ELEMENT SEASON (LEAGUE, LEAGUE)><!ELEMENT LEAGUE (DIVISION, DIVISION, DIVISION)><!ELEMENT DIVISION (TEAM+)><!ELEMENT TEAM (PLAYER*)><!ELEMENT PLAYER EMPTY>

Declaring SEASON Attributes in the DTDThe SEASON element has a single attribute, YEAR. Although some semanticconstraints determine what is and is not a year (1998 is a year; March 31 is not) theDTD doesn’t enforce these. Thus, the best approach declares that the YEARattribute has the most general attribute type, CDATA. Furthermore, we want allseasons to have a year, so we’ll make the YEAR attribute required.

<!ATTLIST SEASON YEAR CDATA #REQUIRED>

Although you really can’t restrict the form of the text authors enter in YEARattributes, you can at least provide a comment that shows what’s expected. Forexample, it may be a good idea to specify that four digit years are required.

<!ATTLIST SEASON YEAR CDATA #REQUIRED> <!— e.g. 1998 —><!— DO NOT USE TWO DIGIT YEARS like 98, 99, 00!! —>

Declaring LEAGUE and DIVISION Attributes in the DTDNext, consider LEAGUE and DIVISION. Each of these has a single NAME attribute.Again, the natural type is CDATA and the attribute will be required. Since these are

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 301

Page 20: XML Bible

302 Part II ✦ Document Type Definitions

two separate NAME attributes for two different elements, two separate <!ATTLIST>declarations are required.

<!ATTLIST LEAGUE NAME CDATA #REQUIRED><!ATTLIST DIVISION NAME CDATA #REQUIRED>

A comment may help here to show document authors the expected form; forinstance, whether or not to include the words League and Division as part of thename.

<!ATTLIST LEAGUE NAME CDATA #REQUIRED> <!— e.g. “National League” —>

<!ATTLIST DIVISION NAME CDATA #REQUIRED><!— e.g. “East” —>

Declaring TEAM Attributes in the DTDA TEAM has both a NAME and a CITY. Each of these is CDATA and each is required:

<!ATTLIST TEAM NAME CDATA #REQUIRED><!ATTLIST TEAM CITY CDATA #REQUIRED>

A comment may help to establish what isn’t obvious to all; for instance, that theCITY attribute may actually be the name of a state in a few cases.

<!ATTLIST TEAM NAME CDATA #REQUIRED><!ATTLIST TEAM CITY CDATA #REQUIRED><!— e.g. “San Diego” as in “San Diego Padres”

or “Texas” as in “Texas Rangers” —>

Alternately, you can declare both attributes in a single <!ATTLIST> declaration:

<!ATTLIST TEAM NAME CDATA #REQUIREDCITY CDATA #REQUIRED>

Declaring PLAYER Attributes in the DTDThe PLAYER element boasts the most attributes. GIVEN_NAME and SURNAME, thefirst two, are simply CDATA and required:

<!ATTLIST PLAYER GIVEN_NAME CDATA #REQUIRED><!ATTLIST PLAYER SURNAME CDATA #REQUIRED>

The next PLAYER attribute is POSITION. Since baseball positions are fairly standard,you might use the enumerated attribute type here. However “First Base,” “Second

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 302

Page 21: XML Bible

303Chapter 10 ✦ Attribute Declarations in DTDs

Base,” “Third Base,” “Starting Pitcher,” and “Relief Pitcher” all contain whitespaceand are therefore not valid XML names. Consequently, the only attribute type thatworks is CDATA. There is no reasonable default value for the position so we makethis attribute required as well.

<!ATTLIST PLAYER POSITION CDATA #REQUIRED>

Next come the various statistics: GAMES, GAMES_STARTED, AT_BATS, RUNS, HITS,WINS, LOSSES, SAVES, SHUTOUTS, and so forth. Each should be a number; but asXML has no data typing mechanism, we simply declare them as CDATA. Since not allplayers have valid values for each of these, let’s declare each one implied ratherthan required.

<!ATTLIST PLAYER GAMES CDATA #IMPLIED><!ATTLIST PLAYER GAMES_STARTED CDATA #IMPLIED>

<!— Batting Statistics —><!ATTLIST PLAYER AT_BATS CDATA #IMPLIED><!ATTLIST PLAYER RUNS CDATA #IMPLIED><!ATTLIST PLAYER HITS CDATA #IMPLIED><!ATTLIST PLAYER DOUBLES CDATA #IMPLIED><!ATTLIST PLAYER TRIPLES CDATA #IMPLIED><!ATTLIST PLAYER HOME_RUNS CDATA #IMPLIED><!ATTLIST PLAYER RBI CDATA #IMPLIED><!ATTLIST PLAYER STEALS CDATA #IMPLIED><!ATTLIST PLAYER CAUGHT_STEALING CDATA #IMPLIED><!ATTLIST PLAYER SACRIFICE_HITS CDATA #IMPLIED><!ATTLIST PLAYER SACRIFICE_FLIES CDATA #IMPLIED><!ATTLIST PLAYER ERRORS CDATA #IMPLIED><!ATTLIST PLAYER WALKS CDATA #IMPLIED><!ATTLIST PLAYER STRUCK_OUT CDATA #IMPLIED><!ATTLIST PLAYER HIT_BY_PITCH CDATA #IMPLIED>

<!— Pitching Statistics —><!ATTLIST PLAYER WINS CDATA #IMPLIED><!ATTLIST PLAYER LOSSES CDATA #IMPLIED><!ATTLIST PLAYER SAVES CDATA #IMPLIED><!ATTLIST PLAYER COMPLETE_GAMES CDATA #IMPLIED><!ATTLIST PLAYER SHUTOUTS CDATA #IMPLIED><!ATTLIST PLAYER ERA CDATA #IMPLIED><!ATTLIST PLAYER INNINGS CDATA #IMPLIED><!ATTLIST PLAYER HOME_RUNS_AGAINST CDATA #IMPLIED><!ATTLIST PLAYER RUNS_AGAINST CDATA #IMPLIED><!ATTLIST PLAYER EARNED_RUNS CDATA #IMPLIED><!ATTLIST PLAYER HIT_BATTER CDATA #IMPLIED><!ATTLIST PLAYER WILD_PITCHES CDATA #IMPLIED><!ATTLIST PLAYER BALK CDATA #IMPLIED><!ATTLIST PLAYER WALKED_BATTER CDATA #IMPLIED><!ATTLIST PLAYER STRUCK_OUT_BATTER CDATA #IMPLIED>

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 303

Page 22: XML Bible

304 Part II ✦ Document Type Definitions

If you prefer, you can combine all the possible attributes of PLAYER into onemonstrous <!ATTLIST> declaration:

<!ATTLIST PLAYER GIVEN_NAME CDATA #REQUIREDSURNAME CDATA #REQUIREDPOSITION CDATA #REQUIREDGAMES CDATA #IMPLIEDGAMES_STARTED CDATA #IMPLIEDAT_BATS CDATA #IMPLIEDRUNS CDATA #IMPLIEDHITS CDATA #IMPLIEDDOUBLES CDATA #IMPLIEDTRIPLES CDATA #IMPLIEDHOME_RUNS CDATA #IMPLIEDRBI CDATA #IMPLIEDSTEALS CDATA #IMPLIEDCAUGHT_STEALING CDATA #IMPLIEDSACRIFICE_HITS CDATA #IMPLIEDSACRIFICE_FLIES CDATA #IMPLIEDERRORS CDATA #IMPLIEDWALKS CDATA #IMPLIEDSTRUCK_OUT CDATA #IMPLIEDHIT_BY_PITCH CDATA #IMPLIED

WINS CDATA #IMPLIEDLOSSES CDATA #IMPLIEDSAVES CDATA #IMPLIEDCOMPLETE_GAMES CDATA #IMPLIEDSHUTOUTS CDATA #IMPLIEDERA CDATA #IMPLIEDINNINGS CDATA #IMPLIEDHOME_RUNS_AGAINST CDATA #IMPLIEDRUNS_AGAINST CDATA #IMPLIEDEARNED_RUNS CDATA #IMPLIEDHIT_BATTER CDATA #IMPLIEDWILD_PITCHES CDATA #IMPLIEDBALK CDATA #IMPLIEDWALKED_BATTER CDATA #IMPLIEDSTRUCK_OUT_BATTER CDATA #IMPLIED>

One disadvantage of this approach is that it makes it impossible to include evensimple comments next to the individual attributes.

The Complete DTD for the Baseball Statistics ExampleListing 10-5 shows the complete attribute-based baseball DTD.

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 304

Page 23: XML Bible

305Chapter 10 ✦ Attribute Declarations in DTDs

Listing 10-5: The complete DTD for baseball statistics thatuses attributes for most of the information

<!ELEMENT SEASON (LEAGUE, LEAGUE)><!ELEMENT LEAGUE (DIVISION, DIVISION, DIVISION)><!ELEMENT DIVISION (TEAM+)><!ELEMENT TEAM (PLAYER*)><!ELEMENT PLAYER EMPTY>

<!ATTLIST SEASON YEAR CDATA #REQUIRED><!ATTLIST LEAGUE NAME CDATA #REQUIRED><!ATTLIST DIVISION NAME CDATA #REQUIRED><!ATTLIST TEAM NAME CDATA #REQUIRED

CITY CDATA #REQUIRED>

<!ATTLIST PLAYER GIVEN_NAME CDATA #REQUIRED><!ATTLIST PLAYER SURNAME CDATA #REQUIRED><!ATTLIST PLAYER POSITION CDATA #REQUIRED><!ATTLIST PLAYER GAMES CDATA #REQUIRED><!ATTLIST PLAYER GAMES_STARTED CDATA #REQUIRED>

<!— Batting Statistics —><!ATTLIST PLAYER AT_BATS CDATA #IMPLIED><!ATTLIST PLAYER RUNS CDATA #IMPLIED><!ATTLIST PLAYER HITS CDATA #IMPLIED><!ATTLIST PLAYER DOUBLES CDATA #IMPLIED><!ATTLIST PLAYER TRIPLES CDATA #IMPLIED><!ATTLIST PLAYER HOME_RUNS CDATA #IMPLIED><!ATTLIST PLAYER RBI CDATA #IMPLIED><!ATTLIST PLAYER STEALS CDATA #IMPLIED><!ATTLIST PLAYER CAUGHT_STEALING CDATA #IMPLIED><!ATTLIST PLAYER SACRIFICE_HITS CDATA #IMPLIED><!ATTLIST PLAYER SACRIFICE_FLIES CDATA #IMPLIED><!ATTLIST PLAYER ERRORS CDATA #IMPLIED><!ATTLIST PLAYER WALKS CDATA #IMPLIED><!ATTLIST PLAYER STRUCK_OUT CDATA #IMPLIED><!ATTLIST PLAYER HIT_BY_PITCH CDATA #IMPLIED>

<!— Pitching Statistics —><!ATTLIST PLAYER WINS CDATA #IMPLIED><!ATTLIST PLAYER LOSSES CDATA #IMPLIED><!ATTLIST PLAYER SAVES CDATA #IMPLIED><!ATTLIST PLAYER COMPLETE_GAMES CDATA #IMPLIED><!ATTLIST PLAYER SHUTOUTS CDATA #IMPLIED><!ATTLIST PLAYER ERA CDATA #IMPLIED><!ATTLIST PLAYER INNINGS CDATA #IMPLIED><!ATTLIST PLAYER HOME_RUNS_AGAINST CDATA #IMPLIED>

Continued

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 305

Page 24: XML Bible

306 Part II ✦ Document Type Definitions

Listing 10-5 (continued)

<!ATTLIST PLAYER RUNS_AGAINST CDATA #IMPLIED><!ATTLIST PLAYER EARNED_RUNS CDATA #IMPLIED><!ATTLIST PLAYER HIT_BATTER CDATA #IMPLIED><!ATTLIST PLAYER WILD_PITCHES CDATA #IMPLIED><!ATTLIST PLAYER BALK CDATA #IMPLIED><!ATTLIST PLAYER WALKED_BATTER CDATA #IMPLIED><!ATTLIST PLAYER STRUCK_OUT_BATTER CDATA #IMPLIED>

To attach the above to Listing 10-4, use the following prolog, assuming of coursethat Example 10-5 is stored in a file called baseballattributes.dtd:

<?xml version=”1.0” standalone=”yes”?><!DOCTYPE SEASON SYSTEM “baseballattributes.dtd” >

SummaryIn this chapter, you learned how to declare attributes for elements in DTDs. Inparticular, you learned the following concepts:

✦ Attributes are declared in an <!ATTLIST> tag in the DTD.

✦ One <!ATTLIST> tag can declare an indefinite number of attributes for asingle element.

✦ Attributes normally have default values, but this condition can change byusing the keywords #REQUIRED, #IMPLIED, or #FIXED.

✦ Ten attribute types can be declared in DTDs: CDATA, Enumerated, NMTOKEN,NMTOKENS, ID, IDREF, IDREFS, ENTITY, ENTITIES, and NOTATION.

✦ The predefined xml:space attribute determines whether whitespace in anelement is significant.

✦ The predefined xml:lang attribute specifies the language in which anelement’s content appears.

In the next chapter, you learn how notations, processing instructions, and unparsedexternal entities can be used to embed non-XML data in XML documents.

✦ ✦ ✦

3236-7 ch10.F.qc 6/29/99 1:07 PM Page 306