copyright © william g. cafiero, 2000 ge global exchange services page 1 bill cafiero 972-231-2180...
TRANSCRIPT
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 1
Bill [email protected]
A short on-line XML Tutorial may be found at www.gegxs.com
A short on-line XML Tutorial may be found at www.gegxs.com
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 2
Where Does XML Fit?
The Internet creates a need for platform-independent technology.
XML
HTML
Java
Internetpresentation
dataprocessing
platform
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 3
eXtensible Markup Language
• XML is designed to transfer structured text and data among systems in multiple organizations
• XML and HTML both evolved from SGML– XML focuses on document data content– HTML focuses on document display
• All markup languages use tags “< >” and “</>” to markup the text and data to provide information about the information
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 4
HTML
HTML - HyperText Markup Language• Non-proprietary document formatting standard• Displayable on any web browser
HTML - When my <B>dog</B> jumped over the <U>lazy fox</U>, I didn't know <I>what</I> to do!
Result - When my dog jumped over the lazy fox, I didn't know what to do!
HTML - <FONT SIZE=20 COLOR=Red>My Red Text!</FONT>
Result - My Red Text!
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 5
History of XML
XML eXtensible Markup Language• Conceived, 1996 by team chaired by Jon Bosak.• W3C (World Wide Web Consortium) recommended for
standard, January 1998.• Derived from SGML (Structured General Markup
Language) parent.• HTML (Hyper-Text Markup Language) is an earlier cousin.• This is what XML looks like:
<garage_sale>
<date>February 29, 1999</date>
<time>7:30 am</time>
<place>249 Cedar Elm Road</place>
<notes>Lots of high quality junk for sale</notes>
</garage_sale>
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 7
This is an XML “Instance” document
<?XML version=“1.0”?><!DOCTYPE Book SYSTEM ”Book.dtd"><Book isbn="1111"> <title>The Catcher in the Rye</title> <author>J.D. Salinger</author> <year>1948</year> <price>11.95</price></Book>
This sample contains a “prolog”, a reference to the accompanying “rules” for the document, element and attribute names and “content”. These are all identified by “markup syntax”
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 8
Tags, Elements and Attributes
• Tags are labels that tells an application or other agent to do something to whatever is encased in the tags
<title> is a “start” tag. </title> is the closing or “end” tag
• An Element refers to both the tags plus the content (the stuff between the tags). E.G.
<title>The Catcher in the Rye</title>
• The outermost element in the hierarchy is called the “Root Element” (Book in our example)
• Any tag can have an Attribute that takes the form of name/value pairs, <tag Attribute = “value”/>
E.G. <Book isbn="1111">
Note:
XML is case sensitive so that tags like <Book>, <book>, and <BOOK> are all different
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 9
The XML Character Set
• Unicode is the character set for XML
• Universal Multiple-Octet Coded Character Set (UCS)
• 16-bit encoding for the worlds principle languages, including ancient languages
• ISO/IEC 19646-1:1993• Description of the whole set is available from
http://www.unicode.org
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 10
XML Characters
Defined by UNICODE (ISO 10646), supported by NT, Win95/98 and Java platforms.
F000E000D000C000B000A0009000800070006000500040003000200010000000
General scripts
Symbols
CJK MiscCJK Ideographs Hangul
SurrogatesPrivate Use
Compatibility
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 11
Names
Names begin with a letter or one of a few punctuation characters (“-”, “.”) and continues with letters, digits, hyphens, underscores, colons or periods.
Spaces are not allowed in names!
Names beginning with the string “XML” are reserved.
<data_element>
<Order-Date>
<Shipping_Address>
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 12
Elements
There are two kinds of elements - those that have content and those that don’t (empty elements)
<title>This is the title</title>
<empty_element attr =“attribute-value”/>
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 13
Attributes
Attributes are a way of attaching characteristics or properties to elements of a document. Attributes have names and values.
<person height=“165cm”>Bill Smith</person>
<person height=“165cm” weight=“165lb”>John Doe</person>
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 14
XML Prolog
• Encoding declaration
• Must always precede the XML content
• Processing instructions, so no closing tag
• The Prolog is made up of an XML declaration and a document type declaration (both optional). We will look at the DOCTYPE declaration in more detail later.
<?xml version=“1.0” encoding=“UTF-8” ?>
<!DOCTYPE docbook SYSTEM “http://www.davenport.org/docbook”>
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 15
Comments
Adding Comments to XML
<?xml version=“1.0”?>
<!-- There is no other version yet -->
<!-- Now on to the Doctype -->
<!DOCTYPE sample...
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 17
Structure
• Structure in XML documents resembles storage containers
• Each storage container fits inside a larger one which fits inside another, and so on
• The storage containers make up the physical structure and the way they fit inside one another makes up the logical structure of the document
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 18
Shipment_DetailLine_NumberQuantity
ShipmentShipment_Numb
erShipment_Date
DetailLine_NumberItemQuantityPrice
An Order in XML
<?xml version="1.0"?><!DOCTYPE Order SYSTEM "Orders.dtd"><Order>
<Order_Number>1001</Order_Number><Order_Date>04/24/00</Order_Date><Customer>Bill's Supply Company</Customer><Detail>
<Line_Number>1</Line_Number><Item>A-123</Item><Quantity>10</Quantity><Price>1.50</Price>
</Detail><Detail>
<Line_Number>2</Line_Number><Item>B-987</Item><Quantity>20</Quantity><Price>2.00</Price>
</Detail><Shipment>
<Shipment_Number>1</Shipment_Number><Ship_Date>3/15/00</Ship_Date><Shipment_Detail>
<Line_Number>1</Line_Number><Quantity>10</Quantity>
</Shipment_Detail><Shipment_Detail>
<Line_Number>2</Line_Number><Quantity>15</Quantity>
</Shipment_Detail></Shipment></Order>
OrdersOrder_NumberOrder_DateCustomer
DetailLine_NumberItemQuantityPrice
ShipmentShipment_Numb
erShipment_Date
Shipment_DetailLine_NumberQuantity
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 19
Structure : Well Formed
• Well-formed documents are tightly constructed -- no “loose ends”
• Well-formed documents use complete storage containers
• No missing end tags in well-formed XML documents
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 20
Document Type Definitions(DTD’s)
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 21
Document Type Definitions
Because we can create our own tags and document structures using XML, we need a mechanism for defining the tags and what the valid structure is.
• A DTD is where we declare our specific elements tags.
• The DTD is where we declare the attributes of each tag.
• The DTD specifies the “occurrence indicators” for the child elements
? zero or one* zero or more+ one or more(none) exactly one
<!ELEMENT book (title, author+, year?, price)><!ELEMENT title (#PCDATA)><!ELEMENT author (#PCDATA)><!ELEMENT year (#PCDATA)><!ELEMENT price (#PCDATA)><!ATTLIST book isbn ID #REQUIRED>
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 22
DTD Syntax
Declarative markup consists of:
• Markup open delimiter: <!
• A keyword
• Declaration information
• Markup close delimiter: >
<!KEYWORD declaration_information>
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 23
Document Type Declaration
• Must precede all markup and character data• Links document and declarations• Can be an external reference• Not required for well-formed non-validating XML
document• Name must match the root tag element
<!DOCTYPE example SYSTEM “greeting.dtd” [ ……]>
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 24
Internal or External DTDs
<?xml version=“1.0”?> <!DOCTYPE label [<!ELEMENT label (name, street, city, state, country, code)><!ELEMENT name (#PCDATA)<!ELEMENT street (#PCDATA)<!ELEMENT city (#PCDATA)<!ELEMENT state (#PCDATA)<!ELEMENT country (#PCDATA)<!ELEMENT code (#PCDATA)]><label>
<name>Rock N. Robyn>/name><street>Jay Bird Street</street><city>Baltimore</city><state>MD</state><country>USA</country><code>43214</code>
</label>
Here the DTD is part of the same data file as the XML data
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 25
DTD in an External File
<?xml version=“1.0”?><!DOCTYPE label SYSTEM “label.dtd”><label>
<name>Rock N. Robyn>/name><street>Jay Bird Street</street><city>Baltimore</city><state>MD</state><country>USA</country><code>43214</code>
</label>Here the DTD is stored in a local file
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 26
DTD on the Web
<?xml version=“1.0”?><!DOCTYPE label SYSTEM “http//www.myserver.com/label.dtd”><label>
<name>Rock N. Robyn>/name><street>Jay Bird Street</street><city>Baltimore</city><state>MD</state><country>USA</country><code>43214</code>
</label> Here the DTD is on a remote web server
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 27
XML Elements
• Elements are the "building blocks" within the logical structure of a document like paragraph, title, section...
• Elements have unique names and lengths are not restricted
• The first NAME character must be a letter, “_” or “:” (Note: numerics are not allowed).
• Declarations must be all uppercase words, names are mixed case and case sensitive
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 28
Element Declarations
<!ELEMENT sender (msgType, msgDetails) > element name
content model keyword name
ELEMENT is the keyword
“sender” is the element name. Element names specify the name of the declared element; element names are sometimes called “generic identifiers” (gi).
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 29
XML Connectors
| OR, in any sequence
, THEN, in sequence
( ) GROUP connector
Connectors provide the rules for the sequence or order the items in the content model may appear
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 30
Sequence Connector
Elements separated by the sequence must appear
in the order they are listed.
<!ELEMENT chapter (title, paragraph)>
A chapter consists of a title followed by a paragraph.
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 31
OR Connector
The OR connector means only one of these elements may appear.
<!ELEMENT Item (Product | Service)>
A Item consists of a Product ORA Item consists of a Service
But not both!
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 32
XML Occurrence Indicators
? ZERO or ONE (optional)
* ZERO or MORE (optional repeatable)
+ ONE or MORE (required repeatable)
(null) ONE ONLY (required)
Occurrence indicators provide the rules that show how many times items in the content model may appear
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 33
Required and Repeatable
Element appears 1 or more times
<!ELEMENT chapter (title, para+)>
A chapter could consist of one of these:• A title followed by a paragraph• A title followed by two paragraphs• A title followed by three paragraphs• A title followed by thirty-seven paragraphs• many other options
NOTE: You would not be allowed to have a chapter with only a title.
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 34
Optional
• A title followed by a paragraph• A paragraph
The optional occurrence indicator means the element may appear 0 or 1 times.
<!ELEMENT chapter (title?, para)>
A chapter could consist of one of these:
NOTE: You could not have a chapter with 2 titles followed by a paragraph
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 35
Nested Model Groups
• A title followed by at least one paragraph• A title followed by an illustration• An illustration• A paragraph• Many paragraphs
Model groups can be nested inside one another.Model groups can be nested inside one another.
<!ELEMENT chapter (title?, (para+ | illus)) >
A chapter could consist of one of these:
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 36
Attributes with a Choice of Values
Attributes describe special conditions associated with individual elements; they are often used as the “adjectives” of XML
<!ELEMENT person (#PCDATA)
<!ATTLIST person email CDATA #REQUIRED>
keyword
elementattributetype name
attributeDefault/Requirement
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 37
Attribute Defaults
An attribute may have a default value specified in the DTD.
<!ATTLIST shirt size (small|medium|large) medium>
<!ATTLIST shoes size CDATA “13”>
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 38
An XML Document and its DTD
<?XML version=“1.0”?><!DOCTYPE Book SYSTEM ”Book.dtd"><Book isbn="1111"> <title>The Catcher in the Rye</title> <author>J.D. Salinger</author> <year>1948</year> <price>11.95</price></Book>
<!ELEMENT book (title, author+, year?, price)><!ELEMENT title (#PCDATA)><!ELEMENT author (#PCDATA)><!ELEMENT year (#PCDATA)><!ELEMENT price (#PCDATA)><!ATTLIST book isbn ID #REQUIRED>
Book.dtd
Book.xml
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 39
Good Style for DTD’s
• Nesting: elements may contain more than one other element: Buyer(Company,Contact)
• Elements that have a single element in their models are ones where that element is repeatable: Street(Line+)
• Data modelling and logical naming of elements ensures accurate representation of relationship between components.
• Keep role of DTD simple – don’t overload
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 41
Why Namespaces?
The appeal of XML lies in the ability to invent tags that convey meaningful information. For example, XML allows you to represent information about a book as:
<BOOK> <TITLE>A Suitable Boy</TITLE> <PRICE currency="US Dollar">22.95</PRICE></BOOK>
Similarly, you can represent information about an author as:<AUTHOR> <TITLE>Mr</TITLE> <NAME>Vikram Seth</NAME></AUTHOR>
This example illustrates a problem. While the human reader can distinguish between the different interpretations of the "TITLE" element, a computer program does not have the context to tell them apart.
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 42
Namespaces
Namespaces solve this problem by associating a vocabulary (or namespace) with a tag name. For example, the titles can be written as:
<BookInfo:TITLE>A Suitable Boy</BookInfo:TITLE>
<AuthorInfo:TITLE>Mr.</AuthorInfo:TITLE>
The name preceding the colon, the prefix, refers to a namespace, a Universal Resource Identifier (URI). The URI ensures global uniqueness when merging XML sources, while the associated prefix, a short name that substitutes for the namespace, need only be unique in the tightly scoped context of the document. With this scheme, there are no conflicts in tags and attributes, and two tags can be the same only if they are from the same namespace and have the same tag name. This allows a document to contain both book and author information without confusion about whether the "TITLE" element refers to the book or the author.
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 43
Namespaces - Examples
An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names. This example shows both an element (publisher) and an attribute (category) qualified by the prefix “pubspace”:
<books xmlns:pubspace="http://www.foo.com/bar"> <book pubspace:category="research">Numerical
Analysis of Partial Differential Equations</book>
<pubspace:publisher>Addison Wesley</pubspace:publisher>
</books>
The attribute "xmlns" is an XML keyword for a namespace declaration.
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 44
What XML namespaces are Not
Two things that XML namespaces are not have caused a lot of confusion, so we'll mention them here:
• XML namespaces are not a technology for joining XML documents that use different DTDs. Although they might be used in such a technology, they don't provide it themselves.
• The URIs used as XML namespace names do not point to schemas, information about the namespace, or anything else -- they're just identifiers. URIs were used simply because they're a well-known system for creating unique identifiers. Don't even think about trying to resolve these URIs.
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 46
Why Schemas?
Although the DTD may have been powerful enough in many instances, it is inadequate to meet the needs of many applications that have been envisaged to use XML.
• The DTD does not support data types beyond character data, which is a severe limitation for describing standards and exposing database schemas.
• The DTD is not integrated with new XML technologies like Namespaces, so it is not possible to import constructs from external schemas to enable code reuse.
• Applications simply need a more flexible mechanism to specify constraints on document structure than a
context-free grammar.
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 47
Additional Features of XML Schema
One of the main weaknesses of DTD was its lack of support for data types beyond character strings. For example:
<year>A few years ago</year>
is correct using the previous DTD.
XML Schema supports the following additional data types:
string, boolean, real, decimal, integer, non-negative integer, positive integer, non-positive integer, negative integer, dateTime, date,time, timePeriod, binary, uri, language
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 48
User Defined Data Types
Further constraints can be placed on the range of possible data values by creating new data types that extend built-in data types.
For example, if our book list covered Twentieth Century literature, in XML Schema, we can limit the values of the year element to be between 1900 and 1999
<datatype name="YearType">
<basetype name="positive-integer"/> <minInclusive>1900</minInclusive> <maxInclusive>1999</maxInclusive></datatype><element name="year" type="YearType"></element>
Note the Schema itself is written in XML!
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 49
Code to check the structure and content
of the data
Code to actuallydo the work
In a typical program, up to 60% of the code is spent checking the data!
Save effort by using XML Schemas
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 50
If your data is structured asXML, and there is a schema,then you can hand the data-checking task off to a schema validator.
Thus, your code is reducedby up to 60%!!!
Big $$ savings!
Save effort using XML Schemas (cont.)
Code to check the structure and content
of the data
Code to actuallydo the work
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 52
A Payment Transaction
Payor Data Payee Data Q-Air Airlines Bulbco Aircraft Engines Co,15 Blue Street One Neumann WayNorth Sydney, NSW 2060 Cincinnati, OH 45215Australia U.S.A.
Banks with First Electronic Bank of Australia Last National BankSydney, Australia Fairfield, CTSWIFT ID is FEBAAU01IMT Federal Reserve Routing Code is 554433221Account number is 9-8-7 Account number is 111-222-333
The Payment Q-Air Airlines wishes to make a $30,473,600.00 USD payment to Bulbco Aircraft Engines Company, tobe paid on 02/13/95. Q-Air will issue the payment transaction on 12/31/94.
This payment covers two invoices that Bulbco Aircraft Engines Company sent Q-Air Airlines. The firstwas invoice 13479, dated 12/09/94, for $30,200,000.00. The second was invoice 13521, dated 12/13/94,for $375,000.00. The first invoice is paid in full. On the second invoice, Q-Air Airlines took twodeductions. The first, for $1,400.00 was for an incorrect discount calculation. The second, for $100,000.00was a credit for returning used parts for rebuilding. Thus Q-Air paid $273,600.00 of the original$375,000.00 invoiced.
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 53
Payment Transaction Hierarchy
Total_Payment_Amount
Payment_Date
Name
Address
Bank
Bank_Code
Account_Number
Payee
Name
Address
Bank
Bank_Code
Account_Number
Payor
Funds_Tranfer
Amount
Reason
Adjustment
Invoice_Number
Date
Amount_Invoiced
Amount_Paid
Amount
Reason
Adjustment
Invoice
Remittance_Data
Payment
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 54
The Payment in XML
<?xml version="1.0"?><!DOCTYPE Payment SYSTEM "Q-Air to Bulbco Payment.dtd"><Payment> <!-- The Payment contains two parts - Funds tranfer info and remittance data --> <Funds_Transfer> <Total_Payment_Amount>30,473,600.00 </Total_Payment_Amount> <Payment_Date>02/13/95</Payment_Date> <Payor> <Name>Q-Air Airlines</Name>
<Address>15 Blue Street, North Sydney, NSW 2060,Australia</Address>
<Bank>First Electronic Bank of Australia</Bank> <Bank_Code>FEBAAU01IMT</Bank_Code> <Bank_Account>9-8-7</Bank_Account> </Payor> <Payee> <Name>Bulbco Aircraft Engines Co.</Name> <Address>One Neumann Way, Cincinnati, OH 45215, USA</Address> <Bank>Last National Bank</Bank> <Bank_Code>554433221</Bank_Code> <Bank_Account>111-222-333</Bank_Account> </Payee> </Funds_Transfer> <Remittance_Data> <Invoice> <Invoice_Number>13479</Invoice_Number> <Date>12/09/94</Date> <Amount_Invoiced>30,200,000.00</Amount_Invoiced> <Amount_Paid>30,200,000.00</Amount_Paid> </Invoice> <Invoice> <Invoice_Number>13521</Invoice_Number> <Date>12/13/94</Date> <Amount_Invoiced>375,000.00</Amount_Invoiced> <Amount_Paid>273,600.00 </Amount_Paid> <Adjustment> <Amount>1,400.00</Amount> <Reason>Incorrect discount</Reason> </Adjustment> <Adjustment> <Amount>100,000.00</Amount> <Reason>Returned part credit</Reason> </Adjustment> </Invoice> </Remittance_Data></Payment>
Copyright © William G. Cafiero, 2000
GE Global eXchange Services
Page 55
The DTD for the Payment
<!ELEMENT Payment (Funds_Transfer, Remittance_Data)> <!ELEMENT Funds_Transfer (Total_Payment_Amount, Payment_Date, Payor, Payee)> <!ELEMENT Total_Payment_Amount (#PCDATA)> <!ELEMENT Payment_Date (#PCDATA)> <!ELEMENT Payor (Name, Address, Bank, Bank_Code, Bank_Account)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Address (#PCDATA)> <!ELEMENT Bank (#PCDATA)> <!ELEMENT Bank_Code (#PCDATA)> <!ELEMENT Bank_Account (#PCDATA )> <!ELEMENT Payee (Name, Address, Bank, Bank_Code, Bank_Account)> <!ELEMENT Remittance_Data (Adjustment*, Invoice*)> <!-- Remittance data may be only adjustments, only invoices, or one or the other --> <!ELEMENT Adjustment (Amount, Reason)> <!ELEMENT Amount (#PCDATA)> <!ELEMENT Reason (#PCDATA)> <!ELEMENT Invoice (Invoice_Number, Date, Amount_Invoiced, Amount_Paid, Adjustment*)> <!ELEMENT Invoice_Number (#PCDATA)> <!ELEMENT Date (#PCDATA)> <!ELEMENT Amount_Invoiced (#PCDATA)> <!ELEMENT Amount_Paid (#PCDATA)>