topics the "bigger picture" –the "xml sales pitch" –xml/xhtml vs. sgml/html...

29
Topics The "bigger picture" The "XML sales pitch" XML/XHTML vs. SGML/HTML XML in electronic publishing XML and the future, web 2.0 XML basics: Building blocks: elements, attributes, … Structural constraints: Well-formed XML Character sets – Namespaces Validity: DTDs and XML schemas Week 0534 Introduction to XML 1

Upload: michael-cummings

Post on 24-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Topics

• The "bigger picture"– The "XML sales pitch"– XML/XHTML vs. SGML/HTML– XML in electronic publishing– XML and the future, web 2.0

• XML basics:– Building blocks: elements, attributes, …– Structural constraints: Well-formed XML– Character sets– Namespaces– Validity: DTDs and XML schemas

Week 0534 Introduction to XML 1

Page 2: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 2

Why Use XML (1)

• Consider a line from a .dat file:2394287410|Verbatim|DataLife MF 2HD|10|3.5"|black

• or the XML-fragment:<product barcode="2394287410"> <manufacturer>Verbatim</manufacturer> <name>DataLife MF 2HD</name> <quantity>10</quantity> <size>3.5"</size> <color>black</color></product>

• Which one is easier to interpret, more robust, easier to use for complex structures?

Page 3: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 3

Why Use XML (2)

• Simple syntax• Self documenting format• Support for hierarchical structures• Simple debugging (both for user as machine)• Language and platform independent• Many different tools• Growing library of ”standard” formats

Page 4: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 4

Main Types of XML Documents

• Narrative-Centric Documents:– Largely with irregular structure, for instance a

novel

• Data-Centric Documents:– With a regular structure, for instance a telephone

directory

• Hybrid Documents:– Typically contains highly regular parts mixed with

irregular contents - e.g., a product catalog

Page 5: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 5

XML/XHTML vs. SGML/HTML

• Problems with SGML/HTML:– SGML is a complex markup language– HTML is only suitable for narrative documents– HTML became a bad mix of structure and layout– HTML browsers are too tolerant for language

• The XML/XHTML promise:– XML has a simple and extendible structure– Suitable for both data and narrative documents– XHTML is for structure only - CSS is for layout– Enforces strict rules

Page 6: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 6

XML in Electronic Publishing

• Some important XML-applications:– Text transformation/printing: XSLT, XSL-FO, SVG,

…– Content: GML, MathML, NewsML, DocBook, …– Data exchange: SOAP, AJAX, xCAL, …– Semantics: RDF, Dublin Core, …

Page 7: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 7

Web 2.0

• Next generation of web is about data - not documents!– Read the O'Reilly Web 2.0 article

Page 8: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 8

XML Basics

• Core literature:– An Introduction to XML and Web Technologies:

• Chapters 1-2

– XML in a Nutshell:• Chapters 1-2, 4-7, 26

Page 9: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 9

What XML consists of

• Elements• Attributes• Entities and entity references• Text• CDATA sections• Processing instructions• Comments

Page 10: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 10

XML declaration

• XML documents should begin with a XML declaration that give information about:– XML version– Encoding– If external DTDs are to be used

• Example:

<?xml version=”1.0” encoding=”ISO-8859-1” standalone=”yes”?>

Page 11: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 11

XML elements

• The basic entity in XML• Consists of a start-tag, content and a end-tag• Simple content:

<title>Web page for IMT4501</title>

• Mixed content:<p> <strong>No</strong>, you can’t do<em>that</em>!</p>

• Empty element:<br></br><br /> <!-- Short version -->

Page 12: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 12

Attributes

• Extra information about an element• Example:

<img height=”240” width=”320” src=”logo.gif” />

• Values enclosed by apostrophes in pairs:<a href=”http://www.hig.no/”>HiG</a><a href=’http://www.oa.no/’>Oppland Arbeiderblad</a>

• But not:<a href=”http://www.vg.no/’>VG</a><a href=’http://www.cnn.no/”>CNN</a>

Page 13: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 13

Well-formed XML

• One root element• Correct nesting of elements• Always a matching end-tag to each element• Case sensitive names• Attribute values in quotes• One attribute can’t appear more than once inside an

element• No comments inside tags• No unescaped < or & inside text content

Page 14: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 14

Sample XML structure<?xml version=”1.0” standalone=”yes”?><higDB> <department abbr=”IMT” ... > <subject code=”IMT4131” ... />

<subject code=”REA4001” ... /> <employee empNo=”eNo287” ... > <subject subjRef=”IMT4131” /> </employee> <employee empNo=”eNo293” ... > <subject subjRef=”REA4001” /> </employee> <employee empNo=”eNo307” ... > <subject subjRef=”IMT4131” /> </employee> </department> <!-- ... --></higDB>

Page 15: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 15

Tree for the example

Page 16: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 16

XML names• Have to start with ’_’ or letter• Followed by numbers, letters, ’_’, ’-’ or .’• XML as a prefix (regardless of capitalization) are

reserved• Acceptable names:

<språk><fugl-eller-fisk><_level1.melding>

• Non acceptable names:<tittel$språk><fugl eller fisk><2erStilling>

Page 17: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 17

Entities and entity references

• Five predefined entity references in XML• Other entity references can be defined in an external

DTD

XHTML Entities

Å &Aring; (unicode: &#197;)Æ &Aelig; (unicode : &#198;)Ø &Oslash; (unicode : &#216;)å &aring; (unicode : &#229;)æ &aelig; (unicode : &#230;)ø &oslash; (unicode : &#248;)

Predefined XML Entities

< &lt; (less than)> &gt; (greater than)& &amp; (ampersand)” &quot; (quotation)’ &apos; (apostrophe)

Unicode Entities

© &#169; (xhtml: &copy;)α &#948; (xhtml: &alpha;)€ &#8364; (xhtml: &euro;)

Page 18: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 18

Text and character parsing

• Text is basically PCDATA (Parsed Character Data):– The parser replaces entity references with value

• CDATA can be used where we want the parser to interpret the character data:<logiskUttrykk> <![CDATA[(len > 0) && (len < 256)]]></logiskUttrykk>

Page 19: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 19

Comments

• Enclosed by <!-- and -->• Should not appear inside a tag• A double hyphen -- can not appear anywhere inside

the comment• Are meant for users, not application• Correct use:

<FotoDB><!-- Example of image database dump --> <Image_series> ...

• Wrong use:<FotoDB <!-- Example of image database dump -->><!-- Not finished -- look at it later -->

Page 20: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 20

Processing instructions

• Enclosed by <? and ?>• Target follows right after <?• Can be used to send information to the application• Comments were used before, but XML parsers can

choose not to send comments to the application• Example:

<?php $logged_in = $_SESSION[“logged_in”]; if (!$logged_in) { echo “You have to <a href=’login.php’>log in</a> first”; }?>

Page 21: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 21

Exercise

• Complete the ZVON XML tutorial:http://www.zvon.org/xxl/XMLTutorial/General/contents.html

Page 22: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 22

Character Sets

• Historically, character encoding has been a challence: – The same code has been used for different

characters on different systems

• Now, there are standards:– ISO-8859-1 (ISO Latin), ˝default˝ on the web– Unicode - defines a larger character set, used by

XML on default:• UTF-8 efficient for western languages• UTF-16• UTF-32

Page 23: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 23

Namespaces – why?

• Distinguish between elements and attributes from different XML vocabularies

• Namespaces allow two or more XML vocabularies to use the same document

• Group all related elements and attributes from a single XML application – easier to be recognized by the software

Page 24: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 24

Namespaces – how?

• A prefix attached to a vocabulary (identified by a URI) with attributes xmlns:<Description xmlns:dc=”http://purl.org/dc/”>

• The prefix is defined inside the sub tree where the element are root

• Elements in a vocabulary identified by the prefix:<Description xmlns:dc=”http://purl.org/dc/”> <dc:title>XML in a Nutshell</dc:title> <dc:creator>Elliotte Rusty Harold</dc:creator> <dc:creator>W. Scott Means</dc:creator> <dc:date>2002</dc:date></Description>

Page 25: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 25

More about the prefix

• You choose the name of the prefix, the URI identifies the vocabulary

• The prefix has to be a leagal XML name

Page 26: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 26

Namespaces – what is it really?

• A vocabulary identified by a fixed Uniform Resource Identifier:– http://...– ftp://...– …

• The URI has to be unique to make the vocabulary unique

• The URI does not need to point at any defined document

Page 27: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 27

Example scope

Page 28: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 28

Default namespace

• Default namespace can be used where all non-prefixed elements belongs to a fixed vocabulary

• Example:<RDF xmlns=”http://www.w3.org/TC/REC-rdf-syntax#”> <Description xmlns:dc=”http://purl.org/dc/”> <dc:title>XML in a Nutshell</dc:title> <dc:creator>Elliotte Rusty Harold</dc:creator> <dc:creator>W. Scott Means</dc:creator> <dc:date>2002</dc:date> </Description></RDF>

Page 29: Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building

Week 0534 Introduction to XML 29

Exercise

• Complete the ZVON XML tutorial:http://www.zvon.org/xxl/NamespaceTutorial/

Output/contents.html