basics of xml

25
A Programme Under the compumitra Series Copyright 2010-14 © Sunmitra Education Technologies Limited, India eXtensible Markup Language (XML) A comment by Tim Bray of Sun Microsystems on Celebration of 10 th Anniversary of XML in Feb 2008. "There is essentially no computer in the world, desk-top, hand-held, or back-room, that doesn't process XML sometimes. This is a good thing, because it shows that information can be packaged and transmitted and used in a way that's independent of the kinds of computer and software that are involved. XML won't be the last neutral information wrapping system; but as the first, it's done very well."

Upload: indiangarg

Post on 08-Sep-2014

102 views

Category:

Technology


2 download

DESCRIPTION

XML is everywhere. Computers, Mobiles, Bank Systems, Internet, TVs, Microwaves, all use XML as an Information Wrapping and Information Xchange System. We will tell you all the basics in a simplest possible way.

TRANSCRIPT

Page 1: Basics of XML

A Programme Under the compumitra SeriesCopyright 2010-14 © Sunmitra Education Technologies Limited, India

eXtensible Markup Language (XML)

A comment by Tim Bray of Sun Microsystems on Celebration of

10th Anniversary of XML in Feb 2008. "There is essentially no computer in the world, desk-top, hand-held, or back-room, that doesn't process XML sometimes. This is a good thing, because it shows that information can be packaged and transmitted and used in a way that's independent of the kinds of computer and software that are involved. XML won't be the last neutral information wrapping system; but as the first, it's done very well."

Page 2: Basics of XML

Outline

XML Eye-opener. What is XML? HTML vs. XML. Basic XML Syntax. Constituents. Some XML Rules. Element Vs. Attribute. Node Naming Principles. Advanced Concepts related to XML Future of XML

Page 3: Basics of XML

XML Eye Opener

SIMPLE: So simple that you would wonder, why you were not trying to understand it till date.

SUCCESSFUL: Most successful data storage format till date that even big brand who were strong believers of proprietary formats for commercial reasons have started using it.

SOLID: Most solid ageless concept that this generation will pass-on to other future generations and they will keep the baton moving.

Page 4: Basics of XML

What is XML-1

XML is abbreviation of eXtensible Markup Language.

XML evolved from more general purpose ISO standard SGML (Standard Generalised Markup Language).

All Data needs Description to make it some useful Information. XML provides a neat solution.

XML looks like normal English but it has been designed to be machine readable.

Page 5: Basics of XML

What is XML-2

XML can store data

XML can help standardization in exchange of data.

User defined markup tags to name dataitems.

Library Functions are available in most programming languages to parse XML.

The syntax looks like<addressbook>

<adrrecord><name>Name1</name><address>Address1</address><city>City1</city>

</adrrecord></addressbook>

Page 6: Basics of XML

Understanding Basic XML Syntax

<?xml version="1.0" encoding="UTF-8" standalone="no"?><COUNTRYLIST> <COUNTRY group="G20"> <NAME>India</NAME> <CODE>IN</CODE> <ISD>91</ISD> <CAPITAL largestcity="No">New Delhi</CAPITAL> <LCITY>Mumbai</LCITY> <CURRENCY>Indian Rupee</CURRENCY> <CURCODE>INR</CURCODE> </COUNTRY> <COUNTRY group="G5"> <NAME>Japan</NAME> <CODE>JP</CODE> <ISD>81</ISD> <CAPITAL largestcity="Yes">Tokyo</CAPITAL> <LCITY>Tokyo</LCITY> <CURRENCY>Yen</CURRENCY> <CURCODE>JPY</CURCODE> </COUNTRY></COUNTRYLIST>

Element Node

XML Declarations:

Version: of XML

Encoding: Character-set Used. UTF-8 is common (unicode 8 bit variant)

Standalone=Yes, depicts non-usage of external type definitions

Attribute Node

Root Element Node

Element Value

Attribute Value

Page 7: Basics of XML

XML Constituents

Elements<address><name>somename</name></address>

Attributes <Book Version="1.0"><name></name></Book>

Five predefined Entities to allow for special charaters in the PCDATA area. > to &gt; < to &lt; & to &amp; ' to &apos; " to &quot;

CDATA section (Character Data Not to be parsed). This is meant for putting lot of code like or general purpose data. Even HTML data can be put here. <![CDATA[ ... ]]>

Processing Instructions (PI) or Directives given betweem <? ?><?xml-stylesheet type="text/css" href="mySheet.css"?> or even initial declaration like below is a PI<?xml version="1.0" encoding="UTF-8" standalone="no"?>

Parsable Character data (PCDATA) between element <address> start and end tags.

Attribute has a name and a value in quotes.

Page 8: Basics of XML

Some XML Rules - 1 All elements to have closing tags.

<address>invalid syntax<address>valid syntax</address>

All elements are case sensitive.<Name>incorrect</name><Name>correct</Name>

Elements shall be correctly nested.<address><name>incorrect</address></name><address><name>correct</name></address>

Attribute values must be quoted.<Book Version=1.0><name></name></Book> (Incorrect) <Book Version="1.0"><name></name></Book> (correct)

Page 9: Basics of XML

Some XML Rules - 2 XML Document must have a root element and only one root

element (it can have any name though).<root>

<Child>correct</child></root>

Entities in data values must use special codes.> as &gt; < as &lt; & as &amp; ' as &apos; " as &quot;

Comments has this syntax.<!– This is a comment -->

Comments can not contain – in its text matter.

Whitespace are preserved as against HTML. For e.g."Hello World" in HTML would be "Hello World". In XML it will retain exact spaces specified.

Empty Elements have this kind of optional format.<Name />

Page 10: Basics of XML

Some XML Rules - 3

Whitespace are preserved as against HTML. For e.g."Hello World" in HTML would be "Hello World".

In XML it will retain exact spaces specified.

The optional style of writing empty elements is.

<Name /> in place of <Name></Name>

Page 11: Basics of XML

XML Practice: Element Vs Attributes - 1 It is generally possible to define all data as

ELEMENT tags in a tree format.<Library>

<Book><ID>201</ID><ISBN>8175257660</ISBN><Author>Name1</Author><Title>Book Title</Title>

</Book></Library>

A neat alternative to above could be using ATTRIBUTES as follows:<Library>

<Book ID="201" ISBN="8175257660"><Author>Name1</Author><Title>Book Title</Title>

</Book></Library>

Page 12: Basics of XML

XML Practice: Element Vs Attributes -2

Which method to use is a thoughtful decision. Information that is surely singular (will not be

repeated) and is not domain specific is recommended as ATTRIBUTE.

If you are unable to classify or the Information can be repeated (For e.g. Author tag can be repeated in above example) should be used as ELEMENT.

Even better format for previous example would be<Library> <Book ID="201">

<ISBN>8175257660</ISBN><Author>Name1</Author><Title>Book Title</Title>

</Book></Library>This is because ISBN is a book related property while ID may be related to a storage place.

Page 13: Basics of XML

XML Node Naming – Begins with

Node (elements or attributes) names shall begin with a letter or _ (underscore).<1STLINE></1STLINE> invalid element naming <LINE1></LINE1> valid naming <BOOK 1Ver="1.00"></BOOK> invalid attribute naming<BOOK _Ver="1.00"></BOOK> valid attribute naming

Page 14: Basics of XML

XML Node Naming – Consists of

Name can consist of Any English Character or even any foreign

language character as allowed by the encoding set given in the declaration.

<Name>Sun</Name><ना�म>सू�रज</ना�म>

A dot (.) or hyphen (-) or _(undescore)

<Address.Cityname>Delhi</Address.Cityname><Address-Cityname>Delhi</Address-Cityname><Address_Cityname>Delhi</Address_Cityname>

Tabs and Spaces are not allowed in XML Node Names.

Page 15: Basics of XML

XML Node Naming – Based on Namespace Name can belong to a namespace

Table may be used in html or furniture. One can resolve this problem by using namespaces as follows<h:table>  <h:tr>    <h:td>Apples</h:td>    <h:td>Bananas</h:td>  </h:tr></h:table>

<f:table>  <f:name>Dining Table</f:name>   <f:width>120</f:width>   <f:length>230</f:length></f:table>

Page 16: Basics of XML

HTML Vs XML - 1

Similarities.Both Uses markup tags

(elements and attributes) e.g. <H1>Heading1</H1> or <font face="Verdana"></font>.

Both use entities e.g. &lt; &gt; etc.

Both are derived from SGML

Page 17: Basics of XML

HTML Vs XML - 2

Differences.HTML has predefined tags,

XML tags are user defined.HTML is for Humans and

errors are ignored. XML is for computers as data storehouse or definitions so errors can not be ignored.

HTML is usually not updated by programs while XML is meant for program based writing.

HTML has large number of entities. XML has just five.

Page 18: Basics of XML

XSL (Extensible Stylesheet Language)

Unlike HTML styling using CSS (Cascade Style Sheet) it has tags that are user defined.

It has three parts XSLT (XSL Transformation): for showing

XML data as transformed XHTML onto a webpage.

Xpath: a way to reach a particular data-item in an XML file. This is very often useful in reading XML based configuration files.

XSL-FO (XSL Formatting Objects): Provides a display/print formatting mechanism for XML data.

Page 19: Basics of XML

DTD (Document Type Definition)

A DTD is referred within a DOCTYPE declaration in an XML file such as.<!DOCTYPE note SYSTEM "Note.dtd">

This DTD file will have the format as follows.

<!DOCTYPE note[<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>]>

XML file has the root node named note with four sub-elements.

The sub-elements have the PCDATA format.

Page 20: Basics of XML

Parsing XML

Process of reading XML file and extracting valid data out of it is called "PARSING".

Parsers are of two typesNon-Validating Parser: When the

document doesn't check against a validating DTD.

Validating Parser: When a document is checked against its DTD

Page 21: Basics of XML

Some Advanced Concepts Related to XML

XML Schema: Relates to defining validation rules in form of XSD (XML Schema Definition) files that too are in the XML format.

XQuery: This is a way to search within an XML file and get the selected nodes that match the criteria.

Page 22: Basics of XML

Where to View/Edit

Browsers: Most Browsers are good at viewing XML. Internet Explorer is particularly good at it.

Editors: Special Editors are available that allow good XML views/editing facilities. Microsoft's XML Editor, Peter's XML editor are good at it.

Office Tools: MS-Word, Frontpage like tools provide good XML Editing. Even MS-Excel support XML file opening.

Visual Studio/WebDeveloper: They provide excellent environment for XML editing and viewing along with validation support.

Page 23: Basics of XML

Let's Quickly Revise

2 Types of Nodes: Elements and Attributes. Elements are repeatable. Attributes can always be put up like elements, reverse may not be true.

Special syntax for non-parsable data as CDATA.

5 Entities for special symbols( <, >, ', ", &).

HTML style Comments Allowed. <!-- comments -->

Case-Sensitive. Closing Required

One can apply other Processing Instructions (PI) that is enclosed with in <? ?>. First line is usually a Version declaration line which is also a PI.

Always have a single root node.

Page 24: Basics of XML

Future of XML

All websites may one day be written in XML. HTML has already been re-standardised as XHTML which provides better syntax checking and browser compatibility.

XML promises to be the most open system for storage of information from all IT gadgets like Desktops to Mobile phones to ipods to ipads to DVD players to microwave-ovens etc. It is already being used and it is expected to be used in more and more devices.

All office documents/e-books offline and online shall ultimately be in XML as it is the sole non-proprietary format that is simple and is able to meet the needs well.

Page 25: Basics of XML

Ask and guide me at [email protected]

Share this information with as many people as possible.

Keep visiting www.sunmitra.com for programme updates.