web engineering notes unit 4
TRANSCRIPT
Unit-IV/Web Engineering St.Aloysius Institute of Technology
INTRODUCTION XML (Extensible Markup Language) is a flexible way to create common information formats
and share both the format and the data on the World Wide Web, intranets, and elsewhere. For
example, computer makers might agree on a standard or common way to describe the
information about a computer product (processor speed, memory size, and so forth) and then
describe the product information format with XML. Such a standard way of describing data
would enable a user to send an intelligent agent (a program) to each computer maker's Web site,
gather data, and then make a valid comparison. XML can be used by any individual or group of
individuals or companies that wants to share information in a consistent way.
XML, a formal recommendation from the World Wide Web Consortium (W3C), is similar to the
language of today's Web pages, the Hypertext Markup Language (HTML). Both XML and
HTML contain markup symbols to describe the contents of a page or file. HTML, however,
describes the content of a Web page (mainly text and graphic images) only in terms of how it is
to be displayed and interacted with. For example, the letter "p" placed within markup tags starts
a new paragraph. XML describes the content in terms of what data is being described. For
example, the word "phoneme" placed within markup tags could indicate that the data that
followed was a phone number. This means that an XML file can be processed purely as data by a
program or it can be stored with similar data on another computer or, like an HTML file, that it
can be displayed. For example, depending on how the application in the receiving computer
wanted to handle the phone number, it could be stored, displayed, or dialled.
XML is "extensible" because, unlike HTML, the markup symbols are unlimited and self-
defining. XML is actually a simpler and easier-to-use subset of the Standard Generalized
Markup Language (SGML), the standard for how to create a document structure. It is expected
that HTML and XML will be used together in many Web applications. XML markup, for
example, may appear within an HTML page.
Early applications of XML include Microsoft's Channel Definition Format (CDF), which
describes a channel, a portion of a Web site that has been downloaded to your hard disk and is
then is updated periodically as information changes. A specific CDF file contains data that
specifies an initial Web page and how frequently it is updated. Another early application is
ChartWare, which uses XML as a way to describe medical charts so that they can be shared by
doctors.Applications related to banking, e-commerce ordering, personal preference profiles,
purchase orders, litigation documents, part lists, and many others are anticipated. VALIDATING XML FILES When you validate your XML file, the XML validator will check to see that your file is valid and
well-formed. The XML editor will process XML files that are invalid or not well-formed. The
editor uses heuristics to open a file using the best interpretation of the tagging that it can. For
example, an element with a missing end tag is simply assumed to end at the end of the
document. As you make updates to a file, the editor incrementally reinterprets your document,
changing the highlighting, tree view, and so on. Many formation errors are easy to spot in the
Prepared By: Mr.Aditya patel Page 1
Unit-IV/Web Engineering St.Aloysius Institute of Technology
syntax highlighting, so you can easily correct obvious errors on-the-fly. However, there will be
other cases when it will be beneficial to perform formal validation on your documents.
You can validate your file by selecting it in the Navigator view, right-clicking it, and clicking
Validate. Any validation problems are indicated in the Problems view. You can double-click on
individual errors, and you will be taken to the invalid tag in the file, so that you can make
corrections.
Note: If you receive an error message indicating that the Problems view is full, you can increase
the number of error messages allowed by clicking Window > Preferences and selecting
General > Markers . Select the Use marker limits check box and change the number in
theLimit visible items per group field.
You can set up a project's properties so that different types of project resources are automatically
validated when you save them. From a project's pop-up menu, click Properties, then select
Validation. Any validators you can run against your project will be listed in the Validation page.
The purpose of a Document Type Definition or DTD is to define the structure of a document
encoded in XML (eXtended Markup Language).
For introductory material about XML, see the XML help page.
It is possible to build and use files containing XML tags without ever defining what tags are
legal. However, if you want to insure that files conform to a known structure, writing a DTD is
the preferred method.
A well-formed file is one that obeys the general XML rules for tags: tags must be
properly nested, opening and closing tags must be balanced, and empty tags must end
with '/>'.
A valid file is not only well-formed, but it must also conform to a publicly available
DTD that specifies which tags it uses, what attributes those tags can contain, and which
tags can occur inside which other tags, among other properties.
The advantage of a valid file is that its contents are more predictable for applications that want to
process or present that file. The DTD insures that only certain tags can be used in certain places.
DEFINITIONS
We need to review some terminology before proceeding:
A proper XML name must start with a letter or underbar (_), with the rest letters,
underbars, digits, or hyphen (-).
A tag is one of the XML constructs used to mark up documents. All tags start with a less-
than symbol (<) and end with a greater-than symbol (>).
Prepared By: Mr.Aditya patel Page 2
Unit-IV/Web Engineering St.Aloysius Institute of Technology
An element is a section of an XML document that acts as a unit. It may be either empty
element, or it may have content.
An empty element consists of a single tag of the form
<gi.../>
Where gi is the tag type (or ―generic identifier‖), and the tag may include attributes. Note the
slash before the closing ―>‖; this signifies an empty tag.
An opening tag begins a section of an XML document that ends with the corresponding
closing tag. An opening tag has this form:
<gi...>
where gi is the tag type (or ―generic identifier‖), and the tag may include attributes. A closing
tag has the form:
</gi>
The content is everything between the opening tag and its corresponding closing tag. The
content may be other elements or just plain text.
The DTD can contain several different types of declarations:
Element declarations let you specify what kinds of tags can be used, and what (if
anything) can appear inside the contents of the element.
Attribute declarations define what attributes you can use inside a given element.
Entity declarations define chunks of fixed text that can be included elsewhere.
Notation declarations define file types (like JPG and WAV files) so you can refer to non-
XML files like image and sound files.
ELEMENTS WITH MIXED CONTENT
In general, an element can have any mixture of text and other elements as children. You can
specify exactly which elements can be children. If you like, you can even specify that the
children must occur in a given order. You can also specify that the child elements are optional.
So, in the general form of the declaration <!ELEMENT gi (content)>, the content is an
expression syntax—that is, it consists of operators and operands arranged in arbitrarily complex
ways. Let's start with some simple cases to show you the features of a content declaration, but
keep in mind that these features can be used in combination. The simplest case is when an
element a has a single child element b:
<!ELEMENT a (b)>
Prepared By: Mr.Aditya patel Page 3
Unit-IV/Web Engineering St.Aloysius Institute of Technology
The above declaration in a DTD means that an element <a>...</a> must contain exactly
one <b> element.
To specify that a child element can occur one or more times, append a plus sign (+) after the
child element name. For example, to say that a <squid> element may contain one or more
<tentacle> elements:
<!ELEMENT squid (tentacle+)>
You can also specify that a child element can occur any number of times, or not at all. Append
an asterisk (*), meaning ―zero or more of the previous,‖ after the child element name:
<!ELEMENT lizard (leg*)> <!-- some <lizard>s have no <leg>s -->
The question-mark suffix (?) means the child element is optional: it can occur zero or one time in
the content of the element you're declaring. For example, suppose an <oven> element can either
be empty or contain a <pie> element:
<!ELEMENT oven (pie?)>
If you want a certain sequence of children, name the child elements in a comma-separated list.
For example, suppose a <memo>element must contain exactly one <from> element, then one
<to> element, one <subject>, and one <message> element:
<!ELEMENT memo (from,to,subject,message)>
But you can use the +, *, and ? operators in this declaration. For example, suppose that you want
to require that a <memo> must have<from> and <to> elements, but the <subject> element is
optional, and it can have zero or more <message> elements. You'd then declare it like this:
<!ELEMENT memo (from,to,subject?,message*)>
Sometimes you need to specify that there is a choice of children. The ―or‖ operator (|) can be
used to separate the choices. For example, suppose that a <trophy> element can have either a
child named <bowling> or a child named <tennis>. Here's how you'd declare it:
<!ELEMENT trophy (bowling|tennis)>
You can also apply the usual suffix operators to groups of elements. For example, suppose you
have an element <timerecord> that starts with a required <purpose> element, followed by zero
or more pairs of <start-time> and <end-time> records:
<!ELEMENT timerecord (purpose,(start-time,end-time)*)>
Here's another more general example:
<!ELEMENT stock ((pig|chicken|cow)*)>
The above example says a <stock> element can contain any number of the three child elements,
in any order.
Prepared By: Mr.Aditya patel Page 4
Unit-IV/Web Engineering St.Aloysius Institute of Technology
Moreover, you can allow regular, untagged text to be mixed in with your specified child tags by
placing #PCDATA at the start of a list of choices. For example, suppose a <speech> element can
contain any mixture of regular text, and text tagged with the elements<loud> and <soft>:
<!ELEMENT speech ((#PCDATA|loud|soft)*)>
<!ELEMENT loud (#PCDATA)>
<!ELEMENT soft (#PCDATA)>
So, the content part of the element declaration can be arbitrarily complex. There are some ways
#PCDATA cannot be used, and there are other uncommon features you may need; refer to the
XML standard or a good book on the subject.
ATTRIBUTE DECLARATIONS
If an element is to have attributes, the names and possible values of those attributes must be
declared in the DTD. Here is the general form:
<!ATTLIST ename {aname atype default} ...>
where ename is the name of the element for which you're defining attributes, aname is the name
of one of that element's possible attributes, atype describes what values it can have, and default
describes whether it has a default value. The last three items can be repeated inside an
<!ATTLIST...> declaration, one group per attribute.
The atype part describing the attribute's type can have three kinds of values:
The keyword CDATA means that the attribute can have any character string as a value.
For example, suppose you want every <play> element to have a title attribute that can contain
any text, and that attribute is required. Here is the complete attribute declaration:
<!ATTLIST play title CDATA #REQUIRED>
There are several tokenized attribute types, which are required to have a certain
structure. See tokenized attributes below.
You can provide a specific set of legal values for the attribute; see enumerated
attributes below.
The last part of the declaration, default, specifies whether the attribute can be omitted, and what
value it will have if omitted. This must be one of the following:
REQUIRED- The attribute must always be supplied.
IMPLIED - The attribute can be omitted, and the DTD does not provide a default value.
Anyone reading this file may assume a default value, but that is not the DTD's problem. "value"
The attribute can be omitted, and the default value is the quoted string that you provide.
Prepared By: Mr.Aditya patel Page 5
Unit-IV/Web Engineering St.Aloysius Institute of Technology
FIXED "value" - The attribute must be given and must have the given "value".
TOKENIZED ATTRIBUTES
You can restrict an attribute to have only values with a certain structure. Here are the possible
values of the atype part of the attribute declaration for such attributes:
ID
An ID attribute must be a unique identifier for that node. This allows other nodes to refer to it.
The attribute value must also be a valid XML name (see above).
IDREF
An IDREF attribute is a reference to an ID attribute in a different node.
For example, suppose that in your DTD, there is a <sailor> element with an ID-type nickname
attribute, and another element <duty> with an IDREF-type attribute called sailor-nick. Then if
you have an element like this:
<sailor nickname='Bluto'>...</sailor>
then this tag would refer to that element:
<duty sailor-nick='Bluto'>...</duty>
IDREFS
The value of an IDREFS attribute must contain one or more ID references separated by spaces.
Example:
<roster sailor-nicks='Bluto Popeye Olive_Oyl'/>
ENTITY
Use this attribute type to refer to external, non-parsed entities. See the section on notations,
below.
ENTITIES
Like ENTITY, but the attribute can be a list of one or more entity names separated by spaces.
NMTOKEN
The attribute value must be a name token, conforming to the rules for XML names (see above).
NMTOKENS
Like NMTOKEN, but the attribute value can contain one or more name tokens separated by
spaces.
ENUMERATED ATTRIBUTES
Prepared By: Mr.Aditya patel Page 6
Unit-IV/Web Engineering St.Aloysius Institute of Technology
You can specify that attributes must have one of a set of one or more values. Here is the general
form of the atype part of the<!ATTLIST...> declaration:
(value1|value2|...)
For example, suppose you want your <vehicle> element to have a kind attribute that must have a
value of either "car","truck", or "boat":
<!ATTLIST vehicle
kind (car|truck|boat) #REQUIRED>
You can also supply a default value in quotes. For example:
<!ATTLIST vehicle
kind (car|truck|boat) "car">
DECLARING AND USING ENTITIES
In a DTD, entities come in four flavours:
A general entity is a chunk of text with a name attached, so you can use the entity as a
sort of shorthand to get the related text substituted in its place.
For example, suppose you are working on a new product called Project Giant-Slayer, but you
know that the marketing department will change the name when it's released to the market. You
could define the current product name as an entity named &product, and use it everywhere in
your product literature. Then, when the marketing department decides on the final name, you can
change the declaration of the entity and the new name will magically appear in place of the old
one in all your web pages and brochures.
A character entity is one of the many standardized special characters that you can use
when you need a character unavailable in your local character set.
A parameter entity is like a general entity, but it can be used as shorthand for parts of a
content declaration in an element declaration.
A binary or non-parsed entity represents an external file that is not in XML format.
GENERAL ENTITIES
General entities have names of the form &name;, where the name follows the usual rules for
XML names (above).
To declare a general entity, use a declaration of this general form in your DTD:
<!ENTITY ename "text">
where ename is the name of the entity you are defining (without the initial & and final ;),
and text is the text you want substituted for that entity.
Prepared By: Mr.Aditya patel Page 7
Unit-IV/Web Engineering St.Aloysius Institute of Technology
For example, to define an entity named &cr; with your copyright string, you might use a
declaration like this:
<!ENTITY cr "Copyright (C) 1763 Cotton Mather LLP">
CHARACTER ENTITIES
To use special characters in your document, you
the decimal number of the character you want. at http://www.w3.org/TR/html401/sgml/entities.html.
can use the form &#n; where n is
A table of these entities is online
PARAMETER ENTITIES
The purpose of a parameter entity is to serve as a short hand for some or all of the content part of
an element declaration.
The general form is:
<!ENTITY % ename "text">
For example, suppose you have a lot of tags whose content model is "#PCDATA|bold|ital)*".
You could define an entity like this:
<!ENTITY bitext "(#PCDATA|bold|ital)*">
Then, to define an element <excuse> with that content:
<!ELEMENT excuse %bitext;>
BINARY (NON-PARSED) ENTITIES
This last type of entity represents a file, like an image or sound file, that is not XML. To declare
such an entity:
<!ENTITY ename SYSTEM "url" NDATA nname>
where ename is the name of the entity you are defining, url is the URL where the file can be
found, and nname is the name of thenotation that the file uses. See the section on notations
below for an example.
NOTATION DECLARATIONS
The purpose of a notation declaration is to define the format of some external non-XML file,
such as a sound or image file, so you can refer to such files in your document.
The general form of a notation declaration can be either of these:
<!NOTATION nname PUBLIC std>
<!NOTATION nname SYSTEM url>
Prepared By: Mr.Aditya patel Page 8
Unit-IV/Web Engineering St.Aloysius Institute of Technology
where nname is the name you are giving to the notation; std is the published name of a public
notation, and url is a reference to a program that can render a file in the given notation.
There are four steps to connecting an attribute to a notation:
1. Declare the notation. Example:
<!NOTATION jpeg PUBLIC "JPG 1.0">
2. Declare the entity. For example:
3. <!ENTITY bogie-pic SYSTEM
"http://stars.com/bogart.jpg" NDATA jpeg>
4. Declare the attribute as type ENTITY. For example:
<!ATTLIST star-bio pin-shot ENTITY #REQUIRED>
5. Use the attribute:
<star-bio pin-shot="bogie-pic">...</star-bio>
In a way, you could argue that this is the most widespread use of XML, as XHTML. Because
XHTML is simply HTML 4.0 reworked, many HTML 4.0 sites are actually using an invalid
form of XHTML.
But the benefit of XML is not that it already exists as XHTML, but that you can create web
documents from XML using XSLT to transform your documents into HTML. You can then send
your XML to an XSLT processor on the web server and serve that result to the web browser.
This makes your documentation available in whatever format you need it to be in.
XML AND CONTENT MANAGEMENT
Ironically, with most websites that use XML, the web designers and content developers might
not even know that XML is there. This is because there is generally a CMS or content
management system that sits in front of the XML to make it easier for the content writers to
write their web content without worrying about how to write HTML or design web pages. XML AND DOCUMENTATION Many companies are moving to XML to write their internal documentation. The most common
XML platform for this is DocBook. The advantage of XML for documentation is that it can be
used to define the common traits in books, magazines, stories, advertisements, and so forth. And
DocBook already has that type of information defined.
The best thing about XML for documentation is that the XML is easy to understand for humans,
both of the actual documentation, but also the XML code surrounding it. XML can be used for
any type of documentation, from a publishing house to Marketing materials. Prepared By: Mr.Aditya patel Page 9
Unit-IV/Web Engineering St.Aloysius Institute of Technology
Here is an example of documentation written in XML:
<howto> <title>How to Write a Mail Link</title>
<author>Jennifer Kyrnin, Web Design Guide</author>
<description>
<paragraph>
Use a HTML tag to allow your readers to send email directly from your Web
site. </paragraph>
</description>
<directions>
<step>Write a link as usual <a href="">email me</a></step>
<step>Where you would normally put a URL, put the code "mailto" <a
href="mailto:">email me</a></step>
<step>Then put your email address after the colon <a
href="mailto:[email protected]">email me</a></step>
</directions> </howto>
As you can see, both the data and the XML are readable and understandable. The content is also
in an order that would be expected by a human reading the document.
XML AND DATABASE DEVELOPMENT
Databases are a natural use for XML, because XML is all about data. Unlike XML for
documentation, XML for databases does not need to be readable by humans. The data is simply
written in such a way to allow machines to read it and make it accessible to a database.
Here's XML that might be loaded into a database:
<item number="00001">
<name>
<first>Jane</first> <middle>Q</middle> <last>Public</last>
</name> <phone type="voice">
<areacode>407</areacode>
<number>555-1212</number> </phone>
<phone type="fax">
<areacode>407</areacode>
<number>555-1213</number> </phone>
<email>[email protected]</email> </item>
Prepared By: Mr.Aditya patel Page 10
Unit-IV/Web Engineering St.Aloysius Institute of Technology
Unlike the document XML, it's not necessary that this be easily readable by humans. Since it is
meant to be input into a database, it is only important that it be processable by a computer.
HTML versus XML
The most salient difference between HTML and XML is that HTML describes presentation and
XML describes content. An HTML document rendered in a web browser is human readable.
XML is aimed toward being both human and machine readable.
Consider the following HTML.
<html>
<head><title>Books</title><head>
<body>
<h2>Books</h2>
<hr>
<em>Sense and Sensibility</em>, <b>Jane Austen</b>, 1811<br>
<em>Pride and Prejudice</em>, <b>Jane Austen</b>, 1813<br>
<em>Alice in Wonderland</em>, <b>Lewis Carroll</b>, 1866<br>
<em>Through the Looking Glass<</em>, <b>Lewis Carroll</b>, 1872<br>
</body>
</html>
The previous HTML is rendered in a browser as follows.
Prepared By: Mr.Aditya patel Page 11
Unit-IV/Web Engineering St.Aloysius Institute of Technology
The HTML above describes how bibliography information is to be presented and formatted for a
human to view in a web browser. Knowing that Sense and Sensibility is enclosed in italic tags
does not however help a program determine that it is the title of a book. XML attempts to
describe web data to address this void.
The following is XML describing the contents of the books HTML page above.
<books>
<book>
<title>Sense and Sensibility</title>
<author>Jane Austen</author>
<year>1811</year>
</book>
<book>
<title>Pride and Prejudice</title>
<author>Jane Austen</author>
<year>1813</year>
</book>
Prepared By: Mr.Aditya patel Page 12
Unit-IV/Web Engineering St.Aloysius Institute of Technology
<book>
<title>Alice in Wonderland</title>
<author>Lewis Carroll</author>
<year>1866</year>
</book>
<book>
<title>Through the Looking Glass</title>
<author>Lewis Carroll</author>
<year>1872</year>
</book>
</books>
A program parsing this data can take advantage of the fact that all book titles are enclosed in
<title> tags. Where would such a program find such information? An XML document may
contain an optional description of its grammar. A grammar describes which tags are used in the
XML document and how such tags can be nested. A grammar is a schema or road map for the
XML document. Originally an XML grammar was specified in a DTD (Document Type
Definition). A newer standard however, XSchema (XML Schema) has been adopted. XSchema
addresses some of the limitations of DTDs.
As can be seen above, XML does not contain any information indicating how the document
should be rendered in a browser. Therefore, XML factors data from presentation. The beauty of
this feature is that the same data can be presented in a variety of ways without having to
replicate any data (e.g., consider making book titles bold and authors italic).
XML SYNTAX DIFFERS FROM HTML
New tags may be defined at will
Tags may be nested to arbitrary depth
May contain an optional description of its grammar
XML can be used to store data inside HTML documents. XML data can be stored inside HTML
pages as "Data Islands". As HTML provides a way to format and display the data, XML stores
Prepared By: Mr.Aditya patel Page 13
Unit-IV/Web Engineering St.Aloysius Institute of Technology
data inside the HTML documents. The data contained in an XML file is of little value unless it
can be displayed, and HTML files are used for that purpose.
The simple way to insert XML code into an HTML file is to use the <xml> tag. The XML tag
informs, the browser that the contents are to be parsed and interpreted using the XML parser.
Like most other HTML tags, the <xml> tag has attributes. The most important attribute is the ID,
which provides for the unique naming of the code. The contents of the XML tag come from one
of two sources : inline XML code or an imported XML file.
If the code appears in the current location , it's said to be inline.
Example
Embedding XML code inside an HTML File. <html>
<xml Id = msg>
<message>
<to> Visitors </to>
<from> Author </from>
<Subject> XML Code Islands </Subject>
<body> In this example, XML code is embedded inside HTML code
</body>
</message>
</xml> </html>
The efficient way is to create a file and import it. You can easily do so by using the SRC
attribute of the XML tag.
Syntax
<xml Id = msg SRC =
"example1.xml"> </xml>
DATA BINDING
Data binding involves mapping, synchronizing, and moving data from a data source, usually on a
remote server, to an end user's local system where the user can manipulate the data. Using data
binding means that after a remote server transmits data, the user can perform some minor data
manipulations on their own local system. The remote server does not have to perform all the data
manipulations nor repeatedly transmit variations of the same data.
Data binding involves moving data from a data source to a local system, and then
manipulating the data, such as, searching, sorting, and filtering, it on the local system.
Prepared By: Mr.Aditya patel Page 14
Unit-IV/Web Engineering St.Aloysius Institute of Technology
When you bind data in this way, you do not have to request that the remote server
manipulate the data and then retransmit the results; you can perform some data
manipulation locally.
In data binding, the data source provides the data, and the appropriate applications
retrieve and synchronize the data and present it on the terminal screen.
If the data changes, the applications are written so they can alter their presentation to
reflect those changes.
Data binding is used to reduce traffic on the network and to reduce the work of the Web
server, especially for minor data manipulations.
Binding data also separates the task of maintaining data from the tasks of developing and
maintaining binding and presentation programs.
CONVERTING XML TO HTML FOR DISPLAY
There exist several ways to convert XML to HTML for display on the Web.
Using HTML alone
If your XML file is of a simple tabular form only two levels deep then you can display XML
files using HTML alone.
Using HTML + CSS
This is a substantially more powerful way to transform XML to HTML than HTML alone, but
lacks the full power and flexibility of the methods listed below.
Using HTML with JavaScript
Fully general XML files of any type and complexity can be processed and displayed using a
combination of HTML and JavaScript. The advantages of this approach are that any possible
transformation and display can be carried out because JavaScript is a fully general purpose
programming language. The disadvantages are that it often requires large, complex, and very
detailed programs using recursive functions (functions that call themselves repeatedly) which are
very difficult for most people to grasp
Using XSL and Xpath
XSL (eXtensible Stylesheet Language) is considered the best way to convert XML to HTML.
The advantages are that the language is very compact, very sophisticated HTML can be
displayed with relatively small programs, it is easy to re-purpose XML to serve a variety of
purposes, it is non-procedural in that you generally specify only what you wish to accomplish as
opposed to detailed instructions as to how to achieve it, and it greatly reduces or eliminates the
need for recursive functions. The disadvantages are that it requires a very different mindset to
use, and the language is still evolving so that many XSL processors in the Web servers are out of
date and newer ones must sometimes be invoked through DOS
Prepared By: Mr.Aditya patel Page 15
Unit-IV/Web Engineering St.Aloysius Institute of Technology
DISPLAYING XML DOCUMENT USING XSL
It is a language for expressing stylesheets. It consists of two parts: A language for transforming XML documents (XSLT)
An XML vocabulary for specifying formatting semantics
An XSL stylesheet specifies the presentation of a class of XML documents by describing how an
instance of the class is transformed into an XML document that uses the formatting vocabulary.
Like CSS an XSL is linked to an XML document and tell browser how to display each of
document's elements. An XML document with an attached XSL can be open directly in Internet
Explorers. You don't need to use an HTML page to access and display the data.
There are two basic steps for using a css to display an XML document: Create the XSL file.
Link the XSL sheet to XML document.
CREATING XSL FILE
XSL is a plain text file with .css extension that contains a set of rules telling the web browser
how to format and display the elements in a specific XML document. You can create a css file
using your favorite text editors like Notepad, Wordpad or other text or HTML editor as show
below:
general.xsl
employees { background-color: #ffffff;
width: 100%; } id { display: block; margin-bottom: 30pt; margin-left: 0; } name { color: #FF0000;
font-size: 20pt; } city,state,zipcode { color: #0000FF;
Prepared By: Mr.Aditya patel Page 16
Unit-IV/Web Engineering St.Aloysius Institute of Technology
font-size: 20pt; }
LINKING
To link to a style sheet you use an XML processing directive to associate the style sheet with the
current document. This statement should occur before the root node of the document.
<?xml-stylesheet type="text/xsl" href="styles/general.xsl">
The two attributes of the tag are as follows:
href
The URL for the style sheet.
type
The MIME type of the document begin linked, which in this case is text/css.
MIME stands for Multipart Internet Mail Extension. It is a standard which defines how to make
systems aware of the type of content being included in e-mail messages.
The css file is designed to attached to the XML document as shown below:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!--This xml file represent the details of an employee-->
<?xml-stylesheet type="text/xsl"
href="styles/general.xsl"> <employees> <employee
id="1"> <name> <firstName>Mohit</firstName> <lastName>Jain</lastName> </name> <city>Karnal</city> <state>Haryana</state> <zipcode>98122</zipcode> </employee>
<employee
id="2"> <name> <firstName>Rahul</firstName> <lastName>Kapoor</lastName> </name> <city>Ambala</city> <state>Haryana</state> <zipcode>98112</zipcode>
</employee> </employees>
Prepared By: Mr.Aditya patel Page 17
Unit-IV/Web Engineering St.Aloysius Institute of Technology
REWRITING
Let's say you have a proxy running om www.myproxy.com and have proxied the site
www.remotesite.com to the directory /remote. The links on the proxied page
www.remotesite.com doesn't know they are being proxied, this can create some problems. But
lets start with looking at the three different link types.
<a href="myfile.html"> - This link will work
<a href="/myfile.html"> - This link wont work
<a href="http://www.remotesite.com/myfile.html"> - This link wont work
The first link will work since it is relative to the content.
The second link is mapped to the root and therefore the browser will request the following page:
http://www.myproxy.com/myfile.html, but this file isn't found since only files in the directory
/remote will be sent to www.remotesite.com. We have to change so that the link points to
/remote/myfile.html.
The third link is absolute and therefor the browser will follow it to
http://www.remotesite.com/myfile.html. This works correctly, but only if the remote site is
visible to the client. Probably the site being proxied is some internal server not accessible from
the outside. We have to change the link to http://www.myproxy.com/remote/myfile.html.
The rewrite filter
As you should already have learned the proxy is built using a filter that proxies all
incomingrequests. To make the rewrite work there is another filter supplied, the rewrite filter.
Theproxy filter will work perfectly fine without a rewrite filter and doesn't have any knowledge
of the possibility for links to be rewritten. This makes it just as easy to run the proxy with and
without rewriting.
How it works
The current rewriting is done by parsing the html, javascript and css files looking for links using
regular expressions.
The reason the proxy is using regular expressions is that it then can use the the same type of
parsing to find links in both css and html. There is one other reason for using regular expression
over a XML parser, pages aren't writing in XHTML. Since there are so many non XML
compatible pages out there using a standard XML parser wouldn't work. There are other options
like javax.swing.text.html and changing from regular expressions is something considered for
the next versions. There will have to be some measurable performance benefits for doing so
however.
Turn on rewrite
web.xml
Prepared By: Mr.Aditya patel Page 18
Unit-IV/Web Engineering St.Aloysius Institute of Technology
The default setting of the proxy is to not do any link rewriting. But you can easily turn the
rewriting on by adding the rewrite filter. A alternate web.xml is supplied with the proxy that has
rewriting enabled. The file is called web_rewriting.xml and can be found in
TOMCAT_HOME/webapps/J2EP_INSTALL_DIR/WEB-INF/. To enable rewriting rename
web_rewriting.xml to web.xml, make sure that you overwrite the existing file.
data.xml (config file)
Here are the good news, you don't have to do anything (almost). If you have mapped a site for
the proxy all of the links excluding the absolute ones will be rewritten. The reason that the
absolute links aren't rewritten is that you might want to leave them as they are and let the user
follow those links.
You will probably turn absolute link rewriting on however. To do this, simply add theparameter
isRewriting="true" to the server. All absolute links found on a page will be matched to see if we
have them mapped in the config. If we have the server mapped andisRewriting is set to "true"
absolute links for the server will be rewritten.
All servers doesn't support the isRewriting=‖true‖, for instance RoundRobinCluster will always
do rewriting. Consult the documentation of the servers for more information.
Other form of rewrites
There are two more issues with rewriting. One is when the server says a page has moved and
sends a location for the new page, we have to rewrite that location. The other issue is when a
cookie is sent from the server, we have to change so the cookie is set for the correct directory.
Both of these issues are handled by the proxy without having to do any extra configuration.
HTML, SGML, and XML
First you should know that SGML (Standard Generalized Markup Language) is the basis for
both HTML and XML. SGML is an international standard (ISO 8879) that was published in
1986.
Second, you need to know that XHTML is XML. "XHTML 1.0 is a reformulation of HTML
4.01 in XML, and combines the strength of HTML 4 with the power of XML."
Thirdly, XML is NOT a language, it is rules to create an XML based language. Thus, XHTML
1.0 uses the tags of HTML 4.01 but follows the rules of XML.
The Document
A typical document is made up of three layers: structure
Content
Style
Prepared By: Mr.Aditya patel Page 19
Unit-IV/Web Engineering St.Aloysius Institute of Technology
Structure
Structure would be the documents title, author, paragraphs, topics, chapters, head, body etc.
Content
Content is the actual information that composes a title, author, paragraphs etc.
Style
Style is how the content within the structural elements are displayed such as font color, type and
size, text alignment etc.
Markup
HTML, SGML, and XML all markup content using tags. The difference is that SGML and XML
mainly deal with the relationship between content and structure, the structural tags that markup
the content are not predefined (you can make up your own language), and style is kept
TOTALLY separate; HTML on the other hand, is a mix of content marked up with both
structural and stylistic tags. HTML tags are predefined by the HTML language.
By mixing structure, content and style you limit yourself to one form of presentation and in
HTML's case that would be in a limited group of browsers for the World Wide Web.
By separating structure and content from style, you can take one file and present it in multiple
forms. XML can be transformed to HTML/XHTML and displayed on the Web, or the
information can be transformed and published to paper, and the data can be read by any XML
aware browser or application.
SGML (Standard Generalized Markup Language)
Historically, Electronic publishing applications such as Microsoft Word, Adobe PageMaker or
QuarkXpress, "marked up" documents in a proprietary format that was only recognized by that
particular application. The document markup for both structure and style was mixed in with the
content and was published to only one media, the printed page.
These programs and their proprietary markup had no capability to define the appearance of the
information for any other media besides paper, and really did not describe very well the actual
content of the document beyond paragraphs, headings and titles. The file format could not be
read or exchanged with other programs, it was useful only within the application that created it.
Because SGML is a nonproprietary international standard it allows you to create documents that
are independent of any specific hardware or software. The document structure (what elements
are used and their relationship to each other) is described in a file called the DTD (Document
Type Definition). The DTD defines the relationships between a document's elements creating a
consistent, logical structure for each document.
SGML is good for handling large-scale, long-term information management needs and has been
around for more than a decade as the language of defense contractors and the electronic
Prepared By: Mr.Aditya patel Page 20
Unit-IV/Web Engineering St.Aloysius Institute of Technology
publishing industry. Because SGML is very large, powerful, and complex it is hard to learn and
understand and is not well suited for the Web environment.
XML (Extensible Markup Language)
XML is a "restricted form of SGML" which removes some of the complexity of SGML. XML
like SGML, retains the flexibility of describing customized markup languages with a user-
defined document structure (DTD) in a non-proprietary file format for both storage and
exchange of text and data both on and off the Web.
As mentioned before, XML separates structure and content from style and the structural markup
tags can actually describe the content because they can be customized for each XML based
markup language. A good example of this is the Math Markup Language (MathML) which is an
XML application for describing mathematical notation and capturing both its structure and
content.
Until MathML, the ability to communicate mathematical expressions on the Web was limited to
mainly displaying images (JPG or GIF) of the scientific notation or posting the document as a
PDF file. MathML allows the information to be displayed on the Web, and makes it available for
searching, indexing, or reuse in other applications.
HTML (Hypertext markup Language)
HTML is a single, predefined markup language that forces Web designers to use it's limiting and
lax syntax and structure. The HTML standard was not designed with other platforms in mind,
such as Web TV’s, mobile phones or PDAs. The structural markup does little to describe the
content beyond paragraph, list, title and heading.
XML breaks the restricting chains of HTML by allowing people to create their own markup
languages for exchanging information. The tags can be descriptive of the content and authors
decide how the document will be displayed using style sheets (CSS and XSL). Because of
XML's consistent syntax and structure, documents can be transformed and published to multiple
forms of media and content can be exchanged between other XML applications.
HTML was useful in the part it has played in the success of the Web but has been outgrown as
the Web requires more robust, flexible languages to support it's expanding forms of
communication and data exchange.
XML will never completely replace SGML because SGML is still considered better for long-
time storage of complex documents. However, XML has already replaced HTML as the
recommended markup language for the Web with the creation of XHTML 1.0.
Even though XHTML has not made the HTML that currently exists on the Web obsolete,
HTML 4.01 is the last version of HTML. XHTML (an XML application) is the foundation for a
universally accessible, device independent Web.
Semantic Web Services, like conventional web services, are the server end of a client– server
system for machine-to-machine interaction via the World Wide Web. Semantic services
Prepared By: Mr.Aditya patel Page 21
Unit-IV/Web Engineering St.Aloysius Institute of Technology
are a component of the semantic web because they use markup which makes data machine-
readable in a detailed and sophisticated way (as compared with human-readable HTML which is
usually not easily "understood" by computer programs).
WEB ONTOLOGY LANGUAGE
It is a family of knowledge representation languages or ontology languages for authoring
ontologies or knowledge bases. The languages are characterised by formal
semantics and RDF/XML-based serializations for the Semantic Web. OWL is endorsed by the World Wide Web Consortium (W3C) and has attracted academic, medical and commercial
interest.
In October 2007, a new W3C working group was started to extend OWL with several new
features as proposed in the OWL 1.1 member submission. W3C announced the new version of
OWL on 27 October 2009. This new version, called OWL 2, soon found its way into semantic
editors such as Protégé and semantic reasoners such as Pellet. The OWL family contains many
species, serializations, syntaxes and specifications with similar names. OWL and OWL2 are
used to refer to the 2004 and 2009 specifications, respectively. Full species names will be used,
including specification version (for example, OWL2 EL). When referring more generally, OWL
Family will be used.
TYPES OF ONTOLOGIES
Domain ontology - A domain ontology (or domain-specific ontology) models a specific domain,
which represents part of the world. Particular meanings of terms applied to that domain are
provided by domain ontology. For example the word card has many different meanings. An
ontology about the domain of poker would model the "playing card" meaning of the word, while
an ontology about the domain of computer hardware would model the "punched card" and
"video card" meanings.
Since domain ontologies represent concepts in very specific and often eclectic ways, they are
often incompatible. As systems that rely on domain ontologies expand, they often need to merge
domain ontologies into a more general representation. This presents a challenge to the ontology
designer. Different ontologies in the same domain arise due to different languages, different
intended usage of the ontologies, and different perceptions of the domain (based on cultural
background, education, ideology, etc.).
At present, merging ontologies that are not developed from a common foundation ontology is a
largely manual process and therefore time-consuming and expensive. Domain ontologies that
use the same foundation ontology to provide a set of basic elements with which to specify the
meanings of the domain ontology elements can be merged automatically. There are studies on
generalized techniques for merging ontologies, but this area of research is still largely
theoretical.
Upper ontology - An upper ontology (or foundation ontology) is a model of the common objects
that are generally applicable across a wide range of domain ontologies. It employs a core
Prepared By: Mr.Aditya patel Page 22
Unit-IV/Web Engineering St.Aloysius Institute of Technology
glossarythat contains the terms and associated object descriptions as they are used in various
relevant domain sets.
There are several standardized upper ontologies available for use, including Dublin Core, GFO,
OpenCyc/ResearchCyc, SUMO, and DOLCE. WordNet, while considered an upper ontology by
some, is not strictly an ontology. However, it has been employed as a linguistic tool for learning
domain ontologies.
Hybrid ontology - The Gellish ontology is an example of a combination of an upper and a
domain ontology. Prepared By: Mr.Aditya patel Page 23