www.monash.edu.au ims5401 web-based systems development topic 2 (cont): elements of the web ambition...
Post on 01-Apr-2015
215 Views
Preview:
TRANSCRIPT
www.monash.edu.au
IMS5401Web-based Systems Development
Topic 2 (cont): Elements of the Web
Ambition versus Achievability
(e) Document formatting - display
(d) Document formatting - searching
www.monash.edu.au
2
Agenda
1. Document formatting
2. Topic 2 (e) Mark-up and page display
3. Topic 2 (f) Mark-up and document searching
4. Implications of document mark-up for web applications and information professionals
www.monash.edu.au
3
Elements of the Web
THE WEB
Connecting computers
Digital representationof documents
Display and organisation of
documents
Linkingdocuments
www.monash.edu.au
4
1. Document formatting
• For the writer: need standard capability for formatting and for describing document content
• For the reader: need software which can find documents and can display them as the author intended
• Different types of formatting• Formatting for layout/structure• Formatting for appearance/style• Formatting for content description
• Note: interpret ‘document’ very broadly
www.monash.edu.au
5
Document appearance•content•layout•style
Document format•content•layout•style
The document formatting problem:Display
Creator of document
Audience
Document Document reader
Write/format
Read
www.monash.edu.au
6
Select document with contentsbest suited to information needs
Doc 3
The document formatting problem:Searching
Creators of documents Audiences
Document Document reader
Describe documentcontent
Read
Doc 1
Doc 2
Doc ..
Doc n
www.monash.edu.au
7
Needs for formatting for display
• Handle documents with content of all types - text, graphics, photos, sound, film, etc
• Support all possible document display needs• Device independence for writer and reader:
• Writer should be sure that his formatted document will be able to be read by anyone;
• Reader should be sure that he can read any document;
• Both reader and writer should feel sure that the reader sees the document exactly as the writer intended it
www.monash.edu.au
8
Needs for formatting for searching
• Can readers find the relevant documents which they need?
• Can authors ensure their documents are described and classified appropriately?
• The biggest library in the world is useless if the content is not organised and classified so people can find what they want
www.monash.edu.au
9
2. Topic 2(e) Mark-up languages and document display: Brief history
• GML (late 1960s IBM)• SGML (late 1970s-mid 80s)• HTML (1990)• CSS (mid 1990s)• XHTML (late 1990s)• XML (late 1990s)
www.monash.edu.au
10
What is a Mark-up Language?
• A set of standard formatting symbols, incorporated into a document to direct the computer how to format and display it
• Can be used to describe:• document structure/layout (ie headings,
paragraphs, titles, etc)• text appearance (fonts, typeface, etc)• style sheets (templates, etc)
• (Can also include metadata for contents; see next section)
www.monash.edu.au
11
Features of a hypertext mark-up language
• Tags - all document elements are marked by tags delimited by arrows; eg <p> means start paragraph; </p> means end paragraph
• Attributes - some tags can have attributes which specify extra detail; eg <p align= “center”> means start a new paragraph and centre it on the page
• Links - the anchor tag allows you to link an element in the document to a location in another web-based document eg <a href=“www.sims.monash.edu.au>
www.monash.edu.au
12
A sample of a ML: Some formatted text ...
Test sample of a Mark-up Language
This brief piece of text aims to show how a mark-up language works by including tags which show how the document should be displayed by the computer.
Note how this text has been formatted to include features such as:
Headers/titlesParagraphs;Italics and Bolding
multiple fontsSpacing and text alignment;
www.monash.edu.au
13
... and the HTML tags to format it
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"><META NAME="Generator" CONTENT="Microsoft Word 97/98"><TITLE>Test sample of a Mark-up Language</TITLE></HEAD><BODY><B><FONT FACE="Times" SIZE=5><P ALIGN="CENTER">Test sample of a Mark-up Language</P></B><P>This brief piece of text aims to show how a mark-up language works by including tags which show how the document should be displayed by the computer. </P><P>Note how this text has been formatted to include features such as:</P><P>Headers/titles</P><P>Paragraphs;</P><I><P>Italics and </I><B>Bolding</P></FONT><FONT FACE="Avant Garde" SIZE=5><P ALIGN="CENTER">multiple</FONT><FONT FACE="Times" SIZE=5> </FONT><FONT FACE="Chicago" SIZE=5>fonts</P></B></FONT><FONT FACE="Times" SIZE=5><P ALIGN="RIGHT">Spacing and text alignment;</P></FONT></BODY>
</HTML>
www.monash.edu.au
14
Evolution: From GML to HTML
• GML (General Mark-up Language): The original mark-up language; developed in IBM to cope with problems of multiple document formats
• SGML (Standardised General Mark-up Language) - an international standard for mark-up based on GML
• HTML (hypertext mark-up language) - a ‘quick and dirty’ mark-up language developed by Tim Berners-Lee for formatting documents for display on his web
www.monash.edu.au
15
HTML: Hypertext Mark-up Language
• The original web ML• Grew from a very simple Version 1.0 (only 20 tags
to format layout) to a large and complex ML • Four main ‘standard’ versions:
• 2.0 (1994);• 3.2 (1996);• 4.0 (1997);• 4.01 (1999)
• Initial simplicity lost as new formatting capabilities were added; many ‘non-standard’ tags
• No more up-grades after 4.01; replaced by XHTML
www.monash.edu.au
16
Dealing with non-text elements of documents
• HTML permits the inclusion of graphics, sound, video, animation, program scripts, etc as objects within the document
• Standard file formats have evolved - covered in lecture on digital representation
• Inclusion of program-like objects covered in lecture on interactivity
www.monash.edu.au
17
Improving on HTML:Cascading Style Sheets (CSS)
• Developed to make up for inadequacies of HTML for controlling document appearance/style
• Works like templates and style sheets in Word, Powerpoint, etc
• Enables definition of standard text style - fonts, typeface, etc
• Followed by other variants of HTML also developed to improve it, fix it, enhance it, etc
• DHTML, MathML, VRML, etc
www.monash.edu.au
18
Making HTML extensible
• What if I want to write a document which needs a special type of formatting?
• Work around it within HTML?• Get new tags built into HTML?• Build a new specialised ML?
• The concept of the extensible language • XHTML: A sort of HTML 5.0, but now with
extensibility built in - allows you create and define your own tags
• XML (see next section)
www.monash.edu.au
19
‘Normal’ technology for computer-based document formatting
• Document creation software (eg MS-Word) puts formatting instructions in the document
• Document is read with the same software that the writer used to create it (eg MS-Word)
• Formatting controls are unique to each software package (copyright)
• Software companies sometimes won’t support the formats used by other rival companies
• ‘Neutral’ formats (eg .rtf, .txt, etc) are more widely accessible but may lose formatting
www.monash.edu.au
20
Technology for formatting web documents: Writing HTML
• Initially, all HTML formatting done by hand in text editors
• Then, specialist HTML composers developed - Front Page, Dreamweaver, etc
• Now, even MS-Word can generate an HTML file• ‘Quality’ of HTML from composer packages is
very variable; compliance with W3C standard?• Often need to ‘patch’ composer code
www.monash.edu.au
21
Technology for displaying formatted web documents: Browsers
• Berners-Lee’s browser for displaying HTML-formatted pages
• Mosaic, Netscape, IE, etc• The inter-relationship between browsers and
HTML• Plug-ins• Error-handling with HTML• Proposed error-handling with XML
www.monash.edu.au
22
3. Topic 2(f): Mark-up for document searching
• Imagine a library without a catalogue; how would you find things in it?
• Imagine a librarian without a cataloguing system; how would you know where to put everything?
• The web can be thought of as a vast library of documents
• Where is its librarian and cataloguing system?!• So how do we find things in it? The need for
metadata
www.monash.edu.au
23
Metadata
• Metadata = data about data. Metadata for a document is data about the document and its contents
• Relate to:• Book indexes; library catalogues, etc• data dictionary entries,entity attributes, etc
• Usefulness for the reader:• To help me find a document; to help me find
the ‘best’ document for my needs
• Usefulness for the author:• To help users find my document; to make it
clear what my document is about
www.monash.edu.au
24
Indexing documents and metadata
• Ideally every document should include its metadata• Metadata may need to be about:
• the document itself: eg author, title, date created, version, etc
• the document’s content: eg content description, topics, related documents, etc; document elements may also need metadata - eg source, creator, copyright, etc
• Different sorts of documents/document elements have different sorts of relevant metadata
• Can we develop universal metadata formats?
www.monash.edu.au
25
Mark-up Languages and metadata
• Metadata was always considered an important issue in ML development (SGML, etc) …
• … but it was not a big deal in the original development of an ML for the web (remember Berners-Lee’s original limited vision of the web)
• Therefore, HTML had very primitive metadata capabilities
www.monash.edu.au
26
Metadata and HTML
• HTML allowed metadata to be contained in two main tags:
• <Head> (info about document)• <Meta> (information about content)
• Primitive capabilities• Not a required element of the document• No standards for how metadata should be
managed within the tags• Ignored by most HTML authors
www.monash.edu.au
27
Searching methods and metadata: finding documents on the web
• What do I want when I search for documents?• All documents relevant to my query (rigour)• No documents which are irrelevant to my query
(precision)• A ranked list in order of relevance • A ranked list in order of ‘quality’ (validity, currency,
size, …?)
• (Note the contradictions inherent in these needs)
www.monash.edu.au
28
Possible search/indexing methods
• “Brute force”• Usage or linkage-based• Librarian-based• Author-based• (See tutorial resources for detailed
explanations)
www.monash.edu.au
29
“Brute force” search/indexing
• The basis of most early web search engines:• Index every word in the document• Determine relevance and ranking from word
frequency and position
• No cataloguing required• Can be completely automated and therefore
very cheap• Gives lots of ‘hits’ for almost any word, but very
inaccurate and imprecise
www.monash.edu.au
30
Usage and linkage-based search/indexing methods
• Initially most famously used by Google; now very common:
• Identify document content by its associated links • Measure document quality by its popularity or by
the number of links to it
• No cataloguing required• Can be completely automated and therefore
very cheap• Accuracy is variable; coverage is incomplete• Gives lots of hits, but patchy results
www.monash.edu.au
31
Directory-based search/indexing methods
• Initially used by Yahoo:• ‘Librarian’ views site and identifies its content
(indexes) and quality (relevance and ranking)
• ‘Librarian’ includes site in catalogue of sites
• Labour-intensive and therefore expensive• Relies on librarian’s expertise and judgement of
content• Generally accurate and precise, but requires
skilled librarians
www.monash.edu.au
32
Author-based search/indexing methods
• Basis for metadata systems:• Document author includes document and content
information in the document• Indexes are based on author-supplied information
• Uses only the author’s time; therefore cheap for document distributor
• Relies on author’s understanding of indexing and metadata
• As accurate and precise as the document author makes it
www.monash.edu.au
33
XML: Extensible Mark-up Language
• Developed by W3C and others as the ‘big one’ - “the universal format for structured documents and data on the web”
• Key concern is metadata, but also aims to provide a framework within which display formats such as HTML, CSS, etc can sit as specialised languages
• Provides framework within users can create and define their own mark-up tags for specialist applications (hence “extensible”)
• Based on SGML, but guided by experience with HTML
www.monash.edu.au
34
The role of other ML’s after XML
• HTML, CSS, etc are so widely-used that they must still be supported
• Future versions will be (should be!) sub-sets of XML; eg after HTML 4.01, next version was XHTML, etc
• New specialist standards and formats developed to live under XML and as a bridge back to old formats
• Move to ‘tighter’, less forgiving mark-up interpreters to ensure compliance
www.monash.edu.au
35
XML and metadata standards
• For XML to extend to cover all document types, it needs to have standards for these document types
• Hence, various standards are being developed within the XML framework to try to define the ‘standard’ document elements for different document types
• Leads us to the Semantic web and web services (see later lecture)
www.monash.edu.au
36
4. Implications of mark-up for web applications
• The growing complexity of mark-up languages and browsers
• The merging (collision?) of form (document appearance and display) with function (document description and searching)
• What matters most for your application?• How easily can it be implemented in a way
which readers/users can deal with?
www.monash.edu.au
37
Initial Vision
User EndDocument End
User reads with...Developer creates with...
Standard mark-up language (HTML)
Standard browser(Mosaic)
www.monash.edu.au
38
Actuality
User EndDocument End
HTML
DHTML
XHTML
XML
+
Other media
+
Scripts
Netscape
Internet Explorer
Opera etc
… to display ‘standard’ media+ plug-ins to display other media (Acrobat, Real Player, etc)
User accesses with...Developer creates with...
Composer tools which generate marked-up document:
Dreamweaver, Front Page,CMS, etc
Scripting Languages
www.monash.edu.au
39
Developing around the limits of HTML/XML and browsers
• Standards for all document formatting needs?
• Support for standards by the big industry players?
• Technical expertise needed in order to publish material which displays properly on the web?
• Technical expertise needed to find and read material on the web?
• Web access from non-computer-screen devices?
www.monash.edu.au
40
Ambition and Achievability (1)
• Can a single formatting standard satisfy every document display and content description requirement?
• Can the HTML web be turned into the XML web?• Conceptual simplicity vs practical complexity• See Jan Bosak’s remarks (next slide)
www.monash.edu.au
41
Ambition and Achievability (2): Jan Bosak on mark-up, linking and display
• Early visionaries went further than adopters would follow
• Breakthroughs came from newcomers who simplified earlier advanced techniques
• Original more complex work is now being re-introduced and seen as necessary
• Biggest roadblock to success with more advanced solutions is the success of the simple limited solutions
(Prolog to: Goldfarb, C (2004) ‘XML handbook’, 5th ed)
www.monash.edu.au
42
What does this mean for information professionals?
• Learning about HTML tags?• Learning about browser capabilities and
differences?• Learning about search engines?• Learning about metadata?• Learning about XML?• Web services and the semantic web
top related