www.monash.edu.au ims5401 web-based systems development topic 2 (cont): elements of the web ambition...

42
www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display (d) Document formatting - searching

Upload: jenifer-rawlings

Post on 01-Apr-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

IMS5401Web-based Systems Development

Topic 2 (cont): Elements of the Web

Ambition versus Achievability

(e) Document formatting - display

(d) Document formatting - searching

Page 2: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

2

Agenda

1. Document formatting

2. Topic 2 (e) Mark-up and page display

3. Topic 2 (f) Mark-up and document searching

4. Implications of document mark-up for web applications and information professionals

Page 3: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

3

Elements of the Web

THE WEB

Connecting computers

Digital representationof documents

Display and organisation of

documents

Linkingdocuments

Page 4: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

4

1. Document formatting

• For the writer: need standard capability for formatting and for describing document content

• For the reader: need software which can find documents and can display them as the author intended

• Different types of formatting• Formatting for layout/structure• Formatting for appearance/style• Formatting for content description

• Note: interpret ‘document’ very broadly

Page 5: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

5

Document appearance•content•layout•style

Document format•content•layout•style

The document formatting problem:Display

Creator of document

Audience

Document Document reader

Write/format

Read

Page 6: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

6

Select document with contentsbest suited to information needs

Doc 3

The document formatting problem:Searching

Creators of documents Audiences

Document Document reader

Describe documentcontent

Read

Doc 1

Doc 2

Doc ..

Doc n

Page 7: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

7

Needs for formatting for display

• Handle documents with content of all types - text, graphics, photos, sound, film, etc

• Support all possible document display needs• Device independence for writer and reader:

• Writer should be sure that his formatted document will be able to be read by anyone;

• Reader should be sure that he can read any document;

• Both reader and writer should feel sure that the reader sees the document exactly as the writer intended it

Page 8: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

8

Needs for formatting for searching

• Can readers find the relevant documents which they need?

• Can authors ensure their documents are described and classified appropriately?

• The biggest library in the world is useless if the content is not organised and classified so people can find what they want

Page 9: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

9

2. Topic 2(e) Mark-up languages and document display: Brief history

• GML (late 1960s IBM)• SGML (late 1970s-mid 80s)• HTML (1990)• CSS (mid 1990s)• XHTML (late 1990s)• XML (late 1990s)

Page 10: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

10

What is a Mark-up Language?

• A set of standard formatting symbols, incorporated into a document to direct the computer how to format and display it

• Can be used to describe:• document structure/layout (ie headings,

paragraphs, titles, etc)• text appearance (fonts, typeface, etc)• style sheets (templates, etc)

• (Can also include metadata for contents; see next section)

Page 11: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

11

Features of a hypertext mark-up language

• Tags - all document elements are marked by tags delimited by arrows; eg <p> means start paragraph; </p> means end paragraph

• Attributes - some tags can have attributes which specify extra detail; eg <p align= “center”> means start a new paragraph and centre it on the page

• Links - the anchor tag allows you to link an element in the document to a location in another web-based document eg <a href=“www.sims.monash.edu.au>

Page 12: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

12

A sample of a ML: Some formatted text ...

Test sample of a Mark-up Language

This brief piece of text aims to show how a mark-up language works by including tags which show how the document should be displayed by the computer.

Note how this text has been formatted to include features such as:

Headers/titlesParagraphs;Italics and Bolding

multiple fontsSpacing and text alignment;

Page 13: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

13

... and the HTML tags to format it

<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"><META NAME="Generator" CONTENT="Microsoft Word 97/98"><TITLE>Test sample of a Mark-up Language</TITLE></HEAD><BODY><B><FONT FACE="Times" SIZE=5><P ALIGN="CENTER">Test sample of a Mark-up Language</P></B><P>This brief piece of text aims to show how a mark-up language works by including tags which show how the document should be displayed by the computer. </P><P>Note how this text has been formatted to include features such as:</P><P>Headers/titles</P><P>Paragraphs;</P><I><P>Italics and </I><B>Bolding</P></FONT><FONT FACE="Avant Garde" SIZE=5><P ALIGN="CENTER">multiple</FONT><FONT FACE="Times" SIZE=5> </FONT><FONT FACE="Chicago" SIZE=5>fonts</P></B></FONT><FONT FACE="Times" SIZE=5><P ALIGN="RIGHT">Spacing and text alignment;</P></FONT></BODY>

</HTML>

Page 14: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

14

Evolution: From GML to HTML

• GML (General Mark-up Language): The original mark-up language; developed in IBM to cope with problems of multiple document formats

• SGML (Standardised General Mark-up Language) - an international standard for mark-up based on GML

• HTML (hypertext mark-up language) - a ‘quick and dirty’ mark-up language developed by Tim Berners-Lee for formatting documents for display on his web

Page 15: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

15

HTML: Hypertext Mark-up Language

• The original web ML• Grew from a very simple Version 1.0 (only 20 tags

to format layout) to a large and complex ML • Four main ‘standard’ versions:

• 2.0 (1994);• 3.2 (1996);• 4.0 (1997);• 4.01 (1999)

• Initial simplicity lost as new formatting capabilities were added; many ‘non-standard’ tags

• No more up-grades after 4.01; replaced by XHTML

Page 16: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

16

Dealing with non-text elements of documents

• HTML permits the inclusion of graphics, sound, video, animation, program scripts, etc as objects within the document

• Standard file formats have evolved - covered in lecture on digital representation

• Inclusion of program-like objects covered in lecture on interactivity

Page 17: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

17

Improving on HTML:Cascading Style Sheets (CSS)

• Developed to make up for inadequacies of HTML for controlling document appearance/style

• Works like templates and style sheets in Word, Powerpoint, etc

• Enables definition of standard text style - fonts, typeface, etc

• Followed by other variants of HTML also developed to improve it, fix it, enhance it, etc

• DHTML, MathML, VRML, etc

Page 18: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

18

Making HTML extensible

• What if I want to write a document which needs a special type of formatting?

• Work around it within HTML?• Get new tags built into HTML?• Build a new specialised ML?

• The concept of the extensible language • XHTML: A sort of HTML 5.0, but now with

extensibility built in - allows you create and define your own tags

• XML (see next section)

Page 19: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

19

‘Normal’ technology for computer-based document formatting

• Document creation software (eg MS-Word) puts formatting instructions in the document

• Document is read with the same software that the writer used to create it (eg MS-Word)

• Formatting controls are unique to each software package (copyright)

• Software companies sometimes won’t support the formats used by other rival companies

• ‘Neutral’ formats (eg .rtf, .txt, etc) are more widely accessible but may lose formatting

Page 20: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

20

Technology for formatting web documents: Writing HTML

• Initially, all HTML formatting done by hand in text editors

• Then, specialist HTML composers developed - Front Page, Dreamweaver, etc

• Now, even MS-Word can generate an HTML file• ‘Quality’ of HTML from composer packages is

very variable; compliance with W3C standard?• Often need to ‘patch’ composer code

Page 21: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

21

Technology for displaying formatted web documents: Browsers

• Berners-Lee’s browser for displaying HTML-formatted pages

• Mosaic, Netscape, IE, etc• The inter-relationship between browsers and

HTML• Plug-ins• Error-handling with HTML• Proposed error-handling with XML

Page 22: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

22

3. Topic 2(f): Mark-up for document searching

• Imagine a library without a catalogue; how would you find things in it?

• Imagine a librarian without a cataloguing system; how would you know where to put everything?

• The web can be thought of as a vast library of documents

• Where is its librarian and cataloguing system?!• So how do we find things in it? The need for

metadata

Page 23: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

23

Metadata

• Metadata = data about data. Metadata for a document is data about the document and its contents

• Relate to:• Book indexes; library catalogues, etc• data dictionary entries,entity attributes, etc

• Usefulness for the reader:• To help me find a document; to help me find

the ‘best’ document for my needs

• Usefulness for the author:• To help users find my document; to make it

clear what my document is about

Page 24: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

24

Indexing documents and metadata

• Ideally every document should include its metadata• Metadata may need to be about:

• the document itself: eg author, title, date created, version, etc

• the document’s content: eg content description, topics, related documents, etc; document elements may also need metadata - eg source, creator, copyright, etc

• Different sorts of documents/document elements have different sorts of relevant metadata

• Can we develop universal metadata formats?

Page 25: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

25

Mark-up Languages and metadata

• Metadata was always considered an important issue in ML development (SGML, etc) …

• … but it was not a big deal in the original development of an ML for the web (remember Berners-Lee’s original limited vision of the web)

• Therefore, HTML had very primitive metadata capabilities

Page 26: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

26

Metadata and HTML

• HTML allowed metadata to be contained in two main tags:

• <Head> (info about document)• <Meta> (information about content)

• Primitive capabilities• Not a required element of the document• No standards for how metadata should be

managed within the tags• Ignored by most HTML authors

Page 27: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

27

Searching methods and metadata: finding documents on the web

• What do I want when I search for documents?• All documents relevant to my query (rigour)• No documents which are irrelevant to my query

(precision)• A ranked list in order of relevance • A ranked list in order of ‘quality’ (validity, currency,

size, …?)

• (Note the contradictions inherent in these needs)

Page 28: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

28

Possible search/indexing methods

• “Brute force”• Usage or linkage-based• Librarian-based• Author-based• (See tutorial resources for detailed

explanations)

Page 29: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

29

“Brute force” search/indexing

• The basis of most early web search engines:• Index every word in the document• Determine relevance and ranking from word

frequency and position

• No cataloguing required• Can be completely automated and therefore

very cheap• Gives lots of ‘hits’ for almost any word, but very

inaccurate and imprecise

Page 30: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

30

Usage and linkage-based search/indexing methods

• Initially most famously used by Google; now very common:

• Identify document content by its associated links • Measure document quality by its popularity or by

the number of links to it

• No cataloguing required• Can be completely automated and therefore

very cheap• Accuracy is variable; coverage is incomplete• Gives lots of hits, but patchy results

Page 31: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

31

Directory-based search/indexing methods

• Initially used by Yahoo:• ‘Librarian’ views site and identifies its content

(indexes) and quality (relevance and ranking)

• ‘Librarian’ includes site in catalogue of sites

• Labour-intensive and therefore expensive• Relies on librarian’s expertise and judgement of

content• Generally accurate and precise, but requires

skilled librarians

Page 32: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

32

Author-based search/indexing methods

• Basis for metadata systems:• Document author includes document and content

information in the document• Indexes are based on author-supplied information

• Uses only the author’s time; therefore cheap for document distributor

• Relies on author’s understanding of indexing and metadata

• As accurate and precise as the document author makes it

Page 33: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

33

XML: Extensible Mark-up Language

• Developed by W3C and others as the ‘big one’ - “the universal format for structured documents and data on the web”

• Key concern is metadata, but also aims to provide a framework within which display formats such as HTML, CSS, etc can sit as specialised languages

• Provides framework within users can create and define their own mark-up tags for specialist applications (hence “extensible”)

• Based on SGML, but guided by experience with HTML

Page 34: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

34

The role of other ML’s after XML

• HTML, CSS, etc are so widely-used that they must still be supported

• Future versions will be (should be!) sub-sets of XML; eg after HTML 4.01, next version was XHTML, etc

• New specialist standards and formats developed to live under XML and as a bridge back to old formats

• Move to ‘tighter’, less forgiving mark-up interpreters to ensure compliance

Page 35: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

35

XML and metadata standards

• For XML to extend to cover all document types, it needs to have standards for these document types

• Hence, various standards are being developed within the XML framework to try to define the ‘standard’ document elements for different document types

• Leads us to the Semantic web and web services (see later lecture)

Page 36: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

36

4. Implications of mark-up for web applications

• The growing complexity of mark-up languages and browsers

• The merging (collision?) of form (document appearance and display) with function (document description and searching)

• What matters most for your application?• How easily can it be implemented in a way

which readers/users can deal with?

Page 37: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

37

Initial Vision

User EndDocument End

User reads with...Developer creates with...

Standard mark-up language (HTML)

Standard browser(Mosaic)

Page 38: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

38

Actuality

User EndDocument End

HTML

DHTML

XHTML

XML

+

Other media

+

Scripts

Netscape

Internet Explorer

Opera etc

… to display ‘standard’ media+ plug-ins to display other media (Acrobat, Real Player, etc)

User accesses with...Developer creates with...

Composer tools which generate marked-up document:

Dreamweaver, Front Page,CMS, etc

Scripting Languages

Page 39: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

39

Developing around the limits of HTML/XML and browsers

• Standards for all document formatting needs?

• Support for standards by the big industry players?

• Technical expertise needed in order to publish material which displays properly on the web?

• Technical expertise needed to find and read material on the web?

• Web access from non-computer-screen devices?

Page 40: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

40

Ambition and Achievability (1)

• Can a single formatting standard satisfy every document display and content description requirement?

• Can the HTML web be turned into the XML web?• Conceptual simplicity vs practical complexity• See Jan Bosak’s remarks (next slide)

Page 41: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

41

Ambition and Achievability (2): Jan Bosak on mark-up, linking and display

• Early visionaries went further than adopters would follow

• Breakthroughs came from newcomers who simplified earlier advanced techniques

• Original more complex work is now being re-introduced and seen as necessary

• Biggest roadblock to success with more advanced solutions is the success of the simple limited solutions

(Prolog to: Goldfarb, C (2004) ‘XML handbook’, 5th ed)

Page 42: Www.monash.edu.au IMS5401 Web-based Systems Development Topic 2 (cont): Elements of the Web Ambition versus Achievability (e) Document formatting - display

www.monash.edu.au

42

What does this mean for information professionals?

• Learning about HTML tags?• Learning about browser capabilities and

differences?• Learning about search engines?• Learning about metadata?• Learning about XML?• Web services and the semantic web