clarke, r. j (2000) l909-12: 1 office automation & intranets buss 909 lecture 12 advanced...
Post on 18-Dec-2015
214 views
TRANSCRIPT
Clarke, R. J (2000) L909-12: 1
Office Automation & Intranets
BUSS 909
Lecture 12Advanced Systems:
Dynamic Generation of Web Pages, Embedded Servers
Clarke, R. J (2000) L909-12: 2
Agenda (1)
we consider some advanced web systems which are either available or possible and useful
all use as a basis some form of Common Gateway Interface (CGI) which are capabilities available in all web servers
some of these advanced web systems are currently under development in the Dept. Business Systems
Clarke, R. J (2000) L909-12: 3
Agenda (2)
the following topics are discussed:dynamic page generation (also known
as dynamic documents) in order to introduce basic CGI concepts
embedded web servers which can be very useful for certain applications- example provided is of a genre prototyping tool
Clarke, R. J (2000) L909-12: 4
Agenda (3)
intelligent hypertext systems that can provide users with previews of documents prior to jumping to them
dynamic site structure- we have described the utility of generating pages on the fly using CGI (in this lecture) and Server Side Includes (discussed elsewhere) but it is possible and useful to also have site structure change dynamically
Clarke, R. J (2000) L909-12: 6
Dynamic Page GenerationDefinition
another name for this is dynamic documents
documents can be generated ‘on the fly’ from information that is being:constantly updated, orgenerated algorithmicallythe result of a search
Clarke, R. J (2000) L909-12: 7
Dynamic Page GenerationUtility
widely used on the web as:gateways to other information systems and
applicationsused to process the input from forms and
image maps
within a web resource, links may be created to virtual documents which when requested are generated before being served
Clarke, R. J (2000) L909-12: 8
Dynamic Page GenerationGateways and Forms
not all information services fit the Web authoring mold in which static files are placed in directories
sometimes the information must be generated dynamically from a database
gateways are used to solve these problems by providing an extension mechanism for the Web Server
Clarke, R. J (2000) L909-12: 9
Dynamic Page GenerationCommon Gateway Interface Specification
the interface between the server and the programs that generate dynamic documents is defined by the Common Gateway Interface (CGI) specification
a related mechanism for generating ‘dynamic’ information (described previously) is the Server Side Include
Clarke, R. J (2000) L909-12: 10
Dynamic Page GenerationGateways and Forms
gateways take an information source that doesn’t fit the web authoring mold making it look to the browser like a file on the Web Server
in practice, the gateway is just a script or program invoked by your web server- accepting user input data through the Web Server and can output HTML
Clarke, R. J (2000) L909-12: 11
Dynamic Page GenerationGateways and Forms
forms are used in WWW as a way of collecting input to a script or program on your server
form scripts are closely related to gateways- these scripts pass data to and from the Web server
Common Gateway Interface (CGI) is a mechanism for communicating between a gateway and web server
Clarke, R. J (2000) L909-12: 12
Dynamic Page GenerationCommon Gateway Interface Specification
CGI specification describes how HTTP servers interact with external gateway programs
these external gateway programs- called CGI scripts- can be written in almost any language including Perl, AWK and C, Visual Basic etc.
Clarke, R. J (2000) L909-12: 13
Dynamic Page GenerationCGI Operation
Information is passed to CGI scripts via environmental variables and in the input streamCGI scripts are written to output a HTML
document, with MIME headersserver sends the output to the clients’
browser and the MIME headers tell the browser how to display the document
Clarke, R. J (2000) L909-12: 16
Dynamic Page GenerationCGI Security Concerns
CGI scripts can open up potential security loopholes in web applications
this is because they have the ability to access information from outside the usual web directory hierarchy
must be considered potentially untrustworthy
Clarke, R. J (2000) L909-12: 19
Dynamic Page GenerationHTML Tags: <FORM…/FORM>
requests are generally gathered using a <FORM> tag which includes attributes:ACTION: URL specifying the location to which
the contents of the form are to be sent- generally a CGI script
METHOD: selects variations in the protocol eg. GET or POST
ENCTYPE: specifies the format of the submitted data (if necessary)
Clarke, R. J (2000) L909-12: 20
Dynamic Page GenerationVerify Tutorial Preferences <FORM>
<CENTER><H2>FACULTY OF COMMERCE<BR>TUTORIAL PREFERENCE SYSTEM<P> Verify Tutorial Preference</H2></CENTER>
<FORM METHOD=“POST” ACTION= “/cgi-bin/tps-cgi”>
<INPUT TYPE “hidden” NAME=“state” VALUE=“13”>
<INPUT TYPE “hidden” NAME=“chksum” VALUE=“91wlyrn”>
<P><BR><UL>
<B>PLEASE ENTER YOUR STUDENT NUMBER: </B> <INPUT TYPE=“text” NAME=“studnum” MAXLENGTH=“7” SIZE=“7”>
<B>PLEASE ENTER YOUR BIRTHDATE (ddmmyy): </B> <INPUT TYPE=“password” NAME=“dob” MAXLENGTH=“6” SIZE=“6”>
<CENTER>
<B>EXIT your web browser after viewing your tutorial preferences, <BR> otherwise anyone using this computer after you will be able to access your information </B>
<P>
<INPUT TYPE=submit VALUE=“Click Here to Continue”>
<INPUT TYPE=reset VALUE=“Click Here to Clear Entries”><P>
</CENTER>
</FORM>
Clarke, R. J (2000) L909-12: 21
Dynamic Page GenerationInformation Flow
when the user enters text on a Form and hits the return key, the web browser sends keystrokes captured by the user to the web server (for example NCSA web server is called http daemon or httpd server)
the web server accepts input, starts up the gateway and hands the input to the gateway via CGI
Clarke, R. J (2000) L909-12: 22
Dynamic Page GenerationInformation Flow
the users keystrokes are passed to the gateway either via:environmental variables, called the GET
method or,using standard input, called the POST
method
the gateway then parses the input and processes it (eg. sends a retrieval command to a database)
Clarke, R. J (2000) L909-12: 23
Dynamic Page GenerationInformation Flow
the gateway may generate HTML output (via a template)
the HTML output is returned to the web server to either:pass on to the client, orit may save the data in a file or database, ormay send the information to someone via
Clarke, R. J (2000) L909-12: 24
Dynamic Page GenerationGateway Scripts
may be scripts or programs written in C/C++, Perl, tcl, the C shell or the Bourne Shell:Perltcl stands for tool command language and is
pronounced ‘tickle’ C shell and Bourne shell are interactive
command interpreter and command programming language for UNIX
Clarke, R. J (2000) L909-12: 25
Dynamic Page GenerationCGI Gateway Scripts- HTML output
CGI gateways that generate HTML output are required to preface the HTML output to stdout with the following line:Content-type: text/html
this line must be followed by a blank line before the first <HTML>tag is sent
Clarke, R. J (2000) L909-12: 26
Dynamic Page GenerationCGI Gateway Scripts- Non-HTML output
the gateway need not generate HTMLit could return the URL of another file,
indictating to the browser that it should get a file- this is called URL redirection
CGI gateways using URL redirection write the following line to stdout:Location: URL
Clarke, R. J (2000) L909-12: 27
Dynamic Page GenerationInformation Flow
DatabaseWeb
BrowserWeb
Server(3) Gateway
Script or Program
(1) Form (2) CGI
(4) HTML(5) HTML
Clarke, R. J (2000) L909-12: 29
Embedded Web ServersTraditional Client-Server Model
So far our discussions in this course have centred on a traditional client-server model for web systemsclient sends a request to a remote
server, and eventuallya response is returned to the client
based on the operation of the remote server
Clarke, R. J (2000) L909-12: 30
Embedded Web ServersClient-side Servers
so useful is this arrangement that we rarely question it- but it is not necessary to have a server running remotely
rather it may be useful to have one or many temporary web servers instantiated and executed client-side
Clarke, R. J (2000) L909-12: 31
Embedded Web ServersCase Tool for Genre Analysis- GASP
genre can be applied to analysing the structure of workpractices (Clarke 2000)to speed up the description of
workpractices a case tool is being developed in Dept. Business Systems
system is called Genre and Action Sequence Processor (GASP) uses an client side or embedded web server
Clarke, R. J (2000) L909-12: 32
Embedded Web ServersCase Tool for Genre Analysis- GASP
users and analysts jointly build up a genre sequences consisting of a set of nodes and linksthe nodes are web pages which may
contain textual descriptions, forms etc describing a stage in a workpractice
alternatively nodes may contain video clips of action collected in the field
Clarke, R. J (2000) L909-12: 33
Embedded Web ServersOperation of GASP
the nodes and links are dynamically created client-side using an embedded serverusers make requests to the embedded
server for a requested pagebut the page is generated only when it
is needed from a directed graph of the genre which is stored in a name space
Clarke, R. J (2000) L909-12: 34
Embedded Web ServersOperation of GASP
state information about the users’ traversal of the genre digraph is stored in the URL, andultimately written to a database when the
user reaches the end-of-sequence symbol for the digraph or when the system times out
stored traversals are the basis for usability analysis!
Clarke, R. J (2000) L909-12: 35
Embedded Web ServersGASP as a CASE Environment
GASP can form part of CASE environment using distributed web technologies
by configuring GASP to echo its activity to a proxy server, a project manager would be able to see exactly what the users and analysts are doing in real time
Clarke, R. J (2000) L909-12: 36
Embedded Web ServersGASP as a CASE Environment
Browser 1
Browser 2
DynamicServer
HTTP 1
HTTP 2
HTTP n
HTTP
DynamicServer
Procedure
GASP
Proxy Server
DynamicServer
Application
Clarke, R. J (2000) L909-12: 38
Intelligent Hypertext SystemNeed to Understand Textual Resources
in Lecture 10, we discussed issues relating to text resources:re-purposing texts to hypertexts can disrupt the
communicative utility of the former (see also T909-10.DOC)
throughout this course we showed how understanding texts could help us create widgets enabling users to traverse large hyper-documents while simultaneously reducing screen ‘real estate’
Clarke, R. J (2000) L909-12: 39
Intelligent Hypertext SystemInability to Preview prior to Jumping
one aspect of the WWW which is a problem is that users are not able to preview textual resources prior to jumping to thempromotes superficial and inefficient
reading practices- ‘skim, scroll and peck’ (Clarke 1995)
increases the number of hits on server, increases user frustration etc.
Clarke, R. J (2000) L909-12: 40
Intelligent Hypertext SystemPreview prior to Jump Feature
the ability to preview prior to jumping to a resource would provide users with much greater control over what they retrieved
this is frustrating because this capability has been available on the earliest microcomputer-based hypertext systems (eg. Hypercard, Supercard on the Apple Macintosh etc.)
Clarke, R. J (2000) L909-12: 41
Intelligent Hypertext SystemThematic & Information Resources of Texts
the texture resource that readers need in order to predict what will occur next in a text is referred to as theme
usually associated with theme is the texture-forming resource called information in which subsequent ‘new’ meanings can be created in a text from previously accumulated ‘given’ meanings
Clarke, R. J (2000) L909-12: 42
Intelligent Hypertext SystemPreview features emphasise Theme
each intranet text for which previews are required, would need to have encoded:various themes at the level of the clausevarious hyper-Themes at the level of
the paragraph, andthe so-called macro-Theme at the level
of an overall abstract for a text
Clarke, R. J (2000) L909-12: 43
Intelligent Hypertext SystemDetermining Thematic Resources
it is unlikely that we will ever be able to completely automate the analysis of texts for thematic resourcestools are available to support a linguist to
conduct this kind of textual analysis (see Michael O’Toole’s SFL WWW Site), but
it only needs to be done once for each hyper-document and then only needs to be repeated each time the document is amended
Clarke, R. J (2000) L909-12: 44
Intelligent Hypertext SystemEncoding Thematic Information (1)
the results of thematic analysis need to be encoded in the hyper-document
the thematic analysis must: move with the documentmust be copied whenever the document is
duplicated within the originating web sitemust not interfere with the rendering or
processing of the document on other sites
Clarke, R. J (2000) L909-12: 45
Intelligent Hypertext SystemEncoding Thematic Information (2)
HTML standard of adding user content to a document is by using META tags:there are conventional uses of this tag (eg.
Description, Keywords) but there is no explicit standardisation limiting what can be encoded into a hyper-document using this tag
meta information is not displayed in the browser- users don’t know it exists unless they View Page Source
Clarke, R. J (2000) L909-12: 46
Intelligent Hypertext SystemEncoding Thematic Information (3)
thematic resources are organised into chains which flow through a text
each text will have a pattern of themes called a thematic progression (examples of which include simple, multiple, and zig-zag)
how to efficiently encode these chains into META tags will form the basis of a Masters project (anyone interested?)
Clarke, R. J (2000) L909-12: 47
Intelligent Hypertext SystemEncoding Thematic Information (4)
once the chain encoding is developed, it is likely to be applicable to other resources as well- including information
each text only needs to knows about its own thematic structure
providing the thematic preview of a document referenced by, or reachable from, the current one is conducted as a server-side process
Clarke, R. J (2000) L909-12: 48
Intelligent Hypertext SystemEncoding Thematic Information (4)
it is assumed that all documents in the intranet have encoded in them, the required thematic and associated information resources
this involves some additional preparation work during re-purposing or document creation, but the resulting increase in functionality would be well worth it
Clarke, R. J (2000) L909-12: 49
Intelligent Hypertext SystemEncoding Thematic Information (5)
<HTML> <HEAD> <TITLE>Current Document</TITLE> <META NAME = "macro-Theme" CONTENT = "mt_text_description"> <META NAME = "hyper-Theme" CONTENT = "ht_paragraph_description"> <META NAME = "Theme" CONTENT = "t_clause_description"> <META NAME = "New" CONTENT = "n_text_description"> <META NAME = "hyper-New" CONTENT = "hn_text_description"> <META NAME = "macro-New" CONTENT = "mn_text_description"> </HEAD> <BODY> : : <A HREF="diffdoc.doc">Different Document</A> : : <P></P> </BODY></HTML>
Clarke, R. J (2000) L909-12: 50
Intelligent Hypertext SystemSuggested Architecture (1)
if the user rolls their mouse over a link to an encoded document as is normally the case, clicking the left
arrow key, enables the user to immediately jump to that document, but
clicking the right arrow key, opens up the usual menu of choices (Edit Linked Item, View Linked Item etc) but also displays at the top of this list an option called “About this Item”
1
Clarke, R. J (2000) L909-12: 51
Intelligent Hypertext SystemSuggested Architecture (2)
the ‘About this Item’ option is not a standard option on this menu- it is included by the Intranet operatorsadding menu items themselves is a
relatively straightforward configuration detail when using the Netscape browser
it is something that is easily setup for Intranet developers
Clarke, R. J (2000) L909-12: 52
Intelligent Hypertext SystemSuggested Architecture (3)
clicking on the ‘About this Item’ option sends a GET URL request to run the Theme CGI program on the web server, along with the document pointed to by the link
as part of the operation of the Theme CGI program, a META tag parser is run on that document
2
Clarke, R. J (2000) L909-12: 53
Intelligent Hypertext SystemSuggested Architecture (4)
the Theme CGI program sends back a response in the form of a dynamically generated hyper-document consisting of the output of the META tag parser
a dependent window is opened in the users browser to display the hyper-document- options can be selected for pulling up required information
3
Clarke, R. J (2000) L909-12: 54
Intelligent Hypertext SystemSuggested Architecture (5)
the users could select from a range of available options depending on the what text resources were encoded in the document’s META tags, including:Abstract: encoded macro-ThemeTopics: encoded hyper-ThemesInformation: encoded hyper-NewSummary: encoded macro-New
Clarke, R. J (2000) L909-12: 57
Scalable Site StructureScalable Web Sites
some web hosting service companies understand that corporate sites must be scalable- that is the entire site can change its scale or sizesmall companies may only initially need a
small web presence......but over several years they may then
need to add more extensive e-commerce, and extranet capabilities
Clarke, R. J (2000) L909-12: 58
Scalable Site StructureScalable Web Sites
we will describe an overview of the technology being developed by a new company called Loudcloudfounded by some senior former
employees of Netscape including its co-founder Marc Andreessen...
...and former Netscape/AOL executives Ben Horowitz, Tim Howes and In Sik Rhee
www.loudcloud.com/company/index.html
Clarke, R. J (2000) L909-12: 59
Scalable Site StructureScalable Web Sites
Loudcloud uses a technology it has developed called Opsware™ automation technology to enable sites to be scaled due to planned or unexpected massive increases in demand
Opsware supports that allocation of additional Capacity On Demand within minutes of a request!
Clarke, R. J (2000) L909-12: 60
Scalable Site StructureScalable Web Sites
a customer can dynamically add or delete services as required- Loudcloud refers to these as Smart Cloud™ technologies Smart Clouds are predefined componentsLoudCloud can do all this because it controls
the construction and hosting of each web site it supports
each web site is heavily ‘instrumented’ and centrally controlled
Clarke, R. J (2000) L909-12: 61
Scalable Site StructureLoudcloud’s Scalable Web Sites
Operational Environment
OpswareTM Technology
Smart CloudTM Services
Internet Business Application
A companies Internet Applications are built on top of Loudcloud’s architecture
Each internet service referred to as a Smart Cloud, is built on to of Opsware automation technology
Opsware technology automates manual tasks including: capacity scaling system configuration & provisioning site versioning
A range of hardware and systems software can be used to support Loudcloud’s environment
Clarke, R. J (2000) L909-12: 62
Scalable Site StructureScale to fit load requests (hits) Add or delete services
Clarke, R. J (2000) L909-12: 63
Dynamic Site Structure Rationale (1)
there is a related but perhaps even more radical possibility than to having scalable web sites- one which has a great deal of promise commercially
we can extend the idea of dynamic web pages to that of having dynamic web sites- sites that change their structure to accommodate use
Clarke, R. J (2000) L909-12: 64
Dynamic Site Structure Rationale (2)
experience with developing web sites should suggest to you that content determines the structure of the web sites assumption that web sites should have a
static web structure- one which does not change over time, or alternatively
that it is either not useful, or too much bother to change the site structure
Clarke, R. J (2000) L909-12: 65
Dynamic Site Structure
this is the case even when we use:Dynamic HTML (DHTML), and
JavaScript to produce pages which appear to be changing
or even when we generate pages-on-demand, that is dynamic generated pages as a consequence of searches, database queries etc.
Clarke, R. J (2000) L909-12: 66
Dynamic Site Structure
in Lecture 9, we discussed installation of a web server which focussed on the NCSA httpd Server, and was tested by installing an Apache Server for Windows/NT on zathros at the Department of Business Systems
recall that we will also need to Configure, Manage and Analyse the Server Log Files which grow as a consequence of accesses made by users of the web server
Clarke, R. J (2000) L909-12: 67
Dynamic Site Structure
if we study what users are accessing on the web server then we can re-organise the site structure to assist users in their requests for pages and resourcesthis would result in more quickly or more
easily serving requests for resourcesthis may also permit more users to be
served by a web site
Clarke, R. J (2000) L909-12: 68
Dynamic Site Structure
re-organising web sites, in order to promote functionality based on analysing the requests by users, is currently being done by some consultants in Australia
usually this leads to an improved but none-the-less static site structure which enables users to more easily access resources
Clarke, R. J (2000) L909-12: 69
Dynamic Site Structure Necessity for Structural Change
changing the site structure can be useful:over the long-term, site structures do and
should evolve as its uses are being further refined or redefined (form should follow function)
in the short-term, usage patterns may change diurnally. For example, intensive web-database queries for local workers during the day time, and FTP requests by overseas workers during the evening.
Clarke, R. J (2000) L909-12: 70
Dynamic Site StructureTechnical Feasibility
it is technically feasible to generate a redirect page which: informs users that a page has moved
with a reminder to bookmark the new location- you have probably encountered this already, or which
automatically redirects a user to the new location- sometimes without a user being aware that this has occurred
Clarke, R. J (2000) L909-12: 71
Dynamic Site StructureApproach (1)
although there are no technical impediments to changing site structure, and some justification for doing it, the question becomes how to best change the site structure?
as mentioned earlier it requires access to the Server Log file, and a willingness to set up this file to record as much as possible about user requests
Clarke, R. J (2000) L909-12: 72
Dynamic Site StructureApproach (2)
the page hits for each user session must be recorded using the Site Log
these records of user sessions can identify the parts of the site structure hierarchy being intensively used
the analysis would likely proceed by creating a weighted tree of usage across the site topology
Clarke, R. J (2000) L909-12: 73
Dynamic Site StructureApproach (3)
just as with dynamic web pages on primarily static sites, not all sections or weblets in a web site will need to have a dynamic site structure
the key to implementing efficient dynamic site structure is to isolate those parts of a site topology that will need to be changed
Clarke, R. J (2000) L909-12: 74
Dynamic Site StructureApproach (4)
for those parts of the site structure that require routine change, server- side system programs may need to:rearrange web pages at the terminal nodes or
leaves of the weighted treeprovide additional intermediate nodes in the
form of additional pagesneed to change, verify and manage internal
links between pages to remove the possibility of bad links
Clarke, R. J (2000) L909-12: 75
Dynamic Site StructurePattern Recognition, Neural Nets
required usability analysis could be implemented by applying pattern recognition techniquessuggestive of neural network technologies-
web site usage learnt by example- there are research opportunities in this work!
great deal of money to be made as this information on user behaviour is just a form of consumer profiling
Clarke, R. J (2000) L909-12: 76
Dynamic Site StructureAnalogous Approach found in HCI
this process is analogous to applying so-called usability analysis- developed in HCI for improving user interfacesthis involves analysing session transcripts
for evidence of repeating sequences of keystrokes during the operation of a system
identify the most frequently occurring runs, then reducing or removing them while providing the same functionality
Clarke, R. J (2000) L909-12: 77
Dynamic Site StructureOutsourcing to Portal Sites
to create dynamic site structure there must be tight integration between the Site Logs and the Web Site itself
if organisations want this feature, but don’t want to implement it themselves, they will need to outsource their entire web site operation and maintenance to a Portal Site
Clarke, R. J (2000) L909-12: 78
Dynamic Site StructureRelevant Example ...
a dynamic site structure could in principle be useful for an educational web site like the one being developed to support BUSS909
at the moment, students must click on a Lectures link to get a list of all Lecture files, then select the week or topic in order to select one file from a maximum of 14 files
Clarke, R. J (2000) L909-12: 79
Dynamic Site Structure… Relevant Example ...
students could click on a link called Current Lecture, which retrieves the file relevant to the current week:the previous lectures could be accessed
via a section page called Previous Lectures
the Future lectures could be accessed via a section page called Future Lectures
Clarke, R. J (2000) L909-12: 80
Dynamic Site Structure… Relevant Example ...
we might implement this ifthe Site Log files indicated that students
were accessing the same file repeatedly- an indication that they did not know which was current
under the assumption that all lecture files were to made available prior to delivery- not the case at the moment!
Clarke, R. J (2000) L909-12: 81
Dynamic Site StructureDynamic Site Structurefewer incorrect hits but the site structure would need to be revised weekly
Static Site StructureUser request activity high medium lowLectures, Past Lectures, Future Lectures
1 2 3
1 2 3 4
1 2 3 4 5
Week 2
Week 3
Week 4
1 2
3
4
6
Week 3
Week 4
P
L
F
L
L
L
5
1 2
4
3
P
L
F
5
Clarke, R. J (2000) L909-12: 82
Acknowledgements
the author gratefully acknowledges discussions with Joshua Fan, Department of Business Systems, who suggested the web architecture necessary for implementing the Intelligent Hypertext System and also for alerting the author to the possibility of Portal Services providing organisations with web hosting services that would provide for the restructuring of web sites
the architecture of GASP is being jointly developed as part of an
ongoing project between Rodney Clarke with Tony McGrath, Wandering Albatross Consulting
Clarke, R. J (2000) L909-12: 83
References
Clarke, R. J. (1995) “WWW Page Metaphor considered Harmful” Proceedings of OZCHI’95, University of Wollongong
Clarke (2000) “An Information System in its Organisational Contexts: A Systemic Semiotic Longitudinal Case Study” Unpublished PhD Dissertation, Department of Information Systems, University of Wollongong
Fan, Joshua (1999) Personal Communication Ford, A. (1995) Spinning the Web: How to Provide Information
on the Internet London: International Thomson Computer Press McGrath, T. (1999) Personal Communication Schwartz, R. L. (1999) “Programming with Perl: Step-by Step
Link Verification” Web Techniques 4 (3) March 1999, 30-34