transclusions in an html-based environment

13
Journal of Computing and Information Technology - CIT 14, 2006, 2, 161–174 doi:10.2498 /cit.2006.02.07 161 Transclusions in an HTML-Based Environment * Josef Kolbitsch 1 and Hermann Maurer 2 1 Graz University of Technology, Austria 2 Institute for Information Systems and Computer Media, Graz University of Technology, Austria Transclusions are an advanced technique for the inclusion of existing content into new documents without the need to duplicate it. Although originally described in the early 1960s, transclusions have still not been made available to users and authors on the world wide web. This paper describes the prototype implementation of a system that allows users to write articles that may contain transclusions. The system offers a simple web-based interface where users can compose new articles. With a simple button the user has the ability to insert a transclu- sion from any HTML page available on the world wide web. While other approaches introduce new markups for the HTML specification, make use of technologies such as XML and XLink or employ authoring systems that internally support transclusions and can generate web pages as output, this implementation solely relies on the techniques provided by an HTML-based environment. Therefore HTML, Javascript, the Document Object Model, CGI scripts and HTTP are the core technologies utilised in the prototype. Keywords: hypertext, transclusions, xanalogical struc- ture, authoring systems, publishing systems, web-based applications. 1. Introduction In 1965, Ted Nelson presented “a le structure for the complex, the changing and the indeter- minate”, in which he introduced the term hyper- text see 20 . One of the fundamental concepts in Nelson’s notion of hypertext is a technique called transclusions. Transclusions allow au- thors to include portions of existing documents into their own articles without duplicating them. Basically, a transclusion in document A is a ref- erence to a portion of the content of a potentially remote document B that is virtually included into document A see Figure 1 . Fig. 1. Exemplary transclusion. Part of document B top right is transcluded into document A top left . Bottom: the source code of document A does not contain the actual text of document 2, but only the data required to retrieve it from the original document. The following sub-sections discuss the back- ground of transclusions and their use in HTML. Section 2 addresses attempts to implement tran- sclusion. Our design and implementation of transclusions in HTML-based environments are detailed in section 3. Issues that were encoun- tered during the implementation are described This paper was supported by the Styria Professorship for Revolutionary Media Technologies.

Upload: others

Post on 15-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Transclusions in an HTML-Based Environment

Journal of Computing and Information Technology - CIT 14, 2006, 2, 161–174doi:10.2498/cit.2006.02.07

161

Transclusions in anHTML-Based Environment*

Josef Kolbitsch1 and Hermann Maurer21 Graz University of Technology, Austria2Institute for Information Systems and Computer Media, Graz University of Technology, Austria

Transclusions are an advanced technique for the inclusionof existing content into new documents without the needto duplicate it. Although originally described in the early1960s, transclusions have still not been made availableto users and authors on the world wide web.

This paper describes the prototype implementation of asystem that allows users to write articles that may containtransclusions. The system offers a simple web-basedinterface where users can compose new articles. With asimple button the user has the ability to insert a transclu-sion from any HTML page available on the world wideweb.

While other approaches introduce new markups for theHTML specification, make use of technologies suchas XML and XLink or employ authoring systems thatinternally support transclusions and can generate webpages as output, this implementation solely relies on thetechniques provided by an HTML-based environment.Therefore HTML, Javascript, the Document ObjectModel, CGI scripts and HTTP are the core technologiesutilised in the prototype.

Keywords: hypertext, transclusions, xanalogical struc-ture, authoring systems, publishing systems, web-basedapplications.

1. Introduction

In 1965, Ted Nelson presented “a file structurefor the complex, the changing and the indeter-minate”, in which he introduced the term hyper-text �see �20��. One of the fundamental conceptsin Nelson’s notion of hypertext is a techniquecalled transclusions. Transclusions allow au-thors to include portions of existing documentsinto their own articles without duplicating them.Basically, a transclusion in document A is a ref-erence to a portion of the content of a potentially

remote document B that is virtually includedinto document A �see Figure 1�.

Fig. 1. Exemplary transclusion. Part of document B�top right� is transcluded into document A �top left�.

Bottom: the source code of document A does notcontain the actual text of document 2, but only the data

required to retrieve it from the original document.

The following sub-sections discuss the back-ground of transclusions and their use in HTML.Section 2 addresses attempts to implement tran-sclusion. Our design and implementation oftransclusions in HTML-based environments aredetailed in section 3. Issues that were encoun-tered during the implementation are described

� This paper was supported by the Styria Professorship for Revolutionary Media Technologies.

Page 2: Transclusions in an HTML-Based Environment

162 Transclusions in an HTML-Based Environment

in section 4. Finally, section 5 discusses severalaspects of our implementation.

1.1. Background of Transclusions

Transclusions are designed as complete replace-ment for all cut-and-paste mechanisms in use.Nelson argues that cut-and-paste is not whatpeople actually want to do but that it is a re-striction imposed upon authors by the nature ofpaper. Writers actually do not want to make acopy of an existing document, cut out the piecethey want to reuse and paste it in their docu-ment. They want to include the original contentand let readers know what the source and thecontext of the quote is �e. g. , �21��.

Reference lists at the end of a scientific publi-cation, for instance, are usually not what is in-tended by writers and desired by readers. Theyare rather a pragmatic solution to the problemthat both the source and the context of the quo-tation are lost by copying-and-pasting a portionof content printed on paper.

What used to be physical restrictions of paperwas embraced by most computing systems in anattempt to resemble the work environments andcommon processes in offices �cf., �33��. There-fore most current graphical operating systemsmake use of metaphors such as a desktop, fold-ers and documents; a document has to be putin exactly one folder; there is a clipboard, andcontent from a different document is includedusing copy-and-paste mechanisms �see �23��.

1.2. Implications of Transclusions

Transclusions are, however, not only a mere re-placement for copy-and-paste. They assure thatthe original context of a quotation is preservedand can provide a visible link to the source ofthe transclusion. Ted Nelson’s approach to real-ising this functionality is based on transpointingwindows �e. g. , �22��.

Moreover, authors of documents can be noti-fied when their articles are transcluded. Thusthey can, for instance, find out about otherresearchers in the same area. Authors usingtransclusions, on the other hand, can be in-formed automatically about modifications insource documents �see �14��.

Apart from obvious improvements in author-ing and publishing systems, transclusions canalso offer a solution to copyright issues expe-rienced today on the world wide web: authorsinclude content into their documents by meansof transclusions. Whenever a reader views atransclusion, a note about the rights associatedwith the transcluded content is added, and a mi-cropayment is made to the corresponding owner��25��. Nelson names this model transcopyright�see �24��.

1.3. Transclusions and HTML

The Hypertext Markup Language �HTML, �7��is a relatively simple language for describingplatform-independent hypertext pages. In theearly stages of its development, the focus ofHTML was on style and graphical presenta-tion rather than on functionality and underlyingparadigms. Therefore, many innovative ideassuch as bidirectional hyperlinks and issues al-ready known at that time including broken linkswere not considered in the implementation �e.g. , �27��.

In principle, transclusions are used in HTML.Designatedmarkups including �img�,�object�and �embed� incorporate content such as im-ages, Java applets and animations into HTMLpages by means of linking. Thus, these el-ements basically make use of the concept oftransclusions.

Transclusions inHTMLare very limited, though.Only certain media such as images can be virtu-ally included, whereas textual content, in gen-eral, cannot be transcluded. Moreover, the tran-sclusion mechanisms available in HTML canonly be applied to entire documents. Fine-grained transclusions such as a small spatialselection of an image are not implemented.

2. Attempts to Implement Transclusions

Although the idea of transclusionswas proposedsome forty years ago, only a few attempts toimplement this advanced technique have beenmade. The following sections give an overviewof several notable approaches to the realisationof transclusions.

Page 3: Transclusions in an HTML-Based Environment

Transclusions in an HTML-Based Environment 163

2.1. Xanadu

Transclusions are an integral part of Xanadu,Ted Nelson’s original hypertext system �e. g. ,�21��. Their implementation relies on a docu-ment model, though, that is radically differentfrom what is widely used today. In Xanadu,documents �versions� do not contain content butreferences to the actual content. Content is bothstored and referenced with the highest granular-ity possible – on the level of single characters.All content is retained in content repositories.

Any document is made up of a list of referencesto content stored in the system, e. g. , a documentconsists of “characters 124 to 729 and 1276 to1301 from the repository”. When content fromdocument B is transcluded into document A, thecorresponding references to the actual contentin the repositories are added to the reference listof document A.

Thus, the creation and retrieval of transclusionsin Xanadu are trivial list operations. Ted Nel-son also details a number of functions relatedto transclusions and the handling of situationsin which documents are modified or large por-tions of documents are deleted �e. g. , �25��.Basically, these functions can be seen as morecomplex list operations.

2.2. Proposal to Amend the HTMLSpecification

Since the idea of transclusions is already presentin HTML for media such as images and multi-media animations, a sensible approach to text-based transclusions is to introduce a new tag thatallows users to transclude text. Therefore, �28�suggests an amendment to the HTML specifica-tion. A new markup, �text�, is proposed withthe intention to offer an element of the samesignificance as �img� or �embed�.

The main attributes of the markup are the URIof the source document and the start positionand length of the text to be transcluded. Theweb browser analyses the tag, loads the sourcedocument of the transclusion, extracts the por-tion of text given by the attributes of the �text�tag, and inserts it into the document. Thus, atransclusion is handled in a similar way as aninline image.

Although the proposal seems rational, it has notbeen accepted, and no web browser to date hasthe feature implemented.

2.3. Transclusions with IFrames andEmbedded Objects

The recommendation for HTML 4 includesmarkups for inline frames and embedded ob-jects �e. g. , �7��. Both inline frames and embed-ded objects define areas within a given HTMLdocument that can be used to display potentiallyremote resources. Inline frames can merelycontain HTML pages and images, whereas ob-jects may contain resources of arbitrary type.

Transcluding document A into document B can,for instance, be achieved by inserting an �ob�

ject� tag with a reference to document A intodocument B ��16��. The capabilities of thistechnique are rather limited, though. Only en-tire documents can be referenced. Moreover,the context is lost because a link from the doc-ument containing the embedded object to thesource of the transclusion is not provided bythese markups. Therefore this approach is notwell suited for realising transclusions.

2.4. XML-Based Transclusions

The Extensible Markup Language, XML, isa flexible language for describing documentsthat contain structured information �see �31��.In contrast to other markup languages such asHTML, where both syntax and semantics aredetermined, neither a set of tags nor the seman-tics are defined in XML. Therefore, XML perse does not contain a distinct markup for links;a separate linking language is used instead.

XLink, the XML Linking Language �e.g., �30��,provides a framework for describing the syntaxand semantics of even complex linking struc-tures between resources. An XLink link typ-ically contains a number of attributes that de-scribe, for instance, what resource is to beloaded and when it is to be displayed.

Three attributes are essential for the implemen-tation of transclusions using XLink:

� href� the document to be loaded. Set to thesource document of the transclusion;

Page 4: Transclusions in an HTML-Based Environment

164 Transclusions in an HTML-Based Environment

� actuate� when the resource is to be loaded.When set to onLoad, the resource is loadedwhen the document containing the XLinklink is loaded;

� show� in which manner the resource is to bedisplayed. When set to embed, the resourceis displayed practically instead of the XLinktag.

The skeleton of the XLink link shown in listing1 transcludes the entire document source�xmlinto the document containing the link at the po-sition of the link. A similar approach to tran-sclusions in XML is described by �29�.

Listing 1. Fragmentary transclusion with XLink.

With the XML Pointer Language �XPointer,�32�� fragments ofXMLdocuments can be iden-tified and addressed as well. Thus, a combi-nation of XML, XLink and XPointer can beemployed to make the use of fine-grained tran-sclusions in XML-based environments possible�see �16; 15��.

2.5. Recent Projects Involving Transclusions

Currently, severalmostly academic projects thatexperiment with transclusions exist. The fol-lowing paragraphs introduce three selected sys-tems.

The University of Nottingham, UK, has pro-posed a technology-based learning environmentthat adapts to its users �see �19��. The informa-tion retained in the system is organised in small“chunks” that are stored as XML files.

Since the system is adaptive, lessons are notstatic but assembled dynamically on the basis ofa lesson plan. When a user requests a particularlesson, appropriate chunks of information areretrieved and included into a virtual documentby means of transclusion. Thus, the system

facilitates the reuse of small pieces of informa-tion for a number of lessons or for students withvarious differing standards of knowledge.

At the University of Bologna, Italy, researchersattempt to combine existing software productssuch as the Internet Explorer and MicrosoftWord to offer a collaborative editing environ-ment for the world wide web �see �3��. The im-plemented tool, XanaWord, allows users to editany web page they view in their web browser –even if they do not have write permissions forthe resource.

Any page displayed in Internet Explorer canbe opened with a word processor such as Mi-crosoft Word, where the user can make arbi-trary changes to the document. When the usersaves the page, only the changes to the originaldocument are stored in the XanaWord reposi-tory. Whenever a document is retrieved from therepository, the modifications made by the userand the content from the original resource areincluded in a dynamically generated documentby means of transclusion. Finally, the dynamicdocument is sent to the user’s web browser.

The Institute for Information Systems and Com-puter Media in Graz, Austria, proposed an envi-ronment capable of handling transclusions invarious output document formats �see �14��.The system includes three components:

� theLatex typesetting system that allowsusersto create documents and save them in a num-ber of document formats includingPostscript,PDF and HTML;

� an extension to Latex that allows users tocreate transclusions; and

� a Hyperwave Information Server �see �9��that handles issues such as linking and ver-sioning.

In the proposed environment users can insert aspecial markup that designates a transclusion inLatex documents. Then, the user has to uploadthe file to a Hyperwave Information Server thatextracts links and saves them in a link database,etc. When the document is requested by a user,the transclusions and links are inserted into thefile saved on the server. The resulting interme-diate file is processed byLatex in order to gener-ate the requested document format. Ultimately,the document containing the transclusion is sentto the client.

Page 5: Transclusions in an HTML-Based Environment

Transclusions in an HTML-Based Environment 165

3. Implementation

In contrast to several approaches to transclu-sions illustrated above, this project does notpresent a proposal but an actual implementa-tion of a system that lets users take advantageof transclusions. It is designed as part of a largersystem that offers communities instruments towork actively with content from digital librariesand electronic encyclopaedias �see �12��. Thus,a prototype is implemented offering a tool forauthoring new articles that can contain transclu-sions. It is available online at �11�.

3.1. Design Goals and Requirements

The environment for creating and retrievingtransclusions aims at facilitating the reuse ofinformation readily available on the Web – evenby novice users. Therefore a number of designgoals have to be taken into consideration:

� ease of use: the tool for making transclu-sions must be as easy to use as traditionalcopy-and-paste mechanisms;

� use of any document on the web: not onlydocuments from a closed repository but ba-sically any web page may be the source of atransclusion;

� level of granularity: any portion of text maybe transcluded from a document, from a sin-gle character to the entire content of a page.

Browser plug-ins or special software tools shouldnot be required. Therefore, this implementationof transclusions solely relies on technologiesavailable and widely utilised on the world wideweb:

� HTML: transclusions can be made from anyHTML formatted document available on theweb. Moreover, documents containing tran-sclusions are presented to the reader as tra-ditional HTML documents �see �7��;

� Javascript, DOM: internally, most currentweb browsers represent HTML pages astrees of objects. The underlying technologyis the Document Object Model �DOM, �4;5��. Javascript is used to access individualobjects in the DOM tree of the HTML pageto be transcluded �see �26; 6�� and enablesfine-grained transclusions;

� HTTP: documents containing transclusionsare transmitted to the readers using the Hy-pertext Transport Protocol �see �8��.

3.2. System Overview

Since the system for creating and retrievingtransclusions consists of a number of compo-nents, a brief overview is given. The followingdescription is made in the order of actions takenby a user in authoring and reading a documentincluding transclusions.

Two fundamental actions can be distinguishedin the system: the creation of a transclusionwhen the article is authored, and its evalua-tion when the page containing the transclusionis to be displayed. When the user wants tocreate an article with a transclusion, the webbrowser presents a frameset with two frames.One frame contains a conventional area for au-thoring HTML content and an additional buttonfor adding a transclusion. When the user pressesthis button, the URL of a page can be entered,an the corresponding page is loaded through anHTTP proxy application into the second frame.The user can either transclude content from thispage or can use the second frame to browse to adifferent page – again through the HTTP proxyapplication. The document to be transcludedis complemented with a button that inserts thetransclusion into the text area of the first frame,when pressed. An illustration of the two framesis given in Figure 5.

The user selects the portion of text to be tran-scluded in the second frame and presses thebutton to have the transclusion actually insertedinto the article. The button calls a Javascriptfunction that determines the start and end posi-tions of the selection made. Together with theURL of the page in the second frame, these val-ues are used to generate an intermediate markupthat is inserted into the article �see section 3.4�.

Once a user has finished authoring an article andchooses to save the new document, the contentsof the text area including the intermediate tran-sclusion markup are sent to a CGI script on theserver. The server stores the “static” text andthe values provided through the transclusion tagto a database. In addition to this, metadata onthe source of the transclusion is collected andstored in the database.

Page 6: Transclusions in an HTML-Based Environment

166 Transclusions in an HTML-Based Environment

Whenever the article containing the transclusionis requested, a second CGI script is invoked.The script retrieves the contents of the articleand the parameters of the transclusion tag fromthe database. The parameters defining the tran-sclusion are used to load the original page fromits original location. If it is unchanged, the tran-scluded portion of the text is extracted from theoriginal page, combined with the static text andsent to the web browser of the client �see Figure5�.

The next section gives a brief introduction tothe architecture of the implementation and itscomponents.

3.3. System Architecture

Our implementation of transclusions followsa classic client-server paradigm. A conven-tional HTTP server, a relational database, sev-eral server-side CGI programs, a non-transpa-rent HTTP proxy application and client-sideJavascript code are the main components of thesystem.

Fig. 2. Overview of the server-side components.

The CGI script “Create and Store” in Figure 2receives the data submitted by users. It analysesthe content of the article, extracts transclusionsand stores both content and transclusions in theinternal database of the system �see section 3.4�.The “Extract and Merge” script, on the otherhand, reads the content of an article togetherwith the information on the transclusion fromthe database, fetches the source document ofthe transclusion, assembles the complete articleand sends it to the user �see section 3.5�.

The thirdCGI script, “FetchExternal”, is utilisedduring the authoring process for loading the

page to be transcluded. This programme is ba-sically necessary to insert a button and a smallportion of Javascript code into the correspond-ing page �see section 4.1�. It relies on a spe-cialised, non-transparent proxy application de-veloped for this project.

In the current prototype implementation the re-lational database consists of only two tables.While one table contains the static content ofthe article, the other one stores detailed infor-mation on the transclusion as well as a rich setof metadata and a fingerprint of the source doc-ument.

3.4. Creating a Transclusion

As explained above, the interface for authoringnew articles consists of a frameset with a framefor writing an article in an HTML form and aseparate frame for displaying the content to betranscluded �Figure 5�. When users wish to in-sert a transclusion they select the portion of textwith the pointer device and click on the buttonprovided in the window.

The button calls a Javascript function which isessentially the only operation carried out on theclient computer. It accesses the document ob-ject model to determine the exact start and endpositions of the selection made by the user andgenerates an intermediate tag that is insertedinto the article. The Javascript interface to se-lections provided bymost browsers is somewhatpeculiar in that it determines these start and endpositions in the way the user actu ally markedthe text. I. e. , the anchor of the selection is theposition where the user clicked to indicate thebeginning of the selection. Then, the user dragsthe mouse, for instance, to the end of the se-lection and releases the mouse button to denote

Listing 2. Syntax of an intermediate transclusion tag�top� and an example �bottom�.

Page 7: Transclusions in an HTML-Based Environment

Transclusions in an HTML-Based Environment 167

the end of the selection. The end position is thefocus. In the following paragraphs, anchor andfocus are denoted by prefixes “a” and “f”.

The syntax of the newly introduced �transclu�sion�markupwith its seven parameters is rathercomplex �see listing 2�. This level of detail isrequired to be able to determine the exact startand end positions of transclusions, though. Val-ues in curly braces describe the type of attributevalues:

� src: the URL of the document to be tran-scluded;

� atag, ftag: the names of the tags in whichthe transclusion starts and ends, e. g. , “P”for a paragraph;

� aindex, findex: the index of the tags inwhich the transclusion starts and ends, e. g.,the seventh paragraph in the document;

� aoffset, foffset: the offset within the startand end tags, e.g., the transclusion starts atthe second character of the seventh para-graph in the document.

The exemplary tag shown in listing 2, for in-stance, describes a transclusion that starts at thefirst character of the second H1 heading andends at the 30th character of the fifth paragraphin the given document.

When the article containing the transclusion issaved by the user, the data is sent to the server,

RemoteServerClient

Server

New Article

Lorem ipsumdolor sit amet.<transclusion>Consectetueradipiscing elitpelletesquecurruminus.

Existing HTMLDocument

Curabitur ultrices.Quisque ac ligula.Suspendisse wiseelit, tincidunttempus, rhoncuset, tortor.

Include Information

on Transclusions

Transclusions"Static" Text

CGI Script(Create & Store)

Javascript

CGI Script(Extract & Merge)

Submit form containingtransclusion tag

Fig. 3. Simplified scematic illustration of the process ofcreating a new article containing a transclusion.

and the “Create and Store” CGI program is in-voked �seeFigure 3�. It extracts the transclusionfrom the article, determines the attributes of thetransclusion and writes the information to thedatabase. The �transclusion� markup in theoriginal article is replaced with a transclusionobject that refers to the transclusion stored inthe database.

The source document of the transclusion is notstored in the internal repository. However, itsURL, the creation and modification dates aswell as an MD5 hash value of the entire pagecontent are retained as fingerprint. These valuesare necessary to determine if the source docu-ment has changed when the transclusion is re-trieved.

It should be noted that, in contrast to �28�, wherean amendment to the HTML specification issuggested, the �transclusion� markup in thisimplementation is only used during the author-ing process. It is inserted when the user makes atransclusion, is evaluated by the system and re-placed by a transclusion object. When the pagecontaining the transclusion is to be displayed,the transclusion object is replaced with the cor-responding content from the original page �seebelow�. Hence, the �transclusion� tag is onlyvisible within the system but not externally tothe user.

3.5. Retrieving a Transclusion

The “logic” of our implementation mainly lieswithin the component that retrieves transclu-sions. Whenever an article is requested, itsbody is analysed for the presence of transclu-sion objects. For each transclusion object thefollowing steps have to be carried out �see alsoFigure 4�:

� resolve the object and retrieve the informa-tion on both the transclusion and on thesource document from the database;

� check if the given URL of the source docu-ment can be loaded;

� if it can be retrieved, check if the metadata,i.e., the creation and modification dates aswell as the MD5 hash values, have changed;

� if the fingerprint of the source document isvalid, retrieve the resource and extract the

Page 8: Transclusions in an HTML-Based Environment

168 Transclusions in an HTML-Based Environment

RemoteServerClient

Server

New Article

Lorem ipsumdolor sit amet.Curabor ultrices.Consectetueradipiscing elitpelletesquecurruminus.

Existing HTMLDocument

Curabor ultrices.Quisque ac ligula.Suspendisse wiseelit, tincidunttempus, rhoncuset, tortor.

Transclusions"Static" Text

CGI Script(Extract & Merge)

CGI Script(Create & Store)

Get originaldocument

Send the document with thetranscluded text to the client

Fig. 4. Handling a request for a page containing atransclusion �simplification�.

Fig. 5. Screenshots from a prototype of transclusions inan HTML-based authoring environment.

portion of text determined by the start andend positions of the transclusion;

� replace the transclusion object in the articlewith the transcluded content;

� if any of the operations above fails, insert anapologetic error message.

Every transclusion is formatted in a way thatreaders can distinguish between authentic and

transcluded content. In Figure 5, transcludedtext is highlighted using a light-gray background.Transclusions are complemented with a hyper-link to the original source of the content.

4. Issues Encountered

During the implementation and evaluation of theprototype a number of difficulties were experi-enced. A few substantial issues are addressedin the following sections.

4.1. Javascript Restrictions

As described in section 3.4, our implementationrelies on Javascript code that detects which por-tions of a document are selected by the user;when the user presses a button, the start and endpositions of the selection are determined.

Restrictions imposed by the security mecha-nisms of most modern web browsers �e. g. ,�18�� prevent Javascript functions from access-ing selections in “foreign” frames and docu-ments. This means that the button that reads theuser’s selection has to be present in the sameframe as the selection.

Since our premise was that transclusions can bemade from any HTML document on the Web,we have to make sure that the Javascript coderequired is inserted in any page the user wantsto transclude. The approach in the current im-plementation is to use a non-transparent proxyapplication. So when users enter the URL ofthe page they wish to transclude, the page is notloaded directly by the web browser, but by aCGI script on the server that acts as an HTTPproxy. The CGI script appends the demandedJavascript code and sends the document to theclient.

The proxy application could be omitted if tran-sclusions were only made in documents froman internal repository such as an online journalor a content management system. The systemgenerating the documents could automaticallyinsert the essential Javascript code when the re-source is requested, for example, with a partic-ular parameter.

Page 9: Transclusions in an HTML-Based Environment

Transclusions in an HTML-Based Environment 169

4.2. Browser Specific Implementation

The function for accessing the user’s selectionposes yet another problem. Different imple-mentations of the corresponding function ex-ist in the various web browsers available to-day. The Mozilla family accesses the selectionthrough the document�getSelection��method,whereas Internet Explorer, for instance, uses adedicated document�selection object �e. g. ,�10��.

Due to the use of the document�getSelection��method in our implementation, the prototype isonly compatible with Mozilla-based browsers.With minor modifications in the client-sideJavascript code, however, the prototype shouldwork with a wide range of web browsers includ-ing Internet Explorer.

4.3. Modified Documents and UnavailableResources

Similar to broken links inweb pages,documentsthat are modified and resources that become un-available can pose a problem for transclusions.One reason for this deficiency is the use of Uni-form Resource Locators on the world wide web�URLs, �2��.

URLs identify an object and describe its phys-ical location. Defining the physical location ofa document determines that only one instanceof the document may exist at a time. Differ-ent resource identification and allocation mech-anisms allow for multiple locations of the samedocument, i. e. , several instances of the samedocument may exist in different physical loca-tions. When a resource with a certain objectidentifier is requested, it is retrieved from oneof the locations that retain a copy �e. g. �27��.This can, for instance, be the location with thefastest network connection, the one with thelowest load, or the one with the shortest dis-tance.

The design of Xanadu takes a similar approach,in which a resource may exist in several loca-tions �e. g. , �25��. Thus, when a transclusion isrequested and one instance of the source data be-comes unavailable, it is retrieved from anotherrepository containing the same information.

We propose an analogue mechanism that makesuse of the Wayback Machine �e. g. , �13��, a very

large archive of currently about forty billionwebpages, and local caching. When a transclusionis requested whose source document has un-dergone major changes or has become unavail-able, the Wayback machine is queried for theresource. The query includes the URL and thecreation date of the transclusion as access date.

Alternatively, a local cache or Google Cachecan be employed. Local caching means that acopy of a resource has to be made when it istranscluded; the local copy is retained in an in-ternal repository of the system. In case of localcaches, however, legal issues may arise. �1�,for instance, discusses whether services suchas Google Cache are in conflict with Germancopyright laws.

Figure 6 illustrates the suggested retrieval strat-egy for transclusions: when the original sourceof a transclusion is available and has not beenchanged, it is retrieved from the original loca-tion. Otherwise an attempt is made to load thepage from the Wayback Machine or from a sim-ilar cache. If this attempt fails as well, the user

Fig. 6. Flow chart of a document retrieval strategywhere the source document of a transclusion can

become unavailable or can be modified.

Page 10: Transclusions in an HTML-Based Environment

170 Transclusions in an HTML-Based Environment

is notified that the transclusion cannot be madeat this time.

5. Discussion

The implementation of transclusions in a purelyHTML-based environment has shown interest-ing perspectives, and various aspects need tobe investigated in detail and require further re-search. A few selected topics are pointed out inthe following sections.

5.1. Robustness

The prototype we presented offers ease of useand relative overall stability. The robustness,however, can still be improved. Under certainconditions, for example, transclusions can beimprecise. Content transcluded from a docu-ment by the “Extract and Merge” componentcan be slightly different from what a user orig-inally selected – a few characters too many ortoo little are extracted.

An issue that generally affects the robustnessof our implementation and demands in-depthanalysis is modified content. The shortcomingpartly arises from an optimization that improvesthe system performance. When modificationsin the source document of a transclusion areto be detected, only the creation and modifi-cation dates as well as the content length in theHTTPheader of the resource are scanned. Someservers do not return these values at all, though,and a small percentage of hosts return invaliddate values. So if creation and modificationdates or the content length are not available, theentire resource is retrieved and an MD5 hashvalue is generated. When the content to be re-trieved is very large, the system load is high orthe network connection is slow, it might take toolong to calculate the hash value. In this case, theprocess might terminate with a time-out signal,and the transclusion cannot be made.

As pointed out above, modifications in transclu-sion sources are a general problem. Especiallydynamic content such as pages from a contentmanagement systemor from a digital library canbe critical. Inmany cases, these documents con-tain advertisements or other frequently chang-ing information such as references to the most

recent articles. Although the actual content ofthe document is not altered, the system com-ponent that analyses the state of transclusionsources would detect a modification.

It is desirable to have modifications in docu-ments and their importance – was only an ad-vertisement changed or has the meaning of thearticle changed – detected automatically. How-ever, this functionality is presently computa-tionally not feasible. �14� suggests leaving thedecision to the user: despite the modificationsthe content from the altered source document istranscluded, and the user has to determine if thetransclusion and the context are still appropri-ate.

In any case, authors of modified transclusionsources should be notified that the content theyvirtually included into their articles might not besuitable anymore, and that it has to be reviewed.

5.2. Aspects of the Design

The design of the implementation, the use of atransclusion object in particular, open up excit-ing opportunities. Since the transclusion objectis associated directly with the source documentof a transclusion, it is possible to determinewhich other articles in our system include con-tent from the same source. This informationindicates that the corresponding articles mightdeal with a similar topic and that they could beof interest for both authors and readers. Moreimportantly, this information denotes hat the au-thors of these articles might work in a similararea. In a scientific setting, for instance, theseauthors can be researchers working on similarprojects. Thus, information exchange can beenabled. From a more general perspective, col-laboration can be fostered and organisationalknowledge management can be facilitated �e. g., �17��.

Therefore we propose a simple function likeWhich other articles transclude the same doc-ument? or Who else uses the same document?that can help readers and writers discover newinformation.

This principle can be applied in the “oppositedirection” as well. Authors can easily find outwhich other articles in the system transcludethe articles they produced. This informationcan basically be used for the same purposes as

Page 11: Transclusions in an HTML-Based Environment

Transclusions in an HTML-Based Environment 171

pointed out above. Hence, we propose anotherfunction that complements every article in thesystem: Which articles in the system transcludethis article or simply put, Who transcludes ‘us’?

In a more sophisticated approach, the systemcould pro-actively point out resources and au-thors that are related to the article being dis-played.

5.3. Aspects of the Proxy Application

Our implementation relies on a non-transparentHTTP proxy application that makes it possibleto insert a small portion of Javascript code intoevery page the user wishes to transclude. Al-though the application was initially intended fora very specific purpose, its design is so flexiblethat a whole range of other, largely unrelatedapplications become feasible.

Blacklisting of words and hyperlinks, highlight-ing of text and dynamic insertion of annota-tions are just a few simple examples. Advancedtechniques may include dynamic adaptation ofcontent, on-the-fly insertion of complementinginformation, etc.

These novel ideas need to be explored in detail.A comprehensive analysis and an evaluation ofearly results will be presented in a forthcomingpublication.

6. Conclusion

This paper briefly outlined Ted Nelson’s notionof hypertext and one of its prime concepts –transclusions. Although HTML has been influ-enced by the notion of transclusions for the in-clusion of external objects such as images, theyhave not been implemented consistently. There-fore a number of proposals have been made onhow to implement transclusions with the tech-nologies available today. A few of the mostimportant approaches have been discussed.

Based merely on the technologies provided bya web-based environment, we have designed asystem that offers users to author articles thatmay contain transclusions. A first prototypeutilises plain HTML, Javascript and server-sidecomponents including CGI scripts and a spe-cialised HTTP proxy application.

Although a number of issues were encounteredduring the implementation phase, we have beencapable of collecting valuable results that makeus confident that we can enhance the currentprototype and increase its robustness and stabil-ity.

The innovative design of the transclusion struc-tures, as well as the architecture of systemcomponents, open up new perspectives and canlead to more advanced functionality. Facilitat-ing information discovery, pro-active dissemi-nation of related content and the stimulation ofcommunity-building are only a few possibilitiesamong others.

With the infrastructure provided by the proxyapplication elaborate functionality such as au-tomatic highlighting of information, dynamiccontent adaptation and on-the-fly insertion ofrelated data can be introduced. This conceptcan prove to be beneficial in numerous environ-ments including electronic encyclopaedias anddigital libraries.

These ideas will be discussed thoroughly in afuture paper.

7. Acknowledgements

The first author would like to thank EdmundHaselwanter for his valuable input on the Way-back Machine.

References

�1� M.BAHR, TheWaybackMachine undGoogleCache– eine Verletzung deutschen Urheberrechts?http���www�jurpc�de�aufsatz����������htm,�2002� Accessed May 11th, 2005.

�2� T. BERNERS-LEE, Universal Resource Identifiers inWWW. A Unifying Syntax for the Expression ofNames and Addresses of Objects on the Networkas Used in the World-Wide-Web, Request for Com-ments 1630 �1994�. See alsohttp���www�w��org�Addressing�rfc���txt.

�3� A. DI IORIO AND F. VITALI, A Xanalogical Collab-orative Editing Environment, Proceedings of theSecond International Workshop of Web DocumentAnalysis �WDA2003�, �2003� Edinburgh, UK, Au-gust 2003. See alsohttp���www�csc�liv�ac�uk��wda�����Papers�Section III�Paper �pdf.

Page 12: Transclusions in an HTML-Based Environment

172 Transclusions in an HTML-Based Environment

�4� L. WOOD ET AL., Document Object Model �DOM�Level 1, Specification Version 1.0,http���www�w��org�TR�����REC�DOM�Level��������, �1998� Accessed March29th, 2005.

�5� P. LE HEGARET ET AL., W3C Document Ob-ject Model �DOM�, http���www�w��org�DOM�,�2005� Accessed March 29th, 2005.

�6� ECMA, ECMAScript Language Specification, 3rdEdition, Standard ECMA-262, �1999�. See alsohttp���www�ecma�international�org�publications�files�ECMA�ST�Ecma����pdf� c

�7� D. RAGGETT ET AL., HTML 4.01 Specification,http���www�w��org�TR�html �, �1999�Accessed April 28th, 2005.

�8� R. FIELDING ET AL., Hypertext Transfer Pro-tocol – HTTP�1.1, Request for Comments2616 �1999�. See also ftp���ftp�isi�edu�in�notes�rfc��txt.

�9� Hyperwave, http���www�hyperwave�com�.

�10� P.P. KOCH, JavaScript – Get selection,http���www�quirksmode�org�js�selected�html,�2004�AccessedMay 10th,2005.

�11� Transclusions in HTML-Based Environments,http���www�kolbitsch�org�research�transclusions�.

�12� J. KOLBITSCH, H. MAURER, Community Buildingaround Encyclopaedic Knowledge, �2005� to bepublished.

�13� R. KOMAN, How the Wayback Machine Works,http���webservices�xml�com�lpt�a�ws����������brewster�html, �2002� AccessedMay 10th, 2005.

�14� H. KROTTMAIER, Transcluded Documents: Advan-tages of Reusing Document Fragments, Proceed-ings of the 6th International ICCC/IFIP Conferenceon Electronic Publishing �ELPUB2002�, �2002�Karlovy Vary, Czech Republic, pp. 359–367.See also http���hkrott�iicm�edu�docs�publications�elpub������pdf.

�15� H. KROTTMAIER, D. HELIC, Issues of Transclu-sions, Proceedings of the World Conference onE-Learning in Corporate, Government, Health-care, & Higher Education �E-Learn 2002�, �2002�Montreal, Canada, pp. 1730–1733. See alsohttp���coronet�iicm�edu�denis�pubs�elearn����b�pdf.

�16� H. KROTTMAIER, H. MAURER, Transclusions inthe 21st Century, Journal of Universal Com-puter Science, 12 �2001�, pp. 1125–1136. Seealso http���www�jucs�org�jucs � ��transclusions in the �st�.

�17� H. MAURER, K. TOCHTERMANN, On a New Pow-erful Model for Knowledge Management andits Applications, Journal of Universal Com-puter Science, 1 �2002�, pp. 85–96. See alsohttp���www�jucs�org�jucs � �on a new powerful�.

�18� MICROSOFT CORPORATION, About Cross-Frame Scripting and Security,http���msdn�microsoft�com�workshop�author�om�xframe scripting security�asp,�2005� Accessed May 10th, 2005.

�19� A. MOORE ET AL., Personally tailored teaching inWHURLE using conditional translucion, Proceed-ings of the Twelfth ACM Conference on Hypertextand Hypermedia, �2001� Aarhus, Danmark, pp.163–164.

�20� T.H. NELSON, A File Structure for the Complex, theChanging and the Indeterminate, Proceedings of theACM 20th National Conference, �1965� Cleveland,OH, U.S.A., pp. 84–100.

�21� T.H. NELSON, Literary Machines, Mindful Press,1981.

�22� T.H. NELSON, The Heart of Connection: Hyper-media Unified by Transclusion, Communications ofthe ACM, 8 �1995�, pp. 31–33.

�23� T.H. NELSON, Generalized Links, Micropaymentand Transcopyright, http���www�almaden�ibm�com�almaden�npuc������tnelson�htm,�1996� Accessed May 1st, 2003.

�24� T.H. NELSON, Transcopyright: Pre-Permission forVirtual Republishing, http���www�aus�xanadu�com�xanadu�transcopy�html, �1998� AccessedMay 3rd, 2005.

�25� T.H. NELSON, Xanalogical Structure, Needed NowMore than Ever: Parallel Documents, Deep Linksto Content, Deep Versioning and Deep Re-Use,ACM Computing Surveys, 4es �1999�. See alsohttp���xanadu�com�au�ted�XUsurvey�xuDation�html.

�26� NETSCAPECOMMUNICATIONS, JavaScriptDe-veloper Central, http���developer�netscape�com�tech�javascript�, �2004� Accessed Febru-ary 3rd, 2004.

�27� A. PAM, Where World Wide Web Went Wrong,Proceedings of the AUUG’95 & Asia-PacificWorld Wide Web ’95 Conference & Exhibition,�1995� Sydney, Australia. See also http���www�csu�edu�au�special�conference�apwww���papers���apam�apam�html.

�28� A. PAM, Fine-Grained Transclusion in the Hyper-text Markup Language, Internet Draft �1997�. Seealso http���xanadu�com�au�archive�draft�pam�html�fine�trans����txt.

�29� E. WILDE, D. LOWE, XML Linking Language. InXPath, XLink, XPointer, and XML: A PracticalGuide to Web Hyperlinking and Transclusion �E.Wilde and D. Lowe�, �2002� pp. 169–198.Addison-Wesley Professional. See also http� ��search�webservices�techtarget�com�searchWebServices�downloads�wilde�lowe ���pdf.

�30� S. DEROSE ET AL., XML LinkingLanguage �XLink�Version 1.0, http���www�w��org�TR�xlink�,�2001� Accessed April 28th, 2005.

Page 13: Transclusions in an HTML-Based Environment

Transclusions in an HTML-Based Environment 173

�31� W3C Extensible Markup Language �XML�,http���www�w��org�XML�, �2003� AccessedApril 28th, 2005.

�32� S. DEROSE ET AL., XML Pointer Language�XPointer�, http���www�w��org�TR�xptr�,�2002� Accessed May 2nd, 2005.

�33� M. YOCOM, Mac OS History,http���www�macos�utah�edu�Documentation�MacOSXClasses�macosxone�macintosh�html,�2004� Accessed May 3rd, 2005.

Received: May, 2005Accepted: February, 2006

Contact address:

Josef KolbitschGraz University of Technology

Steyrergasse 308010 Graz, Austria

e-mail: josef�kolbitsch�tugraz�at

Hermann MaurerInstitute for Information Systems and Computer Media

Graz University of TechnologyInffeldgasse 16c

8010 Graz, Austriae-mail: hmaurer�iicm�edu

JOSEF KOLBITSCH is a PhD student at Graz University of Technology.His research interests include electronic encyclopaedias, digital librariesand hypermedia.

HERMANN MAURER is professor and dean of the faculty of computerscience at Graz University of Technology. He is author of some twentybooks and more than 600 contributions in various publications. Re-cently he has also published “XPERTS”, a series of science fictionnovels.