xml and early english manuscripts: extensible medieval literature

© Blackwell Publishing 2004

Literature Compass 1 (2004) ME 061, 1–5

XML and Early English Manuscripts: Extensible Medieval Literature

Kathryn

Powell

University of Manchester

Abstract

This article is intended to introduce literary scholars to the use of markuplanguages in general and the Extensible Markup Language (XML) in particularin the creation of electronic texts based on early English manuscripts. It is not ahow-to, but primarily a survey of recent work. It notes the limitations of somepopular markup languages, such as Hypertext Markup Language (HTML) andStandard Generalized Markup Language (SGML), in terms of storing, reproducingand disseminating information about medieval manuscripts, and explains ways inwhich XML could be used to overcome some of these limitations. It also describesthe use of XML in a research project based at the University of Manchester’s

Centre for Anglo-Saxon Studies.

Recently, when I was explaining to another academic that much of myresearch combined computer applications with the study of early Englishmanuscripts, my auditor actually laughed and said, skeptically, ‘Computersand medieval manuscripts? How does that work?’ The assumption behindthis reaction – that early medieval people didn’t use computers, so whywould a contemporary academic need computers to study their culturalproductions – is a nonsensical one when scrutinized. In fact, so manyscholars in my particular field – Anglo-Saxon studies – have made suchextensive use of technology, especially in the study of texts and manuscripts,that when asked ‘how that worked’, my immediate response was, ‘Verywell, actually.’

As the history of Anglo-Saxon studies begins in the typographic eraand because not everyone wishing to read or study Anglo-Saxonmanuscripts and the texts they contain has had access to these singularproductions, Anglo-Saxonists have always had to use technology toreproduce manuscripts and make their content more widely available.Typographic reproductions, including diplomatic texts and editions thatinclude information about the manuscript state of medieval texts, makeOld English texts much more widely available even to beginning students,but are severely limited in their ability to represent the characteristics ofa medieval manuscript. They can only comment on features such as the

2 XML and Early English Manuscripts

© Blackwell Publishing 2004 Literature Compass 1 (2004) ME 061, 1–5

foliation of a manuscript, the characteristics of a hand, or any illustrationsor illuminations accompanying a text in a manuscript, and scribal errorsor marginal entries must usually be relegated to the textual notes orapparatus of an edition. Facsimiles – whether photographic or electronic– are able to reproduce many more of the features of a manuscript, butare limited in terms of their accessibility both because of the expenseof producing them and because of the difficulty of providing any textor commentary to accompany the images in such a way as to make thefacsimile useful to literature students who may have limited training inreading manuscripts. When electronic markup languages – metalanguageswith their own vocabularies and syntaxes that tell a computer how tointerpret the content of a text – first began to emerge, they presenteda possible compromise. By ‘marking up’ an electronic edition of an OldEnglish text, one could make the text available to a variety of readerswhile at the same time providing a computer with ‘metadata’ – informa-tion about the text and how to interpret it, including information aboutits manuscript context(s). Ideally, this metadata could then be used by thecomputer to store information about a text in a database, map two-dimensional coordinates of a manuscript page’s contents in a drawingprogram, or display a text in a web browser in a form that preserved someof the layout of a manuscript page or highlighted such features as changesof hand or scribal errors.

It is not surprising, then, that markup languages have been widely adoptedand implemented in the creation of electronic editions of Old Englishtexts. Probably the most widely used of these has been the HypertextMarkup Language (HTML), used to create editions accessible through aweb page.

1

Because its set of tags – terms used to ‘mark up’ a text – isquite small and easy to learn, HTML has been an easy markup languagefor scholars to adopt and use. The same limitations that make it easy touse, however, also severely restrict its ability to represent the complexitiesof medieval manuscripts. Furthermore, like typographic editions andphotographic facsimiles, HTML texts exist only to be read. Despite beingelectronic, they cannot be easily read by or passed between computerprograms other than web browsers. Thus, scholars wishing to create textsthat could not only be read but also electronically processed and analysedhave turned to more complex markup languages, particularly the StandardGeneralized Markup Language (SGML). SGML encoding has been used,for example, in

The Electronic Beowulf

– a digital facsimile of London,British Library, Cotton Vitellius A. xv – to encode both a new edition ofthe poem and a transcript of it from the manuscript with information aboutthe poem’s manuscript state, among other things.

2

Nonetheless, the greatvirtue of SGML – its complexity – is also its greatest drawback. Whilethe language is capable of representing Anglo-Saxon texts and manuscriptsin ways useful to a variety of human and computerized readers, it is socomplex that it is very difficult to learn or fully implement. A promising


XML and Early English Manuscripts 3

alternative is provided by the Extensible Markup Language (XML).Although XML is a very new markup language (it was first introducedin February of 1998), it has already been widely adopted in a variety ofacademic and non-academic disciplines.

3

It is essentially a streamlinedversion of SGML that is relatively easy to learn and capable of being usedin a wide variety of applications. While XML has a very strict grammarand syntax – i.e., a set of rules about what tags should look like, whatorder they should appear in, and what constitutes a grammatical line ofXML – it has an almost entirely fluid vocabulary, so that tags and theirmeanings can be defined by users in a stylesheet, and computer programscan recognize different ‘dialects’ of XML as long as they have accessto the appropriate stylesheets.

4

This makes XML particularly adaptable tounusual applications, including encoding textual features found in medievalmanuscripts that are not common to typographic texts.

Currently, the research project of which I am a member, ‘An Inventoryof Script Categories and Spellings in Eleventh-Century England’, basedat the University of Manchester’s Centre for Anglo-Saxon Studies andled by Donald Scragg, is developing a web-accessible database of eleventh-century Old English spellings and using XML to markup text files to bestored by the database. The twin aims of the project are to identify andprecisely date categories of script and, where possible, individual scribesin the eleventh century and to catalogue Old English spelling variants foundin eleventh-century manuscripts. It is hoped that, by using a database tocorrelate the data on spelling variants with information on the palaeo-graphic dating of manuscripts, it will be possible to determine whetherthere was a standard of written Old English in the eleventh century andthe degree to which such a standard may have been adhered to.

5

In orderto properly compile data about the use of variant spellings and store it inour database, we are creating text files which not only preserve the exactreadings of the manuscripts, but which also carefully preserve informationabout the manuscripts they are based on, the various hands used in thesemanuscripts and precise locations where hands change. XML provided uswith a relatively easy method of marking these features within our textfiles in a way that could be read by our database. After a few weeks ofconsultation between academic members of our project and our technicalassistant from Manchester Computing, Dan Smith, we were able to createa set of tags that was largely compliant with the Text Encoding Initiative(TEI),

6

and that could be used to create a clear data trail between wordsand manuscripts in our database. Furthermore, we have created tags thatcould be implemented in the future to store information about scribalerrors or erasures within a manuscript or to markup foreign words orproper names in our texts so that they could be automatically stored in aspecial directory of our database, but eliminated from our dataset ofEnglish spellings. Also, although it is not within the scope of this projectto do so, it is possible to repurpose the XML-encoded text files we are

4 XML and Early English Manuscripts


creating – for example, to create a different stylesheet that will use thesame set of tags to display these texts in a web browser, highlightingthe various features of the manuscript that we have marked up. It hastaken more than a few weeks of training and practice for members of theproject to become proficient in marking up our text files, but the timewe have spent implementing XML has saved us time entering data intothe database and will eventually result in a set of text files which mayprove useful to others studying eleventh-century texts and manuscripts.

Our experiences with XML have not all been positive. It is a languageand presents all of the standard challenges of language learning. Particularly,the incredibly precise syntax and grammar that allows XML to passinformation between so many different computer applications can be veryfrustrating for humans, who are not usually so exact in their use oflanguage. It is not immediately easy to write an XML document that‘parses’; nor is it always easy to see why a document fails to parse. Thefluidity of XML’s vocabulary, however, makes it, like English, a languagewhich can be adapted to meet a variety of needs. I believe we have onlybegun to see the applications of XML in the study of Anglo-Saxon textsand manuscripts. The possibility exists of using XML to create betterweb-accessible or CD-ROM-based editions of Old English texts than canbe created in HTML – ones which contain more information about themanuscript state(s) of a text and are able to display that information in avariety of forms other than the hyperlink. If XML was widely adoptedby scholars working on research projects in Old English, it would evenbe possible to standardize on a particular set of tags that formed a subsetof the TEI guidelines and that would allow XML-encoded texts to beeasily shared among researchers in Old English. In short, I believe XML isyet another technological advance that has the potential to bring togethercomputers and manuscripts in a way that works very well.

Notes

1

Most of the HTML editions of Old English texts available on the web are indexed onC. Ball’s

Old English Pages

(http://www.georgetown.edu/cball/oe/old_english.html). Regularupdates on new electronic resources in Anglo-Saxon Studies, including new electronic editions,are also provided in M. K. Foys’s ‘Circolwyrde’, a regular column appearing in each fall issueof the

Old English Newsletter

, beginning with the Fall 2000 issue (34(1)).

2

Electronic Beowulf

(CD-ROM), ed. K. Kiernan (London and Ann Arbor, MI: The BritishLibrary and University of Michigan Press, 1999); v. 2.0 (London: British Library Publications,2002). For more infomation on the

Electronic Beowulf

, see the website (http://www.uky.edu/

∼

kiernan/eBeowulf/guide.htm/).

3

E. R. Harold and W. S. Means,

XML in a Nutshell

, 2nd ed. (Sebastopol, California: O’Reilly& Associates, 2002), p. 9. XML is also gaining a following among academics. Within Anglo-Saxonstudies, the Toronto Dictionary of Old English Project has recently implemented some XMLencoding alongside HTML and SGML in their publication of

The Dictionary of Old English: Ato F on CD-ROM

. For more information, see the project website (http://www.doe.utoronto.ca/).

4

Technically, I am using the term ‘stylesheet’ to refer to both the variety of XML-compatiblestylesheet formats and to Document Type Definitions (DTDs). While DTDs and stylesheets


XML and Early English Manuscripts 5

differ in form, they both serve essentially the same function: they tell a computer program whatto do with the information contained within a set of XML tags.

5

For more background on the project, see D. Scragg, ‘Standard Old English and the Study ofEnglish in the Eleventh Century’.


35(1) (Fall 2001), pp. 24–26.

6

TEI is a consortium that aims to provide standards on how to encode a variety of texts forvirtually any purpose. Their website (http://www.tei-c.org/) provides more information aboutthem, including the latest set of XML-compatible TEI guidelines (http://www.tei-c.org/Guidelines2/index.html/) and ‘A Gentle Introduction to XML’ (http://www.tei-c.org/P4X/SG.html/).

Bibliography

The Dictionary of Old English: A to F on CD-ROM

, http://www.doe.utoronto.ca/.

Electronic Beowulf

(CD-ROM), ed. K. Kiernan (London and Ann Arbor, MI: The British Libraryand University of Michigan Press, 1999); v. 2.0 (London: British Library Publications, 2002).

Foys, M. K., ‘Circolwyrde’.


, each Fall issue since 34(1) (Fall 2000).Harold, E. R. and Means, W. S.,

XML in a Nutshell

, 2nd ed. (Sebastopol, California: O’Reilly& Associates, 2002).

Old English Pages

, http://www.georgetown.edu/cball/oe/old_english.html.Scragg, D., ‘Standard Old English and the Study of English in the Eleventh Century’.

OldEnglish Newsletter

35(1) (Fall 2001), pp. 24–26.TEI (Text Encoding Initiative), http://www.tei-c.org/.

xml and early english manuscripts: extensible medieval literature

Documents