the text in the machine: electronic texts in the humanities

5
274 BOOK REVIEWS as well as for students interested in IR. It is easy to read and has an astonishingly wide horizon, discussing hundreds of interesting IR topics and pointing the interested reader in the right directions. The book is suited for both undergraduate and graduate courses on Information Retrieval. The judicious choice of subjects and their thorough treatment, the use of detailed exam- ples and the proposed exercises make this book an excellent course-book. For a graduate course the book would need to be complemented with a number of more in-depth articles, specifically on the more novel techniques discussed in chapters 5 to 7, but the detailed and well-balanced bibliography of the book makes up for this. This book was used in a graduate course by the authors, and they indicate a web-site where to obtain the overheads and speaker notes used when teaching it. For the IR expert or researcher, the interest on this book lies in the wide range of top- ics studied and the critical bibliography provided by these topics. Whilst one may find better books on each of the topics covered by this book, no one book in IR covers them all so clearly and thoroughly. One exception is the chapter on the integration of IR in database management systems (Chapter 5), which is very original and cannot be found elsewhere. It must be noted that the book only deals with ad-hoc retrieval, and does not discuss other important information retrieval topics such as document classification, filtering or routing, passage retrieval, text segmentation, topic detection and tracking, etc. Furthermore, little attempt is made to motivate the methods presented from a mathemati- cal or statistical perspective, and in this respect it may prove insufficient for certain readers. The book discusses so many different topics that a conscious choice has been made to keep explanations simple and intuitive. Hugo Zaragoza Microsoft Research Ltd. 7 JJ Thomson Avenue Cambridge CB3 OFB, UK Email: [email protected] The Text in the Machine: Electronic Texts in the Humanities. Toby Burrows. New York: Haworth Press, Inc.; 1999; 182 pp. with Index. Price: $49.95 hard (ISBN: 0-7890-0424-0). This excellent introduction for the uninitiated (and solid reference for the more knowledge- able) reader is clearly written and well-organized. The author, Principal Librarian of the Scholar’s Centre at the University of Western Australia, has worked with electronic texts for several years, and is also co-director of a nationally funded project to establish web service for the Berndt Museum of Anthropology in Perth. In the Preface, Dr. Burrows discusses the nature and significance of text and the continuing centrality of texts in the humanities. Further, he defines the electronic text, for the purposes of the work, as having two essential characteristics: it must be an electronically stored

Upload: m-zoe-holbrooks

Post on 02-Aug-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Text in the Machine: Electronic Texts in the Humanities

274 BOOK REVIEWS

as well as for students interested in IR. It is easy to read and has an astonishingly widehorizon, discussing hundreds of interesting IR topics and pointing the interested reader inthe right directions.

The book is suited for both undergraduate and graduate courses on Information Retrieval.The judicious choice of subjects and their thorough treatment, the use of detailed exam-ples and the proposed exercises make this book an excellent course-book. For a graduatecourse the book would need to be complemented with a number of more in-depth articles,specifically on the more novel techniques discussed in chapters 5 to 7, but the detailedand well-balanced bibliography of the book makes up for this. This book was used in agraduate course by the authors, and they indicate a web-site where to obtain the overheadsand speaker notes used when teaching it.

For the IR expert or researcher, the interest on this book lies in the wide range of top-ics studied and the critical bibliography provided by these topics. Whilst one may findbetter books on each of the topics covered by this book, no one book in IR covers themall so clearly and thoroughly. One exception is the chapter on the integration of IR indatabase management systems (Chapter 5), which is very original and cannot be foundelsewhere.

It must be noted that the book only deals with ad-hoc retrieval, and does not discuss otherimportant information retrieval topics such as document classification, filtering or routing,passage retrieval, text segmentation, topic detection and tracking, etc.

Furthermore, little attempt is made to motivate the methods presented from a mathemati-cal or statistical perspective, and in this respect it may prove insufficient for certain readers.The book discusses so many different topics that a conscious choice has been made to keepexplanations simple and intuitive.

Hugo ZaragozaMicrosoft Research Ltd.7 JJ Thomson AvenueCambridgeCB3 OFB, UKEmail: [email protected]

The Text in the Machine: Electronic Texts in the Humanities. Toby Burrows. New York:Haworth Press, Inc.; 1999; 182 pp. with Index. Price: $49.95 hard (ISBN: 0-7890-0424-0).

This excellent introduction for the uninitiated (and solid reference for the more knowledge-able) reader is clearly written and well-organized. The author, Principal Librarian of theScholar’s Centre at the University of Western Australia, has worked with electronic textsfor several years, and is also co-director of a nationally funded project to establish webservice for the Berndt Museum of Anthropology in Perth.

In the Preface, Dr. Burrows discusses the nature and significance of text and the continuingcentrality of texts in the humanities. Further, he defines the electronic text, for the purposesof the work, as having two essential characteristics: it must be an electronically stored

Page 2: The Text in the Machine: Electronic Texts in the Humanities

BOOK REVIEWS 275

version of some previously existing print or manuscript document; and it must be publishedor publicly distributed in some way. Since Burrows’ focus is on the creation of existingtexts into electronic form for the scholarly community’s access and research, privately-held electronic copies made by individuals and resident on their own machines, and worksoriginally created in electronic form are both ignored.

Each chapter contains illustrative examples drawn from the best-known baker’s dozenof Internet and Web digital projects, including: Library of Congress’ American MemoryProject, the Australian Cooperative Digitisation Project, Cetedoc Library of Latin Texts,the Canterbury Tales Project, the Electronic Beowulf Project, Editions and Adaptationsof Shakespeare, Goethes Werke, Project Gutenberg, Literature Online, the Online BookInitiative, the Perseus Project, the Oxford Text Archive, Labyrinth Library of MedievalStudies, as well as other commercial and academic projects. There are also screen shotsas appropriate; end-of-chapter recommendations for Further Reading that point to sourceslisted in the Bibliography and websites discussed in the body of the chapter text; and a verygood index.

Chapter 1, Markup Systems for Electronic Texts, introduces readers to the concepts andtechnology of markup languages, focusing on the Standardized General Markup Language(SGML) and its offspring of particular concern to the scholarly community: the Text Encod-ing Initiative (TEI and TEI Lite), Hyper Text Markup Language (HTML), and the eXtensibleMarkup Language (XML).

The discussion of markup languages begins with the “In principio” of the LindisfarneGospels. Rooted in the writings of Sperberg-McQueen, this chapter enlightens the readerwithout talking down him and—equally laudable—without falling into the kind of geeks-peak that alienates all but the truly hardcore technophile. Burrows includes cogent, concisediscussions of punctuational, presentational, procedural, and descriptive types, and evalu-ates their differences.

The author deftly and clearly handles the discussions of SGML, TEI, HTML, and XML.His careful prose introduces the main points of SGML and covers the evolution of HTMLand XML from it painlessly. The discussion of TEI and TEI Lite is one of the best I’ve seen.XML is “the new thing” and as such is the focus of the technology press on and offline.Few pieces aimed at the non-technical audience do as good a job of succinctly introducingthis technology and its relations, XLL and XSL, and their potential utility as Burrows doesin this chapter. I plan to add this chapter to my “recommended readings” list for an adultcontinuing education course I teach in XML.

Chapter 2, Creating an Electronic Text, addresses what many of us might view as the“tedious” bits, that is, how texts are actually shifted from the printed page to the electronicform. Obviously, there are limited options: the text can be typed (keyed in), an existingelectronic form of the desired text can be reused, the text can be scanned, and subsequentto scanning, OCR processed. Each of these options carries technical baggage ranging fromthe choice of file formats (text or image? ASCII? Rich Text? GIF, TIFF, JPEG, PDF?) to thedetermination of standards of accuracy, retention guidelines, and so forth. Burrows outlinesthe technological challenges without unduly alarming the reader. Those already working indigital library projects are familiar with these issues, but readers who have not yet embarkedupon the quest will be well served by the discussion.

Page 3: The Text in the Machine: Electronic Texts in the Humanities

276 BOOK REVIEWS

The first half of Chapter 3, Delivery Mechanisms for Electronic Texts, addresses morefamiliar territory for those already working with electronic texts: residency/storage (es-sentially, “local”—diskettes, CD-ROMs, magnetic tape—and worldwide—Internet, WorldWide Web distribution, including FTP). In the remainder of the chapter, Burrows discussesthe various types of software involved, from word processing packages to browsers andSGML readers.

In Chapter 4, Organizing Access to Electronic Texts, the author introduces the humanagents and agencies involved in producing and distributing electronic texts, individual scho-lars and organizations from university-based groups to commercial concerns. The strengthsand weaknesses associated with the locus of production and maintenance of web resourcesare well-known to those already involved. For readers new to the adventure, the considera-tions of peer review in relation to web-based documents and the pitfalls of “renting” accessto content from commercial brokers will be enlightening and perhaps troubling.

Burrows ends the chapter with a brief discussions of metadata and preservation issues.Electronic repositories as well as physical libraries grapple with metadata issues: the MARCformat has not proven to be all things to all collections. Efforts to come up with a more func-tional replacement, including the Dublin Core work, Educom’s Instructional ManagementSystems (IMS) project, and frameworks utilizing XML (Resource Description Framework[RDF] and Meta Content Framework [MCF]) are discussed.

Issues of preservation of electronic content and access to it continue to challenge the li-brary and archives communities. Electronic repositories are poised to replace some physicalcollections. Creating and maintaining electronic repositories is an expensive proposition,and as such they are likely to remain in the province of institutions, governments, andcommercial concerns. Many university projects are in essence pilot projects and are fundedfor only a limited time, suggesting that these projects may by definition be ephemeral.Governmental bodies increasingly turn to the Web for dissemination of information to thepublic while reducing or eliminating the production of physical information products forthe same public. This trend endangers access to information by citizens and, in the longterm, to scholars of government.

Commercial vendors—an ever more consolidated small group of players—may be themost problematic partners for collection developers and librarians. Electronic subscriptionsare in essence leases of content that reside outside an institution’s facility. Subscribinginstitutions may be limited to a set number of approved users accessing the content at anygiven time, or restricted by IP address (thereby limiting access to those physically locatedon the campus), and may (or may not) have access to archives for years during which theinstitution had a subscription if it no longer maintains the subscription. (With a print product,the institution retains the back issues it purchased regardless of whether it maintains a currentsubscription.) With the rising cost of academic journals, every limitation or diminution ofusage from the print product adds hidden costs to the subscribing institution and its users.These concerns are in addition to the headaches of fitful network conditions, hardware andsoftware failures at either end (provider and user), and malicious mischief aimed, likewise,at both ends.

In addition to preservation of access to electronic content, there is the matter of physicalpreservation of the data. No electronic medium has yet demonstrably improved upon the

Page 4: The Text in the Machine: Electronic Texts in the Humanities

BOOK REVIEWS 277

lifespan of archival paper. While government may well be obliged to maintain contentand access by virtue of its responsibilities to the citizenry, universities might not be able toafford an ongoing effort to upgrade their technology infrastructures as quickly as informationtechnologies evolve. Anyone who has a small collection of floppy disks holding inaccessibledocuments in old software formats can imagine the essence of this problem facing theLibrary of Congress and other repositories. Alternately, commercial vendors are unlikely tomaintain content that does not bring in sufficient return to justify the investment in upgradingits storage technology.

Chapter 5, Structure, Architectures, and Editions, addresses scholarly editing and itsrelationship to the text, including how that relationship is challenged and changed withelectronic texts. The linear presentation of printed books is often reproduced online withHTML and image files. However, as proponents of hypertext have argued, the digital mediafree both creators and experiencers from the tyranny of linear presentation. “Hypertext issomething that is designed and intended to be read in a nonlinear fashion ... the art ofeffective hypertextual design consists, therefore, in building as many interconnecting linksand paths as possible ...” (p. 140) The debate over useful and effective design of hypertextresources continues across the various online communities of interest, from designers todistance educators. This chapter introduces it in the context of creating scholarly texts,pointing out some distinct advantages of electronic versions over the more traditional printversions.

The discussion of scholarly editions introduces types of editions with brief discussionsof each. Electronic presentations done well offer users flexibility unmatchable in printresources, such as parallel texts with embedded annotations, definitions, commentaries,etc., in addition to being able to reproduce traditional forms. Burrows points out “[theelectronic text] seems likely to become the preferred medium for presenting the results ofextensive scholarly study of a particular work.”

Burrows completes his work by arguing the role of electronic texts as agents in preservingthe value and importance of our printed heritage: “Electronic texts are never likely to replaceprinted ones. But it is possible to do things in the electronic medium that are impossible, oruneconomic, to do in print: index every word in the text, transcribe every variant, and link thetext to a videotaped performance, for instance. Above all, however, electronic versions canbring these texts to a new readership. We live in a visual age, where so much communicationtakes place through the computer and television screen, where the image is more powerfulthan the text. By developing electronic texts that exist in a visual medium and are embeddedin a visual context, we can make use of the power of the image to convey the importance ofthe text. From this point of view, electronic texts are a vitally important tool for ensuringthat our great heritage of printed books and manuscripts retains its place in an increasinglyvisual and image-centered world.” (pp. 164–165)

The regular announcements of new electronic text projects, such as former Rutgers En-glish professor Jeffery Triggs’s Global Language Resources (a respository of 600 SGMLfull-text versions of literary classics) and the Medici Archive Project (http://www.medici.org/general/) suggest that widening access to the Web and its attractiveness to users willincrease interest in and production of Web-based resources by individuals outside theAcademy, commercial producers, and institutions and their associted scholars for the

Page 5: The Text in the Machine: Electronic Texts in the Humanities

278 BOOK REVIEWS

foreseeable future. The Text in the Machine is a thorough, well-reasoned discussion of thechallenges and opportunities involved in creating, disseminating, and maintaining scholarlyelectronic texts in the humanities. Readers new to the subject and those involved in creatingelectronic repositories alike will find value in the work. It would make an excellent text fora digital libraries course at the university level.

M. Zoe HolbrooksMLIS, P.O. Box 95214Seattle, WA 98145-2214Email: [email protected]