developing a digital library for the humanities
DESCRIPTION
Developing a Digital Library for the Humanities. Gregory Crane ([email protected]) Winnick Family Chair in Technology and Entrepreneurship Professor of Classics Director, Perseus Digital Library Project Http://www.perseus.tufts.edu/About/grc.html. Perseus Digital Library. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/1.jpg)
Developing a Digital Library for the Humanities
• Gregory Crane ([email protected])
• Winnick Family Chair in Technology and EntrepreneurshipProfessor of ClassicsDirector, Perseus Digital Library ProjectHttp://www.perseus.tufts.edu/About/grc.html
![Page 2: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/2.jpg)
Perseus Digital Library
• On-going areas of Development• 1987: DL on Classical Greek Culture• 1993: History of Science• 1996: Began work on Latin and Rome• 1997: Early Modern English• 1999: History and Topography of London• 2000: Ancient Egyptian Giza• 2000: Slavery and the US Civil War
![Page 3: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/3.jpg)
Partner Institutions
• Max Planck Institute for the History of Science (Berlin)
• Museum of Fine Arts, Boston• Stoa Publishing Consortium• New Variorum Shakespeare Series, Modern
Language Association• Special Collections at Tufts, Brandeis, the
University of Pennsylvania
![Page 4: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/4.jpg)
On-Going Support
• National Endowment for the Humanities(DLI2, Preservation & Access, Education)
• National Science Foundation (DLI2)
• Fund for the Improvement of Postsecondary Education, Dept of Ed.
• Max Planck Society
![Page 5: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/5.jpg)
The Whole greater than the sum
• Tufts Health Sciences Database:
• An on-line Medical School Curriculum– First iteration: 70% of the value– Second Iteration: 90%– Third Iteration: 130%
• “Data” and “system” interact in increasingly dynamic ways.
![Page 6: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/6.jpg)
Persistent value over time &space
• How many ages hence Shall this our lofty scene be acted over,In states unborn and accents yet unknown?– Brutus in Julius Caesar
• How do we structure data for– Contemporary users we can’t directly
anticipate?– Systems not yet designed?
![Page 7: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/7.jpg)
Radically New Documents
• Reconstructions of Historical Spaces, e.g.– UVA’s Crystal Palace (London) – UCLA’s Rome and VR Lab
• Integrating Virtual Spaces with Sources– Museum of Fine Arts, Tombs at Giza– Greek Sculpture– The Streets of 19th Century London
![Page 8: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/8.jpg)
Traditional Docs Rethought
• Concordance: “Obsolete”
• Bibliographies — databases
• Encyclopedias — automatic linking
• Lexica and lexicography — – Automatically discovered semantic rel-s– THEN lexicographic work
![Page 9: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/9.jpg)
Development is two part
• Ultimate end: Radically new docs?
• Short term: Electronic Incunabula– New Variorum Shakespeare– Electronic Marlowe– Tallis Street Maps
• FIRST we thoroughly analyze what we have
• THEN radical redesign emerges
![Page 10: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/10.jpg)
Technology outruns Practice
• The 3D Reconstruction/Virtual Space– Cutting edge technology– Still nascent scholarly practices
• Mature Document Structures– Textual Notes: 1908 Richard 3– Traditional Text Citations: 1887 Commentary
![Page 11: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/11.jpg)
The More Things Stay the same...
• “Content” can remain unchanged
• “Presentation” is dynamic and flexible– The Dictionary knows what you are reading– Citations —> Bidirectional links– Automatic Linking by keyword– Text and Atlas: Plot sites in a document
![Page 12: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/12.jpg)
Current Paradigm: DL Dipomacy
• Monolithic Systems (e.g., Perseus!)– One way to view each document
• Intercommunication via metadata– DL as metadata for “opaque” objects
• Major Problems– Renting access, rather than collecting content– All publications become ephemera
![Page 13: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/13.jpg)
Three Strategies
• 1) The Editing Problem — – How do real authors create structured docs?
• 2) Developing Radically New Docs —– Archimedes DL on Mechanics– MFA Excavations at Giza
• 3) Radical Repurposing of Print– Bolles Collection on London
![Page 14: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/14.jpg)
Bolles Collection at Tufts
• documenting the history and topography of London and its environs – 35 "full-size” maps– 320 more specialized maps– 400 books (284 linear feet of shelf space) – 1,000 pamphlets. – “Paper Hypertexts”
• 10,000+ “extra illustrations”
![Page 15: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/15.jpg)
Bolles Electronic Archive
• A Testbed for the Perseus Digital Library
• “Level 5” TEI Encoded Full Text– Quotes, languages, proper names, dates, money
• High-end OCR and Double Keyboarding– OCR ideal for some but not all– Keyboarding much the best — money
permitting
![Page 16: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/16.jpg)
Bolles — Initial Texts
• Five Million Words now in L5 TEI– Will exceed 10 million by year’s end
• Surveys of London History and Topography– Stow, Maitland, Wilkinson, Allen, Thornbury
• Commentary on social conditions– Mayhew, Archer, Hollingshead, Booth
• Literary works with London as backdrop– Defoe, Dickens, “Sherlock Holmes”
![Page 17: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/17.jpg)
Images
• 10,000 Grayscale Images– Mainly engravings of people and places– “opportunistic” metadata (=captions & context)
• 2,400 Contemporary Images– Well catalogued and geo-referenced
• QTVR Panoramas
• 70 Tallis Map “Elevations”
![Page 18: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/18.jpg)
Geospatial Data
• Bartholomew 1:5000 Data set for London– Modern data as reference and interchange
• Historical maps georeferenced to Barth. Data– 10 so far (c. 2 hours each)– Urban maps do not easily “line up”– How to create an historical GIS?
• GPS Waypoints– As of May 2000, good to within 10m. or better
![Page 19: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/19.jpg)
Feature Extraction
• Easy identification: Dates, Money• Known Keywords and Classes
– The Getty TGN (1 m. places and lon/lats)– The Bartholomew Gazzetteer (10,000)– Indices to Maps (e.g. Cruchley 1826, 4200)– The Index/Abstract of the DNB (30,000+)
• Clean-up with rule based Proper Name classification: Mr NAME; NAME street
![Page 20: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/20.jpg)
“Runtime” Links
• Runtime links supplement in file tagging
• 1) Where metadata is less precise– Metadata from unedited headers and captions
• 2) Where the source does not contain data– If no dates, then scan for them
• Use tagging for “high confidence” data– Ideal situation: automated tags hand proofed
![Page 21: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/21.jpg)
Strategic Questions
• “Editions” a foundation for scholarship
• Where does the editor’s job start?
• How does editor’s job change?
• How do we define “Corpus Editors”?– People with domain expertise in content– Expertise in software and Library systems
• Need for scholarly automated processing
![Page 22: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/22.jpg)
Delivering Integrated Data
• “Good” and “rough” maps for Cic’s Letters
• Coleman delivers quite useful results
• Map locates Coleman Street.
• Streets in description of "Portsoken Ward”.
• Historical Views of this section of London
• Timeline 1: A Linear History
• Timeline 2: “Encyclopedic Scatter”
![Page 23: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/23.jpg)
Further Work
• Disambig., auto-cataloguing, Time/Space
• VR Interface: Tallis 1, 2 and Headset
• New challenging document types
• Geospatial Data in : Patterson's Journeys
• Urban data in Booth and City Directories.– Tallis Map for Oxford Street with overall and
more focused directories.
![Page 24: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/24.jpg)
Research Projects
• Robert Jacob and VR Interfaces– Figure: Tallis VR Conversion 1.
– Figure: Tallis VR Conversion 2..
– Figure: Head mounted VR navigation.
• Holly Taylor and Cognitive Analysis
– Spatial Cognition
– Text Comprehension
![Page 25: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/25.jpg)
Conclusions
• Baseline Knowledge Environment– Practical and useful
• “Corpus Editions”
• Midway between editions and library digitiz.
• Requires a new config. of skills
• The “Diplomatic” Federated DL model weak– Need access to full data for visualizations
![Page 26: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/26.jpg)
Perseus Document Manager
• Works with XML– Multiple granularities: sentence, section,
chapter– Deals with overlapping doc hierarchies– Combines internal and external metadata– Our metadata in RDF and can be XML
• Since all data and metadata —> XML– Well suited to Federated DL Applications
![Page 27: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/27.jpg)
Scalable DL• SGML/XML need translation for display
– Can’t maintain stylesheets for millions of docs
• Intelligent display of various DTDs– “Cheaply” acquires XML/SGML docs – Individual Custom Style sheets allowed
• Integration of Geo-spatial Data
• Multilingual support, feature extraction
• Integrated multi-resolution image support
![Page 28: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/28.jpg)
Perseus Document Manager
• Short term development:– Collecting new datasets to the Perseus DL
• (leveraging Internet 2 investment)
– Adding value: e.g.,• Sources for the History of Mechanics (Max Planck)
• Duke Databank of Documentary Papyri
• Books, maps etc. on the City of London
• Shakespeare and Early modern English
![Page 29: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/29.jpg)
Perseus Document Manager
• Longer Term: Distribution of the System
• How best to maintain and expand the system?– Open source?– Commercial Licensing?– Wait for third party to match PDM features?
![Page 30: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/30.jpg)
Automatic Integration
• Content Analysis: Various Languages• Time: extracting and visualizing dates• Space: Integrating historical Geographic Data• Names: establishing authority lists
– Getty Thesaurus of Geographic Names • Names and Coordinates
– Encyclopedias: e.g., Harpers, DNB• Names and Dates
![Page 31: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/31.jpg)
Our Research Agenda
• Developing a self-sustaining models– Publication of documents– Maintenance of software
• Exploring Problem Sets in different domains– E.g., sparse data (antiquity) vs. rich (London)
• Helping humanists rethink their position– Reaching new audiences– Changing habits
![Page 32: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/32.jpg)
Technology matters: e.g.19th c. Printing in England
• 20th Century Radio/Film/TV: ambiguous
• 19th Century Print Technology– 1810: c. 10,000 copies for a successful book
• Audience for literature mainly upper class
– 1850: hundreds of thousands• Audience vastly expands
• Huge numbers read Dickens, etc.
• 21st Century Network Technology?
![Page 33: Developing a Digital Library for the Humanities](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813ffd550346895dab2cdf/html5/thumbnails/33.jpg)
The Future?
• Two models:– Reproduce current world in new form
• Narrow/expensive distribution
– Think about how that world may change• Broader/inexpensive distribution
• What happens now sets the stage for …– “talk show” cyber culture? or– a new dispersal of intellectual life?