cendari summer school july 2015 burrows

61
MEDIEVAL studies & the digital turn Department of Digital Humanities Toby Burrows

Upload: toby-burrows

Post on 18-Aug-2015

36 views

Category:

Technology


1 download

TRANSCRIPT

MEDIEVAL studies & the digital turn

Department of Digital Humanities

Toby Burrows

Scope and scale

• Medieval studies – a vast field; interdisciplinary and multi-disciplinary– The surviving evidence– Subsequent material – including medievalism

• Digital Humanities – also a very large field– Tools– Theoretical framework – computational modelling

• 18 DH sessions at Kalamazoo ICMS 2015– 7 medieval papers at DH 2015

The digital landscape

• Digitized manuscripts and digital image collections: numerous libraries, museums, researchers

• Digitized editions and reference works: Patrologia Latina, Corpus Christianorum, Hathi Trust, Google Books• Catalogues and databases: libraries, museums, specialist research collections• Tools for working with images: IIIF, DigiPal• Tools for working with texts: NER and topic modeling,

transcriptions and editions• Visualization and analysis tools: social networks, geographical, chronological

Information overload?

“Their works are so limitless that they cannot be numbered … Indeed, we stand convicted of indolence by our inability to read all that they could manage to dictate”

Hugh of St Victor, Didascalicon, Book 4, chapter 2

 “Is there anywhere on earth exempt from these swarms of new books? Even if, taken out one at a time, they offered something worth knowing, the very mass of them would be a serious impediment to learning – from satiety, if nothing else”

Erasmus, Adages, II.1.1

The limits of digitization

“Around 60,000 medieval manuscripts are preserved in German

collections today, of which roughly 7.5 percent have been

digitized so far.”

C. Fabian, C. Schreiber, “Piloting a National Programme for the Digitization of

Medieval Manuscripts in Germany”, Liber Quarterly 24 (1) (2014), 2-16

“We invite you to take part in an experiment in DIY digitization. Please upload your digital photographs of the Bodleian’s special collections to this [Flickr] group.These are your photographs, and you can license them accordingly, but please include at least a shelfmark for identification. We just want to see how people might use this online resource for sharing photographs of our collections.”

Bodleian Library, 26 June 2015

Joining the dots

• We have lots of resources and tools, but they mostly exist independently of each other

• Duplication of effort• Time spent trying to identify and bring together

different sources

• How do we bring them together?• Integrated infrastructure – standardized, centralized• Interoperability – of services and data• IIIF – International Image Interoperability

Framework• LOD – Linked Open Data

VICTORIA [State Library of Victoria]223 Boethius DE MUSICA – Pseudo-Hucbaldus MUSICA ENCHIRIADIS – in Latin – 11c.Vellum, 305 x 210mm, A modern paper + B modern vellum + C contemporary vellum + 56 + D modern

vellum + E modern paper. Collation: (8)1-7. No catchwords, quire signatures in roman numerals placed in the centre of the lower margin of last folio verso of each gathering, foliation modern pencil in arabic numerals, no pagination. Av.-Bv., Cv., Dr.-Er. are blank. Some folios have been repaired. Most sheets have purple stains, however, they rarely efface the text. Worm-holes in fols 1-18 without loss of text.

Dark brown ink, ruling dry-point, one col. of 39-40 lines, Daseia musical notation. Prickings in outer margins. Script is first half 11c. Rhineland or northern Italian littera prae-gothica textualis. Explicit to the first text 49r. FINIT.

Decoration: orange rubrics and green, orange, or brown ink drawings.Edges cut and gilded, binding 19c. brown morocco over boards, gilt, by C. Lewis (see below), spine gilt

with lettering BOETII / MUSICA / M.S. / SEC. XIII (sic).Incipits: 2r. –ulescentis; 9r. His igitur; 55v. Alleluia; 56r. asterisco ostendi. Ownership: Cr. in a 15c. littera

hybrida currens is a schematic table of astrological texts mentioning such authorities as Ptolomaeus, Thebit, Iohannes Hispalensis, Alkabitus, Albumasaris, Alfagranus, and there follows immediately a much rubbed transcription of a commentary (here without ascription) on portion of Arzachel, Canones ad tabulas tholetanas and it begins: Quoniam cuiusque actionis72 archazel (sic) arabus composuit tabulas ad ciuitatem toletti…; 56v. among scribbles now almost illegible is a 15c. musical diagram of the diatesseron; Ar. has a note ‘Boetii Musica, an Ancient MS. in fine Condition with diagrams. A MS. of a work of rare occurrence bound by C. Lewis. H. Drury73 1824’; spine carries the small printed number ‘3345’, being that of Sir Thomas Phillipps; Ev. in modern pencil ‘W. H. Robinson 5.9.1949. £LE-N/a/-’; Ar. has the stock no. ‘587673’; Ev. bears the library’s shelf-mark *091/B63.

72 These opening words are underlined and are the incipit of Arzachel’s text, cf. Thorndike, 1268.73 Henry Drury (1778-1841); see above p.185 for another MS. he owned.

Proliferation of services

NAMES (persons, places)

Europa Sacra, International Medieval Bibliography, CERL Thesaurus

IDENTIFIERS

Used in manuscript databases and library catalogues, printed catalogues, International Medieval Bibliography, Scriptorium

MEASUREMENTS

Recorded in manuscript databases and library catalogues, printed catalogues

CONCEPTS

International Medieval Bibliography, Getty Institute vocabularies, IconClass vocabulary

MANIFESTATIONS

Web sites and books; listed in International Medieval Bibliography, library catalogues, some manuscript databases, printed catalogues

DEPENDENTS

Listed in International Medieval Bibliography, Scriptorium, some manuscript databases

Classics: digital infrastructure

• Perseus Digital Library• Greek and Roman texts • Secondary sources – dictionaries • Linguistic tools – treebanks • Art and archaeology artefact browser

• Pleiades• Community-built gazetteer and graph of ancient places• Pleiades+ adds toponyms from GeoNames

• Pelagios• Annotating place references in texts and images with entries

from Pleiades (tool = Recogito)

• Perseids• Collaborative editing platform for source documents

Classics – unique identifiers

• Perseus Digital Library – URIs:• Texts and citations – built on URNs from Canonical

Text Services • Bibliographical catalogue records• Work and edition/translation level records• In progress: authors, editors, translators; places,

Greek and Latin lexical entities, artefacts, images• Planned: variety of annotation types

• Pleiades• URIs for ancient places

Integrating Digital Epigraphies

• Linked Data platform for digital epigraphy:• PHI Searchable Greek Inscriptions project• Supplementum Epigraphicum Graecum• CLAROS concordance of epigraphical publication

data• JSTOR epigraphy articles

Identifiers from any of the projects may be used to retrieve related data from any of the others

What’s missing in medieval studies?

• A significant proportion of the data is not digital• Much of the digital data is not available for reuse• There are many schemas for manuscript descriptions, and no mappings between them• There are no machine-readable identifiers for most medieval people, places, and organizations• There are no identifiers for medieval manuscripts – or even consistent ways to cite the shelf-marks

Linked Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Using a Linked Data approach

• Assigning globally unique identifiers to manuscripts and their component parts• Tracking relationships between manuscripts and their different manifestations (including digital versions)• Tracking relationships between manuscripts and their various dependents (including works about them)• Deriving entity data from manuscript descriptions and from full text editions and transcriptions• Mapping between terminology in different languages• Mapping between variant terminologies (e.g., 12th century and 21st century)

Linked Data functions

• Identification of individual manuscripts and their components• Identification of names, places, works relating to manuscripts• Terminology mapping: between different vocabularies and ontologies used to describe manuscripts• Schema mapping: between different descriptive structures• Entity extraction: examining digital objects, publications and manuscript descriptions to identify entities (e.g. through text mining)• Browsing and searching across multiple sites and datasets: using IDs, terminology services and mapping services• Linking scholarly activities (annotations, excerpts, representations, etc.) to the manuscript descriptions, objects and publications which they reference

Implications and assumptions

• Multiple naming schemes are necessary for describing entities and relationships. There is no single authoritative vocabulary or ontology. • This is a permanent work-in-progress – not just for adding new entities

and relationships, but for re-thinking and enriching existing entities and relationships.

• Entities need to include individuals as well as categories. Hierarchical categorization is useful, but only as far as it helps with finding or

browsing. There is no definitive classification structure.• Relationships between entities are crucial. The network graph is the basic structure – not the database, text, Web site. The context is essential: <this> is related to <that>, and here is <the evidence> for the relationship.• The focus should be on organizing knowledge and meaning (represented by semantic entities), not on organizing collections. • Linking data is more of an organizational and cultural problem than a

technical one

Re-creation of Phillipps’ shelves, Grolier Club

The Phillipps manuscript collection

• Phillipps’ own printed catalogue (1837-1871) goes up to no. 23,837

• Thomas Fitzroy Fenwick (grandson, d. 1938) spent fifty years reorganizing and renumbering: up to no. 38,628

• Fenwick’s estimate of the total was close to 60,000 volumes and individual documents

• Phillipps also owned 50,000 books, as well as many prints, photographs, drawings and paintings

Sir Thomas Phillipps (1792-1872)

Dispersal of the collection

Fenwick family (1886-1945):

• Sales to interested libraries and governments (Germany, Belgium, Netherlands, France, Ireland, Wales) – more than 2,500 items

• Auctions at Sotheby’s, 1886 to 1938 – 22 auctions, more than 22,000 lots, raised £97,000 (over £30 million)

• Residue (12,000 items) sold to the Robinson brothers in 1945 for £100,000 (£11-12 million)

W.H. Robinson Ltd (1945-1958):

•Series of sale catalogues, 1945-1954

•Donation to the Bodleian Library of the remaining materials, 1958

Sotheby’s (1946-1950, 1965-1977):•Series of sale catalogues

Research questions and use cases

• Show all the Irish manuscripts acquired by Phillipps, together with their previous and subsequent history of ownership, acquisition and sales

• Show all the events which link Phillipps to an earlier or later collector (e.g., Guglielmo Libri, Chester Beatty)

• How many Phillipps manuscripts are now in North American collections, and where are they?

• What can we learn about the sources of the Phillipps Collection, the nature of its contents, and the extent of its dispersal?

Data sources

Source Format Comments

Schoenberg Database of Manuscripts

Relational database Incorporates other sources, esp. sales catalogues6,000 Phillipps MSS; 20,000 Phillipps events

Library catalogues (BL, KB etc.)

Relational databases

Generally MARC recordsProvenance in notesExport can be awkward 

Union cataloguesRelational databases

Printed bibliographies

Formats varyCoverage variesExport can be awkward 

Sale catalogues

 Printed books (some digitized)

Online sources (PDFs, Web sites) 

Many included in Schoenberg MSS in ABE, eBay etc.

Phillipps catalogues and lists

 Printed book; Partly digitizedSupplemented by handwritten notes 

Partly included in SchoenbergHandwritten notes not digitized

Phillipps provenance indexes (BL, IRHT)

Handwritten; Not digitized

 Arranged by Phillipps numberNo longer updated 

Annotated sales catalogues & printed catalogues

Handwritten; Not digitized

 Researchers (Munby), owners (Phillipps), auctioneers (Sotheby’s)Held in Cambridge UL, Bodleian, BL 

In 1862, Sir Thomas Phillipps bought Phillipps MS 16402 in London as part of the Sotheby’s sale of the collection of Guglielmo Libri.

DATA MODEL – Nodegoat

Object Sub-objects Related to: 

PERSON Nationality (country) ManuscriptTextCatalogue

ORGANIZATION Location (city; country) ManuscriptTextCatalogue

MANUSCRIPT SoldDonatedOwnedDescribed InProducedContents 

Person/Organization: Agent, Owner, Buyer, Donor, Recipient, Scribe, Artist, ProducerLocation (city; country)CatalogueText

TEXT Person: AuthorManuscript

CATALOGUE 

  Organization: PublisherPerson: CompilerManuscript

Going beyond the data

• This is only about organizing the data – the first steps in the research process – what about the later steps?• Computational representation of research processes – modelling humanities research in the digital environment• What is the goal?•Making it quicker and easier to gather and organize evidence?•Changing the way we do humanities research?

Research processes: big science

Humphrey, Charles. (2006) “e-Science and the life cycle of research” http://datalib.library.ualberta.ca/~humphrey/lifecycle-science060308.doc

When I go to libraries or archives, I make notes in a continuous form on sheets of paper, entering the page number and abbreviated title of the source opposite each excerpted passage. When I get home, I copy the bibliographical details of the works I have consulted into an alphabeticised index book, so that I can cite them in my footnotes.

I then cut up each sheet with a pair of scissors. The resulting fragments are of varying size, depending on the length of the passage transcribed. These sliced-up pieces of paper pile up on the floor. Periodically, I file them away in old envelopes, devoting a separate envelope to each topic. Along with them go newspaper cuttings, lists of relevant books and articles yet to be read, and notes on anything else which might be helpful when it comes to thinking about the topic more analytically.

If the notes on a particular topic are especially voluminous, I put them in a box file or a cardboard container or a drawer in a desk. I also keep an index of the topics on which I have an envelope or a file. The envelopes run into thousands.

When the time comes to start writing, I go through my envelopes, pick out a fat one and empty it out onto the table, to see what I have got.

Keith Thomas, “Diary”, London Review of Books, 10 June 2010

The humanities researcher’s data: pre-digital• Annotations – collected in books, on sheets of paper, on cards, in notebooks

• Excerpts – collected on sheets of paper, on cards, in notebooks, in commonplace books

• Citations – recorded on cards, in notebooks

• Categorization scheme(s) – for filing and arranging notes (on index cards, file dividers, box labels)

• Storage mechanisms – including files, envelopes, cardboard boxes

• Collections of sources: Personal book collections, journals, off-prints and cuttings, image collections, map collections, microfilms – the ever-expanding office…

• Sharing: in own publications (especially via footnotes1), in bibliographies, through personal contact (e.g. letters)

1 Grafton, Anthony, The footnote: a curious history (London: Faber and Faber, 1997)

The humanities researcher’s data: digital world• Annotations and excerpts – in Word documents, Google Docs, citation management databases (EndNote)

• Citations – citation management databases (EndNote) and Web services (Zotero, Mendeley); downloaded or keyed-in

• Categorization scheme(s) – tagging, keywords

• Storage mechanisms – digital storage of various kinds (logical and physical) – hard drives, USBs, data stores

• Personal digital collections – iPads, Web sites (Omeka, Dropbox, Flickr)

• Copies of source materials – downloaded or scanned / photographed; stored on various media (or linked to using URLs and DOIs)

• Sharing – e-mail, social media, Web collaboration services (Google, Confluence)

Digital methods…

• Force you to be more overt and explicit about your assumptions, processes, methods, approach – to externalize and reify• Help you to visualize the evidence – to explore the relevant material, to navigate complexity• Some risks:– Rigidity, over-simplification, single perspective– Loss of serendipity– Mistaking the visualization for the analysis– Conflating the organizing of the evidence with

explanations and conclusions

Jean-Claude Gardin on humanities systems• ‘Searching in natural language’: fallacious; “representation issues are still with us”• Ordering and classification: “rather primitive”• Simulation: qualitative variables and values• Dynamic system modelling: quantified variables, numerical functions • Artificial intelligence and expert systems: logic, inference engines – data + rules of inference – but discursive rather than formal rules of reasoning

“It is incumbent upon the humanities as a ‘distinct’ science to define its own, alternative ways of reasoning”“Our primary concern should be the study of mental processes at work in archaeological reasoning, with a view to making them amenable to machine handling in a Turing sense – that is, with or without computers”“The formulation of rules of reasoning is here essential, in whichever form we chose to express them”

J-C. Gardin, “The Impact of Computer-based Techniques on Research in Archaeology”, in: Scholarship and Technology in the Humanities, ed. May Katzen (London: Bowker-Saur, 1991), pp. 95-110

A logic of historical thought

• Frame the question• Revise and refine, test and verify• Must be resolvable in empirical terms• Fictional, counterfactual questions: heuristically useful but unprovable

• Use evidence to build up an explanation• Classificatory concepts, e.g. feudalism• Statistical generalizations – inferred by a special form of reasoning

• Assemble in temporal order as a narrative• Causation – influences, factors, elements – by reference to antecedents• Individual motivation and group behaviour• Inference using analogies: heuristically useful but unprovable

• Construct an argument (produce a historical interpretation)• Proceed from premises to a conclusion through rational inference• Define and clarify the terms (e.g., democracy, capitalism, nationalism) –

simple, working, consistent definition

David Hackett Fischer, Historians’ Fallacies: Toward a Logic of Historical Thought (London: Routledge & Kegan Paul, 1971)

Fischer’s logical steps Systems requirements

Framing the question Searching and browsing Counterfactuals and “what if?”

Using evidence to build up an explanation

[Assembling evidence]

Organizing evidence – classification  Linking within the evidence – causation, analogy

Working at scale – from the individual to the general

Reasoning – inference  Working with time – constructing a narrative

Constructing an argument – producing a historical explanation

Defining terms

Reasoning: premises – inference – conclusion [Distributing the results]

Dr Toby BurrowsMarie Curie FellowDepartment of Digital HumanitiesKing’s College London26-29 Drury LaneLondon WC2B 5RL

[email protected]@tobyburrowstobyburrows.wordpress.com

Questions: Digitization

1. How much of the material you need for your research has not been digitized?

2. How should the priorities for future digitization be decided?

3. How useful to you are services which only contain digital or digitized materials?

4. Should everything be free?

Questions: Tools

1. How do you choose which software to use for your research?

2. Do you use generic software, or tools specific to medieval studies?

3. What cloud- or Web-based tools do you use?

4. What is your university’s attitude to Open Source software?

5. What sort of support do you get with using software?

Questions: Computation and representation

1.What do we mean by “data” in the humanities? Can medieval research be data-centred?

2.To what extent can research processes in medieval studies be modelled for use in a computational environment?

3.What’s the point of visualizations?