data (re)use in the humanities: the example of digipal · 2017. 3. 10. · • running from october...

17
http://digipal.eu Data (Re)Use in the Humanities: The Example of DigiPal Peter A. Stokes Department of Digital Humanities King’s College, London [email protected] @pa_stokes @digipalproject Workshop on Research Data Management and Sharing, 18–19 September 2014

Upload: others

Post on 01-Apr-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

Data (Re)Use in the Humanities:

The Example of DigiPal

Peter A. Stokes

Department of Digital Humanities

King’s College, London

[email protected] @pa_stokes @digipalproject

Workshop on Research Data Management and Sharing, 18–19 September 2014

Page 2: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

The Key Question of DigiPal

‘How is it possible to proceed in such a way

that the description of a specimen of

handwriting is as clear and convincing to its

reader as it is to its author?’

A. Derolez, The Palaeography of Gothic Manuscript Books

(2003), p. 7

Page 3: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

The DigiPal Project

• Running from October 2010 to September 2014

• Three primary components:

1. A model of handwriting and manuscripts, manifested in

a digital framework for exploration and communication

2. The application of this framework to a research context:

English Vernacular minuscule of the 11th century

3. The analysis of the data in (2), using (1) to answer

research questions (through articles, monograph, etc.)

Page 4: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting
Page 5: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting
Page 6: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting
Page 7: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

Page 8: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

Page 9: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

Page 10: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

DigiPal Content

Most material on the site is original content, so we can share it:

– 30 short blog articles on academic topics related to the project

– Approx. 30,000 lines of software

– 1,434 records of scribal hands (incl. some Latin)

– 57,283 marked-up images of individual letters (graphs)

– 8,941 structured descriptions of idiographs

– 1,222 structured descriptions of graphs

– 419 unstructured prose descriptions of scribal hands

Some is copyright of various libraries, archives, authors,

publishers:

– 1,578 descriptions of manuscripts and charters

– 795 high-resolution images of pages from books and documents

– Several software libraries from various sources

Page 11: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

Challenges in DigiPal

• Content is visible to anyone but locked into ‘silo’ website

• We want to deposit content into repositories for reuse

• We have made data directly accessible through web API – You can harvest data to train OCR, machine vision, etc.

– You can deliver content through aggregated search sites (Medieval Electronic Scholarly Alliance, Manuscripts Online, Shared Canvas…)

– You can create ‘widgets’ to embed in your own websites

– You can create plugins for teaching in VLEs or MOOCs

– You can create downloadable mobile apps

– You can do many things we haven’t imagined!

There is already a lot of interest in this data, but we are only allowed to make some of it available

Page 12: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

Challenges in DigiPal: Image Rights

‘Scholars in the humanities, especially those concerned with

images, face a bewildering array of restrictions. A confusing

patchwork of policies regarding access to images, image

reproduction, and cultural heritage citation is hindering new

research and publication in the humanities.’ Max Planck Institute, Best Practice For Access to Images (2009), p. 1

‘In many cases, tax-funded or state-supported research

projects must expend significant financial and human

resources on negotiating and paying for reproduction rights,

even if those rights are being obtained from state repositories’

‘Computation and Palaeography’, Dagstuhl Manifestos 2:1 (2013), p. 15

Page 13: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

Challenges in DigiPal: Image Rights

• Images must be sufficient quality – Low-quality legacy images released as CC are usually insufficient

• Can’t use fixed-term licences – What happens when the licence expires?

• Can’t depend on remote image servers – What happens when the server setup changes after funding ends?

Who pays to rewrite our software?

• Must allow manipulation of images – We have to crop, enhance, rotate etc. for the framework to operate

• Must allow open reuse of data – Most funders require OA/public IP; licences sometimes disallow this

These may be surmountable for one or two repositories, but what happens when five, ten, fifty are involved?

Page 14: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

Successes in DigiPal

• Academics have been extremely generous with their data

• Librarians and archivists are genuinely trying to be helpful

– Typically haven’t seen anything like this before

– Often hampered by other factors, especially financial

– Often depends very much on national/local government policy

• Libraries and archives are increasingly contacting us

to ask what they should do with their images

• ‘Leading by example’: as people see the advantages of

what can be done, they start to come on board

– Will Noel (Walters Art Gallery/UPenn) as leading example

Things have changed a lot in the four years of the project

Page 15: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

Draft Copyright/Permissions Policy

• Our software is GPL 3.0 (apart from 3rd-party libraries)

• News and blog posts are CC-BY

• Data is CC-BY, except when it’s not ours to give

• Images of manuscript pages are whatever the library or

archive has decided (typically fully restricted but sometimes CC)

• The coordinates of the cropped images of letters are ours

and therefore fall under CC-BY data

• The cropped images themselves are derivative works (?)

or fair use (?) and therefore are ours to release as CC-BY

(?),

except where explicitly excluded by the image licences

Page 16: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

Conclusions

• I doubt that a single EU-wide policy is realistic (yet)

– There are too many local issues and constraints (esp. financial)

– It’s not helpful to present more rules that put people off

• Nevertheless, a template licence has proven appealing

– Allows room for negotiation and opens up discussion

– People are often genuinely bewildered and appreciate help

• More guidance would be very helpful

– Are our cropped images really covered by ‘fair use’ or not?

– What if the images are only used to train OCR systems?

– What else should we be copyrighting? ‘Look and feel’? Banner?

– Should we be using CC-BY? What about CC-BY-NC?

– Who should agree and sign licences? The PI, HI, ERC?

Page 17: Data (Re)Use in the Humanities: The Example of DigiPal · 2017. 3. 10. · • Running from October 2010 to September 2014 • Three primary components: 1. A model of handwriting

http://digipal.eu

Thanks to

The DigiPal Team (see website for details)

The European Research Council for this workshop

and for funding DigiPal

http://digipal.eu/ @digipalproject

The research leading to these results has received funding

from the European Union Seventh Framework Programme (FP7)

under grant agreement n° 263751.