Transcript
Page 1: Big Data in the Arts and Humanities

Big Data in the Arts and HumanitiesAndrew Prescott, University of Glasgow

AHRC Theme Leader for Digital Transformations

Big Data in a Transdisciplinary Perspective 7th Herrenhausen Conference of the Volkswagen Foundation

25 March 2015

Page 2: Big Data in the Arts and Humanities

Neurone activity in the brain of a zebra fish embryo. Each video sequence is one terabyte in size.

Ahrens, M. B. & Keller, P. J. Nature Meth. http://dx.doi.org/10.1038/NMETH.2434 (2013)

Page 3: Big Data in the Arts and Humanities

The high frequency telescopes of the Square Kilometre Array will produce 1 exabyte per day (more than current global internet traffic) in

first phase. This will eventually rise to many Petabits (1015) per second, more than 10 times the current global internet traffic

Page 4: Big Data in the Arts and Humanities

BIG HUMANITIES DATASETS

Sound and Video: • Shoa Holocaust Survivors testimonials collection is 20 terabytes

(cf. Sloan Digital Sky Survey 10 terabytes)• The BBC’s digital assets are estimated at about 52 petabytes of

dataStructured data: • US National Archives and Records Administration: 142 TB of

data; estimated 347 PB by 2022 • Ancestry holds 14 billion records and is adding 2 million records

daily. Brightsolid's (Findmypast) new data centre in Aberdeen will have 400 petabytes of storage

• Web archives: multi-petabyteLinguistic corpora:• Corpus of American Contemporary English: 450 million words• Wikipedia Corpus: 1.9 billion words• Google American books n-grams: 155 billion words

Page 5: Big Data in the Arts and Humanities

THE CHANGING NATURE OF THE PRIMARY MATERIALS OF HUMANITIES RESEARCH

• The papers of the British prime minister William Ewart Gladstone (1809-1898): approx. 160,000 documents in 762 volumes.

• Margaret Thatcher archive: 1 million documents in 3,000 boxes occupying 300 metres of shelving

• Enron Corporation Corpus, acquired by Federal Energy Regulatory Commission during enquiry into corporation’s collapse. Approx. 600,000 e-mails generated by 158 employees; about 423MB (zipped).

Page 6: Big Data in the Arts and Humanities

Electronic records from the Executive Office of the President during the second presidency of George W. Bush: 82 TB of data; 200+ million e-

mail messages; 3+ million digital photographs; 30+ million other electronic records

http://www.georgewbushlibrary.smu.edu/Research/Electronic-Records.aspx

Page 7: Big Data in the Arts and Humanities

###### Begin Original ARMS Header ###### RECORD TYPE: PRESIDENTIAL (NOTES MAIL)CREATOR:Sandy Kress ( CN=Sandy Kress/OU=OPD/O=EOP [ OPD ] ) CREATION DATE/TIME:14-JUN-2001 17:13:17.00SUBJECT:: Education statement TO:Claire E. Buchan ( CN=Claire E. Buchan/OU=WHO/O=EOP@EOP [ WHO ] ) READ:UNKNOWN ###### End Original ARMS Header ######

---------------------- Forwarded by Sandy Kress/OPD/EOP on 06/14/2001 05:13 PM ---------------------------

Sarah Pfeifer 06/14/2001 04:59:34 PM Record Type: Record

To: Sarah E. Youssef/OPD/EOP@EOP, Brian R. Besanceney/OPD/EOP@EOP, Sandy Kress/OPD/EOP@EOP cc:Subject: Education statement

---------------------- Forwarded by Sarah Pfeifer/OPD/EOP on 06/14/2001 04:59 PM ---------------------------

Sarah Pfeifer 06/14/2001 04:59:00 PM Record Type: Record

To: See the distribution list at the bottom of this message cc: Subject: Education statement

This statement has been approved by the President. Harriet called me several minutes ago with one last change, which I have incorporated.

Message Sent To:_____________________________________________________________ Harriet Miers/WHO/EOP@EOP

John Gardner/WHO/EOP@EOP Barbara A. Barclay/WHO/EOP@EOP Debra D. Bird/WHO/EOP@EOP Carolyn E. Cleveland/WHO/EOP@EOP

E-mail by B. Alexander (Sandy) Kress, Senior Adviser to President George W. Bush on Education, concerning the drafting of the No Child

Left Behind Act in 2001

http://www.georgewbushlibrary.smu.edu/en/Research/Electronic-Records/Email.aspx#Email

Page 8: Big Data in the Arts and Humanities

• Visualisation of relationship between terms in Wikileaks Significant Action Reports real to Iraq

• Big data: ‘whose size forces us to look beyond the tried-and true methods that are prevalent at that time’ (Jacobs, 2009)

• Illustrate how big data is already a current issue for humanities researchers

• Suggests humanities becoming not only more quantitative, but also more visual, haptic and exploratory

Page 9: Big Data in the Arts and Humanities

collateral exposure..?POSSIBLE INFORMATIONmedia diversion..?POSSIBLE INFORMATION

Extract from project publication for Insurance.AES256 by Michael Takeo Magruder (2011), using Wikileaks material to reflect on issues of

information freedom and secrecy in today's ever-shifting media landscape.http://www.takeo.org/nspace/2011-insurance_aes256/

Page 10: Big Data in the Arts and Humanities

Portfolio of Big Data projects funded by UK Arts and Humanities Research Council,

2014-15• Dealing with large textual corpora: UK statute law; mining

the history of medicine

• Linking existing databases: Snapdrgn; Big Data History of Music

• Annotation of unstructured data: DEEP film access; optical music recognition; Lost Visions

• Visualisation: International crime fiction; Seeing Data

• Critical study of data: Our Data Ourselves; Secret Life of a Weather Datum

Page 11: Big Data in the Arts and Humanities

Portfolio of Big Data projects funded by UK Arts and Humanities Research Council,

2014-15• Mapping: Literary History of Edinburgh;

• Internet of Things: archaeological 3D imaging; Tangible Memories

• Reflects range of activities currently used in ‘Big Humanities’.

• Does anything link these together methodologically? Do they represent anything different from what we have previously done?

• Is there a ‘Big Data moment’, or is it simply that data and expertise is now available on a larger scale?

• What distinctive contributions can the arts and humanities make to the Big Data debates?

Page 12: Big Data in the Arts and Humanities

HAVE WE BEEN HERE FOR A LONG TIME?

• If Big Data is defined as data whose size requires us to look beyond tried methods, it has been with us since antiquity

• Invention of writing linked to government need to manage information

• 1086: Detailed register of property in Domesday Book

• 12th century: development of pipe rolls and use of counters in government accounting

• 13th century: alphabetisation of the bible by a team of Dominican friars

Page 13: Big Data in the Arts and Humanities

WHY BIG DATA IS DIFFERENT

• Historical examples like Domesday Book or census were inventories; descriptive and backward-looking

• The aim of Big Data techniques is predictive: ‘We know what you are going to do tomorrow’ (credit score agency)

• Results derive from quantity of data rather than quality; methods ‘inherently inexact but the vast amount of data compensates for the imperfections’ (Mayer-Schonberger, p. 187)

• Ignores causal relationships and looks for co-relations e.g. how lifestyle factors predict likelihood of adhering to medical prediction

Page 14: Big Data in the Arts and Humanities

EXAMPLES OF PREDICTIVE ANALYTICS• Driven largely by finance and retail, but rapidly spreading into other

sectors

• Chicago: Automated Preventive Rodent Baiting Program analyses 31 indicators to predict where rodent infestations will occur

• New York: predicting where unlicensed building conversions have occurred to target inspections and issue vacate orders

• Chicago: Predictive Policing System

• AHRC programme includes projects on online betting on election results, and on legislation

• AHRC-Nesta project to use predictive analytics to improve museum attendance

Page 15: Big Data in the Arts and Humanities

Use of big data techniques in choosing film directors, cast, crew, etc.: the-numbers.com

Page 16: Big Data in the Arts and Humanities

Use of predictive analytics to ‘optimise scripts’ in film and TV: epagogix.com

John Wiley considering using IBM Pure Data analytics in similar way for scientific and academic publishing

Page 17: Big Data in the Arts and Humanities

CHALLENGES OF BIG DATA TO THE ARTS AND HUMANITIES

• Not simply about role of quantification or scientific method in arts and humanities

• Challenges assumptions about role of information in research: if data is big enough, messy or poorly curated data need not be an issue

• Questions existing research methods: ‘data-driven research’

• Undermines assumptions about causality and human agency

• Role of retail and financial agencies in developing these methods - the enclosure of data

• Challenges existing critical and theoretical frameworks: not ‘end of theory’ but ‘big data needs big theory’

Page 18: Big Data in the Arts and Humanities

HOW THE ARTS AND HUMANITIES CAN ADDRESS BIG DATA CHALLENGES

• Developing new theoretical frameworks and responses: critical data studies

• Providing models in areas such as causality and ‘messiness of data’

• Exploring the spaces and flow of big data

• Promoting moral values of humanities research in a big data world

• Role of design

• ‘Radical contextualisation’ of big data

• Humanisation of big data

Page 19: Big Data in the Arts and Humanities

THE NEED FOR BIG THEORY

• Chris Anderson in Wired 2008: ‘Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves’.

• New York Times, 2010: ‘The next big idea in language, history and the arts? Data. Members of a new generation of digitally savvy humanists argue it is time to stop looking for inspiration in the next political or philosophical ‘ism’ and start exploring how technology is changing our understanding of the liberal arts. This latest frontier is about method, they say, using powerful technologies and vast stores of digitised materials that previous humanities scholars did not have’.

• Charles Darwin (cited by Callebut): ‘all observation must be for or against some view if it is to be of any service’

Page 20: Big Data in the Arts and Humanities

THE NEED FOR BIG THEORY

• Bowker (2006): Raw data is both an oxymoron and a bad idea; to the contrary, data should be cooked with care

• Huggett (2014): Data are not 'out there', waiting to be discovered; if anything, data are waiting to be created. Information about the past is situated, contingent, and incomplete; data are theory-laden, and relationships are constantly changing depending on context.

• Kitchen and Lauriault (2014): Data are situated, contingent, relational, and framed, and used contextually to try and achieve certain aims and goals

Page 21: Big Data in the Arts and Humanities

CRITICAL DATA STUDIESDalton and Thatcher, What does a critical data studies look like, and why do we care? Seven points for a critical approach to ‘big data (Society and Space, 2014)

1. situate data regimes in time and space 2. expose data as inherently political and whose interests they serve 3. unpack the complex, non-deterministic relationship between data and society 4. illustrate the ways in which data are never raw 5. expose the fallacies that data can speak for themselves and that big data will replace small data 6. explore how new data regimes can be used in socially progressive ways 7. examine how academia engages with new data regimes and the opportunities of such engagement

Page 22: Big Data in the Arts and Humanities

lifeofdata.org.uk

Page 23: Big Data in the Arts and Humanities

big-social-data.net

Our Data Ourselves

Page 24: Big Data in the Arts and Humanities

RETHINKING THE IMPLICATIONS OF BIG DATA• Is a switch from causality to co-relation so radical?

• As long ago as 1946, the historian Marc Bloch argued against the ‘idol of origins’ and sought a history with stronger social and cultural understanding

• Pioneering work of humanities scholarship such as Annales School of historians has lot to contribute in terms of integrating methodology, data and new techniques

• Continued importance of critical understanding of data, as Google flu trends controversy illustrates

• Experience of humanities scholars in dealing with complex and messy historical datasets potentially very relevant

Page 25: Big Data in the Arts and Humanities

Visualisation of ontology for linking information about people in the ancient world developed by

the Standards for Networking Ancient Prosopographies project:

snapdrgn.net

Page 26: Big Data in the Arts and Humanities

seeingdata.org: includes videos on ‘Making Sense of Data Visualisations’

Page 27: Big Data in the Arts and Humanities
Page 28: Big Data in the Arts and Humanities

Erica Savig, M.Arch. PhD Candidate, Cancer Biology Stanford University Lab of Garry P. Nolan National Science Foundation Graduate Research Fellow Stanford Graduate Research Fellow

Common Design Strategies for Exploring Signaling Networks in Biology and Intellectual Geographies in History

Nicole Coleman Director, Humanities + Design Stanford University

Page 29: Big Data in the Arts and Humanities

Component and Behavior for Protein 1

Component and Behavior for Protein 2

Component and Behavior for Protein 3

Parametric Modeling Quantitatively Maps Single Cell Protein Levels to Individual Qualitative Components

Page 30: Big Data in the Arts and Humanities

Michael Takeo Magruder, Data Flower: www.takeo.org

Page 31: Big Data in the Arts and Humanities

Fabio Lattanzi Antinori The Obelisk, 2012

http://fabiolattanziantinori.co

m/obelisk.php

Page 32: Big Data in the Arts and Humanities

co-curate.ncl.ac.ukpararchive.com

bloodaxe.ncl.ac.uk affectivedigitalhistories.org.uk

Page 33: Big Data in the Arts and Humanities

Tim Hitchcock on Big Data, Small Data and Meaning (historyonics.blogspot.co.uk):

‘Big Data’ supposedly lets you get away with dirty data.  In contrast, humanists do read the data; and do so with a sharp eye for its

individual rhythms and peculiarities – its weirdness. 

In the rush towards 'Big Data' – the Longue durée, and automated network analysis; towards a vision of Humanist scholarship in which

Bayesian probability is as significant as biblical allusion, the most urgent need seems to me to be to find the tools that allow us to do the job of close reading of all the small data that goes to make the bigger variety…we need to be able to contextualise every single word in a

representation of every word, ever. Every gesture contextualised in the collective record all gestures; and every brushstroke, in the collective

knowledge of every painting. 

Page 34: Big Data in the Arts and Humanities

Towards a ‘radical contextualisation’: Mapping Metaphor with the Historical Thesaurus of the English Language

http://blogs.arts.gla.ac.uk/metaphor/

Page 35: Big Data in the Arts and Humanities

tangible-memories.com


Top Related