europeana newspapers german infoday - digitale zeitungsarchive als quellen (digitaler)...
TRANSCRIPT
Digitale Zeitungsarchive als Quellen (digitaler)
Geschichtsforschung
Dr. Pim Huijnen
Universität Utrecht
Berlin, 28.02.2014
www.translantis.nl
Translantis
Digital Humanities Approaches to Reference Cultures; The Emergence of the United States in Public Discourse in the Netherlands, 1890-1990
“…uses digital technologies to analyze the role of reference cultures in debates about social issues and collective identities, looking specifically at the emergence of the United States in public discourse in the Netherlands from the end of the nineteenth century to the end of the Cold War.
!
The United States as a reference culture
Business
Society
Consumption
Media
Crime
Health
Amerikanisierung
Business/economy: Americanization
1870-1914 - 1918-1940 - 1945-1989
Fordism Taylorism
Professionalization Managerism
Productivity
Rationalisation Efficiency
Standardization Mass production
Mass market Consumer society
Credit
Consultancy Accountancy
Abweisung, Aneignung,
Verflechtung
Leeuwarder Courant, 27 oktober 1950
Die USA als Referenz-Kultur
27 oktober 1195r 500 anntt, 2
Un
ited
Sta
tes in
Du
tch
ne
ws m
ed
ia
"!
#"""!
$"""!
%"""!
&"""!
'"""!
("""!
)"""!
*"""!
#*'"!
#*'%!
#*'(!
#*'+!
#*($!
#*('!
#*(*!
#*)#!
#*)&!
#*))!
#**"!
#**%!
#**(!
#**+!
#*+$!
#*+'!
#*+*!
#+"#!
#+"&!
#+")!
#+#"!
#+#%!
#+#(!
#+#+!
#+$$!
#+$'!
#+$*!
#+%#!
#+%&!
#+%)!
#+&"!
#+&%!
,-./..0123.!4565.0,!
,78./19
660:;<
,!
Text mining for historical research
National Library Den Haag: ~9.000.000 digitized pages from Dutch news media 1618-1995
Opportunities for comparative and transnational historical research
(esp. History of mentalities/ of ideas)
Development of a digital text mining tool
!
!
Digital research on public debates
servers nodig voor opslag (500 gb aan data) computers nodig voor computationele bewerking (geheugen) duurzaamheid nodig bij opslag en bestandsformaten (min. 5 jaar – maar liefst oneindig) beheer nodig (mankracht)
programmeerkennis nodig
Big Data?
The change of scale has led to a change of state. The quantitative change has led to a qualitative one. […]
[B]ig data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value
Viktor Mayer-Schönberger en Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work, and Think
(Boston 2013) 13.!
“
Big Data!
“Letting the data speak”
Top-down vs. bottom-up
Bob Nicholson, ‘The Digital Turn’, Media History 19 (2013) 59-73.!
Query: ‘Standard oil’ <1900 (1030 hits)
Wortwolke
Word cloud ‘manager’ 1910-1920
(3437 hits)
Word cloud ‘manager’ 1945-1950
(1173 hits)
Voyant word cloud ‘efficiency’ 1945-1960 (46040 hits)
Voyant word cloud ‘efficiëntie’ 1945-1990 (2861 hits)
Histogram
Query: ‘consultancy’ (2167 hits)
Histogram (SPSS)
Query: ‘manager’ (191.710 hits)
BILAND
Query: ‘Heredity’ (1876) (22/1465 hits)
BILAND
Query: ‘Heredity’ (1935) (1465 hits)
BILAND
Query: ‘Hygiene’ (87/41 hits)
‘Typisch Amerikanisch’
Topic modeling
SPSS
Translantis
Query: ‘manager’ (191.710 hits)
Translantis
Query: ‘manager’ in advertenties (82.695 hits)
=8>1/1:;<.!;?;@A:!
query
kwantitatieve analyse
kwalitatieve analyse
inzicht
Digital research on public debates
No limitation source material
No selection issues
No representativeness issues
Enabling research on hidden debates, mentalities, implicit notions
Reproducibility of research, from various perspectives
Source criticism: data
representativeness
internal coherence
(OCR) quality
"!
#"""!
$"""!
%"""!
&"""!
'"""!
("""!
)"""!
*"""!
#*'"!
#*'%!
#*'(!
#*'+!
#*($!
#*('!
#*(*!
#*)#!
#*)&!
#*))!
#**"!
#**%!
#**(!
#**+!
#*+$!
#*+'!
#*+*!
#+"#!
#+"&!
#+")!
#+#"!
#+#%!
#+#(!
#+#+!
#+$$!
#+$'!
#+$*!
#+%#!
#+%&!
#+%)!
#+&"!
#+&%!
,-./..0123.!4565.0,!
,78./19
660:;<
,!
Representative?
Representative?
Libraries, archives, museums and other collection institutions have now been digitising corpora of material for many years, but with a very few exceptions, it is still quite rare for an entire run of primary sources to be digitised and made available online. This means that there are gaps within the digital record. Yet it is unusual for online resources to actively demonstrate these gaps; resources may be advertised as a growing corpus, but when searching through or downloading a digital resources there is rarely any indication of what has not been digitised. This skews the sense of the nature of the collection the scholar is working with and erodes trust.
“
Abstract submitted to DH2014 by Alastair Dunning (The European Library) and Clemens Neudecker (KB National Library of the Netherlands).
See: http://availableonline.wordpress.com/
Source criticism: comparison
Source criticism: Press history
[O]ne of the biggest challenges facing press historians will be to ensure that the historical agency and complex materiality of newspapers are not forgotten in a rush to mine their contents.
Bob Nicholson, ‘The Digital Turn’, Media History 19 (2013) 59-73, on
p. 67
“
Source criticism (interpretation)
Newspapers = public debate?
What newspapers write = what public thinks?
How to interprete results?
What are stopwords? (“staat”) !
Mining for meaning?