building a new political sphere? early european integration in dutch digitized newspapers mariona...
Post on 14-Dec-2015
221 Views
Preview:
TRANSCRIPT
Building a New Political
Sphere? Early European Integration in Dutch Digitized Newspapers
Mariona Coll Ardanuy
Maarten van den Bos
Caroline Sporleder
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 2
Introduction
● October 29th, 2004: Treaty establishing a Constitution for Europe
● 2005: rejection by referendum in France and The Netherlands
● 2005: ratifying process is stopped due to lack in public support
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 3
European Integration Historiography
● Need of a transnational history of the public opinion of the European project
● Focus on the role of:– Non-state actors– Civil society– Public opinion
● Distance focus from interstate relations and government policies
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 4
The role of media
● Mass digitization of historical materials● Enhancement of digital techniques
Media: «an important but mostly overlooked player in integration history»,
H.J.Trenz (2008)
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 5
Approach
● Automatically extraction of networks of people mentioned in news stories– Weighted according to significance– Distributed according to co-occurrence– Dynamic to represent change– Containing most relevant concepts discussed
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 6
Social Networksin Historical Research
● Examples:
– Padgett, Ansell 1993: Rise and action of the Medici
– Rochat et al 2014: Rise of Venetian maritime empire
– Jackson 2014: Unseen relationships in medieval Scotland
● Graphs created manually
● Mostly do not use newspapers as source
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 7
Computational Background
● In European integration studies, computational techniques to map out public discourse are in the earliest stage:– de Roode (2012): 1000 editorials, by hand– Medrano (2003) and Meyer (2010): traditional
techniques applied to newspapers● Entity-centric analysis of data, popular in
quantitative literary analysis (Coll Ardanuy and Sporleder 2014)
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 8
Advantages of the network approach
● No rigid theory● Draws our attention to public debate● Bird's eye view of a period of time● Generator of hypotheses
9
The Data
● Digitized newspapers from Dutch Royal Library
● Three newspapers:
– De Tijd (catholic)
– Het Vrije Volk (socialist)
– De Telegraaf (no formal political affiliation)● Period: 1945-1955
● Restrict search: 'Europa', 'Europese', 'Europeaan'
● High OCR confidence
● Total number of articles: 6128
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 10
The Method: outline
● Obtaining the nodes– Human Name Recognition– Coreference Resolution
● Establishing the relations● Building the network
CLARIN-NeDiMAH Workshop, The Hague 11
Obtaining the Nodes:Human Name Recognition
● Stanford Named Entity Recognizer
– Training data for modern Dutch (CoNLL-2002 shared task)
– 309683 tokens, 3032 person names● Heuristics to increase recall:
– Create a list of 1650 titles or professions preceding human names:
● From wikipedia (Lijst_van_beroepen)● Expand list by capturing the uncapitalized● word between an age expression and a● capitalized word
– Capture any capitalized word after an age expression or a title or profession
● F-score improvement from 0.70 to 0.76
Ex 1: 'de 63-jarige Frank Donoghue'Ex 2: 'de kapitein Ben Shaw'Ex 3: 'de 21-jarige pianist Theo'
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 12
Obtaining the Nodes:Coreference Resolution
● For each identified human name, we keep:
– Age information
– Title or profession information● Coreference resolution per document by string matching:
– Assumption: two identical surface forms in the same article refer to the same person, we keep the less ambiguous referent
● We do not perform disambiguation of names, coreference in the whole dataset by string matching from least to most ambiguous, with considerations:
– Match initials, hypocorisms
– Age or title/profession conflict
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 13
Establishing the relations
● Edge: co-occurrence of nodes per article● Characteristics of the network:
– Undirected– Weighted– Dynamic
December 8th, 2014 HistoInformatics Workshop, Barcelona 14
Establishing the RelationsAttributes of the node
● We extract three attributes per node:– Other names by which the entity is referred– Year the entity was born– Title or profession preceding the entity
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 15
Establishing the RelationsAttributes of the edge
● We extract three attributes per pair of nodes:– Tf-idf weighting for the common documents– Most common words of the common documents– List of co-ocurring news articles
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 16
Fragment of a network
December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 17
Building of the network
● Dynamic networks: succession of yearly static networks
● Python library Networkx to construct the networks● Network analysis software Gephi to visualize them
December 8th, 2014 HistoInformatics Workshop, Barcelona 18
Analysis: expected results
● Important presence of personae such as:
Robert Schuman, Dirk Stikker, Ernest Bevin, Konrad Adenauer, Winston Churchill, Georges Bidault, Willem Drees, Alcide de Gasperi, Jean Monnet
● Socialist newspaper gives more weight to local stories and less weight to big names– 10 most common nodes in De Tijd: 16%– 10 most common nodes in Het Vrije Volk: 10%
December 8th, 2014 HistoInformatics Workshop, Barcelona 19
Analysis: interesting results
● Importance of politics:– central actors in networks are continuously politicians– political process rather than economic
● Ideologically-motivated process:– process of peace– central concepts are: 'solidariteit' and 'gemeenschapszin'
● Continued centrality of American politicians even though from May 1950 the process was seen as a truly European matter
December 8th, 2014 HistoInformatics Workshop, Barcelona 20
Conclusions
● Ongoing work● Computational techniques to strengthen the
empirical foundations of new European integration history
● Network extraction:– Raises new questions– Allows more refined search– Furthers the scope of inquiry
● European Integration history is transnational, multilingual and ramified
December 8th, 2014 HistoInformatics Workshop, Barcelona 21
Thank you for your attention
top related