building a new political sphere? early european integration in dutch digitized newspapers mariona...

21
Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline

Upload: elyssa-stopher

Post on 14-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

Building a New Political

Sphere? Early European Integration in Dutch Digitized Newspapers

Mariona Coll Ardanuy

Maarten van den Bos

Caroline Sporleder

Page 2: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 2

Introduction

● October 29th, 2004: Treaty establishing a Constitution for Europe

● 2005: rejection by referendum in France and The Netherlands

● 2005: ratifying process is stopped due to lack in public support

Page 3: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 3

European Integration Historiography

● Need of a transnational history of the public opinion of the European project

● Focus on the role of:– Non-state actors– Civil society– Public opinion

● Distance focus from interstate relations and government policies

Page 4: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 4

The role of media

● Mass digitization of historical materials● Enhancement of digital techniques

Media: «an important but mostly overlooked player in integration history»,

H.J.Trenz (2008)

Page 5: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 5

Approach

● Automatically extraction of networks of people mentioned in news stories– Weighted according to significance– Distributed according to co-occurrence– Dynamic to represent change– Containing most relevant concepts discussed

Page 6: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 6

Social Networksin Historical Research

● Examples:

– Padgett, Ansell 1993: Rise and action of the Medici

– Rochat et al 2014: Rise of Venetian maritime empire

– Jackson 2014: Unseen relationships in medieval Scotland

● Graphs created manually

● Mostly do not use newspapers as source

Page 7: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 7

Computational Background

● In European integration studies, computational techniques to map out public discourse are in the earliest stage:– de Roode (2012): 1000 editorials, by hand– Medrano (2003) and Meyer (2010): traditional

techniques applied to newspapers● Entity-centric analysis of data, popular in

quantitative literary analysis (Coll Ardanuy and Sporleder 2014)

Page 8: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 8

Advantages of the network approach

● No rigid theory● Draws our attention to public debate● Bird's eye view of a period of time● Generator of hypotheses

Page 9: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

9

The Data

● Digitized newspapers from Dutch Royal Library

● Three newspapers:

– De Tijd (catholic)

– Het Vrije Volk (socialist)

– De Telegraaf (no formal political affiliation)● Period: 1945-1955

● Restrict search: 'Europa', 'Europese', 'Europeaan'

● High OCR confidence

● Total number of articles: 6128

Page 10: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 10

The Method: outline

● Obtaining the nodes– Human Name Recognition– Coreference Resolution

● Establishing the relations● Building the network

Page 11: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

CLARIN-NeDiMAH Workshop, The Hague 11

Obtaining the Nodes:Human Name Recognition

● Stanford Named Entity Recognizer

– Training data for modern Dutch (CoNLL-2002 shared task)

– 309683 tokens, 3032 person names● Heuristics to increase recall:

– Create a list of 1650 titles or professions preceding human names:

● From wikipedia (Lijst_van_beroepen)● Expand list by capturing the uncapitalized● word between an age expression and a● capitalized word

– Capture any capitalized word after an age expression or a title or profession

● F-score improvement from 0.70 to 0.76

Ex 1: 'de 63-jarige Frank Donoghue'Ex 2: 'de kapitein Ben Shaw'Ex 3: 'de 21-jarige pianist Theo'

Page 12: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 12

Obtaining the Nodes:Coreference Resolution

● For each identified human name, we keep:

– Age information

– Title or profession information● Coreference resolution per document by string matching:

– Assumption: two identical surface forms in the same article refer to the same person, we keep the less ambiguous referent

● We do not perform disambiguation of names, coreference in the whole dataset by string matching from least to most ambiguous, with considerations:

– Match initials, hypocorisms

– Age or title/profession conflict

Page 13: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 13

Establishing the relations

● Edge: co-occurrence of nodes per article● Characteristics of the network:

– Undirected– Weighted– Dynamic

Page 14: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 HistoInformatics Workshop, Barcelona 14

Establishing the RelationsAttributes of the node

● We extract three attributes per node:– Other names by which the entity is referred– Year the entity was born– Title or profession preceding the entity

Page 15: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 15

Establishing the RelationsAttributes of the edge

● We extract three attributes per pair of nodes:– Tf-idf weighting for the common documents– Most common words of the common documents– List of co-ocurring news articles

Page 16: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 16

Fragment of a network

Page 17: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 17

Building of the network

● Dynamic networks: succession of yearly static networks

● Python library Networkx to construct the networks● Network analysis software Gephi to visualize them

Page 18: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 HistoInformatics Workshop, Barcelona 18

Analysis: expected results

● Important presence of personae such as:

Robert Schuman, Dirk Stikker, Ernest Bevin, Konrad Adenauer, Winston Churchill, Georges Bidault, Willem Drees, Alcide de Gasperi, Jean Monnet

● Socialist newspaper gives more weight to local stories and less weight to big names– 10 most common nodes in De Tijd: 16%– 10 most common nodes in Het Vrije Volk: 10%

Page 19: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 HistoInformatics Workshop, Barcelona 19

Analysis: interesting results

● Importance of politics:– central actors in networks are continuously politicians– political process rather than economic

● Ideologically-motivated process:– process of peace– central concepts are: 'solidariteit' and 'gemeenschapszin'

● Continued centrality of American politicians even though from May 1950 the process was seen as a truly European matter

Page 20: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 HistoInformatics Workshop, Barcelona 20

Conclusions

● Ongoing work● Computational techniques to strengthen the

empirical foundations of new European integration history

● Network extraction:– Raises new questions– Allows more refined search– Furthers the scope of inquiry

● European Integration history is transnational, multilingual and ramified

Page 21: Building a New Political Sphere? Early European Integration in Dutch Digitized Newspapers Mariona Coll Ardanuy Maarten van den Bos Caroline Sporleder

December 8th, 2014 HistoInformatics Workshop, Barcelona 21

Thank you for your attention