building a new political sphere? early european integration in dutch digitized newspapers mariona...

Post on 14-Dec-2015

221 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Building a New Political

Sphere? Early European Integration in Dutch Digitized Newspapers

Mariona Coll Ardanuy

Maarten van den Bos

Caroline Sporleder

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 2

Introduction

● October 29th, 2004: Treaty establishing a Constitution for Europe

● 2005: rejection by referendum in France and The Netherlands

● 2005: ratifying process is stopped due to lack in public support

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 3

European Integration Historiography

● Need of a transnational history of the public opinion of the European project

● Focus on the role of:– Non-state actors– Civil society– Public opinion

● Distance focus from interstate relations and government policies

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 4

The role of media

● Mass digitization of historical materials● Enhancement of digital techniques

Media: «an important but mostly overlooked player in integration history»,

H.J.Trenz (2008)

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 5

Approach

● Automatically extraction of networks of people mentioned in news stories– Weighted according to significance– Distributed according to co-occurrence– Dynamic to represent change– Containing most relevant concepts discussed

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 6

Social Networksin Historical Research

● Examples:

– Padgett, Ansell 1993: Rise and action of the Medici

– Rochat et al 2014: Rise of Venetian maritime empire

– Jackson 2014: Unseen relationships in medieval Scotland

● Graphs created manually

● Mostly do not use newspapers as source

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 7

Computational Background

● In European integration studies, computational techniques to map out public discourse are in the earliest stage:– de Roode (2012): 1000 editorials, by hand– Medrano (2003) and Meyer (2010): traditional

techniques applied to newspapers● Entity-centric analysis of data, popular in

quantitative literary analysis (Coll Ardanuy and Sporleder 2014)

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 8

Advantages of the network approach

● No rigid theory● Draws our attention to public debate● Bird's eye view of a period of time● Generator of hypotheses

9

The Data

● Digitized newspapers from Dutch Royal Library

● Three newspapers:

– De Tijd (catholic)

– Het Vrije Volk (socialist)

– De Telegraaf (no formal political affiliation)● Period: 1945-1955

● Restrict search: 'Europa', 'Europese', 'Europeaan'

● High OCR confidence

● Total number of articles: 6128

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 10

The Method: outline

● Obtaining the nodes– Human Name Recognition– Coreference Resolution

● Establishing the relations● Building the network

CLARIN-NeDiMAH Workshop, The Hague 11

Obtaining the Nodes:Human Name Recognition

● Stanford Named Entity Recognizer

– Training data for modern Dutch (CoNLL-2002 shared task)

– 309683 tokens, 3032 person names● Heuristics to increase recall:

– Create a list of 1650 titles or professions preceding human names:

● From wikipedia (Lijst_van_beroepen)● Expand list by capturing the uncapitalized● word between an age expression and a● capitalized word

– Capture any capitalized word after an age expression or a title or profession

● F-score improvement from 0.70 to 0.76

Ex 1: 'de 63-jarige Frank Donoghue'Ex 2: 'de kapitein Ben Shaw'Ex 3: 'de 21-jarige pianist Theo'

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 12

Obtaining the Nodes:Coreference Resolution

● For each identified human name, we keep:

– Age information

– Title or profession information● Coreference resolution per document by string matching:

– Assumption: two identical surface forms in the same article refer to the same person, we keep the less ambiguous referent

● We do not perform disambiguation of names, coreference in the whole dataset by string matching from least to most ambiguous, with considerations:

– Match initials, hypocorisms

– Age or title/profession conflict

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 13

Establishing the relations

● Edge: co-occurrence of nodes per article● Characteristics of the network:

– Undirected– Weighted– Dynamic

December 8th, 2014 HistoInformatics Workshop, Barcelona 14

Establishing the RelationsAttributes of the node

● We extract three attributes per node:– Other names by which the entity is referred– Year the entity was born– Title or profession preceding the entity

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 15

Establishing the RelationsAttributes of the edge

● We extract three attributes per pair of nodes:– Tf-idf weighting for the common documents– Most common words of the common documents– List of co-ocurring news articles

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 16

Fragment of a network

December 8th, 2014 CLARIN-NeDiMAH Workshop, The Hague 17

Building of the network

● Dynamic networks: succession of yearly static networks

● Python library Networkx to construct the networks● Network analysis software Gephi to visualize them

December 8th, 2014 HistoInformatics Workshop, Barcelona 18

Analysis: expected results

● Important presence of personae such as:

Robert Schuman, Dirk Stikker, Ernest Bevin, Konrad Adenauer, Winston Churchill, Georges Bidault, Willem Drees, Alcide de Gasperi, Jean Monnet

● Socialist newspaper gives more weight to local stories and less weight to big names– 10 most common nodes in De Tijd: 16%– 10 most common nodes in Het Vrije Volk: 10%

December 8th, 2014 HistoInformatics Workshop, Barcelona 19

Analysis: interesting results

● Importance of politics:– central actors in networks are continuously politicians– political process rather than economic

● Ideologically-motivated process:– process of peace– central concepts are: 'solidariteit' and 'gemeenschapszin'

● Continued centrality of American politicians even though from May 1950 the process was seen as a truly European matter

December 8th, 2014 HistoInformatics Workshop, Barcelona 20

Conclusions

● Ongoing work● Computational techniques to strengthen the

empirical foundations of new European integration history

● Network extraction:– Raises new questions– Allows more refined search– Furthers the scope of inquiry

● European Integration history is transnational, multilingual and ramified

December 8th, 2014 HistoInformatics Workshop, Barcelona 21

Thank you for your attention

top related