attention for de publieke omroep in newspapers: public broadcasting in the news - arima models

Attention for de Publieke omroep in

newspapers

ARIMA models

Assignment 1

Mark Boukes ([email protected])5616298

1st semester 2010/2011Dynamic Data Analysis

Lecturer: Dr. R. VliegenthartNovember 18, 2010

Communication Science (Research MSc) Faculty of Social and Behavioural Sciences

University of Amsterdam

Table of contents

INTRODUCTION...................................................................................................................................................1

METHOD.................................................................................................................................................................1

RESULTS.................................................................................................................................................................2

CONCLUSION........................................................................................................................................................5

REFERENCES........................................................................................................................................................5

Appendix

IntroductionThe public broadcasting system in the Netherlands is part of a political discussion at this

moment. The new government needs to safe money and wants to do this among other things

also on this system, which should inform citizens in the country and provide them with

different opinions on (political) issues. It is interesting to see if in periods like this, where the

use and costs of public broadcasting are politically discussed, also the attention of other media

(their competitors) for this institution increases. Next to this it would be interesting to

compare the attention given to public broadcasting in the media, to the number of people

watching the programs on those channels. Shortly, having a time series dataset of the attention

given to the public broadcasting system in Dutch newspapers, can be a first step to research

on media and public agenda setting. The research questions for this paper were thus the

following:

Can a dataset be created that contains information about the amount of attention paid

in newspapers to De Publieke Omroep on different moments?

Can this dataset be used in a dynamic data analysis according to a ARIMA analysis?

What kind of ARIMA model should be used for that purpose?

MethodTo construct a data set containing information about the amount of attention paid to public

broadcasting in newspapers on different moments in time, a computer assisted content

analysis was conducted using the digital archive of LexisNexis. Articles were selected via the

Boolean search term hlead ("publieke omroep"), to find all articles containing this search term

in the title or lead paragraph, in the period 1 January 1990 until 12 November 2010, in three

newspapers, the largest popular newspaper de Telegraaf and the two largest quality

newspapers de Volkskrant and NRC Handelsblad. In total 3143 articles were found. Using a

SPSS syntax, the information given by LexisNexis was transformed into useable data, with

variables for the date when the article was published. Ultimately the items representing

articles were aggregated, to make it possible to see how many articles were published about

public broadcasting in a certain week. However the SPSS syntax did not take into account that

LexisNexis used Dutch words the last two years to indicate the date when an article was

published. Therefore I only used articles from 2008 and older. Furthermore, the syntax did

also not create items for the weeks in which no articles about public broadcasting were

published, resulting in many missing lags. Therefore I choose to use the period starting in

2004 (week 18) until 2008, and add the missing weeks by hand, resulting in 205 weekly

items.

1

ResultsThe created dataset is being analysed according to the ARIMA-framework described by

Vliegenthart (n.d.). The first step was to establish whether the data was stationary. Therefore,

a graph was made (see Figure 1) to inspect the data in a convenient manner and an augmented

Dickey Fuller (ADF) test was used. Because the result of this ADF (-10.387, p<0.001)

indicated that the null hypotheses of non-stationarity should be rejected, this series can be

considered stationary and makes it unnecessary to integrate the data.

Figure 1: Number of articles about public broadcasting in Dutch newspapers per week.

The next step was to predict the data as good as possible by accounting for its past, either with

autoregressive (AR) terms, moving average (MA) terms or both. This was done by inspecting

the autocorrelation (ACF) and partial autocorrelation functions (PACF). However those functions

did not show a clear pattern to base the building of the model on, though it seems to indicate the

use of a ARIMA (1,0,0) model as the ACF was decaying exponentially, while the PACF only

had a peak at the first lag and all successive lags were approximately zero (see Figure 2).

Figure 2: ACF and PCF of the number of articles in Dutch newspaper.

2

However significant results were found for the Ljung–Box Q test statistic for both the

residuals as well as the squared residuals for this ARIMA(1,0,0) model. This means that

residuals do not reflect white noise but autocorrelate and thus is the model not build suitably

yet. To explain more variance and reduce the amount of autocorrelation the model was

extended step-by-step, until the results of the Ljung–Box Q test were insignificant. This

resulted in a ARIMA(4,0,4) model. Though this model is not parsimonious at all, it reduces

the risks of autocorrelation and heteroscedasticity. However, three effects of the

autoregressive factors where not significant and thus should the model be rejected according

to McCleary and Hay (1980).

Because the last model did not satisfy the goal of making a parsimonious model that explains

the data rather well, I decided to integrate the data one time, even though this was not

necessary according to the ADF-test. By differencing the series (Yd=Yt - Yt-1), a new

dependent stationary (ADF: -20.374, p<0.001) variable was created (see Figures 3 and 4).

Repeating the steps as done before with the original weekly data until the most parsimonious

model was found, which did not reject the null hypothesis of the Ljung–Box Q test statistic,

resulted in a ARIMA (0,1,1)(1,0,0)4 model. This model takes seasonality into account; the

amount of attention might be correlated with the amount four weeks before (one month), for

example due to monthly press releases or monthly planned press room meetings. Remarkably

is that the effect of the autoregressive part of the model is negative for a lag of 4. This seems

to indicate that the attention to De Publieke Omroep seems to fluctuate in a monthly trend. But

this effect is negative, which makes it difficult to understand. It seems to indicate that on a

particular moment when one month before there was much attention for public broadcasting,

there now will be less attention and viceversa. The negative effect of the moving average at lag

1, means that a high peak in attention one week before will lead to reduced attention the next

week, so on average it comes back to the mean. However, according to McCleary and Hay

(1980), the regular and the seasonal factor should be of the same type, either autoregressive

factor or moving average. Therefore was this model rejected too.

Figure 3: Difference in number of articles (t-1) about public broadcasting in Dutch newspapers per week.

3

Figure 4: ACF and PCF of the difference in number of articles (t-1) in Dutch newspaper.

McCleary and Hay (1980) are in the examples they give also confronted with a time series

dataset, which is difficult to build a model for. An alternative solution they suggest is to

transform the series logarithmically, which should lead to more stationary variance. I did this

with the weekly, not integrated data too. The resulting series was also stationary according to

the Dickey Fuller test (ADF: -10.260, p<0.001) and as can be seen in Figure 5. Inspecting the

ACF and PACF this time showed an indication that a ARIMA (1,0,0) model is appropriate for

this time series (see Figure 6). The ACF was decaying exponentially, while the PACF only

had a peak at the first lag and almost all successive lags were approximately zero. The

ARIMA (1,0,0) model also resulted in Ljung–Box Q test statistics that were insignificant, so it

can be assumed that the residuals and squared residuals reflect white noise in this model. The

ARIMA (1,0,0) model of the log-transformed dataseries of attention to De Publieke Omroep

is therefore well specified. Table 1 in the appendix summarizes the results of all the different

models, with information on effects, Ljung–Box Q test statistics and fit statistics.

Figure 5: Log-transformed number of articles about public broadcasting in Dutch newspapers per week.

4

Figure 6: ACF and PCF of the log-transformed number of articles about public broadcasting per week.

ConclusionUsing LexisNexis, it was possible to create a dataset that contains information about the

amount of attention paid in newspapers to De Publieke Omroep on different moments. This

dataset initially seemed to have problems of autocorrelation in the (squared) residuals,

problems of parsimony and problems with ARIMA model building rules, which all make it

difficult to use it in a dynamic data analysis. The solution was log-transforming the data series

so a ARIMA model could be used with one autoregressive factor for lag 1, ARIMA(1,0,0), of

which the (squared) residuals reflected just white noise. The performed ARIMA analyses

suggest that the attention public broadcastings gets in newspapers is dependent on the

attention that has been given to it in the past. The amount of attention in a particular week to

De Publieke Omroep positively affects the amount of attention given to it in the next week.

Log-transforming the dataseries solves the problems that the original and integrated data

faced so, the data can be used in a dynamic data analysis.

ReferencesMcCleary, R., & Hay, R. (1980). Applied Time Series Analysis for the Social Sciences.

London: Sage.

Vliegenthart, R. (n.d.). Moving up. Applying aggregate level time series analysis in

communication science. Unpublished manuscript.

5

Appendix

Table 1. ARIMA models for weekly attention to the Publieke Omroep in Dutch newspapers

ARIMA (1,0,0) ARIMA (4,0,4) ARIMA(0,1,1)(1,0,0)4 ARIMA (1,0,0) (Log-transformed)

Constant 5.150 (.528)*** 5.159(.479)*** .003 1.434 (.066)***

AR(1) .300 ( .066)*** -1.565 (.330)*** .287 (.065)***

AR(2) -1.010 (.541)AR(3) -.576 (.468)AR(4) -.315 (.238) -.224 (.075)**

MA(1) 1.941 (.321)** -.896 (.037)***

MA(2) 1.683 (.582)**MA(3) 1.104 (.498)*MA(4) .552 (.252)*

Ljung-Box Q(20) residuals 38.01** 25.08 28.78 17.68

Ljung-Box Q(20) residuals² 34.93* 29.75 29.84 15.38

AIC 1126.57 1123.07 1127.63 406.54

BIC 1136.54 1152.98 1140.90 416.45

Note. Unstandardized coefficients. Standard errors in parentheses; * p<.05; ** p<.01; *** p<.001

6

attention for de publieke omroep in newspapers: public broadcasting in the news - arima models

News & Politics