attention for de publieke omroep in newspapers: public broadcasting in the news - arima models
DESCRIPTION
Attention for de Publieke omroep in newspapers: public broadcasting in the news - Arima modelsTRANSCRIPT
Attention for de Publieke omroep in
newspapers
ARIMA models
Assignment 1
Mark Boukes ([email protected])5616298
1st semester 2010/2011Dynamic Data Analysis
Lecturer: Dr. R. VliegenthartNovember 18, 2010
Communication Science (Research MSc) Faculty of Social and Behavioural Sciences
University of Amsterdam
Table of contents
INTRODUCTION...................................................................................................................................................1
METHOD.................................................................................................................................................................1
RESULTS.................................................................................................................................................................2
CONCLUSION........................................................................................................................................................5
REFERENCES........................................................................................................................................................5
Appendix
IntroductionThe public broadcasting system in the Netherlands is part of a political discussion at this
moment. The new government needs to safe money and wants to do this among other things
also on this system, which should inform citizens in the country and provide them with
different opinions on (political) issues. It is interesting to see if in periods like this, where the
use and costs of public broadcasting are politically discussed, also the attention of other media
(their competitors) for this institution increases. Next to this it would be interesting to
compare the attention given to public broadcasting in the media, to the number of people
watching the programs on those channels. Shortly, having a time series dataset of the attention
given to the public broadcasting system in Dutch newspapers, can be a first step to research
on media and public agenda setting. The research questions for this paper were thus the
following:
Can a dataset be created that contains information about the amount of attention paid
in newspapers to De Publieke Omroep on different moments?
Can this dataset be used in a dynamic data analysis according to a ARIMA analysis?
What kind of ARIMA model should be used for that purpose?
MethodTo construct a data set containing information about the amount of attention paid to public
broadcasting in newspapers on different moments in time, a computer assisted content
analysis was conducted using the digital archive of LexisNexis. Articles were selected via the
Boolean search term hlead ("publieke omroep"), to find all articles containing this search term
in the title or lead paragraph, in the period 1 January 1990 until 12 November 2010, in three
newspapers, the largest popular newspaper de Telegraaf and the two largest quality
newspapers de Volkskrant and NRC Handelsblad. In total 3143 articles were found. Using a
SPSS syntax, the information given by LexisNexis was transformed into useable data, with
variables for the date when the article was published. Ultimately the items representing
articles were aggregated, to make it possible to see how many articles were published about
public broadcasting in a certain week. However the SPSS syntax did not take into account that
LexisNexis used Dutch words the last two years to indicate the date when an article was
published. Therefore I only used articles from 2008 and older. Furthermore, the syntax did
also not create items for the weeks in which no articles about public broadcasting were
published, resulting in many missing lags. Therefore I choose to use the period starting in
2004 (week 18) until 2008, and add the missing weeks by hand, resulting in 205 weekly
items.
1
ResultsThe created dataset is being analysed according to the ARIMA-framework described by
Vliegenthart (n.d.). The first step was to establish whether the data was stationary. Therefore,
a graph was made (see Figure 1) to inspect the data in a convenient manner and an augmented
Dickey Fuller (ADF) test was used. Because the result of this ADF (-10.387, p<0.001)
indicated that the null hypotheses of non-stationarity should be rejected, this series can be
considered stationary and makes it unnecessary to integrate the data.
Figure 1: Number of articles about public broadcasting in Dutch newspapers per week.
The next step was to predict the data as good as possible by accounting for its past, either with
autoregressive (AR) terms, moving average (MA) terms or both. This was done by inspecting
the autocorrelation (ACF) and partial autocorrelation functions (PACF). However those functions
did not show a clear pattern to base the building of the model on, though it seems to indicate the
use of a ARIMA (1,0,0) model as the ACF was decaying exponentially, while the PACF only
had a peak at the first lag and all successive lags were approximately zero (see Figure 2).
Figure 2: ACF and PCF of the number of articles in Dutch newspaper.
2
However significant results were found for the Ljung–Box Q test statistic for both the
residuals as well as the squared residuals for this ARIMA(1,0,0) model. This means that
residuals do not reflect white noise but autocorrelate and thus is the model not build suitably
yet. To explain more variance and reduce the amount of autocorrelation the model was
extended step-by-step, until the results of the Ljung–Box Q test were insignificant. This
resulted in a ARIMA(4,0,4) model. Though this model is not parsimonious at all, it reduces
the risks of autocorrelation and heteroscedasticity. However, three effects of the
autoregressive factors where not significant and thus should the model be rejected according
to McCleary and Hay (1980).
Because the last model did not satisfy the goal of making a parsimonious model that explains
the data rather well, I decided to integrate the data one time, even though this was not
necessary according to the ADF-test. By differencing the series (Yd=Yt - Yt-1), a new
dependent stationary (ADF: -20.374, p<0.001) variable was created (see Figures 3 and 4).
Repeating the steps as done before with the original weekly data until the most parsimonious
model was found, which did not reject the null hypothesis of the Ljung–Box Q test statistic,
resulted in a ARIMA (0,1,1)(1,0,0)4 model. This model takes seasonality into account; the
amount of attention might be correlated with the amount four weeks before (one month), for
example due to monthly press releases or monthly planned press room meetings. Remarkably
is that the effect of the autoregressive part of the model is negative for a lag of 4. This seems
to indicate that the attention to De Publieke Omroep seems to fluctuate in a monthly trend. But
this effect is negative, which makes it difficult to understand. It seems to indicate that on a
particular moment when one month before there was much attention for public broadcasting,
there now will be less attention and viceversa. The negative effect of the moving average at lag
1, means that a high peak in attention one week before will lead to reduced attention the next
week, so on average it comes back to the mean. However, according to McCleary and Hay
(1980), the regular and the seasonal factor should be of the same type, either autoregressive
factor or moving average. Therefore was this model rejected too.
Figure 3: Difference in number of articles (t-1) about public broadcasting in Dutch newspapers per week.
3
Figure 4: ACF and PCF of the difference in number of articles (t-1) in Dutch newspaper.
McCleary and Hay (1980) are in the examples they give also confronted with a time series
dataset, which is difficult to build a model for. An alternative solution they suggest is to
transform the series logarithmically, which should lead to more stationary variance. I did this
with the weekly, not integrated data too. The resulting series was also stationary according to
the Dickey Fuller test (ADF: -10.260, p<0.001) and as can be seen in Figure 5. Inspecting the
ACF and PACF this time showed an indication that a ARIMA (1,0,0) model is appropriate for
this time series (see Figure 6). The ACF was decaying exponentially, while the PACF only
had a peak at the first lag and almost all successive lags were approximately zero. The
ARIMA (1,0,0) model also resulted in Ljung–Box Q test statistics that were insignificant, so it
can be assumed that the residuals and squared residuals reflect white noise in this model. The
ARIMA (1,0,0) model of the log-transformed dataseries of attention to De Publieke Omroep
is therefore well specified. Table 1 in the appendix summarizes the results of all the different
models, with information on effects, Ljung–Box Q test statistics and fit statistics.
Figure 5: Log-transformed number of articles about public broadcasting in Dutch newspapers per week.
4
Figure 6: ACF and PCF of the log-transformed number of articles about public broadcasting per week.
ConclusionUsing LexisNexis, it was possible to create a dataset that contains information about the
amount of attention paid in newspapers to De Publieke Omroep on different moments. This
dataset initially seemed to have problems of autocorrelation in the (squared) residuals,
problems of parsimony and problems with ARIMA model building rules, which all make it
difficult to use it in a dynamic data analysis. The solution was log-transforming the data series
so a ARIMA model could be used with one autoregressive factor for lag 1, ARIMA(1,0,0), of
which the (squared) residuals reflected just white noise. The performed ARIMA analyses
suggest that the attention public broadcastings gets in newspapers is dependent on the
attention that has been given to it in the past. The amount of attention in a particular week to
De Publieke Omroep positively affects the amount of attention given to it in the next week.
Log-transforming the dataseries solves the problems that the original and integrated data
faced so, the data can be used in a dynamic data analysis.
ReferencesMcCleary, R., & Hay, R. (1980). Applied Time Series Analysis for the Social Sciences.
London: Sage.
Vliegenthart, R. (n.d.). Moving up. Applying aggregate level time series analysis in
communication science. Unpublished manuscript.
5
Appendix
Table 1. ARIMA models for weekly attention to the Publieke Omroep in Dutch newspapers
ARIMA (1,0,0) ARIMA (4,0,4) ARIMA(0,1,1)(1,0,0)4 ARIMA (1,0,0) (Log-transformed)
Constant 5.150 (.528)*** 5.159(.479)*** .003 1.434 (.066)***
AR(1) .300 ( .066)*** -1.565 (.330)*** .287 (.065)***
AR(2) -1.010 (.541)AR(3) -.576 (.468)AR(4) -.315 (.238) -.224 (.075)**
MA(1) 1.941 (.321)** -.896 (.037)***
MA(2) 1.683 (.582)**MA(3) 1.104 (.498)*MA(4) .552 (.252)*
Ljung-Box Q(20) residuals 38.01** 25.08 28.78 17.68
Ljung-Box Q(20) residuals² 34.93* 29.75 29.84 15.38
AIC 1126.57 1123.07 1127.63 406.54
BIC 1136.54 1152.98 1140.90 416.45
Note. Unstandardized coefficients. Standard errors in parentheses; * p<.05; ** p<.01; *** p<.001
6