detecting outliers at the end of the series using forecast intervals

18
Detecting outliers at the end of the series using forecast intervals Dario Buono Eurostat, Unit B.1: Methodology and corporate architecture Fabrice Gras Eurostat, Unit B.1: Methodology and corporate architecture Enrico Infante Eurostat, Unit C.1: National Accounts methodology, Sector Accounts, Financial Indicators Università degli Studi di Napoli Federico II, dipartimento di scienze economiche e statistiche Germana Scepi Università degli Studi di Napoli Federico II, dipartimento di scienze economiche e statistiche DSSR 2016, Napoli, 17-19 February 2016

Upload: dario-buono

Post on 15-Apr-2017

50 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Detecting outliers at the end of the series using forecast intervals

Detecting outliers at the end of the series using forecast intervals

Dario BuonoEurostat, Unit B.1: Methodology and corporate architecture

Fabrice GrasEurostat, Unit B.1: Methodology and corporate architecture

Enrico InfanteEurostat, Unit C.1: National Accounts methodology, Sector Accounts, Financial IndicatorsUniversità degli Studi di Napoli Federico II, dipartimento di scienze economiche e statistiche

Germana ScepiUniversità degli Studi di Napoli Federico II, dipartimento di scienze economiche e statistiche

DSSR 2016, Napoli, 17-19 February 2016

Page 2: Detecting outliers at the end of the series using forecast intervals

Eurostat

Content• Introduction

• Big Data?• Basic Idea

• Methodology• 3 Steps• Further Considerations

• Case Study• 1 – Parmigiano Reggiano• 2 – Compensation of Employees of Household Sector

• Research Findings

2

Page 3: Detecting outliers at the end of the series using forecast intervals

Eurostat

Introduction• Earlier version of this work was presented at the NTTS2013• It was aimed at identifying "market risk" for specific commodities

prices by using the forecast volatility degree• The presented methodology is then generalised to be re-used for

outlier identification and treatment when new observations occurs• This is a recurrent issue for the production of official statistics,

where a large amount of time series are to be validated

3

Page 4: Detecting outliers at the end of the series using forecast intervals

Eurostat

Introduction – Big Data?• Normally big data are defined according to their characteristics,

like volume, velocity, variety, timeliness, exhaustively, flexibility, etc. (Kitchin)

• The amount of data used in our analysis can be considered big from official statistics point of view in terms of volume and velocity (thousands of time series frequently updated, up to a daily basis)

• However, the methodology is designed for time series, so it can only be applied to data with a known structure

4

Page 5: Detecting outliers at the end of the series using forecast intervals

Eurostat

Introduction – Basic Idea

When the observed data differs considerably from the expected forecasted trend, then an outlier is identified

Information about which type of outlier could also be derived

5

Identification of the model

Estimating forecast intervals

Detecting the volatility degree

Step 1

Step 2

Step 3

Page 6: Detecting outliers at the end of the series using forecast intervals

Eurostat

Methodology: Step 1• The first Step is to model the price series Xt without the last r

observations, using a seasonal ARIMA(p,d,q)(P,D,Q):

• The model is dynamic in the sense that it is estimated every time new information is available

• For pre-treatment (already known outliers, calendar effects, etc.), the RegARIMA model is used

6

**11 tPs

QtDsd

ps

P BBXBBBB

rtt *

Page 7: Detecting outliers at the end of the series using forecast intervals

Eurostat

Methodology: Step 2• In the second Step, for each h=(1,…,r) observations not

considered in the model during Step 1, the SARIMA forecast intervals at 5% level are computed:

• The parameter r should be selected by the user. Our suggestion is to consider r=3 in the case of monthly series

• A dynamic selection of r could be used: starting with r=1, the algorithm goes to r=2 in case the last observation is an outlier, continuing till it stops finding outliers

7

heVARzhx tt *2*ˆ

Page 8: Detecting outliers at the end of the series using forecast intervals

Eurostat

Methodology: Step 3• In the third Step, the observed values at time t*+h are compared

with the forecast intervals computed during the second Step• If the observed value at time t*+h is not inside the forecast

interval at time t*+h, then the outlier is detected and should be analysed

8

Value

Observed Value Forecast interval

OutlierNOT detected

Outlier detected

Page 9: Detecting outliers at the end of the series using forecast intervals

Eurostat

Methodology: Further Considerations

When the observed value falls outside the forecast interval, it is classified as an outlier. The type of outliers is detected by looking at all the r intervals together

The table shown here describes how to detect an outlier in the case of r=3

9

t*+1 t*+3 t*+3 Outlier

Is the pric

e outsidethe interval

?

Y N N AON Y N AON N Y AOY Y N TCY N Y AO (2)N Y Y LSY Y Y LSN N N -

Page 10: Detecting outliers at the end of the series using forecast intervals

Eurostat

Case Study: Prices of Parmigiano/1• As a first case study, the price time series of the Italian Parmigiano

Reggiano is analysed. The time span is from January 2000 to June 2012 (150 observations)

10

Page 11: Detecting outliers at the end of the series using forecast intervals

Eurostat

Case Study: Prices of Parmigiano/2• The first step is to model the series without the last r=3

observations• The model estimated is a SARIMA(2,1,0)(0,0,0)

• The forecast intervals are computed on the r=3 forecasted values of the series with 147 observations, and then compared with the observed prices

11

Month Ob. Price MIN MAXApr-12 9.57 9.66 10.07May-12 9.23 9.37 10.30Jun-12 9.20 9.10 10.56

Page 12: Detecting outliers at the end of the series using forecast intervals

Eurostat

Case Study: Prices of Parmigiano/3• The basic idea is that when the observed data differs considerably

from the expected forecasted trend, then the commodity risk may be present

12

The observed price is outside the forecast interval in April and May 2012, but it is inside the interval in June 2012

A transitory change has been identified in April 2012

Page 13: Detecting outliers at the end of the series using forecast intervals

Eurostat

Case Study: D1R_S1M/1 • As a second case study, the compensation of employees received

by household sector is analysed. The time span is from the first quarter of 1999 to the third quarter of 2009 (43 observations)

13

Page 14: Detecting outliers at the end of the series using forecast intervals

Eurostat

Case Study: D1R_S1M/2

14

Quarter Ob. Value MIN MAX2009Q1 1076641 1099909 11224992009Q2 1133088 1158914 11904982009Q3 1090471 1108281 1145282

• The first step is to model the series without the last r=3 observations

• The model estimated is a SARIMA(0,1,1)(0,1,1)

• The forecast intervals are computed on the r=3 forecasted values of the series with 40 observations, and then compared with the observed prices

Page 15: Detecting outliers at the end of the series using forecast intervals

Eurostat

Case Study: D1R_S1M/3

15

• The basic idea is that when the observed data differs considerably from the expected forecasted trend, then the outlier is detected

The observed price is outside the forecast interval in all the three quarters considered

A level shift has been identified in 2009Q1......or before?

Page 16: Detecting outliers at the end of the series using forecast intervals

Eurostat

Case Study: D1R_S1M/4

16

Using a dynamic selection of r, we arrived to select r=5, identifying when the level shift starts

Page 17: Detecting outliers at the end of the series using forecast intervals

Eurostat

Research Findings• When assessing the quality of big sets of time series, it is vital to

have an automatic procedure which allows a detection of outliers within the end-series observations, as analysts usually tend to focus their attention on the most recent part of the time series

• National statistical offices, among other organisations, face this challenge on a daily basis

• This paper proposes a possible approach to identify the presence of outliers within the end-series observations using forecast intervals

• The model used is updated every time new information is available. We aim at finding the best estimation possible, as close as possible to the truth (Giovannini)

17

Page 18: Detecting outliers at the end of the series using forecast intervals

Eurostat

18

Thank you for your attention! Благодаря ви за вниманието! Tack för er uppmärksamhet!Děkuji vám za pozornost! Tak for jeres opmærksomhed!Dank u voor uw aandacht! Tänan tähelepanu eest!Kiitos huomiota! Merci pour votre attention!Vielen Dank für Ihre Aufmerksamkeit! Σας ευχαριστώ για τηνπροσοχή σας!Köszönöm a figyelmet! Go raibh maith agat as do aird!Grazie per l'attenzione! Paldies par jūsu uzmanību!Ačiū už Jūsų dėmesį! Grazzi għall-attenzjoni tiegħek! Takk foroppmerksomheten!Dziękuję za uwagę! Obrigado pela vossa atenção!Vă mulţumesc pentru atenţie! Ďakujem vám za pozornosť!Hvala za vašo pozornost! Gracias por su atención!