time series in r: forecasting and visualisation · ggplot2 package (for graphics) fma package (for...
TRANSCRIPT
1
Time Series in R:Forecasting andVisualisation
Time series in R
29 May 2017
Outline
1 ts objects
2 Time plots
3 Lab session 1
4 Seasonal plots
5 Seasonal or cyclic?
6 Lag plots and autocorrelation
7 Lab session 2
2
Time series
Time series consist of sequences of observationscollected over time.We will assume the time periods are equallyspaced.
Time series examplesDaily IBM stock pricesMonthly rainfallAnnual Google profitsQuarterly Australian beer production
3
ts objects and ts function
A time series is stored in a ts object in R:a list of numbersinformation about times those numbers were recorded.
Example
Year Observation
2012 1232013 392014 782015 522016 110
y <- ts(c(123,39,78,52,110), start=2012)
4
ts objects and ts function
For observations that are more frequent than onceper year, add a frequency argument.E.g., monthly data stored as a numerical vector z:
y <- ts(z, frequency=12, start=c(2003, 1))
5
ts objects and ts function
ts(data, frequency, start)
Type of data frequency start example
Annual 1 1995Quarterly 4 c(1995,2)Monthly 12 c(1995,9)Daily 7 or 365.25 1 or c(1995,234)Weekly 52.18 c(1995,23)Hourly 24 or 168 or 8,766 1Half-hourly 48 or 336 or 17,532 1
6
ts objects
Class: “ts”Print and plotting methods available.
ausgdp
## Qtr1 Qtr2 Qtr3 Qtr4## 1971 4612 4651## 1972 4645 4615 4645 4722## 1973 4780 4830 4887 4933## 1974 4921 4875 4867 4905## 1975 4938 4934 4942 4979## 1976 5028 5079 5112 5127## 1977 5130 5101 5072 5069## 1978 5100 5166 5244 5312## 1979 5349 5370 5388 5396## 1980 5388 5403 5442 5482## 1981 5506 5531 5560 5583## 1982 5568 5524 5452 5358## 1983 5303 5320 5408 5531## 1984 5624 5669 5697 5736## 1985 5811 5894 5952 5965## 1986 5943 5924 5935 5979## 1987 6035 6097 6167 6227## 1988 6256 6272 6295 6345## 1989 6413 6468 6497 6511## 1990 6514 6512 6490 6442## 1991 6390 6346 6328 6340## 1992 6362 6389 6433 6491## 1993 6541 6566 6602 6671## 1994 6765 6847 6890 6918## 1995 6962 7018 7083 7134## 1996 7173 7212 7242 7276## 1997 7332 7400 7478 7550## 1998 7618
7
ts objects
start(ausgdp)
## [1] 1971 3
end(ausgdp)
## [1] 1998 1
frequency(ausgdp)
## [1] 4
8
ts objects
Residential electricity sales
elecsales
## Time Series:## Start = 1989## End = 2008## Frequency = 1## [1] 2354 2380 2319 2469 2386 2569 2576 2763 2844## [10] 3001 3108 3358 3076 3181 3222 3176 3431 3527## [19] 3638 3655
9
ts objects
start(elecsales)
## [1] 1989 1
end(elecsales)
## [1] 2008 1
frequency(elecsales)
## [1] 1
10
fpp2
Main package used in this course> library(fpp2)
This loads:
some data for use in examples and exercisesforecast package (for forecasting functions)ggplot2 package (for graphics)fma package (for lots of time series data)expsmooth package (for more time series data)
11
Outline
1 ts objects
2 Time plots
3 Lab session 1
4 Seasonal plots
5 Seasonal or cyclic?
6 Lag plots and autocorrelation
7 Lab session 2
12
ts objects
autoplot(ausgdp)
5000
6000
7000
1975 1980 1985 1990 1995
Time
ausg
dp
13
Time plots
autoplot(a10) + ylab("$ million") + xlab("Year") +ggtitle("Antidiabetic drug sales")
10
20
30
1995 2000 2005
Year
$ m
illio
n
Antidiabetic drug sales
14
Outline
1 ts objects
2 Time plots
3 Lab session 1
4 Seasonal plots
5 Seasonal or cyclic?
6 Lag plots and autocorrelation
7 Lab session 2
15
Lab Session 1
16
Outline
1 ts objects
2 Time plots
3 Lab session 1
4 Seasonal plots
5 Seasonal or cyclic?
6 Lag plots and autocorrelation
7 Lab session 2
17
Time plot
autoplot(a10) + ylab("$ million") + xlab("Year") +ggtitle("Antidiabetic drug sales")
10
20
30
1995 2000 2005
Year
$ m
illio
n
Antidiabetic drug sales
18
Seasonal plot
ggseasonplot(a10, year.labels=TRUE,year.labels.left=TRUE) +ylab("$ million") +ggtitle("Seasonal plot: antidiabetic drug sales")
1991 19911992 1992199319931994 19941995 1995
1996 199619971997
1998199819991999
2000 2000
20012001
20022002
2003 20032004
20042005 2005
2006 2006
2007
2007
2008
2008
10
20
30
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
$ m
illio
n
Seasonal plot: antidiabetic drug sales
19
Seasonal polar plotsggseasonplot(a10, polar=TRUE) + ylab("$ million")
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan/
10
20
Month
$ m
illio
n
year1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
Seasonal plot: a10
20
Seasonal subseries plots
ggsubseriesplot(a10) + ylab("$ million") +ggtitle("Subseries plot: antidiabetic drug sales")
10
20
30
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
$ m
illio
n
Subseries plot: antidiabetic drug sales
21
Quarterly Australian Beer Production
beer <- window(ausbeer,start=1992)autoplot(beer)
400
450
500
1995 2000 2005 2010
Time
beer
22
Quarterly Australian Beer Production
ggseasonplot(beer,year.labels=TRUE)
1992
1993
19941995
1996
199719981999
20002001
2002
2003
2004
20052006
2007
20082009
2010
400
450
500
Q1 Q2 Q3 Q4
Quarter
Seasonal plot: beer
23
Quarterly Australian Beer Production
ggsubseriesplot(beer)
400
450
500
Q1 Q2 Q3 Q4
Quarter
beer
24
Outline
1 ts objects
2 Time plots
3 Lab session 1
4 Seasonal plots
5 Seasonal or cyclic?
6 Lag plots and autocorrelation
7 Lab session 2
25
Time series patterns
Trend pattern exists when there is a long-termincrease or decrease in the data.
Seasonal pattern exists when a series is influencedby seasonal factors (e.g., the quarter ofthe year, the month, or day of the week).
Cyclic pattern exists when data exhibit rises andfalls that are not of fixed period (durationusually of at least 2 years).
26
Time series patterns
autoplot(window(elec, start=1980)) +ggtitle("Australian electricity production") +xlab("Year") + ylab("GWh")
8000
10000
12000
14000
1980 1985 1990 1995
Year
GW
h
Australian electricity production
27
Time series patterns
autoplot(bricksq) +ggtitle("Australian clay brick production") +xlab("Year") + ylab("million units")
200
300
400
500
600
1960 1970 1980 1990
Year
mill
ion
units
Australian clay brick production
28
Time series patterns
autoplot(ustreas) +ggtitle("US Treasury Bill Contracts") +xlab("Day") + ylab("price")
86
88
90
0 20 40 60 80 100
Day
pric
e
US Treasury Bill Contracts
29
Time series patterns
autoplot(lynx) +ggtitle("Annual Canadian Lynx Trappings") +xlab("Year") + ylab("Number trapped")
0
2000
4000
6000
1820 1840 1860 1880 1900 1920
Year
Num
ber
trap
ped
Annual Canadian Lynx Trappings
30
Seasonal or cyclic?
Differences between seasonal and cyclic patterns:seasonal pattern constant length; cyclic patternvariable lengthaverage length of cycle longer than length ofseasonal patternmagnitude of cycle more variable thanmagnitude of seasonal pattern
The timing of peaks and troughs is predictable withseasonal data, but unpredictable in the long termwith cyclic data.
31
Seasonal or cyclic?
Differences between seasonal and cyclic patterns:seasonal pattern constant length; cyclic patternvariable lengthaverage length of cycle longer than length ofseasonal patternmagnitude of cycle more variable thanmagnitude of seasonal pattern
The timing of peaks and troughs is predictable withseasonal data, but unpredictable in the long termwith cyclic data.
31
Outline
1 ts objects
2 Time plots
3 Lab session 1
4 Seasonal plots
5 Seasonal or cyclic?
6 Lag plots and autocorrelation
7 Lab session 2
32
Example: Beer production
beer <- window(ausbeer, start=1992)gglagplot(beer)
33
Example: Beer production
lag 7 lag 8 lag 9
lag 4 lag 5 lag 6
lag 1 lag 2 lag 3
400 450 500 400 450 500 400 450 500
400
450
500
400
450
500
400
450
500
Quarter1
2
3
4
34
Lagged scatterplots
Each graph shows yt plotted against yt−k fordifferent values of k.The autocorrelations are the correlationsassociated with these scatterplots.
35
Autocorrelation
Results for first 9 lags for beer data:
r1 r2 r3 r4 r5 r6 r7 r8 r9
-0.102 -0.657 -0.060 0.869 -0.089 -0.635 -0.054 0.832 -0.108
ggAcf(beer)
−0.5
0.0
0.5
4 8 12 16
Lag
AC
F
Series: beer
36
Autocorrelation
r4 higher than for the other lags. This is due tothe seasonal pattern in the data: the peakstend to be 4 quarters apart and the troughs tendto be 2 quarters apart.r2 is more negative than for the other lagsbecause troughs tend to be 2 quarters behindpeaks.Together, the autocorrelations at lags 1, 2, . . . ,make up the autocorrelation or ACF.The plot is known as a correlogram
37
Trend and seasonality in ACF plots
When data have a trend, the autocorrelations forsmall lags tend to be large and positive.When data are seasonal, the autocorrelationswill be larger at the seasonal lags (i.e., atmultiples of the seasonal frequency)When data are trended and seasonal, you see acombination of these effects.
38
Aus monthly electricity production
elec2 <- window(elec, start=1980)autoplot(elec2)
8000
10000
12000
14000
1980 1985 1990 1995
Time
elec
2
39
Aus monthly electricity production
ggAcf(elec2, lag.max=48)
0.00
0.25
0.50
0.75
0 12 24 36 48
Lag
AC
F
Series: elec2
40
Google stock price
autoplot(goog)
400
500
600
700
800
0 200 400 600 800 1000
Time
goog
41
Google stock price
ggAcf(goog, lag.max=100)
0.00
0.25
0.50
0.75
1.00
0 20 40 60 80 100
Lag
AC
F
Series: goog
42
Which is which?
40
60
80
0 20 40 60
chir
ps p
er m
inut
e
1. Daily temperature of cow
7
8
9
10
11
1974 1976 1978
thou
sand
s
2. Monthly accidental deaths
200
400
600
1950 1952 1954 1956 1958 1960
thou
sand
s
3. Monthly air passengers
30
60
90
1860 1880 1900
thou
sand
s
4. Annual mink trappings
0.0
0.5
1.0
12 246 18
AC
F
A
0.0
0.5
1.0
5 10 15
AC
F
B
0.0
0.5
1.0
5 10 15
AC
F
C
0.0
0.5
1.0
12 246 18
AC
F
D
43
Outline
1 ts objects
2 Time plots
3 Lab session 1
4 Seasonal plots
5 Seasonal or cyclic?
6 Lag plots and autocorrelation
7 Lab session 2
44
Lab Session 2
45