break position errors in climate records ralf lindau & victor venema university of bonn germany
TRANSCRIPT
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Internal and External Variance
Consider the differences of one station compared to a neighbor reference.
Breaks are defined by abrupt changes in the station-reference time series.
Internal variancewithin the subperiods
External variancebetween the means of different
subperiods
Break criterion:Maximum external variance
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Decomposition of Variance
n total number of yearsN subperiodsni years within a subperiod
The sum of external and internal variance is constant.
Position errors
Two segments of lengths n1 and n2 with means x1 and x2.
A subsegment of length m with mean x0 is erroneously exchanged from segment 2 to segment 1.
x1 is strongly reduced, x2 differs slightly. x1 and x2 converge.
This reduces the external variance, and the wrong segmentation is rejected.
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Change of external variance
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
𝑛∆𝑣=− 𝑓 1𝑥12+ 𝑓 2𝑥2
2+2 𝑥0 ( 𝑓 1 𝑥1− 𝑓 2𝑥2 )+ 𝑓 0𝑥02
The change of external variance Dv
is only a function of the means and
lengths of the two segments and the
exchanged subsegment .
with
Express x0 by x2 plus scatter
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
The mean of the exchanged
subsegment x0 is equal to x2, the
segment mean where it stem
from, plus a random scatter
variable .d
𝑥0=𝑥2+𝛿 𝛿= 𝜎𝑚∑
𝑖=1
𝑚
𝛿𝑖 ,𝛿𝑖 𝒩(0,1)
d depends on the internal
variance s2 and the length m,
because it is a mean over m
random numbers.
with
𝛿 𝒩(0 ,𝜎2
𝑚)
Quadratic function for Dv
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Replace x0 by d and
normalize by the square of
the jump height d.
𝑛∆𝑣=− 𝑓 1 (𝑥1−𝑥2 )2+2 𝑓 1 (𝑥1−𝑥2 ) 𝛿+ ( 𝑓 2− 𝑓 1 ) 𝛿
2
𝑣∗≔𝑛∆𝑣𝑑2
=− 𝑓 1+2 𝑓 1𝜀+ 𝑓 0𝜀2
The change of the normalized external variance v*, which is the decision criterion for break
detection, is a quadratic function of a random variable ,e which depends on the signal to
noise ratio and the length of the exchanged segment .
𝜀 𝒩(0 ,1
4𝑚𝑆𝑁𝑅2 )
𝑆𝑁𝑅≔¿𝑑 /2∨ ¿𝜎¿
Zero points
If the parabola becomes positive, the
shift of the break position by m items
leads to increased external variance
so that this solution is preferred by
mistake.
Zero points at:
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
+12
−2 𝑓 1𝑓 0−12≅−
𝑛1𝑚
≅𝟐𝒎
Simulated data
10,000 random time series of length 100.
Internal s = 1
Jump height = 2
Data confirm the existence of different parabolae for different m.
But data coverage only for scatter near zero, never reaching the negative solution.
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
m=1
m=2
m=3
d
(
n D
v) /
4} SNR = 1
The negative solution
Typical situation:
SNR extreme low.
A drastically disturbed measurement near the break.
Its exchange leads to x1’ < x2 and x2’ > x1. The two means diverge so that the external variance grows.
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
X1
X1’
X2’ X2
The positive solution
A subsegment adjacent to the true break is randomly lifted by more than half of the jump height.
Including it to the neighboring segment will reduce the internal variance.
An erroneous break position is concluded.
Criterion: Maximum hatched area
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Mathematical formulation of the criterion:
Brownian motion with drift
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Drift = - SNR
d
s
Theoretical retrace
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Parabola equation
linear approximation around the zero point
inserting known slope and (positive) zero point
replacing f1 + f2 by 2m
multiplying by signal-to-noise ratio
Brownian motion with drift
Distribution of the time of the maximum of a Brownian motion
with drift
Strictly valid only for continuous processes.
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
𝑓 (𝑠 )=2[ 1√𝑠 𝜑 (𝜇√𝑠 )+𝜇Φ (𝜇√ 𝑠) ]×[ 1
√𝑡−𝑠𝜑 (𝜇√𝑡−𝑠)−𝜇Φ (−𝜇√𝑡−𝑠 )] ,0<𝑠<𝑡
Buffet , 2003, J Appl Math Stoch Anal
_ _ _ _ _ Buffet, 2003
0 0 0 Numerical simulation of a discrete Brownian motion with drift.
+ + + Complete break search simulation
SNR = 0.5
SNR = 1SNR = 2
Two more problems
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
𝑓 (𝑠 )=2[ 1√𝑠 𝜑 (𝜇√𝑠 )+𝜇Φ (𝜇√ 𝑠) ]×[ 1
√𝑡−𝑠𝜑 (𝜇√𝑡−𝑠)−𝜇Φ (−𝜇√𝑡−𝑠 )] ,0<𝑠<𝑡
Buffet , 2003
Hit rate is not accurately reproduced
Break errors are a two-sided symmetric process. Both, too early and too late breaks are possible.
Hit rate
The hit rate h can be estimated for all drifts d by:
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
true + + + estimated
h=1−Φ1−Φ2+Φ1
2
2
with
Two-sided processes
Deviations are caused by random scatter independently on both sides.
The hit rate h is reduced to h2.
One-sided deviations have the probability:
with + without competitor
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
For two-sided deviations the probability is halved, if a competitor occurs on the other side:
All other probabilities are reduced by
Practical application
The hit rate drops from
from 95% for SNR = 2
to 29% for SNR = 0.5
SNR > 1
becoming quickly very exact.
SNR < 1
becoming quickly very inexact.
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
true + + + estimated
SNR = 1
SNR = 2
SNR = 0.5
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Conclusions
• Break position errors can be described by the distribution of the time of maximum of a Brownian motion with drift.
• The drift parameter is equal to the signal to noise ratio, as given by the half jump height between and the internal standard deviation within homogeneous subperiods.
Hit rate simulation
The hit rate is the probability that the initial value is never exceeded.
For realistic drift sizes the value converges after a few steps.
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
Preliminary maximum
Instead of multiplying with h < 1, we can alternatively stop the summation earlier. k = 2 works well.
12th International Meeting on Statistical Climatology, Jeju, Korea – 24. June 2013
pik is defined as the probability that the kth member of a Brownian motion is the preliminary maximum after i steps.
The probability to be also the absolute maximum is lower by a factor of h.
Thus: