bayesian model selection approach to boundary detection ... · bayesian model selection approach to...
TRANSCRIPT
Bayesian Model Selection Approach to BoundaryDetection with Non-Local Priors
Anonymous Author(s)AffiliationAddressemail
Abstract
We propose a Bayesian model selection (BMS) boundary detection procedure using1
non-local prior distributions for a sequence of data with multiple systematic mean2
changes. By using the non-local priors in the Bayesian model selection framework,3
the BMS method can effectively suppress the non-boundary spike points with large4
instantaneous changes. Further, we speed up the algorithm by reducing the multiple5
change points to a series of single change point detection problems. We establish6
the consistency of the estimated number and locations of the change points under7
various prior distributions. Extensive simulation studies are conducted to compare8
the BMS with existing methods, and our method is illustrated with application to9
the magnetic resonance imaging guided radiation therapy data.10
1 Introduction11
Traditional change point detection algorithms often handle the cases where the occurrent frequency of12
the change points is relatively consistent across the signal. For example, the popular narrowest-over-13
threshold (NOT) [1] algorithm is suitable under the setting where the different segments between the14
change points have comparable lengths, while the stepwise marginal likelihood (SML) method [5]15
works the best to identify the frequent change points. However, it is often case that the gaps between16
the consecutive change points are dramatically different, while only ones with certain distances are of17
interest. In this paper, we develop a computational efficient Bayesian model selection algorithm for18
identifying the change points under this specific setting.19
The inconsistent gaps between the change points can be observed from the signals generated by the20
magnetic resonance imaging guided radiation therapy (MRgRT). One problem with MRgRT is that21
when radiation traveled in the magnetic field, the dose level can be significantly enhanced near the22
boundaries between different tissues or organs in human bodies. The Duke Mid-sized Optical-CT23
System (DMOS) (See Figure A.1 in Appendix) was developed to identify the dose changes near the24
region of this boundary artifact. The right panel in Figure A.1 contains one radiation dose, where we25
can observe that the boundaries on and inside the dosimeter can be distinguished by noticeable peaks26
of the radiation dose levels. In the experiment, multiple radiations enter the cylindrical dosimeter from27
different directions, and a sequence of data ordered by their distances to the sources was collected.28
Because the dosimeter is a circle and the cavity is in the middle, the radiation from different directions29
would hit the boundary at similar distances away from their sources in the dosimeter.30
In the MRgRT data, radiations in certain directions may experience temporary changes at non-31
boundary locations, which may result from the abnormal status of the DMOS system rather than the32
true dose changes. The temporary change points, appear in the data sequence as the spike points,33
often mixed up with the ones on the boundary (the peak locations in Figure A.1) which makes the34
boundary detection extremely challenging. Figures A.2 in Appendix shows the change points in the35
MRgRT data identified by the popular NOT [1] and the SML [5] algorithms, respectively. It can be36
Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017). Do not distribute.
seen that neither of the algorithms correctly identify the true boundaries. This motivate us to propose37
a new algorithm in detecting the systematic changes when the segment length can have dramatic38
differences.39
As shown in the preliminary analysis of the MRgRT data, the local control of the discovery is crucial.40
To avoid picking the spike points, we enforce minimal distances between the change points. Moreover,41
we adopt the computational efficient local scan routine and propose a systematic two-stage procedure42
to speed up the change point detection procedure. More specifically, the local scan method first43
identifies the candidate points with minimal distance based on the local data, and then optimizes44
an utility function to obtain the estimates for the locations and the total number of change points.45
Because the change points are defined based on the mean changes in two consecutive segments, the46
local data are sufficient to detect the systemic changes [6, 7, 8, 13, 14, 16, 18, 19].47
To perserve the positive detection rate of the change points and reduce the false detection rate of the48
non-change points, we utilize a Bayesian marginal likelihood function as the utility, and propose a49
new Bayesian model selection (BMS) procedure for identifying change points. In particular, we show50
that the selection consistency is achieved by choosing both the local [2, 3, 4, 17] and non-local priors51
[11], whereas the convergence rate is faster under the non-local priors.52
Our BMS procedure is cast in the model selection framework, which is faster than the dynamic53
programming introduced under the SML framework. For example, for a single sequence in the54
MRgRT data, BMS takes roughly 1.3 seconds and SML takes 3.1 seconds when the maximum55
number of change points is capped at 100. The major factor that contributes to the efficiency of56
BMS algorithm is the fact that BMS reduces the search space dramatically by selecting a small set of57
candidate change-points. Further, once the candidate points are selected, BMS only needs to evaluate58
two consecutive segments at a time, which allows the algorithms to be implemented in parallel.59
2 Bayesian multiple change points detection60
2.1 Probability model61
Suppose there are p0 true change points t1 ¤, . . . ,¤ tp0 among n observations Yn � tY1, . . . , Ynu.62
As a convention, let t0 � 1 and tpp0�1q � n� 1. Denote λj � tj�1 � tj and λ � minj�0,...,p0 λj .63
We consider a set of Kn candidate points τ1, . . . , τKn , with τ0 � 1 and τKn�1 � n � 1, while64
postponing the discussion on the candidate set selection to Section 2.4. Define nj � τj�1 � τj ,65
nI � minj�0,...,Kn�1 nj , nI ¤ λ. Let HpnIq � tτj : j � 1, . . . ,Kn, |τj�1 � τj | ¡ nIu denote66
the set of all candidate points and T0pp0q � ttj : j � 1, . . . , p0u be the set of true change points.67
The specification of the candidate points allows the BMS method to be implemented in a lower68
dimensional space with the most influential points. It also guarantees that there are a sufficient69
number of non-change points surrounding the true ones so that the consistency conditions are met.70
The probability model takes the form of71
Yl � ντj � εl, l P rτj , τj�1q,where the random errors εl are independent with mean zero and variance σ2
j . Further, we define72
σ � maxj�0,...,p0 σj .73
For ease of exposition, we first consider the situation where the locations of the candidate change74
points are given and T0pp0q � HpnIq. Define Yτj � n�1j�1
°τj�1l�τj�1
Yl, which is the sample average75
for the pj � 1qth segment rτj�1, τjq, j ¡ 1. Suppose the candidate point τk is not a change point,76
then the points in rτk, τk�1q have the same mean as those in rτk�1, τkq. If τk is a change point, we77
expect a mean shift between the segments rτk, τk�1q and rτk�1, τkq. Hence, we can formulate the78
model and prior distribution for l ¥ τ1 as follows:79
Yl � Yτk � µk � ξl, l P rτk, τk�1q,µk � πpµkq, if τk is a change point,µk � 0, with probability 1, if τk is an nI -flat point,
where πp�q is a prior density and ξl is a mean-zero error. Note that the observations in the first80
segment are unchanged. Here the nI -flat point is defined as a non-change point which is at least nI81
apart from any change points. We require the nI distance between the true change points and the flat82
ones so that there are sufficient neighborhood samples to achieve the estimation consistency.83
2
Let µk0 be the true value of µk, and we assume |µk0| ¡ δ, where δ ¡ 0 is the lower bound of84
the µk0, for the k’s with τk P T0pp0q. The prior distribution on µk is crucial for determining the85
convergence rate of the BML procedure. We explore three types of priors: the local prior [9], the86
non-local moment prior and the inverse moment prior in [11].87
Local: πLpµq � Np0, ω2qMoment: πM pµq � µ2v{CM1{
?2π expp�µ2{2q
Inverse moment: πIpµq � sνq{2{Γpq{2sqµ�pq�1q exp �pµ2{νq�s( ,
where CM is the normalizing constant.88
Let Mk represent the model that τk is the sole change point. We define the marginal likelihood with89
the Gaussian kernel as90
PrpYn|Mkq �Kn¹
j�1,j�k
τj�1�1¹l�τj
expt�pYl � Yτj q2u» τk�1�1¹
l�τk
expt�pYl � Yτk � µq2uπpµqdµ.
The posterior model probability of Mk given Yn is91
PrpMk|Ynq � PrpYn|MkqPrpMkq°Knj�1 PrpYn|MjqPrpMjq
� PrpYn|Mkq°Knj�1 PrpYn|Mjq
, (1)
when Mj assumes a non-informative uniform prior, j � 1, . . . ,Kn. Note that it is not necessary for92
Yn to be normally distributed to ensure the selection consistency in detecting mean changes. The93
Gaussian kernel serves as a utility function, which tends to be large when the distances between the94
true and the hypothetical segment means are small. Hence, as n Ñ 8, PrpMk|Ynq approaches 195
when τk is indeed a change point and the τj’s pj � kq are nI -flat points.96
2.2 Change point detection97
We start with the simplest case that there is only one mean shift in the data, i.e., p0 � 1 is fixed98
a priori. We select the candidate point τk associated with the largest PrpMk|Ynq among all the99
candidates, which corresponds to the largest marginal likelihood PrpYn|Mkq. It can be shown that100
PrpMk|Ynq �#
1�Kn
j�k
PrpYn|MjqPrpYn|Mkq
+�1
,
where101
PrpYn|Mjq �³±τj�1�1
l�τjexpt�pYl � Yτj � µq2uπpµqdµ±τj�1�1
l�τjexpt�pYl � Yτj q2u
PrpYn|Mkq �³±τk�1�1
l�τkexpt�pYl � Yτk � µq2uπpµqdµ±τk�1�1
l�τkexpt�pYl � Yτkq2u
.
The two marginal likelihoods indicate that the selection consistency is fully determined by the102
evidence in favor of µk � πpµkq and that of µj � 0 for j � k. The product on the right hand side103
converges to 0 when nI grows to 8 with the sample size. The candidate points retain enough samples104
in the neighborhood to guarantee the convergency.105
For the case with multiple change points (p0 ¡ 1), we select the points associated with the p0 largest106
PrpMk|Ynq, and the selection consistency is presented as follows.107
Theorem 1. Let M � tMk, τk P T0pp0qu. If the Bayes factor satisfies108
PrpYn|Mjq � Oppanj q (2)
for τj R T0pp0q, anj � opp1q, and n1{2I δ{σ Ñ8, then109 ¸MkPM
PrpMk|Ynq � 1�OptKnanI expp�nIδ2qu. (3)
Hence when nI{logpnq Ñ c, 0 c ¤ 8, nI ¤ λ, we have¸MkPM
PrpMk|Ynq pÑ 1.
3
The proof of Theorem 1 is delineated in the Appendix. Clearly, the selection consistency depends110
on the convergence rate of anI , which is determined by the choice of the prior πp�q. Lemmas 2–4111
in the Appendix show that anj � n�1{2j when πp�q � πLpµq; anj � n
�v�1{2j if πp�q � πM pµq and112
anj � expt�ns{ps�1qj u if πp�q � πIpµq. Hence, the selection consistency is achieved at the fastest113
rate using the non-local inverse moment prior.114
2.3 Number of change points115
When p0 is unknown, let T ppq be the set containing p points from the procedure described in Section116
2.2. We define the marginal likelihood given T ppq, that is PrtYn|T ppqu as117
¹τjRT ppq
τj�1�1¹l�τj
expt�pYl � Yτj q2u¹
τkPT ppq
» τk�1�1¹l�τk
expt�pYl � Yτk � µq2uπpµqdµ.
We can estimate the locations and the number of change points in two steps: (i) for any given118
p, we obtain pT ppq using the procedure described in the previous section, and (ii) we estimate p0119
by pp via maximizing PrtYn|pT ppqu with respect to p. In contrast to the procedure in [5] which120
simultaneously estimates the locations and the number of change points by maximizing the marginal121
likelihood with respect to T ppq and p, our BMS splits the estimation procedure into a scanning step122
as described in Section 2.2 and an optimization step. This scanning–optimization mechanism reduces123
the computational burden substantially, because the optimization in step (ii) is merely implemented124
in a single dimension.125
2.4 Candidate points selection126
Previous discussions rely upon a critical assumption that the candidate points are specified in advance,127
which, however, is often infeasible in practice. To facilitate the implementation of the BMS, we128
need to find a candidate set HcpnIq that is close to HpnIq. For the selection consistency of the129
change points, we require for each tj there is a τk P HcpnIq, such that Prp|tj � τk| ¤ nIq �130
1�Oprmintexpp�nIδ2q, anI us. Define131
Ri �³±i�nI�1
l�i expt�pYl � Yi � µq2uπpµqdµ±i�nI�1l�i expt�pYl � Yiq2u
,
where Yi � pnI � 1q�1°i�1j�i�nI
Yj . By the argument similar to that in Lemma 1, Ri goes to infinity132
when i is a true change point, and Ri Ñ 0 in probability, when i is an nI -flat point. Hence, the value133
of Ri can distinguish a change point from a set of nI -flat points. To further eliminate the non-change134
points that are also not nI -flat, we implement the non-maximum suppression that removes away the135
points which do not give the largest Ri’s in their nI -neighborhood. More specifically, the screening136
procedure of selecting candidate points is described as follows.137
Algorithm 1 : Screening(1) For each i in rnI , n� nI s, compute Ri.(2) If Ri � maxtRj : j P pi� nI , i� nI su, i is selected as a candidate point.(3) After scanning through the data sequence, a set of Kn candidate points, denoted as HcpnIq,
is obtained.
This screening algorithm is comparable to that in [15], because by the Laplace approximation, we138
can write139
Ri � Dn
±i�nI�1l�i expt�pYl � Yi � µ�q2uπpµ�q±i�nI�1
l�i expt�pYl � Yiq2ut1� opp1qu
� Dn exp
#2
�i�nI�1¸l�i
Yl �i�1
j�i�nI
Yj
�µ� � nIµ
�2
+πpµ�qt1� opp1qu,
4
where Dn is a constant of order Oppn�1{2I q and µ� is the maximizer of �°i�nI�1
l�i pYl � Yi �140
µq2 � logπpµq. Clearly, the magnitude of the leading term in Ri is strongly associated with141
n�1I
�°i�nI�1l�i Yl �
°i�1j�i�nI
Yj
, which is the local diagnosis function with h � nI in [15].142
Next, we show the screening procedure identifies a set HcpnIq that would lead to the change point143
consistency.144
Proposition 1. Assume n1{2I δ{σ Ñ 8. For each tj P T0pp0q, there is a τ P HcpnIq such that145
Prttj P pτ � nI , τ � nIqu � 1�Ormintexpp�nIδ2q, anI us.146
In theory, i � tj maximizes Ri in the nI -neighborhood of tj asymptotically. By selecting the local147
maximal Ri in the screening procedure, HcpnIq would cover the nI -neighborhood of T0pp0q as148
nÑ8. Also the condition n1{2I δ{σ Ñ8 indicates that the effect size cannot be too small in order149
to find the candidate points around the true change points. After selecting the candidate points, we150
perform the refinement step to identify the locations and the total number of change points.151
Algorithm 2 : RefinementScanning
(1) Compute PrpYn|Mkq by scanning over all the candidate points in HcpnIq.(2) For each p, we obtain a set of change points pT ppq corresponding to the p largest PrpYn|Mkq,
k � 1, . . . ,Kn.Optimization
(3) Select pp that maximizes PrtYn|pT ppqu.Theorem 2. Assume that nI{logpnq Ñ c, 0 c ¤ 8, n1{2I δ{σ Ñ 8, lim supnÑ8 nI{λ 1{2,and equation (2) holds. Let HcpnIq be the set containing candidate points such that |τk�1�τk| ¡ nI ,and for each tj there is a τk P HcpnIq, Prp|tj � τk| ¤ nIq � 1 � Oprmintexpp�nIδ2q, anI us,Then,
Prppp � p0q � 1�Oprmaxtexpp�nIδ2q, anI us,
and furthermore,152
Pr
#sup
ptjP pT pppqinf
tjPT0pp0q|pptj � tjq{n| ¤ nI{n
+� 1�Otexpp�nIδ2qu.
and153
Pr
#sup
tjPT0pp0q
infptjP pT pppq
|pptj � tjq{n| nI{n+� 1�OpanI q.
Theorem 2 shows that BMS controls both the over- and under-segmentation errors. The in-154
trinsic rationale is that for any T ppq different from T0pp0q, there is at least a chosen point155
τ P T ppq whose nI -neighborhood does not contain true change points. Then the likelihood ra-156
tio PrtYn|T ppqu{PrtYn|T0pp0qu goes to 0 with probability 1, because the ratio contains at least157
one of PrpYn|Mjq and PrpYn|Mjq�1 for τk P T0pp0q and τj R T0pp0q, which converges to 0 in158
probability by Lemma 1 and (2).159
For a given p, each multiplicand in PrtYn|pT ppqu can be obtained and stored through the scanning160
step. The optimization step essentially picks the maximal PrtYn|pT ppqu among a set of known161
quantities, which avoids intensive optimization procedures. Because the computational time for162
PrpYn|Mkq grows at the speed ofOpnq for k � 1, . . . ,Kn, the computational time for the refinement163
stage grows with the sample size at the speed of OpnKnq.164
5
3 Simulation165
3.1 The sequence without spikes166
To evaluate the performance of the proposed BMS method in the setting without spike points, we167
generate data from two different models. Model I takes the form of168
Yi � hTJpxiq � σεi
where the error term εi � Np0, 1q, σ � 0.5, and h � p2.01,�2.51, 1.51,�2.01,2.51,�2.11, 1.05, 2.16,�1.56, 2.56,�2.11qT with p0 � 11. We set Jpxiq � tp1 � sgnpnxi �tjqq{2, j � 1, . . . , p0uT, and the xi’s are equally spaced points on [0, 1]. The true change points are
ptj{n, j � 1, . . . , p0q � p0.1, 0.13, 0.15, 0.23, 0.25, 0.40, 0.44, 0.65, 0.76, 0.78, 0.81q.The random errors are generated from three distributions: the standard normal distribution Np0, 1q;169
Student’s t distribution with 5 degrees of freedom tp5q, which is standardized to have a unit variance;170
and the log-normal distribution LNp0, 1q, which is the exponential of the standard normal distribution171
and also standardized to have a unit variance. Model II considers a heteroscedastic error term across172
the segments. The data generating model is173
Yi � hTJptiq � σεi
1TJptiq¹j�1
vj
where pvj , j � 1, . . . , 11q � p1, 0.5, 3, 2{3, 0.5, 3, 2{3, 0.5, 3, 2{3, 0.5q. Other model specificationsremain the same as those in model I. We define the over- and under-segmentation errors as dppGn|Gnqand dpGn|pGnq respectively,
dppGn|Gnq � supbPGn
infaP pGn
|a� b|, dpGn|pGnq � supbP pGn
infaPGn
|a� b|.
For the BMS procedure, we consider three different priors for πp�q, corresponding to the local prior,174
non-local moment prior and non-local inverse moment priors. To save the space, we only present175
the results by using the non-local inverse moment prior. We take nI � tlogpnqu1.5h, where h ¥ 0.5176
generally works well in the simulations.177
For a comprehensive comparison with existing methods, we consider BMS under the non-local178
inverse moment prior πIp�q with q � v � 2 and s � 6, PELT [12], WBS [7], NOT with normal or179
heavy-tail distributions [1] and SML [5]. Table 1 summarizes the numerical results under model I180
and model II with normal, Student’s t, and log-normal error distributions and their heteroscedastic181
counterparts, respectively. On average, BMS performs the best in selecting the number of change182
points and balancing both over- and under-segmentation errors. It is expected that the performances183
of WBS, PELT, and SML deteriorate when the errors do not follow a normal distribution, because184
all the three procedures heavily rely on the parametric model assumptions, and hence they are not185
robust to model mis-specifications. In contrast, both BMS and NOT behave well under various error186
distributions. Also, NOT and SML perform the best in controlling the over-segmentation errors,187
while the resulting estimator pp tends to be larger than the true p0. On the other hand, the BMS allows188
for slightly larger over-segmentation errors in order to maintain pp to be more concentrated around p0.189
3.2 The sequence with spike points.190
In addition, we illustrate the features of the BMS, NOT and SML methods on the sequences contami-191
nated with spike points. Assume the noises are normal, we generate 500 sequences each contains192
n � 1000 points with mean changes at 0.01 and �0.01 on the 400 and 440’s observations, respec-193
tively. We set the noise standard deviation to be 0.002. Further, we generate 10 random samples194
uniformly in the range of p�0.07,�0.08q and p0.07, 0.08q, and add them to the original sequence at195
random locations to form up the spike points. Note that we choose these parameters to mimic the196
real data setting. We implement BMS, NOT, and SML on the simulated samples. In BMS, we select197
nI � 12, which is the largest integer that smaller than 0.65tlogpnqu1.5.198
From Table 2, we can seen that the BMS is insensitive to the spike points with the smallest |pp� p0|199
on average. In the experiments (not show) NOT ignores both the change points with small signal200
6
Table 1: Comparison results averaged over 200 simulations among the BMS, PELT, WBS, NOT andSML methods under model I and model II under different error distributions: the standard normaldistribution Np0, 1q, Student’s tp5q, and log-normal LNp0, 1q with constant variances; and thecorresponding distributions with heteroscedastic variances. Standard deviations are in parentheses.
Error Method pp� p0 dpGn| pGnq dp pGn|GnqDistribution ¤ �3 �2 �1 0 1 2 ¥ 3Np0, 1q BMS 0 0 1 197 2 0 0 2.41 (6.06) 1.96 (3.94)
PELT 0 1 37 162 0 0 0 0.91 (1.19) 6.32 (11.92)WBS 0 0 0 194 6 0 0 1.22 (4.13) 0.86 (0.79)NOT 0 0 0 192 7 1 0 1.93 (8.04) 0.75 (0.80)SML 0 0 0 132 52 13 3 12.94 (42.98) 0.78 (0.90)
tp3q BMS 0 0 8 190 2 0 0 2.15 (5.76) 2.83 (7.01)PELT 0 4 31 165 0 0 0 0.95 (1.03) 6.24 (12.24)NOT 0 0 3 184 3 6 4 7.57 (27.70) 1.51 (2.57)SML 0 0 0 42 34 44 80 40.13 (53.68) 0.88 (0.87)
LNp0, 1q BMS 0 0 12 180 6 1 2 3.69 (12.12) 3.11 (6.89)PELT 1 2 21 135 15 23 3 12.10 (29.63) 7.22 (13.32)NOT 0 1 4 183 7 1 4 6.06 (26.32) 1.18 (4.45)SML 0 0 0 0 0 4 196 111.77 (52.05) 0.73 (1.33)
Heterosced BMS 0 0 13 176 8 3 0 3.69 (7.08) 3.88 (7.36)Np0, 1q PELT 0 0 31 169 0 0 0 1.49 (1.53) 6.15 (11.44)
NOT 0 0 0 150 23 21 6 7.52 (12.60) 1.66 (1.68)SML 0 0 0 119 55 20 6 6.75 (23.98) 1.37 (1.52)
Heterosced BMS 0 0 14 181 4 1 0 2.79 (4.87) 4.21 (8.17)tp5q PELT 0 4 35 159 2 0 0 1.50 (1.83) 7.76 (13.48)
NOT 0 1 5 179 10 2 3 8.15 (25.01) 2.36 (4.70)SML 0 0 0 43 22 41 94 26.74 (35.42) 1.27 (1.59)
Heterosced BMS 0 0 20 173 7 0 0 3.73 (11.72) 4.20 (7.85)LNp0, 1q PELT 0 2 21 142 22 12 1 6.07 (16.09) 8.10 (13.93)
NOT 0 0 4 183 7 4 2 6.32 (24.67) 1.42 (3.70)SML 0 0 0 2 1 0 197 68.05 (47.15) 0.87 (1.67)
noise ratio and the spike signals with small segment length. On the other hand, SML is sensitive to201
the extreme value. To this end, BMS is the most suitable procedure for the MRgRT data, because on202
the one hand it reinforces the minimal segment length to avoid the identification of the spike signal;203
on the other hand it retains small minimal segment length to detect change points with short distance.204
Table 2: Comparison results averaged over 500 simulations among the BMS, NOT and SML methodson the data sequences with spike points.
Method pp� p0¤ �1 0 1 2 ¥ 3
BMS 31 276 113 67 13NOT 387 32 20 11 50SML 0 0 0 0 500
4 MRgRT data205
We illustrate the BMS method with the application to the MRgRT data which contains 2265 data206
points ordered by the distances from the sources of the radiations dose. The R code for implementing207
the method can be downloaded from our GitHub repository [10]. Throughout the implementation, we208
use the non-local inverse moment prior πIpµq � sνq{2{Γtq{p2squµ�pq�1q exp �pµ2{νq�s( with209
q � 2, ν � 2. We set nI � 13, which is the largest integer that smaller than 0.65tlogpnqu1.5.210
We first vary s from 2 to 10. Figure 1 shows that when s is small, we identify more change211
points than the true ones, and as s grows, the number of identified change points decreases. This212
phenomenon is consistent with the result in Lemma 4 that the convergence rate for the non-local prior213
is Optexpp�ns{p1�sqI qu. When s is small, the Bayes factor vanishes slowly and hence the algorithm214
picks redundant change points. When s is sufficiently large, the convergence rate approaches to215
7
Optexpp�nIqu, and hence the algorithm eliminates the flat points more effectively. In the following216
analysis, we select s � 10, with which the algorithm provides the best result in Figure 1.217
-0.010
-0.005
0.000
0.005
0.010
Distance from the source
Y
-0.010
-0.005
0.000
0.005
0.010
Distance from the source
Y
-0.010
-0.005
0.000
0.005
0.010
Distance from the source
Y
0 12 24 35 47 59 71 83 95 123
s= 2
s= 5
s = 10
Figure 1: Change points detection when using different s � 2, 5, 10
Interestingly, after we remove the spike points and kept the data within the range of p�0.01, 0.01q,218
we implement the NOT, SML, and BMS on the truncated sequence. Figures 2 shows that the results219
from the three methods are overlapped. The NOT and BMS methods have similar results, and both220
outperform SML. This implies that removing the spike points improves the boundary detection221
accuracy for all the three methods. However, the spike remover is infeasible in practice, because the222
locations and the magnitudes of the spike are difficult to track in human bodies.
-0.010
-0.005
0.000
0.005
0.010
Distance from the source
Sig
nal l
evel
0 12 24 36 49 63 75 87 101
Figure 2: Change points detection after restricting the data in the range of (-0.01, 0.01). BMS: thered line solid; NOT: the blue dash line; SML: the brown two dash linear.
223
5 Conclusion224
We propose the BMS method that consistently identifies multiple mean changes in a data sequence.225
The BMS method removes the flat points effectively without sacrificing the detection accuracy.226
Further, our method is particularly useful when the data sequence contains spike points, which are227
not of interest. We apply the BMS to analyze the MRgRT data, for which the NOT, SMT and other228
methods fail to but the BMS algorithm correctly detects the mean changes boundaries. We explore229
the BMS performance with different tuning parameters, and the resulting patterns are consistent with230
the theoretical properties. Moreover, we demonstrate that the BMS is robust to the error distributions231
by evaluating the detection procedures on the sequence with different random errors.232
8
References233
[1] Rafal Baranowski, Yining Chen, and Piotr Fryzlewicz. Narrowest-over-threshold detection of234
multiple change-points and change-point-like features. arXiv preprint arXiv:1609.00293, 2016.235
[2] Francesco Bertolino, Walter Racugno, and Elías Moreno. Bayesian model selection approach to236
analysis of variance under heteroscedasticity. The Statistician, pages 503–517, 2000.237
[3] Caterina Conigliani and Anthony O’hagan. Sensitivity of the fractional bayes factor to prior238
distributions. Canadian Journal of Statistics, 28(2):343–352, 2000.239
[4] Fulvio De Santis and Fulvio Spezzaferri. Consistent fractional bayes factor for nested normal240
linear models. Journal of statistical planning and inference, 97(2):305–321, 2001.241
[5] Chao Du, Chu-Lan Michael Kao, and SC Kou. Stepwise signal extraction via marginal242
likelihood. Journal of the American Statistical Association, 111(513):314–330, 2016.243
[6] Peter Eiauer and Peter Hackl. The use of MOSUMS for quality control. Technometrics,244
20(4):431–436, 1978.245
[7] Piotr Fryzlewicz. Wild binary segmentation for multiple change-point detection. Annals of246
Statistics, 42(6):2243–2281, 2014.247
[8] Joseph Glaz, Joseph I Naus, Sylvan Wallenstein, Sylvan Wallenstein, and Joseph I Naus. Scan248
Statistics. Springer, 2001.249
[9] Harold Jeffreys. The theory of probability. Oxford University Press, 1998.250
[10] Fei Jiang, Guosheng Yin, and Francesca Dominici. Bayesian multiple change points detection251
with non-local priors. R code, 2017.252
[11] Valen E Johnson and David Rossell. On the use of non-local prior densities in Bayesian253
hypothesis tests. Journal of the Royal Statistical Society: Series B, 72(2):143–170, 2010.254
[12] Rebecca Killick, Paul Fearnhead, and IA Eckley. Optimal detection of change points with a255
linear computational cost. Journal of the American Statistical Association, 107(500):1590–1598,256
2012.257
[13] Claudia Kirch and Birte Muhsal. A MOSUM procedure for the estimation of multiple random258
change points. Preprint, 2014.259
[14] Marc Lavielle and Carenne Ludeña. The multiple change-points problem for the spectral260
distribution. Bernoulli, 6(5):845–869, 2000.261
[15] Yue S Niu and Heping Zhang. The screening and ranking algorithm to detect dna copy number262
variations. Annals of Applied Statistics, 6(3):1306, 2012.263
[16] Philip Preuss, Ruprecht Puchstein, and Holger Dette. Detection of multiple structural breaks in264
multivariate time series. Journal of the American Statistical Association, 110(510):654–668,265
2015.266
[17] Stephen G Walker. Modern bayesian asymptotics. Statistical Science, pages 111–117, 2004.267
[18] Yi-Ching Yao. Approximating the distribution of the maximum likelihood estimate of the268
change-point in a sequence of independent random variables. Annals of Statistics, 15(3), 1987.269
[19] Chun Yip Yau and Zifeng Zhao. Inference for multiple change points in time series via likelihood270
ratio scan statistics. Journal of the Royal Statistical Society: Series B, 78(4):895–916, 2015.271
9
Appendix272
273
A Additional Figures274
Figure A.1: Reconstructed image of a slice in a cylindrical dosimeter with a cavity in the middle (left) and atypical line profile through the center of the cavity (right). The radiation enter the dosimeter from the hole on theleft of the cylindrical dosimeter. The cylindrical rotates 360 degree so that the radiation can enter from differentdirection.
-0.010
-0.005
0.000
0.005
0.010 NOT
Distance from the source
Y
-0.010
-0.005
0.000
0.005
0.010 SML
Distance from the source
Y
Figure A.2: The change point detection results from the SML with normal prior and the maximal number ofchange points to be 30 and NOT methods for the dose level changes in the DMOS system.
B Proofs275
Let g1pµq and g2pµq denote the first and second derivatives of a generic function gpµq with respect to276
µ respectively, and further define the utility function as277
Ukpµq � �τk�1�1¸l�τk
pYl � Yτk � µq2.
The following conditions are imposed for the theoretical derivations.278
(A1) Assume µ to be in a closed set of points in R.279
(A2) Assume πpµq to be a continuous density function with bounded first and second derivatives.280
(A3) Assume that PrtYn|T ppqu has a unique maximizer in the neighborhood of T0pp0q.281
Lemma 1. Assume that τk is a change point for which the mean of Yl � Yτk satisfies |µk0| ¡ δ for282
δ ¡ 0, n1{2k δ{σ Ñ8, then there is a constant D ¡ 0 such that283
limnÑ8
Pr
�³±τk�1�1l�τk
expt�pYl � Yτk � µq2uπpµqdµ±τk�1�1l�τk
expt�pYl � Yτkq2u¡ exppDnkδ2q
�� 1.
10
Proof: By the definition of Ukpµq, we can write284 ³±τk�1�1l�τk
expt�pYl � Yτk � µq2uπpµqdµ±τk�1�1l�τk
expt�pYl � Yτkq2u�
³exptUkpµquπpµqdµ
exptUkp0qu .
We first define Nδpµk0q � tµ : |µ� µk0| δu and N cδ pµk0q as its compliment, and then show285
limnÑ8
Pr
�sup
µPN cδ pµk0q
tUkpµq � Ukpµk0qu �Dnkδ2�� 1. (4)
Note that286
Ukpµq � Ukpµk0q
� nk
#pµ2k0 � µ2q � n�1
k pµk0 � µqτk�1�1¸l�τk
2pYl � Yτkq+
� nktpµ2k0 � µ2q � 2pµk0 � µqµk0 �Oppn1{2k |µk0 � µ|σkqu
� nkt�pµ� µk0q2u �Oppn1{2k |µk0 � µ|σkq¤ �nkδ2 �Oppn1{2k |µk0 � µ|σkq� �nkδ2{2� nkδ
2{2�Oppn1{2k |µk0 � µ|σkq.As n1{2k δ{σ Ñ8, we have �nkδ2{2�Oppn1{2k |µk0 � µ|σkq 0 with probability 1, and thus287
limnÑ8
Pr
�sup
µPN cδ pµk0q
tUkpµq � Ukpµk0qu �Dnkδ2�� 1.
When τk is a change point, let µ � 0, because |µk0| ¡ δ, we have288
limnÑ8
Pr
�exptUkp0qu
exptUkpµk0qu expp�Dnkδ2q�� 1. (5)
By the Laplace approximation,289 »exptUkpµquπpµqdµ � Opr�U2
k prµq�1{2 exptUkprµquπprµqs, (6)
where rµ is the maximizer of Ukpµq � logtπpµqu, and U2k prµq � Oppnjq. Let pµ be the maximizer of290
Ukpµq, and then291
0 � L1kprµq � Blogπprµq{Bµ� L2kpµ�qprµ� pµq � Blogπprµq{Bµ,
where µ� is a point on the line segment between rµ and pµ. As πpµq has two bounded derivatives by292
condition (A2), L2kpµ�q � Oppnjq, we have rµ� pµ � Oppn�1j q. Therefore, (6) can be written as293 »
exptUkpµquπpµqdµ � Opr�U2k ppµq�1{2 exptUkppµquπppµqs
� Opr�U2k pµk0q�1{2 exptUkpµk0quπpµk0qs, (7)
where the last equality holds because pµ is the least squared estimator. This implies294
exptUkpµk0qu³exptUkpµquπpµqdµ � Oppn1{2k q, (8)
which in conjunction with (5) leads to295
limnÑ8
Pr
�exptUkp0qu³
exptUkpµquπpµqdµ expp�Dnkδ2q�� 1.
By condition (A2) and the boundedness of Ukpµq, we have296
limnÑ8
Pr
�³exptUkpµquπpµqdµ
exptUkp0qu ¡ exppDnkδq�� 1,
which completes the proof.297
11
Lemma 2. Let πpµq � πLpµq be a local prior, and assume that τj is not a change point, i.e., µj0 � 0,298
then299 ³±τj�1�1l�τj
expt�pYl � Yτj � µq2uπpµqdµ±τj�1�1l�τj
expt�pYl � Yτj q2u� Oppn�1{2
j q.
Proof: By the definition of Ujpµq, we can write300 ³exptUjpµquπpµqdµ
exptUjp0qu �³±τj�1�1
l�τjexpt�pYl � Yτj � µq2uπpµqdµ±τj�1�1
l�τjexpt�pYl � Yτj q2u
.
Using the same argument as that leading to (7) with µj0 � 0, we have301 »exptUjpµquπpµqdµ � Opr�U2p0q�1{2 exptUjp0quπp0qs.
Since U2p0q�1{2 � Oppn�1{2j q, and πp0q is a bounded density, we have302 ³
exptUjpµquπpµqdµexptUjp0qu � Oppn�1{2
j q.
Lemma 3. Let303
πpµq � πM pµq � µ2v
CMπbpµq,
where CM is a normalizing constant, πbpµq with πbp0q ¡ 0 is the base prior density with 2v finite304
moments, and bounded first two derivatives in the neighborhood around 0. Assume that τj is not a305
change point, i.e., µj0 � 0, then306 ³±τj�1�1l�τj
expt�pYl � Yτj � µq2uπpµqdµ±τj�1�1l�τj
expt�pYl � Yτj q2u� Oppn�v�1{2
j q.
Proof: We can write307 » τj�1�1¹l�τj
expt�pYl � Yτj � µq2uπpµqdµ �»
exptUjpµq � logπpµqudµ.
Let hpµq � Ujpµq � logπpµq � 2vlogpµq � logtπbpµqu � Ujpµq, and let rµ be the maximizer of308
hpµq, then we have309
2v{rµ� π1bprµq{πbprµq � L1jprµq � 0.
If we expand L1jprµq around pµ, the least squared estimator for µj0, the above equality can be rewritten310
as311
2v{n� n�1rµπ1bprµq{πbprµq � L2j pµ�qrµprµ� pµq � 0,
where µ� is the point on the line segment between pµ and rµ. Therefore,312
Oppn�1q � rµprµ� pµq� prµ� pµq2 � pµprµ� pµq� prµ� pµ� pµ{2q2 � pµ2{4.
Along with the fact that pµ � Oppn�1{2j q, we have rµ� pµ � Oppn�1{2
j q, and rµ � Oppn�1{2q. Next,313
by the Laplace expansion, we have314 »expthpµqudu � Oppt2v{rµ2 � U2
j prµqu�1{2 expr2vlogprµq � logtπbprµqu � Ujprµqsq,12
and also315
n�1Ujprµq � n�1Ujp0q � n�1U 1jpµ��qrµ
� n�1tU 1jppµq � U 1
jpµ��q � U 1jppµqurµ
� Oppµ�� � pµqrµ� Oppn�1q, (9)
where µ�� is on the line segment between rµ and 0. Thus,316
|Ujprµq � Ujp0q| � Opp1q.As a result,317 ³±τj�1�1
l�τjexpt�pYl � Yτj � µq2uπpµqdµ±τj�1�1
l�τjexpt�pYl � Yτj q2u
�³
exptUjpµquπpµqdµexptUjp0qu
�³
expthpuquduexptUjp0qu
� Oppt2v{rµ2 � U2j prµqu�1{2 expr2vlogprµq � logtπM prµqu � Ujprµq � Ujp0qsq
� Oppn�1{2j rµ2vq
� Oppn�1{2�vj q,
where the last equality holds by the fact that rµ � Oppn�1{2q. This proves the result.318
Lemma 4. Let319
πpµq � πIpµq � sνq{2
Γtq{p2squµ�pq�1q exp
#��µ2
ν
�s+.
Assume that τj is not a change point, i.e., µj0 � 0, then320 ³±τj�1�1l�τj
expt�pYl � Yτj � µq2uπpµqdµ±τj�1�1l�τj
expt�pYl � Yτj q2u� Optexpp�ns{ps�1q
j qu.
Proof: We first write321 » τj�1�1¹l�τj
expt�pYl � Yτj � µq2uπpµqdµ � c
»exp
Ujpµq � µ�2sνs � pq � 1qlogpµq( dµ,
for a constant c. Let hpµq � Ujpµq �µ�2s� pq� 1qlogpµq, and assume rµ is the maximizer of hpµq,322
then we have323
U 1jprµq � 2srµ�2s�1νs � pq � 1qrµ�1 � U 1
jpµ�qprµ� pµq � 2srµ�2s�1νs � pq � 1qrµ�1 � 0,
where µ� is on the line segment between rµ and pµ. The above equality yields324
njrµ2s�2p1� pµ{rµq � 2sνs � pq � 1qrµ2s
�U2j pµ�q{nj
, (10)
which implies rµ � Oppn1{p2s�2qj q.325
From (10), we have nrµ2s�1prµ� pµq � Opp1q, which leads to326
rµ� pµ � Optn�p4s�3q{p2s�2qj u. (11)
Following (30) in [11] and using our notation, we obtain327 »expthpµqudu � Op
�" p4s2 � 2sq2s�2rµ � U2j prµq*�1{2
|rµ|�q�1 expt�rµ�2sνs � Ujprµqu�.
13
Expanding Ujprµq around the least squared estimator pµ, we have328
Ujprµq � Ujppµq � 1{2U2j prµ� pµq2
� Ujppµq � opp1q� Ujp0q �Opp1q,
where the second equality follows (11), and the last equality follows the same argument as that329
leading to (9). Therefore, we have330 ³±τj�1�1l�τj
expt�pYl � Yτj � µq2uπpµqdµ±τj�1�1l�τj
expt�pYl � Yτj q2u
�³
exptUjpµquπpµqdµexptUjp0qu
�³
expthpuquduexptUjp0qu
� Op
�" p4s2 � 2sq2s�2rµ � U2j prµq*�1{2
|rµ|�q�1 expt�rµ�2sνs � Ujprµq � Ujp0qu�
� Optexpp�ns{ps�1qj qu.
This proves the result.331
Lemma 5. Assume p0 � 1 and τk is the only true change point. As n1{2k δ{σ Ñ8, PrpMk|Ynq �332
1 � OptKnanI expp�nIδ2qu. Hence when nI{logpnq Ñ c, 0 c ¤ 8, nI ¤ λ, we have333
PrpMk|Ynq pÑ 1.334
Proof: First we can write335
PrpMk|Ynq �#
1�Kn
j�k
PrpYn|MjqPrpYn|Mkq
+�1
. (12)
To show PrpMk|Ynq � 1 Ñ 0, it is equivalent to showingKn
j�1,j�k
PrpYn|MjqPrpYn|Mpq Ñ 0.
Note that336
PrpYn|MjqPrpYn|Mkq �
³±τj�1�1l�τj
expt�pYl � Yτj � µq2uπpµqdµ±τj�1�1l�τj
expt�pYl � Yτj q2u
�±τk�1�1l�τk
expt�pYl � Yτkq2u³±τk�1�1l�τk
expt�pYl � Yτk � µq2uπpµqdµ� AB,
where337
A �³±τj�1�1
l�τjexpt�pYl � Yτj � µq2uπpµqdµ±τj�1�1
l�τjexpt�pYl � Yτj q2u
and338
B �±τk�1�1l�τk
expt�pYl � Yτkq2u³±τk�1�1l�τk
expt�pYl � Yτk � µq2uπpµqdµ.
As shown in [11], A is a Bayes factor whose convergence rate is Oppanj q. For B, first note that the339
data in rτk, τk�1q are generated from the model with mean µk0 such that |µk0| ¡ δ. Hence, we have340
B � Optexpp�nkδ2qu,
14
where the last equality holds by Lemma 1 that341
limnÑ8
�Pr
� ±τk�1�1l�τk
expt�pYl � Yτkq2u³±τk�1�1l�τk
expt�pYl � Yτk � µq2uπpµqdµ expp�Dnkδ2q
��� 1,
where D is a constant. Combining the convergence rates for A and B, we have342
AB � Optank expp�nkδ2qu. (13)
Thus,343
PrpYn|MjqPrpYn|Mkq � AB � OptanI expp�nIδ2qu, (14)
and344
Kn
j�1
PrpYn|MjqPrpYn|Mkq � 1 � OptKnanI expp�nIδ2qu
Plugging this result into (12), we have345
PrpMk|Ynq pÑ 1,
which completes the proof.346
Proof of Theorem 1347
First we can write348 ¸MkPM
PrpMk|Ynq �#
1�°MjRM PrpYn|Mjq°MkPM PrpYn|Mkq
+�1
. (15)
Note that349 °MjRM PrpYn|Mjq°MkPM PrpYn|Mkq ¤
°MjRM PrpYn|Mjq
PrpYn|Mkq ,
for Mk PM. Hence by the same argument as that leading to Lemma 5, we have350 °MjRM PrpYn|Mjq°MkPM PrpYn|Mkq � OptKnanI expp�nIδ2qu,
and351 ¸MkPM
PrpMk|Ynq � 1 � OptKnanI expp�nIδ2qu.
This proves the result.352
Proof of Proposition 1353
Following [15], we define x as an nI -flat point so that there is no change-point in px� nI , x� nIq.354
Let F be the set of all nI -flat points, then355
Pr
$&%�� £tPT0pp0q
Rt ¡ C
� £�£τPF
Rτ C
�,.-� 1� Pr
$&%�� ¤τPT0pp0q
Rτ ¡ C
� ¤�¤τPF
Rτ C
�,.-� 1� Pr
"�max
tPT0pp0qRt ¡ C
¤�minτPF
Rτ C
*¥ 1�
"Pr
�max
tPT0pp0qRt ¡ C
� Pr
�minτPF
Rτ C
*.
15
For each τ P T0pp0q,356
PrpRτ Cq � Otexpp�nIδ2qu,which holds by Lemma 1. Furthermore, for τ P F ,357
PrpRτ ¡ Cq � OpanI q,which holds by Lemmas 2 to 4. Hence,358
Pr
$&%�� £tPT0pp0q
Rt ¡ C
� £�£τPF
Rτ C
�,.-¥ 1�Ormintexpp�nIδ2q, anI us
By Lemma 3 in [15], for any t P T0pp0q we have a τ P HcpnIq such that Prtt P pτ �nI , τ �nIqu �359
1�Ormintexpp�nIδ2q, anI us.360
Proof of Theorem 2361
We first show that for a given p, pT ppq is the maximizer of PrtYn|T ppqu. Based on the BMS362
procedure, pT ppq is the maximizer of°MkPM PrpMk|Ynq, where M � tMk, τk P T ppqu. Since363
we impose the uniform prior on Mk, pT ppq is the maximizer of364 ¸MkPM
PrpYn|Mkq (16)
� Dn
¸MkPM
K¹j�1,j�k
τj�1�1¹l�τj
expt�pYl � Yτj q2u» τk�1�1¹
l�τk
expt�pYl � Yτk � µq2uπpµqdµ
� Dn
K¹j�1
τj�1�1¹l�τj
expt�pYl � Yτj q2u¸
MkPM
³±τk�1�1l�τk
expt�pYl � Yτk � µq2uπpµqdµ±τk�1�1l�τk
expt�pYl � Yτkq2u
where Dn is a constant depending on n. Further note that365
PrtYn|T ppqu (17)
�¹
τjRT ppq
τj�1�1¹l�τj
expt�pYl � Yτj q2u¹
τkPT ppq
» τk�1�1¹l�τk
expt�pYl � Yτk � µq2uπpµqdµ
�K¹j�1
τj�1�1¹l�τj
expt�pYl � Yτj q2u¹
τkPT ppq
³±τk�1�1l�τk
expt�pYl � Yτk � µq2uπpµqdµ±τk�1�1l�τk
expt�pYl � Yτkq2u.
Comparing (17) and (16), clearly they have the same optimizer, and thus pT ppq is the maximizer of366
(17). Hence, our BMS procedure results in the estimators pp and pT pppq that maximize PrtYn|T ppqu.367
Next, let E1 be the event that at least one j such that tj P pτk, τk�1q, and pti � τk, pti � τk�1 for all i368
and τk, τk�1 P HcpnIq, pti P pT pppq that maximizes PrtYn|T ppqu. Following the similar arguments369
as those in [5], we show that the probability of E1 goes to 0. Suppose that pT pppq is such an estimate.370
Consider the first case where ptj � τk � 1qpτk�1 � τk � 1q�1 � Op1q; that is, tj is bounded away371
from τk. Then, we can choose a set of change points that372
rT ppp� 1q � trτ1, . . . , rτpp�1u� tpt1, . . . ,pti, τk�1,pti�1,ptppu.
Then373
PrtYn|pT pppquPrtY|rT ppp� 1qu
�±τk�2�1l�τk�1
expt�pYl � Yτk�1q2u³±τk�2�1
l�τk�1expt�pYl � Yτk�1
� µq2uπpµqdµ.
16
Because limnÑ8 supnI{λ 1{2, there is an N , such that for all n ¡ N , tj�1 ¡ τk�2, and hence374
there is no change point within pτk�1, τk�2q. This prevents the situation where there are more than375
one change points in between τk and τk�2. Further for n ¡ N ,376
EpYl � Yτk�1q � pτk�1 � τk � 1q�1E
#tj¸
s�τk
pYl � Ysq �τk�1
s�tj�1
pYl � Ysq+
¥ ptj � τk � 1qpτk�1 � τk � 1q�1δ.
Therefore, by Lemma 1,377
PrtYn|pT pppquPrtY|rT ppp� 1qu
� Optexpp�nIδ2qu.
If ptj � τk � 1qpτk�1 � τk � 1q�1 � op1q, we define378 rT ppp� 1q � trτ1, . . . , rτpp�1u� tpt1, . . . ,pti, τk,pti�1,ptppu,
and then379
PrtYn|pT pppquPrtY|rT ppp� 1qu
�±τk�1�1l�τk
expt�pYl � Yτkq2u³±τk�1�1l�τk
expt�pYl � Yτk � µq2uπpµqdµ
�±τk�1�1l�tj
expt�pYl � Yτkq2u³±τk�1�1l�tj
expt�pYl � Yτk � µq2uπpµqdµ
�³±τk�1�1
l�tjexpt�pYl � Yτk � µq2uπpµqdµ³±τk�1�1
l�τkexpt�pYl � Yτk � µq2uπpµqdµ
�±τk�1�1l�τk
expt�pYl � Yτkq2u±τk�1�1l�tj
expt�pYl � Yτkq2u.
The first term in the last equation is of order Optexpp�nIδ2qu by Lemma 1, and the last two terms380
are of order Opp1q because ptj � τk � 1qpτk�1 � τk � 1q�1 � op1q. Therefore,381
Pr
�PrtYn|pT pppqu
PrtY|rT ppp� 1qu¡ 1|E1
�¤ E
�PrtYn|pT pppqu
PrtY|rT ppp� 1qu
�� Otexpp�nIδ2qu.
As pT pppq is the maximizer of PrtYn|T ppqu, Pr�
PrtYn| pT pppquPrtY| rT ppp�1qu
¡ 1�� 1 because pT pppq is the382
maximizer of PrtYn|pT pppqu and it is unique by condition (A3). By the Bayes rule, we have383
Pr
�E1| PrtYn|pT pppqu
PrtY|rT ppp� 1qu¡ 1
�¤ Otexpp�nIδ2qu. (18)
Hence, Prppti � τk or pti � τk�1q � 1 � Otexpp�nIδ2qu. Further note that τk and τk�1 are in the384
nI -neighborhood of tj , and thus for any tj , there is a pti such that385
Prttj P ppti � nI ,pti � nIqu � 1�Otexpp�nIδ2qu.Since it holds for any j, we can write386
Pr
#sup
ptjP pT pppqinf
tjPT0pp0q|pptj � tjq{n| ¤ nI{n
+� 1�Otexpp�nIδ2qu.
Next we show that for any pti there is a tj in the nI -neighborhood of pti. Define E2 as the event that387
there is at least one pti such that no tj in the nI -neighborhood of pti. Let pT pppq be such an estimate that388
17
pti is the kth candidate points and ppti, τk�1q, pτk�1,ptiq do not contain tj for all j. Then, we define a389
new set of change points by deleting pti,390 rT ppp� 1q � tpt1, . . . ,pti�1,pti�1,ptppu.Then,391
PrtYn|pT pppquPrtY|rT ppp� 1qu
�±τk�1�1
l�ptiexpt�pYl � Y
pti� µq2uπpµqdµ±τk�1�1
l�ptiexpt�pYl � Y
ptiq2u
� OppanI q
by Lemmas 2–4. Therefore, using the same argument as that leading to (18), we have392
PrpE2| PrtYn| pT pppquPrtY| rT ppp�1qu
¡ 1q � OpanI q. For any pti, there exists a tj such that393
Prtpti P ptj � nI , tj � nIqu � 1�OpanI q.It holds for any pti, and thus we have394
Pr
�sup
tjPT0pp0q
infptjP pT pppq
|pptj � tjq{n| nI{n�� 1�OpanI q
Because ptj � nI , tj � nIq contains only one estimate by the definition of pT pppq that |ptj�1 � ptj | ¡ λ,395
we have Prppp � p0q � 1 � Oppmaxtexpp�nIδ2q, anI uq by using the same arguments as those396
leading to Theorem 3.3 in [5].397
18