bayesian model selection approach to boundary detection ... · bayesian model selection approach to...

18
Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address email Abstract We propose a Bayesian model selection (BMS) boundary detection procedure using 1 non-local prior distributions for a sequence of data with multiple systematic mean 2 changes. By using the non-local priors in the Bayesian model selection framework, 3 the BMS method can effectively suppress the non-boundary spike points with large 4 instantaneous changes. Further, we speed up the algorithm by reducing the multiple 5 change points to a series of single change point detection problems. We establish 6 the consistency of the estimated number and locations of the change points under 7 various prior distributions. Extensive simulation studies are conducted to compare 8 the BMS with existing methods, and our method is illustrated with application to 9 the magnetic resonance imaging guided radiation therapy data. 10 1 Introduction 11 Traditional change point detection algorithms often handle the cases where the occurrent frequency of 12 the change points is relatively consistent across the signal. For example, the popular narrowest-over- 13 threshold (NOT) [1] algorithm is suitable under the setting where the different segments between the 14 change points have comparable lengths, while the stepwise marginal likelihood (SML) method [5] 15 works the best to identify the frequent change points. However, it is often case that the gaps between 16 the consecutive change points are dramatically different, while only ones with certain distances are of 17 interest. In this paper, we develop a computational efficient Bayesian model selection algorithm for 18 identifying the change points under this specific setting. 19 The inconsistent gaps between the change points can be observed from the signals generated by the 20 magnetic resonance imaging guided radiation therapy (MRgRT). One problem with MRgRT is that 21 when radiation traveled in the magnetic field, the dose level can be significantly enhanced near the 22 boundaries between different tissues or organs in human bodies. The Duke Mid-sized Optical-CT 23 System (DMOS) (See Figure A.1 in Appendix) was developed to identify the dose changes near the 24 region of this boundary artifact. The right panel in Figure A.1 contains one radiation dose, where we 25 can observe that the boundaries on and inside the dosimeter can be distinguished by noticeable peaks 26 of the radiation dose levels. In the experiment, multiple radiations enter the cylindrical dosimeter from 27 different directions, and a sequence of data ordered by their distances to the sources was collected. 28 Because the dosimeter is a circle and the cavity is in the middle, the radiation from different directions 29 would hit the boundary at similar distances away from their sources in the dosimeter. 30 In the MRgRT data, radiations in certain directions may experience temporary changes at non- 31 boundary locations, which may result from the abnormal status of the DMOS system rather than the 32 true dose changes. The temporary change points, appear in the data sequence as the spike points, 33 often mixed up with the ones on the boundary (the peak locations in Figure A.1) which makes the 34 boundary detection extremely challenging. Figures A.2 in Appendix shows the change points in the 35 MRgRT data identified by the popular NOT [1] and the SML [5] algorithms, respectively. It can be 36 Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017). Do not distribute.

Upload: others

Post on 14-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

Bayesian Model Selection Approach to BoundaryDetection with Non-Local Priors

Anonymous Author(s)AffiliationAddressemail

Abstract

We propose a Bayesian model selection (BMS) boundary detection procedure using1

non-local prior distributions for a sequence of data with multiple systematic mean2

changes. By using the non-local priors in the Bayesian model selection framework,3

the BMS method can effectively suppress the non-boundary spike points with large4

instantaneous changes. Further, we speed up the algorithm by reducing the multiple5

change points to a series of single change point detection problems. We establish6

the consistency of the estimated number and locations of the change points under7

various prior distributions. Extensive simulation studies are conducted to compare8

the BMS with existing methods, and our method is illustrated with application to9

the magnetic resonance imaging guided radiation therapy data.10

1 Introduction11

Traditional change point detection algorithms often handle the cases where the occurrent frequency of12

the change points is relatively consistent across the signal. For example, the popular narrowest-over-13

threshold (NOT) [1] algorithm is suitable under the setting where the different segments between the14

change points have comparable lengths, while the stepwise marginal likelihood (SML) method [5]15

works the best to identify the frequent change points. However, it is often case that the gaps between16

the consecutive change points are dramatically different, while only ones with certain distances are of17

interest. In this paper, we develop a computational efficient Bayesian model selection algorithm for18

identifying the change points under this specific setting.19

The inconsistent gaps between the change points can be observed from the signals generated by the20

magnetic resonance imaging guided radiation therapy (MRgRT). One problem with MRgRT is that21

when radiation traveled in the magnetic field, the dose level can be significantly enhanced near the22

boundaries between different tissues or organs in human bodies. The Duke Mid-sized Optical-CT23

System (DMOS) (See Figure A.1 in Appendix) was developed to identify the dose changes near the24

region of this boundary artifact. The right panel in Figure A.1 contains one radiation dose, where we25

can observe that the boundaries on and inside the dosimeter can be distinguished by noticeable peaks26

of the radiation dose levels. In the experiment, multiple radiations enter the cylindrical dosimeter from27

different directions, and a sequence of data ordered by their distances to the sources was collected.28

Because the dosimeter is a circle and the cavity is in the middle, the radiation from different directions29

would hit the boundary at similar distances away from their sources in the dosimeter.30

In the MRgRT data, radiations in certain directions may experience temporary changes at non-31

boundary locations, which may result from the abnormal status of the DMOS system rather than the32

true dose changes. The temporary change points, appear in the data sequence as the spike points,33

often mixed up with the ones on the boundary (the peak locations in Figure A.1) which makes the34

boundary detection extremely challenging. Figures A.2 in Appendix shows the change points in the35

MRgRT data identified by the popular NOT [1] and the SML [5] algorithms, respectively. It can be36

Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017). Do not distribute.

Page 2: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

seen that neither of the algorithms correctly identify the true boundaries. This motivate us to propose37

a new algorithm in detecting the systematic changes when the segment length can have dramatic38

differences.39

As shown in the preliminary analysis of the MRgRT data, the local control of the discovery is crucial.40

To avoid picking the spike points, we enforce minimal distances between the change points. Moreover,41

we adopt the computational efficient local scan routine and propose a systematic two-stage procedure42

to speed up the change point detection procedure. More specifically, the local scan method first43

identifies the candidate points with minimal distance based on the local data, and then optimizes44

an utility function to obtain the estimates for the locations and the total number of change points.45

Because the change points are defined based on the mean changes in two consecutive segments, the46

local data are sufficient to detect the systemic changes [6, 7, 8, 13, 14, 16, 18, 19].47

To perserve the positive detection rate of the change points and reduce the false detection rate of the48

non-change points, we utilize a Bayesian marginal likelihood function as the utility, and propose a49

new Bayesian model selection (BMS) procedure for identifying change points. In particular, we show50

that the selection consistency is achieved by choosing both the local [2, 3, 4, 17] and non-local priors51

[11], whereas the convergence rate is faster under the non-local priors.52

Our BMS procedure is cast in the model selection framework, which is faster than the dynamic53

programming introduced under the SML framework. For example, for a single sequence in the54

MRgRT data, BMS takes roughly 1.3 seconds and SML takes 3.1 seconds when the maximum55

number of change points is capped at 100. The major factor that contributes to the efficiency of56

BMS algorithm is the fact that BMS reduces the search space dramatically by selecting a small set of57

candidate change-points. Further, once the candidate points are selected, BMS only needs to evaluate58

two consecutive segments at a time, which allows the algorithms to be implemented in parallel.59

2 Bayesian multiple change points detection60

2.1 Probability model61

Suppose there are p0 true change points t1 ¤, . . . ,¤ tp0 among n observations Yn � tY1, . . . , Ynu.62

As a convention, let t0 � 1 and tpp0�1q � n� 1. Denote λj � tj�1 � tj and λ � minj�0,...,p0 λj .63

We consider a set of Kn candidate points τ1, . . . , τKn , with τ0 � 1 and τKn�1 � n � 1, while64

postponing the discussion on the candidate set selection to Section 2.4. Define nj � τj�1 � τj ,65

nI � minj�0,...,Kn�1 nj , nI ¤ λ. Let HpnIq � tτj : j � 1, . . . ,Kn, |τj�1 � τj | ¡ nIu denote66

the set of all candidate points and T0pp0q � ttj : j � 1, . . . , p0u be the set of true change points.67

The specification of the candidate points allows the BMS method to be implemented in a lower68

dimensional space with the most influential points. It also guarantees that there are a sufficient69

number of non-change points surrounding the true ones so that the consistency conditions are met.70

The probability model takes the form of71

Yl � ντj � εl, l P rτj , τj�1q,where the random errors εl are independent with mean zero and variance σ2

j . Further, we define72

σ � maxj�0,...,p0 σj .73

For ease of exposition, we first consider the situation where the locations of the candidate change74

points are given and T0pp0q � HpnIq. Define Yτj � n�1j�1

°τj�1l�τj�1

Yl, which is the sample average75

for the pj � 1qth segment rτj�1, τjq, j ¡ 1. Suppose the candidate point τk is not a change point,76

then the points in rτk, τk�1q have the same mean as those in rτk�1, τkq. If τk is a change point, we77

expect a mean shift between the segments rτk, τk�1q and rτk�1, τkq. Hence, we can formulate the78

model and prior distribution for l ¥ τ1 as follows:79

Yl � Yτk � µk � ξl, l P rτk, τk�1q,µk � πpµkq, if τk is a change point,µk � 0, with probability 1, if τk is an nI -flat point,

where πp�q is a prior density and ξl is a mean-zero error. Note that the observations in the first80

segment are unchanged. Here the nI -flat point is defined as a non-change point which is at least nI81

apart from any change points. We require the nI distance between the true change points and the flat82

ones so that there are sufficient neighborhood samples to achieve the estimation consistency.83

2

Page 3: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

Let µk0 be the true value of µk, and we assume |µk0| ¡ δ, where δ ¡ 0 is the lower bound of84

the µk0, for the k’s with τk P T0pp0q. The prior distribution on µk is crucial for determining the85

convergence rate of the BML procedure. We explore three types of priors: the local prior [9], the86

non-local moment prior and the inverse moment prior in [11].87

Local: πLpµq � Np0, ω2qMoment: πM pµq � µ2v{CM1{

?2π expp�µ2{2q

Inverse moment: πIpµq � sνq{2{Γpq{2sqµ�pq�1q exp �pµ2{νq�s( ,

where CM is the normalizing constant.88

Let Mk represent the model that τk is the sole change point. We define the marginal likelihood with89

the Gaussian kernel as90

PrpYn|Mkq �Kn¹

j�1,j�k

τj�1�1¹l�τj

expt�pYl � Yτj q2u» τk�1�1¹

l�τk

expt�pYl � Yτk � µq2uπpµqdµ.

The posterior model probability of Mk given Yn is91

PrpMk|Ynq � PrpYn|MkqPrpMkq°Knj�1 PrpYn|MjqPrpMjq

� PrpYn|Mkq°Knj�1 PrpYn|Mjq

, (1)

when Mj assumes a non-informative uniform prior, j � 1, . . . ,Kn. Note that it is not necessary for92

Yn to be normally distributed to ensure the selection consistency in detecting mean changes. The93

Gaussian kernel serves as a utility function, which tends to be large when the distances between the94

true and the hypothetical segment means are small. Hence, as n Ñ 8, PrpMk|Ynq approaches 195

when τk is indeed a change point and the τj’s pj � kq are nI -flat points.96

2.2 Change point detection97

We start with the simplest case that there is only one mean shift in the data, i.e., p0 � 1 is fixed98

a priori. We select the candidate point τk associated with the largest PrpMk|Ynq among all the99

candidates, which corresponds to the largest marginal likelihood PrpYn|Mkq. It can be shown that100

PrpMk|Ynq �#

1�Kn

j�k

PrpYn|MjqPrpYn|Mkq

+�1

,

where101

PrpYn|Mjq �³±τj�1�1

l�τjexpt�pYl � Yτj � µq2uπpµqdµ±τj�1�1

l�τjexpt�pYl � Yτj q2u

PrpYn|Mkq �³±τk�1�1

l�τkexpt�pYl � Yτk � µq2uπpµqdµ±τk�1�1

l�τkexpt�pYl � Yτkq2u

.

The two marginal likelihoods indicate that the selection consistency is fully determined by the102

evidence in favor of µk � πpµkq and that of µj � 0 for j � k. The product on the right hand side103

converges to 0 when nI grows to 8 with the sample size. The candidate points retain enough samples104

in the neighborhood to guarantee the convergency.105

For the case with multiple change points (p0 ¡ 1), we select the points associated with the p0 largest106

PrpMk|Ynq, and the selection consistency is presented as follows.107

Theorem 1. Let M � tMk, τk P T0pp0qu. If the Bayes factor satisfies108

PrpYn|Mjq � Oppanj q (2)

for τj R T0pp0q, anj � opp1q, and n1{2I δ{σ Ñ8, then109 ¸MkPM

PrpMk|Ynq � 1�OptKnanI expp�nIδ2qu. (3)

Hence when nI{logpnq Ñ c, 0   c ¤ 8, nI ¤ λ, we have¸MkPM

PrpMk|Ynq pÑ 1.

3

Page 4: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

The proof of Theorem 1 is delineated in the Appendix. Clearly, the selection consistency depends110

on the convergence rate of anI , which is determined by the choice of the prior πp�q. Lemmas 2–4111

in the Appendix show that anj � n�1{2j when πp�q � πLpµq; anj � n

�v�1{2j if πp�q � πM pµq and112

anj � expt�ns{ps�1qj u if πp�q � πIpµq. Hence, the selection consistency is achieved at the fastest113

rate using the non-local inverse moment prior.114

2.3 Number of change points115

When p0 is unknown, let T ppq be the set containing p points from the procedure described in Section116

2.2. We define the marginal likelihood given T ppq, that is PrtYn|T ppqu as117

¹τjRT ppq

τj�1�1¹l�τj

expt�pYl � Yτj q2u¹

τkPT ppq

» τk�1�1¹l�τk

expt�pYl � Yτk � µq2uπpµqdµ.

We can estimate the locations and the number of change points in two steps: (i) for any given118

p, we obtain pT ppq using the procedure described in the previous section, and (ii) we estimate p0119

by pp via maximizing PrtYn|pT ppqu with respect to p. In contrast to the procedure in [5] which120

simultaneously estimates the locations and the number of change points by maximizing the marginal121

likelihood with respect to T ppq and p, our BMS splits the estimation procedure into a scanning step122

as described in Section 2.2 and an optimization step. This scanning–optimization mechanism reduces123

the computational burden substantially, because the optimization in step (ii) is merely implemented124

in a single dimension.125

2.4 Candidate points selection126

Previous discussions rely upon a critical assumption that the candidate points are specified in advance,127

which, however, is often infeasible in practice. To facilitate the implementation of the BMS, we128

need to find a candidate set HcpnIq that is close to HpnIq. For the selection consistency of the129

change points, we require for each tj there is a τk P HcpnIq, such that Prp|tj � τk| ¤ nIq �130

1�Oprmintexpp�nIδ2q, anI us. Define131

Ri �³±i�nI�1

l�i expt�pYl � Yi � µq2uπpµqdµ±i�nI�1l�i expt�pYl � Yiq2u

,

where Yi � pnI � 1q�1°i�1j�i�nI

Yj . By the argument similar to that in Lemma 1, Ri goes to infinity132

when i is a true change point, and Ri Ñ 0 in probability, when i is an nI -flat point. Hence, the value133

of Ri can distinguish a change point from a set of nI -flat points. To further eliminate the non-change134

points that are also not nI -flat, we implement the non-maximum suppression that removes away the135

points which do not give the largest Ri’s in their nI -neighborhood. More specifically, the screening136

procedure of selecting candidate points is described as follows.137

Algorithm 1 : Screening(1) For each i in rnI , n� nI s, compute Ri.(2) If Ri � maxtRj : j P pi� nI , i� nI su, i is selected as a candidate point.(3) After scanning through the data sequence, a set of Kn candidate points, denoted as HcpnIq,

is obtained.

This screening algorithm is comparable to that in [15], because by the Laplace approximation, we138

can write139

Ri � Dn

±i�nI�1l�i expt�pYl � Yi � µ�q2uπpµ�q±i�nI�1

l�i expt�pYl � Yiq2ut1� opp1qu

� Dn exp

#2

�i�nI�1¸l�i

Yl �i�1

j�i�nI

Yj

�µ� � nIµ

�2

+πpµ�qt1� opp1qu,

4

Page 5: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

where Dn is a constant of order Oppn�1{2I q and µ� is the maximizer of �°i�nI�1

l�i pYl � Yi �140

µq2 � logπpµq. Clearly, the magnitude of the leading term in Ri is strongly associated with141

n�1I

�°i�nI�1l�i Yl �

°i�1j�i�nI

Yj

, which is the local diagnosis function with h � nI in [15].142

Next, we show the screening procedure identifies a set HcpnIq that would lead to the change point143

consistency.144

Proposition 1. Assume n1{2I δ{σ Ñ 8. For each tj P T0pp0q, there is a τ P HcpnIq such that145

Prttj P pτ � nI , τ � nIqu � 1�Ormintexpp�nIδ2q, anI us.146

In theory, i � tj maximizes Ri in the nI -neighborhood of tj asymptotically. By selecting the local147

maximal Ri in the screening procedure, HcpnIq would cover the nI -neighborhood of T0pp0q as148

nÑ8. Also the condition n1{2I δ{σ Ñ8 indicates that the effect size cannot be too small in order149

to find the candidate points around the true change points. After selecting the candidate points, we150

perform the refinement step to identify the locations and the total number of change points.151

Algorithm 2 : RefinementScanning

(1) Compute PrpYn|Mkq by scanning over all the candidate points in HcpnIq.(2) For each p, we obtain a set of change points pT ppq corresponding to the p largest PrpYn|Mkq,

k � 1, . . . ,Kn.Optimization

(3) Select pp that maximizes PrtYn|pT ppqu.Theorem 2. Assume that nI{logpnq Ñ c, 0   c ¤ 8, n1{2I δ{σ Ñ 8, lim supnÑ8 nI{λ   1{2,and equation (2) holds. Let HcpnIq be the set containing candidate points such that |τk�1�τk| ¡ nI ,and for each tj there is a τk P HcpnIq, Prp|tj � τk| ¤ nIq � 1 � Oprmintexpp�nIδ2q, anI us,Then,

Prppp � p0q � 1�Oprmaxtexpp�nIδ2q, anI us,

and furthermore,152

Pr

#sup

ptjP pT pppqinf

tjPT0pp0q|pptj � tjq{n| ¤ nI{n

+� 1�Otexpp�nIδ2qu.

and153

Pr

#sup

tjPT0pp0q

infptjP pT pppq

|pptj � tjq{n|   nI{n+� 1�OpanI q.

Theorem 2 shows that BMS controls both the over- and under-segmentation errors. The in-154

trinsic rationale is that for any T ppq different from T0pp0q, there is at least a chosen point155

τ P T ppq whose nI -neighborhood does not contain true change points. Then the likelihood ra-156

tio PrtYn|T ppqu{PrtYn|T0pp0qu goes to 0 with probability 1, because the ratio contains at least157

one of PrpYn|Mjq and PrpYn|Mjq�1 for τk P T0pp0q and τj R T0pp0q, which converges to 0 in158

probability by Lemma 1 and (2).159

For a given p, each multiplicand in PrtYn|pT ppqu can be obtained and stored through the scanning160

step. The optimization step essentially picks the maximal PrtYn|pT ppqu among a set of known161

quantities, which avoids intensive optimization procedures. Because the computational time for162

PrpYn|Mkq grows at the speed ofOpnq for k � 1, . . . ,Kn, the computational time for the refinement163

stage grows with the sample size at the speed of OpnKnq.164

5

Page 6: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

3 Simulation165

3.1 The sequence without spikes166

To evaluate the performance of the proposed BMS method in the setting without spike points, we167

generate data from two different models. Model I takes the form of168

Yi � hTJpxiq � σεi

where the error term εi � Np0, 1q, σ � 0.5, and h � p2.01,�2.51, 1.51,�2.01,2.51,�2.11, 1.05, 2.16,�1.56, 2.56,�2.11qT with p0 � 11. We set Jpxiq � tp1 � sgnpnxi �tjqq{2, j � 1, . . . , p0uT, and the xi’s are equally spaced points on [0, 1]. The true change points are

ptj{n, j � 1, . . . , p0q � p0.1, 0.13, 0.15, 0.23, 0.25, 0.40, 0.44, 0.65, 0.76, 0.78, 0.81q.The random errors are generated from three distributions: the standard normal distribution Np0, 1q;169

Student’s t distribution with 5 degrees of freedom tp5q, which is standardized to have a unit variance;170

and the log-normal distribution LNp0, 1q, which is the exponential of the standard normal distribution171

and also standardized to have a unit variance. Model II considers a heteroscedastic error term across172

the segments. The data generating model is173

Yi � hTJptiq � σεi

1TJptiq¹j�1

vj

where pvj , j � 1, . . . , 11q � p1, 0.5, 3, 2{3, 0.5, 3, 2{3, 0.5, 3, 2{3, 0.5q. Other model specificationsremain the same as those in model I. We define the over- and under-segmentation errors as dppGn|Gnqand dpGn|pGnq respectively,

dppGn|Gnq � supbPGn

infaP pGn

|a� b|, dpGn|pGnq � supbP pGn

infaPGn

|a� b|.

For the BMS procedure, we consider three different priors for πp�q, corresponding to the local prior,174

non-local moment prior and non-local inverse moment priors. To save the space, we only present175

the results by using the non-local inverse moment prior. We take nI � tlogpnqu1.5h, where h ¥ 0.5176

generally works well in the simulations.177

For a comprehensive comparison with existing methods, we consider BMS under the non-local178

inverse moment prior πIp�q with q � v � 2 and s � 6, PELT [12], WBS [7], NOT with normal or179

heavy-tail distributions [1] and SML [5]. Table 1 summarizes the numerical results under model I180

and model II with normal, Student’s t, and log-normal error distributions and their heteroscedastic181

counterparts, respectively. On average, BMS performs the best in selecting the number of change182

points and balancing both over- and under-segmentation errors. It is expected that the performances183

of WBS, PELT, and SML deteriorate when the errors do not follow a normal distribution, because184

all the three procedures heavily rely on the parametric model assumptions, and hence they are not185

robust to model mis-specifications. In contrast, both BMS and NOT behave well under various error186

distributions. Also, NOT and SML perform the best in controlling the over-segmentation errors,187

while the resulting estimator pp tends to be larger than the true p0. On the other hand, the BMS allows188

for slightly larger over-segmentation errors in order to maintain pp to be more concentrated around p0.189

3.2 The sequence with spike points.190

In addition, we illustrate the features of the BMS, NOT and SML methods on the sequences contami-191

nated with spike points. Assume the noises are normal, we generate 500 sequences each contains192

n � 1000 points with mean changes at 0.01 and �0.01 on the 400 and 440’s observations, respec-193

tively. We set the noise standard deviation to be 0.002. Further, we generate 10 random samples194

uniformly in the range of p�0.07,�0.08q and p0.07, 0.08q, and add them to the original sequence at195

random locations to form up the spike points. Note that we choose these parameters to mimic the196

real data setting. We implement BMS, NOT, and SML on the simulated samples. In BMS, we select197

nI � 12, which is the largest integer that smaller than 0.65tlogpnqu1.5.198

From Table 2, we can seen that the BMS is insensitive to the spike points with the smallest |pp� p0|199

on average. In the experiments (not show) NOT ignores both the change points with small signal200

6

Page 7: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

Table 1: Comparison results averaged over 200 simulations among the BMS, PELT, WBS, NOT andSML methods under model I and model II under different error distributions: the standard normaldistribution Np0, 1q, Student’s tp5q, and log-normal LNp0, 1q with constant variances; and thecorresponding distributions with heteroscedastic variances. Standard deviations are in parentheses.

Error Method pp� p0 dpGn| pGnq dp pGn|GnqDistribution ¤ �3 �2 �1 0 1 2 ¥ 3Np0, 1q BMS 0 0 1 197 2 0 0 2.41 (6.06) 1.96 (3.94)

PELT 0 1 37 162 0 0 0 0.91 (1.19) 6.32 (11.92)WBS 0 0 0 194 6 0 0 1.22 (4.13) 0.86 (0.79)NOT 0 0 0 192 7 1 0 1.93 (8.04) 0.75 (0.80)SML 0 0 0 132 52 13 3 12.94 (42.98) 0.78 (0.90)

tp3q BMS 0 0 8 190 2 0 0 2.15 (5.76) 2.83 (7.01)PELT 0 4 31 165 0 0 0 0.95 (1.03) 6.24 (12.24)NOT 0 0 3 184 3 6 4 7.57 (27.70) 1.51 (2.57)SML 0 0 0 42 34 44 80 40.13 (53.68) 0.88 (0.87)

LNp0, 1q BMS 0 0 12 180 6 1 2 3.69 (12.12) 3.11 (6.89)PELT 1 2 21 135 15 23 3 12.10 (29.63) 7.22 (13.32)NOT 0 1 4 183 7 1 4 6.06 (26.32) 1.18 (4.45)SML 0 0 0 0 0 4 196 111.77 (52.05) 0.73 (1.33)

Heterosced BMS 0 0 13 176 8 3 0 3.69 (7.08) 3.88 (7.36)Np0, 1q PELT 0 0 31 169 0 0 0 1.49 (1.53) 6.15 (11.44)

NOT 0 0 0 150 23 21 6 7.52 (12.60) 1.66 (1.68)SML 0 0 0 119 55 20 6 6.75 (23.98) 1.37 (1.52)

Heterosced BMS 0 0 14 181 4 1 0 2.79 (4.87) 4.21 (8.17)tp5q PELT 0 4 35 159 2 0 0 1.50 (1.83) 7.76 (13.48)

NOT 0 1 5 179 10 2 3 8.15 (25.01) 2.36 (4.70)SML 0 0 0 43 22 41 94 26.74 (35.42) 1.27 (1.59)

Heterosced BMS 0 0 20 173 7 0 0 3.73 (11.72) 4.20 (7.85)LNp0, 1q PELT 0 2 21 142 22 12 1 6.07 (16.09) 8.10 (13.93)

NOT 0 0 4 183 7 4 2 6.32 (24.67) 1.42 (3.70)SML 0 0 0 2 1 0 197 68.05 (47.15) 0.87 (1.67)

noise ratio and the spike signals with small segment length. On the other hand, SML is sensitive to201

the extreme value. To this end, BMS is the most suitable procedure for the MRgRT data, because on202

the one hand it reinforces the minimal segment length to avoid the identification of the spike signal;203

on the other hand it retains small minimal segment length to detect change points with short distance.204

Table 2: Comparison results averaged over 500 simulations among the BMS, NOT and SML methodson the data sequences with spike points.

Method pp� p0¤ �1 0 1 2 ¥ 3

BMS 31 276 113 67 13NOT 387 32 20 11 50SML 0 0 0 0 500

4 MRgRT data205

We illustrate the BMS method with the application to the MRgRT data which contains 2265 data206

points ordered by the distances from the sources of the radiations dose. The R code for implementing207

the method can be downloaded from our GitHub repository [10]. Throughout the implementation, we208

use the non-local inverse moment prior πIpµq � sνq{2{Γtq{p2squµ�pq�1q exp �pµ2{νq�s( with209

q � 2, ν � 2. We set nI � 13, which is the largest integer that smaller than 0.65tlogpnqu1.5.210

We first vary s from 2 to 10. Figure 1 shows that when s is small, we identify more change211

points than the true ones, and as s grows, the number of identified change points decreases. This212

phenomenon is consistent with the result in Lemma 4 that the convergence rate for the non-local prior213

is Optexpp�ns{p1�sqI qu. When s is small, the Bayes factor vanishes slowly and hence the algorithm214

picks redundant change points. When s is sufficiently large, the convergence rate approaches to215

7

Page 8: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

Optexpp�nIqu, and hence the algorithm eliminates the flat points more effectively. In the following216

analysis, we select s � 10, with which the algorithm provides the best result in Figure 1.217

-0.010

-0.005

0.000

0.005

0.010

Distance from the source

Y

-0.010

-0.005

0.000

0.005

0.010

Distance from the source

Y

-0.010

-0.005

0.000

0.005

0.010

Distance from the source

Y

0 12 24 35 47 59 71 83 95 123

s= 2

s= 5

s = 10

Figure 1: Change points detection when using different s � 2, 5, 10

Interestingly, after we remove the spike points and kept the data within the range of p�0.01, 0.01q,218

we implement the NOT, SML, and BMS on the truncated sequence. Figures 2 shows that the results219

from the three methods are overlapped. The NOT and BMS methods have similar results, and both220

outperform SML. This implies that removing the spike points improves the boundary detection221

accuracy for all the three methods. However, the spike remover is infeasible in practice, because the222

locations and the magnitudes of the spike are difficult to track in human bodies.

-0.010

-0.005

0.000

0.005

0.010

Distance from the source

Sig

nal l

evel

0 12 24 36 49 63 75 87 101

Figure 2: Change points detection after restricting the data in the range of (-0.01, 0.01). BMS: thered line solid; NOT: the blue dash line; SML: the brown two dash linear.

223

5 Conclusion224

We propose the BMS method that consistently identifies multiple mean changes in a data sequence.225

The BMS method removes the flat points effectively without sacrificing the detection accuracy.226

Further, our method is particularly useful when the data sequence contains spike points, which are227

not of interest. We apply the BMS to analyze the MRgRT data, for which the NOT, SMT and other228

methods fail to but the BMS algorithm correctly detects the mean changes boundaries. We explore229

the BMS performance with different tuning parameters, and the resulting patterns are consistent with230

the theoretical properties. Moreover, we demonstrate that the BMS is robust to the error distributions231

by evaluating the detection procedures on the sequence with different random errors.232

8

Page 9: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

References233

[1] Rafal Baranowski, Yining Chen, and Piotr Fryzlewicz. Narrowest-over-threshold detection of234

multiple change-points and change-point-like features. arXiv preprint arXiv:1609.00293, 2016.235

[2] Francesco Bertolino, Walter Racugno, and Elías Moreno. Bayesian model selection approach to236

analysis of variance under heteroscedasticity. The Statistician, pages 503–517, 2000.237

[3] Caterina Conigliani and Anthony O’hagan. Sensitivity of the fractional bayes factor to prior238

distributions. Canadian Journal of Statistics, 28(2):343–352, 2000.239

[4] Fulvio De Santis and Fulvio Spezzaferri. Consistent fractional bayes factor for nested normal240

linear models. Journal of statistical planning and inference, 97(2):305–321, 2001.241

[5] Chao Du, Chu-Lan Michael Kao, and SC Kou. Stepwise signal extraction via marginal242

likelihood. Journal of the American Statistical Association, 111(513):314–330, 2016.243

[6] Peter Eiauer and Peter Hackl. The use of MOSUMS for quality control. Technometrics,244

20(4):431–436, 1978.245

[7] Piotr Fryzlewicz. Wild binary segmentation for multiple change-point detection. Annals of246

Statistics, 42(6):2243–2281, 2014.247

[8] Joseph Glaz, Joseph I Naus, Sylvan Wallenstein, Sylvan Wallenstein, and Joseph I Naus. Scan248

Statistics. Springer, 2001.249

[9] Harold Jeffreys. The theory of probability. Oxford University Press, 1998.250

[10] Fei Jiang, Guosheng Yin, and Francesca Dominici. Bayesian multiple change points detection251

with non-local priors. R code, 2017.252

[11] Valen E Johnson and David Rossell. On the use of non-local prior densities in Bayesian253

hypothesis tests. Journal of the Royal Statistical Society: Series B, 72(2):143–170, 2010.254

[12] Rebecca Killick, Paul Fearnhead, and IA Eckley. Optimal detection of change points with a255

linear computational cost. Journal of the American Statistical Association, 107(500):1590–1598,256

2012.257

[13] Claudia Kirch and Birte Muhsal. A MOSUM procedure for the estimation of multiple random258

change points. Preprint, 2014.259

[14] Marc Lavielle and Carenne Ludeña. The multiple change-points problem for the spectral260

distribution. Bernoulli, 6(5):845–869, 2000.261

[15] Yue S Niu and Heping Zhang. The screening and ranking algorithm to detect dna copy number262

variations. Annals of Applied Statistics, 6(3):1306, 2012.263

[16] Philip Preuss, Ruprecht Puchstein, and Holger Dette. Detection of multiple structural breaks in264

multivariate time series. Journal of the American Statistical Association, 110(510):654–668,265

2015.266

[17] Stephen G Walker. Modern bayesian asymptotics. Statistical Science, pages 111–117, 2004.267

[18] Yi-Ching Yao. Approximating the distribution of the maximum likelihood estimate of the268

change-point in a sequence of independent random variables. Annals of Statistics, 15(3), 1987.269

[19] Chun Yip Yau and Zifeng Zhao. Inference for multiple change points in time series via likelihood270

ratio scan statistics. Journal of the Royal Statistical Society: Series B, 78(4):895–916, 2015.271

9

Page 10: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

Appendix272

273

A Additional Figures274

Figure A.1: Reconstructed image of a slice in a cylindrical dosimeter with a cavity in the middle (left) and atypical line profile through the center of the cavity (right). The radiation enter the dosimeter from the hole on theleft of the cylindrical dosimeter. The cylindrical rotates 360 degree so that the radiation can enter from differentdirection.

-0.010

-0.005

0.000

0.005

0.010 NOT

Distance from the source

Y

-0.010

-0.005

0.000

0.005

0.010 SML

Distance from the source

Y

Figure A.2: The change point detection results from the SML with normal prior and the maximal number ofchange points to be 30 and NOT methods for the dose level changes in the DMOS system.

B Proofs275

Let g1pµq and g2pµq denote the first and second derivatives of a generic function gpµq with respect to276

µ respectively, and further define the utility function as277

Ukpµq � �τk�1�1¸l�τk

pYl � Yτk � µq2.

The following conditions are imposed for the theoretical derivations.278

(A1) Assume µ to be in a closed set of points in R.279

(A2) Assume πpµq to be a continuous density function with bounded first and second derivatives.280

(A3) Assume that PrtYn|T ppqu has a unique maximizer in the neighborhood of T0pp0q.281

Lemma 1. Assume that τk is a change point for which the mean of Yl � Yτk satisfies |µk0| ¡ δ for282

δ ¡ 0, n1{2k δ{σ Ñ8, then there is a constant D ¡ 0 such that283

limnÑ8

Pr

�³±τk�1�1l�τk

expt�pYl � Yτk � µq2uπpµqdµ±τk�1�1l�τk

expt�pYl � Yτkq2u¡ exppDnkδ2q

�� 1.

10

Page 11: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

Proof: By the definition of Ukpµq, we can write284 ³±τk�1�1l�τk

expt�pYl � Yτk � µq2uπpµqdµ±τk�1�1l�τk

expt�pYl � Yτkq2u�

³exptUkpµquπpµqdµ

exptUkp0qu .

We first define Nδpµk0q � tµ : |µ� µk0|   δu and N cδ pµk0q as its compliment, and then show285

limnÑ8

Pr

�sup

µPN cδ pµk0q

tUkpµq � Ukpµk0qu   �Dnkδ2�� 1. (4)

Note that286

Ukpµq � Ukpµk0q

� nk

#pµ2k0 � µ2q � n�1

k pµk0 � µqτk�1�1¸l�τk

2pYl � Yτkq+

� nktpµ2k0 � µ2q � 2pµk0 � µqµk0 �Oppn1{2k |µk0 � µ|σkqu

� nkt�pµ� µk0q2u �Oppn1{2k |µk0 � µ|σkq¤ �nkδ2 �Oppn1{2k |µk0 � µ|σkq� �nkδ2{2� nkδ

2{2�Oppn1{2k |µk0 � µ|σkq.As n1{2k δ{σ Ñ8, we have �nkδ2{2�Oppn1{2k |µk0 � µ|σkq   0 with probability 1, and thus287

limnÑ8

Pr

�sup

µPN cδ pµk0q

tUkpµq � Ukpµk0qu   �Dnkδ2�� 1.

When τk is a change point, let µ � 0, because |µk0| ¡ δ, we have288

limnÑ8

Pr

�exptUkp0qu

exptUkpµk0qu   expp�Dnkδ2q�� 1. (5)

By the Laplace approximation,289 »exptUkpµquπpµqdµ � Opr�U2

k prµq�1{2 exptUkprµquπprµqs, (6)

where rµ is the maximizer of Ukpµq � logtπpµqu, and U2k prµq � Oppnjq. Let pµ be the maximizer of290

Ukpµq, and then291

0 � L1kprµq � Blogπprµq{Bµ� L2kpµ�qprµ� pµq � Blogπprµq{Bµ,

where µ� is a point on the line segment between rµ and pµ. As πpµq has two bounded derivatives by292

condition (A2), L2kpµ�q � Oppnjq, we have rµ� pµ � Oppn�1j q. Therefore, (6) can be written as293 »

exptUkpµquπpµqdµ � Opr�U2k ppµq�1{2 exptUkppµquπppµqs

� Opr�U2k pµk0q�1{2 exptUkpµk0quπpµk0qs, (7)

where the last equality holds because pµ is the least squared estimator. This implies294

exptUkpµk0qu³exptUkpµquπpµqdµ � Oppn1{2k q, (8)

which in conjunction with (5) leads to295

limnÑ8

Pr

�exptUkp0qu³

exptUkpµquπpµqdµ   expp�Dnkδ2q�� 1.

By condition (A2) and the boundedness of Ukpµq, we have296

limnÑ8

Pr

�³exptUkpµquπpµqdµ

exptUkp0qu ¡ exppDnkδq�� 1,

which completes the proof.297

11

Page 12: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

Lemma 2. Let πpµq � πLpµq be a local prior, and assume that τj is not a change point, i.e., µj0 � 0,298

then299 ³±τj�1�1l�τj

expt�pYl � Yτj � µq2uπpµqdµ±τj�1�1l�τj

expt�pYl � Yτj q2u� Oppn�1{2

j q.

Proof: By the definition of Ujpµq, we can write300 ³exptUjpµquπpµqdµ

exptUjp0qu �³±τj�1�1

l�τjexpt�pYl � Yτj � µq2uπpµqdµ±τj�1�1

l�τjexpt�pYl � Yτj q2u

.

Using the same argument as that leading to (7) with µj0 � 0, we have301 »exptUjpµquπpµqdµ � Opr�U2p0q�1{2 exptUjp0quπp0qs.

Since U2p0q�1{2 � Oppn�1{2j q, and πp0q is a bounded density, we have302 ³

exptUjpµquπpµqdµexptUjp0qu � Oppn�1{2

j q.

Lemma 3. Let303

πpµq � πM pµq � µ2v

CMπbpµq,

where CM is a normalizing constant, πbpµq with πbp0q ¡ 0 is the base prior density with 2v finite304

moments, and bounded first two derivatives in the neighborhood around 0. Assume that τj is not a305

change point, i.e., µj0 � 0, then306 ³±τj�1�1l�τj

expt�pYl � Yτj � µq2uπpµqdµ±τj�1�1l�τj

expt�pYl � Yτj q2u� Oppn�v�1{2

j q.

Proof: We can write307 » τj�1�1¹l�τj

expt�pYl � Yτj � µq2uπpµqdµ �»

exptUjpµq � logπpµqudµ.

Let hpµq � Ujpµq � logπpµq � 2vlogpµq � logtπbpµqu � Ujpµq, and let rµ be the maximizer of308

hpµq, then we have309

2v{rµ� π1bprµq{πbprµq � L1jprµq � 0.

If we expand L1jprµq around pµ, the least squared estimator for µj0, the above equality can be rewritten310

as311

2v{n� n�1rµπ1bprµq{πbprµq � L2j pµ�qrµprµ� pµq � 0,

where µ� is the point on the line segment between pµ and rµ. Therefore,312

Oppn�1q � rµprµ� pµq� prµ� pµq2 � pµprµ� pµq� prµ� pµ� pµ{2q2 � pµ2{4.

Along with the fact that pµ � Oppn�1{2j q, we have rµ� pµ � Oppn�1{2

j q, and rµ � Oppn�1{2q. Next,313

by the Laplace expansion, we have314 »expthpµqudu � Oppt2v{rµ2 � U2

j prµqu�1{2 expr2vlogprµq � logtπbprµqu � Ujprµqsq,12

Page 13: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

and also315

n�1Ujprµq � n�1Ujp0q � n�1U 1jpµ��qrµ

� n�1tU 1jppµq � U 1

jpµ��q � U 1jppµqurµ

� Oppµ�� � pµqrµ� Oppn�1q, (9)

where µ�� is on the line segment between rµ and 0. Thus,316

|Ujprµq � Ujp0q| � Opp1q.As a result,317 ³±τj�1�1

l�τjexpt�pYl � Yτj � µq2uπpµqdµ±τj�1�1

l�τjexpt�pYl � Yτj q2u

�³

exptUjpµquπpµqdµexptUjp0qu

�³

expthpuquduexptUjp0qu

� Oppt2v{rµ2 � U2j prµqu�1{2 expr2vlogprµq � logtπM prµqu � Ujprµq � Ujp0qsq

� Oppn�1{2j rµ2vq

� Oppn�1{2�vj q,

where the last equality holds by the fact that rµ � Oppn�1{2q. This proves the result.318

Lemma 4. Let319

πpµq � πIpµq � sνq{2

Γtq{p2squµ�pq�1q exp

#��µ2

ν

�s+.

Assume that τj is not a change point, i.e., µj0 � 0, then320 ³±τj�1�1l�τj

expt�pYl � Yτj � µq2uπpµqdµ±τj�1�1l�τj

expt�pYl � Yτj q2u� Optexpp�ns{ps�1q

j qu.

Proof: We first write321 » τj�1�1¹l�τj

expt�pYl � Yτj � µq2uπpµqdµ � c

»exp

Ujpµq � µ�2sνs � pq � 1qlogpµq( dµ,

for a constant c. Let hpµq � Ujpµq �µ�2s� pq� 1qlogpµq, and assume rµ is the maximizer of hpµq,322

then we have323

U 1jprµq � 2srµ�2s�1νs � pq � 1qrµ�1 � U 1

jpµ�qprµ� pµq � 2srµ�2s�1νs � pq � 1qrµ�1 � 0,

where µ� is on the line segment between rµ and pµ. The above equality yields324

njrµ2s�2p1� pµ{rµq � 2sνs � pq � 1qrµ2s

�U2j pµ�q{nj

, (10)

which implies rµ � Oppn1{p2s�2qj q.325

From (10), we have nrµ2s�1prµ� pµq � Opp1q, which leads to326

rµ� pµ � Optn�p4s�3q{p2s�2qj u. (11)

Following (30) in [11] and using our notation, we obtain327 »expthpµqudu � Op

�" p4s2 � 2sq2s�2rµ � U2j prµq*�1{2

|rµ|�q�1 expt�rµ�2sνs � Ujprµqu�.

13

Page 14: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

Expanding Ujprµq around the least squared estimator pµ, we have328

Ujprµq � Ujppµq � 1{2U2j prµ� pµq2

� Ujppµq � opp1q� Ujp0q �Opp1q,

where the second equality follows (11), and the last equality follows the same argument as that329

leading to (9). Therefore, we have330 ³±τj�1�1l�τj

expt�pYl � Yτj � µq2uπpµqdµ±τj�1�1l�τj

expt�pYl � Yτj q2u

�³

exptUjpµquπpµqdµexptUjp0qu

�³

expthpuquduexptUjp0qu

� Op

�" p4s2 � 2sq2s�2rµ � U2j prµq*�1{2

|rµ|�q�1 expt�rµ�2sνs � Ujprµq � Ujp0qu�

� Optexpp�ns{ps�1qj qu.

This proves the result.331

Lemma 5. Assume p0 � 1 and τk is the only true change point. As n1{2k δ{σ Ñ8, PrpMk|Ynq �332

1 � OptKnanI expp�nIδ2qu. Hence when nI{logpnq Ñ c, 0   c ¤ 8, nI ¤ λ, we have333

PrpMk|Ynq pÑ 1.334

Proof: First we can write335

PrpMk|Ynq �#

1�Kn

j�k

PrpYn|MjqPrpYn|Mkq

+�1

. (12)

To show PrpMk|Ynq � 1 Ñ 0, it is equivalent to showingKn

j�1,j�k

PrpYn|MjqPrpYn|Mpq Ñ 0.

Note that336

PrpYn|MjqPrpYn|Mkq �

³±τj�1�1l�τj

expt�pYl � Yτj � µq2uπpµqdµ±τj�1�1l�τj

expt�pYl � Yτj q2u

�±τk�1�1l�τk

expt�pYl � Yτkq2u³±τk�1�1l�τk

expt�pYl � Yτk � µq2uπpµqdµ� AB,

where337

A �³±τj�1�1

l�τjexpt�pYl � Yτj � µq2uπpµqdµ±τj�1�1

l�τjexpt�pYl � Yτj q2u

and338

B �±τk�1�1l�τk

expt�pYl � Yτkq2u³±τk�1�1l�τk

expt�pYl � Yτk � µq2uπpµqdµ.

As shown in [11], A is a Bayes factor whose convergence rate is Oppanj q. For B, first note that the339

data in rτk, τk�1q are generated from the model with mean µk0 such that |µk0| ¡ δ. Hence, we have340

B � Optexpp�nkδ2qu,

14

Page 15: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

where the last equality holds by Lemma 1 that341

limnÑ8

�Pr

� ±τk�1�1l�τk

expt�pYl � Yτkq2u³±τk�1�1l�τk

expt�pYl � Yτk � µq2uπpµqdµ  expp�Dnkδ2q

��� 1,

where D is a constant. Combining the convergence rates for A and B, we have342

AB � Optank expp�nkδ2qu. (13)

Thus,343

PrpYn|MjqPrpYn|Mkq � AB � OptanI expp�nIδ2qu, (14)

and344

Kn

j�1

PrpYn|MjqPrpYn|Mkq � 1 � OptKnanI expp�nIδ2qu

Plugging this result into (12), we have345

PrpMk|Ynq pÑ 1,

which completes the proof.346

Proof of Theorem 1347

First we can write348 ¸MkPM

PrpMk|Ynq �#

1�°MjRM PrpYn|Mjq°MkPM PrpYn|Mkq

+�1

. (15)

Note that349 °MjRM PrpYn|Mjq°MkPM PrpYn|Mkq ¤

°MjRM PrpYn|Mjq

PrpYn|Mkq ,

for Mk PM. Hence by the same argument as that leading to Lemma 5, we have350 °MjRM PrpYn|Mjq°MkPM PrpYn|Mkq � OptKnanI expp�nIδ2qu,

and351 ¸MkPM

PrpMk|Ynq � 1 � OptKnanI expp�nIδ2qu.

This proves the result.352

Proof of Proposition 1353

Following [15], we define x as an nI -flat point so that there is no change-point in px� nI , x� nIq.354

Let F be the set of all nI -flat points, then355

Pr

$&%�� £tPT0pp0q

Rt ¡ C

� £�£τPF

Rτ   C

�,.-� 1� Pr

$&%�� ¤τPT0pp0q

Rτ ¡ C

� ¤�¤τPF

Rτ   C

�,.-� 1� Pr

"�max

tPT0pp0qRt ¡ C

¤�minτPF

Rτ   C

*¥ 1�

"Pr

�max

tPT0pp0qRt ¡ C

� Pr

�minτPF

Rτ   C

*.

15

Page 16: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

For each τ P T0pp0q,356

PrpRτ   Cq � Otexpp�nIδ2qu,which holds by Lemma 1. Furthermore, for τ P F ,357

PrpRτ ¡ Cq � OpanI q,which holds by Lemmas 2 to 4. Hence,358

Pr

$&%�� £tPT0pp0q

Rt ¡ C

� £�£τPF

Rτ   C

�,.-¥ 1�Ormintexpp�nIδ2q, anI us

By Lemma 3 in [15], for any t P T0pp0q we have a τ P HcpnIq such that Prtt P pτ �nI , τ �nIqu �359

1�Ormintexpp�nIδ2q, anI us.360

Proof of Theorem 2361

We first show that for a given p, pT ppq is the maximizer of PrtYn|T ppqu. Based on the BMS362

procedure, pT ppq is the maximizer of°MkPM PrpMk|Ynq, where M � tMk, τk P T ppqu. Since363

we impose the uniform prior on Mk, pT ppq is the maximizer of364 ¸MkPM

PrpYn|Mkq (16)

� Dn

¸MkPM

K¹j�1,j�k

τj�1�1¹l�τj

expt�pYl � Yτj q2u» τk�1�1¹

l�τk

expt�pYl � Yτk � µq2uπpµqdµ

� Dn

K¹j�1

τj�1�1¹l�τj

expt�pYl � Yτj q2u¸

MkPM

³±τk�1�1l�τk

expt�pYl � Yτk � µq2uπpµqdµ±τk�1�1l�τk

expt�pYl � Yτkq2u

where Dn is a constant depending on n. Further note that365

PrtYn|T ppqu (17)

�¹

τjRT ppq

τj�1�1¹l�τj

expt�pYl � Yτj q2u¹

τkPT ppq

» τk�1�1¹l�τk

expt�pYl � Yτk � µq2uπpµqdµ

�K¹j�1

τj�1�1¹l�τj

expt�pYl � Yτj q2u¹

τkPT ppq

³±τk�1�1l�τk

expt�pYl � Yτk � µq2uπpµqdµ±τk�1�1l�τk

expt�pYl � Yτkq2u.

Comparing (17) and (16), clearly they have the same optimizer, and thus pT ppq is the maximizer of366

(17). Hence, our BMS procedure results in the estimators pp and pT pppq that maximize PrtYn|T ppqu.367

Next, let E1 be the event that at least one j such that tj P pτk, τk�1q, and pti � τk, pti � τk�1 for all i368

and τk, τk�1 P HcpnIq, pti P pT pppq that maximizes PrtYn|T ppqu. Following the similar arguments369

as those in [5], we show that the probability of E1 goes to 0. Suppose that pT pppq is such an estimate.370

Consider the first case where ptj � τk � 1qpτk�1 � τk � 1q�1 � Op1q; that is, tj is bounded away371

from τk. Then, we can choose a set of change points that372

rT ppp� 1q � trτ1, . . . , rτpp�1u� tpt1, . . . ,pti, τk�1,pti�1,ptppu.

Then373

PrtYn|pT pppquPrtY|rT ppp� 1qu

�±τk�2�1l�τk�1

expt�pYl � Yτk�1q2u³±τk�2�1

l�τk�1expt�pYl � Yτk�1

� µq2uπpµqdµ.

16

Page 17: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

Because limnÑ8 supnI{λ   1{2, there is an N , such that for all n ¡ N , tj�1 ¡ τk�2, and hence374

there is no change point within pτk�1, τk�2q. This prevents the situation where there are more than375

one change points in between τk and τk�2. Further for n ¡ N ,376

EpYl � Yτk�1q � pτk�1 � τk � 1q�1E

#tj¸

s�τk

pYl � Ysq �τk�1

s�tj�1

pYl � Ysq+

¥ ptj � τk � 1qpτk�1 � τk � 1q�1δ.

Therefore, by Lemma 1,377

PrtYn|pT pppquPrtY|rT ppp� 1qu

� Optexpp�nIδ2qu.

If ptj � τk � 1qpτk�1 � τk � 1q�1 � op1q, we define378 rT ppp� 1q � trτ1, . . . , rτpp�1u� tpt1, . . . ,pti, τk,pti�1,ptppu,

and then379

PrtYn|pT pppquPrtY|rT ppp� 1qu

�±τk�1�1l�τk

expt�pYl � Yτkq2u³±τk�1�1l�τk

expt�pYl � Yτk � µq2uπpµqdµ

�±τk�1�1l�tj

expt�pYl � Yτkq2u³±τk�1�1l�tj

expt�pYl � Yτk � µq2uπpµqdµ

�³±τk�1�1

l�tjexpt�pYl � Yτk � µq2uπpµqdµ³±τk�1�1

l�τkexpt�pYl � Yτk � µq2uπpµqdµ

�±τk�1�1l�τk

expt�pYl � Yτkq2u±τk�1�1l�tj

expt�pYl � Yτkq2u.

The first term in the last equation is of order Optexpp�nIδ2qu by Lemma 1, and the last two terms380

are of order Opp1q because ptj � τk � 1qpτk�1 � τk � 1q�1 � op1q. Therefore,381

Pr

�PrtYn|pT pppqu

PrtY|rT ppp� 1qu¡ 1|E1

�¤ E

�PrtYn|pT pppqu

PrtY|rT ppp� 1qu

�� Otexpp�nIδ2qu.

As pT pppq is the maximizer of PrtYn|T ppqu, Pr�

PrtYn| pT pppquPrtY| rT ppp�1qu

¡ 1�� 1 because pT pppq is the382

maximizer of PrtYn|pT pppqu and it is unique by condition (A3). By the Bayes rule, we have383

Pr

�E1| PrtYn|pT pppqu

PrtY|rT ppp� 1qu¡ 1

�¤ Otexpp�nIδ2qu. (18)

Hence, Prppti � τk or pti � τk�1q � 1 � Otexpp�nIδ2qu. Further note that τk and τk�1 are in the384

nI -neighborhood of tj , and thus for any tj , there is a pti such that385

Prttj P ppti � nI ,pti � nIqu � 1�Otexpp�nIδ2qu.Since it holds for any j, we can write386

Pr

#sup

ptjP pT pppqinf

tjPT0pp0q|pptj � tjq{n| ¤ nI{n

+� 1�Otexpp�nIδ2qu.

Next we show that for any pti there is a tj in the nI -neighborhood of pti. Define E2 as the event that387

there is at least one pti such that no tj in the nI -neighborhood of pti. Let pT pppq be such an estimate that388

17

Page 18: Bayesian Model Selection Approach to Boundary Detection ... · Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors Anonymous Author(s) Affiliation Address

pti is the kth candidate points and ppti, τk�1q, pτk�1,ptiq do not contain tj for all j. Then, we define a389

new set of change points by deleting pti,390 rT ppp� 1q � tpt1, . . . ,pti�1,pti�1,ptppu.Then,391

PrtYn|pT pppquPrtY|rT ppp� 1qu

�±τk�1�1

l�ptiexpt�pYl � Y

pti� µq2uπpµqdµ±τk�1�1

l�ptiexpt�pYl � Y

ptiq2u

� OppanI q

by Lemmas 2–4. Therefore, using the same argument as that leading to (18), we have392

PrpE2| PrtYn| pT pppquPrtY| rT ppp�1qu

¡ 1q � OpanI q. For any pti, there exists a tj such that393

Prtpti P ptj � nI , tj � nIqu � 1�OpanI q.It holds for any pti, and thus we have394

Pr

�sup

tjPT0pp0q

infptjP pT pppq

|pptj � tjq{n|   nI{n�� 1�OpanI q

Because ptj � nI , tj � nIq contains only one estimate by the definition of pT pppq that |ptj�1 � ptj | ¡ λ,395

we have Prppp � p0q � 1 � Oppmaxtexpp�nIδ2q, anI uq by using the same arguments as those396

leading to Theorem 3.3 in [5].397

18