ufdcimages.uflib.ufl.edu · acknowledgments i would like to gratefully and sincerely thank dr....

INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS

By

CLAUDIO FUENTES

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2011

c© 2011 Claudio Fuentes

2

To my parents, who have been there in every step

3

ACKNOWLEDGMENTS

I would like to gratefully and sincerely thank Dr. George Casella for his guidance,

understanding and patience during my graduate studies at the University of Florida.

Working with him, as a research assistant and as a student, has been one of the most

rewarding experiences of my life. His wealth of knowledge and experience has shaped

not only the way I understand statistics today.

I would also like to thank my graduate committee members: Dr. Michael Daniels,

Dr. Malay Ghosh and Dr. Gary Peter for their understanding and support, throughout

the whole process. Their sharp comments and suggestions have greatly improved the

quality of this work.

I am deeply grateful to all my teachers and professors. In particular those at

the University of Florida and the Pontificia Universidad Catolica de Chile. It is not a

exaggeration to say that almost everything I know today is the product of their dedication

and excellence at teaching. Without any doubts, they thought me more than I could

learn. Thank you Dr. Alvaro Cofre. I would not be here writing these lines if it was not for

your constant support and inspiration.

Finally, I would like to thank my parents Jorge Fuentes and Edith Melendez. It is

because of their unconditional love and support that I have been able to reach this far.

4

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

CHAPTER

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.1 Two Formulations of the Problem . . . . . . . . . . . . . . . . . . . . . . . 91.2 Inference on the Selected Mean . . . . . . . . . . . . . . . . . . . . . . . 10

2 INTERVAL ESTIMATION FOLLOWING THE SELECTION OF ONEPOPULATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1 The Known Variance Case . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 The Unknown Variance Case . . . . . . . . . . . . . . . . . . . . . . . . . 212.3 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4 Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 CONFIDENCE INTERVALS FOLLOWING THE SELECTION OF k ≥ 1POPULATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1 An Alternative Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.3 Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 INTERVAL ESTIMATION FOLLOWING THE SELECTION OF A RANDOMNUMBER OF POPULATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1 Connection to FDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.2 Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 APPLICATION EXAMPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.1 Fixed Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.2 Random Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.3 Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

LIST OF REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5

LIST OF TABLES

Table page

2-1 Configuration of the new parameterization for the coverage probability . . . . . 24

2-2 Configuration of the new parameterization for the case p = 3 . . . . . . . . . . 24

2-3 Representation of the parameters ∆i ,j when p = k + 1 . . . . . . . . . . . . . . 24

2-4 Coverage probability of 95% CI for the selected mean when p = 4 . . . . . . . 25

3-1 Structure of the ∆’s for the case p = 4, k = 2 . . . . . . . . . . . . . . . . . . . 44

3-2 Coverage probabilities for the number of population means vs the number ofselected populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3-3 Observed confidence coefficient for 95% CI when p = 6 . . . . . . . . . . . . . 44

3-4 Cutoff points for 95% CI using the new method . . . . . . . . . . . . . . . . . . 45

5-1 Confidence intervals for fixed top log-score differences . . . . . . . . . . . . . . 55

5-2 Confidence intervals for random top log-score differences . . . . . . . . . . . . 55

6

LIST OF FIGURES

Figure page

2-1 Coverage probability as a function of ∆21 and ∆32 when p = 3 . . . . . . . . . . 25

2-2 Plot of ∂h/∂∆21 when p = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2-3 Plots of the first two terms of ∂h/∂∆21 . . . . . . . . . . . . . . . . . . . . . . . 26

2-4 Confidence coefficient vs the number of populations for the iid case and α =0.05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2-5 Cutoff point versus number of populations for the iid case and α = 0.05 . . . . 28

3-1 Coverage probabilities as a function of ∆ when p = 6 . . . . . . . . . . . . . . . 45

4-1 Individual components for the coverage probability for random K . . . . . . . . 50

4-2 Lower bound for random K varying the probability selection . . . . . . . . . . . 51

4-3 Coverage probabilities for random K for different values of p . . . . . . . . . . . 52

7

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS

By

Claudio Fuentes

August 2011

Chair: Dr. George CasellaMajor: Statistics

Consider an experiment in which p independent populations πi , with corresponding

unknown means θi are available and suppose that for every 1 ≤ i ≤ p, we can obtain

a sample Xi1, ... ,Xin from πi . In this context, researchers are sometimes interested

in selecting the populations that give the largest sample means as a result of the

experiment, and to estimate the corresponding population means θi . In this dissertation,

we present a frequentist approach to the problem, based on the minimization of the

coverage probability, and discuss how to construct confidence intervals for the mean

of k ≥ 1 selected populations, assuming the populations πi are normal and have a

common variance σ2. Finally, we extend the results for the case when the value of k

is randomly chosen and discuss the potential connection of the procedure with false

discovery rate analysis. We include numerical studies and a real application example

that corroborate this new approach produces confidence intervals that maintain the

nominal coverage probability while taking into account the selection procedure.

8

CHAPTER 1INTRODUCTION

Given a set of p available technologies (treatments, machines, etc.), researchers

must often determine which one is the best, or simply rank them according to a certain

pre-specified criteria. For instance, researchers may be interested in determining what

treatment is more efficient in fighting a certain disease, or they could be interested in

ranking a class of vehicles following a safety standard. This type of problems is known

as ranking and selection problems and specific solutions and procedures have been

proposed in the literature since the second half of the 20th century, with a start which

is usually traced back to Bechhofer (1954), Gupta and Sobel (1957). In his paper,

Bechhofer presents a single sample multiple decision procedure for ranking means of

normal populations. Assuming the variances of the populations are known, he is able

to obtain closed form expressions for the probabilities of a correct ranking in different

scenarios. This approach is more concerned with selection of the population with the

largest mean rather than estimation of that mean. Gupta and co-authors have pioneered

the subset selection approach, in which a subset of populations is selected with a

minimum probability guarantee of containing the largest mean with certain probability

P∗ (see Gupta and Panchapakesan (2002)); while Bechhofer uses an indifferent zone.

That is, there is a minimum guaranteed probability of selecting the population with the

largest mean, as long as that mean is separated from the second largest by a specified

distance δ (see Bechhofer et al. (1995)).

1.1 Two Formulations of the Problem

Here we are concerned with estimation, and describe two formulations of this

problem, with subtle differences between them. Suppose that we have p populations,

with unknown means θi (1 ≤ i ≤ p). Assuming that for every 1 ≤ i ≤ p we can obtain a

sample Xi1, ... ,Xini from the population πi , we can either:

1. Select the population that has the largest parameter, max{θ1, ... , θp}, and estimateits value.

9

2. Select the population with the largest sample mean, and estimate the correspondingθi .

The first of these problems has been widely discussed in the literature. For

example, Blumenthal and Cohen (1968) consider estimating the larger mean from

two normal populations and compare different estimators, but they do not discuss how

to make the selection. In this direction, Guttman and Tiao (1964) propose a Bayesian

procedure consisting in the maximization of the expected posterior utility for a certain

utility function U(θi). In the same direction, but from a frequentist perspective, Saxena

and Tong (1969), Saxena (1976), and Chen and Dudewicz (1976) consider point and

interval estimation of the largest mean.

1.2 Inference on the Selected Mean

Surprisingly, the second problem has received less attention. In this context, a

common and widely used estimator is δ(X) =∑pi=1 Xi I (Xi = X(p)). This estimator

has been discussed in the literature and is known to be biased (Putter and Rubinstein

(1968)). This issue becomes clear if we consider all the populations to be identically

distributed, for we will be estimating the population mean by an extreme value.

Dahiya (1974) addresses this problem for the case of two normal populations and

proposed estimators that perform better in terms of the MSE. Progress was made by

Cohen and Sackrowitz (1982), Cohen and Sackrowitz (1986) and Gupta and Miescke

(1990), where Bayes and generalized Bayes rules were obtained and studied. However,

performance theorems are scarce. One exception is Hwang (1993), who proposes an

empirical Bayes estimator and shows that it performs better in terms of the Bayes risk

with respect to any normal prior. Another exception is Sackrowitz and Samuel-Cahn

(1984) who, in the case of the negative exponential distribution, find UMVUE and

minimax estimators of the mean of the selected population.

The problem of improving the intuitive estimator is technically difficult. In addition,

despite the obvious bias problem, it has been difficult to establish its optimality

10

properties. Standard investigations in admissibility and minimaxity, following ideas

such as Berger (1976), Brown (1979) and Lele (1993) are not straightforward. In

this direction, Stein (1964) established the minimaxity and admissibility of the naive

estimator for k = 2. Minimaxity for the general case, was established later by Sackrowitz

and Samuel-Cahn (1986), were they discussed the case normal case for k ≥ 3.Admissibility, for the general case, appears to be still open.

Similarly, interval estimation is an equally challenging and again, little can be

found in the literature. Typically, confidence intervals are constructed in the usual way,

using the standard normal distribution as a reference to attain the desired coverage

probability. However these intervals do not maintain the nominal coverage probability, as

the number of populations increase.

Qiu and Hwang (2007) propose an empirical Bayes approach to construct

simultaneous confidence intervals for K selected means, but we are not aware of

any other attempts to solve this problem. In their paper, Qiu and Hwang consider a

normal-normal model for the mean of the selected population, which assumes that

each population mean θi follows a normal distribution. Under these assumptions they

are able to construct simultaneous confidence intervals that maintain the nominal

coverage probability and are substantially shorter than the intervals constructed

using the Bonferroni’s bounds. However the confidence intervals they propose are

asymptotically optimal, and since their coverage probabilities are obtained averaging

over both sample space and prior, they do not give a valid frequentist interval. We are

not aware of any other attempts to solve this problem.

Recently, a modern variation of this problem has become very popular, with a

major reason being the explosion of genomic data, calling for the development of

new methodologies. For instance, in genomic studies, looking either for differential

expression or genome wide association, thousands of genes are screened, but only

a smaller number are selected for further study. Consequently, the assessment of

11

significance, through testing or interval estimation, must take this selection mechanism

into account. If the usual confidence intervals are used (not accounting for selection) the

actual confidence coefficient is smaller than the nominal level, and approaches zero as

the number of genes (populations) increases.

In this dissertation, we address the problem of interval estimation and present

a frequentist approach to construct confidence intervals for the means of the selected

populations, where the selection mechanism are properly described in the corresponding

chapters. In Chapter 2 we focus on the problem of selecting one population. In Chapter

3 we introduce a novel methodology to produce confidence intervals when selecting

k > 1 populations, where k is a fixed and known number. Later, in Chapter 4 we extend

the results for the case k > 1, when k is a random quantity. Finally, in Chapter 5 we

discuss the main conclusions and possible extensions of the results presented on this

dissertation.

12

CHAPTER 2INTERVAL ESTIMATION FOLLOWING THE SELECTION OF ONE

POPULATION

For 1 ≤ i ≤ p, let Xi1, ... ,Xin be a random sample from a population πi with unknown

mean θi and variance σ2. Assume the populations πi are independent and normally

distributed, so that the sample mean Xi = n−1∑nj=1 Xij ∼ N(θi ,σ2/n) for i = 1 ... p and

define the order statistics X(1), ... ,X(p) as the sample values placed in descending order.

In other words, the order statistics satisfy X(1) ≥ ... ≥ X(p). In this context, we want

to construct confidence intervals for the mean of the population that gives the largest

sample mean as a result of the experiment.

Formally, if we define θ(1) =∑pi=1 θi I (Xi = X(1)), our aim is to produce confidence

intervals for θ(1), based on X(1), such that the confidence coefficient is at least 1 − α, for

any 0 < α < 1 specified prior to the experiment.

It is not difficult to realize that the standard confidence intervals do not maintain

the nominal coverage probability. For instance, if all the populations πi are normally

distributed with mean θ and variance 1, then, for samples of size n = 1, X1, ... ,Xp ∼iid N(θ, 1). It follows that P(X(1) ≤ x) = Φp(x − θ), where Φ(·) denotes the cdf of the

standard normal distribution. Moreover the mean of the selected population θ(1) = θ and

hence

P(θ(1) ∈ X(1) ± c) = Φp(c)−Φp(−c).

for any value of c > 0.

In particular, when p = 3, we obtain

P(θ(1) ∈ X(1) ± c) = Φ3(c)−Φ3(−c)

= (Φ(c)−Φ(−c))(Φ2(c) + Φ(c)Φ(−c) + Φ2(−c))

= (2Φ(c)− 1)(1−Φ(c) + Φ2(c)).

13

Since 1−Φ(c)+Φ2(c) < 1, we have the standard confidence interval is smaller than

the nominal level given by 2Φ(c) − 1. In fact, it is easy to show that coverage probability

maintain the nominal level only for p = 1 and 2, and then decreases as p goes to infinity.

The problem is that the traditional intervals do not take into account the selection

mechanism. Thus, in order to construct confidence intervals that maintain the nominal

level we must take into account the selection procedure. To this end, we first consider

the partition of the sample space induced by the order statistics and write

P(θ(1) ∈ X(1) ± c) =p∑

i=1

P(θi ∈ Xi ± c ,Xi = X(1)). (2–1)

Observe that each term in the sum (2–1) can be explicitly determined using the joint

distribution of (X1, ... ,Xp). For example, when i = 1 (the first term of the sum), we have

P(θ1 ∈ X1 ± c ,X1 = X(1)) = P(θ1 ∈ X1 ± c ,X1 ≥ X2, ... ,X1 ≥ Xp). (2–2)

In the next section we derive a closed form expression for the coverage probability in

(2–1), assuming the population variance σ2 is known, and present a new approach to

obtain the desired confidence intervals.

2.1 The Known Variance Case

Suppose the population variance σ2 is known and define Zj =√n(Xj − θj)/σ for

j = 1, ... , p. It follows that Z1, ... ,Zp ∼ iid N(0, 1) and

X1 ≥ Xj ⇔ √n(X1 − θ1)/σ ≥

√n(Xj − θj + θj − θ1)/σ

⇔ Z1 ≥ Zj +∆j1⇔ Z1 − Zj ≥ ∆j1,

where ∆j1 =√n(θj − θ1)/σ for j = 1, ... , p.

At this point, to simplify the notation we take n = σ2 = 1. Then, if we consider the

transformation

14

T :

z = z1

ω2 = z1 − z2...

ωp = z1 − zpwe can rewrite (2–2) in terms of ∆21 ... ∆p1, and obtain

P(θ1 ∈ X1 ± c ,X1 ≥ X2, ... ,X1 ≥ Xp) = P(|z | ≤ c ,ω2 ≥ ∆21, ... ,ωp ≥ ∆p1)

=1

(2π)p/2

∫ c

−c

{p∏

j=2

∫ ∞

∆j1

e−12(ωj−z)2dωj

}e−

12z2dz .

Notice that for fixed z , the integrals within the curly brackets { } are essentially the

tail probability of a normal distribution centered at z . Therefore, we can write

P(|z | ≤ c ,ω2 ≥ ∆21, ... ,ωp ≥ ∆p1) =∫ c

−c

{p∏

j=2

Φ(z − ∆j1)}

φ(z)dz ,

where φ(·) denotes the pdf of the standard normal distribution.

Of course, the same argument is valid for the remaining terms of the sum in (2–1).

It follows that we can fully describe the probability P(θ(1) ∈ X(1) ± c) in terms of a new

set of parameters ∆ij ’s, where ∆ij = θi − θj for 1 ≤ i , j ≤ p. Under this representation,

for every c > 0, the value of the coverage probability P(θ(1) ∈ X(1) ± c) is determined

by the relative distances between the population means θi , i = 1, ... , p. In other

words, we coverage probability defines a function hc(∆) = P(θ(1) ∈ X(1) ± c), where

∆ = (∆11, ∆12, ... , ∆pp) is the vector of possible configurations of the relative distances

∆ij ’s.

In this context, we can obtain confidence intervals for θ(1), that have (at least) the

right nominal level, by minimizing first the function hc . Specifically, given 0 ≤ α ≤ 1, we

can determine the value of c > 0 that satisfies

P(θ(1) ∈ X(1) ± c) ≥ min∆hc(∆) = 1− α. (2–3)

15

In order to minimize the function hc , we first notice the following properties of the

parameters ∆ij ’s:

1. ∆jj = 0, for every j .

2. ∆ij = −∆ji , for every i , j .

3. For j > k , ∆jk = ∆j ,j−1 + ∆j−1,j−2 + ... + ∆k+1,k .

These properties reveal a certain underlying symmetry in the structure of the

problem. This symmetry is portrayed in Table 2-1 where every entry ∆ij corresponds to

the difference between the values of θi and θj located in row i and column j respectively.

In addition, Property 3 indicates that we only need to consider p − 1 parameters in

order to determine the value of P(θ(1) ∈ X(1) ± c). In fact, for any given ordering of the

parameters θi ’s, we can always choose a representation of the probability in (2–1) based

on p − 1 parameters ∆ij . As a result, we have that the true ordering of the population

means θi ’s is not particularly relevant in this approach, and hence, we will assume

(without any loss of generality) that θ1 ≥ θ2 ≥ ... ≥ θp.

Although the introduction of the new parameterization seems to reduce (in a sense)

the complexity of the problem, the minimization of hc is still difficult. First, because of

the delicate balance existing between the ∆ij ’s in the full expression (see Table 2-1) and

second, because the formula of the coverage probability is somehow involved.

To illustrate these problems, let us discuss the case p = 2. We have

P(θ(1) ∈ X(1) ± c) =∫ c

−cΦ(z − ∆12)φ(z)dz +

∫ c

−cΦ(z + ∆12)φ(z)dz

=

∫ c

−c[Φ(z − ∆12) + Φ(z + ∆12)]φ(z)dz ,

where ∆12 > 0.

Since only the quantity in brackets [ ] depends on ∆21 and φ(z) > 0, it seems

reasonable to think that hc(∆12) = P(θ(1) ∈ X(1) ± c) is minimized at the same point

where gz(∆12) = Φ(z − ∆12) + Φ(z + ∆12) finds its minimum. However, differentiating gz

16

with respect to ∆12 we obtain

dgzd∆12

= φ(z +∆12)− φ(z − ∆12)

≥ 0, z ≤ 0

< 0, z > 0

where we observe that the value of the derivative depends on ∆12 and z , and consequently,

the minimum of hc can not be determined by simple examination of the behavior of gz .

From the analysis of g′z , we conclude that gz(∆12) is minimized at ∆12 = 0, when

z ≤ 0 and (asymptotically) at ∆12 = +∞, when z > 0. Then, we can establish the

inequality

P(θ(1) ∈ X(1) ± c) ≥∫ 0

−c2Φ(z)φ(z)dz +

∫ c

0

φ(z)dz ,

however, this lower bound is not obtained by direct minimization of the coverage

probability and is less appealing. The problem is that a strategy based on this type

of lower bounds may be too conservative and lead to extremely wide intervals when

applied to higher dimensions (p > 2).

In order to find a formal solution to the minimization problem, we start with the case

p = 3. For this case, we can fully describe the probability of interest in terms of the two

parameters ∆12 and ∆23, as is shown in Table 2-2. We obtain

P(θ(1) ∈ X(1) ± c) = 1√2π

∫ c

−cΦ(z − ∆12)Φ(z − ∆23 − ∆12)e− 12 z2dz (2–4)

+1√2π

∫ c

−cΦ(z + ∆12)Φ(z − ∆23)e− 12 z2dz

+1√2π

∫ c

−cΦ(z + ∆23)Φ(z +∆23 + ∆12)e

− 12z2dz ,

where ∆12, ∆23 ≥ 0 and Φ(·) denotes the cdf of the standard normal distribution.

Preliminary studies suggest that the global minimum of hc(∆12, ∆23) = P(θ(1) ∈X(1) ± c) is located at the origin (see Figure 2-1), but a formal proof is required. To this

end, it is sufficient to show that ∂hc/∂∆23 > 0 and ∂hc/∂∆12 > 0.

17

Taking partial derivatives with respect to ∆21 we obtain

∂hc∂∆12

=1

2π

∫ c

−cΦ(z + ∆23)e

− 12(∆23+∆12+z)

2− 12z2dz (2–5)

− 12π

∫ c

−cΦ(z − ∆12)e− 12 (∆23+∆12−z)2− 12 z2dz

+1

2π

∫ c

−cΦ(z − ∆23)e− 12 (∆12+z)2− 12 z2dz

− 12π

∫ c

−cΦ(z − ∆23 − ∆12)e− 12 (∆12−z)2− 12 z2dz .

Since the partial derivative depends on both ∆12 and ∆23, the behavior of its sign

is not obvious, but different numerical studies support the idea that the derivative is

non-negative. Figure 2-2 shows the plot of the integrand of ∂hc/∂∆12 for fixed values of

∆12 and ∆23.

Notice that if we group the first two terms and the last two terms of (2–5), we can

look at the partial derivative as the sum of two differences. In Figure 2-3 we observe (in

separate plots) the integrands of the first two terms of the partial derivative ∂hc/∂∆12, for

fixed values of ∆12 and ∆23. The plot suggest that the integrands differ only by a location

parameter. In fact, changing variables, we can rewrite the expression in (2–5) as

∂hc∂∆12

= D1 +D2, (2–6)

where

D1 =1

2π

{∫ ∆23+∆12+c

∆23+∆12−c−

∫ c

−c

}Φ(z − ∆12)e− 12 (∆23+∆12−z)2− 12 z2dz ,

D2 =1

2π

{∫ ∆12+c

∆12−c−

∫ c

−c

}Φ(z − ∆23 − ∆12)e− 12 (∆12−z)2− 12 z2dz .

Recall that ∆12 > 0, then looking at D2, we have two possibilities for the intervals of

integration:

1. −c < ∆12 − c < c < ∆12 + c .

2. −c < c < ∆12 − c < ∆12 + c .

18

In other words, the intervals may overlap or not. Denoting by R1 and R2 the

non-common regions of integration, that is

• R1 = (−c , ∆12 − c) and R2 = (c , ∆12 + c) for case (1).• R1 = (−c , c) and R2 = (∆12 − c , ∆12 + c) for case (2).

We have that D2 is guaranteed to be positive, as long as the integral over R2 is greater

than the integral over R1, regardless of the case.

We first notice that R1 and R2 are intervals of the same length. In fact, `(R1) =

`(R2) = ∆12 for case (1), and `(R1) = `(R2) = 2c for case (2). Then, we only need to

show that for any two points z1 ∈ R1 and z2 ∈ R2 located at a certain distance ε > 0 from

the extremes of the corresponding intervals, the integrand evaluated at z2 is greater than

the integrand evaluated at z1.

Observe that for any z1 < z2,

Φ(z2 − ∆23 − ∆12)× ez2∆12−z22Φ(z1 − ∆23 − ∆12)× ez1∆12−z21

= q × exp{(z2 − z1)[∆12 − (z2 + z1)]}, (2–7)

where q = Φ(z2 − ∆23 − ∆12)/Φ(z1 − ∆23 − ∆12) > 1.

Then, for any 0 < ε < min{∆12, 2c}, take z1 = ∆12 − c − ε, z2 = c + ε whenever

min{∆12, 2c} = ∆21 (i.e. case 1) and z1 = c− ε, z2 = ∆12− c+ ε whenever min{∆12, 2c} =2c (i.e. case 2). Replacing these values in (2–7) we obtain the ratio is greater than 1

(regardless the case) which is compelling to conclude that D2 > 0.

Notice that the argument still holds if we replace the cdf Φ(·) by any non-decreasing

function or if we change the interval (−c , c) for (−c1, c2), where c1, c2 > 0. This way, we

obtain the following more general result:

Proposition 2.1. Let ∆1, ∆2, c1, c2 > 0 and let the function f (z ,λ) be non decreasing in

z , where λ is an arbitrary set of parameters. Then,

{∫ ∆1+c2

∆1−c1−

∫ c2

−c1

}f (z ,λ)e−

12(∆1−z)2− 12 z2dz ≥ 0,

where the inequality is strict whenever the function f is monotonically increasing in z .

19

An immediate consequence of Proposition 2.1 is that D1 > 0. As a result, we obtain

that ∂h/∂∆12 > 0. A similar argument shows that ∂h/∂∆23 > 0, completing the proof. It

follows that coverage probability P(θ(1) ∈ X(1) ± c) is minimized at ∆12 = ∆23 = 0, that is,

whenever θ1 = θ2 = θ3.

Observe that Proposition 2.1 gives a straightforward proof for the case p = 2. In

effect, for hc(∆12) = P(θ(1) ∈ X(1) ± c), we have

dhcd∆21

=

∫ ∆12+c

∆12−cφ(z − ∆12)φ(z)dz −

∫ c

−cφ(z − ∆12)φ(z)dz .

Then, applying Proposition 2.1 with f = 1/2π, we obtain that h′c(∆12) ≥ 0. It

immediately follows that the coverage probability is minimized at ∆12 = 0, or equivalently,

when θ1 = θ2.

For the general case (p > 3), we observe that when moving from the case p = k

to the case p = k + 1, we only need to include the extra parameter ∆k+1,k in order to

describe the problem (see Table 2-3). Then, using Proposition 2.1 and mathematical

induction we obtain the following result:

Lemma 1. Let c1, c2 > 0 and for p ≥ 2, let X1, ... ,Xp be independent random variables

with Xi ∼ N(θi , 1). Then,

minθ1,...,θp

P(θ(1) ∈ (X(p) − c1,X(p) + c2)) = p

∫ c2

−c1Φp−1(z)φ(z)dz

= Φp(c2)−Φp(−c1),

where Φ(·) and φ(·) are respectively the cdf and pdf of the standard normal distribution.

Using this lemma, we can easily obtain the following theorem, that summarizes the

main results of this section. The proof is straightforward.

Theorem 2.1. Let 0 < α < 1 and for i = 1, ... , p, suppose that Xi1, ... ,Xin is a random

sample from a N(θi ,σ2), where θi is unknown, but σ2 is known. Then, a confidence

interval for θ(1) =∑pi=1 θi I (Xi = X(1)) with a confidence coefficient of (at least) (1 − α) is

20

given by

X(1) ± σ√nc ,

where the value of c satisfies

Φp(c)−Φp(−c) = 1− α.

2.2 The Unknown Variance Case

If the variance σ2 is unknown, we need to estimate its value. We assume that

we have an independent estimate s2 of σ2, such that s/σ has a pdf ϕ. In a regular

experiment, where we observe a sample of size n from each population, s2 can be

taken as the pooled variance estimate and s2/σ2 ∼ χ2ν , a chi-square distribution with

ν = p(n − 1) degrees of freedom.

Suppose first that p = 3 and for simplicity take n = 1. Then, the coverage probability

can be written as

P(θ(1) ∈ X(1) ± sc) = P(|Z1| ≤ cs/σ,Z1 ≥ Z2 + ∆21,Z1 ≥ Z3 +∆31)

+P(Z2 ≥ Z1 + ∆21, |Z2| ≤ cs/σ,Z2 ≥ Z3 + ∆32)

+P(Z3 ≥ Z1 + ∆13,Z3 ≥ Z2 + ∆32, |Z3| ≤ cs/σ) (2–8)

where Zi = (Xi − θi)/σ and ∆ij = (θi − θj)/σ for 1 ≤ i , j ≤ 3.Notice that taking t = s/σ we can rewrite each term in the sum (2–8) as a mixture.

We obtain

P(θ(1) ∈ (X(1) − sc) =∫ ∞

0

P(|Z1| ≤ ct,Z1 ≥ Z2 + ∆21,Z1 ≥ Z3 + ∆31|t)ϕ(t)dt

+

∫ ∞

0

P(Z2 ≥ Z1 + ∆21, |Z2| ≤ ct,Z2 ≥ Z3 + ∆32|t)ϕ(t)dt

+

∫ ∞

0

P(Z3 ≥ Z1 + ∆13,Z3 ≥ Z2 +∆32, |Z3| ≤ ct|t)ϕ(t)dt,

21

where ϕ(·) denotes the pdf of t. It follows that

P(θ(1) ∈ X(1) ± sc) =∫ ∞

0

P(θ(1) ∈ X(1) ± tc |t)ϕ(t)dt,

where we know (from Section 2.1) that the probability P(θ(1) ∈ X(1) ± tc |t) in the integral

is minimized at θ1 = θ2 = θ3.

The generalization of this result follows from a direct application of Lemma 1.

Lemma 2. Let c1, c2 > 0 and for p ≥ 2, let X1, ... ,Xp be independent random vari-

ables with Xi ∼ N(θi , 1), where both θi and σ2 are unknown. If s2 is an estimate of σ2

independent of X1, ... ,Xn, then

minθ1,...,θp

P(θ(1) ∈ (X(1) − sc1,X(1) − sc2)) =∫ ∞

0

(Φp(c2t)−Φp(−c1t))ϕ(t)dt,

where ϕ(·) is the pdf of s/σ and Φ(·) is the cdf of the standard normal distribution.

We end this section with the following theorem. The proof follows directly form

Lemma 2.


sample from a N(θi ,σ2), where θi and σ2 are unknown. Then, a confidence interval for

the θ(1) =∑pi=1 θi I (Xi = X(1)) with a confidence coefficient of (at least) (1−α) is given by

X(1) ± s√nc ,

where s =√p−1(n − 1)∑p

i=1 s2i , s

2i = (n − 1)−1

∑nj=1(Xij − Xi)2 for i = 1, ... , p and c

satisfies ∫ ∞

0

(Φp(ct)−Φp(−ct))ϕ(t)dt = 1− α.

2.3 Numerical Studies

In this chapter, we have proposed a method to construct confidence intervals for the

mean of the selected population that takes into account the selection procedure. In this

section we present some numerical results that compare the performance of the new

and the traditional intervals.

22

First, we study the behavior of the confidence coefficient, as a function of the

number of populations. Results show that the confident coefficient of the traditional

intervals decreases rapidly as the number of population increases. This effect is

particularly extreme when all the populations have the same mean. Figure 2-4 shows

the result of simulations considering up to 30 populations with the same mean and

setting α = 0.05. The solid blue line represents the confidence coefficient obtained using

our proposed confidence intervals and the dashed red line depicts the behavior of the

confidence coefficient obtained using the standard confidence intervals. Observe that

the solid line is constant at the nominal level 95%.

Intuitively, in order to maintain the coverage probability constant, the confidence

intervals need to get wider. However, this increment is not dramatic and slow down as

the number of populations increase. For instance, if we consider 10000 populations, the

value of the cutoff point is only about 4.41. In fact, from the inequality in Theorem 2.1 it

can be determined that the behavior of the cutoff value c ≈√log(p).

An indirect way to obtain confidence intervals for θ(1), that attain (at least) the

nominal level, would be to construct simultaneous confidence intervals for the means of

all the populations considered in the experiment using, for instance, Bonferroni intervals.

The natural question is whether such a procedure produces better intervals, in terms of

the length. The answer is no. In fact, the size of the Bonferroni intervals increase at a

faster rate compared to the intervals we propose. Figure 2-5 shows the behavior of the

cutoff point c , as the number of populations increase for the case α = 0.05. The solid

line correspond to the value of the standard cutoff point for a 95% confidence interval

(zα/2 = 1.96). The dashed/dotted line represents the value of c for the new confidence

intervals and the dashed line correspond to the cutoff values for the Bonferroni intervals.

In an applied situation, the population means θi (1 ≤ i ≤ p) will be rarely identical.

Hence we need to compare the performance of the confidence intervals when the

populations means are different. Table 2-4 summarize some results obtained by

23

simulations for the case p = 4. The first column shows the true value of the population

means (all of them with variance equal to 1) while the second and third column show the

observed coverage probability for the traditional and new intervals at a confidence level

of 95%. The reported values correspond to the average for the coverage probabilities

after ten replications and the numbers in parenthesis are the corresponding standard

errors.

We observe that our proposed intervals outperform the traditional ones, even when

the population means are far apart. It is interesting to notice that even in situations

where one of the population should be somehow distinguishable (see row four in Table

2-4), the traditional intervals may perform poorly.

2.4 Tables and Figures

Table 2-1. Configuration of the new parameterization for the probabilityP(θ(1) ∈ X(1) ± c). In the table ∆ij = θi − θj .

θ1 θ2 · · · θpθ1 0 -∆21 · · · -∆p1θ2 ∆21 0 · · · -∆p2...

...... . . . ...

θp ∆p1 ∆p2 · · · 0

Table 2-2. Configuration of the new parameterization for the case p = 3, when ∆12 and∆23 are the free parameters. In the table ∆ij = θi − θj .

θ1 θ2 θ3θ1 0 -∆12 -(∆23 + ∆12)θ2 ∆12 0 -∆23θ3 ∆23 +∆12 ∆23 0

Table 2-3. Representation of the parameters ∆i ,j for the case p = k + 1.θ1 θ2 ... θk θk+1

θ1 0 ∆21 ... ∆k,k−1 + ... + ∆21 ∆k+1,k + ... + ∆21θ2 −∆21 0 ... ∆k,k−1 + ... + ∆32 ∆k+1,k + ... + ∆32...

......

. . ....

...θk -(∆k,k−1 + ... + ∆21) -(∆k,k−1 + ... + ∆32) ... 0 ∆k+1,kθk+1 -(∆k+1,k + ... + ∆21) -(∆k+1,k + ... + ∆32) ... −∆k+1,k 0

24

Table 2-4. Observed coverage probability of 95% CI for the mean of the selectedpopulation out of four populations using the traditional and the new method.The reported values correspond to the average after ten replications and thenumber in parenthesis is the corresponding standard error.

(θ1, θ2, θ3, θ4) Trad CI New CI(0,0,0,0) 0.904 0.952

(0.0016) (0.0012)(0,0.25,0.5,1) 0.907 0.952

(0.0020) (0.0011)(0,5,10,15) 0.950 0.974

(0.0014) (0.0009)(0,0,0,2) 0.928 0.9584

(0.0042) (0.0027)(0,0,0,5) 0.952 0.973

(0.0031) (0.0028)

0.0

0.5

1.0

1.5

2.00

1

2

3

0.86

0.87

0.88

0.89

0.90

Figure 2-1. Coverage probability as a function of ∆21 and ∆32 when p = 3.

25

-4 -2 2 4

-0.00005

0.00005

Figure 2-2. Plot of ∂h/∂∆21 for predetermined values of ∆21 and ∆32.

-4 -2 2 4

0.00002

0.00004

0.00006

0.00008

0.00010

0.00012

-4 -2 2 4

0.00002

0.00004

0.00006

0.00008

0.00010

0.00012

Figure 2-3. Plots of the first two terms of ∂h/∂∆21 for predetermined values of ∆21 and∆32.

26

0 5 10 15 20 25 30

0.5

0.6

0.7

0.8

0.9

Number of Populations

Co

nfid

en

ce

Co

eff

icie

nt

New

Traditional

Figure 2-4. Confidence coefficient versus number of populations for the case of identicalpopulation means and α = 0.05. The solid blue line corresponds to theconfidence coefficient for the new confidence intervals, and the dashed redline corresponds to the confidence coefficient for the traditional confidenceintervals.

27

0 5 10 15 20 25 30

1.5

2.0

2.5

3.0

Number of Populations

Cu

toff

Va

lue

Nominal

New

Bonferroni

Figure 2-5. Cutoff point versus number of populations for the case of identical populationmeans and α = 0.05. The dashed blue line corresponds to the cutoff valuefor the traditional confidence interval, zα/2 = 1.96. The dashed red linecorresponds to the cutoff value for the new intervals and the dashed linecorresponds to the cutoff value for the Bonferroni intervals.

28

CHAPTER 3CONFIDENCE INTERVALS FOLLOWING THE SELECTION OF K ≥ 1

POPULATIONS

Using the same framework as in Chapter 2, we assume that for i = 1, ... , p, we

have independent random variables Xj ∼ N(θj ,σ2/n). Also, we define the order statistics

X(1), ... ,X(p) according the inequalities X(1) ≥ ... ≥ X(p) and for simplicity, we start

considering σ2 = n = 1. Then, we observe that the mean of the population from which

the j th biggest observation, X(j), is sampled, can be written as

θ(j) =

p∑

i=1

θi I (Xi = X(j)).

In this context, we want to find the value of c > 0 such that

P(θ(1) ∈ X(1) ± c , ... , θ(k) ∈ X(k) ± c) ≥ 1− α (3–1)

for any 0 < α < 1 and 1 ≤ k ≤ p.Following the same approach we used in Chapter 2, we can write the probability in

(3–1) as

∑

j1 6=...6=jkP(θ(1) ∈ X(1) ± c , ... , θ(k) ∈ X(k) ± c ,X(1) = Xj1, ... ,X(k) = Xjk ),

where the sum has(pk

)terms.

Let us consider first, the case p = 4 and k = 2. Then, the probability of interest is

P(θ(1) ∈ X(1) ± c , θ(2) ∈ X(2) ± c) =∑

i 6=jP(θi ∈ Xi ± c , θj ∈ Xj ± c ,X(1) = Xi ,X(2) = Xj),

(3–2)

where 1 ≤ i , j ≤ 4.In order to obtain closed form expressions for each terms in the sum, observe that

for X(1) = X1 and X(2) = X2, we have (X(1) = X1,X(2) = X2) = (X1 ≥ X2,X2 ≥ X3,X2 ≥X4). In other words, the relative order between X3 and X4 is irrelevant.

29

It follows that we only need to pay attention to possible configurations of the random

variables that are the top. In this case the possible configurations are

(X1 ≥ X2,X2 ≥ X3,X2 ≥ X4) (X3 ≥ X1,X1 ≥ X2,X1 ≥ X4)

(X1 ≥ X3,X3 ≥ X2,X3 ≥ X4) (X3 ≥ X2,X2 ≥ X1,X2 ≥ X4)

(X1 ≥ X4,X4 ≥ X2,X4 ≥ X3) (X3 ≥ X4,X4 ≥ X1,X4 ≥ X2)

(X2 ≥ X1,X1 ≥ X3,X1 ≥ X4) (X4 ≥ X1,X1 ≥ X1,X1 ≥ X3)

(X2 ≥ X3,X3 ≥ X1,X3 ≥ X4) (X4 ≥ X2,X2 ≥ X1,X2 ≥ X3)

(X2 ≥ X4,X4 ≥ X1,X4 ≥ X3) (X4 ≥ X3,X3 ≥ X1,X3 ≥ X2)

If we define Zj = Xj − θj (1 ≤ j ≤ 4) and ∆ij = θi − θj (1 ≤ i , j ≤ 4), we observe

X1 ≥ X2 ⇔ Z1 ≥ Z2 + ∆21X2 ≥ X3 ⇔ Z2 ≥ Z3 + ∆32X2 ≥ X4 ⇔ Z2 ≥ Z4 + ∆42,

where Z1, ... ,Z4 are iid N(0, 1).

Then, the first term of the sum in (3–2) can be written

P(θ1 ∈ X1 ± c , θ2 ∈ X2 ± c ,X1 ≥ X2,X2 ≥ X3,X2 ≥ X4) =

P(|Z1| ≤ c , |Z2| ≤ c ,Z2 ≤ Z1 + ∆12,Z3 ≤ Z2 + ∆23,Z4 ≤ Z2 + ∆24)

and making use of the normality assumptions, we can explicitly write

P(θ1 ∈ X1 ± c , θ2 ∈ X2 ± c ,X1 ≥ X2,X2 ≥ X3,X2 ≥ X4)

=

∫ c

−c

∫ min(c,z1−∆21)

−cΦ(z2 − ∆32)Φ(z2 − ∆43})φ(z1)φ(z2)dz2dz1

+

∫ c

−c

∫ min(c,z2−∆12)

−cΦ(z1 − ∆31)Φ(z1 − ∆41})φ(z1)φ(z2)dz1dz2

Of course, the same argument is valid for the other terms in the sum. This way,

considering all the 12 possible configurations for the order of the random variables X1,

30

X2, X3 and X4 we can write the sum in (3–2) in closed form

P(θ(1) ∈ X(1) ± c , θ(2) ∈ X(2) ± c)

=

∫ c

−c

∫ min(c,z1−∆21)

−cΦ(z2 − ∆32)Φ(z2 − ∆43})φ(z1)φ(z2)dz2dz1

+

∫ c

−c

∫ min(c,z2−∆12)

−cΦ(z1 − ∆31)Φ(z1 − ∆41})φ(z1)φ(z2)dz1dz2

+

∫ c

−c

∫ min(c,z1−∆31)

−cΦ(z3 − ∆23)Φ(z3 − ∆43})φ(z1)φ(z3)dz3dz1

+

∫ c

−c

∫ min(c,z3−∆13)

−cΦ(z1 − ∆21)Φ(z1 − ∆41})φ(z1)φ(z3)dz1dz3

+

∫ c

−c

∫ min(c,z1−∆41)

−cΦ(z4 − ∆24)Φ(z4 − ∆34})φ(z1)φ(z4)dz4dz1

+

∫ c

−c

∫ min(c,z4−∆14)

−cΦ(z1 − ∆21)Φ(z1 − ∆31})φ(z1)φ(z4)dz1dz4

+

∫ c

−c

∫ min(c,z2−∆32)

−cΦ(z3 − ∆13)Φ(z3 − ∆43})φ(z3)φ(z2)dz3dz2

+

∫ c

−c

∫ min(c,z3−∆23)

−cΦ(z2 − ∆12)Φ(z2 − ∆42})φ(z3)φ(z2)dz2dz3

+

∫ c

−c

∫ min(c,z2−∆42)

−cΦ(z4 − ∆14)Φ(z4 − ∆34})φ(z4)φ(z2)dz4dz2

+

∫ c

−c

∫ min(c,z4−∆24)

−cΦ(z2 − ∆12)Φ(z2 − ∆32})φ(z4)φ(z2)dz2dz4

+

∫ c

−c

∫ min(c,z3−∆43)

−cΦ(z4 − ∆14)Φ(z4 − ∆24})φ(z3)φ(z4)dz4dz3

+

∫ c

−c

∫ min(c,z4−∆34)

−cΦ(z3 − ∆13)Φ(z3 − ∆23})φ(z3)φ(z4)dz3dz4

In order to minimize this expression, we need to address two difficulties equally

challenging:

• First, the construction of any lower bound need to take into account the delicatebalance between the ∆ij ’s in the expression.

• Second, special attention need to be paid to the limits of integration. The “corners”of the form min(c , z − ∆ij) make nearly impossible any procedure based ondifferentiation.

31

To overcome the difficulty due to the “corners”, we notice that the events (Z2 ≤Z1+∆12,Z3 ≤ Z2+∆23,Z4 ≤ Z2+∆24) and (Z2 ≥ Z1+∆12,Z3 ≤ Z1+∆13,Z4 ≤ Z1+∆14)are disjoint. Hence, we can express the sum of the probabilities for these two events

as the probability of their union. Consequently, instead of writing down 12 terms for the

sum (one term per configuration), we can express the probability of interest using only 6

terms, each of them describing the two random variables positioned at the top.

Working the details, we obtain:

• X1 and X2 at the top.

P(|Z1| ≤ c , |Z2| ≤ c ,Z3 ≤ max{Z1 + ∆13,Z2 +∆23},Z4 ≤ max{Z1 + ∆14,Z2 + ∆24})=

∫ c

−c

∫ c

−cΦ(max{z1 + ∆13, z2 +∆23})Φ(max{z1 + ∆14, z2 + ∆24})φ(z1)φ(z2)dz1dz2


P(|Z1| ≤ c , |Z3| ≤ c ,Z2 ≤ max{Z1 + ∆12,Z3 − ∆23},Z4 ≤ max{Z1 + ∆14,Z3 + ∆34})=

∫ c

−c

∫ c

−cΦ(max{z1 + ∆12, z3 − ∆23})Φ(max{z1 + ∆14, z3 + ∆34})φ(z1)φ(z3)dz1dz3


P(|Z1| ≤ c , |Z4| ≤ c ,Z2 ≤ max{Z1 + ∆12,Z4 − ∆24},Z3 ≤ max{Z1 + ∆13,Z4 − ∆24})=

∫ c

−c

∫ c

−cΦ(max{z1 + ∆12, z4 − ∆24})Φ(max{z1 + ∆13, z4 − ∆34})φ(z1)φ(z4)dz1dz4


P(|Z2| ≤ c , |Z3| ≤ c ,Z1 ≤ max{Z2 − ∆12,Z3 − ∆13},Z4 ≤ max{Z2 + ∆24,Z3 + ∆34})=

∫ c

−c

∫ c

−cΦ(max{z2 − ∆12, z3 − ∆13})Φ(max{z2 + ∆24, z3 + ∆34})φ(z2)φ(z3)dz2dz3


P(|Z2| ≤ c , |Z4| ≤ c ,Z1 ≤ max{Z2 − ∆12,Z4 +∆14},Z3 ≤ max{Z2 + ∆23,Z4 − ∆34})=

∫ c

−c

∫ c

−cΦ(max{z2 − ∆12, z4 − ∆14})Φ(max{z2 + ∆23, z4 − ∆34})φ(z2)φ(z4)dz2dz4

32


P(|Z3| ≤ c , |Z4| ≤ c ,Z1 ≤ max{Z3 − ∆13,Z4 − ∆14},Z2 ≤ max{Z3 − ∆23,Z4 − ∆24})=

∫ c

−c

∫ c

−cΦ(max{z3 + ∆13, z4 − ∆14})Φ(max{z3 − ∆23, z4 − ∆24})φ(z3)φ(z4)dz3dz4

This way, an alternative representation for the probability of interest is

P(θ(1) ∈ X(1) ± c , θ(2) ∈ X(2) ± c) (3–3)

=

∫ c

−c

∫ c

−cΦ(max{z1 + ∆13, z2 + ∆23})Φ(max{z1 + ∆14, z2 + ∆24})φ(z1)φ(z2)dz1dz2

+

∫ c

−c

∫ c

−cΦ(max{z1 + ∆12, z3 − ∆23})Φ(max{z1 + ∆14, z3 + ∆34})φ(z1)φ(z3)dz1dz3

+

∫ c

−c

∫ c

−cΦ(max{z1 + ∆12, z4 − ∆24})Φ(max{z1 + ∆13, z4 − ∆34})φ(z1)φ(z4)dz1dz4

+

∫ c

−c

∫ c

−cΦ(max{z2 − ∆12, z3 − ∆13})Φ(max{z2 + ∆24, z3 + ∆34})φ(z2)φ(z3)dz2dz3

+

∫ c

−c

∫ c

−cΦ(max{z2 − ∆12, z4 − ∆14})Φ(max{z2 + ∆23, z4 − ∆34})φ(z2)φ(z4)dz2dz4

+

∫ c

−c

∫ c

−cΦ(max{z3 + ∆13, z4 − ∆14})Φ(max{z3 − ∆23, z4 − ∆24})φ(z3)φ(z4)dz3dz4

Observe that this new representation does not completely solve the problem of the

“corners”, but rather removes them from the limits of integration and puts them inside

the integrand. Now, we find expressions of the form max{z + ∆ij} in the argument of

the normal cdf’s Φ(·), which still makes difficult any minimization approach based on

differentiation.

However, this new representation reveals more clearly the symmetry in the structure

of the ∆’s, as is portrayed in Table 3-1. This pattern is particularly important since it

suggests to generalize the expression for any values of p and k .

In order to determine the configuration of ∆’s that minimize the expression in (3–3),

we assume (without loss of generality) that θ1 ≥ θ2 ≥ θ3 ≥ θ4, this way ∆ij ≥ 0 for any

i ≤ j . Also, we consider ∆12, ∆23 and ∆34 as free parameters.

Based on our previous results, it is reasonable to believe that the minimum of (3–3)

is reached at the origin. In order to prove this claim we have studied the behavior of the

33

coverage probability (CP) for different configurations of the ∆ij ’s, with special attention to

the behavior at the boundary. Among others we considered the following cases:

• ∆12 = ∆23 = ∆34 = 0

CP = 6∫ c

−c

∫ c

−cΦ2(max{z1, z2})φ(z1)φ(z2)dz1dz2

• ∆12 > 0, ∆23 = ∆34 = 0

CP = 3

∫ c

−c

∫ c

−cΦ2(max{z1 + ∆12, z2})φ(z1)φ(z2)dz1dz2

+ 3

∫ c

−c

∫ c

−cΦ(max{z2 − ∆12, z3 − ∆12})Φ(max{z2, z3})φ(z2)φ(z3)dz2dz3

−→ 3

∫ c

−c

∫ c

−cφ(z1)φ(z2)dz1dz2, as ∆12 ↑ +∞

• ∆12, ∆23 > 0 and ∆34 = 0:

CP =

∫ c

−c

∫ c

−cΦ2(max{z1 +∆13, z2 + ∆23})φ(z1)φ(z2)dz1dz2

+ 2

∫ c

−c

∫ c

−cΦ(max{z1 + ∆12, z3 − ∆23})Φ(max{z1 + ∆13, z3})φ(z1)φ(z3)dz1dz3

+ 2

∫ c

−c

∫ c

−cΦ(max{z2 − ∆12, z3 − ∆13})Φ(max{z2 + ∆23, z3})φ(z2)φ(z3)dz2dz3

+

∫ c

−c

∫ c

−cΦ(max{z3 − ∆13, z4 − ∆13})Φ(max{z3 − ∆23, z4 − ∆23})φ(z3)φ(z4)dz3dz4

−→ 3

∫ c

−c

∫ c

−cφ(z1)φ(z2)dz1dz2, as ∆12, ∆23 ↑ +∞

However, none of the cases we considered provided conclusive (analytical) evidence

that the minimum is at the origin. On the contrary, various numerical studies has

suggested that the minimum is not located located at the origin (see Figure 3-1), but the

current formulation of the problem makes difficult even to establish that is not located at

the interior of the region determined by ∆12, ∆23 and ∆34.

These difficulties call for a different approach which we discuss in the following

section.

34

3.1 An Alternative Approach

So far, we have approached the problem considering partitions of the coverage

probability based on the possible configurations of the vector (X(1),X(2), ... ,X(k)). Notice

that such approach, by construction, takes into account the relative orderings between

the variables that are selected (the top k).

Instead, we can consider an alternative that do not take explicit consideration of the

ordering between the variables that have been selected. Notice there are(pk

)different

ways to select k out of p populations, without considering the order. Suppose that j

indexes one of such arrangements and denote by Xj1, ... ,Xjk the top k variables, and by

Xjk+1, ... ,Xjp are the bottom p − k . Then, we can separate the sample space according to

min{Xj1, ... ,Xjk} ≥ max{Xjk+1, ... ,Xjp} for j = 1, ... ,(pk

).

This way, the coverage probability can be written

P(θ(1) ∈ X(1) ± c , ... , θ(k) ∈ X(k) ± c)

=

(pk)∑

j=1

P(θj1 ∈ Xj1 ± c , ... , θjk ∈ Xjk ± c , min{Xj1, ... ,Xjk} ≥ max{Xjk+1, ... ,Xjp})

Let us consider first the term where (X1,X2, ... ,Xk) are at the top. For this case, the

corresponding piece of relevant probability is

P(θ1 ∈ X1 ± c , ... , θk ∈ Xk ± c , min{X1, ... ,Xk} ≥ max{Xk+1, ... ,Xp})

=

∫ θ1+c

θ1−c· · ·

∫ θk+c

θk−c

p∏

j=k+1

Pθj (Xj ≤ min{x1, ... , xk})f (x1, ... , xk)dx1 · · · dxk

where f (x1, ... , xk) is the joint density of (X1, ... ,Xk).

Hence, making use of the the normality assumptions, we have

P(θ1 ∈ X1 ± c , ... , θk ∈ Xk ± c , min{X1, ... ,Xk} ≥ max{Xk+1, ... ,Xp})

=

∫ c

−c· · ·

∫ c

−c

p∏

j=k+1

Φ(min{z1 + θ1, ... , zk + θk} − θj)

k∏

i=1

φ(zi)dzi ,

where zi = xi − θ1 for i = 1, ... , k .

35

From here, it is not difficult to obtain the following expression for the coverage

probability

P(θ(1) ∈ X(1) ± c , ... , θ(k) ∈ X(k) ± c)

=

(pk)∑

j=1

∫ c

−c· · ·

∫ c

−c

∏

m∈I cj

Φ

(min`∈Ij{z` + θ`} − θm

) ∏

`∈Ijφ(z`)dz`, (3–4)

where Ij = {j1, ... , jk}, the set of indices for the top k variables in the j-th arrangement

and I cj = {jk+1, ... , jp}, the set of indices for the bottom p − k variables in the j-th

arrangement.

Notice that if k = 1 we are back in the case discussed in Chapter 2 and the case

k = p correspond to simultaneous confidence intervals.

Let us take a closer look at this formula and consider first the case p = 6 and k = 3.

In such case, the sum in (3–4) will have(63

)= 20 terms determined by the configurations

123|456 234|156 345|126 456|123124|356 235|146 346|125125|346 236|145 356|124126|345 245|136134|256 246|136135|246 256|134136|245145|236146|235156|234

where the numbers to the left of the vertical line are the indices of the set Ij (the

populations being selected) and the numbers to the right the indices of the set I cj

(the populations being not selected). Observe that all the indices appear on the left side

(and on the right side) the same number of times (10), revealing some symmetry in the

problem.

36

Using this symmetry, suppose that θ1 ≤ θ2 ≤ θ3 ≤ θ4 ≤ θ5 ≤ θ6 and let

θ6 ↑ ∞. Then, for the 10 groups for which 6 is on the right side, the corresponding

term goes to zero. For the remaining groups (for which 6 appears on the left) the value

of Φ(min`=1,...,k{zj` + θj`} − θjm) is not affected by θ6, and the coverage probability is

determined by the following configuration

12|345 23|145 34|125 45|12313|245 24|135 35|12414|235 25|13415|234

which correspond to the possible ways of choosing 2 out of 5 populations.

Repeating the argument, but letting θ5 ↑ ∞ we obtain the configuration

1|234 3|1242|134 4|123

which are the possible ways to choose 1 out of 4 populations. For this case, we know

(from Chapter 2) that the minimum is reached at θ1 = θ2 = θ3 = θ4. This example

suggests that the coverage probability is minimized when the biggest p−k−1 population

means are sent to +∞ and the remaining k + 1 are set to be equal. However, a formal

argument is required.

For the general case (1 ≤ k < p) the number of possible configurations is

(p

k

)=

(p − 1k

)+

(p − 1p − k

)

=

(p − 1k

)+

(p − 1k − 1

)

where(p−1k

)is the number of times that any given index j appears on the right side

(population j is not selected) and(p−1k−1

)is the number of configurations that have index j

on the left side (population j is selected).

37

Suppose (without any loss of generality) that θ1 ≤ ... ≤ θp and define

Ij(θp) = I

(min

`∈Ij−{p}{z` + θ`} ≥ zp + θp

)

I cj (θp) = I

(min

`∈Ij−{p}{z` + θ`} < zp + θp

)

where I (·) is the indicator function.

From the definition, it immediately follows

min`∈Ij{z` + θ`} = (zp + θp)Ij(θp) + min

`∈Ij−{p}{z` + θ`}I cj (θp) (3–5)

and therefore, the coverage probability can be written as

P(θ(1) ∈ X(1) ± c , ... , θ(k) ∈ X(k) ± c)

=

(pk)∑

j=1

∫ c

−c· · ·

∫ c

−c

∏

m∈I cj

Φ((zp + θp)− θm) Ij(θp)∏

`∈Ijφ(z`)dz`

+

(pk)∑

j=1

∫ c

−c· · ·

∫ c

−c

∏

m∈I cj

Φ

(min

`∈Ij−{p}{z` + θ`} − θm

)I cj (θp)

∏

`∈Ijφ(z`)dz`

Now, observe that as θp ↑ ∞

min`∈Ij{z` + θ`} = (zp + θp)Ij(θp) + min

`∈Ij−{p}{z` + θ`}I cj (θp)

→ min`∈Ij−{p}

{z` + θ`}

and hence

∏

m∈I cj

Φ


)→

∏

m∈I cj

Φ

(min

`∈Ij−{p}{z` + θ`} − θm

)

for all the terms for which θp is on the left side.

At the same time, for the terms where θp is on the right side, we have

∏

m∈I cj

Φ


)→ 0,

38

and therefore, as θp ↑ ∞, the coverage probability converges to

(p−1k−1)∑

j=1

∫ c

−c· · ·

∫ c

−c

∏

m∈I cj

Φ

(min

`∈Ij−{p}{z` + θ`} − θm

) ∏

`∈Ijφ(z`)dz`.

Before we move forward, let us consider the example p = 3, k = 2. Then, the

coverage probability is

P(θ(1) ∈ X(1) ± c , ... , θ(3) ∈ X(3) ± c)

=

(32)∑

j=1

∫ c

−c

∫ c

−c

∏

m∈I cj

Φ


) ∏

`∈Ijφ(z`)dz`

=

∫ c

−c

∫ c

−cΦ(min{z1 + θ1, z2 + θ2} − θ3)φ(z1)φ(z2)dz1dz2

+

∫ c

−c

∫ c

−cΦ(min{z1 + θ1, z3 + θ3} − θ2)φ(z1)φ(z3)dz1dz3 (3–6)

+

∫ c

−c

∫ c

−cΦ(min{z2 + θ2, z3 + θ3} − θ1)φ(z2)φ(z3)dz2dz3,

and, as θ3 ↑ ∞, we obtain

M =

∫ c

−c

∫ c

−cΦ(z1 + θ1 − θ2)φ(z1)φ(z3)dz1dz3 (3–7)

+

∫ c

−c

∫ c

−cΦ(z2 + θ2 − θ1)φ(z2)φ(z3)dz2dz3.

Suppose now, that for a fixed θp, min{z1 + θ1, z3 + θ3} = z3 + θ3. Since we are

assuming that θ1 ≤ θ2 ≤ θ3, this can only happens for certain values of z1 and z3. Let

R1 = {(z1, z3) : min{z1 + θ1, z3 + θ3} = z1 + θ1} and R2 = {(z1, z3) : min{z1 + θ1, z3 + θ3} =z3 + θ3}. Then, the integral in (3–6) can be written as

∫∫

R1

Φ(z1 + θ1 − θ2)φ(z1)φ(z3)dz1dz3 +

∫∫

R2

Φ(z3 + θ3 − θ2)φ(z1)φ(z3)dz1dz3.

Similarly, the integral in (3–7) can be written as

∫∫

R1

Φ(z1 + θ1 − θ2)φ(z1)φ(z3)dz1dz3 +

∫∫

R2

Φ(z1 + θ1 − θ2)φ(z1)φ(z3)dz1dz3

39

and, since θ3 − θ2 ≥ θ1 − θ2, we obtain

∫ c

−c

∫ c

−cΦ(min{z1 + θ1, z3 + θ3} − θ2)φ(z1)φ(z3)dz1dz3

≥∫ c

−c

∫ c

−cΦ(z1 + θ1 − θ2)φ(z1)φ(z3)dz1dz3.

Using similar argument with the third integral in the coverage probability, we

conclude that P(θ(1) ∈ X(1) ± c , ... , θ(3) ∈ X(3) ± c) ≥ M.

For the general case, suppose that θp (fixed) is such that Ij(θp) = 1 for some j .

That is min`∈Ij{z` + θ`} = (zp + θp). Under the assumption θ1 ≤ ... ≤ θp, we have

θp − θm ≥ θ` − θm for any 1 ≤ m, ` ≤ p and therefore, Ij(θp) can be equal to 1 only in

a certain region of the hyper-cube (−c , c)k . Then, partitioning the integrals accordingly,

we obtain

P(θ(1) ∈ X(1) ± c , ... , θ(k) ∈ X(k) ± c)

=

(pk)∑

j=1

∫ c

−c· · ·

∫ c

−c

∏

m∈I cj

Φ


) ∏

`∈Ijφ(z`)dz`

≥(p−1k−1)∑

j=1

∫ c

−c· · ·

∫ c

−c

∏

m∈I cj

Φ

(min

`∈Ij−{p}{z` + θ`} − θm

) ∏

`∈Ijφ(z`)dz`, (3–8)

where the equality is attained asymptotically as θp approaches infinity.

Integrating (3–8) with respect to zp, we obtain

(Φ(c)−Φ(−c))(p−1k−1)∑

j=1

∫ c

−c· · ·

∫ c

−c

∏

m∈I cj

Φ

(min

`∈Ij−{p}{z` + θ`} − θm

) ∏

`∈Ij−{p}φ(z`)dz`

where the quantity in brackets [ ] is exactly the coverage probability for selecting k − 1out of p − 1.

Repeating the argument, but now letting θp−1 ↑ ∞, we obtain the lower bound

(Φ(c)−Φ(−c))2(p−2k−2)∑

j=1

∫ c

−c· · ·

∫ c

−c

∏

m∈I cj

Φ

(min

`∈Ij−{p}{z` + θ`} − θm

) ∏

`∈Ij−{p,p−2}φ(z`)dz`

.

40

This way, continuing the procedure until there is only 1 population on the left side

(selected) and p− k + 1 on the right side (not selected), the resulting lower bound for the

coverage probability is

(Φ(c)−Φ(−c))k−1p−k+1∑

j=1

∫ c

−c

∏

m∈I cj

Φ(z + θj − θm)φ(z)dz

.

Again, notice that the expression in brackets [ ] correspond to the coverage probability

for selecting 1 out of p − k + 1 populations, which we already know is minimized at

θ1 = ... = θp.

Observe that nothing changes in the argument if we replace the intervals (−c , c),by intervals of the form (−c1, c2), with c1, c2 > 0. This observation leads to the following

lemma:

Lemma 3. Let c1, c2 > 0 and for p ≥ 2, let X1, ... ,Xp be independent random variables

with Xi ∼ N(θi , 1). Then,

minθ1,...,θp

P(θ(1) ∈ (X(1) − c1,X(1) − c2), ... , θ(k) ∈ (X(k) − c1,X(k) − c2))

= (Φ(c2)−Φ(−c1))k−1[Φp−k+1(c2)−Φp−k+1(−c1)

],

where Φ(·) is the cdf of the normal standard distribution.

If the variance σ2 is unknown, we can follow the same strategy used in Chapter 2

and extend this result by writing the coverage probability as a mixture. We obtain:

Lemma 4. Let c1, c2 > 0 and for p ≥ 2, let X1, ... ,Xp be independent random vari-

ables with Xi ∼ N(θi , 1), where both θi and σ2 are unknown. If s2 is an estimate of σ2

independent of X1, ... ,Xn, then

minθ1,...,θp

P(θ(1) ∈ (X(1) − c1,X(1) − c2), ... , θ(k) ∈ (X(k) − c1,X(k) − c2))

=

∫ ∞

0

(Φ(c2t)−Φ(−c1t))k−1[Φp−k+1(c2t)−Φp−k+1(−c1t)

]ϕ(t)dt,

where ϕ(·) is the pdf of s/σ and Φ(·) is the cdf of the standard normal distribution.

41

The following theorem summarize the main results of this chapter:


sample from a N(θi ,σ2), where θi is unknown.

Case 1: If the variance σ2 is known, then confidence intervals for the θ(1), ... , θ(k),

with a simultaneous confidence coefficient of (at least) (1− α), are given by

X(j) ± σ√nc , j = 1, ... , k ,

where the value of c satisfies

(Φ(c)−Φ(−c))k−1 [Φp−k+1(c)−Φp−k+1(−c)] = 1− α.

Case 2: If the variance σ2 is unknown, then confidence intervals for the θ(1), ... , θ(k),

with a simultaneous confidence coefficient of (at least) (1− α), are given by

X(j) ± s√nc , j = 1, ... , k ,

where s =√p−1(n − 1)∑p

i=1 s2i , s

2i = (n − 1)−1

∑nj=1(Xij − Xi)2 for 1 ≤ i ≤ p, c satisfies

∫ ∞

0

(Φ(ct)−Φ(−ct))k−1 [Φp−k+1(ct)−Φp−k+1(−ct)] ϕ(t)dt = 1− α

and ϕ(·) is the pdf of s/σ.

3.2 Numerical Studies

The results obtained in Section 3.1 suggests that the minimum coverage probability

is attained when k − 1 population mean go to infinity and the remaining p − k + 1populations have the same mean. To confirm this behavior we performed several

simulation studies in which we consider the empirical coverage probability for confidence

intervals setting very large numbers for the values of θ diverging to infinity and setting

the remaining ones equal to zero. Table 3-2 shows the result of a simulation study in

which we considered six populations and we varied the number of of the selected ones.

In the first column we can see the number of population means set equal to zero (the

42

rest was set equal to 100 to represent infinity) and we observe that for every 1 ≤ k ≤ 6,the minimum coverage probability is obtained when 6 − k + 1 populations have equal

mean.

A different concern is whether the new intervals maintain the nominal level. Table

3-3 summarizes the results for the observed coverage probabilities considering 6

populations obtained in a numerical study. The nominal level is 95%. In the table, the

first column shows different configurations for the population means and the first rows

indicate the number of selected populations. We observed that for every configuration

the observed coverage probability is never below the nominal level. These result remain

valid for every other configuration we have considered (including changing the number of

populations) which validates the reliability of the procedure.

Finally, we studied the behavior of the length of the intervals. In Chapter 2

we observed that the confidence intervals increase in length as the number of

populations increases. This behavior is also expected when we are selecting k > 1

populations, however it it is important to determine how the value of k affects the

length of the intervals. Table 3-4 shows the results of a numerical study in which we

considered different values of p (total number of populations) and k (number of selected

populations). In the table, the first columns shows the number of populations, and the

first row the number of selected populations. In the body we observe the values of the

cutoff points for a 95% confidence intervals for the corresponding configuration, and the

last column shows the cutoff values for 95% simultaneous confidence intervals using

Bonferroni. We notice that the proposed intervals are always shorter to Bonferroni, even

when we select all the available populations (p = k). This difference increases as the

number of populations increases.

43


Table 3-1. Structure of the ∆’s for the case p = 4, k = 2 (see 3–3). Each row represent aterm in the sum.

Top ∆’s(X1,X2) +∆13 +∆23 +∆14 +∆24(X1,X3) +∆12 −∆23 +∆14 +∆34(X1,X4) +∆12 −∆24 +∆13 −∆34(X2,X3) −∆12 −∆13 +∆24 +∆34(X2,X4) −∆12 −∆14 +∆23 −∆34(X3,X4) −∆13 −∆14 −∆23 −∆24

Table 3-2. Coverage probabilities for the number of population means equal to 0 (firstcolumn) vs the number of selected population (first row).

# of θi = 0 k = 1 k = 2 k = 3 k = 4 k = 5 k = 6

6 0.740 0.740 0.739 0.738 0.714 0.5315 0.898 0.698 0.698 0.697 0.682 0.5314 0.904 0.813 0.662 0.662 0.654 0.5313 0.861 0.853 0.730 0.626 0.623 0.5312 0.819 0.818 0.805 0.658 0.592 0.5311 0.777 0.777 0.776 0.757 0.590 0.5310 0.740 0.740 0.739 0.738 0.714 0.531

Table 3-3. Observed coverage probability for 95% CI for the mean of the selectedpopulations when p = 6 using the new method.(θ1, θ2, θ3, θ4, θ5, θ6) k = 1 k = 2 k = 3 k = 4 k = 5 k = 6

(0, 0, 0, 0, 0, 0) 0.955 0.954 0.960 0.968 0.969 0.953(0, 1, 2, 3, 4, 5) 0.977 0.966 0.959 0.959 0.957 0.957(0, 3, 6, 9, 12, 15) 0.982 0.972 0.965 0.953 0.951 0.953(0, 0, 0, 0, 3, 3) 0.978 0.968 0.954 0.957 0.961 0.955

44

Table 3-4. Cutoff points for 95% CI for different values of p and k using the new method.Num Pop k = 1 k = 2 k = 3 k = 4 k = 5 Bonf1 1.960 1.9602 1.960 2.236 2.2413 2.121 2.236 2.388 2.3944 2.234 2.319 2.388 2.491 2.4985 2.319 2.387 2.443 2.491 2.569 2.576

0 2 4 6 8 10

0.88

0.90

0.92

0.94

0.96

Selecting 1 out of 6

Norm of Delta

Cove

rage P

robabili

ty

0 2 4 6 8 10

0.87

0.88

0.89

0.90

0.91


Norm of Delta

Cove

rage P

robabili

ty

0 2 4 6 8 10

0.856

0.860

0.864

0.868


Norm of Delta

Cove

rage P

robabili

ty

0 2 4 6 8 10

0.82

0.83

0.84

0.85

0.86


Norm of Delta

Cove

rage P

robabili

ty

0 2 4 6 8 10

0.80

0.82

0.84

0.86


Norm of Delta

Cove

rage P

robabili

ty

0 2 4 6 8 10

0.0

0.2

0.4

0.6


Norm of Delta

Cove

rage P

robabili

ty

Figure 3-1. Coverage probabilities as a function of ∆ when p = 6. The plots suggest theminimum is not reached at the origin.

45

CHAPTER 4INTERVAL ESTIMATION FOLLOWING THE SELECTION OF A RANDOM NUMBER OF

POPULATIONS

From an application perspective, an interesting variation of the selection problem

occurs when the number populations to be selected is random and depends on the

outcome of the experiment. For instance, in a standard multiple testing scheme, a

common approach is to run all the tests independently (without any corrections such as

Tukey or Bonferroni) and then, only declare significant a subset of the significant tests,

using procedures such as false discovery rate (FDR).

In addition to the notation we introduced in the previous chapters, we assume that

we observe a random sequence of numbers d1 > ... > dp obtained as a result of an

experiment, such that di ∈ (−∞,∞) for 1 ≤ i ≤ p. In this context, for any 0 < α < 1, we

want to determine the value of c > 0 such that

P(θ(1) ∈ X(1) ± c , ... , θ(K) ± c) ≥ 1− α

where K ∈ {0, ... , p} is a random quantity.

In order to obtain an expression for the coverage probability, we first write the

coverage probability as the sum

P(θ(1) ∈ X(1) ± c , ... , θ(K) ± c) =p∑

j=1

P(θ(1) ∈ X(1) ± c , ... , θ(K) ± c |K = j)P(K = j)

=

p∑

j=1

P(θ(1) ∈ X(1) ± c , ... , θ(j) ± c)P(K = j). (4–1)

From our previous results, we notice that for every term in the sum

P(θ(1) ∈ X(1) ± c , ... , θ(j) ± c) ≥ (Φ(c)−Φ(−c))j−1[Φp−j+1(c)−Φp−j+1(−c)] ,

46

and therefore, we can re-write (4–1) and obtain

P(θ(1) ∈ X(1) ± c , ... , θ(K) ± c)

≥p∑

j=1

(Φ(c)−Φ(−c))j−1 [Φp−j+1(c)−Φp−j+1(−c)]P(K = j), (4–2)

where Φ(·) is the cdf of the standard normal distribution.

Since the inequality above is not obtained by direct minimization of the coverage

probability in (4–1), any solution based on (4–2) is likely to be too conservative.

Therefore, it is important to assess the performance of the proposed bound in terms

of ots proximity to the coverage probability. The first thing to determine, is the behavior

of the lower bounds at the component level (K = j). Figure 4-1 shows the results of

a numerical study considering the components K = 1, ... ,K = 6 of the coverage

probability when p = 6. The dashed blue line shows the behavior of the respective

component as the norm of θ = (θ1, ... , θ6) increases and the red solid line shows the

corresponding lower bound. We observe the lower bound (for the individual terms) is not

extremely conservative.

On the other hand, the probability that K = j is given by

P(K = j) =

(pj)∑

i=1

∏

i∈IiP(Xi ≥ dj)

∏

i∈I ci

P(Xi ≤ dj)

=

(pj)∑

i=1

∏

i∈Ii[1−Φ(dj − θi)]

∏

i∈I ci

Φ(dj − θi), (4–3)

where P(Xi ≥ dj) = 1 − Φ(dj − θi) is the probability of selection for population i .

Notice that the expression in (4–3) resembles a binomial distribution. In fact, taking

θ1 = ... = θp = θ, we have

(pj)∑

i=1

∏

i∈Ii[1−Φ(dj − θi)]

∏

i∈I ci

Φ(dj − θi) =

(p

j

)[1−Φ(dj − θ)]j [Φ(dj − θ)]p−j ,

47

the binomial distribution with probability of success 1 − Φ(dj − θ). This observation

suggests we can use the quantities dj − θ as tuning parameters in order to improve the

performance of the lower bound. Figure 4-2 shows the results of a numerical study in

which we take d1 = ... = dp = d , and use the quantity q − θ as a tuning parameter. We

see that by changing the value of the probability of selection we can move the position

lower bound (red solid line) and produce some improvement in the approximation of the

the coverage probability.

Based on the previous observations, we can obtain an approximate solution to the

problem and determine c > 0 using the equation

1− α

=

p∑

j=1

(p

j

)(Φ(c)−Φ(−c))j−1 [Φp−j+1(c)−Φp−j+1(−c)] [1−Φ(qj − θ)]j [Φ(qj − θ)]p−j ,

for any 0 < α < 1.

Numerical studies suggest the results based on the expression above are not

extremely conservative. In addition, the results suggest the performance of the method

greatly improves as the number of population increases (see Figure 4-3).

4.1 Connection to FDR

The false discovery rate (FDR) procedure was introduced by Benjamini and

Hochberg (1995) and is a technique commonly used by practitioners in the context

of multiple testing. The main idea is to control the proportion of errors committed by

falsely rejecting null hypotheses. In simple terms, the procedure works in the following

way: suppose that we need to test H1, ... ,Hm hypotheses and we are not willing to

accept a proportion of false discoveries greater than q. We first rank the P-values (and

corresponding hypothesis) resulting from all the tests from smaller to largest and define

the sequence q1, q2, ... , qp according to qi = (i/m)q for i = 1 ... p). Then, we define k

to be the largest i such that P-valuei < qi . If we reject all the hypotheses corresponding

48

to the first k ordered P-values, the procedure guarantees to control the FDR with a

proportion no greater than q.

In the context of our problem, we observe that the FDR procedure can be easily

connected with the random selection idea. Suppose that we have m = p hypotheses

of the form H0 : θi ≤ 0 vs. H1 : θi > 0. In other words, we are interested in performing

p one-sided tests for the population means. Then, extreme observations will have

small P-values, and therefore, the selection criteria P-valuei < qi can be expressed

as X(i) > di . It follows that for the sequence q1 < ... < qp ∈ (0, 1) we can construct

a corresponding sequence d1 > ... > dp ∈ (−∞,∞), and hence, we can produce

confidence intervals for the top K selected populations, where the value of K is

determined by the FDR.

49


0 2 4 6 8 10

0.88

0.90

0.92

0.94

0.96

Component K = 1

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.85

0.87

0.89

0.91

Component K = 2

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.83

0.84

0.85

0.86

0.87

Component K = 3

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.81

0.83

0.85

0.87

Component K = 4

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.80

0.82

0.84

0.86

Component K = 5

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.75

0.77

0.79

0.81

Component K = 6

Norm of DeltaC

over

age

Pro

babi

lity

Figure 4-1. Individual components and corresponding bounds for the terms of thecoverage probability for random K when p = 6. The blue dashed linecorrespond to the coverage probability and the red solid line is the lowerbound.

50

0 2 4 6 8 10

0.76

0.80

0.84

Prob of Sel 0.3

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.80

0.82

0.84

0.86

Prob of Sel 0.4

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.810.820.830.840.850.86

Prob of Sel 0.5

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.81

0.83

0.85

Prob of Sel 0.6

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.80

0.82

0.84

0.86

Prob of Sel 0.7

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.80

0.82

0.84

0.86

Prob of Sel 0.8

Norm of Delta

Cov

erag

e P

roba

bilit

y

Figure 4-2. Behavior of the lower bound for random K when p = 6 as the probability ofselection varies.

51

0 2 4 6 8 10

0.83

0.84

0.85

0.86

Random out of 5

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.72

0.74

0.76

0.78

0.80

Random out of 10

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.60

0.64

0.68

0.72

Random out of 15

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.52

0.56

0.60

0.64

Random out of 20

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.25

0.30

0.35

0.40

Random out of 40

Norm of Delta

Cov

erag

e P

roba

bilit

y

0 2 4 6 8 10

0.12

0.16

0.20

0.24

Random out of 60

Norm of Delta

Cov

erag

e P

roba

bilit

y

Figure 4-3. Behavior of the coverage probabilities and respective lower bounds forrandom K as the population size p varies.

52

CHAPTER 5APPLICATION EXAMPLE

In this chapter we show a potential application for the procedures we have

introduced in this dissertation. We consider data from a genetic experiment that

compares gene expressions between two different tissue types, infection cushions

(IC) and vegetative hyphae (VH), through competitive hybridization. Using a single

probe, a total of five hybridizations (independent biological replications) were run for

7494 genes. The data consists of the processed signal intensities as a measurement of

the fluorescence reaction of every gene to the probe.

In the context of the experiment, one question of interest is to determine what

genes are differentially expressed between the two tissue types. Since all the genes are

exposed to the same probe, differences that can not be explained by chance indicate

a variation associated to the tissues, in the corresponding genes. A related question

is what is fold increase or decrease in gene expression between the treatments. In

other words, what is the mean signal ratio for each gene between treatments. Here, we

consider the problem of determining confidence intervals for the mean signal ratio in

those genes that give the top largest increase between the two treatments.

First, we implement the procedure for the top k genes, where k is fixed and

pre-specified. We end this chapter showing how the procedure works when K is chosen

at random, determined by FDR.

5.1 Fixed Selection

Suppose first that the number of populations to be selected is determined prior

to the experiment. Specifically, suppose that k = 100. For every gene we take the

difference of the log-scores for each of the 5 replications. Then, we have a total of

p = 7494 populations, from which we take independent samples of size n = 5. Although

the number of replications is not large enough to assume the central limit theorem (CLT),

53

the data does not show clear deviations from normality. To correct for heterogeneity, we

use of the log-scores for the analysis.

Then, we rank the average of the differences in descending order and select the

genes corresponding for the top 100 values of the sample means. Using the results

presented in Chapter 3, the cutoff value for 95% confidence intervals when p = 7494

and k = 100 is c = 4.35. Table 5-1 shows the mean, standard deviation and confidence

intervals for the top 5 and bottom 5 selected genes. The table show that, although the

value of the cutoff point c is seemingly large, the actual confidence intervals are narrow

enough to draw practical conclusions.

5.2 Random Selection

An alternative approach consists in performing one-sided t-tests for the mean

log-score difference for every gene. Then, all the P-values are ranked in ascending

order and we declare significance controlling for FDR. Since the number of genes

that will be selected depends on the outcome of the experiment, we use the results

presented in Chapter 4.

Controlling for a false discovery rate of 5%, we select K = 25 populations. Using

the results form Chapter 4, we obtain for p = 7494 populations, the cutoff point for

95% confidence intervals is c = 4.44 (slightly bigger than the obtained in the previous

section). Table 5-2 shows the mean, standard deviation, P-value and confidence

intervals for the mean difference of the 25 populations selected using the FDR criteria.

Again, we observe the intervals are narrow enough to carry out meaningful inference. In

fact the results of all the intervals agree with the conclusions of the tests.

54


Table 5-1. Confidence intervals based on the selection of the top 100 log-scoredifferences

Ranking Mean St Dev 95% CI1 4.76 0.262 (4.247,5.268)2 4.38 0.303 (3.790,4.969)3 3.93 0.203 (3.534,4.325)4 3.79 0.519 (2.782,4.804)5 3.52 0.600 (2.351,4.685)96 1.35 0.930 (-0.457,3.163)97 1.35 0.680 (0.029,2.675)98 1.35 1.459 (-1.488,4.189)99 1.35 0.911 (-0.428,3.118)100 1.34 0.915 (-0.445,3.117)

Table 5-2. Confidence intervals based on the selection of the top log-score differences,randomly chosen using FDR.

Mean St Dev P-value 95% CI4.38 0.301 2.67e-08 (3.778,4.981)3.93 0.203 3.93e-08 (3.526,4.333)4.76 0.262 2.13e-07 (4.237,5.279)2.85 0.662 1.60e-06 (1.532,4.161)0.95 0.236 2.00e-05 (0.483,1.420)3.03 0.588 2.82e-05 (1.859,4.194)0.99 0.338 4.84e-05 (0.320,1.662)0.48 0.086 6.44e-05 (0.311,0.652)1.83 0.351 6.50e-05 (1.130,2.526)1.24 0.449 6.64e-05 (0.345,2.129)0.88 0.232 6.95e-05 (0.417,1.337)1.25 0.457 7.87e-05 (0.344,2.159)2.61 1.177 8.81e-05 (0.271,4.943)1.32 0.483 9.02e-05 (0.357,2.277)0.98 0.173 9.16e-05 (0.638,1.324)3.52 0.600 9.57e-05 (2.328,4.709)2.74 0.739 1.04e-04 (1.277,4.212)1.42 0.450 1.15e-04 (0.530,2.319)1.00 0.431 1.19e-04 (0.139,1.851)1.12 0.510 1.25e-04 (0.103,2.127)3.50 0.582 1.29e-04 (2.343,4.654)

55

CHAPTER 6CONCLUSIONS

We have proposed a method to construct confidence intervals for population means

following the selection of k ≥ 1 populations, where a population is selected if the

corresponding sample mean is among the top k sampled values. Unlike the traditional

intervals, our method takes into account the selection procedure and therefore does

not fail to maintain the nominal coverage probability. Numerical studies show that the

new intervals perform better than the traditional intervals for any configuration of the

population means and they are consistently narrower than the Bonferroni intervals.

The methodology we have proposed to construct the intervals is based on the

minimization of the coverage probability. In Chapter 2 we proved that for k = 1 the

configuration of the population means (θ1, ... , θp) that minimize the coverage probability

is the iid case, that is whenever θ1 = θ2 = ... = θp = θ, for any value of θ. Moreover,

when this is the case, the coverage probability of the confidence intervals is determined

by the cumulative distribution function of the first order statistic, X(1)max{X1, ... ,Xp}.For k > 1, we proved in Chapter 3 that the optimal configuration is reached

asymptotically when the top k − 1 population means go to +∞ and the remaining

p − k + 1 are equal. The approach we considered leads to an explicit formula for the

minimum of the coverage probability that contains the case k = 1 as a particular case.

In Chapter 4 we extended our results to the case where the number of selected

populations, K , is a random quantity depending on the outcome of the experiment.

Although we did not present a solution based on the direct minimization of the coverage

probability, we proposed a conservative approach, introducing a lower bound for the

coverage probability based on the results obtained in Chapter 3.

Intuitively, in order to construct confidence intervals that maintain the nominal level

in the context of selection, we need to take into account the variability coming from the

selection mechanism itself, and as a result, the confidence intervals are expected to be

56

longer. In addition, the conservative solutions tend to increase the length of the intervals.

However, the solutions we presented here, have shown to perform well in diverse

numerical studies and real applications. Although longer than the traditional intervals,

the proposed confidence intervals are not only shorter than Bonferroni, but also they

grow at a very slow rate. In addition, all the main results presented remain valid if we

consider intervals of the form (c1(x), c2(x)). This open the possibility of reducing the

length of the intervals by constructing non-symmetric confidence intervals, where the

interval limits can be shrink using, for instance, empirical Bayes estimators.

Finally, observed that the approach discussed in Chapter 4 encourage the use of

confidence intervals as a way to determine significance. Such approach could be use in

combination or competition with FDR but further investigation is required.

57

LIST OF REFERENCES

Bechhofer, R. (1954). A single-sample multiple decision procedure for ranking meansof normal populations with known variances. The Annals of Mathematical Statis-tics 25(1), 16–39.

Bechhofer, R., T. Santner, and D. Goldsman (1995). Design and analysis of experimentsfor statistical selection, screening, and multiple comparisons. Wiley.

Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: a practicaland powerful approach to multiple testing. Journal of the Royal Statistical Society.Series B (Methodological), 289–300.

Berger, J. (1976). Inadmissibility results for generalized Bayes estimators of coordinatesof a location vector. The Annals of Statistics, 302–333.

Blumenthal, S. and A. Cohen (1968). Estimation of the larger of two normal means.Journal of the American Statistical Association 63(323), 861–876.

Brown, L. (1979). A heuristic method for determining admissibility of estimators–withapplications. The Annals of Statistics, 960–994.

Chen, H. and E. Dudewicz (1976). Procedures for fixed-width interval estimation ofthe largest normal mean. Journal of the American Statistical Association 71(355),752–756.

Cohen, A. and H. Sackrowitz (1982). Estimating the Mean of the Selected Population.Third Purdue Symposium on Statistical Decision Theory and Related Topics.

Cohen, A. and H. Sackrowitz (1986). A Decision Theoretic Formulation for PopulationSelection Followed by Estimating the Mean of the Selected Population. Fourth PurdueSymposium on Statistical Decision Theory and Related Topics, 243–270.

Dahiya, R. (1974). Estimation of the mean of the selected population. Journal of theAmerican Statistical Association 69(345), 226–230.

Gupta, S. and K. Miescke (1990). On finding the largest normal mean and estimating theselected mean. Sankhya: The Indian Journal of Statistics, Series B 52(2), 144–157.

Gupta, S. and S. Panchapakesan (2002). Multiple decision procedures: theory andmethodology of selecting and ranking populations. Society for Industrial Mathematics.

Gupta, S. and M. Sobel (1957). On a statistic which arises in selection and rankingproblems. The Annals of Mathematical Statistics 28(4), 957–967.

Guttman, I. and G. Tiao (1964). A Bayesian approach to some best populationproblems. The Annals of Mathematical Statistics 35(2), 825–835.

Hwang, J. (1993). Empirical Bayes Estimation for the Means of the SelectedPopulations. Sankhya: The Indian Journal of Statistics, Series A 55(2), 285–304.

58

Lele, C. (1993). Admissibility results in loss estimation. The Annals of Statistics 21(1),378–390.

Putter, J. and D. Rubinstein (1968). On estimating the mean of a selected population.Tech. Kept 165.

Qiu, J. and J. Hwang (2007). Sharp simultaneous intervals for the means of selectedpopulations with application to microarray data analysis. Biometrics 63, 767–776.

Sackrowitz, H. and E. Samuel-Cahn (1984). Estimation of the mean of a selectednegative exponential population. Journal of the Royal Statistical Society. Series B(Methodological) 46(2), 242–249.

Sackrowitz, H. and E. Samuel-Cahn (1986). Evaluating the chosen population: a bayesand minimax approach. Lecture Notes-Monograph Series, 386–399.

Saxena, K. (1976). A single-sample procedure for the estimation of the largest mean.Journal of the American Statistical Association, 147–148.

Saxena, K. and Y. Tong (1969). Interval estimation of the largest mean of k normalpopulations with known variances. Journal of the American Statistical Association,296–299.

Stein, C. (1964). Contribution to the discussion of bayesian and non-bayesian decisiontheory. Handout Institute of Mathematical Statistics Meeting.

59

BIOGRAPHICAL SKETCH

Claudio Fuentes was born in Chile in 1977. Upon graduation from high school, he

enrolled as a student at the Pontificia Universidad Catolica de Chile, where he received

a degree of Bachelor of Science in mathematics in 2001. During his undergraduate

he was appointed as a teaching assistant for several courses. It was then when he

developed a deep appreciation for teaching and decided to pursue an academic career.

In December 2003, he received a master degree in statistics from the same institution.

In August 2005, he entered the graduate program in the Department of Statistics

at the University of Florida. During his education there, he had the opportunity to work

as a research assistant for Distinguished Professor Dr. George Casella, who became

his advisor. In August 2008 he earned the degree of Master of Science in statistics with

a thesis in cluster analysis and in August 2011 he earned his PhD. in Statistics with a

dissertation in interval estimation following selection. After graduation, he joined the

Department of Statistics at Oregon State University as assistant professor.

60

ufdcimages.uflib.ufl.edu · acknowledgments i would like to gratefully and sincerely thank dr....

Documents