multihypothesis sequential probability ratio tests-part i ......examples include, among a multitude...

14
2448 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999 Multihypothesis Sequential Probability Ratio Tests—Part I: Asymptotic Optimality Vladimir P. Dragalin, Alexander G. Tartakovsky, and Venugopal V. Veeravalli, Senior Member, IEEE Abstract— The problem of sequential testing of multiple hy- potheses is considered, and two candidate sequential test proce- dures are studied. Both tests are multihypothesis versions of the binary sequential probability ratio test (SPRT), and are referred to as MSPRT’s. The first test is motivated by Bayesian optimality arguments, while the second corresponds to a generalized likeli- hood ratio test. It is shown that both MSPRT’s are asymptotically optimal relative not only to the expected sample size but also to any positive moment of the stopping time distribution, when the error probabilities or, more generally, risks associated with incorrect decisions are small. The results are first derived for the discrete-time case of independent and identically distributed (i.i.d.) observations and simple hypotheses. They are then ex- tended to general, possibly continuous-time, statistical models that may include correlated and nonhomogeneous observation processes. It also demonstrated that the results can be extended to hypothesis testing problems with nuisance parameters, where the composite hypotheses, due to nuisance parameters, can be reduced to simple ones by using the principle of invariance. These results provide a complete generalization of the results given in [36], where it was shown that the quasi-Bayesian MSPRT is asymptotically efficient with respect to the expected sample size for i.i.d. observations. In a companion paper [12], based on the nonlinear renewal theory we find higher order approximations, up to a vanishing term, for the expected sample size that take into account the overshoot over the boundaries of decision statistics. Index Terms— Asymptotic optimality, invariant sequential tests, multialternative sequential tests, one-sided SPRT, renewal theory, slippage problems. I. INTRODUCTION T HE goal of statistical hypothesis testing is to classify a sequence of observations into one of possible hypothesis based on some knowledge of statistical distributions of the observations under each of the hypotheses. For sequential testing problems, the number of observations Manuscript received April 6, 1998; revised February 2, 1999. This work was supported in part by the National Science Foundation under Grant NCR- 9523967, by the National Institutes of Health under Grant RO1 HL58751, and by the Office of Naval Research under Grant N00014-95-1-0229. V. P. Dragalin is with the Department of Biostatistics, University of Rochester, Rochester, NY 14642 USA (e-mail: [email protected]. edu). A. G. Tartakovsky is with the Center for Applied Mathematical Sciences, University of Southern California, Los Angeles, CA 90089-1113 USA (e-mail: [email protected]). V. V. Veeravalli is with the School of Electrical Engineering and the Department of Statistical Science, Cornell University, Ithaca, NY 14853 USA (e-mail: [email protected]). Communicated by M. L. Honig, Assicate Editor for Communications. Publisher Item Identifier S 0018-9448(99)07008-X. used (sample size) is allowed to be variable, i.e., the sample size is a function of the observations. A sequential test picks a stopping time and a final decision rule to effect a tradeoff between sample size and decision accuracy. The majority of research in sequential hypothesis testing has been restricted to two hypotheses. However, there are several situations, particularly in engineering applications, where it is natural to consider more than two hypotheses. Examples include, among a multitude of others, target detection in multiple-resolution radar [26] and infrared systems [13], signal acquisition in direct sequence code-division multiple access systems [37], and statistical pattern recognition [16]. It is thus of interest to study sequential testing of more than two hypotheses. It is well known that for binary hypotheses testing , Wald’s sequential probability ratio test (SPRT) is optimal, in the sense that it simultaneously minimizes both expectations of the sample size among all tests (sequential and nonsequential) for which the probabilities of error do not exceed predefined values (see, e.g., Wald and Wolfowitz [40], Chernoff [6], and Lehmann [22]). Unfortunately, if , it is not clear if there even exists a test that minimizes the expected sample size for all hypotheses. Moreover, existing research indicates that even if such a test exists, it would be very difficult to find its structure. For this reason a substantial part of the development of sequential multihypothesis testing has been directed toward the study of practical, suboptimal sequential tests and the evaluation of their asymptotic performance, see [3], [5]–[11], [21], [25], [30]–[38], and many others. A model for studying such asymptotics was first introduced by Chernoff [5], wherein he studied the independent and identically distributed (i.i.d.) case and assumed a cost per observation that was allowed to go to zero. Generalization to non-i.i.d. cases (where log- likelihood ratios fail to be random walks) have been made by Golubev and Khas’minskii [18], Lai [23], Tartakovsky [31], [32], [34], [35], as well as Verdenskaya and Tartakovskii [38]. For the binary SPRT, it is well known that renewal the- ory is useful in obtaining asymptotically exact expressions for expected sample sizes and error probabilities (see, e.g., [29], and [41]). An approach to applying renewal theory techniques to sequential multihypothesis tests was recently elucidated by Baum and Veeravalli [3], wherein they studied a generalization 1 of the SPRT to more than two hypotheses that is motivated by a Bayesian setting. This quasi-Bayesian 1 This generalization also appears in other papers [18], [30]–[32] in various contexts. 0018–9448/99$10.00 1999 IEEE

Upload: others

Post on 17-Apr-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

2448 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999

Multihypothesis Sequential Probability RatioTests—Part I: Asymptotic Optimality

Vladimir P. Dragalin, Alexander G. Tartakovsky, and Venugopal V. Veeravalli,Senior Member, IEEE

Abstract—The problem of sequential testing of multiple hy-potheses is considered, and two candidate sequential test proce-dures are studied. Both tests are multihypothesis versions of thebinary sequential probability ratio test (SPRT), and are referredto as MSPRT’s. The first test is motivated by Bayesian optimalityarguments, while the second corresponds to a generalized likeli-hood ratio test. It is shown that both MSPRT’s are asymptoticallyoptimal relative not only to the expected sample size but alsoto any positive moment of the stopping time distribution, whenthe error probabilities or, more generally, risks associated withincorrect decisions are small. The results are first derived forthe discrete-time case of independent and identically distributed(i.i.d.) observations and simple hypotheses. They are then ex-tended to general, possibly continuous-time, statistical modelsthat may include correlated and nonhomogeneous observationprocesses. It also demonstrated that the results can be extendedto hypothesis testing problems with nuisance parameters, wherethe composite hypotheses, due to nuisance parameters, can bereduced to simple ones by using the principle of invariance. Theseresults provide a complete generalization of the results givenin [36], where it was shown that the quasi-Bayesian MSPRT isasymptotically efficient with respect to the expected sample sizefor i.i.d. observations. In a companion paper [12], based on thenonlinear renewal theory we find higher order approximations,up to a vanishing term, for the expected sample size that take intoaccount the overshoot over the boundaries of decision statistics.

Index Terms— Asymptotic optimality, invariant sequentialtests, multialternative sequential tests, one-sided SPRT, renewaltheory, slippage problems.

I. INTRODUCTION

T HE goal of statistical hypothesis testing is to classifya sequence of observations into one of

possible hypothesis based on some knowledge of statisticaldistributions of the observations under each of the hypotheses.For sequential testing problems, the number of observations

Manuscript received April 6, 1998; revised February 2, 1999. This workwas supported in part by the National Science Foundation under Grant NCR-9523967, by the National Institutes of Health under Grant RO1 HL58751, andby the Office of Naval Research under Grant N00014-95-1-0229.

V. P. Dragalin is with the Department of Biostatistics, University ofRochester, Rochester, NY 14642 USA (e-mail: [email protected]).

A. G. Tartakovsky is with the Center for Applied Mathematical Sciences,University of Southern California, Los Angeles, CA 90089-1113 USA (e-mail:[email protected]).

V. V. Veeravalli is with the School of Electrical Engineering and theDepartment of Statistical Science, Cornell University, Ithaca, NY 14853 USA(e-mail: [email protected]).

Communicated by M. L. Honig, Assicate Editor for Communications.Publisher Item Identifier S 0018-9448(99)07008-X.

used (sample size) is allowed to be variable, i.e., the samplesize is a function of the observations. A sequential test picksa stopping time and a final decision rule to effect a tradeoffbetween sample size and decision accuracy.

The majority of research in sequential hypothesis testing hasbeen restricted to two hypotheses. However, there are severalsituations, particularly in engineering applications, where itis natural to consider more than two hypotheses. Examplesinclude, among a multitude of others, target detection inmultiple-resolution radar [26] and infrared systems [13], signalacquisition in direct sequence code-division multiple accesssystems [37], and statistical pattern recognition [16]. It isthus of interest to study sequential testing of more than twohypotheses.

It is well known that for binary hypotheses testing ,Wald’s sequential probability ratio test (SPRT) is optimal, inthe sense that it simultaneously minimizes both expectations ofthe sample size among all tests (sequential and nonsequential)for which the probabilities of error do not exceed predefinedvalues (see, e.g., Wald and Wolfowitz [40], Chernoff [6], andLehmann [22]). Unfortunately, if , it is not clear ifthere even exists a test that minimizes the expected sample sizefor all hypotheses. Moreover, existing research indicates thateven if such a test exists, it would be very difficult to find itsstructure. For this reason a substantial part of the developmentof sequential multihypothesis testing has been directed towardthe study of practical, suboptimal sequential tests and theevaluation of their asymptotic performance, see [3], [5]–[11],[21], [25], [30]–[38], and many others. A model for studyingsuch asymptotics was first introduced by Chernoff [5], whereinhe studied the independent and identically distributed (i.i.d.)case and assumed a cost per observation that was allowedto go to zero. Generalization to non-i.i.d. cases (where log-likelihood ratios fail to be random walks) have been made byGolubev and Khas’minskii [18], Lai [23], Tartakovsky [31],[32], [34], [35], as well as Verdenskaya and Tartakovskii [38].

For the binary SPRT, it is well known that renewal the-ory is useful in obtaining asymptotically exact expressionsfor expected sample sizes and error probabilities (see, e.g.,[29], and [41]). An approach to applying renewal theorytechniques to sequential multihypothesis tests was recentlyelucidated by Baum and Veeravalli [3], wherein they studieda generalization1 of the SPRT to more than two hypothesesthat is motivated by a Bayesian setting. This quasi-Bayesian

1This generalization also appears in other papers [18], [30]–[32] in variouscontexts.

0018–9448/99$10.00 1999 IEEE

Page 2: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

DRAGALIN et al.: MULTIHYPOTHESIS SEQUENTIAL PROBABILITY RATIO TESTS–PART I 2449

multihypothesis SPRT (or MSPRT) was shown to be amenableto an asymptotic analysis using nonlinear renewal theory[41], and asymptotic expressions for the expected sample sizeand error probabilities were obtained in [3]. Furthermore, itwas established in [36] that the quasi-Bayesian MSPRT isasymptotically efficient with respect to expected sample size,for the i.i.d. observation case.

In this paper, we continue to investigate the asymptoticbehavior of the quasi-Bayesian MSPRT, as well as a relatedtest that corresponds to a generalized likelihood ratio test.The latter test was suggested by Armitage [1], and studied in[8]–[10], [25], and [30]–[38] in various contexts. We providea complete generalization of previous results, including thatobtained in [36], in the following directions: i) we show thatboth MSPRT’s are asymptotically optimal relative not only tothe expected sample size but also to any positive momentof the stopping time distribution; and ii) we extend theseresults to general, possibly continuous-time, statistical modelsthat may include correlated and nonhomogeneous observationprocesses. We also demonstrate that our results are applicableto hypothesis testing problems with nuisance parameters. Thiscomplete generalization justifies the use of the MSPRT’s ina variety of applications, some of which are discussed in thepaper.

The remainder of the paper is organized as follows. InSection II, we formulate the problem, present a motivation,and describe the structure of sequential test procedures. InSection III, we derive lower bounds for the finite moments ofthe stopping time distribution under very general conditions,which then are used in proving asymptotic optimality. InSection IV, first we consider the discrete-time i.i.d. caseand prove that, under certain natural conditions, both testsminimize not only the expected sample size but also anypositive moment of the stopping time distribution as errorprobabilities (or, more generally, decision risks) tend to zero.Then the asymptotic optimality results are shown to be validfor general statistical models that involve correlated and non-homogeneous observation processes. This generalization tonon-i.i.d. cases is done in the spirit of the work in [23]and [35], and we borrow many auxiliary results from thiswork. Section V deals with a specific class of multipopulationproblems—the so-called “slippage” problems. It is shownthat slippage problems are related to problems that arise intarget detection in multiresolution or multichannel systems. InSection VI, we illustrate the general results through examplesrelated to detecting signals in multichannel systems, includingcomposite testing situations with nuisance parameters. Finally,in Section VII we give some concluding remarks.

Higher order approximations for the expected sample sizeup to a vanishing term will be given in the companion paper[12]. These approximations take into account the overshootover the boundaries of decision statistics, and their derivationis based on nonlinear renewal theory. In the companion paper,we also present simulation results which confirm that theobtained approximations are highly accurate not only for smallbut also for moderate error probabilities which are typical formany applications.

II. PROBLEM FORMULATION AND TEST PROCEDURES

A. Basic Notation

Throughout the paper P is a probability space onwhich everything is defined, is a random process(defined on ), which is observed either in discrete

or continuous time. Bywill be denoted a sub--algebra of generated

by observed up to time.We first consider the problem of sequential testing of

simple hypotheses “ P P ”where and P are completely known distinct proba-bility measures. An example where the measures Pare notcompletely known (belong to some families of distributions)will be considered in Section VI.

A pair is said to be a sequential hypothesis testif is a Markov stopping time with respect to the family

(i.e., ), and is an -measurable function (terminal decision function) with valuesin the set . Therefore, is identifiedwith accepting the hypothesis .

Next, let be the prior distribution of hy-potheses, , and let the consequence ofdeciding when the hypothesis is true be evaluated bythe loss function .Suppose, without loss of generality, that the losses due tocorrect decisions are zero (i.e., ). The riskassociated with making the decision is then given by

where for , P is the probability ofaccepting the hypothesis when is true (the conditionalprobability of error). Note that is the risk associatedwith wrong decisions; their sum constitutes the average riskwhich does not include the observation or experiment cost.In the particular case of thezero–oneloss function, where

for , the risk is the same asfrequentisterror probability , which is probability of acceptingincorrectly. That is, for the zero–one loss function

We now introduce the following class of tests

(2.1)

where is a vector of positive finitenumbers. In other words, the class consists of the testsfor which the risks do not exceed the predefined numbers

.In what follows, E will denote the expectation with respect

to the measure P. A reasonable formulation of the problem ofoptimizing sequential tests under given prior information is tofind a test that minimizes the average observation time,E

E , among all tests belonging to the class

Page 3: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

2450 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999

The structure of such a test has been determined in [31]and [32] (see also [3]). Specifically, if time is discrete, i.e.,

and the observations are i.i.d., the optimalstopping time is given by , where is thefirst time , , for which . Here

is the posterior probability of thethhypothesis based on the first observations, is thevector of posterior probabilities withth component excluded,and is a nonlinear function depending on the distributionof observations. Thus the optimal test involves a comparison ofposterior probabilities with random thresholds which must bedetermined for each model. In general, it is nearly impossibleto find the form of the functions(for some special cases see [31], [32], and [3]). Furthermore,even if one is able to find the boundaries of the optimal test,it would be difficult to implement this procedure in practice.In addition, while in the case of two hypotheses Wald’s SPRTminimizes not only the average observation timeE but alsoboth expectations E and E , it is unclear if such a testexists for (i.e., the test which minimizes E for all

). However, in an asymptoticsetting when the risks (error probabilities) are sufficientlysmall, such tests may be found.

B. Sequential Test Procedures

In the following, we simplify the structure of the optimaltest by replacing the nonlinear random thresholds with simplefunctions (constants in the case of the zero–one loss function).Specifically, we consider the two following multialternativesequential tests which will be calledmultihypothesis sequentialprobability ratio tests(MSPRT’s).2 The time parameter maybe either discrete or continuous. Let P be the restriction ofthe measure Pto the -algebra and let

P

Q

denote the log-likelihood ratio (LLR) processes with respectto a dominating measure Q. If Q P , the correspondingLLR process will be denoted .

1) Test : Introduce the Markov times

(2.2)

where and are positive thresholds. Thetest procedure is defined as follows:

if

That is, we stop as soon as the threshold in (2.2) is exceededfor some , and decide that this is the true hypothesis.This test is motivated by a Bayesian framework and wasconsidered earlier by Fishman [15], Golubev and Khas’minskii[18], Sosulin and Fishman [30], Tartakovsky [31], [32], as well

2This generalization also appears in other papers [18], [30], [31], and [32]in various contexts.

as Baum and Veeravalli [3], [36]. Indeed, in the special caseof zero–one losses the stopping timesmay be rewritten as

where

and where

is the posterior probability of the hypothesis . Note alsothat if (does not depend on), the stopping time ofthe test is defined by

where

i.e., we stop as soon as the largest posterior probability exceedsa threshold.

2) Test : Let and

(2.3)

be the Markov “accepting” time for the hypothesis . Thetest is defined as follows:

if

This test represents a modification of the matrix SPRT(the combination of one-sided SPRT’s) that was suggested byLorden [25]. It was considered earlier by Armitage [1], Lorden[25], Dragalin [8]–[10], Tartakovsky [32], [34], [35], as well asVerdenskaya and Tartakovskii [38]. Note also that ifand the loss function is zero–one, then and theMarkov time may be represented in the form

where

i.e., is the generalized likelihood ratio between andthe remaining hypotheses.

Suppose the distributions of the observations come fromexponential families, which are good models for many ap-plications. Then the test (as defined in (2.3)) has anadvantage over the first test in that it does not require ex-ponential transformations of the observations. This fact makesit more convenient for practical realization and simulation.Furthermore, may easily be modified to meet constraints onconditional risks (see [8]–[10], [32], [34], and [35]) that maybe more relevant in some practical applications. However, aswe shall see in [12], has the advantage that it is easier todesign the thresholds to precisely meet constraints on therisks .

We begin our analysis by deriving bounds on the risks interms of the thresholds. This allows us to choose the thresholdsin such a way as to guarantee that the tests belong to the class

defined in (2.1). We emphasize that the bounds hold

Page 4: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

DRAGALIN et al.: MULTIHYPOTHESIS SEQUENTIAL PROBABILITY RATIO TESTS–PART I 2451

under general assumptions, and require neither independencenor homogeneity of the observed data. The proof is given inthe Appendix (see also [1], [3], [11], [32], [34], [35], and [38]for related results).

Lemma 2.1: Let be an arbitrary randomprocess observed either in discrete or continuous time. Forall

Corollary 2.1: Let

(2.4)

Then both tests belong to the class .

It should be pointed out that the following implication holds:

(2.5)

To show (2.5) we rewrite and as

and

Then, inequality (2.5) follows from the fact that

A consequence of (2.5) is that if the thresholds are chosenaccording to (2.4), then .

III. L OWER BOUNDS FOR MOMENTS

OF THE STOPPING TIME DISTRIBUTION

The following theorem is of fundamental importance forproving asymptotic optimality. It establishes the potentialasymptotic performance of any sequential or nonsequentialmultihypothesis test in the class as the risks go to zero.For convenience, we write . Alsothe standard abbreviation P-a.s. is used for the almost sureconvergence under the measure P.

Theorem 3.1:Assume there exists an increasing nonneg-ative function and positive finite constants

such that3

-

as (3.1)

In addition, assume that for all

P (3.2)

Then for all and

E

as (3.3)

where is the inverse function of , and whereas .

Proof: Let denote the class of tests for whichP , , and let .It follows from Tartakovsky [35, Lemma 2.1] that underconditions (3.1) and (3.2)

P

for every

Since for any , it followsthat

P

for every (3.4)

where .Denoting

and applying the Chebyshev inequality, we obtain

E P for any and

where by (3.4)

P for every

Thus

E for all

and (3.3) follows.

IV. A SYMPTOTIC OPTIMALITY OF THE MSPRT’S

A. The Case of i.i.d. Observations

So far we considered quite a general case imposing onlyminor restrictions on the structure of the observed process. Inthis subsection we consider the discrete-time case

and assume that, under hypothesis, the random3Note that (3.l) is nothing but the Strong Law of Large Numbers for the

normalized LLR’sZij(t)=f(t). The functionf(t) characterizes the degreeof nonhomogeneity of the LLR’s.

Page 5: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

2452 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999

variables are i.i.d. with a density (relativeto some sigma-finite measure) and that the densitiesand

are distinct for all . In other words, these densitiesdo not coincide with probability one (w.p. 1) in the sense thatP for all and all .

In general, the ’s will represent -valued random vectors,i.e., . Let us define to bethe log-likelihood ratio of the individual observation , i.e.,

and

Further, we define

E

Note that the are the Kullback–Leibler (KL) informa-tion numbers (“distances”), and is a minimal distancebetween the hypothesis and other hypotheses. Due to theaforementioned condition of distinctness of measures, theseKL distances are strictly positive. We shall also assume that

.In what follows, we will consider a general, asymmetric

case (with respect to risk constraints) where it is assumed thatthe numbers approach zero in such a way that for all

(4.1)

That is, we assume that the ratios (or moregenerally and ) are bounded away from and

. This guarantees that any does not go to zero at anexponentially faster (slower) rate than any other. Note thatwe do not require that the ’s go to zero at the same rate—ifthis were the case all the ’s would equal . The reason forallowing ’s other than is that there are many interestingproblems for which the risks for the various hypotheses maybe orders of magnitude different.

The following lemma establishes the asymptotic lowerbounds for the moments of the stopping time in the i.i.d.case. It is then used to prove the asymptotic optimality ofthe analyzed tests in the class (2.1) as .

Lemma 4.1: If are i.i.d. random vectorsand , then for any and

E as

(4.2)

Proof: In the i.i.d. case the condition Eis sufficient for conditions (3.1) and (3.2) to be satisfied with

and . Thus the lemma follows directlyfrom Theorem 3.1.

Now in order to prove first-order asymptotic optimality wehave only to show that the lower bounds (4.2) are achieved forthe considered tests. The details of the proof are given in theAppendix. The following theorem summarizes the main resultson the asymptotic performance and the asymptotic optimality

of the MSPRT’s with respect to any positive moment of thestopping time distribution. Everywhere below as

means that .

Theorem 4.1:Let .

i) For all and

E as

E as (4.3)

where and .

ii) If the thresholds are determined by (2.4), then

E E E

as for all (4.4)

For , the asymptotic formulas of (4.3) describe thebehavior of the first term of expansion of the mean samplesize. The behavior of the second term remains unclear—it maytend to infinity or may be of the order of a constant. In thecompanion paper [12], we derive higher order approximationsto the expected sample size up to a vanishing term. These ap-proximations require further conditions on the higher momentsof the LLR’s. As shown in [12], in a special asymmetric case,the second term is of the order of (a constant) wheneverthe second moment of LLR’s is finite. In general, however,the second term is shown to tend to infinity as a square rootof the threshold.

Remark 4.1:Theorem (4.1) remains valid in thecontinuous-time case where the LLR’s are random processeswith independent and stationary increments with finite firstabsolute moment.

Remark 4.2:We emphasize that the tests consideredasymptotically minimize all positive moments of the ob-servation time, under only the condition of positiveness andfiniteness of the KL distances—finiteness of higher ordermoments is not required. The required condition holds inmost practical problems. In particular, in problems of signaldetection, all that is required is that the signal-to-noise ratio(SNR) be finite and positive.

Remark 4.3: In the “symmetric” case (with respect to risks)where

for all as (4.5)

(cf. (4.1)), a much stronger (near optimality) result is truefor the expected observation time. More specifically, if thethresholds are chosen so that and (4.5) is fulfilled,then

E E as (4.6)

and the same is, of course, true for the test. This fact may bederived by using the results of Lorden [25]. We conjecture that(4.6) is also true for the more general case given in (4.1). Wedo not have a rigorous proof, but simulation results presentedin [12] support our conjecture.

Page 6: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

DRAGALIN et al.: MULTIHYPOTHESIS SEQUENTIAL PROBABILITY RATIO TESTS–PART I 2453

Remark 4.4:Since the two proposed MSPRT’s have dis-tinctly different structures, and are both asymptotically opti-mal, it is natural to ask if this asymptotic optimality is sharedby a host of other tests. Clearly, since we only have first-orderoptimality, simple modifications of the proposed MSPRT’s(such as adding a constant delay) preserve the optimalityproperty. However, it is interesting to see if some otherseemingly reasonable test with a very different structure willalso have this property. As an example, consider the following.For simplicity, let for all and let

be the maximum-likelihood test procedureof the form

if

where the threshold is chosen so that . Notethat this test is a natural extension of the optimum fixed samplesize test under a constraint on the Bayes risk . Now,using the results of Sosulin and Fishman [30, pp. 179–182] itcan be proved that in a symmetric i.i.d. Gaussian case

EE

as

That is, the seemingly reasonable maximum-likelihood test isnot asymptotically optimum.

B. The Non-i.i.d. Case

Many results in sequential analysis, including the ones thatwe obtained above, are based on the random walk structureof LLR’s. In this subsection we study the situation wherethe LLR’s lose this nice property, and are general, possiblycontinuous-time, processes. The motivation for this study isas follows. The random walk structure is not valid in manypractical applications where observations are nonhomogeneousand/or correlated. In addition, there is an important classof problems with nuisance parameters that admit invariantsolutions. Invariant LLR’s are not random walks even in thesimplest cases where the models generate i.i.d. observations.

Note that the derivation of lower bounds for E inTheorem 3.1 (see (3.3)) required only the Strong Law of LargeNumbers

w.p. 1 as

with some increasing function where play the roleof KL distances in this case. However, except in the i.i.d.case, this condition does not even guarantee finiteness ofthe moments of the stopping time. To obtain the asymptoticsfor moments of the stopping times and in the generalnon-i.i.d. case, we hence need to strengthen the convergencecondition.

Let be a random process, and let . Further,let

(4.7)

denote the last entry time of in the set. The Strong Law “ P-a.s. as ”

can be expressed in terms of as P forall positive . In other words, w.p. 1 if and only ifP for all positive .

The following is the strengthening of the Strong Law to theso-called -quick version (see, e.g., Lai [23] in the discrete-time case).

Definition: Let be a random process, and letbe as defined in (4.7). For , is said toconverge

-quickly under measure P to if and only if

E for all

where E denotes expectation with respect to P. If the corre-sponding -quick convergence condition holds for any positive, we say that the process convergesstrongly completely4

to .Let be an increasing nonnegative function with

being the inverse function. In many practical applications(see, e.g., examples in Section VI). We now

strengthen the a.s. convergence condition (3.1) in Theorem 3.1into the following -quick convergence for some positive:

- -

(4.8)

The following theorem establishes the asymptotic optimalityresult for general statistical models (without the i.i.d. assump-tion) when the -quick convergence condition (4.8) holds.

Theorem 4.2:Let the condition (4.8) hold for some .Then the following three assertions are true.

i) E , E , , for anyfinite and .

ii) In addition, if the condition (3.2) holds, then for alland

E as (4.9)

E as (4.10)

where , , and.

iii) If , , and thecondition (4.11) holds, then both tests belong to theclass , and for all and

E E E

as (4.11)

In particular, if with , then

E E E

(4.12)

4This definition of strong complete convergence should not be confusedwith that of complete convergence (which is equivalent to1-quick conver-gence) given in Hsu and Robbins [20].

Page 7: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

2454 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999

Note that in [35], Tartakovsky established a result similar to(4.11) for the MSPRT but for the situation where constraintsare imposed on the conditional error probabilities P .The asymptotics (4.9) and (4.10) are new and perhaps themost useful, since they allow us to obtain asymptotics undera variety of constraints.

The rigorous proof of Theorem 4.2 is given in the Appendix.Here we present a heuristic argument that leads to this theorem.We focus our attention on the test, since, as is clear from(2.5), the corresponding asymptotics for follow from thosefor .

Introduce the following notation:

Note that the condition (4.8) implies in particular that the meanvalue of the LLR is asymptotically equal to , as

. Thus we may expect that for the large

where is the stochastic part of .Now, if fluctuations of are not too large, which isexpressed in terms of P- -quick convergence

, then it may be expected that the stochastic part issubstantially less than (in average). Therefore,

which for large and gives

E

Since , we also expect

E

On the other hand, if we choose the numbersas in (2.4),the right-hand side of the latter inequality is the first term ofexpansion in the lower bound for E in the correspondingclass (see Theorem 3.1). Thus we expect that the testsand are asymptotically optimal when the error probabilitiesapproach zero.

Corollary 4.1: If the normalized LLR’s con-verge strongly completely to under P, then the asymptoticrelationships (4.9)–(4.11) hold for all , and hence thetest procedures and minimize any positive moment ofthe stopping time distribution.

It should be pointed out that since is not a Markovtime, it is usually not easy to check the-quick convergencecondition (4.8). The following condition implies (4.8) and issometimes useful. If, for any

P

(4.13)

then (4.8) is true. The integral in (4.13) is replaced by a sum inthe discrete-time case. (See Chow and Lai [7] for some relatedinequalities and conditions in the i.i.d. case, and Tartakovsky[35] for the general case.) This condition will be checked ina number of examples in Section VI.

Remark 4.5: In the i.i.d. case, the condition Eis necessary and sufficient for the-quick convergence of

(see (4.13) and [7]). Therefore, Theorem4.2 implies the asymptotic formulas (4.4) in Theorem 4.1for under conditions E . SinceTheorem 4.1 holds for any under the weaker conditionE , this shows that Theorem 4.1 does not followfrom Theorem 4.2.

V. SLIPPAGE PROBLEMS

In this section we apply the general results obtainedabove to a specific multiple decision problem called theslippage problem (see Ferguson [14], Mosteller [27]).Suppose there are populations whose distribution functions

are identical except possibly fordifferent shifts . On the basis of samples fromthe populations we wish to decide whether or not thepopulations are equal or one of these populations has slippedto the right of the rest and, if so, which one (e.g., whichis the “odd” one). In other words, we may ask whether ornot for some , we have . In thelanguage of hypothesis testing we want to test the nullhypothesis “ ” againstalternatives “ ,” , where .Thus in this case . The slippage problem is ofconsiderable practical importance and closely related to theso-called ranking and selection problem in which the goal ofan experimenter is to select the best population [4], [17].

A problem of this kind was first discussed by Mosteller [27]in a nonsequential setting for the nonparametric case whenboth the form of distribution function , and values of and

are unknown. Ferguson [14] generalized this result for thecase of arbitrary (but known) distributions and(not necessarily just with different means), i.e., when, underhypothesis , all populations have the same distribution

and, under , the th representative has some differentdistribution . Tartakovsky [33] considered the case ofpossibly different (and unknown) distribution functions

in a minimax (again nonsequential) setting. Asa result, the minimax-invariant solution to this problem hasbeen obtained.

Another interesting application of this model is in signal(target) detection in a multichannel (multiresolution) system.There may be no useful signal at all (hypothesis) or asignal may be present in one of the channels, in theth,say (hypothesis ). It is necessary to detect a signal assoon as possible and to indicate the number of the channelwhere the signal is located. This important practical problemwill be emphasized in the subsequent examples. Moreover,we will consider a more general case of possibly correlatedand nonidentically distributed observations in each population

Page 8: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

DRAGALIN et al.: MULTIHYPOTHESIS SEQUENTIAL PROBABILITY RATIO TESTS–PART I 2455

while populations will be assumed statistically independent ofeach other.

To be specific, let be an-component process observed either in discrete or continu-

ous time. The component corresponds to the observationin the th channel. It is assumed that all components maybe observed simultaneously and may have a fairly generalstructure. We also suppose that they are mutually independent.For the problem under consideration, the following three-valued loss function is appropriate:

forforforotherwise.

(5.1)

That is, we assume that the losses associated with falsealarms, missing the signal, and choosing the wrong signal are,respectively, given by , , and . The decision risksare then given by

(5.2)

(5.3)

where P , , are the corresponding errorprobabilities.

Consider a commonly used additive model [2], [30], [32]where an observed process in theth channel represents eitherthe additive mixture of a useful signal with a noiseor only noise

ifif

(5.4)

where the superscriptmeans that the process is regardedunder , i.e., when a signal is present in theth channel. Ingeneral, the signal could be random and its structure maybe different for the various channels.

Since and

P

P

P

P

the LLR depends on the observation processthroughonly the components and .

In order to apply general results of Section IV, we have tocheck the validity of the required conditions. This will be donein the next section for several specific examples.

VI. EXAMPLES

In this section we consider examples which are mean-ingful for many applications including target detection inmultiple-resolution radar [26] and infrared systems [13], signalacquisition in direct sequence code-division multiple-access

systems [37], and statistical pattern recognition [16]. The firstis a discrete-time but non-i.i.d. example, the second is acontinuous-time example, and the third includes nonparametricmodels and invariant sequential tests. The main goal in theseexamples is to show that the conditions required for our results,particularly in the non-i.i.d. cases, are reasonable and areusually met in practice. Detailed numerical and simulationresults for the i.i.d. case are presented in the companion paper[12].

Example 1: Discrete-time detection/identification of a de-terministic signal in the presence of correlated noise in a mul-tichannel system.Assume that the functionsin (5.4) are deterministic and the noise processes

are stable first-order autoregressive Gaussian pro-cesses, i.e.,

where are i.i.d. Gaussian variables with zeromean and variance ( and are independent), and

. It is easy to show that the LLR’s are of the form

and hence under

Here the “tilde values” are forand , and similarly for .

We denote the accumulated SNR for channelup to timeby

Assume that

for some (6.1)

where are finite positive numbers. Now wewill establish that

P -strongly completely

To establish this it is sufficient to show that (see (4.13))

P

for some and all (6.2)

where . Since is a weightedGaussian sample sum with mean zero and variance

Page 9: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

2456 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999

(for large ), it is easy to show that there is a numbersuch that

P

and hence (6.2) is fulfilled. Thus by Theorem 4.2, the asymp-totic equalities (4.11) hold with , ;

, , , and the tests andare asymptotically optimal.

It should be noted that in the application to signal detection,the quantity is a constraint on the risk associated withmissing the signal when it is actually present on one of thechannels (see (5.2)). Similarly, is a constraint onthe risk associated with deciding that the signal is present onchannel , which is a weighted sum of false alarm probabilityand the probability of incorrect classification (see (5.3)).Symmetry assumptions generally yield for

. Now, since typically the accumulated SNR inthe th channel does not depend on the number of the channel,i.e., is same for all and equal to (say), the asymptoticformulas become

E E

(6.3)

E E (6.4)

In particular, if , then it is easy to see thatworks in (6.1), as well as in (6.3) and (6.4). It is also easy tosee that the corresponding SNR is .In this case, the expected stopping times are proportional tothe risk constraints and inversely proportional to the SNR.

Now, from the analysis of Section IV-B, it is not clear ifcondition (4.8) is necessary to guarantee the correspondingoptimality of the tests. In the following, we show that if (4.8)is not fulfilled, the sequential tests considered may have infinitemoments. At the same time the best fixed sample size test hasa finite sample size.

Instead of (6.1), which postulates the growth of the SNR asa power of for sufficiently large , suppose the followingcondition holds:

as (6.5)

i.e., the SNR grows as for sufficiently large . Since this“law” is too slow, one may expect that-quick convergencedoes not hold. Indeed, it is easy to see that

-

but not -quickly (for any ), since for sufficiently small

E P

where, as above in (6.2)

and for , and where is thestandard normal distribution function. Thus Eand Theorem 4.2 cannot be applied. In fact, in this caseE for sufficiently small (actually for ,see, e.g., Golubev and Khas’minskii [18]).

Example 2: Slippage problem for nonhomogeneous Poissonprocess.In applications involving infrared and optical warningsystems, an appropriate model for noise and clutter is a pointrandom process. Below we consider a multichannel system inwhich the noise process is a nonhomogeneous Poisson processand target appearance leads to a change in the intensity of thisprocess. Specifically, under hypothesis (target absent), letthe observed process be avector nonstationary Poisson random process with independentcomponents each of which has intensity while, under(target is located in theth channel), theth component hasthe intensity . Then

E

where

Note that the LLR’s are processes with independent but(generally) nonstationary increments

Now assume that is a power function, ,where and . Then

E

and the increments are bounded as

Thus converges P- -quickly for all positive (i.e.,P -strongly completely) to the numbers

(6.6)

Page 10: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

DRAGALIN et al.: MULTIHYPOTHESIS SEQUENTIAL PROBABILITY RATIO TESTS–PART I 2457

By Theorem 4.2, the MSPRT’s are asymptotically optimal rel-ative to any positive moment of the stopping time distribution.Note that in this case and hence (4.12) applies.

If and for all , then using(6.6), we obtain

and

Thus from (4.12)

E E

E E

Example 3: Nonparametric detection of a target in a mul-tichannel system when training clutter data is available.Sofar we considered the case of simple hypotheses assumingthat the measures PP P were completely known.However, in many practical applications, the models areknown only partially (parametric uncertainty) or they mayeven be unknown (nonparametric uncertainty). The aboveideas, particularly Theorem 4.2, may be used to prove asymp-totic optimality of invariant multihypothesis sequential testsfor composite hypotheses “ P ,” .Here are families of distributions, either parametric ornonparametric, and it is assumed that they can be reduced tosimple ones by using the principle of invariance (for furtherdetails related to the invariance principle in testing hypotheses,see, e.g., Ferguson [14] and Lehmann [22]).5

In fact, Theorem 4.2 and Corollary 4.1 remain true if weuse the LLR’s constructed for a maximal invariantand if the class includes only invariant tests withthe constraints (2.1). In particular, if convergesstrongly completely to , then the corresponding invariantMSPRT’s minimize all positive moments of the stopping timedistribution in the class of invariant sequential tests with thecorresponding constraints imposed on risks.

To illustrate this point, we apply Theorem 4.2 to an -sample nonparametric problem with Lehmannhypotheses (or proportional hazards, if we replace P byPbelow) and show that the extended multialternative version ofthe Savage sequential test is asymptotically optimal with re-spect to any positive moment of the stopping time distribution.

Let , where arei.i.d. with a continuous distribution function P, and indepen-dent of , which are i.i.d. either with Por one ofthem (say, the th one) has the distribution P, whereare specified positive constants, , and P is completelyunknown. In other words, the hypotheses are

P P for all5Another possible approach is to use adaptive tests that employ estimates

of the unknown nuisance parameters.

and

P for and P P

An interesting application of this problem is in target detec-tion in the -channel system in the presence of clutter withunknown continuous distribution P, based on unclassified data

when a classified training sequence isalso available. That is, it is known in advance that the clutteronly generates the data .

A maximal invariant with respect to the group

where is any continuous increasing function, is the vectorof ranks of among . Let

P

be the empirical distribution functions where . Forthe sake of simplicity consider . Then the LLR of themaximal invariant is [28]

P P

P P

Define

P P

P P

P P P P

P P

By [28, Lemma 2], for any there exists a numbersuch that

P

which obviously implies the strong complete convergence ofto under P.

Thus based on Theorem 4.2, one may conclude that Sav-age’s nonparametric sequential rank-order test asymptoticallyminimizes all the moments of the stopping time distribution inthe class of invariant tests when the risks and approachzero. A similar result is valid for the more general casewhen .

VII. CONCLUSIONS

Most of the research on sequential hypothesis testing overthe last fifty years, starting with the classical works of Wald[39] and Wald and Wolfowitz [40], has dealt with the case ofi.i.d. observations and the optimality (or asymptotic optimality)of sequential tests relative to the expected sample size. How-ever, in many practical applications the i.i.d. assumption doesnot hold. Furthermore, the behavior of higher order moments

Page 11: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

2458 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999

may be of interest. The results presented in this paper showthat the proposed MSPRT’s are asymptotically optimal underfairly general conditions that cover nonhomogeneous andcorrelated stochastic processes observed either in discrete orcontinuous time, stochastic models with nuisance parameters,and even nonparametric models. In addition, the tests areshown to be asymptotically optimal not only with respect tothe expected sample size but also with respect to any momentof the stopping time distribution. This justifies the use of theMSPRT’s in a variety of applications, some of which havebeen described in this paper.

Throughout the analysis of the paper we assumed thatthe hypotheses were simple with respect to the informativeparameters. If the hypotheses are composite, then an adaptiveapproach may be applied (see, e.g., Dragalin and Novikov[11]) to find asymptotically optimal solutions. However, wehave reasons to believe that in spite of their asymptoticoptimality, the corresponding adaptive tests will not performwell in practice. (The asymptotic convergence is generallytoo slow for such tests.) Thus a reasonable partitioning ofcomposite hypotheses into a number of simple ones andsubsequent application of the MSPRT’s may be beneficial inpractice.

The results presented above do not cover an important casefor many applications, namely, one where there is an “indif-ference” zone. This important extension will be consideredelsewhere.

APPENDIX

Proof of Lemma 2.1:Obviously,

P

E

E

where is the indicator of the event . Here we usedthe Fubini theorem and the definition of the stopping timeaccording to which

on

Consider the second test. For the Markov times, we havethe equivalent representation

which implies the inequality

for all on

Next, by Wald’s likelihood ratio identity [29], [39], [41]

E

Combining the last two relationships, we get

P for all

Since the event implies theevent , it follows that

P

and the proof is complete.

Proof of Theorem 4.1:To prove the theorem we shall needthe following result, which may be of independent interest.

Lemma A.1: Let be i.i.d., and supposefor all . Then

i) the stopping times and are exponentially boundedand hence E , E for any positivefinite , , and ;

ii) the families

and

are uniformly integrable with respect to Pfor all.

Proof: Note first that both stopping times and donot exceed the stopping time

if

and

for tests and , respectively. Thus to prove i) it is sufficientto prove that is exponentially bounded for . Toprove ii) we have only to prove the uniform integrability of

.i) Obviously,

P P

P

Applying Markov’s inequality

P E

we obtain that for any

P E

E

where

E

Page 12: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

DRAGALIN et al.: MULTIHYPOTHESIS SEQUENTIAL PROBABILITY RATIO TESTS–PART I 2459

Evidently, for any when(in fact, P as ).

Thus there is a finite positive constant and a number ,, such that P

and hence the Markov time is exponentially bounded forall finite positive , which implies the assertion i).

ii) For , define the random variables

Since

and

It follows from the construction that are i.i.d. andexponentially bounded and that are i.i.d. positiverandom variables (under P). Further, define the stopping time

It is easy to see that is a sum of i.i.d. and exponentiallybounded random variables

By Theorem I.6.1 of Gut [19], the family isuniformly integrable whenever is uniformlyintegrable. Thus to prove the desired result we have onlyto show the uniform integrability of the latter family for all

.Since the increments of the random walks

are positive

where

We apply [19, Theorem III.7.1] to show thatis uniformly integrable under Pfor all and

hence the family is uniformly integrabletoo (for all ). This completes the proof of ii).

Proof of i) in Theorem 4.1:Consider the MSPRT , andlet . By Gut [19, Theorem A.1.1], the follow-ing convergence of moments

E as

holds whenever

(C1) E for all ,

(C2) the family is P -uniformlyintegrable,

(C3) P -a.s. as .

By Lemma A.1, Conditions (C1) and (C2) are satisfied. Thea.s. convergence (C3) follows from Baum and Veeravalli [3,Theorem 5.1].

For the MSPRT Conditions (C1) and (C2) follow fromLemma A.1. The convergence

-as

where , is proved in essentially the sameway as (C3).

Proof of ii) in Theorem 4.1:To prove (4.4) it suffices toshow that the lower bound (4.2) is achieved for the testsand . This follows immediately by substitution of

and

in (4.3). This completes the proof.

Proof of Theorem 4.2:Theorem 4.2 will be proved only forthe MSPRT . For the MSPRT the proof is essentially thesame and is omitted. In fact, most of the results, including theasymptotic optimality of , follow immediately from (2.5).

Proof of i) and ii). Write

Let and . By thedefinition of the stopping time

and by condition (4.8)

on the set

These two inequalities show that

on

Hence for any

(A.1)

where is the indicator of the set .

Page 13: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

2460 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999

Since , and since by the assumptions of the theorem

E E

it follows from (A.1) that E for any positive set ofthresholds . Thus assertion i) holds.

Furthermore, (A.1) implies

E

for all as

where . Letting , we obtain thefollowing upper estimate:

E as

(A.2)

To prove ii) for the MSPRT (the relation 4.10)) it sufficesto show that the right-hand side of (A.2) is also the lowerestimate for E . To this end, we first show that (similarlyto (3.4))

P

as for every (A.3)

Write . Obviously, for anyand

P E

E

P

P P

Since

P P P

P P

and since by Theorem 2.1 of Tartakovsky [35]

P

we obtain

P

P (A.4)

Now, set with . Then

P

P

P

P

P

By the condition (3.2), for any there is a finite w.p. 1random variable such that

(in fact, one may take ) and hence

P

P

By the conditions (4.8) and (3.2), the right-hand side of thelatter inequality approaches zero when and

. Thus

P for every

Now, setting with andusing (A.4) and (A.5), we obtain that for all

P

P

as

which proves (A.3).Finally, by Chebyshev’s inequality

E P

for any

where by (A.3) the probability in the right-hand side tendsto as for every . This shows that

E for any

which along with (A.2) proves (4.10).

Page 14: Multihypothesis sequential probability ratio tests-part I ......Examples include, among a multitude of others, target detection in ... to an asymptotic analysis using nonlinear renewal

DRAGALIN et al.: MULTIHYPOTHESIS SEQUENTIAL PROBABILITY RATIO TESTS–PART I 2461

Proof of iii): To prove (4.11) it remains to set, and to use Corollary 2.1, Theorem 3.1,

and (4.10). This completes the proof.

ACKNOWLEDGMENT

The authors are grateful to the reviewers whose commentshave led to improvements in the paper.

REFERENCES

[1] P. Armitage, “Sequential analysis with more than two alternative hy-potheses, and its relation to discriminant function analysis,”J. Roy.Statist. Soc. B, vol. 12, pp. 137–144, 1950.

[2] P. A. Bakut, I. A. Bolshakov, B. M. Gerasimov, A. A. Kuriksha, V. G.Repin, G. P. Tartakovsky, and V. V. Shirokov,Statistical Radar Theory,vol. 1, G. P. Tartakovsky, Ed. Moscow, USSR: Sov. Radio, 1963 (inRussian).

[3] C. W. Baum and V. V. Veeravalli, “A sequential procedure for multihy-pothesis testing,”IEEE Trans. Inform. Theory, vol. 40, pp. 1994–2007,Nov. 1994.

[4] R. E. Bechhofer, J. Kiefer, and M. Sobel,Sequential Identificationand Ranking Procedures (with special reference to Koopman–Darmoisfamilies). Chicago, IL: Univ. Chicago Press, 1968.

[5] H. Chernoff, “Sequential design of experiments,”Ann. Math. Statist.,vol. 30, pp. 755–770, 1959.

[6] , Sequential Analysis and Optimal Design.Philadelphia, PA:SIAM, 1972.

[7] Y.S. Chow and T. L. Lai, “Some one-sided theorems on the taildistribution of sample sums with applications to the last time and largestexcess of boundary crossing,”Trans. Amer. Math. Soc., vol. 208, pp.51–72, 1975.

[8] V. P. Dragalin, “Asymptotic solution of a problem of detecting a signalfrom k channels,”Russ. Math. Surv., vol. 42, no. 3, pp. 213–214, 1987.

[9] , “Asymptotic solution of a problem of signal searching in multi-channel system,”Mat. Issled (Acad, Sci. Moldova), vol. 109, pp. 15–35,1989 (in Russian).

[10] , “Third-order asymptotics for a sequential selection procedure,”Inst. Applications of Math. and Informatics, Milan, Italy, Tech. Rep.,1994; to appear inStatist. Decisions(Special Issue in Honor of ProfessorBechhofer).

[11] V. P. Dragalin and A. A. Novikov, “Adaptive sequential tests forcomposite hypotheses,” Inst. Applications of Math. and Informatics,Milan, Italy, Tech. Rep. 94.4, 1994.

[12] V. P. Dragalin, A. G. Tartakovsky, and V. Veeravalli, “Multihypothesissequential probability ratio tests, part II: Accurate asymptotic expansionsfor the expected sample size,”IEEE Trans. Inform. Theory, submittedfor publication..

[13] B. Emlsee, S. K. Rogers, M. P. DeSimio, and R. A. Raines, “Completeautomatic target recognition system for tactical forward-looking infraredimages,”Opt. Eng., vol. 36, pp. 2593–2603, Sept. 1997.

[14] T. S. Ferguson,Mathematical Statistics: A Decision Theoretic Approach.New York/London: Academic, 1967.

[15] M. M. Fishman, “Average duration of asymptotically optimal multi-alternative sequential procedure for recognition of processes,”Sov. J.Commun. Technol. Electron., vol. 30, pp. 2541–2548, 1987.

[16] K. S. Fu, Sequential Methods in Pattern Recognition and Learning.New York: Academic, 1968.

[17] J. D. Gibbons, I. Olkin, and M. Sobel,Selecting and Ordering Popula-tions: A New Statistical Methodology.New York: Wiley, 1977.

[18] G. K. Golubev and R. Z. Khas’minskii, “Sequential testing for severalsignals in Gaussian white noise,”Theory Prob. Appl., vol. 28, pp.573–584, 1983.

[19] A. Gut, Stopped Random Walks: Limit Theorems and Applications.New York: Springer-Verlag, 1988.

[20] P. L. Hsu and H. Robbins, “Complete convergence and the law of largenumbers,”Proc. Nat. Acad. Sci. U.S.A., vol. 33, pp. 25–31, 1947.

[21] J. Kiefer and J. Sacks, “Asymptotically optimal sequential inference anddesign,”Ann. Math. Statist., vol. 34, pp. 705–750, 1963.

[22] E. L. Lehmann,Testing Statistical Hypotheses.New York: Wiley,1959.

[23] T. L. Lai, “Asymptotic optimality of invariant sequential probabilityratio test,”Ann. Statist., vol. 9, pp. 318–333, 1981.

[24] R. Sh. Liptser and A. N. Shiryaev,Statistics of Random Processes, vol.1. New York: Springer-Verlag, 1977.

[25] G. Lorden, “Nearly-optimal sequential tests for finitely many parametervalues,” Ann. Statist., vol. 5, pp. 1–21, 1977.

[26] M. B. Marcus and P. Swerling, “Sequential detection in radar withmultiple resolution elements,”IRE Trans. Inform. Theory, vol. IT-8, pp.237–245, 1962.

[27] F. Mosteller, “Ak-sample slippage test for an extreme population,”Ann.Math. Statist., vol. 19, pp. 58–65, 1948.

[28] I. R. Savage and J. Sethuraman, “Stopping time of a rank-ordersequential probability ratio test based on Lehmann alternatives,”Ann.Math. Statist., vol. 37, pp. 1154–1160, 1966.

[29] D. Siegmund, Sequential Analysis: Tests and Confidence Intervals.New York: Springer-Verlag, 1985.

[30] Y. G. Sosulin and M. M. Fishman,Theory of Sequential Decisions andIts Applications. Moscow, USSR: Radio i Svyaz’, 1985 (in Russian).

[31] A. G. Tartakovskii, “Sequential testing of many simple hypotheses withdependent observations,”Probl. Inform. Transm., vol. 24, pp. 299–309,1988.

[32] , Sequential Methods in the Theory of Information Systems.Moscow, Russia: Radio i Svyaz’, 1991 (in Russian).

[33] , “Minimax-invariant regret solution to theN -sample slippageproblem,” Math. Meth. Statist., vol. 6, no. 4, pp. 491–508, 1997.

[34] , “Asymptotically optimal sequential tests for nonhomogeneousprocesses,”Sequential Anal., vol. 17, no. 1, pp. 33–61, 1998.

[35] , “Asymptotic optimality of certain multihypothesis sequentialtests: non-i.i.d. case,”Statist. Inference for Stochastic Processes, 1999.

[36] V. V. Veeravalli and C. W. Baum, “Asymptotic efficiency of a se-quential multihypothesis test,”IEEE Trans. Inform. Theory, vol. 41, pp.1994–1997, Nov. 1995.

[37] , “Hybrid acquisition of direct sequence CDMA signals,”Int. J.Wireless Inform. Networks,vol. 3, pp. 55–65, Jan. 1996.

[38] N. V. Verdenskaya and A. G. Tartakovskii, “Asymptotically optimalsequential testing of multiple hypotheses for nonhomogeneous Gaussianprocesses in an asymmetric situation,”Theory Probab. Appl., vol. 36,pp. 536–547, 1991.

[39] A. Wald, Sequential Analysis. New York: Wiley, 1947.[40] A. Wald and J. Wolfowitz, “Optimal character of the sequential proba-

bility ratio test,” Ann. Math. Statist., vol. 19, pp. 326–339, 1948.[41] M. Woodroofe, Nonlinear Renewal Theory in Sequential Analysis.

Philadelphia, PA: SIAM, 1982.