an exact mcnemar test for paired binary markov chains

An Exact McNemar Test for Paired Binary Markov ChainsAuthor(s): Woollcott Smith and Andrew R. SolowSource: Biometrics, Vol. 52, No. 3 (Sep., 1996), pp. 1063-1070Published by: International Biometric SocietyStable URL: http://www.jstor.org/stable/2533067 .

Accessed: 28/06/2014 17:11

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access toBiometrics.

http://www.jstor.org

This content downloaded from 193.142.30.116 on Sat, 28 Jun 2014 17:11:41 PMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=ibs

http://www.jstor.org/stable/2533067?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


BIOMETRICS 52, 1063-1070 September 1996

An Exact McNemar Test for Paired Binary Markov Chains

Woollcott Smith

Statistics Department, Temple University, Philadelphia, Pennsylvania 19122, U.S.A.

and

Andrew R. Solow

Woods Hole Oceanographic Institution, Woods Hole, Massachusetts 02543, U.S.A.

SUMMARY

A straightforward extension of the McNemar test for paired binary data yields an exact test for the equality of the limiting marginal distributions for bivariate binary Markov chains. The exact distribution of the test statistics under the null hypothesis of equal marginals depends on the classical cell occupancy statistics for the Bose-Einstein model. Exact p-values are computed for the one-sided test, and the mean and variance of the test statistic are found. The power of the Markov- McNemar test is found to be close to the power of the classical McNemar test for independent paired observations when the independence assumption holds. The method is applied to the comparison of ribosomal DNA sequences.

1. Introduction Paired experiments or observational studies are often designed to compare the probability of success or failure in two groups. The classic McNemar procedure tests the null hypothesis that the probabilities of success in both groups are equal, while accounting for the dependence within the matched pairs. The fundamental assumption in this test and other classical tests is that the paired observations are independent and identically distributed. In some cases we may wish to compare paired binary series where the observations are not only paired, as in the McNemar test, but corre- lated over time. For example, suppose the weather is observed on a series of days in two locations, say New York and Philadelphia, and for each day the binary outcomes "rain" or "no rain" are recorded for each location. Clearly dependence over time must be accounted for when constructing a McNemar-like test for the null hypothesis that New York City and Philadelphia have the same probability of a "rainy" day.

In this note we make the fundamental assumption that the paired observations sequence { (Xn, Yn7)} is from a first-order Markov chain, and then test the null hypothesis that the limiting marginal probabilities are equal, i.e.,

lim Pr(Xni) lim Pr(Yn =i), for i-0,1. (1) ne*on n-oon

Statistical methods for Markov chains have been developed by Billingsley (1961), Boza (1971), Chatfield (1973), Morgan (1976), and Bedrick and Aragon (1989). Liang and Zeger (1989) have used general linear models to estimate the parameters of a multivariate binary Markov chain. All these results use asymptotic methods and distribution theory. In contrast the test developed here is exact and unbiased, and the exact distributions are easily computable from well-known combinatorial results for the Bose-Einstein rearrangements.

The paper proceeds in the following steps. In Section 2 the basic definitions and notation for bivariate binary time homogeneous Markov chains are developed, we introduce the Markov-McNemar

Key words: Binary processes; Bose-Einstein; Markov chains; McNemar tests; Ribosomal DNA.

1063



1064 Biometrics, September 1996

test for bivariate binary processes, and the conditional distribution of the test statistic is found under the null and alternate hypotheses. Finally, the details of this test are illustrated using a comparison of ribosomal DNA sequences, with application to phylogenetic analysis.

2. A Paired Markov-McNemar Test In this note we consider a time homogenous bivariate binary chain {(Xn, Y,2)} with its four states, and let P = [pii,, jo] denote the 4 x 4 one-step transition matrix for the joint time homogenous chain

Pii,j Pr (Xn,+ I Yj, Y+1 =j|Xn = ,Yn1= ) Since the rows of a Markov transition matrix fpjipj3 ] sum to 1, this leaves 3 * 4 = 12 parameters to describe the dependent structure of the joint chain. In this note we consider the limiting distribution of the Markov chain,

lime Pr(X,, j Ynt Y X , Y z = i) ca (2)

Tavare and Altham (1983), Solow, Smith, and Recchia (1995), and others have investigated tests for the null hypothesis that the limiting distribution (equation (2)) is independent, i.e., Ho: W3jl = wj.3.3'. In contrast, for the McNemar test the null hypothesis is that the marginals are equal, i.e., Ho: wri. = wr.i. Both hypotheses on the limiting distribution impose a single degree of freedom constraint on the original 12-parameter Markov transition matrix. In general, exact unbiased tests are unavailable for the first null hypothesis, Ho. Thus, it is interesting, as this note demonstrates, that an exact conditional test does exist for the McNemar null hypothesis, Ho.

The construction of the Markov-McNemar test is similar to the construction of the classical McNemar test of independent paired observations. As in the classical McNemar test, the limiting marginal distributions are equal, i.e., Wj.i = -Fi., if and only if -rlo = wrol. This leads us to consider the embedded process observed only at the time points m,, when there are discordant pairs

I{ 1, if = 0 and YE 1,_ (3) 0, if Xm,,, 1 and Y,,., 0.

If the original binary process { (Xn, Y7t)} is time homogeneous and Markov, then the embedded process {Zn } is also time homogenous and Markov (Kemeny and Snell, 1960). For example, consider the following one-step transition probabilities

a Pr(Zt,+l = 1 0 Z,, = 0) = Pr(Am,1, I (X?11,,, v -t ) = (1, 0)),

where A,2, is the event that, after time point mr1, the bivariate time homogenous Markov process first visits the state (x, y) = (0, 1) before visiting the state (x, y) = (1, 0). Clearly, A,1,,. is defined in terms of events after the rn7th time point. Thus, by the Markov property, the probability that the event Atn* occurs depends on the past only through the present state (Xm1,, Yr1l, ) = (0, 1) or equivalently Zn = 1. By the time homogenous property of the original Markov chain, the probability that the event A,,,- occurs does not depend on the nth renewal time m'n, and thus the embedded chain {Zn } is time homogeneous in n.

The properties of the time homogenous stationary binary Markov chain {Zn4 play a fundamental role in this analysis. We introduce here some notation and the basic results needed. Consider a binary Markov chain with one-step transition matrix

P = [Pig 1-S (4)

where Pr(Zn j i) Pij, i= 0,1 and j 0,1, with limiting distribution

wrT a+/3 and =+3 7 (5)

Let N = [Nit] denote the random table of one-step transition counts for the Markov chain {Zn}. N and the initial value Z1 i are sufficient statistics for this binary chain. The distribution of N given the initial state Zi i is given in Kedem (1980) and Gabriel (1959) and is a special case of the general Markov chain results in Billingsley (1961). The probability of observing a 2 x 2 transition matrix n= (nij] can be written as

Pr(N =n 4 1=i) =D(n, i)a?2t01 t3?1O (1 - a)720 (1 - A) tni1 ,(6)



McNemar Test for Binary Markov Chains 1065

where D(n, i) counts the number of paths starting at state i with transition matrix n and

D(ni) (i+ nol+n1+1 n-1o) (7l1+ no-Z (7)

Equation (6) is derived by noting that each sample path with transition matrix n has probability &2'01 ,3`10 (1 - C)rnoo (1-_ 3)n7 1. Each sample path consists of alternating runs of zeros and ones. The first binomial term in D(n, i) counts the number of ways of placing the r = niI 1 -> 1 transitions into the k = i + nol runs labeled one. This is equivalent to counting the number of ways of placing r indistinguishable balls into k cells, which by standard results (Feller, 1957) is

(k~r-1)

The second term of D(n, i) counts the number of ways of placing the nioo 0 -> 0 transitions into the 1 - i + nlo runs labeled zero.

Since 0-to-i transitions alternate with 1-to-0 transitions, the following relationships hold between the out-of-state transition counts, given that ZI = i:

nio = L - -l + and n1o = n- ?-no( - nll - nol (8)

where n.. = noo + aol + nro + rill and [x] is the largest integer less than or equal to x. Thus, Z1, Nil, and the number of same-state transitions S = Noo + Nil are sufficient statistics, with probability distribution given by equations (6) and (8).

The distribution in equation (6) depends on the two parameters a and 3. However, the distribution of the test statistic Nil conditioned on the number of same-state transitions Nil + Noo = s depends only on 6 ( 1 - 3)/(1 - a), since

,anl,3no1(I - C)s-J(I _ 3)3 = Cenol, flO(I-_aybj.

Thus, given NiI + Noo = s and ZI = i, the conditional distribution of NI1 depends only on the parameter 6, so that

Pr (Nl I j I NI I + Not) = s, Z - i) =C lD(nj, i) 6j, (9)

where* nj denotes the transition matrix

(s-j nol ) (10) nlio i

and the constant C is the usual normalization constant,

j=s C = D(nlj, i)6j. (1

j=o The null hypothesis that wro = wri 1/2 is equivalent to a or 6 1. Under this null

hypothesis, the normalization constant (equation (11)) is

co n (12)

Equation (12) follows directly from a combinatorial identity (Feller, 1967). Also, the distribution under the null hypothesis 6 1 is equivalent to the Bose-Einstein distribution (Feller, 1967; Johnson and Kotz, 1977). Under the null hypothesis, all distinct paths are equally likely. All arrangements of paths can be formed by placing nli + noo = s same-state transitions into rn = noi + nio + 1 runs (or cells). The event that Nil = j is just the event that j same-state transitions are placed in the k = noi + i runs labeled 1.

Kunte (1977) gives the second-moment properties for the Bose-Einstein rearrangements. From these it is a standard calculation to show that the mean and variance for the conditional distribution (equation (9)) with 6 1 is

Ilk(m, s) = sm and 52 izs) =(k(m(+s)s()n k ) 7 (13)




where m, k, and s are defined in the paragraph above. The reader should note that in equation (13) 2 (Mi, s), the variance for the number of balls in a single cell, corrects a misprint in the original

Kunte paper. Equation (13) can be used to obtain a normal approximation to the distribution for N11.

The results above are related to the work of Gabriel (1959) on the distribution of the number of successes in a binary Markov chain. The primary difference is that the distribution of the test statistics NII is conditioned on the number of observed same-state transitions in the {Z?,} chain, and depends on a single parameter, 6. It is this conditional argument that leads to the exact unbiased test and power calculation described below. Klotz (1973) used a similar conditional argument to find an exact unbiased test for Markov dependence in a sequence of Bernoulli trials.

To illustrate this test procedure, Table 1 gives selected exact p-values for Markov-McNemar tests of the null hypothesis of equal marginals (equation (1)) against the one-sided alternative that

lim Pr(X, = 1) < lim Pr(Y?2 = 1),

which is equivalent to

lim Pr(Z = 1) > 1/2 n-*oo

for the embedded Markov chain defined by equation (2). We consider the case where there are exactly s 50 same-state transitions (Noo + Nil -50)

and a variable number of out-of-state transitions (r =no + nfo); thus, there are a total of r- + s + 1 time points and r + s embedded chain transitions. We assume that the initial state is ZI = 1. Table 1 uses the probabilities given in equations (9), (10), and (12) to compute the exact p-values, for nil -_ 36. Also tabled is the normal approximative to this p-value, which uses the moments given in (13) and the usual continuity correction.

Table 1 Markov-McNemar test. p-value = Pr(Nil > 36 l N1l + Noo = 50, NIO + No1 r, Z1 = 1).

N1o +No, = r 1 2 5 10 20 30 50 100 200

Markov p-value, exact 0.294 0.498 0.165 0.160 0.072 0.041 0.020 0.008 0.004 Markov p-value, approx. 0.238 0.429 0.147 0.150 0.071 0.042 0.021 0.009 0.004

To further consider the behavior of this Markov-McNemar test and to compare it to the classical McNemar test we consider the true level and power of the classical McNemar test and the Markov- McNemar test. Conditioned on the total number of discordant observations, the power of both tests depends on the two parameters a and d of the discordant Markov process. Given in Table 2 are the limiting probability wli defined in equation (5) and the limiting first-order correlation (Singh and Sutradhar, 1989) given by

p lmc?v(Zi+l, Z.) 1-- igloo var(Zi)

In both the independent and McNemar case we test the null hypothesis that wl = .5 against the one-sided alternative that wli > .5. When pi = 0, the process {Z2} is a sequence of independent Bernoulli trials and the assumptions of the classical McNemar test are met.

In the notation of this paper the classical McNemar test statistic is the number of observed transition to one, N.1 -_ NII + Nol. As described in this paper, the Markov-McNemar test statistic is Nil, conditioned on the number of same-state transitions s = NOO + Nil. Exact power is computed in the following way. First, exact a level tests are constructed using a randomized decision rule. Thus, for the Markov-McNemar test for each same-state transition count NOO + NI I s, a randomized decision rule is found such that the null hypothesis is rejected if NII > (s) and the null hypothesis is rejected with probability p(s) if N11 T ne(s). Then for each s, flu(s) and p(s) are found, such that

c*=p(s)Pr(Nii = nc(s) |N11 + N00 s, Z1 =i) (14)

+ Pr(Niil > nc(s) |N11 + N00 = , Zl i).




For the classical McNemar test, standard binomial calculations yield a randomized exact a level test. In this case an n* and a p* are found such that the null hypothesis is rejected if N.1 > nr and the null hypothesis is rejected with probability p* if N.1 = n*_

Given Z1 1 and the distribution defined in (6), it is a straightforward computational problem to find the exact distribution of NOO and NIl and thus compute the exact level and power of both procedures.

Table 2 Probability of rejecting Ho for nominal a* = .05 level Markov-McNemar and

independent McNemar tests for 100 discordant transitions, ZI 1

PI = 0 PI = .4 PI -.4 Ho Hi Ho Hi Ho Hi

ak .50 .60 .30 .40 .70 .80 S3 .50 .40 .30 .20 .70 .60 ri .50 .60 .50 .667 .50 .57 Markov .050 .626 .050 .676 .050 .701 Independent .050 .641 .150 .883 .006 .353

The last two rows of Table 2 give the probability of rejecting the null hypothesis for the Markov- McNemar and the independent McNemar test. The true level of the classical test depends strongly on the serial correlation of the Markov chain. For example, when PI = .4 the classical test with nominal Type I error rate of .05 has a true Type I error rate of .15. This effect has been described by Gastwirth and Rubin (1971) and others. More importantly, the power of the Markov McNemar test is close to the power of the classical test (.626 vs. .641) when the independent assumptions of the classical test are met, i.e., PI = 0.

3. Comparison of Ribosomal DNA Sequences To illustrate how Markov-McNemar paired binary analysis can be applied to complex situations, we consider an example from molecular evolutionary biology. Evolutionary hypotheses can be tested by comparing differences in DNA nucleotide sequences. DNA sequences from the stable 18S ribosomal DNA are often used for this purpose. We used the ribosomal DNA sequences analyzed by Halanych et al. (1995) and made available to us by those authors. The standard analysis of this kind of data involves the construction of evolutionary trees. The stability of a tree construction can then be evaluated using standard, but time consuming, resampling or bootstrapping techniques.

We propose a different analysis which tests a specific subhypothesis. At the heart of any analysis of this kind is the comparison of three species, say species A, B, and C. Species A and B are on the same branch of the evolutionary tree, relative to species C, if the paths to species A and B share a common node (i.e., ancestor) not shared by the path to species C. Thus, the paths to species A and B diverge from the path to C at the same point in time. If the base substitution rate is constant over time, then the rate of sequence mismatches between species A and C will be the same as the rate of sequence mismatches between species B and C.

To test the hypothesis that species A and B are on the same branch of an evolutionary tree relative to species C, we propose the following analysis. From the DNA sequences of species A, B, and C we define an embedded discordant series (equation (3)) in the following way. First, select the sequence positions where there are no deletions in the three series. From this sequence select the embedded discordant sequence by selecting only the positions where the nucleotides match between A and C but do not match between species B and C (Z = 1), or the nucleotides do not match between A and C but match between B and C (Z = 0). The null hypothesis that, relative to species C, the base substitution rates for species A and B are the same is equivalent to the hypothesis that the limiting distribution of the discordant series is wre = WFI .5.

We give below a representative analysis of this kind. The species used for this analysis are species A, Glottidia pyrirnidata, an inarticulate brachiopod; species B, Glycera americana, a polychaete; and species C, Acanthopleura japonica, a chiton. Table 3 shows the data for the 65 discordant positions from the original sequences of 1605 aligned bases. Recorded in Table 3 are the positions of the discordances in the original sequence (displayed vertically, so that the sequence of positions is 69, 72,. .., 1529), the nucleotide bases of the three species, and the discordant series {Zi} defined above.



1068 Biometrics, September 1996 H LOC) 0c 0 4 --I

TH LOCN CY 4 0 d4;, 00 V E- V 0

1,HiLO t- H0 F- 0> H d4 C 0 C

V O

T-- CO - LO

V

E- V O

THnCO LO a H VH 0 -HCY

CO LO - HQ

0V H C LOf) E-E4 {

TH n COLq 00 V V H

q C M CVH V O T-- CN 00 dqt -: <4 0 ,- CN d4J C5) 0 --: 0 O H N CY) 00 E- 0 ,-- CN m I,- < 0 <4 O

- - a - H U V H

T- O CY) Lo H w T- O

CN L4O --IO

CS) to CY) H V H0 LO 00 0 OC) LO t 0

c3 0C LO C) EVE 0) -Hdq 00 0 O) CY) H Q 4

0) 0l

00 LO 0 c(00 C t E-H E- O

00 iH- E-4 0V H 0

S0 00 L < 0 4 O - LO dq d VH V 0 o J d( CS O V O 0 Q)

(CN~ 0 0 ) (D CO tH <E4-E4 V

-4 CN 000--

o Lo CN HVH 0 OH <O 0 LO V-4 0 E- 0 E~~- 4

o co '-a

v E-0 0

; dq LO tl E- E- <4 -- CO tl- m FA

0 - t- CY CO

V E-

V CY CY CO 0 0 H T-

O

CY) T--i VVE-0H 0 ,, -4C\N H V H 0 CNO CO) HOH 0 CN Cr O OHO 0

t C- CN EH 0 E- o a H wCN0j 9 <0

CN3 o a) F4 F4 V o 0 dd

0 0 <0

0-4CO0j 0 <90 0

I-0 uE 0

U ~ ~ H 0 DVVC

t- CN H < H 0 too- T 0H O

0

O- 4;Qi.

nU rrrn




The discordant series of 65 binary observations has 23 ones. Assuming that this is a series of independent observations, then the standard binomial or sign test of the null hypothesis that 7rI = .5 against the two-sided alternative yields a p-value of .0248. For this short series the runs test for positive serial dependence (Klotz, 1973) yields a p-value of .12. However, some longer discordant series from the same DNA data base were strongly serially dependent. Assuming a first-order serial dependence for this discordant series, the appropriate test of the null hypothesis that 7rl = .5 is the proposed Markov-McNemar test. The summary statistics for the embedded series are Z1 = 0, N00 = 29, N1l = 10, N10 = 12, and No1 = 13. The Markov-McNemar conditional two-sided test yields a p-value of Pr(Nil < 10 or N1l > 29 l ZI = 0, No1 + N1o = 25, Noo + N1l = 39) = .061. As outlined in this paper, this is simply the probability calculation for the Bose-Einstein model for distributing 39 indistinguishable balls into the k = 13 cells labeled 1, where there is a total of m = 26 cells.

From the above analysis one would provisionally accept the null hypothesis that species A and B (the polychaete and the inarticulate species) belong on the same tree branch relative to species C (the chiton species). Maximum likelihood tree construction methods yield the same result (Halanych et al., 1995). In general, Markov-McNemar analysis and the maximum likelihood tree analysis yield the same conclusions for most combinations of the 3 species from the 14 species data base. The maximum likelihood tree method models the detailed base substitution process at a single posi- tion over time, while assuming the substitution processes at different locations are independent (Waterman, 1995). On the other hand, Markov-McNemar analysis ignores the details of the base substitution process, but models the dependence between locations. The observed dependent structure could be induced by differences in base substitution rates between species A and species B at different locations in the sequence.

ACKNOWLEDGEMENTS

We would like to thank the referee for suggestions that led to the power calculations contained in Table 3. The Editor's and Associate Editor's advice improved clarity of the paper. Dirk Moore and Peter Petraitis critiqued the ribosomal DNA analysis presented in Section 3. We thank Ken Halanych for providing the ribosomal DNA data and for his insightful explanations of the current methodologies in this area.

RESUMmE

Une extension directe du test de McNemar pour donnees binaires appariees fournit un test exact de l'egalite des distributions marginales limites pour des chaines de Markov-binaires bivariates. La distribution exacte des statistiques de test sous les hypotheses nulles de marges egales depend des statistiques classiques d'occurrence du module de Bose-Einstein. Des degres de signification exacts sont calculus pour le test unilateral, et la moyenne ainsi que la variance de la statistique de test sont trouvees. La puissance du test de Markov-McNemar est constatee comme etant proche de celle du test classique de McNemar pour des couples d'observations independents lorsque l'hypothese d'independance tient. La methode est appliquee a la comparison de sequences d'DNA ribosomal.

REFERENCES

Bedrick, E. J. and Aragon, J. (1989). Approximate confidence intervals for parameters of a stationary binary Markov chain. Technometrics 31, 437-447.

Billingsley, P. (1961). Statistical methods in Markov chains. Annals of Mathematical Statistics 32, 12-40.

Boza, L. B. (1971). Asymptotically optimal test for finite Markov chains. Annals of Mathematical Statistics 42, 1992-2007.

Chatfield, C. (1973). Statistical inference regarding Markov chain models. Applied Statistics 22, 7-21.

Feller, W. (1967). An Introduction to Probability Theory and Its Applications, 2nd edition. New York: Wiley.

Gabriel, K. R. (1959). The distribution of the number of successes in a sequence of dependent trials. Biornetrika 46, 454-460.

Gaswirth, J. L. and Rubin, H. (1971). Effect of dependence on the level of some one-sample tests. Journal of the American Statistical Society 66, 816-820.




Halanych, K. M., Bacheller, J. D., Auinaldo, A. M. A., Stephanie, M. L., Hillis, D. M., and Lake, J. A. (1995). Evidence from 18S ribosomal DNA that the lophophorates are protostome animals. Science 276, 1641-1642.

Johnson, N. L. and Kotz, S. (1977). Urn Models and Their Application?: An Approach to Modern Discrete Probability Theory. New York: John Wiley.

Kedem, B. (1980). Binary Time Series: Lecture Notes in Pure and Applied Math, Volume 52. New York: Marcel Dekker.

Kemeny, J. G. and Snell, J. L. (1960). Finite Markov Chains. Princeton: D. Van Nostrand. Klotz, J. (1973). Statistical inference in Bernoulli trials with dependence. The Annals of Statistics

1, 373-379. Kunte, S. (1977). The multinomial distribution, Dirichlet integrals and Bose-Einstein statistics.

Shankhya 89, 305-508. Liang, K.-Y. and Zeger, S. L. (1989). A class of logistic regression models for multivariate binary

time series. Journal of the American Statistical Association 84, 447-451. Morgan, B. J. T. (1976). Markov properties of sequences of behaviours. Applied Statistics 25,

31-36. Singh, A. C. and Sutradhar, B. C. (1989). Testing proportions for Markov dependent Bernoulli

trials. Biometrika 76, 809-813. Solow, A. R., Smith, W., and Recchia, C. (1995). A conditional test of independence of Markov

chains. Biometrical Journal 8, 973-977. Tavare, S. and Altham, P. M. E. (1983). Serial observations leading to contingency tables, and

corrections to chi-squared statistics. Biometrika 70, 139-144. Waterman, M. S. (1995). Introduction to Computational Biology. London: Chapman and Hall.

Received October 1994; revised April 1995, September 1995, February 1996; accepted February 1996.



an exact mcnemar test for paired binary markov chains

Documents