linked surveys of health services utilization

14
STATISTICS IN MEDICINE Statist. Med. 2007; 26:1788–1801 Published online 12 January 2007 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/sim.2799 Linked surveys of health services utilization Monroe G. Sirken , and Iris M. Shimizu National Center for Health Statistics, CDC, 3311 Toledo Road, Room 5212, Hyattsville, MD 20782, U.S.A. SUMMARY The linked population/establishment survey (LS) of health services utilization is a two-phase sample survey that links the sample designs of the population sample survey (PS) and the health-care provider establishment sample survey (ES) of health services utilization. In Phase I, household respondents in the PS identify their health-care providers during a specified calendar period. In Phase II, health-care providers identified in Phase I report the variables of interest for all or a sample of their transactions with all households during the same calendar period. The LS has been proposed as a potential design alternative to the PS whenever the health-care transactions of interest are hard to find or enumerate in household surveys and as a potential design alternative to the ES whenever it is infeasible or expensive to construct or maintain complete sampling provider frames that list all health-care providers with good measures of provider size. Suppose that the non-sampling errors are ignorable, how do the LS, PS and ES sampling errors compare? This paper addresses that question by summarizing and extending recent research findings that compare expressions of the sampling variance of (1) the LS and PS of equivalent household sample size and (2) the LS and the ES of equivalent expected health-care provider and transaction sample sizes. The paper identifies the parameters contributing to the precision differences and assesses the conditions that favour the LS or one or the other surveys. Published in 2007 by John Wiley & Sons, Ltd. KEY WORDS: population surveys; provider surveys; network sampling; sampling errors 1. INTRODUCTION Health-care utilization statistics on the volume of visits between health-care users and providers are typically collected in the population sample survey (PS) or the health-care provider establishment sample survey (ES) that are independently designed. In the PS, health-care users report the variables of interest for their visits with health-care providers during a specified calendar period. In the ES, health-care providers report the variables of interest for all visits of health-care users during the specified calendar period. This paper proposes a third option—the linked provider survey of health services utilization (LS). The LS is a two-phase survey. In Phase I, health-care users in a population Correspondence to: Monroe G. Sirken, National Center for Health Statistics, 3311 Toledo Road, Room 5212, Hyattsville, MD 20782, U.S.A. E-mail: [email protected] This article is a U.S. Government work and is in the public domain in the U.S.A. Received 15 October 2006 Published in 2007 by John Wiley & Sons, Ltd. Accepted 16 November 2006

Upload: monroe-g-sirken

Post on 06-Jul-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Linked surveys of health services utilization

STATISTICS IN MEDICINEStatist. Med. 2007; 26:1788–1801Published online 12 January 2007 in Wiley InterScience(www.interscience.wiley.com) DOI: 10.1002/sim.2799

Linked surveys of health services utilization‡

Monroe G. Sirken∗,† and Iris M. Shimizu

National Center for Health Statistics, CDC, 3311 Toledo Road, Room 5212, Hyattsville, MD 20782, U.S.A.

SUMMARY

The linked population/establishment survey (LS) of health services utilization is a two-phase samplesurvey that links the sample designs of the population sample survey (PS) and the health-care providerestablishment sample survey (ES) of health services utilization. In Phase I, household respondents inthe PS identify their health-care providers during a specified calendar period. In Phase II, health-careproviders identified in Phase I report the variables of interest for all or a sample of their transactions withall households during the same calendar period.

The LS has been proposed as a potential design alternative to the PS whenever the health-caretransactions of interest are hard to find or enumerate in household surveys and as a potential designalternative to the ES whenever it is infeasible or expensive to construct or maintain complete samplingprovider frames that list all health-care providers with good measures of provider size. Suppose that thenon-sampling errors are ignorable, how do the LS, PS and ES sampling errors compare? This paperaddresses that question by summarizing and extending recent research findings that compare expressionsof the sampling variance of (1) the LS and PS of equivalent household sample size and (2) the LS andthe ES of equivalent expected health-care provider and transaction sample sizes.

The paper identifies the parameters contributing to the precision differences and assesses the conditionsthat favour the LS or one or the other surveys. Published in 2007 by John Wiley & Sons, Ltd.

KEY WORDS: population surveys; provider surveys; network sampling; sampling errors

1. INTRODUCTION

Health-care utilization statistics on the volume of visits between health-care users and providers aretypically collected in the population sample survey (PS) or the health-care provider establishmentsample survey (ES) that are independently designed. In the PS, health-care users report the variablesof interest for their visits with health-care providers during a specified calendar period. In the ES,health-care providers report the variables of interest for all visits of health-care users during thespecified calendar period. This paper proposes a third option—the linked provider survey of healthservices utilization (LS). The LS is a two-phase survey. In Phase I, health-care users in a population

∗Correspondence to: Monroe G. Sirken, National Center for Health Statistics, 3311 Toledo Road, Room 5212,Hyattsville, MD 20782, U.S.A.

†E-mail: [email protected]‡This article is a U.S. Government work and is in the public domain in the U.S.A.

Received 15 October 2006Published in 2007 by John Wiley & Sons, Ltd. Accepted 16 November 2006

Page 2: Linked surveys of health services utilization

LINKED SURVEYS 1789

survey report the number of their visits during the specified calendar period with each provider, andidentify each health-care provider. In Phase II, the health-care providers identified in Phase I reportthe variables of interest for all visits of health-care users during the specified calendar period.

The LS of health services utilization was proposed more than a decade ago [1] as a potentialdesign alternative to the ES. When the LS is viewed as a provider survey, the essential differencebetween it and ES is the sampling frame. The LS uses a population survey-generated samplingframe that contains only the providers visited by households in a PS. On the other hand, the ESuses a complete sampling frame that lists all providers in the universe. In both the LS and ES,health-care providers report the variables of interest about all or a sample of their visits with allhouseholds. The LS is particularly appealing as a potential design alternative to the ES wheneverthe quality of the ES sampling frame is suspect due to incompleteness of provider coverage orinaccuracies in the provider size measures.

More recently, it became apparent that the LS also has potential as a design alternative to thePS. When the LS is viewed as a household sample population survey, there are essentially twodifferences between the LS and the PS: (1) the counting rule that links visits to households at whichthey are eligible to be counted and (2) the respondent rule that designates the eligible sources ofinformation for the visits eligible to be counted at households. The LS uses a multiplicity countingrule that makes all visits with the same provider eligible to be counted at all households that are theusual places of residence of the provider’s health-care users, and the PS uses the de jure residencerule that makes the visits of health-care users eligible to be counted at the one household that is thehealth-care user’s usual place of residence. The variables of interest about visits that are eligible tobe counted at households by the multiplicity rule are reported by health-care providers in the LS,and those eligible to be reported by the de jure residence rule are reported by health-care users inthe PS. Whenever the visits of interest are hard to find or enumerate in household surveys, the LSdeserves particularly serious consideration as a design alternative to the PS.

This paper summarizes the research findings that compare expressions of the sampling varianceof the two-stage LS and ES of equivalent expected provider and visit sample sizes and the LSand PS of equivalent household sample size. The two-stage LS unbiased estimator of the volumeof provider visits and its variance were derived by Sirken et al. [2]. The precision of two-stageLS and ES estimators of equivalent expected provider and visit sample sizes were comparedby Sirken [3]. Most recently, the precision of LS and PS estimators of equivalent household sam-ple size was compared by Sirken and Shimizu [4]. In these earlier publications, the precisioncomparisons assumed simple random sampling (srs) in the PS and sampling with probability pro-portionate to size (pps) in the ES. In this paper, the selection probabilities of households in thePS and providers in the ES are unrestricted.

Section 2 presents basic notation. In subsequent sections, the research findings are presentedin the chronological order that the research was originally conducted. Section 3 describes thetwo-stage LS sample design and presents the LS unbiased estimator and its variance. Section 4describes the ES sample design and compares the LS and ES estimators and their variances.Section 5 describes the PS sample design and compares the LS and PS estimators and samplingvariances. The concluding Section 6 summarizes the findings and briefly discusses their limitations.

2. BASIC NOTATION

A population residing in N households has M visits with a universe of R providers. Let N∗ denotethe households which have one or more visits with providers while N 0 = N−N∗ denotes the house-

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim

Page 3: Linked surveys of health services utilization

1790 M. G. SIRKEN AND I. M. SHIMIZU

holds without provider visits. Let Mi j = the number of visits of household i (i = 1, 2, 3, . . . , N )

with provider j ( j = 1, 2, 3, . . . , R), where Mi j>0 when household i has visits with providerj , and Mi j = 0 when household i and provider j do not have visits. Then, M j =∑N

i Mi j

is the number of visits of N households with provider j, M =∑Rj M j is the number of vis-

its of N households with R providers and M = M/N is the average number of visits perhousehold.

Let Xi jh denote the value of the x-variable for visit h (h = 1, 2, 3, . . . , Mi j ) of household

i (i = 1, 2, 3, . . . , N ) with provider j ( j = 1, 2, . . . , R). Then, Xi j =∑Mi jh Xi jh = sum of the

x-variable over the Mi j visits of household i with provider j (Xi jh = 0 and, hence, Xi j = 0 ifMi j = 0). Also, let X jk denote the value of the x-variable for visit k (k = 1, 2, 3, . . . , M j ) of

provider j and X j =∑Mjk X jk be the sum of the x-variable over the M j visits of provider

j ( j = 1, 2, 3, . . . , R). Then X =∑Rj X j =∑N

i Xi is the sum of the x-variable over the M visitsof the N households with R providers.

Let X j = X j/M j be the average value of the x-variable summed over the M j visits ofprovider j . Also, let X = X/N = the average value of the x-variable per household.

3. The LINKED PROVIDER/POPULATION SAMPLE SURVEY (LS)

3.1. The LS sample design

The LS is a network sampling population survey using a multiplicity counting rule that counts theM j visits of provider j ( j = 1, 2, 3, . . . , R) at every household i (i = 1, 2, 3, . . . , N ) that visitsprovider j . [Network sampling applies whenever the same observation units (visits) are eligibleto be counted at multiple selection units (households)]. Let Ai , denote the cluster of zero or moreproviders that are seen by household i (i = 1, 2, 3, . . . , N ).

The LS is modelled as a two-stage network sample population survey in which the Ai ’s(i = 1, 2, 3, . . . , N ) are PSU’s, and the M j visits of each provider j ∈ Ai (i = 1, 2, 3, . . . , N )

are second stage selection units. In the first stage, a sample of n households is selected with prob-abilities �i (i = 1, 2, 3, . . . , N ) with replacement. In the second stage, a sample of mi j = cLSMi jvisits of the M j visits of each provider j ∈ Ai (i = 1, 2, 3, . . . , n), where cLS is a positive number,is independently selected by srs without replacement. The total LS sample of visits is mLS = cLSrLS,where rLS =∑n

i∑R

j Mi j , a random variable, is the size of the LS sample of providers.In the first stage, household respondents in household i (i = 1, 2, 3, . . . , n) identify their

providers ( j ∈ Ai ) and report Mi j , the number of their visits with each provider. In the sec-ond stage, each provider j ∈ Ai i (i = 1, 2, 3, . . . , n), reports the x-variable for an independentsample of mi j = cLSMi j of its M j visits.

3.2. The unbiased LS estimator of X

The unbiased two-stage network sampling LS estimator of X based on a first stage sample of nhouseholds and a second stage sample of mi j visits of provider j ( j ∈ Ai , i = 1, 2, 3, . . . , n) is

X ′LS = 1

n

n∑i

�′i

�i(1)

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim

Page 4: Linked surveys of health services utilization

LINKED SURVEYS 1791

where

�′i =

∑j∈Ai

Mi j X′j (i) (2)

and

X′j (i)= 1

mi j

mi j∑kX jk (3)

is the unbiased estimate of mean X j = X j/M j based on the sample of mi j = cLSMi j visits ofprovider j that are enumerated at sample household i (i = 1, 2, 3, . . . , n).

The unbiased single-stage LS1 estimator of X is

X ′LS1 = 1

n

n∑i

�i�i

(4)

where

�i =∑j∈Ai

Mi j X j (5)

Because households are sampled with replacement, the LS1 and LS estimators of X count thequantities �i and �′

i , respectively, every time household i (i = 1, 2, 3, . . . , N ) is drawn in thesample. Because the same provider is visited by multiple households, the LS1 and LS estimatorscount the quantities Mi j X j and Mi j X

′j , respectively, every time provider j ( j = 1, 2, 3, . . . , R)

is seen by sample household i (i = 1, 2, 3, . . . , n).

3.3. The variance of the LS estimator of X

The sampling variance of the two-stage LS estimator of X is [2]

Var(X ′LS) = 1

n�2LS1 + 1

ncLS

N∑i

1

�i

∑j∈Ai

Mi j

(1 − cLSMi j

M j

)�2 j (6)

where the first and second terms on the right side of (6) are the first- and second-stage componentsof sampling variance, and

�2LS1 = �2�i =N∑i

�i

(�i�i

− �

)2

(7)

is the between household population variance where � =∑Ni �i = X , and

�2 j =1

M j − 1

M j∑k

(X jk − X j )2 (8)

is the within component of variance of provider j , where X j = X j/M j . The sampling varianceof the single-stage LS estimator of X is

Var(XLS1) = 1

n�2LS1 (9)

where �2LS1 is the LS between household population variance defined in (7).

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim

Page 5: Linked surveys of health services utilization

1792 M. G. SIRKEN AND I. M. SHIMIZU

In health services utilization surveys with some health-care providers, such as dentists andhospitals, substantial fractions of the population do not have any visits with health-care providersduring the calendar period of interest. Denote the sum of the selection probabilities of the N∗households with one or more health-care visits as

P =N∗∑i=1

�i (10)

When expressed as a function of P , the LS between household population variance decomposesinto two components

�2LS1(P)= 1

P[�2LS1∗ + (1 − P)X2] (11)

The proof of (11) appears in the Appendix. The first term inside the brackets on the right sideof (11)

�2LS1∗ = �2�∗i=

N∗∑i=1

�∗i

(�i�∗i

− X

)2

(12)

is the truncated variance of the N∗ households with visits, and

�∗i = �i/P (13)

The second term inside the brackets (1 − P)X2 is the component due to the N − N∗ = N 0

households without visits.

4. ES AND LS COMPARISONS

4.1. Sample designs

The ES is a two-stage health-care provider sample survey in which R health-care providersare the primary sampling units, and the M j visits of provider j ( j = 1, 2, 3, . . . , R) with Nhouseholds are secondary sampling units. In the first stage, a sample of rES providers is se-lected from the universe of R providers with selection probabilities � j ( j = 1, 2, 3, . . . , R).In the second stage, a fixed size sample cES (a positive integer) of the M j visits of providerj ( j = 1, 2, 3, . . . , rES) is selected by srs without replacement. The total ES visit sample size ismES = (rES)(cES).

The essential difference between the LS and the ES is the sampling frame. The ES uses acomplete provider sampling frame that lists all providers ( j = 1, 2, 3, . . . , R) and may or may notlist size measures of the providers. The LS uses a population survey-generated frame that lists then households enumerated in a PS and for each survey household i (i = 1, 2, 3, . . . , n) lists theproviders visited, if any, and the number of visits, Mi j , with each provider j ( j = 1, 2, 3, . . . , R).In the ES sampling frame, the listed units are distinct providers and each provider is listed onlyonce. In the LS sampling frame, the listed units are the clusters of providers visited by surveyhousehold i (i = 1, . . . , n), and the same provider is listed as many times as visited by a distinctsurvey household.

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim

Page 6: Linked surveys of health services utilization

LINKED SURVEYS 1793

4.2. Estimators

The unbiased two-stage ES estimator of X based on a first stage sample of rES providers selectedwith probabilities � j ( j = 1, . . . , R) and a second stage sample of cES visits is

X ′ES = 1

rES

rES∑j

X ′j

� j(14)

where

X ′j =

M j

cES

cES∑kX jk (15)

is the unbiased ES estimate of X j . The unbiased single-stage ES estimator of X is

X ′ES1 = 1

rES

rES∑j

X j

� j(16)

The ES and LS estimators in (14) and (4), respectively, are not directly comparable becausethe PSUs are defined differently, but they become comparable when the ES provider and visitsample sizes, respectively, equal the expected LS provider and visit sample sizes in a sample of nhouseholds where rLS =∑n

i∑R

j Mi j is the LS provider sample size and mLS = cLSrLS is the LSvisit sample size. Under those conditions, the ES provider visit size is

rES = E(rLS|n)= n� (17)

where � =∑Ni∑R

j Mi j�i j , the expected number of visits per household, and the ES visit size is

mES = cE(rLS|n) = cn� (18)

where c= cES = cLS.Equating the ES and LS sample sizes in the manner just described approximates conducting the

ES and LS provider surveys under roughly the same cost constraints assuming that the ES and LSsurvey costs per provider and per visit, respectively, are equivalent and none of the N householdshave multiple visits with the same providers. If households have multiple visits with the sameproviders, the LS yields fewer distinct providers than the ES, and the expected provider surveycosts would be less in the LS than the ES.

Substituting rES = n� and cES = c in (14), the unbiased two-stage ES estimator of X is

X ′ES = 1

n�

n�∑j

X ′j

� j(19)

where X ′j = (1/c)

∑ck X jk . The single-stage ES estimator of X is

X ′ES1 = 1

n�

n�∑j

X j

� j(20)

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim

Page 7: Linked surveys of health services utilization

1794 M. G. SIRKEN AND I. M. SHIMIZU

4.3. Single-stage variances

The variance of the single-stage ES estimator of X based on a sample of rES = E(rLS|n)= n� is

Var(X ′ES1) = 1

n��2ES1 (21)

where

�2ES1 =R∑j

� j

(X j

� j− X

)2

(22)

is the between provider component of variance.The ES1 and LS1 estimators in (16) and (4) and the ES1 and LS1 variances in (21) and (9)

of equivalent expected provider sample sizes are equivalent when the LS1 satisfies the followingconditions:

Condition I: The M provider visits are uniformly distributed over N households such thatevery household has a single provider visit. This LS condition implies that

M = N∗ = N (23)

Condition II: Every household i (i = 1, 2, 3, . . . , N ) that visits provider j has the same LSselection probability, namely

�i ( j)= � j/M j ( j ∈ Ai , i = 1, . . . , N ) (24)

For example, if M = N∗ = N , the ES1 with a pps design, � j = M j/M ( j = 1, 2, 3, . . . , R), isequivalent to the LS with an equal probability sample (eps) design, �i = 1/N (i = 1, 2, 3, . . . , N ),and the ES1 with an eps design, � j = 1/R ( j = 1, 2, 3, . . . , R) is equivalent to the LS1 with asample design in which �i ( j)= 1/RM j ( j ∈ Ai , i = 1, . . . , N ).

Let � denote the ratio of the variances of the LS1 and ES1 estimators of X of equivalent expectedprovider sample sizes,

� = Var(X ′LS1)

Var(X ′ES1)

= ��2LS1�2ES1

(25)

If Conditions I and II are met, � = 1, implying that LS1 and ES1 have equal precision. If Conditions Iand/or Condition II are violated and �>1, precision is better in the ES1 than the LS1, and when�<1, precision is better in the LS1 than in the ES1.

As shown in (11) and proved in the Appendix, when expressed as a function of P , �2LS1 =�2LS1(P>0) = (1/P)[�2LS1∗ +(1−P)X2] where P =∑N∗

i �i is the sum of the selection probabilitiesof the N∗ households with visits. Substituting �2LS1(P) for �2LS1 in the numerator of (25),

�(P>0)= ��2LS1(P)

�2ES1= �

P

[�2LS1∗

�2ES1+ (1 − P)

V 2ES1

](26)

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim

Page 8: Linked surveys of health services utilization

LINKED SURVEYS 1795

where

P=∑N∗

i∑R

j Mi j�i∑N∗i �i

�1 (27)

and V 2ES1 = �2ES1/X

2 is the real-variance of X ′ES1.

The first bracketed term on the right side of (26) equals one if M = N∗ and Condition II issatisfied, and the second bracketed term equals zero if N∗ = N , and increases as P → 0. DecreasingP adversely affects the relative precision of the LS1. For example, if N∗<N but M = N∗ andCondition II is satisfied, � = P , �2LS1∗ = �2ES1, and �(P>1)= [1+ (1− P)/V 2

ES1]>1. On the otherhand, if M>N∗, neither survey is necessarily more efficient than the other, and the outcomedepends on the LS1 and ES1 sample designs.

First, consider the situation in which M�N∗ and the LS1 has the eps design (�i = 1/N , i =1, 2, 3, . . . , N ), and ES1 has the pps design (� j = M j/M, j = 1, . . . , R). When M = N∗, theepsLS1 and the ppsES1 are equivalent. Hence, when M>N∗, the ppsESI is likely to be moreprecise than the epsLS1 because sampling households with equal probabilities are virtually alwaysmore precise when every household has a single visit than when the visits are not uniformlydistributed over N∗ households. This conclusion applies only if the ES sampling frame has goodmeasures of provider sizes (M js).

Next, consider the situation in which M�N∗ and the LS1 has the eps design and the ES1 has theeps design (� j = 1/R, j = 1, 2, 3, . . . , R). When N∗ = M , the epsES1 is equivalent to the LS1with unequal probability selections, �i ( j)= 1/� j M j (i = 1, 2, 3, . . . , M j , j = 1, 2, 3, . . . , R).Hence, when M = N∗, the epsLS1 is likely to be more precise than epsES1 because samplinghouseholds with equal probability are virtually always more precise than sampling households withunequal probabilities. However, when M>N∗, the situation becomes more complicated and theoutcome less transparent. Sampling households with equal probabilities when visits are not uni-formly distributed over N∗ households are not necessarily more precise than sampling householdswith unequal probabilities when every N∗ household has a single visit.

4.4. Two-stage ES and LS variances

The variance of the two-stage ES estimator of X shown in (19) is

Var(X ′ES) = �2ES1

n�+ 1

nc�

R∑j

M2j

� j

(M j − c

M j

)�2− j (28)

where the first and second terms, respectively, on the right side of (28) are the between andwithin components of provider variance. If Conditions I and II are satisfied, it is easily verifiedby subtracting the second terms on the right side of (6) and (28) that the ES and LS second stagevariance components are equivalent. Because the ES and LS first stage variance components arealso equivalent under these conditions, it follows that Var(X ′

ES) =Var(X ′LS) when Conditions I

and II are satisfied.When Condition I is not satisfied because households have multiple provider visits, the dif-

ference between the second stage variance components of the epsLS (�i = 1/N ) and the pps ES(� j = M j/M) is

1

nc�

R∑j

M2j

M j/M

(M j−c

M j

)�2 j−

1

nc

N∑i

1

1/N

R∑j

Mi j (M j−cMi j )

M j� j = N

n

R∑j

� j�2j � 0 (29)

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim

Page 9: Linked surveys of health services utilization

1796 M. G. SIRKEN AND I. M. SHIMIZU

where � j = (1/M j )∑N

i Mi j (Mi j − 1) is the difference between the finite correction factors ofthe second stage variance components of provider j ( j = 1, . . . , R). If none of the N householdshas multiple visits with the same providers, �=∑R

j � j = 0 and it follows that the second stagevariance components are equivalent. On the other hand, if households have multiple visits withthe same providers, the second stage variance component is less in the epsLS than in the ppsES,and the difference between them is an increasing function of �.

5. LS AND PS COMPARISONS

5.1. The LS and PS survey designs

When the LS is viewed as a population survey, a major difference between it and the PS is thecounting rule used to link provider visits to households at which they are eligible to be counted.The PS uses the de jure counting rule that uniquely links every visit to the usual place of residenceof the health-care user. The LS uses a multiplicity counting rule that links the same visit tomultiple households that are the places of residence for health-care users of the same provider. Thecounting rule difference implies that visits are eligible to be counted at the same N∗ households(i = 1, 2, 3, . . . , N∗) in both surveys but the PS counts the Mi =∑R

j Mi j visits of the residents ofhousehold i and the LS counts the M j visits of every provider visited by household i .

5.2. The PS and LS estimators

The unbiased PS estimator of X based on a sample of n households drawn with selection proba-bilities �i (i = 1, 2, 3, . . . , N ) with replacement from a universe of N households is

X ′PS = 1

n

n∑i

�i�i

(30)

where

�i = ∑j∈Ai

Mi j∑k

Xi jk (31)

is the sum of the x-values of the Mi visits of household i (i = 1, 2, 3, . . . , n).The PS estimator in (30) and LS1 and LS estimators in equations (1) and (4) are simi-

lar except for the value of the x-variable that is counted when household i has visit k withprovider j ; the PS estimator counts Xi jk , and the LS1 and the LS estimators count the quantities

X j and X′j , respectively. The PS and the LS1 and LS estimators are equivalent if Xi j = X j

(i = 1, 2, 3, . . . , N , j = 1, 2, 3, . . . , R), that is to say if∑R

j �2 j = 0.

5.3. The PS and single-stage LS variances

The sampling variance of the PS estimator of X is

Var(X ′PS) = 1

n�2PS (32)

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim

Page 10: Linked surveys of health services utilization

LINKED SURVEYS 1797

where

�2PS = �2�i =N∑i

�i

(�i�i

− X

)2

(33)

is the between PS household population variance. When expressed as a function of P =∑N∗i �i ,

the �2PS decomposes as follows:

�2PS(P) = 1

P{�2PS∗ + X2(1 − P)} (34)

The proof of (34) appears in the Appendix. The first bracketed term on the right side of (34)

�2PS∗ = �2�∗i=

N∗∑i

�∗i

(�i�∗i

− X

)2

(35)

is the truncated variance of the N∗ households with visits, where �∗i = (�i/P). The second bracketed

term, X2(1−P) is the truncated portion of �2PS summed over the N 0 = N−N∗ households withoutvisits.

Subtracting (9) from (32) and making appropriate substitutions, the difference between thevariances of the PS and the LS1 estimators of X is

Var(X ′PS) − Var(X ′

LS1) = 1

n(�2PS − �2LS1) = 1

nP(�2PS∗ − �2LS1∗), P>0 (36)

Sirken and Shimizu have shown in a paper that is being prepared for publication that

(�2PS∗ − �2LS1∗) = (�2�∗i+ 2��∗

i �∗i) (37)

where �∗i = �∗

i −�∗i is the difference between the PS and LS1 values of the x-variate of visits counted

at household i (i = 1, 2, 3, . . . , N∗), �2�∗i= E[(�∗

i )2] =∑N∗

i (�∗i )

2/�∗i and ��∗

i �∗i= E(�∗

i �∗i ) =∑N∗

i (�∗i �

∗i )/�

∗i . Substituting (37) in (36), the difference between the variances of the PS and

LS1 estimators of X becomes

Var(X ′PS) − Var(X ′

LS1) = 1

nP{�2�∗

i+ 2��∗

i �∗i}, P>0 (38)

Conditional on P , the difference between the PS and LS1 variances depends on the joint effects oftwo factors: (1) the within provider (wp) component of variance, �2wp = M2∑R

j � j�2j , where � j

is the selection probability of provider j ( j = 1, 2, 3, . . . , R) and (2) the within household (wh)clustering of visits. The parameter �2�∗

ireflects the wp variance effects, and the parameter ��∗

i �∗i

reflects the effects of wh clustering of visits.Variances of the PS and the LS1 estimators of X of equivalent sample size are equivalent if

�2�∗i

+ 2��∗i �

∗i= 0. If �2wp = 0, for example, �2�∗

i= ��∗

i �∗i= 0 and it follows that the PS and LS1

variances are equivalent. Though �2wp>0 does not necessarily imply that the LS1 is more reliable

than PS, that outcome seems far more likely than the reverse because �2wp is a component of �2PSand is not a component of �2LS1. If ��∗

i �∗i<0, neither survey is necessarily more reliable than the

other. On the other hand, if ��∗i �

∗i�0, the LS1 is always more reliable than PS. For example, if

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim

Page 11: Linked surveys of health services utilization

1798 M. G. SIRKEN AND I. M. SHIMIZU

none of the N∗ households has multiple visits and households are selected by srs with replacement,Sirken and Shimizu have shown that ��∗

i �∗i= 0, �2�∗

i= �2wp and the difference between the PS and

LS1 variances in (38) reduces to

Var(X ′PS) − Var(X ′

LS1) = 1

nP�2�∗

i= 1

nP�2wp>0, P>0 (39)

and the relative reduction in the PS variance due to LS1 is

Var(X ′PS) − Var(X ′

LS1)

Var(X ′PS)

= �2wpP�2PS

= �2wp�2wp + �2bp + X2(1 − P)

(40)

where �2bp =∑Rj � j [(X j/� j ) − X ]2 is the between-provider component of variance.

5.4. The PS and two-stage LS variances

Subtracting equation (6) from (32), the difference between the variances of the PS and two-stageLS estimators of X is

Var(X ′PS) − Var(X ′

LS) = 1

n[�2PS − �2LS1] − 1

ncLS

N∑i

1

�i

∑j∈Ai

Mi j

(1 − cLSMi j

M j

)�2 j (41)

The first term on the right side of equation (41) is the difference between the variances of the PSand the LS1 estimates of X , and the second term is the second stage variance component of thetwo-stage LS estimator of X .

The Var(X ′LS)<Var(X ′

PS) if �2LS1<�2PS and sufficiently large visit samples are selected in thesecond stage of the LS. For example, if none of the N∗ households have multiple visits and nhouseholds are selected by srs with replacement, the difference between the variances of the PSand the two-stage LS variance can be expressed as a function of the difference between the PSand the single-stage LS variances and cLS, the second stage LS visit sample size

Var(X ′PS) − Var(X ′

LS)>

(1 − 1

cLS

)[Var(X ′

PS) − Var(X ′LS1)]=

(cLS − 1)

cLS

1

nP�2wp (42)

The two-stage LS reduces the margin by which the PS variance exceeds the single-stage LSvariance by the factor (cLS − 1)/cLS. Nevertheless, the PS variance exceeds the two-stage LSvariance of equivalent household sample size n for values of cLS�1, and the differences betweenthe variances increases as cLS increases. It is particularly noteworthy that when cLS = 1, implyingthat LS and PS have equivalent visits as well as household sample sizes, that the PS varianceexceeds the two-stage LS variance.

6. SUMMARY AND CONCLUDING REMARKS

The linked survey (LS) of health services utilization is a hybrid of the independently designedpopulation sample survey (PS) and establishment (provider) sample survey (ES). In the LS, health-care visits are reported by providers who were visited and identified by respondents in a householdPS. In the ES, health-care visits are reported by providers sampled from a complete provider

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim

Page 12: Linked surveys of health services utilization

LINKED SURVEYS 1799

sampling frame, and in the PS, visits are reported by respondents in a household sample survey.This paper compares expressions of the sampling variances of (1) the two-stage LS and two-stageES of equivalent expected provider and transaction sample sizes and (2) the two-stage LS and PSof equivalent household sample size. It also identifies the parameters contributing to the precisiondifferences between the surveys and assesses the conditions that favour the LS when compared tothe ES and the PS.

The sampling variances of the LS and ES estimators of the volume of health-care transactionsare equivalent if: (1) the LS population survey has a simple random sample with replacementdesign and transactions are uniformly distributed over households such that every household has asingle visit and (2) the ES has a probability proportionate to size (pps) sample design. Otherwise,precision could favour either survey depending on whether the LS or the ES deviates least fromthese conditions. Because transactions rarely, if ever, are uniformly distributed over householdsand population surveys rarely have the srs design, is seems almost certain that the ES with a ppsdesign based on good measures of provider size will be more precise than the LS. On the otherhand, if the ES sampling frame’s coverage is incomplete or its provider size measures are biased,either the LS or the ES could be more precise than the other.

The direction and magnitude of the difference between the sampling variances of the LS andPS estimates of the volume of health-care utilizations depend on the size of the within-providercomponent of variance relative to the difference between the LS and PS intraclass correlations.The LS and PS variances are equivalent if the within-provider component of variance is ignorable.The variance is less in the LS than the PS if the intraclass correlation is greater in the PS. Ifthe intraclass correlation is less in the PS than the LS, either survey can be more or less reliablethan the other depending on the relative size of the within-provider component of variance. It isnoteworthy that the likelihood of the intraclass correlation being greater in the LS than the PSincreases to the extent that households have multiple transactions with the same providers.

The precision of LS and the ES and PS is compared in this paper assuming equivalent expectedprovider and transaction sample sizes in the LS and ES and equivalent household sample size in theLS and PS. Comparing precision of surveys of the same sample size is not the same as comparingprecision of surveys of the same expected survey costs. The LS would probably compare morefavourably with the ES and PS on the basis of equivalent survey costs than equivalent sample sizes.

(a) In LS and ES comparisons, the equivalent expected provider sample size will likely yieldfewer distinct providers in the LS than the ES if households have multiple visits with thesame provider. Because the expected provider survey cost is a function of the number ofdistinct providers, equivalent expected provider survey costs in the LS and ES would likelyyield a larger provider sample of distinct providers in the LS than in the ES and therebyimprove the precision of the LS relative to the ES.

(b) In the LS and PS comparisons, the equivalent household sample size is likely to be largerthan the optimum household sample size in a two-stage LS in which first and second stagesample sizes are optimized for fixed survey costs. Optimizing the two-stage allocation of theLS sample in this manner would improve the precision of the two-stage LS relative to thePS, particularly when costs per transaction are miniscule compared to costs per household aswould be the case in estimating the volume of health services utilization of small populationdomains.

Values of the parameters contributing to the differences in the precision between the LS and ESand the LS and PS are likely to vary considerably from one health services utilization survey to

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim

Page 13: Linked surveys of health services utilization

1800 M. G. SIRKEN AND I. M. SHIMIZU

another and even between the sub-domains of the same survey. Research findings presented in thispaper demonstrate that LS sampling errors are often competitive with ES and PS sampling errorsand provide guidelines for determining the survey conditions when the LS deserves most seriousconsideration as a design alternative to the ES and the PS. Empirical data and survey experimentsare needed to compare sampling and non-sampling errors and survey costs of the three surveysunder a broad range of survey conditions.

APPENDIX

An estimator of Y with varying probabilities is Y ′ = (∑n

1 Yi/�i )/n where Yi is the value of thecharacteristic of interest for sample unit i (i = 1, 2, 3, . . . , N ); �i is the probability of selectingunit i on a single draw; and n and N are the numbers of units in the sample and in the population,respectively. The variance of Y ′ is �2Y ′

i= �2Yi /n where

�2Yi =N∑i

�i

(Yi�i

− Y

)2

(A1)

The variance �2Yi can be decomposed into two parts as

�2Yi =1

P[�2Y ∗

i+ Y 2(1 − P)] (A2)

where P =∑N∗1 �i , N∗ is the number of households having transactions,

�2Y ∗i

=N∗∑i

�∗i

(Yi�∗i

− Y

)2

(A3)

is the truncated variance of the estimator Yi exclusive of the N 0 = N − N∗ households which haveno transactions, and

�∗i = �i

/N∗∑1

�i = �i/P (A4)

is the conditional probability of selecting household i (i = 1, 2, 3, . . . , N ) given household i hada transaction with providers.

It has been shown elsewhere [4] that if �i = 1/N , then (A2) translates to

�2Y

= P

[�2Y

∗ +(

Y

N∗

)2

(1 − P)

]where �2

Y= �2Yi /N

2 and �2Y

∗ = �2Y ∗i/(N∗)2

Proof

�2Y

=N∑i

�i

(Yi�i

− Y

)2

=N∗∑i=1

�i

(Yi�i

− Y

)2

+N0∑

i=N∗+1�i

(Yi�i

− Y

)2

= A + B (A5)

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim

Page 14: Linked surveys of health services utilization

LINKED SURVEYS 1801

Multiplying by (P2/P2) and using the definition of �∗i from (A4), the first term on the right side

of (A5) becomes

A =(P

P

)2 N∗∑1

�i

(Yi�i

− Y

)2

= 1

P

N∗∑1

�iP

(Yi

[�i/P] − YP

)2

= 1

P

N∗∑1

�∗i

(Yi�∗i

− YP

)2

(A6)

Adding and subtracting Y within the squared term of (A6) and using (A3), (A6) becomes

A = 1

P

[N∗∑i=1

�∗i

(Yi�∗i

− Y

)2

+ Y 2(1 − P)2

]

= 1

P[�2Y ∗

i+ Y 2(1 − P)2] (A7)

Recalling that Yi = 0 if household i is among the N 0 = N − N∗ households which had notransactions, the second term on the right side of (A5) can be written as

B =N0∑

i=N∗+1�i

(Yi�i

− Y

)2

= Y 2N0∑1

�i = Y 2(1 − P) (A8)

Substituting (A7) and (A8) into (A5) and simplifying

�2Yi = 1

P[�2Y ∗

i+ Y 2(1 − P)2] + Y 2(1 − P)

= 1

P[�2Y ∗

i+ Y 2(1 − P)2 + Y 2P(1 − P)]

= 1

P[�2Y ∗

i+ Y 2(1 − P)] (A9)

Q. E. D.

REFERENCES

1. Wunderlich GS (ed.). Toward a National Health Care Survey: A Data System for the 21st Century. NationalResearch Council and Institute of Medicine. National Academy Press: Washington, DC, 1992.

2. Sirken M, Shimizu I, Judkins D. The population based establishment survey. Proceedings of the Section on SurveyResearch Methods. American Statistical Association, 1995; 470–473.

3. Sirken MG. Design effects of sampling frames in establishment survey. Survey Methodology 2002; 28:183–190.4. Sirken M, Shimizu I. Design effects of linked population/establishment surveys. 2004 Proceedings of the American

Statistical Association, Survey Methods Research Section (CD-ROM). American Statistical Association: Alexandria,VA, 2004.

Published in 2007 by John Wiley & Sons, Ltd. Statist. Med. 2007; 26:1788–1801DOI: 10.1002/sim