the impact of delays on service times in the intensive ...cc3179/ed2icudelay_2013.pdfthe impact of...
TRANSCRIPT
The Impact of Delays on Service Times in the IntensiveCare Unit
Carri W. ChanDecision, Risk, and Operations, Columbia Business School,New York, NY 10027, [email protected]
Vivek F. FariasSloan School of Management, Massachusetts Institute of Technology, Cambridge, MA [email protected]
Gabriel EscobarDivision of Research, Kaiser Permanente, Oakland, CA [email protected]
This version: July 23, 2013
Mainstream queueing models are frequently employed in modeling healthcare delivery in a number of settings, and
further used in making operational decisions for the same. The vast majority of these queueing models assume that the
service requirements of a job are independent of the state ofthe queue upon its arrival. In a healthcare setting, this
assumption is equivalent to ignoring the effects of delay experienced by a patient awaiting care. However, it is only natural
to conjecture that long delays may have adverse effects on patient outcomes and can potentially lead to longer lengths
of stay (LOS) when the patient ultimately does receive care.At a very coarse level, prior research confirms these natural
conjectures. This work sets out to understand these delay issues from an operational perspective. In particular, usingdata
of nearly 6,000 Emergency Department (ED) visits, we use an instrumental variable approach to empirically measure
how congestion in the Intensive Care Unit (ICU) can lead to delays in boarding from the ED to the ICU and measure the
impact on the patient’s ICU LOS.
Capturing these empirically observed effects in a queueingmodel is challenging as the effect introduces potentially
long range correlations in service and inter-arrival times. As such, we consider the problem of how to incorporate these
measured delay effects into a queueing model and characterize approximations to various quantities of interest when the
service time of a job is adversely impacted by the delay experienced by that job. Our findings suggest that this delay
effect can be substantial and ignoring it when using queueing models to model healthcare delivery systems may result in
significant under-provisioning.
Key words: Delay effects, queueing, Healthcare
1. Introduction
Delays arise routinely in various healthcare settings: they are a consequence of the inherent, highly variable
requirements of healthcare services and the overwhelming demand for these services. It is natural to conjec-
ture that delays in receiving the appropriate care can result in a variety of adverse outcomes – and indeed,
there is some support for such conjectures. This paper proposes to study one such adverse outcome in the
intensive care setting: delays in receiving intensive carecan result in longer lengths of stay in Intensive
Care Units (ICUs). From an operational perspective, this effect has two consequences. The first, of course,
1
2
is the immediate impact on the delayed patient. The second,systemicimpact is the increased congestion
caused by the increased care requirements for the delayed patient. In particular, the increased ICU length
of stay can result in delays toother patients requiring the same ICU resources, which in turn results in
longer lengths of stay for those patients, and so forth. Thispaper will (empirically) study the extent of this
phenomenon. We then propose to modify extant queueing models (that are frequently used to model such
systems) to account for the phenomenon and present a theoretical analysis for the same.
Delays and the ED-ICU Interface: One place where delays are apparent is in the Emergency Depart-
ment (ED). Due to growing demands and reductions in the number of physicians, nurses and beds, EDs are
often overcrowded (Burt and Schappert 2004). The overall median wait to see an ED physician increased
from 22 minutes in 1997 to 30 minutes by 2004 (Wilper et al. 2008). A number of factors can contribute to
delays. For instance, delays may be due to an overload of patients in the ED and insufficient resources to
treat patients in a timely manner. Delays can also occur following assessment and stabilization of patients.
About 10% of patients initially admitted to the hospital through the ED are treated in an ICU; here patients
can experience further ‘boarding’ delays due a multitude offactors ranging from congestion in the ICU
which forces patients to wait in the ED until an ICU bed becomes available (see Litvak et al. (2001)) to
coordination mishaps.
Turning attention to the ICU, we note that ICUs typically provide the highest level of care with one nurse
for every one to two patients. These units are very expensiveto operate and typically require 20% of hospital
operating costs despite only consisting of 10% of the beds (Rivera et al. 2009). Consequently, these units are
often operated at or ‘above’ capacity. Hospitals have developed a number of approaches to deal with ICU
congestion. For instance, ICU congestion can result in discharging current patients preemptively (Chalfin
2005, Dobson et al. 2010, Kc and Terwiesch 2012, Chan et al. 2012), blocking new patients via ambulance
diversion (Allon et al. 2013) or rerouting patients to different units (Thompson et al. 2009, Kim et al. 2012).
In this work, we focus on a frequent symptom of this congestion: admission delays. With an increase in
critical care usage (Halpern and Pastores 2010) and a relatively stagnant supply of ICU beds, it is no wonder
that delays for patients awaiting ICU admission are growing.
This paper will focus on the flow of patients from the ED into the ICU. In particular, we will examine
the ‘boarding’ delay experienced by these patients and the impact of this delay on the length of the patients
stay in the ICU.
Standard Queueing Models Fall Short: Queueing models are often used to model and analyze patient
flows in hospital settings. These models are predictive and can provide valuable insight into the impact of
changing demand scenarios as well as staffing, or more generally capacity provisioning alternatives. See
Green (2006) for an overview of how queueing models have beenused in healthcare applications. The
3
vast majority of these queueing models assume that the service requirement of a job is independent of the
state of the queue upon its arrival. In a healthcare setting,this assumption is equivalent to ignoring the
effects of delay experienced by a patient awaiting care. As we show at a granular level in this paper, this
is not a tenable assumption. In addition, there have been various condition specific studies in the medical
community demonstrating that delays can result in an increase in mortality (de Luca et al. 2004, Chan et al.
2008, Buist et al. 2002, Yankovic et al. 2010) and/or extend patient Length-of-stay (LOS) (Chalfin et al.
2007, Renaud et al. 2009, Rivers et al. 2001).
As we shall see, even in the simplest settings, a natural queueing model that captures the impact of
delays on service time, can be modeled as a high dimensional Markov chain which does not appear easy
to analyze. This is not surprising, since capturing the delay effect creates long-run correlations between
service times and inter-arrival times and very little can besaid about such systems. While such models may
still be beneficial in simulation, the queuing phenomena made transparent by simpleG/G/n type models
is obscured. As such, an important component of this paper isa simple set of closed-form approximations
to key performance metrics for such systems.
Questions and Contributions: In this work, we focus on the effect of delay on patient lengthof stay in
the ICU and characterize the potential congestion caused byany increase in ICU length of stay due to this
effect. In particular we consider the following questions:
1. What is the relationship between an additional hour of waiting for critical care and additional LOS
for a patient when she does eventually receive this care? We answer this question using empirical data
on patient flows from a large hospital network. Our study focuses on how delays in boarding from the
Emergency Department (ED) to the ICU impact patient LOS in the ICU. Our empirical study is granular
and characterizes the magnitude of this effect for a varietyof patient primary conditions. We find strong
evidence for the conjecture that increased ED boarding times are associated with longer ICU lengths of stay.
Loosely, for some primary conditions (such as catastrophicpatients), a single additional hour of boarding
delay (relative to mean delay) is associated with approximately four additional hours in the ICU (relative to
the mean LOS for that class of patients).
2. Can we incorporate such delay effects in the queueing models we use for capacity planning? The
natural analogue of anM/M/s queueing model unfortunately calls for the analysis of a high dimensional
Markov chain which is analytically intractable and obscures queueing phenomena. We present a rigorous,
analytically tractable approximation to such models that,in addition to being quite accurate, provides a
simple, transparent view of the impact of congestion on performance metrics of interest in thepresenceof
the delay effect. This, in turn, allows for the same flexibility of anM/M/s model while accounting for
the delay effect. We view the simplicity of these approximations as surprising since queueing systems with
long-range correlations in service and inter-arrival times are known to be notoriously difficult to analyze.
4
While physicians recognize that delays are detrimental foran individual patient, our analysis provides
insight into the impact such delays may have on increasing overall congestion and reducing access to care
for other critical patients. Perhaps the most important operational insight that arises from our work is the
extent to which interventions that decrease boarding delays can have on key system measures. In particular,
our analysis reveals that such interventions can prove justas important as capacity augmentation! Via our
empirical and theoretical analysis, we demonstrate that ignoring the delay effect when using queueing mod-
els to analyze healthcare operations can result in severe under-provisioning. Moreover, ignoring such delay
effects and the subsequent increase in congestion may result in hospitals utilizing other congestion control
measures, such as ambulance diversion, more frequently.
The rest of this paper proceeds as follow. We first review somerelated literature in Section 1.1. Section 2
provides empirical motivation for our delay-sensitive queueing model. Section 3 presents a simple queueing
models which incorporates state-dependent service times.We examine this model in a Markovian frame-
work in Section 3.2. In Section 4, we develop approximationsfor the system backlog and demonstrate that
the impact of delays can be substantial. In Section 5, we examine the performance hof these approximations.
Section 6 concludes.
1.1. Related Literature
The medical community has invested significant effort into measuring the detrimental impact of delays on
patient outcomes. The majority of this work has focused on a binary notion of delay: was a patient delayed
or not? For instance, a transfer from the Emergency Department (ED) to the Intensive Care Unit (ICU) was
labeled as ‘delayed’ if it was greater than6 hours (Chalfin et al. 2007); however, there was no distinguishing
between6 and20 hours of delay. They find that the median hospital length of stay (inclusive of ICU and
general medical ward stay) is 1 full day longer and the in-hospital mortality rate was 35% higher for patients
who were boarded more than 6 hours. The definition of delay varies across different medical conditions
and scenarios. Renaud et al. (2009) compares the outcomes ofpneumonia patients who are transferred to
the ICU within 1 day (non-delayed) versus 3 days (delayed) ofpresenting symptoms. They find that the
median hospital LOS and 28-day mortality rate is nearly twice as high for delayed patients. The order of
magnitude for delay can be in minutes as in the case of cardiacpatients (de Luca et al. 2004, Buist et al.
2002, Yankovic et al. 2010, Chan et al. 2008) or up to5 days for burn-injured patients (Sheridan et al. 1999).
All of these works focus on a single patient condition in a single hospital and may lead one to conjecture
that the delay effect is isolated to a narrow section of the patient population that visits the ICU. We verify
instead that the delay-effect is prevalent across multiplehospitals and ailments.
In this work, we focus on how operational factors contributeto delay. Specifically, we empirically exam-
ine the impact of ICU occupancy levels on ED boarding, where boarding time is defined as the time a patient
5
spends waiting in the ED for an inpatient bed assignment after a bed has already been requested. Addition-
ally, we consider how this delay impacts ICU LOS (as opposed to hospital LOS as the prior medical works
have considered). We are interested in examining the adverse feedback where congestion induces delays
which further increases congestion.
Shi et al. (2012) also consider ED boarding, but focuses on the impact of hospital discharge policies on
patient boarding. Similar to our work, they consider empirical analysis to motivate stochastic models. Using
simulation models, they approximate inpatient operationsin a hospital in Singapore. In our work, we aim to
provide analytic approximations to the impact of ED boarding on system dynamics such as average number
of patient hours in the system.
Most related to our empirical analysis is the works of Kc and Terwiesch (2009, 2012) and Anderson et al.
(2011). The authors consider how high load impacts ICU LOS following surgery. These works find that
high occupancy levels can result inshorterpatient length-of-stay (LOS) due to a need to accommodate
new, more critical patients. Moreover, such reductions in LOS can increase risks for readmission and death.
In contrast, our work considers theadmission, instead of discharge, process which is altogether a funda-
mentally different medical decision. In particular, we examine how the occupancy level in the unit which
a patient should be admitted canincreaseLOS in the current and subsequent unit. Kim et al. (2012) also
considers the impact of the occupancy levels of downstream hospital units; however, the focus is on how
high occupancy levels can affect patient routing and subsequently, patient outcomes. In the present work,
we focus the ICU and how congestion impacts delays rather than the routing to a potentially less desirable
recovery unit.
Motivated by our empirical findings, we consider how to incorporate the measured delay effect
into our queueing models via state-dependent dynamics. There have been a number of works which
have considered state-dependent queueing systems. Powelland Schultz (2004), Ata and Shnerson (2006),
George and Harrison (2001) all consider queueing systems where service times can be increased or
decreased depending on congestion. In general, they find that service rates shouldincreasewith congestion.
Ata and Shnerson (2006) analyze an M/M/1 queue where servicetimes can be reduced during congestion.
They consider a control problem of how to vary arrival rates,service rates, and prices depending on system
congestion. They find that the arrival rate should be decreased while the service rate should be increased
as the number of customers in the system grows. In contrast, we study a system where the service rate is
not controlled but a function of the system’s history and tackle the long range correlations that these effects
result in.
Anand et al. (2010) examines the quality-speed tradeoff in an M/M/1 queue where service times can be
reduced at the expense of service quality while reducing delay costs. They find that the equilibrium behavior
6
of a queueing system with service rates which vary with congestion is starkly different than in traditional
queueing models. We also compare the impact of congestion-dependent service times to traditional queue-
ing models. Our setting differs in two main factors: 1) service timesincreasewith congestion, we cannot
choose whether to increase or decrease them and 2) we focus onthe steady-state distribution of the queueing
system rather than the equilibrium control decisions.
Whitt (2003) considers how congestion increases with demand in anM/M/n system. In particular,
the arrival rate increases with congestion, whereas our service rate decreases. The arrival rate increases
with the number of serversn, but is strictly decreasing in a congestion measure which depends on the
number of servers. Depending on the congestion measure, different heavy-traffic regimes appear, which
can be used to estimate delay probabilities. While we also approximate the steady-state dynamics of a
congestion-dependent queueing system, we use a different approximation approach and focus on the impact
of congestion on service times, not arrival rates.
A number of approaches utilize limiting regimes to establish approximations for steady-state distributions
of state-dependent systems. For instance, Armony and Maglaras (2004) consider a system where customers
can select their service type, resulting in state-dependent arrival rates. Using approximations achievable
via analysis in the Halfin-Whitt regime, they establish estimates of the steady-state distributions of waiting
times. Mandelbaum and Pats (1998), Mandelbaum et al. (1998)use fluid approximations to approximate
state-dependent queueing networks. We also generate approximations of the steady-state distributions; how-
ever, we use a different approach by providing exact analysis for an upper bounding queueing system.
Perhaps the closest to our work is that of Whitt (1990) and Boxma and Vlasiou (2007) which examine a
G/G/1 queue with service times and interarrival times whichdepend linearly on delays. Under very special
conditions–e.g. the workload must decay over time, or interarrival times must increase as service rates
decrease–stability conditions and approximations to the waiting times can be derived. While both of these
works consider workload that may increase with delay, the dynamics of our system are very different. In
particular, we do not allow for the changes in interarrival times required for the results in Whitt (1990) and
Boxma and Vlasiou (2007). Consequently, the workload in oursystem will never decay as it must in the
aforementioned works.
While there has been important work focusing on state-dependent queueing systems, they are unable to
fully capture the healthcare specific dynamics which are estimated from real hospital data and presented in
this paper. Our goal is to develop a framework which accountsfor the type of delay effect which can appear
in a healthcare setting. In doing so, we hope to expand the wayqueueing models can be used in such a
setting. Queueing theory has been a useful tool to estimate performance measures, such as waiting times,
and to provide support in operational decision making, suchas determining staffing levels. For instance,
7
Yankovic and Green (2011) consider a variable finite-sourcequeuing model to determine the impact of
nurse staffing on overcrowding in the Emergency Department.In a related vein, de Vericourt and Jennings
(2011) consider an M/M/s//n queue to estimate the impact of nurse-to-patient ratio constraints on patient
delay. Green et al. (2006) modified the traditional M/M/s queueing model to develop time-varying staffing
levels for the Emergency Department. To the best of our knowledge, despite the ever-present delay effect in
healthcare applications, no other works have explicitly taken it into account.
2. Empirical Motivation: Model and Analysis
In this section, we empirically examine delays for patientsbeing transferred from the ED to the ICU. Since
delays can be caused for a number of reasons, we intend to focus on congestion related delays; i.e. delays
in the ED due to the unavailability of a bed in the ICU. We find that delayed transfers from the ED to the
ICU due to high ICU occupancy levels are associated with significant increases in ICU LOS. These findings
have significant implications for capacity planning and resource allocation in the ICU.
We will posit and estimate a reduced form model that relates patient physiological factors and ED board-
ing time (i.e. the delay between when a bed in the ICU is requested for that patient and the time the patient
is actually physically transferred from the ED to the ICU). The model permits the impact of boarding time
to be different across different patient categories.
2.1. Data
We analyze a large patient data set collected from 19 facilities within a single hospital network for a total
of 212,064 patient visits over the course of 1 year. This dataincludes patient level characteristics such as
age, sex, primary condition for admission (i.e. congestiveheart failure or pneumonia), and four separate
severity scores based on lab tests and comorbidities. It also includes operational data which tracks each
patient through each unit, marking time and dates of admission and discharge. Hospital units were classified
into six broad categories including Emergency Department (ED), General Medical Ward, Transitional Care
Unit (TCU), Intensive Care Unit (ICU), Operation Room (OR),and Post Anesthesia Recovery Unit. As
this was aninpatientdataset, the captured time in the ED is the time difference between the order to admit
to an inpatient unit and when the patient actually left the emergency department. Hence, this captures the
ED boarding timeand is measured as the time from when the admit order was placed until the patient is
physically admitted to an inpatient unit. Note that this does not include the time for triage, stabilization, and
assessment, all of which will typically be activities that occur prior to the request for an ICU bed.
Severity scores in the data were determined at the time of hospital admission and capture the severity of
the patients at the time the request for an ICU bed was made. Inorder to use these scores for risk adjustment,
we excluded all patients who were admitted to the ICU more than 48 hours after hospital admission since it
8
is unlikely the scores will accurately measure the severityof patients after that. These scores are used for the
over 3 million patients in this hospital network and have similar predictive power as the APACHE and SAPS
scores withc statistic in the 0.88 range (Zimmerman et al. 2006, Moreno etal. 2005). See Escobar et al.
(2008) for further description of these severity scores.
To understand the impact of delay on different patient types, we classify patients based on over 16,000
ICD9 admission diagnosis codes into 10 broad groups of ailments based on the types of specialists who
treat them: Cancer, Catastrophic, Cardiac, Fluid&Hematologic, Infectious, Metabolic, Renal, Respiratory,
Skeletal, and Vascular (Escobar et al. 2008). While there are some patients who do not fall into one of these
categories, we focus on these main groupings which the majority of patients fall under. The dataset we
analyzed consisted of over 102,800 ED patients, 7,700 patients of which were transferred to the ICU within
48 hours of hospital admission.
We consider patients whose admission was classified as ‘ED, medical’, i.e. their admission was via the
ED and their ailment was not considered surgical. 900 patients were removed from the sample because they
died. This is common practice in the medical community because various factors, such as Do-not-resuscitate
orders, can skew LOS estimates for patients who die (Norton et al. 2007, Rapoport et al. 1996). We note
that we verified the robustness of our empirical analysis by also including patients who died and find our
results are quite similar. When determining occupancy levels, all patients are included.
The final dataset consisted of 5,996 ED patients who survivedto hospital discharge and were transferred
to the ICU within 48 hours of the admission decision in the ED.The average ICU LOS for these patient
classes was 56 hours with the maximum ICU LOS of nearly 37 days. The average ED boarding time was
3.5 hours. The average age of the patients was 64. Table 1 summarizes the statistics for the different patient
categories. The average occupancy of the ICU was 70%.
Condition Category Number of ED Boarding Time ICU LOS AgePatients Mean± Std. Mean± Std. Mean± Std.
Cancer 27 4.28± 4.99 52.50± 36.96 64.89± 11.01Catastrophic 685 2.77± 3.91 87.15± 83.70 62.20± 18.37Cardiac 2203 3.57± 4.21 37.75± 36.59 66.07± 14.31Fluid&Hematologic 164 4.30± 5.24 45.78± 47.79 64.70± 16.10Infectious 1012 3.85± 4.71 74.85± 84.08 65.73± 16.86Metabolic 650 2.87± 3.30 51.70± 57.21 48.64± 19.92Renal 123 3.49± 4.62 64.04± 63.75 60.67± 16.44Respiratory 741 3.32± 4.05 65.50± 75.96 66.30± 15.62Skeletal 98 4.83± 5.78 52.70± 55.09 66.00± 18.70Vascular 293 3.27± 3.92 53.01± 42.01 69.72± 13.69
Table 1 Summary Statistics for 10 patient categories
9
2.2. Hypotheses
We wish to understand how ICU occupancy levels can impact delays to ICU admission and, in turn, how
this delay impacts patient ICU LOS. We consider the following hypotheses which are primarily motivated
by evidence in the medical literature as well as the medical expertise of one of the coauthors:
1. When the ICU is busy, patient admissions may be delayed. This results in an increase in ED Boarding
time for patients who are to be admitted to the ICU.
∂ED BOARD
∂ICU OCC> 0
2. The ‘delay effect’: ICU Admission delays can hurt patients’ outcomes. ICU LOS is increasing in ED
Boarding time.∂ICU LOS
∂ED BOARD> 0
While both hypotheses are natural to conjecture, the significance these phenomena can play in capacity
management (as we will see in the subsequent sections) merits that we establish their veracity rigorously. In
addition, the empirical study in this section will also allow us to quantify the magnitude of the delay effect
for different classes of patients.
2.3. Estimation Model
We now describe our reduced-form model which forms the basisfor our estimate of the impact of boarding
delay on ICU LOS. To test hypothesis 1, we regress ED boardingtime for patienti,ED BOARDi, against
a measure of ICU occupancy and patient specific physiological variables. In particular, we letICU BUSY
be an indicator for the ICU being in a busy state, as will be described in detail later. Further, letXi be a
vector of various physiologic and operational factors which may affect ICU LOS as well as ED boarding
time, such as patient severity, age, primary condition, dayof admission, and hospital where care is received.
One ofXi’s components is a constant. Our model is then:
ED BOARDi = βTXi+ γICU BUSYi + εi (1)
whereεi is assumed to be zero-mean noise uncorrelated withXi and ICU BUSYi. The coefficientγ
measures the relationship between ICU occupancy levels andED Boarding time:γ > 0 would support
hypothesis 1.
To test hypothesis 2, we consider the ICU LOS of patienti, ICU LOSi, and the ED boarding time for
that patient,ED BOARDi. LettingXi be the same vector of features as before, our model is then:
ICU LOSi = βTXi +∑
j
δjED BOARDi1{AILMENTi=j} + νi (2)
10
wherej indexes the set of possible ailments. The zero-mean noise term νi is assumed to be uncorrelated
with Xi. The coefficientδj may be interpreted as measuring how each additional hour of ED Boarding
increases expected ICU LOS for ailment groupj: δj > 0 would support hypothesis 2.
Instruments: We chose to not assume thatνi andED BOARDi are uncorrelated; correlation between
these two variables can arise for several plausible reasons, one of which is the impact unobserved patient
severity can have on bothICU LOSi andED BOARDi. An exceptionally severe patient may naturally
require a longer length of stay in the ICU (due to the increased time required for recovery). The same
patient may also be prioritized in any scheduling which could lead to shorter boarding times for that patient.
In particular, in such an event we would expectED BOARDi andνi to be negatively correlated. Since
such exceptional factors are unobserved in the model, the negative correlation, if ignored would result in
underestimatingδ. To address this issue we require suitable instrument variables.
The occupancy level in the ICU is unlikely to be correlated with patient severity but is likely corre-
lated with the boarding time experienced by the patient and hence constitutes an excellent candidate for
an instrumental variable. In particular, we useICU BUSYi as our instrumental variable. Our instrumental
variable regression permits an attractive interpretationas a two-stage regression: we replaceED BOARDi
in model (2), with ED BOARDi, thepredictedED Boarding time based on model (1).
2.4. Empirical Results
We first consider the impact of a busy ICU on ED Boarding. We define an ICU as ‘busy’ if the occupancy
level is greater than 80% of the maximum patient census over the course of the year. Because beds can be
flexed by bringing in additional staff, this is likely a lowerbound on the actual occupancy level. Moreover,
it is possible that the delay effects will be seen prior to 100% occupancy as some beds may be reserved in
anticipation of patient arrivals from other hospital units, such as the Operating Room. This characterization
of the ICU being busy is similar to the approaches taken in Kc and Terwiesch (2012), Kim et al. (2012),
Chan et al. (2012) and Batt and Terwiesch (2012) among others. Note that we examined other measures of
busy, including different thresholds and times at which theoccupancy was measured. The results are similar,
so we have included the most statistically significant ones.
With p < .001, ED Boarding time increases by 1.3463 hours when the ICU occupancy level is greater
than 80%. This result supports hypothesis 1 and further supports using ICU occupancy as an instrumental
variable in Model (2).
We now consider the impact of ED Boarding on ICU LOS. As a measure of model robustness, we
consider two models: the first does not use any instrumental variables and the second uses ICU occupancy
as an instrument for ED Boarding as discussed earlier. Table2 summarizes the delay effects for the 10
primary condition categories of interest. We see evidence of an endogeneity bias, especially in the case of
11
Renal patients, where it seems that increased boarding timeactually reduces ICU LOS. This goes against
medical knowledge and intuition. We can see that the instrument is able to adjust for this bias. When we use
the IV, all of the coefficients which capture delay effects increase; all statistically significant coefficients in
this case are positive.
(i) (ii)Without IV With IV
δCancer 1.8272 1.6828(2.3566) (5.7258)
δCatastrophic -0.8235 3.7687∗∗
(0.5897) (1.8169)δCardiac 0.0242 2.5668∗
(0.3232) (1.4904)δFluid&Hematologic 0.3411 6.3874∗∗
(0.8982) (2.5725)δInfectious 0.5382 3.1042∗
(0.4094) (1.6461)δMetabolic -0.4322 2.6632
(0.7218) (1.8336)δRenal -2.3084∗ -1.7586
(1.1789) (2.7339)δRespiratory 0.2935 4.2770∗∗
(0.5508) (1.7418)δSkeletal -0.8038 -0.6972
(1.0555) (3.1126)δV ascular -0.6246 3.8718∗
(0.9028) (2.2522)Standard errors in parentheses.∗p< 0.10; ∗∗p < 0.05; ∗∗∗p< 0.01
Table 2 ICU LOS regression results: (i) without instrumenta l variables; (ii) uses ICU Occupancy > 80% at
ICU admission time as an instrumental variable.
We can see that for patient categories: Catastrophic, Cardiac, Fluid & Hematologic, Infectious, Respira-
tory, and Vascular the delay effect is statistically significant (p < .10). For these ailments, 1 additional hour
in ED delay is associated with an increase in ICU LOS by 2.5-6.5 hours. As we will see in our analysis of
queueing systems with delay-dependent service times, thisimpact can be substantial.
We do not see any statistically significant results for patient conditions Cancer, Metabolic, Renal, and
Skeletal. Cancer, Renal and Skeletal are the patient conditions with the fewest number of patients, so the
lack of statistically significant results may be attributedto the small sample size. There are 650 samples
of Metabolic patients, yet it seems that delays may have little impact on ICU LOS. This may be because
Metabolic, along with the Cancer and Renal categories, corresponds to chronic conditions including Dia-
betes, immune disorders, end stage renal disease, etc. Subsequently, these patients may be more delay
12
tolerant. While the patients are considered severe (they still need ICU care), there is likely to be less urgency
when the patient’s primary condition for admission is chronic. Finally, Skeletal refers to conditions such as
broken hips, which may be susceptible to infection if left untreated; however, their urgency is likely to be
lower than other patients such as those with infections in the blood (Infectious). Hence, these four categories
which do not display statistically significant results for the delay effect seem to correspond to the conditions
which are most likely to have little to no relationship between ICU LOS and delayed admission.
We note that prior work has demonstrated that when the ICU is busy, patient LOS may decrease
(Kc and Terwiesch 2012). In their work, they focus on a singlecardiac ICU where patients are cared for fol-
lowing cardiac surgery. In our case, we do not consider surgical patients. We focus on ED medical patients.
Kim et al. (2012) shows that scheduled surgical patients aremost likely to experience speedup when the
ICU is busy, while ED medical patients do not seem to experience speedup when the ICU becomes con-
gested. Our data is consistent with these findings. Moreover, our findings are robust to controls for the
possibility of speedup.
From our empirical analysis, it is clear that, for a large group of patients, delays in ICU admission are
associated with substantial increases in ICU LOS. As expected from the medical literature, the impact of
delays varies across different patient conditions. We nextdevote our attention to understanding the implica-
tions of this delay effect on traditional queueing insights.
3. Incorporating the Delay Effect: M/M(f)/s Model
Motivated by our empirical analysis, we turn our attention to developing queueing models which incorporate
the delay effect. Such analysis allows one to measure the impact of ignoring the delay effect when using
conventional queueing approaches. To do this, we introduceanM/M/s-like queueing system which has
jobs with delay-dependent service times. Our analysis assumes a single patient class in order to focus on
the impact of the delay effect. Such an assumption is reasonable in hospitals with specialized ICUs. For
instance, some large hospitals have dedicated cardiac ICUswhere non-surgical cardiac patients are given
priority.
We begin with a Markovian queueing system and modify it to account for the delay phenomenon. In
particular, we consider a model wherein the service time of ajob is inflated from some nominal value by a
quantity which depends on the number of jobs in the queue uponthe job’s arrival. Hence, the service rate
of the standard exponential random variable depends on the delay of the job; we denote this dependence
by M(f) wheref is an ’inflation’ function that we will define shortly. Such a model is able to capture the
dynamics estimated from the patient data in the previous section.
We now formally introduce our delay-dependent queueing system. Consider ans server queueing system
described as follows: Jobs arrive according to a Poisson process at rateλ and are served in FIFO fashion.
13
We letNt denote the number of jobs in the system at timet. Jobi arrives at timeti and it’s service time
is exponentially distributed with mean1 + f(
Nt−i
)
wheref(·) is a growth function which satisfies the
following requirements:
1. f(m) = 0 for m= 0.
2. f(·) is bounded and non-decreasing.
In what follows, we will examine the behavior of this system and the impact of the growth function,
f(m). We will refer to such a system as a queueing system with delaydependent workload, and abbreviate
it with the notationM/M(f)/s.
3.1. Stability of an M/M(f)/s system
We first begin our analysis of our queueing system with delay-dependent service times by considering the
stability for such a system. While the stability condition,and consequently the throughput of anM/M(f)/s
system, is a relatively coarse performance benchmark, it provides interesting insight into the behavior of
such systems. We have that:
Proposition 1AnM/M(f)/s system is stable if and only if
λ
s≤
1
1+ fmax
wherefmax is the maximum value taken on byf(·).
The proof of this result can be found in the appendix. To provide some intuition of this result, if a burst of
jobs arrive, they will all experience some delay and an increase in service requirement. If a particularly bad
burst of jobs arrive in sequence, the system will quickly deteriorate to the point where all jobs are delayed
and require maximal service time. Hence, the stability requirement is based on the maximum possible job
requirement. While the question of stability reduces to thestandard stability characterization under the
worse-case scenario of all jobs inflating maximally, the system dynamics are more nuanced.
3.2. A Markovian Model
Our delay-dependent queueing system can be represented as amulti-dimensional Markov Chain. For the
sake of concreteness and simplicity of exposition we will consider a very simplef(·), and simply indicate
corresponding results for generalf(·). In particular, we assume that the workload increase function,f(·), is
defined as follows:
f(m) =
{
0, m<N∗;k, m≥N∗.
for some threshold occupancy level,N∗ > 0. Hence, the mean service time of each job is1 if there are fewer
thanN jobs in the system upon arrival and1+k otherwise. IfN∗ = s, this means any job which is delayed
14
will have an increased service requirement. Relating back to our empirical findings, the increase in service
requirement seems to occur if a new job sees an occupancy level of 80%, corresponding toN∗ = .8× s.
Let X = (XN ,XD) be the system state whereXN is the number of jobs in the system who arrived with
less thanN∗ jobs currently in the system. Note that due to the FIFO and non-preemptive service discipline,
if XN > 0, then necessarily there are(XN ∧ s) jobs currently in service at rate1. The remaining servers,
(s−XN)+, will be serving jobs at rate1
1+kif any are available. Otherwise, they will idle. We can verify
that the Markov Property holds for our state as defined.
Proposition 2AnM/M(f)/s system can be represented as a Markovian system with stateX = (XN ,XD).
PROOF: We show that the Markov Property holds for our system. We letX(i) = (XN(i),XD(i)) be the
state at theith state transition. What’s left to show is that
P (X(i+1)= (xN , xD)|X(0),X(1), . . . ,X(i− 1),X(i)) = P (X(i+1)= (xN , xD)|X(i))
We demonstrate this by considering the precise transition probabilities:
P (X(i+1) = (xN , xD)|X(0),X(1), . . . ,X(i− 1),X(i) = (x′N , x
′D))
=
λ
λ+(x′N∧s)+
x′D
∧(s−x′N
)+
1+k
, if (xN , xD) = (x′N +1, x′
D) andx′N +x′
D <N∗;
λ
λ+(x′N∧s)+
x′D
∧(s−x′N
)+
1+k
, if (xN , xD) = (x′N , x
′D +1) andx′
N +x′D ≥N∗;
x′N∧s
λ+(x′N∧s)+
x′D
∧(s−x′N
)+
1+k
, if (xN , xD) = (x′N − 1, x′
D);
x′D∧(s−x′N )+
1+k
λ+(x′N∧s)+
x′D
∧(s−x′N
)+
1+k
, if (xN , xD) = (x′N , x
′D − 1);
0, otherwise.
(3)
= P (X(i+1)= (xN , xD)|X(i) = (x′N , x
′D))
It is clear that the transition probabilities depend only onthe current state and are independent of the past.
2
The transition matrix for this Markov Chain has a block diagonal structure. However, despite this struc-
ture, solving for the steady-state dynamics involves solving a high dimensional matrix inversion. While one
may be able to solve this numerically, it does not provide much insight for the general model. Moreover,
this approach quickly becomes intractable with more general f functions. The state-space must grow in the
number of break-points in the functionf , so that the block sizes in the transition matrix grow exponentially
in the number of break-points.
Despite starting from the innocuousM/M/s queueing model, the introduction of the delay effect makes
the resulting system far too difficult to permit an exact analysis. As such we focus on producing approxi-
mations to quantities of interest (such as the expected workload) by constructing suitable upper bounding
systems. This analysis provides some insight into how the issues above might impact nominal predictions
that do not account for the impact of delay on service time.
15
4. Approximating The Workload Process
This section will be concerned with establishing and interpreting a simple (and fairly accurate) approxima-
tion to the long run average work load of anM/M(f)/s system. In particular, let us denote byWt and
Nt respectively, the workload and number in system processes in this system. Consider also, anM/M/s
system with arrival rateλ and service rate 11+fmax
wherefmax =maxm f(m). Assume the service discipline
for this system is FIFO. We denote byW t and andN t respectively, the workload and number in system
processes in this system. We will frequently refer to the former system (the system we are interested in
analyzing) as system 1 and the latter system (which will havevalue in our producing bounds) as system
2. Finally, we denote byW t, the workload process in anM/M/s system with arrival rateλ and service
rate1, i.e. a systemwithoutany delay-effect or relationship to the growth functionf(m). We will refer to
this system as the baseline, delay-independent system and use it’s behavior as a comparison benchmark for
our M/M(f)/s system and the corresponding bound we will establish. We letE[W ],E[W ], andE[W ]
denote the expected work in each system. That is, if we start the systems according to their respective sta-
tionary distributions, then these correspond to the expected work in each system at time0: E[W ] =E[W0],
E[W ] =E[W 0], andE[W ] =E[W 0]
4.1. An Upperbound for A Step Function
In order to provide more insight into the bound we will derive, we start by examining a special case of the
delay-growth function,f . In particular, we focus on the case where jobs have nominal service requirement
of mean1 which increases to1+ k if there areN∗ or more jobs in the system upon arrival:
f(m) =
{
0, m<N∗;k, m≥N∗.
Such a delay growth function captures the increased servicetime required by jobs (patients) who arrive to
a congested system (i.e.,m ≥N∗). As described in Section 3.2, we can relate this delay-growth function
directly to our empirical study by appropriately definingN∗. This bears similarities to some of the medical
literature which examines the increase in workload of delayed versus not delayed patients (Chalfin et al.
2007, Renaud et al. 2009). Moreover, we consider the case where the service times are exponentially dis-
tributed. We can establish the following upperbound:
Theorem 1Assume thatf(·) is defined according tof(m) = k for m≥N∗, andf(m) = 0 otherwise. We
have that the expected workload,E[W ], satisfies
E[W ]≤E[W ]−λ(2k+ k2)P (N <N∗)
whereW andN denote the workload and number of jobs in a traditionalM/M/s system with arrival rate
λ and service rate1/(1+ k).
16
The upperbound consists of the amount of work in the system ifall jobs were inflated, which is then
corrected according to the second term in the bound. To provide some intuition of the correction term, let’s
consider the case whereN∗ = s and examine the amount of work contributed by an arbitrary job, i. We
note that we correct for the extra amount of work that is introduced whenever a job does not have to wait
upon arrival, i.e.Nt−i< s. A job that immediately begins service contributes a total of 1
2W 2
i work, i.e. it
brings workWi that is depleted at constant rate 1 until it completes service. The total contribution is then
the area of the right triangle with width and height equal toWi. Because this job does not have to wait, the
amount of work that is actually contributed isW2i
2(1+k)2, which accounts for the artificial inflation of the work
to expected size1+ k. Therefore, to account for the actual amount of work introduced by a job who does
not have to wait, we subtract the amount of work contributed by the inflated job12W
2
i and add the amount
of work by the correct mean1 sized job: W2i
2(1+k)2. See Figure 1 for an illustration of accounting to correct for
the excess work introduced. Recognizing that the second moment of an exponential random variable with
meanµ is µ2, we derive the desired result.
����������������������������
����������������������������
W i
W i1+k
ti ti +W iti +W i1+k
12W
2
i −12
(
W i1+k
)2
Figure 1 Due to the inflation of all jobs, each job which experi ences zero delay contributes excess work
which is shaded in gray.
Note, that fork= 0, we recover the results for a queueing system without delay-dependent service times.
In the case of Markovian dynamics, we recover the classical results of anM/M/s queue. The first expres-
sion in the upper bound corresponds to a system whereall jobs have their service time increased, irrespective
of the amount of delay experiences. However, the workload does not unilaterally increase with the load.
The second part of the expression represents the correctionfor over inflating the workload for jobs which
do not experience excess congestion. We note that this is an upper bounding system because, while we
account for the correct workload if a job is not delayed, we donot correct for the propagation effect of it’s
inflated workload on delays for future jobs. Still, the upperbound is quite accurate for systems with various
growth factors,k, and numbers of servers,s. Figure 2 demonstrates the accuracy of the derived upperbound
in comparison to the simulated workload of theM/M(f)/s system.
17
0.3 0.4 0.5 0.6 0.7 0.80
5
10
15
20
25
30
λ
E[W
]
UB M/M(f)/sM/M(f)/s
k=.2
k=.1
k=.05
(a) 1 server
0.6 0.8 1 1.2 1.4 1.60
5
10
15
20
25
30
λ
E[W
]
UB M/M(f)/sM/M(f)/s
k=.05
k=.1
k=.2
(b) 2 servers
3 4 5 6 7 80
5
10
15
20
25
30
35
40
λ
E[W
]
UB M/M(f)/sM/M(f)/s
k=.2
k=.05
k=.1
(c) 10 servers
Figure 2 Comparison of expected workload in a simulated M/M(f)/s system versus the derived upper-
bound for s= 1,2, and 10. Inflation is given by a step function: f(m) = k1{m≥s} with k = .05, .1, and
.2.
We observe that the upperbound in Theorem 1 admits a simple analytical expression. This allows us to
generate a clean understanding of the impact of delay on the workload process akin to our understanding
of the role factors such as utilization play in a traditionalM/M/s system. We do this by deriving explicit
expressions for the upperbound.
4.1.1. Exact Expressions and Interpretation To further allow for additional interpretation of our
bound, we leverage established expressions forM/M/s queues to evaluate our bound. We have for an
M/M/s queueing system with arrival rateλ and service rateµ, i.e.ρ= λ/(µs):
π0 =
[
s−1∑
i=0
(sρ)i
i!+
(sρ)s
s!(1− ρ)
]−1
πn =
{
π0(sρ)n
n!, n < s;
π0ρnss
s!, n≥ s.
(4)
The expected work in the systemE[W ] =E[N ]/µ is given as:
E[W ] =sρ
µ+
1
µ
ρ
(1−ρ)2(sρ)s
s!∑s−1
i=0(sρ)i
i!+ (sρ)s
s!(1−ρ)
(5)
Thus, for any number of servers,s, it is possible to compose exact expressions for our upperbound.
To demonstrate this process, we now explicitly evaluate ourbound in two cases: a single server and
two servers. While such a small system may not be generally applicable to an ICU setting, there are
specialized ICUs which can be very small. For instance, in California, the smallest number of licensed
Medical/Surgical ICU beds amongst hospitals with such an ICU is 2 and three hospitals have a 3 bed
ICU (State of California Office of Statewide Health Planning& Development 2010-2011). More generally,
there are other service settings which include a delay effect and have few servers. For instance, Primary
Care may be one such setting (though the delay effect is likely much smaller than in the ED to ICU setting
18
which we are considering here). In our evaluation of explicit expressions, we considerN∗ = s, so that the
workload increases for any job which is delayed. Note that our empirical estimates find that occupancy
levels of 80% have a statistically significant relationshipto increase ED boarding time (delay), which in
turn relates to an increase in ICU LOS. As we are examining theimpact of delay (which is influenced by
occupancy levels), we introduce the delay effect in our queueing system when a job isactuallydelayed, i.e.
whenN∗ = s.
The Single Server Case M/M(f)/1: We want to compare the behavior of theM/M(f)/1 system to
a regularM/M/1 system which does not have any delay effect. We denote the workload in anM/M/1
system with arrival rateλ and service rate1 asW and note that:
E[W ] =ρ
1− ρ
for ρ= λ. For notational consistency, we maintain this definition ofρ= λ throughout the following analysis.
For ourM/M(f)/1 system, we use the result derived in Theorem 1 to establish anupper bound toE[W ],
the expected work in this system:
E[W ]≤WUB =(1+ k)2ρ
1− (1+ k)ρ−λ(2k+ k2)
1
1− (1+ k)ρ=
ρ
1− (1+ k)ρ
We consider the ratio between these two expressions to understand the relative increase in workload due
to the delay effect:WUB
E[W ]=
1− ρ
1− (1+ k)ρ
We can see the precise dependence on the growth factork. Traditional queueing systems assume thatk= 0.
To understand the impact of ignoring the delay effect, we canexamine how the relative workload increases
with k–especially whenk= 0. We have that:
d
dk
(
WUB
E[W ]
)∣
∣
∣
∣
k=0
= ρ1− ρ
(1− (1+ k)ρ)2
∣
∣
∣
∣
k=0
=ρ
1− ρ=E[W ]
Hence, if we use a Taylor series approximation, we have that
WUB
E[W ]≈ 1+E[W ]k
so that the workload in ourM/M(f)/1 system grows quadratically with the expected work in a traditional
M/M/1 system. When considering that the work grows exponentiallyin ρ for a traditionalM/M/1 system,
we see that in our new system with delay dependent service times, the work will grow super exponentially
with ρ.
The Two Server Case M/M(f)/2: We now consider a similar analysis to the single server case when
there are two servers. Because there are two servers, we now define the system loadρ= λ/2 and maintain
19
this definition in what follows. The expected workload of anM/M/2 system with arrival rateλ and service
rate1 is:
E[W ] = 2ρ+
ρ
(1−ρ)2(2ρ)2
2!∑1
i=0(2ρ)i
i!+ (2ρ)2
2!(1−ρ)
=2ρ
1− ρ2(6)
For ourM/M(f)/2 system, we use the upper bound derived in Theorem 1:
E[W ]≤WUB =2(1+ k)2ρ
1− (1+ k)2ρ2− 2ρ
(2k+ k2)(1+2(1+ k)ρ)(1− (1+ k)ρ)
1+ (1+ k)ρ(7)
We consider the ratio between these two expressions to understand the relative increase in workload due
to the delay effect:
WUB
E[W ]=
(1+ k)2(1− ρ2)
1− (1+ k)2ρ2−
(2k+ k2)(1+2(1+ k)ρ)(1− (1+ k)ρ)(1− ρ2)
1+ (1+ k)ρ(8)
Again, we can see the precise dependence on the growth factork. However, it is still cumbersome to fully
understand the impact ofk. To do this, we again utilize the Taylor Series approximation to understand the
impact of introducing the delay impact, i.e. whenk=0. We have that:
d
dk
(
WUB
E[W ]
)∣
∣
∣
∣
k=0
=2ρ2
1− ρ2+6ρ2 +4ρ3 (9)
= ρE[W ] + 6ρ2 +4ρ3
≈
{
ρE[W ], ρ≈ 1;6ρ2 +4ρ3, ρ≈ 0.
When the system is not very loaded, i.e.ρ≈ 0, the polynomial ordered terms,6ρ2 + 4ρ3 dominates in the
derivative. Hence, the relative increase in workload growspolynomially in the system load. We expect the
delay effect to have a much more substantial impact as the system becomes more heavily loaded. Whenρ
is close to1, the first term in the derivative dominates and the workload grows with respect to the expected
amount of work and system load in a delay-independent queueing system. When examining the impact of
the delay effect in conjunction with theM/M/1 case, we see that the delay-effect increases the amount of
work in the system based on the expected workload in a traditionalM/M/s system. In particular,
WUB
E[W ]≈ 1+E[W ]ρk
aroundk = 0 and forρ close to one. Similar to the single server case, we see the delay effect introduces a
quadratic term inE[W ], the expected work of a traditionalM/M/2 system. Figure 3 displays the derivative
in these regimes.
20
0 0.02 0.04 0.06 0.08 0.10
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
ρ
Derivative
6ρ2 + 2ρ3
(a)Comparison of derivative atk= 0 for ρ≈ 0
0.9 0.92 0.94 0.96 0.98 10
20
40
60
80
100
120
140
160
180
200
ρ
Derivativeρ E[W]
(b)Comparison of derivative atk= 0 for ρ≈ 1
Figure 3 Simulation of M/M(f1)/s system
We can see that this bound lets us precisely characterize theimpact of the delay-dependent service times
on the expected workload in the system. We can relate the increase in the expected work in theM/M(f)/s
system for any number of servers,s, to the workload in a system without any delay effect. Certainly, a
more heavily loaded system will experience more delay. Thiswill magnify the impact of the delay-effect.
On the other hand, when the system load is low, the delay-effect will have little impact since few jobs
will experience delays and the subsequent growth in servicerequirement. Because most healthcare systems
operate in a regime where delays happen with relative frequency it is important to understand the impact
of the delay-effect. Our analysis is a first step in understanding how to incorporate delay-dependent service
times into queueing systems and how the delay-effect can impact system behavior.
4.2. A General Upperbound for an M/M(f)/s System
As we saw in Section 2, the delay effect can be gradual. Thus, we now generalize our result from Theorem
1 to other delay-growth functions. Consider any growth function f(·) with a countable number of disconti-
nuities. Let0 =M0 <M1 <M2 < · · ·<MJ−1 <MJ =∞ be break points in the functionf , so that if the
number of jobs in the system upon arrival of a new job satisfyMj−1 ≤N t <Mj, the service rate of that job
is 1/(1+ kj), wherekj ≤ kj+1. Hence,
f(m) = kj, if Mj−1 ≤m<Mj
Thus,f is an arbitrary non-decreasing piece-wise constant function. Note that any growth functionsf can
be expressed via such a discontinuous function.
As we have described before,Wt andNt are defined as the workload and jobs process for this delay-
dependent queueing system. Similarly, letW t be the workload process for anM/M/s system with arrival
21
rateλ and service rate1/(1+kJ), wherekJ =maxj kj. We can then establish the following upperbound to
ourM/M(f)/s system:
Theorem 2If f is a non-decreasing piece-wise constant function withf(m) = kj if Mj−1 ≤m<Mj, we
have that the workload process,Wt, satisfies
E[W ]≤E[W ]−J∑
j=1
[
λ(2kJ + k2J − 2kj − k2
j )P (Mj−1 ≤N t <MJ)]
The proof of this result requires a coupling argument and canbe found in Appendix B. To provide some
insight into the interpretation of this bound, we parse through the two expressions which compose the
upperbound:
1. The first term corresponds to the expected work in the system if all job are inflated maximally to
mean service time1 + kJ = 1 + fmax. Thus, it corresponds to the expected work in anM/M/s system
with ρ= λ(1+kJ)/s. However, most jobs will not be inflated to the maximum size, which brings us to the
second term.
2. The second term corresponds to the correction necessary for over inflating the workload of jobs with
moderate or no wait upon arrival. If this occurs, the work that the new job brings is a factor of1+kj
1+kJless
than the amount of work that arrives in theW t system. Removing this extra work results in the multiplier
of the last expression.
Note that the only time we rely on the exponential service times is to make the algebraic simplification
in Proposition 6 to establish the closed form expression forthe correction term. Hence, the bound can be
extended to general service times, but may not result in as clean expressions.
5. Numerical Comparisons
We now turn our attention to examining the behavior of our delay-dependent queueing system along with
the quality of the derived upperbound. In particular, we wish to examine how this delay effect may impact
a real system. To do this we connect back to our empirical analysis in Section 2 to calibrate our model. We
consider a setting with a fixed number of servers (beds). If a job (patient) arrives and there is an available
server, it is immediately served. If there is no available server, he must wait. We consider the expected
workload in the systems.
5.1. Calibration of Model
We consider a model where patients have a nominal ICU LOS. If anew patient is delayed admission, his
LOS increases by a constant factork. That is, we examine the scenario wheref(m) = k if m ≥ s and0
otherwise. We need to determine the value ofk. To do so, we turn back to our empirical analysis in Section
22
2. Recall that we found when the ICU occupancy is above 80%, wefind that patient delays (ED Boarding)
seems to increase. In turn, longer ED boarding is associatedwith longer ICU LOS. In order to capture this
effect in our simulations, we account for an increase in service time whenever a job is delayed. Note that
because our initial queueing model does not account for the possibility of physicians ‘saving’ ICU beds for
scheduled surgeries or the potential of more severe patients arriving, delays occur only when the occupancy
level is 100%. That is, the service requirement increases whenever a job arrives and all servers are busy.
Given the heterogenous impact on patients, we focus on a single condition category. Our numerical cal-
culations will be based on Cardiac patients. We selected this condition category because i) cardiac patients
demonstrate an increase in ICU LOS when delayed, ii) this is the largest group of patients in the hospital
system we are studying and iii) some hospitals have dedicated cardiac ICUs which primarily treat cardiac
patients. We notice that the mean LOS of Cardiac patients is 37.75 hours, the mean ED Boarding time is
3.57 hours, and each hour of boarding is associated with an increase in ICU LOS by 2.5668 hours.
We note that in our empirical analysis we estimated a linear growth function, which we will be approxi-
mating with a step function. We do this in two ways. First we consider the smallest reasonable delay effect.
In this case, we know that 1 additional hour of boarding time is associated with 2.5668 additional hours in
the ICU. We make this the smallest delay effect possible and define the growth functionf = f1 as:
f1(m) =
{
0.068= 2.5668 additional hours37.75 mean ICU LOS, m≥ s;
0, otherwise.
On the other hand, we notice that the mean ED Boarding time is more than 3 hours. We use this obser-
vation to consider a larger delay effect. In particular, we consider that patients will experience an average
delay of3.57 hours which translates to9.16 = 3.57× 2.5668 extra hours in the ICU. Hence, we consider a
second growth function:
f2(m) =
{
0.243= 9.16 additional hours37.75 mean ICU LOS, m≥ s;
0, otherwise.
We simulate the behavior of these delay-dependent queueingsystems for a small (6 beds) and moderately
sized (15 beds) ICU. We compare the expected workload to three benchmarks:
1. [M/M/s with ρ= λ] This represents a traditional queueing system without delay effects. This is a
(trivial) lower bound to the delay-dependent system.
2. [M/M/s with ρ= λ(1+k)] This represents a queueing system where the amount of work each
job brings is artificially inflated as ifall jobs experienced delays. This is a (trivial) upper bound to the
delay-dependent system.
3. [Upperbound derived in Theorem 1] This corrects for the miscalculation of work for jobs who are
not delayed.
23
We next examine how accurate the approximations are in orderto gain more understanding of the impact
of the delay effect and when it is most important to account for it when using queueing models to provide
insight into various service settings.
5.2. Simulation Results
Figure 4 plots the expected workload,E[W ], for different arrival rates. We make two observations about
the delay-dependent system. First, the upper bound is very accurate. Second, even with this very small
delay effect, we can see the behavior of the system is quite different than that of anM/M/s system. At
low loads, the delay-dependent system looks like anM/M/s system where no jobs are extended; this is
because few jobs, if any are delayed. However, as the system load increases, more jobs are delayed and the
delay-dependent system transitions between theM/M/s system without any job growth to theM/M/s
system with constant job growth. It is clear that ignoring the delay effect can be misleading as to the actual
work in the system.
0 0.5 1 1.5 2 2.5 3 3.5 410
−1
100
101
102
λ (patients/day)
E[W
] (da
ys)
M/M/s (µ = 1)UB M/M(f)/sM/M(f)/sM/M/s (µ = 1/(1+k))
(a) 6 bed ICU
0 1 2 3 4 5 6 7 8 910
−1
100
101
102
λ (patients/day)
E[W
] (da
ys)
M/M/s (µ = 1)UB M/M(f)/sM/M(f)/sM/M/s (µ = 1/(1+k))
(b) 15 bed ICU
Figure 4 Comparison of simulation of M/M(f1)/s system to the derived upperbound as well as traditional
M/M/s systems with no jobs or all jobs are inflated.
In order to get a better sense of the impact of the delay effect, in Figure 5, we examine the difference
in the expected workload of different models compared to a traditionalM/M/s system where no jobs are
extended, i.e.ρ= λ/s. Most ICUs are not operated in a regime where patients are rarely or always delayed,
so we focus on arrival rates where at least a third of the beds turn over each day so there is some, but not
excessive, congestion in our system. Again, we see that our bound is fairly accurate. Moreover, it provides
more insight into the system workload than anM/M/s system where all jobs are inflated. Note that an
24
M/M/s system withµ = 1/(1 + k) precisely characterizes the stability condition for a delay-dependent
queuing system (see Proposition 1). However, the dynamics of the workload are more nuanced.
1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.40
1
2
3
4
5
6
7
8
λ (patients/day)
∆ E
[W] (
patie
nt d
ays)
M/M/s (µ = 1/(1+k)) − M/M/s (µ = 1)UB M/M(f)/s − M/M/s (µ = 1)M/M(f)/s − M/M/s (µ = 1)
(a) 6 bed ICU
4 4.5 5 5.5 6 6.5 7 7.5 8 8.50
1
2
3
4
5
6
7
8
9
λ (patients/day)
∆ E
[W] (
patie
nt d
ays)
M/M/s (µ = 1/(1+k)) − M/M/s (µ = 1)UB M/M(f)/s − M/M/s (µ = 1)M/M(f)/s − M/M/s (µ = 1)
(b) 15 bed ICU
Figure 5 Simulation of M/M(f1)/s system: Difference in workload compared to a standard M/M/s system
with ρ= λs
. Here the growth factor is 6.8%.
Figure 6 considers the increase in expected workload when the delay-effect is much larger. In this case,
being delayed increases a patient’s ICU LOS by nearly 25% corresponding to patients seeing the average
delay of 3.57 hours. We notice that the upper bound is slightly looser. This is because the upper bound only
corrects the work a single job brings in, but not the propagation effect it has on delaying/not delaying future
jobs. This propagation is more substantial when the delay-effect is larger. Still, we can see the upper bound
is a better measure of system load than the naive upper bound of anM/M/s system withρ= λ(1+k)
s, i.e.
all jobs are extended.
Through our simulations, we can see that our derived upper bound can be quite accurate. Moreover, we
see that the expected workload for ourM/M(f)/s system is very different when comparing to a system
without a delay effect. Ignoring the impact delays may have on service times may result in poor capacity
management and substantial under provisioning when using traditional queueing models to guide such
decisions. It is especially important to consider the delayeffect when the system is heavily loaded and most
jobs tend to experience some delay. Without accounting for the delay effect, a hospital ICU can become
even more congested. In order to manage this increase in system load, hospitals may have to cancel surgeries
and/or divert ambulances to reduce patient arrivals at a substantial loss in revenue. As the delay effect seems
to be prevalent in a number of healthcare settings, reconsidering the management of these systems in light
of delay sensitive service times may result in substantial operational and medical care improvements.
25
1.4 1.6 1.8 2 2.2 2.4 2.6 2.80
2
4
6
8
10
12
14
16
λ (patients/day)
∆ E
[W] (
patie
nt d
ays)
M/M/s (µ = 1/(1+k)) − M/M/s (µ = 1)UB M/M(f)/s − M/M/s (µ = 1)M/M(f)/s − M/M/s (µ = 1)
(a) 6 bed ICU
3.5 4 4.5 5 5.5 6 6.5 70
2
4
6
8
10
12
14
16
18
20
λ (patients/day)
∆ E
[W] (
patie
nt d
ays)
M/M/s (µ = 1/(1+k)) − M/M/s (µ = 1)UB M/M(f)/s − M/M/s (µ = 1)M/M(f)/s − M/M/s (µ = 1)
(b) 15 bed ICU
Figure 6 Simulation of M/M(f2)/s system (increased delay effect): Difference in workload co mpared to a
standard M/M/s system with ρ= λs
. Here the growth factor is 24.39%.
6. Conclusion
To summarize, this work quantifies a relatively unstudied queuing phenomenon in a critical care setting – the
impact of delays on care requirements. We see that this natural phenomenon is substantially verified by data
and attempt to incorporate the phenomenon into simple queueing models. The impact of this phenomenon
is comparable with moderate service provisioning adjustments (which are expensive and can have dramatic
impact) and, as such, warrants careful attention.
Analyzing queueing systems with delay-dependent service times exactly can be cumbersome and
intractable. As such, we focus on the development of reasonable approximations for the system workload.
We find that 1) our approximations are quite accurate and 2) they provide expressions which allow for inter-
pretations related to increases in system load. We find that ignoring the delay effect when using queueing
models to guide operational decision making may result in substantial under provisioning of resources such
as beds, nurses, and physicians. Moreover, because the delay effect can be quite substantial, disregarding
it may impede future attempts to make ICUs more efficient and effective. Incorporating a delay effect will
result in more accurate estimates of system dynamics as wellas targets for system improvement.
While we don’t expect our models to directly translate into new capacity management criteria for hospital
ICUs, we hope that this analysis demonstrates the impact of ignoring the delay effects when making such
decisions. By ignoring the delay effects, ICUs continue to be highly congested. Such congestion can lead
to other reactive actions such as rerouting (Kim et al. 2012), patient speedup (Kc and Terwiesch 2012), and
ambulance diversion (Allon et al. 2013), which can also be detrimental to patient outcomes. From both a
patient as well as systems level perspective, it is desirable to reduce delays. While reducing the average ED
26
boarding time by an hour may be practically difficult, the adverse feedback of delays on increased service
requirements suggests that even small reductions in boarding time on the order of 10 to 15 minutes may
help reduce congestion.
ReferencesAllon, G., S. Deo, W. Lin. 2013. The impact of hospital size and occupancy of hospital on the extent of ambulance
diversion: Theory and evidence.Operations Research61 554–562.
Anand, K., M. F. Pac, S. Veeraraghavan. 2010. Quality-SpeedConundrum: Tradeoffs in Customer-Intensive Services.
Management Science57 40–56.
Anderson, D., C. Price, B. Golden, W. Jank, E. Wasil. 2011. Examining the discharge practices of surgeons at a large
medical center.Health Care Management Science1–10.
Armony, M., C. Maglaras. 2004. Contact centers with a call-back option and real-time delay information.Operation
Research52 527–545.
Ata, B., S. Shnerson. 2006. Dynamic Control of an M/M/1 Service System with Adjustable Arrival and Service Rates.
Management Science52 1778–1791.
Batt, R.J., C. Terwiesch. 2012. Doctors under load: An empirical study of state-dependent service times in emergency
care.Working Paper, The Wharton School.
Boxma, O.J., M. Vlasiou. 2007. On queues with service and interarrival times depending on waiting times.Queueing
Systems56 121–132.
Buist, M.D., G.E. Moore, S.A. Bernard, B.P. Waxman, J.N. Anderson, T.V. Nguyen. 2002. Effects of a medical emer-
gency team on reduction of incidence of and mortality from unexpected cardiac arrests in hospital: preliminary
study.British Medical Journal324 387–390.
Burt, C.W., S.M. Schappert. 2004. Ambulatory care visits tophysician offices, hospital outpatient departments, and
emergency departments: United States, 1999-2000.Vital Health Stat.13(157) 1–70.
Chalfin, D. B. 2005. Length of intensive care unit stay and patient outcome: The long and short of it all.Critical Care
Medicine33 2119–2120.
Chalfin, D. B., S. Trzeciak, A. Likourezos, B. M. Baumann, R. P. Dellinger. 2007. Impact of delayed transfer of
critically ill patients from the emergency department to the intensive care unit.Critical Care Medicine35
1477–1483.
Chan, C. W., V. F. Farias, N. Bambos, G. Escobar. 2012. Optimizing icu discharge decisions with patient readmissions.
Operations Research60 1323–1342.
Chan, P.S., H.M. Krumholz, G. Nichol, B.K. Nallamothu. 2008. Delayed time to defibrillation after in-hospital cardiac
arrest.New England Journal of Medicine358(1) 9–17.
de Luca, G., H. Suryapranata, J.P. Ottervanger, E.M. Antman. 2004. Time delay to treatment and mortality in primary
angioplasty for acute myocardial infarction: every minuteof delay counts.Circulation 109(10) 1223–1225.
27
de Vericourt, F., O. B. Jennings. 2011. Nurse Staffing in Medical Units: A Queueing Perspective.Operations Research
59 1320–1331.
Dobson, G., H.H. Lee, E. Pinker. 2010. A model of ICU bumping.Operations Research58 1564–1576.
Durrett, R. 1996.Probability: Theory and Examples. Duxbury Press.
Escobar, G. J., J. D. Greene, P. Scheirer, M. N. Gardner, D. Draper, P. Kipnis. 2008. Risk-adjusting hospital inpatient
mortality using automated inpatient, outpatient, and laboratory databases.Medical Care46 232–239.
George, J. M., J. M. Harrison. 2001. Dynamic control of a queue with adjustable service rate.Operations Research
49(5) 720–731.
Ghahramani, S. 1986. Finiteness of moments of partial busy periods for m/g/c queues.Journal of Applied Probability
23(1) 261–264.
Green, L. 2006. Queueing analysis in healthcare. Randolph W. Hall, ed.,Patient Flow: Reducing Delay in Healthcare
Delivery, International Series in Operations Research & Management Science, vol. 91. Springer US, 281–307.
Green, L. V., J. Soares, J. F. Giglio, R. A. Green. 2006. Usingqueuing theory to increase the effectiveness of emergency
department provider staffing.Academic Emergency Medicine13 61–68.
Halpern, N. A., S. M. Pastores. 2010. Critical care medicinein the united states 2000-2005: An analysis of bed
numbers, occupancy rates, payer mix, and costs.Critical Care Medicine38 65–71.
Kc, D., C. Terwiesch. 2009. Impact of workload on service time and patient safety: An econometric analysis of
hospital operations.Management Science55 1486–1498.
Kc, D., C. Terwiesch. 2012. An econometric analysis of patient flows in the cardiac intensive care unit.Manufacturing
& Service Operations Management14 50–65.
Kim, S-H, C. W. Chan, M. Olivares, G. Escobar. 2012. Managinginpatient units: An empirical study of capacity
allocation and its implication on service outcomes.Working Paper, Columbia Business School.
Litvak, E., M.C. Long, A.B. Cooper, M.L. McManus. 2001. Emergency department diversion: causes and solutions.
Acad Emerg Med8 1108–1110.
Mandelbaum, A., W. Massey, M. Reiman. 1998. Strong approximations for markovian service networks.Queueing
Systems30(1-2) 149–201.
Mandelbaum, A., G. Pats. 1998. State-dependent stochasticnetworks. part i: Approximations and applications with
continuous diffusion limits.The Annals of Applied Probability8(2) 569–646.
Moreno, R.P., P. G. Metnitz, E. Almeida, B. Jordan, P. Bauer,R.A. Campos, G. Iapichino, D. Edbrooke, M. Capuzzo,
J.R. Le Gall. 2005. SAPS 3–From evaluation of the patient to evaluation of the intensive care unit. Part 2:
Development of a prognostic model for hospital mortality atICU admission.Intensive Care Med31 1345–1355.
Norton, S.A., L.A. Hogan, R.G. Holloway, H. Temkin-Greener, M.J. Buckley, T.E. Quill. 2007. Proactive palliative
care in the medical intensive care unit: effects on length ofstay for selected high-risk patients.Crit Care Med
35 1530–1535.
28
Powell, S. G., K. L. Schultz. 2004. Throughput in Serial Lines with State-Dependent Behavior.Management Science
50 1095–1105.
Rapoport, J., D. Teres, S. Lemeshow. 1996. Resource use implications of do not resuscitate orders for intensive care
unit patients.Am J Respir Crit Care Med153 185–190.
Renaud, B., A. Santin, E. Coma, N. Camus, D. Van Pelt, J. Hayon, M. Gurgui, E. Roupie, J. Herve, M.J. Fine, C. Brun-
Buisson, J. Labarere. 2009. Association between timing ofintensive care unit admission and outcomes for
emergency department patients with community-acquired pneumonia. Critical Care Medicine37(11) 2867–
2874.
Rivera, A., J. F. Dasta, J. Varon. 2009. Critical Care Economics. Critical Care & Shock12 124–129.
Rivers, E., B. Nguyen, S. Havstad, J. Ressler, A. Muzzin, B. Knoblich, E. Peterson, M. Tomlanovich. 2001. Early
goal-directed therapy in the treatment of severe sepsis andseptic shock.New England Journal of Medicine
345(19) 1368–1377.
Sheridan, R., J Wber, K Prelack, L. Petras, M. Lydon, R. Tompkins. 1999. Early burn center transfer shortens the
length of hospitilization and reduces complications in children with serious burn injuries.J Burn Care Rehabil
20 347–50.
Shi, P., M. C. Chou, J. G. Dai, D. Ding, J. Sim. 2012. Hospital Inpatient Operations: Mathematical Models and
Managerial Insights.Working Paper, Georgia Institute of Technology.
State of California Office of Statewide Health Planning & Development. 2010-2011. Annual Financial Data. URL
http://www.oshpd.ca.gov/HID/Products/Hospitals/AnnFinanData/CmplteDataSet/index.asp.
Thompson, S., M. Nunez, R. Garfinkel, M.D. Dean. 2009. Efficient short-term allocation and reallocation of patients
to floors of a hospital during demand surges.Operations Research57(2) 261–273.
Whitt, W. 1990. Queues with service times and interarrival times depending linearly and randomly upon waiting times.
Queueing Systems6 335–352.
Whitt, W. 2003. How multiserver queues scale with growing congestion-dependent demand.Queueing Systems51
531–542.
Wilper, A.P., S. Woolhandler, K.E. Lasser, D. McCormick, S.L. Cutrona, D.H. Bor, D.U. Himmelstein. 2008. Waits to
see an emergency department physician: U.S. trends and predictors, 1997-2004.Health Affairs27 w84–95.
Yankovic, N., S. Glied, L.V. Green, M. Grams. 2010. The impact of ambulance diversion on heart attack deaths.
Inquiry 47 81–91.
Yankovic, N., L. Green. 2011. Identifying Good Nursing Levels: A Queuing Approach.Operations Research59
942–955.
Zimmerman, J. E., A. A. Kramer, D.S. McNair, F. M. Malila. 2006. Acute Physiology and Chronic Health Evaluation
(APACHE) IV: hospital mortality assessment for today’s critically ill patients.Crit Care Med34 1297–1310.
29
Appendix A: Miscellaneous Proofs
PROOF OFPROPOSITION1:
Stability: First, we show that ifλs≤ 1
1+fmax, then the system is rate stable. This follows by examining a traditional
M/M/s system with arrival rateλ and mean service requirement1 + fmax = 1 + maxm f(m). By coupling the
arrivals of this system and the service times so that if the mean service requirement in our delay-dependent system is
σ ≤ 1+ fmax, its service requirement isσX and the service requirement in theM/M/s system is(1+ fmax)X where
X is a mean1, exponentially distributed random variable. It is easy to see that thisM/M/s system is an upperbounding
system to our delay-dependent system. Hence, if the upperbounding system is stable, so is theM/M(f)/s system.
The stability condition for this upperbounding system is the desired criteria.
Instability: We now show that ifλs> 1
1+fmax, then the system is unstable. We do this in two steps: 1) we show that
from any initial state, there is a non-zero probability thatthe time until theM/M(f)/s system will reach the state
where the number of jobs in the system is such that the servicetime of a new arrival would be maximally inflated and
all the jobs in the system have been been delayed enough that their service rate is maximal is finite 2) we establish the
transience of this state which will establish that our resulting system is transient and, hence, unstable.
We define the following notation: LetNfmax=min{N : f(N) =maxn f(n)} be the minimum number of jobs in the
system such that the service time for a new job is inflated maximally. Our state at timet can be described by theNfmax-
dimensional vector,Zt, where(Zt)n is the number of jobs in the system which sawn jobs when it arrived (ZNfmaxis
the number of jobs which seeNfmaxor more jobs in the system). LetTxy = inf{t > 0 : Zt = y|Z0 = x} be the time to
first passage to statey given we start in statex at time0. Finally, we define the state with exactlyN =max(Nfmax, s)
jobs in the system, all of whose service time is maximally inflated asS∗ = {Z :ZNfmax=∑
nZn = N}.
We begin by showing that the time to reach stateS∗ is finite with non-zero probability from any initial state.
Specifically, we will show that for any statex, P (TxS∗ < ∞) > 0. Consider a system which starts at statex, i.e.
Z0 = x. LetNx be the number of jobs in the system in statex. We start with assumingNx <Nfmax+ N . Our goal is
to find the first time to stateS∗. One way to get toS∗ is to haveN +Nfmax−Nx jobs arrive before any job departs
the system and then haveNfmax−Nx jobs depart from the system before another job arrives. Thus, the probability of
this particular sample path occuring, which we denote asE, can be lower bounded by:
P (E)>
(
λ
λ+ sµmax
)N+Nfmax−Nx
(
sµmin
λ+ sµmax
)Nfmax
> 0
Moreover, the time it takes for this cascade of events to occur is upper bounded by the sum ofN + 2Nfmax−Nx,
mean1/(λ+sµmin) exponentially distributed random variables. Specifically, the time has a gamma distributionTE ∼
Γ(N +2Nfmax−Nx,1/(λ+ sµmin)), which is finite with non-zero probability. Hence, we have that:
P (TxS∗ <∞)>P (E)P (TE <∞)> 0
Note that ifNx >Nfmax+ N , we simply need thatNx − N jobs must depart before the next arrival. Using the same
argument as above, we can show thatP (TxS∗ <∞)> 0, for anyx.
Next, we demonstrate that the recurrence time for stateS∗ is infinite with non-zero probability, i.e.P (TS∗S∗ <
∞)< 1. To do this, we will leverage the fact that a standardM/M/s queueing system withρ= λsµmin
= λ(1+fmax)
s> 1
30
is unstable, and hence, transient. We consider two states inthisM/M/s system: statey, with N jobs in the system,
and statey+, with N +1 jobs in the system. Because thisM/M/s system is transient, the time to first passage from
y+ to y satisfies the following:P (TM/M/s
y+y<∞)< 1. Here we use the superscriptM/M/s to differentiate from the
first passage time of our delay dependentM/M(f)/s system,Txy.
We leverage the the preceding observation and decompose therecurrence timeTS∗S∗ into whether the next event is
an arrival or departure with the new state denoted byy+ andy−, respectively:
P (TS∗S∗ <∞) =sµmin
λ+ sµmin
P (Ty−S∗ <∞)+λ
λ+ sµmin
P (Ty+S∗ <∞)
≤sµmin
λ+ sµmin
+λ
λ+ sµmin
P (Ty+S∗ <∞)< 1
The last inequality comes from the observation that all jobswhich arrive to the system will see at leastN ≥ Nfmax
jobs in the system before the system hits stateS∗; hence, they will have service time exponentially distributed with
mean1+ fmax = 1/µmin. Hence, the dynamics of ourM/M(f)/s system are identical to theM/M/s system with
arrival rateλ and service rateµmin during the trajectory to the first visit to stateS∗ from statey+. Because theM/M/s
system is transient, stateS∗ is also transient in ourM/M(f)/s system.
By Theorem 3.4 in Durrett (1996), all states in ourM/M(f)/s system are transient since the time to reach a
transient state (y ∈ S∗) is finite with non-zero probability for all states. Hence, theM/M(f)/s queue is unstable. 2
Appendix B: Proof of Theorem 2
We now proceed with the proof of our main result. The proof will examine the case of Theorem 1, which assumes that
the growth functionf is defined as:
f(m) =
{
0, m<N∗;k, m≥N∗.
We note that the generalized result for Theorem 2 will followsimilarly. The only changes required are additional
notation and book keeping to keep track of each breakpoint inthe growth function,f . The proof will proceed in several
steps. Again we will refer to ourM/M(f)/s system as system 1 and anM/M/s system with arrival rateλ and service
rate1/(1+ k) as system 2.
Coupling: To begin we will construct a natural coupling between theM/M(f)/s andM/M/s systems above. In
particular, we assume that both systems see a common arrivalprocess. With an abuse of notation, let the service time
for the ith arriving job in the latter system beW i; the corresponding service time in the delay dependent system is
then eitherWi = W i/(1 + k) or Wi = W i depending on whether the delay dependent system has low congestion
(Nt−i<N∗) or is considered busy (Nt−
i≥N∗) upon the arrival of theith job. Finally, we assume that both systems start
empty. Now letτi (τ i) denote the amount of time theith arriving job waits in the former (latter) system respectively
before beginning service. We have, as a consequence of our coupling, the following elementary result:
Proposition 3τi ≤ τ i for all i. Moreover,Nt ≤N t for all t.
PROOF: We prove the first statement. Proceeding by induction observe that the statement is true fori= 1: τ1 = τ1 =
0. Assume the statement true fori= l− 1 and consideri= l. For the sake of contradiction assume thatτl > τ l. Since
the service discipline is FIFO in both systems, it follows that when jobl starts service in system 2:
31
• There are at mosts− 1 jobs from among the firstl− 1 arriving jobs present in system 2.
• Simultaneously, atleasts jobs from among the firstl− 1 arriving jobs are still present in system 1 since jobl has
not yet started service in system 1.
Consequently, given the induction hypothesis and the fact that by our couplingWi ≤W i for i= 1,2, . . . , l− 1, there
is a job among the firstl − 1 arrivals that finished service strictly earlier in system2 than in system1. This is a
contradiction. We have consequently established thatτi ≤ τ i for all i. The latter statement follows as a simple corollary.
2
We next use the result above to construct a first upper bound. We have:
Proposition 4
Wiτi +1
2W 2
i ≤W iτ i +1
2W
2
i − 1{
N t−i<N∗
} 1
2W
2
i
(
2k+ k2
(1+ k)2
)
PROOF: We begin with two elementary observations. First,
Wi ≤W i
always under our coupling and, in particular, ifN t−i≥N∗. Further
Wi ≤W i
1+ k
if N t−i<N∗. This follows from the fact thatN t >Nt (Proposition 3), so thatN t−
i<N∗ impliesNt−
i<N∗. It follows
that
Wiτi +1
2W 2
i ≤W iτ i +1
2W 2
i
≤W iτ i + 1{
N t−i≥N∗
} 1
2W
2
i + 1{
N t−i<N∗
} 1
2
W2
i
(1+ k)2
=W iτ i +1
2W
2
i − 1{
N t−i<N∗
} 1
2W
2
i
(
2k+ k2
(1+ k)2
)
The first inequality follows from the fact thatWi ≤ W i (by our coupling) andτi ≤ τ i (Proposition 3). The second
inequality follows from the two observations we made at the outset. 2
We next connect this result to the average workload in both systems (over a finite interval). LetN(T ) be the number
of jobs that have arrived duringt∈ [0, T ]. We have:
Proposition 5
1
T
∫ T
0
Wtdt≤1
T
∫ T
0
W tdt−1
T
N(T)∑
i=1
1{
N t−i<N∗
} 1
2W
2
i
(
2k+ k2
(1+ k)2
)
+W T
T
PROOF: Notice that the total workload contributed by jobi over time in system 1 is given by the quantityWiτi +
12W 2
i where the first term in the sum corresponds to the workload contributed while jobi waits, and the latter term
corresponds to the workload contributed while jobi is in service. We consequently have:
1
T
∫ T
0
Wtdt≤1
T
N(T)∑
i=1
(
Wiτi +1
2W 2
i
)
≤1
T
N(T)∑
i=1
(
W iτ i +1
2W
2
i
)
−1
T
N(T)∑
i=1
1{
N t−
i<N∗
} 1
2W
2
i
(
2k+ k2
(1+ k)2
)
=1
T
∫ T
0
W tdt+W T
T−
1
T
N(T)∑
i=1
1{
N t−i<N∗
} 1
2W
2
i
(
2k+ k2
(1+ k)2
)
32
2
Note that the last equality comes from the fact that not all ofthe work which arrives between[0, T ] is completed by
timeT ; henceWT remains. What remains is to take limits on both sides of the inequality established in the previous
result. To that end we begin with a few intermediary results.First, we provide a few definitions. We letE[W ] and
E[W ] be the expected work in ourM/M(f)/s system and anM/M/s system withρ= λ(1+k)
s, respectively.
Lemma 1limT1T
∫ T
0Wtdt=E[W ]
PROOF: This result follows directly from the renewal reward theorem and the fact that the system is stable. The
reward function is the cumulative work and is defined as:R(t) =∫ t
0Wτdτ 2
Lemma 2limT1T
∫ T
0W tdt=E[W ]
PROOF: Again, this result follows directly from the renewal reward theorem and the fact that the system is stable.
The reward function is the cumulative work and is defined as:R(t) =∫ t
0W τdτ 2
Lemma 3limT1T
∫ T
01{N t <N∗}dt=P (N t <N∗)
PROOF: Again, this result follows directly from the renewal reward theorem and the fact that the system is stable. The
reward function is the total time the number of jobs in the system is less thanN∗ and is defined as:R(t) =∫ t
01{Nτ <
N∗}dτ 2
Lemma 4limTWT
T= 0
PROOF: This follows from the fact that the system is stable and thusrecurrent. If we consider thatWT is upper-
bounded by the amount of work that arrives between[T ∗0 (T ), T ], whereT ∗
0 (T ) = sup{t < T :Wt = 0} is the last time
beforeT , the system was empty, then the fact that the system is recurrent establishes thatP (T − T ∗0 (T )<∞) = 1.
Assuming a finite first moment forWi gives the desired result. 2
We next establish a limit for the second term on the right handside of the inequality in Proposition 4.
Proposition 6
limT
1
T
N(T)∑
i=1
1{
N t−i<N∗
} 1
2W
2
i
(
2k+ k2
(1+ k)2
)
= λ(2k+ k2) limT
1
T
∫ T
0
1{
N t <N∗}
dt
PROOF: Let use denote, for notational convenience,
1
2EW
2
i
(
2k+ k2
(1+ k)2
)
= 2k+ k2 ,α.
and1
2
(
2k+ k2
(1+ k)2
)
, β.
We begin with observing that
limT
1
T
N(T)∑
i=1
1{
N t−i<N∗
}
α= λα limT
1
T
∫ T
0
1{
N t <N∗}
dt (10)
33
by PASTA. Next, observe that
E
[
limT
1
T
N(T)∑
i=1
1{
N t−i<N∗
}
α
]
= limT
1
TE
[
N(T)∑
i=1
1{
N t−i<N∗
}
α
]
= limT
1
T
N(T)∑
i=1
E
[
1{
N t−i<N∗
}
α]
= limT
1
T
N(T)∑
i=1
E
[
1{
N t−i<N∗
}]
EW2
i β
= limT
1
T
N(T)∑
i=1
E
[
1{
N t−i<N∗
}
W2
i
]
β
= limT
E
[
1
T
N(T)∑
i=1
1{
N t−i<N∗
}
W2
i
]
β
(11)
The first equality above follows by dominated convergence (using the dominating random variableN(T )/T ). The
fourth equality (which is crucial) follows since1{
N t−i<N∗
}
andW2
i are independent random variables. Recall
these are defined for the standardM/M/s system. Now, sincelimT1T
∫ T
01{
N t <N∗}
dt is a constant (by Lemma
3), (10) and (11) together yield
limT
E
[
1
T
N(T)∑
i=1
1{
N t−i<N∗
}
W2
i
]
β = λα limT
1
T
∫ T
0
1{
N t <N∗}
dt.
But from Lemma 5, which will come in Appendix B.1
limT
E
[
1
T
N(T)∑
i=1
1{
N t−i<N∗
}
W2
i
]
β = E
[
limT
1
T
N(T)∑
i=1
1{
N t−i<N∗
}
W2
i
]
β
= limT
1
T
N(T)∑
i=1
1{
N t−i<N∗
}
W2
i β
Using Lemmas 1 and 2 to replace the limit with expectations gives the desired result. This completes the proof.2
B.1. Existence of a Limit
Thepartial busy periodof anM/G/s queue is defined as the time between when an arriving customersees an empty
system and the first time after that at which a departing customer sees an empty system. We will use the following
result:
Theorem 3 (Ghahramani (1986))Themth moments of the partial busy period of anM/G/s queue are finite if and
only if the service time distribution has finitemth moments.
We denote byTm the lengthmth partial busy period. We can now establish:
Lemma 5Assume the service time distribution has finite fourth moments. Then,
limT
1
T
N(T)∑
i=1
1{
N t−i<N∗
}
W2
i
exists and equals a constant. Further,
limT
E
[
1
T
N(T)∑
i=1
1{
N t−i<N∗
}
W2
i
]
β = E
[
limT
1
T
N(T)∑
i=1
1{
N t−i<N∗
}
W2
i
]
β
34
PROOF: We first establish that
limT
1
T
N(T)∑
i=1
1{
N t−i<N∗
}
W2
i β
exists and is constant. To see this denote by1 = j1 < j2 < j3 . . . the arrivalsi for whichN t−i= 0. Observe that the
random variables
Xm ,
jm+1−1∑
i=jm
1{
N t−i<N∗
}
W2
i
are independent random variables. Moreover,∑jm+1−1
i=jm1{
N t−i<N∗
}
W2
i ≤ s2T 2m.Note that since we have assumed
the service time distribution has finite fourth moments, we have ET 4m < ∞. Now let M(T ) = sup{l|Ajl ≤ T };
M(T )→∞. The strong law of large numbers then implies that
limT
∑M(T)i=1 Xm
M(T )
exists and is a constant a.s. Further, a simple argument using Chebyshev’s inequality and the Borel Cantelli lemma
implies that
limT
XM(T)
T= 0 a.s.
Finally, the elementary renewal theorem implies thatlimTM(T)
T= 1/ET1. But,
∑M(T)i=1 Xm
M(T )
M(T )
T−
XM(T)
T≤
1
T
N(T)∑
i=1
1{
N t−i<N∗
}
W2
i ≤
∑M(T)i=1 Xm
M(T )
M(T )
T+
XM(T)
T
so that taking limits throughout and employing the above observations yields the first conclusion of the Lemma.
Now to establish the second conclusion, observe that
N(T)∑
i=1
1{
N t−i<N∗
}
W2
i ≤
N(T)∑
i=1
W2
i
and that
E
N(T)∑
i=1
W2
i = EN(T )EW 2i = λTEW 2
i
where the first equality is Wald’s identity. Consequently, we may apply the conclusion of the first part of the theorem
along with the dominated convergence theorem to establish the second conclusion of the theorem.
2