risk prediction at the emergency department165310/fulltext01.pdf · prognostic similarity is not...

64
Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1383 Risk Prediction at the Emergency Department BY THOMAS OLSSON ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2004

Upload: others

Post on 01-Feb-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Medicine 1383

Risk Prediction at the EmergencyDepartment

BY

THOMAS OLSSON

ACTA UNIVERSITATIS UPSALIENSISUPPSALA 2004

Page 2: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic
Page 3: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

List of Papers

This thesis is based on the following papers, which will be referred to by their Roman numerals:

I. T Olsson, A Terént, L Lind. Rapid Emergency Medicine score: a new prognostic tool for in-hospital mortality in non-surgical emergency department patients. J Intern Med. 2004 May;255(5):579-87.

II. T Olsson, L Lind. Comparison of the Rapid Emergency Medicine Score and APACHE II in Nonsurgical Emergency Department Patients.Acad Emerg Med.2003 Oct;10(10):1040-1048

III. T Olsson, A Terént, L Lind. Rapid Emergency Medicine Score can predict long-term mortality in non-surgical emer-gency department patients. Acad Emerg Med. 2004 Oct;11(10):1008-1013

IV. T Olsson, A Terént, L Lind. Charlson Comorbidity Index can add prognostic information to Rapid Emergency Medicine Score as a predictor of long-term mortality. Submitted.

Reprints were made with permission of the publishers.

Page 4: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

Contents

Introduction.....................................................................................................7Emergency Medicine..................................................................................7Scoring systems..........................................................................................8Co-morbidity ............................................................................................11The process of developing a new predictive instrument ..........................12

Check list for the creator of the “ideal” Scoring System.....................13Validating the instrument .........................................................................13Evaluating accuracy .................................................................................13Identifying sources of biases ....................................................................14

Selection bias .......................................................................................14Lead time bias......................................................................................14Detection bias ......................................................................................14

Balance between reliability and content validity......................................15Methodological rigour..............................................................................15Applications of scoring systems...............................................................15

Research...............................................................................................16Resource Use .......................................................................................16Quality Assurance................................................................................16Patient prediction .................................................................................17

Severity of illness scoring in the Emergency Department .........................17

Aims of the study ..........................................................................................18Major aim .................................................................................................18Secondary aims ........................................................................................18

Methods ........................................................................................................19The total cohort ........................................................................................19The sub-samples.......................................................................................20The scoring procedure ..............................................................................21Measurements...........................................................................................23Follow-up .................................................................................................24Statistical analyses....................................................................................24

Page 5: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

Results...........................................................................................................26Rapid Emergency Medicine Score (REMS) can predict in-hospital mortality in non-surgical ED patients over a wide range of disorders(paper I) ....................................................................................................26A comparison of APACHE II and REMS at the ED (paper II)................33REMS can predict long-term mortality in non-surgical ED patients (paper III) .................................................................................................38Charlson co-morbidity index can add prognostic information to Rapid Emergency Medicine Score as a predictor of long-term mortality (paper IV) .................................................................................42

Discussion.....................................................................................................44Development of REMS ............................................................................44Evaluation of REMS ................................................................................45REMS vs APACHE II..............................................................................45Long-term mortality .................................................................................47Co-morbidity ............................................................................................47Strengths and limitations ..........................................................................48Future perspectives...................................................................................51

Conclusions...................................................................................................54Major aim .................................................................................................54Secondary aims ........................................................................................54

Acknowledgements.......................................................................................55

Sammanfattning på svenska..........................................................................56

References.....................................................................................................58

Page 6: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

Abbreviations

ED Emergency Department APACHE Acute physiology and chronic health evaluation RAPS Rapid acute physiology score CCI Charlson co-morbid index ROC Receiving operating characteristic OR Odds ratio HR Hazard ratio CI Confidence interval SEM Standard error of mean

Page 7: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

7

Introduction

Emergency Medicine The history of emergency medicine in the Western world is relatively brief. The baby boom era resulted in an immense increase in population after World War II, which consequently led to urban sprawl.

House calls by physicians became increasingly burdensome and primary care doctors began to triage patients away from the consulatation rooms directly to hospital care. The combination of vanishing house calls, increas-ing numbers of patients and economic support for hospital-based services, compelled patients to flock to hospitals for care. Hospitals began to reorgan-ise to form centres where actual life-sustaining care could be provided.

Emergency Medicine matured as a clinical speciality in the United States as early as the 1960s (1). Many other Western countries have not yet formal-ized this speciality, but all countries with Emergency Departments (EDs) have experienced similar problems during the past two decades i.e.- EDs overcrowded with indigent patients, especially in the large cities. The pres-sure on the EDs has been escalating due to the increasing number of patients, advancing technology in diagnostics and of course the dwindling financial resources available to the health care system.

The ED is the main gateway to the medical system and subsequently to health care expenditure. The future will undoubtedly see ongoing analyses of the balance between access, quality, and cost for populations throughout the entire health care system. Providers will increasingly come to rely on an evidence-based approach to health care decisions, using data from popula-tion-based research to justify promising new therapies and abandon treat-ments that add little to patient outcome.

The outcomes measured in emergency medicine research are often short-term (i.e. during the period of time spent at the ED), with less attention being paid to the impact that an ED visit has on an episode of illness that includes other places providing care. Although short-term outcomes such as admis-sion rate, speedy changes in physiology due to ED treatment, or time allotted for treatment are often important, population-based data will become more important in the future. What role does emergency medicine play in a popu-lation-based health care system? What quality does emergency medicine add to the health care system? These are significant questions, so it would there-fore be of immense value to empower some kind of objective quantification

Page 8: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

8

of illness for the patients attending the EDs. A physician’s ability to accu-rately assess illness severity is critical to the practice of medicine. Many therapeutic decisions are dependent on the accuracy of physicians´ clinical judgement of severity. As crucial as such assessments are, physicians are not explicitly taught to judge just how sick patients are. In fact, as Feinstein ar-gues; “We often do not define the word (sickness), but we know what it communicates: a sense of urgency in the need for treatment, or a poor prog-nosis, or a difference between two patients with the same diagnosed dis-ease…” (2). Allowing for several disease-specific exceptions, we really do not know how physicians make such assessments (3). In studies of therapeu-tic efficacy or effectiveness for instance, investigators must insure that pa-tients assigned to different treatments have equal susceptibility for the de-velopment of adverse outcomes, such as morbidity and mortality. If such prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic disparities rather than to the effects of treatment. As Simon argues with respect to cancer trials: “the variability in prognosis among patients is greater than the size of the treatment differences usually seen. Consequently, failure to understand and adequately account for patient heterogeneity easily leads to unreliable claims and inefficient trials” (4-5).

An objectified severity classification in the ED could be used to evaluate the use of hospital resources and to compare the efficacy of different EDs in a short term as well as a long-term perspective. A severity of illness classifi-cation in the ED combined with an accurate description of the disease could also prognostically stratify acutely ill patients and assist investigators com-paring the success of new forms of therapy.

There is currently no such validated, objective instrument for assessment of the severity of illness, available for use at the ED.

Scoring systems A number of systems have been developed over the recent years, to score the severity of illness based on the physiological response to the illness or in-jury, (6). They have mainly focused on the critically ill, and tend to be used for predicting the outcome of a group of patients rather than an individual. They are not usually specific enough to be used as the basis for making indi-vidual clinical decisions.

Acute Physiology And Chronic Health Evaluation II (APACHE II) is probably the best-known score and is used extensively in intensive care scor-ing. The original Acute Physiology And Chronic Health Evaluation (APACHE) system used 34 physiological variables, taking the worst value obtained in the first 24 h of admission to intensive care, combined with a simple evaluation of chronic health, whereby the patient was allocated to one

Page 9: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

9

of four classes, A to D (7). The aim of the APACHE system was to allow classification of patients on the basis of severity of illness, enabling com-parison between like groups for such purposes as outcome assessment and the appraisal of new therapies. APACHE II is a modification in which the physiological component has been simplified by a reduction from 34 to 12 variables. Scores are added for age and chronic disease, resulting in a possi-ble total score ranging from 0 to 71(8) (See table 1).

Table 1. The Apache II severity of disease classification system.

PHYSIOLOGICAL VARIABLE High abnormal range Low abnormal range

A +4 +3 +2 +1 0 +1 +2 3+ +4 TEMPERATURE- rectal >40.9 39-40.9 38.5-38.9 36-38.4 34-35.9 32-33.9 30-31.9 <30

MEAN ARTERIAL PRESSURE MmHg

>159 130-159 110-129 70-109 50-69 <49

HEART RATE >179 140-179 110-139 70-109 55-69 40-54 <39

RESPIRATORY RATE >49 35-49 25-34 12-24 10-11 6-9 <5

OXYGENATION: PaO2 >70 61-70 55-60 <55

ARTERIAL PH >7.7 7.6-7.69 7.5-7.59 7.33-7.49 7.25-7.32 7.15-7.24 <7.15

SERUM SODIUM (mmol/L)

<180 160-179 155-159 150-154 7.33-7.49 7.25-7.32 7.15-7.24 <7.15

SERUM POTASSIUM >7 6-6.9 5.5-5.9 3.5-5.4 3-3.4 2.5-2.9 <2.5

SERUM CREATININE (MG/100 ML) (Double point score for acute renal failure)

>3.5 2-3.4 1.5-1.9 0.6-1.4 <0.6

HEMATOCRITE (%) >60 50-59.9 46-49.9 30-45.9 20-29.9 <20

WHITE BLOOD COUNT >40 20-39.9 15-19.9 3-14.9 1-2.9 <1

GLASGOW COMA SCALE Score= 15 minus actual GCS

<5 5-7 8-10 11-13 >13

Total Acute Physiology Score (APS) Sum of the 12 indi-vidual variable points

B C AGE POINTS Assign points to age as follows

CHRONIC HEALTH POINTS APACHE II SCORE

AGE POINTS <45 0 45-54 2 55-64 3 66-74 5 >74 6

If the patient has a history of severe organ system insufficiency or is immunocomprimised assign points as follows:

a. For non-operative or emergency postopera-tive patients-5 points

Or

b. For elective postoperative patients-2 points

Sum of A + B + C

A APS points-------

B Age points--------

C ChronicHealth points-------- Total APACHE II________

Page 10: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

10

The authors concluded that APACHE II could be helpful for investigators to determine whether control and treatment groups are similar in intervention studies. In addition, they proposed that an expected death rate based on the APACHE II score could be compared to the actual death rate, as a test of therapeutic efficacy. It was also suggested that data from very carefully de-scribed groups of patients could be used for clinical decisions, when inte-grated with a particular patient’s overall clinical course.

APACHE II has been validated in both general (9-12) and surgical (13-17) intensive care patients. Some studies (13, 14) have found the score to have the best prognostic characteristics when applied to emergency surgical patients, as opposed to elective surgical and non-surgical patients.

The APACHE III score (18) is an additional modification, which takes into consideration 18 physiological variables and chronic ill health. Its accu-racy is comparable to that of APACHE II and other scores (10), although its use has been limited, as the equation required for calculating the predicted mortality has not been published and is only available by purchase from the distributor.

The Simplified Acute Physiology Score (SAPS) is another derivation of the APACHE score, using 14 of the original 34 variables (19) to predict death and is comparable to APACHE II (10). SAPS II is a revision of this score using 13 physiological variables, as well as form of admission (elec-tive or emergency, surgical or medical) and chronic health points if AIDS, metastatic malignancy or haematological malignancy is present (20). In a comparison with APACHE II (21), SAPS II provided a better assessment of mortality risk in intensive care patients, although neither system performed well enough to be considered a reliable indicator of outcome in the individ-ual patient.

The Mortality Probability Model (MPM) published in 1985 (22), was also based on data from ICUs in the United States. It was developed using logis-tic regression techniques to identify and assess the variables that were then included in the model.

A multitude of other predictive models have been constructed for a vari-ety of situations. Some are designed for the general ICU population (23). However, many are usually designed from and intended to be applied to specific patient populations. The Physiological and Operative Severity Score for enumeration of mortality and morbidity (POSSUM) has been devised specifically for prediction in surgical patients in general (24) and also in specific groups, such as patients with vascular and colorectal disease (25-26). There are others designed for prediction in patients with multiple organ failure (27-29), sepsis (30), acute coronary syndromes (31-32), stroke (33)and asthma (34).

Most of these specific models have been constructed for the evaluation of patients suffering from trauma. The Trauma Injury Severity Score (TRISS)

Page 11: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

11

(35) is widely used at trauma centres around the world but there are several others in practise (36-38).

The APACHE II, and most of the disease-specific scoring systems, in-clude several blood chemistry variables, which require time-consuming analyses and are therefore not suitable for rapid scoring in the emergency room.

The Rapid Acute Physiology Score, RAPS, an abbreviated version of APACHE II including the physiological variables pulse rate, blood pressure, respiratory rate and Glasgow coma scale, has been previously evaluated as a prehospital scoring system in a group of helicopter-transported patients (39). It was concluded that RAPS could be useful when applied as a complement to APACHE II, but may have a limited value when used separately.

Co-morbiditySeveral researchers during the past 25 years have demonstrated that the in-fluence of co-morbidity on a prognosis is also important. Kaplan and Fein-stein classified clinical ailments by three grades of severity in their study of the effect of co-morbidity on diabetic outcome (40). Gonella et al advanced the staging concept, which defines different levels of severity for specific diseases, based primarily upon manifestations and complications of each disease (41-43). Horn et al produced the Severity of Illness Index, a disease-generic method which determines severity from seven indicators: stage of the principal diagnosis, its interaction with other diagnoses, patient rate of response to treatment, residual impairment after therapy, complications de-pendent on medical care, and the extent of non-operating room procedures required (44, 45).

Horns` group has also developed the Computerized Severity Index, a dis-ease-specific tool which uses objective clinical findings to rate severity for each ICD-9-CM classification code (46). Computerized scoring provides a severity level score for overall severity from co-morbid conditions (46).

The Charlson co-morbidity index was originally designed to classify prognostic co-morbidity in longitudinal studies (47). It has been used in a number of studies to stratify patients, in order to control the confusing influ-ence of co-morbid conditions on overall survival (48-50).

Charlson et al. have furthermore emphasized the importance of physician clinical judgement in the determination of severity (51, 52). Assessments by resident physicians were used to gauge severity, based upon how sick they considered each patient to be. Residents also rated patient functional ability and gave their prognoses for the patient’s five-year survival. Follow-up stud-ies showed that predictors of 1-year survival after hospitalisation were func-tional ability, severity of illness, extent of co-morbid disease and physician prognosis for survival. (52).

Page 12: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

12

The importance of co-morbidity was also demonstrated by Greenfield et al, in their studies of the Co-morbid Index, which measures baseline co-morbid severity, acute exacerbations, and patient functional status (53).

Most of the current methods depend upon encounter form information and/or medical record audits for their data. While these methods have the advantage of being relatively easy to use, and do not encumber the medical provider with the need for data collection, they have the distinct disadvan-tage of producing second hand information from records, which is often an abbreviated version of what the medical provider knows about the patient. It could therefore be of considerable interest to combine a severity of disease classification system based on physiologic response to the illness, with one based on co-existent diseases.

The APACHE II system (8) includes a measure of co-morbidity, called Chronic Health Points but this only comprises information about the liver, cardiovascular, respiratory or renal systems. Some prognostic models for critically ill patients, such as SAPS (19), MPM (22) and RAPS (39) ignore co-morbidity entirely. Roy et al (54) concluded that use of detailed informa-tion about co-morbidity, captured by the Charlson Comorbidity Index, could improve the predictive accuracy of the APACHE II in critically ill patients.

In an attempt to add prognostic information to the RAPS in the ED, a sub-set of patients in this study was scored according to the Charlson Co-morbidity Index.

The process of developing a new predictive instrument To understand how these scores are used in clinical research and practice and to be able to compare one model with another, one must first be familiar with several methodological concepts. Most of the current models described above rely on logistic regression equations to provide mortality, and to a less extent, length of stay as indicators of effectiveness and quality of care. The “ideal” predictive instrument has probably not yet been invented, but in or-der to create a new acceptable scoring instrument it is important to point out a few “lessons” learned from mistakes made in earlier attempts (55).

Page 13: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

13

Check list for the creator of the “ideal” Scoring System

Table 2. A summary of the aspects of the development of a new severity of disease classification system presented as a checklist (Ref 55-65).

Important components of the development process Done ( )

Validation of the instrument e.g. split-sample technique [ ]

Evaluate discriminatory power of the predictive model Calculate ROC-curves, SEM and p-values

Evaluate calibration of the predictive model Goodness of fit analyse, Hosmer-Lemshow

[ ]

[ ]

Avoid selection bias [ ] Avoid detection bias [ ] Avoid lead-time bias [ ] Methodological rigour [ ] Large data-base [ ] Routinely collected treatment independent variables [ ] Balance between reliability and content validity [ ] As few subjective variables as possible [ ] As few variables as possible [ ] How to deal with missing values? [ ]

Validating the instrument The scoring system can be validated internally by using the so-called split sample technique (56) (a portion of the group from which the data were originally collected, but from which data were not used to develop the model), or a completely new sample (57).

Evaluating accuracy When a likelihood ratio for the scoring system has been provided from a logistic regression model, two fundamental concepts have to be considered when evaluating the accuracy. The explanatory power of a predictive system is described in terms of discrimination and calibration.

Discrimination refers to a model’s ability to distinguish whether a patient died or survived. It’s generally measured by the area under the receiver-operating characteristic (ROC) curve, which plots the sensitivity against the specificity of the model over a wide range of mortality cut-off points. As the area under curve (AUC) increases from 0.5 (random chance) to 1 (perfect discrimination), the accuracy and predictive power of the model improves.

Page 14: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

14

Generally an AUC of 0.8 or better is expected for mortality predictions in current models and scores of 0.7 or less are considered to be unacceptable (58, 59).

The calibration of a model refers to how well the predicted risk compares to the observed outcome over a stratified range of potential risks. These goodness-of-fit analyses can be displayed in several ways, such as bar graphs that plot actual to predicted mortality (calibration curves), and/or with chi-square-type statistic developed by Hosmer and Lemeshow (60).

Identifying sources of biases Selection bias The test sample does not adequately represent the population to which the model will be applied. The sample can be too small or it can lack breadth regarding a variety of disease processes. This flaw usually appears when the model is shown to be inaccurate in certain subgroups. The selection of vari-ables and their weightings can also be a potential source of selection bias. It is well known that relying on a physician’s interpretations (subjective vari-ables) is not as reproducible as laboratory tests (61).

Lead time bias Lead-time bias occurs when a model is applied after certain assumptions about the model are no longer true. In the case of APACHE II, lead time bias results when patients are partially treated prior to ICU admission (at the ED or another hospital) (62-64). Doing so violates one of the APACHE prem-ises, namely, that the score reflects the physiologic severity of the underlying cause for ICU admission, independent of any treatment that would subse-quently be given.

Detection bias Detection bias is found not only in predictive instruments, but also in any study where data are collected. It refers to the issue of missing data and how this issue is addressed. This has been noted to be a potential problem with the APACHE-models. The way in which this problem has been rectified by the authors is to assume a missing variable to be normal and to assign it a weight of zero. However, the assumption of normality for parameters that are not routinely obtained or are missing can lead to bias. For instance, the patient with chronic renal failure, who has an abnormally high creatinine level, may not have this laboratory test as a daily routine check-up.

Page 15: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

15

Another source of detection bias arises in mental status examinations, such as the Glasgow coma scale. Error can be introduced into this measure due to treatment of sedated or chemically paralysed patients.

There have been several attempts to reduce this potential source of bias. One of the most frequently used techniques is to design models that use as few variables as possible and yet still obtain an adequate relationship be-tween predictors and the outcome. Current methods under investigation in-clude the construction of a data hierarchy, which would decide which vari-ables could be left out without damaging the predictive ability, or actually weighing missing variables (65).

Balance between reliability and content validity Content validity reflects the comprehensiveness of a model (66). Mortality may be influenced by a number of factors that may be difficult to quantify, such as the number of organ systems failing and the duration of failure. Reli-ability refers to inter (between) and intra (within) observer agreement in the use of any severity of illness score and represents the agreement in data col-lection. Polderman et al (67) noted that the most common sources of persis-tent variability were difficulty in agreeing on chronic health conditions, in-complete data, mistaken exclusion of data and calculation errors. Therefore, a balance between content validity and reliability must be struck. The inclu-sion of many variables (over-fitting) may actually reduce the performance of the model because some of these variables will be correlated with the out-come solely by chance. The more variables required and the more complex they are to obtain, the less likely they are to be included in the instrument (68, 69).

Methodological rigour If a mathematical technique, such as multivariate logistic regression is used to create a predictive system, it must be based on a large database of all con-secutive eligible patients. It is also important that the variables used are rou-tinely collected and independent of intervention (65).

Applications of scoring systems The wide range of uses of predictive instruments has been described by Hyzy (70).

Page 16: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

16

ResearchScoring systems can be used to stratify patients into equal groups for pro-spective trials.

Furthermore, differences in baseline populations can be analysed and ex-amined for misleading influences. This technique has been used in a number of retrospective studies (71).

Resource Use With the APACHE II model Wagner and colleagues (72) were able to dem-onstrate the section of the ICU-population not receiving any intervention at all. In a study of almost 2000 patients, who were only monitored at the ICU, it was noted that approximately 70% of the patients had an APACHE II pre-dicted mortality of less than 10%. Of these 1400 patients, less than 5% re-ceived any kind of active therapy. It was noted in this study that the APACHE II could be applied to triage the patients into the ICU when there were more requests for beds than there were beds available. Becker and col-leagues (73) used APACHE III successfully to compare length of stay in the ICU between different hospitals, for patients who underwent coronary artery by-pass grafting. These models may help health care providers gauge whether they are using ICU resources in accordance to the institutions from which the models were constructed, or other institutions using the instru-ment. Efforts could then be made to correct deficiencies that quite possibly may be responsible for the potentially poor handling of resources.

One of the current shortcomings of severity scores is their failure to esti-mate functional outcome or morbidity. The combination of a high risk of mortality and significant reduction in functional ability could be a powerful tool for the decrease in intensive care support. Unfortunately, at the moment, not enough is known about the impact of critical illness on functional ability and quality of life to allow for the development of predictive scores.

Quality Assurance In one study, the Paediatric Risk of Mortality instrument was used to assess possible differences in the care of paediatric patients in tertiary and non-tertiary care centres. Comparisons could be made among the centres with regard to the care of patients, stratified by risk. It was found that the tertiary care centres were better equipped to care for children in the higher illness severity strata (74). However, differences in patient selection and process can masquerade as differences in quality of care, and the differences in pre-dicted outcomes cannot be used as the sole basis for quality control (75).

Page 17: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

17

Patient prediction In acute diseases it is not quite safe to prognosticate either death or recovery.

Hippocrates

The most controversial potential use for predictive models is to use them to affect individual case management. Especially for issues of withholding or withdrawing treatment (76, 77) these twin problems apply to the extremes of predicted risk. On average, at the extremes, clinical judgement seems to be equal to, if not better than predictive models (78, 79). However, variability in clinical estimates is considerable (80) and the tendency to overestimate risk clinically can be significant (81). Schafer et al. (82) conducted a study analysing APACHE II, SAPS and MPM in a population of almost 1000 pa-tients at an ICU. These investigators found areas under the ROC in the range of 0.75 and poor results in goodness-of-fit testing in all the major modern predictive models. Another more recent study examined the use of daily APACHE II scores and the use of high decision thresholds to avoid false predictions of death. Not only were there patients who were predicted to die who survived, but also the sensitivity of the instrument was reduced so much that the model was rendered practically useless (83). By definition, no sys-tem will ever be able to predict an unpredictable outcome. The decision to withdraw care from a patient will never be made easily or with absolute cer-tainty. This is true with or without the presence of predictive instruments.

Severity of illness scoring in the Emergency Department The purpose of this thesis was to create a scoring system for non-surgical patients attending the ED, regarding in-hospital mortality and length of in-hospital stay (LOS). RAPS, the abbreviated version of the widely practised and extensively evaluated scoring system APACHE II was chosen.

The greatest advantage with the RAPS system as a prognostic tool in the ED would be the simplicity of the scoring procedure, since the four parame-ters included are easily collected even in the emergency situation. However, it would be of great value to improve the predictive accuracy of RAPS with-out complicating the system and making it less available. Body temperature and peripheral oxygen saturation are both easy to obtain in the ED. Chrono-logical age is a well-documented risk factor for death from acute illness, independent of the severity of disease (8).

Co-morbidity is another potential component of interest in the effort to create an accurate emergency scoring tool regarding both short-term- and long-term mortality.

To develop this new predictive model, data were collected in more than 12 000 consecutive non-surgical patients admitted to the ED during the pe-riod of one year.

Page 18: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

18

Aims of the study

Major aim To define a Rapid Emergency Medicine Score (REMS) as a predictor of in-hospital mortality in the non-surgical ED and to see if there is any relationbetween REMS and length of stay in a cohort of consecutive 12 000 non-surgical patients.

Secondary aims To compare REMS with the more complicated APACHE II system regard-ing discriminating ability and predictive accuracy for in-hospital mortality in a sub-sample of the non-surgical patients admitted to the ED.

To investigate if the REMS could predict long-term mortality in non-surgical patients admitted to the ED.

To investigate if co-existing medical disorders (Charlson Comorbidity In-dex) in non-surgical patients attending the ED could predict short-term and/or long-term mortality, and if it could add prognostic information to the REMS.

Page 19: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

19

Methods

The total cohort From October 1995 to November 1996 data were prospectively and consecu-tively collected in 12 006 consecutive non-surgical adult entries to the ED, at the University Hospital of Uppsala, Sweden, which has a 1 200 bed capacity. All 6 physiologic measurements could be collected in 97.9 % of the entries. Basic characteristics of these 11 751 patients constituting the cohort are given in table 3.

Table 3. Baseline characteristics and hospital course of the 11 751 patients pre-sented to the ED.

Mean St dev Min max median

Age 61.9 20.7 15 102 67 Length of stay (days) 3.2 5.7 0 117 1 Sex (female) 51.6 % Hospitalised 55.9 % In hospital mortality 2.4 % Mortality within 48 h 1.0 %

If the reason for admission was cardiac arrest (n=52) and the patients could not be resuscitated they were excluded, since these patients per definition would achieve a maximal score and thereby would skew the analyses. Pa-tients with more than 1 parameter missing in the protocol were excluded (n=185). If only one parameter was missing (n=1393) the data were col-lected later from medical records. If this single parameter was not mentioned in the patient’s record (n=82) it was regarded as normal.

The data included were gender, age, the main symptom presented on ad-mission, i.e. the reason for attending the ED, and six physiological meas-urements: blood pressure, pulse rate, Glasgow coma scale, respiratory rate, peripheral oxygen saturation and body temperature.

The nurse in charge of the patients applied the eight-point standard ex-amination within 20 minutes following admission. The protocol was ap-proved by the Local Ethics Committee.

Page 20: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

20

The sub-samples Sub-sample no 1

A database was available at our ED with REMS scores on every patient admitted to the ED from November 1, 1995 to November 1, 1996 (n= 11751). However, aim no 2 included both REMS and APACHE II, and there were no resources allocated for such a tremendous undertaking. We com-promised by taking the REMS score prospectively in 885 patients from the first cohort, who were admitted to a medical department during January and February 1996. Since this was a comparison of a new scoring system and a previously extensively evaluated scoring system in critically ill patients with a high mortality rate, we regarded it necessary to get a case mix consistent of a substantial part with critically ill patients. Therefore all patients who were referred directly to the ICU from November 1, 1995 to November 1, 1996 (n=185) were also enrolled, and the laboratory tests of the APACHE II were collected retrospectively, in both groups. These two groups were merged into sub sample no 1.

The 185 critically patients were involved in the following way: A call was made from the prehospital setting (ambulance, helicopter) to the ED, after which the nurse in charge informed the ICU about the in-coming patient (general ICU: n=77, Neuro-ICU: n=23 and Coronary Care Unit: n=62) and noted this in the log book. This was our method of identifying the worst subjects. These patients were in very poor condition and were not stopped to be checked by the emergency physician, but were transported directly to the ICU. However our nurse in charge had been instructed about the procedure of taking the physiological parameters in REMS (Respiratory rate, heart rate, blood pressure, peripheral oxygenation and coma) so these parameters were taken in the ED during transportation to the ICU. If the reason for admission was cardiac arrest (n=9) and the patients could not be resuscitated in the field, they were excluded, since these patients per definition would achieve a maximal score and thereby would skew the analyses. In the first group 20 of the 885 patients had more than one physiological parameter missing and were therefore excluded. If only one parameter was missing (n=62) the data was collected a later date from their medical records. If this single parameter was not mentioned in the record (n=8), it was regarded as normal (respira-tory rate, n=4, pulse rate, n=4). The mortality rate in this subgroup (n=865) was 7,5% ± 0.26 SD and 4.2 % ± 0.2 died during the first 48 hours (n=62). In the second subgroup, 23 of the 185 patients had more than one REMS-variable missing and were therefore excluded. The mortality rate in this sub-group of the population (n=162) was 31.5% ± 0.47 and 14.8 % ± 0.36 died during the first 48 hours (n=62). Thus, 43 patients were excluded and the remaining total sub-sample no 1 consisted of 1027 patients.

Page 21: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

21

Sub-sample no 2 The sub-sample no 2 consisted of the 865 patients from the first cohort,

who were admitted to a medical department during January and February 1996. In the attempt to improve REMS by adding co-existing diseases (paper 4) we regarded this sample, not including the 185 critically ill patients and with a subsequently less complicated methodology, as an acceptable case mix regarding the mortality rate. The Basic characteristics of these patients comprising both the sub-samples are given in table 4 and 5.

Table 4. Baseline characteristics, hospital courses in a sub-sample of 1027 patients in whom APACHE II was collected.

Mean St dev Min max median

Age 70 18.1 16 100 75 Length of stay (days) 5.0 7.1 0 70 3 Sex (female) 50.4 % In hospital mortality 11.3 % Mortality within 48 h 5.8 %

Table 5. Baseline characteristics, hospital course s in a sub-sample of 865 patients in whom co-existing diseases were collected.

Mean St dev Min max median

Age 68.8 7.1 16 100 74 Length of stay (days) 4.7 7.1 0 70 3 Sex (female) 50.9 % In hospital mortality 7.5 % Mortality within 48 h 4.2 %

The scoring procedure APACHE II (4) uses a point score based upon values of 12 routine physio-logical measurements, as well as age and previous health status. The vari-ables included in the APACHE II system are: body temperature, mean arte-rial pressure calculated from systolic and diastolic blood pressure, respira-tory rate, heart rate, oxygenation of arterial blood (PaO2), arterial pH, serum sodium, serum potassium, serum creatinine, hematocrite, white blood count and Glasgow Coma Scale (GCS). The acute physiology score is determined from the most deranged physiologic values collected during the initial 24 h after ICU admission. As previous organ failure and age are well-documented risk factors for death from acute illness, independent of the severity of dis-

Page 22: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

22

ease, both are incorporated into the APACHE II score. The maximal APACHE II score is 71 (Table 1).

RAPS (5) was developed by including the variables in APACHE II easily obtained in a prehospital setting. These variables were pulse rate, blood pres-sure, respiratory rate and GCS. The scoring procedure was identical to that of APACHE II, except for GCS, where the relative weight was reduced by two thirds, compared with the APACHE II. The scoring range for each vari-able was 0 to 4 and the maximal score was 16 in the RAPS system.

In our attempt to improve the predictive power of RAPS, peripheral oxy-gen saturation (0-4 points) and body temperature (0-4) were added to the four RAPS variables. Age was also added to RAPS, and was weighted as in APACHE II (0-6 points). This scoring procedure is given in table 6.

Table 6. The scoring procedure for the parameters in the RAPS, peripheral oxygen saturation, body temperature and for age.

PHYSIOLOGICAL VARIABLE High abnormal range Low abnormal range

+4 +3 +2 +1 0 +1 +2 3+ +4 BODY TEMPERATURE >40.9 39-40.9 38.5-38.9 36-38.4 34-35.9 32-33.9 30-31.9 <30

MEAN ARTERIAL PRESSURE

>159 130-159 110-129 70-109 50-69 <49

HEART RATE >179 140-179 110-139 70-109 55-69 40-54 <39

RESPIRATORY RATE >49 35-49 25-34 12-24 10-11 6-9 <5

PERIPHERAL OXYGEN SATURATION

<75 75-85 86-89 >89

GLASGOW COMA SCALE <5 5-7 8-10 11-13 >13

TOTAL SUM OF SCORING POINTS

AGE POINTS Assign points to age as follows

AGE POINTS <45 0 45-54 2 55-64 3 66-74 5 >74 6

As in the original work by Charlson et al (47) all co-morbid diseases were recorded. Conditions that had completely resolved (i.e. a history of operation for currently inactive conditions such as cholecystectomy) were not counted as co-morbidity. The scoring procedure as shown in table 7 is performed

Page 23: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

23

according to Charlson’s description in her original work (47). The different disorders were weighted according to their impact on mortality.

Table 7. The scoring procedure for the Charlson Co-morbidity Index as originally described by Charlson et al (47).

Weight Condition

1 Myocardial infarct Congestive heart failure Peripheral vascular disease Cerebrovascular disease Dementia Chronic pulmonary disease Ulcer disease Mild liver disease Diabetes 2 Hemiplegia Moderate or severe renal disease Diabetes with end-organ damage Any tumour Leukaemia Lymphoma 3 Moderate or severe liver disease 6 Metastatic solid tumour

MeasurementsBody temperature was taken in the ear, as this method is the standard proce-dure in our ED. Saturation of arterial blood was measured with a pulse oxi-meter.

The nurse in charge assessed the patient’s level of consciousness – normal or abnormal. If it was abnormal, a note was made in the study protocol. In the case of deterioration of consciousness medical records were evaluated to collect information regarding the level of consciousness. The level of con-sciousness was not well described in 18 of these 699 patients. These 18 were excluded from the study. In the other 681 cases, the information was col-lected easily. Respiratory frequency was missing in 302 patients. This was however collected from 250, from their medical records, described by a number or by words, and then translated into numbers according to a specific protocol (See table 8). If the respiratory frequency was not mentioned at all, it was considered to be normal.

Page 24: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

24

Table 8. Scoring on respiratory frequency by information collected from medical records.

Characteristics of respiration score

“Moderate dyspnoea” 2 p “Fast breathing”,” severe dysp-noea”

4 p

Not mentioned 0 p “Slow breathing” “Sporadic apnoeas”

2 p

“Frequent apnoeas” “Long apnoeas”

4 p

Systolic and diastolic blood pressure was measured with an automated arm cuff and converted into mean arterial BP by the standard formula (SBP-DBP/3+DBP).

Information about co-existent diseases, as well as the number of drugs used was collected retrospectively from medical records.

The variables in APACHE II that were not collected in the REMS, i.e. se-rum sodium, potassium, creatinine, hematocrite, and white blood count were analysed from blood samples drawn in the ED in the sub-sample. Arterial pH was not used in the scoring system since this variable is not measured rou-tinely in the ED or in the medical department.

Follow-up The cohort was followed with a mean follow-up period of 4.67 0.29 SD years. Outcome events were assessed using data from the Swedish Hospital Discharge and Cause of Death Registries.

Statistical analyses Student’s unpaired t-test was used only to calculate differences between groups for normally distributed variables, such as age.

Differences between groups regarding scores, which have a skewed dis-tribution (REMS; APACHE II), were calculated with Mann-Whitney U-test or Kruskal-Wallis test.

The correlation between the scoring systems (REMS and APACHE II) was evaluated using the Spearman Rank Correlation Coefficient, as were correlations between scoring systems and length of stay in hospital.

Both univariate and multivariate logistic regression models were per-formed to calculate Odds ratios, likelihood ratios, 95% CI and p-values for the different scoring systems.

Page 25: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

25

When long-term mortality was investigated, the Cox Hazard regression analysis was used to generate Hazard ratios and likelihood ratios for the REMS system. Kaplan-Maier life-table analysis was also performed with the log-rank test, in order to evaluate differences between groups.

Discriminatory power for the scoring systems was assessed by using Re-ceiver Operating Characteristic (ROC) curves. The SEM and p-values for the ROC-curves, as well as comparisons between them were calculated by the Hanley and McNeil methods (63, 64).

Calibration of the model is an evaluation of the extent to which the esti-mated probabilities of mortality of the model correspond to observed mortal-ity rates. Calibration of REMS and APACHE II in study I and II, was evalu-ated with the Hosmer-Lemshow goodness-of-fit test (22).

Validity was assessed by using so called split-sample technique (61). The total sample was split into two equal parts and evaluated independently.

Statistic analyses were performed using the Stat-View 5.0 package. Hos-mer-Lemshow goodness of fit statistics was carried out using the Statistical Package for the Social Sciences (SPSS) for Windows.

P-values below 0.05 were considered as significant.

Page 26: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

26

Results

Rapid Emergency Medicine Score (REMS) can predict in-hospital mortality in non-surgical ED patients over a wide range of disorders (paper I) Univariate logistic regression in the total sample showed all 6 measured physiological parameters and age to be significant predictors of in-hospital mortality (n=285). When a multivariate analysis was performed, body tem-perature and blood pressure did not independently predict mortality, while the other 4 parameters did. Blood pressure was not however removed from the scoring system, since it is incorporated in both APACHE II and RAPS and is measured on a routine basis in all patients. Oxygen saturation contrib-uted to the predictive accuracy and age was the strongest predictor in the multiple logistic regression analysis (Table 9).

Table 9. Univariate and multiple logistic regression for all parameters in RAPS and for age, body temperature and oxygen saturation. Odds ratios for in-hospital mor-tality given for an increase of one point in the score.

Univariate analysis Multivariate analysis

Variable Odds ratio CI 95% P-value Odds ratio CI 95% P-value

0-4 saturation 2.26 2.08-2.46 0.0001 1.39 1.22-1.58 0.0001 0-4 resp frequency 2.86 2.55-3.20 0.0001 1.55 1.30-1.85 0.0001 0-4 pulse frequency 1.53 1.38-1.69 0.0001 1.22 1.10-1.36 0.0002 0-4 body temperature 1.85 1.61-2.12 0.0003 1.18 0.97-1.42 0.0915 0-4 coma 2.28 2.08-2.50 0.0001 1.60 1.42-1.80 0.0001 0-4 blood pressure 1.20 1.08-1.32 0.0003 0.94 0.84-1.04 0.2485 0-6 age 2.64 2.30-3.04 0.0001 2.30 2.00-2.67 0.0001

The modified system (REMS) was therefore defined as being the total sum-ming up of age, coma, respiratory frequency, oxygen saturation, blood pres-sure, pulse rate (maximal score being 4 for all) and age (maximal score be-ing 6).

The differences regarding scores, age and LOS between those who sur-vived and those who died during their hospital stay are shown in table 10.

Page 27: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

27

Table 10. Differences between those who survived and those who died in the cohort.

Survived (n=11466) Dead (n=285) P-value

Age 61.5 20.7 80.2 10.6 <0.0001 LOS 3.2 5.6 6.5 8.3 <0.0001 REMS 5.5 3.4 10.5 4.9 <0.0001 RAPS 1.8 1.74 4.0 3.9 <0.0001

By determining sensitivity and specificity for different cut off levels for each test (Table 11), Receiver Operating Characteristic curves (ROC-curves) (fig 1), were drawn, showing graphically the discriminating power of the score.

Table 11. Sensitivity and specificity determined for different cut off-points.

SCORE SURVIVED DEAD SENSITIVITY SPECIFICITY 1-SPECIFICITY (Non-case) (Case)

0 1406 0 1 0.12 0.88 1 21 0 1 0.12 0.88 2 1510 0 1 0.26 0.74 3 548 5 0.98 0.30 0.70 4 676 3 0.97 0.36 0.64 5 1167 5 0.95 0.46 0.54 6 1492 53 0.77 0.59 0.41 7 974 19 0.70 0.68 0.32 8 1660 29 0.55 0.82 0.18 9 798 21 0.45 0.89 0.11 10 584 16 0.37 0.94 0.06 11 312 18 0.32 0.97 0.03 12 143 6 0.25 0.98 0.02 13 71 7 0.23 0.99 0.01 14 34 7 0.21 0.99 0.01 15 31 6 0.18 1.0 0 16 13 6 0.16 1.0 0 17 8 7 0.14 1.0 0 18 3 10 0.10 1.0 0 19 3 8 0.07 1.0 0 20 5 3 0.06 1.0 0 21 5 10 0.03 1.0 0 22 1 3 0.02 1.0 0 23 1 1 0.01 1.0 0 24 0 3 0 1.0 0 25 0 1 0 1.0 0 Totals 11466 285

Page 28: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

28

RAPS exhibited weaknesses in the sensitivity and specificity of predicting death (0.652±0.019 SEM), being below the areas for the most extensively evaluated scoring systems such as APACHE II which often have AUCs be-tween 0.8 and 0.9 in different settings (8). The ROC-curve for the REMS system had an acceptable area (0.852±0.014) and showed a superior dis-criminating power compared to RAPS (p<0.001).

0,0

0,2

0,4

0,6

0,8

1,0

0 0,2 0,4 0,6 0,8 1

1-specificity

sens

itivi

ty

Figure 1. Receiver Operating Characteristic (ROC) curves show graphically the predictive power for each test. Sensitivity is plotted on the vertical axis, while 1-specificity (equal to the false positive rate) is plotted on the horizontal axis. The area under the curve determines the power of the test. RAPS ( ) exhibited weaknesses in the sensitivity and specificity of predicting death, (0.652±0.019 SEM). The ROC-curve for the REMS system ( ) had an acceptable area (0.852±0.014) and showed a superior discriminating power com-pared to RAPS (p<0.001).

REMS appeared to be a powerful predictor of in-hospital mortality resulting in a likelihood ratio chi-square value of 487 (p<0.0001) in the logistic re-gression model and an OR of 1.40 for each point increase (95% CI 1.36-1.45). Similar results were obtained when REMS was applied in each of the major patient groups separately (chest pain, dyspnoea, stroke and diabetes) (table 12).

Page 29: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

29

Table 12. Odds ratios with 95% confidence intervals and p-values for in-hospital mortality resulting from an increase of one point in the REMS for different symp-toms are presented. Number of subjects and number of deaths are given for each symptom.

Reasons forattending the ED

Odds ratio 95% CI P-value Subjects (n)

Mortality (n)

Chest pain 1.47 1.36-1.60 0.0001 3234 59 Stroke 1.38 1.25-1.53 0.0001 831 46 Coma 1.34 1.16-1.56 0.0001 63 18 Dyspnoea 1.32 1.25-1.40 0.0001 1261 69 General weakness 1.20 1.08-1.32 0.0007 722 43 Hyperglycaemia 1.46 1.12-1.89 0.0048 324 8 Asthma 1.27 1.03-1.58 0.0261 290 5 Fever 1.31 1.03-1.67 0.0270 112 6 Cough 1.30 0.99-1.70 0,0600 181 5 Arrhythmia 1.60 0.97-2.63 0.0649 419 3 Vertigo 1.29 0.85-1.90 0.260 729 2 Syncope 1.24 0.84-1.80 0.270 304 3 Epileptic seizures 1.06 0.73-1.54 0.750 245 2 Hypoglycaemia 1.00 0.999 22 2 Nausea, diarrhoea, pain in the leg etc

3014 0

Total 1.40 1.36-1.45 0.0001 11751 285

RAPS was also a significant predictor of in-hospital death with a likelihood ratio chi-square value of 261.2 (p<0.0001) and an OR of 1.47, (95% CI 1.41-1.54). The higher OR is explained by the lower maximal score in RAPS (16 compared with 26 for REMS).

REMS seemed to be a stable system throughout different ranges of age, grouped as in the APACHE age score. In the group with the youngest pa-tients there were only 2 deaths so REMS was not evaluated in that subgroup, regarding in-hospital mortality (See table 13). There was no difference in ORs between men and women.

Page 30: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

30

Table 13. Odds Ratios for each REMS point increase, number of subjects and num-ber of deaths for different age groups.

Subgroups of the population

Odds ratio 95% CI P-value Subjects (n)

Mortality n (%)

age < 45 ----- ----------- ------- 2699 2 (0.07%) age 45-54 1.60 1.38-1.86 0.0001 1476 7 (0.47%) age 55-64 1.27 1.05-1.53 0.015 1298 16 (1.2%) age 65-74 1.43 1.29-1.58 0.0001 2012 38 (1.9%) age > 74 1.23 1.18-1.28 0.0001 4265 222 (5.2%) Total 1.40 1.36-1.45 0.0001 11751 285 (2.4%)

Two clear-cut off points could be defined in the REMS (Fig 2,3). Firstly, all patients presenting with REMS < 3 survived. Secondly, at 24 and 25 points all the patients died (Fig 2,3).

1427

2063

1851

2538 2531

933

23879 34 24 23 6 4

0

500

1000

1500

2000

2500

3000

subj

ects

(n)

0-1

2-3

4-5

6-7

8-9

10-11

12-13

14-15

16-17

18-19

20-21

22-23

24-26

REMS

Figure 2. Frequency distribution for REMS.

Page 31: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

31

0 5 8 72 73 3724

14

13

18

134

4

0102030405060708090

100

Mor

talit

y %

0-1

2-3

4-5

6-7

8-9

10-11

12-13

14-15

16-17

18-19

20-21

22-23

24-25

REMS

Figure 3. In-hospital mortality in percentage and cases (n) for each REMS point.

The calibration of REMS was poor. The Hosmer–Lemshow goodness-of-fit statistic was performed with deciles of risk and was found to have a low degree of fit with chi-square 62 (p<0.0001) (se table 14).

Table 14. Calibration of the REMS with the Hosmer-Lemshow goodness of fit test (22) comparing the observed mortality with the expected mortality within deciles of risk.

Deciles of risk

Observed alive

Expectedalive

Observed dead

Expecteddead

Total

1 1406 1403 0 3 1406 2 1531 1526 0 5 1531 3 1224 1224 8 8 1232 4 1167 1160 5 12 1172 5 1492 1524 53 21 1545 6 974 974 19 19 993 7 1660 1658 44 46 1704 8 1382 1371 50 61 1432 9 630 626 106 110 736

REMS was validated by the so-called split-sample method (61), i.e. the pre-dictions made on the first 50% (n=5 375) of the patients included in the study were compared with those of the second half (n=5 376) of the popula-tion. The odds ratios and their 95% confidence intervals for the two parts of the population were almost identical and are presented in table 15.

Page 32: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

32

Table 15. Validation of the REMS with split-sample technique. Odds Ratios for each point increase, area under receiver operating curve (AUC) and Goodness of fit (GOF) with Chi-square values in both the sample sets. Odds

ratioCI Likeli-

hoodratio

P-value AUC GOF No of cases

No of dead

REMS in population No 1

1.40 1.34-1.46 261 <0.0001 0.832±0.016 35.3 df=7

5876 145

REMS in population No 2

1.41 1.34-1.48 226 <0.0001 0.862±0.018 31.7 df=7

5875 140

REMS also appeared to be a powerful predictor of short-term mortality (n = 118 who died within 48 h) resulting in a likelihood ratio chi-square value of 322 (p<0.0001) in the logistic regression model and an OR of 1.46 for each point increase (95% CI 1.40-1.52). Similar results were obtained when REMS was applied in each of the major patient groups separately (chest pain, dyspnoea, stroke and diabetes).

There was a modest correlation between REMS and length of stay (me-dian days 3.3, range 0 to 117, Spearman’s rank coefficient rs=0.47, p< 0.0001), and a rather poor correlation between RAPS and LOS (rs=0.22, p=0.0001)

0

20

40

60

80

100

120

leng

th o

f sta

y (d

ays)

0 5 10 15 20 25 30

REMS

Figure 4. Regression plot for REMS and LOS.

Page 33: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

33

A comparison of APACHE II and REMS at the ED (paper II) As shown in figure 5, there was a strong correlation between REMS and APACHE II in the sub-sample no1.

0

5101520

25

3035

4045

APA

CH

E II

0 5 10 15 20 25REMS

Figure 5. Correlation between the two scoring systems REMS and APACHE II. (Spearman rank coefficient rs=0.87, p=0.0001)

REMS was a powerful predictor of in-hospital mortality (n=116) in the sub-sample no 1, resulting in a likelihood ratio chi-square value of 318.7 (p<0.0001) in the logistic regression model and an OR of 1.58 for each point increase (95% CI 1.48-1.70).

APACHE II was also a significant predictor of in-hospital death with a likelihood ratio chi-square value of 278.5 (p<0.0001) and an OR of 1.25 (95% CI 1.21-1.29). The lower OR for APACHE II is at least partly ex-plained by the lower maximal score in REMS (26 compared with 40 for APACHE II).

Figure 6 shows the ROC-curves for REMS and APACHE II regarding in-hospital mortality. The two methods showed a good discrimination with no difference between the areas under curves. (0.911 ± 0.015 SEM for REMS and 0.901 ± 0.015 for APACHE II).

The p-value was 0.218 when the two areas were compared by the Hanley-McNeil-method (64).

Page 34: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

34

0,0

0,2

0,4

0,6

0,8

1,0

0 0,2 0,4 0,6 0,8 1

1-specificity

sens

itivi

ty

Figure 6. Receiver Operating Characteristic (ROC) curves for REMS ( ) (0.911 ± 0.015) and APACHE II ( (0.901 ± 0.015) regarding in-hospital mortality did not differ (p-value: 0.218).

One clear cut off point could be defined in REMS as well as in the APACHE II system (Fig 7 and 8). All patients presenting with REMS < 6 and APACHE II < 6 points survived.

Page 35: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

35

6482

136

209

263

123

4733 24 21 18 7

0

50

100

150

200

250

300

subj

ects

(n)

0-1

2-3

4-5

6-7

8-9

10-11

12-13

14-15

16-17

18-19

20-21

22-24

REMS

0 0 8 911

12

21

18 1615 6

0

20

40

60

80

100

Mor

talit

y %

0-1

2-3

4-5

6-7

8-9

10-11

12-13

14-15

16-17

18-19

20-21

22-24

REMS0

Figure 7. Frequency distribution of the REMS score (top panel) and in-hospital mortality in the sub-sample no 1.

Page 36: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

36

78102

287

240

124

5238

25 24 3011 10 3 3

0

50

100

150

200

250

300

subj

ects

(n)

0-2

3-5

6-8

9-11

12-14

15-17

18-20

21-23

24-26

27-29

30-32

33-35

36-38

39-40

APACHE II

0 6 11 8

10

11

12

18

18

9

9

2 2

0

20

40

60

80

100

Mor

talit

y %

0-2

3-5

6-8

9-11

12-14

15-17

18-20

21-23

24-26

27-29

30-32

33-35

36-38

39-40

APACHE II

0

Figure 8. Frequency distribution of the APACHE II score (top panel) and in-hospital mortality in the sub-sample no 1.

Model calibration, as indicated by the Homer-Lemeshow goodness of fit test, demonstrated that there was no difference between observed and pre-dicted mortality for REMS and APACHE II. (Chi-square value for REMS: 9.3, p=0.23 and chi-square value for APACHE II: 8.7, p=0.36) (See tables 16 and 17).

Page 37: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

37

Table 16. Calibration of the REMS with the Hosmer-Lemshow goodness of fit test (22) comparing the observed mortality with the expected mortality within deciles of risk. A 2 x 10 table is produced with ten strata of probabilities. The expected out-comes were then compared with the observed outcome for each decile. The calibra-tion turned out to be acceptable (chi-square-value: 9.3, p=0.23).

Deciles of risk

Observed alive

Expectedalive

Observed dead

Expecteddead

Total

1 123 123 0 0 1232 63 63 0 0 633 96 95 0 1 964 121 122 3 2 1245 80 83 5 2 856 151 151 6 6 1577 103 99 3 7 1068 112 109 11 14 1239 62 67 88 83 150

Table 17. Calibration of the APACHE II with the Hosmer-Lemshow goodness of fit test (22) comparing the observed mortality with the expected mortality within dec-iles of risk. A 2 x 10 table is produced with ten strata of probabilities. The expected outcomes were then compared with the observed outcome for each decile. The cali-bration turned out to be acceptable (chi-square-value: 8.7, p=0.36).

Deciles of risk

Observed alive

Expectedalive

Observed dead

Expecteddead

Total

1 101 100 0 1 101 2 79 78 0 1 79 3 101 99 0 2 101 4 78 79 3 2 81 5 102 102 3 3 105 6 86 86 3 3 89 7 81 83 6 4 87 8 110 106 3 7 113 9 94 97 14 11 108 10 79 80 84 83 163

There was a significant correlation between REMS and length of stay in hospital (median days 1, range 0 to 117) but the Spearman’s rank coefficient was rather poor (rs=0.23, p< 0.0001). The correlation between APACHE II and LOS was also significant with a similar Spearman’s rank coefficient (rs=0.22, p< 0.0001).

Page 38: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

38

REMS can predict long-term mortality in non-surgical ED patients (paper III) Kaplan-Meier cumulative survival plot for the 4.7 years follow-up in the total sample is given in fig 9.

0

,2

,4

,6

,8

1

Cum

. Sur

viva

l

0 500 1000 1500 2000

TIME (days)

Figure 9. The Kaplan-Meier cumulative survival plot in the total sample during the 4.7-year follow up.

When the actual number of deaths in the cohort over the entire follow-up period was compared to general population age- and sex-matched data ob-tained from official mortality data from the same region of Sweden, the rela-tive risk was significantly increased in the non-surgical patients attending the ED in this study (2.42, 95% CI 2.34-2.50, Wald test).

The Cox proportional-hazard model was used to evaluate how well REMS could predict long-term mortality over a mean ( SD) of 4.67 (0.29) years of follow-up. Both REMS and age were strong predictors of mortality at one week, one month, three months, one year and at 4.7 years (table 18). When forced into a multivariate model together with age, REMS could still predict mortality at the same points independent of age.

Page 39: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

39

Table 18. Hazard ratios for REMS and age in separate models as well as forced into the same multivariate model to show that REMS can predict mortality at each point independent of age. No of survivors are presented at each point.

Univariate analysis Multivariate analysis

No of survi-vors

For a one-unit increase in:

HR CI P-value HR CI P-value

1 week REMS 1.34 1.30-1.37 0.0001 1.30 1.26-1.34 0.0001 11503 Age-score 1.81 1.62-2.03 0.0001 1.29 1.14-1.46 0.0001 1 month REMS 1.30 1.27-1.32 0.0001 1.24 1.21-1.27 0.0001 11286 Age-score 1.77 1.63-1.92 0.0001 1.36 1.24-1.48 0.0001 3 months REMS 1.29 1.27-1.31 0.0001 1.20 1.17-1.22 0.0001 10908 Age-score 1.82 1.71-1.94 0.0001 1.47 1.38-1.58 0.0001 1 year REMS 1.26 1.24-1.28 0.0001 1.13 1.11-1.15 0.0001 10171 Age-score 1.73 1.66-1.80 0.0001 1.50 1.43-1.58 0.0001 4.7 years REMS 1.26 1.25-1.27 0.0001 1.10 1.08-1.12 0.0001 8112 Age-score 1.84 1.79-1.89 0.0001 1.69 1.64-1.75 0.0001

Similar HRs were obtained when REMS was applied in most of the major patient groups separately, such as chest pain, dyspnoea, stroke, diabetes, while somewhat lower HRs (although highly significant) were seen for pa-tients with coma and general weakness (table 19).

Page 40: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

40

Table 19. Hazard ratios with 95% confidence intervals and p-values for in-hospital mortality during a 4.7 years follow-up resulting from an increase of one point in the REMS for different complaints are presented. Number of subjects and number of deaths are given for each symptom.

Each point increase in the REMS

Reasons for attendingthe ED

Hazard ratio 95% CI P-value Subjects (n) Mortality (n)

Chest pain 1.31 1.28-1.34 0.0001 3234 822Stroke 1.18 1.14-1.23 0.0001 831 399Coma 1.12 1.07-1.18 0.0001 63 42Dyspnoea 1.23 1.20-1.25 0.0001 1261 675General weakness 1.11 1.08-1.15 0.0001 722 498Hyperglycaemia 1.28 1.20-1.35 0.0048 324 142Asthma 1.20 1.14-1.27 0.0001 290 83Fever 1.24 1.14-1.35 0.0270 112 55Cough 1.28 1.19-1.37 0.0600 181 61Arrhythmia 1.37 1.24-1.50 0.0001 419 57Vertigo 1.25 1.18-1.32 0.0270 729 172Syncope 1.32 1.22-1.42 0.0270 304 69Epileptic seizures 1.24 1.16-1.32 0.0075 245 56Hypoglycaemia 1.28 1.02-1.60 0.0300 22 7Headache 1.42 1.30-1.56 0.0001 542 43Intoxication 1.37 1.27-1.48 0.0001 483 51Allergic reactions 1.62 1.34-1.96 0.03 413 12TOTAL 1.26 1.25-1.27 0.0001 11751 3640

In fig 10 the total population was split into a high-risk, an intermediate-risk group and a low-risk group. The gap between the low-risk and the interme-diate-risk groups and between the intermediate-risk and the high-risk groups increased continuously during the whole follow-up period, indicating that deviations of REMS influence mortality even several years following the visit to the ED.

Page 41: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

41

0

,2

,4

,6

,8

1

Cum

. Sur

viva

l

0 1 2 3 4 5 6Time (years)

Figure 10. Kaplan-Meier cumulative survival plot for 4.7 year survival in a low-risk population (1) , REMS <6, an inter-mediate-risk population (2), REMS 6-13 and a high-risk population, REMS>13 (3).

1

3

2

Page 42: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

42

Information on coexisting disorders can add prognostic information to Rapid Emergency Medicine Score as a predictor of long-term mortality (paper IV) REMS, was not strongly related to the indices of co-morbidity (r = 0.38, p 0.0001 vs. CCI (fig 11).

0

3

6

9

12

15

18

21

24

REM

S

0 2 4 6 8 10 12

Charlson Comorbidity Index

Figure 11. Correlation between the two scoring systems Charlson comorbidity index and REMS (Spearman rank coefficient rs=0.38, p=0.0001).

A univariate Cox proportional hazard analysis showed REMS, as well as Charlson Comorbidity Index (CCI) to be significant predictors of short-term as well as long-term mortality (table 20).

When a multivariate analysis was performed, however, CCI could not predict short-term mortality independently while REMS did. Both REMS and CCI could predict long-term mortality independently.

Page 43: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

43

Table 20. Hazard ratios for REMS and CCI in separate models and forced into the same multivariate model to show that CCI can add prognostic information to REMS in a long-term but not in a short-term perspective.

Univariate analysis Multivariate analysis

No of survi-vors

For a one-unit increase in:

HR CI P-value HR CI P-value

3 days REMS 1.31 1.25-1.38 0.0001 1.34 1.26-1.41 0.0001 826 CCI 1.12 0.99-1.27 0.068 0.87 0.74-1.03 0.1097 days REMS 1.33 1.28-1.39 0.0001 1.35 1.29-1.42 0.0001 814 CCI 1.15 1.04-1.28 0.01 0.910 0.79-1.047 0.18730 days REMS 1.32 1.28-1.38 0.0001 1.32 1.27-1.38 0.0001 787 CCI 1.20 1.11-1.31 0.0001 1.02 0.92-1.13 0.707 CCI90 days REMS 1.33 1.29-1.38 0.0001 1.32 1.27-1.37 0.0001 761 CCI 1.23 1.15-1.32 0.0001 1.07 0.98-1.17 0.10 CCI1 year REMS 1.29 1.24-1.32 0.0001 1.26 1.22-1.30 0.0001 681 CCI 1.25 1.19-1.32 0.0001 1.16 1.09-1.23 0.00013 years REMS 1.26 1.23-1.30 0.0001 1.24 1.20-1.27 0.0001 562 CCI 1.27 1.22-1.32 0.0001 1.18 1.13-1.24 0.00014.7 years REMS 1.25 1.22-1.28 0.0001 1.22 1.19-1.25 0.0001 471 CCI 1.28 1.23-1.33 0.0001 1.20 1.15-1.25 0.0001

Between CCI and length of stay (median days 3.0, range 0 to 70), there was a rather poor correlation (Spearman’s rank coefficient, rs=0.25, p< 0.0001).

Page 44: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

44

Discussion

Development of REMS The present study showed that RAPS, developed for a pre-hospital setting, was a predictor of in-hospital death in non-surgical patients admitted to the ED. However, addition of information on peripheral oxygen saturation and age to the RAPS system, which resulted in REMS, improved the predictive ability considerably and was applicable and equally powerful in the major patient groups, i.e. those with chest pain, coma, stroke, diabetes or dyspnoea. The REMS system offers rapid and easily performed scoring procedure with parameters accessible in the ED, and is independent of later treatment.

Since peripheral oxygen saturation is measured in all patients at the ED, it would therefore be considerably easy to develop the RAPS system further, by adding this parameter. As age was the strongest predictor of mortality in the multiple regression model it seemed natural to incorporate this variable, weighted as originally described in the APACHE II. Blood pressure did not independently predict mortality in the present study. In the original APACHE II work however, BP was measured invasively and in most sub-jects continuously. In this study precision may have been lost by the calcula-tion of mean blood pressure from a single non-invasively measured SBP and DBP.

REMS combined with an accurate description of the disease could proba-bly be used to stratify acutely ill patients and assist investigators comparing the success of new or differing forms of therapy. This scoring index could also be used to evaluate the use of hospital resources and compare the effi-cacy of EDs at different hospitals, or a specific hospital, over a period of time. Further research will determine if REMS could also be useful as a se-verity of disease classification system for surgical patients at the ED. Speci-fied trauma scores are used (35-38), but other surgical patients exist, such as those with acute abdominal disorders. One major advantage with REMS is its usefulness across different non-surgical patient groups. Disease-specific scoring systems already exist, such as those describing acute coronary syn-dromes (31, 32), stroke (33) and asthma (34), but most of these specific systems require collection of data not easily available at the ED.

The association between LOS in hospital and REMS was highly signifi-cant, but the Spearman rank coefficient was modest. One possible explana-tion for this may be that LOS reflects not only the acute severity of illness,

Page 45: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

45

but also general morbidity and cognitive function in elderly patients. Thus, the ability of REMS to predict the LOS in hospital is limited.

Evaluation of REMS The discriminatory power was acceptable for a severity of illness scoring system according to the established norms (62). REMS showed a reasonable potential with an area under the ROC-curve of 0.85.

The ROC-curve did not reveal any obvious cut-off level for RAPS. The RAPS system cannot therefore be used for clinical decisions, such as with-holding treatment, in individual patients.

However, REMS, the modified version of RAPS, showed clear cut-off levels with no deaths below 3 points and 100% mortality above 24 points. Although these results seem straightforward, it would be premature to use REMS as a tool to withhold treatment before the results have been con-firmed in other studies. Even so, the overall clinical course of the particular patient must be prioritised in the decision of treatment, but REMS could probably be used as an additional tool to help in this decision process.

The validity of the model carried out using split sample technique was good,with Odds ratios and likelihood ratios similar in both groups.

The calibration however, was poor with a chi-square value of 62 (p < 0.0001). This is consistent with several other studies in which different pre-dictive models have been valuated (11, 21, 75, 84). In the assessment of the performance of five intensive care scoring models within a large Scottish database (84), all models (SAPS, APACHE II and III, and MPM) had Hos-mer-Lemeshow goodness-of-fit tests that were statistically significant (p<0.001). This indicates a statistical significant difference between ob-served and predicted mortality in all cases, i.e. poor calibration of the mod-els. It is well known though, that small differences can achieve significance with large sample sizes. This sensitivity to larger patient numbers was dem-onstrated by Zhu et al (85). Given the size of our study population, it may therefore not be surprising that the goodness-of-fit tests were significant. This is further demonstrated in the analysis of calibration of REMS in the sub-sample in which the goodness-of-fit test was no longer significant.

REMS vs. APACHE II APACHE II is probably the most widely used and extensively evaluated scoring system for assessing the severity of illness, which has been devel-oped for ICUs during the past three decades. In view of the vast experience

Page 46: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

46

connected with this scale in the ICU, we conducted a comparative study between APACHE II and the Rapid Emergency Medicine Score (REMS) in the non-surgical ED.

Two questions were applied before the start of this study. Could the APACHE II system, developed for use on critically ill patients in an ICU setting, perform equally well as a scoring tool even in non-surgical patients attending the ED? Could the REMS scoring system, with its simplicity and fewer variables, described as a predictive model at the ED in the first aim of this thesis, perform as well as APACHE II?

At least three important established statistical methods must be consid-ered when comparing two scoring systems. Firstly, a linear regression model has to be performed, to show if there is a correlation between the methods. Secondly the discriminatory power must be compared and p-values calcu-lated according to one of the established methods, in this thesis the Hanley and McNeil Wilcoxon statistics method. Thirdly, a logistic regression model must be performed and the calibration compared between the two systems. The calibration technique is based on comparing the observed mortality with the expected mortality within deciles of risk. In this work the Hosmer -Lemeshow method is used. Furthermore, the availability of the different variables in the two systems, i.e. the usefulness in clinical practice, has to be discussed.

Both REMS and APACHE II appeared to be powerful predictors of in-hospital mortality in this study population. The chi-square value for APACHE II was slightly higher compared with REMS, but they were both highly significant. The Odds ratio was higher for REMS than for APACHE II. This is probably mainly due to the lower maximal score in REMS (26 compared with 40 for APACHE II). It is not possible to compare the predic-tive accuracy in terms of Odds ratios when the scales are different.

As could be expected there was a strong correlation between the two pre-dictive models, as the APACHE II includes all variables included in REMS. When ROC-curves were drawn for both of the scoring systems, both REMS and APACHE II had very good discriminatory power with areas under the curves (AUC) of more than 0.9 that did not differ according to the method of Hanley-McNeil. The calibrations were also adequate for both systems.

The similarity of the discriminatory power and the calibrations of the two systems is a strong argument for using the simpler scoring system. REMS requires fewer measurements to be collected than the APACHE II scoring system, causes less discomfort for the patient, is less expensive and can be calculated quicker, as no time consuming blood chemistry analyses are in-volved.

A simple standardized scoring system such as REMS, valid for a majority of pathologies would largely eliminate the need for specific scoring systems, thereby facilitating inter-ED comparisons of treatment and management. Although REMS cannot replace highly specific scoring systems, such as

Page 47: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

47

those used for burn patients or patients with myocardial infarction, it is an efficient indicator of mortality over a wide range of disorders. Although it is possible to improve a score by changing the variables, modifying weights, and including information on diagnosis, this also complicates the score, con-sequently prohibiting its routine application. However, although all scores have limitations, the REMS has the advantage of being simple, inexpensive and reliable. As both REMS and APACHE II performed adequately and almost equally well regarding calibration and discrimination in the ED set-ting there should be no doubt as to the preference of REMS in this situation.

Long-term mortality The present study showed that REMS was a predictor of mortality at one week, one month, three months, one year, and 4.7-year mortality in the non-surgical ED. It was found that REMS could independently predict long-term mortality even with age as a covariate. It is logical to assume that a de-rangement of the physiological parameters in REMS can have an impact on short-term mortality, but how such a score could predict long-term outcome seems less obvious. One explanation could well be that deviations in physio-logical variables during an emergency episode mirror a specific individual’s response to acute disease. The extent, to which a specific patient deteriorates in physiological parameters when exposed to an illness, could also be the crucial point for risk assessment during a long-term follow-up period.

The population was split into low-risk, intermediate-risk and high-risk populations. A life-table analysis showed then that the risk of dying in-creased during the complete follow-up period in those with REMS > 13 compared with those with REMS < 6 and in those with REMS 6-13 com-pared with the low-risk group. This plot shows that a derangement in physio-logical variables not only affects mortality in the short-term perspective but also indicates a relation between the physiologic response to illness on a specific occasion and the risk of dying over a longer period.

Co-morbidityThe Charlson Co-morbidity Index is one of the few validated indices cur-rently available, which categorises medical diagnoses based on case note review (47). One limitation of this study is that this type of approach could be subject to error due to incomplete documentation. Some disorders thought to be irrelevant at the time of clerking may not have been included in the assessment.

Furthermore, CCI is an inventory of disease, but it does not include any quantitative assessment of severity. Crabatree et al demonstrated a combined

Page 48: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

48

disease inventory and assessment of symptom severity scale, in the Co-morbidity Symptom Scale (86). This was administered by interviews in which the patient’s perspective of the severity of disease was determined.

Maybe a severity of symptom assessment such as this would stand a bet-ter chance of adding prognostic information to REMS at the non-surgical ED even in a short-term perspective. In the original work by Charlson and sev-eral other studies evaluating the CCI (47-50), the outcome has been long-term mortality. Incalzi et al could demonstrate that the arithmetical sum of coexistent disease cannot predict in-hospital mortality (87) and these find-ings are consistent with our study.

The REMS has been shown in several studies to possess an adequate pre-dictive accuracy and has the advantage of being simple to use, quick and cheap, and as a prognostic tool for short-term mortality in the non-surgical ED, REMS alone would suffice. It would, however, be of immense interest to study the combination of REMS and quantative measurements of comor-bidity on a larger population to get an even more accurate predictor of long-term mortality to serve as a support for ED-physicians in their communica-tion with other physicians, patients and their relatives about prognosis.

Strengths and limitations The strength of this thesis was the design of the study, which used a large sample of patients consecutively included and followed in a prospective manner during a prolonged period of time (4.7 years). The in-hospital mor-tality in the total sample was low, (2.4 %) but was compensated for by the size of the data set (11 751 subjects). Furthermore, all the three major de-mands needing to be considered when constructing a new severity of disease classification system were fulfilled.

The predictive power, expressed in Odds ratios was calculated in a multi-ple logistic regression model. The validity was evaluated by the split sample technique (56). This method has reached standard level and, as such, is gen-erally accepted by the medical community.

The discriminatory ability was assessed and presented in ROC curves and these were compared using an established method (Hanley and McNeil Wil-coxon statistics method (58). The discrimination as assessed by the area un-der the ROC-curve is important because it shows how many patients are correctly classified as survivors, respectively nonsurvivors by the model. The calibration of the scoring system i.e. the evaluation of the extent to which the estimated probabilities of mortality of the model corresponded to observed mortality rates was calculated with the Hosmer- Lemeshow good-ness of fit test (22). Most of the widely used scoring systems created for the ICU setting are evaluated by this method.

Page 49: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

49

Thus, the established methodology in this field was strictly followed in the process of developing the Rapid Emergency Medicine Score.

One of the limitations in this thesis was that these data represent patients from a non-surgical ED, at a university teaching hospital. We must therefore be cautious in generalizing the results presented here, when comparing with a non-university or a surgical ED.

The discrimination and validity were satisfactory for the REMS, but the calibration of the model was poor. Poor fit along with good discriminative ability have been previously reported in several studies for APACHE II and SAPS II in populations other than those for which the models were origi-nally developed (11, 21, 75, 84). There are several explanations given for this outcome of the Hosmer–Leme goodness of fit test. Zhu et al (85) claimed that a large amount of data is significant even with small differ-ences, and this may be part of the explanation, even if APACHE II per-formed well in this aspect in the original work (8). Another explanation could be variation of the data quality. Teres and Lemeshow showed consid-erable variation of data quality despite low inter-rater variability (88). No check up for the inter-rater variability was performed in our study, allowing additionally for this bias. The length of hospital stay can influence the ob-served mortality. One study comparing LOS and in-hospital mortality as well as 30-day mortality revealed higher hospital mortality in a region with longer LOS but identical 30-day mortality (89). Knaus and co-workers in-cluded a LOS variable in their estimation of probability of hospital death and showed a higher mortality in hospitals with a longer average LOS (90).

Compared with the total sample the sub-sample populations were rather small (n=1027 and n=865). It would of course have been more elegant to score all the 11 751 patients regarding the APACHE II and the Charlson Co-morbidity Score, but there were no resources allocated for such a tremen-dous undertaking. As a compromise, we scored the most degenerated pa-tients in the total cohort (n=162) and added them to the population consecu-tively being admitted to the hospital over a period of two months, in order to achieve a certain number of in-hospital deaths. This procedure gave a mor-tality rate in the sub-sample of 11.3% compared with 2.4 % in the total co-hort. These critically ill patients were identified in the logbooks at the ED, in which a note was made for every patient who was transferred directly to the ICU due to unstable conditions. These inclusion criteria in the sub-sample changed the case mix considerably compared with the total cohort, and con-sequently the results cannot automatically be applied to the general non-surgical ED setting.

However, the comparison of the REMS vs. APACHE II in the sub-sample no 1 and the evaluations of CCI in the sub sample no 2, were all undertaken in the same population and the results seemed straightforward. The OR was higher for REMS in the sub-sample no 1 than in the total cohort indicating that REMS is even more important in a high-risk non-surgical ED.

Page 50: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

50

To summarise, the table from the introduction states that the first three main items in the “check list” have been scrutinized (See table 2). It has however not been possible to get a completely clear-cut table regarding the three common types of biases, methodological rigour and reliability. This has been the case for most of the widely used scoring systems in the original works, as well as in the further evaluating studies.

Selection bias strikes at the core of a predictive instrument’s design, data collection, and construction phases. This prejudice can be perpetuated in two ways: (1) In the selection of the population to be studied and on which the instrument will be based and (2) in the selection of variables to be included for evaluation and eventually incorporated into the model. A considerable potential bias will be found if a model consists entirely of patients at one particular hospital, as in our study. Essential qualities of this specific patient population are then incorporated into the model, and are not adulterated by patients from other institutions. REMS should be tested in a multicentre study, in order to rectify such flaws. There can also be a selection bias sim-ply in the type of variables chosen if they rely on physician’s interpretations, e.g. as with coma in our system.

Lead-time bias is perhaps a greater potential problem in the ICU setting in which the patients are often treated, prior to admission to the ED. Patients attending the ED may however have been treated during transport, e.g. with oxygen.

Detection biases have been noted to be a potential problem with the APACHE II model (8). This problem has been rectified in this thesis by as-suming the missing variable to be normal if it is not mentioned in the medi-cal record. Such a retrospective approach was not undertaken in the APACHE II. Another source of detection bias emerges in the mental status examinations, such as the Glasgow Coma Scale. Due to treatment, sedated or chemically paralysed patients cause introduction of error into this measure (91). However, this is uncommon at the ED because patients rarely receive sedative drugs during transport.

Polderman et al claimed that there was a degree of variability inherent in the APACHE II of about 10 to 16% even if strict educational guidelines were incorporated for the physicians to assess the score (67). A certain de-gree in measurement variability is generally the case in all research and a limitation in this study is the lack of test for reliability in terms of inter-observer agreement. It should be understood that in physiology-based sever-ity models there is a presumption that the “noise” associated with the con-tinuous monitoring of extreme, non-clinically relevant values are minimized by an experienced ICU nurse. E.g. a patient on an oxygen mask produces fluctuations of oxygen saturation during suction, movement, spells of agita-tion or simply if she/he pulls off the mask. Which value should be recorded for the severity model?

Page 51: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

51

The “best practice” approach would follow GCP phase III clinical drug trials, where an independent research nurse collects data while a research monitor provides a constant overview of missing or inconsistent data. One potential risk for inter- or intra-observer disagreement is the amount of data needed. To give it credit, the REMS is easy to use, and requires only a few parameters that are already routinely measured in the ED.

Future perspectives Further studies are necessary in order to make sure that the potential biases do not distort the scoring system i.e. reliability studies. It would also be in-teresting to study those patients with low REMS scores who died. Could their deaths be due to a sudden unexpected occurrence, such as an intracra-nial haemorrhage or cardiac arrest?

The applications earlier proposed for the severity of disease scoring sys-tems developed for the ICU setting were mentioned in the introduction. The application of scoring systems outside their prime purpose of measuring severity of illness tends to degrade their performance and should be per-formed with extreme caution (92) (Table 23).

Table 21. Potential applications for REMS in the future.

Future applications for REMS

Research (stratify groups) Quality assurance in health care Resource utilization Triage in the ED Withhold or withdraw treatment??

The APACHE II and other extensively studied scoring systems for the ICU setting are being used to stratify patients into equal groups for prospective trials, to compare performances and different treatment between units or hospitals, for quality assurance purposes and for resource utilization analy-ses. REMS could probably been used for similar tasks in the ED.

The most controversial potential use for predictive models is to use them to affect individual case management, especially for issues of withholding or withdrawing treatment (76, 77). At the Emergency Department however, a morally significant difference rightfully exists between the withholding and withdrawal of medical treatment. Medical practitioners and patients some-times find it emotionally easier to forgo new interventions than to withdraw ongoing treatment. The emotional impact and methods of withholding and withdrawing treatment differ distinctly in different medical settings. In the

Page 52: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

52

ICU setting the withholding of further treatment is done quietly, often with-out input from the patient, whereas the withdrawal of ongoing medical treatment can be more obvious and difficult (93). This situation is reversed at the emergency department where there is often a lack of information about the identity, medical condition and wishes of the patient, as well as the spe-cial societal expectations that exist regarding the nature of emergency care. The withholding of treatment at the ED is the most obvious and traumatic of all decisions (94). The situations in which interventions are needed are pat-ently obvious. To do nothing requires an enormous effort, the clinician’s assurance that he or she is aware of all relevant information, and a sense of personal security. Later withdrawal of care, however, is often performed after more data has been gathered in a more secluded setting. These special circumstances not only make the withholding of medical treatment more problematic than later withdrawal of unwanted or useless interventions, they also ascribe a morally significant difference.

REMS could however be helpful for triaging in daily work at the ED. There is currently no reliable tool for this purpose available at EDs over-crowded with patients in need of help. The standard procedure in many hos-pitals to date is that the nurse in charge triages the patients regarding symp-toms e.g. patients with chest pain are prioritised before those who are dizzy. A combined system, using both symptoms and REMS could be of value in this work, and provide both the nurse and the physician with a better and more reliable tool to decide if a specific patient can wait for several hours or should be taken care of immediately. Specific cut-off points in the score might define those who are critically ill and have to be taken care of imme-diately.

Furthermore, it would be of immense interest to combine REMS with a clinical prediction of mortality. This could be done in the same way as in the PIOPED study, which has become the “gold standard” for the diagnosis of pulmonary embolism (95). The probability of a specific patient having the diagnosis was examined by using a combination of an objective measure-ment (scan) and the physician’s clinical risk scoring. There are studies where physicians´ likelihood estimates are compared with severity of illness scores such as APACHE II. Katzman et al (78) claimed that the experienced inten-sivist ability to identify patients at the extreme ends of the survival scale is quite comparable in accuracy to quantitative models. They suggested a model where a combination of the physician’s estimates and an objective severity of illness score could be a potential for good prognostication when making difficult resource allocation decisions. Poses et al stated that physi-cians with different backgrounds and experience may vary in the way they make prognostic judgments and that such variability may have major effects on communication, decision-making and care of patients (54). It would be of immense value to combine intuitive clinical forecasts with REMS estimates in a new scale, which could probably outperform both.

Page 53: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

53

Several authors have shown that functional measures described as Active Daily Living (ADL) or Mini-Mental Status provide independent measures of mortality and claim that updated severity of disease scores will need to in-corporate functional measures along with burden of illness (96-99). Since both the ADL and Mini Mental Status scores are easily obtained in the ED setting it would be of interest to combine these scores with REMS to evalu-ate if the predictive accuracy could be improved.

Turner et al claimed (100) that scoring patients later in their illness allows better predictability and decision-making for individual patients, and to achieve maximum benefit from severity of illness scores these should be repeated on successive days following admission. REMS is particularly well suited for this purpose, due to its simplicity and quickness of use, and could potentially become a system for continuous monitoring of patients during transport, at the ED and later in the hospital, with the purpose of following the patients’ status. It would then be possible to study if alterations in the REMS score per se could have any prognostic value.

The medical climate in the western world is rapidly changing. This change is to a large extent inevitable, and as Jonsen has noted (101), the modern medical Good Samaritan can no longer deal with only one patient at a time when the line of patients at the clinic is unlimited: “The old tutorist rule requires the physician to apply a procedure that offers any chance of saving life. A more relaxed rule would allow the selection of a level of prob-ability or reasonable chance. It would remove the moral vocabulary of medi-cine ´Do everything´ and substitute it with ´Do as much as is reasonable´”. Implicit in the idea of doing as much as is reasonable is that this involves not just one patient, but all patients. The modern medical climate calls for a uti-larian social policy in which the entire population will benefit, in part by denying life support services to a few who are unlikely to benefit (102). Prognostic scoring systems in varying clinical settings are likely to play an increased role in the future, particularly if their limitations are appreciated and their predictive ability improves. Hopefully, the role of prognostic scor-ing systems will continue to be adjunctive, and medical decision-making will remain the province of physicians, patients and families.

Page 54: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

54

Conclusions

Major aim The present study revealed REMS as being a significant predictor of in-hospital mortality in patients attending the Emergency Department over a wide range of common non-surgical disorders. The relation between REMS and the length of stay in hospital was however modest.

Secondary aims The APACHE II system was a strong predictor of mortality in non-surgical patients attending the ED. Both APACHE II and REMS showed adequate and comparable predictive accuracy, but REMS is suggested in preference as a scoring tool in the non-surgical ED setting because of its simplicity.

REMS could predict, over a wide range of disorders, long-term (4.7 years) mortality at the ED, with or without incorporation of the age component.

Information on coexisting disorders (Charlson Co-morbidity Index) can prognosticate both short-term and long-term mortality in the non-surgical ED. It can also add prognostic information to REMS as a predictor of long-term mortality.

Page 55: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

55

Acknowledgements

I wish to express my sincere gratitude and appreciation to all who contrib-uted in the present study with special attention to

Lars Lind, my supervisor, for good collaboration, and for his instructive and engaged guidance.

Andreas Térent, my co-supervisor, for support, encouragement and construc-tive criticism.

Solveig Karlsson, for her generosity and support.

Thomas Mooe, for his genuine interest and considerate but constructive criti-cism.

Per Nordin for encouragement and practical support.

Olov Wåhlinder, my first supervisor at the department of internal medicine, for inspiring me and for sharing his wide scientific knowledge.

The staff at the library of Östersund Hospital, for their competent, speedy and ready help with reference articles throughout the years.

All my colleagues and ward staff at the department of internal medicine.

Ragnar Asplund and Susanne Johansson at Mitthögskolan, the Fou unit, Jämtland for believing in this work and making it financially possible.

My friend John Selander with whom virtually every aspect of life, including research, can and has been discussed.

My parents for their constant support and encouragement.

My wife Lina for having patience and my son Johan, for always reminding me what really is important in life…

Page 56: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

56

Sammanfattning på svenska

De senaste 30 åren har olika poängsättningssystem utvecklats för att bedöma svårighetsgraden av sjukdom hos patienter på en intensivvårdsavdelning. Det mest använda och utvärderade systemet är Acute Physiology and Chronic Health evaluation (APACHE II), som poängsätter avvikelsen i 12 olika fysi-ologiska variabler. De användningsområden som föreslagits för detta system är jämförelse av vårdkvalité mellan olika enheter och jämförelse mellan oli-ka behandlingsformer. En skattning av sjukdomsgrad är även av värde i in-terventionsstudier för att kunna stratifiera patienter i olika grupper. På en akutmottagning skulle ett sådant system vara av stort värde för att jämföra resursutnyttjande mellan olika mottagningar och sjukhus, samt för att på ett rationellt sätt kunna prioritera bland de patienter som väntar på att få träffa en läkare på akutmottagningen. Hittills har något sådant system inte funnits tillgängligt för akutmottagningen.

Under ett år poängsattes 11 751 medicinska patienter på Akademiska sjukhusets akutmottagning med hjälp av systemet Rapid Acute Physiology Score (RAPS), som är en förenklad version av APACHE II. RAPS innehål-ler bara fyra fysiologiska variabler (blodtryck, andningsfrekvens, puls fre-kvens och medvetandegrad) och inga laboratorievariabler, och är därför mer lämpad för bruk på en akutmottagning. Denna patientkohort följdes sedan under 4.7 år med avseende på mortalitet under sjukhusvistelsen men även därefter.

Med hjälp av multipel logistisk regressionsmetodik konstaterades att peri-fer syrgassaturation och ålder signifikant adderade prognostisk information till RAPS med avseende på sjukhusmortalitet, vilket ledde till att nya syste-met Rapid Emergency Medicine score (REMS) skapades med tillägg av dessa variabler.

REMS var bättre än RAPS att predicera sjukhusmortalitet enligt analys avseende diskrimination med hjälp av ROC-kurvor ( p<0.001). En ökning med en poäng i REMS var associerad med ett odds ratio på 1.40 för död under vårdtid (95% CI 1.36-1.45, p<0.0001) Ett likartat odds ratio sågs i olika symtomgrupper (bröstsmärta, stroke, coma, dyspné och diabetes) och i båda könen. Associationen mellan REMS och längden på vårdtiden på sjuk-huset var endast av måttlig grad.

Charlson Comorbidity Index, som ett mått på sjukligheten hos den enskil-da patienten innan vårdtillfället på akutmottagningen kunde predicera död under vårdtiden när detta undersöktes i en delmängd av den totala studiepo-

Page 57: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

57

pulationen (n=865). Tillägg av denna information förbättrade förbättrade REMS förmåga att predicera långtidsmortalitet (1, 3 och 4,7) år men förbätt-rade inte REMS prognostiska förmåga vad gäller kort-tidsmortalitet (3, 7, 30 och 90 dagar). REMS var en lika kraftfull prognostisk markör för dödlighet under vårdtid som det betydligt mer komplicerade APACHE II systemet vid en jämförelse på akutmottagningen. Båda systemen påvisade i denna analys en god kalibrering med avseende på relationen observerat jämfört med för-väntat antal avlidna under sjukhusvistelsen utvärderat med Hosmer-Lemshows goodness-to-fit test. REMS på akutmottagningen kunde också predicera långtids mortalitet (4.7 år) i den totala kohorten utvärderat med Cox proportional hazard-metodik (hazard ratio 1.26, p<0.0001). Detta sågs även när ålder exkluderades från REMS.

I denna studie sågs att REMS var en användbar prognostisk markör för medicinska patienter på akutmottagningen både med avseende på död under vårdtid samt långtids mortalitet. Det är ett enklare system än det tidigare använda APACHE II och har lika bra prediktiv förmåga. Charlson Comorbi-dity Index adderade prognostisk information till REMS avseende långtids-mortalitet. Sammantaget tyder dessa fynd på att REMS kan vara ett värde-fullt tillskott vid karaktäriseringen av medicinska patienter på akutmottag-ningen.

Page 58: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

58

References

1. Clark C, The issues, In: Emergency Medicine, Congressional Quarterly Re-searcher, January 1996;6:3-9

2. Feinstein AR: Clinical judgement. Baltimorer: Williams and Wilkins, 1967, p. 224

3. Jalan KN, Prescott RJ, Sircus N, et al: Ulcerative colitis a clinical study of 399 patients. J R Coll Surg 1971; 16:338-355

4. Simon R: Use of regression models: statistical aspects. Cancer Clinical Trials: Design, Practice and Analysis. Oxford Press, 1983; 445-466

5. Simon R: Importance of prognostic factors in cancer clinical trials. Cancer treat Rep 1984 ; 68: 185-192

6. Wisner DH. History and current status of scoring systems for critical care. Arch surg 1992; 127: 352-6

7. Knaus W A, Draper E A, Wagner D P, Zimmerman J E: APACHE-acute physi-ology and chronic health evaluation:a physiologically based classification sys-tem. Crit Care Med 1981; 9: 591-597

8. Knaus W A, Draper E A, Wagner D P, Zimmerman J E: APACHE II: A sever-ity of disease classification system; Crit Care Med 1985; 13: 818-829

9. Porath a, Eldar N, Harman-Bohem I, Gurman G. Evaluation of the APACHE II scoring system in an Israeli intensive care unit. Isr J Med Sci 1994; 30: 514-520.

10. Wilairatana P, Noan NS, Chinprasat s, Prodeengam K, Kityaporn D, Looareesu-wan S. Scoring systems for preciting outcomes of critically ill patients in north-eastern Thailand. Southeast Asian J Trop Med Public Health 1995; 26: 66-72

11. Rowan KM, Kerr JH, Major E, Mcpherson K, Short A Vessey MP. Intensive Care Society´s Acute physiologiy And Chronic Health Evaluation (APAVHE II) study in Britain and Ireland: a prospective, multicenter, cohort study comparing two methods for predicting outcome for adult intensive care patietns. Crit Care Med 1994; 22: 1385-1391

12. Wong DT, Crofts SL, Gomez M, McGuire GP, Byrick RJ. Evaluation of predic-tive ability of APACHE II system and hospital outcome in Canadian intensive care unit patients. Crit Care Med 1995; 23: 1177-83.

13. Giangiuliani G, Mancini A, Gui D. Validation of a severity of illness score (APACHEII) in a surgical intensive care unit. Intensive Care Med 1989; 15: 519-22.

14. Bohnen JMA, Mustard RA, Oxholm SE, Schouten BD. APACHE II score and abdominal sepsis. A prospective study. Arch Surg 1988; 123: 225-9.

15. Poenaru D, Christou NV. Clinical outcome of seriously ill surgical patients with intra-abdominal infection depends on both physiologic (APACHE II score) and immunologic (DHT score) alterations.

16. Bosscha K, Reijnders K, Hulstaert PF, Algra A, van der Werken C. Prognostic scoring systems to predict outcome in peritonitis and intra-abdominal sepsis. Br J Surg 1997; 84: 1532-4.

Page 59: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

59

17. Berger MM, Marazzi A, Freeman J, Chiolero R. Evaluation of the consistency of Acute Physiologiy And Chronic Health Evaluation (APACHE II) scoring in a surgical intensive care unit. Crit Care Med 1992; 20: 1681-7

18. Knaus W A, Draper E A, Wagner D P, Zimmerman J E, Bergner M, Bastos PG et al. The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults. Chest 1991; 100: 1619-36.

19. Le Gall JR, Loirat P, Alperovitch A, Glaser P, Granthil C, Mathieu D et al. A simplified acute physiology score for ICU patients. Crit Care Med 1984; 12: 975-7

20. Le Gall J, Lemeshow S, Saulnier F. A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. JAMA 1993; 270: 2957-63.

21. Moreno R, Morais P. Outcome prediction in intensive care: results of a prospec-tive, multicentre, Portuguese study. Intensive Care Med 1997; 23: 177-86.

22. Lemeshow S, Teres D, Pastides H, Avrunin JS, Steingrub JS. A method for predicting survival and mortality of ICU patients using objectively derived weights. Crit Care Med 1985; 519-25.

23. Snyder JV, McGuirk M, Grenvik A, Stickler D. Outcome of intensive care: an application of a predictive model. Crit Care Med 1981; 9: 598-603.

24. Copeland GP, Jones D, Walters M. POSSUM: a scoring system for surgical audit. Br J Surg 1991; 78: 355-60.

25. Copeland GP, Jones DR, Wilcox A, Harris PL. Comparative vascular audit using the POSSUM scoring system. Ann R Coll Surg Engl 1993; 75: 175-7.

26. Sagar PM, Hartley MN, Mancey-Jones B, Sedman PC, May J, Macfie J. Com-parative audit of colorectal resection with POSSUM scoring system. Br J Surg 1994; 81: 1492-4.

27. Knaus WA, Draper EA, Wagner DP, Zimmerman JE. Prognosis in acute organ-system failure. Ann Surg 1985; 202: 685-93.

28. Goris RJA, te Boekhorst TPA, Nuytinck JKS, Gimbrere JSF. Multi-organ failu-re. Generalized autodestructive inflammation? Arch Surg 1985; 120: 1109-15

29. Ruokonen E, Takala J, Kari A, Alhava E. Septic shock and multiple organ fail-ure. Crit Care Med 1991; 19: 1146-51.

30. Elebute EA, Stoner HB. The grading of sepsis. Br J Surg 1983; 70: 29-31. 31. Antman EM, Cohen M, Bernink PJ, et al The TIMI risk score for unstable an-

gina/non-ST elevation MI: A method for prognostication and therapeutic deci-sion making, JAMA 2000; 284: 835-42

32. Lindahl B, Toss H, Siegbahn A, Venge P and Wallentin L. Markers of myocar-dial damage and inflammation and long-term cardiac mortality in unstable coro-nary artery disease. N Eng J Med 2000;343: 1139-47

33. Marco Fiorelli, et al: Prediction of Long-term Outcome in the Early Hours Fol-lowing Acute Ischemic Stroke. Arch Neurol 1995; 52: 250-254

34. Rodrigo G, Rodrigo C. A new index for early prediction of hospitalization in patients with acute asthma. Am J Emerg Med 1997 Jan;15(1):8-13

35. Boyd CR, Tolson MA, Copes WS: Evaluating trauma care: The TRISS method. J Trauma 1987; 27: 370-378

36. Champion HR, Copes WS, Sacco WJ, etal. A new characterization of injury severity. J Trauma 1990; 30: 539-45

37. Champion HR, Sacco WJ, Copes WS, Gann DS, Gennarelli TA, Flanagan ME. A revision of the trauma score. J Trauma 1989: 29: 623-9.

Page 60: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

60

38. Baker SP, O´Neill B, Haddon W Jr, Long WB. The Injury Severity Score: a method for describing patients with muliple injuries and evaluating emergency care. J Trauma 1974; 14:187-89

39. Rhee K, Fisher C, Willitis N: The Rapid Acute Physiology Score. Am J Emerg Med 1987; 5:278-286

40. Kaplan MH, Feinstein AR. The importance of classifying initial co-morbidity in evaluating the autcome of diabetes mellitus. J Chron Dis 1974; 27: 387-404.

41. Gonella JS, Louis DZ, McCord JJ. Quality of patient care-A mearurement of change: the staging concept. Med Care 1975; 13: 467-473

42. Gonella JS, Goran MJ. The staging concept-an aproach to the assessment of outcome of ambulatory care. Med Care 1976; 14: 13-21

43. Gonella JS, Hornbrook MC, Louis DZ. Staging of disease, a casemix measure-ment. JAMA 1984; 251:637-644

44. Horn SD. Measuring severity of illness: Comparisions across institutions. Am J Public Health 1983; 73:25-31

45. Horn SD, Buckley G, Sharkey PD et al. Interhospital differences in severity of illnes. N England J Med 1985; 313:20-24

46. Horn SD, Sharkey PD, Buckle JM et al. The relationship between severity of illness and hospital length of stay and mortality. Med Care 1991; 29: 305-317

47. Charlson ME, Pompei, PP, Ales KL, MacKenzie CR. A new method of calssify-ing prognostic co-morbidity in longitudinal studies; development and validation. J Chron Dis 1987; 5: 373-383.

48. Hlatkey MA, Paul SM, Gortner SR. Functional capacity after cardiac surgery in elderly patietns. J AM Coll Cardiol 1994; 24: 104-108

49. Charlson ME, Mac Kenzie CR, Gold JP, Ales KL, Topkins M, Shires GT. Risk for postoperative congestive heart failure. Surg Gynecol Obstet. 1991; 172: 95-104

50. Buntinx F, Niclaes L, Suetens C, Jans B, Mertens R, Van den Akker M. Evalua-tion of Charlson´s co-morbidity index in elderly living in nursing homes. J Clin Epidemiol 2002; 55: 1144-1147

51. Charlson ME, Sax FL, MacKenzie CR et al. Assessing illness severity; Does clinical judgement work? J Chron Dis 1986; 39: 439-452

52. Pompei P, Charlson ME, Douglas RG Jr. Clinical assessment as predictors of one year survival after hospitalization: Implications for prognostic stratification, J Clin Epidemiol 1988; 41: 275-284

53. Greenfield S, Aronow HU, Elashoff RM et al. Flaws in mortality data. JAMA 1988; 260: 2253-2255.

54. Poses R, McClish DK, Smith WR, Bekes C, Scott E. Prediction of survival of Critically Ill Patients by Admission Co-morbidity. J Clin Epidemiol 1996; 49:743-747

55. Meade MO, Cook DJ. A critical appraisal and systematic review of illness se-verity scoring systems in the intensive care unit. Current Opinion in Critical Care 1995; 1: 221-7

56. Efron B. Bootstrap methods: another look at the jackknife. Annals of statistics 1979; 7: 1-26

57. Schuster DP. Predicting Outcome after ICU Admission. Chest 1992; 102:1861-1870

58. Hanley J, McNeil B The Meaning and Use of the Area under a Receiver Operat-ing Characteristic (ROC) Curve. Radiology 143: 29-36 April 1982

Page 61: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

61

59. Hanley J, McNeil B A Method of Comparing the Areas under Receiver Operat-ing Characteristic Curves derived from the same cases. Radiology 148; 839-843, September 1983

60. Lemeshow S, Hosmer DW Jr: A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol 1982; 115: 92-106

61. Fitzgerald FT. Physical diagnosis versus modern technology: a review. West Jmed 1990; 152:377-82

62. Dragsted L, Jorgensen J, Jensen NH, Bonsing E, Jacobsen, Knaus WA, et al. Interhospital comparisions of patients outcome from intensive care: importance of lead-time bias. Crit Care Med 1989; 17:418-22

63. Cerra FB, Negro F, Abrams J. APACHE II score does not predict multiple or-gan failure of mortality in postoperative surgical patients. Arch Surg 1990; 125: 519-22

64. Escarce JJ, Kelley MA. Admission source to the medical intensive care unit predicts hospital death independent of APACHE II score. JAMA 1990; 264: 2389-93

65. Cowen JS, Kelley MA. Errors and bias in using predictive scoring systems. Crit Care Clin 1994; 10:53-71

66. Rosenberg AL. Recent innovations in intensive care unit risk-prediction models. Crit Care. 2002; 8: 321-330

67. Polderman KH, Jorna EM, Girbes AR: Inter-observer variability in APACHE II scoring: effect of strict guidelines and training. Intensive Care Med 2001; 27: 1365-1369.

68. Champion HR, Sacco WJ. Measurement of patient illness severity. Crit Care Med 1982; 10:522-3

69. Li TL, Phillips MC, Shaw L, Cook EF, Natanson C, Goldman L. On-site physi-cian staffing in a community hospital intensive care unit: impact on test and procedure use and on patient outcome. JAMA 1984; 252:2023-7

70. Hyzy RC. ICU scoring and clinical decision making. Chest 1995; 107:1482-83 71. Hunt JP, Baker CC, Lentz CW, et al. Thoracis aorta injuries: management and

outcome of 144 patients. J Trauma 1996; 40: 547-56 72. Wagner DP, Knauss WA, Draper EA, Lawrence DE, Zimmerman JE. The range

of intensive care services today. JAMA 1981; 246: 2711-16 73. Becker RB, Zimmerman JE, Knaus WA, Wagner DP, Seneff MG, Draper EA.

Identification of low-risk monitor admissions to medical-surgical ICUs. Chest 1987; 92:423-8

74. Pollack MM, Alexander SR, Clarke N, Ruttiman UE, Tesselaar HM, Bachulis, AC. Improved outcomes from tertiary center pediatric intensive care: a state-wide comparision of tertiary and nontertiary care facilities. Crit Care Med 1991; 19: 150-9

75. Sirio CL, Tajami K, Tase C, Knaus WA, Wagner DP, Hirasawa H. An initial comparsion of intensive care in Japan and the United States. Crit Care Med 1991; 19: 150-9

76. Knaus WA, Rauss A, Alperovitch A, Le Gall JR, Loirat P, Patios E, et al. Do objective estimates of chances for survival influence decision to withhold or withdraw treatment? Med Desis Making 1990; 10:163-71

77. Knaus WA, Wagner DP, Lynn J. Short-term mortality predictions for critically ill hospitalized adults: science and ethics. Science 1991; 13: 17-35

Page 62: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

62

78. Katzman D, McClish DK, Powell SH. How well can physicians estimate mor-tality in a medical intensive care unit? Med Desis Making 1989; 9: 125-32

79. Brannen AL, Godfrey LJ, Goetter WE. Prediction of outcome from critical illness: a comparision of clinical judgment with a prediction rule. Arch Intern Med 1989; 149:1083-86

80. Pearlman RA. Variability in physician estimates of survival for acute respiratory failure in chronic obstructive pulmonary disease. Chest 1987; 91:515-21

81. Detsky AS, Stricker SC, Mulley AG, Thibault GE. Prognosis, survival and the expenditure of hospital resources for patients in an intensive care unit. N Engl Med 1981; 305: 667-72

82. Schafer JH, Maurer A, Jochimson F, et al. Outcome prediction models on ad-mission in a medical intensive care unit: do they predict individual outcome? Crit Care Med 1990; 18:1111-17

83. Rogers J, Fuller HD. Use of daily Acute Physiology and Chronic Health Evalua-tion (APACHE II) scores to predict individual patient survival rate. Crit Care Med 1994; 22: 1402-5.

84. Livingston BM, MacKirdy FN, Howie JC, Jones R, Norrie JD. Assessment of the performance of five intensive care scoring models within a large Scottish da-tabase. 2000; 6; 1820-1827

85. Zhu BP, Lemeshow S, Hosmer DW, et al. Factors affecting the performance of the Mortality Probability Model II system and strategies of customization: A simulation study. Crit Care Med 1996; 24: 57-63

86. Crabtree H.L, Gray C S, Hildreth A J, O´Conell M B, Brown J. The Co-morbidity Symptom Scale: A Combined Disease Inventory and Assessment of Symptom Severity. JAGS 2000; 48:1674-1678

87. Incalzi R A, Capparella O, Gemma A. et al. Predicting mortality and length of stay of geriatric patients in an acute care hospital. J Gerontol 1992; 47: M35-9.

88. Terese D, Lemeshow S: Why severity models should be used with caution. Crit Care Clin 1994; 10:93-110

89. Jenchs SF, Williams DK, Kay TL: Assessing hopital-associated deaths from discharge data. JAMA 1988; 260:2240-2246

90. Knaus WA, Wagner DP, Zimmerman JE, et al: Variations in mortality and length of stay in intensive care units. Ann Intern Med 1993; 118: 753-761

91. Castella X, Gilabert J, Torner F, Torres C. Mortality prediction models in inten-sive care: Acute Physiology and Chronic health Evaluation II and Mortality Pre-diction Model compared. 1991; 2: 191-198

92. Ridley. Severity of illness scoring. Anaesthesia 1998; 53:1185-1194 93. Faber-Langendon K, Bartels BM; Process of forgoing life-sustaineing treatment

in university hospital: An empirical study. Crit Care Med 1992; 20: 570-577 94. Iserson KV, Sanders AB, Mathieu DR: Ethics in Emergency Medicine. 1995

Tuscon: Galen Press;1:7-10 95. PIOPED (Prospective Investigation in Pulmonary Embolism Diagnosis) study

compares lung scans and pulmonary arteriography. J Nucl Med. 1989 Mar;30(3):279-80.

96. Steinman MG, Shea JA, Jette A et al. The Functional Independence Measure: test of scaling, assumptions structure and reliability across 20 diverse impair-ment categoies. Arch Phys Med Rehabil 1996;77:1101-1108

97. Folstein M, Folstein S, McHugh P. Mini-Mental Status: A practical method for grading the cognitive status for patients or the clinician. J Psychiatr Res 1975;12: 189-198

Page 63: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

63

98. Inouye SK, Peduzzi PN, Robinson JT, et al. The importance of functional meas-ures in predicting mortality among older hospitalized patients. JAMA 1998; 279:1187-1193

99. Davis RB, Iezzoni LI, Phillips RS, et al. Predicting in-hospital mortality. The importance of functional status information. Med care 1995; 33: 906-921

100. Turner JS, Potgieter PD, Linton DM. Systmems for scoring severity of illness in intensive care. SAMJ 1989; 76: 17-20

101. Jonsen AR: The New Medicine and the Old Ethics. Cambridge, MA Harard University Press 1990.

102. Knaus WA, Wagner DP, Lynn J: Short term mortality predictions for critically ill hospitalized adults: science and ethics. Science 1991;254: 389

Page 64: Risk Prediction at the Emergency Department165310/FULLTEXT01.pdf · prognostic similarity is not achieved, then differences in outcomes may be attributable to inherent prognostic

Acta Universitatis UpsaliensisComprehensive Summaries of Uppsala Dissertations

from the Faculty of MedicineEditor: The Dean of the Faculty of Medicine

Distribution:Uppsala University Library

Box 510, SE-751 20 Uppsala, Swedenwww.uu.se, [email protected]

ISSN 0282-7476ISBN 91-554-6070-4

A doctoral dissertation from the Faculty of Medicine, Uppsala University,is usually a summary of a number of papers. A few copies of the completedissertation are kept at major Swedish research libraries, while the sum-mary alone is distributed internationally through the series Comprehen-sive Summaries of Uppsala Dissertations from the Faculty of Medicine.(Prior to October, 1985, the series was published under the title “Abstracts ofUppsala Dissertations from the Faculty of Medicine”.)