clinical psychometrics
DESCRIPTION
Clinical PsychometricsTRANSCRIPT
Clinical Psychometrics
Clinical Psychometrics
Per Bech
A John Wiley & Sons, Ltd., Publication
This edition first published 2012 © 2012 by John Wiley & Sons, Ltd
Danish original title: Klinisk psykometri, by Per Bech, ISBN 97887628-1011-2, copyright Munksgaard Danmark, Copenhagen 2011.
This edition of Klinisk psykometri is published with the title “Clinical Psychometrics”, by arrangement with Munksgaard Danmark.
Wiley-Blackwell is an imprint of John Wiley & Sons, formed by the merger of Wiley’s global Scientific, Technical and Medical business with Blackwell Publishing.
Registered Office John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Offices 9600 Garsington Road, Oxford, OX4 2DQ, UK 111 River Street, Hoboken, NJ 07030-5774, USA
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell
The right of the author to be identified as the author of this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
The contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting a specific method, diagnosis, or treatment by physicians for any particular patient. The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of fitness for a particular purpose. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. Readers should consult with a specialist where appropriate. The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising herefrom.
Library of Congress Cataloging-in-Publication Data
Bech, Per. [Klinisk psykometri. English] Clinical psychometrics / Per Bech. – 1st ed. p. ; cm. Includes bibliographical references and index. ISBN 978-1-118-32978-8 (pbk. : alk. paper) 1. Psychometrics. 2. Psychiatry. I. Title. [DNLM: 1. Psychometrics–history. 2. Factor Analysis, Statistical. 3. Psychology, Clinical– instrumentation. 4. Psychopharmacology. BF 39] BF39.B417 2012 150.1′5195–dc23
2012009839
A catalogue record for this book is available from the British Library.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Cover image: © Todd Harrison – iStockphoto.com Cover design by Sarah Dickinson
Set in 9.5/12pt Minion by SPi Publisher Services, Pondicherry, India
1 2012
I attempted to effect the scientific in my psychopathology by methodological
investigations, not by a dogmatic exposition of a complete psychiatric
epistemology.
Karl Jaspers (1950)
The debt of psychiatry to the psychologist is now great and growing. From
[Eysenck’s] rigorous inquiries, sustained and resourcefully developed over
years, psychiatry stands to gain an impetus and accuracy in some essential
matters which will advance it and reinforce the free play of clinical skill and
insight.
Aubrey Lewis (1952)
Emil Kraepelin is probably the most outstanding psychiatrist who ever lived.
Max Hamilton (1978)
To Ole Rafaelsen, a man larger than life, and to Erling Dein who
showed me how to use Occam’s razor in psychopathology
About the author, ix
Preface, x
Introduction, 1
1. Classical psychometrics, 3
Emil Kraepelin: Symptom check list and pharmacopsychology, 6
Charles Spearman: Factor analysis and intelligence tests, 10
Harold Hotelling: Principal Component Analysis, 13
Hans Eysenck: Factor analysis and personality questionnaires, 15
Max Hamilton: Factor analysis and rating scales, 20
Pierre Pichot: Symptom rating scales and clinical validity, 23
2. Modern psychiatry: DSM-IV/ICD-10, 27
Focusing on reliability, 27
Focusing on validity, 28
Quantitative, dimensional diagnosis, 29
3. Modern dimensional psychometrics, 32
Ronald A. Fisher: From Galton’s pioneer work to the suffi cient statistic, 32
Georg Rasch: From Guttman’s pioneer work to item response theory
analysis (IRT), 34
Sidney Siegel: Non-parametric statistics, 38
Robert J. Mokken: Non-parametric analysis for item response
theory (IRT), 39
4. Modern psychometrics: Item categories and suffi cient statistics, 43
Rensis Likert: Scale step measurements, 43
John Overall: Brief, suffi cient rating scales, 45
Contents
vii
viii Contents
Clinical versus psychometric validity, 48
Item-response theory versus factor analysis, 49
Jacob Cohen: Eff ect size, 50
5. The clinical consequence of IRT analyses: The pharmacopsychometric triangle, 53
Eff ect size and clinical signifi cance, 53
Th e pharmacopsychometric triangle, 56
Antidementia medication, 59
Antipsychotic medication, 60
Antimanic medication, 65
Antidepressive medication, 66
Antianxiety medication, 69
Mood stabilising medications, 72
Combination of antidepressants, 73
6. The clinical consequence of IRT analyses: Health-related quality of life, 74
Th e WHO-5 Questionnaire, 78
7. The clinical consequences of IRT analyses: The concept of stress, 82
Post-traumatic stress disorder, 82
Th e work-related stress condition, 84
Integration of Selye’s medical stress model, 85
8. Questionnaires as ‘blood tests’, 89
Population studies in depression and anxiety, 89
Th e predictive validity of WHO-5, 92
Screening scales, 92
9. Summary and perspectives, 95
10. Epilogue: Who’s carrying Einstein’s baton?, 103
Glossary, 109
Appendices, 114
References, 185
Index, 196
ix
Per Bech
Per Bech received a medical degree from Copenhagen University in 1969.
In 1972 he received a gold medal award from Århus University for his thesis
on the dose-response relationship between cannabis ( tetrahydrocannabinole)
and various psychological measurements, including time experience and
reaction time in simulated car driving.
He completed a doctorate thesis (Dr. Med. Sci) at Copenhagen University
on the clinical and psychometrical validity of rating scales in depression and
mania in 1981.
He was appointed Professor of Psychiatry at Odense University in 1992
and in 2008 he was appointed Professor of Applied or Clinical Psychometrics
at Copenhagen University.
Since 1981 he has held the post of chief psychiatrist at The Mental Health
Centre North Zealand in Hillerød (Capital Region of Denmark) and is
Head of the Psychiatric Research Unit there. He is an honorary member of
the Royal College of Psychiatrists and of the European Psychiatric
Association (EPA).
About the author
x
The first edition of this book was the original Danish version published
in January 2011, as an introduction to the very broad field covering clinical
psychology, psychiatry and clinical psychopharmacology. It was an attempt
to follow Kraepelin’s rating scale approach and his pharmacop sycho-
metrics as they have developed in the twentieth century, especially with the
introduction of psychopharmacology in the 1960s. The central concept
here is the Pharmacopsychometric Triangle, in which (A) covers desired
clinical effect, (B) unwanted effects, or side effects, and (C) patient-reported
quality of life. In connection with (A), short psychometric scales are
described which can be used to measure such classes of drugs as
antide mentias, antipsychotics, antimanics, antidepressants, antianxiety
drugs, and mood stabilisers.
The psychometric performances of scales for (A), (B) and (C) are described
with reference to both factor analysis and to item response theory models.
These models have been amended for readers without mathematical knowl-
edge. However, throughout the book experienced psychiatrists are referred
to as an index of validity in an attempt to bring the symptoms home to the
dimensions within (A), (B) and (C) where they belong.
My thanks when preparing the Danish version of my book went, as so
often before, to Peter Allerup, Professor of Theoretical Psychometrics at the
University of Århus. He has been a ‘basic factor’ for my work with rating
scales over nearly 40 years! My research coordinator Lone Lindberg has
made a unique contribution, with invaluable help in typing and layout. Gabriele
Bech-Andersen and Susan Søndergaard are behind the translation procedures
for the scales in the Danish version, and Susan has translated this English
version from the Danish. Ove Aaskoven has been my statistical research
assistant for many years, often in collaboration with Peter Allerup. Finally, I
owe a debt of thanks to the Munksgaard editors Marie Schack and Daniel R.
Andersen who made helpful suggestions for the earlier Danish versions.
In this English version editor Jesper Konradsen has raised challenging
queries, especially on the philosophical lines running through it, with
Preface
Preface xi
focus on the development of psychometrics from a philosophical start to
mathematical aspects of measuring mental stages, to clinical validity and
dose–response relationships and then back to the philosophy of Wittgenstein,
which brings symptoms home to form relevant syndromes or dimensions.
1
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Clinical psychiatry has incorporated psychology as an important auxiliary
subject in the same way as neuropharmacology and neuroanatomy. As a
branch of medicine, clinical psychiatry has especially attempted to deter-
mine the organic cause of mental disorders; and before the establishment of
psychometrics, the psychological approach to patients was seen as a non-
organic explanatory model for mental disorders. Freud’s psychoanalysis, in
particular, was seen as a psychological explanatory model; partly because
psychiatry was regarded for many years as an atypical branch of medicine
due to the non-testability of the Freudian theories, which were thus without
clinical validity ( 1 ).
The scientific approach to psychology launched by psychometrics has
resulted in psychiatry being regarded as a clinical branch of medicine. This
only took place with the 1987 publication of Feinstein’s monograph on clini-
metrics ( 2 ). Finding a comprehensive overview of the role of psychometrics
in clinical psychiatry has proved difficult. The following is an attempt to put
this to rights.
It falls naturally to divide clinical psychometrics into two eras. The first of
these, the classical era, covers the period from 1879 to 1945. It is the era of the
greatest names: Wilhelm Wundt who founded psychometrics in 1879 and his
two most important pupils; Kraepelin and Spearman. The modern period
developed after 1945 has Eysenck, Hamilton and Pichot as the major psychom-
etricians. They developed the questionnaires and rating scales archetypal of
modern clinical psychometrics in the period from 1945 to the 1970s ( 3 ). From
a statistical point of view, however, Francis Galton and his London psychomet-
ric laboratory (founded in 1884) are essential elements, together with Galton’s
two most important ‘students’ (Pearson and Fisher) and the three people
(Rasch, Siegel and Mokken) who developed the psychometric analyses that are
Introduction
2 Clinical Psychometrics
archetypal of modern clinical psychometrics in the period from 1945 to the
1970s ( 4 ) (see Figure I.1 ).
The most obvious impact of modern psychometric research, which has
resulted in short valid rating scales and the descriptive statistics of effect
sizes, is the pharmacopsychometric triangle. It was the revolution in phar-
macology 50 years ago that led to the rebirth of Kraepelin’s pharmacopsy-
chology, now crystallised in the pharmacopsychometric triangle, the
major focus of this book.
Psychometrics
WundtLeipzig (1879–1904)
GaltonLondon (1884)
Kraepelin Spearman (1904)Factor analysis
(1883)DSM III/IV
ICD-10(1994)
(1892)Pharmaco-psychology
Pichot(1974)
Eysenck(1953)
Hamilton(1967)
Pearson (1911)The grammar of science
Fisher (1922)Sufficient statistic
Siegel (1956)Nonparametric
statistic
Item scoreLikert (1932)
Anchoring points
Rasch (1960)IRT
Total scoreCattell (1973)Transferability
Hotelling (1933)PCA
Figure I.1 Psychometrics
3
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
More than a century ago, psychology was defined as the science of human
mental manifestations and phenomena. However, it was psychometrics (the
science of measuring these mental manifestations and phenomena) that
made psychology scientific. Thus, psychometrics is a purely psychological
area of research.
From a historical point of view, psychology branched out from philosophy
as an independent university discipline at the close of the nineteenth
century. It all started in Leipzig in 1879. Here the philosopher Wilhelm Wundt
(1832–1920) established his psychological laboratory at the university.
Formally, however, his laboratory remained under the faculty of philosophy.
Wundt succeeded in detaching psychology from philosophy, especially
freeing it from the influence of Emanuel Kant, an extremely influential
philosopher who stated that it is impossible to measure manifestations of
the mind in the same way as physical objects ( 5 ). With his criticism of pure
reason, Kant (1724–1804) established the very important distinction between
‘the essential nature of things’ (things in themselves) and ‘things as they seem’
(i.e., that which we sense or perceive as a phenomenon when faced with
the object we are examining).
Figure 1.1 illustrates Kant ’ s philosophical approaches with reference to
present day psychiatry, according to which depression is understood to be a
clinical phenomenological perception (shared phenomenology of depressive
symptoms) as measured by the six depression symptoms contained in the
Hamilton Depression Scale (HAM-D 6 , see Figure 3.1). Modern neuropsy-
chiatry attempts to describe the depression behind the phenomenological
perception, i.e., depression ‘in itself ’, as we believe it to be present in the
brain, for example, as a serotonin 1A receptor problem (impairment).
The area of research now known as brain research is just such an attempt
to measure the processes presumed to be taking place in the brain, that is ‘das
1 Classical psychometrics
4 Clinical Psychometrics
Ding an sich’. As pointed out by Sontag, reality has increasingly grown to
resemble what the camera shows us ( 6 ). It is reality itself when the neuropsy-
chiatric camera demonstrates receptor binding in the brain, while clinical
reality is increasingly becoming what the camera visualises for us by means
of assessment scales or patient-related questionnaires.
The ability to describe reality as it is in itself, i.e., looking at the world
unclouded by any preconception of it, has been debated by such neo-Kantentians
as Wittgenstein and Quine ( 7 ). The quantification of endophenotypes or
deep phenotypes is probably the most scientific image of the world. However,
we do not have endophenotypes to tell us whether we indeed can describe
reality, e.g., the brain, as it is itself. Wittgenstein tells us that he does not want
to say whether we can or cannot describe reality as it is in itself. He wanted,
as stated by Putman to bring our phenomenological items back to their home
in clinical psychiatry. This is what clinical psychometrics is about ( 7 ).
Figure 1.2 shows a correlation between the so-called psychotic symp-
tom items in an American rating scale (see Appendix) and serotonin 2A
receptor binding, which it is now possible to measure by means of positron
emission tomography (PET) scanning ( 8 ). The figure shows a correlation
coefficient of −0.57; this is statistically significant but not clinically sig-
nificant, as the variance on the ordinate axis (the ‘psychosis’ scale) can
explain only about 32% of the variance on the axis of abscissas (serotonin
2A receptor binding). If the two patients at the far left are excluded as
outliers, then the negative correlation value is halved, so that less than 10%
of the variance is explained.
Kant’s philosophical approach
Psychometric frame of reference(The clinical scientist)
das Ding für uns
the phenomenon for us
Things as we perceive them in timeand space when measuring them
e.g. HAM-D6
Biological frame of reference(The brain scientist)
das Ding an sich
the noumenon
Things in themselves – onlybiological comprehension is valid
e.g. serotonin 1A receptorin the brain
Figure 1.1 The philosophical background for the emergence of psychometrics
Classical psychometrics 5
The scale in Figure 1.2 shows the positive symptoms in a schizophrenia
scale. In the early 1970s, the American psychiatrist Nancy Andreasen found
it important to label those schizophrenic symptoms on which medication
had an effect as positive. In clinical psychiatry, these were termed productive
symptoms as they were often the reason for hospitalisation in a mental insti-
tution. Later on, Nancy Andreasen became interested in neuropsychiatric
brain imaging methods [Computer Assisted Tomography (CAT scan),
Magnetic Resonance Imaging (MRI), Positron Emission Tomography
(PET)], which became available in the 1980s and 90s. However, in an inter-
view from 2003, she had to admit that schizophrenia is probably not located
in one specific section of the brain ( 9 ). Schizophrenia affects many different
brain areas that cannot be visualised as ‘das Ding an sich’.
Wilhelm Wundt ’ s major achievement was to realise that mathematical
models of ‘das Ding für uns’ can be used to measure the ‘shared pheno-
menology’ of the state one wishes to assess quantitatively. During his stud-
ies at the Heidelberg faculty of medicine, he obtained a degree in medicine.
Wundt then participated in studies in the physiology of perception under
Helmholtz (1821–94) and Fechner (1801–87). He observed that it was
possible to get subjects to reliably assess sensory impressions when the
conditions of the study were standardised, e.g., with increasing light or
noise exposure.
Clinical assessment
Psychotic subscale (PANSS) (See Appendix)40
30
20
102.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00
Frontal 5-HT2A receptor binding in the brain (biological validity)
Figure 1.2 The problematic relationship between the clinical, the psychometrical and the biological frames of reference with a correlation coeffi cient of −0.57
6 Clinical Psychometrics
Wundt ’ s philosophical basis was that each manifestation of the mind
corresponds to a neurobiological substrate in the brain, but in his opinion the
psychometric measurement of this manifestation of the mind should only
focus on the psychological phenomena (das Ding für sich) and not include
any biological elements in any way. He belonged to the branch of philosophy
called non-reductive monism (corresponding to Harald Høffding ’ s critical
monism, which maintains that manifestations of the mind cannot be reduced
to purely biological variables) ( 10 ). On the other hand, it is of course possible
to reduce certain manifestations of the mind to less complicated ones in an
attempt to obtain the most reliable or objective measure. He felt that it would
be possible in this way to make psychology scientific within the frame of its
own descriptive realm, since psychological and biological methods of
description are two different ways of viewing reality.
Wundt ’ s approach was that of descriptive psychology where the various
dimensions consisting of individual items (symptoms) can be added to give
a total score. He was excluding the immediate, peak-experiences detached
from relations, e.g., the spontaneous, stimulus-unrelated, perception-like
images in the religious experience of the child, actually referred to as ‘Sensus
numinis’ ( 11,12 ). The clearest description of Wundt ’ s scientific approach
based on his ‘Grundzüge der psychologischen Psychologie’ is found in
Vannerus’ monograph ( 13 ).
The psychometric method developed by Wundt is probably the only
specific psychological method identified in mental science, i.e., in scientific
psychology ( 14 ). The two most famous scientists to emerge from Wundt ’ s
psychological laboratory in Leipzig were Emil Kraepelin and Charles
Spearman; both of them understood that psychological measurement
( psychometrics) and biological measurement are two different ways of
viewing nature.
Emil Kraepelin: Symptom check list and pharmacopsychology
Kraepelin (1856–1926) had just obtained his medical degree when he applied
for a post at Wundt ’ s laboratory in 1882. As Wundt was unable to finance his
salary, Kraepelin also had to take up a post as a locum at the local mental
hospital in Leipzig. Thus, Kraepelin held an unsalaried position at the Wundt
laboratory. Kraepelin ’ s purpose was to introduce scientific psychology into
psychiatry so that his career as a psychiatrist would be furthered by his stud-
ies at Wundt ’ s psychological laboratory. In his job application to Wundt, he
wrote that he would give a kingdom for a [research] topic; Wundt then gave
him the opportunity to examine the influence of psychoactive substances
Classical psychometrics 7
such as alcohol and the hypnotic drug chloral hydrate on volunteer research
subjects. Kraepelin set out to demonstrate a dose response curve using
reaction time measurements as the psychological response and psychoactive
substances as the stimuli, so that increasing amounts of alcohol (number of
drinks) led to lengthening reaction times. Since Wundt could see that
Kraepelin had his heart set on psychiatry, he encouraged Kraepelin to employ
this objective scientific method when subsequently assessing the various
symptoms presented by patients suffering from mental disorders.
Kraepelin published his first Psychiatric Compendium as early as 1883.
In this he attempted to focus on the symptoms presented in the different
disorders ( Compendium der Psychiatrie . Verlag von Amber Abel, Leipzig,
1883). After leaving the Leipzig laboratory and starting on his career as
a psychiatrist in Munich, Kraepelin published several compendiums or
textbooks on psychiatry. He revised his textbook almost bi-annually and in
the 6 th edition in 1899, he was able to describe two disorders with different
symptom profiles: manic-depressive disorder and schizophrenia.
Figure 1.3 shows the checklist Kraepelin used when systematically moni-
toring his patients over several years in order to ascertain which symptoms
possessed ‘shared phenomenology’ over this period of time. These are called
Kraepelin’s symptom checklist from his Zählkarten (counting cards)
• Nervousness
• Restlessness
• Irritability
• Depression
• Psychomotor retardation
• Aggression
• Grandiosity
• Negativistic behaviour
• Hallucinations
• Paranoid ideas
Matthias M. Weber and Eric J. Engstrom Kraepelin’s ‘diagnostic cards’:the confluence of clinical research and preconceived categories.
History of Psychiatry 1997; 8: 375 – 385.
Figure 1.3 The assessment scale or checklist used by Kraepelin (10)
8 Clinical Psychometrics
checklist symptoms, as Kraepelin only determined whether the symptom was
present or absent. This type of scale is called a nominal scale. Using this
method, Kraepelin was able to demonstrate that during a period of about six
months, some patients presented with the first five or six symptoms in
Figure 1.3 , while in other episodes of shorter duration (up to three months)
they had the next two symptoms (aggression and delusions of grandeur), along
with restlessness, sleep disturbance and irritability. Between these episodes of
depression or mania, these patients were discharged from hospital and were
socially well-functioning. Other patients, who were often lifetime residents in
asylums, had the last three symptoms in Figure 1.3 . Kraepelin described them
as suffering from dementia praecox (now schizophrenia), as the disorder
typically started when they were about 20 years of age and was chronic in
nature, often with an influence on intellectual functions as well. But these were
consequences, not elements, of the schizophrenic symptomatology. Manic-
depressive disorder, on the other hand, did not typically emerge at a specific
age. Based on the original registrations by Kraepelin on his ‘Zahlkarten’ (count-
ing cards) including the checklist symptoms in Figure 1.3 , Jablensky et al made
a comparison using the Present State Examination (PSE). From the PSE scores
the ICD-9 diagnoses of schizophrenia and manic-depressive disorder can be
made. In total Jablensky et al identified 721 patients assessed by Kraepelin and
found a concordance for the diagnoses of schizophrenia and manic-depressive
disorder of approximately 80% with the ICD-9 diagnoses ( 15 ).
In his thesis: ‘Über die Beeinflussung einfacher psychischer Vorgänge
durch einige Arzneimittel‘ (Jena, Fischer Verlag 1892), Kraepelin established
the area of research he designated pharmacopsychology .
In the 8 th edition of his textbook, written between, 1909–13, Kraepelin
added reflections on the psychotherapeutic effects of certain drugs such as
morphine, phenemal and chloral hydrate. However, he found that the effects
of these drugs on schizophrenia and manic-depressive disorder were
extremely poor. He was thus able to observe the spontaneous course of illness
in these two disorders.
In the schizophrenic patient, as stated previously, the condition was
unremitting, while manic-depressive disorder was characterised by episodes
with specific symptoms and then periods between episodes of a year or more
in which the patients were completely without symptoms and thus able to
function normally. In these descriptions, Kraepelin determinedly avoided
including the various theories on disease circulating at that time, such as
hereditary elements, stress burden and so on.
Kraepelin ’ s textbooks were not widely known outside Germany, as the two
world wars made German psychiatry less acceptable. His system only began
to make an international impact after World War II, not least in the USA.
Classical psychometrics 9
During his research at Wundt ’ s Leipzig laboratory, Kraepelin conceived
the idea of establishing pharmacopsychology. He thought it important to
describe the symptoms found to be reversible during a course of pharmaco-
logical therapy. However, as mentioned previously, no therapeutically
adequate drugs were developed during Kraepelin ’ s lifetime, so this research
area was scaled down. It is a fact of great interest that Kraepelin was among
the first to propose the use of dose response comparisons as an essential
pharmacological criterion when determining the clinical effect of a drug.
The Rorschach test Until the breakthrough of modern psychopharmacology in the 1960s, Danish
psychometric research was heavily influenced by the Rorschach Shape
Interpretation test, published by the Swiss psychiatrist Hermann Rorschach in
1921. The Rorschach test consists of 10 symmetrical inkblots, which do not
represent recognisable images per se , but are used as indefinite visual stimuli
open to many different interpretations, in the same way as with abstract paint-
ing. No psychometric theory underlies this ‘inkblot test’, but in the hands of a
trained psychologist it may provide an opening for the psychodynamic theo-
ries propounded with reference to Freud ’ s psychoanalysis. Psychoanalysis was
an accepted method of treatment in psychiatry during the period between the
two world wars. However, an inherent limitation of the Rorschach test is that
the scoring is heavily dependent on the testing psychologist, so that the
Rorschach test has very poor inter-observer reliability (agreement).
In Denmark experimental psychology with stimulus response trials domi-
nated research. Alfred Lehmann (1858–1921) was the founder of experi-
mental psychology in Denmark. He had worked together with Kraepelin
at Wundt ’ s laboratory. He established a psychological laboratory at
Copenhagen University in 1886; Kraepelin paid a visit to it in 1901. The first
professor of clinical psychology at Copenhagen University, Lise Østergaard
(1924–1996) used the Rorschach test in her doctorate thesis on formal
thought disturbances in schizophrenia at the University of Copenhagen, but
the clinical experience she had gained under the supervision of the consultant
psychiatrist Erling Dein turned out to be more rewarding than her Rorschach
results ( 16 ). In the introduction to her thesis, Lise Østergaard correctly
states that Kraepelin with his symptom checklist was the first person able to
delimit schizophrenia by its characteristic symptomatology. Kraepelin had
emphasised that the symptom profile was rarely quite alike from one patient
to another, but in chronic schizophrenics the course of their disorder was
completely homogenous.
Lise Østergaard then adds that Kraepelin ’ s description of these patients
could ‘have a rather sterile and external appearance’. She finds Kraepelin ’ s
10 Clinical Psychometrics
mode of description ‘marked by the stiffness and paucity of nuances
that characterised Germanic psychology (Wundt). Kraepelin was not open to
the new currents in the psychology of his period (i.e., the psychodynamic
theories)’.
However, Lise Østergaard was forced to conclude that it was Kraepelin ’ s
consistent, clinical descriptions of psychiatric patients that made it at all
possible to delimit both the schizophrenic as well as the manic-depressive
disorder.
With the introduction of modern psychopharmacology, it became vitally
important to follow Kraepelin ’ s clinical but somewhat sterile measuring of
symptoms, and as a consequence psychometrics had to reject the Rorschach
test on a scientific basis (lack of reliability and validity) and to go on to
promote the use of symptom rating scales based on Kraepelin ’ s checklist.
Clinical reality, as described by Kraepelin at the start of the 20 th century,
was ousted by Freud ’ s psychoanalysis, and only reinstated in the 1950s when
modern psychopharmacology appeared on the scene. This made the clinical
reality Kraepelin had described perfectly obvious to everyone, as well as the
fact that Freud ’ s clinical theorising had been dismissed. Because clinical
psychology was so slow to realise this, its range became very limited. Thus, it
is hardly a paradox that clinical psychiatrists were the ones to develop clinical
rating scales.
Charles Spearman: Factor analysis and intelligence tests
In 1906, the English psychology student Charles Spearman (1863–1945)
finalised his studies at Wundt ’ s laboratory with a PhD thesis, but in 1904 he
had already published his first paper on the correlation method that was to
become the starting point of factor analysis ( 17 ).
Spearman then moved back to England and took up a London profes-
sorship. His psychological field of interest was that of intelligence tests for
use with primary school pupils. Spearman is generally regarded as the first
actual promoter of psychometrics via his attempt to define certain dimensions
of intelligence through factor analysis. His idea was to use mathematical
factor analysis to identify the factors that make up the concept of intelli-
gence. Factor analysis is a method by which one may get an indication of
which tests belong together and which do not. Thus, it is not a method of
measurement but a classification of the different tests (factor structures).
Worldwide, however, factor analysis was soon elevated to the status of an
important psychometric proof of validity of a rating scale, i.e., that the scale
was scientifically valid.
Classical psychometrics 11
If it was possible to show by the use of factor analysis, which tests pointed
the same way and which pointed in other directions, then a scientific analysis
had been performed.
In 1927, using factor analysis, Spearman was able to identify two factors of
intelligence: a general factor and a specific factor ( 18 ).
The principle of Spearman ’ s factor model is first to compute the correla-
tions between different intelligence tests, identifying those factors that best
describe the connection. The weighting of the tests comprising a certain
factor is computed (factor loadings). The first factor is usually a general
factor. The second factor is a specific factor, which shows in which areas the
person in question has their strong points.
An attempt to use the Spearman factor analysis tradition for empirical
research with different intelligence tests showed that the model does not
describe the real world. One of the problems was that factor analysis is very
sensitive to the range of variance in the sample being tested. If the analysis
is an attempt to determine factors in subjects who are all very intelligent
(i.e., a very narrow range of variance), too many factors will be identified. In a
very large population sample with very different levels of intelligence (i.e.,
a very great range of variance), usually only a single general factor emerges.
The fundamental element in factor analysis is the correlation coefficient.
Computation of the first factor will provide a rough estimate as to the size of
the correlation coefficients of the individual items in a scale; these are given
as factor loadings. When all the items have positive factor loadings (as is the
case with the first factor in Hamilton ’ s Anxiety Scale, see Table 1.1 ) then a
general factor is present (general anxiety factor in Table 1.1 ). Should one wish
to ascertain whether some items have a higher mutual correlation coefficient
(loadings) than others, then the second factor will provide this information,
through its contrast between positive versus negative loadings. In Table 1.1
the psychic anxiety symptoms have positive loadings while the physical
(bodily) anxiety symptoms have negative loadings. The sign direction in
itself is of minor importance and should not be dwelt upon; as the significant
element here is that the symptoms with the same sign have a higher mutual
correlation than the items with the opposite sign. The result shown in
Table 1.1 has a very high clinical validity when assessing antianxiety effect in
a drug.
In short, it is the identification of the first two factors that is of clinical
significance. Typically, the first factor will demonstrate that the symptoms
selected obviously have varying degrees of positive correlations; therefore
this factor is called the general factor. The second factor is the bipolar factor
according to the factor analysis literature as it attempts to establish two
discriminatory symptom groups, namely the group with negative factor
12 Clinical Psychometrics
loadings and the group with positive factor loadings. Hence this factor is
called the bipolar factor. As this term has nothing to do with bipolar affective
disorder, it is now labelled the dual factor. According to Spearman, in intel-
ligence tests this dual factor would typically discriminate between people
with language skills and people with maths skills.
British versus American factor analysis Spearman founded a special British approach to factor analysis, in which
factor analysis is used to interpret the first two factors of a rating scale analysis
(the general versus the dual). In contrast, an American approach rapidly
emerged in which factor analysis was used to identify as many factors as
possible. In the following, emphasis will be on the British method. The
American tradition of factor analytical tradition particularly refers to
Guilford ’ s classical monograph, which first appeared in 1936 ( 5 ) and in a
revised version in 1954 ( 19 ).
In the American tradition, Thurstone ( 20 ) recommended noting down the
factors in order to find more simple structures, while Guilford recommended
an ‘orthogonal’ rotation, i.e., factors may not inter-correlate (must be at right
Table 1.1 Factor analysis. Archetypical two-factor model of Hamilton ’ s anxiety scale with factor 1 as a general factor and factor 2 as a bipolar or dual factor with positive loadings on the psychic anxiety symptoms and negative loadings on the physical anxiety symptoms.
Items Hamilton (1969) (40)
Pichot et al (1981) (41)
Loadings Loadings
Factor 1 Factor 2 Factor 1 Factor 2
1 Anxiety 0.66 0.50 0.50 0.39 2 Tension 0.83 0.32 0.62 0.35 3 Phobic fears 0.48 0.28 0.45 0.35 4 Insomnia 0.62 0.05 0.65 0.26 5 Concentration difficulties 0.69 0.37 0.62 0.27 6 Depressed mood 0.69 0.33 0.66 0.38 7 Motor tension 0.52 –0.53 0.54 –0.25 8 Sensory symptoms 0.73 –0.30 0.58 –0.40 9 Cardiovascular 0.68 –0.41 0.53 –0.48 10 Respiratory 0.56 –0.40 0.52 –0.43 11 Gastrointestinal 0.66 –0.16 0.29 –0.39 12 Genito-urinary 0.45 –0.25 0.33 –0.31 13 Other autonomic 0.67 –0.14 0.52 –0.30 14 General (agitation) 0.80 0.10 0.70 0.09
Classical psychometrics 13
angles to each other). Cattell, on the other hand, suggested a less rigorous
approach with the use of ‘oblique’ rotation, permitting a certain degree of
inter-correlation between factors ( 21 ). This basic attempt to eliminate negative
loadings through rotation is called ‘positive manifold’ ( 22 ). In contrast, British
tradition advocates an initial simple description of the principal component
analysis. According to this the entire core of Spearman ’ s factor an analysis must
be examined before performing any rotation. In this ‘Spearman’ algebra, the
first factor (the principal component) is a general factor that indicates the
degree of positive correlation among the different items in a scale. The second
factor is frequently a bipolar or dual factor (i.e., with negative loadings on some
items and positive loadings on other items). One might claim that the British
tradition is less invasive, less ‘manipulative’ than the American.
When focusing on the landmarks in the development of factor analysis over
the first 50 years, Vernon concludes that Hotelling ’ s principal component
analysis is mathematically more accurate than Spearman ’ s method, but that
its greater complexity implies tedious calculations ( 23 ). However, with the
SSPS or SAS programs, a century after Spearman ’ s factor analysis, we may
now actually start with Hotelling ’ s method before we perform all the many
rotations within factor analysis. The paradox is that we have difficulty in
understanding the mathematical superiority of Hotelling ’ s method over that
of Spearman. Therefore we do not realise that the first and second principal
components identified by Hotelling ’ s method are often sufficient. In other
words, we are often unable to provide an argument for making all the
rotations inherent in the factor analytic method.
Harold Hotelling: Principal Component Analysis
It was the American mathematician Harold Hotelling (1895–1973) who
became the best advocate for the British (Spearman) algebra of concentrating
on the initial simple correlation matrix, focusing on the first two factors; the
general factor and the bi-directional factor.
Hotelling received his PhD in Mathematics from Princeton University in
1924. In 1927, he wrote a review in the Journal of American Statistical
Association on the first edition of Fisher ’ s Statistical Methods for Research
Workers and subsequently visited Fisher in London in 1929. In 1933, from
his new base at Columbia University, Hotelling introduced his Principal
Component Analysis as a pure mathematical approach to factor analysis in an
attempt to simplify the structure of a large number of items in a rating scale
( 24,25 ) (see Calculus Example 1).
The best description of Hotelling ’ s Principal Component Analysis (PCA)
has been made by Dunteman ( 26 ). PCA is an attempt to identify a few
14 Clinical Psychometrics
components explaining most of the variance in the scores for individual
items in a rating scale in the original sample. Because PCA is conducted on
rating scales that contain items with some degree of positive inter-correlation,
the first component might explain up to 50% of the variance while the second
component explains 10–15% of the variance. PCA has no underlying statistical
model, but employs a mathematical focus to explain the total variance in
the item scores, thereby capturing most of the information within the items
of the rating scale. The first (general) component is a straight line in the
correlation matrix with closest fit to the total variance, and the second
component is a straight line of closest fit to the residuals from the first prin-
cipal component. Since both principal components are uncorrelated, each
one makes an independent contribution to accounting for the variance of the
original items. The correlations of items within the principal components are
called loadings, a term borrowed from Spearman ’ s factor analysis. Whereas
the eigenvalue of the first principal component is usually higher than 1.0,
the eigenvalue of the second principal component need not be higher than
0.7 ( 26 ).The first principal component must be orthogonal to the second
component, which will have alternative loadings, i.e., as many negative as
positive loadings (bi-directional, or dual), thereby contrasting the two groups
of items that are mutually most correlated.
PCA should be clinically interpreted as a method of classifying items,
rather than a method to validate the problems of measurement. The presence
of a general factor or component is not an argument for summing all items of
a rating scale so that the total score is a sufficient statistic for measuring
severity on a dimension.
PCA is a way to group items according to the second, bi-directional
component, for example into typical and atypical depression. In this context,
Bertrand Russell ’ s ramified hierarchy of typology is the best way to illustrate
the clinical meaningfulness of PCA ( 27 ). The example used by Russell is the
definition of a typical versus an atypical Englishman. It is clear that most
Englishmen do not possess all of the properties that most Englishmen
possess. Therefore, a typical Englishman, according to this definition, might
be atypical. The problem raised by Russell is that the word ‘typical’ has been
defined by a reference to all properties. It is in this situation that Russell
introduced his ramified hierarchy in order to deal with the apparent circularity
( 27 ). Being a typical Englishman should not refer to the totality of properties
(all potential items) but to a sub-totality of the predictive items for which
over 50% of the properties are captured by the concept of a typical Englishman.
The PCA can be considered as a method of ramified hierarchy in which the
second component has identified the predicative items by contrasting items
with negative and positive loadings.
Classical psychometrics 15
In conclusion, with reference to Russell ’ s theory of typology, the general
component or factor identified by principal component analysis is the
description of being an Englishman, whereas the bi-directional second prin-
cipal component or factor is the description of being a typical or an atypical
Englishman by the contrasting positive versus negative loadings of the second,
bi-directional factor, e.g., positive = typical and negative = atypical.
Hans Eysenck: Factor analysis and personality questionnaires
In the autumn of 1945, Eysenck (1916–1997) was appointed Chief
Psychologist at the Psychiatric Institute in London, which is affiliated with
the Maudsley psychiatric hospital ( 1 ).
Eysenck set out to evaluate the validity of the psychological tests used in
clinical psychiatry in the late 1940s. This has quite neutrally been summa-
rised by Schafer, who concludes that if the results of a psychological test
diverge from the diagnosis made by the psychiatrist, this does not necessarily
mean that the test is incorrect ( 28 ). A clinical diagnosis, e.g., in depression,
was not at that time clear-cut, as psychiatrists often found it difficult to
distinguish between neurotic and psychotic depression. This mirrored
Kraepelin versus Freud in their understanding of ‘neurotic’ and ‘psychotic’.
The above-mentioned Rorschach interpretation in schizophrenia is a good
example of this ( 16 ).
In this connection, it is imperative to understand that Eysenck did not
himself treat patients and that his contact with clinically experienced
psychiatrists led him to perceive Freud ’ s psychoanalysis as both a theory of
personality and a treatment model ( 1 ). Eysenck soon realised that as a
treatment method, psychoanalysis lacked clinical effect. In his personality
questionnaire studies, however, his reference frame was to be found in
Freud ’ s and Jung ’ s psychoanalytic models of personality rather than in true
clinical reality. In his trials with factor analysis, he adhered to Spearman ’ s
British tradition by examining the first two factors (the general versus the
dual), while using Hotelling ’ s principal component analysis.
As mentioned previously, it had become a tradition among psychologists to
use the test constructed by the psychiatrist Herman Rorschach (1884–1922)
(the Rorschach test). In the area of personality, Rorschach had discovered that
vision can be influenced by the personality behind the ‘glasses’. He thus
thought that coloured inkblots are especially stimulating for the extrovert
personality (extroversion dimension), while non-coloured inkblots, with
less movement of the figures, are connected to the introvert personality
( introversion dimension).
16 Clinical Psychometrics
Eysenck demonstrated that these Rorschach theories could not be
empirically reproduced using the Rorschach test, as interpretations of
the test varied a good deal from one psychologist to another. In the field of
psychometrics, Eysenck adopted the position that it is important to work
with consistent personality dimensions. Using an empirical approach, he
demonstrated that it is possible to ask people what they are experiencing.
By using questionnaires, Eysenck was able to eliminate investigator influence
on testing behaviour, and he felt that the use of factor analysis would ensure that
the interpretation of the questionnaire response profiles would not be influ-
enced by the interpretation of the individual investigating psychologist. Eysenck
made use of lay subjects (initially often young men up before the medical board
prior to military service), but rarely included patients with a valid diagnosis. His
questionnaires had qualitative response options on a nominal scale, in which
only a ‘Yes ’ or a ‘No’ were required. One of the reasons for this was the limited
capacity of the computers available in the 1950s and 1960s; nowadays, we have
access to the necessary power when using quantitative response categories.
Eysenck drew on both Jung ’ s personality theory of extroversion/introversion
(as used by Rorschach), as well as on Freud ’ s personality theory of neuroticism,
as the basis of his psychologist approach.
As a psychologist working on a theoretical basis, Eysenck was not sufficiently
aware of the fact that both Jung and Freud were primarily clinical experts. Thus,
Freud perceived neuroticism as a particularly pronounced degree of normal
behaviour, not as the qualitative remove from normal behaviour seen in schiz-
ophrenia or the psychotic forms of depression or mania. As shown by Kline
( 29 ), Eysenck attempted to validate his questionnaire dimensions, e.g., neuroti-
cism and extroversion/introversion, within the field of learning psychology, not
in the clinical reality that formed the basis of Freud ’ s and Jung ’ s theories.
Among these personality dimensions ( 30 ), Eysenck ’ s neuroticism factor
proved the most definite ( 31 ). Figure 1.4 gives an abbreviated version of
Eysenck ’ s Neuroticism Scale with the nine items that best show the structure of
the anxious neurotic personality. Of the remaining questions in Eysenck ’ s
Neuroticism Scale (23 in all), many are closely associated with depression.
A psychometric analysis of Eysenck ’ s Personality Question naire (EPQ), based
on a study with persons experiencing relatively rapid remission after posttrau-
matic stress ( 32 ) and a corresponding control group (N = 1353 persons), gave
a Loevinger coefficient of homogeneity of 0.42, proving that it is acceptable
to use the total score of the nine questions as a measure of neuroticism.
Another study, with patients suffering from differing types of affective disor-
ders, showed that only Eysenck ’ s neuroticism scale was in accord with an
experienced psychiatrist ’ s assessment of the degree of neurosis ( 33 ).
Eysenck found that those persons specifically suffering from anxiety
had a response pattern that was very sensitive to negatively formulated
Classical psychometrics 17
questions – such as those dealing with symptoms: the higher the number of
affirmative responses, the more neurotic the subject is.
When commencing his research with these questionnaires, Eysenck
labelled the Rorschach test the idiographic method of measurement and his
own questionnaires, the nomothetic method.
The idiographic method is concerned with what is of unique significance
to one individual with no relevance for others and Eysenck therefore cor-
rectly stated that the idiographic method cannot be used in measuring, since
to measure is precisely to observe individuals with reference to a common
scale. In contrast, the nomothetic method centres on what can be measured.
Eysenck ’ s use of factor analysis to prove the fact of a nomothetic measure is a
paradox, because factor analysis is not a method of measurement. Thus, in
modern research factor analysis is used in idiographic analyses, e.g., when
describing an individual ’ s quality of life ( 34 ).
It is of great importance to understand that Eysenck ’ s intensive personality
questionnaire research using factor analysis actually confirms Spearman ’ s
results within the field of intelligence tests, in that e special focus should
placed on the first two factors identified by the analysis. Thus, Eysenck found
that the first factor was a general neuroticism factor (Figure 1.4 ), while factor
2 was a dual factor discriminating between extroversion versus introversion
( 30 ). It was Eysenck ’ s attempts to explain the remaining factors and to relate
these to the psychoanalytic perception of personality rather than to clinical
reality that blurred his results.
Eysenck’s Neuroticism Scale
No. Symptom Yes(= 1)
No(= 0)
15 Are you an irritable person?
19 Are your feelings easily hurt?
31 Would you call yourself a nervous person?34 Are you a worrier?
38 Do you worry about awful things that might happen?
41 Would you call yourself tense or “highly-strung”?
47 Do you worry about your health?
54 Do you suffer from sleeplessness?
72Do you worry too long after an embarrassing experience?
Total score
Item numbers in accordance with the EPQ (30)The questions below address how you would describe yourself in general
Figure 1.4 Scoring sheet for Eysenck ’ s neuroticism questionnaire
18 Clinical Psychometrics
Around 1970, the American psychologist Charles Spielberger devel-
oped a questionnaire to measure anxiety ( 35 ). In this he attempted to
discriminate between dispositional neurotic personality and present state
anxiety. The first of these he termed ‘trait’ anxiety and the second ‘state’
anxiety.
Figure 1.6 shows Spielberger ’ s ‘trait’ scale with 9 items selected from the
original 20. This selection is based on the criterion of clinical validity, so that
it corresponds with Eysenck ’ s neuroticism scale (Figure 1.4 ).
Around 1990, an international consensus that a five-factor personality
model could adequately cover the whole field was achieved among psychol-
ogists ( 36 ).This model is called ‘The Big Five’ ( 37 ). On the basis of this model,
a questionnaire, the NEO-PI-R, was developed. The two first factors in ‘The
Big Five’ are based on Eysenck ’ s EPQ and reflect Eysenck ’ s Neuroticism Scale
and Eysenck ’ s Extraversion Scale. Neuroticism and Extroversion are usually
referred to as ‘The Big Two’; however, the items in the NEO-PI-R do not ade-
quately cover Eysenck ’ s original dimension. The abbreviated versions of
Eysenck ’ s Neuroticism and Extroversion Scales (shown in Figures 1.4 and 1.5 )
are sufficient when measuring ‘The Big Two’.
Figure 1.7 shows the nine NEO-PI-R items that correspond most closely to
Eysenck ’ s neuroticism from a clinical point of view as shown in Figure 1.4 .
Only five out of the nine items in Figure 1.7 are negatively phrased, so the four
Eysenck’s Extraversion scale
No. Symptom Yes (= 1)
No(= 0)
5 Are you a talkative person?
10 Are you rather lively?
17 Do you enjoy meeting new people?
32 Do you have many friends?
52 Do you like mixing with people?
60 Do you like doing things in which you have to act quickly?
70 Can you get a party going?
82 Do you like plenty of bustle and excitement around you?
86 Do other people think of you as being very lively?
Total score
Item numbers in accordance with the EPQ (30)The questions below address how you would describe yourself in general
Figure 1.5 Scoring sheet for Eysenck ’ s extraversion questionnaire
Classical psychometrics 19
Spielberger’s trait anxiety scale
No.
2
4
8
9
11
12
14
18
20
SymptomYes
(= 1,2,3)*No
(= 0)
I tire easily
I wish I could be as happy as others seem to be
I feel that difficulties are piling-up so that I cannot overcome them
I worry too much over something that really doesn’t matter
I am inclined to take things hard
I lack self-confidence
I try to avoid facing a crisis or difficulty
I take disappointments so keenly that I can’t put them out of my mind
I become tense and upset when I think about present concerns
Total score
Item numbers in accordance with the original publication (35)The statements below address how you would describe yourself in general
* Degrees 1, 2 and 3 all give positive replies
Figure 1.6 Scoring sheet for Spielberger ’ s trait anxiety questionnaire
NEO items corresponding with Eysenck’s neuroticism dimension
No. Symptom Yes(= 1)
No(= 0)
1 I am not the worrying type
31 I scare easily
61 I seldom feel anxious or uneasy
79 I hesitate to show anger, even when apposite
91 I often feel tense and nervous
121 I seldom worry about the future
147 I do not see myself as especially unworried
151 I often worry about things that might go wrong
216 Even minor factors can frustrate me
Total score
Item numbers in accordance with the original scaleThe questions below address how you would describe yourself in general
Figure 1.7 Scoring sheet for modifi ed neuroticism questionnaire (NEO )
20 Clinical Psychometrics
remaining items must be ‘flipped’ when measuring the degree of neuroticism.
When this is done Loevinger ’ s coefficient of homogeneity is 0.42.
Max Hamilton: Factor analysis and rating scales
Hamilton (1912–1988) commenced his career as a psychiatrist just after
World War II. He had the same starting point as Kraepelin, that of wishing
to utilise psychometrics as a means of making clinical psychiatry more
scientific in its approach. In 1945 he started working at the Maudsley Hospital
in London – at the same time and at the same place as Eysenck. He actually
attended Eysenck ’ s PhD courses in factor analysis ( 1 ).
His approach was that psychometrics in clinical psychiatry should be
considered a scientific discipline parallel to pharmacology and biochemistry.
During his career, Max Hamilton was Associate Professor of Psychiatry at
Leeds University from 1953–1957. These years saw the founding of modern
psychopharmacology, beginning with the establishment of the antimanic
effect of lithium compared to placebo, followed by the antimanic and antip-
sychotic effect of chlorpromazine. Such placebo-controlled, randomised,
double-blind clinical trials became more and more common in Britain in the
1950s and Hamilton could see the need for reasonably brief rating scales to
be used when measuring the effects of these new psychotropic drugs.
Hamilton held a position as research assistant at Leeds University Hospital
from 1957 to 1960 while developing his two rating scales, the Hamilton
Anxiety Scale (HAM-A) from 1959 ( 38 ) and the Hamilton Depression Scale
(HAM-D) ( 39 ) from 1960. While Eysenck was interested in the more perma-
nent features of neuroticism, Hamilton was only interested in the symptoms
of anxiety or depression that appeared as signs of clinical disorders and were
reversible through psychopharmacological treatment. Like Kraepelin, his
opinion was that these symptoms provide the best impression of the anxious
or the depressive patient.
With both of his scales, the HAM-A (see Figure 1.8 ) and the HAM-D (see
Figure 1.9 ), Hamilton ’ s purpose was to measure those mental and physical
symptoms found by the patient and his or her relatives to be the greatest
burden. Hamilton ’ s goal was not to make a diagnosis, only to measure the
severity of the anxious or depressive condition. So each week the question
was how severe the symptoms listed in Figure 1.8 and Figure 1.9 had been
during the past week. Based on these weekly assessments during a course of
treatment with antianxiety or antidepressive medication, it would be possible
to describe their clinical effects.
Just as Eysenck did, Hamilton made use of factor analysis to demonstrate
the scientific value of his scales in his psychometric publications.
Classical psychometrics 21
For the depression scale, Hamilton found a varying number of factors
during his studies (Hamilton, 1960, 1967). The first study population was
very homogeneous, namely, depressive patients who were so severely afflicted
that they were hospitalised. In the next study, the patient population was
more heterogeneous, consisting of depressive patients who were either
hospitalised or attending an out-patient clinic. Hamilton could see that, in an
increasingly homogeneous patient group, an increasing number of factors
could be identified; an unfortunate consequence of the correlation method as
a mathematical element of factor analysis.
With his anxiety scale studies in out-patients suffering from anxiety
neurosis, Hamilton found a two-factor model in both the first trial using a
13-item anxiety scale ( 38 ) and in the next trial with the final 14-item version
(Hamilton 1969) ( 40 ). Hamilton ’ s factor analysis showed that the first factor
was a general factor while the second factor was dual, as it had negative
Hamilton Anxiety Scale HAM-A14∗ HAM-A6 (the core symptoms of anxiety)
1∗ Anxiety
2∗ Tension
3∗ Fears
4 Insomnia
5∗ Difficulties in concentration
6 Depressed mood
7∗ General somatic symptoms (muscular)
8 General somatic symptoms (sensory)
9 Cardiovascular symptoms
10 Respiratory symptoms
11 Gastrointestinal symptoms
12 Genito-urinary symptoms
13 Other autonomic symptoms
14∗ Behaviour at interview
See Appendix 5a for Manual
Figure 1.8 Scoring sheet for HAM-A 14
22 Clinical Psychometrics
loadings on the physical anxiety symptoms and positive factor loadings on
the psychic anxiety symptoms (Table 1.1 ). This was subsequently confirmed
by a French study using the HAM-A ( 41 ).
A major international trial with DSM-III panic attack patients confirmed
this HAM-A 14
two-factor model ( 42 ). On the basis of these results, Hamilton
thought that the first factor is general (i.e., all the symptoms in the scale
concur in measuring one dimension), providing enough evidence to use the
total score as a sufficient statistic. But Hamilton became less confident about
this conclusion when his anxiety scale was not able to distinguish between
placebo and an antianxiety drug ( 43 ).
The fact that the second factor in Hamilton ’ s anxiety scale is bipolar,
or dual, i.e., that some items have negative factor loadings and others have
positive factor loadings, is perhaps the most interesting element in the
factor analysis method. Factor loadings demonstrate the correspondence
1∗ Depressed mood
2∗ Guilt feelings and self-depreciation
3 Suicidal ideation
4 Initial insomnia
5 Middle insomnia
6 Delayed insomnia
7∗ Work and interests
8∗ Psychomotor retardation
9 Psykomotor agitation
10∗ Anxiety (psychic)
11 Anxiety (somatic)
12 Gastro-intestinal symptoms
13∗ General somatic symptoms
14 Sexual interest
15 Hypochondriasis
16 Loss of insight
17 Weight loss
See Appendix 3a for Manual
The Hamilton Depression Scale (HAM-D17)∗ HAM-D6 (core symptoms of depression)
Figure 1.9 Scoring sheet for HAM-D 17
Classical psychometrics 23
between symptoms and the factor in question, thus implying a psycho-
logical insight. This demonstration that the anxiety condition consists of
physical and psychic anxiety symptoms with an equal distribution, seven
physical and seven psychic anxiety symptoms in HAM-A 14
, proved to be
highly significant later on. Hamilton did not look into this because interest
was centred on his depression scale in the period from 1969 to 1989.
Factor analysis was not able to provide a psychological insight in depressive
symptomatology through the factor structure of the HAM-D.
Factor analysis is a psychometric method that reveals a structure in an
assessment scale, but not whether it is a dimension in which the total score is
a meaningful expression of the severity of a condition. In his monograph on
clinimetric methods, Feinstein uses Hamilton ’ s scales as examples of scales
‘produced by factor analysis’, however, without discussing the nature of this
validation procedure ( 2 ).
Here it is important to understand that Max Hamilton built on Spearman ’ s
and Eysenck ’ s factor analysis within the frame of the two explainable factors.
Hamilton went on to demonstrate that (particularly in the HAM-A) the first
factor is a general factor while the second factor is bi-directional, differentiat-
ing between somatic and psychic anxiety symptoms. This dualism between
body and mind seems to underline the accepted custom of calling factor 2 a
dual factor.
Factor-analytic studies with Hamilton ’ s Depression Scale have shown
that the great difference between different clinical trial results is in the
number of factors produced and their item loadings. In other words, the
American factor-analytic tradition leads to inferior results. The British
tradition (only interpreting the two first factors – the general versus the
dual) would seem to result in a fair amount of agreement between different
clinical trials. A recent landmark study in this respect is from the STAR-D
analysis ( 44 ).
Pierre Pichot: Symptom rating scales and clinical validity
Pichot obtained his medical degree in Paris in 1947. When he, like Hamilton,
chose psychiatry, his purpose was to use psychometrics as a scientific disci-
pline on the same plane as pharmacology and biochemistry. Pichot therefore
studied psychometrics at the Sorbonne in Paris immediately after getting his
medical degree ( 3 ).
He took up a position as registrar at the psychiatric hospital Saint-Anne in
Paris under Professor Delay, who was among the first to demonstrate the
antipsychotic effect of chlorpromazine.
24 Clinical Psychometrics
In 1972, Pichot made it clear that, from a psychometric point of view,
using the HAM-D total score in studies on the antidepressive effect of a drug
was a dead end. His reason was that factor analysis had not accepted the use
of the HAM-D total score. Thus, Pichot did not acknowledge a demonstration
of a general factor as sufficient evidence that the total score was a sufficient
measure of the degree of depression.
Pichot had worked with the US rating scale, the BPRS (Brief Psychiatric
Rating Scale), developed by Overall and Gorham ( 45 ). Drawing on a symp-
tom pool of more than sixty symptoms, it had been demonstrated that the
eighteen symptoms in Figure 1.10 were especially sensitive to change during
a course of chlorpromazine therapy in psychotic patients and imipramine
therapy in depressive patients. The BPRS is perhaps the most widely used
psychiatric rating scale worldwide. This is perhaps because it is seemingly
easy to use; see Kraepelin ’ s symptom list in Figure 1.3 .
Pichot then recommended the use of the six BPRS depressive symptoms to
measure the antidepressive effect of a drug. A major review of the BPRS some
years later confirmed that Pichot ’ s depression factor was an independent
factor in the BPRS ( 46 ).
Pichot had been brought up in the French school of psychometrics,
founded by Alfred Binet (1857–1911) through his intelligence tests for primary
school pupils. Binet ’ s starting point was that school teachers possessed the
greatest knowledge about the intellectual abilities of their pupils in the differ-
ent levels of primary school. So Binet enlisted the aid of the most experienced
school teachers when choosing intelligence tests, instead of using Spearman ’ s
factor analysis. Binet ‘outperformed’ Spearman, as the updated versions of
Binet ’ s tests are now generally used.
In 1905 Binet declared that:
Our aim is, when a child is put before us, to take the measurement
of his intellectual ability, in order to establish whether he is normal
or if he is retarded. For this purpose we have to study his present
condition, and this condition alone… as a result we shall neglect
entirely his aetiology… We shall confine ourselves to gathering
together the truth on his present condition ( 47 ).
Pichot thus held the opinion that rating scales measuring antipsychotic
effect, antidepressive effect, or antianxiety effect should be based on the
clinical reality of the assessments of experienced psychiatrists and not on
factor analysis ( 3 ). The version of the BPRS scale shown in Figure 1.10 is
identical to Overall ’ s reference (The semi-structured Brief Psychiatric Rating
Scale interview and rating guide) as to symptom description ( 48 ).
The descriptions applying to absence of a symptom are taken from Turner ’ s
Classical psychometrics 25
Brief Psychiatric Rating Scale Score
1 Somatic concern
2 Anxiety, psychic
3 Emotional withdrawal
4 Conceptual disorganization
5 Guilt feelings
6 Tension
7 Mannerisms and posturing
8 Grandiosity
9 Depressive mood
10 Hostility
11 Suspiciousness
12 Hallucinatory behaviour
13 Motor retardation
14 Uncooperativeness
15 Unusual thought content
16 Blunted or inappropriate affect
17 Elation/euphoria
18 Confusion and disorientation
Figure 1.10 Brief Psychiatric Rating Scale
Mania scale
Grandiosity [8]Uncooperativeness [14]Hostility [10]Increased psychomotor activity [17]Intrusive behaviourElevated mood
Schizophrenia scale
Emotional withdrawal [3]Conceptual disorganisation [4]Suspiciousness [11]Hallucinations [12]Unusual thought content [15]Blunted affect [16]
Depression scale
Somatic concern [1]Anxiety, psychic [2]Guilt feelings [5]Tension [6]Depressive mood [9]Motor retardation [13]
Figure 1.11 The three BPRS subscales. In the brackets the item numbers as indicated in Figure 1.10
26 Clinical Psychometrics
1963 version ( 49 ). The first 18 items make up the BPRS-18. Two extra items
are included to allow measurement of mania (Figure 1.11 ).
A clinical validity analysis of the BPRS would result in a depression factor,
a mania factor and a schizophrenia factor, as seen in Figure 1.11 .The mania
and schizophrenia scales are often combined in a general psychosis factor
when assessing the clinical effect of antipsychotics.
In his final work, Psychology Down the Ages (1937) ( 50 ), Spearman writes
that the correlation coefficient developed by Pearson and himself was
exclusively comprehended and used in English-speaking countries. In
France and especially in Germany it was refuted. Classical psychometrics,
which is based on the concept of correlation in factor analysis, and
Cronbach ’ s alpha are thus typically described in the major American stand-
ard works on psychometrics: Guilford 1936 ( 5 ), Guilford second edition
1954 ( 19 ); Nunnally 1967 ( 51 ); and Nunnally and Bernstein third edition
1994 ( 52 ) as well as Comrey 1992 ( 22 ).
It is precisely because these major monographs on factor analysis lie
within the American reference frame that the interpretation of Hamilton
Depression Scale results is so problematic; this American tradition lays
stress on the number of factors, while the British tradition uses Ockham ’ s
razor, i.e., the principle of simplicity, and focuses on the two first factors (the
general versus the dual). Hamilton relied chiefly on Hotelling ’ s principal
component analysis.
The English philosopher William Ockham (1285–1349) described the
principle afterwards known as Ockham ’ s razor: the scientific community
should only assume what is strictly necessary when working with a scientific
hypothesis (the law of parsimony).
This was precisely Pichot ’ s point; that psychometric analysis of rating
scales should avoid the use of factor-analytic methods, as in the American
tradition. Such analysis should follow Binet ’ s model in using experienced
psychiatrists as a test of clinical validity, and use item response theory models
to determine if it is valid to sum the individual items as a total score. In
Pichot ’ s opinion, Binet had used the same reasoning when developing his
intelligence tests as that which lies behind the item analyses published by
Rasch in 1960.
27
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Focusing on reliability
As can be seen from the preceding chapters, classical psychometrics in
psychiatry has mainly been influenced by the work of clinical psychiatrists
(Kraepelin, Hamilton and Pichot) and not by psychologists. In the field of
psychometrics, Spearman and Eysenck attempted to measure the dimensions
of intelligence and personality, respectively; i.e., areas of human manifestations
of the mind of a more permanent nature. Kraepelin, Hamilton and Pichot
were absorbed by those symptoms that reflect clinical conditions and for
which modern psychopharmacology has now made treatment possible. Here
it should be mentioned that Gorham, rather than Overall, was the clinically
experienced person behind the BPRS. He had seen the major effects of
chlorpromazine and imipramine when these drugs became available during
the 1950s and 60s.
In clinical psychiatry, the classic psychological test has always been
regarded as a supplement to psychiatric diagnosis. However, it was a major
issue for the two psychiatric diagnosis systems [the International
Classification of Diseases (ICD) adopted by WHO in 1948 in the sixth edition
(ICD-6) and the American system, the Diagnostic and Statistical Manual of
Mental Disorders (DSM), first edition (DSM-I) published in 1953], as well as
the psychometric tests (e.g., the Rorschach test), that reliability was about
0.50. Reliability refers to the degree of unanimity a group of psychiatrists can
achieve when making an ICD- or DSM-diagnosis; or a group of psychologists
when interpreting a Rorschach test. Reliability is shown by a ‘coefficient of
reliability’, the intraclass or Kappa coefficient, and if the coefficient is around
0.50, one might just as well have tossed a coin. To be clinically meaningful, a
coefficient of reliability must be around 0.80.
2 Modern psychiatry: DSM-IV / ICD -10
28 Clinical Psychometrics
A complete revolution in clinical psychiatric diagnosing took place
around 1980, about one hundred years after the establishment of Wundt ’ s
psychological laboratory in Leipzig. It so happened that two US psychiatrists,
Spitzer and Klerman, who had used rating scales for many years, had noticed
that while agreement (reliability) was very poor when several psychiatrists
diagnosed a patient according to the diagnostic system in use at that time, the
reliability of the HAM-D, HAM-A or BPRS was very high ( 53 ). Furthermore,
Spitzer and especially Klerman were greatly concerned by the fact that
modern psychopharmacology was often used for illnesses for which it was
unsuitable or, conversely, not used in patients who might benefit from drug
therapy.
In 1980, the Association of American Psychiatrists published a completely
new diagnostic system based solely on the symptom profile. In this manner,
an adequately high reliability was ensured, and patients with treatment-
demanding depression or anxiety received the proper psychopharmacological
treatment.
The new diagnostic system was the third revision of the Diagnostic and
Statistical Manual for Mental Disorders, DSM, with the acronym DSM-III
( 54 ).With the DSM-III, a very good agreement emerged between the HAM-D
score and the diagnosis of major depression.
In 1992, the World Health Organization (WHO) published their 10 th
revision of the International Classification of Disease (ICD) diagnostic
system, subsequently given the acronym ICD-10 ( 55 ). This system copies the
DSM-III, but is unfortunately not its identical twin. As is the case with the
DSM-III, the ICD-10 is in high agreement with the HAM-D, HAM-A and
BPRS.
It is precisely because we have these two not quite identical systems in the
DSM-III and ICD-10, that rating scales such as the HAM-D have become the
natural common denominator. Thus, a score of 18 or more on the HAM-D
indicates a treatment-demanding depression both according to DSM-III and
ICD-10.
Focusing on validity
The symptoms included in DSM-III (or DSM-IV which was published in
1994 and is almost identical to DSM-III but still not adequately identical to
ICD-10) have been chosen through consensus, and not through empirical
research ( 56 ).
According to DSM-IV, a treatment-demanding depression is called a
major depression, with the algorithm that at least five of nine symptoms
should have been present almost every day throughout the previous two
Modern psychiatry: DSM-IV / ICD -10 29
weeks. According to ICD-10, a moderate depression implies that at least six
out of ten symptoms should be present almost every day throughout the
previous two weeks. As can be seen, these DSM-IV and ICD-10 cut-off scores
for ‘typical’ depression follow Russell’s definition of a typical Englishman; i.e.,
more than 50 % of the total number of items.
Klerman has called the introduction of the DSM-III a neo-Kraepelinic
paradigm. This is often perceived as a biological–medical approach to clinical
psychiatry, as opposed to the Freudian approach that prevailed between the
two twentieth century world wars.
This neo-Kraepelinic paradigm only refers to the fact that Kraepelin
introduced a thorough symptom description so that the course of symptoms
could provide diagnostic information. Kraepelin had learned from Wundt to
describe clinical symptoms under standardised conditions without letting
oneself be influenced by etiological deliberations. However, Kraepelin’s
description was not completely atheoretical, since the medical model of
disorders also ‘hovered’ at the back of his mind.
In all medical conditions an attempt is made to delimit the symptom
complex; i.e., the syndrome that the symptoms point to during the course of
the illness (from debut of symptoms to their diminishment during treatment).
The clinical reality referred to by the various rating scales is purely
psychiatric and thus neo-Kraepelinic. DSM-III/DSM-IV ( 54,56 ) and ICD-10
( 55 ) adhere (more or less) to this reality. Thus, the effect of lithium on the
‘positive’ symptoms of mania, but not schizophrenia, makes the distinction of
positive and negative symptoms meaningless.
Quantitative, dimensional diagnosis
Completely new revisions of DSM-IV and ICD-10 are planned in which the
dimensional approach will be the conclusive factor. However, factor analysis
is still employed to identify the ‘dimensions’ to be combined with the
diagnostic descriptions in DSM-V. A thematic section on DSM-V and ICD 11
in the journal Psychological Medicine used factor analysis to show that some
symptoms cluster in a mania factor; some around the positive and negative
schizophrenia factors, respectively; and a few around depression factors ( 57 ).
Thus, they do not comprise true dimensions in which the symptoms
identified cover the whole of the dimension in question. This is what modern
psychometrics is capable of through the use of item response theory models,
as will be shown in the next chapter.
Enhanced inter-rater reliability is the great improvement brought about by
the introduction of modern psychiatry (DSM-III/IV or ICD-10). Reliability
is a major component of classical psychometrics, but certainly not of modern
30 Clinical Psychometrics
psychometrics. In a recently published book on psychometrics by Furr and
Bacharach ( 58 ), the reliability issue is dealt with extensively, in contrast to
the focus modern psychometrics places on item-response theory models.
According to modern psychometrics, a scale with adequate validity also
possesses adequate reliability ( 4 ).
When evaluating the reliability of an assessment questionnaire, the degree
of unanimity is analysed. According to the clinimetric approach, the
experienced clinician must be the key, and subsequent analysis is made of the
percentage of clinicians using a certain scale that deviate from the master.
Some feel that a deviation of +/− 20% is acceptable, as for example with the
PANSS scale.
Satisfying a ‘democratic’ desire for inter-observer agreement can be
achieved by using an intra-class coefficient. The interview-related scales
included here all possess an adequate reliability.
The reliability of a questionnaire is indicated through a test–retest
reliability coefficient, which is to say the agreement between the results of the
questionnaire performed at two different points in time. When measuring
anxiety and depression, one must be sure that the profile of the condition has
been fairly stable during the period between the two test times if the test-retest
reliability coefficient is to be meaningful.
Classical psychometrics uses Cronbach’s coefficient alpha in order to avoid
the issue of condition profile constancy in the test–retest method, as this
coefficient uses a single time point to indicate the degree of correlation
between the individual questions. Cronbach’s alpha does not tell anything
about the validity of the individual questions, only about their reliability and
their mutual agreement. In his book on clinimetrics from 1987 Feinstein
attempted to put a stop to the use of Cronbach ’ s alpha, as its size depends on
the number of questions: the higher number of questions, the higher the
reliability ( 2 ). Using the same conditions as Feinstein (a 0.30 mean correlation
value between questions), the above-mentioned Furr and Bacharach
demonstrate that in a 4-item questionnaire, Cronbach’s alpha is 0.40, in an
8-item questionnaire alpha is 0.60, but in a 20-item questionnaire alpha
approaches 0.80; according to classical psychometrics, this is the value that a
questionnaire should achieve for adequate reliability.
In order to ensure a high Cronbach alpha coefficient, many questionnaires
have approximately 20 items; this is perhaps the cause of a growing dislike of
questionnaires in the general population, as it is obvious to everyone that
many of the items are redundant. However, Furr and Bacharach do not agree
with this sentiment, they use many pages to explain that modern statistical
software programs (SSPS and SAS) make it extremely easy to compute
Cronbach’s alpha.
Modern psychiatry: DSM-IV / ICD -10 31
Furr and Bacharach attempt to convince their readers that an alpha
coefficient of 0.75 does not in itself make a factor analysis superfluous. If the
coefficient alpha is perceived as a reliability coefficient, then stress must be
placed on the mutual agreement between the different items. In principal
component analysis, the simplest form of factor analysis, demonstration of a
general factor (i.e., all factors positively correlated) is the unanimity shown
by the alpha coefficient. Multidimensional assessment scales (Hamilton’s
Depression Scale and Hamilton’s Anxiety Scale) both have an alpha coefficient
higher than 0.75 ( 59 ). Hamilton felt that demonstrating a general factor
implies that the individual items in his scales may be summed as a measure
of degree of depression or anxiety severity. However, coefficient alpha and
factor analysis are not statistical methods which test whether a scale measures
the degree of severity by the sum of its items.
Only the modern item response theory model is able to statistically test
whether an assessment scale measures degree of severity. In a certain sense
one could say that the item response theory model has demonstrated the
importance of the typical depressive symptoms or the psychic anxiety
symptoms of the dual or bi-directional factor 2 in the measurement of
depression or anxiety. This then implies that relatively few items (less than
ten) are important in the measurement of depression or anxiety. If Cronbach’s
alpha is used on its own, then many more items are needed to go beyond the
0.75 limit, typically 15 to 20 items.
The DSM-IV/ICD-10 diagnosis systems (e.g., in schizophrenia or
depression) and modern psychometric methods agree in recommending
approximately ten symptoms as a suitable number. This indicates that clinical
reality can be adequately described through ‘a handful of items’. Classical
psychometrics, with Cronbach’s alpha coefficient or factor analysis, has typically been used by those interested in personality questionnaires.
Nunnally (1967) states: ‘it is unrealistic for the measurement of most human
traits only to have a handful of items’ ( 51, 52 ).
32
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Measurement of the manifestations of the mind in modern psychometrics
includes two absolutely essential elements. The first of these core features is
that each symptom in a rating scale is itself measured on a scale. The term
‘scale’ is derived from the Italian and means ‘stairs’. In the DSM-IV and ICD-
10 systems, a symptom only has one step: either the symptom is absent and
one is on the ground floor, or the symptom is present and one is one step
down towards the basement. In clinical psychometrics, it is deemed essential
to have several steps and a six-step ‘basement stair’ is thought optimal to
measure each symptom.
The second core feature in modern psychometrics is whether or not the
score of the symptom items belonging to a syndrome, e.g., depression, can be
added up, so that the sum of all the symptoms constitutes a sufficient statistic
for the impression of the present state.
Three statisticians have, each in their own way, played a vital part in the
development of modern psychometrics, namely Ronald A. Fisher, Georg
Rasch and Sidney Siegel.
Ronald A. Fisher: From Galton’s pioneer work to the sufficient statistic
Some people believe that psychometrics in fact started with Francis Galton
(1822–1911). In contrast to Wundt, Galton attempted to connect psycho-
metrics to the theory of evolution put forth by his cousin Charles Darwin
(1809–82) in his ‘The Origin of Species’. It was the psycho-social aspect of
the theory of evolution that Galton attempted to measure. In 1883, he
published ‘Inquiries into Human Faculty and its Development’, which was
actually a collection of rather mixed essays. It is an anthropological rather
3 Modern, dimensional psychometrics
Modern, dimensional psychometrics 33
than a psychometric publication. Of particular psychometric significance is
Galton ’ s attempt to develop ‘verbal scales ’ containing several response
categories. He discovered how difficult it is to describe these ‘orders of mag-
nitude’ so that they are understood in the same way from one subject to
another.
In 1884, Galton established the first psychological laboratory in Britain
(in London). Galton developed an increasing interest in mathematical
statistical problems, and it was Galton ’ s pupil Karl Pearson (1857–1936), who
published a correlation analysis, analogous to that of Spearman. In his
ground-breaking work from 1904, Spearman writes that it was actually
Galton who put forward in 1886 the mind set of correlation analysis, when
seeking a mathematical expression, where the value 1 signified perfect
correlation between two factors (e.g., that people with long arms usually also
have long legs), where the value 0 meant no correlation, and the value −1
meant a negative correlation ( 17 ).
After Galton’s death in 1911, Karl Pearson acquired his professorship in
genetics, but established an institute for applied statistics at the University of
London, in which Galton’s laboratory was incorporated.
In their statistical work, both Galton and Pearson were interested in those
physical or mental qualities that have a normal distribution. Galton measured
the height of 8585 British citizens and found a mean and a dispersion that
was in accordance with the normal distribution, the so-called Gaussian bell
curve.
Ronald Fisher (1890–1962) worked at Galton’s laboratory in the
1930s ( 60 ).
Fisher was a mathematician and had developed a great interest in statistics.
He worked on solving the problems that had arisen when statistics was
applied to small data sets. Here, one was attempting to construct a statistical
understanding (inference), including how representative the observations
of the test sample were of the distribution one sought to estimate (e.g., the
normal distribution or Gaussian distribution), i.e., how to calculate the
distribution parameters.
In 1922, Fisher published a paper ‘On the Mathematical Foundation of
Theoretical Statistics’ in which he states that the statistician’s task is to ensure
minimal loss of information when data are reduced, for example, to a normal
distribution ( 61 ). It is important to find sufficient statistical expressions
( sufficient statistics).
Ronald A. Fisher is regarded by many as the founder of medical statistics,
especially with reference to the first edition of ‘Statistical methods for
research workers’ in 1925, which, as mentioned previously, Hotelling reviewed
in 1927.
34 Clinical Psychometrics
Georg Rasch: From Guttman’s pioneer work to item response theory analysis ( IRT )
Georg Rasch (1901–80) was a professor of statistics for Danish psychologists.
Like Fisher, Rasch had a degree in mathematics, with an MSc from the
University of Copenhagen in 1925 and a Doctor degree in 1930. His doctor-
ate thesis was entitled: ‘On Matrix Calculus and its Application in Differential
Equations’. At that time two professorships in mathematics were available in
Copenhagen, but one of these was given to A.F. Andersen (1891–1972) and
the other to the young Børge Jessen (1907–93).
In 1935, Rasch received a Rockefeller scholarship for 12 months studies at
Fisher’s London institute, as he was now moving on to statistics. Fisher’s
concept of sufficiency served as inspiration for the psychometric model
developed by Rasch in the 1950s, which was to become the basis of modern
psychometrics ( 62 ).
The central element in modern psychometrics is whether there is a latent
additive function when the symptoms in a rating scale are used. If this is the
case, the total score is then a sufficient statistic for the present symptom
profile.
When discussing the item response theory model published by Rasch in
1960, it is important to realise that the model is not the result of a theoretical
study. This IRT (item response theory) analysis was used in connection with
a research problem, where it was necessary to have a method for comparing
subjects independently of which items they had been measured with.
In his empirical studies, Rasch was very interested in his subjects’ ability to
solve mathematical problems. In order to assess the capabilities of the
subjects, one chose arithmetical problems that could be ranked according to
difficulty, so that some are very easy to solve, some slightly more difficult,
some again moderately difficult, some markedly difficult and some highly or
extremely difficult. The bright student is able to solve almost all the problems,
while the less clever student is only able to solve the easier ones.
If each problem is scored as correctly solved or incorrectly solved (on a
nominal scale), it is then possible to demonstrate, provided the Rasch analy-
sis is valid, that the sum of correct answers is a sufficient measure of the
subject’s present ability to solve arithmetical problems. What is investigated
in the Rasch analysis is whether or not the ranking of the problems, as made
by the skilled mathematician, is reflected when taking into account such
external factors as age and gender.
When using this Rasch analysis on a symptom rating scale, the prevalence
of the symptoms is analysed. Figure 3.1 shows a prevalence ranking of six
depressive symptoms. In mild cases of depression, the symptoms ‘lowered
Modern, dimensional psychometrics 35
mood’, ‘loss of interest’ and ‘tiredness’ are almost always present. So these are
the three symptoms a GP must especially enquire about. Often ‘tiredness’
is the symptom that brings the patient to the doctor, and he or she will often
tell the doctor that when one is very tired then one becomes depressed or less
interested in one’s daily activities: if the doctor finds no ‘organ-related’ or
physical explanation of the tiredness, it is then important to quantify whether
lowered mood is present as well as less interest in daily activities.
When presented with a depressed patient in a psychiatric emergency ward,
the doctor on call has to determine if suicidal impulses are present in order
to decide whether hospitalisation is necessary. At this point, the more rare
symptoms in Figure 3.1 must be clarified, i.e., ‘guilt feelings’ and ‘psychomo-
tor retardation’. The symptom ‘suicidal ideation’ is extremely difficult to
assess, but as a depressive state is by far the most common cause of suicide; it
is very important to establish the presence or absence of ‘guilt feelings’ and
‘psychomotor retardation’.
With reference to the ability to solve mathematical problems, the bright
student will be able to solve both easy and difficult problems. In the same
way, it applies that a depressed patient with ‘guilt feelings’ may also have more
‘mild’ depressive symptoms; i.e., lowered mood, loss of interest and tiredness.
In psychometric terminology, these three symptoms are termed ‘ceiling
symptoms’ as they reach the frequency ceiling even in mild depressive states
(Figure 3.1 ). In the Rasch model the term ‘item parameter difficulty’ is used,
Dep
ress
ed m
oo
d
Lac
k o
f in
tere
sts
An
xio
us
mo
od
Tir
edn
ess
and
pai
ns
Gu
ilt f
eelin
gs
Psy
cho
mo
tor
reta
rdat
ion
Frequency percentage
Severity of depression
Ceiling
Figure 3.1 Prevalency structure of the six depression symptoms
36 Clinical Psychometrics
and ‘ceiling items’ are then classified as items with low difficulty. The
symptoms of ‘guilt feelings’ and ‘psychomotor retardation’ are referred to in
psychometrics as ‘floor symptoms’ as they only emerge in more severe states
of depression. In the Rasch model the item parameter difficulty ranges from
minus 2 to plus 2 ( 63 ). When reflecting the underlying dimensions of depres-
sive states, the rank ordering of items into ‘ceiling items’ versus ‘floor items’
can be transformed to a dimension of depression on a scale from 1 to 5 where
the Rasch minus 2 = 1, minus 1 = 2, 0 = 3, plus 1 = 4, and plus 2 = 5. Applying
this to the HAM-D 6 rating scale for depression, we have confirmed its clinical
validity by the psychometric (Rasch) model of measurement ( 64 ).
As the symptom ‘suicidal ideation’ can be seen as a ‘floor symptom’, and
thus the last link, where the three ‘ceiling symptoms’ are the first link, and the
three ‘floor symptoms’ are the next, then the patient should be closely
monitored and hospitalised.
That the so-called ‘ceiling symptoms’ (Figure 3.1 ) occur before the ‘floor
symptoms’ when one assesses men versus women and older versus younger
persons is an expression of ‘the concept of transferability’ in applied psycho-
metrics. Computer Adopted Testing (CAT) is often referred to in modern
psychometrics ( 65 ). Some people view the extremely dramatic reduction of
the many single elements stored in the individual items of a rating scale or a
questionnaire to a sum score of all items (total score), as a sign of reduction-
ism. This is understood as an eagerness to reduce that may tempt one to
claim that one has extensively analysed what one seeks to measure. Rasch
found it extremely important to avoid this reductionism when one had found
empirical evidence that a rating scale or a questionnaire fulfilled the item
response theory model. The items one had isolated in this manner measured
a very important quantitative aspect, while the excluded items might possess
an important independent significance. When measuring the quantitative
degree of depression in depressive states using Hamilton’s 17 items, items like
sleep disturbances and suicidal ideation are excluded. This is because sleep
disturbances are often present in mild depressive states, but not always in
severe states. The issue of suicidal ideation is often so complex that it is
important to have the underlying quantitative measurement performed.
As will be seen later on, the Rasch sufficiency line of thought, viz the true
reductional measurement of technology, is important for dose response rela-
tionships when using antidepressants. The measurement problem inherent
in clinical trials of antidepressants (better or worse outcome, milder or more
severe degree of depression) has been solved by Rasch analysis.
It is indeed interesting to follow the thoughts of the psychologist J. Michell
in his two monographs, in which he reviews psychometrics within scien-
tific psychology from a historical perspective. In his first monograph
Modern, dimensional psychometrics 37
( An introduction to the Logic of Psychological Measurement ) his review of
psychometrics ends with Guttman’s cumulative rating scale ( 66 ). This scale
exactly fulfils the mathematical principle inherent in the item response
theory model; that the difference between different subjects can be measured
when the total score is a sufficient statistic, e.g., that the clever student is able
to solve both the difficult and less difficult problems, while the less clever
student has only managed to solve the easier problems. However, Guttman’s
cumulative scale is a deterministic scale which does not permit the statistical
uncertainty that must be accepted, not least in the clinical field. The Rasch
method is often called a statistical version of the Guttman scale.
Louis Guttman (1916–87) was professor of sociology and psychology at
the Hebrew University of Jerusalem, where he was director of an institute
that was later renamed the Guttman Institute in his honour.He set forth his
model for accumulating individual scale items in the 1930s. During World
War II, Guttman’s model was used to study instant anxiety symptoms in
American troops who had been under fire. It turned out that the somatic
anxiety symptoms that appeared immediately or within hours after these
combat situations could be ranked according to the Guttman principle, so
that the milder anxiety symptoms included palpitations, ‘butterflies in the
stomach’, and dizziness. The more severe anxiety conditions included nausea,
hand tremor and stiffness of the body ( 67 ).
When Michell published his second book (Measurement in psychology)
in 1999, it concludes with a paragraph on item response theory precisely as
developed by Rasch ( 68 ).
The psychologist Borsboom published ‘Measuring the Mind’ in 2005,
further extending Michell’s summary of psychometrics from a psychologist’s
standpoint ( 69 ). Borsboom correctly attempts to distinguish between the
clinical validity of a rating scale, which is clearly a technical, clinical (not
primarily a psychometric) issue, and psychometric validity. However, he then
adds that once clinical validity has been established, it is also important to
perform a psychometric validation analysis, and for this purpose he recom-
mends the Rasch analysis. A good introduction to the Rasch analyses is found
in: Bond TG, Fox CM, Applying the Rasch Model ( 70 ). The best example of the
practical procedure when performing a Rasch analysis of a rating scale is to
be found in Allerup’s Statistical analysis of MADRS – a rating scale developed
in 1986 ( 71 ).
Modern psychometrics was founded by Georg Rasch. In fact, it was after
many attempts to perform factor analysis, especially with the many suggested
ways of rotation, that Rasch realised that this approach was unscientific,
because the guidelines for these rotation procedures were based on ‘trial and
error’, not on evidence ( 72 ). He found the rotation procedures more harmful
38 Clinical Psychometrics
than helpful in providing ability scales for measurements. This was the
background upon which Rasch developed his item response theory model.
He emphasised that his analysis of measurements should only be performed
when a rating scale had been proved clinically valid. Then the problem of
measurement should be tested, i.e., transferability defined as a mathematical-
statistical analysis of whether the scale contains one and only one dimension
when used several times during a course of therapy, and when controlled for
age or gender bias. As pointed out by Borsboom one of the requirements
in the Rasch model is local independency between items ( 69 ), an attempt
by Rasch to screen out the tautological correlations between items, i.e., a
problem inherent in factor analysis.
Sidney Siegel: Non-parametric statistics
Siegel (1916–61) completed his PhD in psychology in 1953 at Stanford
University and then taught psychology and statistics at the University of
Pennsylvania until his death in 1961. Together with the philosopher Donald
Davidson (1917–2003), he worked on psychometric analysis, including
measurement theory models. However, Davidson abandoned these psycho-
logical analyses due to the difficulties in measuring subjective experience,
while still adhering to Wundt ’ s approach to non-reductive monism, i.e., that
it is only possible to reduce psychological dimensions to less complex
psychological elements, but never to unique biological elements. Høffding
subsequently designated this approach critical monism .
Siegel also worked with Patrick Suppes (1922–) who independently of
Georg Rasch demonstrated that the latent additive function is the central
element in psychometric measurements.
In 1956, Sidney Siegel published ‘Nonparametric Statistics for the
Behavioural Sciences’ the first work to collect the non-parametric or
distribution-free statistical tests, also known as rank order tests ( 73 ). When
drawing conclusions based on a sample of measurement results, one might,
especially in the field of psychology, feel uneasy about assuming that the
underlying distribution belongs to a certain category of distribution.
As one of the most significant non-parametric tests, Siegel included
Fisher’s exact test, which is without parameters. In any case, Siegel’s book
from 1956 has become a kind of bible on the relations between the scale step
version (response category type) of the individual items in a rating scale and
the corresponding statistical analysis. Thus, the nominal scale step (the cate-
gory scale) is associated with, for example, Fisher’s exact test; in this case,
when wishing to use Pearson ’ s Χ 2 -test, one must, according to Siegel, perform
a Yates ’ correction.
Modern, dimensional psychometrics 39
The ordinal response category scale is associated with non-parametric
tests such as the Wilcoxon Signed Rank Test or the Kruskal-Wallis One-Way
Analysis of Variance by Ranks. (The Spearman correction analysis is a non-
parametric test, while Pearson’s correction is a parametric method). Siegel’s
great contribution was to focus on the relations between item response cate-
gory and the statistical (non-parametric) test. Some people believe this to be
true psychometrics (see Figure 3.2 ).
Robert J. Mokken: Non-parametric analysis for item response theory ( IRT )
The connection between the prevalence of a symptom (e.g., in depression)
and the severity of depression in the group of patients under examination,
has a probability value that is included in the Rasch analysis; a parametric
analysis.
Based on this connection inherent in Rasch analysis, Mokken (1929–)
developed a corresponding non-parametric analysis ( 74, 75 ). It is thus inher-
ent in the model that items with a high prevalence (e.g., lowered mood or
lack of interest in daily activities) are present in both the mildly depressed
patient and the more severely depressed patient (ceiling effect), while items
with a low prevalence (e.g., guilt feelings or psychomotor retardation) are
only present in the more severely depressed patient. This is often referred to
within the Mokken analysis as invariant item ordering, i.e., transferability.
Mokken published his non-parametric model in 1971 and was in many ways
Level of measurement Nominal scale Ordinal scale Interval scale
Classification(e.g. Men versus women)
+
Ranking(more or less depressiveon HAM-D17)
+
Unit of measurement(HAM-D6)
+
Statistical tests Fisher’s exacttest X2 test
WilcoxonRank order test
Student’s +–testEffect size
Figure 3.2 Connection between measurement level and the corresponding statistical test (Modifi ed from Siegel S. Nonparametric statistics for the behavioural sciences. New York: McGraw Hill, 1956)
40 Clinical Psychometrics
influenced by Rasch analysis. However, based on Siegel’s defence of the use of
non-parametric statistics when the individual items of a rating scale are
measured with severity categories corresponding to those of the original
scale, he stated that Loevinger’s coefficient of homogeneity was the most
relevant indication of whether a rating scale was in accordance with the item
response theory model.
Loevinger’s coefficient of homogeneity was thus used by Mokken in his
IRT analysis. Jane Loevinger (1918–2008) was one of the few women to con-
tribute statistical tests in psychometrics. Her thesis from 1957 (Objective
Tests as Instruments of Psychological Theory) is her most widely cited work
( 76 ). She demonstrated that if one measures reliability as an agreement
between the items in a psychological questionnaire, one may end up in a
tautological process by making parallel questionnaires. She employed the
mind-set behind Guttman’s cumulative model: that each individual item’s
degree of independent information should be examined, not whether or not
it is identical to the other items in a questionnaire. Loevinger therefore devel-
oped her coefficient of homogeneity as an overall assessment of the Guttman
model in its probability formula. Mokken then went further and established
a coefficient of homogeneity for each item in a questionnaire in order to
identity the items that do not fit the Guttman model. It may seem surprising
that Loevinger herself did not complete Mokken’s work. In his 1971 book,
Mokken states that this coefficient of homogeneity should be regarded as a
descriptive statistic in the sense that a value of 0.40 or higher means that the
total score of a rating scale is a sufficient statistic. Actually, Mokken regarded
coefficients between 0.30 and 0.39 as doubtful, perhaps suitable, while
coefficients of 0.50 or higher were perfect and signified a perfect scale.
Mokken analysis is a much weaker test than the Rasch test on whether a
scale fits the item response theory model, because external factors such as age
and gender are not included as part of the analysis in the same way as in the
Rasch analysis ( 74, 75, 76 ).
With Mokken’s 1971 monograph on rating scale analysis, one may claim
that modern psychometrics had reached a level where the two central
elements of this discipline are expressed in pure rating scale terms, that is, the
quantification of the individual symptom on a Likert scale (see Chapter 4),
and for Mokken in particular the cumulative Guttman scale. A good intro-
duction to Rasch analyses is: Sijtsna K, Molenaar IW. Introduction to
Nonparametric Item Response Theory ( 75, 76 ).
The two psychometric procedures, classical versus modern (Figure 3.3 .),
may, with reference to Wittgenstein, be considered as two different pathways
which we consider as different approaches ( 77 ). The classical approach serves
to describe a family of types which have been discussed in connection with
Cla
ssic
al:
Fac
tor
anal
ysis
for
typo
logi
cal i
ssue
sM
oder
n:Ite
m a
naly
sis
for
mea
sure
men
t is
sues
A m
athe
mat
ical
mod
el fo
r ty
pe d
escr
iptio
nS
pear
man
’s t
wo-
fact
or m
odel
and
unr
otat
ed p
rinci
pal
com
pone
nt a
naly
sis
are
rank
ed to
geth
er.
The
firs
t fac
tor
is a
ge
nera
l fac
tor,
whi
ch is
taut
olog
ical
, whi
le th
e ne
xt fa
ctor
w
ith n
egat
ive
vers
us p
ositi
ve lo
adin
gs is
the
type
de
scrip
tion,
i.e.
, phy
sica
l ver
sus
men
tal a
nxie
ty o
r ty
pica
l ve
rsus
aty
pica
l dep
ress
ion.
An
exam
ple
of th
e pr
inci
pal c
ompo
nent
ana
lysi
sse
e:
App
endi
x 11
a C
alcu
lus
exam
ple
1
A m
athe
mat
ical
mod
el fo
r m
easu
rem
ent
issu
es.
An
asse
ssm
ent
scal
e w
hich
fulfi
ls th
e ite
m r
espo
nse
theo
ry
mod
el, e
.g. t
he R
asch
mod
el, p
osse
sses
the
mea
sure
men
t te
chni
cal a
dvan
tage
that
the
tota
l sco
re is
a s
uffic
ient
st
atis
tic, i
n th
at w
e ha
ve th
e di
stan
ce b
etw
een
and
rank
or
der
of th
e in
divi
dual
item
s. (
ofte
n re
ferr
ed t
o as
inva
riant
ite
m o
rder
ing
or tr
ansf
erab
ility
). In
dep
ress
ion
mea
sure
men
t th
is m
eans
tha
t we
know
that
dep
ress
ed m
ood,
lack
of
inte
rest
, an
d fa
tigue
are
pre
sent
eve
n in
mild
er d
egre
es o
f de
pres
sion
, w
hile
gui
lt an
d ps
ycho
mot
or r
etar
datio
n m
ake
thei
r ap
pear
ance
in m
ore
seve
re d
egre
es.
An
exam
ple
of th
e ite
m a
naly
sis
for
mes
urem
ent i
ssue
sse
e:
App
endi
x 11
b C
alcu
lus
exam
ple
2
ww
w.p
sykf
orsk
hil.d
k
ww
w.p
sykf
orsk
hil.d
k
Fig
ure
3.3
The
psy
chom
etric
mod
els:
cla
ssic
al v
s. m
oder
n
42 Clinical Psychometrics
Hotelling’s principal component analysis and Russell’s ramified hierarchy of
typology. The modern approach, for family resemblances, has the criterion of
measurement (total score a sufficient statistic) comparable with other meas-
urement instruments, such as a blood pressure apparatus or a thermometer.
The Guttman cumulative rating scales with the item response theory models
are examples of the modern approach, focussing on the summed total score
as a sufficient statistic.
Wittgenstein used his language-game approach as an argument against
private language. The speaker can only be sure that he or she is using words
correctly when an ‘inner’, ‘subjective’ or ‘private’ process is operating while the
words are used as part of their original public language ( 78 ). Wittgenstein
himself worked with games of applied mathematics ( 78 ). Inspired by his
attempt to follow the measurements of ‘inner’, ‘subjective’ feelings ( 79 ), the
familiar arrangement of the HAM-D 17
items seems to follow that A, B, C
v ersion (Appendix 3a).
43
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Immediately after the publishing of the ICD-10 ( 55 ) and the DSM-III/
DSM-IV ( 56 ), a major attempt was made to integrate modern psychometrics
with these new diagnostic systems ( 4 ). In this attempt, Likert’s response
categories were limited to 0–4 scales, while Guttman ’ s cumulative scale was
described on the basis of the statistical models within the item response theory
analyses, e.g., the Rasch or Mokken analyses. This was done by combining the
Likert values of the individual symptoms to form a sufficient total score.
Rensis Likert: Scale step measurements
One of the basic elements in modern psychometrics is that each symptom
must be measured on a scale with several steps, namely a Likert scale, named
for Rensis Likert (1903–81). In 1932, he completed his PhD in psychology at
Columbia University, New York, in which he had developed a response
category scale with five steps. In his thesis, Likert used questions based on
values: ‘judgement of value rather than judgement of fact’. The response options
for the individual questions were ‘bipolar’ in that they went from ‘strongly
approve’, ‘approve’, ‘undecided’, ‘disapprove’ to ‘strongly disapprove’. He found
that ‘attitudes are distributed fairly normally’ and that this provided a basis for
‘combining the different statements’. However, Likert did not investigate
whether the sum of the individual questions actually constituted a sufficient
statistic. Subsequently it has been demonstrated that a Likert scale going from
0 to 6 (i.e., seven response categories) ‘hits the ceiling’, which is to say that a
greater number of response options will not provide more information ( 80 ).
This might be the place to mention that in the first review on graphic
rating scales, Freyd (The Graphic Rating Scale) ( 81 ) comments that, while
Galton (1883) was the first to use a ‘Likert’ scale, he was not systematic as to
methods. Freyd recommends that the line along which the measurement
4 Modern psychometrics: Item categories and sufficient statistics
44 Clinical Psychometrics
takes place should be long enough to permit five response categories. In
many ways, this is the precursor of the Likert scale.
Figure 4.1 shows how the seven-response category Likert scale, used in
the BPRS, is a ‘global’ scale compared with the semi-global, seven-response
category Likert scales used in another assessment scale, namely the
Montgomery-Åsberg Depression Rating Scale (MADRS). It also shows
the exact scale step definitions based on the BPRS. The first of these was
developed as early as in 1963 by Professor William J. Turner (1907–2006)
( 49 ). An expanded version of this is incorporated in the PANSS scale.
According to the BPRS, the depression symptom is defined as lowered
mood and, on the Likert scale from 0 to 6, this is a global clinical expression
reflecting the adjectives given in Figure 5.1. In the MADRS Item 1, lowered
mood (observed), no definition is supplied for grades 1, 3 and 5. The reason
for this is that MADRS is a subscale derived from the Comprehensive
Psychopathological Rating Scale, in which the individual items have a Likert
scale from 0–3 ( 82 ). The scale has merely been doubled without taking note
of the empty steps, thus making it semi-global (see Appendix 3c). The first
psychometric analysis of the MADRS showed that psychiatrists using the
Assessment of the symptom depressed mood with increasingly precisedefinitions (anchoring)
BPRS MADRS PANSS / CIDRS
0 = not present 0 = neutral mood 0 = absent
1 = doubtful 1 1 = on the verge of depressed mood
2 = very mild 2 = looks dispirited but does brighten up without difficulty
2 = quite mild tendency, but only occasionally
3 = mild to moderate
3 3 = mild to moderate indications of depressed mood, but no hopelessness
4 = moderate to marked
4 = appears sad and unhappy most of the time
4 = moderate to marked indications of depressed mood, perhaps tendency to crying. Reports feeling of hopelessness
5 = marked to severe
5 5 = marked to severe indications of depressed mood, distinct hopelessness
6 = extremely severe
6 = extreme and constant despondency
6 = extremely severe indications of depressed mood massive hopelessness
For ABC scoring sheet of MADRS see Appendix 3c
Figure 4.1 Schematic representation of the graduation of the symptom depressed mood
Modern psychometrics 45
scale had avoided the empty steps ( 71 ). This is probably the reason why the
standardisation of the MADRS has a relatively high value of 12 for remission,
while the corresponding score on the HAM-D is 7.
The use of empty scale definitions in questionnaires has proven to give an
artificially higher score than the use of well-defined scale steps, such as in the
PANSS example in Figure 4.1 ( 83 ). The most fruitful of the attempts made to
improve the Likert scale in the Hamilton Depression Scale is Paykel’s Clinical
Interview for Depression (CID), precisely through its use of the 0–6– scale
shown in Figure 4.1 ( 84 ). The assessment scale, Clinical Interview for
Depression and Related Syndromes (CIDRS) is developed from the CID. It
follows the endeavours of Turner and the PANSS, while avoiding the ten-
dency to overlap seen in these two attempts. The example given in Figure 4.1
is in accordance with the CIDRS.
On the HAM-A, as shown in the Appendix, there are 1, 2, 3, 4 steps
descending from ground level = 0 down to the lowest level. On the HAM-D
some symptoms ( 39, 9 ) also go from 0 to 4 while others ( 85, 8 ) go from 0 to
2. Hamilton explained that this had been introduced because it was not
clinically meaningful to employ a longer ladder than this. Most assessment
scales tend to deal rather sketchily with the issue of measuring a specific
symptom, for example, lowered mood as shown in Figure 4.1 ( 86 ). As men-
tioned previously, Hamilton gave a great deal of thought as to symptoms that
can only be measured from 0 to 2 versus symptoms that can be measured
from 0 to 4. A score of 3 on the BPRS in Figure 4.1 may signify: a) that dur-
ing the interview, the patient typically seems mildly to moderately depressed,
that is to say neither quite mildly nor markedly depressed, b) that during the
interview, the patient has fluctuated between doubtful and marked to severe,
but on average has a score of 3, or c) one has the impression that during the
last three days, taken as a whole, the patient has had a score of 3.
Recently, an attempt has been made to ensure a more exact score on the
Hamilton Depression Scale by assessing both frequency and severity of a
symptom in an integrated score (GRID – HAM-D 6 ).
As ‘grids’ or nets, both the HAM-D 6 in its GRID version and the MES
can be viewed as attempts to ‘tighten the net’ to catch those symptoms
that are difficult to pinpoint during an interview due to their varying
frequency.
John Overall: Brief, sufficient rating scales
J ohn Overall’s (1929–) PhD dissertation in 1957 from Texas University in
Austin in the field of general experimental psychology led to five years’ train-
ing in psychometrics at Thurstone’s Psychometric Laboratory in North
Carolina, where he came into contact with the Central Neuropsychiatric
46 Clinical Psychometrics
Research Laboratory of the Veterans Administration Hospital in Perry Point,
Maryland. As a consequence, Overall joined the programme that the Veterans
Administration had initiated after seeing the revolutionary effects of
chlorpromazine and imipramine, whereby schizophrenic or depressive
patients could be discharged from mental hospitals. To make this more
evidence-based, US multi-centre investigations had been initiated, both pla-
cebo-controlled and against active ingredient comparator, in accordance
with the, randomised, double-blind method that had been introduced within
medical science in the 1950s. The programme was called ‘Cooperative
Studies of Chemotherapy in Psychiatry’. In this programme, Lorr’s ‘Inpatient
Multi-dimensional Scale’ (IMPS) was included as a measure of desired clini-
cal effect ( 87 ). On the basis of the first data analyses of the results from this
programme, and using the statistical analysis methods learned from
Thurstone (the grand old man of American factor analysis), Overall was able
to show that the 63 subscales in the IMPS could be reduced to 16 items. In
this, Overall received much aid from two experienced clinicians, the psychia-
trist Leo Hollister and the psychologist Don Gorham. In particular Gorham’s
clinical experience was used. He was 20 years older than Overall and had
been in the midst of the dramatic change in clinical reality in mental hospi-
tals caused by the introduction of chlorpromazine and imipramine. The
clinical training provided by the physician Leo Hollister was also extremely
important for the formulation of the 16 items that led to the development of
the Brief Psychiatric Rating Scale(BPRS) in 1962 ( 45, 46 ). As noted by Overall
the language of the 16 BPRS items is that employed by experienced psychia-
trists when treating patients:
The guiding principle in development of the BPRS was to provide
psychiatrists with a rating instrument that would permit them to
record their judgment at a level of abstraction consistent with the
manner in which they ordinarily conceptualised manifestations of
psychopathology ( 88 ).
In 1963, after publishing the BPRS, Overall returned to Texas as head of
the Research Computation Center in Galveston. In 1978, he became profes-
sor of clinical psychometrics at UT Houston Medical School. Overall tells
how the BPRS was accepted outside the US via the CINP (Collegium
Internationale NeuroPsychopharmacology) ( 89 ). Max Hamilton headed the
CINP group that was to implement both the BPRS and Hamilton’s Anxiety
Scale (HAM-A) and Depression Scale (HAM-D) via controlled clinical trials
worldwide. With these scales, averages and deviations can be computed to
allow comparison of results of clinical trials from different parts of the world.
This is not possible with a diagnosis!
Modern psychometrics 47
Both Max Hamilton and John Overall advocated the clinical approach:
when the diagnosis had been made, the prescribed treatment should then be
monitored by HAM-D/HAM-A or BPRS in order to measure the level of
response.
In 1969, Index Medicus accepted rating scales as scientific, evidence-based
measuring instruments for the assessment of drug efficacy in psychiatry.
Brief Psychiatric Rating Scale (BPRS) was the scale referred to by Index
Medicus in 1969, as the BPRS was specifically developed to assess the effects
of antipsychotics or antidepressants ( 4 ). Figure 1.10 shows the BPRS with its
18 symptoms covering depression and schizophrenia. Figure 1.11 shows that
with the addition of two items, mania can also be assessed. Thus, a mere six
BPRS symptoms make it possible to measure the three major fields in clinical
psychiatry; namely schizophrenia, mania and depression.
Factor analytic studies with BPRS brought into sharper focus the American
tradition versus the British. Using the British tradition learnt during his
studies in London, Pichot demonstrated the need to focus on the two most
important factors, and showed that it is the depression factor rather than the
psychotic factor that is important. Building on the American tradition of fac-
tor analysis, Overall attempted to discriminate between ‘depression’, ‘anergia’,
‘thought disturbance’, ‘excitement’, and ‘hostility/ suspicion’; thus, five factors
in all ( 59, 60 ). It is worth noting here, that the BPRS literature does not
discriminate between positive versus negative factors. This terminology
entered with the PANSS scale, based on the BPRS ( 4 ). The distinction
between positive and negative schizophrenia symptoms has not proved
fruitful in clinical psychometrics, as it lacks clinical validity. The lack of
understanding of the concept of schizophrenia in American psychiatry has
stimulated the efforts to introduce the discrimination between positive
versus negative symptoms. Thus, the DSM-IV states, concerning the
diagnosis of schizophrenia, that the positive symptoms refer to an overreac-
tion of normal functions, while the negative symptoms refer to a diminish-
ment in, or even loss of, normal functions. This is not far removed from the
bipolar affective disorder in which the manic symptoms are just such an
overreaction (Freud termed this a contra-phobic reaction), while depression
or melancholia precisely display diminishment in or even loss of normal
functions. In the schizophrenic disorder, autism, ambivalence, distorted
associative thought processes, and distorted sensory perception are the core
elements, as denoted in the psychotic dimension of the BPRS. The ten BPRS
items identified by Overall to be the most discriminating items for measur-
ing schizophrenic states cover both ‘ negative’ and ‘positive’ symptoms ( 90 ).
An item response theory analysis (Rasch) showed that these items measure a
dimension of severity of schizophrenic states ( 91 ).
48 Clinical Psychometrics
Attempts to compare the validity of rating scales using the ICD-10 or DSM
systems reveal that these diagnostic systems do not contain a measurement
function ( 4 ).
Clinical versus psychometric validity
When analysing the measurement validity of an assessment scale such as,
for example, a depression scale, it is important first to evaluate its clinical
validity; this can only be done by a highly experienced psychiatrist. In the
first Danish assessment of the Hamilton Depression scale, the two most
experienced psychiatrists at the Psychiatric Clinic of the leading Danish
hospital (Rigshospitalet) were used as ‘Indices of validity’. These were Erling
Dein and Ove Jacobsen. As mentioned previously, Erling Dein was the
supervisor in Lise Østergaard’s doctorate thesis. My own doctorate thesis
from 1981 describes how these two experienced psychiatrists assessed the
degree of depression on a scale from 0 = no depression to 10 = maximum
depression.
Of the 17 symptoms in the Hamilton scale, only six symptoms corresponded
to our “Indices of validity”. These six symptoms are: lowered mood, guilt
feelings, lack of interest, psychomotor retardation, psychic anxiety, and
fatigue. They correspond, to a certain degree, to the six BPRS symptoms
measuring depression (Figure 1.11).
The mathematical or statistical method used in psychometrics to
determine whether it is relevant to add up the different symptom scores as a
measure of present state severity of a psychiatric disorder (item response
theory analyses) is visualised in Figure 4.2 , in which severity of depression is
measured by six different symptoms ( 1, 3 ). These six symptoms have been
taken from Hamilton ’ s Depression scale (see Figure 1.9), as they have turned
out to be the most suitable as a ‘ruler’ (shown in Figure 4.2 ) going from 0 = no
depression to 22 = maximum depression, to illustrate a present state profile.
Figure 4.2 is an attempt to illustrate how the contents of Figure 3.1 can be
translated into a measure or ruler by summing the six symptoms into a total
score. The six symptoms in Figure 4.2 are symbolised by boxes that may
overlap; this is termed statistical uncertainty . To allow each symptom to
express its particular piece of information (its particular prevalence)
corresponding to the area it covers on the ruler, there must not be much
overlap between the symptoms.
As can be seen, lowered mood, lack of interest and fatigue form the first
half of the ruler while anxiety, guilt feelings and psychomotor retardation
make up the second half. As is also seen, fatigue overlaps both lack of interest
and anxiety while guilt feelings overlap anxiety and retardation.
Modern psychometrics 49
Thus, an assessment scale is an attempt to achieve a linear description of the
severity of the psychiatric disorder through the symptoms selected. This is often
spoken of as a visual scale, going for example from 0 to 22 – as in Figure 4.2 .
In the mathematical-statistical analysis (item response theory analysis) of
the six symptoms in Figure 4.2 , one has ensured that there is no influence of
age or gender on these symptoms (e.g., that older people score differently
from younger people, or that women score differently from men).
When all symptoms point in the same direction, in accordance with the
order shown in Figure 3.1, one says that the degree of severity of the present
state syndrome has been found. Thus, one speaks of the severity of a depres-
sive syndrome, a manic syndrome, et cetera .
Factor analysis would typically attempt to demarcate some symptoms, cov-
ering a small part of the ruler; while a Rasch analysis is based on the assump-
tion that a clinical analysis has been performed to determine whether these
depressive symptoms provide an adequate description of the whole dimen-
sion. Rasch analysis then determines whether or not the placing of these
symptoms on the ruler is influenced by external factors, such as age and gen-
der and geographical area. In order to operate as a ruler, in the same way as
the standard metre bar in Paris, the instrument of measure must function
independently of external factors.
The six symptoms in Figure 4.2 comprise the HAM-D 6 and have been
found to fulfil the Rasch analysis. Thus, neither age nor gender have an effect
on the HAM-D 6 total score; this has been demonstrated in studies performed
both in and outside Denmark; e.g., in Germany, France and the US.
Item-response theory versus factor analysis
It is extremely important to understand that the use of factor analysis is
not a method to test whether a scale measures the degree of depression.
Unfortunately, different software systems, such as Statistical Analysis System
(SAS), make it possible for anyone to perform a factor analysis. Previously,
The Depression Ruler: total score a sufficient statistic
Lack of interest
Depressive mood
Anxiety
Fatigue
Retardation
Feelings of guilt
0No depression
22Maximum depression
Figure 4.2 An elaboration of Figure 3.1 – prevalency is now substituted by item score
50 Clinical Psychometrics
this operation necessitated the aid of a competent statistician, who would
point out that the more the symptoms correlate in a factor analysis, or on
Cronbach’s alpha test, the less is the information value in the symptom. The
key to performing an assessment of a depression is precisely the ability to
register the valid symptoms, as is apparent from Figure 4.2 .
An important aspect of a Rasch analysis is not only that of a professional
selection of the symptoms that covers the ruler or dimension under measurement,
but also that there is no local dependency between the individual symptoms.
It has frequently been debated whether a simple visual analogue scale would
suffice, i.e., a depression ruler corresponding to the BPRS depression symptom
in Figure 4.1 . In this connection, one often uses the Clinical Global Impression
Scale (CGI) ( 92 ). Figure 4.3 shows the CGI-S; the S stands for severity. Evidently,
one has, first of all, to place the patient in the most relevant category of illness. If
this is depression, then a Grade 6 signifies that one has the clinical, global, present
state, impression of the person in question as belonging to the most depressed
group of patients one has seen. In other words, the CGI-S scale in Figure 4.3 can
only be used by highly experienced clinicians. The less experienced are handi-
capped by not having seen enough severely depressed patients, and tend to over-
score the condition. Due to this, the HAM-D 6 is a more reliable scale when
people with varying degrees of psychiatric training are involved. Furthermore,
the use of a symptom assessment scale permits an investigator to explore whether
a certain treatment is only effective on a few of the actual symptoms.
Jacob Cohen: Effect size
Like John Overall, Jacob Cohen (1924–98) studied psychology with special
emphasis on statistics. He majored in 1947; the subject of his PhD dissertation
Score Clinical Global Impression (degree of illness)
0 No sign of mental illness
1 Doubtful presence of mental illness
2 Mild degree of illness
3 Moderate degree of illness
4 Marked degree of illness
5 Severe degree of illness
6 Among the most severely ill patients within the psychiatric diagnosticgroup to which the patient belongs
The Clinical Global Impression Scale, severity (CGI-S)
Figure 4.3 Scoring sheet for Clinical Global Impression Scale, Severity (CGI-S)
Modern psychometrics 51
from New York University in 1950 was factor analysis in intelligence tests. In
addition to effect size statistics, he is renowned for his scale reliability
measurement; Cohen’s kappa-agreement coefficient ( 93 ). In modern psycho-
metrics, Cohen is best known for the descriptive statistics known as the
standardised effect size ( 94 ). This concept will be dealt with in more detail in
Chapter 5; here it is important to specify that effect size refers to the clinical
significance of a specific treatment (e.g., when comparing an active drug with
placebo) and not only to the statistical significance. Cohen probably provides
his best explanation of this in his paper entitled ‘The earth is round (P < 0.05)’
( 95 ). With reference to clinical psychometrics, one might say that the
standardisation of a scale, for instance the HAM-D, implies that a depressive
condition should be treated (the earth is round) when HAM-D ≥ 18 and not
because of some or other P-value. As a crude measure of effect size, Cohen
employed the norms ‘small’, ‘medium’ and ‘large’. When evaluating clinical
significance of a drug compared to placebo, a ‘medium’ effect size is required,
which is not to be translated into a P-value, but into other clinical targets,
Score Depression measurement
0 No depression
1 Doubtful depression
2
Mild depression3
4
5
Moderate depression6
7
8
Severe depression9
10
The Global (0 –10) depression scale
Figure 4.4 Scoring sheet for Global Depression Scale (0–10)
52 Clinical Psychometrics
e.g., 20% more effective (=20% higher response rate) than placebo or a
Number Needed to Treat = 5.
In the interest of comprehensiveness, the depression scale we used as our
clinical reference in our first validity examination of the Hamilton Depression
Scale (HAM-D 17
) is shown in Figure 4.4 . Clinical validity comes before the
psychometric process of validation. As the two experienced psychiatrists
(Erling Dein and Ove Jacobsen) were very reliable in their use of the global
depression scale, on par with the psychiatrists who performed the HAM-D 17
ratings (Tom Bolwig and John Vitger). This was the basis for performing an
item analysis; i.e., to investigate how each of the 17 items in the HAM-D 17
adhered to the global score from 0 to 10 (Figure 4.4 ). The result was the six
symptoms that constitute the depression ruler (Figure 4.2 ).
As far as the BPRS (Figure 1.10) is concerned, it is of course not relevant to
do an item analysis such as when the HAM-D 17
is compared to a global
depression scale. This is because the BPRS is a ‘bipolar’ scale, partly consisting
of a six-item depression scale corresponding to the depression ruler in
Figure 4.2 , and partly consisting of 11 items that can be said to make up a
psychosis ruler or a scale with positive (mania-type) items. In the Appendix,
the BPRS is therefore shown as two scales (schizophrenicity and depression).
It is indeed very disappointing that over the past three decades the clinical
validity of a rating scale is no longer the domain of experienced psychiatrists
but left to inexperienced research workers (social workers, psychologists, and
young medical doctors) using structured clinical interviews. However, these
structured interviews have been developed to help the inexperienced research
worker to be clinically more competent in clinical trials. The investigation of
clinical validity of rating scales or questionnaires has still to be performed by
experienced psychiatrists.
The Mania Scale (Appendix 6) has also been developed using experienced
psychiatrists as index of validity ( 64 ).
53
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Dr. Phil. Benny Karpatchof, a professor at the University of Copenhagen’s
Department of Psychology, has developed a scale covering the consequences
of a Rasch analysis. This scale ranges from Hell via Purgatory to Paradise
( 96 ). Figure 5.1 is a modified version.
The clinical consequence of having entered psychometric Paradise is
recognised when employing effect-size statistics to denote clinical signifi-
cance in placebo-controlled studies, especially when evaluating dose
response relationships in modern neuropsychopharmacology.
With reference to Wittgenstein we might say that Karpatchof ’s approach is
to bring the items back to their correct home (‘Paradise’) when tested in the
stimulus–response model, the dose–response relationship. Drugs are major
treatment modalities for all medical disorders, including psychiatric disor-
ders. However, this does not imply that drugs can cure any mental disorder.
On the other hand, the pharmacological approach of demonstrating a dose–
response relationship, is a most important scientific principle. It has been
studied only sporadically in clinical psychopharmacology, probably due to
inadequate outcome measures and/or descriptive statistics, i.e., effect-size
statistics.
Effect size and clinical significance
Figure 5.2 illustrates data from a placebo-controlled clinical trial in patients
who all fulfilled the DSM-IV criteria for major depression prior to treatment,
i.e., had a treatment-demanding depression. The patients were randomised
to either placebo or active medication (verum); in this case a selective
serotonin reuptake inhibitor (SSRI). In all, 102 patients entered the trial,
which lasted six weeks. Of these, 52 received active treatment and 50 received
placebo. The patients were assessed using the Hamilton Depression Scale
5 The clinical consequence of IRT analyses: The pharmacopsychometric triangle
54 Clinical Psychometrics
(HAM-D 17
). Before treatment, the patients had a HAM-D 17
mean score of 24,
this applied to the 52 patients receiving the SSRI drug and the 50 on placebo.
During the trial the patients were assessed once a week. Endpoint was six
weeks after start of the trial. In total, five patients dropped out in the SSRI
group, three of these due to too many side effects (headache, nausea or
hyperhidrosis), while two patients withdrew because they felt that the
treatment did not help, perhaps because they thought they were receiving
HAM-D
EndpointBaseline Weeks of therapy
Effect size 3/7.5 = 0.40
PlaceboHAM-D = 14
HAM-D = 11Active drug (SSRI)
(Pooled sd = 7.5)
24
14
11
7
0
Figure 5.2 Example of calculation of effect size in a placebo controlled antidepressant study in which HAM-D 17 was used
Hell The selected symptoms are, when taken as a whole, quite inhomogenous, e.g. BPRS18 (Appendix 7) HAM-D17 (Appendix 3a)
Purgatory A certain degree of homogeneity, but with local item dependency, necessitating revision or deletion of these items, e.g. MADRS10 (Appendix 3c)
Paradise Distinct homogeneity without local dependency(total score a sufficient statistic), making it possible to demonstratedose response relationship, e.g. HAM-D6, MES (see Appendix 3d),MAS (see Appendix 6)
The psychometric consequences by the results of an item theory analysis(after Karpatschof)
Figure 5.1 Diagram of the psychometric consequences from the results of an item response theory analysis. (Modifi ed from Karpatschof B. Udforskning i psykologi. De kvantitative metoder. København: Akademisk Forlag 2006)
The clinical consequence of IRT analyses 55
placebo. The trial was double-blind, so that neither patient nor treating
physician were aware of which type of treatment was given. In total, four
patients dropped out in the placebo group, one with headache and three
because of lack of effect.
The so-called LOCF method (Last Observation Carried Forward) was
used to analyse the results. From the week when the nine patients left the
trial, their HAM-D 17
scores were carried forward as if these patients had
remained on their score until endpoint. The reason for this LOCF method is
a desire to retain all the patients entering the trial in the analysis (Intent-to-
treat). In this way, an attempt is made to describe the treatment course for all
the patients included in the trial and not only the ‘well-behaved’ patients who
completed the full six-week treatment period.
As can be seen in Figure 5.2 , the HAM-D 17
mean score of the placebo-
treated patients at endpoint was 14. The mean HAM-D 17
score was 11 for the
SSRI treated patients.
Effect size is an expression of the difference between the HAM-D 17
score at
starting point (baseline) and at endpoint for active medication (24−11 = 13)
and the corresponding change for the placebo medication (24−14 = 10). This
HAM-D 17
mean score difference of 13−10 is thus 3. This difference is now
considered in relation to the standard deviation on the change in HAM-D 17
for all patients. As seen in Figure 20 this ‘pooled’ distribution (standard
deviation) has been calculated to be 7.5.
Effect size (Figure 5.2 ) is the fraction made up by the difference in the
mean HAM-D 17
change for the two types of treatment (i.e., 3) divided by the
deviation (i.e., 7.5). In this manner the effect size in the trial is 0.40 when
comparing SSRI treatment with placebo.
Effect size statistics were introduced by Cohen ( 97 ). This measure is inter-
esting in that it provides a measure of treatment effect in relation to the
standard deviation on the scale used for assessment ( 97 ). In this way, the
effect size is dimensionless, in that it becomes independent of the raw score
of a particular scale and thus permits a comparison of different rating scales
by use of the standardised effect–size statistic.
In his original publication, Cohen states that an effect size of 0.2 lacks
clinical significance. In his opinion, an effect size of 0.5 has medium or mod-
erate clinical significance, while an effect size of 0.8 or more means marked
clinical significance. These figures are relevant when comparing an active
treatment with placebo, as in Figure 5.2 . Cohen admits that these effect-size
values for clinical significance are actually provisional, subjective cut-offs.
In our 2000 analysis on the effect size of fluoxetine (the first SSRI drug to be
approved) in patients fulfilling the DSM-III criteria for major depression, we
found an effect size of 0.30 on HAM-D 17
, while it was 0.38 on HAM-D 6 ( 98 ).
56 Clinical Psychometrics
The effect size area between 0.30 and 0.50 has been heavily debated due
to the fact that the US Federal Drug Administration (FDA) opened an
opportunity for re-analysis of all the data submitted from the pharmaceutical
industry when seeking FDA approval of a drug for DSM-III/DSM-IV major
depression.
Turner et al were allowed access to all the FDA data on 12 newer antide-
pressive drugs ( 99 ). They found a mean effect size of about 0.30, as also
found by Kirsch et al when analysing a subgroup of FDA data on six FDA-
approved drugs ( 100 ). The HAM-D had been used as an effect measure in
more than 95% of these trials.
Figure 5.3 shows a comparison of the HAM-D 17
and HAM-D 6 data for
those of the antidepressants where it was possible to gain access to the indi-
vidual HAM-D items and not only the total HAM-D score. It is a limitation
of the Turner and Kirsch analyses that the FDA only provided the HAM-D
total mean score, not the individual item scores. Furthermore, in some trials
the HAM-D total score is HAM-D 17
while in other trials it is HAM-D 21
( 98, 100, 101 ).
Analyses of clinical significance through effect-size studies using health
related quality of life scales have demonstrated a change in the denominator
corresponding to half the standard deviation, i.e., 0.50 ( 101 ). However, the
effect size of 0.50 has obviously in itself a 95%-confidence interval which
ranges from 0.36 to 0.63. Clinical significance when evaluating antidepres-
sive effect lies precisely in this range between 0.36 and 0.63, as can be seen in
Figure 5.3 .
The pharmacopsychometric triangle
Within modern psychometrics the creation of a pharmacopsychometric
triangle has now become possible, based particularly on two specific
elements: the concept of transferability and effect-size statistics.
Cattell demonstrated how he had attempted in vain to use factor analysis
to test for transferability ( 107 ). Catell understood transferability as an expres-
sion of whether a rating scale measures the same phenomenon or the same
dimension in different groups of patients (e.g., men versus women, younger
age groups versus older age groups, primary depression versus secondary
depression) or in the same group of patients when the rating scale is used for
weekly assessments during a course of antidepressive therapy.
This is precisely what the item response theory models ensure; a test of
transferability. A landmark study in this area is the Rasch analysis that was
performed in connection with an antidepressive medication study using
weekly HAM-D assessments ( 108 ).
The clinical consequence of IRT analyses 57
The use of effect-size statistics in the pharmacopsychometric triangle is
important in that it is independent of the rating scale used, as this
dimensionless statistic only uses the mean and standard deviation.
As can be seen in Figure 5.4 , the upper left corner of the triangle (A) is
the desired clinical effect, with special emphasis on dose–response
relationship. This dose–response relationship highlights what Rasch
expresses as follows:
If we want to know something about a quantity, then we have to
observe something that depends on that quantity, something that
changes if the quantity varies materially. In that case we have a
sufficient statistic ( 62, 63 ).
Studies HAM-D17 HAM-D6
Bech et al. (98,100)Fluoxetine20–60mg 0.30 0.38
Entsuah et al. (102)Fluoxetine 20–60mg 0.24 0.40
Bech et al. (103)Citalopram20mg40mg
0.090.39
0.210.51
Bech et al. (104)Escitalopram10mg20mg
––
0.380.61
Bech (105)Mirtazapine 15–60mg 0.49 0.42
Bech et al. (106)Duloxetine 60mg120mg
0.460.49
0.510.57
Figure 5.3 Effect size results in placebo-controlled antidepressive trials using HAM-D 17 and HAM-D 6
58 Clinical Psychometrics
The upper right corner (B) illustrates the undesired clinical effect, i.e., the
different side effects. The side effect scale used in the example is the UKU
Scale ( 4, 109 ).
Lastly, C illustrates patient-reported, clinically-related quality of life,
which can be said to be a balance between the desired versus the undesired
effects of the drug under examination.
When discussing the different classes of psychotropic drugs, we can
refer to the ICD-10 hierarchy, or ladder (Figure 5.5 ), which ranks the
various psychiatric disorders so that at the ‘bottom’ we have personality
disturbances (for which there is, of course, no available pharmacological
therapy), then on the next step anxiety (as implied in Figure 5.5 , these
areas of psychiatry are Freudian, while the steps further up are
Kraepelinian). According to ICD-10, a patient suffering from both depres-
sion and anxiety should be diagnosed as depressive, and a patient suffering
from both mania and schizophrenia should be diagnosed as schizophrenic,
and so on.
Looking at the six steps of the ICD-10 diagnosis ladder (Figure 5.5 ),
we find dementia on the top step ( 1 ). Personality disturbances are placed on
the lowest step ( 6 ). As far as these are concerned, the use of drugs to treat
such deviant character traits as psychopathy has always been a very prob-
lematic issue.
A major drawback of the hierarchical structure of ICD-10 is the lack of
ability to distinguish between a primary depressive condition and a work-
related stress condition (distress), as the latter is diagnosed as a depressive
condition if its severity is consistent with a moderate (major) depression. It
was precisely the ability of an experienced psychiatrist to distinguish between
these two conditions that formed the basis for the introduction of antidepres-
sive drugs (imipramine) ( 1 ). In epidemiological studies that use ICD-10
Measurement of wantedclinical effect(e.g. HAM-D6, see Appendix 3f)
Measurement of unwantedclinical effect(e.g. PRISE, see Appendix 10)
Resulting patient-related quality of life(e.g. WHO-5, see Appendix 8a)
A B
C
Figure 5.4 The pharmacopsychometric triangle. (Modifi ed from Bech P. Applied psychometrics in clinical psychiatry: Acta Psychiatr. Scan 2009; 120: 400–409, Figure 1.)
The clinical consequence of IRT analyses 59
diagnoses, the prevalence of work-related stress lies below 1%, because many
people develop moderate depression ( 110 ); that this is secondary to work-
related stress can no longer be read from the diagnosis.
Antidementia medication
We have chosen data on the antidementia drug donepezil to illustrate how all
three areas of the pharmacopsychometric triangle (A, B and C) provide an
integrated picture (Figure 5.6 ). The data are from one of the most well-
designed antidementia studies among those assessed by the US Federal Drug
Administration (FDA) when authorising this product ( 111 ). The patients
included in the study fulfilled the DSM-III criteria for Alzheimer’s Disease.
In a double-blind 15-week study, donepezil was administered in two fixed
doses of 5 mg and 10 mg, and both doses were compared to placebo. The
11-item Alzheimer’s Disease Assessment Scale (ADAS) and the Mini Mental
State Examination (MMSE) were used as rating scales.
The MMSE effect size is negative, since a higher score on this scale
indicates improved cognitive functioning, while the ADAS effect size is posi-
tive, since a higher score on this scale indicates more symptoms of cognitive
dysfunction. On the MMSE, an effect size higher than 0.40 was only achieved
on 10 mg donepezil.
On the QoL scale, a higher score signifies better quality of life, but here
5 mg of donepezil is quite without effect, while 10 mg gives a statistically
higher effect but no clinically relevant effect as the effect size is merely −0.25
(negative as a higher score signifies an improved quality of life).
Dementia
Schizophrenia
Mania
Depression
Anxiety
Personality disorders Anti-anxiety
medications [5]
Antidepressants [4]
Anti-manic medications [3]
Anti-psychotic medication [2]
Anti-dementia medication [1]
Psychotherapy [6]
Kraepelin
Freud
The concordance between the ICD10 hierarchical (ladder) and the pharmacological classes of psychotherapeutical drugs
Figure 5.5 Diagram of the six diagnostic hierarchy steps of the ICD-10 in which stress-related anxiety and personality disorders lie within the area covered by Freudian psychiatry
60 Clinical Psychometrics
Figure 5.6 also shows that relatively few patients are unable to complete the
study because of side effects, especially nausea, which is one of the main
donepezil side effects. The 10% drop out in this controlled study is in line
with a recently published Danish study ( 111, 112 ).
Dementia therapy addresses the behavioural changes brought on by the
condition; here the weight of the burden resting on the relatives is of major
importance for the course of the disease. Thus, their quality of life is often
what is assessed, since a useful patient-related measure is difficult to find. If
the results of the patient’s own quality of life assessment are counter-intuitive,
then their relative’s assessment is used instead.
The behavioural scale commonly used is the Neuropsychiatric Inventory
(NPI), in which each symptom is assessed based on both the patient and on
the burden of illness as experienced by their relative ( 111, 115 ).
Antipsychotic medication
When evaluating the effect of antipsychotic medication, a comparison of
effects against placebo is important. However, treating psychotic patients
(i.e., especially schizophrenic patients) with inactive (placebo) medication
poses major ethical issues, as highly effective antipsychotic drugs are available.
There are two major categories of antipsychotic medication, the typical
and the atypical antipsychotic drugs. Chlorpromazine was the first drug to
demonstrate an antipsychotic effect that was quite different from that of the
medicines available prior to this time, such as phenemal. The most potent
typical antipsychotic drug is haloperidol. This was the most used antipsy-
chotic in the treatment of acute psychosis worldwide until the arrival of the
atypical antipsychotics.
Wanted effectADAS MMSE
5 mg 0.47 – 0.31
10 mg 0.58 – 0.41
% non-completers due toside effects
Placebo 5 mg 10 mg
2% 4% 10%
Unwanted effect
5 mg – 0.05
– 0.2510 mg
QoL
A B
C
Figure 5.6 The pharmacopsychometric triangle for Donepezil (111)
The clinical consequence of IRT analyses 61
Haloperidol was thus the most frequently employed comparative medica-
tion at the end of the twentieth century when investigating the antipsychotic
effect of the new, atypical drugs. The most well-designed trial was performed
in the US ( 113 ).
This US trial is often designated a ‘landmark’-study, as fixed doses of both
haloperidol (4, 8 and 16 mg) and the new atypical antipsychotic sertindole
(12, 20 and 24 mg) were used. A re-analysis of this trial in accordance with
the pharmacopsychometric triangle has recently been made ( 115 ).
From a scientific point of view, it is very important to include a placebo
group; in Europe, however, this would be perceived as ethically debatable.
Figure 5.7 shows the pharmacopsychometric triangle for the assessment of
antipsychotic actions when comparing the classical drug haloperidol with
the modern atypical drug sertindole; both of them compared to placebo in
the US-based trial ( 115, 118 ).
The antipsychotic effect (A) is measured on PANSS 11
which consists of
the 11 BPRS symptoms (items 3, 4, 7, 8, 10, 11, 12, 14, 15, 16 and 17 (see
Appendix 7)) shown in Figure 1.10. BPRS and PANSS differ in the latter ’ s
more precise anchors for the items. Moreover the total score of these 11 BPRS
items fulfils the item response theory model, ( 91, 92, 115, 118 ) definition of
each scale step.
The Simpson–Angus scale is used to measure the side-effect s profile (B).
This is shown in Figure 5.8 and consists of ten symptoms, all measuring the
extrapyramidal symptoms (EPS) corresponding to those seen in Parkinson ’ s
Disease. These extrapyramidal symptoms make the use of the typical classical
antipsychotics problematic, and in the development of the modern atypical
drugs a major goal has been to avoid such extrapyramidal side effects.
Depression rating scales have often been used to assess quality of life in
schizophrenic patients; these may, however, provide counter-intuitive results
in schizophrenics, as they also do in dementia ( 116 ). In Figure 5.7 the
depression items correspond to the six BPRS items in Appendix 7.
In Figure 5.9 , data on Mokken’s coefficient of homogeneity are illustrated;
this coefficient is a precise indication that the total score is a sufficient meas-
ure. On the PANSS 11
(A) and the Simpson–Angus Scale (B), the coefficient of
homogeneity is above 0.40, which means that the total score is a sufficient
measure in these scales.
On the PANSS 6 depression scale (C), the coefficient of homogeneity is
just below 0.40, and this indicates that use of the total score is only just
permissible.
Figure 5.7 shows that all the haloperidol doses (4, 8 and 16 mg) are effective
with an effect size greater than 0.40 as regards antipsychotic effect (A). As
Wan
ted
effe
ct
P
sych
otic
sub
scal
e P
AN
SS
11
Unw
ante
d ef
fect
Sim
pson
-Ang
us S
cale
(S
AS
)
Gen
eric
QoL
sca
leD
epre
ssio
n su
bsca
le P
AN
SS
6
Dos
e m
gD
ose
mg
Ser
tindo
leH
alop
erid
ol
12 m
g20
mg
24 m
g
0.39
0.64
0.45
0.50
0.73
0.55
4 m
g8
mg
16 m
g
Hal
oper
idol
Dos
e m
gD
ose
mg
12 m
g20
mg
24 m
g
Ser
tindo
le
0.0
2–
0.05
–0.
33
–0.
32–
0.32
–0.
48
4 m
g8
mg
16 m
g
Dos
em
gS
ertin
dole
Hal
oper
idol
Dos
e m
g
12 m
g20
mg
24 m
g
0.12
0.44
0.32
0.35
0.37
0.11
4 m
g8
mg
16 m
g
AB
C
Fig
ure
5.7
The
pha
rmac
opsy
chom
etric
tria
ngle
for
antip
sych
otic
med
icat
ion.
(M
odifi
ed fr
om B
ech
et a
l, D
ose-
resp
onse
rel
atio
nshi
p o
f ser
tindo
le a
nd h
alop
erid
ol u
sing
the
pha
rmac
opsy
chom
etric
tria
ngle
. Act
a Ps
ychi
atr
Scan
d 20
11; 1
23: 1
54–1
61, F
igur
e 1.
)
The clinical consequence of IRT analyses 63
regards sertindole, the lowest dose (12 mg) is only just effective, while 20 mg
is the optimal dose.
The side-effect measures on the Simpson–Angus Scale (B) (see Figure 5.9 )
show an effect size greater than −0.30 for all haloperidol doses. The effect size
is negative due to the fact that the side effects emerge during treatment. For
sertindole, the optimal dose for antipsychotic effect (20 mg) is entirely without
side effects, since an effect size of +/−0.20 has no clinical significance.
However, the side effects are considerable at a dose of 24 mg sertindole.
As regards depression and quality of life, 20 mg sertindole (the optimal
antipsychotic dose) also has an antidepressive effect with an effect size greater
than 0.40. None of the haloperidol doses reaches an effect size of 0.40, and
with the highest dose, the effect size is only 0.11.
By use of the pharmacological triangle one can thus determine whether
the scales are valid (total score a significant measure) as well as get an
overview of effect size statistics.
It is thought-provoking that even such a relatively low dose as 4 mg of halop-
eridol causes considerable Parkinsonian symptoms, and that the highest dose of
16 mg causes very severe side effects without any signs of remission of depres-
sive symptoms and consequently no increase in quality of life. When using
haloperidol as an alternative to the mood stabilising effect of lithium in bipolar
disorder, we operated with a very small dose between 0.5 and 2 mg ( 117 ).
Nr. Item Score
1 Gait 0–4
2 Arm dropping 0–4
0–40–4
0–4
0–40–4
0–4
0–4
0–4
3 Shoulder shaking
4 Elbow rigidity
5 Wrist rigidity
6 Leg pendulousness
7 Head dropping
8 Glabella tap
9 Tremor
10 Salivation
Total score (0–44)
Figure 5.8 Scoring sheet of the Simpson–Angus side effect Scale. (Adapted from Simpson GM Angus JWS. A rating scale for extrapyramidal side effects. Acta Psychiatr Scand 1970;46: (suppl. 212):11–19)
Wan
ted
effe
ct
P
sych
otic
sub
scal
e P
AN
SS
11
Tre
atm
ent w
eek
Hom
ogen
eity
Wee
k 4
Wee
k 6
Wee
k 8
0.44
0.46
0.44
Tre
atm
ent w
eek
Hom
ogen
eity
Wee
k 4
Wee
k 6
Wee
k 8
0.48
0.42
0.45
Unw
ante
d ef
fect
Sim
pson
-Ang
us s
cale
Tre
atm
ent w
eek
Hom
ogen
eity
Wee
k 4
Wee
k 6
Wee
k 8
0.38
0.39
0.38
Gen
eric
QoL
sca
leD
epre
ssio
n su
bsca
le P
AN
SS
6
AB
C
Fig
ure
5.9
Psy
chom
etric
val
idat
ion
of t
he s
cale
s in
Fig
ure
5.7
(Mok
ken
anal
ysis
). (
115)
The clinical consequence of IRT analyses 65
Antimanic medication
The first placebo-controlled trial in modern psychopharmacology took place
in the Danish city of Århus (Risskov), where Professor Erik Strömgren
initiated a study with manic patients, using lithium as therapy. In 1949, the
use of lithium was re-introduced in Australia, where John Cade (1912–1980)
demonstrated that lithium seemed to possess an antimanic effect in bipolar
patients, while it did not have an antipsychotic effect in schizophrenia ( 118 ).
The 1950s saw a commencement of clinical trials using placebo control.
Mogens Schou headed the placebo-controlled study in Århus, where he was
able to demonstrate a significantly higher effect of lithium than of placebo in
the treatment of mania. In 1988, a ‘landmark’ study of lithium versus antip-
sychotic medication took place at Northwick Park Hospital in London ( 119 ).
In a randomised controlled, double-blind trial, patients (120 in all) admitted
with psychosis (i.e., schizophrenia, schizo-affective psychosis, mania) were
either treated with lithium, pimozide (a drug similar to haloperidol), a com-
bination of these two active drugs, or with placebo. The trial had a duration
of three weeks and the results showed that the present state symptom profile
and not the DSM-III diagnosis was the valid factor. Regardless of diagnosis,
pimozide had a specific effect on the psychotic symptoms (hallucinations
and delusions), while lithium had a specific effect on the manic symptoms
assessed by the Bech-Rafaelsen Mania Scale (see Appendix 6). In his award-
winning book ‘Madness Explained’, the psychologist R.P. Bental wonders
why there has not been more of this type of study, in which all patients hos-
pitalised during a specific period of time are treated according to standard-
ised principles. He calls this investigation a landmark study ( 120 ).
Mogens Schou demonstrated the high prophylactic effect of lithium on
both mania and depression in bipolar patients. There are no placebo-
controlled trials with haloperidol in mania, as the use of placebo in such
severe cases is considered to be unethical. Therefore, the sertindole study
(Figure 5.7 ) is very important.
Around 1980, it became possible to measure haloperidol plasma
con centrations and the psychiatric department of the Danish Rigshospitalet
performed a study to investigate a potential connection between haloperidol
plasma concentration and clinical effect ( 121 ). This study showed that
severely manic patients (measured on the Bech–Rafaelsen Mania Scale
(see Appendix 6) could respond after 6 days of treatment with a fixed dose
of 10 mg haloperidol. The patients with the highest plasma concentration
showed the best response. As patients differ in their metabolism of haloperi-
dol and as there are no active metabolites, the trial resulted in a recommen-
dation to use blood sampling in haloperidol therapy.
66 Clinical Psychometrics
With the emergence of atypical antipsychotics, the drug olanzapine
proved to have the most reliable antimanic effect. As women are slower
metabolisers of olanzapine than men, we performed a study on manic
women at the University Hospital of Geneva in Switzerland. These severely
manic patients responded after 14 days on an olanzapine dose of 20 mg,
and we could yet again show that the patients with the highest plasma
concentration had the most pronounced effect as assessed by the Bech–
Rafaelsen Mania scale (MAS) ( 122 ). In this trial, the MAS was compared
with the US’ Young Mania Scale (YMRS) and proved to be far more
valid, both in item response theory analysis and plasma concentration
effect relations.
Antidepressive medication
The ‘second generation’ antidepressants provided us with a line of products
developed on the basis of a hypothesis regarding their biological mode of
action ( 123 ). They all had different chemical formulations, in contrast to
the ‘first generation’ antidepressants that had their tricyclic chemical
structure in common; for this reason these antidepressants are often called
‘tricyclics’. The new generation had a selective inhibiting effect on
serotonin reuptake ( s elective s erotonin r euptake i nhibitors, or SSRIs). The
tricyclic antidepressants also possess this effect, together with many other
modes of action, such as their antihistamine effect, which is quite potent.
Their sedative effect makes car driving problematic. The antihistamine
effect also causes an increased appetite, so that weight gain should be
monitored.
The SSRIs do not have these ‘side effects’, but their serotonin reuptake
inhibition can give other side effects, such as nausea and vomiting,
hyperhidrosis, headache, sleep disturbances, agitation and sexual dysfunc-
tion; these side effects are caused by their serotonin 2A receptor stimulating
effect while the SSRIs’ antidepressive effect is due to serotonin 1A receptor
stimulation.
As many of these side effects are listed as depressive symptoms in the
Hamilton Depression Scale or the MADRS (see Appendix 3), but not in
the HAM-D 6 , it is vital to use the HAM-D
6 in dose-response relationship
studies.
Figure 5.10 shows the pharmacopsychometric triangle for the second-
generation drug escitalopram where HAM-D 6 was used as measure of
antidepressive effect (A) and the quality of life scale Q-LES-Q (C) was used
to measure patient-related quality of life ( 104, 106 ). The percentage of
Wan
ted
effe
ctH
AM
-D6
Non
-com
plet
ers
%du
e to
sid
e-ef
fect
s
Ove
rall
QoL
Dos
eE
ffect
siz
e
10 m
g es
cita
lopr
am20
mg
esci
talo
pram
40 m
g ci
talo
pram
0.31
0.70
0.46
Pla
cebo
10 m
g es
cita
lopr
am20
mg
esci
talo
pram
40 m
g ci
talo
pram
7.4%
6.7%
10.4
%9.
6%
LES
-QD
ose
10 m
g es
cita
lopr
am20
mg
esci
talo
pram
40 m
g ci
talo
pram
–0.
14–
0.48
–0.
43
A
C
B
Fig
ure
5.1
0 T
he p
harm
acop
sych
omet
ric t
riang
le fo
r es
cita
lop
ram
and
cita
lop
ram
in d
epre
ssio
n (1
04)
68 Clinical Psychometrics
patients leaving the study before completion of the planned eight weeks of
treatment was used as an overall measure of side effects.
Mokken analysis showed that both the HAM-D 6 and the Q-LES-Q were
unidimensional (coefficients of homogeneity of 0.40 or higher).
The study shown in Figure 5.10 is a ‘landmark’ study in the sense that it
included a Quality of Life scale and in that escitalopram was not only com-
pared with placebo but also with 40 mg of citalopram, which a previous
dose–response analysis using the HAM-D 6 had shown to be the optimal dose
in patients with a baseline HAM-D 17
of 20 or higher.
The study data shown in Figure 5.10 included only patients with a DSM-IV
major depression who scored 30 or higher at baseline on the MADRS,
indicating a rather marked degree of depression. As can be seen, 10 mg
escitalopram was an inadequate dose in these patients as evident both on the
HAM-D 6 and on the LES-Q. Both 40 mg of citalopram and in particular
20 mg of escitalopram, however, achieved an effect size greater than 0.40
( 104, 106 ).
Figure 5.11 shows the pharmacopsychometric triangle for desvenlafaxine,
which is the active metabolite of venlafaxine. While escitalopram, like other
SSRI drugs, only has a serotonin specific action, both venlafaxine and
desvenlafaxine have a reuptake action on noradrenaline as well as serotonin.
For this reason, these drugs have the acronym SNRI (serotonin and
noradrenaline reuptake inhibitors). The element that makes the trial shown
in Figure 5.11 a landmark study is that the WHO-5 quality-of-life scale was
used in the placebo-controlled trials leading to a FDA approval of desvenla-
faxine with 50 mg as the lowest effective dose ( 124 ).
However, Figure 5.11 shows that effect size only reaches 0.40 on the
HAM-D 6 for this dose. On the WHO-5, the effect size is negative since a
higher score signifies increased well-being. For the 100 mg desvenlafaxine
dose, the HAM-D 17
, the HAM-D 6 and the WHO-5 are all above the 0.40
limit for clinical significance. As regards side effects, of which hyperhidrosis
is the most significant, there is no difference between 50 mg and 100 mg des-
venlafaxine.
Three decades ago it was concluded that even for the rather potent first-
generation antidepressants (i.e., imipramine) we are not able to demonstrate
their actions from an aetiological point of view ( 64 ):
The influence of the disorder on the total variance in response to
treatment obviously depends on the specificity of the therapeutic
effect. Drugs acting on an aetiological factor parallel to vitamin B 12
in pernicious anaemia are more specific than are drugs acting on an
intermediary factor like digoxin in heart failure. However, drugs need
The clinical consequence of IRT analyses 69
not act on an aetiological factor to be of nostological importance.
From our studies we cannot evaluate whether imipramine acts on an
aetiological rather than on an intermediary factor in endogenous
depression. What we have found is that in these patients with
endogenous depression (defined by the diagnostic Newcastle Scale)
a correlation emerged between plasma levels and goal outcome.
By use of the HAM-D 6 it was moreover possible to obtain a
population-independent response-curve, i.e., a curve indicating the
treatment effect in relation to treatment time. Such a curve might
indicate that if an outcome is imipramine-dependent, the patient’s
response has to follow the response-pattern for imipramine. ( 108 ).
Antianxiety medication
In the 1960s, the benzodiazepines, in particular diazepam, became available
to treat the different anxiety disorders, especially generalised anxiety.
As anxiety disorders, especially generalised anxiety, are chronic in their
nature, the development of dependency on benzodiazepines was seen as very
problematic; this dependency is almost of the same nature as that known for
alcohol. Cross tolerance between diazepam and alcohol was demonstrated;
in some places (including in Denmark) this knowledge was used in the
treatment of alcohol withdrawal. However, diazepam did not prove to be reli-
ably effective in this critical alcohol withdrawal condition, which can be
Wanted effect
Placebo
50 mg
100 mg
7%
12%
13%
Unwanted effectHyperhidrosis
Dose Effect size
50 mg – 0.30
100 mg – 0.45
Quality of life / WHO-5
A B
C
HAM-D6
0.43
0.50
DoseHAM-D17
50 mg 0.33
100 mg 0.41
Effect size
Figure 5.11 The pharmacopsychometric triangle for desvenlafaxine in depression, using the WHO-5 (124)
70 Clinical Psychometrics
lethal when untreated. Other drugs, such as phenemal, are safer than
diazepam in alcohol withdrawal syndrome.
Both phenemal and diazepam have quite a significant effect on anxiety
and since the 1960s attempts have been made to find drugs that do not
generate dependency. General practitioners have often employed adrenergic
beta-receptor inhibitors such as propranolol, the archetypical ‘beta-blocker’.
It belongs to a group of drugs used in hypertension, also a chronic condition
in its milder forms. Long-term propranolol therapy in hypertension has not
caused the dependency seen with alcohol or benzodiazepines.
The differentiation between mental anxiety symptoms and physical
(somatic) anxiety symptoms that Hamilton showed to be important by his
factor analysis (Table 1.1) has proved to be of major clinical significance.
Thus, the effect of benzodiazepines and ‘beta-blockers’ (propranolol) is
predominantly on the physical anxiety symptoms. These somatic anxiety
symptoms dominate the picture in a normal stress-related anxiety reaction.
This is why benzodiazepines, alcohol and propranolol are used in these anx-
iety states. Propranolol is used to calm exam nerves or for airplane pilots who
experience anxious trembling during take-off. As propranolol does not cross
the blood-brain barrier, it has no sedative effect, as is the case with alcohol
and benzodiazepines.
While there are no definite ‘landmark’ studies with propranolol in general-
ised anxiety, clinical experience with the drug is not convincing, due to its
specific effect on the physical anxiety symptoms. A trial drug developed in
the 1980s by the then Swiss company Ciba-Geigy (CGP 361 A) demonstrated
a central anxiolytic effect. As it had proved to have a greater anxiolytic than
antihypertensive effect, the drug was assessed in a Danish placebo-controlled
trial ( 125 ). This was quite a small pilot study with about 17 patients in each
treatment group.
The pharmacopsychometric triangle in Figure 5.12 shows that this beta-
blocker was effective in generalised anxiety on both the Hamilton Anxiety
Scale and on the six-item HAM-A 6, which measures psychic anxiety
symptoms (see Table 1.1). However the drug’s effect on the Quality of Life
scale was less pronounced, although it was well tolerated.
This study is mentioned here due to the fact that, in contrast to pro-
pranolol, this beta-blocker demonstrated an effect on the psychic anxiety
symptoms, and also because a positive well-being scale was included.
The five WHO-5 analogue symptoms are actually items from the Hospital
Anxiety and Depression Scale (HADS) (see Appendix 8b). Some of the
items in this questionnaire are aimed at symptom experience (negatively
phrased questions) and some at positive well-being (positively phrased
The clinical consequence of IRT analyses 71
questions).The WHO-5 is a questionnaire for measurement of general,
positive well-being.
As both phenemal and the benzodiazepines are antiepileptics, attempts
have been made to measure the anxiolytic effects of modern antiepileptics
that do not possess the dependency producing effect of diazepam.
One the new antiepileptics, pregabalin, has been found effective in gen-
eralised anxiety and is authorised for use on this indication. A re-analysis of
the placebo-controlled pregabalin trials in patients with generalised anxi-
ety has shown that 150 mg pregabalin is an inadequate dose, with a HAM-
A 14
effect size of 0.31, and merely 0.20 on the valid HAM-A 6 ( 126 ).
Pregabalin doses between 200 mg and 450 mg gave a HAM-A 14
effect size of
0.56 and a HAM-A 6 effect size of 0.49. Higher doses did not result in larger
effect sizes.
These controlled pregabalin studies in generalised anxiety included differ-
ent benzodiazepines, but not diazepam. Clonazepam and alprazolam are
thought to have the lowest dependency syndrome risk. The alprazolam effect
size was about 0.35 on the HAM-A 14
and HAM-A 6 .
Only one trial exists in which pregabalin was compared to an antianxiety
drug; venlafaxine. For a dose of a mere 75 mg venlafaxine, the HAM-A 6 effect
size was 0.40, but only 0.31 on the HAM-A 14
.
The Rickels et al study is the landmark study in generalised anxiety
disorder, as it is a placebo-controlled comparison of diazepam with imi-
pramine and trazodone, focusing however, on the psychic anxiety
symptoms of Hamilton’s Anxiety Scale (see Figure 1.8) ( 127 ). Using the
total score of the psychic anxiety factor, Rickels et al demonstrated that
Tolerability
Quality of lifeHADS (see Appendix 8b)
Coefficient ofhomogeneity
A B
C
Effect size
HAM-A14 0.47
HAM-A6: 0.63
Wanted effect
Placebo
Active
100%
86.7%
Effect size
– 0.26
0.34
0.46
HAM-A14
HAM-A6:
WHO-5 0.68
Figure 5.12 The pharmacopsychometric triangle for anti-anxiety (125)
72 Clinical Psychometrics
imipramine was significantly superior to diazepam compared to placebo
( 127, 130 ). When using the total score of all 14 Hamilton items, however,
the superior effect of imipramine versus diazepam became less obvious as
the physical symptoms weigh too heavily in the total score when the
complete scale is used.
Mood stabilising medications
Lithium is still considered to be the most effective mood stabiliser ( 121 ).
Evaluated within the framework of the pharmacopsychometric triangle, the
profile of lithium in affective disorder is as illustrated below.
In Figure 5.13 (A) covers the clinical effect of lithium. A dose– response
relationship has been observed ( 121 ). Thus for an acute antimanic effect, a
dose resulting in concentrations between 0.8 and 1.2 mmol/l is most effec-
tive. For antidepressant augmentation in patients with therapy-resistant
depression, a concentration between 0.3 and 0.5 mmol/l is most effective. For
long-term mood stabilisation between 0.5 and 0.8 mmol/l is most appropriate.
In this mood stabilising approach the side-effects, as seen in high antimanic
doses, should be eliminated, i.e., such side-effects as tremor ( 128 ). Car
simulator trials have shown that in a range from 0.5 to 0.8 mmol/l, l ithium
has no sedative effect on the psychological functions relevant for car driving
behaviour.
Very few reports have been published on quality of life in long-term
lithium therapy with reference to typical quality of life questionnaires such as
SF-36 or WHO-5. However, within instruments assessing quality of life,
suicidal thoughts are often used to demarcate the lowest possible level of
quality of life (‘life is not worth living’). Evidence has been accumulated
Clinical effect /dose of lithium mmol /l (118)
0.8 – 1.2AntidepressiveMood stabilizing
A B
C
Side effects
Non-sedative profile:Simulated cardriving (128)
Quality of lifeAntisuicidal effect (118)
Antimanic0.3 – 0.50.5 – 0.8
Figure 5.13 The pharmacopsychometric triangle
The clinical consequence of IRT analyses 73
showing that lithium is the most effective antisuicidal medication in
psychopharmacology ( 118, 121 ).
Combination of antidepressants
In placebo-controlled trials we are focusing on the response to a single
antidepressant medication to identify the effect size for this medication
against placebo. We have rather few trials studying the effect of combining
two antidepressants, which is often used in daily clinical practice, if a patient
has not responded to the first drug attempt. In this case the common
approach is to maintain the treatment with the first drug and then to add
another drug to obtain remission. The landmark study in this approach by
augmentation of another drug is the STAR-D study ( 129 ). This study has
recently been re-analysed using the pharmacopsychometric triangle as
outcome, i.e., with the HAM-D 6 as criterion for a pure antidepressive
effect ( 130 ). By use of this valid subscale for antidepressive effect we could
demonstrate the augmentation with bupropion to patients not responding to
citalopram was superior (P = 0.03) to augmentation with buspirone ( 130 ).
74
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Among the many foreigners to visit Wundt’s laboratory in Leipzig in
the 1880 s, was the medical candidate and psychologist William James
(1842–1910), who was present at one of Wundt’s lectures in November 1882
and was also shown the laboratory.
James must have sat in the auditorium and listened to Wundt’s lecture
with, among others, Kraepelin, who was the only physician amongst Wundt’s
students. On that November day, Kraepelin was very preoccupied with his
experiments in the psychological laboratory. We have no certain knowledge
of a possible encounter between the two physicians at Wundt’s laboratory, but
James spoke German and they probably exchanged a few words.
In his capacity as a physician, James had set up a physiological laboratory
at Harvard University in 1875, but not until the beginning of 1884 did it
become a psychological laboratory modelled upon Wundt’s. In 1889 James
was called to a professorship in psychology, having already been appointed
professor of philosophy at Harvard in 1885 ( 131 ). 1890 saw the publication of
his main work ‘Principles of Psychology’, still thought to be the most
significant publication within scientific psychology. However, James
remained more of a philosopher than a psychologist and became more and
more absorbed with what we now call health-related quality of life ( 132 ).
In 1897, James published a collection of essays entitled ‘The Will to Believe’
( 133 ). Among these essays was ‘Is Life worth Living?’; now regarded as the
‘landmark’ publication in health-related quality of life. James took as his starting
point the fact that human well-being is a subjective, emotional perception
and should thus be measured psychometrically, not biologically. He did not
attempt himself to develop questionnaires measuring quality of life.
In his ‘Talks to teachers’ on psychology for students about some of life’s
ideals, James refers to the following statement by Wilhelm Wundt at the turn
of the 20 th century:
6 The clinical consequence of IRT analyses: Health-related quality of life
The clinical consequence of IRT analyses 75
And if I [Wundt] were asked what the work of experimental
observation in psychology has consisted of, and still consists of for
me, I should say that it has given me an entirely new idea of the
nature and connection of our inner processes … the close union of all
those psychic functions normally separated by artificial observations
and names, such as ideation, feeling, will; and I saw the inner
homogeneity, in all its phases of mental life
Quality of life might be the interconnection of feelings, will and well-
being ( 134 ).
It was the first proper philosopher of the welfare state, Jeremy Bentham
(1748–1832), who was also the first to attempt to measure hedonia, i.e.,
subjective well-being. He felt that each citizen should be able to achieve his or
her optimal mental well-being within the economic and cultural boundaries
of society ( 135 ). He defined subjective well-being as the difference between
the sum of all kinds of pleasure and the sum of all kinds of pain experienced
by the individual in a given period of time, e.g., during a course of treatment
lasting for some weeks or a few months.
However, the scientific ‘landmark’ study was not performed until the end
of the twentieth century. It used the MOS SF-36 questionnaire (Medical
Outcomes Studies, Short Form) (Figure 6.1 ), in which ‘36’ refers to the
36 questions in the questionnaire ( 4 ). As seen in Figure 6.1 , the 36 SF-36
items constitute eight subscales. The first subscale at the top of Figure 6.1 is
Physical Functioning (PF) and contains the ten items listed in the question-
naire ’ s Subheading 3. They deal with difficulty in coping with such physical
activities as lifting or carrying groceries, climbing stairs, taking walks. These
ten questions intuitively fulfil the item response theory model, as persons
unable to bathe or clothe themselves (3j) are also unable to walk a distance of
100 meters (3i), etc.
The Role Physical scale (RP) in Figure 6.1 measures the impact of physical
health on daily activities. Bodily Pain (BP) (items 7 and 8) measures physical
pain. General Health (GH) measures physical health with items 1 and 11.
As seen in Figure 6.1 , these four subscales cover physical health. The four
subscales at the bottom of Figure 6.1 deal with mental health; with Vitality
(VT), Social Functioning (SF), Role Emotional (RE) and Mental Health (MH).
The items in the four subscales dealing with mental health are both
positively and negatively phrased. As shown in Figure 6.1 , both Item 9e
(being full of energy) and 9d (being calm and relaxed) are positively phrased
and measure positive mental well-being. In contrast, Item 10 (difficulty
visiting relatives and friends), 5c (less careful doing daily activities) and Item 9f
(feeling sad) are negatively phrased and should be seen to measure actual
76 Clinical Psychometrics
symptoms of poor mental health. The reason for the use of both positively
and negatively phrased questions is an old psychometric issue. Its purpose
was to ensure that the subject had actually read the questions and was not just
mechanically filling in replies in the same way regardless of whether they
were negative or positive.
Figure 6.1 also shows the SF-12. The 36 questions in SF-36 are often seen
as quite a large number, although SF stands for Short Form! In recent years,
the use of SF-12 has become popular. This scale measures both bodily or
physical quality of life, mental quality of life and social quality of life. These
SF-12 SF-36Items Items Scales Factors
3a3b only moderate physical activities3c3d able to walk up several flights of stairs3e3f3g3h3i3j
3b
3d Physical functioning (PF)
4a4b accomplished less work than before4c limited in the kind of work4d
4b Role Physical (RP)4c
PhysicalHealth (PCS)
78 difficulty working due to pain
Bodily Pain (BP)8
1 1 health all in all11a11b11c11d
General Health (GH)
9a9e full of energy9g9i
9e Vitality (VT)
610 difficulty visiting relatives and friends
Social Functioning (SF)10
Mental Health(MCS)
5a5b accomplished less work than before5c didn’t do work as carefully as usual
5b Role Emotional (RE)5c
9b9c9d calm and peaceful9f downhearted and blue9h
9d Mental Health (MH)9f
Figure 6.1 Scoring sheet for the SF-12 items from the SF-36. The two factors, physical versus mental health, are also indicated
The clinical consequence of IRT analyses 77
subscales are converted to a 0–100 value scale where ‘0’ signifies worst
imaginable quality of life and ‘100’ best imaginable quality of life ( 4 ).
SF-36 population studies have been carried out in several countries with
Denmark playing a leading role ( 136 ). Figure 6.2 shows the original US
population study. The results of the Danish population studies are quite
similar to these.
The American study is an interesting landmark study, in that it
demonstrates how depressive patients differ from a normal population.
As can be seen (Figure 6.2 ), the depressive patients score less on all
subscales, and on the four Mental Health Functioning subscales (MCS,
Figure 6.1 ) the difference equals one standard deviation. The problem in
this respect is that the degree of clinical depression is poorly defined in
this study. Thus as regards mental quality of life, ‘0’ indicates that life is not
worth living while ‘100’ signifies maximum positive well-being. This mental
quality of life measure in the SF-36 is based on a precursor of the scale the
An American general population study and a groupof depressed patients from the primary care setting
100
90
80
70
60
50
40
30
20
10
0MH
Normal population
Depressive patients
PF = Psychical FactorRP = Role PsychicalBP = Bodily PainGH = General HealthVT = VitalitySF = Social FunctioningRE = Role EmotionalMH = Mental Health
Bestimaginable
Worstimaginable
PF RP BP GH VT SF RE
Figure 6.2 Results of an American general population study (modifi ed) comparing persons with and without depression. (Ware JE, Gandek B and the IQoLA project group. Int J Ment Health 1994;23:49–73)
78 Clinical Psychometrics
Psychological General Well-Being (PGWB) scale, which was actually the
scale used in all the scientific trials performed in the 1980s to assess efficacy
of medication in chronic diseases such as hypertension( 4 ).
When the World Health Organization (WHO) was established in 1948,
health was defined as not only the absence of symptoms of illness, but also
as physical, mental and social well-being. This is why SF-36 is termed a
health-related quality of life scale. Among its components, positive psycho-
logical well-being is probably the most general measure, as opposed to
physical and social quality of life.
However, in the SF-36, the mental quality of life questions are both
negatively phrased (as when measuring depressive symptoms, e.g., feeling
blue) and positively phrased (as is the case when positive well-being is being
measured).
The use of both types of phrasing was included in many questionnaires in
the early days of psychometrics, partly to ensure that the person being
interviewed actually read the questions thoroughly and did not just mechan-
ically tick a certain response option no matter its content, and partly to avoid
what is called ‘social disability’, a situation that may arise if only negatively
phrased questions are asked and the person being interviewed makes him
or herself appear more ill than is really the case.
The WHO -5 Questionnaire
In an extensive analysis of Murray’s basic human needs and their hierarchic
arrangements, Rasmussen concluded that the hedonic need might be
considered as a global index of measurement ( 137, 138 ). The WHO-5 can be
considered as such a general psychological well-being scale measuring a
global hedonic dimension and is actually derived from the Psychological
General Well-Being scale ( 139, 140 ). The WHO-5 is a questionnaire that
measures current (the previous two weeks) mental well-being. As such, the
WHO-5 is probably the most robust questionnaire from a psychometric
point of view ( 141 ). Attempts at measuring eudemonia, which is not the
actual perception of well-being, but rather some meaningful causal element
lying behind hedonia, are still inconclusive. When measuring positive quality
of life, it is important to avoid symptom-related language and to use only
positively phrased questions. Based on previous experience with the PGWB
and the SF-36, the WHO-5 was developed as a measure of general positive
quality of life.
The quantification of the individual items in terms of their presence
during the past two weeks proved to be highly sensitive as an indicator of
The clinical consequence of IRT analyses 79
positive well-being. Subsequently, a five-item questionnaire was shown to be
sufficient to cover the dimension from 0 to 100, where a higher score means
a higher level of well-being. As each item is scored from 0–5 (see Figure 6.3 )
the theoretical raw score goes from 0–25. By multiplying the raw score by 4,
a theoretical score span from 0 = worst imaginable quality of life to 100 = best
imaginable quality of life is achieved.
A Danish population study showed a WHO-5 mean score of about 70
( 142, 143 ). Table 6.1 shows a general practitioner study in which WHO-5 is
close to 70 in patients without symptoms of mental illness. It also shows that
in depressive patients, the WHO-5 mean score is about 30 and that in the
various anxiety disorders, the WHO-5 is linearly increasing.
Speer has shown, using the PGWB, that after 6 weeks of treatment depres-
sive patients may achieve a statistically significant increase in well-being
which, however, is still significantly lower than that of the average population
( 144 ). The national norm is not reached until after 12 weeks of therapy.
Figure 6.4 illustrates equivalent results with the WHO-5. Here, depressive
patients score approximately 30 on the WHO-5 prior to treatment. After six
Over the pasttwo weeks…
All of thetime
Most ofthe time
More thanhalf the
time
Less thanhalf the
time
Some ofthe time
At notime
1 .. I have felt cheerful andin goodspirits
5 4 3 2 1 0
2 .. I have feltcalm andrelaxed
5 4 3 2 1 0
3 .. I have felt active and vigorous
5 4 3 2 1 0
4 .. I woke up feeling freshand rested
5 4 3 2 1 0
5 .. My dailylife has beenfilled with things that interest me
5 4 3 2 1 0
Total score x 4 = __________
The WHO-Five questionnaire
raw score of item 1 to item 5
Figure 6.3 The WHO-5 scoring sheet
80 Clinical Psychometrics
weeks of therapy WHO-5 increases to 50, this is statistically significant
(P < 0.01). The present day goal of antidepressive therapy is however to attain
the same WHO-5 score as the average population, i.e., about 70. Often, this
does not happen until after 12 weeks of therapy, as illustrated in Figure 6.4 .
Another standardisation of the WHO-5 has been performed by Lucas et al
( 145 ). Using the WHOQoL-BREF item of general quality of life: ‘How would
you rate your quality of life?, Poor, Neither poor nor good, or Good’ it was
found that persons with ‘poor’ quality of life had a WHO-5 mean score of
37.5 (21.4), persons answering ‘neither poor nor good’ had a WHO-5 mean
Table 6.1 Results of a WHO-5 study in the primary care setting (146)
ICD-10 diagnoses WHO-5 mean (sd)
Not diagnosed with mental disorders (N = 1162) Mental disorders (N = 358) • Depressive disorder (N = 116) • Anxiety disorders (N = 30) • Somatoform disorders (N = 173) • Other minor mental disorders (N = 39)
66.27 (19.57) 43.66 (21.96) 31.91 (21.38) 45.07 (20.29) 48.86 (20.03) 50.60 (19.20)
WHO-5
Weeks of therapy
100
70
50
30
General population mean
P £ 0.01
P < 0.01
Endpoint
Baseline
0
1 2 3 4 5 6 7 8 9 10 11 12
The goal of treatment in depression using WHO-Fiveis to reach the general population mean
Figure 6.4 The goal of depression therapy is that the depressive person should obtain a WHO-5 result in line with that of the general population, i.e., around 70. As can be seen, this will only happen after 12 weeks of therapy
The clinical consequence of IRT analyses 81
score of 59.6 (20.8) whereas those answering ‘good’ had a WHO-5 mean
score of 68.9 (16.2).
A review article by McDowell shows that the WHO-5 possesses high sen-
sitivity and specificity as a screening instrument in depression ( 142 ). In the
general practice setting, the WH0-5 has proved to be better than both the
General Health Questionnaire (GHQ) and a specific depression question-
naire designed to screen for depression in this setting ( 142, 143 ).
As the GHQ consists of items with a mixture of positively and negatively
phrased questions, a study in patients with chronic non-malign pain has used
factor analysis to determine whether the respondents were compliant when
completing the GHQ, i.e., noticing the questions that are ‘reversed’, that is,
with positive versus negative signs. This is done by taking the raw scores and
using a factor analysis in which the first factor takes the negatively phrased
items and the second factor the positively phrased items. In the study in ques-
tion it could be demonstrated that the respondents were able to differentiate
between positively and negatively phrased questions ( 147 ).
Table 6.1 shows a study from the family doctor setting where WHO-5
had a mean score of approximately 66 in the patients not having a mental
disorder ( 146 ). Patients with major depression had a mean score of approxi-
mately 32. The patients with anxiety disorder had higher WHO-5 means.
When Eysenck started using questionnaires instead of the Rorschach test
in the 1940s to assess personality variables, he was especially interested in
measuring the dimension of neuroticism.
Eysenck was actually testing Freud’s concept of this dimension. The
hypothesis was that this dimension was present to a mild degree in the normal
population, while increasing with growing neurotic behaviour in patients
suffering from anxiety neurosis. In these questionnaire studies, Eysenck
demonstrated that it was more reliable to use items with negative sign when
measuring an illness-related dimension, such as neuroticism.
Figure 1.4 shows the nine items that delimit the dimension of neuroticism.
European and American clinical psychologists have attempted to achieve
consensus on the most important personality dimensions and have identified
a five-factor model ( 36 ). In this model, Eysenck’s dimension of neuroticism
is the most important. As may be seen in Figure 1.4, the psychic anxiety
symptoms, and not the somatic anxiety symptoms, constitute the dimension
of neuroticism.
In a Danish study that used the clinical diagnoses made by the Danish
professor of psychiatry Thorkild Vanggaard as index of validity (clinical
validity), it was found that only Eysenck’s dimension of neuroticism had
clinical validity compared to ten other personality scales ( 33 ).
82
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
When Hans Selye developed his theory of the concept of stress in 1936, he
discriminated between stressors (strain), stress (the bodily reactions to such
strain) and distress (the mental reactions to the strain) ( 132 ).
In Selye’s original stress model, post-traumatic stress disorder (PTSD) was
not the focal point; it was the stress condition that develops during daily
strain at work or at home. When delimiting these daily ‘life events’ in psycho-
metric research, one attempted to consider as many items as possible, operat-
ing within the field of classical psychometrics ( 4 ). Cronbach’s alpha coefficient
was thus used as a statistical index. This coefficient denotes the degree of
correlation between the different daily life events. However, Cronbach’s alpha
gives no indication as to whether each event provides additional information
about the ‘stressor dimension’. Furthermore, the number of events is part of
the formula for the calculation of alpha. This resulted in a tendency to include
at least 20 items in the various ‘life-event questionnaires’.
Studies performed by the Danish National Research Centre for the Working
Environment have established that the six items shown in Figure 7.2 provide
an adequate measure of work-related stressors ( 148 ).
Post-traumatic stress disorder
In post-traumatic stress disorder, a single stressor completely dominates the
picture. Studies in American Vietnam veterans have formed the ‘landmark’
research in PTSD ( 149 ).
With the DSM-III, PTSD became an official diagnosis, which is also
included in ICD-10. Apart from combat situations, the catastrophes most
commonly encountered today are earthquakes, airplane crashes, road
7 The clinical consequences of IRT analyses: The concept of stress
The clinical consequences of IRT analyses 83
traffic accidents and rape, i.e., a single, completely unexpected and dev-
astating event. It has gradually become apparent that a PTSD ‘distress’
reaction follows a clearly defined trajectory, which is illustrated in
Figure 7.3 .
The initial symptoms typically commence after a two-week latency period.
If the person has a physical pain reaction, e.g., whiplash syndrome, then all
therapeutic attention is focused on pain relief ( 150 ). Due to this, PTSD is
often allowed to develop without initiation of relevant therapy. If these initial
symptoms are allowed to develop, they obviously become more and more
pronounced (sleep disturbances, nightmares, repetitive thoughts or memo-
ries of the violent event).
In about 15% of all PTSD sufferers, the condition may develop into a
proper depressive state (HAM-D 6 ), with lowered mood (hopelessness),
Significant factors for stress and well-being in the work place
• Influence (you are listened to)
• Meaning (daily work processes)
• Relevant information (planning work)
• Support (social network)
• Rewards (recognition)
• Demands (overtime)
Figure 7.2 The six factors in Selye’s stress model found to be signifi cant. (Kristensen TS, Borg V and Hannerz H. Socioeconomic status and psychosocial work environ-ment: results from a Danish national study. Scand J Public Health 2002; 30: 41–48, and Bech P et al. Work-related stress, depression and quality of life in Danish managers. Eur Psychiatry 2005 (suppl 3):S318–S325)
Stressors(stressing influences)
Stress(the physical reaction)
Distress(the mental reaction)
E.g. Too high demandsLack of influenceLack of social support
E.g.High blood cortisolHypermetabolismHypertension
E.g.IrritabilityPsychic anxietyDepression
The Hans Selye’s model of clinical stress
Figure 7.1 Hans Selye’s medical stress model
84 Clinical Psychometrics
guilt feelings (negative view of the past), lack of initiative (negative view
of present situation), fatigue, feeling subdued and inactive. If this depres-
sive state is neglected it may, over the course of three to six months,
become chronic, with the addition of symptoms such as introversion,
emptiness, alienation reaction, and occasional suicidal impulses. The
behavioural theory of response to stimuli seems obvious in the PTSD situ-
ation, but the course of symptoms (HAM-D 9 � HAM-D
6 � HAM-D
2 )
seems a priori programmed to adapt to a certain form of genetic behav-
iour. In other words, we possess an innate disposition to react with the A,
B, C course of syndromes as collected in the HAM-D 17
(see Figure 7.3 or
Appendix 3a).
The work-related stress condition
The distress syndrome connected to work-related stress is largely identical to
the distress syndrome described in Figure 7.3 . However, the progression over
time in work-related stress conditions is less clearly described; this is proba-
bly due to the very unsystematic literature on work-related stress conditions.
The scales used in these studies often make it difficult to ascertain when an
individual symptom or a syndrome has been measured. Furthermore,
DSM-IV or ICD-10 depressions are often brought into the picture without
apparent awareness of the fatal flaw in these diagnostic systems. It is, thus, an
inherent rule of these diagnostic systems that if the condition (syndrome) is
so pronounced as to be major depression, then the stress model must be
abandoned ( 110 ).
Months aftertraumatic
event
Symptoms ABC HAM-D17
1–2 months Disturbed sleep, hereafter other arousal (anxiety)symptoms such as sweating, dizziness, heartpounding
HAM-D9
(B items)
3–4 months Depressed mood, tiredness, lack of interests, guiltfeelings, psychic anxiety, slowed down
HAM-D6
(A items)
5–6 months Feelings of emptiness, alienation, lack of insight,suicidal thoughts
HAM-D2
(C items)
The development of Post Traumatic Stress Disorder (PTSD) as measured by theABC version of HAM-D17 (see Appendix 3a)
Figure 7.3 Development over time of post-traumatic stress disorder (PTSD). Trauma (stressor): Catastrophes, accidents, war, traffi c accidents, rape
The clinical consequences of IRT analyses 85
Due to this, according to DSM-IV and ICD-10, chronic stress conditions
are merely mild anxious/depressive states where there are not enough symp-
toms to make a major depression or proper anxiety diagnosis.
The most widely used questionnaire developed specifically to measure
‘distress’ within the medical illness model was developed at Johns Hopkins
Hospital in Baltimore in the 1950s ( 4 ). Originally containing 41 items, it has
since expanded to 90 items. This distress questionnaire is the Symptom
Checklist (SCL-90).
Cohen’s Self-perceived Stress Scale (stress questionnaire) has been used in
several Danish general population studies, and has a Mokken coefficient of
homogeneity of 0.44.
Integration of Selye’s medical stress model
In 1936, the Austrian born physician Hans Selye (1907–86) described the
stress state he had observed as a general syndrome in patients with chronic
somatic diseases. Selye continued his career in Canada, where his research
led to the development of ‘the biological stress syndrome’ ( 151, 152 ).
According to Selye’s stress model, ‘stressors’ are the demands or strains that
cause the stress condition (Figure 7.1 ).
Cohen’s stress questionnaire is an attempt to measure the subjective
‘ stressors’ experienced by the patient during the preceding two weeks.
Question 3 in this stress questionnaire asks how much of the time during the
previous weeks you felt nervous and ‘stressed’.
According to Selye’s medical stress model, the actual stress condition is a
biological phenomenon; the pathophysiological part of the medical disease
model. From a scientific point of view, it is thus very important to use
the correct terminology, as the stress demands that have led to the stress
condition are named ‘stressors’, and are typically psychosocial factors (see
e.g., Figure 7.1 ), while the stress condition itself is biologically defined,
according to Selye. He believed that the higher levels of the adrenal cortex
hormone cortisol produced during chronic pressure result in the biological
stress condition reaction ( 151, 152 ). Hans Selye demonstrated that when
chronic stressors cause imbalance in the normal biological regulating mech-
anism of the body (the actual stress reaction), the body attempts to regain a
state of balance by increasing its production of the hormone cortisol in the
adrenal cortex.
After Selye’s death, some have sought to introduce the term ‘allostasis’ to
describe a stressed organism’s attempt to achieve a new state of balance in the
hormone and nervous system at the cost of increased cortisol production.
When slightly increased, Selye called the cortisol hormone a ‘tolerance
86 Clinical Psychometrics
hormone’. Throughout the ages, it has been women in particular who have
had to manage life on a higher level of cortisol, which is why they are faster
than men to develop the unhealthy stress condition that Selye called ‘distress’.
Selye’s final work ‘Stress without Distress’ was translated into Danish with
a title equivalent to ‘Stress without Anxiety’ ( 151, 152 ). Today, one would
rather translate ‘distress’ to ‘depression’. As early as 1913, the renowned neuro-
surgeon H.W. Cushing (1869–1939) described a disorder in which cancer
causes the production of cortisol to gradually increase over many months
(Cushing ’ s Disease) ( 132 ). At the beginning of the disease, these patients are
completely free from stressors, but mental symptoms appear prior to the
physical ones, i.e., over the course of some months, with anxiety, fatigue, sleep
disturbances, concentration difficulties, despondency and lowered mood. If
these symptoms are disregarded and the cancer is not diagnosed, cortisol
production will increase with increasing growth of the tumour and somatic
symptoms will appear such as hypertension, diabetes and cardiac disease,
which will prove fatal for the Cushing patient. Thus, it is the increased pro-
duction of cortisol seen in a stress condition that explains the mental stress
symptoms (distress) of anxiety, sleep disturbances and depression. However,
it is difficult to measure serum cortisol levels and the mental symptoms
appear already at the early stage of increase. Viewing cortisol as the crucial
factor is thus far too materialistic an approach. It is the mental manifestations
that are important. As Hans Selye himself concludes, it is important for each
individual to find his or her own level of stress without distress ( 151 ).
According to Selye’s model, all humans are stressed, as any kind of productive
labour has an impact on cortisol production.
The American linguist Noam Chomsky views the body–mind discussion
as a minor issue, as we only think we can comprehend the nature of a disease
when we are able to describe it in biological terms ( 153 ). When mental symp-
toms enter the picture, like in distress, we call cortisol a distress hormone (in
lower concentrations we call it the ‘tolerance hormone’).
In clinical psychometrics (clinimetrics), we measure mental manifesta-
tions within the psychometric frame of reference, so that in connection with
Selye ’ s stress model, we measure distress through questionnaires. Both anxi-
ety and depression questionnaires are used to measure distress. The Anxiety
Symptom Scale (ASS), see Appendix 5b, is recommended when screening for
anxiety symptoms, while the Major Depression Inventory (MDI), see appen-
dix 4a, is recommended when screening for depression.
The connection between depression and anxiety, when measuring ‘dis-
tress’ is best illustrated by Beck’s cognitive model of depression (Figure 7.4 )
and Spielberger’s antianxiety model (Figure 7.5 ). In Figure 7.4 , Beck’s
negative triad is related to the corresponding symptoms in Hamilton’s
The clinical consequences of IRT analyses 87
depression scale (HAM-D) and in Figure 7.5 , Spielberger’s model of men-
tal versus somatic anxiety is related to the corresponding symptoms in the
HAM-D.
In a very comprehensive study by Grinker et al containing many rele-
vant depression rating scales covering the period from 1956–60, i.e., prior
to the release of the HAM-D or BDI, these authors found, when using
factor analysis without and with rotations, that in their opinion a rather
limited number of factors was identified ( 154 ). For the quantification of
depressive states the authors found that the core items of subjective depres-
sion include
Hopelessness
Helplessness
Worthlessness or guilt feelings
To their surprise, Grinker et al also identified anxiety as a core item of
depression ( 154 ). Moreover, they considered psychomotor retardation
and tiredness as behavioural core items. The negative triad of depression
or the bias of the negative depressed person in his or her information pro-
cessing system has been considered to be the endophenotype, or deep
phenotyping, in depressive states. The extended HAM-D 17
version
The HAM-D items of depressed mood (Item 1), of guilt (Item 2), and of work andinterests (Item 7) are the three angles in the negative triangle (triad).
Negative view of the future(hopelessness)[HAM-D item 1]
[BDI6 item 1]
Negative view of the present(helplessness)[HAM-D item 7][BDI6 item 15]
Negative view of the past(guilt feelings)
[HAM-D item 2][BDI6 item 5]
Figure 7.4 Beck’s Negative Triad of Depression
88 Clinical Psychometrics
includes more specific items in this respect, namely the item of hopeless-
ness, the item of helplessness, and the item of worthlessness or guilt (see
Appendix 3b).
With regard to the ‘allostasis’ condition, i.e., the state in which long term
(or chronic) stressors are present, Eysenck’s neuroticsm scale is typically
employed (Figure 1.4). The fact that women score significantly higher than
men on Eysenck’s neuroticism scale in general population studies or in clini-
cal studies, gives food for thought. Perhaps the ‘villain’ here is the ‘politeness
hormone’ or rather the ‘neuroticism hormone’, cortisol.
Spielberger’s cognitive appraisal model of anxiety (5), of which anxiousmood is most valid, corresponding to Item 10 (psychic anxiety) in theHAM-D while the somatic symptoms are contained in HAM-D item 11
Subjective feelings of anxious mood
• Nervousness• Tension• Worry• Apprehension• Fearfulness (panic)
Activation (arousal) of the nervous system•••••
Nausea or upset stomachSweatingDizzinessHeart poundingTrembling
HAM-D Item 10
HAM-D Item 11
Figure 7.5 Spielberger’s cognitive appraisal model of anxiety
89
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Many doctors have often wished for a questionnaire measuring depression or
anxiety in the same way as a blood sample can be used to measure the
patient ’ s metabolism or cholesterol level. The results of such a blood test
come back from the laboratory with the current value and with the normal
range of this blood test given in brackets.
These normal ranges have emerged from blood sample results from a
representative sample of citizens.
Population studies in depression and anxiety
We have looked at various representative samples involving questionnaires
assessing anxiety, depression and well-being. Depression has become the
most relevant ‘blood test’ during the last 10–15 years to use in general practice
in order to identify true depression, depression secondary to medical
disorders (such as patients with chronic pain, cancer, diabetes) or people with
psychosocial burdens. We have undertaken three population studies using
the Major Depression Inventory (MDI). The MDI questionnaire is shown
in the appendix. It includes the depressive symptoms of both the DSM-IV
and the ICD-10. Via the algorithms given in these diagnostic systems, the
MDI response makes it possible both to diagnose the depression, and to
denote the severity of the depression as indicated by the MDI total score,
since the MDI total score is a sufficient statistic (Rasch analysis) and may
therefore be used to measure the severity of both primary and secondary
depression ( 155 ).
Table 8.1 lists the results of our three general population investigations. The
first was performed in 1999 and so far it has only been published in a PhD
thesis by Vibeke Nørholm. Her topic was the quality of life in schizophrenic
patients and she used the voluminous WHOQoL questionnaires ( 141 ).
8 Questionnaires as ‘blood tests’
Tab
le 8
.1 T
he r
esul
ts o
f Dan
ish
pop
ulat
ion
stud
ies
from
199
9, 2
000
and
2003
usi
ng t
he M
ajor
Dep
ress
ion
Inve
ntor
y q
uest
ionn
aire
Dia
gnos
is
1999
2000
2003
(N=1
078)
fem
ales
(N=5
66)
mal
es (N
=512
) 20
00 (N
=114
1)
fem
ales
(N=6
10)
mal
es (N
=531
) 20
03 (N
=186
7)
fem
ales
(N=9
88)
mal
es (N
=879
)
DSM
-IV m
ajor
dep
ress
ion
3.2
% 4.
2 %
2.1
% 3.
4 %
3.8
% 3.
0 %
2.6
% 3.
2 %
2.1
%
ICD
-10 d
epre
ssio
n 3.
6 %
4.2
% 2.
9 %
4.2
% 5.
1 %
3.2
% 2.
8 %
3.5
% 2.
1 %
MD
I > 2
0(m
ild d
epre
ssio
n)
6.6
% 8.
1 %
4.9
% 7.
7 %
9.5
% 5.
6 %
6.2
% 7.
7 %
4.7
%
MD
I > 2
5(m
oder
ate
dep
ress
ion)
4.
1 %
5.1
% 2.
9 %
4.2
% 5.
2 %
3.0
% 3.
7 %
5.0
% 2.
4 %
Resp
onse
rat
e 67
.1 %
51 %
68 %
Questionnaires as ‘blood tests’ 91
Table 8.1 shows that, in the Danish general population sample, the
prevalence of depression was 3.2% for DSM-IV major depression and 3.6%
for ICD-10 depression. An MDI score of 25 or more (corresponding to a
HAM-D 17
score of 18 or more) gives about 4% prevalence in the population.
According to WHO’s estimates from different parts of the world, the
prevalence lies between 3 and 5%.
In 2000, we performed a sampling in connection with Lis Raabæk Olsen’s
PhD thesis ( 156 ). The result was again a 3–4% prevalence of depression in
the general population, depending on the method used (DSM-IV, ICD-10, or
MDI total score). In 2003, we undertook a new population study, together
with Dr. Odont. Erik Friis-Hasché, whose field of interest was fear of dentists.
Approximately 1/3 of the persons in this study actually had a marked fear of
dentists (see http://www.ncbi.nlm.nih.gov/pubmed/7725561 ). Once more,
the prevalence of depression in the general population was between 3 and
4%. Apart from the year 2000 sample, Table 8.1 shows a greater prevalence of
depression in women than in men.
The family doctor will diagnose hypertension when systolic and diastolic
results are ≥ 140 mm Hg and ≥ 90 mm Hg, and by using the MDI in the same
way, the doctor may diagnose treatment-requiring depression or DSM-IV
major depression when the MDI is higher than 25. To continue the analogy,
the family doctor then determines whether it is a question of primary or
secondary hypertension, or of primary or secondary depression. While the
DSM-IV or ICD-10 major depression symptoms are presupposed to be the
same in primary depression (e.g., bipolar or unipolar depression) and in
secondary depression (depression due to somatic illness or a stress condition),
scientific research has proved through demonstration of transferability, that
the HAM-D 6 or the MDI do measure the same depressive condition in both
primary and secondary depression. This is the reason why the HAM-D 6 or
the MDI may be used when screening for depression.
In connection with the 2003 population study, anxiety was also measured,
using the Spielberger Anxiety Scale. We found that 7.5% of the general
population had a clinical anxiety condition ( 147 ).
Spielberger’s Anxiety Scale consists of a State scale (measuring present
state anxiety) and a Trait scale (measuring personality propensity to anxiety).
The present state scale only measures the psychic anxiety symptoms, while
the personality scale is a mixture of anxiety-related and depression-related
tendencies, but still with particular focus on bodily manifestations of anxiety.
However, results of the Trait scale are very difficult to interpret, so Eysenck’s
neuroticism scale (Figure 1.4) is the more valid.
Spielberger’s State Anxiety scale consists of 20 items; ten of these are
negatively phrased (symptom orientation), while the remaining ten items are
positively phrased (well-being orientation).
92 Clinical Psychometrics
A factor analysis of Spielberger’s State Anxiety scale results in several
factors, despite a very high Cronbach’s alpha coefficient (between 0.82 and
0.96); however, these factors are method factors, not true factors that provide
new insight ( 157 ). Thus, the two most significant factors only show that the
items describing symptoms have positive loadings (negatively phrased items)
while the items describing well-being have negative loadings ( 157,160 ). This
methodological issue is used as a measure of test-taking behaviour ( 147 ).
When requiring a questionnaire that deals directly with social functioning,
Sheehan’s Disability Scale is applicable.
The ability of the WHO-5 in detecting depression in elderly diabetic patients
(with a cut-off < 50) was found quite acceptable ( 158 ). Thus this study using the
DSM-IV major depression as index of validity ( 158 ) obtained a sensitivity of
100% and a specificity of 78%. In a population of adolescents with diabetes
(aged 13 to 17 years), the WHO-5 with a cut-off of < 50 using the Centre for
Epidemiologic Studies Depression Scale (CES-D) as index for depression,
obtained a sensitivity of 89% and a specificity of 86% ( 159 ).
The predictive validity of WHO -5
The predictive validity of the WHO-5 has recently been demonstrated in a
Danish study, where patients with cardiac disorders have been followed for a
period of six years ( 160 ). Patients who scored less than 50 on the WHO-5 at
the start of the study proved to have a significantly higher mortality than those
scoring more than 50 at the start of the study. This is apparent from Figure 8.1 .
Screening scales
There is a range of questionnaires aimed at screening for a condition
rather than measuring it. Among these different screening instruments,
the following have been selected: the Mini Mental State Examination
(MMSE) with the clock and the Anxiety Symptom Scale (ASS) (see
Appendix 5b).
MMSE /Clock test The Mini Mental State Examination is a screening instrument, as the scale
only assesses certain aspects of cognitive functioning. Therefore, some per-
sons may perform very well on the test with scores between 25 and 30 and
still be in the initial stage of dementia. Nor does the scale provide a depend-
able description of the more pronounced dementia state at the other end of
the score variation, i.e., scores below 15.
Questionnaires as ‘blood tests’ 93
However, the scale is the most frequently used worldwide, as it is easy to
administer and has a high reliability. As mentioned in connection with
antidementia medication, it is also used to measure effect during a course of
treatment.
In the clock-drawing test, the subject is presented with a pre-drawn circle
and asked to fill in numbers so as to represent the face of a clock. Then the
hands have to be set at a given time, e.g., 13.50 hours. The test is quick and
easy to administer. However, it cannot be used as the sole test and must be
viewed as a supplement to the MMSE.
Anxiety Symptom Scale ( ASS ) The ASS screening instrument provides a swift method to ascertain
which kind of anxiety is the most predominant in the subject (see
Appendix 5b).
When measuring the current state of anxiety, Spielberger’s Anxiety Scale
(STAI) may be used. If a clinical anxiety condition is established, the ASS
WHO-5 > 50%
WHO-5 < 50%
3 years 6 years
0.2
0.4
0.6
0.8
1.0
The predictive value of WHO-5 in patients with cardiac disorders. A survival analysis. (160)
Figure 8.1 Predictive validity of the WHO-5 in a study on survival in patients with heart disease. The Kaplan-Meier curves demonstrate that in patients scoring above 50 on the WHO-5 at discharge from hospital, 20% die within 6 years, while in patients with a WHO-5 score below 50, 80% die within 6 years ( 160 )
94 Clinical Psychometrics
scoring profile can be used to determine whether, besides a general state
(items 1 and 2) there is avoidance behaviour (item 3), anxiety attacks as in
panic attacks (items 4 and 5), obsessional phenomena (items 6, 7 and 8) or
post-traumatic anxiety (item 9). Item 10 gives an indication of the anxiety
condition ’ s impact on social functioning. When using the ASS as a screening
instrument, a score of 3 or higher is the clinical threshold.
95
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Clinical psychometrics has developed into a discipline within clinical
psychiatry of similar importance as genetics, epidemiology, or pharmacology.
Psychometrics was originally a discipline within psychology that was estab-
lished at Wundt’s psychological laboratory about a century ago. Here Kraepelin
learnt how to measure his subjects’ mental manifestations under standardised
conditions, e.g., the dose–response relationship of alcohol to reaction times.
When proceeding to a career in clinical psychiatry, Kraepelin continued his
‘laboratory-like’ assessments of his patients ’ symptoms over time by measuring
their symptoms on his ‘symptom checklist’. On this basis, he was able to delimit
the course of illness in both schizophrenic and manic-depressive patients.
At the beginning of the 20th century, Kraepelin attempted to establish a
discipline that he named pharmacopsychology in the hope that psychoactive
drugs with the desired effect on schizophrenia or the manic-depressive
disorder would make their appearance. However, this only came to pass at
the beginning of the 1950s.
Attempts have been made to scientifically test the rating scales developed
since the 1950s by using the classical psychometric method developed by the
psychologist Spearman during his studies at Wundt’s psychometric
laboratory; the factor analytic method.
Table 9.1 gives an overview of the questionnaire (Eysenck’s personality
scale) and the rating scales (HAM-A, HAM-D, BPRS) that were developed in
the 1950s and tested by use of Spearman’s factor analysis.
The British tradition used the two-factor model introduced by Spearman
in his research on intelligence measurements. In Spearman’s research, the
first factor was a general factor and the next factor a dual or ‘bi-directional’
factor (indicating that in the general factor, two subgroups with opposite
signs can be isolated, as the symptoms within these subgroups have the
highest inter-correlation).
9 Summary and perspectives
96 Clinical Psychometrics
In modern psychometrics, however, factor analysis has faded into the
background. Principal component analysis is included as an example of the
factor analytic method that survives in clinical psychometrics. Here it is the
bi-directional factor 2 that is of interest, since it focuses on a pattern of symp-
toms in classification issues.
Factor 1 is the general factor that is presumed to measure the degree of
severity, as it reflects that all of the selected symptoms are more or less
positively correlated. However, this correlation is already mirrored in
Cronbach’s coefficient alpha and is not an argument for adding up the symp-
toms. In this case, item response theory analysis must be employed.
If factor 2 indicates a clinically meaningful symptom pattern by the
positive versus negative factor loadings, then factor rotation is quite
unnecessary. When interpreting the symptom pattern in factor 2, all loadings
must be taken into account, and not only those demonstrating statistical
significance (e.g., not just loadings of 0.30 or greater).
However, pharmacopsychometrics has discarded classic psychometrics
(factor analysis), as the general factor was unable to measure transferabil-
ity, that is, whether or not a rating scale measures the same phenomenon
or the same dimension in different groups of patients (men versus
women, younger versus older age groups, primary versus secondary
depression) or in the same group of patients when the scale is used in
weekly assessments during antidepressive therapy. Modern psychomet-
rics was able to demonstrate this concept of transferability by the use of
item response theory models.
Table 9.1 A schematic review of British factor analytic tradition, focusing on the general versus the dual factor.
General factor Dual factor
Spearman 1927 ( 17 ) General intelligence factor Linguistic versus mathematical intelligence
Eysenck 1953 ( 31 ) General neuroticism factor EPQ – N
Extraversion versus introversion
Hamilton 1959 ( 38 ) (Appendix 5a)
General anxiety factor HAM-A 14
Psychic versus somatic anxiety
Hamilton 1960 ( 39 ) (Appendix 3a)
general depression factor HAM-D 17
Depression versus anxiety
Overall 1962 ( 44,45 ) (Appendix 7)
General psychotism factor BPRS 18
Schizophrenicity versus depression
Bech et al 2010 ( 161 ) (Appendix 3e and 3h)
general distress factor SCL-92
Depression versus anxiety
Summary and perspectives 97
These models were constructed precisely because factor analysis was
unable to measure transferability, no matter how many times the different
factors were rotated in accordance with the American tradition.
In the pharmacopsychometric triangle, the transferability requirement is
important (the total score on a rating scale for desired clinical effect is a
sufficient statistic), as the unit for the measure of the magnitude of pharma-
cological effect is denoted by variation unit, which is what makes the
magnitude of effect independent of the rating scale scoring system.
A group of approximately six symptoms has proved to be a sufficient
measure of desired clinical effect. When considering the second angle of the
pharmacopsychometric triangle, any unwanted effects of the drug, a separate
analysis of each side effect symptom is often necessary. Use of the item
response theory model has shown that the third angle in the pharmacopsy-
chometric triangle, subjective quality of life, can be measured with relatively
few items, e.g., the WHO-5.
The pharmacopsychometric triangle provides an easily grasped overview
of the importance of a drug in clinical psychiatry.
Table 9.1 shows the most used rating scales worldwide. Apart from the
SCL-90, these scales can be found in the Appendices. The Danish SCL-92 (as
well as many of the others) is to be found in an electronic version at: www.
psykforskhil.dk.
Figure 9.1 illustrates the issue in depression called ‘the one and the many’.
The standardisation introduced with the diagnoses of ‘major’ versus ‘minor’
depression is rooted in the Hamilton Depression Scale. This scale gives a
common ground: ‘the one’. However, depression also appears in many forms
(‘the many’), such as primary depression (when no certain cause can be
established) and secondary depression when emerging after stress (burden)
or after medical conditions (postnatal depression, post-stroke depression
etc.). These manifold subtypes are marked with Roman numerals in
Figure 9.1 and with reference to the corresponding therapy according to
Lichtenberg and Belmaker ( 162 ).
Among international collections of rating scales, the book by Lam et al can be
recommended ( 163 ). This work mentions the fact that rating scales (assessment
scales or questionnaires) are widely used in scientific research, but still only to
minor extent in daily clinical work, even though electronic patient records are
encouraging such use. Perhaps the use of rating scales will only become a
requirement in daily clinical work with the introduction of DSM-V or ICD-11.
Lam et al discuss the difference between two different approaches to
treatment evaluation during antidepressive drug treatment. These approaches
are personified by two physicians, Dr Scales and Dr Gestalt. Dr Scales uses
the HAM-D and Dr Gestalt uses a general measure (‘are you feeling better or
98 Clinical Psychometrics
are you not feeling better today?’) when assessing their patients. This
difference in their approaches to treatment is illustrated in Figure 9.2
( according to Lam et al).
Prior to treatment (week 0 on Figure 9.2 ), both Dr Scales and Dr Gestalt
have diagnosed moderate depression according to ICD-10, and Dr Scales has
also established his patient ’ s symptom score on the Hamilton Depression
Scale (HAM-D 17
); a total score of 24 (see Figure 9.2 ).
Major depression sub-typesHAM-D17
mean (sd)Treatment
Primary depression (melancholia)
I Psychotic depression 30 (6) ECT, TCA
II Bipolar depression 24 (5) Mood stabilizors, SSRI
III Unipolar depression 24 (5) SSRI, SNRI, TCA
IV Atypical depression 21 (5) MAO-I
Secondary (to stress) depression
V Stress-adjustment disorder with depression and anxiety
18 (4) Stress-reducing exercises
VI Depression after childhood trauma 18 (4) Cognitive therapy
VII Depressive reaction to stress in connection with separation
18 (4) Psycho-social intervention
Secondary (to somatic illness) depression
VIII Post-natal depression 18 (5) Cognitive therapy/ SSRI
IX Age-related depression (post-stroke) 18 (5) SSRI
X Substance abuse disorder 18 (5) Treatment of underlying disorder
Less than major depression sub-types
XI Dysthymia (depressive neurosis) 14 (3) Cognitive therapy/ SSRI
XII PTSDStress-reducing
exercises
XIII Other stress-related neuroses Cognitive therapy/SSRI
ECT = electroconvulsive therapy; TCA = tricyclic antidepressants; SSRI = specific serotonin reuptake inhibitors; SNRI = serotonin-/noradrenaline reuptake inhibitors; MAO-I = Monoamine oxidase inhibitors
Figure 9.1 Subtypes of depression, modifi ed from Lichtenberg and Belmaker ( 162 )
Summary and perspectives 99
Dr Scales and Dr Gestalt agree to start a course of antidepressive
medication at a dosage of 20 mg during the first week of therapy.
After the first week of therapy, Dr Gestalt asks how the patient feels and
when the answer is that there is no improvement Dr Gestalt increases the
dosage to 30 mg. Dr Scales informs the patient that the HAM-D 17
has now
decreased to 20, which means that the dosage should not be altered.
After the second week of therapy, Dr Gestalt enquires how the patient is
doing and when the answer is ‘largely unchanged’; he now increases the
dosage to 40 mg. Dr Scales informs the patient that the HAM-D 17
has
decreased to 14, which means that the dosage should not be altered.
After the third week of therapy Dr Gestalt enquires how the patient is
doing and as the reply is still ‘by and large the same’ he increases the dosage
to 60 mg (the maximum dosage). Dr Scales informs the patient that the
HAM-D 17
is now 12, half of the original score, and that they are on the right
track and that the dosage should remain unchanged.
After the fourth week of therapy, the patient informs Dr Gestalt that the
side effects (heavy perspiration, inner unrest and headache) are such a
burden that his family feels that the medication should be stopped. Dr Scales
24
18
12
6
Dr. Scales [Dr. Gestalt]
No improvement
Unchanged
Stops
1 2 3 4 5 6
20 mg [30 mg]
20 mg [40 mg]
20 mg [60 mg]
20 mg
20 mg
20 mg
20 mg [20 mg]
Weeks treatment
HAM-D
Response
Remission
Figure 9.2 A course of treatment as conducted by Dr Scales versus Dr Gestalt. (Modifi ed from Lam et al Assessment scales in depression and anxiety. London. Taylor & Francis 2006)
Severity of depression
02
46
8
Wee
ks o
f sho
rt-t
erm
th
erap
y
52
7 12 18 24
Rec
over
y
Rel
apse
Sym
ptom
s
Maj
or
depr
essi
on
Ear
ly im
prov
emen
t (25
%)
Res
pons
e (5
0%)
Rem
issi
on
Wee
ks o
f the
med
ium
-te
rm th
erap
y
HAM-D-17 total score
WP
A S
erie
s 19
99
The
seq
uenc
e of
impr
ovem
ent,
resp
onse
, rem
issi
on, r
elap
se a
nd r
ecov
ery
base
d on
Joh
n R
ush’
s or
igin
al m
odel
Fig
ure
9.3
Cou
rse
of t
hera
py
in d
epre
ssiv
e p
atie
nts
with
a H
AM
-D 17
sco
re o
f ap
pro
xim
atel
y 24
bef
ore
star
t of
ant
idep
ress
ant
ther
apy.
(Re
pro
duce
d fr
om B
ech
P. P
harm
acol
ogic
al t
reat
men
t of
dep
ress
ive
diso
rder
s: A
rev
iew
. In
: Maj
M, S
arto
rius
N (
eds)
Dep
ress
ive
Dis
orde
rs. C
hich
este
r, W
iley
1999
pp
89–
127.
Rep
rodu
ced
with
p
erm
issi
on.)
Summary and perspectives 101
informs the patient that HAM-D 17
is now 8 and that remission (absence of
symptoms) is within reach.
After the fifth week of therapy, Dr Scales can announce that HAM-D 17
has
fallen below the remission value of 7 and that continuation therapy can now
commence.
The development in Figure 9.3 shows how to use an assessment scale
during a course of treatment. When Dr Scales informs the patient that the
continuing decrease in his HAM-D 17
depression score is following the
expected trajectory, this has in itself a calming influence on the patient. Due
to his ‘holistic approach’, Dr Gestalt gives his patient ’ s own assessment too
much weight, resulting in a far too high dosage.
The use of itemised symptom measures (Dr. Scales) in the STAR-D study
was found to reveal a 25 to 45% earlier reduction in baseline severity of
depression than the global impression assessment (Dr. Gestalt). According to
Rush ( 164,166 ): ‘Analogous to treating hypertension, “less hypertensive” is
not a goal of treatment of hypertension. Nor should “less depressed” be the
goal for our depressed patients…’ .
Figure 9.3 shows the average curve for Dr Scales’ depressive patients
during treatment. The patients consult Dr Scales at the time point 0, where
the mean HAM-D 17
is about 24. After four weeks of therapy the mean
HAM-D 17
is about 11, i.e., a 50% reduction. Internationally, one uses such a
HAM-D 17
reduction of 50% or more as an indication of ‘response’ to
treatment. Two weeks of therapy typically gives a 25% reduction in HAM-
D 17
at week 0, and this is called ‘early improvement’. A score of 7 or less on
HAM-D 17
is termed ‘remission’, i.e., a relative absence of symptoms. ‘Relapse’
happens when remission has been obtained, only to be followed by an
increase in HAM-D 17
to 16 or more. After an absence of symptoms for 52
weeks in the older age group and 26 weeks in the younger age group, it is
highly likely that the patient is completely beyond the depressive phase and
Dr Scales can then finish antidepressive therapy. The period between
‘ remission’ and ‘recovery’ is termed maintenance therapy (Figure 9.3 ). If the
patient has a history of depressive episodes, relapse prevention therapy
should be offered.
The practical medical approach of Dr. Scales has had an impact on clinical
psychometrics going beyond the superficial approach of Dr. Gestalt.
Profound phenotyping in clinical psychiatry, e.g., endophenotypes, is
considered as the pathway between psychiatric disorders and the distal
genotypes ( 165 ). This deep phenotyping has been captured by the Newcastle
scales, with such items as sudden onset of depressive episode, diurnal
variation and morning worsening of depression ( 4 ). Similarly, this is reflected
in the double book-keeping behaviour in schizophrenia.
102 Clinical Psychometrics
In Kant’s philosophic approach (Figure 1.1) the dichotomy between the
phenomena and the noumena is covered by Wittgenstein’s ‘family resem-
blances’ in which the similarities between proximal and more distal pheno-
types are referred to as ‘applied mathematics’ ( 166 ).
When Hotelling looked back on the first decade of using his principal
component analysis, he advised psychometricians to consult mathematical
experts rather than psychologists to improve the use of his analysis method.
The mathematician Georg Rasch put a stop to the use of factor analysis in the
1950s. Wittgenstein had at that time criticised Freud’s psychoanalysis as
being a method by which we never know when to stop in the process of free
association: Freud never showed the right solution ( 167 ).
Rasch criticised factor analysis as being a method in which we never know
when to stop the rotations.
As remarked by Putman the Wittgenstein approach was to bring our items
back to their homes by reference to the ‘family resemblances’ ( 7 ). Putman
added that Wittgenstein could not have been so farsighted, had he not stood
on the shoulders of Kant. Similarly, we could not have developed clinical
psychometrics, had we not been standing on the shoulders of Kraepelin,
Hamilton, Pichot, Spearman, Hotelling and Rasch. Clinical psychometrics,
then, combines theories of measurement with the family resemblances in
clinical phenomenology, including deep phenotypings, i.e., theories of
clinical validity.
103
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
In a certain sense, clinical psychometrics has followed the continuity, usually
found in clinical medicine; that of a relay race in which the older clinicians
pass the baton to the younger generation. However, this is true only in a
certain sense, as the ‘relay race’ has been more Platonistic in clinical psycho-
metrics than in other branches of medicine. This epilogue attempts to give an
answer as to who took over Einstein ’ s office mentally – who received the
psychometric baton.
As an example, it was Bengt Strömgren (professor of astronomy at the
University of Copenhagen and brother of the Danish professor of psychiatry
Erik Strömgren), who physically took over Einstein’s office at the Institute for
Advanced Studies at Princeton, New Jersey, USA rather than someone desig-
nated by Einstein as his successor; or crown prince/princess ( 168 ). Often
enough, it is a purely bureaucratic decision process that lies behind the choice
of successor to a chair or an office, even that of a world famous scientist,
rather than the selection of a natural successor within the particular field
of research.
Figure 10.1 illustrates the more or less Platonistic office takeovers in the
wake of the three great psychiatric clinimetricists (Kraepelin, Hamilton and
Pichot). John Overall is still in his office in Texas and has yet to pass on
his baton.
As regards Kraepelin, Professor Hanns Hippius took over the former’s
office in 1971. In a certain sense, this office had remained empty during the
fifty years following Kraepelin ’ s departure. During the period from 1926 to
1971, German psychiatry was marked by the two world wars, and in the US,
Freud had taken over the scene ( 123 ). With the advent of Hippius, Kraepelin’s
work in both psychopathology and pharmacology became concentrated
around Kraepelin ’ s office and large library in Munich for all German-
speaking psychiatrists.
10 Epilogue: Who’s carrying Einstein ’ s baton?
104 Clinical Psychometrics
Professor Jules Angst, whose 1966 thesis had demonstrated the importance
of distinguishing between unipolar depression (patients suffering from
recurrent depressive episodes but never mania) and bipolar affective disorder
(patients suffering from both depressive and manic episodes) – something
Kraepelin had not covered sufficiently in his studies – was among the appli-
cants for the Munich chair in 1971 and might have been preferred ( 169 ).
However, he chose to withdraw his application in favour of a chair in Zurich
that Hanns Hippius had also applied for. In an attempt to improve Kraepelin’s
checklist in line with the international rating scale standard of the 1970s,
Hippius and Angst developed a very comprehensive scale system, the AMDP
(Arbeits-Gemeinschaft für Methodik und Dokumentation in den Psychiatrie)
in 1979; the most extensively used rating scale system in German-speaking
countries ( 170 ).
During his Zurich period, Jules Angst continued his major work with
scales, demarcating the bipolar affective disorders, most recently with the
Hypomania Checklist – the HCL-32 ( 171 ). The HCL-32 or the American
Mood Disorder Questionnaire are both intended to capture the previous
history of the depressive patient to ascertain any possible ‘upswings’
( hidden bipolarity) ( 172 ). Introversion is often a characteristic of patients
with recurrent depression but without signs of mania (unipolar depres-
sion). In the bipolar patient, extraversion is a more predominant personality
type. Due to this, some of the items in the HCL-32 overlap (MDQ and
Eysenck’s EPQ-E).
Following Hanns Hippius’ retirement, Professor Hans Jürgen Möller took
over Kraepelin’s Munich office. Möller has continued work on the AMDP
system, but has been particularly preoccupied with modern psychometric
Who got their offices?
Kraepelin E Hamilton M Pichot P
Lecrubier Y Kay SR
Paykel ES
Angst J Hippius H
Möller HJ
Lingjærde O Klerman GL
Overall JE
Lindenmeyer JP
Williams JBW
Rush J
Figure 10.1 Diagram of the psychiatrists who continued Kraepelin’s, Hamilton’s, Pichot’s and Overall’s pioneering work in scale construction
Epilogue: Who’s carrying Einstein’s baton? 105
studies on Hamilton ’ s depression scale as an effect measure in antidepressive
medication ( 173 ). Möller has very recently published an important review of
rating scales in psychiatry with particular emphasis on methodological
issues ( 174 ).
Physically, Max Hamilton’s office at the University of Leeds was taken over
by Mindham when Hamilton retired. However, Mindham was not particu-
larly interested in further work on Hamilton ’ s scales. Eugene Paykel, Professor
of Psychiatry at Cambridge University, developed Hamilton’s scales further
in the UK.
In 1985, Paykel published his Clinical Interview for Depression (CID), the
first attempt to use a 0–6 Likert scale with both the Hamilton Depression
Scale and the Hamilton Anxiety Scale ( 84 ). During this process, Paykel
discarded some of the original items so as to keep the number within 36, as
he had also added some new items. However, this modification meant that
the CID never caught on, as national medical agencies all over the world
insist that HAM-D 17
or HAM-A 14
, respectively, are part of the documentation
for the clinical effects of antidepressive or antianxiety drugs ( 175 ).
Most factor-analytic studies using the Hamilton Depression Scale are of a
more ‘invasive’ nature, carrying out various rotations of the factor structure.
Paykel’s CID study is among the few to assess only the un-rotated factor
structure ( 176 ). In his patient selection he compared especially depressive
hospitalised patients (N = 65) with outpatients (N = 100). He identified three
factors, the first of which is a general factor and the second a bipolar, or dual,
factor. The very important element in this study is Paykel’s demonstration
that the symptoms that especially discriminate between inpatients and out-
patients are the true core symptoms of depression and that these symptoms
are also the ones that are negatively loaded in the dual factor, (lowered mood,
guilt feelings, work and interests, psychomotor retardation) while the
positively loaded symptoms are sleep problems, anxiety and irritability.
Paykel has made the most valuable standardisation of the original
HAM-D 17
( 177 ). His London-based study took place in patients treated by
their GPs. The antidepressive drug amitriptyline was compared to placebo in
mildly to moderately depressed patients. The results showed that in patients
with a HAM-D 17
of 12 or less prior to start of therapy, placebo was just as
effective as amitriptyline, while in patients with a HAM-D 17
of 13 to 24 prior
to start of therapy, amitriptyline was clearly better than placebo, and this
effect was the same no matter whether the HAM-D 17
start score was from
13 to 17 or from 18 to 25.
Following the development of psychopharmacological drugs in the 1950s,
many psychopharmacological societies were established outside the UK in
different parts of the world. Among the oldest, besides the parent association
106 Clinical Psychometrics
Collegium Internationale Neuro-Psychopharmacologicum (CINP), is the
Scandinavian College of Neuro-Psychopharmacology (SCNP), which
celebrated its 50 th anniversary in 2009. In comparison, the American College
of Neuropsychopharmacology (ACNP) celebrated its 50 years in 2011, while
the British Association for Psychopharmacology (BAP) will have to wait
until 2024.
In 1969, the SCNP set up a committee for clinical investigations under the
acronym UKU (Udvalg for Kliniske Undersøgelser) with the Norwegian
Professor Odd Lingjærde as chairman and one member from each of the
other Scandinavian countries ( 1 ). Lingjærde arranged for the translation of
the Hamilton Depression Scale into the different Scandinavian languages.
The scale was then used in a UKU study demonstrating that lithium, in com-
bination with tricyclic antidepressants, was significantly more effective than
placebo in treatment-resistant depression ( 178 ). In the early 1980s, due to a
surprisingly small number of side-effect reports on psychopharmacological
drugs, the Swedish Medical Agency asked the UKU to design a reliable side-
effects rating scale. This led to the very comprehensive UKU Side Effect
Rating Scale ( 109 ), still the most comprehensive side-effect rating scale used.
A UKU subscale has been constructed for use in connection with the newer
antidepressants (4). In 1993, the UKU published a detailed review of rating
scales measuring the wanted and unwanted effects of psychopharmacologi-
cal therapy ( 179 ).
Figure 10.1 shows Klerman as the American ‘heir’ to the Hamilton
Depression Scale; he translated this scale into American English, making
such radical changes that Hamilton protested ( 1 ). However, Klerman’s ver-
sion was included in Guy’s Early Clinical Drug Evaluation (ECDEU) manual
( 92 ), which is used by the FDA and therefore, also by the pharmaceutical
industry.
Janet Williams developed the most internationally used structured inter-
view for the HAM-D ( 180 ). She also wrote a very important review of the
various versions of the HAM-D including the GRID-HAM-D ( 181, 182 ).
It falls naturally to mention John Rush in this connection, as he is viewed
as another American ‘heir’ of the Hamilton Depression Scale with his
Inventory of Depressive Symptomatology (IDS-30), which builds on the
HAM-D with extra items measuring the ‘atypical’ depressive symptoms of the
DSM-IV ( 183 ).
Professor Loo, who took over Pichot’s chair in Paris, made important
analyses with the HAM-D together with Marcelo Fleck and Professor
Guelfi ( 184 ). Professor Yves Lecrubier (1944 –2010) has recently compared
the HAM-D 17
and the HAM-D 6 ( 185 )and has developed a neuropsychiatric
interview, the MINI International Neuropsychiatric Interview (MINI)
Epilogue: Who’s carrying Einstein’s baton? 107
together with Professor David Sheehan ( 73 ). As for the BPRS, which was
introduced by Pichot in Europe in collaboration with John Overall (the US
developer of the BPRS), further European progress seems to have been put
on hold after Pichot’s retirement, as the PANSS version, also American, has
become its successor. It was the collaboration between John Overall and
Pichot in Europe and John Overall, Don Gorham and Leo Hollister in the US
that inspired Overall to develop a clinical scale like the BPRS. Hollister
worked as a professor of psychiatry, although he was only trained as a
specialist in internal medicine (with particular interest in antihypertensive
medicine) and had no formal training as a psychiatrist. He became the
administrative head of the largest psychiatric hospital in USA at the time
when placebo-controlled trials were conducted in psychiatry in the 1950s.
Hollister undertook what was probably the first US placebo-controlled study
on chlorpromazine in schizophrenia with ‘between-groups-analysis’ as
opposed to ‘cross-over-analysis’. He found the BPRS clinically meaningful
compared to the Rorschach test on one hand and to the Minnesota
Multiphasic Personality Inventory (MMPI) on the other ( 89 ). Hollister
found it difficult to grasp how a psychiatrist working as a serious clinician
could be able to listen to and observe a patient while at the same time, as
described by Greenberg ( 123, 126 ), frantically leafing through and completing
a stack of the quite complex scales now required by the medical industry in
their study protocols.
In Figure 10.1 , Overall is placed on the same level as Pichot, as his BPRS
scale together with the Hamilton are the archetypical rating scales of the
50 years of psychopharmacological history. In 1988, Overall arranged a
symposium under the auspices of the New Clinical Drug Evaluation Unit
(NDCEU), sponsored by the National Institute of Mental Health (NIMH)
with the title: ‘The Brief Psychiatric Rating Scale (BPRS): Recent Developments
in Ascertainment and Scaling’. Here he stresses the importance of avoiding
too many changes in a scale widely used on an international basis ( 186 ). The
mere addition in 1965 of two items to the 1962 version, so that it now consists
of 18 items means that users of the most common BPRS-18 always, incor-
rectly, refer to his 1962 paper with the original 16-item version. Overall also
mentions in his 1988 introduction that the ‘pain limit’ of a ‘brief ’ scale is
18 items. The version he recommended in 1988 is shown in Figure 1.10.
Overall finally remarks that he would like to have added ‘elevated mood’ in
order to include the manic state (186, 193).
In the 1980s, at the Albert Einstein Medical Center in New York, Stanley R.
Kay (1946–90) developed ‘The Positive and Negative Syndrome Scale’
(PANSS) in collaboration with J.P. Lindenmayer ( 187 ). This scale is based on
the BPRS, with adequate anchorings in the individual items. The PANSS is
108 Clinical Psychometrics
not a brief scale as it contains 30 items. An 11-item version is, however, with
reference to the BPRS, sufficient for the measurement of antipsychotic effect.
Other offices than those previously belonging to Kraepelin, Hamilton and
Pichot have, of course, also conducted studies in both Europe and the US in
particular to improve ICD-10 or DSM-IV with more complex rating scale
systems, but have misunderstood psychometrics by seeing item response
theory models as a special case of factor analysis ( 188,189 ).
As regards the offices of modern psychometrics, only that of Georg Rasch
in Copenhagen will be mentioned here. In a both Platonistic and physical
sense, Peter Allerup may be said to have taken over Rasch’s office after the
latter’s retirement, even though the chair Peter Allerup holds at the Danish
School of Education was not established until recently (as an institution
belonging to Århus University).
Europe has played the major role in this summary of clinical psychomet-
rics. However, American psychometrics has also been important and one
might mention in addition to the Likert scale and Siegel’s non-parametric
statistics that Jane Loevinger’s coefficient of homogeneity from a Platonistic
point of view, moved to Amsterdam when Mokken included it in his non-
parametric item response theory analysis after he had become familiar with
Guttman ’ s model through Rasch.
At Johns Hopkins Hospital in Baltimore, Derogatis took over the SCL-90
baton from the psychiatrist and professor Jerry Frank, who studied the effect
of psychotherapy in anxiety and depression. However Derogatis’ main
interest lay in gaining a SCL-90 copyright by changing two items, resulting in
the SCL-90-R. The version used in Denmark, the SCL-92, covers both the
SCL-90 and SCL-90-R ( 157 ).
At Harvard University, psychology did not become detached from philos-
ophy until Ralph Barton Perry’s time in office, as successor to William James.
Later on, Edwin Boring and then Fred Skinner, who took care of psychology
in Boston, downplayed the role of psychometrics. It was Willard Quine, who
became professor of philosophy in Boring’s time, whose set theory was more
in line with the field of psychometrics ( 190 ). Willard Van Orman Quine
(1908–2002) was appointed professor at Harvard in 1948. After the death of
Wittgenstein he was among the most influential philosophers in the English-
speaking world. Based on Russell’s theory of descriptions and typology,
Quine concluded in his book ‘From a logical point of view’ (1953) that to be
is to be the value of a variable. In other words, to be depressed is to have a
score on the HAM-D 6 of 9 or more.
Finally, it is worth mentioning here, that the University Hospital of Munich
marked the 50 th anniversary of Emil Kraepelin ’ s death by issuing a Kraepelin
Gold Medal. This award was presented to Professor Erik Strömgren.
109
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Allostasis When subjected to severe stress the human organism
attempts to attain a new stability in its hormonal and
nervous system at the cost of an increased cortisol
production. When this succeeds with a relatively
small increase in cortisol production, cortisol is
called a ‘tolerance hormone’.
Calvinism A concept introduced by the American psychiatrist
(Pharmacological) Gerald Klerman in 1972; referring to the fact that
psychopharmacologic drugs only have an effect on
mental disorders (depression, psychosis, mania, or
anxiety) and are not to be perceived as recreational
drugs in line with amphetamine or cannabis. This is
a reference to Calvin, who stated that life is predes-
tined, that God determines the course of our lives
from birth on, ‘as a doctor’s prescription’.
Clinimetrics A term introduced into medicine long after the first
rating scales were used in psychiatry. Alvan R
Feinstein (1925–2001) was professor of medicine
and epidemiology and the ‘father of clinical epide-
miology’, he introduced the term ‘Clinimetrics’.
Clinical pshycometrics is clinimetrics in psychiatry.
Compliance When constructing a questionnaire, various methods
(in filling in a are used to secure that the person completing the
questionnaire) questionnaire does read the questions properly. One
of the methods used is that of changing between
positively and negatively worded questions. Experience
has shown that there are more disadvantages than
Glossary
110 Glossary
advantages in this method. If a questionnaire has
‘mixed’ questions, then it is possible to apply factor
analysis to find out whether the positively worded
questions constitute the one factor (the one pole) and
the negatively worded questions the other factor (the
opposite pole).
Correlation A mathematical expression of the correlation between
coefficient two variables. Francis Galton used as an example
the fact that persons with long arms often have
long legs as well, when, as the first to do so, he
formulated a correlation coefficient in 1886; here
his position was that 1 meant perfect correlation, 0
meant no correlation, and -1 inverse, or negative
correlation. One of his pupils, Karl Person
developed the parametric correlation test, while
another pupil, Charles Spearman, developed the
non- parametric correlation test.
These correlation tests led on to factor analysis
(principal component analysis).
Factor analysis Introduced by Spearman in 1904 as a statistical
method by which the items in a rating scale or a
questionnaire are reduced to simple factors.
Spearman himself felt that his two-factor model
was adequate. The term ‘factor analysis’ is used in its
widest sense, especially to encompass principal
component analysis as developed by Hotelling [REF
24] in a mathematical version.
The first factor was called a general factor as it
demonstrated the degree of positive correlation
between the items (questions). The second factor
was termed a bipolar (dual) factor as it demon-
strated the items (questions) with a high degree of
correlation without this being the case for the
remaining items (questions).
This emerged through the factor loading signs.
The negative loadings form an independent, specific
scale, as do the positive loadings, that is to say two
specific scales.
Feighner criteria John P. Feighner (1937–2006) was an American
psychiatrist who was the first author of the 1972
paper: ‘Diagnostic criteria for use in psychiatric
Glossary 111
research’ (Arch Gen Psychiatry 1972; 1:57–63)
which became the basis of DSM-III. These criteria
defined Major Depression as an algorithm in which
five of the nine depression symptoms must be
present to make this diagnosis. Feighner thought
that these symptoms were the same in primary and
secondary depression. Psychometrically, this is a
transferability issue (s.d.).
Primary A depressive state which cannot be explained as
depression secondary to either physical disorders (e.g., post-
stroke depression) or to stress-induced depression.
In an anology to hypertension, primary depression
can be termed essential, or idiopathic.
Psychoanalysis A diagnostic and a therapeutic method developed by
Freud. As a therapy psychoanalysis has been found
to be without effect on mental disorders (depression,
mania, schizophrenia).
Psychopharmacology The study of drugs acting on mental functions,
including their clinical effect (antidepressant,
antipsychotic, antimanic, or antianxiety) and their
fate when entering the organism, in terms of phar-
macodynamics and pharmacokinetics.
Reductionism When a complex questionnaire or rating scale is
reduced so that it covers the whole area and not just
a single aspect.
Relapse When a person suffers a setback over the following
months after obtaining freedom from symptoms. On
HAM-D 17
, a score af 13 is seen as a relapse score.
Reliability The reliability of a questionnaire is often shown by
(questionnaire) its test-retest coefficient, i.e., when two responses by
the same person, given with a period of about
2–3 weeks between completions, are in agreement
with each other. This reliability target depends, of
course, on the person ’ s unchanged condition in the
period between test and retest.
Reliability The reliability of a rating scale, when several
(rating scale) interviewers (clinicians) assess the same patient or
patients is statistically shown by their intraclass
coefficient, where 1.0 means complete equivalence
and 0.6 only just an equivalence. All rating scales
included have a saticfactory reliability. The Rorschach
112 Glossary
test, however, has an intraclass coefficient of only
0.40 or lower.
Remission Being relatively free of symptoms, i.e., a score of 7 or
lower on the HAM-D 17
.
Response A sufficient reduction of symptoms during treat-
ment. A 50% reduction of symptoms from the time
when the treatment started is frequently used as a
measure. In dose–response studies, effect size is a
more distinctive response measure. Both methods
are universal measures as they do not depend on the
raw score of the rating scale used.
Standardisation The scale scores defined to indicate response, remis-
sion and relapse.
Transferability When a scale still measures the same dimension
each time it is applied several times during treat-
ment, or when different assessors rate the same sub-
ject, no matter whether their condition is primary or
secondary. Psychometrically, only item response
theory analyses are able to show whether transfera-
bility has been achieved.
Unidimensionality A rating scale is said to be unidimensional, when it is
accepted by Rasch analysis. Rasch analysis presup-
poses that scores on items with low prevalence are
preceded by scores on items with higher prevalaence.
Items with low prevalence measure the more severe
degrees of the dimension to be assessed while items
with high prevalence measure the milder degrees.
Validity (clinical) Clinical validity means the degree to which a rating
scale or a questionnaire has clinical significance or is
clinically valid. After DSM-III, DSM-IV and ICD-10
had been introduced it became customary to use
these systems as an index of clinical valididty. An
example is the Major Depression Inventory (MDI)
which has a high clinical validity (‘face validity’)
because its questions correspond with the depression
symptoms of DSM-IV major depression.
Validity Psychometric validity means that the rating scale or
(psychometric) the questionnaire has been analysed psychometri-
cally, e.g., by means of the item response theory
model to find out whether it is unidimensional, also
when women and men, or younger and older
Glossary 113
persons are compared. This type of validity is also
called ‘internal’ validity.
Validity (external) External validity describes the degree to which a
scale correlates with factors outside the scale, e.g.,
dosage of a drug (dose–response relation) or
whether it is able to discriminate between treatment
with an active drug (verum) and an inactive drug
(placebo).
Visual analogue An assessment method which measures the
scale (VAS) dimension in question on a line from zero to 10
(centimeters) or from zero to 100 (millimeters).
Zero indicates that there is nothing to measure, and
10 or 100 indicate an extreme degree.
Window A term used for the time frame a rating scale covers,
e.g., the past three days. It is derived from consid-
ered a rating scale as a camera, visualising clinical
reality (6).
114
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Wundt, Kraepelin and Wittgenstein all stood on the shoulders of Kant
with their phenomenological approach, saying that when we know things
clinically, we then know how to bring symptoms or signs back to their respec-
tive syndromes. They tried very consistently to avoid all etiological factors of
the clinical syndromes focusing on the description alone. At the end of his
Philosophical Investigations Wittgenstein says: ‘Can one learn this descriptive
knowledge? Yes, some can. Not, however by taking a course in it, but through
“experience”. Can someone else be a man’s teacher in this? Certainly. From
time to time he gives him the right tip’.
In this Appendix some of the right tips in the spirit of Wittgenstein are
indicated, clinically and psychometrically. To enable the clinician to make
more effective and economic use of his or her basically limited capacities
for handling scales we have focussed on brief scales. Thus, the Hamilton
Depression Scale has been decomposed into three familiar subscales (spe-
cific, arousal, suicidal). It is easier to remember these three words than the
whole string of seventeen items in this scale.
The dialogue between the interviewer and his or her patient should be
considered as an informal conversation in which the task of the interviewer
is to give the patient a feeling of relief in knowing that the interviewer is thor-
oughly familiar with the problems the patient had feared were private and
non-communicable. Throughout this Appendix Wittgenstein’s approach has
been followed when selecting and describing the various scales by ‘bringing
the items back to their respective syndromes’.
Table A.1 shows how the informal conversation is finally measured by a
total score which has been standardised.
Appendix 1 is the Hamilton Copenhagen Lecture which can be seen as a
paraphrase of Wittgenstein’s concept of phenomenological or descriptive
Appendix
Max Hamilton’s HAM-D
Danish version [1] Danish version [2]
Consensus Danish version [3]
English version [4] English version [5]
Consensus English back translation [6]
To be accepted by Max Hamilton [7]
Figure A.1 The six steps in the translation procedure leading to the fi nal acceptance by the developer of the scale, exemplifi ed by the Danish version of HAM-D
Table A.1 Standardisation of three different depression scales: Hamilton Depression Rating Scale (HAM-D 17 and HAM-D 6 ) and Bech-Rafaelsen Melancholia Scale (MES)
HAM-D 17 MES HAM-D 6
Theoretic score-range 0–52 0–44 0–22 Remission (relative zero point) 7 6 4
DEGREES OF CLINICAL DEPRESSION HAM-D 17 MES HAM-D 6
Doubtful depression 8–12 7–10 5–6 Mild depression 13–17 11–14 7–8 Moderate depression 18–24 15–24 9–11 Medium-severe to severe depression 25–52 25–44 12–22
116 Clinical Psychometrics
knowledge. The expert judgment about the general expression of feelings is
most valid, according to Wittgenstein (1953): ‘Most valid from the judgment
of those who understand by experience people better (des bessern Menschen
kennen).’
Appendix 2 is an example of how to learn the use of the Hamilton
Depression Scale; the tips from the A, B, C version.
Appendix 3 contains a selection of depression scales; especially the
Melancholia Scale (191).
This collection of scales includes those mentioned several times in the
text, as well as others that merit a more detailed description, together with
their standardised values. When a psychometric analysis has shown that a
total score is a sufficient reduction of the information available in the indi-
vidual items, then the question naturally arises of how to interpret this score.
This is the meaning of standardisation (Table A.1 gives an example).
Appendix 4 contains the Major Depression Inventory and Appendix 5 the
Hamilton Anxiety Scale.
Appendix 6 contains the Mania Scale (MAS) (192).
The interview based scales in the appendix contain both scoring sheets
and scoring manuals. Some of the scales included in the appendix consist of
items selected from more comprehensive scales. Thus Appendix 3f, 3g and 3i
are each made up of items taken from more comprehensive scales. Appendix
3h consists of items from the Hopkins Symptom Checklist (SCL-92).
Appendix 3i contains the six items in the Beck Depression Inventory (BDI
version I), corresponding to the six HAM-D 6 items.
Figure A.1 demonstrates the translation procedure recommended by
WHO. HAM-D is used as an example, precisely because it is to be found in
so many different translations even within the same language area. This often
results in not knowing which of these versions was used in a specific study.
Often the reference is to Hamilton’s first English version from 1960, but this
version is not used any more, as Hamilton himself could not recommend it.
The Danish version is a very free translation. Its back translation into English
was published in 1986 after prior approval from Max Hamilton. The Danish
professor Ole Rafaelsen (1930–87) was primus motor here. He also played a
major part in the development of the MES and the MAS. Ole Rafaelsen also
made a back translation of BDI into US-English.
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
117
Rating scales are so extensively used in clinical trials that it is difficult to find
a report of a drug trial that does not use at least one scale.
The young psychiatrist will find it difficult to believe that the use of such
scales in psychiatry is still comparatively new. As little as forty years ago, a
leading British psychiatrist declared that to make a scale of different symp-
toms and to add scores on them to produce a total was a meaningless proce-
dure. It was impossible! Opinions have changed so much that it has almost
become accepted that a clinical trial is not “scientific” without the use of a
rating scale. Of course, this is quite mistaken. Clinical trials have been carried
out without scales in the past and will be in the future. The excessive preoc-
cupation with scales has led almost to a sort of worship of them, and has
undoubtedly led to some misuse.
A rating scale is no more than a particular way of recording a clinical judg-
ment. The clinician puts down his opinion on the presence or absence of a
symptom, or its severity, but whether he does so in words or in the form of a
number, the judgment is the same. However, judgments are of different kinds
and have therefore to be recorded on appropriate scales. The commonest
type of scale is that used for recording severity of illness and is quite different
in nature from those which are related to the other kinds of clinical decisions,
e.g. making a diagnosis or selecting patients for treatment. Although rating
scales have a clearly defined role to play in clinical psychiatry it must not be
forgotten that it is a very limited one.
The fundamental basis of scales rests on the everyday work of the clini-
cian. The psychiatrist is accustomed to say of a patient “This patient is now
better or worse than last week”. The patients themselves can make the same
sort of judgment. Furthermore, a clinician can say “This patient is worse, or
better, than that patient”. The process can be carried further. If we can say that
one patient is more ill than another, we can place a group of patients in an
Appendix 1 The clinical validity of rating scales for depression Copenhagen 1977
Hamilton M.
118 Clinical Psychometrics
order , in which the first is the most severely ill, going down to the last who is
least ill. When we try to put a large number of patients into rank order, it is
easier to assemble them into groups which have a rank order. The experi-
enced clinician can remember the characteristics of such groups and can
allocate a patient to the appropriate group without making a direct compari-
son. This is what is meant by making a Global Judgment. The same proce-
dure can be applied to the individual manifestations of the illness, i.e. the
symptoms.
It is generally accepted in all branches of medicine that the more symp-
toms which a patient experiences, the more ill he is. This forms a crude, but
surprisingly effective way, of measuring the severity of illness. The doctor
goes through a list of symptoms and checks how many are shown by the
patient. The total checked is a measure of severity of illness. An obvious
improvement is to take into consideration the extent of a symptom. A severe
symptom should contribute more to the total score than a mild one. In other
words, the symptom is given a weight according to its severity. Such a system
of weighting converts a check-list into a rating scale, and it is clear that the
total score is merely a way of recording the clinician’s judgment.
There arise immediately three questions which may seem naïve but which
are really very important. The first one goes as follows. Counting the number
of symptoms will show that a patient who has eighteen is more ill than one
who has six, but what do you do when one patient has six symptoms and
another has six completely different ones? How is a decision made then? One
answer is to say that no decision is made; but that is not the whole truth.
It is generally true that symptoms are not mutually exclusive, i.e. that the
presence of one prevents the other appearing. In general, symptoms tend to be
associated with each other. This can be shown by calculating the correlations
between symptoms, when it is found that the correlations are all positive.
These positive correlations provide the mathematical justification for adding
the scores on the symptoms to make a total score. However, there are some
special circumstances in which a group of symptoms, all correlated positively
with each other, will have negative correlations with another group. This
shows that when the symptoms of the one group are present, those in the
other group will tend to be absent. One way of dealing with this situation is to
have two separate scales. From the clinical point of view, it is better to divide
the patients into two groups and to deal with them separately. This is only a
partial answer, but it serves to show that the question is not a simple one.
The second question asks “How is the weighting determined?” If a symptom
is absent, it is scored zero. If it is trivial, mild, moderate or severe, it is scored 1,
2, 3 or 4 respectively. Why not 1, 2, 4 or 8 respectively? And how is a comparison
made between 1 to 4 for depression and 1 to 4 for paranoid thinking?
The clinical validity of rating scales for depression 119
The second part is very similar to the first question and again the answer
is not a simple one. There are technical ways of determining what should be
a value appropriate to every category of symptoms and every grade of sever-
ity. But in general, these complicated techniques give an answer which is very
much the same as the simple ones. The difference is so small that it is not
worth the trouble. However, in some circumstances simple crude weights are
unsatisfactory.
The third question is the one which worried psychiatrists 40 years ago. How
can one add scores on depression, loss of weight and loss of libido and obtain
a total which makes any sense? It does not appear to be capable of having any
meaning. There is some truth in this but it misses the point. To the patient,
one of the most important aspects of mental illness, as of all illnesses, is that it
is a loss of functional capacity. The patient suffers from disabilities: he cannot
work, he cannot sleep, he cannot carry on life in the usual way, and each extra
symptom is, in a sense, an additional burden on him. When we add scores we
are not so much adding scores on depression, loss of weight or loss of libido,
as adding up measures of disability. It is disability which is common to all the
symptoms and so a total score represents, in a way, the suffering of the patient.
These three questions seem to be concerned with very simple elementary
points, but in fact although they are simple, they are not elementary.
The most important classification of scales is that which distinguishes
between those used by an observer and those used by the patient.
Each type of scale has its advantages and disadvantages. For example, the
observer scale will include items on information which a patient cannot give.
By definition, a patient cannot describe his loss of insight nor can he say that
he has delusions, although he may say that he has hallucinations. The observer
scale when used by an experienced clinician, can record very small and deli-
cate changes, which are difficult for the inexperienced person and especially
for the patient, to recognize. However, they do take a long time; even half an
hour’s interview is, in my opinion, not really enough.
A disadvantage of the self-rating scale is that the patients are likely to fill in
the form about their condition with the help of wife or husband. If they make
daily assessments and take home the forms, then the children, grandparents
and cousins will come to help. Even the milkman and butcher may offer
assistance to help fill in the scale! The self-assessment scale has the great
advantage that it is easy to use repeatedly. A patient can be asked to describe
how he feels today or even this hour. Most observer scales have difficulties
over this and some scales make an assessment covering a period of a week or
two weeks.
In the end, there is no such thing as the best scale for all circumstances, all
patients and under all conditions. The clinician who is going to use a scale
120 Clinical Psychometrics
must decide what he wants to get from it; what is the information he is look-
ing for. The two types of scale give different information. Two important
requirements are high validity and reliability, and this is found in most
observer scales. Validity signifies that scales measure truly what is wanted of
them. One way of measuring validity is to compare a group of severely ill
patients with a group which is only moderately ill. If the first group obtains
high scores and the second low scores, we can say that the validity is high.
High (inter-rater) reliability means that if two raters use the scale at the same
time, the scores they obtain will be very close. It is an astonishing fact that
rating scales can be more accurate and reliable than some physical measure-
ments. A last word on these points: a clinician should ask himself not only
what he wants to measure and how, but also why. This last question is not
asked sufficiently often.
Originally scales were validated against a global judgment, i.e. when a scale
was designed it was tested by comparing the results obtained with the scale
against the physicians’ judgment. This took priority and determined whether
the scale could be regarded as satisfactory. Now that rating scales are regarded
as acceptable for assessment, we can reverse the process, as I have been sug-
gesting for many years. We can use the scales to look at global judgment, to
examine what the psychiatrist does and how he does it. In this respect, one of
the most interesting pieces of research is one carried out here in Copenhagen
by Per Bech and his colleagues. What they showed was that the Hamilton
scale did correlate very well with global judgment except at the most severe
levels. Furthermore, they found that to get an exact or a better correspond-
ence between the scale and global judgment, the full 17 items were unneces-
sary. Six of them did all the work and the other 11 were, so to speak, passengers
which just interfered with the work.
I think it is not an accident that 6 items are sufficient to equal the global
judgment. We know from research by psychologists in all sorts of ways that
the human mind is capable of holding, on an average, only 7 items of infor-
mation. There is a very famous paper published on this “The magic number
seven, plus or minus two”. The fact that 6 items in the scale do the work of
global judgment suggests that what the clinician is doing is to hold in his
mind about 6 or 7 items of information and this is what he assembles into his
judgment. Of course, which items he assembles is another matter. Bech and
his colleagues showed that the items which played little or no part were either
those which did not occur often or those which the physician thinks are not
important.
It would also appear that the weight given to a particular symptom is not
the same at all levels of severity. I suspect that when a symptom begins to be
very severe, it is given increasing importance. A depressive patient, if actively
The clinical validity of rating scales for depression 121
suicidal, makes himself a crisis situation to the physician, whatever the other
symptoms may be, they are overshadowed.
When suicidal thinking is mild, it takes its place with the other symptoms,
but as it becomes more severe, the physician takes more and more notice of it
and less and less of the other symptoms.
References
Bech , P. , Gram , L.F. , Dein , E. , Jacobsen , O. , Vitger , J ., & Bolwig , T.G . ( 1975 ) Quantitative
rating of depressive states . Acta Psychiatrica Scandinavica , 51 , 161 – 70 .
Miller , G.A. ( 1956 ) The magical number seven plus or minus two: some limits on our
capacity for processing information . Psychological Review , 63 , 81 – 97 .
122
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
With the introduction of the new classification systems of psychiatric
disorders (ICD-10 and DSM-IV) two decades ago it became impossible to
distinguish between primary and secondary (stress-related) depression 1 .
The stimulus-response models for both PTSD (one single, severe life
event) and for exhaustion depression (multiple distressing life events) are
placed within the anxiety disorders in the ICD-10 and DSM-IV, although the
delayed distress response in these syndromes often progresses into the full
clinical picture of depression when untreated. The most internationally valid
measure of depressive states is the Hamilton Depression Scale (HAM-D 17
) 1 .
Figure A 2.1 shows how the 17 items in the HAM-D can be re-allocated
following the triangle corners so that “A” covers the core items of the
depressive state (HAM-D 6 ), while “B” covers the unspecific stress (arousal)
items with reference to Selye ’ s original definition of stress as the non-specific
response of the body to any demand made upon it 2 . Finally, “C” covers the
items of suicidal thoughts and lack of insight. In a patient with primary or
secondary depression, suicidal thoughts are often activated if there is a lack
of insight on the part of the patient into his disorder 3 .
When Hamilton developed his scale 4 , he consulted Kraepelin ’ s original
description of primary depression (manic-depressive illness), as well as
Kraepelin ’ s description of secondary depression (exhaustion depression).
However, Hamilton also made focus-interviews with his depressed patients
and their relatives 4 . This was the background for his selection of the 17 items
in the HAM-D.
Psychometric analyses with either principal component analysis or item-
response theoretical models 5 have shown that the HAM-D 6 (A in Figure A 2.1 )
is a valid measure of depression and thereby the most specific outcome
measure of the effect of antidepressant medication.
Appendix 2 The ABC profile of the HAM-D 17
* M
odes
t**
Sig
nific
ant
***
Sig
nific
ant:
Am
ount
s gi
ven
to th
e au
thor
’s in
stitu
tion
or to
a c
olle
ague
for
rese
arch
in w
hich
the
auth
or h
as p
artic
ipat
ion,
not
dire
ctly
to th
e au
thor
.F
or m
ore
info
rmat
ion,
see
inst
ruct
ions
for
auth
ors.
Bec
h P
--
--
--
-
Wri
tin
g g
rou
pm
emb
er
Dis
clo
sure
s
Oth
er r
esea
rch
gra
nt
or
med
ical
co
nti
nu
ou
sed
uca
tio
n2
Sp
eake
r’s
ho
no
rari
aO
wn
ersh
ipin
tere
stC
on
sult
ant/
Ad
viso
ryb
oar
dO
ther
3R
esea
rch
gra
nt1
Em
plo
ymen
tAB
C-v
ersi
on
of
the
Ham
ilto
n D
epre
ssio
n S
cale
(H
AM
-D)
HA
M-D
9(B
)
(B)
To
tal s
core
:H
AM
-D2
Th
e su
icid
e ri
sk b
ehav
iou
r
3.
Sui
cida
l tho
ught
s
16.
17.
Wei
ghtlo
ss
Hyp
ocho
ndria
sis
Psy
chom
otor
agi
tatio
n
Inso
mni
a : i
nitia
l
Inso
mni
a : m
iddl
e
Inso
mni
a : l
ate
Anx
iety
, som
atic
Gas
troi
ntes
tinal
sym
pt.
Sex
ual d
istu
rban
ces
15.
14.
12.
11.
9.
6.
5.
4.
13.
10.
8.
7.
2.
1.
Dep
ress
ed m
ood
Som
. Sym
pt. g
ener
al
Anx
iety
, psy
chic
Psy
chom
otor
ret
arda
tion
Aci
vitie
s an
d in
tere
sts
Gui
lt
Insi
ght
Th
e p
ure
dep
ress
ion
pic
ture
Th
e st
ress
-rel
ated
aro
usa
l
(C)
(C)
To
tal s
core
:H
AM
-D6
(A)
(A)
To
tal s
core
:
HA
M-D
17T
ota
l sco
re:
(A+B
+C)
Fig
ure
A2
.1
124 Clinical Psychometrics
When evaluating the specific antidepressive effect of an intervention we
need to focus on the HAM-D 6 1 . The theoretical score range of the HAM-
D 6 goes from 0 to 22, whereas the theoretical score range of the whole
HAM-D 17
goes from 0 to 52. In other words, the explained variance of the
HAM-D 6 theoretically covers no more than approximately 40% of the
HAM-D 17
. In patients with major depression, however, the HAM-D 6 typi-
cally explains over 50% of the total score of the HAM-D 17
. For instance, in
the STAR*D study the HAM-D 6 explained 53% of the variance in the base-
line data set 5 .
The nine items covered by the HAM-D 9 (B in Figure A 2.1 ) measure the
unspecific stress reaction in the body. Antidepressants with antihistamine
effects are often superior to selective serotonin reuptake inhibitors (SSRIs)
on the HAM-D 9 items 5 . Activation of the hypothalamic-pituitary-adrenal
(HPA) axis resulting in high cortisol levels in the body is a dysregulation that
accompanies depression as an unspecific reaction, i.e., it should not be seen
as the cause of primary depression. In the STAR*D study, the HAM-D 9
explained 41% of the variance 5 .
The discussion about the risk of suicide during initial SSRI treatment of
depressed patients might be an activation on the HAM-D 9 compared to
the HAM-D 6 . When prescribing SSRIs, it is therefore important to assess
the ABC profile of the HAM-D 17
. In the daily routine therapy of patients
with depressive illness the most valid way to monitor outcome is the
ABC profile.
For the untrained young doctors educated in the use of the HAM-D 17
, the
ABC profile is a simple way of recalling how the items in the HAM-D 6 ,
HAM-D 9 , and HAM-D
2 are best applied. The interview is recommended to
start from corner B, as these unspecific symptoms are easiest to capture, and
then go on to A and finish with C. Actually, this order is also the way in
which the spontaneous PTSD syndrome develops. During the first weeks,
the HAM-D 9 symptoms develop, and after some months the symptoms cov-
ered by the HAM-D 6 appear. In PTSD cases that do not remit, symptoms in
the HAM-D 2 should be carefully assessed.
The use of the ABC profile in the HAM-D interview shall give the
depressed patient a feeling of relief as the interviewer seems to be thor-
oughly familiar with the kind of illness that confronts him and to be
acquainted with the kind of feelings and thoughts that depression brings to
the patient. This is a vital start of the treatment process in the patient-doctor
relationship. The evaluation of the HAM-D 9 items (unspecific arousal items)
is important when measuring outcomes of antidepressive treatment because
they might overlap with the side-effects of the medication prescribed. The
The ABC profi le of the HAM-D17 125
use of a scale for the assessment of tolerable versus intolerable side-effects as
in the STAR*D study is an important supplement to the ABC profile of the
HAM-D 17
.
References
1 Bech P. Struggle for subtypes in primary and secondary depression and their
mode-specific treatment or healing . Psychother Psychosom. 2010 ; 79 ( 6 ): 33 – 38 .
2 Selye H . The evolution of the stress concept . Am Sci. 1973 ; 61 : 692 – 699 .
3 Bech P , Olsen LR , Nimeus A . Psychometric scales in suicide risk assessment . In:
Wasserman D , editor. Suicide – an unnecessary death . London : Martin Dunitz ;
2001 . p. 147 – 158 .
4 Bech P. Fifty years with the Hamilton scales for anxiety and depression. A tribute
to Max Hamilton . Psychother Psychosom. 2009 ; 78 ( 4 ): 202 – 211 .
5 Bech P , Fava M , Trivedi MH , Wisniewski SR , Rush AJ . Factor structure and
dimensionality of the two depression scales in STAR*D using level 1 datasets .
J Affect Disord . 2011 ; 132 : 396 – 400 .
126
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
The time frame (window) is the past three days.
Scoring sheet
Nr. Symptom Score
1 * Depressed mood 0–4
2 * Low self-esteem, guilt 0–4
3 Suicidal thoughts 0–4
4 Insomnia: initial 0–2
5 Insomnia: middle 0–2
6 Insomnia: late 0–2
7 * Work and interests 0–4
8 * Psychomotor retardation 0–4
9 Psychomotor agitation 0–4
10 * Anxiety, psychic 0–4
11 Anxiety, somatic 0–4
12 Gastrointestinal symptoms (appetite) 0–2
13 * Somatic symptoms, general 0–2
14 Sexual disturbances 0–2
15 Hypochondriasis (somatisation) 0–4
16 Insight 0–2
17 Weight loss 0–2
* Depression factor (HAM-D6) Total score 0–52
Appendix 3a Hamilton Depression Scale (HAM-D17)
Hamilton depression scale (HAM-D17) 127
SumNo depression: 0–7Doubtful depression: 8–12Mild depression: 13–17Moderate depression: 18–24Severe depression: 25–52
Hamilton Depression Scale (HAM-D17)
Manual
1. Depressed moodThis item covers both the verbal and the non-verbal communication of
sadness, depression, despondency and hopelessness.
0: Absent.
1: Slight tendency to despondency or sadness.
2: Clearer indications of lowered mood, moderately depressed but no
hopelessness.
3: Mood significantly lowered, perhaps non-verbalsigns (e.g. weeping).
Reports hopelessness.
4: Mood severely lowered, clear signs of hopelessness.
2. Self-depreciation and guilt feelingsThis item covers lowered self-esteem with guilt feelings.
0: No self-depreciation or guilt feelings.
1: Lowered self-esteem in relation to family, friend or colleagues, feeling
him-/herself to be a burden during present depressive state.
2: Indications of guilt feelings more clearly present because the patient is
concerned with incidents in the past prior to current episode (minor
omissions or failings).
3: Feels that the current depressive condition suffering is some sort of
punishment. However, still intellectually able to recognize that this is
hardly correct.
4: Guilt feelings and impression that current depressive condition is a
punishment, cannot be persuaded otherwise (delusion).
3. Suicidal impulses 0: Absent.
1: The patient feels that life is not worth while, but he expresses no wish to die.
2: The patient wishes to die (e.g. not waking up the next morning), but has
no plans to take his/her own life.
128 Clinical Psychometrics
3: Vague, but still active plans to take own life.
4: Has certain plans to take own life.
4. Initial insomniaAsk about the last three nights irrespective of possible sedatives
0: Absent.
1: At least on one night awake in bed more than half an hour trying to fall
asleep.
2: Each night awake in bed more than half an hour trying to fall asleep.
5. Middle insomniaThe patient wakes up one or more times between midnight and 5 a.m. Ask
about the last three nights irrespective of possible sedatives.
0: Absent.
1: Wakes up once or twice during the last 3 nights.
2: Wakes up at least once every night.
6. Delayed insomnia = Premature awakeningThe patient wakes up before planned. Ask about the last three nights irre-
spective of possible sedatives.
0: Absent.
1: Once woken up an hour or more before planned.
2: Consistently woken up an hour or more before planned.
7. Work and interests 0: No problems.
1: Slight problems with usual daily activities (at home or outside home).
2: More pronounced insufficiency but still only moderate.
3: Problems managing routine tasks, only completed with major effort.
Clear signs of helplessness.
4: Completely unable to go through with routine activities without aid,
i.e. extreme helplessness.
8. Psychomotor retardation 0: Absent.
1: Patient’s usual motor level of activity only slightly reduced.
2: Clearer signs of reduced motor activity, e.g. moderately reduced gesticu-
lation and slow pace or moderately slowed speech.
Hamilton depression scale (HAM-D17) 129
3: The interview is clearly prolonged or made difficult due to brief answers.
4: The interview very difficult to complete due to verbal retardation and/or
extremely reduced motor activity.
9. Psychomotor agitation 0: Absent.
1: Slight motor agitation. E.g. tendency to change position in chair or
scratch head.
2: Clearer signs of motor agitation; wringing hands, moderate problem
sitting still in chair, but remains seated.
3: The patient gets up from chair once during interview.
4: The patient so agitated that he/she has to get up and pace about several
times during interview.
10. Anxiety (psychic) 0: Absent.
1: Slight worrying and fear.
2: Clearer indications of psychic anxiety, appears moderately worried, inse-
cure or afraid, but still able to control insecurity.
3: Psychic anxiety and worry so pronounced that it is difficult for patient to
control; at times impact on daily activities.
4: Psychic anxiety very pronounced; constant impact on daily activities
11. Anxiety (somatic)This item includes physiological or autonomic anxiety phenomena. Psychic
tension should be rated in item 10.
0: Absent.
1: Slight tendency to somatic anxiety symptoms such as stomach upset,
sweating or trembling.
2: Clearer indications of somatic tension. E.g. moderate stomach upset, pal-
pitations, sweating or tremor. Still without impact on daily life.
3: Somatic anxiety so pronounced that the patient experiences difficulty
controlling this. At times impact on daily life.
4: Somatic anxiety extremely pronounced; fairly constant impact on daily life.
12. Somatic, Gastro-intestinalSymptoms have impact on entire gastro-intestinal tract. Dry mouth, loss of
appetite, and constipation are among the most frequent symptoms. Upset stom-
ach (“butterflies in the stomach”) is a autonomic somatic anxiety manifestation
130 Clinical Psychometrics
to be assessed in item 11. A feeling that “stomach disintegrates”) is a nihilistic
paranoid manifestation of hypochondriasis and should be assessed in item 15.
0: Absent.
1: Slightly reduced appetite or food intake about normal, but without
enjoyment.
2: Appetite moderately or extremely reduced. Still eats, as he/she recog-
nizes that this is important.
13. Somatic, GeneralThis item is about feelings of fatigue and exhaustion, reduced energy, but also
diffuse muscular aches and pains in neck, shoulders, back or limbs.
0: Absent.
1: Slight fatigue, muscle pains or perhaps headache.
2: Moderate or pronounced fatigue or muscle pains.
14. Sexual interestThis item is about reduced libido or interest. It is often difficult to approach,
especially in older patients.
0: No disturbances.
1: Mild disturbances.
2: Moderate to severe disturbances.
15. Hypochondriasis 0: Absent.
1: Slight preoccupation with bodily functions.
2: Clear indications of concern as to somatic health. Appears moderately
afraid that he/she is somatically ill, somatises depression but at a
“ neurotic” level.
3: Hypochondriasis more pronounced. The patient is convinced that he/
she is suffering from somatic condition (e.g. fear of cancer), but can be
persuaded that this is not the case for a short while.
4: Hypochondriasis extremely pronounced, paranoid delusions. Often
nihilistic: “rotting insides”; “stomach disappearing”.
16. Loss of insightThis item has, of course, only meaning if the observer is convinced that the
patient at the interview still is in a depressive state.
0: The patient agrees to having depressive symptoms or a “nervous” illness.
Hamilton depression scale (HAM-D17) 131
1: The patient still agrees to being depressed, but feels this to be secondary
to non-illness related conditions like malnutrition, climate, overwork.
2: Denies being ill at all. Delusional patients are by definition without
insight. Enquiries should therefore be directed to the patient ’ s attitude to
his symptoms of Guilt (item 2) or Hypochondriasis (item 15), but other
delusional symptoms should also be considered.
17. Weight lossTry to get objective information; if such is not available be conservative in
estimation.
0: No weight loss.
1: Weight loss less than two kg.
2: Weight loss of 2 kg or more.
Pure depression Stress-related arousal
(A)
(C)
(B)
1. Depressed mood
2. Guilt
7. Activities and interests
8. Psychomotor retardation
10. Anxiety, psychic
13. Somatic symptoms – general
Insomnia : initial
Insomnia : middle
Insomnia : late
Psychomotor agitation
Anxiety, somatic
Gastrointestinal symptoms
Sexual disturbances
Hypochondriasis
4.
5.
6.
9.
11.
12.
14.
15.
17. Weight lossSuicidal thoughts3.
16. Insight
HAM-D6Total score:
HAM-D2Total score:
HAM-D9Total score:
Suicide risk behaviour
ABC version of the Hamilton Depression scale (HAM-D)
(A) (B)(C)
HAM-D17Totalscore: (A+B+C)
132
Scoring sheet
Nr. Symptom Score
1 Depressed mood 0–4
2 Low self-esteem, guilt 0–4
3 Suicidal thoughts 0–4
4 Insomnia: initial 0–2
5 Insomnia: middle 0–2
6 Insomnia: late 0–2
7 Work and interests 0–4
8 Psychomotor retardation 0–4
9 Psychomotor agitation 0–4
10 Anxiety, psychic 0–4
11 Anxiety, somatic 0–4
12 Gastrointestinal symptoms (appetite) 0–2
13 Somatic symptoms, general 0–2
14 Sexual disturbances 0–2
15 Hypochondriasis (somatisation) 0–4
16 Insight 0–2
17 Weight loss 0–2
18 Diurnal variation 0–2
19 Depersonalization and derealisation 0–4
20 Paranoid symptoms 0–4
21 Obsessional and compulsive symptoms 0–2
22 Helplessness 0–4
23 Hopelessness 0–4
24 Worthlessness 0–4
Total score 0–76
Appendix 3b Hamilton Depression Scale (HAM-D 24 )
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Hamilton depression scale (HAM-D24) 133
Hamilton Depression Scale (HAM-D 24 )
Manual
18. Diurnal variation 0: None.
1: Mild.
2: Severe.
19. Depersonalization and derealoization Such as: feelings of unreality, nihilistic ideas .
0: Absent.
1: Mild.
2: Moderate.
3: Severe.
4: Incapacitating.
20. Paranoid symptoms 0: None.
1: Suspicious.
2: Ideas of reference.
3: Delusions of reference and persecution.
4: Hallucinations.
21. Obsessional and compulsive symptoms 0: Absent.
1: Mild.
2: Severe.
22. Helplessness 0: Not present.
1: Patient reports mild feelings of helplessness.
2: Moderate feelings of helplessness.
3: Strong feeling of helplessness.
4: Strong feelings of helplessness AND has given up routine activities of
normal life (decreased personal hygiene, doesn’t get out of bed, difficulty
feeding self, etc.).
134 Clinical Psychometrics
23. Hopelessness Pessimistic about future
0: Not present.
1: Very mild feelings of hopelessness.
2: Feels “hopeless” but accepts reassurances.
3: Expresses feelings of discouragement, despair, pessimism about future,
which cannot be dispelled.
4: Inappropriately perseverates, “I’ll never get well” or equivalent.
24. Worthlessness Ranges from mild loss of esteem, feelings of inferiority, self-deprecation to delu-
sional notions of worthlessness .
0: Not present.
1: Very mild feelings of low self-esteem.
2: Feelings of worthlessness.
3: Strong feelings of worthlessness.
4: Delusions of worthlessness, “I am a sinner”.
135
Appendix 3c ABC Version of the Montgomery-Åsberg Depression Scale (MADRS10)
Specific depression state(MADRS6)
Unspecific (arousal) state(MADRS3)
(A)
(C)
(B)1. Apparent sadness (0 – 6)
2. Reported sadness (0 – 6)
3. Inner tension (0 – 6)
7. Lassitude (0 – 6)
8. Inability to feel (0 – 6)
9. Pessimistic thoughts (0 – 6)
10.
Total score: Total score:
Suicide risk behaviour
(A) MADRS6 (B) MADRS3(C) MADRS1
MADRS10Total score: (A+B+C)
4. Reduced sleep (0 – 6)
5. Reduced appetite (0 – 6)
6. Concentration (0 – 6)
Suicidal thoughts (0 – 6)
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
136
The time frame (window) is the past three days.
Scoring sheet
No. Symptom Score
1 Depressed mood 0–4
2 Tiredness 0–4
3 Work and interests 0–4
4 Concentration difficulties 0–4
5 Sleep disturbances 0–4
6 Psychic anxiety 0–4
7 Emotional introversion 0–4
8 Worthless and guilt 0–4
9 Suicidal thoughts 0–4
10 Decreased verbal activity 0–4
11 Decreased motor activity 0–4
Total score 0–44
No depression: 0–6 Doubtful depression: 7–10 Mild depression: 11–14 Moderate depression: 15–24 Severe depression: 25–44
Appendix 3d The Bech-Rafaelsen Melancholia Scale (MES)
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
The Bech-Rafaelsen melancholia scale (MES) 137
The Bech-Rafaelsen Melancholia Scale (MES)
Manual
Item 1 Depressed mood 0: Not depressed
1: Slight tendencies to lowered spirits
2: More clearly preoccupied with unpleasant feelings although without
clear hopelessness
3: Markedly lowered mood. Feelings of hopelessness clearly present and/or
clear non-verbal signs of lowered mood
4: Severe degrees of lowered mood. Pronounced degree of hopelessness
Item 2 Tiredness 0: Not present
1: Very mild feelings of tiredness
2: More clearly in a state of tiredness or weakness, but still no impairment
on the daily life activities
3: Marked feelings of tiredness which occasionally interfere with the daily
life activities
4: Extreme feelings of tiredness which interfere more constantly with the
daily life activities
Item 3 Work and interests 0: No difficulties in social life (work) activities or interests
1: Slight problems with usual daily activities (at home or outside home)
2: More clearly insufficiency in social life activities or interests but without
helplessness
3: Difficulties in performing even daily routine activities, which are carried
out with great effort. Tendencies to helplessness
4: Completely unable to go through with routine activities without aid
from others, i.e. extreme helplessness
Item 4 Concentration difficulties This item includes both concentration difficulties and memory problems
0: Not present
1: Very mild tendencies to concentration disturbances
2: More clearly difficulties in concentration or problems in decision
making but still without impact on daily life activities
138 Clinical Psychometrics
3: Concentration disturbances/memory problems so great that reading
more than newspaper headlines or watching even shorter television
program is difficult
4: It is clear even during the interview that there are difficulties in concen-
tration
Item 5 Sleep disturbances This item only covers the subjective experience of reduced sleep length
(hours of sleep/24 hours), irrespective of possible sedatives. The assessment
should be based on the three preceding nights, The score is the average of the
past three nights
0: No reduced sleep length
1: Duration sleep slightly reduced
2: Duration of sleep clearly but still only moderately reduced, i.e. still less
than a 50% reduction
3: Duration of sleep reduced with 50% or more
4: Duration of sleep extremely reduced, e.g. as if not been sleeping at all
Item 6 Psychic anxiety 0: Not present
1: Very mild tendencies to worry, feeling fear or apprehension
2: More clearly in a state of worrying, feeling insecure or afraid, which,
however, it is still possible to control
3: The psychic anxiety or apprehension is at times more difficult to control.
On the edge of panic
4: Extreme degree of anxiety, interfering greatly with the daily life activities
Item 7 Emotional introversion 0: Not present
1: Very mild tendencies to draw back for emotional contact with other
people, e.g. colleagues
2: More clear emotionally introverted to other people apart from close
friends or family members
3: Moderately to markedly introverted even towards close friends or family
members
4: Is isolated or emotionally introverted to an extreme degree
The Bech-Rafaelsen melancholia scale (MES) 139
Item 8 Worthless and guilt 0: No loss of self-esteem, no self-depreciation or guilt feelings
1: Is concerned with the experience of being a burden to family, friends
or colleagues due to reduced interests or introversion
2: Focussing on negative events in the past prior to the current episode of
depression. However, still to a mild degree
3: More clearly focussed on negative events in the past accompanied with
the feeling that the current depression is a kind of punishment for pre-
vious omissions or failures. However, can intellectually still se that this
view is unfounded
4: The guilt feelings have become paranoid ideas
Item 9 Suicidal thoughts 0: Not present
1: Feels that life is not worthwhile, but expresses no wish to die
2: Wishes to die (“it would be a relief not to wake up next morning”) but
has no plans to take own life.
3: Probably has plans to take own life
4: Has definitely plans to take own life
Item 10 Decreased verbal activity 0: Not present
1: Very mild problems in verbal formulation
2: More pronounced inertia in conversation, for example, a trend to longer
pauses
3: Interview is clearly influenced by brief responses or longer pauses
4: Interview is clearly prolonged due to decreased verbal formulation
activity
Item 11 Decreased motor activity 0: Not present
1: Very mild tendencies to decreased motor activity, for example, facial
expression slightly reduced
2: Moderately reduced motor activity, e.g. reduced gestures
3: Markedly reduced motor activity, e.g. all movements slow
4: Severely reduced motor activity, approaching stupor
140
Appendix 3e ABC version of the SCL-92 analogue with HAM-D17
Pure depressionSCL-6
Stress-related arousalSCL-9
(A)
(C)
(B)30.
26.
32.
71.
31.
14.
Feeling blue
Blaming yourself for things
Feeling no interest in things
Feeling everything is an effort
Worrying too much about things
Feeling low in energy or slowed down
44.
66.
64.
78.
2.
57.
19.
5.
87.
Trouble falling asleep
Sleep that is restless or disturbed
Awakening in the early morning
Feeling so restless you couldn’t sitstillNervousness or shaking inside
Feeling tense or keyed up
Poor appetite
Loss of sexual interest or pleasure
The idea that something serious iswrong with your body
59.
15.
Thoughts of death or dying
Thoughts of ending your life
SCL-6Total score:
SCL-2Total score:
SCL-9Total score:
Suicide risk behaviourSCL-2
(A) (B)(C)
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
141
The time frame (window) is the past three days.
Hamilton Depression Subscale and item definitions
1 DEPRESSED MOOD Score
0 Not present.
1 Very mild tendencies towards lowered spirits.
2 Moderate signs of being depressed
3 Markedly depressed. Some hopelessness and/or clear non-verbal signs of depression.
4 Severe degree of lowered mood. Pronounced hopelessness.
2 LOW SELF-ESTEEM AND GUILT
0 No self-depreciation, low self-esteem or guilt feelings.
1 Concerned with the fact of being a burden to the family, friends or colleagues.
2 Signs of guilt feelings about incidents (minor omissions or failures) prior to current episode of depression.
3 Feels that current depression is a punishment for failures or omissions in the past.
4 Feels that the current depression is a well-deserved punishment.
3 WORK AND INTERESTS
0 No difficulties; time feels useful.
1 Mild insufficiencies in social and day-to-day activities.
2 Moderate signs of lack of interest in doing things or day-to-day activities.
Appendix 3f HAM-D 6 – clinician version
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
142 Clinical Psychometrics
3 Difficulties in performing even daily routine activities which are carried out with great effort.
4 Often needs help in performing self-care activities (unable to function independently).
4 PSYCHOMOTOR RETARDATION, GENERAL
0 Norma psychomotor condition.
1 Motoric speed slightly reduced.
2 Clear signs of reduced speed, e.g. reduced gestures, facial expression and slow pace.
3 The interview is clearly prolonged due to long breaks and brief answers.
4 The interview can hardly or not be completed due to retardation.
5 PSYCHIC ANXIETY
0 Not present.
1 Mild tendencies towards tenseness, worry, fear or apprehension.
2 Moderate anxiety, apprehension or insecurity.
3 Difficulty controlling anxiety or apprehension; sometimes at the edge of panic.
4 Extreme degree of anxiety
6 TIREDNESS AND PAINS
0 Not present
1 Doubtful or very vague feelings of tiredness or pain.
2 Moderate to severe tiredness or pains.
Total score
Sum: HAM-D 6 No depression: 0–4 Depression doubtful: 5–6 Mild depression: 7–8 Moderate depression: 9–11 Severe depression 12–22
143
Appendix 3g The HAM-D 6 Questionnaire
In this questionnaire you will find six groups of statements. Please choose the
one statement in each group that best describes how you have been feeling
over the past three days, including today, and mark it with an X in the
corresponding box.
(1) During the past three days
I have been in my usual good mood 0
I have felt a little more sad than usual 1
I have been clearly more sad than usual, but haven ’ t felt hopeless 2
I have been so gloomy that I briefly have felt overpowered
by hopelessness
3
I have been so low in my moods that everything seems dark
and hopeless
4
(2) During the past three days
I have been quite satisfied with myself 0
I have been a little more self-critical than usual with a tendency
to feel less worthy than others
1
I have been brooding over my failures in the past 2
I have been plagued with distressing guilt feelings 3
I have been convinced that my current condition is a punishment 4
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
144 Clinical Psychometrics
(3) During the past three days
My daily activities have been as usual 0
I have been less interested in my usual activities 1
I have felt that I have had difficulty performing my daily
activities, but I was still able to perform them with great effort
2
I have had difficulty performing even simple routine activities 3
I have not been able to do any of the most simple day-to-day
activities without help
4
(4) During the past three days
I have felt neither restless nor slowed down 0
I have felt a little slowed down 1
I have felt rather slowed down or have been talking a little less
than usual
2
I have felt clearly slowed down or subdued or have talked much
less than usual
3
I have hardly been talking at all or felt extremely slowed down
all the time
4
(5) During the past three days
I have been calm and relaxed 0
I have felt a little more tense or insecure than usual 1
I have been clearly more worried or tense than usual, but have
not felt that I lost control
2
I have been so tense or worried that I have briefly I felt close to panic 3
I have had episodes where I was overwhelmed by panic 4
(6) During the past three days
I have been as active and have had as much energy as usual 0
I have felt rather low in energy or physically unwell with some
bodily pains
1
I have felt very low in energy or had bodily pains 2
145
Appendix 3h SCL-D6 subscale for depression
How much were you bothered by:
Not at all A little bit Moderately Quite a bit Extremely
(30) Feeling blue
Blaming yourself forthings
(26)
Worrying too muchabout things
(31)
Feeling everything is aneffort
(71)
Feeling low in energy orslowed down
(14)
Feeling no interest inthings
(32)
SCL-D6
In this questionnaire please mark with an X how you have been feeling over
the past week, including today.
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
146
Appendix 3i The BDI6 subscale for depression
1I do not feel sad
I feel sad and depressed
I feel constantly sad and depressedand feel unable to get out of it
I feel so blue and unhappy that I cannotbear it
BDI6
A
B
D
C
5 I don’t feel particularly guilty
I feel bad or unworthy a good part of thetime
I feel quite guilty
I feel constantly as thought I am guiltyand worthless
A
B
C
D
In this questionnaire you will find six groups of statements. Please choose the
one statement in each group (A, B, C or D) that best describes how you have
been feeling over the past three days, including today, and mark it with an X
in the corresponding box (A, B, C or D).
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
The BDI6 subscale for depression 147
11 I am no more irritable now than I everwas
I get annoyed or irritable more easilythan I used to
I feel irritated all the time
I don’t get irritated at all about thethings that used to irritate me
BDI6
A
B
D
C
13I make decisions about as well as ever
I try to put off making decisions
I have great difficulty in makingdecisions
I cannot make any decisions at allanymore
A
B
C
D
A
B
D
C
A
B
C
D
17I don’t get more tired than usual
I get tired more easily than I sued to
I get tired from doing anything
I get too tired to do anything
BDI6
A
B
D
C
15I can work about as well as before
It takes extra effort to get started atdoing something
I have to push myself very hard to doanything
I can’t do any work at all
A
B
C
D
A
B
D
C
A
B
C
D
148
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
The following questions ask about how you have been feeling over the last
two weeks . Please put a tick in the box which is closest to how you have been
feeling. A higher number signifies a higher degree of depression.
How much of the time in the last two weeks …
All the time
Most of the time
Slightly more than half the time
Slightly less than half the time
Some of the time
At no time
1 Have you felt low in spirits or sad? 5 4 3 2 1 0
2 Have you lost interest in your daily activities? 5 4 3 2 1 0
3 Have you felt lacking in energy and strength? 5 4 3 2 1 0
4 Have you felt less self-confident? 5 4 3 2 1 0
5 Have you had a bad conscience or feelings of guilt? 5 4 3 2 1 0
6 Have you felt that life wasn’t worth living? 5 4 3 2 1 0
7 Have you had difficulty in concentrating, e.g. when reading the newspaper or watching TV? 5 4 3 2 1 0
8a Have you felt very restless? 5 4 3 2 1 0
8b Have you felt subdued or slowed down? 5 4 3 2 1 0
Appendix 4a Major Depression Inventory
Major depression inventory 149
How much of the time in the last two weeks …
All the time
Most of the time
Slightly more than half the time
Slightly less than half the time
Some of the time
At no time
9 Have you had trouble sleeping at night? a: too little sleep b: too much sleep 5 4 3 2 1 0
10a Have you suffered from reduced appetite? 5 4 3 2 1 0
10b Have you suffered from increased appetite?
5
4
3
2
1
0
Total score
Depression Inventory MDI: Scoring key
At the top the diagnostic demarcation line is indicated. The total score of the
10 items is filled in below
The diagnostic demarcation line
How much of the time …
All the time
Most of the time
Slightly more than
half the time
Slightly less than
half the time
Some of the time
At no
time
Core symptoms
1 Have you felt low in spirits or sad?
5 4 3 2 1 0
2 Have you lost interest in your daily activities?
5 4 3 2 1 0
3 Have you felt lacking in energy and strength?
5 4 3 2 1 0
Accomp- any ing symptoms
4 Have you felt less self-confident?
5 4 3 2 1 0
5 Have you had a bad conscience or feelings of guilt?
5 4 3 2 1 0
6 Have you felt that life wasn’t worth living?
5 4 3 2 1 0
7 Have you had difficulty in concentrating, e.g. when reading the newspaper or watching TV?
5 4 3 2 1 0
Highest score
8a Have you felt restless? 5 4 3 2 1 0 8b Have you felt subdued
or slowed down? 5 4 3 2 1 0
9 Have you had difficulty sleeping at night? a: too little sleep b: too much sleep
5 4 3 2 1 0
Highest score
10a Have you suffered from reduced appetite?
5 4 3 2 1 0
10b Have you suffered from increased appetite?
5 4 3 2 1 0
Total score (item 1 – 10)
Diagnosis: ICD-10 ___________________ DSM-IV___________________
Major depression inventory 151
Major Depression Inventory (MDI): A depression questionnaire with a
dual function
MDI: Scoring instructions
The questionnaire consists of the ten symptoms contained in the World
Health Organization WHO’s depression demarcation. WHO employs the last
two weeks as the period of time in which to assess whether each symptom
has been present for more than half the time. These symptoms are mainly
subjective; therefore it is natural to ask the patient to complete the question-
naire, allowing the patient to tick each symptom. A higher number signifies a
more constant presence of the symptom in question. Remember to fill in
patient name and the date
The patient’s completed questionnaire is scored using the scoring key.
MDI (Major Depression Inventory) has a dual function, as it is scored both
as an instrument of severity (A) similar to the Hamilton Depression Scale,
and (B) as a diagnostic tool.
(A) If MDI is used as a rating scale in the same way as the Hamilton scales,
then the sum of the ten questions indicates the degree of depression. For
item 8 and 10, with two answer categories for each (a) and (b), the high-
est score is used. The theoretical score range is thus from 0 (no depres-
sion) to 50 (maximum depression).
Mild depression: MDI total score from 21 to 25
Moderate depression: MDI total score from 26 to 30
Severe depression: MDI total score of 31 or higher
(B) MDI as a diagnostic tool : the vertical line (the diagnostic demarcation
line) is used as indicated above. The three top symptoms which reflect
the core symptoms of the WHO/ICD-10 diagnosis of depressions must
have been present during the last two weeks for most of the time. The
accompanying symptoms in the remaining seven MDI items must have
been present during the last two weeks for more than half of the time.
The ICD-10 algorithm:
Mild depression: 2 core symptoms and 2 accompanying
symptoms
Moderate depression: 2 core symptoms and 4 accompanying
symptoms
Severe depression: 3 core symptoms and 5 accompanying
symptoms.
152 Clinical Psychometrics
MDI can also be employed when diagnosing DSM-IV major depression.
According to DSM-IV only nine symptoms are used, as the DSM-IV item 4
is included in item 5. Thus the item with the highest score is used here.
The DSM-IV algorithm : 5 out of the 9 symptoms should be present. Of these one should be one of the
two first items; according to DSM-IV these are core symptoms.
A more precise major depression diagnosis depends on the answer to item
9 (a) or (b) and to item 10 (a) or (b).
Major depression without inverse neurovegetative symptoms: a score on
9a and 10a.
Major depression with inverse neurovegetative symptoms: a score on 9b
and 10b.
153
Appendix 4b Dealing with missing values in the Major Depression Inventory (MDI)
A. As a rating scale (total score)
1. Items 8a and 8b; use the highest score
2. Items 10a and 10b: use the highest score
3. When only two out of these ten new items are missing, then the total
score is calculated as (the sum of the items) / (number of items) * 10.
4. If more than two out of the ten items are missing, then omit calculating
total score.
B. As a diagnostic tool
1. As in the 2 first paragraphs in section A.
2. For Items 4 and 5: use the highest score
3. For the nine new items:
a) For the 3 first items: a score ≥ 4 = 1, a score < 4 = 0
b) For the 6 last items: a score ≥ 3 = 1, a score < 3 = 0
4. Major depression is present if the sum of the 9 items ≥ 5 and the sum of
the two first items is ≥ 1.
5. Major depression can be ruled out if the sum of the 9 items < 5 or the sum
of the first two items = 0.
6. Thus, theoretically, major depression can be confirmed when there are
fewer than 5 missing items.
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
154
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
The time frame (window) is the past three days.
Scoring sheet
Nr. Symptom Score
1 Anxious mood 0-4
2 Tension 0-4
3 Fears 0-4
4 Insomnia 0-4
5 Difficulties in concentration and memory 0-4
6 Depressed mood 0-4
7 General somatic symptoms (Muscular symptoms) 0-4
8 Generelle somatiske klager (Sensory) 0-4
9 Cardiovascular symptoms 0-4
10 Respiratory symptoms 0-4
11 Gastrointestinal symptoms 0-4
12 Genito-urinary symptoms 0-4
13 Other autonomic symptoms 0-4
14 Behaviour during interview 0-4
Total score 0-56
Symptoms scored from 0 to 4 Sum
0 = not present 6 to 14 = mild anxiety 1 = mild degree 15 to 28 = moderate anxiety 2 = moderate degree 29 to 52 = severe anxiety 3 = marked degree 4 = maximum degree
Appendix 5a Hamilton Anxiety Scale HAM-A 14
Hamilton anxiety scale HAM-A14 155
Hamilton Anxiety Scale (HAM-A 14 )
Manual
1. Anxiety This item covers the emotional condition of uncertainty about the future,
ranging from worry, insecurity, irritability, apprehension to overpowering
dread. The patient’s report of worrying, insecurity, uncertainty, fear and
panic, i.e, the psychic, or mental (‘central’) anxiety experience is weighed.
0: The patient is neither more nor less insecure or irritable than usual.
1: The patient reports more tension, irritability or feeling more insecure
than usual.
2: The patient expresses more clearly to be in a state of anxiety,
apprehension or irritability, which he may find difficult to control. It is
thus without influence on the patient ’ s daily life, because the worrying
still is about minor matters
3: The anxiety or insecurity is at times more difficult to control because
the worrying is about major injuries or harms which might occur in the
future. E.g., the anxiety may be experienced as panic, i.e., overpowering
dread: has occasionally interfered with the patient ’ s daily life.
4: The feeling of dread is present so often that it markedly interferes with
the patient ’ s daily life.
2. Tension This item includes inability to relax, nervousness, bodily tensions, trembling
and restless fatigue.
0: The patient is neither more nor less tense than usual.
1: The patient indicates to be somewhat more nervous and tense than usual.
2: The patient expresses clearly to be unable to relax, full of inner unrest
which he finds difficult to control, but still without influence on the
patient ’ s daily life.
3: The inner unrest and nervousness is so intense or so frequent that it
occasionally has interfered with the patient ’ s daily work.
4: Tensions and unrest interfere with the patient ’ s life and work at all times.
3. Fears A type of anxiety which arises when the patient finds himself in special situ-
ations. Such situations may be open or closed rooms, to queue, to ride a bus
or a train. The patient shall experience relief by avoiding such situations. It is
important to notice at this evaluation, whether there has been more phobic
anxiety during the present episode than usual.
156 Clinical Psychometrics
0: Not present.
1: Doubtful if present.
2: The patient has experienced phobic anxiety, but was able to fight it.
3: It has been difficult for the patient to fight or overcome his phobic
anxiety which has thus to a certain extent interfered with the patient ’ s
daily life and work.
4: The phobic anxiety has clearly interfered with the patient ’ s daily life
and work.
4. Insomnia This item covers only the patient ’ s subjective experience of sleep length (hours
of sleep per 24-hour-period) and sleep depth (superficial and interrupted
sleep versus deep and steady sleep). The rating is based on the three preceding
nights. Note: Administration of hypnotics or sedatives shall be disregarded.
0: Usual sleep length and sleep depth.
1: Sleep length is doubtfully or slightly reduced (e.g., due to difficulties
failing asleep), but no change in sleep depth.
2: Sleep depth is now also reduced, sleep being more superficial. Sleep as a
whole somewhat disturbed.
3: Sleep duration as well as sleep depth is markedly changed. The broken
sleep periods total only a few hours per 24-hour-period.
4: It is difficult here to ascertain sleep duration as sleep depth is so shallow that
the patient speaks of short periods of slumber or dosing, but no real sleep.
5. Difficulties in concentration and memory This item covers difficulties in concentration, making decisions about
everyday matters, and memory.
0: The patient has neither more nor less difficulties in concentration and/
or memory than usual.
1: It is doubtful whether the patient has difficulties in concentration and/
or memory.
2: Even with a major effort it is difficult for the patient to concentrate on
his daily routine work.
3: More pronounced difficulties with concentration, memory, or decision
making. E.g., has difficulties reading an article in a newspaper or watching
a television programme right through. Scores 3 as long as the loss of
concentration or poor memory has not clearly influenced the interview.
4: When the patient during the interview has shown difficulty in concentration
and/or memory, and/or when decisions are reached with considerable delay.
6. Depressed mood This item covers both the verbal and the non-verbal communication of
sadness, depression, despondency, and hopelessness.
Hamilton anxiety scale HAM-A14 157
0: Natural mood.
1: When it is doubtful whether the patient is more despondent or sad than
usual. E.g., the patient indicates vaguely to be more depressed than usual.
2: When the patient more clearly is concerned with unpleasant
experiences, although he still is without hopelessness.
3: The patient shows clear non-verbal signs of depression and/or
hopelessness.
4: The patient ’ s remarks on despondency and the non-verbal ones dominate
the interview in which the patient cannot be distracted.
7. General somatic symptoms (muscular symptoms) This item includes weakness, stiffness, soreness merging into real pain,
which is more or less diffusely localised in the muscles. E.g., jaw ache or
neck ache.
0: The patient is neither more nor less sore or stiff in his muscles than usual.
1: The patient indicates to be somewhat more sore or stiff in his muscles
than usual.
2: The symptoms have gained the character of pain.
3: The muscle pains interfere to some extent which the patient ’ s daily life
and work.
4: The muscle pains are present most of the time and interfere clearly with
the patient ’ s daily life and work.
8. General somatic symptoms (sensory symptoms) This item includes increased fatigability and weakness merging into real
functional disturbances of the senses. Including: tinnitus, blurring of vision,
hot and cold flushes and prickling sensations.
0: Not present
1: It is doubtful whether the patient ’ s indications of pressing or prickling
sensations (e.g., in ears, eyes or skin) are more pronounced than usual.
2: The pressing sensations in the ear reach the character of buzzing in the
ears, in the eye as visual disturbances, and in the skin as prickling or
itching sensations (paraesthesias).
3: The generalised sensory symptoms interfere to some extent with the
patient ’ s daily life and work.
4: The generalised sensory symptoms are present most of the time and
interfere clearly with the patient ’ s daily life and work.
9. Cardiovascular symptoms This item includes tachycardia, palpitations, oppression, chest pain, throbbing
in the blood vessels, and feelings of fainting.
158 Clinical Psychometrics
0: Not present.
1: Doubtful if present.
2: Cardiovascular symptoms are present, but the patient can still control the
symptoms.
3: The patient has now and again difficulties in controlling the
cardiovascular symptoms which thus to some extent interfere with the
patient ’ s daily life and work.
4: The cardiovascular symptoms are present most of the time and interfere
clearly with the patient ’ s daily life and work.
10. Respiratory symptoms This item includes feelings of constriction or contraction in throat or chest,
dyspnoea merging into choking sensations and sighing respiration.
0: Not present.
1: Doubtful if present.
2: Respiratory symptoms are present, but the patient can still control the
symptoms.
3: The patient has now and again difficulties in controlling the respiratory
symptoms which thus to some extent interfere with the patient ’ s daily
life and work.
4: The respiratory symptoms are present most of the time and interfere
clearly with the patient ’ s daily life and work.
11. Gastro-intestinal symptoms The item includes difficulties in swallowing, ‘sinking’ sensation of the stom-
ach, dyspepsia (heartburn or burning sensations in the stomach, abdominal
pains related to meals, fullness, nausea and vomiting), abdominal rumbling
and diarrhoea.
0: Not present.
1: Doubtful if present (or doubtful if different from the patient ’ s ordinary
gastrointestinal sensations).
2: One or more of the above-mentioned gastro-intestinal symptoms are
present, but the patient can still control the symptoms.
3: The patient has now and again difficulties in controlling the gastrointestinal
symptoms which thus to some extent interfere with the patient ’ s daily life
and work. E.g., tendency of losing control over the bowels.
4: The gastrointestinal symptoms are present most of the time and
interfere clearly with the patient ’ s daily life and work. E.g., losing control
over the bowels.
Hamilton anxiety scale HAM-A14 159
12. Genito-urinary symptoms This item includes non-organic or psychic symptoms such as frequent
or more pressing passing of urine, menstrual irregularities, anorgasmia,
dyspareunia, premature ejaculation, loss of erection.
0: Not present.
1: Doubtful if present (or doubtful if different from the ordinary genito-
urinary sensations).
2: One or more of the above-mentioned genito-urinary symptoms are
present, but they do not interfere with the patient ’ s daily life and work.
3: The patient has now and again one or more of the above mentioned
genito-urinary symptoms to such a degree that they to some extent
interfere with the patient ’ s daily life and work. E.g., tendency of losing
control over micturation.
4: The genito-urinary symptoms are present most of the time and interfere
clearly with the patient ’ s daily life and work. E.g., losing control over
micturation.
13. Autonomic symptoms This item includes dryness of mouth, blushing or pallor, sweating and dizziness.
0: Not present.
1: Doubtful if present.
2: One or more of the above-mentioned autonomic symptoms are present,
but they do not interfere with the patient ’ s daily life and work.
3: The patient has now and again one or more of the above-mentioned
autonomic symptoms to such a degree that they to some extent interfere
with the patient ’ s daily life and work.
4: The autonomic symptoms are present most of the time and interfere
clearly with the patient ’ s daily life and work.
14. Behaviour at interview This item is based on patient behaviour during the interview. Did the patient
appear tense, nervous, agitated, restless, fidgeting, tremulous, pale, hyper-
ventilating, or sweating?
On the basis of such observations a global estimate is made:
0: The patient does not appear anxious.
1: It is doubtful whether the patient is anxious.
2: The patient is moderately anxious.
3: The patient is clearly anxious.
4: The patient is overwhelmed by anxiety. E.g., shaking and trembling all over.
160
Appendix 5b Anxiety Symptom Scale (ASS)
The following questions ask about how you have been feeling over the past two
weeks. Please put a tick in the box that is closest to how you have been feeling.
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
161
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
The time frame (window) is the past three days.
Scoring sheet
No. Symptom Score
1 Elevated mood 0–4
2 Increased verbal activity 0–4
3 Increased social contact (intrusiveness)
0–4
4 Increased motor activity 0–4
5 Sleep disturbances 0–4
6 Work activities (distractibility) 0–4
7 Irritable mood, hostility 0–4
8 Increased sexual activity 0–4
9 Increased self-esteem 0–4
10 Flight of thoughts 0–4
11 Noise level 0–4
Total score 0–44
No mania: 0–6 Doubtful mania: 7–10 Hypomania: 11–14 Moderate mania: 15–24 Marked/severe mania: 25–44
Appendix 6 The Bech-Rafaelsen Mania Scale (MAS)
162 Clinical Psychometrics
The Bech-Rafaelsen Mania Scale (MAS) Manual
Item 1 Elevated mood 0: Not present
1: Slightly elevated mood, optimistic, but still adapted to situation
2: Moderately elevated mood, joking, laughing, however, somewhat irrele-
vant to situation
3: Markedly elevated mood, exuberant both in manner and speech, clearly
irrelevant to situation
4: Extremely elevated mood, quite irrelevant to situation
Item 2 Increased verbal activity 0: Not present
1: Somewhat talkative
2: Clearly talkative, few spontaneous intervals in the conversation, but still
not difficult to interrupt
3: Almost no spontaneous intervals in the conversation, difficult to
interrupt
4: Impossible to interrupt, dominates the conversation completely
Item 3 Increased social contact (intrusiveness) 0: Not present
1: Slightly meddling (putting his/her oar in), slightly intrusive
2: Moderately meddling and arguing or intrusive
3: Dominating, arranging, directing, but still in context with the setting
4: Extremely dominating and manipulating, not in context with the
setting
Item 4 Increased motor activity 0: Not present
1: Slightly increased motor activity (e.g., some tendency to lively facial
expression)
2: Clearly increased motor activity (e.g., lively facial expression, not able to
sit quietly in chair)
3: Excessive motor activity, on the move most of the time, but the patient
can sit still if urged to (rises only once during interview)
4: Constantly active, restlessly energetic. Even if urged to, the patient
cannot sit still
The Bech-Rafaelsen mania scale (MAS) 163
Item 5 Sleep disturbances This item covers the patient’s subjective experience of the duration of sleep
(hours of sleep per 24-h periods). The rating should be based on the three
preceding nights, irrespective of the administration of hypnotics or sedatives.
The score is the average of the past three nights .
0: Not present (habitual duration of sleep)
1: Duration of sleep reduced by 25%
2: Duration of sleep reduced by 50%
3: Duration of sleep reduced by 75%
4: No sleep
Item 6 Work activities (distractibility) Work activity should be measured in terms of the degree of disability or
distractibility in social, occupational or other important areas of
functioning .
0: No difficulties
1: Slightly increased drive, but work quality is slightly reduced as motiva-
tion is changing; the patient is somewhat distractible (attention drawn to
irrelevant stimuli)
2: Work activity clearly affected by distractibility, but still to a moderate degree
3: The patient occasionally loses control of routine tasks because of marked
distractibility
4: Unable to perform any task without help
Item 7 Irritable mood, hostility 0: Not present
1: Somewhat impatient or irritable, but control is maintained
2: Moderately impatient or irritable. Does not tolerate provocations
3: Provocative, makes threats, but can be calmed down
4: Overt physical violence; physically destructive
Item 8 Increased sexual activity 0: Not present
1: Slight increase in sexual interest and activity, for example, slightly flirta-
tious
2: Moderately increase in sexual interest and activity, for example, clearly
flirtatious
3: Marked increase in sexual interest and activity, excessively flirtatious
4: Completely preoccupied by sexual interests
164 Clinical Psychometrics
Item 9 Increased self-esteem 0: Not present
1: Slightly increased self-esteem, for example, overestimates slightly own
habitual capabilities
2: Moderate increased self-esteem, for example, overestimates more clearly
own habitual capabilities or hints at unusual abilities
3: Markedly unrealistic ideas, for example, believes he/she possesses
extraordinary abilities, powers or knowledge (scientific, religious etc),
but can quickly be corrected
4: Grandiose ideas which cannot be corrected
Item 10 Flight of thoughts 0: Not present
1: Somewhat lively in descriptions, explanations and elaborations without
losing the connection with the topic of the conversation. The thoughts
are thus still coherent
2: The patient’s thoughts are occasionally distracted by random associa-
tions (often rhymes, slangs, puns, pieces of verse or music)
3: The line of thoughts is more regularly disrupted by diversionary
associations.
4: It is very difficult or impossible to follow the patient because of the flight
of thoughts; he or she constantly jumps from one topic to another
Item 11 Noise level 0: Not present
1: Speaks somewhat loudly without being noisy
2: Voice discernible at a distance, and somewhat noisy
3: Vociferous, voice discernible at a long distance, is markedly noisy or singing
4: Shouting, screaming; or using other sources of noise due to hoarseness
165
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
With the two subscales
Nr. Item Score Schizophrenicity subscale
Depression subscale
1 Somatic concern (0–6)
2 Anxiety (psychic) (0–6)
3 Emotional withdrawal (0–6)
4 Conceptual disorganisation (0–6)
5 Self-depreciation and guilt feelings (0–6)
6 Anxiety (somatic) (0–6)
7 Specific motor disturbances (0–6)
8 Exaggereated self-esteem (0–6)
9 Depressive mood (0–6)
Appendix 7 Brief Psychiatric Rating Scale (BPRS)
166 Clinical Psychometrics
Nr. Item Score Schizophrenicity subscale
Depression subscale
10 Hostility (0–6)
11 Suspiciousness (0–6)
12 Hallucinations (0–6)
13 Psychomotor retardation (0–6)
14 Uncooperativeness (0–6)
15 Unusual thought content (0–6)
16 Blunted or inappropriate affect (0–6)
17 Psychomotor agitation (0–6)
18 Disorientation and confusion (0–6)
Total BPRS
Subtotal
schizophrenicity
Subtotal
depression
167
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
WHO (Five) Well-Being Index (1998 version)
Please indicate for each of the five statements which is closest to how you
have been feeling over the last two weeks. Notice that higher numbers mean
better well-being.
Example: If you have felt cheerful and in good spirits more than half of the
time during the last two weeks, put a tick in the box with the number 3 in the
upper right corner.
Over the last two weeks
All of the time
Most of the time
More than half of the time
Less than half of the time
Some of the time
At no time
1 I have felt cheerful and in good spirits
5 4 3 2 1 0
2 I have felt calm and relaxed
5 4 3 2 1 0
3 I have felt active and vigorous
5 4 3 2 1 0
4 I work up feeling fresh and rested
5 4 3 2 1 0
5 My daily life has been filled with things that interest me
5 4 3 2 1 0
Appendix 8a
Psychiatric Research UnitWHO Collaborating Centre in Mental Health
168 Clinical Psychometrics
Scoring The raw score is calculated by totalling the figures of the five answers.
The raw score ranges from 0 to 25, 0 representing worst possible and 25
representing best possible quality of life.
To obtain a percentage score ranging from 0 to 100, the raw score is multi-
plied by 4. A percentage score of 0 represents worst possible, whereas a score
of 100 represents best possible quality of life.
Interpretation It is recommended to administer the Major Depression (ICD-10) Inventory
if the raw score is below 13 or if the patient has answered 0 to 1 to any of the
five items. A score below 13 indicates poor wellbeing and is an indication for
testing for depression under ICD-10.
© Psychiatric Research Unit, WHO Collaborating Center for Mental Health,
Frederiksborg General Hospital, DK-3400 Hillerød
169
The correct scoring of the Hospitals Anxiety and Depression Scale to cover
positive well-being (WHO-5) and anxiety symptoms or neuroticism.
Appendix 8b The HADS subscales for positive well-being and anxiety symptoms
HADS
WHO-5 Eysenck Neuroticism
2. I still enjoy the things I used to enjoy
4. I can laugh and see the funny side of things
6. I feel cheerful
7. I feel relaxed
12. I look forward with enjoyment to things
1. I feel tense or ‘wound up’
3. I get a sort of frightened feeling as if something awful is about to happen
5. Worrying thoughts go through my mind
11. I feel restless as if I have to be on move
13. I get sudden feelings of panic
Remaining items:
8. I feel as if I am slowed down
9. I get a feeling like ‘butterflies’ in the stomach
10. I have lost interest in my appearance
14. I can enjoy a good book
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
170
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
F Etiological considerations
F1 Lack of insight (the last 3 days) 0 Absent .
1 Doubtful .
2 Admits to mental problems but not to being mentally ill.
3 Acknowledges possible change in behaviour, but denies mental i llness.
4 Denies any change in behaviour. Thus does not even feel stressed.
F2 a Psychological stress ( stressors) (around beginning of episode and 6 months retrospectively)
0 Absent . No psychological stress.
1 Doubtful .
2 Definitely present presence of long-term psycho-social stressor (e.g.,
divorce or work-related problems) considered to to have etiological
significance, i.e., condition would not have occurred without it).
F2b Post-traumatic stress disorder 0 Absent . No post-traumatic stress disorder.
1 Doubtful .
2 Definitely present when condition has developed during the course
of a few weeks after exposure to exceptionally catastrophic event.
F3 Neuroticism (covering premorbid history) 0 Absent
1 Doubtful presence of chronic tendency from early youth to anxiety,
worrying or feelings of inferiority.
Appendix 9a Etiological considerations in major depression by use of the Clinical Interview for Depression and Related Syndromes (CIDRS)
Etiological considerations in major depression 171
2 Mild . Slight tendency to personality structure with anxiety, worrying
and tension.
3 Mild to moderate . Mildly to moderately anxious personality struc-
ture (neuroticism), however without causing constraints in daily life.
4 Moderate to marked neuroticism, including tendency to introver-
sion, some degree of limitation in daily life.
5 Marked to severe neuroticism, causing constraints in daily life.
6 Extremely severe neuroticism causing chronic constraints in daily
life.
F4 Increased reactivity towards environment (the last 3 days)
0 Absent .
1 Doubtful or minimally present.
2 Mild . Unspecific factors, such as having someone to talk to, lead to
limited improvement.
3 Mild to moderate . Unspecific factors or certain specific situations
either lead to improvement or deterioration.
4 Moderate to marked . This condition varies to a considerable degree,
depending on the factors making up the situation.
5 Marked to severe . Certain factors frequently lead to complete disap-
pearance or triggering of condition.
6 Extremely severe . The condition depends entirely on quite specific
situations, which each time lead to complete disappearance or trig-
gering of it.
F5 Diurnal variation – symptoms worse in evening (the last 3 days)
0 Absent .
1 Doubtful , minimally present.
2 Mild .
3 Mild to moderate . Fluctuations of greater intensity or frequency.
4 Moderate to marked .
5 Marked to severe . Regular changes from considerable depression to
hardly any symptoms.
6 Extremely manifest changes.
F6 Diurnal variation – symptoms worse in morning (the last 3 days)
0 Absent .
1 Doubtful or minimal.
172 Clinical Psychometrics
2 Mild .
3 Mild to moderate . Fluctuations of greater intensity or frequency.
4 Moderate to marked .
5 Marked to severe . Regular changes from considerable condition to
hardly any symptoms.
6 Extremely marked changes in condition.
F7 Quality of depression (covering whole episode) 0 Absent . No difference from ordinary grief reaction or stress
condition.
1 Doubtfully present , as not a question of ordinary grief reaction or
stress condition.
2 Mild . Felt to be slightly different from ordinary feeling of stress.
3 Mild to moderate , definitely different from ordinary feeling of
stress.
4 Moderately to markedly different from ordinary feeling of stress, all
is negative.
5 Markedly to severely different from ordinary feeling of stress.
6 Extremely severe , pronounced difference from ordinary feeling of
stress, exceedingly different.
F8 Persistency and duration of condition (covering whole episode)
0 Absent .
1 Doubtful . Quite insignificant day-to-day variations.
2 Definite persistency . Condition the same from day to day, if any
change it tends to be an increase of symptoms.
3 Duration less than 6 months.
4 Duration 6–12 months.
5 Duration 12–24 months.
6 Duration more than 24 months.
F9 Depressive delusions (the last 3 days) 0 Absent .
1 Doubtful presence of actual delusions.
2 Mild . Vague depressive delusions which are not adhered to.
3 Mild to moderate depressive delusions as to physical illness or
financial problems. Not especially adhered to.
4 Moderate to marked depressive delusions, adhered to, to a cer-
tain extent.
5 Marked to severe depressive delusions, obstinately adhered to.
Etiological considerations in major depression 173
6 Extremely marked depressive delusions, completely dominating
condition.
F10 Previous depressive downs 0 Absent
1 Doubtful whether current episode has been preceded by depressive
downs differing from actual depressive episodes by short duration
(typically 4 days or less) and lesser degree of severity. However the lat-
ter element (degree of severity) is not so significant here as the presence
of recurrent episodes of short duration. Should not be confused with
premenstrual tension.
2 Has previously had one depressive down.
3 Has previously had 2–3 downs.
4 Has previously had 4–5 depressive downs.
5 Has previously had around 1 down per year.
6 Has previously had several downs per year.
F11 Previous depressive episodes (covering whole history [anamnesis])
0 Absent .
1 Doubtful whether current episode has been preceded by a delimited
depressive episode of at least 2 weeks duration.
2 Has previously had one depressive episode.
3 Has previously had 2 depressive episodes.
4 Has previously had 3 depressive episodes.
5 Has previously had 4 depressive episodes.
6 Has previously had 5 or more depressive episodes.
F12 Previous hypomanic ups 0 Absent .
1 Doubtful whether current episode has been preceded by
hypomanic ups differing from actual manic episodes by short
duration (typically 4 days or less) and lesser degree of severity
(i.e., without major impact on ability to work or on other social
activities).
2 Has previously had one up.
3 Has previously had 2–3 ups.
4 Has previously had 4–5 ups.
5 Has previously had around 1 up per year.
6 Has previously had several ups per year.
174 Clinical Psychometrics
F13 Previous manic episodes (covering whole history [anamnesis])
0 Absent
1 Doubtful whether current episode has been preceded by a delimited
manic episode of at least 1 week’s duration. 2
Has previously had 1 manic episode.
3 Has previously had 2 manic episodes.
4 Has previously had 3 manic episodes.
5 Has previously had 4 manic episodes.
6 Has previously had 5 or more manic episodes.
F14 Previous mixed states (covering whole history [anamnesis]) 0 Absent .
1 Doubtful whether current episode has been preceded by an episode
with both depressive and manic symptoms.
2 Has previously had 1 episode with mixed states.
3 Has previously had 2 episodes with mixed states.
4 Has previously had 3 episodes with mixed states.
5 Has previously had 4 episodes with mixed states.
6 Has previously had 5 or more episodes with mixed states.
F15 Hereditary disposition 0 Absent .
1 Doubtful .
2 Mild . Scanty information about distant relative with affective disor-
der characteristics.
3 Mild to moderate . Definite information about distant relative with
affective disorder (committed suicide, hospitalised for this, treated
for this).
4 Moderate to marked . Closer relatives (grandparents, half-siblings)
have/had affective disorder.
5 Marked to severe . A brother, sister or parent has/had affective disorder.
6 Extremely severe . Both a parent and a sibling have/had affective disorder.
F16 Somatic illness (around start of episode and 6 months retrospectively) includes e.g., postpartum depression, post-stroke depression and withdrawal symptoms after substance abuse (alcohol and other psychoactive drugs)
0 Absent .
1 Doubtful
Etiological considerations in major depression 175
2 Definitely present when the somatic illness is considered to have
etiological significance, i.e., condition would not have occurred with-
out it.
F17 Drug-/substance-induced condition 0 Absent .
1 Doubtful
2 Definitely present when treatment with drug is considered to to
have etiological significance, i.e., condition would not have occurred
without it.
176
Appendix 9b Newcastle Diagnostic Depression Scale (1965)
No. Item Score
Calculation
value Score
Calculation
value
1 Deviant
personality
2
1
0
0
+½
+1
2 Psychological
stresses
2
1
0
0
+1
+2
3 The quality of
depression
2
1
0
+1
+½
0
4 Weight loss
2
1
0
+2
+1
0
5 Previous
depressive
episodes
2
1
0
+1
+½
0
6 Motor activity
2
1
0
+2
+1
0
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Newcastle diagnostic depression scale (1965) 177
No. Item Score
Calculation
value Score
Calculation
value
7 Anxiety
2
1
0
–1
–½
0
8 Nihilistic
delusions
2
1
0
+2
+1
0
9 Accusations of
others
2
1
0
–1
–½
0
10 Feelings of guilt
2
1
0
+1
+½
0
Calculated total value
Endogenous depression = + 6 or more
Dubiously endogenous depression = + 5½
Non-endogenous depression = + 5 or less
178
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
PRISE 20
(Patient Related Inventory of Side Effects). (Bech P, Csillag C. Rational
polypharmacy in the acute therapy of major depression. In Tech 2011)
Modified after Wisniewski et al (2006)
Have you had any of these side effects over the past two weeks?
No Yes, but tolerable Yes – Distressing
1. Dry mouth ® ® ® 2. Nausea ® ® ® 3. Diarrhoea ® ® ® 4. Constipation ® ® ® 5. Dizziness ® ® ® 6. Palpitations ® ® ® 7. Sweating ® ® ® 8. Headache ® ® ® 9. Tremors ® ® ® 10. Difficulty sleeping: too little ® ® ® 11. Difficulty sleep: too much ® ® ® 12. Loss of sexual desire ® ® ® 13. Trouble achieving orgasm ® ® ® 14. Trouble with erections ® ® ® 15. Anxiety ® ® ® 16. Restlessness ® ® ® 17. Decreased energy ® ® ® 18. Increased appetite ® ® ® 19. Increased weight ® ® ® 20. Emotional indifference ® ® ®
Appendix 10 The modified PRISE 20 questionnaire for side effects of antidepressants
179
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Principal component analysis ( PCA ) for typing
DMSc.Thomas Teasdale, Associate Professor of Psychology at the University
of Copenhagen, has attempted to explain the mathematics of factor analysis
in his contribution to ‘Undersøgelsesmetoder i klinisk psykologi’ (Evaluation
methods in clinical psychology) (Munksgaard 1992). For this purpose he
presents a fictive version of the correlation matrix (Table A 11.1 ) which
emerges when measuring intelligence by six different tests, or items (A, B, C,
D, E, F). Table A 11.1 demonstrates that the six items correlate positively with
one another to a certain degree.
Based on this correlation matrix Thomas Teasdale has performed the
matrix algebra found in principal component analysis (PCA), namely the
mathematical method described by Hotelling in 1933 where one moves from
correlation coefficient to eigen vector (eigenvalue), which expresses the vari-
ance contained in the individual items.
Figure A 11.1 shows the eigenvectors, or eigenvalues, calculated by Teasdale
in his fictive version. The sum of these eigenvalues is 6 (= the number of
components).Thus the 1 st component has an eigenvalue of 3.1, the 2 nd com-
ponent a value of 1.3, the 3 rd component a value of 0.43, the 4 th component a
value of 0.41, the 5 th component is 0.39 and the 6 th component is 0.36. These
values are given in Figure A 11.1 , together with the percentage of variance
each of these components is responsible for.
In Figure A 11.1 ‘explained variance’ as a percentage is seen on the ordinate
axis. The six components are distributed on the abscissa axis. Thus the 1 st
Component explains 51.7 % of the variance and the 2 nd component explains
21.7% of the variance, which means that together the two first principal com-
ponents explain 73.4 % of the variance, making the remaining components
quite insignificant.
Appendix 11a Calculus Example 1
180 Clinical Psychometrics
The abscissa in Figure A 11.1 is labelled ‘Ramified hierarchy of typological
components’ to allow a reference to Russell’s typology.
The first principal component, which explains slightly more than 50% of
the variance, is named the general intelligence factor, here all six tests (A, B,
C, D, E and F) correlate positively; we were already aware of this from the
results in Table A 11.1 .
However, as demonstrated in Table A 11.2 this is shown more precisely by
the use of factor loadings which only give the correlation between the indi-
vidual tests and the component itself. The next principal component is bi-
directional, as seen in Table A 11.2 , as items A, B and C have positive
Explained variance
51.7%
100%
73.4%
80.1%87.3%
93.8%
Ramified hierarchy of typological components
1st component
3rd component
2nd component
5th component
4th component
6th component
3.1
0.41
0.43
1.3
0.39
0.36
Figure A11.1 The calculated eigenvalues, e.g. 3.1 for the 1st component, and the corresponding percentages (explained variance)
Table A11.1 Correlation matrix Inter-correlation coefficients for the 6 items A,B,C,D,E,F
A B C D E F
A – B 0.62 – C 0.58 0.60 – D 0.31 0.29 0.28 – E 0.32 0.33 0.29 0.60 – F 0.30 0.31 0.29 0.63 0.59 –
Calculus example 1 181
loadings while items D, E and F have negative loadings. Loadings are
thus related to correlation coefficients and lie between –1.0 and 1.0.
Teasdale then goes on to demonstrate that if you perform an actual explor-
ative factor analysis with rotation you will merely end up with the result seen
in Table A 11.3 . In this way the rotated factor 1 consists of A, B and C with
high (significant) loadings. The next rotated factor 2 consists of D, E and F
with high, significant, loadings, i.e., loadings above 0.30. The explorative fac-
tor analysis is statistical with ‘significant’ loadings, while the PCA, based on
sound mathematics, directly shows the loading signs (+ or −). This factor-
analytical rotation has merely ensured that all loadings will be positive!
Many people interpret the result of this PCA analysis as indicating that the first
principal component ‘measures’ a general level of intelligence because all six
items or tests have positive loadings. Russell’s typology is a good way to illustrate
that PCA is not a method with which to illustrate pure measurement techniques.
In his example Russell uses the typical Englishman. If we presume that a
typical Englishman is especially linguistically gifted while a typical continen-
tal European is especially non-linguistically gifted, then, according to Russell,
it is no use taking all six tests, or items (A, B, C, D, E, F) into consideration,
as this will often show that the typical Englishman has a high score on A and
B, but not on C, and low scores on D, E and F, and will then become atypical
if all six criteria are used as part of being a typical Englishman. According to
Russell one must move one step away from the first component and look as
the verbal tests, or items among the items in the next component with posi-
tive loadings (A, B and C). This example also shows that the sum of all six
items, or tests (A+B+C+D+E+F) is not an adequate measure of intelligence.
Table A11.2 Factor loadings for the two first principal components
A B C D E F
Component 1 0.72 0.73 0.70 0.72 0.73 0.72 Component 2 0.45 0.47 0.48 –0.49 –0.43 –0.47
Table A11.3 Explorative factor rotation
Factor loadings
Rotated factors A B C D E F
Component 1 0.83 0.84 0.83 0.16 0.20 0.18 Component 2 0.19 0.19 0.16 0.85 0.82 0.84
182 Clinical Psychometrics
In order to assess whether the total score of a collection of tests, or items,
is a sufficient measure of intelligence, or of depression, it is necessary to per-
form an item response theory (IRT) analysis (see the next Calculus Example).
Thus PCA can be used both to determine whether certain items in a scale
correlate with many of the other items in the scale, but especially to deter-
mine whether there is a dual component which can be used to classify or type
rather than to perform an actual measurement.
In the field of depression the typology of items is important when classify-
ing antidepressive drugs as either sedative or non-sedative, and measure-
ment techniques are important when assessing actual antidepressive effect.
References
Teasdale , T.W. ( 1992 ) Psykometriske aspekter af kvantitativ testning (Psychometric
aspects of quantitative testing) . In: Undersøgelsesmetoder i klinisk psykologi
(Evaluation methods in clinical psychology) (ed L. Østergaard ), pp. 112 – 35 .
København , Munksgaard .
Russell , B. ( 1956 ) My philosophical development . Routledge , London .
Child , D. ( 2006 ) The essentials of factor analysis . 3 rd edition . London , Continuum .
183
Rasch analysis ( IRT )
100%
80%
20%
3 7 12
Loweredmood
Sleepdisturbances
Guilt feelings
Total score
50%
Percentage presence of symptoms
This figure is a modified Teasdale ( 1992 ) example. It is modified in the sense
that, amongst other things, it shows three symptoms on a depression scale.
Each symptom is scored from 0 to 4; theoretically the sum should thus go
from 0 to 12.
‘Lowered Mood’ is seen to be present at a total score of approximately 3,
as half of the patients with a score of 3 have lowered mood. In contrast the
symptom ‘Guilt Feelings’ is only present in half of the patients when the total
score is approximately 7. These two symptoms fulfill the Rasch requirement
that patients with the symptom ‘Guilt Feelings’ should also demonstrate
‘Lowered Mood’. Transversely, patients who score approximately 3 only
present with Lowered Mood, not Guilt Feelings.
The case is different with the symptom: ‘Sleep Disturbances’. Among
patients with low scores some already suffer from sleep disturbances. Thus,
at a total score of around 3, approximately 20% have sleep disturbances.
Appendix 11b Calculus Example 2
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
184 Clinical Psychometrics
In patients with a total score of approximately 7, 80% present with sleep
disturbances but it is not known whether these patients also have guilt feel-
ings. The two curves in the figure showing ‘Lowered Mood’ and ‘Guilt
Feelings’ are correct item-characteristic curves according to the Rasch analy-
sis as they are S-shaped and do not intersect. The Sleep Disturbances curve is
not S-shaped and intersects both.
‘Lowered mood’ (here showing that 20% with a low total score suffer from
sleep disturbances) and further on the ‘Guilt feelings’ curve (now showing
that 20% with severe depression do not suffer from sleep disturbances). Thus
the symptom ‘Sleep disturbances’ cannot be said to play a part in such a way
that the total score is a sufficient measure of depression. The HAM-D 6 with
its six different depression symptoms fulfils the Rasch analysis.
References
Teasdale , T.W. ( 1992 ) Psykometriske aspekter af kvantitativ testning (Psychometric
aspects of quantitative testing) . In: Undersøgelsesmetoder i klinisk psykologi
(Evaluation methods in clinical psychology) (ed L. Østergaard ), pp. 112 – 35 .
København , Munksgaard .
Bech , P. ( 1984 ) The instrumental use of rating scales for depression . Pharma-
copsychiatry , 17 , 22 – 8 .
185
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
1 Bech , P. ( 2009 ) Fifty years with the Hamilton scales for anxiety and depression.
A tribute to Max Hamilton . Psychotherapy and Psychosomatics , 78 ( 4 ), 202 – 11 .
2 Feinstein , A.R. ( 1987 ) Clinimetrics . New Haven , Yale University Press .
3 Bech , P. ( 2008 ) Pichot P - A tribute to the European pharmacopsychologist on his
90th birthday . European Psychiatric Review , 2 , 76 – 80 .
4 Bech , P. ( 1993 ) Rating scales for psychopathology, health status and quality of life. A
compendium on documentation in accordance with the DSM-III-R and WHO sys-
tems . Berlin , Springer .
5 Guilford , J.P. ( 1936 ) Psychometric methods . New York , Mc Graw-Hill .
6 Sontag , S . ( 1977 ) Photography unlimited . The New York Review of Books 1977
(June 23), 26 – 31 .
7 Putman , H. ( 1995 ) Pragmatism . Oxford , Blackwell .
8 Rasmussen , H. , Erritzoe , D. , Andersen , R. , Ebdrup , B.H. , Aggernaes , B. , Oranje ,
B. , et al. ( 2010 ) Decreased frontal serotonin2A receptor binding in antipsychotic-
naive patients with first-episode schizophrenia . Archives of General Psychiatry ,
67 ( 1 ), 9 – 16 .
9 Tone , A. ( 2010 ) Andreasen, N. Interview by A. Tone. In: An oral history of neu-
ropsychopharmacology. The first fifty years (ed T . Ban ). Tennessee , American
College of Neuropsychopharmacology .
10 Høffding , H. ( 1906 ) The problems of philosophy (with a preface by William James) .
London , MacMillan .
11 Otto , R . ( 1932 ) Das Gefühl des überweltlichen. (Sensus Numinis) . Munich ,
C.H.Beck .
12 Maslow , A.H . ( 1968 ) Toward a psychology of being . New York , D. Van Nostrand Co .
13 Vannerus , A. ( 1929 ) Wundts psykologi . Stockholm , Bonniers .
14 Thomsen , R. ( 1968 ) The Pelican history of psychology . London , Penguin Books Ltd .
15 Jablensky , A. , Hugler , H. , Von Cranach , M. , & Kalinov , K. ( 1993 ) Kraepelin revis-
ited: a reassessment and statistical analysis of dementia praecox and manic-
depressive insanity in 1908 . Psychological Medicine , 23 ( 4 ), 843 – 58 .
References
186 References
16 Østergaard , L. ( 1962 ) En psykologisk analyse af de formelle skizofrene tankeforstyr-
relser (A psychological analysis of schizophrenic formal thought disorder) .
Copenhagen , Munksgaard .
17 Spearman , C. ( 1904 ) General intelligence objectively determined and measured .
American Journal of Psychology , 15 , 201 – 93 .
18 Spearman , C. ( 1927 ) The abilities of man: Their nature and measurement . New
York , Macmillan .
19 Guilford , J.P. ( 1954 ) Psychometric methods . New York , McGraw-Hill .
20 Thurstone , L.L. ( 1947 ) Multiple factor analysis: A development and expansion of
vectors of the mind . Chicago , Chicago University Press .
21 Cattell , R.B. ( 1978 ) The scientific use of factor analysis . New York , Plenum Press .
22 Comrey , A.L. , & Lee H.B. ( 1992 ) A first course in factor analysis . New York ,
Laurence Erlbaum .
23 Vernon , P.E. ( 1950 ) The structure of human abilities . London , Methuen .
24 Hotelling , H. ( 1933 ) Analysis of a Complex of Statistical Variables with Principal
Components . Journal of Educational Psychology , 24 , 417 – 41 .
25 Hotelling , H. ( 1936 ) Simplified calculation of principal components . Psychometrika ,
1 , 27 – 35 .
26 Dunteman , G.H. ( 1989 ) Principal components analysis . Newbury Park , SAGE
Publications .
27 Russell , B. ( 1956 ) My philosophical development . London , Routledge .
28 Schafer , R. ( 1948 ) The clinical application of psychological tests . New York ,
International Universities Press .
29 Kline , P. ( 1993 ) The handbook of psychological testing . London , Routledge .
30 Eysenck , H.J. , & Eysenck , S.B.G. ( 1975 ) Manual of the Eysenck Personality
Questionnaire . London , Hodder Stoughton .
31 Eysenck , H.J. ( 1953 ) The structure of human personality . London , Methuen .
32 Beckmann , J.H. ( 1995 ). Røveriets bio-psyko-sociale konsekvenser (The bio-psycho-
social consequences of robbery) . Odense , Denmark, Odense University Hospital .
33 Bech , P. , Jorgensen , B. , Jeppesen , K. , Loldrup Poulsen , D. , & Vanggaard , T. ( 1986 )
Personality in depression: concordance between clinical assessment and question-
naires . Acta Psychiatrica Scandinavica , 74 ( 3 ), 263 – 8 .
34 Thunedborg , K. , Black , C.H. , & Bech , P. ( 1995 ) Beyond the Hamilton depression
scores in long-term treatment of manic-melancholic patients: prediction of
recurrence of depression by quality of life measurements . Psychotherapy and
Psychosomatics , 64 ( 3–4 ), 131 – 40 .
35 Spielberger , C.D. , Gorsuch , R. , & Lushene , R.E. ( 1970 ) The State-Trait Inventory:
Test Manual ( STAI ) . Palo Alto, CA, Consulting Psychologist Press.
36 Digman , J.M. ( 1990 ) Personality structure: Emergence of the Five-Factor Model .
Annual Review of Psychology , 41 , 417 – 40 .
37 Wiggins , J.S. (ed.) ( 1996 ) The five factor model of personality. Theoretical
perspectives . New York , Guildford Press .
38 Hamilton , M. ( 1959 ) The assessment of anxiety states by rating . British Journal of
Medical Psychology , 32 ( 1 ), 50 – 5 .
References 187
39 Hamilton , M. ( 1960 ) A rating scale for depression . Journal of Neurology
Neurosurgery and Psychiatry , 23 , 56 – 62 .
40 Hamilton , M. ( 1969 ) Diagnosis and rating of anxiety . British Journal of Psychiatry,
Special Publication 3 , 76 – 9 .
41 Pichot , P. , Pull , C.B , von Frenckell , R. , & Pull , M.C. ( 1981 ) Une analyse factorielle
de l ’ echelle d ’ appreciation de l ’ anxieté de Hamilton . Psychiatria Fennica , 13 , 183 – 9 .
42 Bech , P. , Allerup , P. , Maier , W. , Albus , M. , Lavori , P. , & Ayuso , J.L. ( 1992 ) The
Hamilton scales and the Hopkins Symptom Checklist (SCL-90) . A cross-national
validity study in patients with panic disorders . British Journal of Psychiatry , 160 ,
206 – 11 .
43 Hamilton , M. ( 1958 ) Treatment of anxiety states. III. Components of anxiety and
their response to benactyzine . Journal of Mental Science , 104 ( 437 ), 1062 – 8 .
44 Bech , P. , Fava , M. , Trivedi , M.H. , Wisniewski , S.R. , & Rush , A.J. ( 2011 ) Factor
structure and dimensionality of the two depression scales in STAR*D using level
1 datasets . Journal of Affective Disorders , 132 ( 3 ), 396 – 400 .
45 Overall , J.E. , & Gorham , D.R. ( 1962 ) The brief psychiatric rating scale . Psychological
reports , 10 , 799 – 812 .
46 Hedlund , J.L. , & Vieweg , B.W. ( 1980 ) The Brief Psychiatric Rating Scale BPRS: a
comprehensive review . Journal of Operational Psychiatry , 11 , 48 – 65 .
47 Binet , A. , & Simon , T . ( 1905 ) New methods for the diagnosis of the intellectual
level of subnormals (translated by Wiseman S. Intelligence and ability. London,
Penguin Books, 1967). L ’ Année Psychologique , 12 , 191 – 244 .
48 Rhoades , H.M. , & Overall , J.E. ( 1988 ) The semi-structured Brief Psychiatric
Rating Scale interview and rating guide . Psychopharmacology Bulletin , 24 , 101 – 4 .
49 Turner , W.J. ( 1963 ) Glossaries for use with the Overall and Gorham Brief Psychiatric
Rating Scale . New York , Research Division, Central Islip State Hospital .
50 Spearman , C. ( 1937 ) Psychology down the ages . London , MacMillan .
51 Nunnally , J.C. ( 1967 ) Psychometric theory . New York , McGraw-Hill .
52 Nunnally , J.C. , & Bernstein , I.R. ( 1994 ) Psychometric theory . Third ed. New York ,
McGraw-Hill .
53 Bech , P. ( 2009 ) Applied psychometrics in clinical psychiatry: the pharmacopsy-
chometric triangle . Acta Psychiatrica Scandinavica , 120 ( 5 ), 400 – 9 .
54 American Psychiatric Association . ( 1980 ) The Diagnostic and Statistical Manual of
Mental Disorders , third edition ( DSM-III ) . Washington DC , American Psychiatric
Association .
55 World Health Organization . ( 1992 ) International Classification of Disease . Tenth
Revision (ICD-10). Geneva , World Health Organization.
56 American Psychiatric Association . ( 1994 ) The Diagnostic and Statistical Manual of
Mental Disorders , Fourth Edition ( DSM-IV ) . Washington DC , American
Psychiatric Association .
57 Demjaha , A. , Morgan , K. , Morgan , C. , Landau , S. , Dean , K. , Reichenberg , A. , et al.
( 2009 ) Combining dimensional and categorical representation of psychosis: the
way forward for DSM-V and ICD-11? Psychological Medicine , 39 ( 12 ), 1943 – 55 .
58 Furr , R.M. , & Bacharach , V.R. ( 2008 ) Psychometrics . London , SAGE Publications .
188 References
59 Bech , P. ( 2008 ) The use of rating scales in affective disorders . European Psychiatric
Review , 1 , 14 – 18 .
60 Box , J.F. , & Fisher , R.A. ( 1978 ) The life of a scientist . Chichester , John Wiley .
61 Fisher , R.A. ( 1922 ) On the mathematical foundation of theoretical statistics .
Philosophical Transactions , 222 , 309 – 68 .
62 Olsen , L.W . ( 1999 ) Georg Rasch og målingsmodellerne (Georg Rasch and the
measurement models). Statistical Department, University of Copenhagen.
63 Fischer , G.H. , & Molenaar , I.W. ( 1995 ) Rasch models . Berlin , Springer .
64 Bech , P. ( 1981 ) Rating scales for affective disorders: their validity and consistency .
Acta Psychiatrica Scandinavica , 295 , 1 – 101 .
65 de Mars , C. ( 2010 ) Item response theory . Oxford , Oxford University Press .
66 Michell , J. ( 1990 ) An introduction to the logic of psychological measurement .
New York , Psychology Press .
67 Suchman , E .A. ( 1950 ) The utility of scalegram analysis . In: Measurement and pre-
dictions . (eds S.A. Stouffer , L. Guttman , & E.A. Suchman ), pp. 122 – 71 . Princeton ,
Princeton University Press .
68 Michell , J. ( 1999 ) Measurement in psychology . Cambridge , Cambridge University
Press .
69 Borsboom , D. ( 2005 ) Measuring the mind . Cambridge , Cambridge University
Press .
70 Bond , T.G. , & Fox , C.M. ( 2001 ) Applying the Rasch model . London , Lawrence
Erlbaum .
71 Allerup , P. ( 1986 ) Statistical analysis of MADRS : A rating scale . Copenhagen ,
Danish Institute for Educational Research .
72 Rasch , G. ( 1953 ) On simultaneous factor analysis in several populations . Uppsala,
Nordisk Psykologi ’ s Monograph Series No. 3 , pp. 65 – 71 .
73 Siegel , S. ( 1956 ) Nonparametric statistics for the behavioural sciences . New York ,
McGraw Hill .
74 Mokken , R.J. ( 1971 ) Theory and procedure of scale analysis . Berlin , Monton .
75 Sijtsna , K. , & Molenaar , I.W. ( 2002 ) Introduction to nonparametric item response
theory . London , Sage Publications .
76 Loevinger , J. ( 1957 ) Objective tests as instruments of psychological theory .
Psychological Reports , 3 , 635 – 94 .
77 Wittgenstein , L. ( 1953 ) Philosophical investigations . Oxford , Blackwell .
78 Ryle , G. (ed.) ( 1967 ) The revolution in philosophy . London , MacMillan .
79 Bech , P. ( 2011 ) The ABC profile of the HAM-D17 . Revista Brasileira de Psiquiatria ,
33 ( 2 ), 109 – 10 .
80 Ramsey , J.O. ( 1973 ) The effect of number of categories in rating scales in precision
of estimation of scale values . Psychometrika , 38 , 513 – 32 .
81 Freyd , M. ( 1923 ) The graphical rating scale . Journal of Educational Psychology , 14 ,
83 – 102 .
82 Asberg , M. , Montgomery , S.A. , Perris , C. , Schalling , D. , & Sedvall , G. ( 1978 ) A
comprehensive psychopathological rating scale . Acta Psychiatrica Scandinavica,
Suppl 1978 ( 271 ), 5 – 27 .
References 189
83 Bent-Hansen , J. , & Bech , P. ( 2011 ) Validity of the Definite and Semidefinite
Questionnaire version of the Hamilton Depression Scale, the Hamilton Subscale
and the Melancholia Scale . Part I. European Archives of Psychiatry and Clinical
Neuroscience , 261 , 37 – 46 .
84 Paykel , E.S . ( 1985 ) The clinical interview for depression. Development, reliability
and validity . Journal of Affective Disorders , 9 ( 1 ), 85 – 96 .
85 Hamilton , M. ( 1967 ) Development of a rating scale for primary depressive illness .
British Journal of Social & Clinical Psychology , 6 ( 4 ), 278 – 96 .
86 Fiske , D.W. ( 1983 ) Methodological perspectives on psychiatric rating scales . In:
Statistical and methodological advances in psychiatric research , (eds R.D . Gibbons , &
M.W . Dysken ), pp. 35 – 58 . Lancaster , MTP Press .
87 Lorr , M. ( 1974 ) Assessing psychotic behaviour by the IMPS . In: Psychological
measurements in psychopharmacology , (ed P . Pichot ), pp. 50 – 63 . Basel , Karger .
88 Overall , J.E. ( 1974 ) The Brief Psychiatric Rating Scale in psychopharmacology
research . In: Psychological measurements in psychopharmacology , (ed P. Pichot ),
pp. 67 – 78 . Basel , Karger .
89 Ban , T. (ed.) ( 2010 ) An oral history of neuropsychopharmacology. The first fifty
years . Brentwood, TN , American College of Neuropsychopharmacology .
90 Overall , J.E. ( 1979 ) Criteria for selection of subjects for research in biological psy-
chiatry . In: Handbook of biological psychiatry , (ed H.M.V . Praag ), pp. 359 – 91 . New
York , Decker .
91 Andersen , J. , Larsen , J.K. , Schultz , V. , Nielsen , B.M. , Korner , A. , Behnke , K. , et al.
( 1989 ) The Brief Psychiatric Rating Scale. Dimension of schizophrenia-reliability
and construct validity . Psychopathology , 22 ( 2–3 ), 168 – 176 .
92 Guy , W. ( 1976 ) Early Clinical Drug Evaluation ( ECDEU ) Assessment manual .
Rockville , National Institute of Health.
93 Cohen , J. ( 1960 ) A coefficient of agreement for nominal scales . Educational and
Psychological Measurement , 29 , 37 – 46 .
94 Cohen , J. ( 1969 ) Statistical power analysis for the behavioural sciences . Hillsdale ,
Lawrence Erlbaum .
95 Cohen , J. ( 1994 ) The earth is round (P < 0.05) . American Psychologist , 49 ,
997 – 1003 .
96 Karpatschof , B . ( 2006 ) Udforskning i psykologi. De kvantitative metoder
(Research in psychology. The quantitative methods). Copenhagen , Akademisk
Forlag .
97 Cohen , J. ( 1976 ) S tatistical power analysis for the behavioural sciences . Second Ed.
New York , Lawrence Erlbaum .
98 Bech , P. , Cialdella , P. , Haugh , M.C. , Birkett , M.A. , Hours , A. , Boissel , J.P ., et al.
( 2000 ) Meta-analysis of randomised controlled trials of fluoxetine v. placebo and
tricyclic antidepressants in the short-term treatment of major depression . British
Journal of Psychiatry , 176 , 421 – 8 .
99 Turner , E.H , Matthews , A.M. , Linardatos , E. , Tell , R.A. , & Rosenthal , R. ( 2008 )
Selective publication of antidepressant trials and its influence on apparent efficacy .
New England Journal of Medicine , 358 ( 3 ), 252 – 60 .
190 References
100 Kirsch , I. , Deacon , B.J. , Huedo-Medina , T.B. , Scoboria , A. , Moore , T.J. , &
Johnson , B.T. ( 2008 ) Initial severity and antidepressant benefits: a meta-analysis of
data submitted to the Food and Drug Administration . PLoS Medicine , 5 ( 2 ), e45 .
101 Norman , G.R. , Sloan , J.A. , & Wyrwich , K.W. ( 2003 ) Interpretation of changes in
health-related quality of life: the remarkable universality of half a standard devia-
tion . Medical Care , 41 ( 5 ), 582 – 92 .
102 Entsuah , R. , Shaffer , M. , & Zhang , J. ( 2002 ) A critical examination of the sensi-
tivity of unidimensional subscales derived from the Hamilton Depression
Rating Scale to antidepressant drug effects . Journal of Psychiatric Research ,
36 ( 6 ), 437 – 48 .
103 Bech , P. , Tanghoj , P. , Andersen , H.F. , & Overo , K. ( 2002 ) Citalopram dose-
response revisited using an alternative psychometric approach to evaluate clinical
effects of four fixed citalopram doses compared to placebo in patients with major
depression . Psychopharmacology , 163 ( 1 ), 20 – 5 .
104 Bech , P. , Tanghoj , P. , Cialdella , P. , Andersen , H.F. , & Pedersen , A.G. ( 2004 )
Escitalopram dose-response revisited: an alternative psychometric approach to
evaluate clinical effects of escitalopram compared to citalopram and placebo in
patients with major depression . International Journal of Neuropsychopharmacology ,
7 ( 3 ), 283 – 90 .
105 Bech , P. ( 2001 ) Meta-analysis of placebo-controlled trials with mirtazapine using
the core items of the Hamilton Depression Scale as evidence of a pure antidepres-
sive effect in the short-term treatment of major depression . International Journal
of Neuropsychopharmacology , 4 ( 4 ), 337 – 45 .
106 Bech , P. , Kajdasz , D.K. , & Porsdal , V. ( 2006 ) Dose-response relationship of dulox-
etine in placebo-controlled clinical trials in patients with major depressive disor-
der . Psychopharmacology , 188 ( 3 ), 273 – 80 .
107 Cattell , R.B. ( 1973 ) Personality and mood questionnaire . San Francisco , Jossey-
Bass Publishers .
108 Bech , P. , Allerup , P. , Reisby , N. , & Gram , L.F. ( 1984 ) Assessment of symptom
change from improvement curves on the Hamilton depression scale in trials with
antidepressants . Psychopharmacology , 84 ( 2 ), 276 – 81 .
109 Lingjaerde , O. , Ahlfors , U.G. , Bech , P. , Dencker , S.J. , & Elgen , K. ( 1987 ) The UKU
side effect rating scale. A new comprehensive rating scale for psychotropic drugs
and a cross-sectional study of side effects in neuroleptic-treated patients . Acta
Psychiatrica Scandinavica , 334 , 1 – 100 .
110 Casey , P. , Maracy , M. , Kelly , B.D. , Lehtinen , V. , Ayuso-Mateos , J.L. , Dalgard , O.S. ,
et al. ( 2006 ) Can adjustment disorder and depressive episode be distinguished?
Results from ODIN . Journal of Affective Disorders , 92 ( 2–3 ), 291 – 7 .
111 Rogers , S.L. , Doody , R.S. , Mohs , R.C. , & Friedhoff , L.T. ( 1998 ) Donepezil
improves cognition and global function in Alzheimer disease: a 15-week, double-
blind, placebo-controlled study. Donepezil Study Group . Archives of Internal
Medicine , 158 ( 9 ), 1021 – 31 .
112 Caroe , T.K. , & Moe , C. ( 2009 ) Adverse events causing discontinuation of done-
pezil for Alzheimer ’ s dementia . Ugeskr Laeger , 171 ( 50 ), 3690 – 3 .
References 191
113 Zimbroff , D.L. , Kane , J.M. , Tamminga , C.A. , Daniel , D.G. , Mack , R.J. , Wozniak , P.J. ,
et al. ( 1997 ) Controlled, dose-response study of sertindole and haloperidol in
the treatment of schizophrenia. Sertindole Study Group . American Journal of
Psychiatry , 154 ( 6 ), 782 – 91 .
114 Simpson , G.M. , & Angus , J.W. ( 1970 ) A rating scale for extrapyramidal side
effects . Acta Psychiatrica Scandinavica , 212 , 11 – 19 .
115 Bech , P. , Tanghoj , P. , Andreasson , K. , & Overo , K.F. ( 2011 ) Dose-response rela-
tionship of sertindole and haloperidol using the pharmacopsychometric triangle .
Acta Psychiatrica Scandinavica , 123 , 154 – 61 .
116 Lehman , A.F. ( 1996 ) Measures of quality of life among persons with severe and
persistent mental disorders . Social Psychiatry and Psychiatric Epidemiology , 31 ( 2 ),
78 – 88 .
117 Bech , P. , & Rafaelsen , O.J. ( 1980 ) Personality and manic-melancholic illness .
Psychiatria Fennica , Supplementum , 223 – 31 .
118 Bech , P . ( 2006 ) The full story of lithium . A tribute to Mogens Schou (1918–2005).
Psychotherapy and Psychosomatics , 75 ( 5 ), 265 – 9 .
119 Johnstone , E.C. , Crow , T.J. , Frith , C.D. , & Owens , D.G. ( 1988 ) The Northwick Park
“functional” psychosis study: diagnosis and treatment response . Lancet , 2 ( 8603 ),
119 – 25 .
120 Bental , R.P. ( 2003 ) Madness explained . London , Allen Lane .
121 Gjerris , A. , Bech , P. , Broen-Christensen , C. , Geisler , A. , Klysner , R. , & Rafaelsen , O.J.
( 1981 ) Haloperidol levels in relation to antimanic effect . In: Clinical pharmacol-
ogy and psychiatry (eds E. Usdin , S. Dahl , L.F. Gram , O. Lingjærde ), pp. 227 – 32 .
London , MacMillan Press .
122 Bech , P. , Gex-Fabry , M. , Aubry , J.M. , Favre , S. , & Bertschy , G. ( 2006 ) Olanzapine
plasma level in relation to antimanic effect in the acute therapy of manic states .
Nordic Journal of Psychiatry , 60 ( 2 ), 181 – 2 .
123 Greenberg , G. ( 2010 ) Manufacturing depression . The secret history of a modern
disease . London, Bloomsbury .
124 Boyer , P. , Montgomery , S. , Lepola , U. , Germain , J.M. , Brisard , C. , Ganguly , R. ,
et al. ( 2008 ) Efficacy, safety, and tolerability of fixed-dose desvenlafaxine 50 and
100 mg/day for major depressive disorder in a placebo-controlled trial .
International Clinical Psychopharmacology , 23 ( 5 ), 243 – 53 .
125 Bjerrum , H. , Allerup , P. , Thunedborg , K. , Jakobsen , K. , & Bech P. ( 1992 )
Treatment of generalized anxiety disorder: comparison of a new beta-blocking
drug (CGP 361 A), low-dose neuroleptic (flupenthixol), and placebo .
Pharmacopsychiatry , 25 ( 5 ), 229 – 32 .
126 Bech , P. ( 2007 ) Dose-response relationship of pregabalin in patients with gener-
alized anxiety disorder . A pooled analysis of four placebo-controlled trials.
Pharmacopsychiatry , 40 ( 4 ), 163 – 8 .
127 Rickels , K. , Downing , R. , Schweizer , E. , & Hassman , H. ( 1993 ) Antidepressants
for the treatment of generalized anxiety disorder . A placebo-controlled compari-
son of imipramine, trazodone, and diazepam. Archives of General Psychiatry ,
50 ( 11 ), 884 – 95 .
192 References
128 Bech , P. , Thomsen , J. , Prytz , S. , Vendsborg , P.B. , Zilstorff , K. , & Rafaelsen , O.J.
( 1979 ) The profile and severity of lithium-induced side effects in mentally
healthy subjects . Neuropsychobiology , 5 ( 3 ), 160 – 6 .
129 Trivedi , M.H. , Fava , M. , Wisniewski , S.R. , Thase , M.E. , Quitkin , F. , Warden , D. ,
et al. ( 2006 ) Medication augmentation after the failure of SSRIs for depression .
New England Journal of Medicine , 354 ( 12 ), 1243 – 52 .
130 Bech , P. , Fava , M. , Trivedi , M.H. , Wisniewski , S.R. , & Rush , A.J. ( 2012 ) Outcomes
on the Pharmacopsychometric Triangle: bupropion- SR versus buspirone aug-
mentation of citalopram in the STAR *D Trial . Acta Psychiatrica Scandinavica ,
125 ( 4 ): 342 – 348 .
131 Harper , R.S. ( 1949 ) The laboratory of William James . Harvard Alumni Bulletin
November, 169 – 73 .
132 Bech , P. ( 1999 ) Stress og livskvalitet (Stress and quality of life) . Copenhagen ,
PsykiatriFondens Forlag.
133 James , W . ( 1897 ) The will to believe . London , Longmans, Green & Co .
134 James , W. ( 1907 ) Talks to teachers . New York , Norton .
135 Bentham , J. ( 1834 ) Deontology or the science of morality . London , University of
London .
136 Ware , Jr. , J.E. , Kosinski , M. , Gandek , B. , Aaronson , N.K. , Apolone , G. , Bech , P. ,
et al. ( 1998 ) The factor structure of the SF-36 Health Survey in 10 countries:
results from the IQOLA Project . International Quality of Life Assessment.
Journal of Clinical Epidemiology , 51 ( 11 ), 1159 – 65 .
137 Murray , H.A. ( 1938 ) Exploration in personality . New York , Oxford University
Press .
138 Rasmussen , E.T. ( 1965 ) Dynamisk psykologi og dens grundlag (Dynamic psychol-
ogy and its basis) . Copenhagen , Munksgaard .
139 Dupuy , H.J. ( 1984 ) The Psychological General Well-Being Index ( PGWB ) . In:
Assessment of quality of life in clinical trials of cardiovascular therapy (eds
N.K . Wenger , M.E. Mattson , C.D . Furberg , J. Elinson ), pp. 184 – 8 . New York ,
Le Jacq Publishing .
140 Bech , P. , Gudex , C. , & Johansen , K.S. ( 1996 ) The WHO (Ten) Well-Being Index:
validation in diabetes . Psychotherapy and Psychosomatics , 65 ( 4 ), 183 – 90 .
141 Noerholm , V. , Groenvold , M. , Watt , T. , Bjorner , J.B. , Rasmussen , N.A. , & Bech , P.
( 2004 ) Quality of life in the Danish general population–normative data and
validity of WHOQOL-BREF using Rasch and item response theory models .
Quality of Life Research , 13 ( 2 ), 531 – 40 .
142 Bech , P. , Olsen , L.R. , Kjoller , M. , & Rasmussen , N.K. ( 2003 ) Measuring well-
being rather than the absence of distress symptoms: a comparison of the SF-36
Mental Health subscale and the WHO-Five Well-Being Scale . International
Journal of Methods in Psychiatric Research , 12 ( 2 ), 85 – 91 .
143 McDowell , I. ( 2010 ) Measures of self-perceived well-being . Journal of
Psychosomatic Research , 69 ( 1 ), 69 – 79 .
144 Speer , D.C. ( 1998 ) Mental health outcome evaluations . San Diego , Academic
Press .
References 193
145 Carrasco-Lucas , R. , Allerup , P. , & Bech , P. ( 2012 ) The validity of the invariant item
ordering of the World Health Organization-Five Well-Being Index in screening for
the elements of tiredness and unrested sleep within apathy in an elderly population .
146 Christensen , K.S. , Bech , P. , & Fink , P. ( 2010 ) Measuring mental health by ques-
tionnaires in primary care – unidimensionality, responsiveness and compliance .
European Psychiatric Review , 3 , 8 – 12 .
147 Bech , P. , Gormsen , L. , Loldrup , D. , & Lunde , M. ( 2009 ) The clinical effect of
clomipramine in chronic idiopathic pain disorder revisited using the Spielberger
State Anxiety Symptom Scale (SSASS) as outcome scale . Journal of Affective
Disorders , 119 ( 1–3 ), 43 – 51 .
148 Kristensen , T.S. , Borg , V. , & Hannerz , H. ( 2002 ) Socioeconomic status and psy-
chosocial work environment: results from a Danish national study . Scandinavian
Journal of Public Health , 59 , 41 – 48 .
149 Davidson , J.R.T , & Fao , E.B. ( 1993 ) Posttraumatic stress disorder . DSM-IV and
beyond . Washington DC, American Psychiatric Press .
150 Buitenhuis , J. , de Jong , P.J. , Jaspers , J.P. , & Groothoff , J.W. ( 2006 ) Relationship
between posttraumatic stress disorder symptoms and the course of whiplash
complaints . Journal of Psychosomatic Research , 61 ( 5 ), 681 – 9 .
151 Selye , H. ( 1974 ) Stress without distress . 1st ed. New York , Lippincott .
152 Selye , H. ( 1980 ) Stress uden angst (Stress without anxiety) . Copenhagen ,
Gyldendal.
153 Bech , P. ( 2002 ) Measurement issues . In: Biological psychiatry (eds H. D ’ Haenen ,
J.A. Den Boer , P. Willner ), pp. 25 – 36 . New York , John Wiley .
154 Grinker , R.R.S , Miller , J. , Sabshin , M. , & Nunnally , J.C. ( 1961 ) The Phenomena of
Depressions . New York , Hoeber .
155 Olsen , L.R. , Mortensen , E.L. , & Bech , P. ( 2004 ) Prevalence of major depression
and stress indicators in the Danish general population . Acta Psychiatrica
Scandinavica , 109 ( 2 ), 96 – 103 .
156 Olsen , L. R. ( 2007 ) Measurements of depressive illness and mental distress in the
Danish general population . Copenhagen , Copenhagen University .
157 Endler , N.S. , & Magnusson , D. ( 1976 ) Multidimensional aspects of State and
Trait anxiety: A cross-cultural study of Canadian and Swedish college students .
In: Cross-cultural anxiety (eds C.D. Spielberger , R. Diaz-Guerrero ), pp. 143 – 72 .
Washington DC , Hemisphere Publishing .
158 Awata , S. , Bech , P. , Yoshida , S. , Hirai , M. , Suzuki , S. , Yamashita , M. , et al. ( 2007 )
Reliability and validity of the Japanese version of the World Health Organization-
Five Well-Being Index in the context of detecting depression in diabetic patients .
Psychiatry and Clinical Neurosciences , 61 ( 1 ), 112 – 19 .
159 de Wit , M. , Pouwer , F. , Gemke , R.J. , Delemarre-van de Waal , H.A. , & Snoek , F.J.
( 2007 ) Validation of the WHO-5 Well-Being Index in adolescents with type 1
diabetes . Diabetes Care , 30 ( 8 ), 2003 – 6 .
160 Birket-Smith , M. , Hansen , B.H. , Hanash , J.A. , Hansen , J.F. , & Rasmussen , A.
( 2009 ) Mental disorders and general well-being in cardiology outpatients–6-year
survival . Journal of Psychosomatic Research , 67 ( 1 ), 5 – 10 .
194 References
161 Bech , P. , Bille , J. , Lindberg , L. , Waarst , S. , Lauge , N. , & Treufeldt , P . ( 2010 ) Health
of the Nation Outcome Scales (HoNOS). Ti år med HoNOS: 2000–2009 . Hillerød ,
Psykiatrisk Center Nordsjælland, Forskningsenheden .
162 Lichtenberg , P. , & Belmaker , R.H. ( 2010 ) Subtyping major depressive disorder .
Psychotherapy and Psychosomatics , 79 ( 3 ), 131 – 5 .
163 Lam , R.W. , Michalak , E.E. , & Swinson , R.P . ( 2006 ) Assessment scales in depression
and anxiety . London , Taylor & Francis .
164 Rush , A.J. ( 2007 ) STAR*D: what have we learned? American Journal of Psychiatry ,
164 ( 2 ), 201 – 4 .
165 Gottesman , I.I. , & Gould , T.D. ( 2003 ) The endophenotype concept in psychia-
try: etymology and strategic intentions . American Journal of Psychiatry , 160 ( 4 ),
636 – 45 .
166 Körner , S. ( 1986 ) The philosophy of mathematics . New York , Dover Publications .
167 Barrett , C. (ed.) ( 1966 ) Wittgenstein . Oxford , Blackwell .
168 Regis , E. ( 1987 ) Who got Einstein ’ s office? New York , Addison-Wesley .
169 Angst , J. ( 1966 ) Zür Ätiologie und Nosologie endogener depressiver Psychosen .
Berlin , Springer .
170 Stieglitz , R.D. , Fahndrich , E. , & Renfordt , E. ( 1988 ) Interrater study for the
AMDP system . Pharmacopsychiatry , 21 ( 6 ), 451 – 2 .
171 Angst , J. , Adolfsson , R. , Benazzi , F. , Gamma , A. , Hantouche , E. , Meyer , T.D. , et al.
( 2005 ) The HCL-32: towards a self-assessment tool for hypomanic symptoms in
outpatients . Journal of Affective Disorders , 88 ( 2 ), 217 – 33 .
172 Hirschfeld , R.M. , Williams , J.B. , Spitzer , R.L. , Calabrese , J.R. , Flynn , L. , Keck , Jr ,
P.E. , et al. ( 2000 ) Development and validation of a screening instrument for bipo-
lar spectrum disorder: the Mood Disorder Questionnaire . American Journal of
Psychiatry , 157 ( 11 ), 1873 – 5 .
173 Moller , H.J . ( 2001 ) Methodological aspects in the assessment of severity of
depression by the Hamilton Depression Scale . European Archives of Psychiatry
and Clinical Neurosciences , 251 Suppl 2 , II13 – 20 .
174 Moller , H.J. ( 2009 ) Standardised rating scales in psychiatry: methodological
basis, their possibilities and limitations and descriptions of important rating
scales . World Journal of Biological Psychiatry , 10 ( 1 ), 6 – 26 .
175 Guidi , J. , Fava , G.A. , Bech , P. , & Paykel , E.S. ( 2011 ) The Clinical Interview for
Depression: A comprehensive review of studies and clinimetric properties .
Psychotherapy and Psychosomatics , 80 , 10 – 27 .
176 Paykel , E.S. , Klerman , G.L. , & Prusoff , B.A. ( 1970 ) Treatment setting and clinical
depression . Archives of General Psychiatry , 22 , 11 – 21 .
177 Paykel , E.S. ( 1990 ) Use of the Hamilton Depression Scale in general
practice . In: The Hamilton Scales (eds P. Bech, A. Coppen) , pp. 40 – 9 . Berlin ,
Springer .
178 Lingjaerde , O. , Edlund , A.H. , Gormsen , C.A. , Gottfries , C.G. , Haugstad , A. ,
Hermann , I.L ., et al. ( 1974 ) The effects of lithium carbonate in combination with
tricyclic antidepressants in endogenous depression. A double-blind, multicenter
trial . Acta Psychiatrica Scandinavica , 50 ( 2 ), 233 – 42 .
References 195
179 Bech , P ., Malt , U.F ., Dencker , S.J ., Ahlfors , U.G ., Elgen , K ., Lewander , T ., et al.
( 1993 ) Scales for assessment of diagnosis and severity of mental disorders . Acta
Psychiatrica Scandinavica , 87 (Supplementum 372 ), 1 – 91 .
180 Williams , J.B.W. ( 1990 ) Structured interview guide for the Hamilton Rating
Scale . In: The Hamilton Scales (eds P . Bech , A. Coppen ), pp. 48 – 63 . Berlin ,
Springer .
181 Williams , J.B . ( 2001 ) Standardizing the Hamilton Depression Rating Scale: past,
present, and future . European Archives of Psychiatry and Clinical Neurosciences ,
251 Suppl 2 , II6 – 12 .
182 Williams , J.B. , Kobak , K.A. , Bech , P. , Engelhardt , N. , Evans , K. , Lipsitz , J. , et al.
( 2008 ) The GRID-HAMD: standardization of the Hamilton Depression Rating
Scale . International Clinical Psychopharmacology , 23 ( 3 ), 120 – 9 .
183 Rush , A.J. , Giles , D.E. , Schlesser , M.A. , Fulton , C.L. , Weissenburger , J. , & Burns ,
C. ( 1986 ) The Inventory for Depressive Symptomatology (IDS): preliminary
findings . Psychiatry Research , 18 ( 1 ), 65 – 87 .
184 Fleck , M.P. , Poirier-Littre , M.F. , Guelfi , J.D. , Bourdel , M.C. , & Loo , H. ( 1995 )
Factorial structure of the 17-item Hamilton Depression Rating Scale . Acta
Psychiatrica Scandinavica , 92 ( 3 ), 168 – 72 .
185 Lecrubier , Y. , & Bech , P. ( 2007 ) The Ham D(6) is more homogenous and as
sensitive as the Ham D(17) . European Psychiatry , 22 ( 4 ), 252 – 5 .
186 Overall , J.E. , Gorham , D.R. ( 1988 ) The Brief Psychiatric Rating Scale (BPRS) .
Recent developments in ascertainment and scaling. Psychopharmacology Bulletin ,
24 , 97 – 9 .
187 Kay , S.R. , Opler , L.A. , Lindenmayer , J.P. ( 1988 ) Reliability and validity of the
positive and negative syndrome scale for schizophrenics . Psychiatry Research ,
23 ( 1 ), 99 – 110 .
188 Van Os , J. , Gilvarry , C. , Bale , R. , Van Horn , E. , Tattan , T. , White , I ., et al. ( 1999 )
A comparison of the utility of dimensional and categorical representations of
psychosis . UK700 Group. Psychological Medicine , 29 ( 3 ), 595 – 606 .
189 Mellenbergh , G.J. ( 1994 ) Generalized linear item response theory . Psychological
Bulletin , 115 , 300 – 7 .
190 Quine , W.V. ( 1985 ) The time of my life . Boston , MIT Press .
191 Bech , P. ( 2002 ) The Bech-Rafaelsen Melancholia Scale (MES) in clinical trials of
therapies in depressive disorders: a 20-year review of its use as outcome measure .
Acta Psychiatrica Scandinavica , 106 ( 4 ), 252 – 64 .
192 Bech , P. ( 2005 ) The Bech-Rafaelsen Mania and Melancholic Scales in clinical
trials . In: Focus on bipolar research (ed M.C. Brown) , pp. 131 – 51 . New York , Nova
Science Publishers .
196
Clinical Psychometrics, First Edition. Per Bech.
© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.
Note: Page references in bold refer to entries in the Glossary
ABC Hamilton Depression Scale 84,
222–5, 131
ADAS (Alzheimer’s Disease Assessment
Scale) 59
alcohol 69
Allerup, Peter 37, 108
allostasis 85, 88, 109
alprazolam 71
Alzheimer’s Disease Assessment Scale
(ADAS) 59
AMDP (Arbeits-Gemeinschaft für
Methodik und Dokumentation
in den Psychiatrie)
system 104
American College of
Neuropsychopharmacology
(ACNP) 106
amitryptyline 105
Andersen, A.F. 34
Andreasen, Nancy 5
Angst, Jules 104
antianxiety medication 69–72
antidementia medication 59–60, 93
antidepressants 36, 56, 57, 66–9
combination of 72–3
tricyclics 66, 106
antimanic medication 65–6
antipsychotic medication 60–4, 66
anxiety 18
Anxiety Symptom Scale (ASS) 86, 92,
93–4, 160
applied mathematics 102
Arbeits-Gemeinschaft für Methodik und
Dokumentation in den
Psychiatrie (AMDP) system 104
Bacharach, V.R. 30, 31
Bech-Rafaelsen Mania Scale (see MAS)
Bech-Rafaelsen Melancholia Scale
(see MES)
Beck Depression Inventory (BDI) 87, 116
BDI version 6 146–7
Beck’s cognitive model of depression
86, 87
Bental, R.P. 65
Bentham, Jeremy 75
benzodiazepines 69, 70, 71
Bernstein, I. R. 26
beta-blocker 70
between-groups analysis 107
bi-directional factor 13–15
Big Five model 18
Big Two model 18
Binet, Alfred 24, 26
bipolar aff ective disorder 63, 65, 104
bipolar factor 12, 13
Bolwig TG 52
Boring, Edwin 108
Index
Index 197
Borsboom, D. 37, 38
brain research 3–4
Brief Psychiatric Rating Scale
(BPRS) 24–6, 27, 28, 44, 46, 47,
50, 52, 61, 107–8, 165–6
British Association for
Psychopharmacology (BAP) 106
buspirone 73
Cade, John 65
Calvinism (pharmacological) 109
Cattell, R.B. 13, 56
ceiling items 35, 36, 39
Centre for Epidemiologic Studies
Depression Scale (CES-D) 92
Chi-Squared Test 38
chloral hydrate 8
chlorpromazine 20, 23, 27, 46, 60, 107
Chomsky, Noam 86
citalopram 67, 68, 73
classical psychometric procedures
40–1
Clinical Global Impression Scale,
Severity (CGI-S) 50
Clinical Interview for Depression
(CID) 45, 105
Clinical Interview for Depression and
Related Syndromes (CIDRS) 44,
45, 170–5
clinimetrics 1, 23, 30, 86, 109
clonazepam 71
coeffi cient of homogeneity
Loevinger 20, 40, 108
Mokken 61, 64, 85
coeffi cient of reliability 27
Cohen, Jacob 50–2, 55, 85
Collegium Internationale Neuro-
Psychopharmacologicum
(CINP) 106
compliance 81, 109–10
Comprehensive Psychopathological
Rating Scale 44
computer adopted testing (CAT) 36
computer assisted tomography (CAT)
scan 5
Comrey, A.L. 26
contra-phobic reaction 47
Copenhagen lecture (Hamilton)
117–21
correlation coeffi cient 11, 26, 110
correlation matrix 13
cortisol 85–6, 88
critical monism 6, 38
Cronbach’s alpha 26, 30–1, 50, 82,
92, 96
cross-over analysis 107
Cushing, H.W. 86
Cushing’s Disease 86
Darwin, Charles 32
Davidson, Donald 38
Dein, Erling 9, 48, 52
Delay, J. 23
depression 3, 34–5, 47
subtypes 98
unipolar 104
depression ruler 48, 49
Derogatis L.R.108
desvenlafaxine 68, 69
Diagnostic and Statistical Manual of
Mental Disorders (DSM)
27, 48
DSM-I 27
DSM-III 28, 29, 43, 82
DSM-IV 27–31, 32, 43, 84–5,
89, 108
DSM-V 29
diazepam 69–70, 71, 72
donepezil 59–60
dose-response relationship 53, 57, 68
dual factor 12, 13, 23
Early Clinical Drug Evaluation
(ECDEU) manual (Guy) 106
eff ect size 50–2, 53–6
in pharmacopsychometric
triangle 56–7
escitalopram 66, 67, 68
extrapyradminal symptoms (EPS) 61
extraversion/introversion 16
198 Index
Eysenck, Hans 1, 15–19, 20, 27, 81
Extraversion scale 18
Personality scale 95
Personality Questionnaire (EPQ) 16,
18, 104
Neuroticism scale 16, 17, 18, 19, 88
factor analysis 10–12, 14, 24, 26, 29, 31,
49, 95–7, 102, 110, 179
British vs American 12–13
vs item response theory (IRT)
analysis 39–42
personality questionnaires and 15–20
rating scales and 20–3
family resemblances 102
Fechner, Gustav 5
Feighner criteria 110–11
Feinstein, Alvan R. 23, 30, 109
Fisher, Ronald A. 1, 13, 32–3, 34
Fisher’s exact test 38
Fleck, Marcelo 106
fl oor items 36
fl uoxetine 55
Frank, Jerry 108
Freud, Sigmund 1, 9, 16, 43, 47, 81,
102, 103
personality theory of neuroticism 16
Friis-Hasché, Erik 91
Furr, R.M. 30, 31
Galton, Francis 1, 32, 43
Gaussian bell curve 33
general factor 11, 12
General Health Questionnaire (GHQ) 81
Global Depression Scale 51
Gorham, Don 24, 27, 46, 107
graphic rating scales 43
Greenberg, G. 107
GRID-HAM-D 106
Grinker R. 87
Guelfi , J.D. 106
Guilford, J.P. 12, 26
guilt feelings 35, 36, 39
Guttman, Louis 37, 108
cumulative model 37, 40, 42, 43
haloperidol 60–1, 61–3, 65
HAM-A 11, 20, 28, 31, 45, 46, 47, 70,
71, 95
HAM-A6 21, 22, 70, 71, 72
HAM-A13
21
HAM-A14
21, 22, 23, 71, 72, 105,
154–9
HAM-D 3, 20–4, 26, 28, 31, 45, 46, 47,
48, 51, 53, 86–7, 97, 105
GRID version 45
HAM-D6 4, 22, 36,49, 50, 55, 56, 57,
66, 68
clinician version 141–2
Questionnaire 143–4
HAM-D9 84
HAM-D17
22, 39, 42, 52, 54–7, 68, 69,
98–101, 122–5, 126–31
ABC version 84
HAM-D21
56
HAM-D24
132–4
Hamilton, Max 1, 20–3, 27, 46–7, 102,
103, 105, 108, 117–21
Hamilton Anxiety Scale see HAM-A
Hamilton Depression Scale see
HAM-D
Helmholtz, Hermann von 5
Hippius, Hanns 103, 104
Høff ding, Harald 6, 38
Hollister, Leo 46, 107
Hospital Anxiety and Depression Scale
(HADS) 70
Hotelling, Harold 13–14, 26, 33, 42,
102, 179
Hypomania Checklist (HCL-32) 104
idiographic method of measurement
17
imipramine 27, 46, 58, 68, 69, 71, 72
indices of validity 48
Inpatient Multi-dimensional Scale
(IMPS) 46
intelligence tests 10–12, 24, 26
International Classifi cation of Disease
(ICD) (WHO) 28
ICD-6 27
Index 199
ICD-10 27–31, 32, 43, 48, 82, 84–5,
89, 98, 108
hierarchy or ladder 58–9
ICD-11 29
intraclass coeffi cient 27
invariant item ordering 39
Inventory of Depressive
Symptomatology (IDS-30) 106
item parameter diffi culty 35
item response theory (IRT) analysis 26,
29–31, 34–8, 43, 47, 48, 49, 54,
56, 96, 108, 182
vs factor analysis 49–50
non-parametric analysis for
39–42
Jacobsen, Ove 48, 52
James, William 74, 108
Jessen, Borge 34
Jung, Carl Gustav 16
Kant, Emanuel 3, 4, 102
Kaplan-Meier curves 93
Kappa coeffi cient 27, 51
Karpatchof, Benny 53, 54
Kay, Stanley R. 107
Kirsch, I. 56
Klerman, G.L. 28, 29, 106
Kline, P. 16
Kraepelin, Emil 1, 2, 6–9, 9–10, 20, 27,
74, 95, 102, 103, 104, 108
‘diagnostic cards’ 7, 8
Psychiatric Compendium 7
symptom checklist 6–9
Kruskal-Wallis One-Way Analysis of
Variance by Ranks 39
Lam, R.W. 97–101
language-game approach 42
Last Observation Carried Forward
(LOCF method) 55
Lecrubier, Yves 106
Lehmann, Alfred 9
Likert, Rensis 43–5
Likert response 43
Likert scale 40, 44, 108
Lindenmayer, J.P.107
Lingjærde, Odd 106
lithium 20, 29, 63, 65, 72, 106
local independency of items 38, 50, 54
Loevinger, Jane 20, 40, 108
Loevinger coeffi cient of
homogeneity 20, 40, 108
Loo, H. 106
Lorr, M. 46
MADRS 37, 44–5, 66, 68
ABC scoring sheet 44
magnetic resonance imaging (MRI) 5
Major Depression Inventory (MDI) 86,
89, 148–53
mania 29
MAS 52, 65, 66, 161–4
manic-depressive disorder 8, 10
medical model (etiological considerations)
29, 91, 97–98, 170–6
medical stress model (Selye) 82, 83,
85–6
MES 54, 115, 116, 136–9
Mindham 105
MINI International Neuropsychiatric
Interview (MINI) 106
Mini Mental State Examination
(MMSE) 59–60, 92
Minnesota Multiphasic Personality
Inventory (MMPI) 107
Mitchell, J. 36–7
modern psychometric procedures
40–2
Mokken, Robert J. 1, 39–40, 43, 108
coeffi cient of homogeneity 61,
64, 85
Molenaar, I.W. 40
Mood Disorder Questionnaire
(MDQ) 104
Möller, Hans Jürgen 104
Montgomery-Åsberg Depression Rating
Scale see MADRS
mood stabilising medications 72
morphine 8
200 Index
National Institute of Mental Health
(NIMH) 107
NEO-PI-R 18
Neuropsychiatric Inventory (NPI) 60
neuroticism 81
New Clinical Drug Evaluation Unit
(NCDEU) 107
Newcastle Diagnostic Depression Scale
(1965) 176–7
nominal scale 8, 16, 38, 39
non-parametric statistics 38–9, 108
non-reductive monism 6, 38
Nørholm, Vibeke 89
normal (Gaussian) distribution 33
normothetic method 17
Nunnally, J.C. 26
Ockham, William 26
Ockham’s razor 26
olanzapine 66
Olsen, Lis Raabæk 91
ordinal scale 39
Østergaard, Lise 9–10, 48
Overall, John 24, 27, 45–8, 103, 107
Paykel, Eugene 105
Parkinson’s Disease 61
parsimony, law of 26
Patient Related Inventory of Side Eff ects
(PRISE-20) 178
Pearson, Karl 1, 26, 33
Pearson’s correction 39
Perry, Ralph Barton 108
pharmacopsychology 2, 6–9
pharmacopsychometric triangle 56–9,
61, 66, 70, 71, 72, 73, 97
pharmacopsychometrics 96
phenemal 8, 60, 70, 71
phenotyping 101
Pichot, Pierre 1, 23–6, 27, 47, 102, 103,
106, 107, 108
pimozide 65
population-independent
response-curve 69
population studies in depression and
anxiety 89–94
Positive and Negative Syndrome Scale
(PANSS) 4–5, 30, 44, 45, 47, 61,
107–8
positive manifold 13
positron emission tomography (PET)
scanning 4, 5
post-traumatic stress disorder
(PTSD) 82–4
pregabalin 71
Present State Examination (PSE) 8
primary depression 111
principal component analysis
(PCA) 13–15, 26, 42, 96, 179–82
PRISE 20 (Patient Related Inventory of
Side Eff ects) 178
propranolol 70
psychoanalysis 1, 9, 102, 111
Psychological General Well-Being
(PGWB) 78, 79
psychomotor retardation 35, 36, 39
psychopharmacology 111
psychotic symptom items 4
Putman, H. 4, 102
Q-LES-Q 66, 68
quality of life 61, 74–5
Quality of Life scale 56, 58, 59, 60, 66,
68, 70
Quine, William Van Orman 4, 108
Rafaelsen O. 116
ramifi ed hierarchy of typology
(Russell) 14, 42
rank order tests 38
Rasch, Georg 1, 26, 34–8, 47, 102, 108
Rasch analysis 34, 36, 37, 39, 40, 43, 49,
50, 56, 89, 183–4
reductionism 36, 111
relapse 100, 101, 111
reliability (questionnaire) 30, 111
reliability (rating scale) 27–8, 29, 111–12
reliability, coeffi cient of 27
Index 201
remission 16, 45, 63, 72, 101, 112
response 101, 112
Rorschach, Hermann 9
Rorschach test 9–10, 16, 17, 27, 81, 107
Rush, John 106
Russell, Bertrand 14–15, 42, 108, 180, 181
scale step measurements 43–5
Scandinavian College of
Neuro-Psychopharmacology
(SCNP) 106
Schafer, R. 15
schizophrenia 5, 8, 9, 10, 29, 47, 61, 65
schizophrenicity 96, 165–6
Schou, Mogens 65
screening scales 92
selective serotonin reuptake inhibitor
(SSRI) 53–4, 66
Self-perceived Stress Scale (Cohen) 85
Selye, Hans 82, 83, 85–6
‘sensus numinis’ 6
serotonin and noradrenaline reuptake
inhibitors (SNRI) 68
sertindole 61–3, 65
SF-12 75, 76
SF-36 (Medical Outcomes Studies, Short
Form) 72, 75–8
Sheehan, David 107
Sheehan’s Disability Scale 92
Siegel, Sidney 1, 38–9, 108
Sijtsna, K. 40
Simpson-Angus scale 61, 63
Skinner, Fred 108
Spearman, Charles 1, 6, 10–13, 14, 17,
24, 27, 33, 95, 102
Spearman correlation analysis 39
Spielberger State Anxiety Scale
(STAI) 19, 91–2, 93
Spielberger, Charles 18
antianxiety model 86, 88
Spitzer, R.L. 28
standardisation 112, 115
STAR-D analysis 23
Statistical Analysis System (SAS) 49
statistical uncertainty 48
stress 82–8
Strömgren, Bengt 103
Strömgren, Erik 65, 103, 108
suffi ciency, concept of 34
suffi cient rating scales 45–8
suffi cient statistic 32, 34, 37, 41–3, 49,
54, 61, 89, 97
suicidal ideation 35, 36
Suppes, Patrick 38
Symptom Checklist (SCL)
SCL-90 85, 97, 108
SCL-90-R 108
SCL-92 97, 108
SCL-D6 145
symptom checklist (Kraepelin) 7, 9, 95
Teasdale. Th omas 179
test-retest reliability coeffi cient 30
Th urstone, L.L. 12, 46
trait anxiety 18
transferability 36, 38, 41, 56, 96–7, 112
translation procedure 115
trazodone 71
tricyclic antidepressants 66, 106
Turner, William J. 24, 44, 45, 56
UKU (Udvalg for Kliniske
Undersøgelser) Scale 58, 106
Side Eff ect Rating Scale 106
unidimensionality 68, 112
unipolar depression 104
validity (clinical) 1, 11, 15, 18, 23–6, 37,
48–9, 112
validity (external) 34, 113
validity (psychometric) 28–9, 37, 48–9,
112–13
Vanggaard, Th orkild 81
Vannerus, A. 6
venlafaxine 68, 71
Vernon, P.E. 13
visual analogue scale (VAS) 50, 113
Vitger John 52
202 Index
WHO-5 questionnaire 71, 72,
78–81, 97
predictive value 92, 93
quality-of-life scale 68, 89
Well-Being Index (1998
version) 167–8
Wilcoxon Signed Rank Test 39
Williams, Janet 106
Window (time frame) 113
Wittgenstein, Ludwig 4, 40, 42, 53, 102
work-related stress condition 84–5
Wundt, Wilhelm 1, 3, 5–6, 6–7, 10, 28,
29, 32, 38, 74, 75, 95
Yates’ correction 38
Young Mania Rating Scale (YMRS) 66