clinical psychometrics

211
Clinical Psychometrics

Upload: thamer-al-taim

Post on 22-Oct-2015

95 views

Category:

Documents


12 download

DESCRIPTION

Clinical Psychometrics

TRANSCRIPT

Page 1: Clinical Psychometrics

Clinical Psychometrics

Page 2: Clinical Psychometrics

Clinical Psychometrics

Per Bech

A John Wiley & Sons, Ltd., Publication

Page 3: Clinical Psychometrics

This edition first published 2012 © 2012 by John Wiley & Sons, Ltd

Danish original title: Klinisk psykometri, by Per Bech, ISBN 97887628-1011-2, copyright Munksgaard Danmark, Copenhagen 2011.

This edition of Klinisk psykometri is published with the title “Clinical Psychometrics”, by arrangement with Munksgaard Danmark.

Wiley-Blackwell is an imprint of John Wiley & Sons, formed by the merger of Wiley’s global Scientific, Technical and Medical business with Blackwell Publishing.

Registered Office John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Offices 9600 Garsington Road, Oxford, OX4 2DQ, UK 111 River Street, Hoboken, NJ 07030-5774, USA

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell

The right of the author to be identified as the author of this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

The contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting a specific method, diagnosis, or treatment by physicians for any particular patient. The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of fitness for a particular purpose. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. Readers should consult with a specialist where appropriate. The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising herefrom.

Library of Congress Cataloging-in-Publication Data

Bech, Per. [Klinisk psykometri. English] Clinical psychometrics / Per Bech. – 1st ed. p. ; cm. Includes bibliographical references and index. ISBN 978-1-118-32978-8 (pbk. : alk. paper) 1. Psychometrics. 2. Psychiatry. I. Title. [DNLM: 1. Psychometrics–history. 2. Factor Analysis, Statistical. 3. Psychology, Clinical– instrumentation. 4. Psychopharmacology. BF 39] BF39.B417 2012 150.1′5195–dc23

2012009839

A catalogue record for this book is available from the British Library.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Cover image: © Todd Harrison – iStockphoto.com Cover design by Sarah Dickinson

Set in 9.5/12pt Minion by SPi Publisher Services, Pondicherry, India

1 2012

Page 4: Clinical Psychometrics

I attempted to effect the scientific in my psychopathology by methodological

investigations, not by a dogmatic exposition of a complete psychiatric

epistemology.

Karl Jaspers (1950)

The debt of psychiatry to the psychologist is now great and growing. From

[Eysenck’s] rigorous inquiries, sustained and resourcefully developed over

years, psychiatry stands to gain an impetus and accuracy in some essential

matters which will advance it and reinforce the free play of clinical skill and

insight.

Aubrey Lewis (1952)

Emil Kraepelin is probably the most outstanding psychiatrist who ever lived.

Max Hamilton (1978)

To Ole Rafaelsen, a man larger than life, and to Erling Dein who

showed me how to use Occam’s razor in psychopathology

Page 5: Clinical Psychometrics

About the author, ix

Preface, x

Introduction, 1

1. Classical psychometrics, 3

Emil Kraepelin: Symptom check list and pharmacopsychology, 6

Charles Spearman: Factor analysis and intelligence tests, 10

Harold Hotelling: Principal Component Analysis, 13

Hans Eysenck: Factor analysis and personality questionnaires, 15

Max Hamilton: Factor analysis and rating scales, 20

Pierre Pichot: Symptom rating scales and clinical validity, 23

2. Modern psychiatry: DSM-IV/ICD-10, 27

Focusing on reliability, 27

Focusing on validity, 28

Quantitative, dimensional diagnosis, 29

3. Modern dimensional psychometrics, 32

Ronald A. Fisher: From Galton’s pioneer work to the suffi cient statistic, 32

Georg Rasch: From Guttman’s pioneer work to item response theory

analysis (IRT), 34

Sidney Siegel: Non-parametric statistics, 38

Robert J. Mokken: Non-parametric analysis for item response

theory (IRT), 39

4. Modern psychometrics: Item categories and suffi cient statistics, 43

Rensis Likert: Scale step measurements, 43

John Overall: Brief, suffi cient rating scales, 45

Contents

vii

Page 6: Clinical Psychometrics

viii Contents

Clinical versus psychometric validity, 48

Item-response theory versus factor analysis, 49

Jacob Cohen: Eff ect size, 50

5. The clinical consequence of IRT analyses: The pharmacopsychometric triangle, 53

Eff ect size and clinical signifi cance, 53

Th e pharmacopsychometric triangle, 56

Antidementia medication, 59

Antipsychotic medication, 60

Antimanic medication, 65

Antidepressive medication, 66

Antianxiety medication, 69

Mood stabilising medications, 72

Combination of antidepressants, 73

6. The clinical consequence of IRT analyses: Health-related quality of life, 74

Th e WHO-5 Questionnaire, 78

7. The clinical consequences of IRT analyses: The concept of stress, 82

Post-traumatic stress disorder, 82

Th e work-related stress condition, 84

Integration of Selye’s medical stress model, 85

8. Questionnaires as ‘blood tests’, 89

Population studies in depression and anxiety, 89

Th e predictive validity of WHO-5, 92

Screening scales, 92

9. Summary and perspectives, 95

10. Epilogue: Who’s carrying Einstein’s baton?, 103

Glossary, 109

Appendices, 114

References, 185

Index, 196

Page 7: Clinical Psychometrics

ix

Per Bech

Per Bech received a medical degree from Copenhagen University in 1969.

In 1972 he received a gold medal award from Århus University for his thesis

on the dose-response relationship between cannabis ( tetrahydrocannabinole)

and various psychological measurements, including time experience and

reaction time in simulated car driving.

He completed a doctorate thesis (Dr. Med. Sci) at Copenhagen University

on the clinical and psychometrical validity of rating scales in depression and

mania in 1981.

He was appointed Professor of Psychiatry at Odense University in 1992

and in 2008 he was appointed Professor of Applied or Clinical Psychometrics

at Copenhagen University.

Since 1981 he has held the post of chief psychiatrist at The Mental Health

Centre North Zealand in Hillerød (Capital Region of Denmark) and is

Head of the Psychiatric Research Unit there. He is an honorary member of

the Royal College of Psychiatrists and of the European Psychiatric

Association (EPA).

About the author

Page 8: Clinical Psychometrics

x

The first edition of this book was the original Danish version published

in January 2011, as an introduction to the very broad field covering clinical

psychology, psychiatry and clinical psychopharmacology. It was an attempt

to follow Kraepelin’s rating scale approach and his pharmacop sycho-

metrics as they have developed in the twentieth century, especially with the

introduction of psychopharmacology in the 1960s. The central concept

here is the Pharmacopsychometric Triangle, in which (A) covers desired

clinical effect, (B) unwanted effects, or side effects, and (C) patient-reported

quality of life. In connection with (A), short psychometric scales are

described which can be used to measure such classes of drugs as

antide mentias, antipsychotics, antimanics, antidepressants, antianxiety

drugs, and mood stabilisers.

The psychometric performances of scales for (A), (B) and (C) are described

with reference to both factor analysis and to item response theory models.

These models have been amended for readers without mathematical knowl-

edge. However, throughout the book experienced psychiatrists are referred

to as an index of validity in an attempt to bring the symptoms home to the

dimensions within (A), (B) and (C) where they belong.

My thanks when preparing the Danish version of my book went, as so

often before, to Peter Allerup, Professor of Theoretical Psychometrics at the

University of Århus. He has been a ‘basic factor’ for my work with rating

scales over nearly 40 years! My research coordinator Lone Lindberg has

made a unique contribution, with invaluable help in typing and layout. Gabriele

Bech-Andersen and Susan Søndergaard are behind the translation procedures

for the scales in the Danish version, and Susan has translated this English

version from the Danish. Ove Aaskoven has been my statistical research

assistant for many years, often in collaboration with Peter Allerup. Finally, I

owe a debt of thanks to the Munksgaard editors Marie Schack and Daniel R.

Andersen who made helpful suggestions for the earlier Danish versions.

In this English version editor Jesper Konradsen has raised challenging

queries, especially on the philosophical lines running through it, with

Preface

Page 9: Clinical Psychometrics

Preface xi

focus on the development of psychometrics from a philosophical start to

mathematical aspects of measuring mental stages, to clinical validity and

dose–response relationships and then back to the philosophy of Wittgenstein,

which brings symptoms home to form relevant syndromes or dimensions.

Page 10: Clinical Psychometrics

1

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Clinical psychiatry has incorporated psychology as an important auxiliary

subject in the same way as neuropharmacology and neuroanatomy. As a

branch of medicine, clinical psychiatry has especially attempted to deter-

mine the organic cause of mental disorders; and before the establishment of

psychometrics, the psychological approach to patients was seen as a non-

organic explanatory model for mental disorders. Freud’s psychoanalysis, in

particular, was seen as a psychological explanatory model; partly because

psychiatry was regarded for many years as an atypical branch of medicine

due to the non-testability of the Freudian theories, which were thus without

clinical validity ( 1 ).

The scientific approach to psychology launched by psychometrics has

resulted in psychiatry being regarded as a clinical branch of medicine. This

only took place with the 1987 publication of Feinstein’s monograph on clini-

metrics ( 2 ). Finding a comprehensive overview of the role of psychometrics

in clinical psychiatry has proved difficult. The following is an attempt to put

this to rights.

It falls naturally to divide clinical psychometrics into two eras. The first of

these, the classical era, covers the period from 1879 to 1945. It is the era of the

greatest names: Wilhelm Wundt who founded psychometrics in 1879 and his

two most important pupils; Kraepelin and Spearman. The modern period

developed after 1945 has Eysenck, Hamilton and Pichot as the major psychom-

etricians. They developed the questionnaires and rating scales archetypal of

modern clinical psychometrics in the period from 1945 to the 1970s ( 3 ). From

a statistical point of view, however, Francis Galton and his London psychomet-

ric laboratory (founded in 1884) are essential elements, together with Galton’s

two most important ‘students’ (Pearson and Fisher) and the three people

(Rasch, Siegel and Mokken) who developed the psychometric analyses that are

Introduction

Page 11: Clinical Psychometrics

2 Clinical Psychometrics

archetypal of modern clinical psychometrics in the period from 1945 to the

1970s ( 4 ) (see Figure  I.1 ).

The most obvious impact of modern psychometric research, which has

resulted in short valid rating scales and the descriptive statistics of effect

sizes, is the pharmacopsychometric triangle. It was the revolution in phar-

macology 50 years ago that led to the rebirth of Kraepelin’s pharmacopsy-

chology, now crystallised in the pharmacopsychometric triangle, the

major focus of this book.

Psychometrics

WundtLeipzig (1879–1904)

GaltonLondon (1884)

Kraepelin Spearman (1904)Factor analysis

(1883)DSM III/IV

ICD-10(1994)

(1892)Pharmaco-psychology

Pichot(1974)

Eysenck(1953)

Hamilton(1967)

Pearson (1911)The grammar of science

Fisher (1922)Sufficient statistic

Siegel (1956)Nonparametric

statistic

Item scoreLikert (1932)

Anchoring points

Rasch (1960)IRT

Total scoreCattell (1973)Transferability

Hotelling (1933)PCA

Figure I.1 Psychometrics

Page 12: Clinical Psychometrics

3

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

More than a century ago, psychology was defined as the science of human

mental manifestations and phenomena. However, it was psychometrics (the

science of measuring these mental manifestations and phenomena) that

made psychology scientific. Thus, psychometrics is a purely psychological

area of research.

From a historical point of view, psychology branched out from philosophy

as an independent university discipline at the close of the nineteenth

century. It all started in Leipzig in 1879. Here the philosopher Wilhelm Wundt

(1832–1920) established his psychological laboratory at the university.

Formally, however, his laboratory remained under the faculty of philosophy.

Wundt succeeded in detaching psychology from philosophy, especially

freeing it from the influence of Emanuel Kant, an extremely influential

philosopher who stated that it is impossible to measure manifestations of

the mind in the same way as physical objects ( 5 ). With his criticism of pure

reason, Kant (1724–1804) established the very important distinction between

‘the essential nature of things’ (things in themselves) and ‘things as they seem’

(i.e., that which we sense or perceive as a phenomenon when faced with

the object we are examining).

Figure  1.1 illustrates Kant ’ s philosophical approaches with reference to

present day psychiatry, according to which depression is understood to be a

clinical phenomenological perception (shared phenomenology of depressive

symptoms) as measured by the six depression symptoms contained in the

Hamilton Depression Scale (HAM-D 6 , see Figure  3.1). Modern neuropsy-

chiatry attempts to describe the depression behind the phenomenological

perception, i.e., depression ‘in itself ’, as we believe it to be present in the

brain, for example, as a serotonin 1A receptor problem (impairment).

The area of research now known as brain research is just such an attempt

to measure the processes presumed to be taking place in the brain, that is ‘das

1 Classical psychometrics

Page 13: Clinical Psychometrics

4 Clinical Psychometrics

Ding an sich’. As pointed out by Sontag, reality has increasingly grown to

resemble what the camera shows us ( 6 ). It is reality itself when the neuropsy-

chiatric camera demonstrates receptor binding in the brain, while clinical

reality is increasingly becoming what the camera visualises for us by means

of assessment scales or patient-related questionnaires.

The ability to describe reality as it is in itself, i.e., looking at the world

unclouded by any preconception of it, has been debated by such neo-Kantentians

as Wittgenstein and Quine ( 7 ). The quantification of endophenotypes or

deep phenotypes is probably the most scientific image of the world. However,

we do not have endophenotypes to tell us whether we indeed can describe

reality, e.g., the brain, as it is itself. Wittgenstein tells us that he does not want

to say whether we can or cannot describe reality as it is in itself. He wanted,

as stated by Putman to bring our phenomenological items back to their home

in clinical psychiatry. This is what clinical psychometrics is about ( 7 ).

Figure  1.2 shows a correlation between the so-called psychotic symp-

tom items in an American rating scale (see Appendix) and serotonin 2A

receptor binding, which it is now possible to measure by means of positron

emission tomography (PET) scanning ( 8 ). The figure shows a correlation

coefficient of −0.57; this is statistically significant but not clinically sig-

nificant, as the variance on the ordinate axis (the ‘psychosis’ scale) can

explain only about 32% of the variance on the axis of abscissas (serotonin

2A receptor binding). If the two patients at the far left are excluded as

outliers, then the negative correlation value is halved, so that less than 10%

of the variance is explained.

Kant’s philosophical approach

Psychometric frame of reference(The clinical scientist)

das Ding für uns

the phenomenon for us

Things as we perceive them in timeand space when measuring them

e.g. HAM-D6

Biological frame of reference(The brain scientist)

das Ding an sich

the noumenon

Things in themselves – onlybiological comprehension is valid

e.g. serotonin 1A receptorin the brain

Figure 1.1 The philosophical background for the emergence of psychometrics

Page 14: Clinical Psychometrics

Classical psychometrics 5

The scale in Figure  1.2 shows the positive symptoms in a schizophrenia

scale. In the early 1970s, the American psychiatrist Nancy Andreasen found

it important to label those schizophrenic symptoms on which medication

had an effect as positive. In clinical psychiatry, these were termed productive

symptoms as they were often the reason for hospitalisation in a mental insti-

tution. Later on, Nancy Andreasen became interested in neuropsychiatric

brain imaging methods [Computer Assisted Tomography (CAT scan),

Magnetic Resonance Imaging (MRI), Positron Emission Tomography

(PET)], which became available in the 1980s and 90s. However, in an inter-

view from 2003, she had to admit that schizophrenia is probably not located

in one specific section of the brain ( 9 ). Schizophrenia affects many different

brain areas that cannot be visualised as ‘das Ding an sich’.

Wilhelm Wundt ’ s major achievement was to realise that mathematical

models of ‘das Ding für uns’ can be used to measure the ‘shared pheno-

menology’ of the state one wishes to assess quantitatively. During his stud-

ies at the Heidelberg faculty of medicine, he obtained a degree in medicine.

Wundt then participated in studies in the physiology of perception under

Helmholtz (1821–94) and Fechner (1801–87). He observed that it was

possible to get subjects to reliably assess sensory impressions when the

conditions of the study were standardised, e.g., with increasing light or

noise exposure.

Clinical assessment

Psychotic subscale (PANSS) (See Appendix)40

30

20

102.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00

Frontal 5-HT2A receptor binding in the brain (biological validity)

Figure 1.2 The problematic relationship between the clinical, the psychometrical and the biological frames of reference with a correlation coeffi cient of −0.57

Page 15: Clinical Psychometrics

6 Clinical Psychometrics

Wundt ’ s philosophical basis was that each manifestation of the mind

corresponds to a neurobiological substrate in the brain, but in his opinion the

psychometric measurement of this manifestation of the mind should only

focus on the psychological phenomena (das Ding für sich) and not include

any biological elements in any way. He belonged to the branch of philosophy

called non-reductive monism (corresponding to Harald Høffding ’ s critical

monism, which maintains that manifestations of the mind cannot be reduced

to purely biological variables) ( 10 ). On the other hand, it is of course possible

to reduce certain manifestations of the mind to less complicated ones in an

attempt to obtain the most reliable or objective measure. He felt that it would

be possible in this way to make psychology scientific within the frame of its

own descriptive realm, since psychological and biological methods of

description are two different ways of viewing reality.

Wundt ’ s approach was that of descriptive psychology where the various

dimensions consisting of individual items (symptoms) can be added to give

a total score. He was excluding the immediate, peak-experiences detached

from relations, e.g., the spontaneous, stimulus-unrelated, perception-like

images in the religious experience of the child, actually referred to as ‘Sensus

numinis’ ( 11,12 ). The clearest description of Wundt ’ s scientific approach

based on his ‘Grundzüge der psychologischen Psychologie’ is found in

Vannerus’ monograph ( 13 ).

The psychometric method developed by Wundt is probably the only

specific psychological method identified in mental science, i.e., in scientific

psychology ( 14 ). The two most famous scientists to emerge from Wundt ’ s

psychological laboratory in Leipzig were Emil Kraepelin and Charles

Spearman; both of them understood that psychological measurement

( psychometrics) and biological measurement are two different ways of

viewing nature.

Emil Kraepelin: Symptom check list and pharmacopsychology

Kraepelin (1856–1926) had just obtained his medical degree when he applied

for a post at Wundt ’ s laboratory in 1882. As Wundt was unable to finance his

salary, Kraepelin also had to take up a post as a locum at the local mental

hospital in Leipzig. Thus, Kraepelin held an unsalaried position at the Wundt

laboratory. Kraepelin ’ s purpose was to introduce scientific psychology into

psychiatry so that his career as a psychiatrist would be furthered by his stud-

ies at Wundt ’ s psychological laboratory. In his job application to Wundt, he

wrote that he would give a kingdom for a [research] topic; Wundt then gave

him the opportunity to examine the influence of psychoactive substances

Page 16: Clinical Psychometrics

Classical psychometrics 7

such as alcohol and the hypnotic drug chloral hydrate on volunteer research

subjects. Kraepelin set out to demonstrate a dose response curve using

reaction time measurements as the psychological response and psychoactive

substances as the stimuli, so that increasing amounts of alcohol (number of

drinks) led to lengthening reaction times. Since Wundt could see that

Kraepelin had his heart set on psychiatry, he encouraged Kraepelin to employ

this objective scientific method when subsequently assessing the various

symptoms presented by patients suffering from mental disorders.

Kraepelin published his first Psychiatric Compendium as early as 1883.

In  this he attempted to focus on the symptoms presented in the different

disorders ( Compendium der Psychiatrie . Verlag von Amber Abel, Leipzig,

1883). After leaving the Leipzig laboratory and starting on his career as

a  psychiatrist in Munich, Kraepelin published several compendiums or

textbooks on psychiatry. He revised his textbook almost bi-annually and in

the 6 th edition in 1899, he was able to describe two disorders with different

symptom profiles: manic-depressive disorder and schizophrenia.

Figure  1.3 shows the checklist Kraepelin used when systematically moni-

toring his patients over several years in order to ascertain which symptoms

possessed ‘shared phenomenology’ over this period of time. These are called

Kraepelin’s symptom checklist from his Zählkarten (counting cards)

• Nervousness

• Restlessness

• Irritability

• Depression

• Psychomotor retardation

• Aggression

• Grandiosity

• Negativistic behaviour

• Hallucinations

• Paranoid ideas

Matthias M. Weber and Eric J. Engstrom Kraepelin’s ‘diagnostic cards’:the confluence of clinical research and preconceived categories.

History of Psychiatry 1997; 8: 375 – 385.

Figure 1.3 The assessment scale or checklist used by Kraepelin (10)

Page 17: Clinical Psychometrics

8 Clinical Psychometrics

checklist symptoms, as Kraepelin only determined whether the symptom was

present or absent. This type of scale is called a nominal scale. Using this

method, Kraepelin was able to demonstrate that during a period of about six

months, some patients presented with the first five or six symptoms in

Figure  1.3 , while in other episodes of shorter duration (up to three months)

they had the next two symptoms (aggression and delusions of grandeur), along

with restlessness, sleep disturbance and irritability. Between these episodes of

depression or mania, these patients were discharged from hospital and were

socially well-functioning. Other patients, who were often lifetime residents in

asylums, had the last three symptoms in Figure  1.3 . Kraepelin described them

as suffering from dementia praecox (now schizophrenia), as  the disorder

typically started when they were about 20 years of age and was  chronic in

nature, often with an influence on intellectual functions as well. But these were

consequences, not elements, of the schizophrenic symptomatology. Manic-

depressive disorder, on the other hand, did not typically emerge at a specific

age. Based on the original registrations by Kraepelin on his ‘Zahlkarten’ (count-

ing cards) including the checklist symptoms in Figure  1.3 , Jablensky et al made

a comparison using the Present State Examination (PSE). From the PSE scores

the ICD-9 diagnoses of schizophrenia and manic-depressive disorder can be

made. In total Jablensky et al identified 721 patients assessed by Kraepelin and

found a concordance for the diagnoses of schizophrenia and manic-depressive

disorder of approximately 80% with the ICD-9 diagnoses ( 15 ).

In his thesis: ‘Über die Beeinflussung einfacher psychischer Vorgänge

durch einige Arzneimittel‘ (Jena, Fischer Verlag 1892), Kraepelin established

the area of research he designated pharmacopsychology .

In the 8 th edition of his textbook, written between, 1909–13, Kraepelin

added reflections on the psychotherapeutic effects of certain drugs such as

morphine, phenemal and chloral hydrate. However, he found that the effects

of these drugs on schizophrenia and manic-depressive disorder were

extremely poor. He was thus able to observe the spontaneous course of illness

in these two disorders.

In the schizophrenic patient, as stated previously, the condition was

unremitting, while manic-depressive disorder was characterised by episodes

with specific symptoms and then periods between episodes of a year or more

in which the patients were completely without symptoms and thus able to

function normally. In these descriptions, Kraepelin determinedly avoided

including the various theories on disease circulating at that time, such as

hereditary elements, stress burden and so on.

Kraepelin ’ s textbooks were not widely known outside Germany, as the two

world wars made German psychiatry less acceptable. His system only began

to make an international impact after World War II, not least in the USA.

Page 18: Clinical Psychometrics

Classical psychometrics 9

During his research at Wundt ’ s Leipzig laboratory, Kraepelin conceived

the idea of establishing pharmacopsychology. He thought it important to

describe the symptoms found to be reversible during a course of pharmaco-

logical therapy. However, as mentioned previously, no therapeutically

adequate drugs were developed during Kraepelin ’ s lifetime, so this research

area was scaled down. It is a fact of great interest that Kraepelin was among

the first to propose the use of dose response comparisons as an essential

pharmacological criterion when determining the clinical effect of a drug.

The Rorschach test Until the breakthrough of modern psychopharmacology in the 1960s, Danish

psychometric research was heavily influenced by the Rorschach Shape

Interpretation test, published by the Swiss psychiatrist Hermann Rorschach in

1921. The Rorschach test consists of 10 symmetrical inkblots, which do not

represent recognisable images  per se , but are used as indefinite visual stimuli

open to many different interpretations, in the same way as with abstract paint-

ing. No psychometric theory underlies this ‘inkblot test’, but in the hands of a

trained psychologist it may provide an opening for the psychodynamic theo-

ries propounded with reference to Freud ’ s psychoanalysis. Psychoanalysis was

an accepted method of treatment in psychiatry during the period between the

two world wars. However, an inherent limitation of the Rorschach test is that

the scoring is heavily dependent on the testing psychologist, so that the

Rorschach test has very poor inter-observer reliability (agreement).

In Denmark experimental psychology with stimulus response trials domi-

nated research. Alfred Lehmann (1858–1921) was the founder of experi-

mental psychology in Denmark. He had worked together with Kraepelin

at Wundt ’ s laboratory. He established a psychological laboratory at

Copenhagen University in 1886; Kraepelin paid a visit to it in 1901. The first

professor of clinical psychology at Copenhagen University, Lise Østergaard

(1924–1996) used the Rorschach test in her doctorate thesis on formal

thought disturbances in schizophrenia at the University of Copenhagen, but

the clinical experience she had gained under the supervision of the consultant

psychiatrist Erling Dein turned out to be more rewarding than her Rorschach

results ( 16 ). In the introduction to her thesis, Lise Østergaard correctly

states that Kraepelin with his symptom checklist was the first person able to

delimit schizophrenia by its characteristic symptomatology. Kraepelin had

emphasised that the symptom profile was rarely quite alike from one patient

to another, but in chronic schizophrenics the course of their disorder was

completely homogenous.

Lise Østergaard then adds that Kraepelin ’ s description of these patients

could ‘have a rather sterile and external appearance’. She finds Kraepelin ’ s

Page 19: Clinical Psychometrics

10 Clinical Psychometrics

mode of description ‘marked by the stiffness and paucity of nuances

that characterised Germanic psychology (Wundt). Kraepelin was not open to

the new currents in the psychology of his period (i.e., the psychodynamic

theories)’.

However, Lise Østergaard was forced to conclude that it was Kraepelin ’ s

consistent, clinical descriptions of psychiatric patients that made it at all

possible to delimit both the schizophrenic as well as the manic-depressive

disorder.

With the introduction of modern psychopharmacology, it became vitally

important to follow Kraepelin ’ s clinical but somewhat sterile measuring of

symptoms, and as a consequence psychometrics had to reject the Rorschach

test on a scientific basis (lack of reliability and validity) and to go on to

promote the use of symptom rating scales based on Kraepelin ’ s checklist.

Clinical reality, as described by Kraepelin at the start of the 20 th century,

was ousted by Freud ’ s psychoanalysis, and only reinstated in the 1950s when

modern psychopharmacology appeared on the scene. This made the clinical

reality Kraepelin had described perfectly obvious to everyone, as well as the

fact that Freud ’ s clinical theorising had been dismissed. Because clinical

psychology was so slow to realise this, its range became very limited. Thus, it

is hardly a paradox that clinical psychiatrists were the ones to develop clinical

rating scales.

Charles Spearman: Factor analysis and intelligence tests

In 1906, the English psychology student Charles Spearman (1863–1945)

finalised his studies at Wundt ’ s laboratory with a PhD thesis, but in 1904 he

had already published his first paper on the correlation method that was to

become the starting point of factor analysis ( 17 ).

Spearman then moved back to England and took up a London profes-

sorship. His psychological field of interest was that of intelligence tests for

use with primary school pupils. Spearman is generally regarded as the first

actual promoter of psychometrics via his attempt to define certain dimensions

of intelligence through factor analysis. His idea was to use mathematical

factor analysis to identify the factors that make up the concept of intelli-

gence. Factor analysis is a method by which one may get an indication of

which tests belong together and which do not. Thus, it is not a method of

measurement but a classification of the different tests (factor structures).

Worldwide, however, factor analysis was soon elevated to the status of an

important psychometric proof of validity of a rating scale, i.e., that the scale

was scientifically valid.

Page 20: Clinical Psychometrics

Classical psychometrics 11

If it was possible to show by the use of factor analysis, which tests pointed

the same way and which pointed in other directions, then a scientific analysis

had been performed.

In 1927, using factor analysis, Spearman was able to identify two factors of

intelligence: a general factor and a specific factor ( 18 ).

The principle of Spearman ’ s factor model is first to compute the correla-

tions between different intelligence tests, identifying those factors that best

describe the connection. The weighting of the tests comprising a certain

factor is computed (factor loadings). The first factor is usually a general

factor. The second factor is a specific factor, which shows in which areas the

person in question has their strong points.

An attempt to use the Spearman factor analysis tradition for empirical

research with different intelligence tests showed that the model does not

describe the real world. One of the problems was that factor analysis is very

sensitive to the range of variance in the sample being tested. If the analysis

is  an attempt to determine factors in subjects who are all very intelligent

(i.e., a very narrow range of variance), too many factors will be identified. In a

very large population sample with very different levels of intelligence (i.e.,

a very great range of variance), usually only a single general factor emerges.

The fundamental element in factor analysis is the correlation coefficient.

Computation of the first factor will provide a rough estimate as to the size of

the correlation coefficients of the individual items in a scale; these are given

as factor loadings. When all the items have positive factor loadings (as is the

case with the first factor in Hamilton ’ s Anxiety Scale, see Table  1.1 ) then a

general factor is present (general anxiety factor in Table  1.1 ). Should one wish

to ascertain whether some items have a higher mutual correlation coefficient

(loadings) than others, then the second factor will provide this information,

through its contrast between positive versus negative loadings. In Table  1.1

the psychic anxiety symptoms have positive loadings while the physical

(bodily) anxiety symptoms have negative loadings. The sign direction in

itself is of minor importance and should not be dwelt upon; as the significant

element here is that the symptoms with the same sign have a higher mutual

correlation than the items with the opposite sign. The result shown in

Table  1.1 has a very high clinical validity when assessing antianxiety effect in

a drug.

In short, it is the identification of the first two factors that is of clinical

significance. Typically, the first factor will demonstrate that the symptoms

selected obviously have varying degrees of positive correlations; therefore

this factor is called the general factor. The second factor is the bipolar factor

according to the factor analysis literature as it attempts to establish two

discriminatory symptom groups, namely the group with negative factor

Page 21: Clinical Psychometrics

12 Clinical Psychometrics

loadings and the group with positive factor loadings. Hence this factor is

called the bipolar factor. As this term has nothing to do with bipolar affective

disorder, it is now labelled the dual factor. According to Spearman, in intel-

ligence tests this dual factor would typically discriminate between people

with language skills and people with maths skills.

British versus American factor analysis Spearman founded a special British approach to factor analysis, in which

factor analysis is used to interpret the first two factors of a rating scale analysis

(the general versus the dual). In contrast, an American approach rapidly

emerged in which factor analysis was used to identify as many factors as

possible. In the following, emphasis will be on the British method. The

American tradition of factor analytical tradition particularly refers to

Guilford ’ s classical monograph, which first appeared in 1936 ( 5 ) and in a

revised version in 1954 ( 19 ).

In the American tradition, Thurstone ( 20 ) recommended noting down the

factors in order to find more simple structures, while Guilford recommended

an ‘orthogonal’ rotation, i.e., factors may not inter-correlate (must be at right

Table 1.1 Factor analysis. Archetypical two-factor model of Hamilton ’ s anxiety scale with factor 1 as a general factor and factor 2 as a bipolar or dual factor with positive loadings on the psychic anxiety symptoms and negative loadings on the physical anxiety symptoms.

Items Hamilton (1969) (40)

Pichot et al (1981) (41)

Loadings Loadings

Factor 1 Factor 2 Factor 1 Factor 2

1 Anxiety 0.66 0.50 0.50 0.39 2 Tension 0.83 0.32 0.62 0.35 3 Phobic fears 0.48 0.28 0.45 0.35 4 Insomnia 0.62 0.05 0.65 0.26 5 Concentration difficulties 0.69 0.37 0.62 0.27 6 Depressed mood 0.69 0.33 0.66 0.38 7 Motor tension 0.52 –0.53 0.54 –0.25 8 Sensory symptoms 0.73 –0.30 0.58 –0.40 9 Cardiovascular 0.68 –0.41 0.53 –0.48 10 Respiratory 0.56 –0.40 0.52 –0.43 11 Gastrointestinal 0.66 –0.16 0.29 –0.39 12 Genito-urinary 0.45 –0.25 0.33 –0.31 13 Other autonomic 0.67 –0.14 0.52 –0.30 14 General (agitation) 0.80 0.10 0.70 0.09

Page 22: Clinical Psychometrics

Classical psychometrics 13

angles to each other). Cattell, on the other hand, suggested a less rigorous

approach with the use of ‘oblique’ rotation, permitting a certain degree of

inter-correlation between factors ( 21 ). This basic attempt to eliminate negative

loadings through rotation is called ‘positive manifold’ ( 22 ). In contrast, British

tradition advocates an initial simple description of the principal component

analysis. According to this the entire core of Spearman ’ s factor an analysis must

be examined before performing any rotation. In this ‘Spearman’ algebra, the

first factor (the principal component) is a general factor that indicates the

degree of positive correlation among the different items in a scale. The second

factor is frequently a bipolar or dual factor (i.e., with negative loadings on some

items and positive loadings on other items). One might claim that the British

tradition is less invasive, less ‘manipulative’ than the American.

When focusing on the landmarks in the development of factor analysis over

the first 50 years, Vernon concludes that Hotelling ’ s principal component

analysis is mathematically more accurate than Spearman ’ s method, but that

its greater complexity implies tedious calculations ( 23 ). However, with the

SSPS or SAS programs, a century after Spearman ’ s factor analysis, we may

now actually start with Hotelling ’ s method before we perform all the many

rotations within factor analysis. The paradox is that we have difficulty in

understanding the mathematical superiority of Hotelling ’ s method over that

of Spearman. Therefore we do not realise that the first and second principal

components identified by Hotelling ’ s method are often sufficient. In other

words, we are often unable to provide an argument for making all the

rotations inherent in the factor analytic method.

Harold Hotelling: Principal Component Analysis

It was the American mathematician Harold Hotelling (1895–1973) who

became the best advocate for the British (Spearman) algebra of concentrating

on the initial simple correlation matrix, focusing on the first two factors; the

general factor and the bi-directional factor.

Hotelling received his PhD in Mathematics from Princeton University in

1924. In 1927, he wrote a review in the Journal of American Statistical

Association on the first edition of Fisher ’ s Statistical Methods for Research

Workers and subsequently visited Fisher in London in 1929. In 1933, from

his new base at Columbia University, Hotelling introduced his Principal

Component Analysis as a pure mathematical approach to factor analysis in an

attempt to simplify the structure of a large number of items in a rating scale

( 24,25 ) (see Calculus Example 1).

The best description of Hotelling ’ s Principal Component Analysis (PCA)

has been made by Dunteman ( 26 ). PCA is an attempt to identify a few

Page 23: Clinical Psychometrics

14 Clinical Psychometrics

components explaining most of the variance in the scores for individual

items in a rating scale in the original sample. Because PCA is conducted on

rating scales that contain items with some degree of positive inter-correlation,

the first component might explain up to 50% of the variance while the second

component explains 10–15% of the variance. PCA has no underlying statistical

model, but employs a mathematical focus to explain the total variance in

the item scores, thereby capturing most of the information within the items

of the rating scale. The first (general) component is a straight line in the

correlation matrix with closest fit to the total variance, and the second

component is a straight line of closest fit to the residuals from the first prin-

cipal component. Since both principal components are uncorrelated, each

one makes an independent contribution to accounting for the variance of the

original items. The correlations of items within the principal components are

called loadings, a term borrowed from Spearman ’ s factor analysis. Whereas

the eigenvalue of the first principal component is usually higher than 1.0,

the eigenvalue of the second principal component need not be higher than

0.7 ( 26 ).The first principal component must be orthogonal to the second

component, which will have alternative loadings, i.e., as many negative as

positive loadings (bi-directional, or dual), thereby contrasting the two groups

of items that are mutually most correlated.

PCA should be clinically interpreted as a method of classifying items,

rather than a method to validate the problems of measurement. The presence

of a general factor or component is not an argument for summing all items of

a rating scale so that the total score is a sufficient statistic for measuring

severity on a dimension.

PCA is a way to group items according to the second, bi-directional

component, for example into typical and atypical depression. In this context,

Bertrand Russell ’ s ramified hierarchy of typology is the best way to illustrate

the clinical meaningfulness of PCA ( 27 ). The example used by Russell is the

definition of a typical versus an atypical Englishman. It is clear that most

Englishmen do not possess all of the properties that most Englishmen

possess. Therefore, a typical Englishman, according to this definition, might

be atypical. The problem raised by Russell is that the word ‘typical’ has been

defined by a reference to all properties. It is in this situation that Russell

introduced his ramified hierarchy in order to deal with the apparent circularity

( 27 ). Being a typical Englishman should not refer to the totality of properties

(all potential items) but to a sub-totality of the predictive items for which

over 50% of the properties are captured by the concept of a typical Englishman.

The PCA can be considered as a method of ramified hierarchy in which the

second component has identified the predicative items by contrasting items

with negative and positive loadings.

Page 24: Clinical Psychometrics

Classical psychometrics 15

In conclusion, with reference to Russell ’ s theory of typology, the general

component or factor identified by principal component analysis is the

description of being an Englishman, whereas the bi-directional second prin-

cipal component or factor is the description of being a typical or an atypical

Englishman by the contrasting positive versus negative loadings of the second,

bi-directional factor, e.g., positive = typical and negative = atypical.

Hans Eysenck: Factor analysis and personality questionnaires

In the autumn of 1945, Eysenck (1916–1997) was appointed Chief

Psychologist at the Psychiatric Institute in London, which is affiliated with

the Maudsley psychiatric hospital ( 1 ).

Eysenck set out to evaluate the validity of the psychological tests used in

clinical psychiatry in the late 1940s. This has quite neutrally been summa-

rised by Schafer, who concludes that if the  results of a psychological test

diverge from the diagnosis made by the psychiatrist, this does not necessarily

mean that the test is incorrect ( 28 ). A clinical diagnosis, e.g., in depression,

was not at that time clear-cut, as psychiatrists often found it difficult to

distinguish between neurotic and psychotic depression. This mirrored

Kraepelin versus Freud in their understanding of ‘neurotic’ and ‘psychotic’.

The above-mentioned Rorschach interpretation in schizophrenia is a good

example of this ( 16 ).

In this connection, it is imperative to understand that Eysenck did not

himself treat patients and that his contact with clinically experienced

psychiatrists led him to perceive Freud ’ s psychoanalysis as both a theory of

personality and  a treatment model ( 1 ). Eysenck soon realised that as a

treatment method, psychoanalysis lacked clinical effect. In his personality

questionnaire studies, however, his reference frame was to be found in

Freud ’ s and Jung ’ s psychoanalytic models of personality rather than in true

clinical reality. In his trials with factor analysis, he  adhered to Spearman ’ s

British tradition by examining the first two factors (the general versus the

dual), while using Hotelling ’ s principal component analysis.

As mentioned previously, it had become a tradition among psychologists to

use the test constructed by the psychiatrist Herman Rorschach (1884–1922)

(the Rorschach test). In the area of personality, Rorschach had discovered that

vision can be influenced by the personality behind the ‘glasses’. He thus

thought that coloured inkblots are especially stimulating for the extrovert

personality (extroversion dimension), while non-coloured inkblots, with

less  movement of the figures, are connected to the introvert personality

( introversion dimension).

Page 25: Clinical Psychometrics

16 Clinical Psychometrics

Eysenck demonstrated that these Rorschach theories could not be

empirically reproduced using the Rorschach test, as interpretations of

the test varied a good deal from one psychologist to another. In the field of

psychometrics, Eysenck adopted the position that it is important to work

with consistent personality dimensions. Using an empirical approach, he

demonstrated that it is possible to ask people what they are experiencing.

By using questionnaires, Eysenck was able to eliminate investigator influence

on testing behaviour, and he felt that the use of factor analysis would ensure that

the interpretation of the questionnaire response profiles would not be influ-

enced by the interpretation of the individual investigating psychologist. Eysenck

made use of lay subjects (initially often young men up before the medical board

prior to military service), but rarely included patients with a valid diagnosis. His

questionnaires had qualitative response options on a nominal scale, in which

only a ‘Yes ’ or a ‘No’ were required. One of the reasons for this was the limited

capacity of the computers available in the 1950s and 1960s; nowadays, we have

access to the necessary power when using quantitative response categories.

Eysenck drew on both Jung ’ s personality theory of extroversion/introversion

(as used by Rorschach), as well as on Freud ’ s personality theory of neuroticism,

as the basis of his psychologist approach.

As a psychologist working on a theoretical basis, Eysenck was not sufficiently

aware of the fact that both Jung and Freud were primarily clinical experts. Thus,

Freud perceived neuroticism as a particularly pronounced degree of normal

behaviour, not as the qualitative remove from normal behaviour seen in schiz-

ophrenia or the psychotic forms of depression or mania. As shown by Kline

( 29 ), Eysenck attempted to validate his questionnaire dimensions, e.g., neuroti-

cism and extroversion/introversion, within the field of learning psychology, not

in the clinical reality that formed the basis of Freud ’ s and Jung ’ s theories.

Among these personality dimensions ( 30 ), Eysenck ’ s neuroticism factor

proved the most definite ( 31 ). Figure  1.4 gives an abbreviated version of

Eysenck ’ s Neuroticism Scale with the nine items that best show the structure of

the anxious neurotic personality. Of the remaining questions in Eysenck ’ s

Neuroticism Scale (23 in all), many are closely associated with depression.

A psychometric analysis of Eysenck ’ s Personality Question naire (EPQ), based

on a study with persons experiencing relatively rapid remission after posttrau-

matic stress ( 32 ) and a corresponding control group (N = 1353 persons), gave

a Loevinger coefficient of homogeneity of 0.42, proving that it is acceptable

to use the total score of the nine questions as  a  measure of neuroticism.

Another study, with patients suffering from differing types of affective disor-

ders, showed that only Eysenck ’ s neuroticism scale was in accord with an

experienced psychiatrist ’ s assessment of the degree of neurosis ( 33 ).

Eysenck found that those persons specifically suffering from anxiety

had  a  response pattern that was very sensitive to negatively formulated

Page 26: Clinical Psychometrics

Classical psychometrics 17

questions – such as those dealing with symptoms: the higher the number of

affirmative responses, the more neurotic the subject is.

When commencing his research with these questionnaires, Eysenck

labelled the Rorschach test the idiographic method of measurement and his

own questionnaires, the nomothetic method.

The idiographic method is concerned with what is of unique significance

to one individual with no relevance for others and Eysenck therefore cor-

rectly stated that the idiographic method cannot be used in measuring, since

to measure is precisely to observe individuals with reference to a common

scale. In contrast, the nomothetic method centres on what can be measured.

Eysenck ’ s use of factor analysis to prove the fact of a nomothetic measure is a

paradox, because factor analysis is not a method of measurement. Thus, in

modern research factor analysis is used in idiographic analyses, e.g., when

describing an individual ’ s quality of life ( 34 ).

It is of great importance to understand that Eysenck ’ s intensive personality

questionnaire research using factor analysis actually confirms Spearman ’ s

results within the field of intelligence tests, in that e special focus should

placed on the first two factors identified by the analysis. Thus, Eysenck found

that the first factor was a general neuroticism factor (Figure  1.4 ), while factor

2 was a dual factor discriminating between extroversion versus introversion

( 30 ). It was Eysenck ’ s attempts to explain the remaining factors and to relate

these to the psychoanalytic perception of personality rather than to clinical

reality that blurred his results.

Eysenck’s Neuroticism Scale

No. Symptom Yes(= 1)

No(= 0)

15 Are you an irritable person?

19 Are your feelings easily hurt?

31 Would you call yourself a nervous person?34 Are you a worrier?

38 Do you worry about awful things that might happen?

41 Would you call yourself tense or “highly-strung”?

47 Do you worry about your health?

54 Do you suffer from sleeplessness?

72Do you worry too long after an embarrassing experience?

Total score

Item numbers in accordance with the EPQ (30)The questions below address how you would describe yourself in general

Figure 1.4 Scoring sheet for Eysenck ’ s neuroticism questionnaire

Page 27: Clinical Psychometrics

18 Clinical Psychometrics

Around 1970, the American psychologist Charles Spielberger devel-

oped  a questionnaire to measure anxiety ( 35 ). In this he attempted to

discriminate between dispositional neurotic personality and present state

anxiety. The first of these he termed ‘trait’ anxiety and the second ‘state’

anxiety.

Figure  1.6 shows Spielberger ’ s ‘trait’ scale with 9 items selected from the

original 20. This selection is based on the criterion of clinical validity, so that

it corresponds with Eysenck ’ s neuroticism scale (Figure  1.4 ).

Around 1990, an international consensus that a five-factor personality

model could adequately cover the whole field was achieved among psychol-

ogists ( 36 ).This model is called ‘The Big Five’ ( 37 ). On the basis of this model,

a questionnaire, the NEO-PI-R, was developed. The two first factors in ‘The

Big Five’ are based on Eysenck ’ s EPQ and reflect Eysenck ’ s Neuroticism Scale

and Eysenck ’ s Extraversion Scale. Neuroticism and Extroversion are usually

referred to as ‘The Big Two’; however, the items in the NEO-PI-R do not ade-

quately cover Eysenck ’ s original dimension. The abbreviated versions of

Eysenck ’ s Neuroticism and Extroversion Scales (shown in Figures  1.4 and 1.5 )

are sufficient when measuring ‘The Big Two’.

Figure  1.7 shows the nine NEO-PI-R items that correspond most closely to

Eysenck ’ s neuroticism from a clinical point of view as shown in Figure  1.4 .

Only five out of the nine items in Figure  1.7 are negatively phrased, so the four

Eysenck’s Extraversion scale

No. Symptom Yes (= 1)

No(= 0)

5 Are you a talkative person?

10 Are you rather lively?

17 Do you enjoy meeting new people?

32 Do you have many friends?

52 Do you like mixing with people?

60 Do you like doing things in which you have to act quickly?

70 Can you get a party going?

82 Do you like plenty of bustle and excitement around you?

86 Do other people think of you as being very lively?

Total score

Item numbers in accordance with the EPQ (30)The questions below address how you would describe yourself in general

Figure 1.5 Scoring sheet for Eysenck ’ s extraversion questionnaire

Page 28: Clinical Psychometrics

Classical psychometrics 19

Spielberger’s trait anxiety scale

No.

2

4

8

9

11

12

14

18

20

SymptomYes

(= 1,2,3)*No

(= 0)

I tire easily

I wish I could be as happy as others seem to be

I feel that difficulties are piling-up so that I cannot overcome them

I worry too much over something that really doesn’t matter

I am inclined to take things hard

I lack self-confidence

I try to avoid facing a crisis or difficulty

I take disappointments so keenly that I can’t put them out of my mind

I become tense and upset when I think about present concerns

Total score

Item numbers in accordance with the original publication (35)The statements below address how you would describe yourself in general

* Degrees 1, 2 and 3 all give positive replies

Figure 1.6 Scoring sheet for Spielberger ’ s trait anxiety questionnaire

NEO items corresponding with Eysenck’s neuroticism dimension

No. Symptom Yes(= 1)

No(= 0)

1 I am not the worrying type

31 I scare easily

61 I seldom feel anxious or uneasy

79 I hesitate to show anger, even when apposite

91 I often feel tense and nervous

121 I seldom worry about the future

147 I do not see myself as especially unworried

151 I often worry about things that might go wrong

216 Even minor factors can frustrate me

Total score

Item numbers in accordance with the original scaleThe questions below address how you would describe yourself in general

Figure 1.7 Scoring sheet for modifi ed neuroticism questionnaire (NEO )

Page 29: Clinical Psychometrics

20 Clinical Psychometrics

remaining items must be ‘flipped’ when measuring the degree of neuroticism.

When this is done Loevinger ’ s coefficient of homogeneity is 0.42.

Max Hamilton: Factor analysis and rating scales

Hamilton (1912–1988) commenced his career as a psychiatrist just after

World War II. He had the same starting point as Kraepelin, that of wishing

to  utilise psychometrics as a means of making clinical psychiatry more

scientific in its approach. In 1945 he started working at the Maudsley Hospital

in London – at the same time and at the same place as Eysenck. He actually

attended Eysenck ’ s PhD courses in factor analysis ( 1 ).

His approach was that psychometrics in clinical psychiatry should be

considered a scientific discipline parallel to pharmacology and biochemistry.

During his career, Max Hamilton was Associate Professor of Psychiatry at

Leeds University from 1953–1957. These years saw the founding of modern

psychopharmacology, beginning with the establishment of the antimanic

effect of lithium compared to placebo, followed by the antimanic and antip-

sychotic effect of chlorpromazine. Such placebo-controlled, randomised,

double-blind clinical trials became more and more common in Britain in the

1950s and Hamilton could see the need for reasonably brief rating scales to

be used when measuring the effects of these new psychotropic drugs.

Hamilton held a position as research assistant at Leeds University Hospital

from 1957 to 1960 while developing his two rating scales, the Hamilton

Anxiety Scale (HAM-A) from 1959 ( 38 ) and the Hamilton Depression Scale

(HAM-D) ( 39 ) from 1960. While Eysenck was interested in the more perma-

nent features of neuroticism, Hamilton was only interested in the symptoms

of anxiety or depression that appeared as signs of clinical disorders and were

reversible through psychopharmacological treatment. Like Kraepelin, his

opinion was that these symptoms provide the best impression of the anxious

or the depressive patient.

With both of his scales, the HAM-A (see Figure  1.8 ) and the HAM-D (see

Figure  1.9 ), Hamilton ’ s purpose was to measure those mental and physical

symptoms found by the patient and his or her relatives to be the greatest

burden. Hamilton ’ s goal was not to make a diagnosis, only to measure the

severity of the anxious or depressive condition. So each week the question

was how severe the symptoms listed in Figure  1.8 and Figure  1.9 had been

during the past week. Based on these weekly assessments during a course of

treatment with antianxiety or antidepressive medication, it would be possible

to describe their clinical effects.

Just as Eysenck did, Hamilton made use of factor analysis to demonstrate

the scientific value of his scales in his psychometric publications.

Page 30: Clinical Psychometrics

Classical psychometrics 21

For the depression scale, Hamilton found a varying number of factors

during his studies (Hamilton, 1960, 1967). The first study population was

very homogeneous, namely, depressive patients who were so severely afflicted

that they were hospitalised. In the next study, the patient population was

more heterogeneous, consisting of depressive patients who were either

hospitalised or attending an out-patient clinic. Hamilton could see that, in an

increasingly homogeneous patient group, an increasing number of factors

could be identified; an unfortunate consequence of the correlation method as

a mathematical element of factor analysis.

With his anxiety scale studies in out-patients suffering from anxiety

neurosis, Hamilton found a two-factor model in both the first trial using a

13-item anxiety scale ( 38 ) and in the next trial with the final 14-item version

(Hamilton 1969) ( 40 ). Hamilton ’ s factor analysis showed that the first factor

was a general factor while the second factor was dual, as it had negative

Hamilton Anxiety Scale HAM-A14∗ HAM-A6 (the core symptoms of anxiety)

1∗ Anxiety

2∗ Tension

3∗ Fears

4 Insomnia

5∗ Difficulties in concentration

6 Depressed mood

7∗ General somatic symptoms (muscular)

8 General somatic symptoms (sensory)

9 Cardiovascular symptoms

10 Respiratory symptoms

11 Gastrointestinal symptoms

12 Genito-urinary symptoms

13 Other autonomic symptoms

14∗ Behaviour at interview

See Appendix 5a for Manual

Figure 1.8 Scoring sheet for HAM-A 14

Page 31: Clinical Psychometrics

22 Clinical Psychometrics

loadings on the physical anxiety symptoms and positive factor loadings on

the psychic anxiety symptoms (Table  1.1 ). This was subsequently confirmed

by a French study using the HAM-A ( 41 ).

A major international trial with DSM-III panic attack patients confirmed

this HAM-A 14

two-factor model ( 42 ). On the basis of these results, Hamilton

thought that the first factor is general (i.e., all the symptoms in the scale

concur in measuring one dimension), providing enough evidence to use the

total score as a sufficient statistic. But Hamilton became less confident about

this conclusion when his anxiety scale was not able to distinguish between

placebo and an antianxiety drug ( 43 ).

The fact that the second factor in Hamilton ’ s anxiety scale is bipolar,

or dual, i.e., that some items have negative factor loadings and others have

positive factor loadings, is perhaps the most interesting element in the

factor analysis method. Factor loadings demonstrate the correspondence

1∗ Depressed mood

2∗ Guilt feelings and self-depreciation

3 Suicidal ideation

4 Initial insomnia

5 Middle insomnia

6 Delayed insomnia

7∗ Work and interests

8∗ Psychomotor retardation

9 Psykomotor agitation

10∗ Anxiety (psychic)

11 Anxiety (somatic)

12 Gastro-intestinal symptoms

13∗ General somatic symptoms

14 Sexual interest

15 Hypochondriasis

16 Loss of insight

17 Weight loss

See Appendix 3a for Manual

The Hamilton Depression Scale (HAM-D17)∗ HAM-D6 (core symptoms of depression)

Figure 1.9 Scoring sheet for HAM-D 17

Page 32: Clinical Psychometrics

Classical psychometrics 23

between symptoms and the factor in question, thus implying a psycho-

logical insight. This demonstration that the anxiety condition consists of

physical and psychic anxiety symptoms with an equal distribution, seven

physical and seven psychic anxiety symptoms in HAM-A 14

, proved to be

highly significant later on. Hamilton did not look into this because interest

was centred on his depression scale in the period from 1969 to 1989.

Factor analysis was not able to provide a psychological insight in depressive

symptomatology through the factor structure of the HAM-D.

Factor analysis is a psychometric method that reveals a structure in an

assessment scale, but not whether it is a dimension in which the total score is

a meaningful expression of the severity of a condition. In his monograph on

clinimetric methods, Feinstein uses Hamilton ’ s scales as examples of scales

‘produced by factor analysis’, however, without discussing the nature of this

validation procedure ( 2 ).

Here it is important to understand that Max Hamilton built on Spearman ’ s

and Eysenck ’ s factor analysis within the frame of the two explainable factors.

Hamilton went on to demonstrate that (particularly in the HAM-A) the first

factor is a general factor while the second factor is bi-directional, differentiat-

ing between somatic and psychic anxiety symptoms. This dualism between

body and mind seems to underline the accepted custom of calling factor 2 a

dual factor.

Factor-analytic studies with Hamilton ’ s Depression Scale have shown

that the great difference between different clinical trial results is in the

number of  factors produced and their item loadings. In other words, the

American factor-analytic tradition leads to inferior results. The British

tradition (only interpreting the two first factors – the general versus the

dual) would seem to result in a fair amount of agreement between different

clinical trials. A recent landmark study in this respect is from the STAR-D

analysis ( 44 ).

Pierre Pichot: Symptom rating scales and clinical validity

Pichot obtained his medical degree in Paris in 1947. When he, like Hamilton,

chose psychiatry, his purpose was to use psychometrics as a scientific disci-

pline on the same plane as pharmacology and biochemistry. Pichot therefore

studied psychometrics at the Sorbonne in Paris immediately after getting his

medical degree ( 3 ).

He took up a position as registrar at the psychiatric hospital Saint-Anne in

Paris under Professor Delay, who was among the first to demonstrate the

antipsychotic effect of chlorpromazine.

Page 33: Clinical Psychometrics

24 Clinical Psychometrics

In 1972, Pichot made it clear that, from a psychometric point of view,

using the HAM-D total score in studies on the antidepressive effect of a drug

was a dead end. His reason was that factor analysis had not accepted the use

of the HAM-D total score. Thus, Pichot did not acknowledge a demonstration

of a general factor as sufficient evidence that the total score was a sufficient

measure of the degree of depression.

Pichot had worked with the US rating scale, the BPRS (Brief Psychiatric

Rating Scale), developed by Overall and Gorham ( 45 ). Drawing on a symp-

tom pool of more than sixty symptoms, it had been demonstrated that the

eighteen symptoms in Figure  1.10 were especially sensitive to change during

a course of chlorpromazine therapy in psychotic patients and imipramine

therapy in depressive patients. The BPRS is perhaps the most widely used

psychiatric rating scale worldwide. This is perhaps because it is seemingly

easy to use; see Kraepelin ’ s symptom list in Figure  1.3 .

Pichot then recommended the use of the six BPRS depressive symptoms to

measure the antidepressive effect of a drug. A major review of the BPRS some

years later confirmed that Pichot ’ s depression factor was an independent

factor in the BPRS ( 46 ).

Pichot had been brought up in the French school of psychometrics,

founded by Alfred Binet (1857–1911) through his intelligence tests for primary

school pupils. Binet ’ s starting point was that school teachers possessed the

greatest knowledge about the intellectual abilities of their pupils in the differ-

ent levels of primary school. So Binet enlisted the aid of the most experienced

school teachers when choosing intelligence tests, instead of using Spearman ’ s

factor analysis. Binet ‘outperformed’ Spearman, as the updated versions of

Binet ’ s tests are now generally used.

In 1905 Binet declared that:

Our aim is, when a child is put before us, to take the measurement

of his intellectual ability, in order to establish whether he is normal

or if he is retarded. For this purpose we have to study his present

condition, and this condition alone… as a result we shall neglect

entirely his aetiology… We shall confine ourselves to gathering

together the truth on his present condition ( 47 ).

Pichot thus held the opinion that rating scales measuring antipsychotic

effect, antidepressive effect, or antianxiety effect should be based on the

clinical reality of the assessments of experienced psychiatrists and not on

factor analysis ( 3 ). The version of the BPRS scale shown in Figure  1.10 is

identical to Overall ’ s reference (The semi-structured Brief Psychiatric Rating

Scale interview and rating guide) as to symptom description ( 48 ).

The descriptions applying to absence of a symptom are taken from Turner ’ s

Page 34: Clinical Psychometrics

Classical psychometrics 25

Brief Psychiatric Rating Scale Score

1 Somatic concern

2 Anxiety, psychic

3 Emotional withdrawal

4 Conceptual disorganization

5 Guilt feelings

6 Tension

7 Mannerisms and posturing

8 Grandiosity

9 Depressive mood

10 Hostility

11 Suspiciousness

12 Hallucinatory behaviour

13 Motor retardation

14 Uncooperativeness

15 Unusual thought content

16 Blunted or inappropriate affect

17 Elation/euphoria

18 Confusion and disorientation

Figure 1.10 Brief Psychiatric Rating Scale

Mania scale

Grandiosity [8]Uncooperativeness [14]Hostility [10]Increased psychomotor activity [17]Intrusive behaviourElevated mood

Schizophrenia scale

Emotional withdrawal [3]Conceptual disorganisation [4]Suspiciousness [11]Hallucinations [12]Unusual thought content [15]Blunted affect [16]

Depression scale

Somatic concern [1]Anxiety, psychic [2]Guilt feelings [5]Tension [6]Depressive mood [9]Motor retardation [13]

Figure 1.11 The three BPRS subscales. In the brackets the item numbers as indicated in Figure 1.10

Page 35: Clinical Psychometrics

26 Clinical Psychometrics

1963 version ( 49 ). The first 18 items make up the BPRS-18. Two extra items

are included to allow measurement of mania (Figure  1.11 ).

A clinical validity analysis of the BPRS would result in a depression factor,

a mania factor and a schizophrenia factor, as seen in Figure  1.11 .The mania

and schizophrenia scales are often combined in a general psychosis factor

when assessing the clinical effect of antipsychotics.

In his final work, Psychology Down the Ages (1937) ( 50 ), Spearman writes

that the correlation coefficient developed by Pearson and himself was

exclusively comprehended and used in English-speaking countries. In

France and especially in Germany it was refuted. Classical psychometrics,

which is based on the concept of correlation in factor analysis, and

Cronbach ’ s alpha are thus typically described in the major American stand-

ard works on psychometrics: Guilford 1936 ( 5 ), Guilford second edition

1954 ( 19 ); Nunnally 1967 ( 51 ); and Nunnally and Bernstein third edition

1994 ( 52 ) as well as Comrey 1992 ( 22 ).

It is precisely because these major monographs on factor analysis lie

within the American reference frame that the interpretation of Hamilton

Depression Scale results is so problematic; this American tradition lays

stress on the number of factors, while the British tradition uses Ockham ’ s

razor, i.e., the principle of simplicity, and focuses on the two first factors (the

general versus the dual). Hamilton relied chiefly on Hotelling ’ s principal

component analysis.

The English philosopher William Ockham (1285–1349) described the

principle afterwards known as Ockham ’ s razor: the scientific community

should only assume what is strictly necessary when working with a scientific

hypothesis (the law of parsimony).

This was precisely Pichot ’ s point; that psychometric analysis of rating

scales should avoid the use of factor-analytic methods, as in the American

tradition. Such analysis should follow Binet ’ s model in using experienced

psychiatrists as a test of clinical validity, and use item response theory models

to determine if it is valid to sum the individual items as a total score. In

Pichot ’ s opinion, Binet had used the same reasoning when developing his

intelligence tests as that which lies behind the item analyses published by

Rasch in 1960.

Page 36: Clinical Psychometrics

27

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Focusing on reliability

As can be seen from the preceding chapters, classical psychometrics in

psychiatry has mainly been influenced by the work of clinical psychiatrists

(Kraepelin, Hamilton and Pichot) and not by psychologists. In the field of

psychometrics, Spearman and Eysenck attempted to measure the dimensions

of intelligence and personality, respectively; i.e., areas of human manifestations

of the mind of a more permanent nature. Kraepelin, Hamilton and Pichot

were absorbed by those symptoms that reflect clinical conditions and for

which modern psychopharmacology has now made treatment possible. Here

it should be mentioned that Gorham, rather than Overall, was the clinically

experienced person behind the BPRS. He had seen the major effects of

chlorpromazine and imipramine when these drugs became available during

the 1950s and 60s.

In clinical psychiatry, the classic psychological test has always been

regarded as a supplement to psychiatric diagnosis. However, it was a major

issue for the two psychiatric diagnosis systems [the International

Classification of Diseases (ICD) adopted by WHO in 1948 in the sixth edition

(ICD-6) and the American system, the Diagnostic and Statistical Manual of

Mental Disorders (DSM), first edition (DSM-I) published in 1953], as well as

the psychometric tests (e.g., the Rorschach test), that reliability was about

0.50. Reliability refers to the degree of unanimity a group of psychiatrists can

achieve when making an ICD- or DSM-diagnosis; or a group of psychologists

when interpreting a Rorschach test. Reliability is shown by a ‘coefficient of

reliability’, the intraclass or Kappa coefficient, and if the coefficient is around

0.50, one might just as well have tossed a coin. To be clinically meaningful, a

coefficient of reliability must be around 0.80.

2 Modern psychiatry: DSM-IV / ICD -10

Page 37: Clinical Psychometrics

28 Clinical Psychometrics

A complete revolution in clinical psychiatric diagnosing took place

around  1980, about one hundred years after the establishment of Wundt ’ s

psychological laboratory in Leipzig. It so happened that two US psychiatrists,

Spitzer and Klerman, who had used rating scales for many years, had noticed

that while agreement (reliability) was very poor when several psychiatrists

diagnosed a patient according to the diagnostic system in use at that time, the

reliability of the HAM-D, HAM-A or BPRS was very high ( 53 ). Furthermore,

Spitzer and especially Klerman were greatly concerned by the fact that

modern psychopharmacology was often used for illnesses for which it was

unsuitable or, conversely, not used in patients who might benefit from drug

therapy.

In 1980, the Association of American Psychiatrists published a completely

new diagnostic system based solely on the symptom profile. In this manner,

an adequately high reliability was ensured, and patients with treatment-

demanding depression or anxiety received the proper psychopharmacological

treatment.

The new diagnostic system was the third revision of the Diagnostic and

Statistical Manual for Mental Disorders, DSM, with the acronym DSM-III

( 54 ).With the DSM-III, a very good agreement emerged between the HAM-D

score and the diagnosis of major depression.

In 1992, the World Health Organization (WHO) published their 10 th

revision of the International Classification of Disease (ICD) diagnostic

system, subsequently given the acronym ICD-10 ( 55 ). This system copies the

DSM-III, but is unfortunately not its identical twin. As is the case with the

DSM-III, the ICD-10 is in high agreement with the HAM-D, HAM-A and

BPRS.

It is precisely because we have these two not quite identical systems in the

DSM-III and ICD-10, that rating scales such as the HAM-D have become the

natural common denominator. Thus, a score of 18 or more on the HAM-D

indicates a treatment-demanding depression both according to DSM-III and

ICD-10.

Focusing on validity

The symptoms included in DSM-III (or DSM-IV which was published in

1994 and is almost identical to DSM-III but still not adequately identical to

ICD-10) have been chosen through consensus, and not through empirical

research ( 56 ).

According to DSM-IV, a treatment-demanding depression is called a

major depression, with the algorithm that at least five of nine symptoms

should have been present almost every day throughout the previous two

Page 38: Clinical Psychometrics

Modern psychiatry: DSM-IV / ICD -10 29

weeks. According to ICD-10, a moderate depression implies that at least six

out of ten symptoms should be present almost every day throughout the

previous two weeks. As can be seen, these DSM-IV and ICD-10 cut-off scores

for ‘typical’ depression follow Russell’s definition of a typical Englishman; i.e.,

more than 50 % of the total number of items.

Klerman has called the introduction of the DSM-III a neo-Kraepelinic

paradigm. This is often perceived as a biological–medical approach to clinical

psychiatry, as opposed to the Freudian approach that prevailed between the

two twentieth century world wars.

This neo-Kraepelinic paradigm only refers to the fact that Kraepelin

introduced a thorough symptom description so that the course of symptoms

could provide diagnostic information. Kraepelin had learned from Wundt to

describe clinical symptoms under standardised conditions without letting

oneself be influenced by etiological deliberations. However, Kraepelin’s

description was not completely atheoretical, since the medical model of

disorders also ‘hovered’ at the back of his mind.

In all medical conditions an attempt is made to delimit the symptom

complex; i.e., the syndrome that the symptoms point to during the course of

the illness (from debut of symptoms to their diminishment during treatment).

The clinical reality referred to by the various rating scales is purely

psychiatric and thus neo-Kraepelinic. DSM-III/DSM-IV ( 54,56 ) and ICD-10

( 55 ) adhere (more or less) to this reality. Thus, the effect of lithium on the

‘positive’ symptoms of mania, but not schizophrenia, makes the distinction of

positive and negative symptoms meaningless.

Quantitative, dimensional diagnosis

Completely new revisions of DSM-IV and ICD-10 are planned in which the

dimensional approach will be the conclusive factor. However, factor analysis

is still employed to identify the ‘dimensions’ to be combined with the

diagnostic descriptions in DSM-V. A thematic section on DSM-V and ICD 11

in the journal Psychological Medicine used factor analysis to show that some

symptoms cluster in a mania factor; some around the positive and negative

schizophrenia factors, respectively; and a few around depression factors ( 57 ).

Thus, they do not comprise true dimensions in which the symptoms

identified cover the whole of the dimension in question. This is what modern

psychometrics is capable of through the use of item response theory models,

as will be shown in the next chapter.

Enhanced inter-rater reliability is the great improvement brought about by

the introduction of modern psychiatry (DSM-III/IV or ICD-10). Reliability

is a major component of classical psychometrics, but certainly not of modern

Page 39: Clinical Psychometrics

30 Clinical Psychometrics

psychometrics. In a recently published book on psychometrics by Furr and

Bacharach ( 58 ), the reliability issue is dealt with extensively, in contrast to

the  focus modern psychometrics places on item-response theory models.

According to modern psychometrics, a scale with adequate validity also

possesses adequate reliability ( 4 ).

When evaluating the reliability of an assessment questionnaire, the degree

of unanimity is analysed. According to the clinimetric approach, the

experienced clinician must be the key, and subsequent analysis is made of the

percentage of clinicians using a certain scale that deviate from the master.

Some feel that a deviation of +/− 20% is acceptable, as for example with the

PANSS scale.

Satisfying a ‘democratic’ desire for inter-observer agreement can be

achieved by using an intra-class coefficient. The interview-related scales

included here all possess an adequate reliability.

The reliability of a questionnaire is indicated through a test–retest

reliability coefficient, which is to say the agreement between the results of the

questionnaire performed at two different points in time. When measuring

anxiety and depression, one must be sure that the profile of the condition has

been fairly stable during the period between the two test times if the test-retest

reliability coefficient is to be meaningful.

Classical psychometrics uses Cronbach’s coefficient alpha in order to avoid

the issue of condition profile constancy in the test–retest method, as this

coefficient uses a single time point to indicate the degree of correlation

between the individual questions. Cronbach’s alpha does not tell anything

about the validity of the individual questions, only about their reliability and

their mutual agreement. In his book on clinimetrics from 1987 Feinstein

attempted to put a stop to the use of Cronbach ’ s alpha, as its size depends on

the number of questions: the higher number of questions, the higher the

reliability ( 2 ). Using the same conditions as Feinstein (a 0.30 mean correlation

value between questions), the above-mentioned Furr and Bacharach

demonstrate that in a 4-item questionnaire, Cronbach’s alpha is 0.40, in an

8-item questionnaire alpha is 0.60, but in a 20-item questionnaire alpha

approaches 0.80; according to classical psychometrics, this is the value that a

questionnaire should achieve for adequate reliability.

In order to ensure a high Cronbach alpha coefficient, many questionnaires

have approximately 20 items; this is perhaps the cause of a growing dislike of

questionnaires in the general population, as it is obvious to everyone that

many of the items are redundant. However, Furr and Bacharach do not agree

with this sentiment, they use many pages to explain that modern statistical

software programs (SSPS and SAS) make it extremely easy to compute

Cronbach’s alpha.

Page 40: Clinical Psychometrics

Modern psychiatry: DSM-IV / ICD -10 31

Furr and Bacharach attempt to convince their readers that an alpha

coefficient of 0.75 does not in itself make a factor analysis superfluous. If the

coefficient alpha is perceived as a reliability coefficient, then stress must be

placed on the mutual agreement between the different items. In principal

component analysis, the simplest form of factor analysis, demonstration of a

general factor (i.e., all factors positively correlated) is the unanimity shown

by the alpha coefficient. Multidimensional assessment scales (Hamilton’s

Depression Scale and Hamilton’s Anxiety Scale) both have an alpha coefficient

higher than 0.75 ( 59 ). Hamilton felt that demonstrating a general factor

implies that the individual items in his scales may be summed as a measure

of degree of depression or anxiety severity. However, coefficient alpha and

factor analysis are not statistical methods which test whether a scale measures

the degree of severity by the sum of its items.

Only the modern item response theory model is able to statistically test

whether an assessment scale measures degree of severity. In a certain sense

one could say that the item response theory model has demonstrated the

importance of the typical depressive symptoms or the psychic anxiety

symptoms of the dual or bi-directional factor 2 in the measurement of

depression or anxiety. This then implies that relatively few items (less than

ten) are important in the measurement of depression or anxiety. If Cronbach’s

alpha is used on its own, then many more items are needed to go beyond the

0.75 limit, typically 15 to 20 items.

The DSM-IV/ICD-10 diagnosis systems (e.g., in schizophrenia or

depression) and modern psychometric methods agree in recommending

approximately ten symptoms as a suitable number. This indicates that clinical

reality can be adequately described through ‘a handful of items’. Classical

psychometrics, with Cronbach’s alpha coefficient or factor analysis, has typically been used by those interested in personality questionnaires.

Nunnally (1967) states: ‘it is unrealistic for the measurement of most human

traits only to have a handful of items’ ( 51, 52 ).

Page 41: Clinical Psychometrics

32

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Measurement of the manifestations of the mind in modern psychometrics

includes two absolutely essential elements. The first of these core features is

that each symptom in a rating scale is itself measured on a scale. The term

‘scale’ is derived from the Italian and means ‘stairs’. In the DSM-IV and ICD-

10 systems, a symptom only has one step: either the symptom is absent and

one is on the ground floor, or the symptom is present and one is one step

down towards the basement. In clinical psychometrics, it is deemed essential

to have several steps and a six-step ‘basement stair’ is thought optimal to

measure each symptom.

The second core feature in modern psychometrics is whether or not the

score of the symptom items belonging to a syndrome, e.g., depression, can be

added up, so that the sum of all the symptoms constitutes a sufficient statistic

for the impression of the present state.

Three statisticians have, each in their own way, played a vital part in the

development of modern psychometrics, namely Ronald A. Fisher, Georg

Rasch and Sidney Siegel.

Ronald A. Fisher: From Galton’s pioneer work to the sufficient statistic

Some people believe that psychometrics in fact started with Francis Galton

(1822–1911). In contrast to Wundt, Galton attempted to connect psycho-

metrics to the theory of evolution put forth by his cousin Charles Darwin

(1809–82) in his ‘The Origin of Species’. It was the psycho-social aspect of

the theory of evolution that Galton attempted to measure. In 1883, he

published ‘Inquiries into Human Faculty and its Development’, which was

actually a collection of rather mixed essays. It is an anthropological rather

3 Modern, dimensional psychometrics

Page 42: Clinical Psychometrics

Modern, dimensional psychometrics 33

than a psychometric publication. Of particular psychometric significance is

Galton ’ s attempt to develop ‘verbal scales ’ containing several response

categories. He discovered how difficult it is to describe these ‘orders of mag-

nitude’ so that they are understood in the same way from one subject to

another.

In 1884, Galton established the first psychological laboratory in Britain

(in  London). Galton developed an increasing interest in mathematical

statistical problems, and it was Galton ’ s pupil Karl Pearson (1857–1936), who

published a correlation analysis, analogous to that of Spearman. In his

ground-breaking work from 1904, Spearman writes that it was actually

Galton who put forward in 1886 the mind set of correlation analysis, when

seeking a mathematical expression, where the value 1 signified perfect

correlation between two factors (e.g., that people with long arms usually also

have long legs), where the value 0 meant no correlation, and the value −1

meant a negative correlation ( 17 ).

After Galton’s death in 1911, Karl Pearson acquired his professorship in

genetics, but established an institute for applied statistics at the University of

London, in which Galton’s laboratory was incorporated.

In their statistical work, both Galton and Pearson were interested in those

physical or mental qualities that have a normal distribution. Galton measured

the height of 8585 British citizens and found a mean and a dispersion that

was in accordance with the normal distribution, the so-called Gaussian bell

curve.

Ronald Fisher (1890–1962) worked at Galton’s laboratory in the

1930s ( 60 ).

Fisher was a mathematician and had developed a great interest in statistics.

He worked on solving the problems that had arisen when statistics was

applied to small data sets. Here, one was attempting to construct a statistical

understanding (inference), including how representative the observations

of the test sample were of the distribution one sought to estimate (e.g., the

normal distribution or Gaussian distribution), i.e., how to calculate the

distribution parameters.

In 1922, Fisher published a paper ‘On the Mathematical Foundation of

Theoretical Statistics’ in which he states that the statistician’s task is to ensure

minimal loss of information when data are reduced, for example, to a normal

distribution ( 61 ). It is important to find sufficient statistical expressions

( sufficient statistics).

Ronald A. Fisher is regarded by many as the founder of medical statistics,

especially with reference to the first edition of ‘Statistical methods for

research workers’ in 1925, which, as mentioned previously, Hotelling reviewed

in 1927.

Page 43: Clinical Psychometrics

34 Clinical Psychometrics

Georg Rasch: From Guttman’s pioneer work to item response theory analysis ( IRT )

Georg Rasch (1901–80) was a professor of statistics for Danish psychologists.

Like Fisher, Rasch had a degree in mathematics, with an MSc from the

University of Copenhagen in 1925 and a Doctor degree in 1930. His doctor-

ate thesis was entitled: ‘On Matrix Calculus and its Application in Differential

Equations’. At that time two professorships in mathematics were available in

Copenhagen, but one of these was given to A.F. Andersen (1891–1972) and

the other to the young Børge Jessen (1907–93).

In 1935, Rasch received a Rockefeller scholarship for 12 months studies at

Fisher’s London institute, as he was now moving on to statistics. Fisher’s

concept of sufficiency served as inspiration for the psychometric model

developed by Rasch in the 1950s, which was to become the basis of modern

psychometrics ( 62 ).

The central element in modern psychometrics is whether there is a latent

additive function when the symptoms in a rating scale are used. If this is the

case, the total score is then a sufficient statistic for the present symptom

profile.

When discussing the item response theory model published by Rasch in

1960, it is important to realise that the model is not the result of a theoretical

study. This IRT (item response theory) analysis was used in connection with

a research problem, where it was necessary to have a method for comparing

subjects independently of which items they had been measured with.

In his empirical studies, Rasch was very interested in his subjects’ ability to

solve mathematical problems. In order to assess the capabilities of the

subjects, one chose arithmetical problems that could be ranked according to

difficulty, so that some are very easy to solve, some slightly more difficult,

some again moderately difficult, some markedly difficult and some highly or

extremely difficult. The bright student is able to solve almost all the problems,

while the less clever student is only able to solve the easier ones.

If each problem is scored as correctly solved or incorrectly solved (on a

nominal scale), it is then possible to demonstrate, provided the Rasch analy-

sis is valid, that the sum of correct answers is a sufficient measure of the

subject’s present ability to solve arithmetical problems. What is investigated

in the Rasch analysis is whether or not the ranking of the problems, as made

by the skilled mathematician, is reflected when taking into account such

external factors as age and gender.

When using this Rasch analysis on a symptom rating scale, the prevalence

of the symptoms is analysed. Figure  3.1 shows a prevalence ranking of six

depressive symptoms. In mild cases of depression, the symptoms ‘lowered

Page 44: Clinical Psychometrics

Modern, dimensional psychometrics 35

mood’, ‘loss of interest’ and ‘tiredness’ are almost always present. So these are

the three symptoms a GP must especially enquire about. Often ‘tiredness’

is the symptom that brings the patient to the doctor, and he or she will often

tell the doctor that when one is very tired then one becomes depressed or less

interested in one’s daily activities: if the doctor finds no ‘organ-related’ or

physical explanation of the tiredness, it is then important to quantify whether

lowered mood is present as well as less interest in daily activities.

When presented with a depressed patient in a psychiatric emergency ward,

the doctor on call has to determine if suicidal impulses are present in order

to decide whether hospitalisation is necessary. At this point, the more rare

symptoms in Figure  3.1 must be clarified, i.e., ‘guilt feelings’ and ‘psychomo-

tor retardation’. The symptom ‘suicidal ideation’ is extremely difficult to

assess, but as a depressive state is by far the most common cause of suicide; it

is very important to establish the presence or absence of ‘guilt feelings’ and

‘psychomotor retardation’.

With reference to the ability to solve mathematical problems, the bright

student will be able to solve both easy and difficult problems. In the same

way, it applies that a depressed patient with ‘guilt feelings’ may also have more

‘mild’ depressive symptoms; i.e., lowered mood, loss of interest and tiredness.

In psychometric terminology, these three symptoms are termed ‘ceiling

symptoms’ as they reach the frequency ceiling even in mild depressive states

(Figure  3.1 ). In the Rasch model the term ‘item parameter difficulty’ is used,

Dep

ress

ed m

oo

d

Lac

k o

f in

tere

sts

An

xio

us

mo

od

Tir

edn

ess

and

pai

ns

Gu

ilt f

eelin

gs

Psy

cho

mo

tor

reta

rdat

ion

Frequency percentage

Severity of depression

Ceiling

Figure 3.1 Prevalency structure of the six depression symptoms

Page 45: Clinical Psychometrics

36 Clinical Psychometrics

and ‘ceiling items’ are then classified as items with low difficulty. The

symptoms of ‘guilt feelings’ and ‘psychomotor retardation’ are referred to in

psychometrics as ‘floor symptoms’ as they only emerge in more severe states

of depression. In the Rasch model the item parameter difficulty ranges from

minus 2 to plus 2 ( 63 ). When reflecting the underlying dimensions of depres-

sive states, the rank ordering of items into ‘ceiling items’ versus ‘floor items’

can be transformed to a dimension of depression on a scale from 1 to 5 where

the Rasch minus 2 = 1, minus 1 = 2, 0 = 3, plus 1 = 4, and plus 2 = 5. Applying

this to the HAM-D 6 rating scale for depression, we have confirmed its clinical

validity by the psychometric (Rasch) model of measurement ( 64 ).

As the symptom ‘suicidal ideation’ can be seen as a ‘floor symptom’, and

thus the last link, where the three ‘ceiling symptoms’ are the first link, and the

three ‘floor symptoms’ are the next, then the patient should be closely

monitored and hospitalised.

That the so-called ‘ceiling symptoms’ (Figure  3.1 ) occur before the ‘floor

symptoms’ when one assesses men versus women and older versus younger

persons is an expression of ‘the concept of transferability’ in applied psycho-

metrics. Computer Adopted Testing (CAT) is often referred to in modern

psychometrics ( 65 ). Some people view the extremely dramatic reduction of

the many single elements stored in the individual items of a rating scale or a

questionnaire to a sum score of all items (total score), as a sign of reduction-

ism. This is understood as an eagerness to reduce that may tempt one to

claim that one has extensively analysed what one seeks to measure. Rasch

found it extremely important to avoid this reductionism when one had found

empirical evidence that a rating scale or a questionnaire fulfilled the item

response theory model. The items one had isolated in this manner measured

a very important quantitative aspect, while the excluded items might possess

an important independent significance. When measuring the quantitative

degree of depression in depressive states using Hamilton’s 17 items, items like

sleep disturbances and suicidal ideation are excluded. This is because sleep

disturbances are often present in mild depressive states, but not always in

severe states. The issue of suicidal ideation is often so complex that it is

important to have the underlying quantitative measurement performed.

As will be seen later on, the Rasch sufficiency line of thought, viz the true

reductional measurement of technology, is important for dose response rela-

tionships when using antidepressants. The measurement problem inherent

in clinical trials of antidepressants (better or worse outcome, milder or more

severe degree of depression) has been solved by Rasch analysis.

It is indeed interesting to follow the thoughts of the psychologist J. Michell

in his two monographs, in which he reviews psychometrics within scien-

tific  psychology from a historical perspective. In his first monograph

Page 46: Clinical Psychometrics

Modern, dimensional psychometrics 37

( An  introduction to the Logic of Psychological Measurement ) his review of

psychometrics ends with Guttman’s cumulative rating scale ( 66 ). This scale

exactly fulfils the mathematical principle inherent in the item response

theory model; that the difference between different subjects can be measured

when the total score is a sufficient statistic, e.g., that the clever student is able

to solve both the difficult and less difficult problems, while the less clever

student has only managed to solve the easier problems. However, Guttman’s

cumulative scale is a deterministic scale which does not permit the statistical

uncertainty that must be accepted, not least in the clinical field. The Rasch

method is often called a statistical version of the Guttman scale.

Louis Guttman (1916–87) was professor of sociology and psychology at

the Hebrew University of Jerusalem, where he was director of an institute

that was later renamed the Guttman Institute in his honour.He set forth his

model for accumulating individual scale items in the 1930s. During World

War II, Guttman’s model was used to study instant anxiety symptoms in

American troops who had been under fire. It turned out that the somatic

anxiety symptoms that appeared immediately or within hours after these

combat situations could be ranked according to the Guttman principle, so

that the milder anxiety symptoms included palpitations, ‘butterflies in the

stomach’, and dizziness. The more severe anxiety conditions included nausea,

hand tremor and stiffness of the body ( 67 ).

When Michell published his second book (Measurement in psychology)

in 1999, it concludes with a paragraph on item response theory precisely as

developed by Rasch ( 68 ).

The psychologist Borsboom published ‘Measuring the Mind’ in 2005,

further extending Michell’s summary of psychometrics from a psychologist’s

standpoint ( 69 ). Borsboom correctly attempts to distinguish between the

clinical validity of a rating scale, which is clearly a technical, clinical (not

primarily a psychometric) issue, and psychometric validity. However, he then

adds that once clinical validity has been established, it is also important to

perform a psychometric validation analysis, and for this purpose he recom-

mends the Rasch analysis. A good introduction to the Rasch analyses is found

in: Bond TG, Fox CM, Applying the Rasch Model ( 70 ). The best example of the

practical procedure when performing a Rasch analysis of a rating scale is to

be found in Allerup’s Statistical analysis of MADRS – a rating scale developed

in 1986 ( 71 ).

Modern psychometrics was founded by Georg Rasch. In fact, it was after

many attempts to perform factor analysis, especially with the many suggested

ways of rotation, that Rasch realised that this approach was unscientific,

because the guidelines for these rotation procedures were based on ‘trial and

error’, not on evidence ( 72 ). He found the rotation procedures more harmful

Page 47: Clinical Psychometrics

38 Clinical Psychometrics

than helpful in providing ability scales for measurements. This was the

background upon which Rasch developed his item response theory model.

He emphasised that his analysis of measurements should only be performed

when a rating scale had been proved clinically valid. Then the problem of

measurement should be tested, i.e., transferability defined as a mathematical-

statistical analysis of whether the scale contains one and only one dimension

when used several times during a course of therapy, and when controlled for

age or gender bias. As pointed out by Borsboom one of the requirements

in  the Rasch model is local independency between items ( 69 ), an attempt

by  Rasch to screen out the tautological correlations between items, i.e., a

problem inherent in factor analysis.

Sidney Siegel: Non-parametric statistics

Siegel (1916–61) completed his PhD in psychology in 1953 at Stanford

University and then taught psychology and statistics at the University of

Pennsylvania until his death in 1961. Together with the philosopher Donald

Davidson (1917–2003), he worked on psychometric analysis, including

measurement theory models. However, Davidson abandoned these psycho-

logical analyses due to the difficulties in measuring subjective experience,

while still adhering to Wundt ’ s approach to non-reductive monism, i.e., that

it is only possible to reduce psychological dimensions to less complex

psychological elements, but never to unique biological elements. Høffding

subsequently designated this approach critical monism .

Siegel also worked with Patrick Suppes (1922–) who independently of

Georg Rasch demonstrated that the latent additive function is the central

element in psychometric measurements.

In 1956, Sidney Siegel published ‘Nonparametric Statistics for the

Behavioural Sciences’ the first work to collect the non-parametric or

distribution-free statistical tests, also known as rank order tests ( 73 ). When

drawing conclusions based on a sample of measurement results, one might,

especially in the field of psychology, feel uneasy about assuming that the

underlying distribution belongs to a certain category of distribution.

As one of the most significant non-parametric tests, Siegel included

Fisher’s exact test, which is without parameters. In any case, Siegel’s book

from 1956 has become a kind of bible on the relations between the scale step

version (response category type) of the individual items in a rating scale and

the corresponding statistical analysis. Thus, the nominal scale step (the cate-

gory scale) is associated with, for example, Fisher’s exact test; in this case,

when wishing to use Pearson ’ s Χ 2 -test, one must, according to Siegel, perform

a Yates ’ correction.

Page 48: Clinical Psychometrics

Modern, dimensional psychometrics 39

The ordinal response category scale is associated with non-parametric

tests such as the Wilcoxon Signed Rank Test or the Kruskal-Wallis One-Way

Analysis of Variance by Ranks. (The Spearman correction analysis is a non-

parametric test, while Pearson’s correction is a parametric method). Siegel’s

great contribution was to focus on the relations between item response cate-

gory and the statistical (non-parametric) test. Some people believe this to be

true psychometrics (see Figure  3.2 ).

Robert J. Mokken: Non-parametric analysis for item response theory ( IRT )

The connection between the prevalence of a symptom (e.g., in depression)

and the severity of depression in the group of patients under examination,

has a probability value that is included in the Rasch analysis; a parametric

analysis.

Based on this connection inherent in Rasch analysis, Mokken (1929–)

developed a corresponding non-parametric analysis ( 74, 75 ). It is thus inher-

ent in the model that items with a high prevalence (e.g., lowered mood or

lack of interest in daily activities) are present in both the mildly depressed

patient and the more severely depressed patient (ceiling effect), while items

with a low prevalence (e.g., guilt feelings or psychomotor retardation) are

only present in the more severely depressed patient. This is often referred to

within the Mokken analysis as invariant item ordering, i.e., transferability.

Mokken published his non-parametric model in 1971 and was in many ways

Level of measurement Nominal scale Ordinal scale Interval scale

Classification(e.g. Men versus women)

+

Ranking(more or less depressiveon HAM-D17)

+

Unit of measurement(HAM-D6)

+

Statistical tests Fisher’s exacttest X2 test

WilcoxonRank order test

Student’s +–testEffect size

Figure 3.2 Connection between measurement level and the corresponding statistical test (Modifi ed from Siegel S. Nonparametric statistics for the behavioural sciences. New York: McGraw Hill, 1956)

Page 49: Clinical Psychometrics

40 Clinical Psychometrics

influenced by Rasch analysis. However, based on Siegel’s defence of the use of

non-parametric statistics when the individual items of a rating scale are

measured with severity categories corresponding to those of the original

scale, he stated that Loevinger’s coefficient of homogeneity was the most

relevant indication of whether a rating scale was in accordance with the item

response theory model.

Loevinger’s coefficient of homogeneity was thus used by Mokken in his

IRT analysis. Jane Loevinger (1918–2008) was one of the few women to con-

tribute statistical tests in psychometrics. Her thesis from 1957 (Objective

Tests as Instruments of Psychological Theory) is her most widely cited work

( 76 ). She demonstrated that if one measures reliability as an agreement

between the items in a psychological questionnaire, one may end up in a

tautological process by making parallel questionnaires. She employed the

mind-set behind Guttman’s cumulative model: that each individual item’s

degree of independent information should be examined, not whether or not

it is identical to the other items in a questionnaire. Loevinger therefore devel-

oped her coefficient of homogeneity as an overall assessment of the Guttman

model in its probability formula. Mokken then went further and established

a coefficient of homogeneity for each item in a questionnaire in order to

identity the items that do not fit the Guttman model. It may seem surprising

that Loevinger herself did not complete Mokken’s work. In his 1971 book,

Mokken states that this coefficient of homogeneity should be regarded as a

descriptive statistic in the sense that a value of 0.40 or higher means that the

total score of a rating scale is a sufficient statistic. Actually, Mokken regarded

coefficients between 0.30 and 0.39 as doubtful, perhaps suitable, while

coefficients of 0.50 or higher were perfect and signified a perfect scale.

Mokken analysis is a much weaker test than the Rasch test on whether a

scale fits the item response theory model, because external factors such as age

and gender are not included as part of the analysis in the same way as in the

Rasch analysis ( 74, 75, 76 ).

With Mokken’s 1971 monograph on rating scale analysis, one may claim

that modern psychometrics had reached a level where the two central

elements of this discipline are expressed in pure rating scale terms, that is, the

quantification of the individual symptom on a Likert scale (see Chapter 4),

and for Mokken in particular the cumulative Guttman scale. A good intro-

duction to Rasch analyses is: Sijtsna K, Molenaar IW. Introduction to

Nonparametric Item Response Theory ( 75, 76 ).

The two psychometric procedures, classical versus modern (Figure  3.3 .),

may, with reference to Wittgenstein, be considered as two different pathways

which we consider as different approaches ( 77 ). The classical approach serves

to describe a family of types which have been discussed in connection with

Page 50: Clinical Psychometrics

Cla

ssic

al:

Fac

tor

anal

ysis

for

typo

logi

cal i

ssue

sM

oder

n:Ite

m a

naly

sis

for

mea

sure

men

t is

sues

A m

athe

mat

ical

mod

el fo

r ty

pe d

escr

iptio

nS

pear

man

’s t

wo-

fact

or m

odel

and

unr

otat

ed p

rinci

pal

com

pone

nt a

naly

sis

are

rank

ed to

geth

er.

The

firs

t fac

tor

is a

ge

nera

l fac

tor,

whi

ch is

taut

olog

ical

, whi

le th

e ne

xt fa

ctor

w

ith n

egat

ive

vers

us p

ositi

ve lo

adin

gs is

the

type

de

scrip

tion,

i.e.

, phy

sica

l ver

sus

men

tal a

nxie

ty o

r ty

pica

l ve

rsus

aty

pica

l dep

ress

ion.

An

exam

ple

of th

e pr

inci

pal c

ompo

nent

ana

lysi

sse

e:

App

endi

x 11

a C

alcu

lus

exam

ple

1

A m

athe

mat

ical

mod

el fo

r m

easu

rem

ent

issu

es.

An

asse

ssm

ent

scal

e w

hich

fulfi

ls th

e ite

m r

espo

nse

theo

ry

mod

el, e

.g. t

he R

asch

mod

el, p

osse

sses

the

mea

sure

men

t te

chni

cal a

dvan

tage

that

the

tota

l sco

re is

a s

uffic

ient

st

atis

tic, i

n th

at w

e ha

ve th

e di

stan

ce b

etw

een

and

rank

or

der

of th

e in

divi

dual

item

s. (

ofte

n re

ferr

ed t

o as

inva

riant

ite

m o

rder

ing

or tr

ansf

erab

ility

). In

dep

ress

ion

mea

sure

men

t th

is m

eans

tha

t we

know

that

dep

ress

ed m

ood,

lack

of

inte

rest

, an

d fa

tigue

are

pre

sent

eve

n in

mild

er d

egre

es o

f de

pres

sion

, w

hile

gui

lt an

d ps

ycho

mot

or r

etar

datio

n m

ake

thei

r ap

pear

ance

in m

ore

seve

re d

egre

es.

An

exam

ple

of th

e ite

m a

naly

sis

for

mes

urem

ent i

ssue

sse

e:

App

endi

x 11

b C

alcu

lus

exam

ple

2

ww

w.p

sykf

orsk

hil.d

k

ww

w.p

sykf

orsk

hil.d

k

Fig

ure

3.3

The

psy

chom

etric

mod

els:

cla

ssic

al v

s. m

oder

n

Page 51: Clinical Psychometrics

42 Clinical Psychometrics

Hotelling’s principal component analysis and Russell’s ramified hierarchy of

typology. The modern approach, for family resemblances, has the criterion of

measurement (total score a sufficient statistic) comparable with other meas-

urement instruments, such as a blood pressure apparatus or a thermometer.

The Guttman cumulative rating scales with the item response theory models

are examples of the modern approach, focussing on the summed total score

as a sufficient statistic.

Wittgenstein used his language-game approach as an argument against

private language. The speaker can only be sure that he or she is using words

correctly when an ‘inner’, ‘subjective’ or ‘private’ process is operating while the

words are used as part of their original public language ( 78 ). Wittgenstein

himself worked with games of applied mathematics ( 78 ). Inspired by his

attempt to follow the measurements of ‘inner’, ‘subjective’ feelings ( 79 ), the

familiar arrangement of the HAM-D 17

items seems to follow that A, B, C

v ersion (Appendix 3a).

Page 52: Clinical Psychometrics

43

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Immediately after the publishing of the ICD-10 ( 55 ) and the DSM-III/

DSM-IV ( 56 ), a major attempt was made to integrate modern psychometrics

with these new diagnostic systems ( 4 ). In this attempt, Likert’s response

categories were limited to 0–4 scales, while Guttman ’ s cumulative scale was

described on the basis of the statistical models within the item response theory

analyses, e.g., the Rasch or Mokken analyses. This was done by combining the

Likert values of the individual symptoms to form a sufficient total score.

Rensis Likert: Scale step measurements

One of the basic elements in modern psychometrics is that each symptom

must be measured on a scale with several steps, namely a Likert scale, named

for Rensis Likert (1903–81). In 1932, he completed his PhD in psychology at

Columbia University, New York, in which he had developed a response

category scale with five steps. In his thesis, Likert used questions based on

values: ‘judgement of value rather than judgement of fact’. The response options

for the individual questions were ‘bipolar’ in that they went from ‘strongly

approve’, ‘approve’, ‘undecided’, ‘disapprove’ to ‘strongly disapprove’. He found

that ‘attitudes are distributed fairly normally’ and that this provided a basis for

‘combining the different statements’. However, Likert did not investigate

whether the sum of the individual questions actually constituted a sufficient

statistic. Subsequently it has been demonstrated that a Likert scale going from

0 to 6 (i.e., seven response categories) ‘hits the ceiling’, which is to say that a

greater number of response options will not provide more information ( 80 ).

This might be the place to mention that in the first review on graphic

rating scales, Freyd (The Graphic Rating Scale) ( 81 ) comments that, while

Galton (1883) was the first to use a ‘Likert’ scale, he was not systematic as to

methods. Freyd recommends that the line along which the measurement

4 Modern psychometrics: Item categories and sufficient statistics

Page 53: Clinical Psychometrics

44 Clinical Psychometrics

takes place should be long enough to permit five response categories. In

many ways, this is the precursor of the Likert scale.

Figure  4.1 shows how the seven-response category Likert scale, used in

the BPRS, is a ‘global’ scale compared with the semi-global, seven-response

category Likert scales used in another assessment scale, namely the

Montgomery-Åsberg Depression Rating Scale (MADRS). It also shows

the  exact scale step definitions based on the BPRS. The first of these was

developed as early as in 1963 by Professor William J. Turner (1907–2006)

( 49 ). An expanded version of this is incorporated in the PANSS scale.

According to the BPRS, the depression symptom is defined as lowered

mood and, on the Likert scale from 0 to 6, this is a global clinical expression

reflecting the adjectives given in Figure 5.1. In the MADRS Item 1, lowered

mood (observed), no definition is supplied for grades 1, 3 and 5. The reason

for this is that MADRS is a subscale derived from the Comprehensive

Psychopathological Rating Scale, in which the individual items have a Likert

scale from 0–3 ( 82 ). The scale has merely been doubled without taking note

of the empty steps, thus making it semi-global (see Appendix 3c). The first

psychometric analysis of the MADRS showed that psychiatrists using the

Assessment of the symptom depressed mood with increasingly precisedefinitions (anchoring)

BPRS MADRS PANSS / CIDRS

0 = not present 0 = neutral mood 0 = absent

1 = doubtful 1 1 = on the verge of depressed mood

2 = very mild 2 = looks dispirited but does brighten up without difficulty

2 = quite mild tendency, but only occasionally

3 = mild to moderate

3 3 = mild to moderate indications of depressed mood, but no hopelessness

4 = moderate to marked

4 = appears sad and unhappy most of the time

4 = moderate to marked indications of depressed mood, perhaps tendency to crying. Reports feeling of hopelessness

5 = marked to severe

5 5 = marked to severe indications of depressed mood, distinct hopelessness

6 = extremely severe

6 = extreme and constant despondency

6 = extremely severe indications of depressed mood massive hopelessness

For ABC scoring sheet of MADRS see Appendix 3c

Figure 4.1 Schematic representation of the graduation of the symptom depressed mood

Page 54: Clinical Psychometrics

Modern psychometrics 45

scale had avoided the empty steps ( 71 ). This is probably the reason why the

standardisation of the MADRS has a relatively high value of 12 for remission,

while the corresponding score on the HAM-D is 7.

The use of empty scale definitions in questionnaires has proven to give an

artificially higher score than the use of well-defined scale steps, such as in the

PANSS example in Figure  4.1 ( 83 ). The most fruitful of the attempts made to

improve the Likert scale in the Hamilton Depression Scale is Paykel’s Clinical

Interview for Depression (CID), precisely through its use of the 0–6– scale

shown in Figure  4.1 ( 84 ). The assessment scale, Clinical Interview for

Depression and Related Syndromes (CIDRS) is developed from the CID. It

follows the endeavours of Turner and the PANSS, while avoiding the ten-

dency to overlap seen in these two attempts. The example given in Figure  4.1

is in accordance with the CIDRS.

On the HAM-A, as shown in the Appendix, there are 1, 2, 3, 4 steps

descending from ground level = 0 down to the lowest level. On the HAM-D

some symptoms ( 39, 9 ) also go from 0 to 4 while others ( 85, 8 ) go from 0 to

2. Hamilton explained that this had been introduced because it was not

clinically meaningful to employ a longer ladder than this. Most assessment

scales tend to deal rather sketchily with the issue of measuring a specific

symptom, for example, lowered mood as shown in Figure  4.1 ( 86 ). As men-

tioned previously, Hamilton gave a great deal of thought as to symptoms that

can only be measured from 0 to 2 versus symptoms that can be measured

from 0 to 4. A score of 3 on the BPRS in Figure  4.1 may signify: a) that dur-

ing the interview, the patient typically seems mildly to moderately depressed,

that is to say neither quite mildly nor markedly depressed, b) that during the

interview, the patient has fluctuated between doubtful and marked to severe,

but on average has a score of 3, or c) one has the impression that during the

last three days, taken as a whole, the patient has had a score of 3.

Recently, an attempt has been made to ensure a more exact score on the

Hamilton Depression Scale by assessing both frequency and severity of a

symptom in an integrated score (GRID – HAM-D 6 ).

As ‘grids’ or nets, both the HAM-D 6 in its GRID version and the MES

can be viewed as attempts to ‘tighten the net’ to catch those symptoms

that  are difficult to pinpoint during an interview due to their varying

frequency.

John Overall: Brief, sufficient rating scales

J ohn Overall’s (1929–) PhD dissertation in 1957 from Texas University in

Austin in the field of general experimental psychology led to five years’ train-

ing in psychometrics at Thurstone’s Psychometric Laboratory in North

Carolina, where he came into contact with the Central Neuropsychiatric

Page 55: Clinical Psychometrics

46 Clinical Psychometrics

Research Laboratory of the Veterans Administration Hospital in Perry Point,

Maryland. As a consequence, Overall joined the programme that the Veterans

Administration had initiated after seeing the revolutionary effects of

chlorpromazine and imipramine, whereby schizophrenic or depressive

patients could be discharged from mental hospitals. To make this more

evidence-based, US multi-centre investigations had been initiated, both pla-

cebo-controlled and against active ingredient comparator, in accordance

with the, randomised, double-blind method that had been introduced within

medical science in the 1950s. The programme was called ‘Cooperative

Studies of Chemotherapy in Psychiatry’. In this programme, Lorr’s ‘Inpatient

Multi-dimensional Scale’ (IMPS) was included as a measure of desired clini-

cal effect ( 87 ). On the basis of the first data analyses of the results from this

programme, and using the statistical analysis methods learned from

Thurstone (the grand old man of American factor analysis), Overall was able

to show that the 63 subscales in the IMPS could be reduced to 16 items. In

this, Overall received much aid from two experienced clinicians, the psychia-

trist Leo Hollister and the psychologist Don Gorham. In particular Gorham’s

clinical experience was used. He was 20 years older than Overall and had

been in the midst of the dramatic change in clinical reality in mental hospi-

tals caused by the introduction of chlorpromazine and imipramine. The

clinical training provided by the physician Leo Hollister was also extremely

important for the formulation of the 16 items that led to the development of

the Brief Psychiatric Rating Scale(BPRS) in 1962 ( 45, 46 ). As noted by Overall

the language of the 16 BPRS items is that employed by experienced psychia-

trists when treating patients:

The guiding principle in development of the BPRS was to provide

psychiatrists with a rating instrument that would permit them to

record their judgment at a level of abstraction consistent with the

manner in which they ordinarily conceptualised manifestations of

psychopathology ( 88 ).

In 1963, after publishing the BPRS, Overall returned to Texas as head of

the Research Computation Center in Galveston. In 1978, he became profes-

sor of clinical psychometrics at UT Houston Medical School. Overall tells

how the BPRS was accepted outside the US via the CINP (Collegium

Internationale NeuroPsychopharmacology) ( 89 ). Max Hamilton headed the

CINP group that was to implement both the BPRS and Hamilton’s Anxiety

Scale (HAM-A) and Depression Scale (HAM-D) via controlled clinical trials

worldwide. With these scales, averages and deviations can be computed to

allow comparison of results of clinical trials from different parts of the world.

This is not possible with a diagnosis!

Page 56: Clinical Psychometrics

Modern psychometrics 47

Both Max Hamilton and John Overall advocated the clinical approach:

when the diagnosis had been made, the prescribed treatment should then be

monitored by HAM-D/HAM-A or BPRS in order to measure the level of

response.

In 1969, Index Medicus accepted rating scales as scientific, evidence-based

measuring instruments for the assessment of drug efficacy in psychiatry.

Brief Psychiatric Rating Scale (BPRS) was the scale referred to by Index

Medicus in 1969, as the BPRS was specifically developed to assess the effects

of antipsychotics or antidepressants ( 4 ). Figure 1.10 shows the BPRS with its

18 symptoms covering depression and schizophrenia. Figure 1.11 shows that

with the addition of two items, mania can also be assessed. Thus, a mere six

BPRS symptoms make it possible to measure the three major fields in clinical

psychiatry; namely schizophrenia, mania and depression.

Factor analytic studies with BPRS brought into sharper focus the American

tradition versus the British. Using the British tradition learnt during his

studies in London, Pichot demonstrated the need to focus on the two most

important factors, and showed that it is the depression factor rather than the

psychotic factor that is important. Building on the American tradition of fac-

tor analysis, Overall attempted to discriminate between ‘depression’, ‘anergia’,

‘thought disturbance’, ‘excitement’, and ‘hostility/ suspicion’; thus, five factors

in all ( 59, 60 ). It is worth noting here, that the BPRS literature does not

discriminate between positive versus negative factors. This terminology

entered with the PANSS scale, based on the BPRS ( 4 ). The distinction

between positive and negative schizophrenia symptoms has not proved

fruitful in clinical psychometrics, as it lacks clinical validity. The lack of

understanding of the concept of schizophrenia in American psychiatry has

stimulated the efforts to introduce the discrimination between positive

versus negative symptoms. Thus, the DSM-IV states, concerning the

diagnosis of schizophrenia, that the positive symptoms refer to an overreac-

tion of normal functions, while the negative symptoms refer to a diminish-

ment in, or even loss of, normal functions. This is not far removed from the

bipolar affective disorder in which the manic symptoms are just such an

overreaction (Freud termed this a contra-phobic reaction), while depression

or melancholia precisely display diminishment in or even loss of normal

functions. In the schizophrenic disorder, autism, ambivalence, distorted

associative thought processes, and distorted sensory perception are the core

elements, as denoted in the psychotic dimension of the BPRS. The ten BPRS

items identified by Overall to be the most discriminating items for measur-

ing schizophrenic states cover both ‘ negative’ and ‘positive’ symptoms ( 90 ).

An item response theory analysis (Rasch) showed that these items measure a

dimension of severity of schizophrenic states ( 91 ).

Page 57: Clinical Psychometrics

48 Clinical Psychometrics

Attempts to compare the validity of rating scales using the ICD-10 or DSM

systems reveal that these diagnostic systems do not contain a measurement

function ( 4 ).

Clinical versus psychometric validity

When analysing the measurement validity of an assessment scale such as,

for example, a depression scale, it is important first to evaluate its clinical

validity; this can only be done by a highly experienced psychiatrist. In the

first Danish assessment of the Hamilton Depression scale, the two most

experienced psychiatrists at the Psychiatric Clinic of the leading Danish

hospital (Rigshospitalet) were used as ‘Indices of validity’. These were Erling

Dein and Ove Jacobsen. As mentioned previously, Erling Dein was the

supervisor in Lise Østergaard’s doctorate thesis. My own doctorate thesis

from 1981 describes how these two experienced psychiatrists assessed the

degree of depression on a scale from 0 = no depression to 10 = maximum

depression.

Of the 17 symptoms in the Hamilton scale, only six symptoms corresponded

to our “Indices of validity”. These six symptoms are: lowered mood, guilt

feelings, lack of interest, psychomotor retardation, psychic anxiety, and

fatigue. They correspond, to a certain degree, to the six BPRS symptoms

measuring depression (Figure 1.11).

The mathematical or statistical method used in psychometrics to

determine whether it is relevant to add up the different symptom scores as a

measure of present state severity of a psychiatric disorder (item response

theory analyses) is visualised in Figure  4.2 , in which severity of depression is

measured by six different symptoms ( 1, 3 ). These six symptoms have been

taken from Hamilton ’ s Depression scale (see Figure 1.9), as they have turned

out to be the most suitable as a ‘ruler’ (shown in Figure  4.2 ) going from 0 = no

depression to 22 = maximum depression, to illustrate a present state profile.

Figure  4.2 is an attempt to illustrate how the contents of Figure 3.1 can be

translated into a measure or ruler by summing the six symptoms into a total

score. The six symptoms in Figure  4.2 are symbolised by boxes that may

overlap; this is termed statistical uncertainty . To allow each symptom to

express its particular piece of information (its particular prevalence)

corresponding to the area it covers on the ruler, there must not be much

overlap between the symptoms.

As can be seen, lowered mood, lack of interest and fatigue form the first

half of the ruler while anxiety, guilt feelings and psychomotor retardation

make up the second half. As is also seen, fatigue overlaps both lack of interest

and anxiety while guilt feelings overlap anxiety and retardation.

Page 58: Clinical Psychometrics

Modern psychometrics 49

Thus, an assessment scale is an attempt to achieve a linear description of the

severity of the psychiatric disorder through the symptoms selected. This is often

spoken of as a visual scale, going for example from 0 to 22 – as in Figure  4.2 .

In the mathematical-statistical analysis (item response theory analysis) of

the six symptoms in Figure  4.2 , one has ensured that there is no influence of

age or gender on these symptoms (e.g., that older people score differently

from younger people, or that women score differently from men).

When all symptoms point in the same direction, in accordance with the

order shown in Figure 3.1, one says that the degree of severity of the present

state syndrome has been found. Thus, one speaks of the severity of a depres-

sive syndrome, a manic syndrome, et cetera .

Factor analysis would typically attempt to demarcate some symptoms, cov-

ering a small part of the ruler; while a Rasch analysis is based on the assump-

tion that a clinical analysis has been performed to determine whether these

depressive symptoms provide an adequate description of the whole dimen-

sion. Rasch analysis then determines whether or not the placing of these

symptoms on the ruler is influenced by external factors, such as age and gen-

der and geographical area. In order to operate as a ruler, in the same way as

the standard metre bar in Paris, the instrument of measure must function

independently of external factors.

The six symptoms in Figure  4.2 comprise the HAM-D 6 and have been

found to fulfil the Rasch analysis. Thus, neither age nor gender have an effect

on the HAM-D 6 total score; this has been demonstrated in studies performed

both in and outside Denmark; e.g., in Germany, France and the US.

Item-response theory versus factor analysis

It is extremely important to understand that the use of factor analysis is

not  a  method to test whether a scale measures the degree of depression.

Unfortunately, different software systems, such as Statistical Analysis System

(SAS), make it possible for anyone to perform a factor analysis. Previously,

The Depression Ruler: total score a sufficient statistic

Lack of interest

Depressive mood

Anxiety

Fatigue

Retardation

Feelings of guilt

0No depression

22Maximum depression

Figure 4.2 An elaboration of Figure 3.1 – prevalency is now substituted by item score

Page 59: Clinical Psychometrics

50 Clinical Psychometrics

this operation necessitated the aid of a competent statistician, who would

point out that the more the symptoms correlate in a factor analysis, or on

Cronbach’s alpha test, the less is the information value in the symptom. The

key to performing an assessment of a depression is precisely the ability to

register the valid symptoms, as is apparent from Figure  4.2 .

An important aspect of a Rasch analysis is not only that of a professional

selection of the symptoms that covers the ruler or dimension under measurement,

but also that there is no local dependency between the individual symptoms.

It has frequently been debated whether a simple visual analogue scale would

suffice, i.e., a depression ruler corresponding to the BPRS depression symptom

in Figure  4.1 . In this connection, one often uses the Clinical Global Impression

Scale (CGI) ( 92 ). Figure  4.3 shows the CGI-S; the S stands for severity. Evidently,

one has, first of all, to place the patient in the most relevant category of illness. If

this is depression, then a Grade 6 signifies that one has the clinical, global, present

state, impression of the person in question as belonging to the most depressed

group of patients one has seen. In other words, the CGI-S scale in Figure  4.3 can

only be used by highly experienced clinicians. The less experienced are handi-

capped by not having seen enough severely depressed patients, and tend to over-

score the condition. Due to this, the HAM-D 6 is a more reliable scale when

people with varying degrees of psychiatric training are involved. Furthermore,

the use of a symptom assessment scale permits an investigator to explore whether

a certain treatment is only effective on a few of the actual symptoms.

Jacob Cohen: Effect size

Like John Overall, Jacob Cohen (1924–98) studied psychology with special

emphasis on statistics. He majored in 1947; the subject of his PhD dissertation

Score Clinical Global Impression (degree of illness)

0 No sign of mental illness

1 Doubtful presence of mental illness

2 Mild degree of illness

3 Moderate degree of illness

4 Marked degree of illness

5 Severe degree of illness

6 Among the most severely ill patients within the psychiatric diagnosticgroup to which the patient belongs

The Clinical Global Impression Scale, severity (CGI-S)

Figure 4.3 Scoring sheet for Clinical Global Impression Scale, Severity (CGI-S)

Page 60: Clinical Psychometrics

Modern psychometrics 51

from New York University in 1950 was factor analysis in intelligence tests. In

addition to effect size statistics, he is renowned for his scale reliability

measurement; Cohen’s kappa-agreement coefficient ( 93 ). In modern psycho-

metrics, Cohen is best known for the descriptive statistics known as the

standardised effect size ( 94 ). This concept will be dealt with in more detail in

Chapter 5; here it is important to specify that effect size refers to the clinical

significance of a specific treatment (e.g., when comparing an active drug with

placebo) and not only to the statistical significance. Cohen probably provides

his best explanation of this in his paper entitled ‘The earth is round (P < 0.05)’

( 95 ). With reference to clinical psychometrics, one might say that the

standardisation of a scale, for instance the HAM-D, implies that a depressive

condition should be treated (the earth is round) when HAM-D ≥ 18 and not

because of some or other P-value. As a crude measure of effect size, Cohen

employed the norms ‘small’, ‘medium’ and ‘large’. When evaluating clinical

significance of a drug compared to placebo, a ‘medium’ effect size is required,

which is not to be translated into a P-value, but into other clinical targets,

Score Depression measurement

0 No depression

1 Doubtful depression

2

Mild depression3

4

5

Moderate depression6

7

8

Severe depression9

10

The Global (0 –10) depression scale

Figure 4.4 Scoring sheet for Global Depression Scale (0–10)

Page 61: Clinical Psychometrics

52 Clinical Psychometrics

e.g.,  20% more effective (=20% higher response rate) than placebo or a

Number Needed to Treat = 5.

In the interest of comprehensiveness, the depression scale we used as our

clinical reference in our first validity examination of the Hamilton Depression

Scale (HAM-D 17

) is shown in Figure  4.4 . Clinical validity comes before the

psychometric process of validation. As the two experienced psychiatrists

(Erling Dein and Ove Jacobsen) were very reliable in their use of the global

depression scale, on par with the psychiatrists who performed the HAM-D 17

ratings (Tom Bolwig and John Vitger). This was the basis for performing an

item analysis; i.e., to investigate how each of the 17 items in the HAM-D 17

adhered to the global score from 0 to 10 (Figure  4.4 ). The result was the six

symptoms that constitute the depression ruler (Figure  4.2 ).

As far as the BPRS (Figure 1.10) is concerned, it is of course not relevant to

do an item analysis such as when the HAM-D 17

is compared to a global

depression scale. This is because the BPRS is a ‘bipolar’ scale, partly consisting

of a six-item depression scale corresponding to the depression ruler in

Figure  4.2 , and partly consisting of 11 items that can be said to make up a

psychosis ruler or a scale with positive (mania-type) items. In the Appendix,

the BPRS is therefore shown as two scales (schizophrenicity and depression).

It is indeed very disappointing that over the past three decades the clinical

validity of a rating scale is no longer the domain of experienced psychiatrists

but left to inexperienced research workers (social workers, psychologists, and

young medical doctors) using structured clinical interviews. However, these

structured interviews have been developed to help the inexperienced research

worker to be clinically more competent in clinical trials. The investigation of

clinical validity of rating scales or questionnaires has still to be performed by

experienced psychiatrists.

The Mania Scale (Appendix 6) has also been developed using experienced

psychiatrists as index of validity ( 64 ).

Page 62: Clinical Psychometrics

53

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Dr. Phil. Benny Karpatchof, a professor at the University of Copenhagen’s

Department of Psychology, has developed a scale covering the consequences

of a Rasch analysis. This scale ranges from Hell via Purgatory to Paradise

( 96 ). Figure  5.1 is a modified version.

The clinical consequence of having entered psychometric Paradise is

recognised when employing effect-size statistics to denote clinical signifi-

cance in placebo-controlled studies, especially when evaluating dose

response relationships in modern neuropsychopharmacology.

With reference to Wittgenstein we might say that Karpatchof ’s approach is

to bring the items back to their correct home (‘Paradise’) when tested in the

stimulus–response model, the dose–response relationship. Drugs are major

treatment modalities for all medical disorders, including psychiatric disor-

ders. However, this does not imply that drugs can cure any mental disorder.

On the other hand, the pharmacological approach of demonstrating a dose–

response relationship, is a most important scientific principle. It has been

studied only sporadically in clinical psychopharmacology, probably due to

inadequate outcome measures and/or descriptive statistics, i.e., effect-size

statistics.

Effect size and clinical significance

Figure  5.2 illustrates data from a placebo-controlled clinical trial in patients

who all fulfilled the DSM-IV criteria for major depression prior to treatment,

i.e., had a treatment-demanding depression. The patients were randomised

to either placebo or active medication (verum); in this case a selective

serotonin reuptake inhibitor (SSRI). In all, 102 patients entered the trial,

which lasted six weeks. Of these, 52 received active treatment and 50 received

placebo. The patients were assessed using the Hamilton Depression Scale

5 The clinical consequence of  IRT analyses: The pharmacopsychometric triangle

Page 63: Clinical Psychometrics

54 Clinical Psychometrics

(HAM-D 17

). Before treatment, the patients had a HAM-D 17

mean score of 24,

this applied to the 52 patients receiving the SSRI drug and the 50 on placebo.

During the trial the patients were assessed once a week. Endpoint was six

weeks after start of the trial. In total, five patients dropped out in the SSRI

group, three of these due to too many side effects (headache, nausea or

hyperhidrosis), while two patients withdrew because they felt that the

treatment did not help, perhaps because they thought they were receiving

HAM-D

EndpointBaseline Weeks of therapy

Effect size 3/7.5 = 0.40

PlaceboHAM-D = 14

HAM-D = 11Active drug (SSRI)

(Pooled sd = 7.5)

24

14

11

7

0

Figure 5.2 Example of calculation of effect size in a placebo controlled antidepressant study in which HAM-D 17 was used

Hell The selected symptoms are, when taken as a whole, quite inhomogenous, e.g. BPRS18 (Appendix 7) HAM-D17 (Appendix 3a)

Purgatory A certain degree of homogeneity, but with local item dependency, necessitating revision or deletion of these items, e.g. MADRS10 (Appendix 3c)

Paradise Distinct homogeneity without local dependency(total score a sufficient statistic), making it possible to demonstratedose response relationship, e.g. HAM-D6, MES (see Appendix 3d),MAS (see Appendix 6)

The psychometric consequences by the results of an item theory analysis(after Karpatschof)

Figure 5.1 Diagram of the psychometric consequences from the results of an item response theory analysis. (Modifi ed from Karpatschof B. Udforskning i psykologi. De kvantitative metoder. København: Akademisk Forlag 2006)

Page 64: Clinical Psychometrics

The clinical consequence of IRT analyses 55

placebo. The trial was double-blind, so that neither patient nor treating

physician were aware of which type of treatment was given. In total, four

patients dropped out in the placebo group, one with headache and three

because of lack of effect.

The so-called LOCF method (Last Observation Carried Forward) was

used to analyse the results. From the week when the nine patients left the

trial, their HAM-D 17

scores were carried forward as if these patients had

remained on their score until endpoint. The reason for this LOCF method is

a desire to retain all the patients entering the trial in the analysis (Intent-to-

treat). In this way, an attempt is made to describe the treatment course for all

the patients included in the trial and not only the ‘well-behaved’ patients who

completed the full six-week treatment period.

As can be seen in Figure  5.2 , the HAM-D 17

mean score of the placebo-

treated patients at endpoint was 14. The mean HAM-D 17

score was 11 for the

SSRI treated patients.

Effect size is an expression of the difference between the HAM-D 17

score at

starting point (baseline) and at endpoint for active medication (24−11 = 13)

and the corresponding change for the placebo medication (24−14 = 10). This

HAM-D 17

mean score difference of 13−10 is thus 3. This difference is now

considered in relation to the standard deviation on the change in HAM-D 17

for all patients. As seen in Figure  20 this ‘pooled’ distribution (standard

deviation) has been calculated to be 7.5.

Effect size (Figure  5.2 ) is the fraction made up by the difference in the

mean HAM-D 17

change for the two types of treatment (i.e., 3) divided by the

deviation (i.e., 7.5). In this manner the effect size in the trial is 0.40 when

comparing SSRI treatment with placebo.

Effect size statistics were introduced by Cohen ( 97 ). This measure is inter-

esting in that it provides a measure of treatment effect in relation to the

standard deviation on the scale used for assessment ( 97 ). In this way, the

effect size is dimensionless, in that it becomes independent of the raw score

of a particular scale and thus permits a comparison of different rating scales

by use of the standardised effect–size statistic.

In his original publication, Cohen states that an effect size of 0.2 lacks

clinical significance. In his opinion, an effect size of 0.5 has medium or mod-

erate clinical significance, while an effect size of 0.8 or more means marked

clinical significance. These figures are relevant when comparing an active

treatment with placebo, as in Figure  5.2 . Cohen admits that these effect-size

values for clinical significance are actually provisional, subjective cut-offs.

In our 2000 analysis on the effect size of fluoxetine (the first SSRI drug to be

approved) in patients fulfilling the DSM-III criteria for major depression, we

found an effect size of 0.30 on HAM-D 17

, while it was 0.38 on HAM-D 6 ( 98 ).

Page 65: Clinical Psychometrics

56 Clinical Psychometrics

The effect size area between 0.30 and 0.50 has been heavily debated due

to  the fact that the US Federal Drug Administration (FDA) opened an

opportunity for re-analysis of all the data submitted from the pharmaceutical

industry when seeking FDA approval of a drug for DSM-III/DSM-IV major

depression.

Turner et al were allowed access to all the FDA data on 12 newer antide-

pressive drugs ( 99 ). They found a mean effect size of about 0.30, as also

found by Kirsch et al when analysing a subgroup of FDA data on six FDA-

approved drugs ( 100 ). The HAM-D had been used as an effect measure in

more than 95% of these trials.

Figure  5.3 shows a comparison of the HAM-D 17

and HAM-D 6 data for

those of the antidepressants where it was possible to gain access to the indi-

vidual HAM-D items and not only the total HAM-D score. It is a limitation

of the Turner and Kirsch analyses that the FDA only provided the HAM-D

total mean score, not the individual item scores. Furthermore, in some trials

the HAM-D total score is HAM-D 17

while in other trials it is HAM-D 21

( 98, 100, 101 ).

Analyses of clinical significance through effect-size studies using health

related quality of life scales have demonstrated a change in the denominator

corresponding to half the standard deviation, i.e., 0.50 ( 101 ). However, the

effect size of 0.50 has obviously in itself a 95%-confidence interval which

ranges from 0.36 to 0.63. Clinical significance when evaluating antidepres-

sive effect lies precisely in this range between 0.36 and 0.63, as can be seen in

Figure  5.3 .

The pharmacopsychometric triangle

Within modern psychometrics the creation of a pharmacopsychometric

triangle has now become possible, based particularly on two specific

elements: the concept of transferability and effect-size statistics.

Cattell demonstrated how he had attempted in vain to use factor analysis

to test for transferability ( 107 ). Catell understood transferability as an expres-

sion of whether a rating scale measures the same phenomenon or the same

dimension in different groups of patients (e.g., men versus women, younger

age groups versus older age groups, primary depression versus secondary

depression) or in the same group of patients when the rating scale is used for

weekly assessments during a course of antidepressive therapy.

This is precisely what the item response theory models ensure; a test of

transferability. A landmark study in this area is the Rasch analysis that was

performed in connection with an antidepressive medication study using

weekly HAM-D assessments ( 108 ).

Page 66: Clinical Psychometrics

The clinical consequence of IRT analyses 57

The use of effect-size statistics in the pharmacopsychometric triangle is

important in that it is independent of the rating scale used, as this

dimensionless statistic only uses the mean and standard deviation.

As can be seen in Figure  5.4 , the upper left corner of the triangle (A) is

the desired clinical effect, with special emphasis on dose–response

relationship. This dose–response relationship highlights what Rasch

expresses as follows:

If we want to know something about a quantity, then we have to

observe something that depends on that quantity, something that

changes if the quantity varies materially. In that case we have a

sufficient statistic ( 62, 63 ).

Studies HAM-D17 HAM-D6

Bech et al. (98,100)Fluoxetine20–60mg 0.30 0.38

Entsuah et al. (102)Fluoxetine 20–60mg 0.24 0.40

Bech et al. (103)Citalopram20mg40mg

0.090.39

0.210.51

Bech et al. (104)Escitalopram10mg20mg

––

0.380.61

Bech (105)Mirtazapine 15–60mg 0.49 0.42

Bech et al. (106)Duloxetine 60mg120mg

0.460.49

0.510.57

Figure 5.3 Effect size results in placebo-controlled antidepressive trials using HAM-D 17 and HAM-D 6

Page 67: Clinical Psychometrics

58 Clinical Psychometrics

The upper right corner (B) illustrates the undesired clinical effect, i.e., the

different side effects. The side effect scale used in the example is the UKU

Scale ( 4, 109 ).

Lastly, C illustrates patient-reported, clinically-related quality of life,

which can be said to be a balance between the desired versus the undesired

effects of the drug under examination.

When discussing the different classes of psychotropic drugs, we can

refer to the ICD-10 hierarchy, or ladder (Figure  5.5 ), which ranks the

various psychiatric disorders so that at the ‘bottom’ we have personality

disturbances (for which there is, of course, no available pharmacological

therapy), then on the next step anxiety (as implied in Figure  5.5 , these

areas of psychiatry are Freudian, while the steps further up are

Kraepelinian). According to ICD-10, a patient suffering from both depres-

sion and anxiety should be diagnosed as depressive, and a patient suffering

from both mania and schizophrenia should be diagnosed as schizophrenic,

and so on.

Looking at the six steps of the ICD-10 diagnosis ladder (Figure  5.5 ),

we find dementia on the top step ( 1 ). Personality disturbances are placed on

the lowest step ( 6 ). As far as these are concerned, the use of drugs to treat

such deviant character traits as psychopathy has always been a very prob-

lematic issue.

A major drawback of the hierarchical structure of ICD-10 is the lack of

ability to distinguish between a primary depressive condition and a work-

related stress condition (distress), as the latter is diagnosed as a depressive

condition if its severity is consistent with a moderate (major) depression. It

was precisely the ability of an experienced psychiatrist to distinguish between

these two conditions that formed the basis for the introduction of antidepres-

sive drugs (imipramine) ( 1 ). In epidemiological studies that use ICD-10

Measurement of wantedclinical effect(e.g. HAM-D6, see Appendix 3f)

Measurement of unwantedclinical effect(e.g. PRISE, see Appendix 10)

Resulting patient-related quality of life(e.g. WHO-5, see Appendix 8a)

A B

C

Figure 5.4 The pharmacopsychometric triangle. (Modifi ed from Bech P. Applied psychometrics in clinical psychiatry: Acta Psychiatr. Scan 2009; 120: 400–409, Figure 1.)

Page 68: Clinical Psychometrics

The clinical consequence of IRT analyses 59

diagnoses, the prevalence of work-related stress lies below 1%, because many

people develop moderate depression ( 110 ); that this is secondary to work-

related stress can no longer be read from the diagnosis.

Antidementia medication

We have chosen data on the antidementia drug donepezil to illustrate how all

three areas of the pharmacopsychometric triangle (A, B and C) provide an

integrated picture (Figure  5.6 ). The data are from one of the most well-

designed antidementia studies among those assessed by the US Federal Drug

Administration (FDA) when authorising this product ( 111 ). The patients

included in the study fulfilled the DSM-III criteria for Alzheimer’s Disease.

In a double-blind 15-week study, donepezil was administered in two fixed

doses of 5 mg and 10 mg, and both doses were compared to placebo. The

11-item Alzheimer’s Disease Assessment Scale (ADAS) and the Mini Mental

State Examination (MMSE) were used as rating scales.

The MMSE effect size is negative, since a higher score on this scale

indicates improved cognitive functioning, while the ADAS effect size is posi-

tive, since a higher score on this scale indicates more symptoms of cognitive

dysfunction. On the MMSE, an effect size higher than 0.40 was only achieved

on 10 mg donepezil.

On the QoL scale, a higher score signifies better quality of life, but here

5 mg of donepezil is quite without effect, while 10 mg gives a statistically

higher effect but no clinically relevant effect as the effect size is merely −0.25

(negative as a higher score signifies an improved quality of life).

Dementia

Schizophrenia

Mania

Depression

Anxiety

Personality disorders Anti-anxiety

medications [5]

Antidepressants [4]

Anti-manic medications [3]

Anti-psychotic medication [2]

Anti-dementia medication [1]

Psychotherapy [6]

Kraepelin

Freud

The concordance between the ICD10 hierarchical (ladder) and the pharmacological classes of psychotherapeutical drugs

Figure 5.5 Diagram of the six diagnostic hierarchy steps of the ICD-10 in which stress-related anxiety and personality disorders lie within the area covered by Freudian psychiatry

Page 69: Clinical Psychometrics

60 Clinical Psychometrics

Figure  5.6 also shows that relatively few patients are unable to complete the

study because of side effects, especially nausea, which is one of the main

donepezil side effects. The 10% drop out in this controlled study is in line

with a recently published Danish study ( 111, 112 ).

Dementia therapy addresses the behavioural changes brought on by the

condition; here the weight of the burden resting on the relatives is of major

importance for the course of the disease. Thus, their quality of life is often

what is assessed, since a useful patient-related measure is difficult to find. If

the results of the patient’s own quality of life assessment are counter-intuitive,

then their relative’s assessment is used instead.

The behavioural scale commonly used is the Neuropsychiatric Inventory

(NPI), in which each symptom is assessed based on both the patient and on

the burden of illness as experienced by their relative ( 111, 115 ).

Antipsychotic medication

When evaluating the effect of antipsychotic medication, a comparison of

effects against placebo is important. However, treating psychotic patients

(i.e., especially schizophrenic patients) with inactive (placebo) medication

poses major ethical issues, as highly effective antipsychotic drugs are available.

There are two major categories of antipsychotic medication, the typical

and the atypical antipsychotic drugs. Chlorpromazine was the first drug to

demonstrate an antipsychotic effect that was quite different from that of the

medicines available prior to this time, such as phenemal. The most potent

typical antipsychotic drug is haloperidol. This was the most used antipsy-

chotic in the treatment of acute psychosis worldwide until the arrival of the

atypical antipsychotics.

Wanted effectADAS MMSE

5 mg 0.47 – 0.31

10 mg 0.58 – 0.41

% non-completers due toside effects

Placebo 5 mg 10 mg

2% 4% 10%

Unwanted effect

5 mg – 0.05

– 0.2510 mg

QoL

A B

C

Figure 5.6 The pharmacopsychometric triangle for Donepezil (111)

Page 70: Clinical Psychometrics

The clinical consequence of IRT analyses 61

Haloperidol was thus the most frequently employed comparative medica-

tion at the end of the twentieth century when investigating the antipsychotic

effect of the new, atypical drugs. The most well-designed trial was performed

in the US ( 113 ).

This US trial is often designated a ‘landmark’-study, as fixed doses of both

haloperidol (4, 8 and 16 mg) and the new atypical antipsychotic sertindole

(12, 20 and 24 mg) were used. A re-analysis of this trial in accordance with

the pharmacopsychometric triangle has recently been made ( 115 ).

From a scientific point of view, it is very important to include a placebo

group; in Europe, however, this would be perceived as ethically debatable.

Figure  5.7 shows the pharmacopsychometric triangle for the assessment of

antipsychotic actions when comparing the classical drug haloperidol with

the modern atypical drug sertindole; both of them compared to placebo in

the US-based trial ( 115, 118 ).

The antipsychotic effect (A) is measured on PANSS 11

which consists of

the 11 BPRS symptoms (items 3, 4, 7, 8, 10, 11, 12, 14, 15, 16 and 17 (see

Appendix 7)) shown in Figure 1.10. BPRS and PANSS differ in the latter ’ s

more precise anchors for the items. Moreover the total score of these 11 BPRS

items fulfils the item response theory model, ( 91, 92, 115, 118 ) definition of

each scale step.

The Simpson–Angus scale is used to measure the side-effect s profile (B).

This is shown in Figure  5.8 and consists of ten symptoms, all measuring the

extrapyramidal symptoms (EPS) corresponding to those seen in Parkinson ’ s

Disease. These extrapyramidal symptoms make the use of the typical classical

antipsychotics problematic, and in the development of the modern atypical

drugs a major goal has been to avoid such extrapyramidal side effects.

Depression rating scales have often been used to assess quality of life in

schizophrenic patients; these may, however, provide counter-intuitive results

in schizophrenics, as they also do in dementia ( 116 ). In Figure  5.7 the

depression items correspond to the six BPRS items in Appendix 7.

In Figure  5.9 , data on Mokken’s coefficient of homogeneity are illustrated;

this coefficient is a precise indication that the total score is a sufficient meas-

ure. On the PANSS 11

(A) and the Simpson–Angus Scale (B), the coefficient of

homogeneity is above 0.40, which means that the total score is a sufficient

measure in these scales.

On the PANSS 6 depression scale (C), the coefficient of homogeneity is

just  below 0.40, and this indicates that use of the total score is only just

permissible.

Figure  5.7 shows that all the haloperidol doses (4, 8 and 16 mg) are effective

with an effect size greater than 0.40 as regards antipsychotic effect (A). As

Page 71: Clinical Psychometrics

Wan

ted

effe

ct

P

sych

otic

sub

scal

e P

AN

SS

11

Unw

ante

d ef

fect

Sim

pson

-Ang

us S

cale

(S

AS

)

Gen

eric

QoL

sca

leD

epre

ssio

n su

bsca

le P

AN

SS

6

Dos

e m

gD

ose

mg

Ser

tindo

leH

alop

erid

ol

12 m

g20

mg

24 m

g

0.39

0.64

0.45

0.50

0.73

0.55

4 m

g8

mg

16 m

g

Hal

oper

idol

Dos

e m

gD

ose

mg

12 m

g20

mg

24 m

g

Ser

tindo

le

0.0

2–

0.05

–0.

33

–0.

32–

0.32

–0.

48

4 m

g8

mg

16 m

g

Dos

em

gS

ertin

dole

Hal

oper

idol

Dos

e m

g

12 m

g20

mg

24 m

g

0.12

0.44

0.32

0.35

0.37

0.11

4 m

g8

mg

16 m

g

AB

C

Fig

ure

5.7

The

pha

rmac

opsy

chom

etric

tria

ngle

for

antip

sych

otic

med

icat

ion.

(M

odifi

ed fr

om B

ech

et a

l, D

ose-

resp

onse

rel

atio

nshi

p o

f ser

tindo

le a

nd h

alop

erid

ol u

sing

the

pha

rmac

opsy

chom

etric

tria

ngle

. Act

a Ps

ychi

atr

Scan

d 20

11; 1

23: 1

54–1

61, F

igur

e 1.

)

Page 72: Clinical Psychometrics

The clinical consequence of IRT analyses 63

regards sertindole, the lowest dose (12 mg) is only just effective, while 20 mg

is the optimal dose.

The side-effect measures on the Simpson–Angus Scale (B) (see Figure  5.9 )

show an effect size greater than −0.30 for all haloperidol doses. The effect size

is negative due to the fact that the side effects emerge during treatment. For

sertindole, the optimal dose for antipsychotic effect (20 mg) is entirely  without

side effects, since an effect size of +/−0.20 has no clinical significance.

However, the side effects are considerable at a dose of 24 mg sertindole.

As regards depression and quality of life, 20 mg sertindole (the optimal

antipsychotic dose) also has an antidepressive effect with an effect size greater

than 0.40. None of the haloperidol doses reaches an effect size of 0.40, and

with the highest dose, the effect size is only 0.11.

By use of the pharmacological triangle one can thus determine whether

the scales are valid (total score a significant measure) as well as get an

overview of effect size statistics.

It is thought-provoking that even such a relatively low dose as 4 mg of halop-

eridol causes considerable Parkinsonian symptoms, and that the highest dose of

16 mg causes very severe side effects without any signs of remission of depres-

sive symptoms and consequently no increase in quality of life. When using

haloperidol as an alternative to the mood stabilising effect of lithium in bipolar

disorder, we operated with a very small dose between 0.5 and 2 mg ( 117 ).

Nr. Item Score

1 Gait 0–4

2 Arm dropping 0–4

0–40–4

0–4

0–40–4

0–4

0–4

0–4

3 Shoulder shaking

4 Elbow rigidity

5 Wrist rigidity

6 Leg pendulousness

7 Head dropping

8 Glabella tap

9 Tremor

10 Salivation

Total score (0–44)

Figure 5.8 Scoring sheet of the Simpson–Angus side effect Scale. (Adapted from Simpson GM Angus JWS. A rating scale for extrapyramidal side effects. Acta Psychiatr Scand 1970;46: (suppl. 212):11–19)

Page 73: Clinical Psychometrics

Wan

ted

effe

ct

P

sych

otic

sub

scal

e P

AN

SS

11

Tre

atm

ent w

eek

Hom

ogen

eity

Wee

k 4

Wee

k 6

Wee

k 8

0.44

0.46

0.44

Tre

atm

ent w

eek

Hom

ogen

eity

Wee

k 4

Wee

k 6

Wee

k 8

0.48

0.42

0.45

Unw

ante

d ef

fect

Sim

pson

-Ang

us s

cale

Tre

atm

ent w

eek

Hom

ogen

eity

Wee

k 4

Wee

k 6

Wee

k 8

0.38

0.39

0.38

Gen

eric

QoL

sca

leD

epre

ssio

n su

bsca

le P

AN

SS

6

AB

C

Fig

ure

5.9

Psy

chom

etric

val

idat

ion

of t

he s

cale

s in

Fig

ure 

5.7

(Mok

ken

anal

ysis

). (

115)

Page 74: Clinical Psychometrics

The clinical consequence of IRT analyses 65

Antimanic medication

The first placebo-controlled trial in modern psychopharmacology took place

in the Danish city of Århus (Risskov), where Professor Erik Strömgren

initiated a study with manic patients, using lithium as therapy. In 1949, the

use of lithium was re-introduced in Australia, where John Cade (1912–1980)

demonstrated that lithium seemed to possess an antimanic effect in bipolar

patients, while it did not have an antipsychotic effect in schizophrenia ( 118 ).

The 1950s saw a commencement of clinical trials using placebo control.

Mogens Schou headed the placebo-controlled study in Århus, where he was

able to demonstrate a significantly higher effect of lithium than of placebo in

the treatment of mania. In 1988, a ‘landmark’ study of lithium versus antip-

sychotic medication took place at Northwick Park Hospital in London ( 119 ).

In a randomised controlled, double-blind trial, patients (120 in all) admitted

with psychosis (i.e., schizophrenia, schizo-affective psychosis, mania) were

either treated with lithium, pimozide (a drug similar to haloperidol), a com-

bination of these two active drugs, or with placebo. The trial had a duration

of three weeks and the results showed that the present state symptom profile

and not the DSM-III diagnosis was the valid factor. Regardless of diagnosis,

pimozide had a specific effect on the psychotic symptoms (hallucinations

and delusions), while lithium had a specific effect on the manic symptoms

assessed by the Bech-Rafaelsen Mania Scale (see Appendix 6). In his award-

winning book ‘Madness Explained’, the psychologist R.P. Bental wonders

why there has not been more of this type of study, in which all patients hos-

pitalised during a specific period of time are treated according to standard-

ised principles. He calls this investigation a landmark study ( 120 ).

Mogens Schou demonstrated the high prophylactic effect of lithium on

both mania and depression in bipolar patients. There are no placebo-

controlled trials with haloperidol in mania, as the use of placebo in such

severe cases is considered to be unethical. Therefore, the sertindole study

(Figure  5.7 ) is very important.

Around 1980, it became possible to measure haloperidol plasma

con centrations and the psychiatric department of the Danish Rigshospitalet

performed a study to investigate a potential connection between haloperidol

plasma concentration and clinical effect ( 121 ). This study showed that

severely manic patients (measured on the Bech–Rafaelsen Mania Scale

(see Appendix 6) could respond after 6 days of treatment with a fixed dose

of  10 mg haloperidol. The patients with the highest plasma concentration

showed the best response. As patients differ in their metabolism of haloperi-

dol and as there are no active metabolites, the trial resulted in a recommen-

dation to use blood sampling in haloperidol therapy.

Page 75: Clinical Psychometrics

66 Clinical Psychometrics

With the emergence of atypical antipsychotics, the drug olanzapine

proved to have the most reliable antimanic effect. As women are slower

metabolisers of olanzapine than men, we performed a study on manic

women at the University Hospital of Geneva in Switzerland. These severely

manic patients responded after 14 days on an olanzapine dose of 20 mg,

and we could yet again show that the patients with the highest plasma

concentration had the most pronounced effect as assessed by the Bech–

Rafaelsen Mania scale (MAS) ( 122 ). In this trial, the MAS was compared

with the US’ Young Mania Scale (YMRS) and proved to be far more

valid,  both in item response theory analysis and plasma concentration

effect relations.

Antidepressive medication

The ‘second generation’ antidepressants provided us with a line of products

developed on the basis of a hypothesis regarding their biological mode of

action ( 123 ). They all had different chemical formulations, in contrast to

the ‘first generation’ antidepressants that had their tricyclic chemical

structure in common; for this reason these antidepressants are often called

‘tricyclics’. The new generation had a selective inhibiting effect on

serotonin reuptake ( s elective s erotonin r euptake i nhibitors, or SSRIs). The

tricyclic antidepressants also possess this effect, together with many other

modes of action, such as their antihistamine effect, which is quite potent.

Their sedative effect makes car driving problematic. The antihistamine

effect also causes an increased appetite, so that weight gain should be

monitored.

The SSRIs do not have these ‘side effects’, but their serotonin reuptake

inhibition can give other side effects, such as nausea and vomiting,

hyperhidrosis, headache, sleep disturbances, agitation and sexual dysfunc-

tion; these side effects are caused by their serotonin 2A receptor stimulating

effect while the SSRIs’ antidepressive effect is due to serotonin 1A receptor

stimulation.

As many of these side effects are listed as depressive symptoms in the

Hamilton Depression Scale or the MADRS (see Appendix 3), but not in

the HAM-D 6 , it is vital to use the HAM-D

6 in dose-response relationship

studies.

Figure  5.10 shows the pharmacopsychometric triangle for the second-

generation drug escitalopram where HAM-D 6 was used as measure of

antidepressive effect (A) and the quality of life scale Q-LES-Q (C) was used

to measure patient-related quality of life ( 104, 106 ). The percentage of

Page 76: Clinical Psychometrics

Wan

ted

effe

ctH

AM

-D6

Non

-com

plet

ers

%du

e to

sid

e-ef

fect

s

Ove

rall

QoL

Dos

eE

ffect

siz

e

10 m

g es

cita

lopr

am20

mg

esci

talo

pram

40 m

g ci

talo

pram

0.31

0.70

0.46

Pla

cebo

10 m

g es

cita

lopr

am20

mg

esci

talo

pram

40 m

g ci

talo

pram

7.4%

6.7%

10.4

%9.

6%

LES

-QD

ose

10 m

g es

cita

lopr

am20

mg

esci

talo

pram

40 m

g ci

talo

pram

–0.

14–

0.48

–0.

43

A

C

B

Fig

ure

5.1

0 T

he p

harm

acop

sych

omet

ric t

riang

le fo

r es

cita

lop

ram

and

cita

lop

ram

in d

epre

ssio

n (1

04)

Page 77: Clinical Psychometrics

68 Clinical Psychometrics

patients leaving the study before completion of the planned eight weeks of

treatment was used as an overall measure of side effects.

Mokken analysis showed that both the HAM-D 6 and the Q-LES-Q were

unidimensional (coefficients of homogeneity of 0.40 or higher).

The study shown in Figure  5.10 is a ‘landmark’ study in the sense that it

included a Quality of Life scale and in that escitalopram was not only com-

pared with placebo but also with 40 mg of citalopram, which a previous

dose–response analysis using the HAM-D 6 had shown to be the optimal dose

in patients with a baseline HAM-D 17

of 20 or higher.

The study data shown in Figure  5.10 included only patients with a DSM-IV

major depression who scored 30 or higher at baseline on the MADRS,

indicating a rather marked degree of depression. As can be seen, 10 mg

escitalopram was an inadequate dose in these patients as evident both on the

HAM-D 6 and on the LES-Q. Both 40 mg of citalopram and in particular

20 mg of escitalopram, however, achieved an effect size greater than 0.40

( 104, 106 ).

Figure  5.11 shows the pharmacopsychometric triangle for desvenlafaxine,

which is the active metabolite of venlafaxine. While escitalopram, like other

SSRI drugs, only has a serotonin specific action, both venlafaxine and

desvenlafaxine have a reuptake action on noradrenaline as well as serotonin.

For this reason, these drugs have the acronym SNRI (serotonin and

noradrenaline reuptake inhibitors). The element that makes the trial shown

in Figure  5.11 a landmark study is that the WHO-5 quality-of-life scale was

used in the placebo-controlled trials leading to a FDA approval of desvenla-

faxine with 50 mg as the lowest effective dose ( 124 ).

However, Figure  5.11 shows that effect size only reaches 0.40 on the

HAM-D 6 for this dose. On the WHO-5, the effect size is negative since a

higher score signifies increased well-being. For the 100 mg desvenlafaxine

dose, the HAM-D 17

, the HAM-D 6 and the WHO-5 are all above the 0.40

limit for clinical significance. As regards side effects, of which hyperhidrosis

is the most significant, there is no difference between 50 mg and 100 mg des-

venlafaxine.

Three decades ago it was concluded that even for the rather potent first-

generation antidepressants (i.e., imipramine) we are not able to demonstrate

their actions from an aetiological point of view ( 64 ):

The influence of the disorder on the total variance in response to

treatment obviously depends on the specificity of the therapeutic

effect. Drugs acting on an aetiological factor parallel to vitamin B 12

in pernicious anaemia are more specific than are drugs acting on an

intermediary factor like digoxin in heart failure. However, drugs need

Page 78: Clinical Psychometrics

The clinical consequence of IRT analyses 69

not act on an aetiological factor to be of nostological importance.

From our studies we cannot evaluate whether imipramine acts on an

aetiological rather than on an intermediary factor in endogenous

depression. What we have found is that in these patients with

endogenous depression (defined by the diagnostic Newcastle Scale)

a correlation emerged between plasma levels and goal outcome.

By use of the HAM-D 6 it was moreover possible to obtain a

population-independent response-curve, i.e., a curve indicating the

treatment effect in relation to treatment time. Such a curve might

indicate that if an outcome is imipramine-dependent, the patient’s

response has to follow the response-pattern for imipramine. ( 108 ).

Antianxiety medication

In the 1960s, the benzodiazepines, in particular diazepam, became available

to treat the different anxiety disorders, especially generalised anxiety.

As anxiety disorders, especially generalised anxiety, are chronic in their

nature, the development of dependency on benzodiazepines was seen as very

problematic; this dependency is almost of the same nature as that known for

alcohol. Cross tolerance between diazepam and alcohol was demonstrated;

in some places (including in Denmark) this knowledge was used in the

treatment of alcohol withdrawal. However, diazepam did not prove to be reli-

ably effective in this critical alcohol withdrawal condition, which can be

Wanted effect

Placebo

50 mg

100 mg

7%

12%

13%

Unwanted effectHyperhidrosis

Dose Effect size

50 mg – 0.30

100 mg – 0.45

Quality of life / WHO-5

A B

C

HAM-D6

0.43

0.50

DoseHAM-D17

50 mg 0.33

100 mg 0.41

Effect size

Figure 5.11 The pharmacopsychometric triangle for desvenlafaxine in depression, using the WHO-5 (124)

Page 79: Clinical Psychometrics

70 Clinical Psychometrics

lethal when untreated. Other drugs, such as phenemal, are safer than

diazepam in alcohol withdrawal syndrome.

Both phenemal and diazepam have quite a significant effect on anxiety

and since the 1960s attempts have been made to find drugs that do not

generate dependency. General practitioners have often employed adrenergic

beta-receptor inhibitors such as propranolol, the archetypical ‘beta-blocker’.

It belongs to a group of drugs used in hypertension, also a chronic condition

in its milder forms. Long-term propranolol therapy in hypertension has not

caused the dependency seen with alcohol or benzodiazepines.

The differentiation between mental anxiety symptoms and physical

(somatic) anxiety symptoms that Hamilton showed to be important by his

factor analysis (Table  1.1) has proved to be of major clinical significance.

Thus, the effect of benzodiazepines and ‘beta-blockers’ (propranolol) is

predominantly on the physical anxiety symptoms. These somatic anxiety

symptoms dominate the picture in a normal stress-related anxiety reaction.

This is why benzodiazepines, alcohol and propranolol are used in these anx-

iety states. Propranolol is used to calm exam nerves or for airplane pilots who

experience anxious trembling during take-off. As propranolol does not cross

the blood-brain barrier, it has no sedative effect, as is the case with alcohol

and benzodiazepines.

While there are no definite ‘landmark’ studies with propranolol in general-

ised anxiety, clinical experience with the drug is not convincing, due to its

specific effect on the physical anxiety symptoms. A trial drug developed in

the 1980s by the then Swiss company Ciba-Geigy (CGP 361 A) demonstrated

a central anxiolytic effect. As it had proved to have a greater anxiolytic than

antihypertensive effect, the drug was assessed in a Danish placebo-controlled

trial ( 125 ). This was quite a small pilot study with about 17 patients in each

treatment group.

The pharmacopsychometric triangle in Figure  5.12 shows that this beta-

blocker was effective in generalised anxiety on both the Hamilton Anxiety

Scale and on the six-item HAM-A 6, which measures psychic anxiety

symptoms (see Table 1.1). However the drug’s effect on the Quality of Life

scale was less pronounced, although it was well tolerated.

This study is mentioned here due to the fact that, in contrast to pro-

pranolol, this beta-blocker demonstrated an effect on the psychic anxiety

symptoms, and also because a positive well-being scale was included.

The five WHO-5 analogue symptoms are actually items from the Hospital

Anxiety and Depression Scale (HADS) (see Appendix 8b). Some of the

items in this questionnaire are aimed at symptom experience (negatively

phrased questions) and some at positive well-being (positively phrased

Page 80: Clinical Psychometrics

The clinical consequence of IRT analyses 71

questions).The WHO-5 is a questionnaire for measurement of general,

positive well-being.

As both phenemal and the benzodiazepines are antiepileptics, attempts

have been made to measure the anxiolytic effects of modern antiepileptics

that do not possess the dependency producing effect of diazepam.

One the new antiepileptics, pregabalin, has been found effective in gen-

eralised anxiety and is authorised for use on this indication. A re-analysis of

the placebo-controlled pregabalin trials in patients with generalised anxi-

ety has shown that 150 mg pregabalin is an inadequate dose, with a HAM-

A 14

effect size of 0.31, and merely 0.20 on the valid HAM-A 6 ( 126 ).

Pregabalin doses between 200 mg and 450 mg gave a HAM-A 14

effect size of

0.56 and a HAM-A 6 effect size of 0.49. Higher doses did not result in larger

effect sizes.

These controlled pregabalin studies in generalised anxiety included differ-

ent benzodiazepines, but not diazepam. Clonazepam and alprazolam are

thought to have the lowest dependency syndrome risk. The alprazolam effect

size was about 0.35 on the HAM-A 14

and HAM-A 6 .

Only one trial exists in which pregabalin was compared to an antianxiety

drug; venlafaxine. For a dose of a mere 75 mg venlafaxine, the HAM-A 6 effect

size was 0.40, but only 0.31 on the HAM-A 14

.

The Rickels et al study is the landmark study in generalised anxiety

disorder, as it is a placebo-controlled comparison of diazepam with imi-

pramine and trazodone, focusing however, on the psychic anxiety

symptoms of Hamilton’s Anxiety Scale (see Figure  1.8) ( 127 ). Using the

total score of the psychic anxiety factor, Rickels et al demonstrated that

Tolerability

Quality of lifeHADS (see Appendix 8b)

Coefficient ofhomogeneity

A B

C

Effect size

HAM-A14 0.47

HAM-A6: 0.63

Wanted effect

Placebo

Active

100%

86.7%

Effect size

– 0.26

0.34

0.46

HAM-A14

HAM-A6:

WHO-5 0.68

Figure 5.12 The pharmacopsychometric triangle for anti-anxiety (125)

Page 81: Clinical Psychometrics

72 Clinical Psychometrics

imipramine was significantly superior to diazepam compared to placebo

( 127, 130 ). When using the total score of all 14 Hamilton items, however,

the superior effect of imipramine versus diazepam became less obvious as

the physical symptoms weigh too heavily in the total score when the

complete scale is used.

Mood stabilising medications

Lithium is still considered to be the most effective mood stabiliser ( 121 ).

Evaluated within the framework of the pharmacopsychometric triangle, the

profile of lithium in affective disorder is as illustrated below.

In Figure 5.13 (A) covers the clinical effect of lithium. A dose– response

relationship has been observed ( 121 ). Thus for an acute antimanic effect, a

dose resulting in concentrations between 0.8 and 1.2 mmol/l is most effec-

tive. For antidepressant augmentation in patients with therapy-resistant

depression, a concentration between 0.3 and 0.5 mmol/l is most effective. For

long-term mood stabilisation between 0.5 and 0.8 mmol/l is most appropriate.

In this mood stabilising approach the side-effects, as seen in high antimanic

doses, should be eliminated, i.e., such side-effects as tremor ( 128 ). Car

simulator trials have shown that in a range from 0.5 to 0.8 mmol/l, l ithium

has no sedative effect on the psychological functions relevant for car driving

behaviour.

Very few reports have been published on quality of life in long-term

lithium therapy with reference to typical quality of life questionnaires such as

SF-36 or WHO-5. However, within instruments assessing quality of life,

suicidal thoughts are often used to demarcate the lowest possible level of

quality of life (‘life is not worth living’). Evidence has been accumulated

Clinical effect /dose of lithium mmol /l (118)

0.8 – 1.2AntidepressiveMood stabilizing

A B

C

Side effects

Non-sedative profile:Simulated cardriving (128)

Quality of lifeAntisuicidal effect (118)

Antimanic0.3 – 0.50.5 – 0.8

Figure 5.13 The pharmacopsychometric triangle

Page 82: Clinical Psychometrics

The clinical consequence of IRT analyses 73

showing that lithium is the most effective antisuicidal medication in

psychopharmacology ( 118, 121 ).

Combination of antidepressants

In placebo-controlled trials we are focusing on the response to a single

antidepressant medication to identify the effect size for this medication

against placebo. We have rather few trials studying the effect of combining

two antidepressants, which is often used in daily clinical practice, if a patient

has not responded to the first drug attempt. In this case the common

approach is to maintain the treatment with the first drug and then to add

another drug to obtain remission. The landmark study in this approach by

augmentation of another drug is the STAR-D study ( 129 ). This study has

recently been re-analysed using the pharmacopsychometric triangle as

outcome, i.e., with the HAM-D 6 as criterion for a pure antidepressive

effect ( 130 ). By use of this valid subscale for antidepressive effect we could

demonstrate the augmentation with bupropion to patients not responding to

citalopram was superior (P = 0.03) to augmentation with buspirone ( 130 ).

Page 83: Clinical Psychometrics

74

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Among the many foreigners to visit Wundt’s laboratory in Leipzig in

the  1880 s, was the medical candidate and psychologist William James

(1842–1910), who was present at one of Wundt’s lectures in November 1882

and was also shown the laboratory.

James must have sat in the auditorium and listened to Wundt’s lecture

with, among others, Kraepelin, who was the only physician amongst Wundt’s

students. On that November day, Kraepelin was very preoccupied with his

experiments in the psychological laboratory. We have no certain knowledge

of a possible encounter between the two physicians at Wundt’s laboratory, but

James spoke German and they probably exchanged a few words.

In his capacity as a physician, James had set up a physiological laboratory

at Harvard University in 1875, but not until the beginning of 1884 did it

become a psychological laboratory modelled upon Wundt’s. In 1889 James

was called to a professorship in psychology, having already been appointed

professor of philosophy at Harvard in 1885 ( 131 ). 1890 saw the publication of

his main work ‘Principles of Psychology’, still thought to be the most

significant publication within scientific psychology. However, James

remained more of a philosopher than a psychologist and became more and

more absorbed with what we now call health-related quality of life ( 132 ).

In 1897, James published a collection of essays entitled ‘The Will to Believe’

( 133 ). Among these essays was ‘Is Life worth Living?’; now regarded as the

‘landmark’ publication in health-related quality of life. James took as his starting

point the fact that human well-being is a subjective, emotional perception

and should thus be measured psychometrically, not biologically. He did not

attempt himself to develop questionnaires measuring quality of life.

In his ‘Talks to teachers’ on psychology for students about some of life’s

ideals, James refers to the following statement by Wilhelm Wundt at the turn

of the 20 th century:

6 The clinical consequence of  IRT analyses: Health-related quality of life

Page 84: Clinical Psychometrics

The clinical consequence of IRT analyses 75

And if I [Wundt] were asked what the work of experimental

observation in psychology has consisted of, and still consists of for

me, I should say that it has given me an entirely new idea of the

nature and connection of our inner processes … the close union of all

those psychic functions normally separated by artificial observations

and names, such as ideation, feeling, will; and I saw the inner

homogeneity, in all its phases of mental life

Quality of life might be the interconnection of feelings, will and well-

being ( 134 ).

It was the first proper philosopher of the welfare state, Jeremy Bentham

(1748–1832), who was also the first to attempt to measure hedonia, i.e.,

subjective well-being. He felt that each citizen should be able to achieve his or

her optimal mental well-being within the economic and cultural boundaries

of society ( 135 ). He defined subjective well-being as the difference between

the sum of all kinds of pleasure and the sum of all kinds of pain experienced

by the individual in a given period of time, e.g., during a course of treatment

lasting for some weeks or a few months.

However, the scientific ‘landmark’ study was not performed until the end

of the twentieth century. It used the MOS SF-36 questionnaire (Medical

Outcomes Studies, Short Form) (Figure  6.1 ), in which ‘36’ refers to the

36  questions in the questionnaire ( 4 ). As seen in Figure  6.1 , the 36 SF-36

items constitute eight subscales. The first subscale at the top of Figure  6.1 is

Physical Functioning (PF) and contains the ten items listed in the question-

naire ’ s Subheading 3. They deal with difficulty in coping with such physical

activities as lifting or carrying groceries, climbing stairs, taking walks. These

ten questions intuitively fulfil the item response theory model, as persons

unable to bathe or clothe themselves (3j) are also unable to walk a distance of

100 meters (3i), etc.

The Role Physical scale (RP) in Figure  6.1 measures the impact of physical

health on daily activities. Bodily Pain (BP) (items 7 and 8) measures physical

pain. General Health (GH) measures physical health with items 1 and 11.

As seen in Figure  6.1 , these four subscales cover physical health. The four

subscales at the bottom of Figure  6.1 deal with mental health; with Vitality

(VT), Social Functioning (SF), Role Emotional (RE) and Mental Health (MH).

The items in the four subscales dealing with mental health are both

positively and negatively phrased. As shown in Figure  6.1 , both Item 9e

(being full of energy) and 9d (being calm and relaxed) are positively phrased

and measure positive mental well-being. In contrast, Item 10 (difficulty

visiting relatives and friends), 5c (less careful doing daily activities) and Item 9f

(feeling sad) are negatively phrased and should be seen to measure actual

Page 85: Clinical Psychometrics

76 Clinical Psychometrics

symptoms of poor mental health. The reason for the use of both positively

and negatively phrased questions is an old psychometric issue. Its purpose

was to ensure that the subject had actually read the questions and was not just

mechanically filling in replies in the same way regardless of whether they

were negative or positive.

Figure  6.1 also shows the SF-12. The 36 questions in SF-36 are often seen

as quite a large number, although SF stands for Short Form! In recent years,

the use of SF-12 has become popular. This scale measures both bodily or

physical quality of life, mental quality of life and social quality of life. These

SF-12 SF-36Items Items Scales Factors

3a3b only moderate physical activities3c3d able to walk up several flights of stairs3e3f3g3h3i3j

3b

3d Physical functioning (PF)

4a4b accomplished less work than before4c limited in the kind of work4d

4b Role Physical (RP)4c

PhysicalHealth (PCS)

78 difficulty working due to pain

Bodily Pain (BP)8

1 1 health all in all11a11b11c11d

General Health (GH)

9a9e full of energy9g9i

9e Vitality (VT)

610 difficulty visiting relatives and friends

Social Functioning (SF)10

Mental Health(MCS)

5a5b accomplished less work than before5c didn’t do work as carefully as usual

5b Role Emotional (RE)5c

9b9c9d calm and peaceful9f downhearted and blue9h

9d Mental Health (MH)9f

Figure 6.1 Scoring sheet for the SF-12 items from the SF-36. The two factors, physical versus mental health, are also indicated

Page 86: Clinical Psychometrics

The clinical consequence of IRT analyses 77

subscales are converted to a 0–100 value scale where ‘0’ signifies worst

imaginable quality of life and ‘100’ best imaginable quality of life ( 4 ).

SF-36 population studies have been carried out in several countries with

Denmark playing a leading role ( 136 ). Figure  6.2 shows the original US

population study. The results of the Danish population studies are quite

similar to these.

The American study is an interesting landmark study, in that it

demonstrates how depressive patients differ from a normal population.

As can be seen (Figure  6.2 ), the depressive patients score less on all

subscales, and on the four Mental Health Functioning subscales (MCS,

Figure  6.1 ) the difference equals one standard deviation. The problem in

this respect is that the degree of clinical depression is poorly defined in

this study. Thus as regards mental quality of life, ‘0’ indicates that life is not

worth living while ‘100’ signifies maximum positive well-being. This mental

quality of life measure in the SF-36 is based on a precursor of the scale the

An American general population study and a groupof depressed patients from the primary care setting

100

90

80

70

60

50

40

30

20

10

0MH

Normal population

Depressive patients

PF = Psychical FactorRP = Role PsychicalBP = Bodily PainGH = General HealthVT = VitalitySF = Social FunctioningRE = Role EmotionalMH = Mental Health

Bestimaginable

Worstimaginable

PF RP BP GH VT SF RE

Figure 6.2 Results of an American general population study (modifi ed) comparing persons with and without depression. (Ware JE, Gandek B and the IQoLA project group. Int J Ment Health 1994;23:49–73)

Page 87: Clinical Psychometrics

78 Clinical Psychometrics

Psychological General Well-Being (PGWB) scale, which was actually the

scale used in all the scientific trials performed in the 1980s to assess efficacy

of medication in chronic diseases such as hypertension( 4 ).

When the World Health Organization (WHO) was established in 1948,

health was defined as not only the absence of symptoms of illness, but also

as physical, mental and social well-being. This is why SF-36 is termed a

health-related quality of life scale. Among its components, positive psycho-

logical well-being is probably the most general measure, as opposed to

physical and social quality of life.

However, in the SF-36, the mental quality of life questions are both

negatively phrased (as when measuring depressive symptoms, e.g., feeling

blue) and positively phrased (as is the case when positive well-being is being

measured).

The use of both types of phrasing was included in many questionnaires in

the early days of psychometrics, partly to ensure that the person being

interviewed actually read the questions thoroughly and did not just mechan-

ically tick a certain response option no matter its content, and partly to avoid

what is called ‘social disability’, a situation that may arise if only negatively

phrased questions are asked and the person being interviewed makes him

or herself appear more ill than is really the case.

The WHO -5 Questionnaire

In an extensive analysis of Murray’s basic human needs and their hierarchic

arrangements, Rasmussen concluded that the hedonic need might be

considered as a global index of measurement ( 137, 138 ). The WHO-5 can be

considered as such a general psychological well-being scale measuring a

global hedonic dimension and is actually derived from the Psychological

General Well-Being scale ( 139, 140 ). The WHO-5 is a questionnaire that

measures current (the previous two weeks) mental well-being. As such, the

WHO-5 is probably the most robust questionnaire from a psychometric

point of view ( 141 ). Attempts at measuring eudemonia, which is not the

actual perception of well-being, but rather some meaningful causal element

lying behind hedonia, are still inconclusive. When measuring positive quality

of life, it is important to avoid symptom-related language and to use only

positively phrased questions. Based on previous experience with the PGWB

and the SF-36, the WHO-5 was developed as a measure of general positive

quality of life.

The quantification of the individual items in terms of their presence

during the past two weeks proved to be highly sensitive as an indicator of

Page 88: Clinical Psychometrics

The clinical consequence of IRT analyses 79

positive well-being. Subsequently, a five-item questionnaire was shown to be

sufficient to cover the dimension from 0 to 100, where a higher score means

a higher level of well-being. As each item is scored from 0–5 (see Figure  6.3 )

the theoretical raw score goes from 0–25. By multiplying the raw score by 4,

a theoretical score span from 0 = worst imaginable quality of life to 100 = best

imaginable quality of life is achieved.

A Danish population study showed a WHO-5 mean score of about 70

( 142, 143 ). Table  6.1 shows a general practitioner study in which WHO-5 is

close to 70 in patients without symptoms of mental illness. It also shows that

in depressive patients, the WHO-5 mean score is about 30 and that in the

various anxiety disorders, the WHO-5 is linearly increasing.

Speer has shown, using the PGWB, that after 6 weeks of treatment depres-

sive patients may achieve a statistically significant increase in well-being

which, however, is still significantly lower than that of the average population

( 144 ). The national norm is not reached until after 12 weeks of therapy.

Figure  6.4 illustrates equivalent results with the WHO-5. Here, depressive

patients score approximately 30 on the WHO-5 prior to treatment. After six

Over the pasttwo weeks…

All of thetime

Most ofthe time

More thanhalf the

time

Less thanhalf the

time

Some ofthe time

At notime

1 .. I have felt cheerful andin goodspirits

5 4 3 2 1 0

2 .. I have feltcalm andrelaxed

5 4 3 2 1 0

3 .. I have felt active and vigorous

5 4 3 2 1 0

4 .. I woke up feeling freshand rested

5 4 3 2 1 0

5 .. My dailylife has beenfilled with things that interest me

5 4 3 2 1 0

Total score x 4 = __________

The WHO-Five questionnaire

raw score of item 1 to item 5

Figure 6.3 The WHO-5 scoring sheet

Page 89: Clinical Psychometrics

80 Clinical Psychometrics

weeks of therapy WHO-5 increases to 50, this is statistically significant

(P < 0.01). The present day goal of antidepressive therapy is however to attain

the same WHO-5 score as the average population, i.e., about 70. Often, this

does not happen until after 12 weeks of therapy, as illustrated in Figure  6.4 .

Another standardisation of the WHO-5 has been performed by Lucas et al

( 145 ). Using the WHOQoL-BREF item of general quality of life: ‘How would

you rate your quality of life?, Poor, Neither poor nor good, or Good’ it was

found that persons with ‘poor’ quality of life had a WHO-5 mean score of

37.5 (21.4), persons answering ‘neither poor nor good’ had a WHO-5 mean

Table 6.1 Results of a WHO-5 study in the primary care setting (146)

ICD-10 diagnoses WHO-5 mean (sd)

Not diagnosed with mental disorders (N = 1162) Mental disorders (N = 358) • Depressive disorder (N = 116) • Anxiety disorders (N = 30) • Somatoform disorders (N = 173) • Other minor mental disorders (N = 39)

66.27 (19.57) 43.66 (21.96) 31.91 (21.38) 45.07 (20.29) 48.86 (20.03) 50.60 (19.20)

WHO-5

Weeks of therapy

100

70

50

30

General population mean

P £ 0.01

P < 0.01

Endpoint

Baseline

0

1 2 3 4 5 6 7 8 9 10 11 12

The goal of treatment in depression using WHO-Fiveis to reach the general population mean

Figure 6.4 The goal of depression therapy is that the depressive person should obtain a WHO-5 result in line with that of the general population, i.e., around 70. As can be seen, this will only happen after 12 weeks of therapy

Page 90: Clinical Psychometrics

The clinical consequence of IRT analyses 81

score of 59.6 (20.8) whereas those answering ‘good’ had a WHO-5 mean

score of 68.9 (16.2).

A review article by McDowell shows that the WHO-5 possesses high sen-

sitivity and specificity as a screening instrument in depression ( 142 ). In the

general practice setting, the WH0-5 has proved to be better than both the

General Health Questionnaire (GHQ) and a specific depression question-

naire designed to screen for depression in this setting ( 142, 143 ).

As the GHQ consists of items with a mixture of positively and negatively

phrased questions, a study in patients with chronic non-malign pain has used

factor analysis to determine whether the respondents were compliant when

completing the GHQ, i.e., noticing the questions that are ‘reversed’, that is,

with positive versus negative signs. This is done by taking the raw scores and

using a factor analysis in which the first factor takes the negatively phrased

items and the second factor the positively phrased items. In the study in ques-

tion it could be demonstrated that the respondents were able to differentiate

between positively and negatively phrased questions ( 147 ).

Table  6.1 shows a study from the family doctor setting where WHO-5

had a mean score of approximately 66 in the patients not having a mental

disorder ( 146 ). Patients with major depression had a mean score of approxi-

mately 32. The patients with anxiety disorder had higher WHO-5 means.

When Eysenck started using questionnaires instead of the Rorschach test

in the 1940s to assess personality variables, he was especially interested in

measuring the dimension of neuroticism.

Eysenck was actually testing Freud’s concept of this dimension. The

hypothesis was that this dimension was present to a mild degree in the normal

population, while increasing with growing neurotic behaviour in patients

suffering from anxiety neurosis. In these questionnaire studies, Eysenck

demonstrated that it was more reliable to use items with negative sign when

measuring an illness-related dimension, such as neuroticism.

Figure 1.4 shows the nine items that delimit the dimension of neuroticism.

European and American clinical psychologists have attempted to achieve

consensus on the most important personality dimensions and have identified

a five-factor model ( 36 ). In this model, Eysenck’s dimension of neuroticism

is the most important. As may be seen in Figure  1.4, the psychic anxiety

symptoms, and not the somatic anxiety symptoms, constitute the dimension

of neuroticism.

In a Danish study that used the clinical diagnoses made by the Danish

professor of psychiatry Thorkild Vanggaard as index of validity (clinical

validity), it was found that only Eysenck’s dimension of neuroticism had

clinical validity compared to ten other personality scales ( 33 ).

Page 91: Clinical Psychometrics

82

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

When Hans Selye developed his theory of the concept of stress in 1936, he

discriminated between stressors (strain), stress (the bodily reactions to such

strain) and distress (the mental reactions to the strain) ( 132 ).

In Selye’s original stress model, post-traumatic stress disorder (PTSD) was

not the focal point; it was the stress condition that develops during daily

strain at work or at home. When delimiting these daily ‘life events’ in psycho-

metric research, one attempted to consider as many items as possible, operat-

ing within the field of classical psychometrics ( 4 ). Cronbach’s alpha coefficient

was thus used as a statistical index. This coefficient denotes the degree of

correlation between the different daily life events. However, Cronbach’s alpha

gives no indication as to whether each event provides additional information

about the ‘stressor dimension’. Furthermore, the number of events is part of

the formula for the calculation of alpha. This resulted in a tendency to include

at least 20 items in the various ‘life-event questionnaires’.

Studies performed by the Danish National Research Centre for the Working

Environment have established that the six items shown in Figure  7.2 provide

an adequate measure of work-related stressors ( 148 ).

Post-traumatic stress disorder

In post-traumatic stress disorder, a single stressor completely dominates the

picture. Studies in American Vietnam veterans have formed the ‘landmark’

research in PTSD ( 149 ).

With the DSM-III, PTSD became an official diagnosis, which is also

included in ICD-10. Apart from combat situations, the catastrophes most

commonly encountered today are earthquakes, airplane crashes, road

7 The clinical consequences of  IRT analyses: The concept of stress

Page 92: Clinical Psychometrics

The clinical consequences of IRT analyses 83

traffic accidents and rape, i.e., a single, completely unexpected and dev-

astating event. It has gradually become apparent that a PTSD ‘distress’

reaction follows a clearly defined trajectory, which is illustrated in

Figure  7.3 .

The initial symptoms typically commence after a two-week latency period.

If the person has a physical pain reaction, e.g., whiplash syndrome, then all

therapeutic attention is focused on pain relief ( 150 ). Due to this, PTSD is

often allowed to develop without initiation of relevant therapy. If these initial

symptoms are allowed to develop, they obviously become more and more

pronounced (sleep disturbances, nightmares, repetitive thoughts or memo-

ries of the violent event).

In about 15% of all PTSD sufferers, the condition may develop into a

proper depressive state (HAM-D 6 ), with lowered mood (hopelessness),

Significant factors for stress and well-being in the work place

• Influence (you are listened to)

• Meaning (daily work processes)

• Relevant information (planning work)

• Support (social network)

• Rewards (recognition)

• Demands (overtime)

Figure 7.2 The six factors in Selye’s stress model found to be signifi cant. (Kristensen TS, Borg V and Hannerz H. Socioeconomic status and psychosocial work environ-ment: results from a Danish national study. Scand J Public Health 2002; 30: 41–48, and Bech P et al. Work-related stress, depression and quality of life in Danish managers. Eur Psychiatry 2005 (suppl 3):S318–S325)

Stressors(stressing influences)

Stress(the physical reaction)

Distress(the mental reaction)

E.g. Too high demandsLack of influenceLack of social support

E.g.High blood cortisolHypermetabolismHypertension

E.g.IrritabilityPsychic anxietyDepression

The Hans Selye’s model of clinical stress

Figure 7.1 Hans Selye’s medical stress model

Page 93: Clinical Psychometrics

84 Clinical Psychometrics

guilt feelings (negative view of the past), lack of initiative (negative view

of present situation), fatigue, feeling subdued and inactive. If this depres-

sive state is neglected it may, over the course of three to six months,

become chronic, with the addition of symptoms such as introversion,

emptiness, alienation reaction, and occasional suicidal impulses. The

behavioural theory of response to stimuli seems obvious in the PTSD situ-

ation, but the course of symptoms (HAM-D 9 � HAM-D

6 � HAM-D

2 )

seems a priori programmed to adapt to a certain form of genetic behav-

iour. In other words, we possess an innate disposition to react with the A,

B, C course of syndromes as collected in the HAM-D 17

(see Figure  7.3 or

Appendix 3a).

The work-related stress condition

The distress syndrome connected to work-related stress is largely identical to

the distress syndrome described in Figure  7.3 . However, the progression over

time in work-related stress conditions is less clearly described; this is proba-

bly due to the very unsystematic literature on work-related stress conditions.

The scales used in these studies often make it difficult to ascertain when an

individual symptom or a syndrome has been measured. Furthermore,

DSM-IV or ICD-10 depressions are often brought into the picture without

apparent awareness of the fatal flaw in these diagnostic systems. It is, thus, an

inherent rule of these diagnostic systems that if the condition (syndrome) is

so pronounced as to be major depression, then the stress model must be

abandoned ( 110 ).

Months aftertraumatic

event

Symptoms ABC HAM-D17

1–2 months Disturbed sleep, hereafter other arousal (anxiety)symptoms such as sweating, dizziness, heartpounding

HAM-D9

(B items)

3–4 months Depressed mood, tiredness, lack of interests, guiltfeelings, psychic anxiety, slowed down

HAM-D6

(A items)

5–6 months Feelings of emptiness, alienation, lack of insight,suicidal thoughts

HAM-D2

(C items)

The development of Post Traumatic Stress Disorder (PTSD) as measured by theABC version of HAM-D17 (see Appendix 3a)

Figure 7.3 Development over time of post-traumatic stress disorder (PTSD). Trauma (stressor): Catastrophes, accidents, war, traffi c accidents, rape

Page 94: Clinical Psychometrics

The clinical consequences of IRT analyses 85

Due to this, according to DSM-IV and ICD-10, chronic stress conditions

are merely mild anxious/depressive states where there are not enough symp-

toms to make a major depression or proper anxiety diagnosis.

The most widely used questionnaire developed specifically to measure

‘distress’ within the medical illness model was developed at Johns Hopkins

Hospital in Baltimore in the 1950s ( 4 ). Originally containing 41 items, it has

since expanded to 90 items. This distress questionnaire is the Symptom

Checklist (SCL-90).

Cohen’s Self-perceived Stress Scale (stress questionnaire) has been used in

several Danish general population studies, and has a Mokken coefficient of

homogeneity of 0.44.

Integration of Selye’s medical stress model

In 1936, the Austrian born physician Hans Selye (1907–86) described the

stress state he had observed as a general syndrome in patients with chronic

somatic diseases. Selye continued his career in Canada, where his research

led to the development of ‘the biological stress syndrome’ ( 151, 152 ).

According to Selye’s stress model, ‘stressors’ are the demands or strains that

cause the stress condition (Figure  7.1 ).

Cohen’s stress questionnaire is an attempt to measure the subjective

‘ stressors’ experienced by the patient during the preceding two weeks.

Question 3 in this stress questionnaire asks how much of the time during the

previous weeks you felt nervous and ‘stressed’.

According to Selye’s medical stress model, the actual stress condition is a

biological phenomenon; the pathophysiological part of the medical disease

model. From a scientific point of view, it is thus very important to use

the correct terminology, as the stress demands that have led to the stress

condition are named ‘stressors’, and are typically psychosocial factors (see

e.g., Figure  7.1 ), while the stress condition itself is biologically defined,

according to Selye. He believed that the higher levels of the adrenal cortex

hormone cortisol produced during chronic pressure result in the biological

stress condition reaction ( 151, 152 ). Hans Selye demonstrated that when

chronic stressors cause imbalance in the normal biological regulating mech-

anism of the body (the actual stress reaction), the body attempts to regain a

state of balance by increasing its production of the hormone cortisol in the

adrenal cortex.

After Selye’s death, some have sought to introduce the term ‘allostasis’ to

describe a stressed organism’s attempt to achieve a new state of balance in the

hormone and nervous system at the cost of increased cortisol production.

When slightly increased, Selye called the cortisol hormone a ‘tolerance

Page 95: Clinical Psychometrics

86 Clinical Psychometrics

hormone’. Throughout the ages, it has been women in particular who have

had to manage life on a higher level of cortisol, which is why they are faster

than men to develop the unhealthy stress condition that Selye called ‘distress’.

Selye’s final work ‘Stress without Distress’ was translated into Danish with

a title equivalent to ‘Stress without Anxiety’ ( 151, 152 ). Today, one would

rather translate ‘distress’ to ‘depression’. As early as 1913, the renowned neuro-

surgeon H.W. Cushing (1869–1939) described a disorder in which cancer

causes the production of cortisol to gradually increase over many months

(Cushing ’ s Disease) ( 132 ). At the beginning of the disease, these patients are

completely free from stressors, but mental symptoms appear prior to the

physical ones, i.e., over the course of some months, with anxiety, fatigue, sleep

disturbances, concentration difficulties, despondency and lowered mood. If

these symptoms are disregarded and the cancer is not diagnosed, cortisol

production will increase with increasing growth of the tumour and somatic

symptoms will appear such as hypertension, diabetes and cardiac disease,

which will prove fatal for the Cushing patient. Thus, it is the increased pro-

duction of cortisol seen in a stress condition that explains the mental stress

symptoms (distress) of anxiety, sleep disturbances and depression. However,

it is difficult to measure serum cortisol levels and the mental symptoms

appear already at the early stage of increase. Viewing cortisol as the crucial

factor is thus far too materialistic an approach. It is the mental manifestations

that are important. As Hans Selye himself concludes, it is important for each

individual to find his or her own level of stress without distress ( 151 ).

According to Selye’s model, all humans are stressed, as any kind of productive

labour has an impact on cortisol production.

The American linguist Noam Chomsky views the body–mind discussion

as a minor issue, as we only think we can comprehend the nature of a disease

when we are able to describe it in biological terms ( 153 ). When mental symp-

toms enter the picture, like in distress, we call cortisol a distress hormone (in

lower concentrations we call it the ‘tolerance hormone’).

In clinical psychometrics (clinimetrics), we measure mental manifesta-

tions within the psychometric frame of reference, so that in connection with

Selye ’ s stress model, we measure distress through questionnaires. Both anxi-

ety and depression questionnaires are used to measure distress. The Anxiety

Symptom Scale (ASS), see Appendix 5b, is recommended when screening for

anxiety symptoms, while the Major Depression Inventory (MDI), see appen-

dix 4a, is recommended when screening for depression.

The connection between depression and anxiety, when measuring ‘dis-

tress’ is best illustrated by Beck’s cognitive model of depression (Figure  7.4 )

and Spielberger’s antianxiety model (Figure  7.5 ). In Figure  7.4 , Beck’s

negative triad is related to the corresponding symptoms in Hamilton’s

Page 96: Clinical Psychometrics

The clinical consequences of IRT analyses 87

depression scale (HAM-D) and in Figure  7.5 , Spielberger’s model of men-

tal versus somatic anxiety is related to the corresponding symptoms in the

HAM-D.

In a very comprehensive study by Grinker et al containing many rele-

vant depression rating scales covering the period from 1956–60, i.e., prior

to the release of the HAM-D or BDI, these authors found, when using

factor analysis without and with rotations, that in their opinion a rather

limited number of factors was identified ( 154 ). For the quantification of

depressive states the authors found that the core items of subjective depres-

sion include

Hopelessness

Helplessness

Worthlessness or guilt feelings

To their surprise, Grinker et al also identified anxiety as a core item of

depression ( 154 ). Moreover, they considered psychomotor retardation

and tiredness as behavioural core items. The negative triad of depression

or the bias of the negative depressed person in his or her information pro-

cessing system has been considered to be the endophenotype, or deep

phenotyping, in depressive states. The extended HAM-D 17

version

The HAM-D items of depressed mood (Item 1), of guilt (Item 2), and of work andinterests (Item 7) are the three angles in the negative triangle (triad).

Negative view of the future(hopelessness)[HAM-D item 1]

[BDI6 item 1]

Negative view of the present(helplessness)[HAM-D item 7][BDI6 item 15]

Negative view of the past(guilt feelings)

[HAM-D item 2][BDI6 item 5]

Figure 7.4 Beck’s Negative Triad of Depression

Page 97: Clinical Psychometrics

88 Clinical Psychometrics

includes more specific items in this respect, namely the item of hopeless-

ness, the item of helplessness, and the item of worthlessness or guilt (see

Appendix 3b).

With regard to the ‘allostasis’ condition, i.e., the state in which long term

(or chronic) stressors are present, Eysenck’s neuroticsm scale is typically

employed (Figure 1.4). The fact that women score significantly higher than

men on Eysenck’s neuroticism scale in general population studies or in clini-

cal studies, gives food for thought. Perhaps the ‘villain’ here is the ‘politeness

hormone’ or rather the ‘neuroticism hormone’, cortisol.

Spielberger’s cognitive appraisal model of anxiety (5), of which anxiousmood is most valid, corresponding to Item 10 (psychic anxiety) in theHAM-D while the somatic symptoms are contained in HAM-D item 11

Subjective feelings of anxious mood

• Nervousness• Tension• Worry• Apprehension• Fearfulness (panic)

Activation (arousal) of the nervous system•••••

Nausea or upset stomachSweatingDizzinessHeart poundingTrembling

HAM-D Item 10

HAM-D Item 11

Figure 7.5 Spielberger’s cognitive appraisal model of anxiety

Page 98: Clinical Psychometrics

89

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Many doctors have often wished for a questionnaire measuring depression or

anxiety in the same way as a blood sample can be used to measure the

patient ’ s metabolism or cholesterol level. The results of such a blood test

come back from the laboratory with the current value and with the normal

range of this blood test given in brackets.

These normal ranges have emerged from blood sample results from a

representative sample of citizens.

Population studies in depression and anxiety

We have looked at various representative samples involving questionnaires

assessing anxiety, depression and well-being. Depression has become the

most relevant ‘blood test’ during the last 10–15 years to use in general practice

in order to identify true depression, depression secondary to medical

disorders (such as patients with chronic pain, cancer, diabetes) or people with

psychosocial burdens. We have undertaken three population studies using

the Major Depression Inventory (MDI). The MDI questionnaire is shown

in  the appendix. It includes the depressive symptoms of both the DSM-IV

and the ICD-10. Via the algorithms given in these diagnostic systems, the

MDI response makes it possible both to diagnose the depression, and to

denote the severity of the depression as indicated by the MDI total score,

since the MDI total score is a sufficient statistic (Rasch analysis) and may

therefore be used to measure the severity of both primary and secondary

depression ( 155 ).

Table  8.1 lists the results of our three general population investigations. The

first was performed in 1999 and so far it has only been published in a PhD

thesis by Vibeke Nørholm. Her topic was the quality of life in schizophrenic

patients and she used the voluminous WHOQoL questionnaires ( 141 ).

8 Questionnaires as ‘blood tests’

Page 99: Clinical Psychometrics

Tab

le 8

.1 T

he r

esul

ts o

f Dan

ish

pop

ulat

ion

stud

ies

from

199

9, 2

000

and

2003

usi

ng t

he M

ajor

Dep

ress

ion

Inve

ntor

y q

uest

ionn

aire

Dia

gnos

is

1999

2000

2003

(N=1

078)

fem

ales

(N=5

66)

mal

es (N

=512

) 20

00 (N

=114

1)

fem

ales

(N=6

10)

mal

es (N

=531

) 20

03 (N

=186

7)

fem

ales

(N=9

88)

mal

es (N

=879

)

DSM

-IV m

ajor

dep

ress

ion

3.2

% 4.

2 %

2.1

% 3.

4 %

3.8

% 3.

0 %

2.6

% 3.

2 %

2.1

%

ICD

-10 d

epre

ssio

n 3.

6 %

4.2

% 2.

9 %

4.2

% 5.

1 %

3.2

% 2.

8 %

3.5

% 2.

1 %

MD

I > 2

0(m

ild d

epre

ssio

n)

6.6

% 8.

1 %

4.9

% 7.

7 %

9.5

% 5.

6 %

6.2

% 7.

7 %

4.7

%

MD

I > 2

5(m

oder

ate

dep

ress

ion)

4.

1 %

5.1

% 2.

9 %

4.2

% 5.

2 %

3.0

% 3.

7 %

5.0

% 2.

4 %

Resp

onse

rat

e 67

.1 %

51 %

68 %

Page 100: Clinical Psychometrics

Questionnaires as ‘blood tests’ 91

Table  8.1 shows that, in the Danish general population sample, the

prevalence of depression was 3.2% for DSM-IV major depression and 3.6%

for ICD-10 depression. An MDI score of 25 or more (corresponding to a

HAM-D 17

score of 18 or more) gives about 4% prevalence in the population.

According to WHO’s estimates from different parts of the world, the

prevalence lies between 3 and 5%.

In 2000, we performed a sampling in connection with Lis Raabæk Olsen’s

PhD thesis ( 156 ). The result was again a 3–4% prevalence of depression in

the general population, depending on the method used (DSM-IV, ICD-10, or

MDI total score). In 2003, we undertook a new population study, together

with Dr. Odont. Erik Friis-Hasché, whose field of interest was fear of dentists.

Approximately 1/3 of the persons in this study actually had a marked fear of

dentists (see http://www.ncbi.nlm.nih.gov/pubmed/7725561 ). Once more,

the prevalence of depression in the general population was between 3 and

4%. Apart from the year 2000 sample, Table  8.1 shows a greater prevalence of

depression in women than in men.

The family doctor will diagnose hypertension when systolic and diastolic

results are ≥ 140 mm Hg and ≥ 90 mm Hg, and by using the MDI in the same

way, the doctor may diagnose treatment-requiring depression or DSM-IV

major depression when the MDI is higher than 25. To continue the analogy,

the family doctor then determines whether it is a question of primary or

secondary hypertension, or of primary or secondary depression. While the

DSM-IV or ICD-10 major depression symptoms are presupposed to be the

same in primary depression (e.g., bipolar or unipolar depression) and in

secondary depression (depression due to somatic illness or a stress condition),

scientific research has proved through demonstration of transferability, that

the HAM-D 6 or the MDI do measure the same depressive condition in both

primary and secondary depression. This is the reason why the HAM-D 6 or

the MDI may be used when screening for depression.

In connection with the 2003 population study, anxiety was also measured,

using the Spielberger Anxiety Scale. We found that 7.5% of the general

population had a clinical anxiety condition ( 147 ).

Spielberger’s Anxiety Scale consists of a State scale (measuring present

state anxiety) and a Trait scale (measuring personality propensity to anxiety).

The present state scale only measures the psychic anxiety symptoms, while

the personality scale is a mixture of anxiety-related and depression-related

tendencies, but still with particular focus on bodily manifestations of anxiety.

However, results of the Trait scale are very difficult to interpret, so Eysenck’s

neuroticism scale (Figure 1.4) is the more valid.

Spielberger’s State Anxiety scale consists of 20 items; ten of these are

negatively phrased (symptom orientation), while the remaining ten items are

positively phrased (well-being orientation).

Page 101: Clinical Psychometrics

92 Clinical Psychometrics

A factor analysis of Spielberger’s State Anxiety scale results in several

factors, despite a very high Cronbach’s alpha coefficient (between 0.82 and

0.96); however, these factors are method factors, not true factors that provide

new insight ( 157 ). Thus, the two most significant factors only show that the

items describing symptoms have positive loadings (negatively phrased items)

while the items describing well-being have negative loadings ( 157,160 ). This

methodological issue is used as a measure of test-taking behaviour ( 147 ).

When requiring a questionnaire that deals directly with social functioning,

Sheehan’s Disability Scale is applicable.

The ability of the WHO-5 in detecting depression in elderly diabetic patients

(with a cut-off < 50) was found quite acceptable ( 158 ). Thus this study using the

DSM-IV major depression as index of validity ( 158 ) obtained a sensitivity of

100% and a specificity of 78%. In a population of adolescents with diabetes

(aged 13 to 17 years), the WHO-5 with a cut-off of < 50 using the Centre for

Epidemiologic Studies Depression Scale (CES-D) as index for depression,

obtained a sensitivity of 89% and a specificity of 86% ( 159 ).

The predictive validity of WHO -5

The predictive validity of the WHO-5 has recently been demonstrated in a

Danish study, where patients with cardiac disorders have been followed for a

period of six years ( 160 ). Patients who scored less than 50 on the WHO-5 at

the start of the study proved to have a significantly higher mortality than those

scoring more than 50 at the start of the study. This is apparent from Figure  8.1 .

Screening scales

There is a range of questionnaires aimed at screening for a condition

rather than measuring it. Among these different screening instruments,

the following have been selected: the Mini Mental State Examination

(MMSE) with the clock and the Anxiety Symptom Scale (ASS) (see

Appendix 5b).

MMSE /Clock test The Mini Mental State Examination is a screening instrument, as the scale

only assesses certain aspects of cognitive functioning. Therefore, some per-

sons may perform very well on the test with scores between 25 and 30 and

still be in the initial stage of dementia. Nor does the scale provide a depend-

able description of the more pronounced dementia state at the other end of

the score variation, i.e., scores below 15.

Page 102: Clinical Psychometrics

Questionnaires as ‘blood tests’ 93

However, the scale is the most frequently used worldwide, as it is easy to

administer and has a high reliability. As mentioned in connection with

antidementia medication, it is also used to measure effect during a course of

treatment.

In the clock-drawing test, the subject is presented with a pre-drawn circle

and asked to fill in numbers so as to represent the face of a clock. Then the

hands have to be set at a given time, e.g., 13.50 hours. The test is quick and

easy to administer. However, it cannot be used as the sole test and must be

viewed as a supplement to the MMSE.

Anxiety Symptom Scale ( ASS ) The ASS screening instrument provides a swift method to ascertain

which kind of anxiety is the most predominant in the subject (see

Appendix 5b).

When measuring the current state of anxiety, Spielberger’s Anxiety Scale

(STAI) may be used. If a clinical anxiety condition is established, the ASS

WHO-5 > 50%

WHO-5 < 50%

3 years 6 years

0.2

0.4

0.6

0.8

1.0

The predictive value of WHO-5 in patients with cardiac disorders. A survival analysis. (160)

Figure 8.1 Predictive validity of the WHO-5 in a study on survival in patients with heart disease. The Kaplan-Meier curves demonstrate that in patients scoring above 50 on the WHO-5 at discharge from hospital, 20% die within 6 years, while in patients with a WHO-5 score below 50, 80% die within 6 years ( 160 )

Page 103: Clinical Psychometrics

94 Clinical Psychometrics

scoring profile can be used to determine whether, besides a general state

(items 1 and 2) there is avoidance behaviour (item 3), anxiety attacks as in

panic attacks (items 4 and 5), obsessional phenomena (items 6, 7 and 8) or

post-traumatic anxiety (item 9). Item 10 gives an indication of the anxiety

condition ’ s impact on social functioning. When using the ASS as a screening

instrument, a score of 3 or higher is the clinical threshold.

Page 104: Clinical Psychometrics

95

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Clinical psychometrics has developed into a discipline within clinical

psychiatry of similar importance as genetics, epidemiology, or pharmacology.

Psychometrics was originally a discipline within psychology that was estab-

lished at Wundt’s psychological laboratory about a century ago. Here Kraepelin

learnt how to measure his subjects’ mental manifestations under standardised

conditions, e.g., the dose–response relationship of alcohol to reaction times.

When proceeding to a career in clinical psychiatry, Kraepelin continued his

‘laboratory-like’ assessments of his patients ’ symptoms over time by measuring

their symptoms on his ‘symptom checklist’. On this basis, he was able to delimit

the course of illness in both schizophrenic and manic-depressive patients.

At the beginning of the 20th century, Kraepelin attempted to establish a

discipline that he named pharmacopsychology in the hope that psychoactive

drugs with the desired effect on schizophrenia or the manic-depressive

disorder would make their appearance. However, this only came to pass at

the beginning of the 1950s.

Attempts have been made to scientifically test the rating scales developed

since the 1950s by using the classical psychometric method developed by the

psychologist Spearman during his studies at Wundt’s psychometric

laboratory; the factor analytic method.

Table  9.1 gives an overview of the questionnaire (Eysenck’s personality

scale) and the rating scales (HAM-A, HAM-D, BPRS) that were developed in

the 1950s and tested by use of Spearman’s factor analysis.

The British tradition used the two-factor model introduced by Spearman

in his research on intelligence measurements. In Spearman’s research, the

first factor was a general factor and the next factor a dual or ‘bi-directional’

factor (indicating that in the general factor, two subgroups with opposite

signs can be isolated, as the symptoms within these subgroups have the

highest inter-correlation).

9 Summary and perspectives

Page 105: Clinical Psychometrics

96 Clinical Psychometrics

In modern psychometrics, however, factor analysis has faded into the

background. Principal component analysis is included as an example of the

factor analytic method that survives in clinical psychometrics. Here it is the

bi-directional factor 2 that is of interest, since it focuses on a pattern of symp-

toms in classification issues.

Factor 1 is the general factor that is presumed to measure the degree of

severity, as it reflects that all of the selected symptoms are more or less

positively correlated. However, this correlation is already mirrored in

Cronbach’s coefficient alpha and is not an argument for adding up the symp-

toms. In this case, item response theory analysis must be employed.

If factor 2 indicates a clinically meaningful symptom pattern by the

positive versus negative factor loadings, then factor rotation is quite

unnecessary. When interpreting the symptom pattern in factor 2, all loadings

must be taken into account, and not only those demonstrating statistical

significance (e.g., not just loadings of 0.30 or greater).

However, pharmacopsychometrics has discarded classic psychometrics

(factor analysis), as the general factor was unable to measure transferabil-

ity, that is, whether or not a rating scale measures the same phenomenon

or the same dimension in different groups of patients (men versus

women, younger versus older age groups, primary versus secondary

depression) or in the same group of patients when the scale is used in

weekly assessments during antidepressive therapy. Modern psychomet-

rics was able to demonstrate this concept of transferability by the use of

item response theory models.

Table 9.1 A schematic review of British factor analytic tradition, focusing on the general versus the dual factor.

General factor Dual factor

Spearman 1927 ( 17 ) General intelligence factor Linguistic versus mathematical intelligence

Eysenck 1953 ( 31 ) General neuroticism factor EPQ – N

Extraversion versus introversion

Hamilton 1959 ( 38 ) (Appendix 5a)

General anxiety factor HAM-A 14

Psychic versus somatic anxiety

Hamilton 1960 ( 39 ) (Appendix 3a)

general depression factor HAM-D 17

Depression versus anxiety

Overall 1962 ( 44,45 ) (Appendix 7)

General psychotism factor BPRS 18

Schizophrenicity versus depression

Bech et al 2010 ( 161 ) (Appendix 3e and 3h)

general distress factor SCL-92

Depression versus anxiety

Page 106: Clinical Psychometrics

Summary and perspectives 97

These models were constructed precisely because factor analysis was

unable to measure transferability, no matter how many times the different

factors were rotated in accordance with the American tradition.

In the pharmacopsychometric triangle, the transferability requirement is

important (the total score on a rating scale for desired clinical effect is a

sufficient statistic), as the unit for the measure of the magnitude of pharma-

cological effect is denoted by variation unit, which is what makes the

magnitude of effect independent of the rating scale scoring system.

A group of approximately six symptoms has proved to be a sufficient

measure of desired clinical effect. When considering the second angle of the

pharmacopsychometric triangle, any unwanted effects of the drug, a separate

analysis of each side effect symptom is often necessary. Use of the item

response theory model has shown that the third angle in the pharmacopsy-

chometric triangle, subjective quality of life, can be measured with relatively

few items, e.g., the WHO-5.

The pharmacopsychometric triangle provides an easily grasped overview

of the importance of a drug in clinical psychiatry.

Table  9.1 shows the most used rating scales worldwide. Apart from the

SCL-90, these scales can be found in the Appendices. The Danish SCL-92 (as

well as many of the others) is to be found in an electronic version at: www.

psykforskhil.dk.

Figure  9.1 illustrates the issue in depression called ‘the one and the many’.

The standardisation introduced with the diagnoses of ‘major’ versus ‘minor’

depression is rooted in the Hamilton Depression Scale. This scale gives a

common ground: ‘the one’. However, depression also appears in many forms

(‘the many’), such as primary depression (when no certain cause can be

established) and secondary depression when emerging after stress (burden)

or after medical conditions (postnatal depression, post-stroke depression

etc.). These manifold subtypes are marked with Roman numerals in

Figure  9.1 and with reference to the corresponding therapy according to

Lichtenberg and Belmaker ( 162 ).

Among international collections of rating scales, the book by Lam et al can be

recommended ( 163 ). This work mentions the fact that rating scales (assessment

scales or questionnaires) are widely used in scientific research, but still only to

minor extent in daily clinical work, even though electronic patient records are

encouraging such use. Perhaps the use of rating scales will only become a

requirement in daily clinical work with the introduction of DSM-V or ICD-11.

Lam et al discuss the difference between two different approaches to

treatment evaluation during antidepressive drug treatment. These approaches

are personified by two physicians, Dr Scales and Dr Gestalt. Dr Scales uses

the HAM-D and Dr Gestalt uses a general measure (‘are you feeling better or

Page 107: Clinical Psychometrics

98 Clinical Psychometrics

are you not feeling better today?’) when assessing their patients. This

difference in their approaches to treatment is illustrated in Figure  9.2

( according to Lam et al).

Prior to treatment (week 0 on Figure  9.2 ), both Dr Scales and Dr Gestalt

have diagnosed moderate depression according to ICD-10, and Dr Scales has

also established his patient ’ s symptom score on the Hamilton Depression

Scale (HAM-D 17

); a total score of 24 (see Figure  9.2 ).

Major depression sub-typesHAM-D17

mean (sd)Treatment

Primary depression (melancholia)

I Psychotic depression 30 (6) ECT, TCA

II Bipolar depression 24 (5) Mood stabilizors, SSRI

III Unipolar depression 24 (5) SSRI, SNRI, TCA

IV Atypical depression 21 (5) MAO-I

Secondary (to stress) depression

V Stress-adjustment disorder with depression and anxiety

18 (4) Stress-reducing exercises

VI Depression after childhood trauma 18 (4) Cognitive therapy

VII Depressive reaction to stress in connection with separation

18 (4) Psycho-social intervention

Secondary (to somatic illness) depression

VIII Post-natal depression 18 (5) Cognitive therapy/ SSRI

IX Age-related depression (post-stroke) 18 (5) SSRI

X Substance abuse disorder 18 (5) Treatment of underlying disorder

Less than major depression sub-types

XI Dysthymia (depressive neurosis) 14 (3) Cognitive therapy/ SSRI

XII PTSDStress-reducing

exercises

XIII Other stress-related neuroses Cognitive therapy/SSRI

ECT = electroconvulsive therapy; TCA = tricyclic antidepressants; SSRI = specific serotonin reuptake inhibitors; SNRI = serotonin-/noradrenaline reuptake inhibitors; MAO-I = Monoamine oxidase inhibitors

Figure 9.1 Subtypes of depression, modifi ed from Lichtenberg and Belmaker ( 162 )

Page 108: Clinical Psychometrics

Summary and perspectives 99

Dr Scales and Dr Gestalt agree to start a course of antidepressive

medication at a dosage of 20 mg during the first week of therapy.

After the first week of therapy, Dr Gestalt asks how the patient feels and

when the answer is that there is no improvement Dr Gestalt increases the

dosage to 30 mg. Dr Scales informs the patient that the HAM-D 17

has now

decreased to 20, which means that the dosage should not be altered.

After the second week of therapy, Dr Gestalt enquires how the patient is

doing and when the answer is ‘largely unchanged’; he now increases the

dosage to 40 mg. Dr Scales informs the patient that the HAM-D 17

has

decreased to 14, which means that the dosage should not be altered.

After the third week of therapy Dr Gestalt enquires how the patient is

doing and as the reply is still ‘by and large the same’ he increases the dosage

to 60 mg (the maximum dosage). Dr Scales informs the patient that the

HAM-D 17

is now 12, half of the original score, and that they are on the right

track and that the dosage should remain unchanged.

After the fourth week of therapy, the patient informs Dr Gestalt that the

side effects (heavy perspiration, inner unrest and headache) are such a

burden that his family feels that the medication should be stopped. Dr Scales

24

18

12

6

Dr. Scales [Dr. Gestalt]

No improvement

Unchanged

Stops

1 2 3 4 5 6

20 mg [30 mg]

20 mg [40 mg]

20 mg [60 mg]

20 mg

20 mg

20 mg

20 mg [20 mg]

Weeks treatment

HAM-D

Response

Remission

Figure 9.2 A course of treatment as conducted by Dr Scales versus Dr Gestalt. (Modifi ed from Lam et al Assessment scales in depression and anxiety. London. Taylor & Francis 2006)

Page 109: Clinical Psychometrics

Severity of depression

02

46

8

Wee

ks o

f sho

rt-t

erm

th

erap

y

52

7 12 18 24

Rec

over

y

Rel

apse

Sym

ptom

s

Maj

or

depr

essi

on

Ear

ly im

prov

emen

t (25

%)

Res

pons

e (5

0%)

Rem

issi

on

Wee

ks o

f the

med

ium

-te

rm th

erap

y

HAM-D-17 total score

WP

A S

erie

s 19

99

The

seq

uenc

e of

impr

ovem

ent,

resp

onse

, rem

issi

on, r

elap

se a

nd r

ecov

ery

base

d on

Joh

n R

ush’

s or

igin

al m

odel

Fig

ure

9.3

Cou

rse

of t

hera

py

in d

epre

ssiv

e p

atie

nts

with

a H

AM

-D 17

sco

re o

f ap

pro

xim

atel

y 24

bef

ore

star

t of

 ant

idep

ress

ant

ther

apy.

(Re

pro

duce

d fr

om B

ech

P. P

harm

acol

ogic

al t

reat

men

t of

dep

ress

ive

diso

rder

s: A

rev

iew

. In

: Maj

M, S

arto

rius

N (

eds)

Dep

ress

ive

Dis

orde

rs. C

hich

este

r, W

iley

1999

pp

89–

127.

Rep

rodu

ced

with

p

erm

issi

on.)

Page 110: Clinical Psychometrics

Summary and perspectives 101

informs the patient that HAM-D 17

is now 8 and that remission (absence of

symptoms) is within reach.

After the fifth week of therapy, Dr Scales can announce that HAM-D 17

has

fallen below the remission value of 7 and that continuation therapy can now

commence.

The development in Figure  9.3 shows how to use an assessment scale

during a course of treatment. When Dr Scales informs the patient that the

continuing decrease in his HAM-D 17

depression score is following the

expected trajectory, this has in itself a calming influence on the patient. Due

to his ‘holistic approach’, Dr Gestalt gives his patient ’ s own assessment too

much weight, resulting in a far too high dosage.

The use of itemised symptom measures (Dr. Scales) in the STAR-D study

was found to reveal a 25 to 45% earlier reduction in baseline severity of

depression than the global impression assessment (Dr. Gestalt). According to

Rush ( 164,166 ): ‘Analogous to treating hypertension, “less hypertensive” is

not a goal of treatment of hypertension. Nor should “less depressed” be the

goal for our depressed patients…’ .

Figure  9.3 shows the average curve for Dr Scales’ depressive patients

during treatment. The patients consult Dr Scales at the time point 0, where

the mean HAM-D 17

is about 24. After four weeks of therapy the mean

HAM-D 17

is about 11, i.e., a 50% reduction. Internationally, one uses such a

HAM-D 17

reduction of 50% or more as an indication of ‘response’ to

treatment. Two weeks of therapy typically gives a 25% reduction in HAM-

D 17

at week 0, and this is called ‘early improvement’. A score of 7 or less on

HAM-D 17

is termed ‘remission’, i.e., a relative absence of symptoms. ‘Relapse’

happens when remission has been obtained, only to be followed by an

increase in HAM-D 17

to 16 or more. After an absence of symptoms for 52

weeks in the older age group and 26 weeks in the younger age group, it is

highly likely that the patient is completely beyond the depressive phase and

Dr Scales can then finish antidepressive therapy. The period between

‘ remission’ and ‘recovery’ is termed maintenance therapy (Figure  9.3 ). If the

patient has a history of depressive episodes, relapse prevention therapy

should be offered.

The practical medical approach of Dr. Scales has had an impact on clinical

psychometrics going beyond the superficial approach of Dr. Gestalt.

Profound phenotyping in clinical psychiatry, e.g., endophenotypes, is

considered as the pathway between psychiatric disorders and the distal

genotypes ( 165 ). This deep phenotyping has been captured by the Newcastle

scales, with such items as sudden onset of depressive episode, diurnal

variation and morning worsening of depression ( 4 ). Similarly, this is reflected

in the double book-keeping behaviour in schizophrenia.

Page 111: Clinical Psychometrics

102 Clinical Psychometrics

In Kant’s philosophic approach (Figure 1.1) the dichotomy between the

phenomena and the noumena is covered by Wittgenstein’s ‘family resem-

blances’ in which the similarities between proximal and more distal pheno-

types are referred to as ‘applied mathematics’ ( 166 ).

When Hotelling looked back on the first decade of using his principal

component analysis, he advised psychometricians to consult mathematical

experts rather than psychologists to improve the use of his analysis method.

The mathematician Georg Rasch put a stop to the use of factor analysis in the

1950s. Wittgenstein had at that time criticised Freud’s psychoanalysis as

being a method by which we never know when to stop in the process of free

association: Freud never showed the right solution ( 167 ).

Rasch criticised factor analysis as being a method in which we never know

when to stop the rotations.

As remarked by Putman the Wittgenstein approach was to bring our items

back to their homes by reference to the ‘family resemblances’ ( 7 ). Putman

added that Wittgenstein could not have been so farsighted, had he not stood

on the shoulders of Kant. Similarly, we could not have developed clinical

psychometrics, had we not been standing on the shoulders of Kraepelin,

Hamilton, Pichot, Spearman, Hotelling and Rasch. Clinical psychometrics,

then, combines theories of measurement with the family resemblances in

clinical phenomenology, including deep phenotypings, i.e., theories of

clinical validity.

Page 112: Clinical Psychometrics

103

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

In a certain sense, clinical psychometrics has followed the continuity, usually

found in clinical medicine; that of a relay race in which the older clinicians

pass the baton to the younger generation. However, this is true only in a

certain sense, as the ‘relay race’ has been more Platonistic in clinical psycho-

metrics than in other branches of medicine. This epilogue attempts to give an

answer as to who took over Einstein ’ s office mentally – who received the

psychometric baton.

As an example, it was Bengt Strömgren (professor of astronomy at the

University of Copenhagen and brother of the Danish professor of psychiatry

Erik Strömgren), who physically took over Einstein’s office at the Institute for

Advanced Studies at Princeton, New Jersey, USA rather than someone desig-

nated by Einstein as his successor; or crown prince/princess ( 168 ). Often

enough, it is a purely bureaucratic decision process that lies behind the choice

of successor to a chair or an office, even that of a world famous scientist,

rather than the selection of a natural successor within the particular field

of research.

Figure  10.1 illustrates the more or less Platonistic office takeovers in the

wake of the three great psychiatric clinimetricists (Kraepelin, Hamilton and

Pichot). John Overall is still in his office in Texas and has yet to pass on

his baton.

As regards Kraepelin, Professor Hanns Hippius took over the former’s

office in 1971. In a certain sense, this office had remained empty during the

fifty years following Kraepelin ’ s departure. During the period from 1926 to

1971, German psychiatry was marked by the two world wars, and in the US,

Freud had taken over the scene ( 123 ). With the advent of Hippius, Kraepelin’s

work in both psychopathology and pharmacology became concentrated

around Kraepelin ’ s office and large library in Munich for all German-

speaking psychiatrists.

10 Epilogue: Who’s carrying Einstein ’ s baton?

Page 113: Clinical Psychometrics

104 Clinical Psychometrics

Professor Jules Angst, whose 1966 thesis had demonstrated the importance

of distinguishing between unipolar depression (patients suffering from

recurrent depressive episodes but never mania) and bipolar affective disorder

(patients suffering from both depressive and manic episodes) – something

Kraepelin had not covered sufficiently in his studies – was among the appli-

cants for the Munich chair in 1971 and might have been preferred ( 169 ).

However, he chose to withdraw his application in favour of a chair in Zurich

that Hanns Hippius had also applied for. In an attempt to improve Kraepelin’s

checklist in line with the international rating scale standard of the 1970s,

Hippius and Angst developed a very comprehensive scale system, the AMDP

(Arbeits-Gemeinschaft für Methodik und Dokumentation in den Psychiatrie)

in 1979; the most extensively used rating scale system in German-speaking

countries ( 170 ).

During his Zurich period, Jules Angst continued his major work with

scales, demarcating the bipolar affective disorders, most recently with the

Hypomania Checklist – the HCL-32 ( 171 ). The HCL-32 or the American

Mood Disorder Questionnaire are both intended to capture the previous

history of the depressive patient to ascertain any possible ‘upswings’

( hidden bipolarity) ( 172 ). Introversion is often a characteristic of patients

with recurrent depression but without signs of mania (unipolar depres-

sion). In the bipolar patient, extraversion is a more predominant personality

type. Due to this, some of the items in the HCL-32 overlap (MDQ and

Eysenck’s EPQ-E).

Following Hanns Hippius’ retirement, Professor Hans Jürgen Möller took

over Kraepelin’s Munich office. Möller has continued work on the AMDP

system, but has been particularly preoccupied with modern psychometric

Who got their offices?

Kraepelin E Hamilton M Pichot P

Lecrubier Y Kay SR

Paykel ES

Angst J Hippius H

Möller HJ

Lingjærde O Klerman GL

Overall JE

Lindenmeyer JP

Williams JBW

Rush J

Figure 10.1 Diagram of the psychiatrists who continued Kraepelin’s, Hamilton’s, Pichot’s and Overall’s pioneering work in scale construction

Page 114: Clinical Psychometrics

Epilogue: Who’s carrying Einstein’s baton? 105

studies on Hamilton ’ s depression scale as an effect measure in antidepressive

medication ( 173 ). Möller has very recently published an important review of

rating scales in psychiatry with particular emphasis on methodological

issues ( 174 ).

Physically, Max Hamilton’s office at the University of Leeds was taken over

by Mindham when Hamilton retired. However, Mindham was not particu-

larly interested in further work on Hamilton ’ s scales. Eugene Paykel, Professor

of Psychiatry at Cambridge University, developed Hamilton’s scales further

in the UK.

In 1985, Paykel published his Clinical Interview for Depression (CID), the

first attempt to use a 0–6 Likert scale with both the Hamilton Depression

Scale and the Hamilton Anxiety Scale ( 84 ). During this process, Paykel

discarded some of the original items so as to keep the number within 36, as

he had also added some new items. However, this modification meant that

the CID never caught on, as national medical agencies all over the world

insist that HAM-D 17

or HAM-A 14

, respectively, are part of the documentation

for the clinical effects of antidepressive or antianxiety drugs ( 175 ).

Most factor-analytic studies using the Hamilton Depression Scale are of a

more ‘invasive’ nature, carrying out various rotations of the factor structure.

Paykel’s CID study is among the few to assess only the un-rotated factor

structure ( 176 ). In his patient selection he compared especially depressive

hospitalised patients (N = 65) with outpatients (N = 100). He identified three

factors, the first of which is a general factor and the second a bipolar, or dual,

factor. The very important element in this study is Paykel’s demonstration

that the symptoms that especially discriminate between inpatients and out-

patients are the true core symptoms of depression and that these symptoms

are also the ones that are negatively loaded in the dual factor, (lowered mood,

guilt feelings, work and interests, psychomotor retardation) while the

positively loaded symptoms are sleep problems, anxiety and irritability.

Paykel has made the most valuable standardisation of the original

HAM-D 17

( 177 ). His London-based study took place in patients treated by

their GPs. The antidepressive drug amitriptyline was compared to placebo in

mildly to moderately depressed patients. The results showed that in patients

with a HAM-D 17

of 12 or less prior to start of therapy, placebo was just as

effective as amitriptyline, while in patients with a HAM-D 17

of 13 to 24 prior

to start of therapy, amitriptyline was clearly better than placebo, and this

effect was the same no matter whether the HAM-D 17

start score was from

13 to 17 or from 18 to 25.

Following the development of psychopharmacological drugs in the 1950s,

many psychopharmacological societies were established outside the UK in

different parts of the world. Among the oldest, besides the parent association

Page 115: Clinical Psychometrics

106 Clinical Psychometrics

Collegium Internationale Neuro-Psychopharmacologicum (CINP), is the

Scandinavian College of Neuro-Psychopharmacology (SCNP), which

celebrated its 50 th anniversary in 2009. In comparison, the American College

of Neuropsychopharmacology (ACNP) celebrated its 50 years in 2011, while

the British Association for Psychopharmacology (BAP) will have to wait

until 2024.

In 1969, the SCNP set up a committee for clinical investigations under the

acronym UKU (Udvalg for Kliniske Undersøgelser) with the Norwegian

Professor Odd Lingjærde as chairman and one member from each of the

other Scandinavian countries ( 1 ). Lingjærde arranged for the translation of

the Hamilton Depression Scale into the different Scandinavian languages.

The scale was then used in a UKU study demonstrating that lithium, in com-

bination with tricyclic antidepressants, was significantly more effective than

placebo in treatment-resistant depression ( 178 ). In the early 1980s, due to a

surprisingly small number of side-effect reports on psychopharmacological

drugs, the Swedish Medical Agency asked the UKU to design a reliable side-

effects rating scale. This led to the very comprehensive UKU Side Effect

Rating Scale ( 109 ), still the most comprehensive side-effect rating scale used.

A UKU subscale has been constructed for use in connection with the newer

antidepressants (4). In 1993, the UKU published a detailed review of rating

scales measuring the wanted and unwanted effects of psychopharmacologi-

cal therapy ( 179 ).

Figure  10.1 shows Klerman as the American ‘heir’ to the Hamilton

Depression Scale; he translated this scale into American English, making

such radical changes that Hamilton protested ( 1 ). However, Klerman’s ver-

sion was included in Guy’s Early Clinical Drug Evaluation (ECDEU) manual

( 92 ), which is used by the FDA and therefore, also by the pharmaceutical

industry.

Janet Williams developed the most internationally used structured inter-

view for the HAM-D ( 180 ). She also wrote a very important review of the

various versions of the HAM-D including the GRID-HAM-D ( 181, 182 ).

It falls naturally to mention John Rush in this connection, as he is viewed

as another American ‘heir’ of the Hamilton Depression Scale with his

Inventory of Depressive Symptomatology (IDS-30), which builds on the

HAM-D with extra items measuring the ‘atypical’ depressive symptoms of the

DSM-IV ( 183 ).

Professor Loo, who took over Pichot’s chair in Paris, made important

analyses with the HAM-D together with Marcelo Fleck and Professor

Guelfi ( 184 ). Professor Yves Lecrubier (1944 –2010) has recently compared

the HAM-D 17

and the HAM-D 6 ( 185 )and has developed a neuropsychiatric

interview, the MINI International Neuropsychiatric Interview (MINI)

Page 116: Clinical Psychometrics

Epilogue: Who’s carrying Einstein’s baton? 107

together with Professor David Sheehan ( 73 ). As for the BPRS, which was

introduced by Pichot in Europe in collaboration with John Overall (the US

developer of the BPRS), further European progress seems to have been put

on hold after Pichot’s retirement, as the PANSS version, also American, has

become its successor. It was the collaboration between John Overall and

Pichot in Europe and John Overall, Don Gorham and Leo Hollister in the US

that inspired Overall to develop a clinical scale like the BPRS. Hollister

worked as a professor of psychiatry, although he was only trained as a

specialist in internal medicine (with particular interest in antihypertensive

medicine) and had no formal training as a psychiatrist. He became the

administrative head of the largest psychiatric hospital in USA at the time

when placebo-controlled trials were conducted in psychiatry in the 1950s.

Hollister undertook what was probably the first US placebo-controlled study

on chlorpromazine in schizophrenia with ‘between-groups-analysis’ as

opposed to ‘cross-over-analysis’. He found the BPRS clinically meaningful

compared to the Rorschach test on one hand and to the Minnesota

Multiphasic Personality Inventory (MMPI) on the other ( 89 ). Hollister

found it difficult to grasp how a psychiatrist working as a serious clinician

could be able to listen to and observe a patient while at the same time, as

described by Greenberg ( 123, 126 ), frantically leafing through and completing

a stack of the quite complex scales now required by the medical industry in

their study protocols.

In Figure  10.1 , Overall is placed on the same level as Pichot, as his BPRS

scale together with the Hamilton are the archetypical rating scales of the

50 years of psychopharmacological history. In 1988, Overall arranged a

symposium under the auspices of the New Clinical Drug Evaluation Unit

(NDCEU), sponsored by the National Institute of Mental Health (NIMH)

with the title: ‘The Brief Psychiatric Rating Scale (BPRS): Recent Developments

in Ascertainment and Scaling’. Here he stresses the importance of avoiding

too many changes in a scale widely used on an international basis ( 186 ). The

mere addition in 1965 of two items to the 1962 version, so that it now consists

of 18 items means that users of the most common BPRS-18 always, incor-

rectly, refer to his 1962 paper with the original 16-item version. Overall also

mentions in his 1988 introduction that the ‘pain limit’ of a ‘brief ’ scale is

18  items. The version he recommended in 1988 is shown in Figure  1.10.

Overall finally remarks that he would like to have added ‘elevated mood’ in

order to include the manic state (186, 193).

In the 1980s, at the Albert Einstein Medical Center in New York, Stanley R.

Kay (1946–90) developed ‘The Positive and Negative Syndrome Scale’

(PANSS) in collaboration with J.P. Lindenmayer ( 187 ). This scale is based on

the BPRS, with adequate anchorings in the individual items. The PANSS is

Page 117: Clinical Psychometrics

108 Clinical Psychometrics

not a brief scale as it contains 30 items. An 11-item version is, however, with

reference to the BPRS, sufficient for the measurement of antipsychotic effect.

Other offices than those previously belonging to Kraepelin, Hamilton and

Pichot have, of course, also conducted studies in both Europe and the US in

particular to improve ICD-10 or DSM-IV with more complex rating scale

systems, but have misunderstood psychometrics by seeing item response

theory models as a special case of factor analysis ( 188,189 ).

As regards the offices of modern psychometrics, only that of Georg Rasch

in Copenhagen will be mentioned here. In a both Platonistic and physical

sense, Peter Allerup may be said to have taken over Rasch’s office after the

latter’s retirement, even though the chair Peter Allerup holds at the Danish

School of Education was not established until recently (as an institution

belonging to Århus University).

Europe has played the major role in this summary of clinical psychomet-

rics. However, American psychometrics has also been important and one

might mention in addition to the Likert scale and Siegel’s non-parametric

statistics that Jane Loevinger’s coefficient of homogeneity from a Platonistic

point of view, moved to Amsterdam when Mokken included it in his non-

parametric item response theory analysis after he had become familiar with

Guttman ’ s model through Rasch.

At Johns Hopkins Hospital in Baltimore, Derogatis took over the SCL-90

baton from the psychiatrist and professor Jerry Frank, who studied the effect

of psychotherapy in anxiety and depression. However Derogatis’ main

interest lay in gaining a SCL-90 copyright by changing two items, resulting in

the SCL-90-R. The version used in Denmark, the SCL-92, covers both the

SCL-90 and SCL-90-R ( 157 ).

At Harvard University, psychology did not become detached from philos-

ophy until Ralph Barton Perry’s time in office, as successor to William James.

Later on, Edwin Boring and then Fred Skinner, who took care of psychology

in Boston, downplayed the role of psychometrics. It was Willard Quine, who

became professor of philosophy in Boring’s time, whose set theory was more

in line with the field of psychometrics ( 190 ). Willard Van Orman Quine

(1908–2002) was appointed professor at Harvard in 1948. After the death of

Wittgenstein he was among the most influential philosophers in the English-

speaking world. Based on Russell’s theory of descriptions and typology,

Quine concluded in his book ‘From a logical point of view’ (1953) that to be

is to be the value of a variable. In other words, to be depressed is to have a

score on the HAM-D 6 of 9 or more.

Finally, it is worth mentioning here, that the University Hospital of Munich

marked the 50 th anniversary of Emil Kraepelin ’ s death by issuing a Kraepelin

Gold Medal. This award was presented to Professor Erik Strömgren.

Page 118: Clinical Psychometrics

109

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Allostasis When subjected to severe stress the human organism

attempts to attain a new stability in its hormonal and

nervous system at the cost of an increased cortisol

production. When this succeeds with a relatively

small increase in cortisol production, cortisol is

called a ‘tolerance hormone’.

Calvinism A concept introduced by the American psychiatrist

(Pharmacological) Gerald Klerman in 1972; referring to the fact that

psychopharmacologic drugs only have an effect on

mental disorders (depression, psychosis, mania, or

anxiety) and are not to be perceived as recreational

drugs in line with amphetamine or cannabis. This is

a reference to Calvin, who stated that life is predes-

tined, that God determines the course of our lives

from birth on, ‘as a doctor’s prescription’.

Clinimetrics A term introduced into medicine long after the first

rating scales were used in psychiatry. Alvan R

Feinstein (1925–2001) was professor of medicine

and epidemiology and the ‘father of clinical epide-

miology’, he introduced the term ‘Clinimetrics’.

Clinical pshycometrics is clinimetrics in psychiatry.

Compliance When constructing a questionnaire, various methods

(in filling in a are used to secure that the person completing the

questionnaire) questionnaire does read the questions properly. One

of the methods used is that of changing between

positively and negatively worded questions. Experience

has shown that there are more disadvantages than

Glossary

Page 119: Clinical Psychometrics

110 Glossary

advantages in this method. If a questionnaire has

‘mixed’ questions, then it is possible to apply factor

analysis to find out whether the positively worded

questions constitute the one factor (the one pole) and

the negatively worded questions the other factor (the

opposite pole).

Correlation A mathematical expression of the correlation between

coefficient two variables. Francis Galton used as an example

the fact that persons with long arms often have

long legs as well, when, as the first to do so, he

formulated a correlation coefficient in 1886; here

his position was that 1 meant perfect correlation, 0

meant no correlation, and -1 inverse, or negative

correlation. One of his pupils, Karl Person

developed the parametric correlation test, while

another pupil, Charles Spearman, developed the

non- parametric correlation test.

These correlation tests led on to factor analysis

(principal component analysis).

Factor analysis Introduced by Spearman in 1904 as a statistical

method by which the items in a rating scale or a

questionnaire are reduced to simple factors.

Spearman himself felt that his two-factor model

was adequate. The term ‘factor analysis’ is used in its

widest sense, especially to encompass principal

component analysis as developed by Hotelling [REF

24] in a mathematical version.

The first factor was called a general factor as it

demonstrated the degree of positive correlation

between the items (questions). The second factor

was termed a bipolar (dual) factor as it demon-

strated the items (questions) with a high degree of

correlation without this being the case for the

remaining items (questions).

This emerged through the factor loading signs.

The negative loadings form an independent, specific

scale, as do the positive loadings, that is to say two

specific scales.

Feighner criteria John P. Feighner (1937–2006) was an American

psychiatrist who was the first author of the 1972

paper: ‘Diagnostic criteria for use in psychiatric

Page 120: Clinical Psychometrics

Glossary 111

research’ (Arch Gen Psychiatry 1972; 1:57–63)

which became the basis of DSM-III. These criteria

defined Major Depression as an algorithm in which

five of the nine depression symptoms must be

present to make this diagnosis. Feighner thought

that these symptoms were the same in primary and

secondary depression. Psychometrically, this is a

transferability issue (s.d.).

Primary A depressive state which cannot be explained as

depression secondary to either physical disorders (e.g., post-

stroke depression) or to stress-induced depression.

In an anology to hypertension, primary depression

can be termed essential, or idiopathic.

Psychoanalysis A diagnostic and a therapeutic method developed by

Freud. As a therapy psychoanalysis has been found

to be without effect on mental disorders (depression,

mania, schizophrenia).

Psychopharmacology The study of drugs acting on mental functions,

including their clinical effect (antidepressant,

antipsychotic, antimanic, or antianxiety) and their

fate when entering the organism, in terms of phar-

macodynamics and pharmacokinetics.

Reductionism When a complex questionnaire or rating scale is

reduced so that it covers the whole area and not just

a single aspect.

Relapse When a person suffers a setback over the following

months after obtaining freedom from symptoms. On

HAM-D 17

, a score af 13 is seen as a relapse score.

Reliability The reliability of a questionnaire is often shown by

(questionnaire) its test-retest coefficient, i.e., when two responses by

the same person, given with a period of about

2–3  weeks between completions, are in agreement

with each other. This reliability target depends, of

course, on the person ’ s unchanged condition in the

period between test and retest.

Reliability The reliability of a rating scale, when several

(rating scale) interviewers (clinicians) assess the same patient or

patients is statistically shown by their intraclass

coefficient, where 1.0 means complete equivalence

and 0.6 only just an equivalence. All rating scales

included have a saticfactory reliability. The Rorschach

Page 121: Clinical Psychometrics

112 Glossary

test, however, has an intraclass coefficient of only

0.40 or lower.

Remission Being relatively free of symptoms, i.e., a score of 7 or

lower on the HAM-D 17

.

Response A sufficient reduction of symptoms during treat-

ment. A 50% reduction of symptoms from the time

when the treatment started is frequently used as a

measure. In dose–response studies, effect size is a

more distinctive response measure. Both methods

are universal measures as they do not depend on the

raw score of the rating scale used.

Standardisation The scale scores defined to indicate response, remis-

sion and relapse.

Transferability When a scale still measures the same dimension

each time it is applied several times during treat-

ment, or when different assessors rate the same sub-

ject, no matter whether their condition is primary or

secondary. Psychometrically, only item response

theory analyses are able to show whether transfera-

bility has been achieved.

Unidimensionality A rating scale is said to be unidimensional, when it is

accepted by Rasch analysis. Rasch analysis presup-

poses that scores on items with low prevalence are

preceded by scores on items with higher prevalaence.

Items with low prevalence measure the more severe

degrees of the dimension to be assessed while items

with high prevalence measure the milder degrees.

Validity (clinical) Clinical validity means the degree to which a rating

scale or a questionnaire has clinical significance or is

clinically valid. After DSM-III, DSM-IV and ICD-10

had been introduced it became customary to use

these systems as an index of clinical valididty. An

example is the Major Depression Inventory (MDI)

which has a high clinical validity (‘face validity’)

because its questions correspond with the depression

symptoms of DSM-IV major depression.

Validity Psychometric validity means that the rating scale or

(psychometric) the questionnaire has been analysed psychometri-

cally, e.g., by means of the item response theory

model to find out whether it is unidimensional, also

when women and men, or younger and older

Page 122: Clinical Psychometrics

Glossary 113

persons are compared. This type of validity is also

called ‘internal’ validity.

Validity (external) External validity describes the degree to which a

scale correlates with factors outside the scale, e.g.,

dosage of a drug (dose–response relation) or

whether it is able to discriminate between treatment

with an active drug (verum) and an inactive drug

(placebo).

Visual analogue An assessment method which measures the

scale (VAS) dimension in question on a line from zero to 10

(centimeters) or from zero to 100 (millimeters).

Zero indicates that there is nothing to measure, and

10 or 100 indicate an extreme degree.

Window A term used for the time frame a rating scale covers,

e.g., the past three days. It is derived from consid-

ered a rating scale as a camera, visualising clinical

reality (6).

Page 123: Clinical Psychometrics

114

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Wundt, Kraepelin and Wittgenstein all stood on the shoulders of Kant

with  their phenomenological approach, saying that when we know things

clinically, we then know how to bring symptoms or signs back to their respec-

tive syndromes. They tried very consistently to avoid all etiological factors of

the clinical syndromes focusing on the description alone. At the end of his

Philosophical Investigations Wittgenstein says: ‘Can one learn this descriptive

knowledge? Yes, some can. Not, however by taking a course in it, but through

“experience”. Can someone else be a man’s teacher in this? Certainly. From

time to time he gives him the right tip’.

In this Appendix some of the right tips in the spirit of Wittgenstein are

indicated, clinically and psychometrically. To enable the clinician to make

more effective and economic use of his or her basically limited capacities

for  handling scales we have focussed on brief scales. Thus, the Hamilton

Depression Scale has been decomposed into three familiar subscales (spe-

cific, arousal, suicidal). It is easier to remember these three words than the

whole string of seventeen items in this scale.

The dialogue between the interviewer and his or her patient should be

considered as an informal conversation in which the task of the interviewer

is to give the patient a feeling of relief in knowing that the interviewer is thor-

oughly familiar with the problems the patient had feared were private and

non-communicable. Throughout this Appendix Wittgenstein’s approach has

been followed when selecting and describing the various scales by ‘bringing

the items back to their respective syndromes’.

Table  A.1 shows how the informal conversation is finally measured by a

total score which has been standardised.

Appendix 1 is the Hamilton Copenhagen Lecture which can be seen as a

paraphrase of Wittgenstein’s concept of phenomenological or descriptive

Appendix

Page 124: Clinical Psychometrics

Max Hamilton’s HAM-D

Danish version [1] Danish version [2]

Consensus Danish version [3]

English version [4] English version [5]

Consensus English back translation [6]

To be accepted by Max Hamilton [7]

Figure A.1 The six steps in the translation procedure leading to the fi nal acceptance by the developer of the scale, exemplifi ed by the Danish version of HAM-D

Table A.1 Standardisation of three different depression scales: Hamilton Depression Rating Scale (HAM-D 17 and HAM-D 6 ) and Bech-Rafaelsen Melancholia Scale (MES)

HAM-D 17 MES HAM-D 6

Theoretic score-range 0–52 0–44 0–22 Remission (relative zero point) 7 6 4

DEGREES OF CLINICAL DEPRESSION HAM-D 17 MES HAM-D 6

Doubtful depression 8–12 7–10 5–6 Mild depression 13–17 11–14 7–8 Moderate depression 18–24 15–24 9–11 Medium-severe to severe depression 25–52 25–44 12–22

Page 125: Clinical Psychometrics

116 Clinical Psychometrics

knowledge. The expert judgment about the general expression of feelings is

most valid, according to Wittgenstein (1953): ‘Most valid from the judgment

of those who understand by experience people better (des bessern Menschen

kennen).’

Appendix 2 is an example of how to learn the use of the Hamilton

Depression Scale; the tips from the A, B, C version.

Appendix 3 contains a selection of depression scales; especially the

Melancholia Scale (191).

This collection of scales includes those mentioned several times in the

text, as well as others that merit a more detailed description, together with

their standardised values. When a psychometric analysis has shown that a

total score is a sufficient reduction of the information available in the indi-

vidual items, then the question naturally arises of how to interpret this score.

This is the meaning of standardisation (Table  A.1 gives an example).

Appendix 4 contains the Major Depression Inventory and Appendix 5 the

Hamilton Anxiety Scale.

Appendix 6 contains the Mania Scale (MAS) (192).

The interview based scales in the appendix contain both scoring sheets

and scoring manuals. Some of the scales included in the appendix consist of

items selected from more comprehensive scales. Thus Appendix 3f, 3g and 3i

are each made up of items taken from more comprehensive scales. Appendix

3h consists of items from the Hopkins Symptom Checklist (SCL-92).

Appendix 3i contains the six items in the Beck Depression Inventory (BDI

version I), corresponding to the six HAM-D 6 items.

Figure  A.1 demonstrates the translation procedure recommended by

WHO. HAM-D is used as an example, precisely because it is to be found in

so many different translations even within the same language area. This often

results in not knowing which of these versions was used in a specific study.

Often the reference is to Hamilton’s first English version from 1960, but this

version is not used any more, as Hamilton himself could not recommend it.

The Danish version is a very free translation. Its back translation into English

was published in 1986 after prior approval from Max Hamilton. The Danish

professor Ole Rafaelsen (1930–87) was primus motor here. He also played a

major part in the development of the MES and the MAS. Ole Rafaelsen also

made a back translation of BDI into US-English.

Page 126: Clinical Psychometrics

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

117

Rating scales are so extensively used in clinical trials that it is difficult to find

a report of a drug trial that does not use at least one scale.

The young psychiatrist will find it difficult to believe that the use of such

scales in psychiatry is still comparatively new. As little as forty years ago, a

leading British psychiatrist declared that to make a scale of different symp-

toms and to add scores on them to produce a total was a meaningless proce-

dure. It was impossible! Opinions have changed so much that it has almost

become accepted that a clinical trial is not “scientific” without the use of a

rating scale. Of course, this is quite mistaken. Clinical trials have been carried

out without scales in the past and will be in the future. The excessive preoc-

cupation with scales has led almost to a sort of worship of them, and has

undoubtedly led to some misuse.

A rating scale is no more than a particular way of recording a clinical judg-

ment. The clinician puts down his opinion on the presence or absence of a

symptom, or its severity, but whether he does so in words or in the form of a

number, the judgment is the same. However, judgments are of different kinds

and have therefore to be recorded on appropriate scales. The commonest

type of scale is that used for recording severity of illness and is quite different

in nature from those which are related to the other kinds of clinical decisions,

e.g. making a diagnosis or selecting patients for treatment. Although rating

scales have a clearly defined role to play in clinical psychiatry it must not be

forgotten that it is a very limited one.

The fundamental basis of scales rests on the everyday work of the clini-

cian. The psychiatrist is accustomed to say of a patient “This patient is now

better or worse than last week”. The patients themselves can make the same

sort of judgment. Furthermore, a clinician can say “This patient is worse, or

better, than that patient”. The process can be carried further. If we can say that

one patient is more ill than another, we can place a group of patients in an

Appendix 1 The clinical validity of rating scales for depression Copenhagen 1977

Hamilton M.

Page 127: Clinical Psychometrics

118 Clinical Psychometrics

order , in which the first is the most severely ill, going down to the last who is

least ill. When we try to put a large number of patients into rank order, it is

easier to assemble them into groups which have a rank order. The experi-

enced clinician can remember the characteristics of such groups and can

allocate a patient to the appropriate group without making a direct compari-

son. This is what is meant by making a Global Judgment. The same proce-

dure can be applied to the individual manifestations of the illness, i.e. the

symptoms.

It is generally accepted in all branches of medicine that the more symp-

toms which a patient experiences, the more ill he is. This forms a crude, but

surprisingly effective way, of measuring the severity of illness. The doctor

goes through a list of symptoms and checks how many are shown by the

patient. The total checked is a measure of severity of illness. An obvious

improvement is to take into consideration the extent of a symptom. A severe

symptom should contribute more to the total score than a mild one. In other

words, the symptom is given a weight according to its severity. Such a system

of weighting converts a check-list into a rating scale, and it is clear that the

total score is merely a way of recording the clinician’s judgment.

There arise immediately three questions which may seem naïve but which

are really very important. The first one goes as follows. Counting the number

of symptoms will show that a patient who has eighteen is more ill than one

who has six, but what do you do when one patient has six symptoms and

another has six completely different ones? How is a decision made then? One

answer is to say that no decision is made; but that is not the whole truth.

It is generally true that symptoms are not mutually exclusive, i.e. that the

presence of one prevents the other appearing. In general, symptoms tend to be

associated with each other. This can be shown by calculating the correlations

between symptoms, when it is found that the correlations are all positive.

These positive correlations provide the mathematical justification for adding

the scores on the symptoms to make a total score. However, there are some

special circumstances in which a group of symptoms, all correlated positively

with each other, will have negative correlations with another group. This

shows that when the symptoms of the one group are present, those in the

other group will tend to be absent. One way of dealing with this situation is to

have two separate scales. From the clinical point of view, it is better to divide

the patients into two groups and to deal with them separately. This is only a

partial answer, but it serves to show that the question is not a simple one.

The second question asks “How is the weighting determined?” If a symptom

is absent, it is scored zero. If it is trivial, mild, moderate or severe, it is scored 1,

2, 3 or 4 respectively. Why not 1, 2, 4 or 8 respectively? And how is a comparison

made between 1 to 4 for depression and 1 to 4 for paranoid thinking?

Page 128: Clinical Psychometrics

The clinical validity of rating scales for depression 119

The second part is very similar to the first question and again the answer

is not a simple one. There are technical ways of determining what should be

a value appropriate to every category of symptoms and every grade of sever-

ity. But in general, these complicated techniques give an answer which is very

much the same as the simple ones. The difference is so small that it is not

worth the trouble. However, in some circumstances simple crude weights are

unsatisfactory.

The third question is the one which worried psychiatrists 40 years ago. How

can one add scores on depression, loss of weight and loss of libido and obtain

a total which makes any sense? It does not appear to be capable of having any

meaning. There is some truth in this but it misses the point. To the patient,

one of the most important aspects of mental illness, as of all illnesses, is that it

is a loss of functional capacity. The patient suffers from disabilities: he cannot

work, he cannot sleep, he cannot carry on life in the usual way, and each extra

symptom is, in a sense, an additional burden on him. When we add scores we

are not so much adding scores on depression, loss of weight or loss of libido,

as adding up measures of disability. It is disability which is common to all the

symptoms and so a total score represents, in a way, the suffering of the patient.

These three questions seem to be concerned with very simple elementary

points, but in fact although they are simple, they are not elementary.

The most important classification of scales is that which distinguishes

between those used by an observer and those used by the patient.

Each type of scale has its advantages and disadvantages. For example, the

observer scale will include items on information which a patient cannot give.

By definition, a patient cannot describe his loss of insight nor can he say that

he has delusions, although he may say that he has hallucinations. The observer

scale when used by an experienced clinician, can record very small and deli-

cate changes, which are difficult for the inexperienced person and especially

for the patient, to recognize. However, they do take a long time; even half an

hour’s interview is, in my opinion, not really enough.

A disadvantage of the self-rating scale is that the patients are likely to fill in

the form about their condition with the help of wife or husband. If they make

daily assessments and take home the forms, then the children, grandparents

and cousins will come to help. Even the milkman and butcher may offer

assistance to help fill in the scale! The self-assessment scale has the great

advantage that it is easy to use repeatedly. A patient can be asked to describe

how he feels today or even this hour. Most observer scales have difficulties

over this and some scales make an assessment covering a period of a week or

two weeks.

In the end, there is no such thing as the best scale for all circumstances, all

patients and under all conditions. The clinician who is going to use a scale

Page 129: Clinical Psychometrics

120 Clinical Psychometrics

must decide what he wants to get from it; what is the information he is look-

ing for. The two types of scale give different information. Two important

requirements are high validity and reliability, and this is found in most

observer scales. Validity signifies that scales measure truly what is wanted of

them. One way of measuring validity is to compare a group of severely ill

patients with a group which is only moderately ill. If the first group obtains

high scores and the second low scores, we can say that the validity is high.

High (inter-rater) reliability means that if two raters use the scale at the same

time, the scores they obtain will be very close. It is an astonishing fact that

rating scales can be more accurate and reliable than some physical measure-

ments. A last word on these points: a clinician should ask himself not only

what he wants to measure and how, but also why. This last question is not

asked sufficiently often.

Originally scales were validated against a global judgment, i.e. when a scale

was designed it was tested by comparing the results obtained with the scale

against the physicians’ judgment. This took priority and determined whether

the scale could be regarded as satisfactory. Now that rating scales are regarded

as acceptable for assessment, we can reverse the process, as I have been sug-

gesting for many years. We can use the scales to look at global judgment, to

examine what the psychiatrist does and how he does it. In this respect, one of

the most interesting pieces of research is one carried out here in Copenhagen

by Per Bech and his colleagues. What they showed was that the Hamilton

scale did correlate very well with global judgment except at the most severe

levels. Furthermore, they found that to get an exact or a better correspond-

ence between the scale and global judgment, the full 17 items were unneces-

sary. Six of them did all the work and the other 11 were, so to speak, passengers

which just interfered with the work.

I think it is not an accident that 6 items are sufficient to equal the global

judgment. We know from research by psychologists in all sorts of ways that

the human mind is capable of holding, on an average, only 7 items of infor-

mation. There is a very famous paper published on this “The magic number

seven, plus or minus two”. The fact that 6 items in the scale do the work of

global judgment suggests that what the clinician is doing is to hold in his

mind about 6 or 7 items of information and this is what he assembles into his

judgment. Of course, which items he assembles is another matter. Bech and

his colleagues showed that the items which played little or no part were either

those which did not occur often or those which the physician thinks are not

important.

It would also appear that the weight given to a particular symptom is not

the same at all levels of severity. I suspect that when a symptom begins to be

very severe, it is given increasing importance. A depressive patient, if actively

Page 130: Clinical Psychometrics

The clinical validity of rating scales for depression 121

suicidal, makes himself a crisis situation to the physician, whatever the other

symptoms may be, they are overshadowed.

When suicidal thinking is mild, it takes its place with the other symptoms,

but as it becomes more severe, the physician takes more and more notice of it

and less and less of the other symptoms.

References

Bech , P. , Gram , L.F. , Dein , E. , Jacobsen , O. , Vitger , J ., & Bolwig , T.G . ( 1975 ) Quantitative

rating of depressive states . Acta Psychiatrica Scandinavica , 51 , 161 – 70 .

Miller , G.A. ( 1956 ) The magical number seven plus or minus two: some limits on our

capacity for processing information . Psychological Review , 63 , 81 – 97 .

Page 131: Clinical Psychometrics

122

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

With the introduction of the new classification systems of psychiatric

disorders (ICD-10 and DSM-IV) two decades ago it became impossible to

distinguish between primary and secondary (stress-related) depression 1 .

The stimulus-response models for both PTSD (one single, severe life

event) and for exhaustion depression (multiple distressing life events) are

placed within the anxiety disorders in the ICD-10 and DSM-IV, although the

delayed distress response in these syndromes often progresses into the full

clinical picture of depression when untreated. The most internationally valid

measure of depressive states is the Hamilton Depression Scale (HAM-D 17

) 1 .

Figure A 2.1 shows how the 17 items in the HAM-D can be re-allocated

following the triangle corners so that “A” covers the core items of the

depressive state (HAM-D 6 ), while “B” covers the unspecific stress (arousal)

items with reference to Selye ’ s original definition of stress as the non-specific

response of the body to any demand made upon it 2 . Finally, “C” covers the

items of suicidal thoughts and lack of insight. In a patient with primary or

secondary depression, suicidal thoughts are often activated if there is a lack

of insight on the part of the patient into his disorder 3 .

When Hamilton developed his scale 4 , he consulted Kraepelin ’ s original

description of primary depression (manic-depressive illness), as well as

Kraepelin ’ s description of secondary depression (exhaustion depression).

However, Hamilton also made focus-interviews with his depressed patients

and their relatives 4 . This was the background for his selection of the 17 items

in the HAM-D.

Psychometric analyses with either principal component analysis or item-

response theoretical models 5 have shown that the HAM-D 6 (A in Figure A 2.1 )

is a valid measure of depression and thereby the most specific outcome

measure of the effect of antidepressant medication.

Appendix 2 The ABC profile of the HAM-D 17

Page 132: Clinical Psychometrics

* M

odes

t**

Sig

nific

ant

***

Sig

nific

ant:

Am

ount

s gi

ven

to th

e au

thor

’s in

stitu

tion

or to

a c

olle

ague

for

rese

arch

in w

hich

the

auth

or h

as p

artic

ipat

ion,

not

dire

ctly

to th

e au

thor

.F

or m

ore

info

rmat

ion,

see

inst

ruct

ions

for

auth

ors.

Bec

h P

--

--

--

-

Wri

tin

g g

rou

pm

emb

er

Dis

clo

sure

s

Oth

er r

esea

rch

gra

nt

or

med

ical

co

nti

nu

ou

sed

uca

tio

n2

Sp

eake

r’s

ho

no

rari

aO

wn

ersh

ipin

tere

stC

on

sult

ant/

Ad

viso

ryb

oar

dO

ther

3R

esea

rch

gra

nt1

Em

plo

ymen

tAB

C-v

ersi

on

of

the

Ham

ilto

n D

epre

ssio

n S

cale

(H

AM

-D)

HA

M-D

9(B

)

(B)

To

tal s

core

:H

AM

-D2

Th

e su

icid

e ri

sk b

ehav

iou

r

3.

Sui

cida

l tho

ught

s

16.

17.

Wei

ghtlo

ss

Hyp

ocho

ndria

sis

Psy

chom

otor

agi

tatio

n

Inso

mni

a : i

nitia

l

Inso

mni

a : m

iddl

e

Inso

mni

a : l

ate

Anx

iety

, som

atic

Gas

troi

ntes

tinal

sym

pt.

Sex

ual d

istu

rban

ces

15.

14.

12.

11.

9.

6.

5.

4.

13.

10.

8.

7.

2.

1.

Dep

ress

ed m

ood

Som

. Sym

pt. g

ener

al

Anx

iety

, psy

chic

Psy

chom

otor

ret

arda

tion

Aci

vitie

s an

d in

tere

sts

Gui

lt

Insi

ght

Th

e p

ure

dep

ress

ion

pic

ture

Th

e st

ress

-rel

ated

aro

usa

l

(C)

(C)

To

tal s

core

:H

AM

-D6

(A)

(A)

To

tal s

core

:

HA

M-D

17T

ota

l sco

re:

(A+B

+C)

Fig

ure

A2

.1

Page 133: Clinical Psychometrics

124 Clinical Psychometrics

When evaluating the specific antidepressive effect of an intervention we

need to focus on the HAM-D 6 1 . The theoretical score range of the HAM-

D 6 goes from 0 to 22, whereas the theoretical score range of the whole

HAM-D 17

goes from 0 to 52. In other words, the explained variance of the

HAM-D 6 theoretically covers no more than approximately 40% of the

HAM-D 17

. In patients with major depression, however, the HAM-D 6 typi-

cally explains over 50% of the total score of the HAM-D 17

. For instance, in

the STAR*D study the HAM-D 6 explained 53% of the variance in the base-

line data set 5 .

The nine items covered by the HAM-D 9 (B in Figure A 2.1 ) measure the

unspecific stress reaction in the body. Antidepressants with antihistamine

effects are often superior to selective serotonin reuptake inhibitors (SSRIs)

on the HAM-D 9 items 5 . Activation of the hypothalamic-pituitary-adrenal

(HPA) axis resulting in high cortisol levels in the body is a dysregulation that

accompanies depression as an unspecific reaction, i.e., it should not be seen

as the cause of primary depression. In the STAR*D study, the HAM-D 9

explained 41% of the variance 5 .

The discussion about the risk of suicide during initial SSRI treatment of

depressed patients might be an activation on the HAM-D 9 compared to

the HAM-D 6 . When prescribing SSRIs, it is therefore important to assess

the ABC profile of the HAM-D 17

. In the daily routine therapy of patients

with depressive illness the most valid way to monitor outcome is the

ABC profile.

For the untrained young doctors educated in the use of the HAM-D 17

, the

ABC profile is a simple way of recalling how the items in the HAM-D 6 ,

HAM-D 9 , and HAM-D

2 are best applied. The interview is recommended to

start from corner B, as these unspecific symptoms are easiest to capture, and

then go on to A and finish with C. Actually, this order is also the way in

which the spontaneous PTSD syndrome develops. During the first weeks,

the HAM-D 9 symptoms develop, and after some months the symptoms cov-

ered by the HAM-D 6 appear. In PTSD cases that do not remit, symptoms in

the HAM-D 2 should be carefully assessed.

The use of the ABC profile in the HAM-D interview shall give the

depressed patient a feeling of relief as the interviewer seems to be thor-

oughly familiar with the kind of illness that confronts him and to be

acquainted with the kind of feelings and thoughts that depression brings to

the patient. This is a vital start of the treatment process in the patient-doctor

relationship. The evaluation of the HAM-D 9 items (unspecific arousal items)

is important when measuring outcomes of antidepressive treatment because

they might overlap with the side-effects of the medication prescribed. The

Page 134: Clinical Psychometrics

The ABC profi le of the HAM-D17 125

use of a scale for the assessment of tolerable versus intolerable side-effects as

in the STAR*D study is an important supplement to the ABC profile of the

HAM-D 17

.

References

1 Bech P. Struggle for subtypes in primary and secondary depression and their

mode-specific treatment or healing . Psychother Psychosom. 2010 ; 79 ( 6 ): 33 – 38 .

2 Selye H . The evolution of the stress concept . Am Sci. 1973 ; 61 : 692 – 699 .

3 Bech P , Olsen LR , Nimeus A . Psychometric scales in suicide risk assessment . In:

Wasserman D , editor. Suicide – an unnecessary death . London : Martin Dunitz ;

2001 . p. 147 – 158 .

4 Bech P. Fifty years with the Hamilton scales for anxiety and depression. A tribute

to Max Hamilton . Psychother Psychosom. 2009 ; 78 ( 4 ): 202 – 211 .

5 Bech P , Fava M , Trivedi MH , Wisniewski SR , Rush AJ . Factor structure and

dimensionality of the two depression scales in STAR*D using level 1 datasets .

J Affect Disord . 2011 ; 132 : 396 – 400 .

Page 135: Clinical Psychometrics

126

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

The time frame (window) is the past three days.

Scoring sheet

Nr. Symptom Score

1 * Depressed mood 0–4

2 * Low self-esteem, guilt 0–4

3 Suicidal thoughts 0–4

4 Insomnia: initial 0–2

5 Insomnia: middle 0–2

6 Insomnia: late 0–2

7 * Work and interests 0–4

8 * Psychomotor retardation 0–4

9 Psychomotor agitation 0–4

10 * Anxiety, psychic 0–4

11 Anxiety, somatic 0–4

12 Gastrointestinal symptoms (appetite) 0–2

13 * Somatic symptoms, general 0–2

14 Sexual disturbances 0–2

15 Hypochondriasis (somatisation) 0–4

16 Insight 0–2

17 Weight loss 0–2

* Depression factor (HAM-D6) Total score 0–52

Appendix 3a Hamilton Depression Scale (HAM-D17)

Page 136: Clinical Psychometrics

Hamilton depression scale (HAM-D17) 127

SumNo depression: 0–7Doubtful depression: 8–12Mild depression: 13–17Moderate depression: 18–24Severe depression: 25–52

Hamilton Depression Scale (HAM-D17)

Manual

1. Depressed moodThis item covers both the verbal and the non-verbal communication of

sadness, depression, despondency and hopelessness.

0: Absent.

1: Slight tendency to despondency or sadness.

2: Clearer indications of lowered mood, moderately depressed but no

hopelessness.

3: Mood significantly lowered, perhaps non-verbalsigns (e.g. weeping).

Reports hopelessness.

4: Mood severely lowered, clear signs of hopelessness.

2. Self-depreciation and guilt feelingsThis item covers lowered self-esteem with guilt feelings.

0: No self-depreciation or guilt feelings.

1: Lowered self-esteem in relation to family, friend or colleagues, feeling

him-/herself to be a burden during present depressive state.

2: Indications of guilt feelings more clearly present because the patient is

concerned with incidents in the past prior to current episode (minor

omissions or failings).

3: Feels that the current depressive condition suffering is some sort of

punishment. However, still intellectually able to recognize that this is

hardly correct.

4: Guilt feelings and impression that current depressive condition is a

punishment, cannot be persuaded otherwise (delusion).

3. Suicidal impulses 0: Absent.

1: The patient feels that life is not worth while, but he expresses no wish to die.

2: The patient wishes to die (e.g. not waking up the next morning), but has

no plans to take his/her own life.

Page 137: Clinical Psychometrics

128 Clinical Psychometrics

3: Vague, but still active plans to take own life.

4: Has certain plans to take own life.

4. Initial insomniaAsk about the last three nights irrespective of possible sedatives

0: Absent.

1: At least on one night awake in bed more than half an hour trying to fall

asleep.

2: Each night awake in bed more than half an hour trying to fall asleep.

5. Middle insomniaThe patient wakes up one or more times between midnight and 5 a.m. Ask

about the last three nights irrespective of possible sedatives.

0: Absent.

1: Wakes up once or twice during the last 3 nights.

2: Wakes up at least once every night.

6. Delayed insomnia = Premature awakeningThe patient wakes up before planned. Ask about the last three nights irre-

spective of possible sedatives.

0: Absent.

1: Once woken up an hour or more before planned.

2: Consistently woken up an hour or more before planned.

7. Work and interests 0: No problems.

1: Slight problems with usual daily activities (at home or outside home).

2: More pronounced insufficiency but still only moderate.

3: Problems managing routine tasks, only completed with major effort.

Clear signs of helplessness.

4: Completely unable to go through with routine activities without aid,

i.e. extreme helplessness.

8. Psychomotor retardation 0: Absent.

1: Patient’s usual motor level of activity only slightly reduced.

2: Clearer signs of reduced motor activity, e.g. moderately reduced gesticu-

lation and slow pace or moderately slowed speech.

Page 138: Clinical Psychometrics

Hamilton depression scale (HAM-D17) 129

3: The interview is clearly prolonged or made difficult due to brief answers.

4: The interview very difficult to complete due to verbal retardation and/or

extremely reduced motor activity.

9. Psychomotor agitation 0: Absent.

1: Slight motor agitation. E.g. tendency to change position in chair or

scratch head.

2: Clearer signs of motor agitation; wringing hands, moderate problem

sitting still in chair, but remains seated.

3: The patient gets up from chair once during interview.

4: The patient so agitated that he/she has to get up and pace about several

times during interview.

10. Anxiety (psychic) 0: Absent.

1: Slight worrying and fear.

2: Clearer indications of psychic anxiety, appears moderately worried, inse-

cure or afraid, but still able to control insecurity.

3: Psychic anxiety and worry so pronounced that it is difficult for patient to

control; at times impact on daily activities.

4: Psychic anxiety very pronounced; constant impact on daily activities

11. Anxiety (somatic)This item includes physiological or autonomic anxiety phenomena. Psychic

tension should be rated in item 10.

0: Absent.

1: Slight tendency to somatic anxiety symptoms such as stomach upset,

sweating or trembling.

2: Clearer indications of somatic tension. E.g. moderate stomach upset, pal-

pitations, sweating or tremor. Still without impact on daily life.

3: Somatic anxiety so pronounced that the patient experiences difficulty

controlling this. At times impact on daily life.

4: Somatic anxiety extremely pronounced; fairly constant impact on daily life.

12. Somatic, Gastro-intestinalSymptoms have impact on entire gastro-intestinal tract. Dry mouth, loss of

appetite, and constipation are among the most frequent symptoms. Upset stom-

ach (“butterflies in the stomach”) is a autonomic somatic anxiety manifestation

Page 139: Clinical Psychometrics

130 Clinical Psychometrics

to be assessed in item 11. A feeling that “stomach disintegrates”) is a nihilistic

paranoid manifestation of hypochondriasis and should be assessed in item 15.

0: Absent.

1: Slightly reduced appetite or food intake about normal, but without

enjoyment.

2: Appetite moderately or extremely reduced. Still eats, as he/she recog-

nizes that this is important.

13. Somatic, GeneralThis item is about feelings of fatigue and exhaustion, reduced energy, but also

diffuse muscular aches and pains in neck, shoulders, back or limbs.

0: Absent.

1: Slight fatigue, muscle pains or perhaps headache.

2: Moderate or pronounced fatigue or muscle pains.

14. Sexual interestThis item is about reduced libido or interest. It is often difficult to approach,

especially in older patients.

0: No disturbances.

1: Mild disturbances.

2: Moderate to severe disturbances.

15. Hypochondriasis 0: Absent.

1: Slight preoccupation with bodily functions.

2: Clear indications of concern as to somatic health. Appears moderately

afraid that he/she is somatically ill, somatises depression but at a

“ neurotic” level.

3: Hypochondriasis more pronounced. The patient is convinced that he/

she is suffering from somatic condition (e.g. fear of cancer), but can be

persuaded that this is not the case for a short while.

4: Hypochondriasis extremely pronounced, paranoid delusions. Often

nihilistic: “rotting insides”; “stomach disappearing”.

16. Loss of insightThis item has, of course, only meaning if the observer is convinced that the

patient at the interview still is in a depressive state.

0: The patient agrees to having depressive symptoms or a “nervous” illness.

Page 140: Clinical Psychometrics

Hamilton depression scale (HAM-D17) 131

1: The patient still agrees to being depressed, but feels this to be secondary

to non-illness related conditions like malnutrition, climate, overwork.

2: Denies being ill at all. Delusional patients are by definition without

insight. Enquiries should therefore be directed to the patient ’ s attitude to

his symptoms of Guilt (item 2) or Hypochondriasis (item 15), but other

delusional symptoms should also be considered.

17. Weight lossTry to get objective information; if such is not available be conservative in

estimation.

0: No weight loss.

1: Weight loss less than two kg.

2: Weight loss of 2 kg or more.

Pure depression Stress-related arousal

(A)

(C)

(B)

1. Depressed mood

2. Guilt

7. Activities and interests

8. Psychomotor retardation

10. Anxiety, psychic

13. Somatic symptoms – general

Insomnia : initial

Insomnia : middle

Insomnia : late

Psychomotor agitation

Anxiety, somatic

Gastrointestinal symptoms

Sexual disturbances

Hypochondriasis

4.

5.

6.

9.

11.

12.

14.

15.

17. Weight lossSuicidal thoughts3.

16. Insight

HAM-D6Total score:

HAM-D2Total score:

HAM-D9Total score:

Suicide risk behaviour

ABC version of the Hamilton Depression scale (HAM-D)

(A) (B)(C)

HAM-D17Totalscore: (A+B+C)

Page 141: Clinical Psychometrics

132

Scoring sheet

Nr. Symptom Score

1 Depressed mood 0–4

2 Low self-esteem, guilt 0–4

3 Suicidal thoughts 0–4

4 Insomnia: initial 0–2

5 Insomnia: middle 0–2

6 Insomnia: late 0–2

7 Work and interests 0–4

8 Psychomotor retardation 0–4

9 Psychomotor agitation 0–4

10 Anxiety, psychic 0–4

11 Anxiety, somatic 0–4

12 Gastrointestinal symptoms (appetite) 0–2

13 Somatic symptoms, general 0–2

14 Sexual disturbances 0–2

15 Hypochondriasis (somatisation) 0–4

16 Insight 0–2

17 Weight loss 0–2

18 Diurnal variation 0–2

19 Depersonalization and derealisation 0–4

20 Paranoid symptoms 0–4

21 Obsessional and compulsive symptoms 0–2

22 Helplessness 0–4

23 Hopelessness 0–4

24 Worthlessness 0–4

Total score 0–76

Appendix 3b Hamilton Depression Scale (HAM-D 24 )

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 142: Clinical Psychometrics

Hamilton depression scale (HAM-D24) 133

Hamilton Depression Scale (HAM-D 24 )

Manual

18. Diurnal variation 0: None.

1: Mild.

2: Severe.

19. Depersonalization and derealoization Such as: feelings of unreality, nihilistic ideas .

0: Absent.

1: Mild.

2: Moderate.

3: Severe.

4: Incapacitating.

20. Paranoid symptoms 0: None.

1: Suspicious.

2: Ideas of reference.

3: Delusions of reference and persecution.

4: Hallucinations.

21. Obsessional and compulsive symptoms 0: Absent.

1: Mild.

2: Severe.

22. Helplessness 0: Not present.

1: Patient reports mild feelings of helplessness.

2: Moderate feelings of helplessness.

3: Strong feeling of helplessness.

4: Strong feelings of helplessness AND has given up routine activities of

normal life (decreased personal hygiene, doesn’t get out of bed, difficulty

feeding self, etc.).

Page 143: Clinical Psychometrics

134 Clinical Psychometrics

23. Hopelessness Pessimistic about future

0: Not present.

1: Very mild feelings of hopelessness.

2: Feels “hopeless” but accepts reassurances.

3: Expresses feelings of discouragement, despair, pessimism about future,

which cannot be dispelled.

4: Inappropriately perseverates, “I’ll never get well” or equivalent.

24. Worthlessness Ranges from mild loss of esteem, feelings of inferiority, self-deprecation to delu-

sional notions of worthlessness .

0: Not present.

1: Very mild feelings of low self-esteem.

2: Feelings of worthlessness.

3: Strong feelings of worthlessness.

4: Delusions of worthlessness, “I am a sinner”.

Page 144: Clinical Psychometrics

135

Appendix 3c ABC Version of the Montgomery-Åsberg Depression Scale (MADRS10)

Specific depression state(MADRS6)

Unspecific (arousal) state(MADRS3)

(A)

(C)

(B)1. Apparent sadness (0 – 6)

2. Reported sadness (0 – 6)

3. Inner tension (0 – 6)

7. Lassitude (0 – 6)

8. Inability to feel (0 – 6)

9. Pessimistic thoughts (0 – 6)

10.

Total score: Total score:

Suicide risk behaviour

(A) MADRS6 (B) MADRS3(C) MADRS1

MADRS10Total score: (A+B+C)

4. Reduced sleep (0 – 6)

5. Reduced appetite (0 – 6)

6. Concentration (0 – 6)

Suicidal thoughts (0 – 6)

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 145: Clinical Psychometrics

136

The time frame (window) is the past three days.

Scoring sheet

No. Symptom Score

1 Depressed mood 0–4

2 Tiredness 0–4

3 Work and interests 0–4

4 Concentration difficulties 0–4

5 Sleep disturbances 0–4

6 Psychic anxiety 0–4

7 Emotional introversion 0–4

8 Worthless and guilt 0–4

9 Suicidal thoughts 0–4

10 Decreased verbal activity 0–4

11 Decreased motor activity 0–4

Total score 0–44

No depression: 0–6 Doubtful depression: 7–10 Mild depression: 11–14 Moderate depression: 15–24 Severe depression: 25–44

Appendix 3d The Bech-Rafaelsen Melancholia Scale (MES)

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 146: Clinical Psychometrics

The Bech-Rafaelsen melancholia scale (MES) 137

The Bech-Rafaelsen Melancholia Scale (MES)

Manual

Item 1 Depressed mood 0: Not depressed

1: Slight tendencies to lowered spirits

2: More clearly preoccupied with unpleasant feelings although without

clear hopelessness

3: Markedly lowered mood. Feelings of hopelessness clearly present and/or

clear non-verbal signs of lowered mood

4: Severe degrees of lowered mood. Pronounced degree of hopelessness

Item 2 Tiredness 0: Not present

1: Very mild feelings of tiredness

2: More clearly in a state of tiredness or weakness, but still no impairment

on the daily life activities

3: Marked feelings of tiredness which occasionally interfere with the daily

life activities

4: Extreme feelings of tiredness which interfere more constantly with the

daily life activities

Item 3 Work and interests 0: No difficulties in social life (work) activities or interests

1: Slight problems with usual daily activities (at home or outside home)

2: More clearly insufficiency in social life activities or interests but without

helplessness

3: Difficulties in performing even daily routine activities, which are carried

out with great effort. Tendencies to helplessness

4: Completely unable to go through with routine activities without aid

from others, i.e. extreme helplessness

Item 4 Concentration difficulties This item includes both concentration difficulties and memory problems

0: Not present

1: Very mild tendencies to concentration disturbances

2: More clearly difficulties in concentration or problems in decision

making but still without impact on daily life activities

Page 147: Clinical Psychometrics

138 Clinical Psychometrics

3: Concentration disturbances/memory problems so great that reading

more than newspaper headlines or watching even shorter television

program is difficult

4: It is clear even during the interview that there are difficulties in concen-

tration

Item 5 Sleep disturbances This item only covers the subjective experience of reduced sleep length

(hours of sleep/24 hours), irrespective of possible sedatives. The assessment

should be based on the three preceding nights, The score is the average of the

past three nights

0: No reduced sleep length

1: Duration sleep slightly reduced

2: Duration of sleep clearly but still only moderately reduced, i.e. still less

than a 50% reduction

3: Duration of sleep reduced with 50% or more

4: Duration of sleep extremely reduced, e.g. as if not been sleeping at all

Item 6 Psychic anxiety 0: Not present

1: Very mild tendencies to worry, feeling fear or apprehension

2: More clearly in a state of worrying, feeling insecure or afraid, which,

however, it is still possible to control

3: The psychic anxiety or apprehension is at times more difficult to control.

On the edge of panic

4: Extreme degree of anxiety, interfering greatly with the daily life activities

Item 7 Emotional introversion 0: Not present

1: Very mild tendencies to draw back for emotional contact with other

people, e.g. colleagues

2: More clear emotionally introverted to other people apart from close

friends or family members

3: Moderately to markedly introverted even towards close friends or family

members

4: Is isolated or emotionally introverted to an extreme degree

Page 148: Clinical Psychometrics

The Bech-Rafaelsen melancholia scale (MES) 139

Item 8 Worthless and guilt 0: No loss of self-esteem, no self-depreciation or guilt feelings

1: Is concerned with the experience of being a burden to family, friends

or colleagues due to reduced interests or introversion

2: Focussing on negative events in the past prior to the current episode of

depression. However, still to a mild degree

3: More clearly focussed on negative events in the past accompanied with

the feeling that the current depression is a kind of punishment for pre-

vious omissions or failures. However, can intellectually still se that this

view is unfounded

4: The guilt feelings have become paranoid ideas

Item 9 Suicidal thoughts 0: Not present

1: Feels that life is not worthwhile, but expresses no wish to die

2: Wishes to die (“it would be a relief not to wake up next morning”) but

has no plans to take own life.

3: Probably has plans to take own life

4: Has definitely plans to take own life

Item 10 Decreased verbal activity 0: Not present

1: Very mild problems in verbal formulation

2: More pronounced inertia in conversation, for example, a trend to longer

pauses

3: Interview is clearly influenced by brief responses or longer pauses

4: Interview is clearly prolonged due to decreased verbal formulation

activity

Item 11 Decreased motor activity 0: Not present

1: Very mild tendencies to decreased motor activity, for example, facial

expression slightly reduced

2: Moderately reduced motor activity, e.g. reduced gestures

3: Markedly reduced motor activity, e.g. all movements slow

4: Severely reduced motor activity, approaching stupor

Page 149: Clinical Psychometrics

140

Appendix 3e ABC version of the SCL-92 analogue with HAM-D17

Pure depressionSCL-6

Stress-related arousalSCL-9

(A)

(C)

(B)30.

26.

32.

71.

31.

14.

Feeling blue

Blaming yourself for things

Feeling no interest in things

Feeling everything is an effort

Worrying too much about things

Feeling low in energy or slowed down

44.

66.

64.

78.

2.

57.

19.

5.

87.

Trouble falling asleep

Sleep that is restless or disturbed

Awakening in the early morning

Feeling so restless you couldn’t sitstillNervousness or shaking inside

Feeling tense or keyed up

Poor appetite

Loss of sexual interest or pleasure

The idea that something serious iswrong with your body

59.

15.

Thoughts of death or dying

Thoughts of ending your life

SCL-6Total score:

SCL-2Total score:

SCL-9Total score:

Suicide risk behaviourSCL-2

(A) (B)(C)

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 150: Clinical Psychometrics

141

The time frame (window) is the past three days.

Hamilton Depression Subscale and item definitions

1 DEPRESSED MOOD Score

0 Not present.

1 Very mild tendencies towards lowered spirits.

2 Moderate signs of being depressed

3 Markedly depressed. Some hopelessness and/or clear non-verbal signs of depression.

4 Severe degree of lowered mood. Pronounced hopelessness.

2 LOW SELF-ESTEEM AND GUILT

0 No self-depreciation, low self-esteem or guilt feelings.

1 Concerned with the fact of being a burden to the family, friends or colleagues.

2 Signs of guilt feelings about incidents (minor omissions or failures) prior to current episode of depression.

3 Feels that current depression is a punishment for failures or omissions in the past.

4 Feels that the current depression is a well-deserved punishment.

3 WORK AND INTERESTS

0 No difficulties; time feels useful.

1 Mild insufficiencies in social and day-to-day activities.

2 Moderate signs of lack of interest in doing things or day-to-day activities.

Appendix 3f HAM-D 6 – clinician version

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 151: Clinical Psychometrics

142 Clinical Psychometrics

3 Difficulties in performing even daily routine activities which are carried out with great effort.

4 Often needs help in performing self-care activities (unable to function independently).

4 PSYCHOMOTOR RETARDATION, GENERAL

0 Norma psychomotor condition.

1 Motoric speed slightly reduced.

2 Clear signs of reduced speed, e.g. reduced gestures, facial expression and slow pace.

3 The interview is clearly prolonged due to long breaks and brief answers.

4 The interview can hardly or not be completed due to retardation.

5 PSYCHIC ANXIETY

0 Not present.

1 Mild tendencies towards tenseness, worry, fear or apprehension.

2 Moderate anxiety, apprehension or insecurity.

3 Difficulty controlling anxiety or apprehension; sometimes at the edge of panic.

4 Extreme degree of anxiety

6 TIREDNESS AND PAINS

0 Not present

1 Doubtful or very vague feelings of tiredness or pain.

2 Moderate to severe tiredness or pains.

Total score

Sum: HAM-D 6 No depression: 0–4 Depression doubtful: 5–6 Mild depression: 7–8 Moderate depression: 9–11 Severe depression 12–22

Page 152: Clinical Psychometrics

143

Appendix 3g The HAM-D 6 Questionnaire

In this questionnaire you will find six groups of statements. Please choose the

one statement in each group that best describes how you have been feeling

over the past three days, including today, and mark it with an X in the

corresponding box.

(1) During the past three days

I have been in my usual good mood 0

I have felt a little more sad than usual 1

I have been clearly more sad than usual, but haven ’ t felt hopeless 2

I have been so gloomy that I briefly have felt overpowered

by hopelessness

3

I have been so low in my moods that everything seems dark

and hopeless

4

(2) During the past three days

I have been quite satisfied with myself 0

I have been a little more self-critical than usual with a tendency

to feel less worthy than others

1

I have been brooding over my failures in the past 2

I have been plagued with distressing guilt feelings 3

I have been convinced that my current condition is a punishment 4

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 153: Clinical Psychometrics

144 Clinical Psychometrics

(3) During the past three days

My daily activities have been as usual 0

I have been less interested in my usual activities 1

I have felt that I have had difficulty performing my daily

activities, but I was still able to perform them with great effort

2

I have had difficulty performing even simple routine activities 3

I have not been able to do any of the most simple day-to-day

activities without help

4

(4) During the past three days

I have felt neither restless nor slowed down 0

I have felt a little slowed down 1

I have felt rather slowed down or have been talking a little less

than usual

2

I have felt clearly slowed down or subdued or have talked much

less than usual

3

I have hardly been talking at all or felt extremely slowed down

all the time

4

(5) During the past three days

I have been calm and relaxed 0

I have felt a little more tense or insecure than usual 1

I have been clearly more worried or tense than usual, but have

not felt that I lost control

2

I have been so tense or worried that I have briefly I felt close to panic 3

I have had episodes where I was overwhelmed by panic 4

(6) During the past three days

I have been as active and have had as much energy as usual 0

I have felt rather low in energy or physically unwell with some

bodily pains

1

I have felt very low in energy or had bodily pains 2

Page 154: Clinical Psychometrics

145

Appendix 3h SCL-D6 subscale for depression

How much were you bothered by:

Not at all A little bit Moderately Quite a bit Extremely

(30) Feeling blue

Blaming yourself forthings

(26)

Worrying too muchabout things

(31)

Feeling everything is aneffort

(71)

Feeling low in energy orslowed down

(14)

Feeling no interest inthings

(32)

SCL-D6

In this questionnaire please mark with an X how you have been feeling over

the past week, including today.

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 155: Clinical Psychometrics

146

Appendix 3i The BDI6 subscale for depression

1I do not feel sad

I feel sad and depressed

I feel constantly sad and depressedand feel unable to get out of it

I feel so blue and unhappy that I cannotbear it

BDI6

A

B

D

C

5 I don’t feel particularly guilty

I feel bad or unworthy a good part of thetime

I feel quite guilty

I feel constantly as thought I am guiltyand worthless

A

B

C

D

In this questionnaire you will find six groups of statements. Please choose the

one statement in each group (A, B, C or D) that best describes how you have

been feeling over the past three days, including today, and mark it with an X

in the corresponding box (A, B, C or D).

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 156: Clinical Psychometrics

The BDI6 subscale for depression 147

11 I am no more irritable now than I everwas

I get annoyed or irritable more easilythan I used to

I feel irritated all the time

I don’t get irritated at all about thethings that used to irritate me

BDI6

A

B

D

C

13I make decisions about as well as ever

I try to put off making decisions

I have great difficulty in makingdecisions

I cannot make any decisions at allanymore

A

B

C

D

A

B

D

C

A

B

C

D

17I don’t get more tired than usual

I get tired more easily than I sued to

I get tired from doing anything

I get too tired to do anything

BDI6

A

B

D

C

15I can work about as well as before

It takes extra effort to get started atdoing something

I have to push myself very hard to doanything

I can’t do any work at all

A

B

C

D

A

B

D

C

A

B

C

D

Page 157: Clinical Psychometrics

148

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

The following questions ask about how you have been feeling over the last

two weeks . Please put a tick in the box which is closest to how you have been

feeling. A higher number signifies a higher degree of depression.

How much of the time in the last two weeks …

All the time

Most of the time

Slightly more than half the time

Slightly less than half the time

Some of the time

At no time

1 Have you felt low in spirits or sad? 5 4 3 2 1 0

2 Have you lost interest in your daily activities? 5 4 3 2 1 0

3 Have you felt lacking in energy and strength? 5 4 3 2 1 0

4 Have you felt less self-confident? 5 4 3 2 1 0

5 Have you had a bad conscience or feelings of guilt? 5 4 3 2 1 0

6 Have you felt that life wasn’t worth living? 5 4 3 2 1 0

7 Have you had difficulty in concentrating, e.g. when reading the newspaper or watching TV? 5 4 3 2 1 0

8a Have you felt very restless? 5 4 3 2 1 0

8b Have you felt subdued or slowed down? 5 4 3 2 1 0

Appendix 4a Major Depression Inventory

Page 158: Clinical Psychometrics

Major depression inventory 149

How much of the time in the last two weeks …

All the time

Most of the time

Slightly more than half the time

Slightly less than half the time

Some of the time

At no time

9 Have you had trouble sleeping at night? a: too little sleep b: too much sleep 5 4 3 2 1 0

10a Have you suffered from reduced appetite? 5 4 3 2 1 0

10b Have you suffered from increased appetite?

5

4

3

2

1

0

Total score

Page 159: Clinical Psychometrics

Depression Inventory MDI: Scoring key

At the top the diagnostic demarcation line is indicated. The total score of the

10 items is filled in below

The diagnostic demarcation line

How much of the time …

All the time

Most of the time

Slightly more than

half the time

Slightly less than

half the time

Some of the time

At no

time

Core symptoms

1 Have you felt low in spirits or sad?

5 4 3 2 1 0

2 Have you lost interest in your daily activities?

5 4 3 2 1 0

3 Have you felt lacking in energy and strength?

5 4 3 2 1 0

Accomp- any ing symptoms

4 Have you felt less self-confident?

5 4 3 2 1 0

5 Have you had a bad conscience or feelings of guilt?

5 4 3 2 1 0

6 Have you felt that life wasn’t worth living?

5 4 3 2 1 0

7 Have you had difficulty in concentrating, e.g. when reading the newspaper or watching TV?

5 4 3 2 1 0

Highest score

8a Have you felt restless? 5 4 3 2 1 0 8b Have you felt subdued

or slowed down? 5 4 3 2 1 0

9 Have you had difficulty sleeping at night? a: too little sleep b: too much sleep

5 4 3 2 1 0

Highest score

10a Have you suffered from reduced appetite?

5 4 3 2 1 0

10b Have you suffered from increased appetite?

5 4 3 2 1 0

Total score (item 1 – 10)

Diagnosis: ICD-10 ___________________ DSM-IV___________________

Page 160: Clinical Psychometrics

Major depression inventory 151

Major Depression Inventory (MDI): A depression questionnaire with a

dual function

MDI: Scoring instructions

The questionnaire consists of the ten symptoms contained in the World

Health Organization WHO’s depression demarcation. WHO employs the last

two weeks as the period of time in which to assess whether each symptom

has been present for more than half the time. These symptoms are mainly

subjective; therefore it is natural to ask the patient to complete the question-

naire, allowing the patient to tick each symptom. A higher number signifies a

more constant presence of the symptom in question. Remember to fill in

patient name and the date

The patient’s completed questionnaire is scored using the scoring key.

MDI (Major Depression Inventory) has a dual function, as it is scored both

as an instrument of severity (A) similar to the Hamilton Depression Scale,

and (B) as a diagnostic tool.

(A) If MDI is used as a rating scale in the same way as the Hamilton scales,

then the sum of the ten questions indicates the degree of depression. For

item 8 and 10, with two answer categories for each (a) and (b), the high-

est score is used. The theoretical score range is thus from 0 (no depres-

sion) to 50 (maximum depression).

Mild depression: MDI total score from 21 to 25

Moderate depression: MDI total score from 26 to 30

Severe depression: MDI total score of 31 or higher

(B) MDI as a diagnostic tool : the vertical line (the diagnostic demarcation

line) is used as indicated above. The three top symptoms which reflect

the core symptoms of the WHO/ICD-10 diagnosis of depressions must

have been present during the last two weeks for most of the time. The

accompanying symptoms in the remaining seven MDI items must have

been present during the last two weeks for more than half of the time.

The ICD-10 algorithm:

Mild depression: 2 core symptoms and 2 accompanying

symptoms

Moderate depression: 2 core symptoms and 4 accompanying

symptoms

Severe depression: 3 core symptoms and 5 accompanying

symptoms.

Page 161: Clinical Psychometrics

152 Clinical Psychometrics

MDI can also be employed when diagnosing DSM-IV major depression.

According to DSM-IV only nine symptoms are used, as the DSM-IV item 4

is included in item 5. Thus the item with the highest score is used here.

The DSM-IV algorithm : 5 out of the 9 symptoms should be present. Of these one should be one of the

two first items; according to DSM-IV these are core symptoms.

A more precise major depression diagnosis depends on the answer to item

9 (a) or (b) and to item 10 (a) or (b).

Major depression without inverse neurovegetative symptoms: a score on

9a and 10a.

Major depression with inverse neurovegetative symptoms: a score on 9b

and 10b.

Page 162: Clinical Psychometrics

153

Appendix 4b Dealing with missing values in the Major Depression Inventory (MDI)

A. As a rating scale (total score)

1. Items 8a and 8b; use the highest score

2. Items 10a and 10b: use the highest score

3. When only two out of these ten new items are missing, then the total

score is calculated as (the sum of the items) / (number of items) * 10.

4. If more than two out of the ten items are missing, then omit calculating

total score.

B. As a diagnostic tool

1. As in the 2 first paragraphs in section A.

2. For Items 4 and 5: use the highest score

3. For the nine new items:

a) For the 3 first items: a score ≥ 4 = 1, a score < 4 = 0

b) For the 6 last items: a score ≥ 3 = 1, a score < 3 = 0

4. Major depression is present if the sum of the 9 items ≥ 5 and the sum of

the two first items is ≥ 1.

5. Major depression can be ruled out if the sum of the 9 items < 5 or the sum

of the first two items = 0.

6. Thus, theoretically, major depression can be confirmed when there are

fewer than 5 missing items.

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 163: Clinical Psychometrics

154

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

The time frame (window) is the past three days.

Scoring sheet

Nr. Symptom Score

1 Anxious mood 0-4

2 Tension 0-4

3 Fears 0-4

4 Insomnia 0-4

5 Difficulties in concentration and memory 0-4

6 Depressed mood 0-4

7 General somatic symptoms (Muscular symptoms) 0-4

8 Generelle somatiske klager (Sensory) 0-4

9 Cardiovascular symptoms 0-4

10 Respiratory symptoms 0-4

11 Gastrointestinal symptoms 0-4

12 Genito-urinary symptoms 0-4

13 Other autonomic symptoms 0-4

14 Behaviour during interview 0-4

Total score 0-56

Symptoms scored from 0 to 4 Sum

0 = not present 6 to 14 = mild anxiety 1 = mild degree 15 to 28 = moderate anxiety 2 = moderate degree 29 to 52 = severe anxiety 3 = marked degree 4 = maximum degree

Appendix 5a Hamilton Anxiety Scale HAM-A 14

Page 164: Clinical Psychometrics

Hamilton anxiety scale HAM-A14 155

Hamilton Anxiety Scale (HAM-A 14 )

Manual

1. Anxiety This item covers the emotional condition of uncertainty about the future,

ranging from worry, insecurity, irritability, apprehension to overpowering

dread. The patient’s report of worrying, insecurity, uncertainty, fear and

panic, i.e, the psychic, or mental (‘central’) anxiety experience is weighed.

0: The patient is neither more nor less insecure or irritable than usual.

1: The patient reports more tension, irritability or feeling more insecure

than usual.

2: The patient expresses more clearly to be in a state of anxiety,

apprehension or irritability, which he may find difficult to control. It is

thus without influence on the patient ’ s daily life, because the worrying

still is about minor matters

3: The anxiety or insecurity is at times more difficult to control because

the worrying is about major injuries or harms which might occur in the

future. E.g., the anxiety may be experienced as panic, i.e., overpowering

dread: has occasionally interfered with the patient ’ s daily life.

4: The feeling of dread is present so often that it markedly interferes with

the patient ’ s daily life.

2. Tension This item includes inability to relax, nervousness, bodily tensions, trembling

and restless fatigue.

0: The patient is neither more nor less tense than usual.

1: The patient indicates to be somewhat more nervous and tense than usual.

2: The patient expresses clearly to be unable to relax, full of inner unrest

which he finds difficult to control, but still without influence on the

patient ’ s daily life.

3: The inner unrest and nervousness is so intense or so frequent that it

occasionally has interfered with the patient ’ s daily work.

4: Tensions and unrest interfere with the patient ’ s life and work at all times.

3. Fears A type of anxiety which arises when the patient finds himself in special situ-

ations. Such situations may be open or closed rooms, to queue, to ride a bus

or a train. The patient shall experience relief by avoiding such situations. It is

important to notice at this evaluation, whether there has been more phobic

anxiety during the present episode than usual.

Page 165: Clinical Psychometrics

156 Clinical Psychometrics

0: Not present.

1: Doubtful if present.

2: The patient has experienced phobic anxiety, but was able to fight it.

3: It has been difficult for the patient to fight or overcome his phobic

anxiety which has thus to a certain extent interfered with the patient ’ s

daily life and work.

4: The phobic anxiety has clearly interfered with the patient ’ s daily life

and work.

4. Insomnia This item covers only the patient ’ s subjective experience of sleep length (hours

of sleep per 24-hour-period) and sleep depth (superficial and interrupted

sleep versus deep and steady sleep). The rating is based on the three preceding

nights. Note: Administration of hypnotics or sedatives shall be disregarded.

0: Usual sleep length and sleep depth.

1: Sleep length is doubtfully or slightly reduced (e.g., due to difficulties

failing asleep), but no change in sleep depth.

2: Sleep depth is now also reduced, sleep being more superficial. Sleep as a

whole somewhat disturbed.

3: Sleep duration as well as sleep depth is markedly changed. The broken

sleep periods total only a few hours per 24-hour-period.

4: It is difficult here to ascertain sleep duration as sleep depth is so shallow that

the patient speaks of short periods of slumber or dosing, but no real sleep.

5. Difficulties in concentration and memory This item covers difficulties in concentration, making decisions about

everyday matters, and memory.

0: The patient has neither more nor less difficulties in concentration and/

or memory than usual.

1: It is doubtful whether the patient has difficulties in concentration and/

or memory.

2: Even with a major effort it is difficult for the patient to concentrate on

his daily routine work.

3: More pronounced difficulties with concentration, memory, or decision

making. E.g., has difficulties reading an article in a newspaper or watching

a television programme right through. Scores 3 as long as the loss of

concentration or poor memory has not clearly influenced the interview.

4: When the patient during the interview has shown difficulty in concentration

and/or memory, and/or when decisions are reached with considerable delay.

6. Depressed mood This item covers both the verbal and the non-verbal communication of

sadness, depression, despondency, and hopelessness.

Page 166: Clinical Psychometrics

Hamilton anxiety scale HAM-A14 157

0: Natural mood.

1: When it is doubtful whether the patient is more despondent or sad than

usual. E.g., the patient indicates vaguely to be more depressed than usual.

2: When the patient more clearly is concerned with unpleasant

experiences, although he still is without hopelessness.

3: The patient shows clear non-verbal signs of depression and/or

hopelessness.

4: The patient ’ s remarks on despondency and the non-verbal ones dominate

the interview in which the patient cannot be distracted.

7. General somatic symptoms (muscular symptoms) This item includes weakness, stiffness, soreness merging into real pain,

which is more or less diffusely localised in the muscles. E.g., jaw ache or

neck ache.

0: The patient is neither more nor less sore or stiff in his muscles than usual.

1: The patient indicates to be somewhat more sore or stiff in his muscles

than usual.

2: The symptoms have gained the character of pain.

3: The muscle pains interfere to some extent which the patient ’ s daily life

and work.

4: The muscle pains are present most of the time and interfere clearly with

the patient ’ s daily life and work.

8. General somatic symptoms (sensory symptoms) This item includes increased fatigability and weakness merging into real

functional disturbances of the senses. Including: tinnitus, blurring of vision,

hot and cold flushes and prickling sensations.

0: Not present

1: It is doubtful whether the patient ’ s indications of pressing or prickling

sensations (e.g., in ears, eyes or skin) are more pronounced than usual.

2: The pressing sensations in the ear reach the character of buzzing in the

ears, in the eye as visual disturbances, and in the skin as prickling or

itching sensations (paraesthesias).

3: The generalised sensory symptoms interfere to some extent with the

patient ’ s daily life and work.

4: The generalised sensory symptoms are present most of the time and

interfere clearly with the patient ’ s daily life and work.

9. Cardiovascular symptoms This item includes tachycardia, palpitations, oppression, chest pain, throbbing

in the blood vessels, and feelings of fainting.

Page 167: Clinical Psychometrics

158 Clinical Psychometrics

0: Not present.

1: Doubtful if present.

2: Cardiovascular symptoms are present, but the patient can still control the

symptoms.

3: The patient has now and again difficulties in controlling the

cardiovascular symptoms which thus to some extent interfere with the

patient ’ s daily life and work.

4: The cardiovascular symptoms are present most of the time and interfere

clearly with the patient ’ s daily life and work.

10. Respiratory symptoms This item includes feelings of constriction or contraction in throat or chest,

dyspnoea merging into choking sensations and sighing respiration.

0: Not present.

1: Doubtful if present.

2: Respiratory symptoms are present, but the patient can still control the

symptoms.

3: The patient has now and again difficulties in controlling the respiratory

symptoms which thus to some extent interfere with the patient ’ s daily

life and work.

4: The respiratory symptoms are present most of the time and interfere

clearly with the patient ’ s daily life and work.

11. Gastro-intestinal symptoms The item includes difficulties in swallowing, ‘sinking’ sensation of the stom-

ach, dyspepsia (heartburn or burning sensations in the stomach, abdominal

pains related to meals, fullness, nausea and vomiting), abdominal rumbling

and diarrhoea.

0: Not present.

1: Doubtful if present (or doubtful if different from the patient ’ s ordinary

gastrointestinal sensations).

2: One or more of the above-mentioned gastro-intestinal symptoms are

present, but the patient can still control the symptoms.

3: The patient has now and again difficulties in controlling the gastrointestinal

symptoms which thus to some extent interfere with the patient ’ s daily life

and work. E.g., tendency of losing control over the bowels.

4: The gastrointestinal symptoms are present most of the time and

interfere clearly with the patient ’ s daily life and work. E.g., losing control

over the bowels.

Page 168: Clinical Psychometrics

Hamilton anxiety scale HAM-A14 159

12. Genito-urinary symptoms This item includes non-organic or psychic symptoms such as frequent

or  more pressing passing of urine, menstrual irregularities, anorgasmia,

dyspareunia, premature ejaculation, loss of erection.

0: Not present.

1: Doubtful if present (or doubtful if different from the ordinary genito-

urinary sensations).

2: One or more of the above-mentioned genito-urinary symptoms are

present, but they do not interfere with the patient ’ s daily life and work.

3: The patient has now and again one or more of the above mentioned

genito-urinary symptoms to such a degree that they to some extent

interfere with the patient ’ s daily life and work. E.g., tendency of losing

control over micturation.

4: The genito-urinary symptoms are present most of the time and interfere

clearly with the patient ’ s daily life and work. E.g., losing control over

micturation.

13. Autonomic symptoms This item includes dryness of mouth, blushing or pallor, sweating and dizziness.

0: Not present.

1: Doubtful if present.

2: One or more of the above-mentioned autonomic symptoms are present,

but they do not interfere with the patient ’ s daily life and work.

3: The patient has now and again one or more of the above-mentioned

autonomic symptoms to such a degree that they to some extent interfere

with the patient ’ s daily life and work.

4: The autonomic symptoms are present most of the time and interfere

clearly with the patient ’ s daily life and work.

14. Behaviour at interview This item is based on patient behaviour during the interview. Did the patient

appear tense, nervous, agitated, restless, fidgeting, tremulous, pale, hyper-

ventilating, or sweating?

On the basis of such observations a global estimate is made:

0: The patient does not appear anxious.

1: It is doubtful whether the patient is anxious.

2: The patient is moderately anxious.

3: The patient is clearly anxious.

4: The patient is overwhelmed by anxiety. E.g., shaking and trembling all over.

Page 169: Clinical Psychometrics

160

Appendix 5b Anxiety Symptom Scale (ASS)

The following questions ask about how you have been feeling over the past two

weeks. Please put a tick in the box that is closest to how you have been feeling.

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 170: Clinical Psychometrics

161

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

The time frame (window) is the past three days.

Scoring sheet

No. Symptom Score

1 Elevated mood 0–4

2 Increased verbal activity 0–4

3 Increased social contact (intrusiveness)

0–4

4 Increased motor activity 0–4

5 Sleep disturbances 0–4

6 Work activities (distractibility) 0–4

7 Irritable mood, hostility 0–4

8 Increased sexual activity 0–4

9 Increased self-esteem 0–4

10 Flight of thoughts 0–4

11 Noise level 0–4

Total score 0–44

No mania: 0–6 Doubtful mania: 7–10 Hypomania: 11–14 Moderate mania: 15–24 Marked/severe mania: 25–44

Appendix 6 The Bech-Rafaelsen Mania Scale (MAS)

Page 171: Clinical Psychometrics

162 Clinical Psychometrics

The Bech-Rafaelsen Mania Scale (MAS) Manual

Item 1 Elevated mood 0: Not present

1: Slightly elevated mood, optimistic, but still adapted to situation

2: Moderately elevated mood, joking, laughing, however, somewhat irrele-

vant to situation

3: Markedly elevated mood, exuberant both in manner and speech, clearly

irrelevant to situation

4: Extremely elevated mood, quite irrelevant to situation

Item 2 Increased verbal activity 0: Not present

1: Somewhat talkative

2: Clearly talkative, few spontaneous intervals in the conversation, but still

not difficult to interrupt

3: Almost no spontaneous intervals in the conversation, difficult to

interrupt

4: Impossible to interrupt, dominates the conversation completely

Item 3 Increased social contact (intrusiveness) 0: Not present

1: Slightly meddling (putting his/her oar in), slightly intrusive

2: Moderately meddling and arguing or intrusive

3: Dominating, arranging, directing, but still in context with the setting

4: Extremely dominating and manipulating, not in context with the

setting

Item 4 Increased motor activity 0: Not present

1: Slightly increased motor activity (e.g., some tendency to lively facial

expression)

2: Clearly increased motor activity (e.g., lively facial expression, not able to

sit quietly in chair)

3: Excessive motor activity, on the move most of the time, but the patient

can sit still if urged to (rises only once during interview)

4: Constantly active, restlessly energetic. Even if urged to, the patient

cannot sit still

Page 172: Clinical Psychometrics

The Bech-Rafaelsen mania scale (MAS) 163

Item 5 Sleep disturbances This item covers the patient’s subjective experience of the duration of sleep

(hours of sleep per 24-h periods). The rating should be based on the three

preceding nights, irrespective of the administration of hypnotics or sedatives.

The score is the average of the past three nights .

0: Not present (habitual duration of sleep)

1: Duration of sleep reduced by 25%

2: Duration of sleep reduced by 50%

3: Duration of sleep reduced by 75%

4: No sleep

Item 6 Work activities (distractibility) Work activity should be measured in terms of the degree of disability or

distractibility in social, occupational or other important areas of

functioning .

0: No difficulties

1: Slightly increased drive, but work quality is slightly reduced as motiva-

tion is changing; the patient is somewhat distractible (attention drawn to

irrelevant stimuli)

2: Work activity clearly affected by distractibility, but still to a moderate degree

3: The patient occasionally loses control of routine tasks because of marked

distractibility

4: Unable to perform any task without help

Item 7 Irritable mood, hostility 0: Not present

1: Somewhat impatient or irritable, but control is maintained

2: Moderately impatient or irritable. Does not tolerate provocations

3: Provocative, makes threats, but can be calmed down

4: Overt physical violence; physically destructive

Item 8 Increased sexual activity 0: Not present

1: Slight increase in sexual interest and activity, for example, slightly flirta-

tious

2: Moderately increase in sexual interest and activity, for example, clearly

flirtatious

3: Marked increase in sexual interest and activity, excessively flirtatious

4: Completely preoccupied by sexual interests

Page 173: Clinical Psychometrics

164 Clinical Psychometrics

Item 9 Increased self-esteem 0: Not present

1: Slightly increased self-esteem, for example, overestimates slightly own

habitual capabilities

2: Moderate increased self-esteem, for example, overestimates more clearly

own habitual capabilities or hints at unusual abilities

3: Markedly unrealistic ideas, for example, believes he/she possesses

extraordinary abilities, powers or knowledge (scientific, religious etc),

but can quickly be corrected

4: Grandiose ideas which cannot be corrected

Item 10 Flight of thoughts 0: Not present

1: Somewhat lively in descriptions, explanations and elaborations without

losing the connection with the topic of the conversation. The thoughts

are thus still coherent

2: The patient’s thoughts are occasionally distracted by random associa-

tions (often rhymes, slangs, puns, pieces of verse or music)

3: The line of thoughts is more regularly disrupted by diversionary

associations.

4: It is very difficult or impossible to follow the patient because of the flight

of thoughts; he or she constantly jumps from one topic to another

Item 11 Noise level 0: Not present

1: Speaks somewhat loudly without being noisy

2: Voice discernible at a distance, and somewhat noisy

3: Vociferous, voice discernible at a long distance, is markedly noisy or singing

4: Shouting, screaming; or using other sources of noise due to hoarseness

Page 174: Clinical Psychometrics

165

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

With the two subscales

Nr. Item Score Schizophrenicity subscale

Depression subscale

1 Somatic concern (0–6)

2 Anxiety (psychic) (0–6)

3 Emotional withdrawal (0–6)

4 Conceptual disorganisation (0–6)

5 Self-depreciation and guilt feelings (0–6)

6 Anxiety (somatic) (0–6)

7 Specific motor disturbances (0–6)

8 Exaggereated self-esteem (0–6)

9 Depressive mood (0–6)

Appendix 7 Brief Psychiatric Rating Scale (BPRS)

Page 175: Clinical Psychometrics

166 Clinical Psychometrics

Nr. Item Score Schizophrenicity subscale

Depression subscale

10 Hostility (0–6)

11 Suspiciousness (0–6)

12 Hallucinations (0–6)

13 Psychomotor retardation (0–6)

14 Uncooperativeness (0–6)

15 Unusual thought content (0–6)

16 Blunted or inappropriate affect (0–6)

17 Psychomotor agitation (0–6)

18 Disorientation and confusion (0–6)

Total BPRS

Subtotal

schizophrenicity

Subtotal

depression

Page 176: Clinical Psychometrics

167

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

WHO (Five) Well-Being Index (1998 version)

Please indicate for each of the five statements which is closest to how you

have been feeling over the last two weeks. Notice that higher numbers mean

better well-being.

Example: If you have felt cheerful and in good spirits more than half of the

time during the last two weeks, put a tick in the box with the number 3 in the

upper right corner.

Over the last two weeks

All of the time

Most of the time

More than half of the time

Less than half of the time

Some of the time

At no time

1 I have felt cheerful and in good spirits

5 4 3 2 1 0

2 I have felt calm and relaxed

5 4 3 2 1 0

3 I have felt active and vigorous

5 4 3 2 1 0

4 I work up feeling fresh and rested

5 4 3 2 1 0

5 My daily life has been filled with things that interest me

5 4 3 2 1 0

Appendix 8a

Psychiatric Research UnitWHO Collaborating Centre in Mental Health

Page 177: Clinical Psychometrics

168 Clinical Psychometrics

Scoring The raw score is calculated by totalling the figures of the five answers.

The  raw score ranges from 0 to 25, 0 representing worst possible and 25

representing best possible quality of life.

To obtain a percentage score ranging from 0 to 100, the raw score is multi-

plied by 4. A percentage score of 0 represents worst possible, whereas a score

of 100 represents best possible quality of life.

Interpretation It is recommended to administer the Major Depression (ICD-10) Inventory

if the raw score is below 13 or if the patient has answered 0 to 1 to any of the

five items. A score below 13 indicates poor wellbeing and is an indication for

testing for depression under ICD-10.

© Psychiatric Research Unit, WHO Collaborating Center for Mental Health,

Frederiksborg General Hospital, DK-3400 Hillerød

Page 178: Clinical Psychometrics

169

The correct scoring of the Hospitals Anxiety and Depression Scale to cover

positive well-being (WHO-5) and anxiety symptoms or neuroticism.

Appendix 8b The HADS subscales for positive well-being and anxiety symptoms

HADS

WHO-5 Eysenck Neuroticism

2. I still enjoy the things I used to enjoy

4. I can laugh and see the funny side of things

6. I feel cheerful

7. I feel relaxed

12. I look forward with enjoyment to things

1. I feel tense or ‘wound up’

3. I get a sort of frightened feeling as if something awful is about to happen

5. Worrying thoughts go through my mind

11. I feel restless as if I have to be on move

13. I get sudden feelings of panic

Remaining items:

8. I feel as if I am slowed down

9. I get a feeling like ‘butterflies’ in the stomach

10. I have lost interest in my appearance

14. I can enjoy a good book

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 179: Clinical Psychometrics

170

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

F Etiological considerations

F1 Lack of insight (the last 3 days) 0 Absent .

1 Doubtful .

2 Admits to mental problems but not to being mentally ill.

3 Acknowledges possible change in behaviour, but denies mental i llness.

4 Denies any change in behaviour. Thus does not even feel stressed.

F2 a Psychological stress ( stressors) (around beginning of episode and 6 months retrospectively)

0 Absent . No psychological stress.

1 Doubtful .

2 Definitely present presence of long-term psycho-social stressor (e.g.,

divorce or work-related problems) considered to to have etiological

significance, i.e., condition would not have occurred without it).

F2b Post-traumatic stress disorder 0 Absent . No post-traumatic stress disorder.

1 Doubtful .

2 Definitely present when condition has developed during the course

of a few weeks after exposure to exceptionally catastrophic event.

F3 Neuroticism (covering premorbid history) 0 Absent

1 Doubtful presence of chronic tendency from early youth to anxiety,

worrying or feelings of inferiority.

Appendix 9a Etiological considerations in major depression by use of the Clinical Interview for Depression and Related Syndromes (CIDRS)

Page 180: Clinical Psychometrics

Etiological considerations in major depression 171

2 Mild . Slight tendency to personality structure with anxiety, worrying

and tension.

3 Mild to moderate . Mildly to moderately anxious personality struc-

ture (neuroticism), however without causing constraints in daily life.

4 Moderate to marked neuroticism, including tendency to introver-

sion, some degree of limitation in daily life.

5 Marked to severe neuroticism, causing constraints in daily life.

6 Extremely severe neuroticism causing chronic constraints in daily

life.

F4 Increased reactivity towards environment (the last 3 days)

0 Absent .

1 Doubtful or minimally present.

2 Mild . Unspecific factors, such as having someone to talk to, lead to

limited improvement.

3 Mild to moderate . Unspecific factors or certain specific situations

either lead to improvement or deterioration.

4 Moderate to marked . This condition varies to a considerable degree,

depending on the factors making up the situation.

5 Marked to severe . Certain factors frequently lead to complete disap-

pearance or triggering of condition.

6 Extremely severe . The condition depends entirely on quite specific

situations, which each time lead to complete disappearance or trig-

gering of it.

F5 Diurnal variation – symptoms worse in evening (the last 3 days)

0 Absent .

1 Doubtful , minimally present.

2 Mild .

3 Mild to moderate . Fluctuations of greater intensity or frequency.

4 Moderate to marked .

5 Marked to severe . Regular changes from considerable depression to

hardly any symptoms.

6 Extremely manifest changes.

F6 Diurnal variation – symptoms worse in morning (the last 3 days)

0 Absent .

1 Doubtful or minimal.

Page 181: Clinical Psychometrics

172 Clinical Psychometrics

2 Mild .

3 Mild to moderate . Fluctuations of greater intensity or frequency.

4 Moderate to marked .

5 Marked to severe . Regular changes from considerable condition to

hardly any symptoms.

6 Extremely marked changes in condition.

F7 Quality of depression (covering whole episode) 0 Absent . No difference from ordinary grief reaction or stress

condition.

1 Doubtfully present , as not a question of ordinary grief reaction or

stress condition.

2 Mild . Felt to be slightly different from ordinary feeling of stress.

3 Mild to moderate , definitely different from ordinary feeling of

stress.

4 Moderately to markedly different from ordinary feeling of stress, all

is negative.

5 Markedly to severely different from ordinary feeling of stress.

6 Extremely severe , pronounced difference from ordinary feeling of

stress, exceedingly different.

F8 Persistency and duration of condition (covering whole episode)

0 Absent .

1 Doubtful . Quite insignificant day-to-day variations.

2 Definite persistency . Condition the same from day to day, if any

change it tends to be an increase of symptoms.

3 Duration less than 6 months.

4 Duration 6–12 months.

5 Duration 12–24 months.

6 Duration more than 24 months.

F9 Depressive delusions (the last 3 days) 0 Absent .

1 Doubtful presence of actual delusions.

2 Mild . Vague depressive delusions which are not adhered to.

3 Mild to moderate depressive delusions as to physical illness or

financial problems. Not especially adhered to.

4 Moderate to marked depressive delusions, adhered to, to a cer-

tain extent.

5 Marked to severe depressive delusions, obstinately adhered to.

Page 182: Clinical Psychometrics

Etiological considerations in major depression 173

6 Extremely marked depressive delusions, completely dominating

condition.

F10 Previous depressive downs 0 Absent

1 Doubtful whether current episode has been preceded by depressive

downs differing from actual depressive episodes by short duration

(typically 4 days or less) and lesser degree of severity. However the lat-

ter element (degree of severity) is not so significant here as the presence

of recurrent episodes of short duration. Should not be confused with

premenstrual tension.

2 Has previously had one depressive down.

3 Has previously had 2–3 downs.

4 Has previously had 4–5 depressive downs.

5 Has previously had around 1 down per year.

6 Has previously had several downs per year.

F11 Previous depressive episodes (covering whole history [anamnesis])

0 Absent .

1 Doubtful whether current episode has been preceded by a delimited

depressive episode of at least 2 weeks duration.

2 Has previously had one depressive episode.

3 Has previously had 2 depressive episodes.

4 Has previously had 3 depressive episodes.

5 Has previously had 4 depressive episodes.

6 Has previously had 5 or more depressive episodes.

F12 Previous hypomanic ups 0 Absent .

1 Doubtful whether current episode has been preceded by

hypomanic ups differing from actual manic episodes by short

duration (typically 4 days or less) and lesser degree of severity

(i.e., without major impact on ability to work or on other social

activities).

2 Has previously had one up.

3 Has previously had 2–3 ups.

4 Has previously had 4–5 ups.

5 Has previously had around 1 up per year.

6 Has previously had several ups per year.

Page 183: Clinical Psychometrics

174 Clinical Psychometrics

F13 Previous manic episodes (covering whole history [anamnesis])

0 Absent

1 Doubtful whether current episode has been preceded by a delimited

manic episode of at least 1 week’s duration. 2

Has previously had 1 manic episode.

3 Has previously had 2 manic episodes.

4 Has previously had 3 manic episodes.

5 Has previously had 4 manic episodes.

6 Has previously had 5 or more manic episodes.

F14 Previous mixed states (covering whole history [anamnesis]) 0 Absent .

1 Doubtful whether current episode has been preceded by an episode

with both depressive and manic symptoms.

2 Has previously had 1 episode with mixed states.

3 Has previously had 2 episodes with mixed states.

4 Has previously had 3 episodes with mixed states.

5 Has previously had 4 episodes with mixed states.

6 Has previously had 5 or more episodes with mixed states.

F15 Hereditary disposition 0 Absent .

1 Doubtful .

2 Mild . Scanty information about distant relative with affective disor-

der characteristics.

3 Mild to moderate . Definite information about distant relative with

affective disorder (committed suicide, hospitalised for this, treated

for this).

4 Moderate to marked . Closer relatives (grandparents, half-siblings)

have/had affective disorder.

5 Marked to severe . A brother, sister or parent has/had affective disorder.

6 Extremely severe . Both a parent and a sibling have/had affective disorder.

F16 Somatic illness (around start of episode and 6 months retrospectively) includes e.g., postpartum depression, post-stroke depression and withdrawal symptoms after substance abuse (alcohol and other psychoactive drugs)

0 Absent .

1 Doubtful

Page 184: Clinical Psychometrics

Etiological considerations in major depression 175

2 Definitely present when the somatic illness is considered to have

etiological significance, i.e., condition would not have occurred with-

out it.

F17 Drug-/substance-induced condition 0 Absent .

1 Doubtful

2 Definitely present when treatment with drug is considered to to

have etiological significance, i.e., condition would not have occurred

without it.

Page 185: Clinical Psychometrics

176

Appendix 9b Newcastle Diagnostic Depression Scale (1965)

No. Item Score

Calculation

value Score

Calculation

value

1 Deviant

personality

2

1

0

0

+1

2 Psychological

stresses

2

1

0

0

+1

+2

3 The quality of

depression

2

1

0

+1

0

4 Weight loss

2

1

0

+2

+1

0

5 Previous

depressive

episodes

2

1

0

+1

0

6 Motor activity

2

1

0

+2

+1

0

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 186: Clinical Psychometrics

Newcastle diagnostic depression scale (1965) 177

No. Item Score

Calculation

value Score

Calculation

value

7 Anxiety

2

1

0

–1

–½

0

8 Nihilistic

delusions

2

1

0

+2

+1

0

9 Accusations of

others

2

1

0

–1

–½

0

10 Feelings of guilt

2

1

0

+1

0

Calculated total value

Endogenous depression = + 6 or more

Dubiously endogenous depression = + 5½

Non-endogenous depression = + 5 or less

Page 187: Clinical Psychometrics

178

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

PRISE 20

(Patient Related Inventory of Side Effects). (Bech P, Csillag C. Rational

polypharmacy in the acute therapy of major depression. In Tech 2011)

Modified after Wisniewski et al (2006)

Have you had any of these side effects over the past two weeks?

No Yes, but tolerable Yes – Distressing

1. Dry mouth ® ® ® 2. Nausea ® ® ® 3. Diarrhoea ® ® ® 4. Constipation ® ® ® 5. Dizziness ® ® ® 6. Palpitations ® ® ® 7. Sweating ® ® ® 8. Headache ® ® ® 9. Tremors ® ® ® 10. Difficulty sleeping: too little ® ® ® 11. Difficulty sleep: too much ® ® ® 12. Loss of sexual desire ® ® ® 13. Trouble achieving orgasm ® ® ® 14. Trouble with erections ® ® ® 15. Anxiety ® ® ® 16. Restlessness ® ® ® 17. Decreased energy ® ® ® 18. Increased appetite ® ® ® 19. Increased weight ® ® ® 20. Emotional indifference ® ® ®

Appendix 10 The modified PRISE 20 questionnaire for side effects of antidepressants

Page 188: Clinical Psychometrics

179

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Principal component analysis ( PCA ) for typing

DMSc.Thomas Teasdale, Associate Professor of Psychology at the University

of Copenhagen, has attempted to explain the mathematics of factor analysis

in his contribution to ‘Undersøgelsesmetoder i klinisk psykologi’ (Evaluation

methods in clinical psychology) (Munksgaard 1992). For this purpose he

presents a fictive version of the correlation matrix (Table A 11.1 ) which

emerges when measuring intelligence by six different tests, or items (A, B, C,

D, E, F). Table A 11.1 demonstrates that the six items correlate positively with

one another to a certain degree.

Based on this correlation matrix Thomas Teasdale has performed the

matrix algebra found in principal component analysis (PCA), namely the

mathematical method described by Hotelling in 1933 where one moves from

correlation coefficient to eigen vector (eigenvalue), which expresses the vari-

ance contained in the individual items.

Figure A 11.1 shows the eigenvectors, or eigenvalues, calculated by Teasdale

in his fictive version. The sum of these eigenvalues is 6 (= the number of

components).Thus the 1 st component has an eigenvalue of 3.1, the 2 nd com-

ponent a value of 1.3, the 3 rd component a value of 0.43, the 4 th component a

value of 0.41, the 5 th component is 0.39 and the 6 th component is 0.36. These

values are given in Figure A 11.1 , together with the percentage of variance

each of these components is responsible for.

In Figure A 11.1 ‘explained variance’ as a percentage is seen on the ordinate

axis. The six components are distributed on the abscissa axis. Thus the 1 st

Component explains 51.7 % of the variance and the 2 nd component explains

21.7% of the variance, which means that together the two first principal com-

ponents explain 73.4 % of the variance, making the remaining components

quite insignificant.

Appendix 11a Calculus Example 1

Page 189: Clinical Psychometrics

180 Clinical Psychometrics

The abscissa in Figure A 11.1 is labelled ‘Ramified hierarchy of typological

components’ to allow a reference to Russell’s typology.

The first principal component, which explains slightly more than 50% of

the variance, is named the general intelligence factor, here all six tests (A, B,

C, D, E and F) correlate positively; we were already aware of this from the

results in Table A 11.1 .

However, as demonstrated in Table A 11.2 this is shown more precisely by

the use of factor loadings which only give the correlation between the indi-

vidual tests and the component itself. The next principal component is bi-

directional, as seen in Table A 11.2 , as items A, B and C have positive

Explained variance

51.7%

100%

73.4%

80.1%87.3%

93.8%

Ramified hierarchy of typological components

1st component

3rd component

2nd component

5th component

4th component

6th component

3.1

0.41

0.43

1.3

0.39

0.36

Figure A11.1 The calculated eigenvalues, e.g. 3.1 for the 1st component, and the corresponding percentages (explained variance)

Table A11.1 Correlation matrix Inter-correlation coefficients for the 6 items A,B,C,D,E,F

A B C D E F

A – B 0.62 – C 0.58 0.60 – D 0.31 0.29 0.28 – E 0.32 0.33 0.29 0.60 – F 0.30 0.31 0.29 0.63 0.59 –

Page 190: Clinical Psychometrics

Calculus example 1 181

loadings while items D, E and F have negative loadings. Loadings are

thus related to correlation coefficients and lie between –1.0 and 1.0.

Teasdale then goes on to demonstrate that if you perform an actual explor-

ative factor analysis with rotation you will merely end up with the result seen

in Table A 11.3 . In this way the rotated factor 1 consists of A, B and C with

high (significant) loadings. The next rotated factor 2 consists of D, E and F

with high, significant, loadings, i.e., loadings above 0.30. The explorative fac-

tor analysis is statistical with ‘significant’ loadings, while the PCA, based on

sound mathematics, directly shows the loading signs (+ or −). This factor-

analytical rotation has merely ensured that all loadings will be positive!

Many people interpret the result of this PCA analysis as indicating that the first

principal component ‘measures’ a general level of intelligence because all six

items or tests have positive loadings. Russell’s typology is a good way to illustrate

that PCA is not a method with which to illustrate pure measurement techniques.

In his example Russell uses the typical Englishman. If we presume that a

typical Englishman is especially linguistically gifted while a typical continen-

tal European is especially non-linguistically gifted, then, according to Russell,

it is no use taking all six tests, or items (A, B, C, D, E, F) into consideration,

as this will often show that the typical Englishman has a high score on A and

B, but not on C, and low scores on D, E and F, and will then become atypical

if all six criteria are used as part of being a typical Englishman. According to

Russell one must move one step away from the first component and look as

the verbal tests, or items among the items in the next component with posi-

tive loadings (A, B and C). This example also shows that the sum of all six

items, or tests (A+B+C+D+E+F) is not an adequate measure of intelligence.

Table A11.2 Factor loadings for the two first principal components

A B C D E F

Component 1 0.72 0.73 0.70 0.72 0.73 0.72 Component 2 0.45 0.47 0.48 –0.49 –0.43 –0.47

Table A11.3 Explorative factor rotation

Factor loadings

Rotated factors A B C D E F

Component 1 0.83 0.84 0.83 0.16 0.20 0.18 Component 2 0.19 0.19 0.16 0.85 0.82 0.84

Page 191: Clinical Psychometrics

182 Clinical Psychometrics

In order to assess whether the total score of a collection of tests, or items,

is a sufficient measure of intelligence, or of depression, it is necessary to per-

form an item response theory (IRT) analysis (see the next Calculus Example).

Thus PCA can be used both to determine whether certain items in a scale

correlate with many of the other items in the scale, but especially to deter-

mine whether there is a dual component which can be used to classify or type

rather than to perform an actual measurement.

In the field of depression the typology of items is important when classify-

ing antidepressive drugs as either sedative or non-sedative, and measure-

ment techniques are important when assessing actual antidepressive effect.

References

Teasdale , T.W. ( 1992 ) Psykometriske aspekter af kvantitativ testning (Psychometric

aspects of quantitative testing) . In: Undersøgelsesmetoder i klinisk psykologi

(Evaluation methods in clinical psychology) (ed L. Østergaard ), pp. 112 – 35 .

København , Munksgaard .

Russell , B. ( 1956 ) My philosophical development . Routledge , London .

Child , D. ( 2006 ) The essentials of factor analysis . 3 rd edition . London , Continuum .

Page 192: Clinical Psychometrics

183

Rasch analysis ( IRT )

100%

80%

20%

3 7 12

Loweredmood

Sleepdisturbances

Guilt feelings

Total score

50%

Percentage presence of symptoms

This figure is a modified Teasdale ( 1992 ) example. It is modified in the sense

that, amongst other things, it shows three symptoms on a depression scale.

Each symptom is scored from 0 to 4; theoretically the sum should thus go

from 0 to 12.

‘Lowered Mood’ is seen to be present at a total score of approximately 3,

as half of the patients with a score of 3 have lowered mood. In contrast the

symptom ‘Guilt Feelings’ is only present in half of the patients when the total

score is approximately 7. These two symptoms fulfill the Rasch requirement

that patients with the symptom ‘Guilt Feelings’ should also demonstrate

‘Lowered Mood’. Transversely, patients who score approximately 3 only

present with Lowered Mood, not Guilt Feelings.

The case is different with the symptom: ‘Sleep Disturbances’. Among

patients with low scores some already suffer from sleep disturbances. Thus,

at a total score of around 3, approximately 20% have sleep disturbances.

Appendix 11b Calculus Example 2

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Page 193: Clinical Psychometrics

184 Clinical Psychometrics

In  patients with a total score of approximately 7, 80% present with sleep

disturbances but it is not known whether these patients also have guilt feel-

ings. The two curves in the figure showing ‘Lowered Mood’ and ‘Guilt

Feelings’ are correct item-characteristic curves according to the Rasch analy-

sis as they are S-shaped and do not intersect. The Sleep Disturbances curve is

not S-shaped and intersects both.

‘Lowered mood’ (here showing that 20% with a low total score suffer from

sleep disturbances) and further on the ‘Guilt feelings’ curve (now showing

that 20% with severe depression do not suffer from sleep disturbances). Thus

the symptom ‘Sleep disturbances’ cannot be said to play a part in such a way

that the total score is a sufficient measure of depression. The HAM-D 6 with

its six different depression symptoms fulfils the Rasch analysis.

References

Teasdale , T.W. ( 1992 ) Psykometriske aspekter af kvantitativ testning (Psychometric

aspects of quantitative testing) . In: Undersøgelsesmetoder i klinisk psykologi

(Evaluation methods in clinical psychology) (ed L. Østergaard ), pp. 112 – 35 .

København , Munksgaard .

Bech , P. ( 1984 ) The instrumental use of rating scales for depression . Pharma-

copsychiatry , 17 , 22 – 8 .

Page 194: Clinical Psychometrics

185

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

1 Bech , P. ( 2009 ) Fifty years with the Hamilton scales for anxiety and depression.

A tribute to Max Hamilton . Psychotherapy and Psychosomatics , 78 ( 4 ), 202 – 11 .

2 Feinstein , A.R. ( 1987 ) Clinimetrics . New Haven , Yale University Press .

3 Bech , P. ( 2008 ) Pichot P - A tribute to the European pharmacopsychologist on his

90th birthday . European Psychiatric Review , 2 , 76 – 80 .

4 Bech , P. ( 1993 ) Rating scales for psychopathology, health status and quality of life. A

compendium on documentation in accordance with the DSM-III-R and WHO sys-

tems . Berlin , Springer .

5 Guilford , J.P. ( 1936 ) Psychometric methods . New York , Mc Graw-Hill .

6 Sontag , S . ( 1977 ) Photography unlimited . The New York Review of Books 1977

(June 23), 26 – 31 .

7 Putman , H. ( 1995 ) Pragmatism . Oxford , Blackwell .

8 Rasmussen , H. , Erritzoe , D. , Andersen , R. , Ebdrup , B.H. , Aggernaes , B. , Oranje ,

B. , et al. ( 2010 ) Decreased frontal serotonin2A receptor binding in antipsychotic-

naive patients with first-episode schizophrenia . Archives of General Psychiatry ,

67 ( 1 ), 9 – 16 .

9 Tone , A. ( 2010 ) Andreasen, N. Interview by A. Tone. In: An oral history of neu-

ropsychopharmacology. The first fifty years (ed T . Ban ). Tennessee , American

College of Neuropsychopharmacology .

10 Høffding , H. ( 1906 ) The problems of philosophy (with a preface by William James) .

London , MacMillan .

11 Otto , R . ( 1932 ) Das Gefühl des überweltlichen. (Sensus Numinis) . Munich ,

C.H.Beck .

12 Maslow , A.H . ( 1968 ) Toward a psychology of being . New York , D. Van Nostrand Co .

13 Vannerus , A. ( 1929 ) Wundts psykologi . Stockholm , Bonniers .

14 Thomsen , R. ( 1968 ) The Pelican history of psychology . London , Penguin Books Ltd .

15 Jablensky , A. , Hugler , H. , Von Cranach , M. , & Kalinov , K. ( 1993 ) Kraepelin revis-

ited: a reassessment and statistical analysis of dementia praecox and manic-

depressive insanity in 1908 . Psychological Medicine , 23 ( 4 ), 843 – 58 .

References

Page 195: Clinical Psychometrics

186 References

16 Østergaard , L. ( 1962 ) En psykologisk analyse af de formelle skizofrene tankeforstyr-

relser (A psychological analysis of schizophrenic formal thought disorder) .

Copenhagen , Munksgaard .

17 Spearman , C. ( 1904 ) General intelligence objectively determined and measured .

American Journal of Psychology , 15 , 201 – 93 .

18 Spearman , C. ( 1927 ) The abilities of man: Their nature and measurement . New

York , Macmillan .

19 Guilford , J.P. ( 1954 ) Psychometric methods . New York , McGraw-Hill .

20 Thurstone , L.L. ( 1947 ) Multiple factor analysis: A development and expansion of

vectors of the mind . Chicago , Chicago University Press .

21 Cattell , R.B. ( 1978 ) The scientific use of factor analysis . New York , Plenum Press .

22 Comrey , A.L. , & Lee H.B. ( 1992 ) A first course in factor analysis . New York ,

Laurence Erlbaum .

23 Vernon , P.E. ( 1950 ) The structure of human abilities . London , Methuen .

24 Hotelling , H. ( 1933 ) Analysis of a Complex of Statistical Variables with Principal

Components . Journal of Educational Psychology , 24 , 417 – 41 .

25 Hotelling , H. ( 1936 ) Simplified calculation of principal components . Psychometrika ,

1 ,  27 – 35 .

26 Dunteman , G.H. ( 1989 ) Principal components analysis . Newbury Park , SAGE

Publications .

27 Russell , B. ( 1956 ) My philosophical development . London , Routledge .

28 Schafer , R. ( 1948 ) The clinical application of psychological tests . New York ,

International Universities Press .

29 Kline , P. ( 1993 ) The handbook of psychological testing . London , Routledge .

30 Eysenck , H.J. , & Eysenck , S.B.G. ( 1975 ) Manual of the Eysenck Personality

Questionnaire . London , Hodder Stoughton .

31 Eysenck , H.J. ( 1953 ) The structure of human personality . London , Methuen .

32 Beckmann , J.H. ( 1995 ). Røveriets bio-psyko-sociale konsekvenser (The bio-psycho-

social consequences of robbery) . Odense , Denmark, Odense University Hospital .

33 Bech , P. , Jorgensen , B. , Jeppesen , K. , Loldrup Poulsen , D. , & Vanggaard , T. ( 1986 )

Personality in depression: concordance between clinical assessment and question-

naires . Acta Psychiatrica Scandinavica , 74 ( 3 ), 263 – 8 .

34 Thunedborg , K. , Black , C.H. , & Bech , P. ( 1995 ) Beyond the Hamilton depression

scores in long-term treatment of manic-melancholic patients: prediction of

recurrence of depression by quality of life measurements . Psychotherapy and

Psychosomatics , 64 ( 3–4 ), 131 – 40 .

35 Spielberger , C.D. , Gorsuch , R. , & Lushene , R.E. ( 1970 ) The State-Trait Inventory:

Test Manual ( STAI ) . Palo Alto, CA, Consulting Psychologist Press.

36 Digman , J.M. ( 1990 ) Personality structure: Emergence of the Five-Factor Model .

Annual Review of Psychology , 41 , 417 – 40 .

37 Wiggins , J.S. (ed.) ( 1996 ) The five factor model of personality. Theoretical

perspectives . New York , Guildford Press .

38 Hamilton , M. ( 1959 ) The assessment of anxiety states by rating . British Journal of

Medical Psychology , 32 ( 1 ), 50 – 5 .

Page 196: Clinical Psychometrics

References 187

39 Hamilton , M. ( 1960 ) A rating scale for depression . Journal of Neurology

Neurosurgery and Psychiatry , 23 , 56 – 62 .

40 Hamilton , M. ( 1969 ) Diagnosis and rating of anxiety . British Journal of Psychiatry,

Special Publication 3 , 76 – 9 .

41 Pichot , P. , Pull , C.B , von Frenckell , R. , & Pull , M.C. ( 1981 ) Une analyse factorielle

de l ’ echelle d ’ appreciation de l ’ anxieté de Hamilton . Psychiatria Fennica , 13 , 183 – 9 .

42 Bech , P. , Allerup , P. , Maier , W. , Albus , M. , Lavori , P. , & Ayuso , J.L. ( 1992 ) The

Hamilton scales and the Hopkins Symptom Checklist (SCL-90) . A cross-national

validity study in patients with panic disorders . British Journal of Psychiatry , 160 ,

206 – 11 .

43 Hamilton , M. ( 1958 ) Treatment of anxiety states. III. Components of anxiety and

their response to benactyzine . Journal of Mental Science , 104 ( 437 ), 1062 – 8 .

44 Bech , P. , Fava , M. , Trivedi , M.H. , Wisniewski , S.R. , & Rush , A.J. ( 2011 ) Factor

structure and dimensionality of the two depression scales in STAR*D using level

1 datasets . Journal of Affective Disorders , 132 ( 3 ), 396 – 400 .

45 Overall , J.E. , & Gorham , D.R. ( 1962 ) The brief psychiatric rating scale . Psychological

reports , 10 , 799 – 812 .

46 Hedlund , J.L. , & Vieweg , B.W. ( 1980 ) The Brief Psychiatric Rating Scale BPRS: a

comprehensive review . Journal of Operational Psychiatry , 11 , 48 – 65 .

47 Binet , A. , & Simon , T . ( 1905 ) New methods for the diagnosis of the intellectual

level of subnormals (translated by Wiseman S. Intelligence and ability. London,

Penguin Books, 1967). L ’ Année Psychologique , 12 , 191 – 244 .

48 Rhoades , H.M. , & Overall , J.E. ( 1988 ) The semi-structured Brief Psychiatric

Rating Scale interview and rating guide . Psychopharmacology Bulletin , 24 , 101 – 4 .

49 Turner , W.J. ( 1963 ) Glossaries for use with the Overall and Gorham Brief Psychiatric

Rating Scale . New York , Research Division, Central Islip State Hospital .

50 Spearman , C. ( 1937 ) Psychology down the ages . London , MacMillan .

51 Nunnally , J.C. ( 1967 ) Psychometric theory . New York , McGraw-Hill .

52 Nunnally , J.C. , & Bernstein , I.R. ( 1994 ) Psychometric theory . Third ed. New York ,

McGraw-Hill .

53 Bech , P. ( 2009 ) Applied psychometrics in clinical psychiatry: the pharmacopsy-

chometric triangle . Acta Psychiatrica Scandinavica , 120 ( 5 ), 400 – 9 .

54 American Psychiatric Association . ( 1980 ) The Diagnostic and Statistical Manual of

Mental Disorders , third edition ( DSM-III ) . Washington DC , American Psychiatric

Association .

55 World Health Organization . ( 1992 ) International Classification of Disease . Tenth

Revision (ICD-10). Geneva , World Health Organization.

56 American Psychiatric Association . ( 1994 ) The Diagnostic and Statistical Manual of

Mental Disorders , Fourth Edition ( DSM-IV ) . Washington DC , American

Psychiatric Association .

57 Demjaha , A. , Morgan , K. , Morgan , C. , Landau , S. , Dean , K. , Reichenberg , A. , et al.

( 2009 ) Combining dimensional and categorical representation of psychosis: the

way forward for DSM-V and ICD-11? Psychological Medicine , 39 ( 12 ), 1943 – 55 .

58 Furr , R.M. , & Bacharach , V.R. ( 2008 ) Psychometrics . London , SAGE Publications .

Page 197: Clinical Psychometrics

188 References

59 Bech , P. ( 2008 ) The use of rating scales in affective disorders . European Psychiatric

Review , 1 , 14 – 18 .

60 Box , J.F. , & Fisher , R.A. ( 1978 ) The life of a scientist . Chichester , John Wiley .

61 Fisher , R.A. ( 1922 ) On the mathematical foundation of theoretical statistics .

Philosophical Transactions , 222 , 309 – 68 .

62 Olsen , L.W . ( 1999 ) Georg Rasch og målingsmodellerne (Georg Rasch and the

measurement models). Statistical Department, University of Copenhagen.

63 Fischer , G.H. , & Molenaar , I.W. ( 1995 ) Rasch models . Berlin , Springer .

64 Bech , P. ( 1981 ) Rating scales for affective disorders: their validity and consistency .

Acta Psychiatrica Scandinavica , 295 , 1 – 101 .

65 de Mars , C. ( 2010 ) Item response theory . Oxford , Oxford University Press .

66 Michell , J. ( 1990 ) An introduction to the logic of psychological measurement .

New York , Psychology Press .

67 Suchman , E .A. ( 1950 ) The utility of scalegram analysis . In: Measurement and pre-

dictions . (eds S.A. Stouffer , L. Guttman , & E.A. Suchman ), pp. 122 – 71 . Princeton ,

Princeton University Press .

68 Michell , J. ( 1999 ) Measurement in psychology . Cambridge , Cambridge University

Press .

69 Borsboom , D. ( 2005 ) Measuring the mind . Cambridge , Cambridge University

Press .

70 Bond , T.G. , & Fox , C.M. ( 2001 ) Applying the Rasch model . London , Lawrence

Erlbaum .

71 Allerup , P. ( 1986 ) Statistical analysis of MADRS : A rating scale . Copenhagen ,

Danish Institute for Educational Research .

72 Rasch , G. ( 1953 ) On simultaneous factor analysis in several populations . Uppsala,

Nordisk Psykologi ’ s Monograph Series No. 3 , pp. 65 – 71 .

73 Siegel , S. ( 1956 ) Nonparametric statistics for the behavioural sciences . New York ,

McGraw Hill .

74 Mokken , R.J. ( 1971 ) Theory and procedure of scale analysis . Berlin , Monton .

75 Sijtsna , K. , & Molenaar , I.W. ( 2002 ) Introduction to nonparametric item response

theory . London , Sage Publications .

76 Loevinger , J. ( 1957 ) Objective tests as instruments of psychological theory .

Psychological Reports , 3 , 635 – 94 .

77 Wittgenstein , L. ( 1953 ) Philosophical investigations . Oxford , Blackwell .

78 Ryle , G. (ed.) ( 1967 ) The revolution in philosophy . London , MacMillan .

79 Bech , P. ( 2011 ) The ABC profile of the HAM-D17 . Revista Brasileira de Psiquiatria ,

33 ( 2 ), 109 – 10 .

80 Ramsey , J.O. ( 1973 ) The effect of number of categories in rating scales in precision

of estimation of scale values . Psychometrika , 38 , 513 – 32 .

81 Freyd , M. ( 1923 ) The graphical rating scale . Journal of Educational Psychology , 14 ,

83 – 102 .

82 Asberg , M. , Montgomery , S.A. , Perris , C. , Schalling , D. , & Sedvall , G. ( 1978 ) A

comprehensive psychopathological rating scale . Acta Psychiatrica Scandinavica,

Suppl 1978 ( 271 ), 5 – 27 .

Page 198: Clinical Psychometrics

References 189

83 Bent-Hansen , J. , & Bech , P. ( 2011 ) Validity of the Definite and Semidefinite

Questionnaire version of the Hamilton Depression Scale, the Hamilton Subscale

and the Melancholia Scale . Part I. European Archives of Psychiatry and Clinical

Neuroscience , 261 , 37 – 46 .

84 Paykel , E.S . ( 1985 ) The clinical interview for depression. Development, reliability

and validity . Journal of Affective Disorders , 9 ( 1 ), 85 – 96 .

85 Hamilton , M. ( 1967 ) Development of a rating scale for primary depressive illness .

British Journal of Social & Clinical Psychology , 6 ( 4 ), 278 – 96 .

86 Fiske , D.W. ( 1983 ) Methodological perspectives on psychiatric rating scales . In:

Statistical and methodological advances in psychiatric research , (eds R.D . Gibbons , &

M.W . Dysken ), pp. 35 – 58 . Lancaster , MTP Press .

87 Lorr , M. ( 1974 ) Assessing psychotic behaviour by the IMPS . In: Psychological

measurements in psychopharmacology , (ed P . Pichot ), pp. 50 – 63 . Basel , Karger .

88 Overall , J.E. ( 1974 ) The Brief Psychiatric Rating Scale in psychopharmacology

research . In: Psychological measurements in psychopharmacology , (ed P. Pichot ),

pp. 67 – 78 . Basel , Karger .

89 Ban , T. (ed.) ( 2010 ) An oral history of neuropsychopharmacology. The first fifty

years . Brentwood, TN , American College of Neuropsychopharmacology .

90 Overall , J.E. ( 1979 ) Criteria for selection of subjects for research in biological psy-

chiatry . In: Handbook of biological psychiatry , (ed H.M.V . Praag ), pp. 359 – 91 . New

York , Decker .

91 Andersen , J. , Larsen , J.K. , Schultz , V. , Nielsen , B.M. , Korner , A. , Behnke , K. , et al.

( 1989 ) The Brief Psychiatric Rating Scale. Dimension of schizophrenia-reliability

and construct validity . Psychopathology , 22 ( 2–3 ), 168 – 176 .

92 Guy , W. ( 1976 ) Early Clinical Drug Evaluation ( ECDEU ) Assessment manual .

Rockville , National Institute of Health.

93 Cohen , J. ( 1960 ) A coefficient of agreement for nominal scales . Educational and

Psychological Measurement , 29 , 37 – 46 .

94 Cohen , J. ( 1969 ) Statistical power analysis for the behavioural sciences . Hillsdale ,

Lawrence Erlbaum .

95 Cohen , J. ( 1994 ) The earth is round (P < 0.05) . American Psychologist , 49 ,

997 – 1003 .

96 Karpatschof , B . ( 2006 ) Udforskning i psykologi. De kvantitative metoder

(Research in psychology. The quantitative methods). Copenhagen , Akademisk

Forlag .

97 Cohen , J. ( 1976 ) S tatistical power analysis for the behavioural sciences . Second Ed.

New York , Lawrence Erlbaum .

98 Bech , P. , Cialdella , P. , Haugh , M.C. , Birkett , M.A. , Hours , A. , Boissel , J.P ., et al.

( 2000 ) Meta-analysis of randomised controlled trials of fluoxetine v. placebo and

tricyclic antidepressants in the short-term treatment of major depression . British

Journal of Psychiatry , 176 , 421 – 8 .

99 Turner , E.H , Matthews , A.M. , Linardatos , E. , Tell , R.A. , & Rosenthal , R. ( 2008 )

Selective publication of antidepressant trials and its influence on apparent efficacy .

New England Journal of Medicine , 358 ( 3 ), 252 – 60 .

Page 199: Clinical Psychometrics

190 References

100 Kirsch , I. , Deacon , B.J. , Huedo-Medina , T.B. , Scoboria , A. , Moore , T.J. , &

Johnson ,  B.T. ( 2008 ) Initial severity and antidepressant benefits: a meta-analysis of

data submitted to the Food and Drug Administration . PLoS Medicine , 5 ( 2 ), e45 .

101 Norman , G.R. , Sloan , J.A. , & Wyrwich , K.W. ( 2003 ) Interpretation of changes in

health-related quality of life: the remarkable universality of half a standard devia-

tion . Medical Care , 41 ( 5 ), 582 – 92 .

102 Entsuah , R. , Shaffer , M. , & Zhang , J. ( 2002 ) A critical examination of the sensi-

tivity of unidimensional subscales derived from the Hamilton Depression

Rating Scale to antidepressant drug effects . Journal of Psychiatric Research ,

36 ( 6 ), 437 – 48 .

103 Bech , P. , Tanghoj , P. , Andersen , H.F. , & Overo , K. ( 2002 ) Citalopram dose-

response revisited using an alternative psychometric approach to evaluate clinical

effects of four fixed citalopram doses compared to placebo in patients with major

depression . Psychopharmacology , 163 ( 1 ), 20 – 5 .

104 Bech , P. , Tanghoj , P. , Cialdella , P. , Andersen , H.F. , & Pedersen , A.G. ( 2004 )

Escitalopram dose-response revisited: an alternative psychometric approach to

evaluate clinical effects of escitalopram compared to citalopram and placebo in

patients with major depression . International Journal of Neuropsychopharmacology ,

7 ( 3 ), 283 – 90 .

105 Bech , P. ( 2001 ) Meta-analysis of placebo-controlled trials with mirtazapine using

the core items of the Hamilton Depression Scale as evidence of a pure antidepres-

sive effect in the short-term treatment of major depression . International Journal

of Neuropsychopharmacology , 4 ( 4 ), 337 – 45 .

106 Bech , P. , Kajdasz , D.K. , & Porsdal , V. ( 2006 ) Dose-response relationship of dulox-

etine in placebo-controlled clinical trials in patients with major depressive disor-

der . Psychopharmacology , 188 ( 3 ), 273 – 80 .

107 Cattell , R.B. ( 1973 ) Personality and mood questionnaire . San Francisco , Jossey-

Bass Publishers .

108 Bech , P. , Allerup , P. , Reisby , N. , & Gram , L.F. ( 1984 ) Assessment of symptom

change from improvement curves on the Hamilton depression scale in trials with

antidepressants . Psychopharmacology , 84 ( 2 ), 276 – 81 .

109 Lingjaerde , O. , Ahlfors , U.G. , Bech , P. , Dencker , S.J. , & Elgen , K. ( 1987 ) The UKU

side effect rating scale. A new comprehensive rating scale for psychotropic drugs

and a cross-sectional study of side effects in neuroleptic-treated patients . Acta

Psychiatrica Scandinavica , 334 , 1 – 100 .

110 Casey , P. , Maracy , M. , Kelly , B.D. , Lehtinen , V. , Ayuso-Mateos , J.L. , Dalgard , O.S. ,

et al. ( 2006 ) Can adjustment disorder and depressive episode be distinguished?

Results from ODIN . Journal of Affective Disorders , 92 ( 2–3 ), 291 – 7 .

111 Rogers , S.L. , Doody , R.S. , Mohs , R.C. , & Friedhoff , L.T. ( 1998 ) Donepezil

improves cognition and global function in Alzheimer disease: a 15-week, double-

blind, placebo-controlled study. Donepezil Study Group . Archives of Internal

Medicine , 158 ( 9 ), 1021 – 31 .

112 Caroe , T.K. , & Moe , C. ( 2009 ) Adverse events causing discontinuation of done-

pezil for Alzheimer ’ s dementia . Ugeskr Laeger , 171 ( 50 ), 3690 – 3 .

Page 200: Clinical Psychometrics

References 191

113 Zimbroff , D.L. , Kane , J.M. , Tamminga , C.A. , Daniel , D.G. , Mack , R.J. , Wozniak , P.J. ,

et al. ( 1997 ) Controlled, dose-response study of sertindole and haloperidol in

the  treatment of schizophrenia. Sertindole Study Group . American Journal of

Psychiatry , 154 ( 6 ), 782 – 91 .

114 Simpson , G.M. , & Angus , J.W. ( 1970 ) A rating scale for extrapyramidal side

effects . Acta Psychiatrica Scandinavica , 212 , 11 – 19 .

115 Bech , P. , Tanghoj , P. , Andreasson , K. , & Overo , K.F. ( 2011 ) Dose-response rela-

tionship of sertindole and haloperidol using the pharmacopsychometric triangle .

Acta Psychiatrica Scandinavica , 123 , 154 – 61 .

116 Lehman , A.F. ( 1996 ) Measures of quality of life among persons with severe and

persistent mental disorders . Social Psychiatry and Psychiatric Epidemiology , 31 ( 2 ),

78 – 88 .

117 Bech , P. , & Rafaelsen , O.J. ( 1980 ) Personality and manic-melancholic illness .

Psychiatria Fennica , Supplementum , 223 – 31 .

118 Bech , P . ( 2006 ) The full story of lithium . A tribute to Mogens Schou (1918–2005).

Psychotherapy and Psychosomatics , 75 ( 5 ), 265 – 9 .

119 Johnstone , E.C. , Crow , T.J. , Frith , C.D. , & Owens , D.G. ( 1988 ) The Northwick Park

“functional” psychosis study: diagnosis and treatment response . Lancet , 2 ( 8603 ),

119 – 25 .

120 Bental , R.P. ( 2003 ) Madness explained . London , Allen Lane .

121 Gjerris , A. , Bech , P. , Broen-Christensen , C. , Geisler , A. , Klysner , R. , & Rafaelsen , O.J.

( 1981 ) Haloperidol levels in relation to antimanic effect . In: Clinical pharmacol-

ogy and psychiatry (eds E. Usdin , S. Dahl , L.F. Gram , O. Lingjærde ), pp. 227 – 32 .

London , MacMillan Press .

122 Bech , P. , Gex-Fabry , M. , Aubry , J.M. , Favre , S. , & Bertschy , G. ( 2006 ) Olanzapine

plasma level in relation to antimanic effect in the acute therapy of manic states .

Nordic Journal of Psychiatry , 60 ( 2 ), 181 – 2 .

123 Greenberg , G. ( 2010 ) Manufacturing depression . The secret history of a modern

disease . London, Bloomsbury .

124 Boyer , P. , Montgomery , S. , Lepola , U. , Germain , J.M. , Brisard , C. , Ganguly , R. ,

et al. ( 2008 ) Efficacy, safety, and tolerability of fixed-dose desvenlafaxine 50 and

100 mg/day for major depressive disorder in a placebo-controlled trial .

International Clinical Psychopharmacology , 23 ( 5 ), 243 – 53 .

125 Bjerrum , H. , Allerup , P. , Thunedborg , K. , Jakobsen , K. , & Bech P. ( 1992 )

Treatment of generalized anxiety disorder: comparison of a new beta-blocking

drug (CGP 361 A), low-dose neuroleptic (flupenthixol), and placebo .

Pharmacopsychiatry , 25 ( 5 ), 229 – 32 .

126 Bech , P. ( 2007 ) Dose-response relationship of pregabalin in patients with gener-

alized anxiety disorder . A pooled analysis of four placebo-controlled trials.

Pharmacopsychiatry , 40 ( 4 ), 163 – 8 .

127 Rickels , K. , Downing , R. , Schweizer , E. , & Hassman , H. ( 1993 ) Antidepressants

for the treatment of generalized anxiety disorder . A placebo-controlled compari-

son of imipramine, trazodone, and diazepam. Archives of General Psychiatry ,

50 ( 11 ), 884 – 95 .

Page 201: Clinical Psychometrics

192 References

128 Bech , P. , Thomsen , J. , Prytz , S. , Vendsborg , P.B. , Zilstorff , K. , & Rafaelsen , O.J.

( 1979 ) The profile and severity of lithium-induced side effects in mentally

healthy subjects . Neuropsychobiology , 5 ( 3 ), 160 – 6 .

129 Trivedi , M.H. , Fava , M. , Wisniewski , S.R. , Thase , M.E. , Quitkin , F. , Warden , D. ,

et al. ( 2006 ) Medication augmentation after the failure of SSRIs for depression .

New England Journal of Medicine , 354 ( 12 ), 1243 – 52 .

130 Bech , P. , Fava , M. , Trivedi , M.H. , Wisniewski , S.R. , & Rush , A.J. ( 2012 ) Outcomes

on the Pharmacopsychometric Triangle: bupropion- SR versus buspirone aug-

mentation of citalopram in the STAR *D Trial . Acta Psychiatrica Scandinavica ,

125 ( 4 ): 342 – 348 .

131 Harper , R.S. ( 1949 ) The laboratory of William James . Harvard Alumni Bulletin

November, 169 – 73 .

132 Bech , P. ( 1999 ) Stress og livskvalitet (Stress and quality of life) . Copenhagen ,

PsykiatriFondens Forlag.

133 James , W . ( 1897 ) The will to believe . London , Longmans, Green & Co .

134 James , W. ( 1907 ) Talks to teachers . New York , Norton .

135 Bentham , J. ( 1834 ) Deontology or the science of morality . London , University of

London .

136 Ware , Jr. , J.E. , Kosinski , M. , Gandek , B. , Aaronson , N.K. , Apolone , G. , Bech , P. ,

et al. ( 1998 ) The factor structure of the SF-36 Health Survey in 10 countries:

results from the IQOLA Project . International Quality of Life Assessment.

Journal of Clinical Epidemiology , 51 ( 11 ), 1159 – 65 .

137 Murray , H.A. ( 1938 ) Exploration in personality . New York , Oxford University

Press .

138 Rasmussen , E.T. ( 1965 ) Dynamisk psykologi og dens grundlag (Dynamic psychol-

ogy and its basis) . Copenhagen , Munksgaard .

139 Dupuy , H.J. ( 1984 ) The Psychological General Well-Being Index ( PGWB ) . In:

Assessment of quality of life in clinical trials of cardiovascular therapy (eds

N.K .  Wenger , M.E. Mattson , C.D . Furberg , J. Elinson ), pp. 184 – 8 . New York ,

Le Jacq Publishing .

140 Bech , P. , Gudex , C. , & Johansen , K.S. ( 1996 ) The WHO (Ten) Well-Being Index:

validation in diabetes . Psychotherapy and Psychosomatics , 65 ( 4 ), 183 – 90 .

141 Noerholm , V. , Groenvold , M. , Watt , T. , Bjorner , J.B. , Rasmussen , N.A. , & Bech , P.

( 2004 ) Quality of life in the Danish general population–normative data and

validity of WHOQOL-BREF using Rasch and item response theory models .

Quality of Life Research , 13 ( 2 ), 531 – 40 .

142 Bech , P. , Olsen , L.R. , Kjoller , M. , & Rasmussen , N.K. ( 2003 ) Measuring well-

being rather than the absence of distress symptoms: a comparison of the SF-36

Mental Health subscale and the WHO-Five Well-Being Scale . International

Journal of Methods in Psychiatric Research , 12 ( 2 ), 85 – 91 .

143 McDowell , I. ( 2010 ) Measures of self-perceived well-being . Journal of

Psychosomatic Research , 69 ( 1 ), 69 – 79 .

144 Speer , D.C. ( 1998 ) Mental health outcome evaluations . San Diego , Academic

Press .

Page 202: Clinical Psychometrics

References 193

145 Carrasco-Lucas , R. , Allerup , P. , & Bech , P. ( 2012 ) The validity of the invariant item

ordering of the World Health Organization-Five Well-Being Index in screening for

the elements of tiredness and unrested sleep within apathy in an elderly population .

146 Christensen , K.S. , Bech , P. , & Fink , P. ( 2010 ) Measuring mental health by ques-

tionnaires in primary care – unidimensionality, responsiveness and compliance .

European Psychiatric Review , 3 , 8 – 12 .

147 Bech , P. , Gormsen , L. , Loldrup , D. , & Lunde , M. ( 2009 ) The clinical effect of

clomipramine in chronic idiopathic pain disorder revisited using the Spielberger

State Anxiety Symptom Scale (SSASS) as outcome scale . Journal of Affective

Disorders , 119 ( 1–3 ), 43 – 51 .

148 Kristensen , T.S. , Borg , V. , & Hannerz , H. ( 2002 ) Socioeconomic status and psy-

chosocial work environment: results from a Danish national study . Scandinavian

Journal of Public Health , 59 , 41 – 48 .

149 Davidson , J.R.T , & Fao , E.B. ( 1993 ) Posttraumatic stress disorder . DSM-IV and

beyond . Washington DC, American Psychiatric Press .

150 Buitenhuis , J. , de Jong , P.J. , Jaspers , J.P. , & Groothoff , J.W. ( 2006 ) Relationship

between posttraumatic stress disorder symptoms and the course of whiplash

complaints . Journal of Psychosomatic Research , 61 ( 5 ), 681 – 9 .

151 Selye , H. ( 1974 ) Stress without distress . 1st ed. New York , Lippincott .

152 Selye , H. ( 1980 ) Stress uden angst (Stress without anxiety) . Copenhagen ,

Gyldendal.

153 Bech , P. ( 2002 ) Measurement issues . In: Biological psychiatry (eds H. D ’ Haenen ,

J.A. Den Boer , P. Willner ), pp. 25 – 36 . New York , John Wiley .

154 Grinker , R.R.S , Miller , J. , Sabshin , M. , & Nunnally , J.C. ( 1961 ) The Phenomena of

Depressions . New York , Hoeber .

155 Olsen , L.R. , Mortensen , E.L. , & Bech , P. ( 2004 ) Prevalence of major depression

and stress indicators in the Danish general population . Acta Psychiatrica

Scandinavica , 109 ( 2 ), 96 – 103 .

156 Olsen , L. R. ( 2007 ) Measurements of depressive illness and mental distress in the

Danish general population . Copenhagen , Copenhagen University .

157 Endler , N.S. , & Magnusson , D. ( 1976 ) Multidimensional aspects of State and

Trait anxiety: A cross-cultural study of Canadian and Swedish college students .

In: Cross-cultural anxiety (eds C.D. Spielberger , R. Diaz-Guerrero ), pp. 143 – 72 .

Washington DC , Hemisphere Publishing .

158 Awata , S. , Bech , P. , Yoshida , S. , Hirai , M. , Suzuki , S. , Yamashita , M. , et al. ( 2007 )

Reliability and validity of the Japanese version of the World Health Organization-

Five Well-Being Index in the context of detecting depression in diabetic patients .

Psychiatry and Clinical Neurosciences , 61 ( 1 ), 112 – 19 .

159 de Wit , M. , Pouwer , F. , Gemke , R.J. , Delemarre-van de Waal , H.A. , & Snoek , F.J.

( 2007 ) Validation of the WHO-5 Well-Being Index in adolescents with type 1

diabetes . Diabetes Care , 30 ( 8 ), 2003 – 6 .

160 Birket-Smith , M. , Hansen , B.H. , Hanash , J.A. , Hansen , J.F. , & Rasmussen , A.

( 2009 ) Mental disorders and general well-being in cardiology outpatients–6-year

survival . Journal of Psychosomatic Research , 67 ( 1 ), 5 – 10 .

Page 203: Clinical Psychometrics

194 References

161 Bech , P. , Bille , J. , Lindberg , L. , Waarst , S. , Lauge , N. , & Treufeldt , P . ( 2010 ) Health

of the Nation Outcome Scales (HoNOS). Ti år med HoNOS: 2000–2009 . Hillerød ,

Psykiatrisk Center Nordsjælland, Forskningsenheden .

162 Lichtenberg , P. , & Belmaker , R.H. ( 2010 ) Subtyping major depressive disorder .

Psychotherapy and Psychosomatics , 79 ( 3 ), 131 – 5 .

163 Lam , R.W. , Michalak , E.E. , & Swinson , R.P . ( 2006 ) Assessment scales in depression

and anxiety . London , Taylor & Francis .

164 Rush , A.J. ( 2007 ) STAR*D: what have we learned? American Journal of Psychiatry ,

164 ( 2 ), 201 – 4 .

165 Gottesman , I.I. , & Gould , T.D. ( 2003 ) The endophenotype concept in psychia-

try: etymology and strategic intentions . American Journal of Psychiatry , 160 ( 4 ),

636 – 45 .

166 Körner , S. ( 1986 ) The philosophy of mathematics . New York , Dover Publications .

167 Barrett , C. (ed.) ( 1966 ) Wittgenstein . Oxford , Blackwell .

168 Regis , E. ( 1987 ) Who got Einstein ’ s office? New York , Addison-Wesley .

169 Angst , J. ( 1966 ) Zür Ätiologie und Nosologie endogener depressiver Psychosen .

Berlin , Springer .

170 Stieglitz , R.D. , Fahndrich , E. , & Renfordt , E. ( 1988 ) Interrater study for the

AMDP system . Pharmacopsychiatry , 21 ( 6 ), 451 – 2 .

171 Angst , J. , Adolfsson , R. , Benazzi , F. , Gamma , A. , Hantouche , E. , Meyer , T.D. , et al.

( 2005 ) The HCL-32: towards a self-assessment tool for hypomanic symptoms in

outpatients . Journal of Affective Disorders , 88 ( 2 ), 217 – 33 .

172 Hirschfeld , R.M. , Williams , J.B. , Spitzer , R.L. , Calabrese , J.R. , Flynn , L. , Keck , Jr ,

P.E. , et al. ( 2000 ) Development and validation of a screening instrument for bipo-

lar spectrum disorder: the Mood Disorder Questionnaire . American Journal of

Psychiatry , 157 ( 11 ), 1873 – 5 .

173 Moller , H.J . ( 2001 ) Methodological aspects in the assessment of severity of

depression by the Hamilton Depression Scale . European Archives of Psychiatry

and Clinical Neurosciences , 251 Suppl 2 , II13 – 20 .

174 Moller , H.J. ( 2009 ) Standardised rating scales in psychiatry: methodological

basis, their possibilities and limitations and descriptions of important rating

scales . World Journal of Biological Psychiatry , 10 ( 1 ), 6 – 26 .

175 Guidi , J. , Fava , G.A. , Bech , P. , & Paykel , E.S. ( 2011 ) The Clinical Interview for

Depression: A comprehensive review of studies and clinimetric properties .

Psychotherapy and Psychosomatics , 80 , 10 – 27 .

176 Paykel , E.S. , Klerman , G.L. , & Prusoff , B.A. ( 1970 ) Treatment setting and clinical

depression . Archives of General Psychiatry , 22 , 11 – 21 .

177 Paykel , E.S. ( 1990 ) Use of the Hamilton Depression Scale in general

practice . In:  The Hamilton Scales (eds P. Bech, A. Coppen) , pp. 40 – 9 . Berlin ,

Springer .

178 Lingjaerde , O. , Edlund , A.H. , Gormsen , C.A. , Gottfries , C.G. , Haugstad , A. ,

Hermann , I.L ., et al. ( 1974 ) The effects of lithium carbonate in combination with

tricyclic antidepressants in endogenous depression. A double-blind, multicenter

trial . Acta Psychiatrica Scandinavica , 50 ( 2 ), 233 – 42 .

Page 204: Clinical Psychometrics

References 195

179 Bech , P ., Malt , U.F ., Dencker , S.J ., Ahlfors , U.G ., Elgen , K ., Lewander , T ., et al.

( 1993 ) Scales for assessment of diagnosis and severity of mental disorders . Acta

Psychiatrica Scandinavica , 87 (Supplementum 372 ), 1 – 91 .

180 Williams , J.B.W. ( 1990 ) Structured interview guide for the Hamilton Rating

Scale . In: The Hamilton Scales (eds P . Bech , A. Coppen ), pp. 48 – 63 . Berlin ,

Springer .

181 Williams , J.B . ( 2001 ) Standardizing the Hamilton Depression Rating Scale: past,

present, and future . European Archives of Psychiatry and Clinical Neurosciences ,

251 Suppl 2 , II6 – 12 .

182 Williams , J.B. , Kobak , K.A. , Bech , P. , Engelhardt , N. , Evans , K. , Lipsitz , J. , et al.

( 2008 ) The GRID-HAMD: standardization of the Hamilton Depression Rating

Scale . International Clinical Psychopharmacology , 23 ( 3 ), 120 – 9 .

183 Rush , A.J. , Giles , D.E. , Schlesser , M.A. , Fulton , C.L. , Weissenburger , J. , & Burns ,

C. ( 1986 ) The Inventory for Depressive Symptomatology (IDS): preliminary

findings . Psychiatry Research , 18 ( 1 ), 65 – 87 .

184 Fleck , M.P. , Poirier-Littre , M.F. , Guelfi , J.D. , Bourdel , M.C. , & Loo , H. ( 1995 )

Factorial structure of the 17-item Hamilton Depression Rating Scale . Acta

Psychiatrica Scandinavica , 92 ( 3 ), 168 – 72 .

185 Lecrubier , Y. , & Bech , P. ( 2007 ) The Ham D(6) is more homogenous and as

sensitive as the Ham D(17) . European Psychiatry , 22 ( 4 ), 252 – 5 .

186 Overall , J.E. , Gorham , D.R. ( 1988 ) The Brief Psychiatric Rating Scale (BPRS) .

Recent developments in ascertainment and scaling. Psychopharmacology Bulletin ,

24 , 97 – 9 .

187 Kay , S.R. , Opler , L.A. , Lindenmayer , J.P. ( 1988 ) Reliability and validity of the

positive and negative syndrome scale for schizophrenics . Psychiatry Research ,

23 ( 1 ), 99 – 110 .

188 Van Os , J. , Gilvarry , C. , Bale , R. , Van Horn , E. , Tattan , T. , White , I ., et al. ( 1999 )

A  comparison of the utility of dimensional and categorical representations of

psychosis . UK700 Group. Psychological Medicine , 29 ( 3 ), 595 – 606 .

189 Mellenbergh , G.J. ( 1994 ) Generalized linear item response theory . Psychological

Bulletin , 115 , 300 – 7 .

190 Quine , W.V. ( 1985 ) The time of my life . Boston , MIT Press .

191 Bech , P. ( 2002 ) The Bech-Rafaelsen Melancholia Scale (MES) in clinical trials of

therapies in depressive disorders: a 20-year review of its use as outcome measure .

Acta Psychiatrica Scandinavica , 106 ( 4 ), 252 – 64 .

192 Bech , P. ( 2005 ) The Bech-Rafaelsen Mania and Melancholic Scales in clinical

trials . In: Focus on bipolar research (ed M.C. Brown) , pp. 131 – 51 . New York , Nova

Science Publishers .

Page 205: Clinical Psychometrics

196

Clinical Psychometrics, First Edition. Per Bech.

© 2012 John Wiley & Sons, Ltd. Published 2012 by John Wiley & Sons, Ltd.

Note: Page references in bold refer to entries in the Glossary

ABC Hamilton Depression Scale 84,

222–5, 131

ADAS (Alzheimer’s Disease Assessment

Scale) 59

alcohol 69

Allerup, Peter 37, 108

allostasis 85, 88, 109

alprazolam 71

Alzheimer’s Disease Assessment Scale

(ADAS) 59

AMDP (Arbeits-Gemeinschaft für

Methodik und Dokumentation

in den Psychiatrie)

system 104

American College of

Neuropsychopharmacology

(ACNP) 106

amitryptyline 105

Andersen, A.F. 34

Andreasen, Nancy 5

Angst, Jules 104

antianxiety medication 69–72

antidementia medication 59–60, 93

antidepressants 36, 56, 57, 66–9

combination of 72–3

tricyclics 66, 106

antimanic medication 65–6

antipsychotic medication 60–4, 66

anxiety 18

Anxiety Symptom Scale (ASS) 86, 92,

93–4, 160

applied mathematics 102

Arbeits-Gemeinschaft für Methodik und

Dokumentation in den

Psychiatrie (AMDP) system 104

Bacharach, V.R. 30, 31

Bech-Rafaelsen Mania Scale (see MAS)

Bech-Rafaelsen Melancholia Scale

(see MES)

Beck Depression Inventory (BDI) 87, 116

BDI version 6 146–7

Beck’s cognitive model of depression

86, 87

Bental, R.P. 65

Bentham, Jeremy 75

benzodiazepines 69, 70, 71

Bernstein, I. R. 26

beta-blocker 70

between-groups analysis 107

bi-directional factor 13–15

Big Five model 18

Big Two model 18

Binet, Alfred 24, 26

bipolar aff ective disorder 63, 65, 104

bipolar factor 12, 13

Bolwig TG 52

Boring, Edwin 108

Index

Page 206: Clinical Psychometrics

Index 197

Borsboom, D. 37, 38

brain research 3–4

Brief Psychiatric Rating Scale

(BPRS) 24–6, 27, 28, 44, 46, 47,

50, 52, 61, 107–8, 165–6

British Association for

Psychopharmacology (BAP) 106

buspirone 73

Cade, John 65

Calvinism (pharmacological) 109

Cattell, R.B. 13, 56

ceiling items 35, 36, 39

Centre for Epidemiologic Studies

Depression Scale (CES-D) 92

Chi-Squared Test 38

chloral hydrate 8

chlorpromazine 20, 23, 27, 46, 60, 107

Chomsky, Noam 86

citalopram 67, 68, 73

classical psychometric procedures

40–1

Clinical Global Impression Scale,

Severity (CGI-S) 50

Clinical Interview for Depression

(CID) 45, 105

Clinical Interview for Depression and

Related Syndromes (CIDRS) 44,

45, 170–5

clinimetrics 1, 23, 30, 86, 109

clonazepam 71

coeffi cient of homogeneity

Loevinger 20, 40, 108

Mokken 61, 64, 85

coeffi cient of reliability 27

Cohen, Jacob 50–2, 55, 85

Collegium Internationale Neuro-

Psychopharmacologicum

(CINP) 106

compliance 81, 109–10

Comprehensive Psychopathological

Rating Scale 44

computer adopted testing (CAT) 36

computer assisted tomography (CAT)

scan 5

Comrey, A.L. 26

contra-phobic reaction 47

Copenhagen lecture (Hamilton)

117–21

correlation coeffi cient 11, 26, 110

correlation matrix 13

cortisol 85–6, 88

critical monism 6, 38

Cronbach’s alpha 26, 30–1, 50, 82,

92, 96

cross-over analysis 107

Cushing, H.W. 86

Cushing’s Disease 86

Darwin, Charles 32

Davidson, Donald 38

Dein, Erling 9, 48, 52

Delay, J. 23

depression 3, 34–5, 47

subtypes 98

unipolar 104

depression ruler 48, 49

Derogatis L.R.108

desvenlafaxine 68, 69

Diagnostic and Statistical Manual of

Mental Disorders (DSM)

27, 48

DSM-I 27

DSM-III 28, 29, 43, 82

DSM-IV 27–31, 32, 43, 84–5,

89, 108

DSM-V 29

diazepam 69–70, 71, 72

donepezil 59–60

dose-response relationship 53, 57, 68

dual factor 12, 13, 23

Early Clinical Drug Evaluation

(ECDEU) manual (Guy) 106

eff ect size 50–2, 53–6

in pharmacopsychometric

triangle 56–7

escitalopram 66, 67, 68

extrapyradminal symptoms (EPS) 61

extraversion/introversion 16

Page 207: Clinical Psychometrics

198 Index

Eysenck, Hans 1, 15–19, 20, 27, 81

Extraversion scale 18

Personality scale 95

Personality Questionnaire (EPQ) 16,

18, 104

Neuroticism scale 16, 17, 18, 19, 88

factor analysis 10–12, 14, 24, 26, 29, 31,

49, 95–7, 102, 110, 179

British vs American 12–13

vs item response theory (IRT)

analysis 39–42

personality questionnaires and 15–20

rating scales and 20–3

family resemblances 102

Fechner, Gustav 5

Feighner criteria 110–11

Feinstein, Alvan R. 23, 30, 109

Fisher, Ronald A. 1, 13, 32–3, 34

Fisher’s exact test 38

Fleck, Marcelo 106

fl oor items 36

fl uoxetine 55

Frank, Jerry 108

Freud, Sigmund 1, 9, 16, 43, 47, 81,

102, 103

personality theory of neuroticism 16

Friis-Hasché, Erik 91

Furr, R.M. 30, 31

Galton, Francis 1, 32, 43

Gaussian bell curve 33

general factor 11, 12

General Health Questionnaire (GHQ) 81

Global Depression Scale 51

Gorham, Don 24, 27, 46, 107

graphic rating scales 43

Greenberg, G. 107

GRID-HAM-D 106

Grinker R. 87

Guelfi , J.D. 106

Guilford, J.P. 12, 26

guilt feelings 35, 36, 39

Guttman, Louis 37, 108

cumulative model 37, 40, 42, 43

haloperidol 60–1, 61–3, 65

HAM-A 11, 20, 28, 31, 45, 46, 47, 70,

71, 95

HAM-A6 21, 22, 70, 71, 72

HAM-A13

21

HAM-A14

21, 22, 23, 71, 72, 105,

154–9

HAM-D 3, 20–4, 26, 28, 31, 45, 46, 47,

48, 51, 53, 86–7, 97, 105

GRID version 45

HAM-D6 4, 22, 36,49, 50, 55, 56, 57,

66, 68

clinician version 141–2

Questionnaire 143–4

HAM-D9 84

HAM-D17

22, 39, 42, 52, 54–7, 68, 69,

98–101, 122–5, 126–31

ABC version 84

HAM-D21

56

HAM-D24

132–4

Hamilton, Max 1, 20–3, 27, 46–7, 102,

103, 105, 108, 117–21

Hamilton Anxiety Scale see HAM-A

Hamilton Depression Scale see

HAM-D

Helmholtz, Hermann von 5

Hippius, Hanns 103, 104

Høff ding, Harald 6, 38

Hollister, Leo 46, 107

Hospital Anxiety and Depression Scale

(HADS) 70

Hotelling, Harold 13–14, 26, 33, 42,

102, 179

Hypomania Checklist (HCL-32) 104

idiographic method of measurement

17

imipramine 27, 46, 58, 68, 69, 71, 72

indices of validity 48

Inpatient Multi-dimensional Scale

(IMPS) 46

intelligence tests 10–12, 24, 26

International Classifi cation of Disease

(ICD) (WHO) 28

ICD-6 27

Page 208: Clinical Psychometrics

Index 199

ICD-10 27–31, 32, 43, 48, 82, 84–5,

89, 98, 108

hierarchy or ladder 58–9

ICD-11 29

intraclass coeffi cient 27

invariant item ordering 39

Inventory of Depressive

Symptomatology (IDS-30) 106

item parameter diffi culty 35

item response theory (IRT) analysis 26,

29–31, 34–8, 43, 47, 48, 49, 54,

56, 96, 108, 182

vs factor analysis 49–50

non-parametric analysis for

39–42

Jacobsen, Ove 48, 52

James, William 74, 108

Jessen, Borge 34

Jung, Carl Gustav 16

Kant, Emanuel 3, 4, 102

Kaplan-Meier curves 93

Kappa coeffi cient 27, 51

Karpatchof, Benny 53, 54

Kay, Stanley R. 107

Kirsch, I. 56

Klerman, G.L. 28, 29, 106

Kline, P. 16

Kraepelin, Emil 1, 2, 6–9, 9–10, 20, 27,

74, 95, 102, 103, 104, 108

‘diagnostic cards’ 7, 8

Psychiatric Compendium 7

symptom checklist 6–9

Kruskal-Wallis One-Way Analysis of

Variance by Ranks 39

Lam, R.W. 97–101

language-game approach 42

Last Observation Carried Forward

(LOCF method) 55

Lecrubier, Yves 106

Lehmann, Alfred 9

Likert, Rensis 43–5

Likert response 43

Likert scale 40, 44, 108

Lindenmayer, J.P.107

Lingjærde, Odd 106

lithium 20, 29, 63, 65, 72, 106

local independency of items 38, 50, 54

Loevinger, Jane 20, 40, 108

Loevinger coeffi cient of

homogeneity 20, 40, 108

Loo, H. 106

Lorr, M. 46

MADRS 37, 44–5, 66, 68

ABC scoring sheet 44

magnetic resonance imaging (MRI) 5

Major Depression Inventory (MDI) 86,

89, 148–53

mania 29

MAS 52, 65, 66, 161–4

manic-depressive disorder 8, 10

medical model (etiological considerations)

29, 91, 97–98, 170–6

medical stress model (Selye) 82, 83,

85–6

MES 54, 115, 116, 136–9

Mindham 105

MINI International Neuropsychiatric

Interview (MINI) 106

Mini Mental State Examination

(MMSE) 59–60, 92

Minnesota Multiphasic Personality

Inventory (MMPI) 107

Mitchell, J. 36–7

modern psychometric procedures

40–2

Mokken, Robert J. 1, 39–40, 43, 108

coeffi cient of homogeneity 61,

64, 85

Molenaar, I.W. 40

Mood Disorder Questionnaire

(MDQ) 104

Möller, Hans Jürgen 104

Montgomery-Åsberg Depression Rating

Scale see MADRS

mood stabilising medications 72

morphine 8

Page 209: Clinical Psychometrics

200 Index

National Institute of Mental Health

(NIMH) 107

NEO-PI-R 18

Neuropsychiatric Inventory (NPI) 60

neuroticism 81

New Clinical Drug Evaluation Unit

(NCDEU) 107

Newcastle Diagnostic Depression Scale

(1965) 176–7

nominal scale 8, 16, 38, 39

non-parametric statistics 38–9, 108

non-reductive monism 6, 38

Nørholm, Vibeke 89

normal (Gaussian) distribution 33

normothetic method 17

Nunnally, J.C. 26

Ockham, William 26

Ockham’s razor 26

olanzapine 66

Olsen, Lis Raabæk 91

ordinal scale 39

Østergaard, Lise 9–10, 48

Overall, John 24, 27, 45–8, 103, 107

Paykel, Eugene 105

Parkinson’s Disease 61

parsimony, law of 26

Patient Related Inventory of Side Eff ects

(PRISE-20) 178

Pearson, Karl 1, 26, 33

Pearson’s correction 39

Perry, Ralph Barton 108

pharmacopsychology 2, 6–9

pharmacopsychometric triangle 56–9,

61, 66, 70, 71, 72, 73, 97

pharmacopsychometrics 96

phenemal 8, 60, 70, 71

phenotyping 101

Pichot, Pierre 1, 23–6, 27, 47, 102, 103,

106, 107, 108

pimozide 65

population-independent

response-curve 69

population studies in depression and

anxiety 89–94

Positive and Negative Syndrome Scale

(PANSS) 4–5, 30, 44, 45, 47, 61,

107–8

positive manifold 13

positron emission tomography (PET)

scanning 4, 5

post-traumatic stress disorder

(PTSD) 82–4

pregabalin 71

Present State Examination (PSE) 8

primary depression 111

principal component analysis

(PCA) 13–15, 26, 42, 96, 179–82

PRISE 20 (Patient Related Inventory of

Side Eff ects) 178

propranolol 70

psychoanalysis 1, 9, 102, 111

Psychological General Well-Being

(PGWB) 78, 79

psychomotor retardation 35, 36, 39

psychopharmacology 111

psychotic symptom items 4

Putman, H. 4, 102

Q-LES-Q 66, 68

quality of life 61, 74–5

Quality of Life scale 56, 58, 59, 60, 66,

68, 70

Quine, William Van Orman 4, 108

Rafaelsen O. 116

ramifi ed hierarchy of typology

(Russell) 14, 42

rank order tests 38

Rasch, Georg 1, 26, 34–8, 47, 102, 108

Rasch analysis 34, 36, 37, 39, 40, 43, 49,

50, 56, 89, 183–4

reductionism 36, 111

relapse 100, 101, 111

reliability (questionnaire) 30, 111

reliability (rating scale) 27–8, 29, 111–12

reliability, coeffi cient of 27

Page 210: Clinical Psychometrics

Index 201

remission 16, 45, 63, 72, 101, 112

response 101, 112

Rorschach, Hermann 9

Rorschach test 9–10, 16, 17, 27, 81, 107

Rush, John 106

Russell, Bertrand 14–15, 42, 108, 180, 181

scale step measurements 43–5

Scandinavian College of

Neuro-Psychopharmacology

(SCNP) 106

Schafer, R. 15

schizophrenia 5, 8, 9, 10, 29, 47, 61, 65

schizophrenicity 96, 165–6

Schou, Mogens 65

screening scales 92

selective serotonin reuptake inhibitor

(SSRI) 53–4, 66

Self-perceived Stress Scale (Cohen) 85

Selye, Hans 82, 83, 85–6

‘sensus numinis’ 6

serotonin and noradrenaline reuptake

inhibitors (SNRI) 68

sertindole 61–3, 65

SF-12 75, 76

SF-36 (Medical Outcomes Studies, Short

Form) 72, 75–8

Sheehan, David 107

Sheehan’s Disability Scale 92

Siegel, Sidney 1, 38–9, 108

Sijtsna, K. 40

Simpson-Angus scale 61, 63

Skinner, Fred 108

Spearman, Charles 1, 6, 10–13, 14, 17,

24, 27, 33, 95, 102

Spearman correlation analysis 39

Spielberger State Anxiety Scale

(STAI) 19, 91–2, 93

Spielberger, Charles 18

antianxiety model 86, 88

Spitzer, R.L. 28

standardisation 112, 115

STAR-D analysis 23

Statistical Analysis System (SAS) 49

statistical uncertainty 48

stress 82–8

Strömgren, Bengt 103

Strömgren, Erik 65, 103, 108

suffi ciency, concept of 34

suffi cient rating scales 45–8

suffi cient statistic 32, 34, 37, 41–3, 49,

54, 61, 89, 97

suicidal ideation 35, 36

Suppes, Patrick 38

Symptom Checklist (SCL)

SCL-90 85, 97, 108

SCL-90-R 108

SCL-92 97, 108

SCL-D6 145

symptom checklist (Kraepelin) 7, 9, 95

Teasdale. Th omas 179

test-retest reliability coeffi cient 30

Th urstone, L.L. 12, 46

trait anxiety 18

transferability 36, 38, 41, 56, 96–7, 112

translation procedure 115

trazodone 71

tricyclic antidepressants 66, 106

Turner, William J. 24, 44, 45, 56

UKU (Udvalg for Kliniske

Undersøgelser) Scale 58, 106

Side Eff ect Rating Scale 106

unidimensionality 68, 112

unipolar depression 104

validity (clinical) 1, 11, 15, 18, 23–6, 37,

48–9, 112

validity (external) 34, 113

validity (psychometric) 28–9, 37, 48–9,

112–13

Vanggaard, Th orkild 81

Vannerus, A. 6

venlafaxine 68, 71

Vernon, P.E. 13

visual analogue scale (VAS) 50, 113

Vitger John 52

Page 211: Clinical Psychometrics

202 Index

WHO-5 questionnaire 71, 72,

78–81, 97

predictive value 92, 93

quality-of-life scale 68, 89

Well-Being Index (1998

version) 167–8

Wilcoxon Signed Rank Test 39

Williams, Janet 106

Window (time frame) 113

Wittgenstein, Ludwig 4, 40, 42, 53, 102

work-related stress condition 84–5

Wundt, Wilhelm 1, 3, 5–6, 6–7, 10, 28,

29, 32, 38, 74, 75, 95

Yates’ correction 38

Young Mania Rating Scale (YMRS) 66