making social work count lecture 4 an esrc curriculum innovation and researcher development...

Making Social Work Count Lecture 4

An ESRC Curriculum Innovation and Researcher Development Initiative

What is being studied?Approaches to measuring variables

Assessment and judgment

• Social workers have to assess all the time:– Is there a problem or

need here?– What is the risk of things

getting worse?– Have I made a

difference?

• Researchers carry out similar tasks

• This lecture considers the key issue of developing meaningful measurements for use in quantitative research

• Many of the issues are of relevance to the more general task of “assessment”

Quantitative and qualitative

• All research involves simplification– The question is whether we know what is gained and

lost by simplification• Qualitative studies tend to focus on meaning– Common strategy is identifying themes of relevance

• Quantitative studies convert issues to numbers– Allows certain types of important description (e.g. how

many people have this problem?)– And – crucially - comparison (e.g. are things getting

better? Does one group have more problems?)

Quantitative and qualitative

Quantitative research• This session focuses on

quantitative research• It identifies key

considerations in thinking about the quality of quantitative study– Reliability– Validity

Qualitative research• Some of these

considerations can also be applied to qualitative research

• However, qualitative studies also have their own criteria for assessing good research

Learning outcomes

Understand what a variable

is

Appreciate different types of variable that can be used in

quantitative research

Understand issues in

relation to reliability and

validity

Know what a standardised instrument is

Have had the opportunity to

reflect on implications for

practice

Example of children in care

• Returning to idea that care “fails” children• Lecture 3 suggested that comparing children

who have left care with the general population is not a valid comparison sample

• Now let’s look at outcome measures

Forrester et al. (2009) review

• The literature review focused on studies that looked at child welfare over time for children in care

• Strongest finding: very poor research base – this is a difficult area to research

• Of 13 studies, almost all suggested:– Most of the harm occurs before care– Children tend to do better once in care– Some harm occurs as children leave care– Even in good placements children still tend to have

problems

But…

What “outcomes” were being measured?What outcomes do YOU think should be measured for children in care?

Key points

• Deciding on “outcomes” or variables for a study is NOT some value-neutral, technocratic activity

• Key issues to consider:– WHO is deciding what is

to be measured? (e.g. experts? Government? Service users?)

– WHAT is being measured?

– HOW is it being measured? [focus of this lecture]

Key points

What is measured?• For instance, in studies

reviewed by Forrester:– the most common issue

“measured” was behaviour (and particularly problem behaviour)

– education was the second most common

– others included physical growth, social relations, etc

How is it measured?• Studies in the review:

– obtained information from social work files and made a researcher “judgment”

– used school tests– pooled interview and other

data and made a researcher “judgment”

– used questionnaires to carers

• What are the strengths and weaknesses of each?

Attributes and variables

• An attribute – is a characteristic of an individuale.g. height, intelligence, beauty, serenity

• A variable – is the operationalisation of an attributee.g. metres, IQ score, marks out of 10?, err…It allows attributes to be compared and described

• The focus of lecture is on: how attributes are operationalised?

Variables need to be reliable and valid

Reliability• Are the results consistent, e.g. can the

results be replicated in different conditions and across different groups?

Validity• Does the instrument measure what it claims

to measure?

Measures should be both reliable and valid

Low reliabilityCannot be valid if

not reliable…

ReliableNot valid

Not reliableAND not valid

Standardised Instruments (SIs)

• Tools that measure a specific quality or characteristic e.g. psychological distress

• They let us compare results across groups in different settings e.g. social workers, families, teachers, police.....

• SIs need to be high in both reliability and validity

Reliability – overview

• The consistency of a measure• A test is considered reliable if we get the same

result repeatedly• Reliability can be estimated in a number of

different ways– Test-retest reliability: over time– Inter-rater reliability: between different scorers– Internal Consistency Reliability:

across items on the same test

Test-Retest Reliability

• Tests the extent to which the test is repeatable and stable over time

• The same social workers are given the same questions 2 to 3 weeks later

• If the results differ substantially, and there has been no intervention, then we should question the reliability of those questions

Inter-rater reliability

• Where two or more people rate/score/judge the test

• The scores of the judges are compared to find the degree of correlation/consistency between their judgements

• If there is a high degree of correlation between the different judgements, the test can be said to be reliable

Internal Consistency Reliability

• For example where there are two questions within a SI that seem to be asking the same thing

• If the test is internally valid the respondent should give the same answer to both questions

• More generally questions should be linked to one another if they measuring the same attribute

Validity

• The extent to which a test measures what it claims to measure:– Construct validity: The degree to which the test

measures the construct of what it wants to measure – the overarching type of validity

– Predictive validity: The degree of effectiveness with which the performance on a test predicts performance in a real-life situation

– Content validity: that items on the test represent the entire range of possible items the test should cover

Construct validity

• The degree to which the test measures what it is intended to measure

• The over-arching concept in validity – all other types of validity are ways of assessing this

• As a result construct validity has many elements:– Predictive validity (can it predict things e.g. IQ scores and later test

results)– Criterion validity (does it correctly differentiate e.g. does a screening

instrument identify people who are depressed)– Construct validity (is the full range of the construct included)– And other types…

Predictive validity

• Can structured risk assessment tools predict children who will be abused?

• Are the predictions more accurate than practitioners’ decisions?

Predictive validity

• Barlow et al (2013) found that most attempts to predict had low success i.e. high numbers of false positives or false negatives

• Further research needed to develop reliable tools that predict abuse or re-abuse

• Though this is also true for practitioners…

Content validity

• Refers to the extent to which a measure represents elements of a social construct or trait

• For example, a depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioural dimension

• Or : how should “ethnicity” be defined? In practice it is not possible to capture the full range of possible ethnicities – but what level of simplification is “valid”?

General Health Questionnaire (GHQ)

• A reliable and valid screening instrument identifying aspects of current mental health (anxiety/depression/social phobia)

• The self administered questionnaire asks if someone has experienced a particular symptom or behaviour recently

• Each item is rated on a four-point scale

• Used in many countries in different languages

GHQ 12 questions

Questions include: Have you recently ......1. Been able to concentrate on whatever you are doing 2. Lost much sleep over worry 3. Felt that you are playing a useful part in things 4. Felt capable of making decisions about things 5. Felt constantly under strain 6. Felt you couldn’t overcome your difficulties 7. Been able to enjoy your normal day to day activities 8. Been able to face up to your problems 9. Been feeling unhappy and depressed 10. Been losing confidence in yourself 11. Been thinking of yourself as a worthless person 12. Been feeling reasonably happy, all things considered

GHQ 12

• Different ways of measuring risk of psychiatric problems using data

• All show reasonable link with clinical diagnosis • Common way is ‘yes’ or ‘no’ (depending on

question) in 4 or more questions• How do social workers do…?

Clinical scores for social workers and general population using GHQ

NQSW One year later General population0

5

10

15

20

25

30

35

40

45

50

33

43

18

Carpenter et al, 2010; ONS, 2010

How to measure children’s emotional and behavioural welfare?• SDQ: Questionnaire designed for carers, children and

teachers

• Reliability is tested by: comparing emotional and behavioural welfare – and over time

• Validity is tested by:• seeing whether scores

predict children receiving specialist help, criminal behaviour, excluded from school and “real world” outcomes

• also comparing with clinical assessment and other instruments

Strengths and Difficulties Questionnaire (SDQ)

• A brief behavioural screening questionnaire for parents/carers/ teachers with 3-16 year olds

• Asks about psychological attributes, some positive and others negative– E.g. emotional, conduct,

hyperactivity, peer relationship, prosocial behaviour

SDQ questions

• 25 questions composed of five scales with five questions in each scale

• E.g. 5 questions in the Emotional Symptoms Scale1. I get a lot of headaches 2. I worry a lot 3. I am often unhappy 4. I am nervous in 5. I have many fears Responses: Not true/Somewhat true/Certainly true

Why does this matter?

• Worth considering common social work research methods such as coming to a “researcher judgment” – how reliable? How valid?

• More importantly – what about your practice?• What is a better way of judging whether a child has

emotional or behavioural problems, or an adult is at risk of psychological problems – your judgment or a standardized instrument?

• If you want to evaluate whether you are making a difference – what role might a standardized instrument have?

Learning outcomes

Do you?• Understand what a variable is• Appreciate different types of

variable that can be used in quantitative research

• Understand issues in relation to:– Reliability– Validity

• Know what a standardised instrument is

• Have had the opportunity to reflect on implications for practice

References

• Goldberg, D. & Williams, P. (1988) A user’s guide to the General Health Questionnaire. Slough: NFER-Nelson

• Goodman R (1997) The Strengths and Difficulties Questionnaire: A Research Note. Journal of Child Psychology and Psychiatry, 38, 581-586

• http://www.sdqinfo.com/d0.html• Barlow, J., Fisher, J.D. and Jones, D. (2013) Systematic Review of Models for

Analysing Significant Harm, Department for Education Report; London Accessed: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/183949/DFE-RR199.pdf

http://www.sdqinfo.com/d0.html

http://www.sdqinfo.com/d0.html

https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/183949/DFE-RR199.pdf



making social work count lecture 4 an esrc curriculum innovation and researcher development...

Documents

care children

good research slide

lecture slide

practice slide

example of children

poor research base

good placements children

key considerations