making social work count lecture 4 an esrc curriculum innovation and researcher development...
TRANSCRIPT
Making Social Work Count Lecture 4
An ESRC Curriculum Innovation and Researcher Development Initiative
What is being studied?Approaches to measuring variables
Assessment and judgment
• Social workers have to assess all the time:– Is there a problem or
need here?– What is the risk of things
getting worse?– Have I made a
difference?
• Researchers carry out similar tasks
• This lecture considers the key issue of developing meaningful measurements for use in quantitative research
• Many of the issues are of relevance to the more general task of “assessment”
Quantitative and qualitative
• All research involves simplification– The question is whether we know what is gained and
lost by simplification• Qualitative studies tend to focus on meaning– Common strategy is identifying themes of relevance
• Quantitative studies convert issues to numbers– Allows certain types of important description (e.g. how
many people have this problem?)– And – crucially - comparison (e.g. are things getting
better? Does one group have more problems?)
Quantitative and qualitative
Quantitative research• This session focuses on
quantitative research• It identifies key
considerations in thinking about the quality of quantitative study– Reliability– Validity
Qualitative research• Some of these
considerations can also be applied to qualitative research
• However, qualitative studies also have their own criteria for assessing good research
Learning outcomes
Understand what a variable
is
Appreciate different types of variable that can be used in
quantitative research
Understand issues in
relation to reliability and
validity
Know what a standardised instrument is
Have had the opportunity to
reflect on implications for
practice
Example of children in care
• Returning to idea that care “fails” children• Lecture 3 suggested that comparing children
who have left care with the general population is not a valid comparison sample
• Now let’s look at outcome measures
Forrester et al. (2009) review
• The literature review focused on studies that looked at child welfare over time for children in care
• Strongest finding: very poor research base – this is a difficult area to research
• Of 13 studies, almost all suggested:– Most of the harm occurs before care– Children tend to do better once in care– Some harm occurs as children leave care– Even in good placements children still tend to have
problems
But…
What “outcomes” were being measured?What outcomes do YOU think should be measured for children in care?
Key points
• Deciding on “outcomes” or variables for a study is NOT some value-neutral, technocratic activity
• Key issues to consider:– WHO is deciding what is
to be measured? (e.g. experts? Government? Service users?)
– WHAT is being measured?
– HOW is it being measured? [focus of this lecture]
Key points
What is measured?• For instance, in studies
reviewed by Forrester:– the most common issue
“measured” was behaviour (and particularly problem behaviour)
– education was the second most common
– others included physical growth, social relations, etc
How is it measured?• Studies in the review:
– obtained information from social work files and made a researcher “judgment”
– used school tests– pooled interview and other
data and made a researcher “judgment”
– used questionnaires to carers
• What are the strengths and weaknesses of each?
Attributes and variables
• An attribute – is a characteristic of an individuale.g. height, intelligence, beauty, serenity
• A variable – is the operationalisation of an attributee.g. metres, IQ score, marks out of 10?, err…It allows attributes to be compared and described
• The focus of lecture is on: how attributes are operationalised?
Variables need to be reliable and valid
Reliability• Are the results consistent, e.g. can the
results be replicated in different conditions and across different groups?
Validity• Does the instrument measure what it claims
to measure?
Measures should be both reliable and valid
Low reliabilityCannot be valid if
not reliable…
ReliableNot valid
Not reliableAND not valid
Standardised Instruments (SIs)
• Tools that measure a specific quality or characteristic e.g. psychological distress
• They let us compare results across groups in different settings e.g. social workers, families, teachers, police.....
• SIs need to be high in both reliability and validity
Reliability – overview
• The consistency of a measure• A test is considered reliable if we get the same
result repeatedly• Reliability can be estimated in a number of
different ways– Test-retest reliability: over time– Inter-rater reliability: between different scorers– Internal Consistency Reliability:
across items on the same test
Test-Retest Reliability
• Tests the extent to which the test is repeatable and stable over time
• The same social workers are given the same questions 2 to 3 weeks later
• If the results differ substantially, and there has been no intervention, then we should question the reliability of those questions
Inter-rater reliability
• Where two or more people rate/score/judge the test
• The scores of the judges are compared to find the degree of correlation/consistency between their judgements
• If there is a high degree of correlation between the different judgements, the test can be said to be reliable
Internal Consistency Reliability
• For example where there are two questions within a SI that seem to be asking the same thing
• If the test is internally valid the respondent should give the same answer to both questions
• More generally questions should be linked to one another if they measuring the same attribute
Validity
• The extent to which a test measures what it claims to measure:– Construct validity: The degree to which the test
measures the construct of what it wants to measure – the overarching type of validity
– Predictive validity: The degree of effectiveness with which the performance on a test predicts performance in a real-life situation
– Content validity: that items on the test represent the entire range of possible items the test should cover
Construct validity
• The degree to which the test measures what it is intended to measure
• The over-arching concept in validity – all other types of validity are ways of assessing this
• As a result construct validity has many elements:– Predictive validity (can it predict things e.g. IQ scores and later test
results)– Criterion validity (does it correctly differentiate e.g. does a screening
instrument identify people who are depressed)– Construct validity (is the full range of the construct included)– And other types…
Predictive validity
• Can structured risk assessment tools predict children who will be abused?
• Are the predictions more accurate than practitioners’ decisions?
Predictive validity
• Barlow et al (2013) found that most attempts to predict had low success i.e. high numbers of false positives or false negatives
• Further research needed to develop reliable tools that predict abuse or re-abuse
• Though this is also true for practitioners…
Content validity
• Refers to the extent to which a measure represents elements of a social construct or trait
• For example, a depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioural dimension
• Or : how should “ethnicity” be defined? In practice it is not possible to capture the full range of possible ethnicities – but what level of simplification is “valid”?
General Health Questionnaire (GHQ)
• A reliable and valid screening instrument identifying aspects of current mental health (anxiety/depression/social phobia)
• The self administered questionnaire asks if someone has experienced a particular symptom or behaviour recently
• Each item is rated on a four-point scale
• Used in many countries in different languages
GHQ 12 questions
Questions include: Have you recently ......1. Been able to concentrate on whatever you are doing 2. Lost much sleep over worry 3. Felt that you are playing a useful part in things 4. Felt capable of making decisions about things 5. Felt constantly under strain 6. Felt you couldn’t overcome your difficulties 7. Been able to enjoy your normal day to day activities 8. Been able to face up to your problems 9. Been feeling unhappy and depressed 10. Been losing confidence in yourself 11. Been thinking of yourself as a worthless person 12. Been feeling reasonably happy, all things considered
GHQ 12
• Different ways of measuring risk of psychiatric problems using data
• All show reasonable link with clinical diagnosis • Common way is ‘yes’ or ‘no’ (depending on
question) in 4 or more questions• How do social workers do…?
Clinical scores for social workers and general population using GHQ
NQSW One year later General population0
5
10
15
20
25
30
35
40
45
50
33
43
18
Carpenter et al, 2010; ONS, 2010
How to measure children’s emotional and behavioural welfare?• SDQ: Questionnaire designed for carers, children and
teachers
• Reliability is tested by: comparing emotional and behavioural welfare – and over time
• Validity is tested by:• seeing whether scores
predict children receiving specialist help, criminal behaviour, excluded from school and “real world” outcomes
• also comparing with clinical assessment and other instruments
Strengths and Difficulties Questionnaire (SDQ)
• A brief behavioural screening questionnaire for parents/carers/ teachers with 3-16 year olds
• Asks about psychological attributes, some positive and others negative– E.g. emotional, conduct,
hyperactivity, peer relationship, prosocial behaviour
SDQ questions
• 25 questions composed of five scales with five questions in each scale
• E.g. 5 questions in the Emotional Symptoms Scale1. I get a lot of headaches 2. I worry a lot 3. I am often unhappy 4. I am nervous in 5. I have many fears Responses: Not true/Somewhat true/Certainly true
Why does this matter?
• Worth considering common social work research methods such as coming to a “researcher judgment” – how reliable? How valid?
• More importantly – what about your practice?• What is a better way of judging whether a child has
emotional or behavioural problems, or an adult is at risk of psychological problems – your judgment or a standardized instrument?
• If you want to evaluate whether you are making a difference – what role might a standardized instrument have?
Learning outcomes
Do you?• Understand what a variable is• Appreciate different types of
variable that can be used in quantitative research
• Understand issues in relation to:– Reliability– Validity
• Know what a standardised instrument is
• Have had the opportunity to reflect on implications for practice
References
• Goldberg, D. & Williams, P. (1988) A user’s guide to the General Health Questionnaire. Slough: NFER-Nelson
• Goodman R (1997) The Strengths and Difficulties Questionnaire: A Research Note. Journal of Child Psychology and Psychiatry, 38, 581-586
• http://www.sdqinfo.com/d0.html• Barlow, J., Fisher, J.D. and Jones, D. (2013) Systematic Review of Models for
Analysing Significant Harm, Department for Education Report; London Accessed: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/183949/DFE-RR199.pdf