math candel maastricht university. 1.internal validity do your conclusions reflect the “true state...

35
Math Candel Maastricht University

Post on 21-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Math Candel

Maastricht University

1. Internal validity Do your conclusions reflect the “true state of nature” ?

2. External validity or generalizabilityDo your conclusions hold for other persons, situations, and times ?

1. Bad operationalisation of the constructs you want to measure

2. Incorrect design of the study3. Errors in the statistical analysis4. Wrong choice of the sample5. Sample too small6. Non-response, dropout and non-compliance

Does active coping lead to an increase in the quality of life as experienced by rheumatic patients ?

If we carry out an empirical study, and the answer is yes, how sure can we be that this really is the case ?

• How did we measure quality of life ? How good is this measure ?

• E.g. Sickness Impact Profile (SIP; 1989, 1994):– Emotional stability– Social behavior– Somatic Autonomy– Mobility control– Mobility range– Psychological autonomy

• Social behavior (e.g.):I am cutting down the length of visits with friendsI am doing fewer community activities

I stay away from home only for brief periods of time

• Emotional stability (e.g.):I often act irritable toward those around me, for example, snap at people, give sharp answers, criticize easily

I act disagreeable to family members, for example, I act spiteful, I am stubborn

Suppose we administer this scale on a different day, does it lead to the same scores on the different subscales ?

(1) How reliable is the scale ?One could calculate a correlation between scores obtained at two different occasions:

test-retest reliability

Remedy for unreliability: Repeated measurements

Motivation for multi-item measures !

(2) Are we sure that the scale is measuring what we would like to measure ?

Does the scale have validity ?

If the scale is reliable, this is still not guaranteed.

The scale may still be invalid !

• Content validityIs the scale representative of the domain to be measured ?Is there something missing ?

• Nomological or construct validityDoes the scale behave as one would expect the scale to behave (e.g. on theoretical grounds) ?

E.g. the score on SIP should correlate with the score on another quality of life measure

1. Respondent’s tendency to agree (or disagree) with each statement

2. Respondent’s tendency to give answers that are socially accepted

3. Respondent’s tendency to give extreme answers4. Respondent’s tendency to bring some variation into his or her

answers 5. Desire of the respondent to please the

interviewers/researchers ……..

• How good was the design of the study ? Are alternative explanations possible for the relations found ?

Suppose the design of the study was:

Baseline Intervention Postmeasurement measurement

What problems may be involved ?

• Suppose the weather was good during the period between pre- and post-measurement, so there were less complaints related to the disease

History effects

• Rheuma is a progressive disease. One may expect a decrease in quality of life over time.

“Maturation”

• Remedy: install a control group

Baseline Intervention Postmeasurement measurement

Baseline Postmeasurement measurement

• Other effects that also may be under control:

Testing effects E.g. subjects are made aware of their functional health status and increased complaints may be due to pre-measurement

Selection effects The subjects are not assigned in a random fashion to the treatment and control group

• Other possible threats:

Diffusion of treatments E.g. when patients are retrieved from the same hospital (rheumatology clinic) they may learn about each others therapy

Remedy: Randomize at the level of hospitals

• Observational studies: confounders !• E.g. the relation between smoking and blood pressure may also be explained by

differences in physical exercise and differences in nutrition of smokers and non-smokers

• Remedies:

Stratify on confounders, and perform analysis for each stratum

Look for persons in both groups that match each other on confounders

Include confounders as covariates in the analysis

• Was the selection based on the variable that we would like to find an effect upon ?

Watch out for regression towards the mean !

Quality of life before

Qua

lity

of

life

aft

er

Quality of life before

Qua

lity

of

life

aft

er

Quality of life before

Qua

lity

of

life

aft

er

Quality of life before

Qua

lity

of

life

aft

er

Mean

Quality of life before

Qua

lity

of

life

aft

er

Mea

n

Mean

Quality of life before

Qua

lity

of

life

aft

er

Mea

n

Mean

Quality of life before

Qua

lity

of

life

aft

er

Mea

n

Mean

Quality of life before

Qua

lity

of

life

aft

er

Mea

n

Mean

Quality of life before

Qua

lity

of

life

aft

er

Mea

n

Mean

Quality of life before

Qua

lity

of

life

aft

er

Mea

n

Mean

• In the treatment group not all people are strongly motivated to follow the program (and possibly to do some exercises at home)

• So a therapy may be effective but this may not be found in the study.

Should we leave out the non-compliers ?

• Non-compliance: not all people are willing to follow the procedures as prescribed by a doctor or a program

• The more often you do a statistical test, the larger the probability that you may find an effect, even if there is none

• The chance of a type I error may increase

Increase of type I error due to multiple testing

Number of statistical tests

121086420

Cha

nce

of a

typ

e I

erro

r .5

.4

.3

.2

.1

0.0

Ideal

type I error

Realized

type I error

• Select a sample that is large enough to be able to detect an effect (in an experiment) or a relation (in an observational study)

• If there is an effect or relation, but we are not able to detect it, we make a type II error

Relation between type II error and sample size

Sample size per group

120100806040200

Cha

nce

of a

type

II

erro

r 1.0

.9

.8

.7

.6

.5

.4

.3

.2

.10.0

Effect:

large

medium

small

• How can one maximize the external validity or the generalizability ?– Take a random sample from the population:

best alternative, but not always feasible– Take a heterogeneous sample: stratified sampling:

in case of interaction no average effects can be reported– Take a sample with subjects similar to modal persons for the strata:

even per stratum there is some risk in interpreting relations

• The target population is not reached:Are the participants different from the non-participants ?Are the people that dropout from the study different from the others ?

• Remedies:• Minimize nonparticipation and dropout• Obtain information on nonparticipation and

dropout

• The target population is reached, but is one able to generalize to other target populations ?

• Interaction between selected sample and treatment• Interaction between time and treatment• Interaction between setting and treatment

• Strategies to get insight into these interactions:- Replicate study with other groups, at

other times and within other settings - Search literature for prior evidence