how to extract more (and better quality) information from ... · content of this workshop •...

59
How to extract more (and better quality) information from your study? ‘simple questions’ ≠ ‘simple analyses’ University of Oulu November 13, 2014 Emmanuel Lesaffre (& Dominique Declerck) Part II: Misclassification 1

Upload: others

Post on 17-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

How to extract more (and better quality) information from your study?

‘simple questions’ ≠ ‘simple analyses’

University of Oulu

November 13, 2014

Emmanuel Lesaffre (& Dominique Declerck) Part II: Misclassification

1

Page 4: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

• Usually based on visual screening, sometimes complemented with tactile examination (dental probe)

• Substantial variation in scoring between EXAMINERS may lead to considerable misclassification (Ismail, 2004; Assaf et al, 2004 & 2006)

• Problematic for:

– Comparability of results

– Repeatability

Examiner misclassification

Example from Caries Research

4 Need for standardisation!

Page 6: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

• Comparability of results: – when multiple examiners involved

e.g. collection of country-wide CE data e.g. collection of data in different dental clinics or dental practices

Examiner misclassification

Need for standardisation!

Example from Caries Research

6

West

Flanders

East

Flanders

Brabant

Antwerp

Limburg

[0.49,1.10] (1.10,1.40] (1.40,2.25]

KEY

Page 8: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

d1 d2 d3 d4 d2

REVERSAL possible !

8

Why so difficult?

Page 9: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Standardisation

For this purpose, guidelines were developed by different research groups

With following aims:

To provide a systematic approach to the collection and reporting of data on caries experience

To ensure that data collected in a wide range of environments are valid and comparable

http://www.who.int/en/ http://www.bascd.org/ http://www.dundee.ac.uk/dhsru/ (World Health Organization, 1971) (Pitts et al., 1997) (Pitts, 2004) 9

Page 11: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Is there a problem?

Review of methodological aspects related to caries experience assessment in reports of epidemiological surveys published between January 2000 and December 2008 (n=89), more specifically:

Reporting of methodological aspects

Application of standardized methodology

J. Agbaje, D. Declerck and E. Lesaffre: Assessment of caries experience in epidemiological surveys: a review. Community Dental Health, 2012, 29, 14-19 11

Page 12: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

ITEM

REPORTED NOT REPORTED

EXPLICITLY MENTIONED

ASSUMED TO BE APPLIED*

Use of standardization criteria 80 (89.9%) NA 9 (10.1%)

Materials and setting Use of probe Type of probe Light condition Use of radiographs Cleaning / debris removal

60 (67.4%) 51 (57.3%) 60 (67.4%) 31 (34.8%) 28 (31.5%)

82 (92.1%) 75 (84.3%) 82 (92.1%) 80 (89.9%) 77 (86.5%)

7 (7.9%)

14 (15.7%) 7 (7.9%)

9 (10.1%) 12 (13.5%)

Detection threshold applied 42 (47.2%) 84 (94.4%) 5 (5.6%)

Examiner characteristics Training Calibration Reliability assessed Reliability reported

57 (64.0%) 61 (68.5%) 47 (52.8%) 41 (46.1%)

58 (65.2%)

0 48 (53.9%)

NA

31 (34.8%) 28 (31.5%) 41 (46.1%) 48 (53.9%)

Frequency of REPORTING of caries experience assessment methodology

* The column “assumed to be applied” contains the sum of surveys were information was explicitly mentioned and those where reference was made to standardisation criteria containing information on the item of interest. NA = not applicable

Journals with Impact Factor performed better

12

Page 13: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

ITEM CONSIDERED N (%) Use of probe 2 (3.8%)

Type of probe 27 (51.9%)

Light condition 16 (30.8%)

Cleaning 5 (9.6%)

Use of radiographs 1 (1.9%)

Detection threshold 2 (3.8%)

Measurement of reliability 24 ( 46.2%)

Reporting of reliability measurement 29 (55.8%)

Consistency in APPLICATION of caries experience assessment methodology

Deviations from the original recommendations were often present

NA = not applicable

Table: Percentage of reports applying WHO Basic Methods for Oral Health Surveys NOT adhering to the guidelines (52 reports included)

13

Page 14: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Introducing .... the Smile for Life study

14

A multi-component oral health intervention in young children in Flanders (Belgium)

Page 15: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Project design

Intervention group 1080 children

Control group 1057 children

Standard care

Medical check up Education on feeding, child rearing, sleeping, safety,… 2 specific education topics on oral health

Communication tools Placemat Toothbrush Cup Booklet

Extended care program on oral health promotion

Education (55 topics) Dietary advice Pacifier use Toothbrushing Dental visit …

+

Delivered by nurses and physicians of Child & Family Evaluation: at age 3 and 5 years

15

Page 16: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Genk

Hasselt

Lummen

Tienen

Leuven

Haacht

KraainemBrussel

Tielt-Winge

Halle

Gooik

(Leerbeek)

Merchtem

Puurs

Berlaar

Zandhoven

Herentals

(Olen)

Kasterlee

Turnhout

(Merksplas)

Brecht

(Wuustwezel)

Overpelt

Peer

Borgloon

Tongeren

(Hoeselt)

Zottegem

Oudenaarde

Gent

Eeklo

Aalst

Dendermonde

St. Niklaas

Beveren

(Vrassene)

Beernem

(Oostkamp)

Tielt

Diksmuide

Ieper

Brugge

Lokeren

WetterenDeinze

Ninove

Vilvoorde

(Strombeek-Bever)

Mortsel

(Edegem)

Brasschaat

Mechelen

(St. Katelijne Waver)

Oostende

Roeselare

(Rumbeke)

KortrijkKortrijk-rand

(Gulligem)

Dilsen-Stokkem

Waregem

Antwerpen

Geel

(Meerhout)

Control region

Intervention region

Flanders

120 km

16

Page 17: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

ORAL EXAMINATIONS • child seated on ordinary chair

• mouth mirror with built-in light source (Mirrolite® by Defend® from Medident, Belgium) • WHO/CPITN type E probe (ball-ended probe) (Prima Dental Instruments, Gloucester, UK) • no cleaning (cotton rolls) • no radiographs

17

Page 18: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

ORAL EXAMINATIONS • at school, exact date not announced beforehand

• trained dentist-examiners (n=8) + nurse • training consisted of explanation of overall set-up, organisational aspects, illustration of clinical variables • calibration exercises were organised: slides, examination of subjects

Despite training, considerable variation remains....

18

Page 19: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Introducing … the Signal Tandmobiel® study

19

Longitudinal oral health survey in school-aged children (primary school, age 6–12 years)

Focus on CARIES, but also EMERGENCE TIMES of permanent teeth, ORAL HYGIENE, GINGIVAL HEALTH, FLUOROSIS… and QUESTIONNAIRE data

Evaluation of oral health promotion INTERVENTION

Page 20: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

• N= 4468 children (2153 girls)

• Sample representative for Flemish children born in 1989 (7.3%)

• Stratified cluster random sampling of schools (15 strata = 5 provinces x 3 education systems)

• Children +/- equal probability of being sampled

• Annual examinations for 6 years (primary school)

20

Page 22: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Data structure

PROVINCES

EDUCATIONAL SYSTEM

SCHOOLS

CLASS 22

Page 23: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Data structure

SURFACE

TOOTH

CHILD

• Caries experience (BASCD – Pine et al.,

1997)

• Gingival health (SBI – Mühleman & Son,

1971)

• Oral hygiene (PI – Silness & Löe, 1964)

• Occlusal plaque accumulation (adapted from Carvalho et al., 1989)

• Clinical eruption stage of permanent

teeth (Carvalho et al., 1989)

• Fluorosis (Thylstrup & Fejerskov, 1978)

SITE

TOOTH

SURFACE

SITE

• Oral hygiene habits

• (Systemic) fluoride supplements

• Dietary habits

• Socio-demographic data

• …

Calibration was undertaken for each of the clinical variables; at several occasions 23

Page 25: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

25

Misclassification terminology

• Misclassification = measurement error on categorical measurements, often binary

• Gold standard = perfect scorer

• Benchmark scorer = reference scorer who is likely not perfect

• Main data = data collected for epidemiologic research

• Validation data = data on smaller group of subjects to evaluate scoring behavior of scorers

Two types of misclassification mechanisms:

– Non-differential: misclassification does not depend on other factors

– Differential: misclassification depends on other factors, related to subject or scorer

Page 26: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

26

Effect of misclassification

• Distorted estimates of

– Prevalence and incidence

– Impact of risk factors, often attenuation of effect of risk factors on disease outcome

– Even for large studies

• Drop in statistical efficiency

– SD of the measurements increases

– Power to detect significant difference decreases

– Sample size needs to be increased!

Page 27: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

27

Measuring misclassification

How is scoring evaluated?

• Gold standard is available:

– Sensitivity: % diseased subjects scored as diseased

– Specificity: % healthy subjects scored as healthy

• Gold standard is NOT available:

– % agreement: % scores that 2 scorers agree upon

– kappa: % agreement corrected for random agreement

Page 28: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

28

Sensitivity, specificity

GOLD STANDARD

EXAMINER

0 1

0 37 13 50

1 3 27 30

40 40 80

27

Sens 0.67540

Example

37

Spec 0.92540

Sensitivity 67.5%

Specificity 92.5%

Page 29: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

29

Example

• Smile for Life study

• Illustration of:

– Scoring behavior depends on level of measurement

– Scoring behavior can drastically depend on type of measurement

Page 30: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

30

0

0,2

0,4

0,6

0,8

1

0 1 2 3 4 5 6 7 8

Sensitivity d3mf

Specificity d3mf

Examiner

Caries at d3 level d3

Smile for Life study

Page 31: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

31

0

0,2

0,4

0,6

0,8

1

0 1 2 3 4 5 6 7 8

Specificity d3mf

Specificity d1mf

Examiner

Caries at d1 versus d3 level

d1

d3

Smile for Life study

Page 32: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

32

0

0,2

0,4

0,6

0,8

1

0 1 2 3 4 5 6 7 8

Sensitivity

Specificity

Examiner

Plaque scoring

Smile for Life study

Page 33: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

33

% Agreement, kappa

Example

Examiner 1

Examin er 2

0 1

0 37 13 50

1 3 27 30

40 40 80

O

37 27 64p 0.80

40 40 80

E

50 40 30 40p 0.50

80 80 80 80

EO

E

0.80 0.50

0.500.6

p p

1 p

% agreement = 80%

% random agreement = 50%

Kappa = 0.6

Page 35: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

35

Kappa

Intra- and inter-examiner kappa:

– Intra-examiner kappa: two scores of the same scorer

– Inter-examiner kappa: two scores of different scorers

Weighted kappa:

– Kappa for ordinal scores

– Difference in scoring is weighted

Page 36: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

36

Kappa

Problems with kappa

0 1

0 37 13

1 3 27

0 1

0 35 11

1 5 29

0 1

0 32 8

1 8 32

0 1

0 29 5

1 11 35

0 1

0 27 3

1 13 37

UNDERSCORING OVERSCORING VARIABILITY

0.6 0.6 0.6 0.6 0.6

What is the message?

Impact on risk estimates?

Page 37: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Kappa versus sens, spec

37

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Prevalence

Ka

pp

a

Sens=90%

Sens=70%

Sens=50%1st study 2nd study

Page 38: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

38

Example

• Signal Tandmobiel study

• Illustration of:

– Correcting for misclassification error using estimates of sensitivity and specificity

– That kappa values may not be helpful

E. Lesaffre, S.M. Mwalili, D. Declerck: Analysis of caries experience taking inter-observer bias and variability into account. Journal of Dental Research, 2004, 83 (12), 951-955.

Page 39: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Epidemiological risk model

• Logistic regression

• Purpose: Determine effect of exposure on disease = estimate 1

0 1

P Y 1|xlog x

P Y 0 |x

Y = 1 disease (caries)

Y = 0 no disease (no caries)

x = 1 exposure (brushing < 1/day)

x = 0 no exposure (brushing > 1/day)

1 = 0 NO EFFECT

1 > 0 exposure increases risk for disease

39

Page 40: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Correction for misclassification

• Solution: direct correction in epidemiological risk model

– Misclassification in response Y

– Validation set (calibration exercise) available

– YE score of examiner & YG score of gold standard

– with

– estimate 0 and 1 using validation data

– estimate (0), 1 using main data

but, take into account that 0 and 1 are estimated

E 0 0 1 G

P Y 1| x 1 P Y 1| x

0 E G

1 E G

P Y 1|Y 0

P

(1

Y 0 |Y 1

spec)

(1 sens)

40

Page 41: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Caries experience in Flanders

• From 1st years results (7 year old children)

West

Flanders

East

Flanders

Brabant

Antwerp

Limburg

[0.49,1.10] (1.10,1.40] (1.40,2.25]

KEY

high caries

low caries

East-west gradient in caries experience

But significant? 41

Page 42: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Caries experience in Flanders: epidemiological model

• Ordinal logistic (random effects) model

– Ordinal: dmft-score is split up in 4 classes (0,1,2,3)

– Extension of previous logistic regression model

– Random effects: assume each school has its own level

j 0k41 2 3age gender xcor ycor b

P Y j |xlog

P Y j |x

xcor = x-coordinate of municipality in Flanders

ycor = y-coordinate of municipality in Flanders

j = intercept corresponding to jth level (j=1,2,3)

b0k = (random) effect of school k (has a distribution which must be estimated)

42

Page 43: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Caries experience in Flanders: epidemiological model

• Result from statistical analysis

• Conclusion:

Significant east-west gradient in caries experience

• But: Due to different scoring of examiners?

3

4

ˆ 0.198 (P 0.0001)

ˆ 0.017 (P 0.35)

43

Page 44: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Caries experience in Flanders: calibration exercises

• 4 calibration exercises

– 16 dental examiners

– 1 gold standard

– Weighted kappa values versus “gold standard”: 0.65 => 1.00

• Effect on estimated regression coefficients??

44

Page 45: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Caries experience in Flanders: calibration exercises

• Dental examiners active in restricted geographical areas

(-15%,-5%] (-5%,5%] (5%,18%)

KEY

West

Flanders

East

Flanders

Brabant

Antwerp

Limburg

overscoring

underscoring

Is east-west gradient created by misclassification by examiners? 45

Page 46: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Caries experience in Flanders: correction for misclassification by examiners

• Correction by estimating correction factors from calibration exercises

• Correction terms depend on examiner

• Correction analysis takes uncertainty of estimation of correction terms into account

3

4

ˆ 0.225 (P 0.0001)

ˆ 0.023 (P 0.32)

East-west gradient is NOT created by misclassification by examiners!

46

Page 47: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Caries experience in Flanders: Some conclusions

• Reporting kappa-values does NOT give much insight on the actual impact of the random and systematic error in scoring disease and/or exposure.

• Correction should be done using correction terms (sensitivity, specificity) estimated from a validation data set, taking into account that the correction terms are estimated.

• But, are the sensitivity and specificity obtained from calibration exercises good measures of misclassification behavior in practice?

• Good news: for kappa values above 0.90, results are relatively stable (but there is always a loss in efficiency)

47

Page 48: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

• Validation data from calibration exercises do not represent scoring behavior of examiners in practice.

• Experiment was set up in Smile for Life study to compare:

• Sensitivity and specificity of examiners gold standard (benchmark scorer) in calibration exercise & field conditions

48

Quality of sensitivity & specificity from calibration exercises

J. Agbaje, T. Mutsvari, E. Lesaffre, D. Declerck: Examiner performance in calibration exercises compared with field conditions when scoring caries experience. Clinical Oral Investigations, 2012, 16, 481-488

Page 49: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Quality of sensitivity & specificity from calibration exercises

• Results:

Calibration exercise Field conditions

Examiner Sens (%) Spec (%) Sens (%) Spec (%)

EX01 70.00 99.67 50.00 99.72

EX02 68.33 99.71 - 100.00

EX03 85.00 99.00 41.67 99.64

EX04 72.73 99.49 42.86 99.47

EX05 86.67 98.94 60.00 99.02

EX06 76.67 99.12 53.85 97.86

EX07 83.33 99.23 41.94 98.46

EX08 53.33 99.60 70.59 99.16

Average 74.52 99.34 51.56 99.17 49

Page 50: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Additional comments

• The sensitivity, specificity, kappa values from calibration exercises are probably too good compared to what is obtained in practice. Some additional checking in field conditions is therefore useful.

• Validation data should ideally be obtained from a random sample of the main study (double sampling).

• Here focus was on misclassification of response in epidemiological models. But correction for misclassification can also applied when misclassification occurs on risk factors.

• Correction for misclassification can be done also in more complicated models such as multilevel models, longitudinal studies, etc.

• For caries experience measured in a longitudinal study, correction for misclassification can be done without a validation data set (but only in a longitudinal study).

50

Page 57: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Additional literature Statistical and Methodological Aspects of Oral Health Research E. Lesaffre, J. Feine, B. Leroux, D. Declerck (editors). Wiley-Blackwell, 2009 (ISBN 13: 9780470517925)

T. Mutsvari, D. Bandyopadhyay, D. Declerck and E. Lesaffre: A multilevel model for spatially correlated binary data in the presence of misclassification: An application in oral health research. Statistics in Medicine 2013 Aug 29. doi: 10.1002/sim.5944 T. Mutsvari, D. Declerck and E. Lesaffre: Correction for misclassification of caries experience in the absence of internal validation data. Clinical Oral Investigations, 2013 Nov;17(8):1799-805 M.J. García-Zattera, A. Jara, E. Lesaffre and G. Marshall: Modelling of multivariate monotone disease processes in the presence of misclassification. JASA- Applications and Case Studies, 2012, 107, 976-989 T. Mutsvari, MJ. García-Zattera, D. Declerck and E. Lesaffre: Dealing with misclassification and missing data when estimating prevalence and incidence of caries experience. Community Dentistry and Oral Epidemiology, 2012, 40, S1, 28-35

57

Page 58: How to extract more (and better quality) information from ... · Content of this workshop • Multi-site and split-mouth studies • Misclassification issues • Time-to-event studies

Additional literature

M.J. Garcia-Zattera, T. Mutsvari, A. Jara, D. Declerck, E. Lesaffre: Correcting for misclassification for a monotone disease process with an application in dental research. Statistics in Medicine, 2010, 29, 3103–3117 S. Mwalili, E. Lesaffre, D. Declerck: The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research. Statistical Methods in Medical Research; 17: 123–139, 2008

58