how to extract more (and better quality) information from ... · content of this workshop •...

Post on 17-Oct-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

How to extract more (and better quality) information from your study?

‘simple questions’ ≠ ‘simple analyses’

University of Oulu

November 13, 2014

Emmanuel Lesaffre (& Dominique Declerck) Part II: Misclassification

1

• Usually based on visual screening, sometimes complemented with tactile examination (dental probe)

• Substantial variation in scoring between EXAMINERS may lead to considerable misclassification (Ismail, 2004; Assaf et al, 2004 & 2006)

• Problematic for:

– Comparability of results

– Repeatability

Examiner misclassification

Example from Caries Research

4 Need for standardisation!

• Comparability of results: – when multiple examiners involved

e.g. collection of country-wide CE data e.g. collection of data in different dental clinics or dental practices

Examiner misclassification

Need for standardisation!

Example from Caries Research

6

West

Flanders

East

Flanders

Brabant

Antwerp

Limburg

[0.49,1.10] (1.10,1.40] (1.40,2.25]

KEY

d1 d2 d3 d4 d2

REVERSAL possible !

8

Why so difficult?

Standardisation

For this purpose, guidelines were developed by different research groups

With following aims:

To provide a systematic approach to the collection and reporting of data on caries experience

To ensure that data collected in a wide range of environments are valid and comparable

http://www.who.int/en/ http://www.bascd.org/ http://www.dundee.ac.uk/dhsru/ (World Health Organization, 1971) (Pitts et al., 1997) (Pitts, 2004) 9

Is there a problem?

Review of methodological aspects related to caries experience assessment in reports of epidemiological surveys published between January 2000 and December 2008 (n=89), more specifically:

Reporting of methodological aspects

Application of standardized methodology

J. Agbaje, D. Declerck and E. Lesaffre: Assessment of caries experience in epidemiological surveys: a review. Community Dental Health, 2012, 29, 14-19 11

ITEM

REPORTED NOT REPORTED

EXPLICITLY MENTIONED

ASSUMED TO BE APPLIED*

Use of standardization criteria 80 (89.9%) NA 9 (10.1%)

Materials and setting Use of probe Type of probe Light condition Use of radiographs Cleaning / debris removal

60 (67.4%) 51 (57.3%) 60 (67.4%) 31 (34.8%) 28 (31.5%)

82 (92.1%) 75 (84.3%) 82 (92.1%) 80 (89.9%) 77 (86.5%)

7 (7.9%)

14 (15.7%) 7 (7.9%)

9 (10.1%) 12 (13.5%)

Detection threshold applied 42 (47.2%) 84 (94.4%) 5 (5.6%)

Examiner characteristics Training Calibration Reliability assessed Reliability reported

57 (64.0%) 61 (68.5%) 47 (52.8%) 41 (46.1%)

58 (65.2%)

0 48 (53.9%)

NA

31 (34.8%) 28 (31.5%) 41 (46.1%) 48 (53.9%)

Frequency of REPORTING of caries experience assessment methodology

* The column “assumed to be applied” contains the sum of surveys were information was explicitly mentioned and those where reference was made to standardisation criteria containing information on the item of interest. NA = not applicable

Journals with Impact Factor performed better

12

ITEM CONSIDERED N (%) Use of probe 2 (3.8%)

Type of probe 27 (51.9%)

Light condition 16 (30.8%)

Cleaning 5 (9.6%)

Use of radiographs 1 (1.9%)

Detection threshold 2 (3.8%)

Measurement of reliability 24 ( 46.2%)

Reporting of reliability measurement 29 (55.8%)

Consistency in APPLICATION of caries experience assessment methodology

Deviations from the original recommendations were often present

NA = not applicable

Table: Percentage of reports applying WHO Basic Methods for Oral Health Surveys NOT adhering to the guidelines (52 reports included)

13

Introducing .... the Smile for Life study

14

A multi-component oral health intervention in young children in Flanders (Belgium)

Project design

Intervention group 1080 children

Control group 1057 children

Standard care

Medical check up Education on feeding, child rearing, sleeping, safety,… 2 specific education topics on oral health

Communication tools Placemat Toothbrush Cup Booklet

Extended care program on oral health promotion

Education (55 topics) Dietary advice Pacifier use Toothbrushing Dental visit …

+

Delivered by nurses and physicians of Child & Family Evaluation: at age 3 and 5 years

15

Genk

Hasselt

Lummen

Tienen

Leuven

Haacht

KraainemBrussel

Tielt-Winge

Halle

Gooik

(Leerbeek)

Merchtem

Puurs

Berlaar

Zandhoven

Herentals

(Olen)

Kasterlee

Turnhout

(Merksplas)

Brecht

(Wuustwezel)

Overpelt

Peer

Borgloon

Tongeren

(Hoeselt)

Zottegem

Oudenaarde

Gent

Eeklo

Aalst

Dendermonde

St. Niklaas

Beveren

(Vrassene)

Beernem

(Oostkamp)

Tielt

Diksmuide

Ieper

Brugge

Lokeren

WetterenDeinze

Ninove

Vilvoorde

(Strombeek-Bever)

Mortsel

(Edegem)

Brasschaat

Mechelen

(St. Katelijne Waver)

Oostende

Roeselare

(Rumbeke)

KortrijkKortrijk-rand

(Gulligem)

Dilsen-Stokkem

Waregem

Antwerpen

Geel

(Meerhout)

Control region

Intervention region

Flanders

120 km

16

ORAL EXAMINATIONS • child seated on ordinary chair

• mouth mirror with built-in light source (Mirrolite® by Defend® from Medident, Belgium) • WHO/CPITN type E probe (ball-ended probe) (Prima Dental Instruments, Gloucester, UK) • no cleaning (cotton rolls) • no radiographs

17

ORAL EXAMINATIONS • at school, exact date not announced beforehand

• trained dentist-examiners (n=8) + nurse • training consisted of explanation of overall set-up, organisational aspects, illustration of clinical variables • calibration exercises were organised: slides, examination of subjects

Despite training, considerable variation remains....

18

Introducing … the Signal Tandmobiel® study

19

Longitudinal oral health survey in school-aged children (primary school, age 6–12 years)

Focus on CARIES, but also EMERGENCE TIMES of permanent teeth, ORAL HYGIENE, GINGIVAL HEALTH, FLUOROSIS… and QUESTIONNAIRE data

Evaluation of oral health promotion INTERVENTION

• N= 4468 children (2153 girls)

• Sample representative for Flemish children born in 1989 (7.3%)

• Stratified cluster random sampling of schools (15 strata = 5 provinces x 3 education systems)

• Children +/- equal probability of being sampled

• Annual examinations for 6 years (primary school)

20

Data structure

PROVINCES

EDUCATIONAL SYSTEM

SCHOOLS

CLASS 22

Data structure

SURFACE

TOOTH

CHILD

• Caries experience (BASCD – Pine et al.,

1997)

• Gingival health (SBI – Mühleman & Son,

1971)

• Oral hygiene (PI – Silness & Löe, 1964)

• Occlusal plaque accumulation (adapted from Carvalho et al., 1989)

• Clinical eruption stage of permanent

teeth (Carvalho et al., 1989)

• Fluorosis (Thylstrup & Fejerskov, 1978)

SITE

TOOTH

SURFACE

SITE

• Oral hygiene habits

• (Systemic) fluoride supplements

• Dietary habits

• Socio-demographic data

• …

Calibration was undertaken for each of the clinical variables; at several occasions 23

25

Misclassification terminology

• Misclassification = measurement error on categorical measurements, often binary

• Gold standard = perfect scorer

• Benchmark scorer = reference scorer who is likely not perfect

• Main data = data collected for epidemiologic research

• Validation data = data on smaller group of subjects to evaluate scoring behavior of scorers

Two types of misclassification mechanisms:

– Non-differential: misclassification does not depend on other factors

– Differential: misclassification depends on other factors, related to subject or scorer

26

Effect of misclassification

• Distorted estimates of

– Prevalence and incidence

– Impact of risk factors, often attenuation of effect of risk factors on disease outcome

– Even for large studies

• Drop in statistical efficiency

– SD of the measurements increases

– Power to detect significant difference decreases

– Sample size needs to be increased!

27

Measuring misclassification

How is scoring evaluated?

• Gold standard is available:

– Sensitivity: % diseased subjects scored as diseased

– Specificity: % healthy subjects scored as healthy

• Gold standard is NOT available:

– % agreement: % scores that 2 scorers agree upon

– kappa: % agreement corrected for random agreement

28

Sensitivity, specificity

GOLD STANDARD

EXAMINER

0 1

0 37 13 50

1 3 27 30

40 40 80

27

Sens 0.67540

Example

37

Spec 0.92540

Sensitivity 67.5%

Specificity 92.5%

29

Example

• Smile for Life study

• Illustration of:

– Scoring behavior depends on level of measurement

– Scoring behavior can drastically depend on type of measurement

30

0

0,2

0,4

0,6

0,8

1

0 1 2 3 4 5 6 7 8

Sensitivity d3mf

Specificity d3mf

Examiner

Caries at d3 level d3

Smile for Life study

31

0

0,2

0,4

0,6

0,8

1

0 1 2 3 4 5 6 7 8

Specificity d3mf

Specificity d1mf

Examiner

Caries at d1 versus d3 level

d1

d3

Smile for Life study

32

0

0,2

0,4

0,6

0,8

1

0 1 2 3 4 5 6 7 8

Sensitivity

Specificity

Examiner

Plaque scoring

Smile for Life study

33

% Agreement, kappa

Example

Examiner 1

Examin er 2

0 1

0 37 13 50

1 3 27 30

40 40 80

O

37 27 64p 0.80

40 40 80

E

50 40 30 40p 0.50

80 80 80 80

EO

E

0.80 0.50

0.500.6

p p

1 p

% agreement = 80%

% random agreement = 50%

Kappa = 0.6

35

Kappa

Intra- and inter-examiner kappa:

– Intra-examiner kappa: two scores of the same scorer

– Inter-examiner kappa: two scores of different scorers

Weighted kappa:

– Kappa for ordinal scores

– Difference in scoring is weighted

36

Kappa

Problems with kappa

0 1

0 37 13

1 3 27

0 1

0 35 11

1 5 29

0 1

0 32 8

1 8 32

0 1

0 29 5

1 11 35

0 1

0 27 3

1 13 37

UNDERSCORING OVERSCORING VARIABILITY

0.6 0.6 0.6 0.6 0.6

What is the message?

Impact on risk estimates?

Kappa versus sens, spec

37

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Prevalence

Ka

pp

a

Sens=90%

Sens=70%

Sens=50%1st study 2nd study

38

Example

• Signal Tandmobiel study

• Illustration of:

– Correcting for misclassification error using estimates of sensitivity and specificity

– That kappa values may not be helpful

E. Lesaffre, S.M. Mwalili, D. Declerck: Analysis of caries experience taking inter-observer bias and variability into account. Journal of Dental Research, 2004, 83 (12), 951-955.

Epidemiological risk model

• Logistic regression

• Purpose: Determine effect of exposure on disease = estimate 1

0 1

P Y 1|xlog x

P Y 0 |x

Y = 1 disease (caries)

Y = 0 no disease (no caries)

x = 1 exposure (brushing < 1/day)

x = 0 no exposure (brushing > 1/day)

1 = 0 NO EFFECT

1 > 0 exposure increases risk for disease

39

Correction for misclassification

• Solution: direct correction in epidemiological risk model

– Misclassification in response Y

– Validation set (calibration exercise) available

– YE score of examiner & YG score of gold standard

– with

– estimate 0 and 1 using validation data

– estimate (0), 1 using main data

but, take into account that 0 and 1 are estimated

E 0 0 1 G

P Y 1| x 1 P Y 1| x

0 E G

1 E G

P Y 1|Y 0

P

(1

Y 0 |Y 1

spec)

(1 sens)

40

Caries experience in Flanders

• From 1st years results (7 year old children)

West

Flanders

East

Flanders

Brabant

Antwerp

Limburg

[0.49,1.10] (1.10,1.40] (1.40,2.25]

KEY

high caries

low caries

East-west gradient in caries experience

But significant? 41

Caries experience in Flanders: epidemiological model

• Ordinal logistic (random effects) model

– Ordinal: dmft-score is split up in 4 classes (0,1,2,3)

– Extension of previous logistic regression model

– Random effects: assume each school has its own level

j 0k41 2 3age gender xcor ycor b

P Y j |xlog

P Y j |x

xcor = x-coordinate of municipality in Flanders

ycor = y-coordinate of municipality in Flanders

j = intercept corresponding to jth level (j=1,2,3)

b0k = (random) effect of school k (has a distribution which must be estimated)

42

Caries experience in Flanders: epidemiological model

• Result from statistical analysis

• Conclusion:

Significant east-west gradient in caries experience

• But: Due to different scoring of examiners?

3

4

ˆ 0.198 (P 0.0001)

ˆ 0.017 (P 0.35)

43

Caries experience in Flanders: calibration exercises

• 4 calibration exercises

– 16 dental examiners

– 1 gold standard

– Weighted kappa values versus “gold standard”: 0.65 => 1.00

• Effect on estimated regression coefficients??

44

Caries experience in Flanders: calibration exercises

• Dental examiners active in restricted geographical areas

(-15%,-5%] (-5%,5%] (5%,18%)

KEY

West

Flanders

East

Flanders

Brabant

Antwerp

Limburg

overscoring

underscoring

Is east-west gradient created by misclassification by examiners? 45

Caries experience in Flanders: correction for misclassification by examiners

• Correction by estimating correction factors from calibration exercises

• Correction terms depend on examiner

• Correction analysis takes uncertainty of estimation of correction terms into account

3

4

ˆ 0.225 (P 0.0001)

ˆ 0.023 (P 0.32)

East-west gradient is NOT created by misclassification by examiners!

46

Caries experience in Flanders: Some conclusions

• Reporting kappa-values does NOT give much insight on the actual impact of the random and systematic error in scoring disease and/or exposure.

• Correction should be done using correction terms (sensitivity, specificity) estimated from a validation data set, taking into account that the correction terms are estimated.

• But, are the sensitivity and specificity obtained from calibration exercises good measures of misclassification behavior in practice?

• Good news: for kappa values above 0.90, results are relatively stable (but there is always a loss in efficiency)

47

• Validation data from calibration exercises do not represent scoring behavior of examiners in practice.

• Experiment was set up in Smile for Life study to compare:

• Sensitivity and specificity of examiners gold standard (benchmark scorer) in calibration exercise & field conditions

48

Quality of sensitivity & specificity from calibration exercises

J. Agbaje, T. Mutsvari, E. Lesaffre, D. Declerck: Examiner performance in calibration exercises compared with field conditions when scoring caries experience. Clinical Oral Investigations, 2012, 16, 481-488

Quality of sensitivity & specificity from calibration exercises

• Results:

Calibration exercise Field conditions

Examiner Sens (%) Spec (%) Sens (%) Spec (%)

EX01 70.00 99.67 50.00 99.72

EX02 68.33 99.71 - 100.00

EX03 85.00 99.00 41.67 99.64

EX04 72.73 99.49 42.86 99.47

EX05 86.67 98.94 60.00 99.02

EX06 76.67 99.12 53.85 97.86

EX07 83.33 99.23 41.94 98.46

EX08 53.33 99.60 70.59 99.16

Average 74.52 99.34 51.56 99.17 49

Additional comments

• The sensitivity, specificity, kappa values from calibration exercises are probably too good compared to what is obtained in practice. Some additional checking in field conditions is therefore useful.

• Validation data should ideally be obtained from a random sample of the main study (double sampling).

• Here focus was on misclassification of response in epidemiological models. But correction for misclassification can also applied when misclassification occurs on risk factors.

• Correction for misclassification can be done also in more complicated models such as multilevel models, longitudinal studies, etc.

• For caries experience measured in a longitudinal study, correction for misclassification can be done without a validation data set (but only in a longitudinal study).

50

Additional literature Statistical and Methodological Aspects of Oral Health Research E. Lesaffre, J. Feine, B. Leroux, D. Declerck (editors). Wiley-Blackwell, 2009 (ISBN 13: 9780470517925)

T. Mutsvari, D. Bandyopadhyay, D. Declerck and E. Lesaffre: A multilevel model for spatially correlated binary data in the presence of misclassification: An application in oral health research. Statistics in Medicine 2013 Aug 29. doi: 10.1002/sim.5944 T. Mutsvari, D. Declerck and E. Lesaffre: Correction for misclassification of caries experience in the absence of internal validation data. Clinical Oral Investigations, 2013 Nov;17(8):1799-805 M.J. García-Zattera, A. Jara, E. Lesaffre and G. Marshall: Modelling of multivariate monotone disease processes in the presence of misclassification. JASA- Applications and Case Studies, 2012, 107, 976-989 T. Mutsvari, MJ. García-Zattera, D. Declerck and E. Lesaffre: Dealing with misclassification and missing data when estimating prevalence and incidence of caries experience. Community Dentistry and Oral Epidemiology, 2012, 40, S1, 28-35

57

Additional literature

M.J. Garcia-Zattera, T. Mutsvari, A. Jara, D. Declerck, E. Lesaffre: Correcting for misclassification for a monotone disease process with an application in dental research. Statistics in Medicine, 2010, 29, 3103–3117 S. Mwalili, E. Lesaffre, D. Declerck: The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research. Statistical Methods in Medical Research; 17: 123–139, 2008

58

top related