how to extract more (and better quality) information from ... · content of this workshop •...
TRANSCRIPT
How to extract more (and better quality) information from your study?
‘simple questions’ ≠ ‘simple analyses’
University of Oulu
November 13, 2014
Emmanuel Lesaffre (& Dominique Declerck) Part II: Misclassification
1
Content of this workshop
• Multi-site and split-mouth studies
• Misclassification issues
• Time-to-event studies
• Missing data
2
Assessment of outcomes (eg disease levels)
in oral health surveys: is there a problem?
Example from Caries Research
3
• Usually based on visual screening, sometimes complemented with tactile examination (dental probe)
• Substantial variation in scoring between EXAMINERS may lead to considerable misclassification (Ismail, 2004; Assaf et al, 2004 & 2006)
• Problematic for:
– Comparability of results
– Repeatability
Examiner misclassification
Example from Caries Research
4 Need for standardisation!
• Comparability of results: – between different surveys
e.g. comparison of CE scores between different countries or regions
Examiner misclassification
Example from Caries Research
5
Need for standardisation!
• Comparability of results: – when multiple examiners involved
e.g. collection of country-wide CE data e.g. collection of data in different dental clinics or dental practices
Examiner misclassification
Need for standardisation!
Example from Caries Research
6
West
Flanders
East
Flanders
Brabant
Antwerp
Limburg
[0.49,1.10] (1.10,1.40] (1.40,2.25]
KEY
• Repeatability – E.g. in longitudinal surveys with repeated measurements
in same group over time, ...
Examiner misclassification
Example from Caries Research
7 Need for standardisation!
d1 d2 d3 d4 d2
REVERSAL possible !
8
Why so difficult?
Standardisation
For this purpose, guidelines were developed by different research groups
With following aims:
To provide a systematic approach to the collection and reporting of data on caries experience
To ensure that data collected in a wide range of environments are valid and comparable
http://www.who.int/en/ http://www.bascd.org/ http://www.dundee.ac.uk/dhsru/ (World Health Organization, 1971) (Pitts et al., 1997) (Pitts, 2004) 9
WHO guidelines 5th edition
10
Oral Health Surveys Basic Methods WHO World Health Organization ISBN-13 9789241548649 ISBN-10 9241548649 Order Number: 11505275 CHF 25.00 / US$ 30.00 English, 2013 132 pages
Is there a problem?
Review of methodological aspects related to caries experience assessment in reports of epidemiological surveys published between January 2000 and December 2008 (n=89), more specifically:
Reporting of methodological aspects
Application of standardized methodology
J. Agbaje, D. Declerck and E. Lesaffre: Assessment of caries experience in epidemiological surveys: a review. Community Dental Health, 2012, 29, 14-19 11
ITEM
REPORTED NOT REPORTED
EXPLICITLY MENTIONED
ASSUMED TO BE APPLIED*
Use of standardization criteria 80 (89.9%) NA 9 (10.1%)
Materials and setting Use of probe Type of probe Light condition Use of radiographs Cleaning / debris removal
60 (67.4%) 51 (57.3%) 60 (67.4%) 31 (34.8%) 28 (31.5%)
82 (92.1%) 75 (84.3%) 82 (92.1%) 80 (89.9%) 77 (86.5%)
7 (7.9%)
14 (15.7%) 7 (7.9%)
9 (10.1%) 12 (13.5%)
Detection threshold applied 42 (47.2%) 84 (94.4%) 5 (5.6%)
Examiner characteristics Training Calibration Reliability assessed Reliability reported
57 (64.0%) 61 (68.5%) 47 (52.8%) 41 (46.1%)
58 (65.2%)
0 48 (53.9%)
NA
31 (34.8%) 28 (31.5%) 41 (46.1%) 48 (53.9%)
Frequency of REPORTING of caries experience assessment methodology
* The column “assumed to be applied” contains the sum of surveys were information was explicitly mentioned and those where reference was made to standardisation criteria containing information on the item of interest. NA = not applicable
Journals with Impact Factor performed better
12
ITEM CONSIDERED N (%) Use of probe 2 (3.8%)
Type of probe 27 (51.9%)
Light condition 16 (30.8%)
Cleaning 5 (9.6%)
Use of radiographs 1 (1.9%)
Detection threshold 2 (3.8%)
Measurement of reliability 24 ( 46.2%)
Reporting of reliability measurement 29 (55.8%)
Consistency in APPLICATION of caries experience assessment methodology
Deviations from the original recommendations were often present
NA = not applicable
Table: Percentage of reports applying WHO Basic Methods for Oral Health Surveys NOT adhering to the guidelines (52 reports included)
13
Introducing .... the Smile for Life study
14
A multi-component oral health intervention in young children in Flanders (Belgium)
Project design
Intervention group 1080 children
Control group 1057 children
Standard care
Medical check up Education on feeding, child rearing, sleeping, safety,… 2 specific education topics on oral health
Communication tools Placemat Toothbrush Cup Booklet
Extended care program on oral health promotion
Education (55 topics) Dietary advice Pacifier use Toothbrushing Dental visit …
+
Delivered by nurses and physicians of Child & Family Evaluation: at age 3 and 5 years
15
Genk
Hasselt
Lummen
Tienen
Leuven
Haacht
KraainemBrussel
Tielt-Winge
Halle
Gooik
(Leerbeek)
Merchtem
Puurs
Berlaar
Zandhoven
Herentals
(Olen)
Kasterlee
Turnhout
(Merksplas)
Brecht
(Wuustwezel)
Overpelt
Peer
Borgloon
Tongeren
(Hoeselt)
Zottegem
Oudenaarde
Gent
Eeklo
Aalst
Dendermonde
St. Niklaas
Beveren
(Vrassene)
Beernem
(Oostkamp)
Tielt
Diksmuide
Ieper
Brugge
Lokeren
WetterenDeinze
Ninove
Vilvoorde
(Strombeek-Bever)
Mortsel
(Edegem)
Brasschaat
Mechelen
(St. Katelijne Waver)
Oostende
Roeselare
(Rumbeke)
KortrijkKortrijk-rand
(Gulligem)
Dilsen-Stokkem
Waregem
Antwerpen
Geel
(Meerhout)
Control region
Intervention region
Flanders
120 km
16
ORAL EXAMINATIONS • child seated on ordinary chair
• mouth mirror with built-in light source (Mirrolite® by Defend® from Medident, Belgium) • WHO/CPITN type E probe (ball-ended probe) (Prima Dental Instruments, Gloucester, UK) • no cleaning (cotton rolls) • no radiographs
17
ORAL EXAMINATIONS • at school, exact date not announced beforehand
• trained dentist-examiners (n=8) + nurse • training consisted of explanation of overall set-up, organisational aspects, illustration of clinical variables • calibration exercises were organised: slides, examination of subjects
Despite training, considerable variation remains....
18
Introducing … the Signal Tandmobiel® study
19
Longitudinal oral health survey in school-aged children (primary school, age 6–12 years)
Focus on CARIES, but also EMERGENCE TIMES of permanent teeth, ORAL HYGIENE, GINGIVAL HEALTH, FLUOROSIS… and QUESTIONNAIRE data
Evaluation of oral health promotion INTERVENTION
• N= 4468 children (2153 girls)
• Sample representative for Flemish children born in 1989 (7.3%)
• Stratified cluster random sampling of schools (15 strata = 5 provinces x 3 education systems)
• Children +/- equal probability of being sampled
• Annual examinations for 6 years (primary school)
20
Data structure
PROVINCES
EDUCATIONAL SYSTEM
SCHOOLS
CLASS 22
Data structure
SURFACE
TOOTH
CHILD
• Caries experience (BASCD – Pine et al.,
1997)
• Gingival health (SBI – Mühleman & Son,
1971)
• Oral hygiene (PI – Silness & Löe, 1964)
• Occlusal plaque accumulation (adapted from Carvalho et al., 1989)
• Clinical eruption stage of permanent
teeth (Carvalho et al., 1989)
• Fluorosis (Thylstrup & Fejerskov, 1978)
SITE
TOOTH
SURFACE
SITE
• Oral hygiene habits
• (Systemic) fluoride supplements
• Dietary habits
• Socio-demographic data
• …
Calibration was undertaken for each of the clinical variables; at several occasions 23
25
Misclassification terminology
• Misclassification = measurement error on categorical measurements, often binary
• Gold standard = perfect scorer
• Benchmark scorer = reference scorer who is likely not perfect
• Main data = data collected for epidemiologic research
• Validation data = data on smaller group of subjects to evaluate scoring behavior of scorers
Two types of misclassification mechanisms:
– Non-differential: misclassification does not depend on other factors
– Differential: misclassification depends on other factors, related to subject or scorer
26
Effect of misclassification
• Distorted estimates of
– Prevalence and incidence
– Impact of risk factors, often attenuation of effect of risk factors on disease outcome
– Even for large studies
• Drop in statistical efficiency
– SD of the measurements increases
– Power to detect significant difference decreases
– Sample size needs to be increased!
27
Measuring misclassification
How is scoring evaluated?
• Gold standard is available:
– Sensitivity: % diseased subjects scored as diseased
– Specificity: % healthy subjects scored as healthy
• Gold standard is NOT available:
– % agreement: % scores that 2 scorers agree upon
– kappa: % agreement corrected for random agreement
28
Sensitivity, specificity
GOLD STANDARD
EXAMINER
0 1
0 37 13 50
1 3 27 30
40 40 80
27
Sens 0.67540
Example
37
Spec 0.92540
Sensitivity 67.5%
Specificity 92.5%
29
Example
• Smile for Life study
• Illustration of:
– Scoring behavior depends on level of measurement
– Scoring behavior can drastically depend on type of measurement
30
0
0,2
0,4
0,6
0,8
1
0 1 2 3 4 5 6 7 8
Sensitivity d3mf
Specificity d3mf
Examiner
Caries at d3 level d3
Smile for Life study
31
0
0,2
0,4
0,6
0,8
1
0 1 2 3 4 5 6 7 8
Specificity d3mf
Specificity d1mf
Examiner
Caries at d1 versus d3 level
d1
d3
Smile for Life study
32
0
0,2
0,4
0,6
0,8
1
0 1 2 3 4 5 6 7 8
Sensitivity
Specificity
Examiner
Plaque scoring
Smile for Life study
33
% Agreement, kappa
Example
Examiner 1
Examin er 2
0 1
0 37 13 50
1 3 27 30
40 40 80
O
37 27 64p 0.80
40 40 80
E
50 40 30 40p 0.50
80 80 80 80
EO
E
0.80 0.50
0.500.6
p p
1 p
% agreement = 80%
% random agreement = 50%
Kappa = 0.6
34
Kappa
Evaluation of kappa
35
Kappa
Intra- and inter-examiner kappa:
– Intra-examiner kappa: two scores of the same scorer
– Inter-examiner kappa: two scores of different scorers
Weighted kappa:
– Kappa for ordinal scores
– Difference in scoring is weighted
36
Kappa
Problems with kappa
0 1
0 37 13
1 3 27
0 1
0 35 11
1 5 29
0 1
0 32 8
1 8 32
0 1
0 29 5
1 11 35
0 1
0 27 3
1 13 37
UNDERSCORING OVERSCORING VARIABILITY
0.6 0.6 0.6 0.6 0.6
What is the message?
Impact on risk estimates?
Kappa versus sens, spec
37
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Prevalence
Ka
pp
a
Sens=90%
Sens=70%
Sens=50%1st study 2nd study
38
Example
• Signal Tandmobiel study
• Illustration of:
– Correcting for misclassification error using estimates of sensitivity and specificity
– That kappa values may not be helpful
E. Lesaffre, S.M. Mwalili, D. Declerck: Analysis of caries experience taking inter-observer bias and variability into account. Journal of Dental Research, 2004, 83 (12), 951-955.
Epidemiological risk model
• Logistic regression
• Purpose: Determine effect of exposure on disease = estimate 1
0 1
P Y 1|xlog x
P Y 0 |x
Y = 1 disease (caries)
Y = 0 no disease (no caries)
x = 1 exposure (brushing < 1/day)
x = 0 no exposure (brushing > 1/day)
1 = 0 NO EFFECT
1 > 0 exposure increases risk for disease
39
Correction for misclassification
• Solution: direct correction in epidemiological risk model
– Misclassification in response Y
– Validation set (calibration exercise) available
– YE score of examiner & YG score of gold standard
– with
– estimate 0 and 1 using validation data
– estimate (0), 1 using main data
but, take into account that 0 and 1 are estimated
E 0 0 1 G
P Y 1| x 1 P Y 1| x
0 E G
1 E G
P Y 1|Y 0
P
(1
Y 0 |Y 1
spec)
(1 sens)
40
Caries experience in Flanders
• From 1st years results (7 year old children)
West
Flanders
East
Flanders
Brabant
Antwerp
Limburg
[0.49,1.10] (1.10,1.40] (1.40,2.25]
KEY
high caries
low caries
East-west gradient in caries experience
But significant? 41
Caries experience in Flanders: epidemiological model
• Ordinal logistic (random effects) model
– Ordinal: dmft-score is split up in 4 classes (0,1,2,3)
– Extension of previous logistic regression model
– Random effects: assume each school has its own level
j 0k41 2 3age gender xcor ycor b
P Y j |xlog
P Y j |x
xcor = x-coordinate of municipality in Flanders
ycor = y-coordinate of municipality in Flanders
j = intercept corresponding to jth level (j=1,2,3)
b0k = (random) effect of school k (has a distribution which must be estimated)
42
Caries experience in Flanders: epidemiological model
• Result from statistical analysis
• Conclusion:
Significant east-west gradient in caries experience
• But: Due to different scoring of examiners?
3
4
ˆ 0.198 (P 0.0001)
ˆ 0.017 (P 0.35)
43
Caries experience in Flanders: calibration exercises
• 4 calibration exercises
– 16 dental examiners
– 1 gold standard
– Weighted kappa values versus “gold standard”: 0.65 => 1.00
• Effect on estimated regression coefficients??
44
Caries experience in Flanders: calibration exercises
• Dental examiners active in restricted geographical areas
(-15%,-5%] (-5%,5%] (5%,18%)
KEY
West
Flanders
East
Flanders
Brabant
Antwerp
Limburg
overscoring
underscoring
Is east-west gradient created by misclassification by examiners? 45
Caries experience in Flanders: correction for misclassification by examiners
• Correction by estimating correction factors from calibration exercises
• Correction terms depend on examiner
• Correction analysis takes uncertainty of estimation of correction terms into account
3
4
ˆ 0.225 (P 0.0001)
ˆ 0.023 (P 0.32)
East-west gradient is NOT created by misclassification by examiners!
46
Caries experience in Flanders: Some conclusions
• Reporting kappa-values does NOT give much insight on the actual impact of the random and systematic error in scoring disease and/or exposure.
• Correction should be done using correction terms (sensitivity, specificity) estimated from a validation data set, taking into account that the correction terms are estimated.
• But, are the sensitivity and specificity obtained from calibration exercises good measures of misclassification behavior in practice?
• Good news: for kappa values above 0.90, results are relatively stable (but there is always a loss in efficiency)
47
• Validation data from calibration exercises do not represent scoring behavior of examiners in practice.
• Experiment was set up in Smile for Life study to compare:
• Sensitivity and specificity of examiners gold standard (benchmark scorer) in calibration exercise & field conditions
48
Quality of sensitivity & specificity from calibration exercises
J. Agbaje, T. Mutsvari, E. Lesaffre, D. Declerck: Examiner performance in calibration exercises compared with field conditions when scoring caries experience. Clinical Oral Investigations, 2012, 16, 481-488
Quality of sensitivity & specificity from calibration exercises
• Results:
Calibration exercise Field conditions
Examiner Sens (%) Spec (%) Sens (%) Spec (%)
EX01 70.00 99.67 50.00 99.72
EX02 68.33 99.71 - 100.00
EX03 85.00 99.00 41.67 99.64
EX04 72.73 99.49 42.86 99.47
EX05 86.67 98.94 60.00 99.02
EX06 76.67 99.12 53.85 97.86
EX07 83.33 99.23 41.94 98.46
EX08 53.33 99.60 70.59 99.16
Average 74.52 99.34 51.56 99.17 49
Additional comments
• The sensitivity, specificity, kappa values from calibration exercises are probably too good compared to what is obtained in practice. Some additional checking in field conditions is therefore useful.
• Validation data should ideally be obtained from a random sample of the main study (double sampling).
• Here focus was on misclassification of response in epidemiological models. But correction for misclassification can also applied when misclassification occurs on risk factors.
• Correction for misclassification can be done also in more complicated models such as multilevel models, longitudinal studies, etc.
• For caries experience measured in a longitudinal study, correction for misclassification can be done without a validation data set (but only in a longitudinal study).
50
Example from literature
51
Example from literature
• Did the authors consider possible misclassification?
• What did they undertake in order to limit misclassification?
• Did they report this appropriately in their paper?
• What could be an alternative approach?
54
> Further clinical reflections 55
Misclassification
• Misclassification can have different causes but always impacts upon research results
• It is important to explore the nature of misclassification
• The extent and type of misclassification should be assessed
• Information on misclassification can be used in statistical analysis
56
Additional literature Statistical and Methodological Aspects of Oral Health Research E. Lesaffre, J. Feine, B. Leroux, D. Declerck (editors). Wiley-Blackwell, 2009 (ISBN 13: 9780470517925)
T. Mutsvari, D. Bandyopadhyay, D. Declerck and E. Lesaffre: A multilevel model for spatially correlated binary data in the presence of misclassification: An application in oral health research. Statistics in Medicine 2013 Aug 29. doi: 10.1002/sim.5944 T. Mutsvari, D. Declerck and E. Lesaffre: Correction for misclassification of caries experience in the absence of internal validation data. Clinical Oral Investigations, 2013 Nov;17(8):1799-805 M.J. García-Zattera, A. Jara, E. Lesaffre and G. Marshall: Modelling of multivariate monotone disease processes in the presence of misclassification. JASA- Applications and Case Studies, 2012, 107, 976-989 T. Mutsvari, MJ. García-Zattera, D. Declerck and E. Lesaffre: Dealing with misclassification and missing data when estimating prevalence and incidence of caries experience. Community Dentistry and Oral Epidemiology, 2012, 40, S1, 28-35
57
Additional literature
M.J. Garcia-Zattera, T. Mutsvari, A. Jara, D. Declerck, E. Lesaffre: Correcting for misclassification for a monotone disease process with an application in dental research. Statistics in Medicine, 2010, 29, 3103–3117 S. Mwalili, E. Lesaffre, D. Declerck: The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research. Statistical Methods in Medical Research; 17: 123–139, 2008
58
59