missing data in the infectious diseases institute clinic database agnes n kiragga east africa iedea...
TRANSCRIPT
MISSING DATA IN THE INFECTIOUS DISEASES
INSTITUTE CLINIC DATABASE
Agnes N Kiragga
East Africa IeDEA investigators’ meeting
4-5th May 2010
East African
Regional consortium
2
Objectives
• Describe level of missing data for key variables
• Factors associated with missing for patients on Antiretroviral therapy (ART)
• Assess missing data assumptions in observational databases
3
Assumptions of missing data
• “missing completely at random” [MCAR] - not dependent on anything important– blood sample lost or not taken in error
• “missing at random” [MAR]• - dependent only on other measured factors, not on the
missing (unobserved) value– study specifies blood pressure below a threshold, so after
registering a high value, patient is withdrawn [blood pressure at this visit]
• “missing not at random” [MNAR]• related to the missing outcome itself
– patient withdrew from study because they "didn't feel well“
4
Study population04/2000 – 04/2010
Registered 23121Active 15070
Non-ART13310
ART9811
DART 300
Before 20051043
After 20058468
9511
5
Variables Data recorded
Recorded at every clinic visit
Recorded only when event occurs
Demographic Gendera
Date of birtha
Weight Heighta
XXXX
ClinicalWHO stage Karnofsky score
XX
Laboratory CD4 T-cell CBC (Hemoglobin Lymphocytes )
XX
Other variablesOpportunistic infectionsToxicityART regimenReason for ART switch/stopART Switch dateAdherence score
XXXXXX
6
Source of CD4 data
• Electronic download (86146 (95%)
• Recorded (3085 (5%))
7
Missing baseline variablesVariable N = 9511
Number missing (%)
Age 0 (0)
Gender 0 (0)
WHO clinical stage 0 (0)
Weight (Kg) 13 (0.1)
Height (cm) 1032 (10.8)
CD4 + count (cell/μL)1 3126 (32.8)
CD4 + count (cell/μL)2 1641 (17.2)
CD4 + count (cell/μL)3 1350 (14.2)
ART regimen 0 (0)
Note: 1=3mth pre-ART, 2=6mths pre-ART, 3=12mths pre-ART
8
Number of missing baseline variables
Note: a variables include weight, height and CD4 count
Year ofART start
Number of missing variables, n(%)
0 1 2
≤ 2004 198 (3.5) 648 (19.2) 197 (47.4)
2005 1878 (32.9) 848 (25.1) 100 (24.0)
2006 971 (17.0) 419 (12.4) 24 (5.8)
2007 1154 (20.2) 627 (18.6) 30 (7.2)
2008 739 (12.9) 379 (11.2) 33 (7.9)
2009 694 (12.1) 385 (11.4) 26 (6.3)
2010 81 (1.4) 74 (2.2) 6 (1.4)
Total 5715 (100) 3380 (100) 416 (100)
9
Factors associated with missing baseline CD4 count
No association with gender, age, weight
Variable Missing N=3167
NOT missing N=6344
p
Missing baseline height, n(%)
411 (13) 621 (9.8) <0.0001
ART regimen; Nevirapine Efavirenz PI Other
1798 (56.8)1143 (36.0) 154 (4.9) 72 (2.3)
3698 (58.3)2425 (38.2) 83 (1.3) 138 (2.2)
<0.0001
Year of ART initiation; ≤2004 2005 2006 2007 2008 2009 2010
562 (17.7)720 (22.7)415 (13.2)614 (19.4)390 (12.3)387 (12.2) 79 (2.5)
481 (7.6) 2106 (33.2) 999 (15.7) 1197 (18.9) 761 (12.0) 718 (11.3) 82 (1.3)
<0.0001
OUTCOMES
Study status Active Dead Lost Transferred Missing
2365 (33.8) 199 (24.0) 76 (65.5) 354 (33.1) 173 (34.0)
4623 (72.9) 629 (9.9) 40 (0.6) 716 (11.3) 336 (5.3)
<0.0001
10
CD4 counts at follow-up visits• CD4 tested 6 monthly (± 2 months)• Exclude baseline CD4 counts• Complete CD4 data No. of cd4 test expected >= No. total cd4 Given duration on ART counts observed
• Missing CD4 data No. of cd4 test expected ≠ No. total cd4 Given duration on ART counts observed
• 1423 (15%)- insufficient follow-up • 8088 (85%) assessed for missing CD4
11
Categorization of follow-up CD4 data (N= 8088)
Categorization | Freq. Percent -------------------------------------+------------------------complete baseline+ complete follow-up | 2,878 35.58 complete baseline + missing follow-up | 2,529 31.27 missing baseline + complete follow-up | 1,315 16.26 missing baseline + missing follow-up | 1,366 16.89 ------------------------------- -----+------------------------
Total | 8,088 100.00
• Complete baseline + complete f/up + cd4 testing + timely cd4 tests = 864 (10.7%)
•Included all nested research cohort patients
12
n=995 n=2487 n=1174 n=1555 n=960 n=917
Categorization of follow-up CD4 data year of ART initiation for patients with atleast 6 months follow-up
0%
20%
40%
60%
80%
100%
<=2004 2005 2006 2007 2008 2009
13
Validation of incident Post-ART Tuberculosis cases
• Tuberculosis most common opportunistic infection (rate (95% CI) 2.79 (2.45-3.16)) in first 24 months after ART initiation
• Merged flagged TB cases with TB drug database
•Identified patients on TB treatment
• 334 incident post-ART cases
14
Log rank P<0.435
Assumption 1
Baseline CD4 data Missing completely at random
Probability of development of Tuberculosis (TB) by baseline CD4 data
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0 .5 1 1.5 2 2.5analysis time
Complete baseline CD4 data Missing baseline_ CD4 data
15
0.0
00
.05
0.1
00
.15
0.2
00
.25
0.3
0
0 .5 1 1.5 2 2.5analysis time
Missing follow cd4 data Complete follow up cd4 data
Assumption 2
Baseline CD4 data missing at random
Probability of development of Tuberculosis (TB) by follow-up CD4 data
16
Preliminary Insights from analysis
• Reconcile local and IeDEA wide analyses• Baseline CD4 missing completely at
random (MCAR)• Follow-up CD4 data missing at random• Ignoring the missing data will lead to
biased estimates of ART• Strategies needed to identify patterns and
mechanisms of missing data in observational data prior to analysis
17
Planned analyses• missing data and other HIV outcomes e.g.
• immune response• Incidence of other opportunistic infections• toxicity• treatment changes/switches
• Strength of nested research cohort can be used to validate imputed data in large database
• CD4 trajectories versus mortality-estimate the distribution of CD4 marker trajectories and the distribution of log survival time using mixed-effects models, measuring time from the first pre-HAART CD4