error check in data
DESCRIPTION
Error check in data. Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/. Example data. HUMIS Birth cohort, 5 counties in Norway N=475 mother-child pairs Repeated questionnaires Purpose Outcome:Growth after birth Exposure:Contaminants in mother’s milk. Agenda. - PowerPoint PPT PresentationTRANSCRIPT
04/20/23 H.S. 1
Error check in data
Hein Stigum
Presentation, data and programs at:
http://folk.uio.no/heins/
Example data
• HUMIS– Birth cohort, 5 counties in Norway
– N=475 mother-child pairs
– Repeated questionnaires
• Purpose– Outcome: Growth after birth
– Exposure: Contaminants in mother’s milk
04/20/23 H.S. 2
04/20/23 H.S. 3
Agenda
• Potential problems– String variables, Missing, …
• Univariate
• Bivariate
• Multivariable
• Individual growth
04/20/23 H.S. 4
Potential problems
04/20/23 H.S. 5
String variables
encode KJONN if KJONN!=" ", generate(sex3)
String to numeric
04/20/23 H.S. 6
Missing
04/20/23 H.S. 7
Univariate outliers
402
190
57
0 20 40 60 80 100Child age in days
9
342
0 2,000 4,000 6,000 8,000Child weight i gr
1433
424
587
339
168
184
323
125723
930
295
287
32 3829
238
930
015
436
835
665
237
4346
9
354
0 10 20 30 40fHCB
102
471
400
300
187
100
429
308
395
283
385
7213
417
726
644
415
523
838
146
277
376
366
388
276
10 20 30 40 50BMI- before pregnancy
441
137
266
283
155
7242
947
146
381
388
376
134
177
238
276
277
366
444
40 60 80 100 120Vekt-Før denne graviditeten
258
359
443
356
18 35
106
82
150 160 170 180 190 200Høyde i cm
04/20/23 H.S. 8
Commands for previous plotlocal i=1
foreach var of varlist age1 weight1 fHCB BMI1 mHeight mWeight {
graph hbox `var', marker(1, mlabel(id) msymbol(i) mlabpos(0) mlabangle(-90)) ///
name(plt`i', replace)
local ++i
}
graph combine plt1 plt2 plt3 plt4 plt5 plt6, col(2)
04/20/23 H.S. 9
Bivariate outliers
BMI>35
4060
8010
012
0M
othe
r's w
eigh
t
150 160 170 180 190 200Mother's height
04/20/23 H.S. 10
Commands for previous plottwoway (scatter mWeight mHeight)
///
(scatter mWeight mHeight if BMI1>35 | BMI1<16, mcol(red))///
(qfit mWeight mHeight)///
(qfit mWeight mHeight if mHeight<185)///
, legend(off) text(110 195 "BMI>35", col(red)) ///
ytitle("Mother's weight") xtitle("Mother's height")
BMI>35
4060
8010
012
0M
othe
r's w
eigh
t
150 160 170 180 190 200Mother's height
04/20/23 H.S. 11
Multivariable outliers0
5000
1000
015
000
0 200 400 600 800age
We
igh
t
04/20/23 H.S. 12
Commands for previous plotgen agesq=age^2
gen ageqb=age^3
regress weight age agesq ageqb if age>=0 & age<1000
capture: drop xb res
predict xb, xb /* predicted value */
predict res, res /* residuals */
tw (scatter weight age)(scatter weight age if abs(res)>4000, mcol(red))///
(line xb age, sort lcol(red)) if age>=0 & age<1000, legend(off)
050
0010
000
1500
0
0 200 400 600 800age
04/20/23 H.S. 13
Plot of individual growth patterns:
weight versus age
04/20/23 H.S. 14
Weight by age 10
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
0 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 2000
100034 100045 100067 100078 100089 100091
100102 100135 100168 100181 100214 100225
100236 100258 100269 100282 100293 100304
100315 100337 100348 100359 100372 100416
100462 100473 100517 100528 100541 100574
we
ight
ageGraphs by LNR-numeric var
04/20/23 H.S. 15
050
0010
0001
500
00
5000
1000
015
000
050
0010
0001
500
00
5000
1000
015
000
050
0010
0001
500
0
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000
100607 100618 100631 100686 100719 100721
100732 100798 100809 100833 100844 100866
100888 100899 100901 100934 100945 101024
101046 101103 101171 101193 101204 101215
101226 101248 101261 101272 101294 101305
we
ight
age
Weight by age 2
Weight by age 2
04/20/23 H.S. 16
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 2000
101316 101351 101395 101406 101439 101507
101518 101531 101654 101711 101744 101823
101834 101845 101856 101867 101891 101946
101981 101992 102003 102014 102025 102036
102047 102172 102205 102262 102339 102341
we
ight
age
Weight by age 3
Weight by age 3
04/20/23 H.S. 17
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 2000
102352 102418 102431 102453 102464 102475
102508 102519 102521 102543 102587 102633
102701 102712 102835 102903 102914 102969
103026 103061 103083 103094 103162 103173
103184 103252 103285 103419 103421 103487
we
ight
age
Weight by age 4
Weight by age 4
04/20/23 H.S. 18
050
0010
0001
500
00
5000
1000
015
000
050
0010
0001
500
00
5000
1000
015
000
050
0010
0001
500
0
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000
103678 103691 103713 103779 103803 103836
103893 103915 103948 103983 104016 104051
104095 104128 104163 104207 104218 104264
104332 104387 104501 104523 104681 104703
104747 104769 104771 105052 105085 105142
we
ight
age
Weight by age 5
Weight by age 5
04/20/23 H.S. 19
050
0010
0001
500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
0001
500
0
0 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 2000
105153 105254 105388 105399 105434 105478
105671 105838 105849 105985 106007 106053
106108 106121 106301 106345 106389 106435
106468 106503 106547 106569 106582 106593
106615 106626 106683 106749 106806 106885
we
ight
age
Weight by age 6
Weight by age 6
04/20/23 H.S. 20
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000
106931 107019 107021 107065 107256 107289
107324 107515 107583 107717 107785 107807
107818 107864 107908 108088 108145 108178
108191 108336 108404 108652 108707 108718
108731 108887 108898 108911 108988 108999
we
ight
age
Weight by age 7
Weight by age 7
04/20/23 H.S. 21
050
0010
0001
500
00
5000
1000
015
000
050
0010
0001
500
00
5000
1000
015
000
050
0010
000
1500
0
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000
109023 109034 109067 109078 109124 109192
109225 109449 109451 109462 109506 109528
109607 109675 109721 109809 109866 110182
110294 110316 110338 110349 110395 110474
110597 110665 110676 110698 110711 110777
we
ight
age
Weight by age 8
Weight by age 8
04/20/23 H.S. 22
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 2000
110924 110946 110968 111025 111093 111126
111295 111328 111363 111396 111532 111543
111554 111611 111666 111688 111699 111701
111789 111903 112004 112037 112048 112149
112206 112228 112241 112285 112331 112375
we
ight
age
Weight by age 9
Weight by age 9
04/20/23 H.S. 23
Weight by age 10
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 2000
112397 112432 112511 112634 112803 112871
112882 112893 112926 113005 113095 113231
113319 113354 113365 113409 113422 113692
113703 113747 113782 113826 113861 113995
114028 114107 114175 114197 114276 114311
we
ight
ageGraphs by LNR-numeric var
04/20/23 H.S. 24
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000
114366 114491 114614 114636 114647 114715
114761 114827 114873 114895 114906 114917
114939 114985 115018 115031 115097 115154
115165 115176 115187 115198 115233 115277
115323 115334 115547 115593 115659 115672
we
ight
age
Weight by age 11Weight by age 11
04/20/23 H.S. 25
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000
115705 115749 115863 115929 115964 116122
116133 116166 116289 116381 116493 116504
116583 116662 116717 116796 116807 116853
116932 116987 117099 117178 117189 117246
117257 117279 117336 117369 117371 117393
we
ight
age
Weight by age 12Weight by age 12
04/20/23 H.S. 26
Weight by age 130
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
0 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 2000
117437 117527 117562 117786 117821 117843
117922 117966 117977 118034 118192 118203
118214 118247 118258 118315 118517 118563
118631 118675 118776 118811 118934 119057
119068 119169 119171 119204 119226 119248
we
ight
age
Weight by age 13
04/20/23 H.S. 27
Weight by age 140
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
0 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 2000
119261 119349 119351 119474 119496 119507
119531 119564 119711 119755 119799 119957
120014 120126 120148 120159 120183 120216
120251 120453 120464 120497 120543 120723
120745 120789 120802 120835 120903 121083
we
ight
age
Weight by age 14
04/20/23 H.S. 28
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 20000 500 1000 1500 2000
121094 121116 121206 121274 121318 121329
121331 121353 121432 121522 121713 121836
121871 121948 121983 122038 122117 122152
122253 122264 122275 122411 122422 122433
122455 122488 122499 122613 122736 122872
we
ight
age
Weight by age 15
Weight by age 15
04/20/23 H.S. 29
Weight by age 160
5000
1000
0150
000
5000
1000
0150
000
5000
1000
0150
000
5000
1000
0150
000
5000
1000
0150
00
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000
123164 123175 123232 123254 123309
123322 123399 123412 123434 123478
123546 123568 123579 123614 123658
123682 123759 123862 124931 125335
125583 126088 126257 127326
we
ight
age
Weight by age 16
Commands for previous plots* Individual growth patterns. OBS 16 pages of each 30 plots* Repeated measurements, long format, age nested in id
sort id age /* sort by id-number and age */global d=30 /* 30 plots per page */forvalues i=1(1)16 { /* 16 pages*30 plots=480 subjects */ local j=(`i'-1)*$d+1 /* plot subjects in id-interval: j<=id<=k */ local k=`i'*$d twoway (line weight age, connect(ascending)) if id>=`j' & id<=`k‘ /// ,by(id, compact title("Weight by age, `i'") note("") ) /// ylabel(0(5000)15000) xlabel(0(200)800) graph export “H:\Projects\HUMIS\Weight gain\plt`i'.emf", replace /* Enhanced Metafile Format
*/} /* end of loop */
* Make new Photo album in Powerpoint, and add all plots. This will give one plot per page in max size.
04/20/23 H.S. 30
04/20/23 H.S. 31
After new data merge
Plot of individual growth patterns:
weight versus age
04/20/23 H.S. 32
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
100034 100045 100067 100078 100089 100091
100102 100135 100168 100181 100214 100225
100236 100258 100269 100282 100293 100304
100315 100337 100348 100359 100372 100416
100462 100473 100517 100528 100541 100574
we
ight
age
Weight by age, 1
04/20/23 H.S. 33
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
100607 100618 100631 100686 100719 100721
100732 100798 100809 100833 100844 100866
100888 100899 100901 100934 100945 101024
101046 101103 101171 101193 101204 101215
101226 101248 101261 101272 101294 101305
we
ight
age
Weight by age, 2
04/20/23 H.S. 34
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 800
101316 101351 101395 101406 101439 101507
101518 101531 101654 101711 101744 101823
101834 101845 101856 101867 101891 101946
101981 101992 102003 102014 102025 102036
102047 102172 102205 102262 102339 102341
we
ight
age
Weight by age, 3
04/20/23 H.S. 35
050
0010
000
1500
00
5000
1000
015
000
050
0010
0001
500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 800
102352 102418 102431 102453 102464 102475
102508 102519 102521 102543 102587 102633
102701 102712 102835 102903 102914 102969
103026 103061 103083 103094 103162 103173
103184 103252 103285 103419 103421 103487
we
ight
age
Weight by age, 4
04/20/23 H.S. 36
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
103678 103691 103713 103779 103803 103836
103893 103915 103948 103983 104016 104051
104095 104128 104163 104207 104218 104264
104332 104387 104501 104523 104681 104703
104747 104769 104771 105052 105085 105142
we
ight
age
Weight by age, 5
04/20/23 H.S. 37
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
105153 105254 105388 105399 105434 105478
105671 105838 105849 105985 106007 106053
106108 106121 106301 106345 106389 106435
106468 106503 106547 106569 106582 106593
106615 106626 106683 106749 106806 106885
we
ight
age
Weight by age, 6
04/20/23 H.S. 38
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 800
106931 107019 107021 107065 107256 107289
107324 107515 107583 107717 107785 107807
107818 107864 107908 108088 108145 108178
108191 108336 108404 108652 108707 108718
108731 108887 108898 108911 108988 108999
we
ight
age
Weight by age, 7
04/20/23 H.S. 39
050
0010
0001
500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
0001
500
0
0 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 800
109023 109034 109067 109078 109124 109192
109225 109449 109451 109462 109506 109528
109607 109675 109721 109809 109866 110182
110294 110316 110338 110349 110395 110474
110597 110665 110676 110698 110711 110777
we
ight
age
Weight by age, 8
04/20/23 H.S. 40
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
110924 110946 110968 111025 111093 111126
111295 111328 111363 111396 111532 111543
111554 111611 111666 111688 111699 111701
111789 111903 112004 112037 112048 112149
112206 112228 112241 112285 112331 112375
we
ight
age
Weight by age, 9
04/20/23 H.S. 41
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 800
112397 112432 112511 112634 112803 112871
112882 112893 112926 113005 113095 113231
113319 113354 113365 113409 113422 113692
113703 113747 113782 113826 113861 113995
114028 114107 114175 114197 114276 114311
we
ight
age
Weight by age, 10
04/20/23 H.S. 42
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
114366 114491 114614 114636 114647 114715
114761 114827 114873 114895 114906 114917
114939 114985 115018 115031 115097 115154
115165 115176 115187 115198 115233 115277
115323 115334 115547 115593 115659 115672
we
ight
age
Weight by age, 11
04/20/23 H.S. 43
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 800
115705 115749 115863 115929 115964 116122
116133 116166 116289 116381 116493 116504
116583 116662 116717 116796 116807 116853
116932 116987 117099 117178 117189 117246
117257 117279 117336 117369 117371 117393
we
ight
age
Weight by age, 12
04/20/23 H.S. 44
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
117437 117527 117562 117786 117821 117843
117922 117966 117977 118034 118192 118203
118214 118247 118258 118315 118517 118563
118631 118675 118776 118811 118934 119057
119068 119169 119171 119204 119226 119248
we
ight
age
Weight by age, 13
04/20/23 H.S. 45
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 800
119261 119349 119351 119474 119496 119507
119531 119564 119711 119755 119799 119957
120014 120126 120148 120159 120183 120216
120251 120453 120464 120497 120543 120723
120745 120789 120802 120835 120903 121083
we
ight
age
Weight by age, 14
04/20/23 H.S. 46
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
00
5000
1000
015
000
050
0010
000
1500
0
0 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 8000 200 400 600 800
121094 121116 121206 121274 121318 121329
121331 121353 121432 121522 121713 121836
121871 121948 121983 122038 122117 122152
122253 122264 122275 122411 122422 122433
122455 122488 122499 122613 122736 122872
we
ight
age
Weight by age, 15
04/20/23 H.S. 47
050
0010
0001
5000
050
0010
0001
5000
050
0010
0001
5000
050
0010
0001
5000
050
0010
0001
5000
0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800
123164 123175 123232 123254 123309
123322 123399 123412 123434 123478
123546 123568 123579 123614 123658
123682 123759 123862 124931 125335
125583 126088 126257 127326
we
ight
age
Weight by age, 16
04/20/23 H.S. 48
Individual plots in large datasets?
• Scan 1 page (=30 curves) in 5 sec– Hours used=5N/(30*60*60)
• Scan all– If N=50 000, need 2.3 hours
• May instead scan curves of subjects with medium to large residuals.– Residual>1000
• finds 190 of the 470 children =40%• 12 of the 15 deviant growth patterns =80%
Summing up
• Graph, outliers– Uni: Boxplots
– Bi: Scatterplots
– Multi: Scatterplots+residuals
– Individual growth
• Merge errors are not rare!
04/20/23 H.S. 49