1 population and housing census and survey editing michael j. levin center for population and...
TRANSCRIPT
1
Population and Housing Census and Survey Editing
Michael J. LevinCenter for Population and Development Studies
Harvard [email protected]
2
Appendix A Censuses where some of these methods were applied
Country Census Years American Samoa 1974, 1980, 1990, 2000 Ethiopia 2007 Fiji 1996, 2007 Ghana 1984, 2000, 2010 Grenada 2001 Guam 1980, 1990, 2000 Indonesia 1980, 2010 Kenya 1999 Kiribati 2005 Lesotho 1996, 2006 Malawi 1998, 2008 Maldives 2006 Marshall Islands 1973, 1980, 1988 Micronesia 1973, 1980, 1994, 2000 Northern Marianas 1973, 1980, 1990, 1995, 2000 Palau 1973, 1980, 1990, 1995, 2000, 2005 Papua New-Guinea 1990 Samoa 2001 Sierra Leone 2004 Solomon Islands 1999 South Africa 2001 Sudan 2008 Tanzania 2002 Timor Leste 2004 Tonga 1996, 2006 Uganda 1991, 2002 US Virgin Islands 1980, 1990, 2000 Vanuatu 1989 Zambia 2000 Note: For some, processing occurred during the census, for others it was during preparation or during analysis (including own children estimation).
3
Purpose of Handbook
No census data are ever perfect Changes are made -- little documentation Promote communication between subject
specialists and programmers “Cookbook” of suggestions -- presents
possible resolutions But country edit teams must decide
4
The Census Process
Data collection Capture Editing Tabulation and Dissemination Archiving
5
History of census editing
Early years – manual or nothing Computers Within record editing Between record editing Hot decking
6
Editing in Historical Perspective
Before computers: manual editing With computers: Increased complexity Automated changes Generalized editing packages New philosophies of editing Personal computers Appropriate levels of computer editing
7
Major Elements in a Census
Preparatory work Enumeration Data processing -- keying, editing and
tabulations Building data bases and dissemination Evaluation of results Analysis of results
8
Errors in Census Process
Coverage Errors Questionnaire Design Enumerator/respondent errors Coding errors Data entry errors Computer editing errors Tabulation errors
9
What is editing
Editing is the systematic inspection of invalid and inconsistent responses, and subsequent manual or aurtomatic correction according to pre-determined rules.
The editing team!!
10
Editing Team
Appropriate internal subject matter specialists
Computer Programmers Work together as a team Edit Specs as means of communication Outside experts -- academicians Outside experts -- private sector
11
Why edit?
Edited vs unedited data Always preserve original data Consider the users!!
12
Table 1. Sample population by 15-year age group
and sex, using unedited and edited data
Unedited data Edited data Age group Total Male Female Not
reported Total Male Female
Total 4,147 2,033 2,091 23 4,147 2,045 2,102 Less than 15 years 1,639 799 825 15 1,743 855 888 15 to 29 years 1,256 612 643 1 1,217 603 614 30 to 44 years 727 356 369 2 695 338 357 45 to 59 years 360 194 166 0 341 182 159 60 to 74 years 116 54 59 3 114 53 61 75 years and over 34 12 22 0 37 14 23 Not reported 15 6 7 2
13
TABLE 2. POPULATION AND POPULATION CHANGE BY 15-YEAR AGEGROUP WITH UNKNOWNS: 1990 AND 2000
Numbers Per centAge group
2000 1990
NumberChange
Per centChange 2000 1990
Total 4147 3319 828 24.9 100.0 100.0
Less than 15 years 1639 1348 291 21.6 39.5 40.6
15 to 29 years 1256 902 354 39.2 30.3 27.2
30 to 44 years 727 538 189 35.1 17.5 16.2
45 to 59 years 360 200 160 80.0 8.7 6.0
60 to 74 years 116 89 27 30.3 2.8 2.7
75 years and over 34 25 9 36.0 0.8 0.8
Not reported 15 217 -202 -93.1 0.4 6.5
Table showing trends with unknowns
14
WHAT CENSUS EDITING SHOULD DO
1 Give users measures of the quality of the data
2 Identify the types and sources of error, and
3 Provide adjusted census results
15
Goals of the edit
Imputed household should closely resemble failed edit household
Imputed data should come from a single donor person or house resembling donee
Equally good donors should have equal chances
16
Basics of Census Editing
Systematic inspection and change (not always correction)
Fatal edits -- invalid or missing entries Query edits -- inconsistencies Must preserve the original data as much as possible Quality enumeration more important than editing Edit does not improve data quality -- makes more
esthetic Team must determine how far to do
17
More of Basics
Over-editing is harmful Treatment of unknowns Spurious changes Determining tolerances Learning from the edit process Quality assurance Costs of Editing Imputation Archiving
18
How Over-editing is Harmful
Timeliness Finances Distortion of true values A false sense of security
19
What we have to look out for
Treatment of unknowns Spurious changes Using tolerances Learning from the editing process Quality assurance Costs of editing
20
Two parts of a national edit
Structure editing Content editing
21
Editing Applications
Manual versus automatic correction Guidelines for correcting data Validity and consistency checks Methods of correcting and imputing data Other editing systems
22
Manual versus Automatic Correction Manual correction: takes a long time and
very subject to error Automatic correction: faster and consistent. Not necessarily correct, just consistent. Can look at many variables at the same time Can keep an audit trail
23
Figure 1. Sample editing specifications to correct sex variable, in pseudocode
If SEX of the HEAD OF HOUSEHOLD = SEX of the SPOUSE If FERTILITY of the HEAD OF HOUSEHOLD is not blank If FERTILITY of the SPOUSE is blank (if the SEX of the head of household is not already female) Make the SEX = female endif (if the SEX of the spouse is not already male) Make the SEX = male endif else Do something else because they have same sex and both have fertility !!! [The “something” could be using the sex of the previous head, or alternating the sex of the Head, or using ratios of sexes of all heads for an appropriate response, etc.] endif Endif Else This is the case where the head of household’s fertility is blank If FERTILITY of the SPOUSE is not blank (if the SEX of the head of household is not already male) Make the SEX = male endif (if the SEX of the spouse is not already female) Make the SEX = female endif else Do something else because BOTH have no fertility!!! [The “something” could be using the sex of the previous head, or alternating the sex of the Head, or using ratios of sexes of all heads for an appropriate response, etc.] endif Endif Endif
24
Guidelines for Correcting Data
Make the fewest required changes possible to the originally collected data
Eliminate obvious inconsistencies among the entries
Systematically supply entries for erroneous or missing items by using other entries for the housing unit, person, or other persons in the household or group
When appropriate, use “not reported”
25
26
Types of editing
Top Down• The usual way• Is simple and straight forward Multiple-variable editing approach• Uses more information• Is likely to be a better guess
27
Methods of Correction and Imputation When imputation is not needed – toggling
sexes Static imputation – cold deck technique Dynamic imputation – hot deck technique
28
Hot Deck Imputation
Geographic considerations Use of related items Sequence of the items Complexity of the matrices Standardized hot decks Size of hot decks -- too big, audit trail,
too small, difficult items
29
In developing hot decks
Imputation matrices – structure of the matrices Standardized imputation matrices Seeding the decks Big, but not too big Understanding what the matrix is doing When the matrix is too small … Occupation and industry!!
30
Aids to checking edits
1. Listings
2. Writing whole households before and after with changes
3. Frequency matrices
31
Figure 4. Example of a listing summary for Malawi 2008 Census[LISTING]
1718 336574 - ******************************** ... - 1719 336574 - ******* Age & Head ********* ... - 1720 336574 - ******************************** ... - 1805 1546 0.1 *P00-1* Head is not first person, is %2d... 1748490 1823 877 0.1 *P00-2* No head of household, first person 14+... 1748490 1835 62 0.0 *P00-3* No head 14+, first person becomes head... 1748490 1850 5074 0.3 *P00-4* Too many heads of household - 1 ... 1748490 1860 5238 0.3 *P00-5* Remaining heads made other RELATIONSHI... 1748490 1874 939 0.1 *P00-6* After head edit, not one and only one ... 1748490 1889 2301 0.1 *P00-6a* Spouses too young made other relative... 1748490 1909 1062 0.1 *P00-6ax* Multiple spouses for unmarried head... 1748490 1911 1062 0.1 *P00-6ax* Multiple spouses for unmarried head... 1748490 1929 44 0.0 *P00-6a1* Crazy case where spouse is visitor a... 1748490 1949 89 0.0 *P00-6a3* Crazy case where spouse is visitor a... 1748490 1998 12 0.0 *P00-6a1* Extra spouses who are visitors... 1748490 2017 1483 0.1 *P00-6a2* Extra spouses not married... 1748490
32
Figure 5. Example of a listing summary for Lesotho 2006 Census[LISTING]
4388 21471 - ... - 4389 21471 - ******* Sisterhood Characteristics *********... - 4390 21471 - ... - 4401 1449 1.2 *G45-1* Total sisters out of range [%2d] illeg... 124839 4410 2897 2.3 *G45-2* Dead sisters out of range [%2d] illega... 124839 4419 3791 3.0 *G45-3* Pregnant sisters [%2d] illegal... 124839 4426 3895 3.1 *G45-4* At birth sisters [%2d] illegal... 124839 4433 4908 3.9 *G45-5* Week 6 sisters [%2d] illegal... 124839 4440 103 0.1 *G45-6* Sum of Dead Sisters [%2d][%2d][%2d] gr... 124839 4453 8 0.0 *G45-7* Sum of Dead Sisters [%2d][%2d][%2d] gr... 124839 4461 616 0.5 *G45-8* Dead Sisters [%2d] greater than total ... 124839
33
Figure 8. Example of a write listing for Ethiopia 2007 Census[WRITE]
BARCODE REGION ZONE WEREDA TOWN SUB_CITY SA KEBELE EA HHNO HUNO ------------------------------------------------------------------------- PN RS RH SX AG RL MT ET DS 1 2 3 4 5 6 7 8 9 0 1 2 3 CS YR PR ZN MO FA LT SC HG WL RS LY ES MS MH FH MA FA MD FD LB 01 01 01 01 31 01 05 67 02 08 01 01 01 97 12 01 01 07 01 02 01 06 01 34 01 05 67 02 08 01 01 01 97 17 01 01 03 01 03 01 09 02 30 01 05 05 02 07 02 02 97 05 01 01 05 04 00 00 00 00 00 00 00 04 01 09 02 20 01 05 05 02 03 02 02 02 98 03 01 03 01 01 00 00 00 00 00 00 05 01 09 01 01 01 05 05 02 08 03 07 08 01 P18-3 No literacy , but schooling 97, so literate, PN = 3 P20-20 Unable to read and write 98 because never attended school , PN = 4 P16-1 Mother's vital status invalid = PN = 5 P17-1 Father's vital status invalid = PN = 5 PN RS RH SX AG RL MT ET DS 1 2 3 4 5 6 7 8 9 0 1 2 3 CS YR PR ZN MO FA LT SC HG WL RS LY ES MS MH FH MA FA MD FD LB 01 01 01 01 31 01 05 67 02 08 01 01 01 97 12 01 01 07 01 02 01 06 01 34 01 05 67 02 08 01 01 01 97 17 01 01 03 01 03 01 09 02 30 01 05 05 02 07 02 02 01 97 05 01 01 05 04 00 00 00 00 00 00 00 04 01 09 02 20 01 05 05 02 03 02 02 02 98 00 03 01 03 01 01 00 00 00 00 00 00 05 01 09 01 01 01 05 05 02 08 01 01 01 01
34
Figure 10. Example of a frequency distribution for Sudan 2008 Census[FREQUENCY]
Imputed Item Q18_ATTAINMENT: Education Attainment - all occurrences _____________________________ _____________ _____________ Categories Frequency CumFreq % Cum % Net %|cNet % _______________________________ _____________________________ _____________ _____________ 1 No Qualification 105 105 2.2 2.2 2.4 2.4 2 Incomplete Primary 1564 1669 33.5 35.7 35.3 37.7 3 Primary 4 529 2198 11.3 47.0 11.9 49.6 4 Primary 6 492 2690 10.5 57.6 11.1 60.7 5 Primary 8 302 2992 6.5 64.0 6.8 67.5 6 Junior 3 251 3243 5.4 69.4 5.7 73.2 7 Junior 4 58 3301 1.2 70.7 1.3 74.5 8 Secondary 3 95 3396 2.0 72.7 2.1 76.6 9 Secondary 4 5 3401 0.1 72.8 0.1 76.7 10 Post Secondary Diploma 2 3403 0.0 72.8 0.0 76.8 11 University Degree 154 3557 3.3 76.1 3.5 80.3 12 Post Graduate Diploma 10 3567 0.2 76.3 0.2 80.5 13 Master 52 3619 1.1 77.5 1.2 81.7 14 Ph.D 1 3620 0.0 77.5 0.0 81.7 15 Khalwa 1 3621 0.0 77.5 0.0 81.7 @17 144 3765 3.1 80.6 3.2 85.0 @98 667 4432 14.3 94.9 15.0 100.0 _______________________________ _____________________________ _____________ _____________ NotAppl 240 4672 5.1 100.0 _______________________________ _____________________________ _____________ TOTAL 4672 4672 100.0 100.0
35
Figure 11. Example of a frequency distribution for additional edit for Zambia 1990 Census[FREQUENCY]
Input: 1IN100.DAT Program: ZAMHOUSE ROOMS ------------------------------------------------------------- Values Number of Cum. Imputed Imputations Percent Percent ------------------------------------------------------------- < 1 1,415 37.21 37.21 1 2,185 57.45 94.66 2 121 3.18 97.84 3 22 0.58 98.42 4 16 0.42 98.84 5 23 0.60 99.45 6 21 0.55 100.00 > 6 - - - ------------------------------------- 3,803
36
Other considerations
Running the edit three times: seed, run, check
Saving original responses Imputation flags
37
Computer Edit Specificationsfor Pilot Census 2001Data Processing Project
Christopher S. Corlett
Data Processing Adviser
U.S. Census Bureau
38
Editing examples:
Language – the general edit Young heads of household Population group Access to telephones Same-sex marriages Fertility
Source for all data except language: South Africa Pilot Census 2001
39
Language Edit
If this is the head and language is missing, first look for someone else in the house with language, and assign that.
If this is the head without language, no one else has language, use neighboring head of similar characteristics to assign a best guess.
If this is someone else in the house and language is missing, assign the head’s language.
40
Language Edit: Within House
91200217 Population Group Case = 0009 ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 034 01 1 55 1 09 1 1 02 2 023 02 1 06 55 1 07 1 1 03 2 005 03 1 06 55 1 09 1 1 04 2 003 03 1 06 55 1 09 1 1V.14c: P07 invalid for head, imputing from other PN = 01 Lang = Oth lang = 06V.14c: P07 invalid for head, imputing from other PN = 01 Lang = 06 Oth lang = 06V.14c: P07 invalid for head, imputing from other PN = 01 Lang = 06 Oth lang = 06end ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 034 01 1 06 55 1 09 1 1 02 2 023 02 1 06 55 1 07 1 1 03 2 005 03 1 06 55 1 09 1 1 04 2 003 03 1 06 55 1 09 1 1
41
Language Edit: Imputed House
91200697 Language Case = 0027 ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 027 01 1 1 09 1 1 02 2 027 02 1 1 09 1 1 03 1 005 03 1 1 09 1 1V.14d: P07 invalid, imputing from deck ALANGUAGE PN = 01 Lang =V.15d: P08 invalid for head, impute from deck ARELIGIO PN = 01 Head Relig =V.14f: P07 invalid, imputing from head PN = 02 Lang = Head's lang = 06V.15f: P08 invalid, imputing from head's religion PN = 02 Relig = Head'srelig = 38V.14f: P07 invalid, imputing from head PN = 03 Lang = Head's lang = 06V.15b: imputing P08 from mother's religion PN = 03 Relig = Mo relig = 38end ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 027 01 1 06 38 1 09 1 1 02 2 027 02 1 06 38 1 09 1 1 03 1 005 03 1 06 38 1 09 1 1
42
Editing examples:
Language – the general edit Young heads of household Population group Access to telephones Same-sex marriages Fertility
Source for all data: Pilot Census 2001
43
Young heads of household
V.3 (relationship for head) and V.5 (age of head) Related issue: each HH must have 1 and only 1
head. For invalid head of ages, try to obtain via:
– spouse (impute from deck based on spouse's age and head's sex)
– otherwise, children (child's age and head's sex)– otherwise, impute from deck (household size
and head's sex)
44
Young heads
Skepticism about young heads; if younger than 12 then confirm:
– if someone else older is present, then make them the head (V.3)– can't be married (must be 12+ years to be married)– has to be 12 years older than biological children– confirm consistency of age and educational level– confirm consistency of age and educational institution– can't have economic activity responses if younger than 10– can't have fertility (for girls)
If head doesn't pass these age tests, then impute (based on head’s sex and household size).
45
Young heads
Effect: number of heads younger than 12 years old drops from 1296 (1.3%) to 627 (0.6%)
46
47
48
Notes:
PN = Person number
SEX = Sex
DOB = Day of birth
MOB = Month of birth
YOB = Year of birth
REL = Relationship to head
MAR = Marital status
SPN = Spouse person number
CEB = Children ever born (total)
CS = Children surviving (total)
MPN = Mother person number
FPN = Father person number
Case 1:
49
PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN01 1 11 01 1950 051 01 1 99 99 0102 1 17 07 1977 023 03 5 0103 2 04 04 1985 005 03 5 00 09 0104 1 24 10 1987 011 03 5 53 0105 1 01 07 1990 010 03 5 49 0106 1 20 02 1994 007 01 5 99 0107 1 20 02 1994 007 5 99 01
V.2b4b: age and DOB inconsistent, age <= DOB, Age = 005 Date = 04/04/1985V.2b4b: age and DOB inconsistent, age <= DOB, Age = 011 Date = 24/10/1987V.3: either no heads or > 1= 0002V.3h: more than 1 head =V.3i: multiple heads, making oldest= 0051V.3k: multiple heads, making excess other relV.9g: Relation invalid, has a dad, impute Rela
PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN01 1 11 01 1950 051 01 1 99 99 0102 1 17 07 1977 023 03 5 0103 2 04 04 1985 015 03 5 00 09 0104 1 24 10 1987 013 03 5 53 0105 1 01 07 1990 010 03 5 49 0106 1 20 02 1994 007 11 5 99 0107 1 20 02 1994 007 03 5 99 01
Case 1:
50
Case 2:
PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN
01 2 01 09 1986 015 01 5 00 90
02 2 09 06 1990 011 06 5
03 1 01 09 1991 010 06 5 99
04 2 01 09 1994 007 06 5 99
V.2b4b: age and DOB inconsistent, age <= DOB, Age = 015 Date = 01/09/1986
V.2b4b: age and DOB inconsistent, age <= DOB, Age = 011 Date = 09/06/1990
V.2b4b: age and DOB inconsistent, age <= DOB, Age = 010 Date = 01/09/1991
V.2b4b: age and DOB inconsistent, age <= DOB, Age = 007 Date = 01/09/1994
V.3a1: head is younger than 16, Age = 014
V.3a3: no older relatives found; keep young head
PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN
01 2 01 09 1986 014 01 5 00 90
02 2 09 06 1990 010 06 5
03 1 01 09 1991 009 06 5 99
04 2 01 09 1994 006 06 5 99
51
Case 3:PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN
01 1 12 01 1998 003 09 05
02 2 008 09 05
V.3b: no head of household!
V.3e: no head, making oldest person the head
V.5: head is younger than 12, about to confirm this
V.5e1: young head, but age consistent with educ lvl
V.5i1: young head, but age consistent with educ inst
V.5k: imputing young head's age from AHEADAGE for econ activity inconsistency
PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN
01 1 99 99 1908 092 01 05
02 2 12 01 1998 003 09 05
52
Editing examples:
Young heads of household Population group Access to telephones Same-sex marriages Fertility
53
Population Group (V.13)
For invalid population group, try to obtain via:– Head of household– Someone else in the household– Otherwise, impute from deck (age by household size)
Effects:– Removes 2.9% blank/invalid responses
54
Population Group (percents)
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
BlackAfrican
Coloured Indian orAsian
White Other blank invalid
per
cen
traw
edited
55
Population Group
Parts of the current edit might need refinement for South Africa
Issues to explore:– Imputations in HHs with multiple pop groups;– Tolerances and household size:
Case where whole HH has blank/invalid pop group; Case where all but 1 HH member has same pop group; Situations between these two extremes
– Effect on planning/data use of leaving the variable “not stated”
56
Population Group
57
Case 1:PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM
01 1 073 01 1 06 55 1 09 1 1
02 2 063 02 1 06 55 1 09 1 1
03 2 025 11 1 06 55 1 09 1 1
04 1 016 09 1 06 55 1 09 1 1
05 1 014 09 1 06 55 1 09 1 1
06 2 011 09 1 06 55 1 09 1 1
07 2 000 11 1 09 1 1
V.13e: Pop group invalid, impute from head PN=07 Group=Head Group= 1
PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM
01 1 073 01 1 06 55 1 09 1 1
02 2 063 02 1 06 55 1 09 1 1
03 2 025 11 1 06 55 1 09 1 1
04 1 016 09 1 06 55 1 09 1 1
05 1 014 09 1 06 55 1 09 1 1
06 2 011 09 1 06 55 1 09 1 1
07 2 000 11 1 06 55 1 09 1 1
58
Case 2:PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM01 1 032 01 3 02 39 1 08 710 1 102 2 028 02 1 08 1 103 1 068 07 1 08 1 104 2 057 07 1 08 1 105 2 007 03 1 06 1 106 1 006 03 1 08 1 107 1 001 03 1 08 1 108 2 030 12 1 07 09 1 1
V.13e: Pop group invalid, impute from head (SIX TIMES)
PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM01 1 032 01 3 02 39 1 08 1 102 2 028 02 3 02 39 1 08 1 103 1 068 07 3 02 39 1 08 1 104 2 057 07 3 02 39 1 08 1 105 2 007 03 3 02 39 1 06 1 106 1 006 03 3 02 39 1 08 1 107 1 001 03 3 02 39 1 08 1 108 2 030 12 1 07 39 1 09 1 1
59
Case 3:
PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM
01 1 045 01 01 32 1 01 1
02 2 048 02 01 32 1 01 1
V.13b: Pop group invalid, impute from deck
V.13e: Pop group invalid, impute from head
PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM
01 1 045 01 4 01 32 1 01 1 1
02 2 048 02 4 01 32 1 01 1 1
60
Editing examples:
Young heads of household Population group Access to telephones Same-sex marriages Fertility
61
Telephones and cell phones (IV.16)
Telephone access is not applicable for households that have telephones or cell phones.
Households with responses to the telephone access question should not have telephones or cell phones.
Impute these variables from hot decks (based on dwelling type and tenure status) if necessary.
62
63
Telephones and cell phones
Many left all questions blank– Problems with capture of continuation qsts– Confusion of “blank” and “no” (also seen in
disabilities section)
64
Summary report:
65
Case 1:
DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD TV CMP FRG TEL CLL ACC RFS
01 2 006 1 4 7 4 1 5 1 1 1 2 1 2 2 4
IV.16c: impute cell phone = no Phone2 Cell= Access= 2
DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD TV CMP FRG TEL CLL ACC RFS
01 2 006 1 4 7 4 1 5 1 1 1 2 1 2 2 2 4
66
Case 2:
DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD TV CMP FRG TEL CLL ACC RFS
01 006 1 4 1 4 1 1 1 1 1 2 1 1 4
IV.16h: imputed cell = 1 from deck
DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD TV CMP FRG TEL CLL ACC RFS
01 2 006 1 4 1 4 1 1 1 1 1 2 1 1 1 4
67
Case 3:
DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD TV CMP FRG TEL CLL ACC RFS
02 2 005 1 4 1 4 4 4 1 1
IV.13c: imputed television
IV.14c: imputed computer
IV.15c: imputed refrigerator
IV.16f: imputed telephone
IV.16h: imputed cell
IV.16j: imputed access
IV.17c: imputed rubbish
DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD TV CMP FRG TEL CLL ACC RFS
02 2 005 1 4 1 4 4 4 1 1 1 2 2 2 2 1 4
68
Editing examples:
Young heads of household Population group Access to telephones Same-sex marriages Fertility
69
Same-sex marriages (V.7, V.8, and V.12)
Treated as part of the marital status edits for heads and rest of household
Imputations for invalid sex never result in a same-sex marriage
No polygamous combinations of same-sex allowed
70
Same-sex marriages
Skepticism about same-sex marriages; only allowable if:– Both partners 12 years or older;– Both sexes valid;– Relationships to head consistent (for sub-families);– Both partners’ marital statuses reported as “living
together” (4).
71
72
Same-sex marriages
Investigation shows that almost all of the reported same-sex marriages are erroneous.
Enumerator’s manual contains instructions that add bias against accurate collection.
Social situation in SA means that this might become a contentious issue.
73
Same-sex marriages
Enumerator’s Manual, pg 38:
“Question P-05: Marital Status …
Couples who are not married to each other but live together as if they are married, belong to category 4. This category is for people who live in every respect as a married couple except that they have not undergone a marriage ceremony. Only male/female couples should indicate this category – the census does not collect data on gay couples.”
74
Case 1:PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN
01 1 28 12 1930 071 01 1 02
02 1 17 06 1937 064 02 1 01
03 1 06 02 1935 066 06 8
04 2 06 03 1984 007 09 5 00 99
V.2b4b: age and DOB inconsistent, age <= DOB,Age=071 Date=28/12/1930
V.2b4b: age and DOB inconsistent, age <= DOB,Age=064 Date=17/06/1937
V.2b4b: age and DOB inconsistent, age <= DOB,Age=007 Date=06/03/1984
V.7i: same sex marriage w/ MSs not both 4
PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN
01 1 28 12 1930 070 01 1 02
02 2 17 06 1937 063 02 1 01
03 1 06 02 1935 066 06 8
04 2 06 03 1984 016 09 5 00 99
75
Case 2:
PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN
01 2 16 01 1956 044 01 5 01 01
02 2 09 05 1991 009 02 5
V.2b4b: age and DOB inconsistent, age <= DOB,Age=044 Date=16/01/1956
V.7a: imputing SPN for head to point to spouse SPN=Spouse= 0002
V.7e: imputing head MS from female head MS= 5 SPN= 02
V.7g: spouse too young ... impute from age Head Age = 045 Sp Age= 009
V.7i: same sex marriage w/ MSs not both 4
V.7m: imputing sp MS from hot deck
V.7n: making spouse SPN point to head
PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN
01 2 16 01 1956 045 01 1 02 01 01
02 1 09 05 1991 026 02 1 01
76
Case 3:
PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN
01 2 03 03 1976 025 01 4 01 01 99 99
02 2 14 08 1979 021 02 4 99
03 1 03 08 1995 005 03 5 02 01
V.2b4b: age and DOB inconsistent, age <= DOB, Age=025 Date=03/03/1976
V.7a: imputing SPN for head to point to spouse
V.7h: same sex marriage, both head & spouse MS = 4
V.7n: making spouse SPN point to head
PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN
01 2 03 03 1976 024 01 4 02 01 01 99 99
02 2 14 08 1979 021 02 4 01 99
03 1 03 08 1995 005 03 5 02 01
77
Editing examples:
Young heads of household Population group Access to telephones Same-sex marriages Fertility
78
Fertility (V.27)
Fertility is not applicable for men or women not 12:49 years old.
For women 12:49, blanks in fertility section are treated as zeros.
Handle common enumerator and reporting errors– Switch lines when turning to next page;– Husband report fertility, not wife;– Last child info with child, not mother.
79
Notes:
TCEB = Total children ever born
MCEB = Male children ever born
FCEB = Female children ever born
TCS = Total children surviving
MCS = Male children surviving
FCS = Female children surviving
SXLAST = Sex of last child born
VSLAST = Vital status of last child born (still alive?)
YRLAST = Year of birth of last child born
MOLAST = Month of birth of last child born
80
Fertility Fertility is valid if all of the following are true: TCEB = MCEB + FCEB, and TCS = MCS + FCS, and TCEB >= TCS, and MCEB >= MCS, FCEB >= FCS, and number of boys in the household who declared this person as their mother (using
mother person number) ≤ MCS, and number of girls in the household who declared this person as their mother (using
mother person number) ≤ FCS, and and woman's age ≥ (11 + TCEB), and FCEB>0 if SXLAST=female, and MCEB>0 if SXLAST=male, and FCS>0 if SXLAST=female and VSLAST=alive, and MCS>0 if SXLAST=male and VSLAST=alive, and all responses for last child born information (YRLAST, MOLAST, SXLAST,
VSLAST) are complete and valid, or else they are all blank (indicating no births);
81
Fertility
Also, maximum number of children (24 total and 12 per sex).
When bad CEB or CS values can be calculated, then we do that.
When fertility is not valid, impute a consistent set of fertility responses from a deck (based on age, marital status, education level); then confirmlast child born info from woman’s children in household.
82
83
Total Births (for women 12:49 years)
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
num ber of children
per
cen
t o
f w
om
en
raw
edited
Total children still living (for women 12:49 years)
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
num ber of children
per
cen
t o
f w
om
en
raw
edited
84
Case 1:
PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB
01 1 041
02 2 038 04 02 02 04 02 02 08 1991 2 1
03 2 022 71 01 00 01 01 00 06 1999 1 1
04 1 012
05 2 009
06 1 001
V.27: problems detected in fertility info ... PN= 03
V.27b: imputing TCEB = MCEB+FCEB PN= 03 TCEB=71 MCEB=01 FCEB=00
PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB
01 1 041
02 2 038 04 02 02 04 02 02 08 1991 2 1
03 2 022 01 01 00 01 01 00 06 1999 1 1
04 1 012
05 2 009
06 1 001
85
Case 2:PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB01 2 05402 2 035 02 01 01 02 01 01 2 103 2 020 0004 2 014 0005 1 01206 2 005
V.27: problems detected in fertility info ... PN=02V.27POST: LAST info blank, imputing from youngest child PN= 02(updates FCEB, TCEB, FCS, TCS) V.27e: imputing fertility data from AFERTILITY PN=03V.27e: imputing fertility data from AFERTILITY PN=04
PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB01 2 05402 2 035 02 01 01 02 01 01 11 1995 2 103 2 020 00 00 00 00 00 0004 2 014 00 00 00 00 00 0005 1 01206 2 005
86
Case 3:PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB01 2 042 05 02 03 05 02 03 09 1997 2 102 2 021 01 01 01 01 04 1994 1 103 2 018 01 00 00 01 00 00 01 2001 104 1 02005 1 01406 2 00307 1 00208 2 000V.27: problems detected in fertility info ... PN= 02V.27c: imputing FCEB = TCEB-MCEB PN= 02V.27g: imputing FCS = TCS-MCS PN= 02 V.27: problems detected in fertility info ... PN= 03V.27b: imputing TCEB = MCEB+FCEB PN= 03 V.27j: imputing fertility from hot deck PN= 03 PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB01 2 042 05 02 03 05 02 03 09 1997 2 102 2 021 01 01 00 01 01 00 04 1998 1 103 2 018 01 00 01 01 00 01 01 2001 2 104 1 02005 1 01406 2 00307 1 00208 2 000
87
Fertility
Issues:– If woman reports zero TCEB and leaves rest
blank, does that mean “no fertility” or “error”?– See if last child born can be handled separately
from rest of fertility, so that full set is not imputed when last child born has problems and rest is valid
88
Conclusions
Edits part of the series of census procedures Usually more for aesthetics than technical
enhancement Hardware and software changing rapidly The revolution continues!