1 population and housing census and survey editing michael j. levin center for population and...

88
1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University [email protected]

Upload: natalie-armstrong

Post on 04-Jan-2016

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

1

Population and Housing Census and Survey Editing

Michael J. LevinCenter for Population and Development Studies

Harvard [email protected]

Page 2: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

2

Appendix A Censuses where some of these methods were applied

Country Census Years American Samoa 1974, 1980, 1990, 2000 Ethiopia 2007 Fiji 1996, 2007 Ghana 1984, 2000, 2010 Grenada 2001 Guam 1980, 1990, 2000 Indonesia 1980, 2010 Kenya 1999 Kiribati 2005 Lesotho 1996, 2006 Malawi 1998, 2008 Maldives 2006 Marshall Islands 1973, 1980, 1988 Micronesia 1973, 1980, 1994, 2000 Northern Marianas 1973, 1980, 1990, 1995, 2000 Palau 1973, 1980, 1990, 1995, 2000, 2005 Papua New-Guinea 1990 Samoa 2001 Sierra Leone 2004 Solomon Islands 1999 South Africa 2001 Sudan 2008 Tanzania 2002 Timor Leste 2004 Tonga 1996, 2006 Uganda 1991, 2002 US Virgin Islands 1980, 1990, 2000 Vanuatu 1989 Zambia 2000 Note: For some, processing occurred during the census, for others it was during preparation or during analysis (including own children estimation).

Page 3: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

3

Purpose of Handbook

No census data are ever perfect Changes are made -- little documentation Promote communication between subject

specialists and programmers “Cookbook” of suggestions -- presents

possible resolutions But country edit teams must decide

Page 4: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

4

The Census Process

Data collection Capture Editing Tabulation and Dissemination Archiving

Page 5: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

5

History of census editing

Early years – manual or nothing Computers Within record editing Between record editing Hot decking

Page 6: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

6

Editing in Historical Perspective

Before computers: manual editing With computers: Increased complexity Automated changes Generalized editing packages New philosophies of editing Personal computers Appropriate levels of computer editing

Page 7: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

7

Major Elements in a Census

Preparatory work Enumeration Data processing -- keying, editing and

tabulations Building data bases and dissemination Evaluation of results Analysis of results

Page 8: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

8

Errors in Census Process

Coverage Errors Questionnaire Design Enumerator/respondent errors Coding errors Data entry errors Computer editing errors Tabulation errors

Page 9: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

9

What is editing

Editing is the systematic inspection of invalid and inconsistent responses, and subsequent manual or aurtomatic correction according to pre-determined rules.

The editing team!!

Page 10: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

10

Editing Team

Appropriate internal subject matter specialists

Computer Programmers Work together as a team Edit Specs as means of communication Outside experts -- academicians Outside experts -- private sector

Page 11: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

11

Why edit?

Edited vs unedited data Always preserve original data Consider the users!!

Page 12: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

12

Table 1. Sample population by 15-year age group

and sex, using unedited and edited data

Unedited data Edited data Age group Total Male Female Not

reported Total Male Female

Total 4,147 2,033 2,091 23 4,147 2,045 2,102 Less than 15 years 1,639 799 825 15 1,743 855 888 15 to 29 years 1,256 612 643 1 1,217 603 614 30 to 44 years 727 356 369 2 695 338 357 45 to 59 years 360 194 166 0 341 182 159 60 to 74 years 116 54 59 3 114 53 61 75 years and over 34 12 22 0 37 14 23 Not reported 15 6 7 2

Page 13: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

13

TABLE 2. POPULATION AND POPULATION CHANGE BY 15-YEAR AGEGROUP WITH UNKNOWNS: 1990 AND 2000

Numbers Per centAge group

2000 1990

NumberChange

Per centChange 2000 1990

Total 4147 3319 828 24.9 100.0 100.0

Less than 15 years 1639 1348 291 21.6 39.5 40.6

15 to 29 years 1256 902 354 39.2 30.3 27.2

30 to 44 years 727 538 189 35.1 17.5 16.2

45 to 59 years 360 200 160 80.0 8.7 6.0

60 to 74 years 116 89 27 30.3 2.8 2.7

75 years and over 34 25 9 36.0 0.8 0.8

Not reported 15 217 -202 -93.1 0.4 6.5

Table showing trends with unknowns

Page 14: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

14

WHAT CENSUS EDITING SHOULD DO

1 Give users measures of the quality of the data

2 Identify the types and sources of error, and

3 Provide adjusted census results

Page 15: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

15

Goals of the edit

Imputed household should closely resemble failed edit household

Imputed data should come from a single donor person or house resembling donee

Equally good donors should have equal chances

Page 16: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

16

Basics of Census Editing

Systematic inspection and change (not always correction)

Fatal edits -- invalid or missing entries Query edits -- inconsistencies Must preserve the original data as much as possible Quality enumeration more important than editing Edit does not improve data quality -- makes more

esthetic Team must determine how far to do

Page 17: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

17

More of Basics

Over-editing is harmful Treatment of unknowns Spurious changes Determining tolerances Learning from the edit process Quality assurance Costs of Editing Imputation Archiving

Page 18: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

18

How Over-editing is Harmful

Timeliness Finances Distortion of true values A false sense of security

Page 19: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

19

What we have to look out for

Treatment of unknowns Spurious changes Using tolerances Learning from the editing process Quality assurance Costs of editing

Page 20: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

20

Two parts of a national edit

Structure editing Content editing

Page 21: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

21

Editing Applications

Manual versus automatic correction Guidelines for correcting data Validity and consistency checks Methods of correcting and imputing data Other editing systems

Page 22: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

22

Manual versus Automatic Correction Manual correction: takes a long time and

very subject to error Automatic correction: faster and consistent. Not necessarily correct, just consistent. Can look at many variables at the same time Can keep an audit trail

Page 23: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

23

Figure 1. Sample editing specifications to correct sex variable, in pseudocode

If SEX of the HEAD OF HOUSEHOLD = SEX of the SPOUSE If FERTILITY of the HEAD OF HOUSEHOLD is not blank If FERTILITY of the SPOUSE is blank (if the SEX of the head of household is not already female) Make the SEX = female endif (if the SEX of the spouse is not already male) Make the SEX = male endif else Do something else because they have same sex and both have fertility !!! [The “something” could be using the sex of the previous head, or alternating the sex of the Head, or using ratios of sexes of all heads for an appropriate response, etc.] endif Endif Else This is the case where the head of household’s fertility is blank If FERTILITY of the SPOUSE is not blank (if the SEX of the head of household is not already male) Make the SEX = male endif (if the SEX of the spouse is not already female) Make the SEX = female endif else Do something else because BOTH have no fertility!!! [The “something” could be using the sex of the previous head, or alternating the sex of the Head, or using ratios of sexes of all heads for an appropriate response, etc.] endif Endif Endif

Page 24: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

24

Guidelines for Correcting Data

Make the fewest required changes possible to the originally collected data

Eliminate obvious inconsistencies among the entries

Systematically supply entries for erroneous or missing items by using other entries for the housing unit, person, or other persons in the household or group

When appropriate, use “not reported”

Page 25: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

25

Page 26: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

26

Types of editing

Top Down• The usual way• Is simple and straight forward Multiple-variable editing approach• Uses more information• Is likely to be a better guess

Page 27: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

27

Methods of Correction and Imputation When imputation is not needed – toggling

sexes Static imputation – cold deck technique Dynamic imputation – hot deck technique

Page 28: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

28

Hot Deck Imputation

Geographic considerations Use of related items Sequence of the items Complexity of the matrices Standardized hot decks Size of hot decks -- too big, audit trail,

too small, difficult items

Page 29: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

29

In developing hot decks

Imputation matrices – structure of the matrices Standardized imputation matrices Seeding the decks Big, but not too big Understanding what the matrix is doing When the matrix is too small … Occupation and industry!!

Page 30: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

30

Aids to checking edits

1. Listings

2. Writing whole households before and after with changes

3. Frequency matrices

Page 31: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

31

Figure 4. Example of a listing summary for Malawi 2008 Census[LISTING]

1718 336574 - ******************************** ... - 1719 336574 - ******* Age & Head ********* ... - 1720 336574 - ******************************** ... - 1805 1546 0.1 *P00-1* Head is not first person, is %2d... 1748490 1823 877 0.1 *P00-2* No head of household, first person 14+... 1748490 1835 62 0.0 *P00-3* No head 14+, first person becomes head... 1748490 1850 5074 0.3 *P00-4* Too many heads of household - 1 ... 1748490 1860 5238 0.3 *P00-5* Remaining heads made other RELATIONSHI... 1748490 1874 939 0.1 *P00-6* After head edit, not one and only one ... 1748490 1889 2301 0.1 *P00-6a* Spouses too young made other relative... 1748490 1909 1062 0.1 *P00-6ax* Multiple spouses for unmarried head... 1748490 1911 1062 0.1 *P00-6ax* Multiple spouses for unmarried head... 1748490 1929 44 0.0 *P00-6a1* Crazy case where spouse is visitor a... 1748490 1949 89 0.0 *P00-6a3* Crazy case where spouse is visitor a... 1748490 1998 12 0.0 *P00-6a1* Extra spouses who are visitors... 1748490 2017 1483 0.1 *P00-6a2* Extra spouses not married... 1748490

Page 32: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

32

Figure 5. Example of a listing summary for Lesotho 2006 Census[LISTING]

4388 21471 - ... - 4389 21471 - ******* Sisterhood Characteristics *********... - 4390 21471 - ... - 4401 1449 1.2 *G45-1* Total sisters out of range [%2d] illeg... 124839 4410 2897 2.3 *G45-2* Dead sisters out of range [%2d] illega... 124839 4419 3791 3.0 *G45-3* Pregnant sisters [%2d] illegal... 124839 4426 3895 3.1 *G45-4* At birth sisters [%2d] illegal... 124839 4433 4908 3.9 *G45-5* Week 6 sisters [%2d] illegal... 124839 4440 103 0.1 *G45-6* Sum of Dead Sisters [%2d][%2d][%2d] gr... 124839 4453 8 0.0 *G45-7* Sum of Dead Sisters [%2d][%2d][%2d] gr... 124839 4461 616 0.5 *G45-8* Dead Sisters [%2d] greater than total ... 124839

Page 33: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

33

Figure 8. Example of a write listing for Ethiopia 2007 Census[WRITE]

BARCODE REGION ZONE WEREDA TOWN SUB_CITY SA KEBELE EA HHNO HUNO ------------------------------------------------------------------------- PN RS RH SX AG RL MT ET DS 1 2 3 4 5 6 7 8 9 0 1 2 3 CS YR PR ZN MO FA LT SC HG WL RS LY ES MS MH FH MA FA MD FD LB 01 01 01 01 31 01 05 67 02 08 01 01 01 97 12 01 01 07 01 02 01 06 01 34 01 05 67 02 08 01 01 01 97 17 01 01 03 01 03 01 09 02 30 01 05 05 02 07 02 02 97 05 01 01 05 04 00 00 00 00 00 00 00 04 01 09 02 20 01 05 05 02 03 02 02 02 98 03 01 03 01 01 00 00 00 00 00 00 05 01 09 01 01 01 05 05 02 08 03 07 08 01 P18-3 No literacy , but schooling 97, so literate, PN = 3 P20-20 Unable to read and write 98 because never attended school , PN = 4 P16-1 Mother's vital status invalid = PN = 5 P17-1 Father's vital status invalid = PN = 5 PN RS RH SX AG RL MT ET DS 1 2 3 4 5 6 7 8 9 0 1 2 3 CS YR PR ZN MO FA LT SC HG WL RS LY ES MS MH FH MA FA MD FD LB 01 01 01 01 31 01 05 67 02 08 01 01 01 97 12 01 01 07 01 02 01 06 01 34 01 05 67 02 08 01 01 01 97 17 01 01 03 01 03 01 09 02 30 01 05 05 02 07 02 02 01 97 05 01 01 05 04 00 00 00 00 00 00 00 04 01 09 02 20 01 05 05 02 03 02 02 02 98 00 03 01 03 01 01 00 00 00 00 00 00 05 01 09 01 01 01 05 05 02 08 01 01 01 01

Page 34: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

34

Figure 10. Example of a frequency distribution for Sudan 2008 Census[FREQUENCY]

Imputed Item Q18_ATTAINMENT: Education Attainment - all occurrences _____________________________ _____________ _____________ Categories Frequency CumFreq % Cum % Net %|cNet % _______________________________ _____________________________ _____________ _____________ 1 No Qualification 105 105 2.2 2.2 2.4 2.4 2 Incomplete Primary 1564 1669 33.5 35.7 35.3 37.7 3 Primary 4 529 2198 11.3 47.0 11.9 49.6 4 Primary 6 492 2690 10.5 57.6 11.1 60.7 5 Primary 8 302 2992 6.5 64.0 6.8 67.5 6 Junior 3 251 3243 5.4 69.4 5.7 73.2 7 Junior 4 58 3301 1.2 70.7 1.3 74.5 8 Secondary 3 95 3396 2.0 72.7 2.1 76.6 9 Secondary 4 5 3401 0.1 72.8 0.1 76.7 10 Post Secondary Diploma 2 3403 0.0 72.8 0.0 76.8 11 University Degree 154 3557 3.3 76.1 3.5 80.3 12 Post Graduate Diploma 10 3567 0.2 76.3 0.2 80.5 13 Master 52 3619 1.1 77.5 1.2 81.7 14 Ph.D 1 3620 0.0 77.5 0.0 81.7 15 Khalwa 1 3621 0.0 77.5 0.0 81.7 @17 144 3765 3.1 80.6 3.2 85.0 @98 667 4432 14.3 94.9 15.0 100.0 _______________________________ _____________________________ _____________ _____________ NotAppl 240 4672 5.1 100.0 _______________________________ _____________________________ _____________ TOTAL 4672 4672 100.0 100.0

Page 35: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

35

Figure 11. Example of a frequency distribution for additional edit for Zambia 1990 Census[FREQUENCY]

Input: 1IN100.DAT Program: ZAMHOUSE ROOMS ------------------------------------------------------------- Values Number of Cum. Imputed Imputations Percent Percent ------------------------------------------------------------- < 1 1,415 37.21 37.21 1 2,185 57.45 94.66 2 121 3.18 97.84 3 22 0.58 98.42 4 16 0.42 98.84 5 23 0.60 99.45 6 21 0.55 100.00 > 6 - - - ------------------------------------- 3,803

Page 36: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

36

Other considerations

Running the edit three times: seed, run, check

Saving original responses Imputation flags

Page 37: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

37

Computer Edit Specificationsfor Pilot Census 2001Data Processing Project

Christopher S. Corlett

Data Processing Adviser

U.S. Census Bureau

Page 38: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

38

Editing examples:

Language – the general edit Young heads of household Population group Access to telephones Same-sex marriages Fertility

Source for all data except language: South Africa Pilot Census 2001

Page 39: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

39

Language Edit

If this is the head and language is missing, first look for someone else in the house with language, and assign that.

If this is the head without language, no one else has language, use neighboring head of similar characteristics to assign a best guess.

If this is someone else in the house and language is missing, assign the head’s language.

Page 40: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

40

Language Edit: Within House

91200217 Population Group Case = 0009 ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 034 01 1 55 1 09 1 1 02 2 023 02 1 06 55 1 07 1 1 03 2 005 03 1 06 55 1 09 1 1 04 2 003 03 1 06 55 1 09 1 1V.14c: P07 invalid for head, imputing from other PN = 01 Lang = Oth lang = 06V.14c: P07 invalid for head, imputing from other PN = 01 Lang = 06 Oth lang = 06V.14c: P07 invalid for head, imputing from other PN = 01 Lang = 06 Oth lang = 06end ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 034 01 1 06 55 1 09 1 1 02 2 023 02 1 06 55 1 07 1 1 03 2 005 03 1 06 55 1 09 1 1 04 2 003 03 1 06 55 1 09 1 1

Page 41: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

41

Language Edit: Imputed House

91200697 Language Case = 0027 ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 027 01 1 1 09 1 1 02 2 027 02 1 1 09 1 1 03 1 005 03 1 1 09 1 1V.14d: P07 invalid, imputing from deck ALANGUAGE PN = 01 Lang =V.15d: P08 invalid for head, impute from deck ARELIGIO PN = 01 Head Relig =V.14f: P07 invalid, imputing from head PN = 02 Lang = Head's lang = 06V.15f: P08 invalid, imputing from head's religion PN = 02 Relig = Head'srelig = 38V.14f: P07 invalid, imputing from head PN = 03 Lang = Head's lang = 06V.15b: imputing P08 from mother's religion PN = 03 Relig = Mo relig = 38end ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 027 01 1 06 38 1 09 1 1 02 2 027 02 1 06 38 1 09 1 1 03 1 005 03 1 06 38 1 09 1 1

Page 42: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

42

Editing examples:

Language – the general edit Young heads of household Population group Access to telephones Same-sex marriages Fertility

Source for all data: Pilot Census 2001

Page 43: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

43

Young heads of household

V.3 (relationship for head) and V.5 (age of head) Related issue: each HH must have 1 and only 1

head. For invalid head of ages, try to obtain via:

– spouse (impute from deck based on spouse's age and head's sex)

– otherwise, children (child's age and head's sex)– otherwise, impute from deck (household size

and head's sex)

Page 44: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

44

Young heads

Skepticism about young heads; if younger than 12 then confirm:

– if someone else older is present, then make them the head (V.3)– can't be married (must be 12+ years to be married)– has to be 12 years older than biological children– confirm consistency of age and educational level– confirm consistency of age and educational institution– can't have economic activity responses if younger than 10– can't have fertility (for girls)

If head doesn't pass these age tests, then impute (based on head’s sex and household size).

Page 45: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

45

Young heads

Effect: number of heads younger than 12 years old drops from 1296 (1.3%) to 627 (0.6%)

Page 46: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

46

Page 47: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

47

Page 48: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

48

Notes:

PN = Person number

SEX = Sex

DOB = Day of birth

MOB = Month of birth

YOB = Year of birth

REL = Relationship to head

MAR = Marital status

SPN = Spouse person number

CEB = Children ever born (total)

CS = Children surviving (total)

MPN = Mother person number

FPN = Father person number

Case 1:

Page 49: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

49

PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN01 1 11 01 1950 051 01 1 99 99 0102 1 17 07 1977 023 03 5 0103 2 04 04 1985 005 03 5 00 09 0104 1 24 10 1987 011 03 5 53 0105 1 01 07 1990 010 03 5 49 0106 1 20 02 1994 007 01 5 99 0107 1 20 02 1994 007 5 99 01

V.2b4b: age and DOB inconsistent, age <= DOB, Age = 005 Date = 04/04/1985V.2b4b: age and DOB inconsistent, age <= DOB, Age = 011 Date = 24/10/1987V.3: either no heads or > 1= 0002V.3h: more than 1 head =V.3i: multiple heads, making oldest= 0051V.3k: multiple heads, making excess other relV.9g: Relation invalid, has a dad, impute Rela

PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN01 1 11 01 1950 051 01 1 99 99 0102 1 17 07 1977 023 03 5 0103 2 04 04 1985 015 03 5 00 09 0104 1 24 10 1987 013 03 5 53 0105 1 01 07 1990 010 03 5 49 0106 1 20 02 1994 007 11 5 99 0107 1 20 02 1994 007 03 5 99 01

Case 1:

Page 50: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

50

Case 2:

PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN

01 2 01 09 1986 015 01 5 00 90

02 2 09 06 1990 011 06 5

03 1 01 09 1991 010 06 5 99

04 2 01 09 1994 007 06 5 99

V.2b4b: age and DOB inconsistent, age <= DOB, Age = 015 Date = 01/09/1986

V.2b4b: age and DOB inconsistent, age <= DOB, Age = 011 Date = 09/06/1990

V.2b4b: age and DOB inconsistent, age <= DOB, Age = 010 Date = 01/09/1991

V.2b4b: age and DOB inconsistent, age <= DOB, Age = 007 Date = 01/09/1994

V.3a1: head is younger than 16, Age = 014

V.3a3: no older relatives found; keep young head

PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN

01 2 01 09 1986 014 01 5 00 90

02 2 09 06 1990 010 06 5

03 1 01 09 1991 009 06 5 99

04 2 01 09 1994 006 06 5 99

Page 51: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

51

Case 3:PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN

01 1 12 01 1998 003 09 05

02 2 008 09 05

V.3b: no head of household!

V.3e: no head, making oldest person the head

V.5: head is younger than 12, about to confirm this

V.5e1: young head, but age consistent with educ lvl

V.5i1: young head, but age consistent with educ inst

V.5k: imputing young head's age from AHEADAGE for econ activity inconsistency

PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN

01 1 99 99 1908 092 01 05

02 2 12 01 1998 003 09 05

Page 52: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

52

Editing examples:

Young heads of household Population group Access to telephones Same-sex marriages Fertility

Page 53: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

53

Population Group (V.13)

For invalid population group, try to obtain via:– Head of household– Someone else in the household– Otherwise, impute from deck (age by household size)

Effects:– Removes 2.9% blank/invalid responses

Page 54: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

54

Population Group (percents)

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

BlackAfrican

Coloured Indian orAsian

White Other blank invalid

per

cen

traw

edited

Page 55: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

55

Population Group

Parts of the current edit might need refinement for South Africa

Issues to explore:– Imputations in HHs with multiple pop groups;– Tolerances and household size:

Case where whole HH has blank/invalid pop group; Case where all but 1 HH member has same pop group; Situations between these two extremes

– Effect on planning/data use of leaving the variable “not stated”

Page 56: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

56

Population Group

Page 57: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

57

Case 1:PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM

01 1 073 01 1 06 55 1 09 1 1

02 2 063 02 1 06 55 1 09 1 1

03 2 025 11 1 06 55 1 09 1 1

04 1 016 09 1 06 55 1 09 1 1

05 1 014 09 1 06 55 1 09 1 1

06 2 011 09 1 06 55 1 09 1 1

07 2 000 11 1 09 1 1

V.13e: Pop group invalid, impute from head PN=07 Group=Head Group= 1

PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM

01 1 073 01 1 06 55 1 09 1 1

02 2 063 02 1 06 55 1 09 1 1

03 2 025 11 1 06 55 1 09 1 1

04 1 016 09 1 06 55 1 09 1 1

05 1 014 09 1 06 55 1 09 1 1

06 2 011 09 1 06 55 1 09 1 1

07 2 000 11 1 06 55 1 09 1 1

Page 58: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

58

Case 2:PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM01 1 032 01 3 02 39 1 08 710 1 102 2 028 02 1 08 1 103 1 068 07 1 08 1 104 2 057 07 1 08 1 105 2 007 03 1 06 1 106 1 006 03 1 08 1 107 1 001 03 1 08 1 108 2 030 12 1 07 09 1 1

V.13e: Pop group invalid, impute from head (SIX TIMES)

PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM01 1 032 01 3 02 39 1 08 1 102 2 028 02 3 02 39 1 08 1 103 1 068 07 3 02 39 1 08 1 104 2 057 07 3 02 39 1 08 1 105 2 007 03 3 02 39 1 06 1 106 1 006 03 3 02 39 1 08 1 107 1 001 03 3 02 39 1 08 1 108 2 030 12 1 07 39 1 09 1 1

Page 59: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

59

Case 3:

PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM

01 1 045 01 01 32 1 01 1

02 2 048 02 01 32 1 01 1

V.13b: Pop group invalid, impute from deck

V.13e: Pop group invalid, impute from head

PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM

01 1 045 01 4 01 32 1 01 1 1

02 2 048 02 4 01 32 1 01 1 1

Page 60: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

60

Editing examples:

Young heads of household Population group Access to telephones Same-sex marriages Fertility

Page 61: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

61

Telephones and cell phones (IV.16)

Telephone access is not applicable for households that have telephones or cell phones.

Households with responses to the telephone access question should not have telephones or cell phones.

Impute these variables from hot decks (based on dwelling type and tenure status) if necessary.

Page 62: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

62

Page 63: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

63

Telephones and cell phones

Many left all questions blank– Problems with capture of continuation qsts– Confusion of “blank” and “no” (also seen in

disabilities section)

Page 64: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

64

Summary report:

Page 65: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

65

Case 1:

DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD TV CMP FRG TEL CLL ACC RFS

01 2 006 1 4 7 4 1 5 1 1 1 2 1 2 2 4

IV.16c: impute cell phone = no Phone2 Cell= Access= 2

DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD TV CMP FRG TEL CLL ACC RFS

01 2 006 1 4 7 4 1 5 1 1 1 2 1 2 2 2 4

Page 66: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

66

Case 2:

DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD TV CMP FRG TEL CLL ACC RFS

01 006 1 4 1 4 1 1 1 1 1 2 1 1 4

IV.16h: imputed cell = 1 from deck

DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD TV CMP FRG TEL CLL ACC RFS

01 2 006 1 4 1 4 1 1 1 1 1 2 1 1 1 4

Page 67: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

67

Case 3:

DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD TV CMP FRG TEL CLL ACC RFS

02 2 005 1 4 1 4 4 4 1 1

IV.13c: imputed television

IV.14c: imputed computer

IV.15c: imputed refrigerator

IV.16f: imputed telephone

IV.16h: imputed cell

IV.16j: imputed access

IV.17c: imputed rubbish

DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD TV CMP FRG TEL CLL ACC RFS

02 2 005 1 4 1 4 4 4 1 1 1 2 2 2 2 1 4

Page 68: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

68

Editing examples:

Young heads of household Population group Access to telephones Same-sex marriages Fertility

Page 69: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

69

Same-sex marriages (V.7, V.8, and V.12)

Treated as part of the marital status edits for heads and rest of household

Imputations for invalid sex never result in a same-sex marriage

No polygamous combinations of same-sex allowed

Page 70: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

70

Same-sex marriages

Skepticism about same-sex marriages; only allowable if:– Both partners 12 years or older;– Both sexes valid;– Relationships to head consistent (for sub-families);– Both partners’ marital statuses reported as “living

together” (4).

Page 71: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

71

Page 72: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

72

Same-sex marriages

Investigation shows that almost all of the reported same-sex marriages are erroneous.

Enumerator’s manual contains instructions that add bias against accurate collection.

Social situation in SA means that this might become a contentious issue.

Page 73: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

73

Same-sex marriages

Enumerator’s Manual, pg 38:

“Question P-05: Marital Status …

Couples who are not married to each other but live together as if they are married, belong to category 4. This category is for people who live in every respect as a married couple except that they have not undergone a marriage ceremony. Only male/female couples should indicate this category – the census does not collect data on gay couples.”

Page 74: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

74

Case 1:PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN

01 1 28 12 1930 071 01 1 02

02 1 17 06 1937 064 02 1 01

03 1 06 02 1935 066 06 8

04 2 06 03 1984 007 09 5 00 99

V.2b4b: age and DOB inconsistent, age <= DOB,Age=071 Date=28/12/1930

V.2b4b: age and DOB inconsistent, age <= DOB,Age=064 Date=17/06/1937

V.2b4b: age and DOB inconsistent, age <= DOB,Age=007 Date=06/03/1984

V.7i: same sex marriage w/ MSs not both 4

PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN

01 1 28 12 1930 070 01 1 02

02 2 17 06 1937 063 02 1 01

03 1 06 02 1935 066 06 8

04 2 06 03 1984 016 09 5 00 99

Page 75: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

75

Case 2:

PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN

01 2 16 01 1956 044 01 5 01 01

02 2 09 05 1991 009 02 5

V.2b4b: age and DOB inconsistent, age <= DOB,Age=044 Date=16/01/1956

V.7a: imputing SPN for head to point to spouse SPN=Spouse= 0002

V.7e: imputing head MS from female head MS= 5 SPN= 02

V.7g: spouse too young ... impute from age Head Age = 045 Sp Age= 009

V.7i: same sex marriage w/ MSs not both 4

V.7m: imputing sp MS from hot deck

V.7n: making spouse SPN point to head

PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN

01 2 16 01 1956 045 01 1 02 01 01

02 1 09 05 1991 026 02 1 01

Page 76: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

76

Case 3:

PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN

01 2 03 03 1976 025 01 4 01 01 99 99

02 2 14 08 1979 021 02 4 99

03 1 03 08 1995 005 03 5 02 01

V.2b4b: age and DOB inconsistent, age <= DOB, Age=025 Date=03/03/1976

V.7a: imputing SPN for head to point to spouse

V.7h: same sex marriage, both head & spouse MS = 4

V.7n: making spouse SPN point to head

PN SEX DOB MOB YOB AGE REL MAR SPN CEB CS MPN FPN

01 2 03 03 1976 024 01 4 02 01 01 99 99

02 2 14 08 1979 021 02 4 01 99

03 1 03 08 1995 005 03 5 02 01

Page 77: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

77

Editing examples:

Young heads of household Population group Access to telephones Same-sex marriages Fertility

Page 78: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

78

Fertility (V.27)

Fertility is not applicable for men or women not 12:49 years old.

For women 12:49, blanks in fertility section are treated as zeros.

Handle common enumerator and reporting errors– Switch lines when turning to next page;– Husband report fertility, not wife;– Last child info with child, not mother.

Page 79: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

79

Notes:

TCEB = Total children ever born

MCEB = Male children ever born

FCEB = Female children ever born

TCS = Total children surviving

MCS = Male children surviving

FCS = Female children surviving

SXLAST = Sex of last child born

VSLAST = Vital status of last child born (still alive?)

YRLAST = Year of birth of last child born

MOLAST = Month of birth of last child born

Page 80: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

80

Fertility Fertility is valid if all of the following are true: TCEB = MCEB + FCEB, and TCS = MCS + FCS, and TCEB >= TCS, and MCEB >= MCS, FCEB >= FCS, and number of boys in the household who declared this person as their mother (using

mother person number) ≤ MCS, and number of girls in the household who declared this person as their mother (using

mother person number) ≤ FCS, and and woman's age ≥ (11 + TCEB), and FCEB>0 if SXLAST=female, and MCEB>0 if SXLAST=male, and FCS>0 if SXLAST=female and VSLAST=alive, and MCS>0 if SXLAST=male and VSLAST=alive, and all responses for last child born information (YRLAST, MOLAST, SXLAST,

VSLAST) are complete and valid, or else they are all blank (indicating no births);

Page 81: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

81

Fertility

Also, maximum number of children (24 total and 12 per sex).

When bad CEB or CS values can be calculated, then we do that.

When fertility is not valid, impute a consistent set of fertility responses from a deck (based on age, marital status, education level); then confirmlast child born info from woman’s children in household.

Page 82: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

82

Page 83: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

83

Total Births (for women 12:49 years)

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

num ber of children

per

cen

t o

f w

om

en

raw

edited

Total children still living (for women 12:49 years)

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

num ber of children

per

cen

t o

f w

om

en

raw

edited

Page 84: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

84

Case 1:

PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB

01 1 041

02 2 038 04 02 02 04 02 02 08 1991 2 1

03 2 022 71 01 00 01 01 00 06 1999 1 1

04 1 012

05 2 009

06 1 001

V.27: problems detected in fertility info ... PN= 03

V.27b: imputing TCEB = MCEB+FCEB PN= 03 TCEB=71 MCEB=01 FCEB=00

PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB

01 1 041

02 2 038 04 02 02 04 02 02 08 1991 2 1

03 2 022 01 01 00 01 01 00 06 1999 1 1

04 1 012

05 2 009

06 1 001

Page 85: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

85

Case 2:PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB01 2 05402 2 035 02 01 01 02 01 01 2 103 2 020 0004 2 014 0005 1 01206 2 005

V.27: problems detected in fertility info ... PN=02V.27POST: LAST info blank, imputing from youngest child PN= 02(updates FCEB, TCEB, FCS, TCS) V.27e: imputing fertility data from AFERTILITY PN=03V.27e: imputing fertility data from AFERTILITY PN=04

PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB01 2 05402 2 035 02 01 01 02 01 01 11 1995 2 103 2 020 00 00 00 00 00 0004 2 014 00 00 00 00 00 0005 1 01206 2 005

Page 86: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

86

Case 3:PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB01 2 042 05 02 03 05 02 03 09 1997 2 102 2 021 01 01 01 01 04 1994 1 103 2 018 01 00 00 01 00 00 01 2001 104 1 02005 1 01406 2 00307 1 00208 2 000V.27: problems detected in fertility info ... PN= 02V.27c: imputing FCEB = TCEB-MCEB PN= 02V.27g: imputing FCS = TCS-MCS PN= 02 V.27: problems detected in fertility info ... PN= 03V.27b: imputing TCEB = MCEB+FCEB PN= 03 V.27j: imputing fertility from hot deck PN= 03 PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB01 2 042 05 02 03 05 02 03 09 1997 2 102 2 021 01 01 00 01 01 00 04 1998 1 103 2 018 01 00 01 01 00 01 01 2001 2 104 1 02005 1 01406 2 00307 1 00208 2 000

Page 87: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

87

Fertility

Issues:– If woman reports zero TCEB and leaves rest

blank, does that mean “no fertility” or “error”?– See if last child born can be handled separately

from rest of fertility, so that full set is not imputed when last child born has problems and rest is valid

Page 88: 1 Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com

88

Conclusions

Edits part of the series of census procedures Usually more for aesthetics than technical

enhancement Hardware and software changing rapidly The revolution continues!