by cleophas kiio director, ict 15-sep-101 the best practices in census data processing operation:...

25
By Cleophas Kiio Director, ICT 15-sep-10 1 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

Upload: christopher-dixon

Post on 27-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

By Cleophas Kiio

Director, ICT

15-sep-10 1

The Best Practices in Census Data Processing Operation:

Case of 2009 Census:

Page 2: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

Overview• Data processing Activities Review• Planning for Data processing• Setting the Data processing site• Implementation• Data capture• Analysis• Dissemination• Archival

15-sep-10 2

Page 3: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

Data Processing Activities Review DP follows the completion of field data

collection and entails the following:• Capture• Cleaning/Editing • Tabulation• Analysis• Dissemination• Archival

15-sep-10 3

Page 4: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

Planning for Data Processing (DP)

15-sep-10 4

1. Identification of Methodology/technology:– Keying From Paper (KFP) - Manual Data Entry largely used in KNBS for

small Surveys– Keying From Image (KFI) -scanning– Optical Mark Reading (OMR)- scanning– Optical/Intelligent Character Recognition (OCR/ICR) - scanning– Online data capture – use of pc– Use of mobile devices (PDA)• For the 2009 Census, KNBS chose scanning technology with

OCR/ICR having used the same in the 1999 Census.• A study tour the US Census Bureau was conducted to

understudy the best practices.• Major considerations were the budget and availability of

technical knowhow.

Page 5: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

Planning for Data Processing (DP) cont’d

2. Selection of Tools and Equipment:• Computers – acquired 125 high capacity computers with duo screens.• Servers- 3 high-end servers did the census (32 GB memory, multiple

processors, 1 Terabyte secondary storage each)• Storage – 3 high capacity Storage Area Networks (SANs) were procured

initially 5 Terabytes (TB) each but later upgraded to 14 TB each.• Software-

– Capture software - with the challenges faced the 1999 census where the bureau used the AFPS pro from Top Image Systems (TIS), the Bureau chose to use the iCADE system ( integrated Computer Assisted Data Entry System) developed by the US Census Bureau.

– Cleaning/tabulation- Cspro (Census and Surveys Processing software)• Scanners- 3 new Kodak 1860 high volume scanners were acquired in

addition to the 2 existing Kodak 1900 scanners used during the 1999 Census. Capable of scanning over 200 ppm.

• Network infrastructure- all computers, scanners, servers and SAN were connected in a wide area network (WAN)

15-sep-10 5

Page 6: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

Planning for Data Processing (DP) cont’d

3. Design of Questionnaires• As standard practice questionnaires are developed

and designed with technology to be used in mind.• The 2009 Census questionnaires were designed by

highly trained Bureau staff.• Technical support was offered by the US Census

Bureau• Precision in design was critical for compatibility

with the iCADE system.

15-sep-10 6

Page 7: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

Setting Up the DP Site

1. Planning the layout (library, KFI, OCR/Manual registration, server room, editing )

2. Installing the computer network3. Installing the power supply system and provisioning

for power backup system: UPS and generators4. Installing the furniture, lifts and Air-conditioning5. Procuring high bandwidth internet.6. A ware house for storage7. Recruitment of staff

15-sep-10 7

Page 8: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

15-sep-10 8

– Installation Systems and testing was completed after census enumeration

– Integrated Computer Aided Data Entry (iCADE) system training

– In 2009 we had approximately 12 million A3 questionnaires.

– Engaged close to 500 personnel for the processing.

– Processing took less than a one (1) year to complete

Implementation

Page 9: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

• Tracking of questionnaires done with a custom made tracking system

• with inbuilt geocode list to ascertain completeness and flow control

• Guillotining- trimming/cutting off the spirals• iCADE system processes

o Batching- registering books from each EA in the iCADE o Scanningo Auto and Manual registrationo Exception reviewo OCR reviewo Key From Image (KFI)

15-sep-10 9

a) 2009 Data Capture Processes

Page 10: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

15-sep-10 10

Exception Review

Check-in and Guillotining

Batching Scanning

Library (Questionnaires Holding area)

Images and Script files database

Server/SAN

Auto and Manual

Registration

Key From Image (KFI)

OCR Review

Output Data Server/SAN

Minimum Interaction

Process Flow

iCADE Processes flow

Page 11: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

Capture Output– Captured data was output to a text file then auto-

formated as input to the CSPro software

– OCR characters read: 2,485,008,272 with an accuracy rate of 99.86% (0.14% error)

– KFI characters keyed: 228,771,647 with a 99.94 accuracy rate (0.055%error)

– This means the OCR read over 90% of the characters with a very high accuracy rate (OCR review definitely helped get this accuracy rate but customization algorithms had to be added to the quality).

– 22,326,373 images from the census questionnaires– 273,201 books in 144,098 batches

– 10,602 batches went to exception review and 133,496 batches bypassed Exception Review altogether and went straight into OCR.

15-sep-10 11

Page 12: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

15-sep-10 12

b) 2009 Data Analysis– KNBS used CSPro a freeware from the US

Census Bureau.– This process required:– Subject matter specialists provide

editing rules– Programmers implement editing rules

through programs– The team developed the editing

program with which data is cleaned.

Page 13: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

Editing/cleaning and Imputation

15-sep-10 13

• Systematic inspection of invalid and inconsistent responses, and subsequent manual or automatic correction according to predetermined rules (edit specs).

• Imputation is the procedure of assigning values to missing, invalid, or inconsistent data using a set of predefined criteria embedded to an editing program.

Page 14: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

Why Edit and Impute?• Clean up data to facilitate analysis• Identify types and sources of error• Improve quality of census data

• Errors must be detected and their causes identified

• Appropriate corrective measures are taken toimprove the overall data quality.

15-sep-10 14

Page 15: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

15-sep-10 15

Graphic flow of Editing and imputation

Editing and Imputation (Edit

Specs) Data Cleaning

Program

iCADE Output Data

Codes book

(Dictionary)

Clean Data

Page 16: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

c) Data Tabulation– Process of producing data outputs (tables,

frequencies, cross-tabulations,…)– Requires subject matter specialists to prepare

dummy output layouts supported programmers– Data in then presented in this tabular layouts.

15-sep-10 16

Page 17: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

15-sep-10 17

Graphic flow of Tabulation

Reports

Preferred Presentation (Tabulation

Specs)

Tabulation Program

Volume IA

Clean Data

Volume IB Volume IC Volume II …

Codes book

(Dictionary)Area Names

Page 18: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

d) Data Dissemination– Providing public with information through census

books, fliers, CDs, DVDs, online databases (Census info, IMIS, sms service)

e) Data Archival– Documentation for permanent storage for further

and future analysis

15-sep-10 18

Page 19: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

15-sep-10 19

– Ware-house was located about 10Km from processing centre

– Inadequate processing space– Printing was not perfect this affected the OCR – Limited number and constant breakdown of

the KNBS dedicated lift slowed down processing.

– Power outages posed a major challenges – Being a new system, there was a cautious and

slow acceptance of the system.

Challenges

Page 20: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

15-sep-10 20

– Comprehensive DP plan be developed with clearly defined objectives:

1. Efficiency and effectiveness to process in the shortest time possible.

2. Control cost of processing to avoid budget overruns.

3. Quality data output– Carry out risk analysis beforehand to identify

potential pitfalls and put in place mitigation measures.

Best practices: Lessons learnt

Page 21: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

15-sep-10 21

Best practices: Lessons learnt cont’d

– Cartographic mapping be completed 1 year before census

– geographical codes and related documentation (geo-codes) to be ready 6 months before enumeration.

– Timely acquisition of census tools and equipment– DP site be ready 6 months before enumeration

date for test runs .– Technical and maintenance support measures

must be instituted and enforced.

Page 22: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

15-sep-10 22

– Questionnaires and manuals be ready 5 months before census date to allow for logistics and pretesting.

– Total quality control at the printing press must be ensured for precision printing.

– Recruitment and training of staff be done before the census date.

– DP site be located in close proximity to the questionnaire warehouse

Best practices: Lessons learnt cont’d

Page 23: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

15-sep-10 23

Conclusion • Despite the challenges, it was possible to

complete DP in less than a year after census.• However better planning and organization of

the exercise it possible to complete the exercise within 6 months after enumeration.

• The lessons learnt may form the recommendations that if adopted the above can be attained.

Page 24: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

15-sep-10 24

Page 25: By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:

15-sep-10 25

Thank You!