chapter 4 evaluation of data editing _____ 105 chapter 4 evaluation of data editing process foreword
Post on 06-Jul-2020
Embed Size (px)
EVALUATION OF DATA EDITING PROCESS
FOREWORD by Leopold Granquist, Statistics Sweden
This Chapter aims to serve as a basis to elaborate does not discuss this issue further. The second part of and finally establish standards for useful indicators the paper discusses the role of error lists in evaluating and/or statistical measures on the rationality and editing processes and as a basis for improvements of efficiency of the data processing process, and the data collection. indicating the problem areas of the data collection. What should be measured and how it should be done The third paper is an overview of selected are the underlying issues throughout the Chapter. evaluation studies. Most studies use the error list
An important step towards the mentioned goal is the aid of computers. The rationality indicator taken in the first paper. It outlines the general suggested by Stefanowicz is used by some authors and requirements for a computer system to measure and called the hit-rate. A number of evaluations are monitor the impact of data editing and imputation for focused on the efficiency of editing processes and raise the 2001 UK Census of Population. The UK Office for the question whether resources spent on editing are National Statistics identified the need when evaluating justified in terms of quality improvements. In some the 1991 Census operation. The paper documents the cases the question can be answered by studying the requirements for the "Data Quality Monitoring impact of editing changes on the estimates. How to System" (DQMS), which have been gathered so far. carry out such studies is also described in Chapter 1. DQMS will be developed iteratively on a prototype However, those methods cannot be used for measuring basis and the requirements will be enhanced or the data quality. To obtain measures on how editing re-prioritized as work proceeds. One key requirement affects quality, it is necessary to conduct reinterview will be to allow data experts to check the assumptions studies, record check studies or simulation studies. built into the capture, coding, editing, and imputation Examples of such studies are presented in the paper, of census data, and the impact of these assumptions on which also provides hints as to how the various the data. This requirement is broken down into two methods can be evaluated. Some results from almost parts, standard reports and ad-hoc enquiry facilities. all of the studies are given. The latter will allow intuitive and complementary analysis of the data. A number of standard reports and The fourth paper is a description of two evaluation requirements for the ad-hoc reports are proposed. All studies, each one consisting of a comparison of micro of them will tell implicitly or explicitly what is data collected from two different data sources: survey recognized as important to measure in processing data and administrative data. This is a unique census data. situation. Firstly, among the evaluation studies
The second paper, written by Bogdan Stefanowicz evaluation of statistics collected from administrative proposes indicators on the rationality and the data files. Secondly, in general, data from effectiveness of the set of edits in detecting all errors. administrative sources are not available for comparing The author suggests improvements to take into account survey micro-data with administrative data except for that different types of errors may have a different a few items from the register used as the sampling impact on quality. It does not deal with errors frame. Although the paper does not provide details introduced by editors. The efficiency indicator concerning the editing methods, it covers many aspects involves the number of undetected errors which cannot of editing and evaluation. The need for resources to be found from studies of error lists. The author measure or assess the impact of editing is stressed. suggests that it might be estimated by simulations, but
method in different ways, and perform analysis with
discussed in the second paper by Granquist, there is no
STATISTICAL MEASUREMENT AND MONITORING OF DATA
106 ________________________________________________________________ Evaluation of Data Editing Process
EDITING AND IMPUTATION IN THE 2001 UNITED KINGDOM CENSUS OF POPULATION By Jan Thomas, Census Division, Office for National Statistics, United Kingdom
The Census Offices of the United Kingdom have identified the need to measure and monitor the impacts of the processing systems on census data. Requirements have been identified for a computer system to be known as the “Data Quality Monitoring System” to give this information.
The system will produce a series of standard and ad hoc reports, and will provide comparisons of distributions before and after editing and imputation and simple cross tabulations at various area levels. The data can be visualised, possibly by geographical area, to see where the error is occurring. Work has started on a prototype system and it is hoped that the prototype will be developed for use in the 1997 Census Test.
In addition, it is planned to appoint a team of “data experts” to analyse and interpret the results that will be reported from the system.
This paper outlines the general requirements of the system and those specifically relating to editing and imputation.
Keywords: editing; imputation; measuring and monitoring the data quality.
The UK Census Offices have identified the need to measure and monitor the impacts of processing systems on census data. The evaluation of the 1991 Census operation highlighted the fact that the facilities which were in place to monitor the quality of the data were inadequate, and were employed too late to have any impact on problem resolution. Research is currently underway to produce a computer system which will measure and monitor the data as it is being processed in the 2001 Census.
It is planned that the use of this system, to be known as the "Data Quality Monitoring System" (DQMS), will be extended to cover data capture, coding, derivation of variables and sampling. This paper considers the editing and imputation
requirements only as they are relevant to the data editing process.
It is recognised that to operate this system a team of people who are experts in data analysis will be needed. It is planned to appoint a team of six "Data Experts", with the responsibility for monitoring data as it is processed.
2. BACKGROUND TO EDITING AND IMPUTATION IN THE BRITISH CENSUS
In the 1991 Census, the edit system checked the validity of data and performed sequence and structure checks. Invalid, missing and inconsistent items were identified for the imputation process. The editing process filled in a few missing items. The edit matrices were constructed so as to consider every possible combination of values for relevant items and to give the action, (if any) required should that combination arise, by making the least number of changes.
Imputation was only carried out on the 100% questions and not on sample (10%) questions; this was because most of the 10% questions have lengthy classifications, such as occupation and hence are difficult to impute with any accuracy. Automatic imputation on a record by record basis was first introduced in the 1981 Census, and was based on the work by Felligi and Holt in the 1970s, the so-called hotdeck methodology. This worked well in 1981, and so was carried through to 1991 with only minor changes for new questions.
This paper documents the requirements for the DQMS which have been gathered so far. It is a working paper, as the DQMS will be developed iteratively on a prototype basis and the requirements will be enhanced or re-prioritised as work proceeds. The general system requirements are shown in italics for ease of reference and are classified as either "standard" or "ad-hoc". The specific requirements for editing and imputation are then listed.
Although the role of the data expert is not yet fully defined it is anticipated that they will be in place sometime during 1998-99, and that they will become familiar with the data during the Dress Rehearsal for
Statistical Measurement and Monitoring of Data Editing and Imputation in the 2001 UK Census of Population __ 107
the Census which takes place in 1999. One way of during processing, with deviations (above pre-set organising the team would be to give them topic and thresholds) from the expected numbers, (from area-specific responsibilities. The Data Experts could external sources), highlighted. get to know a geographical area and geographical displays/boundaries could be available in a digital form. It might be possible to feed in some information based on the enumeration district (ED) grading so that hard to enumerate areas are apparent as possible problem areas from early on.
3. THE OPERATION OF THE DATA QUALITY MONITORING SYSTEM
The DQMS will be responsible for monitoring the quality of data from the point at which data is captured. It will be an automated system with standard and flexible outputs and will work only on machine readable data with the ability to both print and provide electronic output. It must fit seamlessly into Census processing and not be the cause of any delay to the operation. Counts taken during processing will be compared with previous census data and with the data from external sources. The DQMS may need to link to ArcInfo (used by the geography planners), spreadsheets and statistical interrogation packages.
3.1 General Requirement