data quality control by naila baig ansari research fellow dept of community health sciences the aga...
TRANSCRIPT
Data Quality ControlData Quality Control
by
Naila Baig Ansari
Research Fellow
Dept of Community Health Sciences
The Aga Khan University
Karachi, Pakistan
Who am I?Who am I?Education:MSc (Epidemiology),
The Aga Khan University, 2001. Thesis: Care and feeding practices and their association with stunting among young children residing in Karachi-s squatter settlements
BBA (Management), The College of William and Mary, Williamsburg, VA, USA, 1989
Research interest: Nutritional and behavioral epidemiology, methodological issues in dietary assessment methods, household food security and gender-related issues, care and feeding practices, management of data and questionnaire designing
Learning ObjectivesLearning Objectives
To know the steps necessary for ensuring quality assurance and control of data at various stages of a study
To understand the difference between pilot testing and pre-testing
To understand the importance of designing data collection instruments
To understand how data can be managed using an audit trail and the various techniques that can be used to inspect your dataset after it has been entered
Performance ObjectivesPerformance Objectives
Know the difference between quality assurance and quality control and ways to ensure them
Know the objectives of a pilot test and a pre-test
Understand how data collection instruments should be designed and coded
Be able to manage data using an audit trail
Be able to inspect datasets for errors and rectify them
Data Quality ControlData Quality Control
Quality Assurance– Activities to ensure
quality of data before data collection
Quality Control– Monitoring and
maintaining the quality of data during the conduct of the study
• Data Management
– Handling and processing of data throughout the study
Steps in Quality AssuranceSteps in Quality Assurance
1. Specify the study hypothesis
2. Specify general design to test study hypothesis Develop an overall study protocol
3. Choose or prepare specific instruments
4. Develop procedures for data collection and processing Develop operation manuals
5. Train staff Certify staff
6. User certified staff, pretest and pilot-study data collection and processing instruments and procedures
Quality Assurance: Standardization of Quality Assurance: Standardization of proceduresprocedures
Why is standardization important?– In order to achieve highest possible level of uniformity
and standardization of data collection procedures in the entire study population
Preparation of written manual of operations– Detailed descriptions of exactly how the procedures
specific to each data collection instrument are to be carried out (BP example)
– Q by Q’s (question by question) instructions for interviews
Quality Assurance: Training of StaffQuality Assurance: Training of Staff
Aim to make each staff person thoroughly familiar with procedures under his/her responsibility
Training certification of the staff member to perform a specific procedure
Quality Assurance: Pretesting and Pilot Quality Assurance: Pretesting and Pilot testingtesting
Pretesting– Involves assessing
specific procedures on a sample in order to detect major flaws
Pilot Testing– Formal rehearsal of
study procedures
– Attempts to reproduce the whole flow of operations in a sample as similar as possible to study participants
Pretesting and Pilot testing resultsPretesting and Pilot testing results
Pretesting of questionnaire used to assess:
– flow of questions,
– presence of sensitive questions,
– appropriateness of categorization of variables,
– clarity of the q by q instructions to the interviewer
Pilot testing
– In addition to the above, flow of process
Quality Assurance: Data ManagementQuality Assurance: Data Management
Designing data collection– Layout, questions to ask, sequence of questions,
phrasing of questions, response categories, skip patterns
– Collect and record “raw”, not processed information (eg. Age)
– Codebook: link between the questionnaire and the data entered in the computer
Code book exampleCode book exampleVariable QNo Meaning Codes Format
Q1Id Q1 Quest. No 1-750 C 3
Q2Sex Q2 Respondent’s sex 1 male
2 female
N 1.0
Q3Child Q3 No of children 99 no response N 2.0
Q4Wt Q4 Weight in kg 999 not recorded N 3.1
Q5roof Q5 Roof type 1 RCC
2 Cement sheet
3 Tin sheet
4 Thatched
Other (specify)
N 2.0
Quality Assurance: Use of a Code bookQuality Assurance: Use of a Code book
Variable names
– Up to 8 characters a-z and 0-9, must start with a letter
– Combination of question number and description (eg. q3age)
Meaning:
– short text description describing the meaning of the variable
– SPSS software can incorporate this info as variable labels and display it in the output
Quality Assurance: Use of a Code bookQuality Assurance: Use of a Code book
Codes
– Try and use numerical codes
Predecide codes for no response, missing values
– Question could not be asked or not applicable (eg. pregnancy outcome)
– Question was asked but respondent did not reply (eg salary)
– Respondent replied “don’t know”
Quality ControlQuality Control
Observation of procedures and performance of staff members for identification of obvious protocol deviations
Strategies include:
– Over-the-shoulder observation of staff
– Taping all interviews and reviewing a random sample
– Ongoing field supervision
– field editing by interviewer as well as field supervisor
– Office editing which includes coding
– log book maintenance
– Statistical assessment of trends over time in the performance of each observer/interviewer/technician
Data Management: Audit trailData Management: Audit trail
Researcher should be able to trace each piece of information back to the original document:
– ID included in the original documents and in the dataset
– All corrections must be documented and explained
– All modifications to the dataset must be documented by command files
– Each analysis must be documented by a command file
Purpose of audit is to
– protect yourself against mistakes, errors, waste of time and loss of information
– enable external audit (revision)
Data Management: Handling of DataData Management: Handling of Data
Entering data
– Use professional data entry program like EpiData
Preparations
– complete codebook
– examine questionnaires for obvious inconsistencies, skip patterns
Data Management: Handling of DataData Management: Handling of Data
Error prevention:
– Set up a data entry form resembling your questionnaire
– Define valid values before entering data
– double data entry by two different operators compare contents to get list of discrepancies (
EpiInfo)
correct errors in both files and run new comparison
First Inspection of data. Error FindingFirst Inspection of data. Error Finding
Add variable and value labels to your data using a syntax command
Searching for errors
– make printouts of codebook from the data, overview of variables, simple frequency tables of appropriate variables
– compare codebook created with original codebook and see if label information is correct
– Inspect the generated summary/frequency tables for illegal or improbable minimum and maximum values of variables and inconsistencies (eg. 250 years age, pregnant male; 23 yr woman with 19 yr son)
Calculate the error rate by
– randomly select 10% or at least 40 of your questionnaires and re-enter them into new file
Correction of errors - DocumentationCorrection of errors - Documentation
If errors are discovered
– Make corrections in a command file (SPSS syntax file), this will provide full documentation of changes made to the dataset
If errors are discovered when comparing files after double data entry
– you can make corrections directly in the data entered, provided you end this step with a comparison of the two files entered and corrected
Correction of errors - DocumentationCorrection of errors - Documentation
Split the process into distinct and well-defined steps and that your documentation from one step to another is consistent
Archive
– once you have a “clean” documented version of your primary data, save one copy in a safe place and do your work with another copy
AnalysisAnalysis
Make sure you use the right data set
– recommend to create command files for analysis which start with the command reading the dataset
Late discovery of errors and inconsistencies
Backing up vs ArchivingBacking up vs Archiving
Backing up
– everyday activity
– purpose to able you to restore your data and documents in case of destruction or loss of data
– not only datasets, but also command files modifying your data, written documents such as the protocol, log book and other documenting information
Archiving
– takes place once or a few times during the life of the project
– purpose is to preserve your data and documents for a more distant future, maybe to even allow other researchers access to the information.