topic 5 quality datafile_management
TRANSCRIPT
![Page 1: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/1.jpg)
Data File Management, Quality Checking a Dataset & Missing Values
Srinivasulu RajendranCentre for the Study of Regional Development (CSRD)
Jawaharlal Nehru University (JNU)New Delhi
![Page 2: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/2.jpg)
Objective of the session
To understand Data File Management, Quality checking a dataset & missing
values through software packages
![Page 3: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/3.jpg)
1. What are the procedure one should follow before proceeding for statistical analysis through a software?2. How do we check quality of data?3. How do we organize the dataset through a software?
![Page 4: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/4.jpg)
Data sources
International Food Policy Research Institute (IFPRI) – 2006-07
Bangladesh Bureau of Statistics – Household Income and Expenditure Surveys (HIES) – 2004/2005
Bangladesh Demographic and Health Survey (BDHS) - 2007
![Page 5: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/5.jpg)
IFPRI DatasetChronic Poverty Study (resurvey 3 studies)
1.Micronutrients Gender/Agricultural Technology (1996-97) – 5 Thanas
2. Food for Education/Cash for Education - (2000 (10 Thanas) & 2003 (8 Thanas))
3. Microfinance (1994 – 5 Thanas)Institute involved: IFPRI, Chronic Poverty Research Center, Data
Analysis and Technical Assistance
![Page 6: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/6.jpg)
In the 2006-07 resurvey, all thanas from the 1994, 1996-97 & 2003 rounds were resurveyed
![Page 7: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/7.jpg)
Micronutrients Gender/Agricultural Technology
Hereafter we refer MCG study also known as Agricultural Technology or Ag Tech
“A census of households was conducted in villages where the NGO had introduced the agricultural technology and comparable villages where NGO was operating, but where the new technologies had not yet been introduced”.
![Page 8: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/8.jpg)
There are two major type of households selected from census
1. NGO – members adopting agricultural tech households
2. NGO members likely adopter households in villages where the technology was not yet introduced
![Page 9: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/9.jpg)
330 Households 1304 HHs in the resurvey for AgrTech
AgriTech introduced –“A” type villages
AgriTech not introduced –
“B” type villages
110 NGO Members adopter HHs
“A” - HHs
55 Non adopter non-NGO Members & NGO
members UNLIKELY to adopt
“C1” HHs
110 NGO Members LIKELY adopter –“B”
HHs
55 Non LIKELY adopter non NGO members & NGO
members unlikely to adopt “C2” HHs
![Page 10: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/10.jpg)
What are the procedure one should follow before proceeding for statistical analysis through a software?
SPSS
![Page 11: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/11.jpg)
1. Identify the data file format and convert them into
relevant software (SPSS) data file format (*.sav)
2. Make sure that COMPLETE variables and observations
has been converted into SPSS Format
3. Identify the characteristics of the variables for the
analysis
4. Save name of the file smaller size
5. It is better to have no space in the file name
6. Organize the data file at one place and folder
7. When ever we work on data, please append the files
with the previous programme file.
![Page 12: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/12.jpg)
How do we check quality of data?
There are few things that needs to be checked before we
proceed for any statistical analysis
1. Missing values
2. Wrong coding system
3. Outliers
4. Digits in the variables (specially for value term variables)
5. Unique numbers of id for the observation
6. Relevant variable characteristics i.e string, numberic etc
![Page 13: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/13.jpg)
SPSS has some good routines for detecting outliers
There is always the FREQUENCIES routine, of course.
The PLOTS command can do scatterplots of 2 variables.
The EXAMINE procedure includes an option for printing out
the cases with the 5 lowest and 5 highest values.
The REGRESSION command can print out scatterplots
(particularly good is *ZRESID by *ZPRED, which is a plot of
the standardized residuals by the standardized predicted
values). In addition, the regression procedure will produce
output on CASEWISE DIAGNOSTICS, which indicate which
cases are extreme outliers.
![Page 14: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/14.jpg)
Detecting the problem
Scatterplots, frequencies can reveal atypical cases
Can also look for cases with very large residuals.
Suspicious correlations sometimes indicate the presence of outliers.
![Page 15: Topic 5 quality datafile_management](https://reader038.vdocuments.site/reader038/viewer/2022100600/55566c48d8b42abc5a8b4ba6/html5/thumbnails/15.jpg)
The difference between STATA & SPSS
Probably the most critical difference between
SPSS and STATA is that STATA includes
additional routines (e.g. rreg, qreg) for
addressing the problem of outliers, which we
will discuss in future classes.