data preparation
TRANSCRIPT
Data preparation
Logging the
Data
Checking for
Accuracy
Developing
Code Book
Checking For
AccuracyTransformation
Use Serial #
Keep Original
forms Legible
Response
Item Non-
response
Name of
Variable
Data Type
Measurement
Type
Instruction
Out of Range
Illogical
Extreme Values
Missing Values
Item Reversal
Categories
Collapsing
Averaging or
Subtotal
Forming New
Variables
Standardization
Physically
Damaged
Logging The Data
Keep the Original
Form
Use Serial Number
Keep the original forms as it can help you in revisiting the values & increases the
authenticity
Serial # helps in matching the punched in data with the data in shape of form
Checking for
Accuracy
Legible Response
Item Non-response
Check whether the responses are comprehendible
Some questions are unanswered, this is called item non-response
Physically
Damaged
Questionnaire
Questionnaire is physically incomplete
Checking for Accuracy: Treatment of
Unsatisfactory Response
Assigning
Missing
Values
Discarding
Sample is large with large number of non-response
Return to the
FieldIdentifiable & Accessible respondents
Large Proportion of Unsatisfactory Responses
Response on key variables is missing
Sample is large with small number of non-response
Small Proportion of Unsatisfactory Responses
Response on key variables is not missing
What Coding Contains
Issues with Coding
Coding Process
Issu
es w
ith
Co
din
gAppropriateness
Exhaustiveness
Mutual Exclusivity
Single Dimension
:Best partitioning for Hypotheses Testing:Availability of Comparison data
Marks are Converted into grades so be comparable
It ensures options Cover all the dataEx:• Use of other option• More than a specific Value
One Person can be placed in one group onlyEx:• Married, Unmarried• 1 to 10, 11 to 20 & More than 20
I am an unemployed teacher.This statement has two dimensions namely:
• Job• Job Status
Column Position
What Coding Contains
Variable Name
Variable Description
Data Type
Values & Missing Values
Measurement Type
Example of Coded Question
Q1
Variable Name
Column
Data Type
Valid Code Values
Missing Code Values
Instruction
Do you currently have a paid job?
PJOB
5
Numeric
Measurement Nominal 1 = Yes2 = No
0 = Inappropriate9 = Not AscertainedIn SPSS, empty cell is missing data
Coding With SPSS
Coding With SPSS
Coding With SPSS
Coding With SPSS
Press
Coding With SPSS
Press
Coding With SPSS
Press
Coding With SPSS
Coding With SPSS
Gender Male Female
Age __________Years
Qualification
Inter Bachelor Master
Income Less than 10k
11k to 20 k
21k to 30 k
More than 30 kDo you like the Given
Brands?Do you use the Given Brands?
Coke
Pepsi
7 up
DEW
Yes No
Yes No
Yes No
Yes No
Coke
Pepsi
7 up
DEW
Yes No
Yes No
Yes No
Yes No
Develop the code book for the given questionnaire
Rating
# Statements 1 2 3 4 5
1 I find coke refreshing. 1 2 3 4 5
2 I find coke to be tasty. 1 2 3 4 5
3 I find coke to be expensive 1 2 3 4 5
4 I would recommend coke to others 1 2 3 4 5
5 Coke is easily available. 1 2 3 4 5
Rank the given brands. (1 for highest brand, 4 for lowest brand Coke
Pepsi
7 up
DEW
Name Data Type
Measurement Values Instruction
Gender Numeric Nominal 1=M, 2=F
Age Numeric Scale -
Qualif Numeric Ordinal 1=Inter, 2= Bachelor 3=master
Income Numeric Ordinal 1= < than 10k, 2= 11 to 20k .
Like_C Numeric Nominal
1=Yes, 2=No
Lke-P Numeric Nominal
Like-7up Numeric Nominal
Like_Dew
Numeric Nominal
Use_C Numeric Nominal
Use-P Numeric Nominal
Use-7up Numeric Nominal
Use_Dew Numeric Nominal
Name Data Type
Measurement Values
Instruction
Rank_c Numeric Ordinal
1=first, 2=2nd, 3=3rd, 4=4th
Rank_P Numeric Ordinal
Rank_7 Numeric Ordinal
Rank_D Numeric Ordinal
C_Ref Numeric Scale1=Strongly Disagree, 2=Disagree, 3=Neither Agree Nor Disagree, 4= Agree, 5=Strongly Agree
C-Tas Numeric Scale
C-exp Numeric Scale
C-Recom Numeric Scale
C-Easy Numeric Scale
Data Cleaning
Consistency Checks
Missing Data
Out of Range
Logically Inconsistent
Extreme Values
Max & Min
Box Plot
Substitute
Delete
Neutral Value
Imputed Value
Case Wise Deletion
Pair Wise Deletion
Data Exploration
:Data Cleaning: Out of Range Data
Example Data is collected using 5 Points Likert Scale
Correct Value
In the Range of 1 to 5
Incorrect Value
Other than these values
SPSS Routine
Max & Min
:Data Cleaning: Out of Range Data
Example Data is collected using 5 Points Likert Scale
Correct Value
In the Range of 1 to 5
Incorrect Value
Other than these values
SPSS Routine
:Data Cleaning: Extreme Values
ExampleThe data on ages is collected from an age group
from 30 to 45 but one has age of 65
Correct Value
30 to 45
Incorrect Value
Values out of this Range
SPSS Routine
Box Plot
:Data Cleaning: Extreme Values
Press
:Data Cleaning: Missing Data Treatment
Dat
a C
lean
ing
Out of Range
Consistency Checks
Substitute a Neutral Value
A neutral value typically mean is substituted for the missing responses.
Treatment of Missing
Responses
Logically Inconsistent
Extreme Values
Substitute an Imputed Response
Answer to other questions is used to impute or calculate responses to the missing responses.
Case-wise DeletionRespondents with Missing responses are discarded completely from analysis. This can reduce the sample size as many of the respondents have some missing data
Pair-wise DeletionThe whole respondent is not discarded. The respondents with a missing value will not be used for the analysis where the data is missing. It is used when:1. Large sample size2. Few missing responses3. Variables are not highly related
Dat
a T
ran
sfo
rmat
ion
Weighing
Item Reversals
Categories Collapsing
Standardization
Type of Transformation
Direct data is difficult or impossible
Forming New Variables
Missing Values
Why you do it How you do it
Understanding improves
Required for Summated Analysis
Comparing data collected through
different scales
Data Transformation, Compute Variables
(SPSS)
Data Transformation, Recode Variable (SPSS)
Data Transformation, Compute Variables
(SPSS)
Make up for the missing data
Assigning weight to different groups
Data, Weight Cases (SPSS)
Transform Replace Missing Values
(SPSS)
Data Transformation, Recode Variable (SPSS)
Dat
a T
ran
sfo
rmat
ion
Weighing
Item Reversals
Categories Collapsing
Standardization
Type of Transformation
Data is collected on family members and income, we can use them to find per capita
family income
Forming New Variables
Missing Values
Example
There is a five point Likert Scale, it is collapsed into two categories of Agree and Disagree
Data is collected on a negative statement, but the construct requires its positive value.
Variation in the data collected through two different scales had to be compared.
Some answers were not present, data points were required for analysis
One group which was more present was given more weightage
Tasks1. Form Variable2. Item Reversal3. Averaging or Summing4. Categories Collapsing5. Standardization
Suppose Optim2 and Optim4 are collected on negative statement
Forming New Variable
Find Income per Person
Forming New Variable
Find Income per Person
Forming New Variable
Find Income per Person
Item Reversal Reverse item Optim2 & Optim4
Press
Press
Summated Analysis
Averaging Optim1, Optim3,
Rev_Optim2 & Rev_Optim4
Optim
Summated Analysis
Averaging Optim1, Optim3,
Rev_Optim2 & Rev_Optim4