data preparation

47

Upload: saad-niazi

Post on 03-Jul-2015

230 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Data preparation
Page 2: Data preparation

Data preparation

Logging the

Data

Checking for

Accuracy

Developing

Code Book

Checking For

AccuracyTransformation

Use Serial #

Keep Original

forms Legible

Response

Item Non-

response

Name of

Variable

Data Type

Measurement

Type

Instruction

Out of Range

Illogical

Extreme Values

Missing Values

Item Reversal

Categories

Collapsing

Averaging or

Subtotal

Forming New

Variables

Standardization

Physically

Damaged

Page 3: Data preparation

Logging The Data

Keep the Original

Form

Use Serial Number

Keep the original forms as it can help you in revisiting the values & increases the

authenticity

Serial # helps in matching the punched in data with the data in shape of form

Page 4: Data preparation

Checking for

Accuracy

Legible Response

Item Non-response

Check whether the responses are comprehendible

Some questions are unanswered, this is called item non-response

Physically

Damaged

Questionnaire

Questionnaire is physically incomplete

Page 5: Data preparation

Checking for Accuracy: Treatment of

Unsatisfactory Response

Assigning

Missing

Values

Discarding

Sample is large with large number of non-response

Return to the

FieldIdentifiable & Accessible respondents

Large Proportion of Unsatisfactory Responses

Response on key variables is missing

Sample is large with small number of non-response

Small Proportion of Unsatisfactory Responses

Response on key variables is not missing

Page 6: Data preparation

What Coding Contains

Issues with Coding

Coding Process

Page 7: Data preparation

Issu

es w

ith

Co

din

gAppropriateness

Exhaustiveness

Mutual Exclusivity

Single Dimension

:Best partitioning for Hypotheses Testing:Availability of Comparison data

Marks are Converted into grades so be comparable

It ensures options Cover all the dataEx:• Use of other option• More than a specific Value

One Person can be placed in one group onlyEx:• Married, Unmarried• 1 to 10, 11 to 20 & More than 20

I am an unemployed teacher.This statement has two dimensions namely:

• Job• Job Status

Page 8: Data preparation

Column Position

What Coding Contains

Variable Name

Variable Description

Data Type

Values & Missing Values

Measurement Type

Page 9: Data preparation

Example of Coded Question

Q1

Variable Name

Column

Data Type

Valid Code Values

Missing Code Values

Instruction

Do you currently have a paid job?

PJOB

5

Numeric

Measurement Nominal 1 = Yes2 = No

0 = Inappropriate9 = Not AscertainedIn SPSS, empty cell is missing data

Page 10: Data preparation

Coding With SPSS

Page 11: Data preparation

Coding With SPSS

Page 12: Data preparation

Coding With SPSS

Page 13: Data preparation

Coding With SPSS

Press

Page 14: Data preparation

Coding With SPSS

Press

Page 15: Data preparation

Coding With SPSS

Press

Page 16: Data preparation

Coding With SPSS

Page 17: Data preparation

Coding With SPSS

Page 18: Data preparation
Page 19: Data preparation

Gender Male Female

Age __________Years

Qualification

Inter Bachelor Master

Income Less than 10k

11k to 20 k

21k to 30 k

More than 30 kDo you like the Given

Brands?Do you use the Given Brands?

Coke

Pepsi

7 up

DEW

Yes No

Yes No

Yes No

Yes No

Coke

Pepsi

7 up

DEW

Yes No

Yes No

Yes No

Yes No

Develop the code book for the given questionnaire

Page 20: Data preparation

Rating

# Statements 1 2 3 4 5

1 I find coke refreshing. 1 2 3 4 5

2 I find coke to be tasty. 1 2 3 4 5

3 I find coke to be expensive 1 2 3 4 5

4 I would recommend coke to others 1 2 3 4 5

5 Coke is easily available. 1 2 3 4 5

Rank the given brands. (1 for highest brand, 4 for lowest brand Coke

Pepsi

7 up

DEW

Page 21: Data preparation

Name Data Type

Measurement Values Instruction

Gender Numeric Nominal 1=M, 2=F

Age Numeric Scale -

Qualif Numeric Ordinal 1=Inter, 2= Bachelor 3=master

Income Numeric Ordinal 1= < than 10k, 2= 11 to 20k .

Like_C Numeric Nominal

1=Yes, 2=No

Lke-P Numeric Nominal

Like-7up Numeric Nominal

Like_Dew

Numeric Nominal

Use_C Numeric Nominal

Use-P Numeric Nominal

Use-7up Numeric Nominal

Use_Dew Numeric Nominal

Page 22: Data preparation

Name Data Type

Measurement Values

Instruction

Rank_c Numeric Ordinal

1=first, 2=2nd, 3=3rd, 4=4th

Rank_P Numeric Ordinal

Rank_7 Numeric Ordinal

Rank_D Numeric Ordinal

C_Ref Numeric Scale1=Strongly Disagree, 2=Disagree, 3=Neither Agree Nor Disagree, 4= Agree, 5=Strongly Agree

C-Tas Numeric Scale

C-exp Numeric Scale

C-Recom Numeric Scale

C-Easy Numeric Scale

Page 23: Data preparation

Data Cleaning

Consistency Checks

Missing Data

Out of Range

Logically Inconsistent

Extreme Values

Max & Min

Box Plot

Substitute

Delete

Neutral Value

Imputed Value

Case Wise Deletion

Pair Wise Deletion

Data Exploration

Page 24: Data preparation

:Data Cleaning: Out of Range Data

Example Data is collected using 5 Points Likert Scale

Correct Value

In the Range of 1 to 5

Incorrect Value

Other than these values

SPSS Routine

Max & Min

Page 25: Data preparation

:Data Cleaning: Out of Range Data

Example Data is collected using 5 Points Likert Scale

Correct Value

In the Range of 1 to 5

Incorrect Value

Other than these values

SPSS Routine

Page 26: Data preparation

:Data Cleaning: Extreme Values

ExampleThe data on ages is collected from an age group

from 30 to 45 but one has age of 65

Correct Value

30 to 45

Incorrect Value

Values out of this Range

SPSS Routine

Box Plot

Page 27: Data preparation

:Data Cleaning: Extreme Values

Page 28: Data preparation

Press

Page 29: Data preparation
Page 30: Data preparation

:Data Cleaning: Missing Data Treatment

Page 31: Data preparation

Dat

a C

lean

ing

Out of Range

Consistency Checks

Substitute a Neutral Value

A neutral value typically mean is substituted for the missing responses.

Treatment of Missing

Responses

Logically Inconsistent

Extreme Values

Substitute an Imputed Response

Answer to other questions is used to impute or calculate responses to the missing responses.

Case-wise DeletionRespondents with Missing responses are discarded completely from analysis. This can reduce the sample size as many of the respondents have some missing data

Pair-wise DeletionThe whole respondent is not discarded. The respondents with a missing value will not be used for the analysis where the data is missing. It is used when:1. Large sample size2. Few missing responses3. Variables are not highly related

Page 32: Data preparation

Dat

a T

ran

sfo

rmat

ion

Weighing

Item Reversals

Categories Collapsing

Standardization

Type of Transformation

Direct data is difficult or impossible

Forming New Variables

Missing Values

Why you do it How you do it

Understanding improves

Required for Summated Analysis

Comparing data collected through

different scales

Data Transformation, Compute Variables

(SPSS)

Data Transformation, Recode Variable (SPSS)

Data Transformation, Compute Variables

(SPSS)

Make up for the missing data

Assigning weight to different groups

Data, Weight Cases (SPSS)

Transform Replace Missing Values

(SPSS)

Data Transformation, Recode Variable (SPSS)

Page 33: Data preparation

Dat

a T

ran

sfo

rmat

ion

Weighing

Item Reversals

Categories Collapsing

Standardization

Type of Transformation

Data is collected on family members and income, we can use them to find per capita

family income

Forming New Variables

Missing Values

Example

There is a five point Likert Scale, it is collapsed into two categories of Agree and Disagree

Data is collected on a negative statement, but the construct requires its positive value.

Variation in the data collected through two different scales had to be compared.

Some answers were not present, data points were required for analysis

One group which was more present was given more weightage

Page 34: Data preparation

Tasks1. Form Variable2. Item Reversal3. Averaging or Summing4. Categories Collapsing5. Standardization

Suppose Optim2 and Optim4 are collected on negative statement

Page 35: Data preparation
Page 36: Data preparation

Forming New Variable

Find Income per Person

Page 37: Data preparation

Forming New Variable

Find Income per Person

Page 38: Data preparation
Page 39: Data preparation

Forming New Variable

Find Income per Person

Page 40: Data preparation

Item Reversal Reverse item Optim2 & Optim4

Page 41: Data preparation
Page 42: Data preparation

Press

Page 43: Data preparation

Press

Page 44: Data preparation
Page 45: Data preparation

Summated Analysis

Averaging Optim1, Optim3,

Rev_Optim2 & Rev_Optim4

Optim

Page 46: Data preparation

Summated Analysis

Averaging Optim1, Optim3,

Rev_Optim2 & Rev_Optim4

Page 47: Data preparation