analyzing surveys

17
Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical Analysis

Upload: gabe

Post on 25-Feb-2016

62 views

Category:

Documents


2 download

DESCRIPTION

Analyzing Surveys. Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT. Laboratory for Interdisciplinary Statistical Analysis. Outline. Data Cleaning and Preprocessing Outlier Detection Missing Value Imputation Visualizing and Understanding Data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Analyzing Surveys

Analyzing SurveysMarcos Carzolio

Associate Collaborator for LISAPhD Student

Department of Statistics, VT

Laboratory for Interdisciplinary Statistical Analysis

Page 2: Analyzing Surveys

Outline• Data Cleaning and Preprocessing

• Outlier Detection• Missing Value Imputation

• Visualizing and Understanding Data• Boxplots, Histograms, and Scatterplots• Correlation Matrices

• Analyzing Data• Contingency Tables• Analysis of Variance (ANOVA)• Regression

Page 3: Analyzing Surveys

Laboratory for Interdisciplinary Statistical Analysis

www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis

LISA helps VT researchers benefit from the use of Statistics

www.lisa.stat.vt.edu

Experimental Design • Data Analysis • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, SPSS...)

Our goal is to improve the quality of research and the use of statistics at

Virginia Tech.

Page 4: Analyzing Surveys

How can LISA help?• Formulate research question.• Screen data for integrity and unusual

observations.• Implement graphical techniques to showcase

the data – what is the story?• Develop and implement an analysis plan to

address research question.• Help interpret results.• Communicate! Help with writing the report or

giving the talk.

• Identify future research directions.4

Page 5: Analyzing Surveys

Laboratory for Interdisciplinary Statistical

Analysis

Collaboration From our website request a meeting for personalized statistical advice

Great advice right now:Meet with LISA before collecting your data

Short Courses Designed to help graduate students apply statistics in their research

Walk-In Consulting

Monday—Friday 1-3 pm in 401 HutchesonAlso, Tuesdays 1-3 pm in ICTAS Café X

& Thursdays 1-3 pm in GLC Video Conf. Room for questions requiring <30 mins

All services are FREE for VT researchers.

LISA helps VT researchers benefit from the use of Statistics

www.lisa.stat.vt.edu

Designing Experiments • Analyzing Data • Interpreting ResultsGrant Proposals • Using Software (R, SAS, JMP, Minitab...)

Page 6: Analyzing Surveys

Some Useful Resources• R Statistical Computing Software

• Can be downloaded for free from: http://www.r-project.org/ • R Studio, a free Integrated Development Environment:

http://rstudio.org/

• For a more interactive and user-friendly experience, try JMP• Downloadable from the Virginia Tech software library: http

://www2.ita.vt.edu/software/department/products/sas/jmp/index.html

• Amelia II: A Program for Missing Data• Visit: http://gking.harvard.edu/amelia/

Page 7: Analyzing Surveys

Types of Survey DataData Type Description Examples StatisticsNominal Data with no intrinsic

relative meaning behind labels

Strawberry, Banana, Hispanic

Mode

Ordinal Data with an ordered structure

Small, Extra Large, Likert Scale*

Median and Percentiles

Interval (continuous or discrete)

Data with meaningful difference relations

Degrees in Celsius, Birthdates, GPS Coordinates

Mean, Standard Deviation, Correlation

Ratio (continuous or discrete)

Data with scale relations

Weight, Income, Length

Mean, Standard Deviation, Correlation

Page 8: Analyzing Surveys

Outlier Detection and Handling

Outlier

• Outliers are data points that deviate far from the main body of data so as to arouse suspicion about their origins

• Visualize your data• Boxplots, histograms, and

scatterplots

• Only remove outliers that are verifiable errors

• Extremeness in observations is not in itself cause for data removal

• R Package ‘outliers’

Page 9: Analyzing Surveys

Missing Value Imputation

• Imputation is the process of filling in the missing values of a dataset

• Before considering imputation, try going after respondents for their true answers

• Can be very tricky (Come to LISA for help)

• If only one or two missing values are present in a vast dataset, use the mean of available values as a “best guess”

Honaker, James et al., AMELIA II: A Program for Missing Data

Page 10: Analyzing Surveys

Visualizing Your Data

Boxplots

SAS/GRAPH(R) 9.2: Statistical Graphics Procedures Guide, Second Edition

Page 11: Analyzing Surveys

Visualizing Your Data

Histograms

Page 12: Analyzing Surveys

Visualizing Your Data

Scatter Plots

Page 13: Analyzing Surveys

Understanding Your Data

Correlation Matrices

Page 14: Analyzing Surveys

Contingency Tables

• Tabulates the number of responses in each category

• Helps to visualize the distribution of data• Use χ2 approximate test for independencePearson's Chi-squared test

data: tab X-squared = 0.7658, df = 2, p-value = 0.6819

Warning message:In chisq.test(tab) : Chi-squared approximation may be incorrect

Page 15: Analyzing Surveys

Analysis of Variance• Technique used to test the differences between groups• Always plot your data before doing analysesCall: aov(formula = resp_height ~ gender)Terms: gender ResidualsSum of Squares 297.744 588.567Deg. of Freedom 1 39

Page 16: Analyzing Surveys

Regression• Actually a generalization of ANOVA

• Again, always plot your data

Call:lm(formula = exercise ~ dad_height)

Residuals: Min 1Q Median 3Q Max -5.9866 -3.4205 -0.3236 2.6709 14.0949 Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) -7.8573 10.7968 -0.728 0.471dad_height 0.1938 0.1546 1.253 0.218

Residual standard error: 4.381 on 37 degrees of freedom (8 observations deleted due to missingness)Multiple R-squared: 0.04073, Adjusted R-squared: 0.0148 F-statistic: 1.571 on 1 and 37 DF, p-value: 0.2179

Page 17: Analyzing Surveys

Other Useful Resources

• A PowerPoint on more automated outlier detection techniques:• http://www.dbs.ifi.lmu.de/~zimek/publications/K

DD2010/kdd10-outlier-tutorial.pdf

• R Package ‘outliers’:• http://cran.r-project.org/web/packages/outliers/

outliers.pdf

• On multiple imputation:• http://sites.stat.psu.edu/~jls/mifaq.html#bayes