describing and exploring data initial data analysis

13
Describing and Exploring Data Initial Data Analysis

Upload: russell-ward

Post on 29-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Describing and Exploring Data Initial Data Analysis

Describing and Exploring Data

Initial Data Analysis

Page 2: Describing and Exploring Data Initial Data Analysis

Overview• Describing and Exploring data

• Initial Data Analysis• Characteristics• (Some) Steps involved• Methods

• Statistics• Central Tendency• Variability• Relationships

• Issues

Page 3: Describing and Exploring Data Initial Data Analysis

• Once data has been collected, the raw information must be manipulated in some fashion to make it more informative.

• Several options are available including plotting the data or calculating descriptive statistics.

Describing and Exploring Data

Page 4: Describing and Exploring Data Initial Data Analysis

Plotting Data• Often, one of the first things one does with a set

of raw data is to plot the data in some manner.• One should start with visual display of data.

• Examples• Frequency and density information

• Histograms, Violin plots• Bar plots

• Trends over time or across groups• Line graphs• Display of interval information (error bars)

• Relationships• Scatterplots

• Combinations

• Visual display of data allows for more rapid comprehension of distributions and relationships

• Use it whenever possible

Page 5: Describing and Exploring Data Initial Data Analysis

Descriptives• The other main part of initial examination of data

includes acquiring descriptive statistics• Measures of Central Tendency- ‘Expected’ values

• Single measure estimates• Mean, Median, Mode, Trimmed Means, M-estimators

• Measures of Variability: estimates of uncertainty• Standard deviation, MAD• Allow for interval estimates on any number of statistics

via the standard error

• Simple correlation measures among the variables under consideration• You should think of correlation statistic as a descriptive,

not inferential statistic• Except for purely exploratory endeavors, correlations

are a starting point for analysis, not an end• In fact, many of the analyses you come across use the

correlation matrix as the dataset

Page 6: Describing and Exploring Data Initial Data Analysis

Initial Data Analysis (IDA)• Also Initial Examination of Data,

Exploratory Data Analysis• Often overlooked or thought of as

being not all that important but…• It is at the beginning stages where

much trouble can be avoided, and if the data is glossed over this can lead to missed findings or results that will not be able to be replicated because they represent bad data.

• Bad data?

Page 7: Describing and Exploring Data Initial Data Analysis

Initial Data Analysis• IDA includes:• General descriptive and graphical

output• A healthy inspection of the individual

variables’ behaviors• Especially visually

• Outlier analysis• Outliers in terms of the model, not the

individual variables necessarily

• Possible model selection or re-specification

• Initial inference measures and testing assumptions of the analysis

Page 8: Describing and Exploring Data Initial Data Analysis

Steps of an analysis• 1. Clarify the objectives of the

investigation• 2. Collect the data in the appropriate way• 3. Investigate the structure and quality of

the data• 4. Carry out IDA (descriptive)• 5. Select and carry out formal statistical

analysis (inferential)• 6. Compare findings with previous results,

collect more data if necessary• 7. Interpret and communicate results

* Be flexible in your approach, and treat each research situation uniquely

Page 9: Describing and Exploring Data Initial Data Analysis

Method of IDA• Data scrutiny and description• Study variables in light of how they were

collected• Look for troublesome variables and that may

warrant special analysis later if used inferentially

• Search for outliers, missing data etc. that may result in less powerful inferential analysis

• Gather summary (descriptive) statistics and graphs presented as to not be misleading

• See if transformations or robust statistics are necessary.

Page 10: Describing and Exploring Data Initial Data Analysis

Method of IDA• Use inferential analyses in an

exploratory way

• Model Formulation• Include relevant theory• Recognize important features of the

data• Do the model and data go together?• Might there be new hypotheses worthy

of examination?

• Is further analysis even necessary?

Page 11: Describing and Exploring Data Initial Data Analysis

Initial Data Analysis• Problem• Although seen by most stats folk as an

important part of data analysis, IDA is often underused as a source of information and important first step in in data interpretation

• “Theories looking for data”• Too much concern on inferential analysis,

statistical significance• A far too typical approach seems to be get

the data, run descriptives, and at the same time or immediately following run the actual analysis• Then because results are poor start figuring

out ways to ‘fix’ it.

Page 12: Describing and Exploring Data Initial Data Analysis

Why the lack of emphasis on IDA?• Assumed it is the natural way that people

conduct their research anyway• It isn’t if they are left to their own devices

• Assumed lack of standard methods for going about it• In fact there are guidelines for how to do it• See Chatfield in related articles section

• Assumed its too exploratory• IDA != fishing• Don’t disregard prior knowledge and theory• Risk of invalid conclusions

• This would be a concern if you didn’t perform IDA

Page 13: Describing and Exploring Data Initial Data Analysis

Conclusion• Analysis of data takes time and one

must be prepared to exhaustively examine all aspects of the information collected

• The purpose of analysis is to allow the data to tell its story, not enforce our own onto the data

• An open-minded and thoughtful approach is necessary to any investigation