data diagnositics in sas enterprise guide - wiilsu.org€¦ · •q-q plot •kernel density ......
TRANSCRIPT
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d . Copyright © 2010 SAS Institute Inc. All rights reserved.
DATA DIAGNOSTICS IN SAS®
ENTERPRISE GUIDE™
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DATA
DIAGNOSTICS IN
SAS®
ENTERPRISE
GUIDE™
AGENDA
How to…
• describe data (descriptive statistics)
• graph the data
• detect and deal with outliers
• assess normality
• transform variables in order to meet
assumptions (transformations)
• sample (for Modeling purposes)
Q&A
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SCENARIO
• Company sells Outdoor and Sports items
• Obtained a list of Customers with valid
email addresses
• Need to compile a data table with
information so we can build a predictive
model.
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
PRODUCT ORDER
DETAIL DATA -
TRANSACTIONAL
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
ThreeTwoOne
INTRODUCING
ENTERPRISE GUIDE SIMPLE AS 1,2,3
To work with SAS Enterprise Guide, you:
1. Create a project
2. Add data to the project
3. Run tasks against the data.
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DESCRIPTIVE STATISTICS & GRAPHS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DESCRIPTIVE
STATISTICS &
GRAPHS
• Characterize Data
• One-way Frequencies
• Distributions
• Reports
• Bar Charts
• Box Plots
• Scatter Plots
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DESCRIPTIVE STATISTICS & GRAPHS DEMO
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
ASSESS
NORMALITY
TasksDescribeDistribution Analysis
Graphs
• Histograms
• Q-Q Plot
• Kernel Density Plot
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
ASSESS
NORMALITY
TasksDescribeDistribution Analysis
4 Tests
• Shapiro-Wilk
• Kolmogorow-Smirnov (K-S)
• Cramer-von Mises
• Anderson-Darling
Testing Normality of Data using SAS
Guido’s Guide to PROC Univariate: A tutorial for SAS
Users
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
TRANSFORMATIONS
FOR NORMALITY
• Log
• Square Root
• Cube Root
• Reciprocal
• Square Transformation
• Many more…
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
TRANSFORMING
VARIABLES
• TotalSpent – Log Transformation
• Age – Recode to categorical
Transforming Variables for Normality and Linearity
Before Logistic Modeling – A Toolkit for Identifying and Transforming
Relevant Predictors
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
COMPUTED COLUMNS ‘ADVANCED EXPRESSION’
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DETECT AND DEAL WITH OUTLIERS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHAT IS AN
OUTLIER
Outliers are observations that have
extreme values relative to other
observations observed under the same
conditions.
Sources:
• Data Entry Errors
• Implausible Values
• Rare Events
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHY DETECT AND
DEAL WITH
OUTLIERS
• Bias or distortion of estimates
• Inflated sums of squares
• Distortion of p-values
• Faulty conclusions
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DETECT OUTLIERS
• Graphs - Box Plots, Distributions, Scatter Plots
• Univariate Statistics
• Regression
Cooks-D
RSTUDENT Statistic
DFFITS statistic
DFBETAS
Introduction to Building a Linear Regression Model
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DEAL WITH
OUTLIERS
Several Approaches
• Deleting
• Capping/Flooring Approach
• Sigma Approach
• Exponential Smoothing Approach
• Mahalanobis Distance Approach
• Robust-Reg Approach
Selecting the Appropriate Outlier Treatment for Common Industry
Applications
A SAS Application to Identify and Evaluate Outliers
Robust Regression and Outlier Detection with the RobustReg Procedure
Robust Outlier Identification using SAS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DETECT AND DEAL WITH OUTLIERS DEMO
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS ENTERPRISE
MINER
TRANSFORM NODE
Simple Transformations
Log
Square Root
Inverse
Square
Exponential
Standardized
Binning Transformations
Bucket
Quantile
Optimal Binning
Best Power Transformations
• Maximize Normality
• Maximize Correlation with
Target
• Equalize Spread with Target
Levels
• Optimal Maximum Equalize
Spread with Target Level
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS ENTERPRISE
MINER REPLACEMENT NODE
Interval Variables
Mean Absolute Deviation (MAD)
User-Specified Limits
Metadata Limits
Extreme Percentiles
Modal Center
Standard Deviations from the
Mean
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHY SAMPLE?
• Smaller Data
• exploratory analysis
• cost
• speed/performance
• Oversample rare events
• To get to population of interest
• Other Statistical Reasons
• Validation or test of models
• Adequate representation of the
population
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
• Simple Random Sampling (SRS)
• Stratified Sampling
• Proportional Sampling
• Other types
TYPES OF
SAMPLING
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
RESOURCES ENTERPRISE GUIDE
Enterprise Guide
• Interactive Tour
• SAS Talks
• Enterprise Guide Public Courses
Enterprise Guide for SAS Programmer
• New Goodies for the SAS® Programmer
in SAS® Enterprise Guide® 4.3
• SAS® Enterprise Guide® for
Programmers
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
ADDITIONAL
SUPPORTENTERPRISE GUIDE TUTORIALS
• View Free Tutorials• http://support.sas.com/training/resource
s/
» SAS Enterprise Guide Tutorial
» Getting Started with SAS Enterprise
Guide
» SAS Enterprise Guide Tutorial for
Statistics
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
FURTHER
TRAINING FROM
SAS EDUCATION
• Enterprise Guide 1 : Query and Reporting
• Enterprise Guide 2: Advanced Tasks and Querying
• Enterprise Guide for Experienced SAS Programmers
• Data Preparation for Data Mining
support.sas.com/training
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
• Ad Hoc Data Preparation for Analysis Using
SAS Enterprise Guide
• Introduction to Using SAS Enterprise Guide
for Statistical Analysis
• Introduction to Building a Linear Regression
Model
• Take a Fresh Look at SAS Enterprise Guide:
From point-and-click ad hocs to robust
enterprise solutions
• Advanced Analytics with Enterprise Guide
PAPERS AVAILABLE
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .sas.com
QUESTIONS?
Thank you for your time and attention!
Connect with me:
LinkedIn: https://www.linkedin.com/in/melodierush
Twitter: @Melodie_Rush