data diagnositics in sas enterprise guide - wiilsu.org€¦ · •q-q plot •kernel density ......

37
Copyright © 2012, SAS Institute Inc. All rights reserved. Copyright © 2010 SAS Institute Inc. All rights reserved. DATA DIAGNOSTICS IN SAS ® ENTERPRISE GUIDE

Upload: ngohuong

Post on 23-Apr-2018

224 views

Category:

Documents


2 download

TRANSCRIPT

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d . Copyright © 2010 SAS Institute Inc. All rights reserved.

DATA DIAGNOSTICS IN SAS®

ENTERPRISE GUIDE™

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DATA

DIAGNOSTICS IN

SAS®

ENTERPRISE

GUIDE™

AGENDA

How to…

• describe data (descriptive statistics)

• graph the data

• detect and deal with outliers

• assess normality

• transform variables in order to meet

assumptions (transformations)

• sample (for Modeling purposes)

Q&A

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SCENARIO

• Company sells Outdoor and Sports items

• Obtained a list of Customers with valid

email addresses

• Need to compile a data table with

information so we can build a predictive

model.

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

CUSTOMER DATA

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

PRODUCT ORDER

DETAIL DATA -

TRANSACTIONAL

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

ThreeTwoOne

INTRODUCING

ENTERPRISE GUIDE SIMPLE AS 1,2,3

To work with SAS Enterprise Guide, you:

1. Create a project

2. Add data to the project

3. Run tasks against the data.

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DESCRIPTIVE STATISTICS & GRAPHS

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DESCRIPTIVE

STATISTICS &

GRAPHS

• Characterize Data

• One-way Frequencies

• Distributions

• Reports

• Bar Charts

• Box Plots

• Scatter Plots

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DESCRIPTIVE STATISTICS & GRAPHS DEMO

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

ASSESS NORMALITY

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

ASSESS

NORMALITY

TasksDescribeDistribution Analysis

Graphs

• Histograms

• Q-Q Plot

• Kernel Density Plot

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

ASSESS

NORMALITY

TasksDescribeDistribution Analysis

4 Tests

• Shapiro-Wilk

• Kolmogorow-Smirnov (K-S)

• Cramer-von Mises

• Anderson-Darling

Testing Normality of Data using SAS

Guido’s Guide to PROC Univariate: A tutorial for SAS

Users

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

ASSESS NORMALITY DEMO

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

TRANSFORM VARIABLES

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

TRANSFORMATIONS

FOR NORMALITY

• Log

• Square Root

• Cube Root

• Reciprocal

• Square Transformation

• Many more…

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

TRANSFORMING

VARIABLES

• TotalSpent – Log Transformation

• Age – Recode to categorical

Transforming Variables for Normality and Linearity

Before Logistic Modeling – A Toolkit for Identifying and Transforming

Relevant Predictors

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

COMPUTED COLUMNS ‘ADVANCED EXPRESSION’

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

COMPUTED COLUMNS ‘RECODED’

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

TRANSFORM VARIABLES DEMO

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DETECT AND DEAL WITH OUTLIERS

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHAT IS AN

OUTLIER

Outliers are observations that have

extreme values relative to other

observations observed under the same

conditions.

Sources:

• Data Entry Errors

• Implausible Values

• Rare Events

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHY DETECT AND

DEAL WITH

OUTLIERS

• Bias or distortion of estimates

• Inflated sums of squares

• Distortion of p-values

• Faulty conclusions

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DETECT OUTLIERS

• Graphs - Box Plots, Distributions, Scatter Plots

• Univariate Statistics

• Regression

Cooks-D

RSTUDENT Statistic

DFFITS statistic

DFBETAS

Introduction to Building a Linear Regression Model

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DEAL WITH

OUTLIERS

Several Approaches

• Deleting

• Capping/Flooring Approach

• Sigma Approach

• Exponential Smoothing Approach

• Mahalanobis Distance Approach

• Robust-Reg Approach

Selecting the Appropriate Outlier Treatment for Common Industry

Applications

A SAS Application to Identify and Evaluate Outliers

Robust Regression and Outlier Detection with the RobustReg Procedure

Robust Outlier Identification using SAS

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DETECT AND DEAL WITH OUTLIERS DEMO

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS ENTERPRISE

MINER

TRANSFORM NODE

Simple Transformations

Log

Square Root

Inverse

Square

Exponential

Standardized

Binning Transformations

Bucket

Quantile

Optimal Binning

Best Power Transformations

• Maximize Normality

• Maximize Correlation with

Target

• Equalize Spread with Target

Levels

• Optimal Maximum Equalize

Spread with Target Level

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS ENTERPRISE

MINER REPLACEMENT NODE

Interval Variables

Mean Absolute Deviation (MAD)

User-Specified Limits

Metadata Limits

Extreme Percentiles

Modal Center

Standard Deviations from the

Mean

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAMPLING

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHY SAMPLE?

• Smaller Data

• exploratory analysis

• cost

• speed/performance

• Oversample rare events

• To get to population of interest

• Other Statistical Reasons

• Validation or test of models

• Adequate representation of the

population

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

• Simple Random Sampling (SRS)

• Stratified Sampling

• Proportional Sampling

• Other types

TYPES OF

SAMPLING

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAMPLING DEMO

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

RESOURCES

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

RESOURCES ENTERPRISE GUIDE

Enterprise Guide

• Interactive Tour

• SAS Talks

• Enterprise Guide Public Courses

Enterprise Guide for SAS Programmer

• New Goodies for the SAS® Programmer

in SAS® Enterprise Guide® 4.3

• SAS® Enterprise Guide® for

Programmers

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

ADDITIONAL

SUPPORTENTERPRISE GUIDE TUTORIALS

• View Free Tutorials• http://support.sas.com/training/resource

s/

» SAS Enterprise Guide Tutorial

» Getting Started with SAS Enterprise

Guide

» SAS Enterprise Guide Tutorial for

Statistics

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

FURTHER

TRAINING FROM

SAS EDUCATION

• Enterprise Guide 1 : Query and Reporting

• Enterprise Guide 2: Advanced Tasks and Querying

• Enterprise Guide for Experienced SAS Programmers

• Data Preparation for Data Mining

support.sas.com/training

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

• Ad Hoc Data Preparation for Analysis Using

SAS Enterprise Guide

• Introduction to Using SAS Enterprise Guide

for Statistical Analysis

• Introduction to Building a Linear Regression

Model

• Take a Fresh Look at SAS Enterprise Guide:

From point-and-click ad hocs to robust

enterprise solutions

• Advanced Analytics with Enterprise Guide

PAPERS AVAILABLE

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .sas.com

QUESTIONS?

Thank you for your time and attention!

Connect with me:

LinkedIn: https://www.linkedin.com/in/melodierush

Twitter: @Melodie_Rush