discriminant analysis database marketing instructor:nanda kumar
Post on 16-Dec-2015
223 Views
Preview:
TRANSCRIPT
Discriminant AnalysisDiscriminant Analysis
Database Marketing
Instructor:Nanda Kumar
Multiple Regression
Y = b0 + b1 X1 + b2 X2 + …+ bn Xn
Same as Simple Regression in principle
New Issues:– Each Xi must represent something unique
– Variable selection
Multiple Regression
Example 1:– Spending = a + b income + c age
Example 2:– weight = a + b height + c sex + d age
Real Estate Example
How is price related to the characteristics of the house?
SAS Code
proc reg;
model price = section lotsize bed bath age other;
run;
Interpreting the Regression Output
Parameter Estimates or Slope Coefficients capture the marginal impact of explanatory variable on price
Example: the coefficient of the variable beds represents the impact of increasing the number of bedrooms by one on price
Significance of the Coefficients
Are they significantly different from zero?– Look at the T values and p values
• T value higher than 1.8 or p<0.05 good
• Sometimes p<0.10 is considered reasonably significant
Overall Goodness of Fit– Look at R2 (also refer to note in Session 1)
Where are we Now?
Behavior
Segment 1
Segment 2
Secondary
Data
Distinguishing
Characteristics Targeting
Factor Analysis Cluster
Analysis
Discriminant/Logit Analysis
Web Browsing
Identified two groups of consumers– One that visits your website frequently– One that doesn’t
Can the differences in behavior be related to socio-demographic variables?
Can we use these discriminators to classify prospects into one of these two groups?
Catalog Business
Identified two consumer segments– One which buys a lot – Other which does not buy as much
Can we find variables that help discriminate the behavior of these two groups?
Can we use these discriminators to classify other consumers into one of these two groups?
Promotional Campaigns
Identify groups based on their response to promotional campaigns– One group purchases a lot on promotion– Other does not
Identify characteristics that distinguish these two groups
Can we use these discriminators to identify price sensitive prospects from the not so price sensitive ones?
Segmentation Analysis
General Problem– Identified segments in the population based on
behavior
– Want to find targetable characteristics that discriminate these groups
– Classify prospects into different groups
DataStock # GE/A ROI Stock # GE/A ROI
1 0.158 0.182 13 -0.012 -0.0312 0.21 0.206 14 0.036 0.0533 0.207 0.188 15 0.038 0.0364 0.28 0.236 16 -0.063 -0.0745 0.197 0.193 17 -0.054 -0.1196 0.227 0.173 18 0 -0.0057 0.148 0.196 19 0.005 0.0398 0.254 0.212 20 0.091 0.1229 0.079 0.147 21 -0.036 -0.072
10 0.149 0.128 22 0.045 0.06411 0.2 0.15 23 -0.026 -0.02412 0.187 0.191 24 0.016 0.026
Good Stocks
Good Stocks
0
0.05
0.1
0.15
0.2
0.25
0 0.05 0.1 0.15 0.2 0.25 0.3
GE/A
RO
I
ROI
Bad Stocks
Bad Stocks
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
-0.1 -0.05 0 0.05 0.1
GE/A
RO
I
ROI
All Stocks
All Stocks
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
-0.1 0 0.1 0.2 0.3
GE/A
RO
I
Identifying the Best Discriminators
Two groups appear to be well separated on each ratio: ROI and GE/A
Also well separated in two dimensional space
But this need not always be the case!
Discriminating Variables
X1
X2
Discriminant Analysis
Identify a set of variables that best discriminate between the two groups
Does so by choosing a new line that maximizes the similarity between members of the same group and minimizing the similarity between members belonging to different groups
Discriminant Function
Z = w1 GEA + w2 ROI
Between-Group Sum of Squares – SSb
Within-Group Sum of Squares – SSw
= (SSb/SSw)
More on the Criterion
For Z to provide maximum separation between the groups, the following must be satisfied:– The means of Z for the two groups should be
as far apart as possible (or high SSb)
– Values of Z for each group should be as homogenous as possible (or low SSw)
Classification
Discriminant Function: The line that separates the members of the two groups
Methods of Classification– Cut-Off Value Method– Decision Theory Approach– Classification Function Approach– Mahalanobis Distance Method
Cut-Off Value Method
Uses the Discriminant Function line to score new observations (prospects) and classify them into one of two groups based on a cut-off value
Classification
Z
Cut-off Value
R2 R1
Classification Function Approach
Classifications based on this approach are identical to those done by Decision Theory approach
Classification functions are computed for each group:
C1 = -7.87 + 61.237*GEA + 21.027*ROI
C2 = -0.004 + 2.551*GEA – 1.404*ROI
Basic Idea
Score each new observation using these two scoring functions
The observation gets assigned to the group with the higher score
What To Look For In The Results?
Significance of the Discriminating Variables– Idea is to test whether the means of the
discriminating variables are statistically different across the two groups
– Statistic: Wilks’ Lamda must be small (Look for the p value/significance level)
Estimate of The Discriminant Function
Canonical Discriminant FunctionZ = -2.0018 + 15.0919*GEA + 5.769*ROI
It is possible that the group means are statistically different even though for all practical purposes, the differences between the groups may not be large
Look at the squared Canonical Correlation: ratio of between group SS/Total SS (High is good)
Importance of the Discriminant Variables and the Discriminant Function
How important is a variable to the Discriminant Function?
Look at the structure loadings: Pooled Within Canonical Structure– Variable with the higher loading is relatively more
important– Caution: If the variables are highly correlated relative
importance of the variables can change with sample
Classification Summary
Look at Cross-Validation results
Web Browsing
Can use the Discriminant function to classify prospects into one of these two groups
Target Appropriately
Catalog Business
Classify other consumers into one of these two groups
Do stuff!
Promotional Campaigns
Classify Prospects into price sensitive and not so price sensitive segments
Target appropriately
Summary
Discriminant Analysis Extremely Useful Segmentation Analysis
tool Intermediate step in the overall picture –
helps classify prospects and devise the appropriate targeting strategies
top related