Download - Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.
Statistical Tools vs. Variable Types
Response (output)
Predictor (input)
Numerical Categorical/Mixed
NumericalSimple and Multiple Regression
Analysis of Variance (ANOVA)Analysis of Covariance (ANCOVA)
Categorical Categorical data analysis
Example: Broker Study
A financial firm would like to determine if brokers they use to execute trades differ with respect to their ability to provide a stock purchase for the firm at a low buying price per share. To measure cost, an index, Y, is used.
Y=1000(A-P)/AwhereP=per share price paid for the stock;A=average of high price and low price per share, for the day.
“The higher Y is the better the trade is.”
}1
1235-112
5 6
27
1713117
17 12
381743
7 5
524131418141917
R=6
CoL: broker
421101512206
14
Five brokers were in the study and six trades were randomly assigned to each broker.
Statistical Model
“LEVEL” OF BROKER(Broker is, of course, represented as “categorical”)
Y11 Y12 • • • • • • •Y1c
Yij
Y21
•
•
•
•
•
•
YnI
•
•
•
•
•
1
2
•
•
•
•
n
1 2 • • • • • • • • C
Yij = j + ij
i = 1, . . . . . , n
j = 1, . . . . . , C
Ync• • • • • • • •
One-Way Anova F-Test:
HO: Level of X has no impact on Y
HI: Level of X does have impact on Y
HO: 1 = 2 = • • • • 8
HI: not all j are EQUAL
ONE WAY ANOVA
Estimate of the common standard deviation
The GLM Procedure
Dependent Variable: TRADE
Sum of Source DF Squares Mean Square F Value Pr > F
Model 4 640.800000 160.200000 7.56 0.0004
Error 25 530.000000 21.200000
Corrected Total 29 1170.800000
R-Square Coeff Var Root MSE TRADE Mean
0.547318 42.63283 4.604346 10.80000
Diagnosis: Normality
• Don’t do the normality checking for all groups but only for the residuals
• The points on the normality plot must more or less follow a line to claim “normal distributed”.
• There are statistic tests to verify it scientifically. • The ANOVA method we learn here is not
sensitive to the normality assumption. That is, a mild departure from the normal distribution will not change our conclusions much.
Normality plot: normal scores vs. residuals
From the Broker data:
- 3 - 2 - 1 0 1 2 3
- 10. 0
- 7. 5
- 5. 0
- 2. 5
0
2. 5
5. 0
7. 5
RESIDUAL
Nor mal Quant i l es
Diagnosis: Equal Variances
• The points on the residual plot must be more or less within a horizontal band to claim “constant variances”.
• There are statistic tests to verify it scientifically. • The ANOVA method we learn here is not sensitive
to the constant variances assumption. That is, slightly different variances within groups will not change our conclusions much.
Residual plot: predicted values vs. residuals
From the Broker data:
RESI DUAL
- 8
- 7
- 6
- 5
- 4
- 3
- 2
- 1
0
1
2
3
4
5
6
7
PREDI CTED
5 6 7 8 9 10 11 12 13 14 15 16 17
Multiple ComparisonProcedures
Once we reject H0: ==...c in favor of H1: NOT all ’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns, are all 4 ’s different? Are 3 the same and one different? If so, which one? etc.
Pairwise Comparison
Goal: grouping levels
Method: Compare each pair of levels
SNK procedure is a popular procedure and introduced here
SAS Output for SNK Procedure
Number of Means 2 3 4 5
Critical Range 5.4749249 6.6214244 7.3120942 7.8071501
Means with the same letter are not significantly different.
SNK Grouping Mean N BROKER
A 17.000 6 5
A
A 14.000 6 4
A
A 12.000 6 2
B 6.000 6 1
B
B 5.000 6 3
Brokers 1 and 3 are not significantly different each other but they are significantly different to the other 3 brokers.
Broker 2 and 4 are not significantly different, and broker 4 and 5 are not significantly different, but broker 2 is different to (smaller than) broker 5 significantly.
Conclusion : 5 4 2 1 3
Comparisons to Control Dunnett Procedure
Designed specifically for comparing several “treatments” to a “control.”
Example: 1 2 3 4 5
6 12 5 14 17
Col
} R=6CONTROL
- Cols 4 and 5 differ from the control [ 1 ].- Cols 2 and 3 are not significantly differentfrom control.
In our example: 1 2 3 4 5 6 12 5 14 17
CONTROL
Comparisons significant at the 0.05 level are indicated by ***.
BROKER Comparison
Difference Between
Means
Simultaneous 95%
Confidence Limits
5 - 1 11.000 4.070 17.930 ***
4 - 1 8.000 1.070 14.930 ***
2 - 1 6.000 -0.930 12.930
3 - 1 -1.000 -7.930 5.930
Contrast
Question 1: Broker 1 vs. the others
Question 2: Brokers 1, 2 are more experienced than the others.
Experienced vs. less experienced brokers
SAS Output for Question 1
Contrast DF Contrast SS Mean Square F Value Pr > F
BROKER 1 VS THE OTHERS 1 172.8000000 172.8000000 8.15 0.0085
KRUSKAL - WALLIS TEST
(Non - Parametric Alternative)
HO: The probability distributions are identical for each level of the factor
HI: Not all the distributions are the same
Example: Life Insurance Amount
State
1: CA 2: KA 3: CO
90 80 165
200 140 160
225 150 140
100 140 160
170 150 175
300 300 155
250 280 180
SAS Code
DATA INSURANCE;INPUT STATE $ AMOUNT@@;
DATALINES;CA 90 CA 200 CA 225 CA 100 CA 170 CA 300 CA 250KA 80 KA 140 KA 150 KA 140 KA 150 KA 300 KA 280CO 165 CO 160 CO 140 CO 160 CO 175 CO 155 CO 180;
** NON-PARAMETRIC TEST;PROC NPAR1WAY DATA=INSURANCE WILCOXON;
TITLE "NONPARAMETRIC TEST TO COMPARE STATES";CLASS STATE;VAR AMOUNT;
RUN;