-
CHAPTER 1
1) (casual relationship) , (
: Multiple Regression, : Multivariate ANOVA) 2)
(reduction) (classification)
.
(large), (complicate & complex) .
. .
(1)
, , , 4
. 4 ()
(), () () .
(common entity).
. (
) .
4 (, , , ) 2 (, )
. . 4
.
-
Chapter1.
2
(2)
100 IQ, , , , , ,
. 100
. .
.
(3)
(, , ) (, , )
, .
.
,
(confirmatory)
, ((exploratory) .
.
1.1.
1.1.1.
(Statistics is about data) .
(collection), (summarization), (analysis), (presentation)
.
(: population)
(sample) (: , , ) (, : IQ, , ,
, ) . .
(: ,
), (: ),
(: IQ, , ), (: ,
-
3
) .
.
(1) (2)
.
.
. 30
. , ,
.
,
(data matrix) .
. n p (, ,
, , , ) .
. () () .
ijx
.
npn21
2p2221
1p1211
x... x...
x... x
x... x
nx
xx
X
1var 2var 3var pvar
1 1obs
X11
180
X12
82
X13
Married
X1P
2 2obs
X21
163
X22
56
X23
Single
X2P
n nobs
Xn1
173
Xn2
75
Xn3
Single
Xnp
-
Chapter1.
4
. ix
p .
px
xx
x2
1
~ nj
x
xx
x
pj
j
j
j ,,2,1,2
1
~
1.1.2.
, .
, , ,
. (history)
(trail) . ,
.
.
1.1.3.
() (random experiment,
) (sample space, S ) (element)
.
. .
(1)(continuous)
-
5
.
, ,
.
(2)(discrete)
, , , .
(1) (metric, measurable, quantitative)
, , , IQ, ,
.
. (: )
(2), (Non-metric, categorical, classified)
, , , (, , )
. .
(nominal)
(, ), (, ) .
(ordinal)
(A>B>C>D>E) .
(time series)
(Cross-section: ) . (,
) ( ) .
(casual relationship) ( ) (exploratory)
(independent) ( )
(dependent) (response) . Y , X
-
Chapter1.
6
. .
,
.
.
1.2.
. 1.2.2
.
1.2.1.
(1) (Multiple Regression)
, (
.) 2
(Multivariate Regression),
(Simultaneous Equation Regression) .
||||
(: ) ,
, , . 1) ,
, , ( ) 2)
( ) ( ) 3)
( ) .
(2) (Logistic Regression)
(binary, dichotomous)
. .
||||
-
7
, ,
(: ),
.
(3) (ANOVA: Analysis Of Variance)
.
||||
(: ) ( , , ), (,
, ), (30 , 40 , 50 )
.
(4) (Multivariate ANOVA)
2
.
||||
() , , (, , ),
(/)
.
1.2.2.
(variable directed
technique) .
(component), (factor), (canonical variate) .
(1) (Principal Component Analysis: PCA)
(2) (Factor Analysis: FA)
(3) (Canonical Analysis: CA)
-
Chapter1.
8
.
1.2.3.
(Individual Directed) .
(1) (Cluster Analysis: CA): Multi-Dimensional Scaling (MDS: )
(2) (Discriminant Analysis: DA): (Canonical Discriminant
Analysis: CDA), (Logistic Discriminant Analysis: LDA)
.
1.3.
. 30
( 15, 15) , , , .
1.3.1.
(abnormality) (outliers:
) (
) () .
.
. p )( pk
.
, , , 2~3
.
-
9
1.3.2.
. (, , , )
( .) .
, ( .)
, .
( )
(variable-directed technique)
.
.
1.3.3.
2 (
) . ,
, ,
(, , , ) .
() ,
, , .
Logistic Logistic .
30 (, , , )
, , ,
() .
1.3.4.
.
.
(2 ) (MDS: Multi-Dimensional Scaling)
.
-
Chapter1.
10
,
. 30 (, )
. , , , 30 .
.
1.3.5.
2 . ,
( 3 ) , , (, , ),
(, ) . ,
, 3 (ANOVA)
.
(1) , , ( , )
.
(2)1 . k 1 ( )
. ()
k)1(1 . .
(false
significant) 1
.
1.3.6.
, .
.
.
-
11
S D N N N S
D S N S N N
Yes Yes No No No Yes
No No Yes Yes No No
P P R R Yes No
P P N N N D
D P N N N N
N: , P: , R: , S: , D:
-
CHAPTER 3.
.
(box-whisker plot) - (stem and leaf
plot) , , .
(W-), (, ) ( )
. 2
(scatter plot) .
.
3 ? 3
4
. .
. )3(p 2
.
2 ( ) .
3.1.
p ix
.
-
Chapter3.
36
p
p
x
xx
x2
1
p (Multivariate
Normal Distribution) , ) ,(~ ppppp Nx
)]()(2/1exp[||)2(
1),;( 12/12/
xxxfpx
pp
p
xE
xExE
2
1
2
1
)(
)()(
,
pppp
p
p
xxExCov
21
22221
11211
))(()(
))((),( jjiijiij xxExxCov jifor
2)()(),( iiiiiii xExVarxxCov
jiforjifor iijjii
ijij 1or
pppp
p
p
R
21
22221
11211
( j ) jx , ( jj ) jjs .
2
2
1
,
2221
1211
.
-
37
)]()(2/1exp[||)2(
1),;( 12/12/2
xxxf px ,
2
1
2
12 )(
)(
xExE
,
2221
1211
3.2.
3.2.1.
2222 )()(
))((
)1/()()1/()(
)1/())(()var()var(
),cov(
YYXX
YYXX
nYYnXX
nYYXXYX
YXr
9.0r 9.0r
1.0r 1.0r
-
Chapter3.
38
(linear association) .
(1)1 1 .
(2)1 . ()
().
(3)1 . ()
().
(4)0 .
(comparable)
.
, .
3.2.2.
0:0 H
(1)
)2(~)2/()1( 2
ntnr
rT where 22 )()(
)])([(
YYXXE
YYXXEr
( )
(2) 0: 00 H
)3
1,11ln5.0(~
11ln5.0
n
Normalrrz
app
.
(3)
31
11ln5.0 2/
n
zz ),( UL .
)1/()1( ),1/()1( 2222 LLUU eeUeeL
-
39
527.0 ,50 rn . 586.0527.01527.01ln5.0
11ln5.0
rrz 95%
)872.0,3.0( UL .
3.2.3.
0
. 0
( 1212 4.0100 XXX ) .
(p-) .
.
(1) control 0.9 .
(2) control 0.7 .
(3) 20-30 0.6 .
(4) (1-5 ,
)
. ( SSTSSRR /2 ) .
-
Chapter3.
40
. Spearman , Kendall's Tau
.
3.2.4. SAS
delimiter=0.9x MISSOVER DSD Tab
.
(1)NOPRINT output . NOPRINT
(, ) , ( p -) .
NOSIMPLE .
(2)OUTP=OUT1 OUT1 SAS data
. .
3.3.
3.3.1.
-
41
A 48 10 15
6 . [APPLICANT.TXT]/ [Applied
Multivariate Methods for Data Analysts, Dallas E. Johnson, p. 101]
ID( ) Letter( 1X ) Appearance( 2X )
Academic Ability( 3X ) Likeability( 4X ) Self-Confidence( 5X )
Lucidity( 6X ) Honest( 7X ) Experience( 9X )
Drive( 10X ) Ambition( 11X ) Potential( 13X )
Keenness to Join( 14X ) Suitability( 15X )
Salesmanship( 8X ) Grasp Concept( 12X )
3.3.2.
15 15/)( AAAPLAVG 6
. .
(1/15)
.
HOMEWORK#2-2 . .
/* */ , /* */ mean
.
-
Chapter3.
42
3.3.3.
( ) (weight)
. ( SUwAPwLwAvg 1521 ... , where i
iw 1)
.
A , , ,
, . 5 2
. 5 2
20.
3.3.4.
.
(grouping)
. 15 .
.
. ,
.
-
43
Group 1 5X , 6X , 8X , 10X , 11X , 12X , 13X
Group 2 1X , 9X , 15X
Group 3 4X , 7X , 14X
Group 4 2X
Group 5 3X
15 (, ) 5
. group 1 ( ) 7
1
.
5/]14...3/)1591(7/)1365[( XXXXXXXAVGw
3.4.
3.4.1.
(scatter
plot). 3 .
(scatter plot matrix) . SAS SAS/INSIGHT
.
-
Chapter3.
44
. CTRL
. .
.
-
45
.
.
3.4.2. Bubble
2 3
Bubble ( blob ) . 2
. GC y , PO
x AM Bubble plot .
. 5X ( y ) 6X ( x )
. 5X 6X
-
Chapter3.
46
( 5X 6X . =0.80755) . 5X 8X
5X ( y )
( 79963.0r ) . 6X 8X 6X ( x )
( 81802.0r ) .
Chernoff , , , , , , ,
3
.
-
47
[EXERCISE]
(1) . [CLASS.txt]
, . (=0.05)
95% .
(2) . [APPLICANT.txt]
6 .
6
.
6 .
(3) 50 15 . [POLICE.txt]
ID: REACT:
HEIGHT (cm) WEIGHT (kg)
SHLDR: (cm) PELVIC: (cm)
CHEST: (cm) THIGH: (mm)
PULSE: DIAST:
CHNUP: BREATH: (liter)
RECVR: (treadmill) 5
SPEED:
ENDUR: ()
FAT:
15 .
, , Bubble plot .
-
CHAPTER 4
? .
. PCA(Principle Component Analysis: )
? . ,
, ,
(, ) .
( Big & Tall) .
2
.
.
3 ( 15 )
. 15
.
, , , 30
. 30
. 1-
2 .
(1) 3p 1-2 ( 3
.) (2) ( )
. p p
() 1-2 () .
. 80% .
-
Chapter 4.
50
4.1.
4.1.1.
19 (: pound) IQ .
IQ ( ) .
IQ
.
.
.[2 SAS/IML ]
-
51
69.5181 18.12
. () .
(, IQ) ()
(99.77%) .
4.1.2.
(1)
- , -
.
. 3
? 3 Bubble
3
. 3p (1-2 )
.
(2)
() 1-2 .
4.1.1 IQ () ,
. 2
3p
.
-
Chapter 4.
52
(3)
.
-
.
() .
.
(4)
(multicollinearity)
( 1)( XX
0|| XX 12 )()( XXMSEbs )
.
(Ridge Regression:
)
. (
) .
..
4.2.
(principal components) .
(1) . ()
(2) (, ) 2, 3,
.
p
px
xx
x
2
1
.
-
53
py
yy
y
2
1
xLy
pppp
p
p
lll
lll
lll
L
11
22221
12111
( )
.
.
.
4.2.1.
p
.
. ?
px
xx
x
2
1
pppp
p
p
... ...
...
...
21
22221
11211
.
p ...21 ( i )
ie xey ii '
iiii eeYVar ')( ,
kiforeeYYCov kiki ,0),('
.
iX .
P .
p
ii
p
ii trPPtrPPtrtrxVar
11
)()()()()( .
-
Chapter 4.
54
k ppkkkk xexexexey k ...2211'
p
ii
k
1
.
4.2.2.
(1) (first principal component)
11'1 aa 1a )('1 xa )('( 1 xaV 1a
)('11 xay .
p
2
1
i
ix .
(2) (second principal component)
12'2 aa , 02
'1 aa ( .
) )('2 xa 2a
)('22 xay .
(3) (third principal component)
13'3 aa , 03
'1 aa , 03
'2 aa ( , )
)('3 xa 3a )('33 xay
.
-
55
( pyyy , , , 21 ) .
( ).
.
1
'
'2
'1
2
1
p
pp
x
a
a
a
y
yy
y
(4) ja ?
- p ..21
peee ,..., , 21 .
pp eaeaea ..., , , 2211
jiee ji ,1 , jiee ji ,0 .
.
jy j .
- trace )(tr pxxx ,...,,, 21 () .
ppi
ixVtr ...)()( 2211 jy
p
jjj
1/ .
(5)
- -
S . - )( S
. (, IQ)
S .
-
Chapter 4.
56
, .
,
(518.65+1.228) .
1 1)999958.0()009142.0( 22 0 .
.
4.3.
4.3.1.
. ()
. r j .
)(' rjrj xey , rx r- ( nr ,...,2,1 )
(IQ, Weight)
68.1192.100 x .
-
57
j () 110(pound) IQ=125 j
jjjY )68.119125(009142.0)02.100110(999958.01
jjjY )68.119125(999958.0)02.100110(009142.02 .
. SAS OUT=
. (4.5 )
4.3.2.
je jjj ec , pj ,...,2,1
(component loading vector) .
.
.
.
,
.
.
0.99 , IQ 0.009
. IQ
.
.
(1)
.
,
.(4.5 )
(2)
.
-
Chapter 4.
58
4.4.
. .
() () 100% .
(, IQ) IQ 2
() 1 . 4.1
y (2 1 )
115 3 . IQ
3
. .
(
80%) 2 .
.
4.4.1.
k
p
jjk
1/ .
( 1 ) , ,2
. 80%
?
9.0...)(
)(
7.0 21
StrStr
2-3 90%,
5-6 70% .
1 80% 1
.
-
59
99.8%
2 .
. .
4.4.2. SCREE plot
),,2( ),,1( 21 SCREE 0
. 9 SCREE plot .
4 0 3 .
0
10
20
30
40
50
60
70
0 2 4 6 8 10
eigen
518.69 1.18
1 . SCREE
plot . 4.5 80%
1 .
4.5.
-
Chapter 4.
60
.
,
.
,
.
(, ) ,
.
. (:
) ,
.
.
pppp
pp
pp
p
pp
p
pp
p
pp
p
R
...
...
...
22
2
11
1
22
2
2222
22
1122
12
11
1
2211
21
1111
11
1 ...
... 1
... 1
21
221
112
pp
p
p
R
1 ...
... 1
... 1
21
221
112
pp
p
p
rr
rr
rr
R
- - S
R R .
- S
.
(1) S R .
(2) 1 80% .
(3) .
-
61
R
p
ikjj Str
1/)(/ pj /
4.6.
4.6.1. 2
.
(1)IQ(y ) (x ) .
(2)IQ .
(3) . 1=545.25 2=16.27
(4) , (0.0666, 0.998)
(0.998, -0.0666). () y1 y , y2 x
.
(5)(1) (4) .
SAS & (4)
(, score) . )( x
x .
-
Chapter 4.
62
(2)
y1 y2 0 .
(3)
(1), (5)
-
63
, .
( ) (, IQ)
545.25/(545.25+16.27)=97(%) .
y (IQ) x () .
4.6.2.
48 .
2~3
. [APPLICANT.txt]
15
. 15 6 ( )
. 15 1~2 (
-
Chapter 4.
64
)
.
DATA APPLICANT; INFILE "C:\TEMP\APPLICANT.TXT"; INPUT ID L AP AA LI SC LC HO SM EX DR AM GC PO KJ SU; RUN; PROC PRINCOMP DATA=APPLICANT OUT=SCORE COVARIANCE; VAR L--SU; RUN; PROC PRINT DATA=SCORE; VAR PRIN1-PRIN15; RUN;
(1)OUT SCORE SAS . SAS
PRIN1( ), PRIN2, .
SCORE SAS . SCORE
. 15
4.3040, 0.3819, .
)( x .
(2)COVARIANCE(COV) -
. Default ( R ) .
(3)VAR L--SU . VAR L AP AA SU;
. .
-
65
.
p .
15.7 1111 s , 26.1 21122112 ss , S
...8.6,6.10,2.18,5.66 4321
-
Chapter 4.
66
(15x15) (66.53+18.18++0.30) 15
(122.53 = ..95.387.315.715
1
iiis ) .
( ii / ) . Difference
. ( S ) 66.53
54%(=66.54/122.54) . 48.36 (66.54-18.18) .
Cumulative = . 4
83% 15 4 .
80%
eigen-value 1 . 10
1~2 80% .
( 69%)
. 6
.
PROC FACTOR DATA=APPLICANT SCREE COVARIANCE; VAR L--SU; RUN;
-
67
SCREE plot
54% .
1-10
. 1~2 80%
.
15152211 ,...,, eaeaea .
r () j )(' rjrj xey , ( rx r -
). OUT=SCORE SAS score
PROC PRINT .
(4.304)
)(*2745.0...)(*0296.0)(*132.0)(*149.0 SUSUAAAAAPAPLL
4304.0)96.510(*2745.0...)08.72(*0296.0)08.77(*132.0)66(*149.0
-
Chapter 4.
68
4.6.3.
, .
. (
) .
.
DATA APPLICANT; INFILE "D:\TEMP\APPLICANT.TXT"; INPUT ID L AP AA LI SC LC HO SM EX DR AM GC PO KJ SU; RUN; PROC PRINCOMP DATA=APPLICANT OUT=SCORE1; VAR L--SU; RUN; PROC PRINT DATA=SCORE1; VAR PRIN1-PRIN15; RUN;
[ ]
. (L, AP) .
8652.3149.72553.12388.0
( S ) .
1 (4 ) 80% .
i pi / .
1 .
-
69
.
.
.
.
4.7.
4.7.1.
-
Chapter 4.
70
( )
.
.
xL
x
xx
e
e
e
y
yy
y
pp
p
.........
2
1
'
'2
'1
2
1
.
( ) ( )
.
.
(index)
. .
. (
80% ) 4 .
-
71
1 LC(), SM(), DR(), AM(),
GC(), PO() 1 &
.
2 EX(), SU()
.
3 LI(), HO(), KJ()
.
4 AA(), KJ() .
? .
.
2 (,
) .
4.7.2.
.
. (Prin1)
(Prin2) .
SAS data OUT1 . [OUTSTAT=OUT1]
OUT .
OUT1 SAS .
-
Chapter 4.
72
() _TYPE_ SCORE .
(transpose) .
4.6.2 PRINCOMP .
Prin1 Prin2 _NMAE_ ID . SAS
PLOTIT MACRO .
-
73
, .
72 .
. 3 4
. %PLOTIT PRIN3 PRIN4
.
-
Chapter 4.
74
4.7.3.
(1)
. (2.7.4 )
. OUT=SCORE
.
PLOTIT macro . DATA=,
LABELID=, PLOTVARS= .
-
75
-
Chapter 4.
76
4 . 1 ( 1 2
), 2 ( 1 2 ), 3 , 4
. 0 2
. (80% ) 1
(0 ) (0 ) .
3 (10, 11, 12, 37, 38) (mild) (41,
42) ( ) . (41, 42)
() . . (10, 11, 12, 37, 38)
.
-
77
(2)
& ( 1) 6 .
PRIN1 .
.
( 2) 6 .
-
Chapter 4.
78
( 3) .
6 . 3
.
-
79
4.8.
()
. y , x xLy .
. 1) ?
. 2) ( L )
? (
) 0,1 '' jiii eeee
. 3) ? 80%
( 1 )
. . (
p
iii
1
/ , pi / )
(1)
(multivariate normal dist)
. (Why?
) ( ? Normal plot, Shapiro-Wilks W statistic)
- (Box-whisker plot)
.
4
. OUT SAS data
SCORE . PROC UNIVARIATE
(STEM-LEAF PLOT), -
(BOX-WHISKER PLOT) .
-
Chapter 4.
80
W- .
1 4 2, 3 .
.
SAS/INSIGHT Box-whisker plot . SCORE data
.
-
81
SAS data
.
( CTRL
) .
-
Chapter 4.
82
Box-whisker plot . Prin2
Prin3 .
bullet Prin2 (41, 42)
. Prin2 (EX. SU)
.
(2)
( ipipiii eXXXY 22110 )
() 0|| XX . ( 0|| XX : X
.
. ( jk aXX )
0 .)
)(||
1)( 1 XXadjXX
XX
1)( XX .
-
83
YXXX 1)( , 12 )(
XXMSEs
( ) t-
.
.
(
.)
VIF(Variance Inflation Index) Condition Index ,
10 .
.
(Ridge Regression:
MSE(Mean Square of Error)
) .
.
.
.
( )
.
. .
ipipiii einpininY Pr2Pr1Pr 22110
PRIN1, PRIN2 OUT= .
.
.
-
Chapter 4.
84
, .
2~3 .
(stepwise, backward, forward)
. ,
.
.
( y )
.
4.9.
.
(1)
(covariance
matrix) . (: kg pound, :
: ) (correlation matrix)
.
(2) / , ]
DATA APPLICANT;
INFILE "C:\TEMP\APPLICANT.TXT"; INPUT ID X1-X20; RUN; PROC PRINCOMP DATA=APPLICANT OUT=SCORE OUTSTAT=OUT1 COVARIANCE; VAR X1-X20;
RUN;
-
85
(3)
80%,
1 . xLy L .
OUT1 . PRIN1, PRIN2 .
(4)
()
. .
(5)
xLy x
y OUT=SCORE SCORE . PRIN1, PRIN2,
.
PRIN1, PRIN2, ( PRIN3)
.
, PRIN1, PRIN2, ( PRIN3) ,
- .
()
. PRIN1(, ), PRIN2(, ) 4
.
-
Chapter 4.
86
PRIN1, PRIN2
.
SAS data
-
87
[EXERCISE]
(1)1990 (MMH) 25
7 (ADORMS) . [Applied Multivariate
Methods for Data Analysts, Dallas E. Johnson, 1998, p94] NAVY.txt(
)
7 MMH .
.
( 1).
( ).
(
2). . ?
?
.
2)( ii yy .
(2) 1994 BIG8 8 football .
[http://lib.stat.cmu.edu/DASL]]BIG8.TXT
, , Rushing (RO_YDS), Rushing , Passing ,
Passing , , (TD_YDS), (Scoring Offence),
(SD), (Turn-over margin per game), (WINS)
-
Chapter 4.
88
6 .
- ? ?
.
.
.
4) .
.
(ID=)
? ?
1 , 2 ,
. . .
-
CHAPTER 5
5.1.
(FA: Factor Analysis )
Galton( )
(1888) Spearman(1904 )
.
(1) ( ) (2)
( ) .
,
(Likert)
. ( ) Cronbach .
, , , ,
, , , . 8
? . A 48
15 () .
-
Chapter5.
90
20 (, , , )
.
5.1.1. Spearman (1904)
Spearman 6
.(,
) . (
)
Classic French English Math Discover Music
Classic 1 .83 .78 .7 .66 .63
French 1 .67 .67 .65 .57
English 1 .64 .54 .51
Math 1 .45 .51
Discover 1 .4
Music 1
Spearman () (f:
factor ) () .
f j .
f .
66
55
44
33
22
11
cov
fmusicferdis
fmathfenglishffrenchfclassic
( fact )
. ( factor1, factor2 ) 6 2
-
91
.
()
() . (classic), (French), (English)
(Math), (Discovery), (Music) .
5.1.2.
p 2~3
p )( pm
.
.
.
.
px
xx
x
2
1
, R .
-
Chapter5.
92
XLY ijl
.
FLX ijl loading ()
pppppp
pp
pp
xlxlxly
xlxlxly
xlxlxly
......
...
...
2211
22221212
12121111
ppppppp
pp
pp
flflflx
flflflx
flflflx
......
...
...
2211
222221212
112121111
ij =
.
-
.
ijl . ,
. ( ) )( jiij el
( ) )( jiiij el
ijl
ijl
.
.
5.2.
5.2.1.
(1) () .
(2) () .
(3) .
-
93
(4) .
5.2.2.
p ),...,,( 21 pxxxx , -
.
fLx
pmpmpp
m
m
p f
ff
lll
llllll
x
xx
...... ...
... ... ...
...2
1
2
1
21
22221
11211
2
1
(1) mk fff ..., , , 2 (: common factor)
(2) ijl ( : factor loading)
i j L (factor loading
matrix) .
(3) p ,...,, 21 ( : specific factor)
j j
L (factor loading matrix) . .
(1) kf 0, 1 . ( mk ,..,2,1 )
),0(~ If
(2) j 0, j . ( pj ,..,2,1 )
),..,,(,0(~ 21 pdiag , .
(3) kf j . 0),( fCov
-
Chapter5.
94
5.2.3.
fLx
(1) LLLfLCovfLCovxCov )()()( ),..,,(,0(~ 21 pdiag )
( pxxx ,...,, 21 ) .
0 .
(2) j jm
kjkjjmjjjjj llllxVar
1
2222
21 ...)(
m
kjkl
1
2 (communality) j (specific) .
m
kjkl
1
2 ix (common factor) .
1 11
2
jm
kjkl
.
(3) i j
m
kjkikji llxx
1),cov( jx (j )
kf ( k ) ),cov( kj fx jkl .
5.3.
5.3.1.
( R ) . ( )
(1) R LLR ,L .
P '**))(( PPLPLPRLLIR
L .
(2) L , )( ppm P
2/)1( pp .
-
95
3 6 (
3 , 3 ) 9 . pm LL
0 .
(3) )( pm L ?
(factor rotation) . (: SAS
ROTATE=VARIMAX )
5.3.2.
principal factoring w/ or w/o iteration, Raos canonical
factoring, alpha factoring, image factoring, maximum likelihood, un-weighted least square factor
analysis, Harris factoring .
/ . (principal factoring
w/ or w/o iteration) .
(1) (Principal Component (factor) method)
R ,
p ...21 , peee ,...,, 21 . ''
222'111 ppp eeeeee
.
LL
e
e
e
eee
pp
pp
'
'22
'11
2211 ]|||[
pm ),(,),,(),,( 2211 pp eee L
.
LL
e
e
e
eee
pmm
mm
0 0 ---
0 00 0
]|||[ 11
'
'22
'11
2211
,
m
iijiii ls
1
2
-
Chapter5.
96
ppssstr 2211)(
pRtr )( . j
jjjjjjpjj eelll '22
221 . j
.
: pp
isss 2211
, : pi
. pyyy ,,, 21
11)( yVar , 22 )( yVar ppyVar )(
.
111 / yf , 222 / yf , , ppp yf /
() 1 75
. . fLx
ppppppp
ppp
ppp
fefefex
fefefex
fefefex
...
:
...
...
222111
2222212112
1212211111
0
0)...( 22221
2 jmjjjjj lll . )( pm
.
(2) (Maximum Likelihood Method)
MLE L f
. jf j jx
,L ,
)|,()|,(max,
xLLLxLL
-
97
MLE . L
0 1 . MLE
1
Heywood . . Heywood
.
5.4.
principal factoring (factor)
. i
ii
yf
.
(S) ,
.
, . ,
, .
15 ( ) .
SAS (default) principal factoring
. Maximum
Likelihood . PROC FACTOR DATA=APPLICANT METHOD=ML;
.
-
Chapter5.
98
( x ) ( y )
xLy .
.
-
99
i j Loading() ijf = iji e . 1y
ap ( 2131.05138.7 ) 1 ap . fLx L . (Loading)
. () .
.
. 1
( : 1 )
( ) . (LC, SM, DR, AM, GC, PO)
? 1 ? (LC=, SM=, DR=,
AM=, GC= , PO=) 1
.
(Marketing Ability)=(LC+SM+DR+AM+GC+PO)/6 ( )
(LC, SM, DR, AM, GC, PO)
. .
5.5.
(loading) ()
.
-
Chapter5.
100
pmpmpp
m
m
p f
ff
lll
llllll
x
xx
...... ...
... ... ...
...2
1
2
1
21
22221
11211
2
1
( )
. .
(1)trivial . 1-2 .
1-2 .
(2)Kaiser ( )
0 ( ) R I
1
1 .
1 1
. SAS . (5.4 4
.)
(3)SCREE
SCREE ( 47)
. 80% ()
. APPLICAT
Kaiser 4 7.512.05 1.46 1.19
1 2 .
(4)Large-sample Test( 2 -)
MLE
2 - . p , m
.
pppmmppp LLH :0
factor1 factor2 Error
Common factors
-
101
:
(positive definite)
nSnSn
)1( 20 )(~]
||||ln[
L max maxlnln2 app
SnHunderLL
n
. 5.9 .
5.6.
( )
. (1)
: 2 (2) 0
.
(rotate)
QUARTIMAX rotation, OBLIQUE rotation, PROMAX rotation
VARIMAX . VARIMAX Kaiser
.
m p
LL L .
.
. , LL L
LPL * ( P ) LL .
( 21, ff ) . 20o
.
-
Chapter5.
102
1 2
-0.5
-0.25
0
0.25
0.5
0 0.5 1
X1 0.55 0.43
X2 0.56 0.29
X3 0.39 0.45
X4 0.74 -0.27
X5 0.72 -0.21
X6 0.59 -0.13
5.7.
5.7.1. [POLICE.txt]
50 15 . [Applied Multivariate Methods for
Data Analysts, Dallas E. Johnson, 1998, p160]
ID: /REACT: /HEIGHT (cm) / WEIGHT (kg)
SHLDR: (cm)/PELVIC: (cm)/CHEST: (cm)
THIGH: (mm)/PULSE: /DIAST: /CHNUP:
BREATH: (liter)/RECVR: (treadmill) 5
SPEED:
ENDUR: ()/FAT:
5.7.2. SAS
-
103
principal factoring
(default) VARIMAX . ( )
. REACT FAT
Maximum Likelihood .
5.7.3.
-
Chapter5.
104
(1)PROC FACTOR COVARIANCE default
.
p ( 15 )
1 .
(2) ? Kaiser . SAS default
1 .
1
.
(3)(MINEIGEN= minimum eigen value) (NFACTOR=)
default 1 .
80% .
. [ ] 1 5 .
15 5
.
(4) ( ) 1
. 5 .
.
5.7.4.
pmpmpp
m
m
p f
ff
lll
llllll
x
xx
...... ...
... ... ...
...2
1
2
1
21
22221
11211
2
1
pmpp
m
m
lll
llllll
... ...
... ...
21
22221
11211
.
Principal factoring
( ie : ) i .
. 1 5.21
.
15p . ( )
-
105
5.7.5.
jm
kjkjjmjjjjj llllxVar
1
2222
21 ...)(
m
kjkl
1
2 (communality)
j (specific) . 5
5m REACT REACT 1~ 5(5
) . , 9009.090082.006168.012762.023649.011577.0 22222
. .
1(100%)
. SPEED ENDUR
5 ( 1 ) 80% . ENDUR
5 5
.
-
Chapter5.
106
5.7.6. (Rotate )
( p ) Kaiser 1
( m )
specific .
pmpmpp
m
m
p f
ff
lll
llllll
x
xx
...... ...
... ... ...
...2
1
2
1
21
22221
11211
2
1
fLx fL, L
f . L
f
.
( pxxx ,...,, 21 ) ( mfff ,,, 21 )
.
.
() .
.
VARIMAX . REORDER
.
-
107
(1)
.
(2)
. WEIGTH 1(Factor1) 0.65 2
0.68 2 . WEIGTH 1
FAT, THIGH .
(3)(FAT , THIGHP , CHEST , CHNUP )
. . (obesity)
. ?
. .
.
. (FAT+THIGH+CHEST-CHNUP)/4 ( ) .
.
(4) ENDURE 0.38966 1
( 2~ 5)
. ENDURE 1~ 5
-
Chapter5.
108
38% . ENDURE 1~ 5
.
(5)(HEIGHT , SHLDR , PELVIC , WEIGHT , BREATH
) . 1
0.65 2 0.68
1 . ( )
(6)WEIGHT 1 0.65
1 . 2 .
(7)(PULSE , RECVR 5 )
.
(8)DIAST , REACT . 4-5
. 3 2
? 82 1-2 15
.
.
=(FAT , THIGHP , CHEST , CHNUP )
=(HEIGHT , SHLDR , PELVIC , WEIGHT ,
BREATH )
(9) (FAT_INDEX), (BODY), 6
8 . 15 2 ( ,
, ) 8
.
.
. CHNUP
.
-
109
. .
. ROTATE
.
5.7.7.
OUTSTAT F_STAT SAS data
.
F_STAT _TYPE_ UNROTATE , PATTERN
.
-
Chapter5.
110
SUBSET . _TYPE_=PATTERN : TEMP
1 2 . 3 2 ,
4 5 2 .
MACRO PLOTIT . 1
2 .
-
111
5.8. (Factor score)
pxxx ,...,, 21
.
fLx
pmpmpp
m
m
p f
ff
lll
llllll
x
xx
...... ...
... ... ...
...2
1
2
1
21
22221
11211
2
1
f
f .
.
( : factor score) . .
( )
.
-
Chapter5.
112
2
.
5.8.1. Bartletts Method (Weighted Least Square Method)
r )( rr xz . )()(1
rrrrfLzfLz
f r . rr zLLLf111 )(
5.8.2. Thompsons Method (Regression Method)
I '
,00
~L
LPNf
z zPLzfE 1')|( rr zRLf
1'
5.8.3. SAS
(1)Default Regression . SCORE
.
(2) . (NFACTORS=2) 2
5 .
.
(3) 2 . 1 .8
WEIGHT, FAT, CHEST
.
-
113
(4) rr zRLf1' 1' RL (standardized scoring
coefficient) . OUT
SAS data .
(5) SAS data OUT
. SCORE1 SAS data
PRINT procedure .
NFACTOR .
. SAS data SAS
-
Chapter5.
114
1 SAS data
.
(7) ?
. ( 1, 2)
21, yy .
2 ( , ,
) .
5.9. Comment
5.9.1.
1 ,
. 1 5
5 . EDURE 5
.
2 - .
ML (Maximum Likelihood)
.
HEYWOOD 1
. ML . ( )
-
115
(5 ) 5 .
1 .
?
5 2
.
ML ( 2 )
. 2 . NFACTORS=3, 4
5 Kaiser .
.
5.9.2.
. .
.
POLICE 1 2 .
-
Chapter5.
116
WEIGHT 1 2 .
1, 2
.
.
. FAT_INDEX, BODY
.
-
117
5.10.
? (Likert)
. (index)
. pxxx ,...,, 21
. Q4-Q13
. 10 2-3
. 10 ()
. 2
.
1
.
2 .
(1) .
(2) .
5.10.1.
.[CODING.TXT]
Q41 ?
7 6 5 4 3 2 1
Q52 ?
7 6 5 4 3 2 1
Q63 ~
7 6 5 4 3 2 1
Q74 ~
7 6 5 4 3 2 1
-
Chapter5.
118
Q85 ?
7 6 5 4 3 2 1
Q96 ?
7 6 5 4 3 2 1
Q107 ?
7 6 5 4 3 2 1
Q118 ?
7 6 5 4 3 2 1
Q129 ?
7 6 5 4 3 2 1
Q4-Q12 .
(Likert) (4 , 5 , 7 )
.
. DATA = ~ VAR = ~ ~
.
ROTATE=VARIMAX VARIANCE
.
REORDER ()
.
COVARIANCE () .
(COVARIANCE . default)
.
-
119
(METHOD) default= .
1 80%
1, 2, 3, 4, 5
.(53%)
.
0.6 ( ) () () .
1(factor 1) Q5-Q8 , 2
Q10-Q11 . Q4, Q12 2
0.75 0.55 .
( : Q5, Q6, Q7, Q8), ( : Q10, Q11)
.
5.10.2.
.
-
Chapter5.
120
Factor1 Factor2
Q7 0.74 0.22
Q8 0.70 -0.04
Q6 0.64 0.20
Q5 0.62 0.37
Q9 0.56 0.49
Q10 0.13 0.80
Q11 0.22 0.75
Q12 0.09 0.55
Q4 0.51 0.53
0.69 0.68
. 1(factor 1) Q5-Q8
2 Q10, Q11 .
( , , ) (
. (1) (2) )
.
2 55% Q8
1 Q4, Q6, Q8 .
.
.
.
( ) . .
-
121
5.10.3.
(index)
(internal consistency)
Cronbach alpha( ) . .
(: observed value)
(measurement error) . 0),cov( , ETETY .
(reliability coefficient) .
)var()var(
)var()var()var(
)var()var(),cov(
),(
2
22
YT
TYT
TYTYTY
, (
) Cronbach . p
),,2,1( pjETY jjj jjO TTYY 0, .
)var(
)var(
11
)var(
),cov(
1
O
jj
O
jiji
Y
Y
pp
Y
YY
pp
Cronbach 0 1 1 .
? 0.6 ? 0.7 ? .
Cronbach ,
.
Cronbach ,
.
(Cronbach ) .
-
Chapter5.
122
CRONBACH
.
2 (binary, dichotomous (0,1)) Cronbach Kuder-
Richardson 20 (KR-20) .
Cronbach
.
NOCORR () .
NOSIMPLE (, ) .
ALPHA CRONBACH .
. Q5-Q8 4
0.69 . Q5 Q6-Q8
0.59 .() Q8 Q5-Q7
-
123
0.68 . 4 () ( )
0.68 .
2 . (Q10, Q11)
0.68 .
5.10.4.
(1)Q5. ? (5 )
(2)Q6. ? (5 )
(3)Q7. ? (5 )
(4)Q8. ? (5 )
4 Q5-Q8
(factor analysis) .
-
Chapter5.
124
Q5, Q6, Q7
.
Q6 . Q6=1( ), ,
5( ). ()
.
Q5, Q6, Q7 ( , ) .
.
Q6 . .
-
125
-
Chapter5.
126
[EXERCISE]
(1)
Maximum Likelihood
. .
(2) 26 9 .
[http://lib.stat.cmu.edu/DASL]JOB.txt
9 .
9 .
9 .
( )
.
Country:
Agr:
Min:
Man:
PS:
Con:
SI:
Fin:
SPS:
TC:
-
CHAPTER 6
6.1.
pxxx ,...,, 21 (
)
(Variable-directed techniques) . (Discriminant Analysis)
(Clustering Analysis) ()
(individual directed techniques) .
npnn
p
p
xxx
xxx
xxx
...
...
...
21
22221
11211
:
() .
:
.
: (, )
, ( ) .
: .
-
Chapter 6.
128
6.1.1.
40 , (=/: ), , ,
, ( ), ( ) .
7 (distance) (similarity)
.
.
.
() 7
( , ) .
.
16 ( 9 , 7 )
.
(, ) ( )
. ( 3
. )
. .
,
. 2 ( )
3 .
-
129
( ) .
.
(:, :) 1)()
2)
. ( ) (1) (2)
. (2)
. (Fisher Linear Deterministic
Function ) (1) .
6.1.2.
, ,
() . (1)
?
. (2) ,
(misclassification) ?
2 2 . (1)
() (2)
() .
.
.
.
6.2.
-
Chapter 6.
130
2 .
1: ),(~ 111 pN , 2: ),(~ 222 pN
1n , 2n p
0x . .
()
( ) ( x ), - ( ) -( S ) .
(pooled) - )2(
)1()1()2(
)1()1(21
2211
21
2211
nnSnSn
nnnn
.
6.2.1.
.
(1)(Likelihood)
),:(),:( 220110 xLxL 1 ,
),:(),:( 220110 xLxL 2 .
)]()'(2/1[2/12/
1
exp||)2(
1),:(
xxppxL
(2) (Fishers Linear Discriminant Function)
- (variance-covariance )
(likelihood function) . 00' kxb (Linear
Discriminant function) 1 , 2 .
121
' )( b , )()')(2/1(21
121
k
-
131
(3)Mahalanobis
- (variance-covariance ) ,
. 21 dd 1 , 2
.
Mahalanobis Distance: )()'( 01
0 iiixxd i =1,2.
(4) (Posterior Probability)
- 1
)|()|( 0201 xPxP 1 , 2
.
]2/1[]2/1[
)2/1(
21 expexpexp)|(
dd
d
pii
xP
6.2.2.
2 1 2
2 1 .
.
.
(1)Re-substitution
(overestimate)
.
(2)
,
. 1/2
.
-
Chapter 6.
132
(3)Cross-validation
Lachenbrush(1968) .
,
. Jackknife . 2
.
2 Cross-validation
.
1 2
95 10 90 5
5 90 10 90
1 .
(equal cost function) (ratio cost function) .
6.2.3.
Kansas Dr. Michael Finnegan
82 9 .
TURKEY.txt/[Applied Multivariate Methods for Data Analysts, Dallas E. Johnson, 1998, p223]
ID: id HUM: RAD:
ULN: ) FEMUR: TIN:
CAR: carp metacarpus D3P: COR:
SCA: TYPE: (WILD), (DOMESTIC)
-
133
HUM, ULN
.
.
(1)
HUM( ), URN( ) TYPE
.
SYMBOL V Value( ), C
color( ) .
GOPTIONS RESET=ALL
. .
-
Chapter 6.
134
.
(2)Fisher
CROSSLIST cross-VALIDATION (
) . Resubstitution .
METHOD=NORMAL NPAR NORMAL
Fisher
default .
Nonparametric .
OUT SAS data .
-
135
CLASS VAR .
(3)
82 HUM, ULN 40 .
2 2 .
() 22 , 18
.
prior prob. ( ) .
default .
.
Fisher
-
Chapter 6.
136
(constant) )()')(2/1( 211
21 k / (coefficient)
)( 211' b
(classification function): kxb 0' 1 ,
2 . 0' xb
.
Resubstitution .
Re-substitution under-estimate
.
-
137
19 Fisher
. From type Classified into
.
Posterior Prob. ( ) ()
. 2 (Obs=2) 0.13,
0.87 .
1 .
-
Chapter 6.
138
, .
Fisher , cross-validation .
6 6/22=0.2727 .
3 3/18=0.1667 .
(0.2727+0.1667)/2=0.2197 .
.
SAS data OUT1 . (OUT=OUT1
)
-
139
_INTO_ .
.
IF (GROUP^=.) .
, .
-
Chapter 6.
140
.
(3 ) . 6 3
.( .)
(4)
2
. (HUM, ULN) .
(HUM, ULN) = (145, 150) , (HUM, ULN) = (150, 145)
SAS . OUTPUT
NEW 2 HUM, ULN .
TYPE
.
-
141
(HUM, ULN) = (145, 150) 0.64
(HUM, ULN) = (150, 145) 0.56 . (
0.5 )
2 .
.
-
Chapter 6.
142
6.3.
6.3.1.
3 .
Resubstitution, Test Data , Cross-Validation. SAS Posterior Prob.
Cross-validation . SAS data
OUTCROSS= . OUT=(:OUT1)
Resubstitution . Test data
OUTTEST= .
-
143
6.3.2.
2 .
(1) 1|2C : 1 2
(2) 2|1C : 2 1
.
2 1 1p , 2 2p .
)ln()()(2/1 *01'
0*
iiiipxxd , 2,1i
)2|1()1|2()1|2(
21
1*1 CpCp
Cpp
,
)2|1()1|2()2|1(
21
2*2 CpCp
Cpp
*2*1 dd 1 ( 1 )
*2
*1 dd 2 ( 2 ) .
Wild Domestic .
.
( p1 p2
)1|2()2|1( CC ), ( 21 pp ) .
.
. .
(1)PRIORS EQUAL
(default)
-
Chapter 6.
144
(2)PRIORS PROPORTIONAL
(3)PRIORS WILD=0.4 DOMESTIC=0.6
0.6, 0.4
6.2.3 => 6->2 4 , =>
3 4 .
.
6.3.3.
SAS
- (within) -
. -
POOL=YES .
.
-
145
( 0.1)
.
6.4.
() .
(1) ? (2) ?
. Forward , Backward
, Stepwise .
6.4.1. Forward
.
(1) () ( )
(ANOVA) . F- .
.
-
Chapter 6.
146
(2) ? (covariate)
(ANCOVA) SS3 F- .
.
: .
20 2
, .
.
. (Y) ( / )
(ANOVA). .
. .
. .
.
? , (covariate)
(ANCOVA) SS3 F- .
. F- (SS3 p-
) . 0.25 0.5 . SAS
SLE(Significant Level for Entry) .
6.4.2. Backward
.
(1) , , ( )
(Type III, Partial SS) F-
.
(2) . SS3 F- (p-
) . 0.15 . SAS
SLS(Significant Level for Stay) .
-
147
6.4.3. Stepwise
Forward .
.
.
,
. SAS SLS, SLE
.
6.4.4.
15 Backward , 15 Stepwise
. .
F-
. (X1, X2) X1
X2 . .
.
X2
X1 .
. .
-
Chapter 6.
148
(1) (2)
.
6.4.5.
.
.
. (TURKEY0.TXT)
(1)
-
149
CROSSLIST
OUTCROSS . 8 cross-
validation (, ) OUT1 .
9 32 19 , 13
. PRIOR equal (0.5) .
(HUM, ULN) . 6.2.3
9 .
-
Chapter 6.
150
(2)
Forward, Backward, Stepwise
Stepwise
Fisher .
0.25-0.4, 0.15
. 0.25, 0.15
Stepwise . SAS
STEPDISC procedure .
-
151
. TIN, COR,
D3P, ULN . Fisher .
4 =>, => .
.
.
(3)
4 9
.
. 4 . ()
OUT2 TYPE _INTO_ .
.
.
-
Chapter 6.
152
B710 COR L684 L750 4
. WILD COR
-
153
6.4.6.
. (TIN, COR, D3P, ULN) = (140, 105, 300, 145) .
(domestic) 0.957 .
6.5.
Fisher Fishers between-within method .
(Canonical)
. ( p ) p -
. (BOX-PLOT )
. ( )
.
.
-
Chapter 6.
154
6.5.1.
.
),(~ ipi
N in . mi ,...,2,1 (m )
p - .
Between sample mean:
m
i oioiinB
1))(( ,
m
i ii
oon
n 11 ,
m
iio nn
1
Within sample mean:
m
i
n
r iriiri
ixxW
1 1))((
bWBbbBb
b )(''
max0
b BWB 1)(
. 1b . xby'11
.
|| '1'1 ii bxbd , mi ,...,2,1 .
6.5.2.
2b BWB1)( . BWB 1)(
2 . 2
. 2'2'2
2'1
'1 )()( iii bxbbxbd , mi ,...,2,1
,
() .
.
-
155
6.5.3.
( 80% SCREE plot
. p
2 .
6.5.4.
PROC CANDISC . 8
. (
.) NCAN . NCAN=2
2 OUT CANSCORE SAS CAN1 CAN2
.
(1)
bWBb
bBbb )(max
0
-
Chapter 6.
156
NCAN=2 100%
. . (
100%) .
.
.
2 .
2
-
157
3969.0128*0234.0...140*1906.0153*1172.022016.0128*0033.0...140*022.0153*029.01
CanCan
(3)
, .
(4)
2 SYMBOL 2 . (
) .
. Bullet
SYMBOL1 V=DOT V=CIRCLE .
-
Chapter 6.
158
8 100%
. 2
.
2
.
. SAS default
.
?
.
.
-
159
CAN1 2.09 CAN1 -1.54.
.
6.6. K Nearest Neighbor
( ) Mahalanobis ( )()'( 01
0 iiixxd )
. K nearest neighbor .
(1) Mahalanobis
.
(2) 2 .
(3)2
3 . k nearest neighbor
Mahalanobis k k
. 3
.
6.6.1.
LIST CROSSVALIDATE
.
-
Chapter 6.
160
K=3
K=2
K=5
K=3, K=5 () K
. TURKEY K=5 nearest
neighbor .
-
161
Fisher (6.4.5 ) 3
. Fisher
2 (10.53%), 1 (6.25%), 8.39% K Nearest
neighbor 8.88% Fisher
.
6.6.2.
( ) , , Binary Classification Trees
. Breiman, Friedman, Olshen, Stone (1984)
CART(Classification And Regression Trees) . J. A. Hartigan
CHAID(Chi-square Automatic Interaction Detector) . Data
Mining . SAS E/Minor, SPSS Clementine Data
Mining Tool .
6.7. 3
6.7.1.
(Wheat) Arthur (soft ) Arkan (hard ) Group 1, 2 Group 3,
4 . 4 .
-
Chapter 6.
162
. [WHEAT.txt]
(Right) (Area), (Perimeter), (Length), (breadth) (down)
(Area), (Perimeter), (Length), (breadth) . [Applied
Multivariate Methods for Data Analysts, Dallas E. Johnson, 1998, p237]
6.7.2.
2 .
Fisher
. .
-
163
OUT1 data
. Obs. 10 1
2 0.621 2 . ()
6.7.3.
BACKWARD .
3 (D_L, R_P, D_B) default 0.15
. SLS=0.2 .
-
Chapter 6.
164
.
.
(1)
(2)
-
165
K-nearest neighbor
.
6.8.
(: continuous, measurement, metric))
. decision tree (CART, CHAID)
.
(Logistic Regression) (binary, dichotomous:
0 1 ) (ordinal: //)
. .
.
0, 1(: , ) .
. 1y , 0y
. 1 0 .
.
(ordinal)
. (, , )
. 3 LOGISTIC
CATMOD . CATMD
CATegorical data MODeling LOGISTIC
CATMOD
6.8.1
ipipiii exxxxfy ..)( 2211 , ),0(~2iidNei
0 1 (
) ( 2R ) F- t- ,
-
Chapter 6.
166
. iy ( 0, 1 ) OLS
.
(1)ODDS: p
podd
1
[p=0.5 1 . ] .
2002 16 0.1 1/9 Odds . 1$ betting
9$ return 2002 16 0.8 4 Odds .
4$ betting 1$ return
(2)ODDS transformation ()
ppp
1
*
(3)
)1Pr( Ypi
( 1Y ) . odds Odds .
i
ip
pp i
1*
ip (0,1) *ip (0, ) . )ln(
*ip
(-,) ),0(~ 2Normalei (
) . Logistic model .
ipipiii
ii exxxppy
..)1
ln( 2211 , ),0(~ 2Normalei :
.
ixxxixxx
xxx
i ee
ee
exYppipiipipii
pipii
}..{}..{
}..{
22112211
2211
1
1
1)|1Pr(
-
167
ip (: 1Y )
ip
.
(4)
2Log L, AIC(Akaike Information Criterion) Schwartz Criterion
(Adjusted ) Wald Chi-
square .
6.8.2. OLS
EXAMPLE
Remission.txt (cell, smear, infil, li, blast, temp)
() .
OLS (Li ) .
OLS .
remiss=1
remiss=0 .
OLS Y OUT1
-
Chapter 6.
168
Li
(p=0.0035) .
( Y )?
-
169
6.8.3.
OLS Li
.
descending ? SAS event()
non-event . 1 ( event )
descending . output
( Y ) OUT2 .
-
Chapter 6.
170
2 - . p- 0,0146
Li .
OLS Y ( oYhat _ ), Y ( lYhat _ )
event non-event
Li
(i=join) .
-
171
OLS ( ) .
0 1 )|1Pr( xYpY ii , event(
) 0 1 .
6.8.3.
Logistic . ,
(Cross Table) .
. stepwise Entry
0.2, Stay 0.1 . ( SLE=0.25 , SLS=0.15
)
-
Chapter 6.
172
SAS Order Value=1 (event) , Order Value=2 (non-event)
. (domestic) .
}*5.03.73{11)|Pr( TINi e
xDomesticYp . TIN
.
(1)INCLUDE
INCLUDE . TURKEY
Fisher TIND3PCORULN TIN
D3P Logistic . (TURKEY
)
INCLUDE=2 2 . TIN,
D3P .
-
173
Tentative }*61.23*67.0*55.72.544{11)|Pr( FEMURPDTINi e
xDomesticYp
. TIN , D3P FEMUR
.
(3 . 3 0 )
(: )
(2) (Standardized Beta Coefficient)
CTABLE (cross-
tabulation) . STB (
) . ( TIN, D3P ).
.
p-
.
-
Chapter 6.
174
TIN 4.37 1.54
(Event) . (-0.92,
0.23) .
CTABLE (cross-
tabulation) . STB . (
TIN, D3P ).
(DOMESTIC) EVENT (=1)
. Pr(Y=1) Pr(Y=Event) .
-
175
Prob. Level Logit Pr(Event) Event
. Pr(Event) 0.0 Event
. EVENT 19
Event non-EVENT 18 Event
.
Correct Event(Domestic) Event Non-event Non-event
In-Correct Event(Domestic) non-Event Non-event event
Correct . 51.4
19/(19+18). 81.1 (19+7)/(19+18) .
Sensitivity Event() Event()
False Pos. Event() non-Event() ,
Sensitivity + False Pos. 1 .
Specificity non-Event() non-Event()
False Neg. non-Event() Event() ,
Specificity + False Neg. 1 .
CORECT Prob. Level .
Sensitivity Specificity . ( )
Prob. Level 0.24~0.8 Correct,
Sensitivity, Specificity . Prob. Level 0.4 .
Pr(Event) 0.4 Event() non-
Event() . 0.4 3 ( 8.1%=3/37,
Non-event Event Event Non-event ) Fisher K-
nearest .
6.8.4.
-
Chapter 6.
176
Pr(Event) Phat SAS data OUT1
.
_LEVEL_ EVENT(Domestic) . Prob.
Level 0.4 . PHAT 0.4
WILD() _LEVEL_ WILD . PHAT
.
.
-
177
CTABLE 2 . . .
() 2 logistic ( 2 : TIN, D3P)
.
2 Pr(EVENT) .
0.4 (TIN=145, D3P=320) Domestic() (TIN=150, D3P=300)
Wild () .
-
Chapter 6.
178
6.8.5.
Fisher K nearest neighbor
. ( CART, CHAID ) Logistic
1) ( ) 2) ( )
. 2 ( )
Logistic . ()
(: ) .
3 Logistic
. WHEAT GROUP (1
-
179
1( 1Y ) 3 Phat .
467.0))89,226,,56,54(|1Pr( 1 rlrpradaxY
297.0467.0764.0))89,226,,56,54(|2Pr( 1 rlrpradaxY
175.0764.0939.0))89,226,,56,54(|3Pr( 1 rlrpradaxY
06.0939.01))89,226,,56,54(|4Pr( 1 rlrpradaxY
( )
-
Chapter 6.
180
[EXERCISE]
[CAR.TXT : http:// lib.stat.cmu.edu/DASL]
(1)CAR.TXT 5 (MPGDISPLACEMENT) 2 (
) () .
. 2 , -
. 3 ? 2 .
(2) (US, non-US) HOMEWORK #7-1
( )
.
(3)CAR.TXT 2 (MPG, HORSEPOWER) (US, non-
US) .
MPG, HORSEPOWER .
FISHER CROSS-VALIDATION
.
(1) .
(4) 2 . (3)
.
(MPG, HORSEPOWER)=(20, 100) (MPG, HORSEPOWER)=(25, 120)
(5)CAR.TXT , 2 MPG, HORSEPOWER, (US, non-US) 2
, non- 70%, 30% .
.
.
-
181
(6)CAR.TXT , (MPGDISPLACEMENT) 5, (US, non-US)
PROC STEPDISC . ( &
) =0.2 (Type III SS Group .)
(7)CAR.TXT , (MPGDISPLACEMENT) 5, 2(US, non-US)
5 .
.
MPG, MANPOWER 2 , .
(8)CAR.TXT 5 (MPGDISPLACEMENT)
.
(9)CAR.TXT 5 (MPGDISPLACEMENT) K nearest
K .
(10)Fisher , K-nearest ,
.
(11)(Wheat) [WHEAT.txt] 4 , 8
Fisher . (8 )
Fisher .
K-nearest . K . (
.)
Location (
) cut-off .
-
.( ) 2
,
-
CHAPTER 7
, , , , ,
. (cluster)
(Clustering Analysis).
() .
. grouping classification
.
? 2
. 12 2 () .
Euclidean () ( ) .
-
Chapter 7.
184
1) 2 (scatter
plot) 2) 3 Bubble Plot 3) 4
2-3 .
.
7.1.
7.1.1. Euclidean
. (similarity)
. r s Euclidean .
2/1)]()'[( srsrrs xxxxd
7.1.2. Euclidean
. r s Euclidean
.
-
185
2/1)]()'[( srsrrs zzzzd
7.1.3. Mahalanobis
r s Mahalanobis within - . .
2/11 )]()'[( srsrrs xxxxd
7.2.
7.2.1.
seed seed (
) () . 3 . 1)
() . 2) seed
. 3) seed
. SAS
procedure FASTCLUS .
7.2.2.
() single-linkage clustering
. Neighbor Method single-linkage clustering
.
(1) (n) . 6
Euclidean () . 6 .
-
Chapter 7.
186
1 2 3 4
1 0.1 0.7 0.2
2 0.4 0.6
3 0.3
4
.
(2) ( ) . (3.5)
.
(1, 2) 3 4
(1, 2) ? ?
3 0.3
4
?: (1, 2) ) () ?
(3) .
( ) () 5 .
Nearest neighbor: ()
Furthest neighbor:
Centroid neighbor:
Average neighbor:
Wards minimum variance:
.
Nearest, Furthest, Centroid neighbor, Average neighbor, Wards minimum variance
? Nearest
Furthest
. 2-3
-
187
.
Average neighbor .
(4) Nearest neighbor .
1 3 0.7, 2 3 0.4 (1, 2) 3 0.4 . 1
4 0.2 2 4 0.6 0.2 (1, 2) 4
.
(1, 2) 3 4
(1, 2) 0.4 0.2
3 0.3
4
(1, 2) 0.4 3 4 0.3 (1, 2, 4) 3 0.3
.
(1, 2, 4) 3
(1, 2, 4) 0.3
3
7.3.
? Tree diagram
Hotel lings 2T Cubic Clustering Criterion .
7.3.1.
-
Chapter 7.
188
(Tree Diagram) , ,
(). (diagram) 2 (2, 4), (3, 5, 6, 1)
.
7.3.2. Pseudo Hotel lings 2T
Hotel lings 2T .
.
7.3.3. CCC
Searle(1983) CCC(Cubic Clustering Criterion)
CCC 3 .
.
-
189
7.4.
(Discriminant Analysis) (Clustering Analysis)
-
Chapter 7.
190
.
.
( ),.....,, 21 PXXX
.
(1)
Fisher method (
, )
K Nearest Discriminant Analysis
Logistic Regression (
,
)
(2)
(by )
. 2
.
(3)
()
.
(1) .
Nearest neighbor
Furthest neighbor
Centroid neighbor
Average neighbor
Wards minimum variance
(2) .
CCC
Pseudo Hotel lings 2T Tree Diagram
(3)
. 3
.
.
(4) .
-
191
7.5.
56 MOIS( ), PROT( ), FAT( ),
ASH(ash ), SODIUM( ), CARB( ), CAL()
. 56 . PIZZA.txt/[Applied Multivariate
Methods for Data Analysts, Dallas E. Johnson, 1998, p331]
7.5.1.
(1)STANDARD Standardized Euclidean Distance
. STANDARD .
(2) Average neighbor( )
. (METHOD=AVERAGE)
AVERAGE | AVE average linkage vs. CENTROID | CEN centroid method
COMPLETE | COM complete linkage (furthest neighbor, maximum method, diameter
method, rank order typal analysis).
DENSITY | DEN density linkage, which is a class of clustering methods using
nonparametric probability density estimation. You must also specify one of the K=, R=, or
HYBRID options
EML maximum-likelihood hierarchical clustering
-
Chapter 7.
192
MEDIAN | MED Gower's median method
SINGLE | SIN single linkage (nearest neighbor, minimum method, connectedness
method, elementary linkage analysis, or dendritic method).
WARD | WAR Ward's minimum-variance method
(3)CCC Cubic Clustering Criterion , PSEUDO Pseudo Hotel lings 2T . .
(4) TREE SAS data .
7.5.2.
7 . 7
2 .
.
. ()
.
7.5.3.
-
193
(1) .
(2) . 34021 34026 .
() (Norm Distance): ) 0.0203 .
(3) 24107 34022 . 0.0304
(4) 14072 24030 . 0.0366
(5) CL54( 54: 24107, 34022) 55( 34021, 34026)
.
(6) 24049, 24033 14118, 14143
.
(7) 52( 54, 55) 14067 . .
(8)FREQ . 2
.
7.5.4.
-
Chapter 7.
194
.
. NCL Number of Clustering . Tie()
. Tie (Tie) .
(1)CCC(Cubic Clustering Criterion, Searle; 1983) 20% .
CCC 3 . CCC
4 .
(2)Hotellings 2T
. PST2(Pseudo Hotellings 2T ) . PST2
.
. PST2 NCL=6, NCL=3
(NCL) . NCL=6
7 NCL=3 7 .
(3)PSF Pseudo F PST2 Pseudo Hotellings 2T . CCC PST2 .
(4)CCC 4 , PST2 4 7 .
-
195
7.5.5. Hierarchical Tree
7.5.6.
( 4 7 )
() . 7
.
NCL=2
NCL=7
NCL=4
-
Chapter 7.
196
=7
1 7 .
2001 1 2001 .
.
.
7.5.7.
.
.
7 (MOIS, PROT, FAT, ASH, SODIUM, CARB, CAL)
. ( )
.
? 2 .
. .
-
197
. 2
3 2 .
HPOS=50 (H) 50%
VPOS=50 (V) 25% .
( : ) .
152 2 6 91.8%
. Prin1 (PROT, FAT, ASH, SOLDIUM, CARB( ))
Print2 ( ) . 1
, 2 .
()
.
() .
.
-
Chapter 7.
198
. 4 .
7 4 .
4 .
7.6. Faster Cluster
Faster clustering (hierarchical clustering)
[() ] (non-
hierarchical clustering) . seed seed
. (number of clusters)
(size: radius ) .
SAS FASTCLUS .
Faster Cluster .
,
NCLUSTER=7
,
,
,
-
199
(STEP1) seeds . seed .
MAXCLUSETRS=2 SEED 2 .
(STEP2) seeds . DRIFT
seed .
(1) (2) .
(STEP3) SEED .
-
Chapter 7.
200
(STEP4) STEP 2)-STEP3) . STEP SEED
. STEP . MAXITER
. MAXITER=3
STEP 3 .
Non-hierarchical clustering
RANDOM .
FASTCLUS . (
3 PRIN1, PRIN2 .)
.
7.6.1.
Fisher Iris() [IRIS.TXT]. 4 .
(VARIETY: S, C, V 3 )
.
-
201
(petal length), (petal width), (stamen length), (stamen
width)
7.6.2.
PROC CLUSTER FASTCLUS () .
procedure
.
(SL, SW), (PL, PW)
.
-
Chapter 7.
202
7.6.3.
Fast cluster . IRIS
(3 : S, C, V)
.
2~3
.
1 (SL, PL, PW: ), 2 SW( ) .
VAXIS Y , HAXIS X . VPOS 25%, HPOS 50%
PLOT .
2
-
203
PRIN1 . (
, ) .
7.6.4. Fast clustering
3 .
2 3
1 .
SAS data (IRIS_PRIN)
PRIN1, PRIN2 .
-
Chapter 7.
204
RMS Std Deviation
.
Maximum Distance from Seed to Observation
seed seed
R-Square
RSQ/(1-RSQ)
?
Distance Between Cluster Centroids
()
PL () . PW
. 1
.
-
205
.
A
B
-
Chapter 7.
206
2 2 .(A
, B ) 1, 2
.
ID
.
7.6.5. 1
3 , SEED (DRIFT), 3
168 RMS .
-
207
7.6.6. 2
.
2 .
.
-
Chapter 7.
208
7.7.
(MDS: Multi Dimensional Scaling) n (
2 ) . (similarity)
. .
Under-arm Deodorant
() .
. , , ,
10 . 2
.
2
7
,
.
3(RADIUS=3)
3
RMS
.
? Intuition & Trial Error
-
209
metric non-metric .
(1)Euclidean distance (Metric )
(2) () . (Metric/non-Metric )
(3) . (non-
Metric )
.
(1)
(2)
(3) ,
7.7.1.
n ( 2 )
() .
() .
(1)metric
metric () Euclidean distance .
X1 X2 Xp
1 x11 x12 x1p
2 x21 x22 x2p
n xn1 xn2 xnp
-
Chapter 7.
210
( 1D , 2D ) 22
222
11 )()()( jpipjiji xxxxxx (Euclidean
distance)
() (10 , 100 )
. 1x , ,2x , px
( ) . Deodorant ,
, , 10
.
(2)non-metric
. . 50 20
(1, 2) 30, (1, 20)
25 , (2, 20) 45
1 2 20
1 0
2 30 0
20 25 45 0
. . MDS
(1-/) .
7.7.2.
. n k=n(n-1)/n
.
(1) . ikjkjiji SSS ...2211
(2) m( 2) .
2) . 2
STRESS .
-
211
ji
ji
ij
ijij
S
SS
STRESS 2)2
2)3)2
)(
)(
2 Stress .
Stress Goodness of fits
20% Poor
10% Fair
5% Good
2.5% Excellent
7.7.3. 1 (Metric)
.
(5.) 5 @56 56 city .
10 MDS 2
. . non-metric
(1-/) .
-
Chapter 7.
212
(1)MDS
LEVEL=absolute .
LEVEL=ordinal . default ordinal .
SHAPE=square .
SHAPE=triangle . triangle default .
PLOTIT . () 2 .
MDS ()
.
2 STRESS
. SAS data OUT
.
(2)MDS
-
213
. ()
(Dim1, Dim2) .
-
Chapter 7.
214
7.7.4. 2
. 6 15 15
. 15 .
cheek() . mouth=>face=>head ..
.
-
215
(1)CONDITION data ROW ,
MATRIX(default) . . (
)
(2)LEVEL data ORDINAL(default) , ABSOLUTE
.
(3)DIMENSION .
(4)PFINAL OUT .
-
Chapter 7.
216
-
217
7.7.5. 3
34 HP(), TIM1( 60 :) TIM2(1/4
:), TS() BRAK1(60 ) BRAK2 (80
) SP( ) SS( ) MPG(
) . MDS . [CARS.txt]
CARS4 CARS2
.
-
Chapter 7.
218
ONE I . ORIG J
( ji, )=(1,2), (1,3), (34,34) 34*34 .
DUP i j NN .
.
. .
-
219
=SHAPE( , nrow, ncol) (nrow, ncol)
. ( DIST nn DD .)
-
Chapter 7.
220
-
221
[EXERCISE]
(1)ORANGE.txt 5 .
Boron(B), Barium(BA), Calcium(CA), Potassium(PO), Magnesium(MA), Manganese(MN),
Phosphorous(PH), Rubidium(RU), Zinc(ZN)
( Centroid( ), Average( ) )
.
.
. (
?)
(2)ORANGE.txt FASTCLUSTER (1) Hierarchical
.
(3) (WHEAT.txt) (, , )
(1) (2) .
(4) .
Metric . .
Non-metric .
-
CHAPTER 8
(Canonical Correlation Analysis)
. (, , ) (,
, )
.
( mXXX ,...,, 21 ) ( nYYY ,...,, 21 ) . Xs
Ys mm XaXaXaV ...2211 , nnYbYbYbW ...2211 1)
2) ),( VUCorr . p
2 .
)
,(~
...2221
1211
2
1
2
1
4
3
2
1
Normal
xx
x
xxxx
x
p
. 1) ( 21, xx )
2)
-
Chapter 8.
222
((determination coefficient= SSTSSR / ) ( 2R ).
( ) ( pp XaXaXa ...2211 )
.
8.1.
8.1.1.
.
),(max 110
1 WVcorrba
where 2'111
'11 , xbWxaV
11, ba (first canonical variate)
11, ba . 1 (first canonical
correlation) .
1)var()var( 11 WV 111'1 aa , 122
'1 bb
8.1.2.
2'221
'22 , xbWxaV 22 , ba .
(1)V2 W2 V1 W1 .
(2) 1)var()var( 22 WV
),( 222 WVcorr .
.
2-3 .
-
223
8.1.3.
.
. p
q ),min( qp .
8.1.4.
.
(1) 0: 101 H vs. 0: 101 H 0: 1201 H vs. 0: 1201 H
)1(||||
|| 212211
ik
iT
, ),min( qpqk
(2) 0:0 rrH vs. 0:0 rrH
)1( 2ik
rirT
, 2 )1)(1(,~)log( rqprqrT
8.2.
(WHEAT.txt) (, , )
(, , ) .
8.2.1.
SAS .
. NOSIMPLE
(, ) .
-
Chapter 8.
224
.
. (pair-wise)
.
(1)OUT SAS SCORES .
(2)NCAN=2 2 . (V1, W1), (V2,W2)
(3)CORR . PROC CORR
.
(4)VPREFIX=DOWN V DOWN . WPREFIX=RIGHT W
RIGHT .
(5) VAR V, WITH W
.
-
225
8.2.2.
, .
.
CANONICAL
4 . ( 4 )
)2,2(),1,1( WVCorrWVCorr . )2,1( WVCorr ? 0
.
1
2
3
-
Chapter 8.
226
CANONICAL
.
0 . 0
. 3 0.03 0.05
. . 4
0.9617 .
,
2'111
'11 , xbWxaV , 2
'221
'22 , xbWxaV 11,ba , 22 , ba .
DBZDLZDPZDAZDOWNV _*2537.0_*2544.0_*7768.0_*0797.011
DBZDLZDPZDAZDOWNV _*4587.0_*2224.1_*6688.0_*2867.021
-
227
RBZRLZRPZRAZRIGHTW _*0407.0_*1601.0_*8941.0_*0165.011
RBZRLZRPZRAZRIGHTW _*4412.0_*8096.0_*1905.0_*8764.021
.
**_Z .
RAW STANDADIZED
. ()
.
. V1 , V2
, W1 , W2 ( )
. .
. V1 V2, W1 W2 .
.
DOWN1 , ,
DOWN2 .
(RIGHT1) , (RIGHT2) .
-
Chapter 8.
228
(RIGHT1) , , .
, , . (RIGHT2)
, , .
(DOWN1) , , ,
(DOWN2) , , .
. .
OUT SCORE .
(DOWN1, RIGHT1) (DOWN2, RIGHT2) (DOWN1, RIGHT2), (DOWN2, RIGHT1)
0. V1, W1, V2, W2 .
.
-
229
0.88 0.39 1
. .
-
Chapter 8.
230
-
231
[EXERCISE]
6 .
( n =20)
191 36 50 5 162 60 189 37 52 2 110 60 193 38 58 12 101 101 162 35 62 12 105 37 189 35 46 13 155 58 182 36 56 4 101 42 211 38 56 8 101 38 167 34 60 6 125 40 176 31 74 15 200 40 154 33 56 17 251 250 169 34 50 17 120 38 166 33 52 13 210 115 154 34 64 14 215 105 247 46 50 1 50 50 193 36 46 6 70 31 202 37 62 12 210 120 176 37 54 4 60 25 157 32 52 11 230 80 156 33 54 15 225 73 138 33 68 2 110 43
(, , ) . 8
.
Chapter1Chapter3Chapter4Chapter5Chapter6Chapter7Chapter8