hoi quy tuyen tinh don

Upload: souvenirsouvenir

Post on 18-Jul-2015

63 views

Category:

Documents


0 download

TRANSCRIPT

Phn tch h i qui tuy n tnh n bi n

ng nhp

V statistics.vn

Tng mc lc

Phn hi

Chn bi

Tm kim

Phn tch hi qui tuyn tnh n bin thc cht l mt khai trin t m hnh phn tch tng quan (correlation analysis) m ti gii thch trong phn trc. Phn tch tng quan cung cp cho chng ta h s tng quan (coefficient of correlation), phn nh mc lin h hay tng quan gia hai bin. Phn tch hi qui tuyn tnh cung cp cho chng ta mt m hnh tin lng mt bin s lm sng t mt yu t khc. V l m hnh ng nhp (model) cho nn phi c tham s (parameter). hnh phntrong phn tch hi qui tuyn tnh, chng ta cn phi c Trang ch >> M Do , tch >> Hi quy tuyn tnh >> tnh cc tham s ca m hnh tin lng.ng k thnh vin

M hnh hi qui tuyn tnh c l l mt trong nhng phng php phn tch thng k ph bin nht, c p dng Trang ch nhiu nht (v cng b lm dng nhiu nht) trong nghin cu y hc. M hnh ny c mt lch s kh lu i. Nm 1885, nh khoa hc gc Phn tch hi qui tuyn tnh n bin thc cht l mt hc) gii m hnh phn tch Anh, Thc m c Galton (mt trong nhng nh khoa hc tin phong trong di truyn khai trin tthiu khi nim "regression" (hi qui) trong mt Francis tng quan (correlation analysis) m ti gii thch trong phn trc. Phn tch nghin cu m trong ng chng minh rng chiu cao ca nhng ngi con khng c xu hng tng quan vi chiu cao ca cha hay m, tng quan cung cp cho chng ta h s tng quan (coefficient of correlation), phn m c xu hng tng quan vi chiu cao trunglin h ca tngv m. ng gi xu hng ny l tuynqui. cung Tt c bi ng nh mc bnh hay cha quan gia hai bin. Phn tch hi qui hi tnh Nhng tht ra, Galton khng phi l ngi u tin pht trin, nhng l ngi u tin ngm hnh m hnh hi qui tuyn tnh. Nh tont khc. cp cho chng ta mt dng, tin lng mt bin s lm sng t mt yu hc ngi Php thuc loi hng huyn Tt c l lun Marie thoi tn bnAdrienng Legendre V l mngi(model) chophtphi c v cng(parameter). Do , trong phn v hi qui tuyn tnh vo nm 1805 mi l hnh u tin nn trin tham s b cng trnh nghin cu tch (nhng lc ng khng dng danh hi qui tuyn tnh, chng ta cn phi c c tng s ca mthytin lng. tuyn tnh l Carl Friedrich Gauss t "regression"). Nhng ngi tnh cc tham nguyn hnh v hi qui (mt im ton hc thuc vo hng huyn thoi qui tuynngi tng mtcp n khi nim phpqui vo u th k 19. nh tin/sch bo M hnh hi khc), tnh c l l trong nhng phng hi phn tch thng kThng k v X hi 1. Tm lc l thuyt

Phn tch hi qui tuyn tnh n bin

ph bin nht, c p dng nhiu nht (v cng b lm dng nhiu nht) trong nghin cu y hc. M hnh ny c mt lch s kh lu i. Nm 1885, nh khoa hc gc Anh, Francis Galton (mt trong nhng nh khoa hc tin phong trong di truyn hc) gii thiu khi nim "regression" (hi qui) trong mt nghin cu

m trong ng chng pht biu rng: ca l o lng ca i tng i (i tng quan vi chiu cao ca Thng k th tuyn M hnh hi qui gin tnh (t nay s vit tt l HQTT)minh rng chiu cao Gi nhng ngi con khng c xu hng = 1, 2, 3, , n) ca mt bin ph cha hay m, m c xu thuc, v l o lng ca mt bin c lp cng ca ihng tng quan vi chiu cao trung bnh ca cha v m. th m t hng phng trnh vi hai tng i, mi lin h tuyn tnh gia hai bin c ng gi xu bng ny l hi qui.

thng s

Xc sut - thng k m t c lng Kim nh c bn

Nhng tht ra, Galton khng phi l ngi u tin pht trin, nhng l ngi u tin ng dng, m hnh hi qui

nh sau:

tuyn tnh. Nh ton hc ngi Php thuc loi hng huyn thoi tn l Adrien Marie Legendre mi l ngi u tin pht trin v cng b cng trnh nghin cu v hi qui tuyn tnh vo nm 1805 (nhng lc ng khng dng danh t

Trong , vo hngl hai thoi khc),ca m hnh cp n khi nimtnhqui vo utnhkt s liu quan st c, v l thuc huyn tham s ngi tng hi qui tuyn hi cn c th 19. phn d, tc phn khng th tin lng bng o lng ca bin s c lp. M hnh trn ch hp l khi cc gi1. Tm lc l thuyt M hnh hi qui tuyn tnh (t nay s vit tt l HQTT) pht biu rng: Gi l o lng ca i tng i (i = 1, 2, 3, , n) c th m t bng phng trnh vi hai thng s nh sau:

"regression"). Nhng ngi c tng nguyn thy v hi qui tuyn tnh l Carl Friedrich Gauss (mt nh ton hc

nh sau y ng: (i) (ii) (iii) (iv)ANOVA Hi quy tuyn tnh

M hnh phn tch

ca mt bin o lng (random error); Gi tr ca x khng chu nh hng sai sph thuc, v l o lng ca mt bin c lp cng ca i tng i, mi lin h tuyn tnh gia hai bin

tun theo lut phn phi chun vi trung bnh 0 v phng sai;Trong , tng quan g vi x; v ng: (chng hn nh (i) l hai tham s ca m hnh hi qui tuyn tnh cn c tnh t s liu quan st c, v l phn

khng c Hi quy nh phnHi quy logistic Hi quy Cox

d, tc phn khng th tin lng bng o lng ca bin s c lp. M hnh trn ch hp l khi cc gi nh sau y

cc gi tr ni tip ca

) c lp vi nhau. chng ta c th c tnh tr s k vng (hay

Gi tr ca x khng chu nh hng sai s o lng (random error);

Vi cc gi nh trn, v bi v hai tham s (ii) ni r hn lPoisson Hi quy s trung bnh) ca y nh sau:(iii)Khc/Tng hp Mu v c m u Trng phi Bayes K nng nghin cu Nhng ch khc Chuyn ngnh Cng c - phn m m Lin kt hu ch T liu (cn ng nhp)

l bt bin, cho nn, trung bnh o phng sai; tun theo lut phn phi chun vi cho mt 0 v lng ca xkhng c tng quan g vi x; v

Vn t ra l cho mt lot s liu (x 1 ,y 1 ) , (x 2, y 2 ), . . . , (x n,y n), hai tham s nn c cc gi tr ni tip ca (chng hn nh ) c lp vi nhau. tnh nh th no. Phng php Phng php bnh ph ng nh nht (cn gi l least squares method) l phng php tt nht c tnh hai tham s . lng ca x chngphp th cchng ta Theo phng ta c ny, Vi cc gi nh trn, v bi v hai tham s l bt bin, cho nn, cho mt o(iv)

cn tm hai c s hn l s trung bnh) ca vi tnh tr s k vng (hay ni r a v b (tng ng y nh sau: st (y i) v gi tr tin on (

) sao cho tng s bnh phng gia gi tr quan

) l thp nht, ni cch khc, chng ta ti thiu ha:

Ha ra, mun ti thiu ha Q chng ta ch cn gii h phng trnh n gin sau y:

Vn t ra l cho mt lot s liu (x1 ,y1 ) , (x2,y2 ), . . . , (xn ,yn ), hai tham s

nn c tnh nh th no.

Phng php Phng php b nh phng nh nht (cn gi l least squares method) l phng php tt nht c tnh hai tham s . Theo phng php ny, chng ta cn tm hai c s a v b (tng ng vi cho tng s bnh phng gia gi tr quan st (yi ) v gi tr tin on ( ha: ) sao

) l thp nht, ni cch khc, chng ta ti thiu

Ha ra, mun ti thiu ha Q chng ta ch cn gii h phng trnh n gin sau y:

Ch rng, trong cc phng trnh trn,

l s trung bnh ca bin s x v y. Xin nhc li rng chng ta khng bit c gi tr ca

a v b, m ch c th c tnh chng, v c s ca hai tham s ny chnh l . Thut ng thng k gi a l intercept, v b l gradient hay slope. Nh chng ta thy qua phng trnh trn, intercept chnh l gi tr ca y khi x = 0. Cng thc [4] cho thy c s b ch n gin bng hip bin ca x v y chia cho phng sai ca y. Tuy cc cng thc ny mi nhn qua c v rc ri, nhng trong thc t th rt n gin, bn c ch cn mt my tnh cm tay (calculator) hay tt hn na phn mm Excel cng c th tnh rt d dng. V d 1 (tip tc) cn nng v vng eo: Trong phn trc (phn tch tng quan), chng ta c s liu v cn nng v vng eo ca 15 i tng nh sau (in li d theo di): Bng 1. Cn nng v vng eo ca 15 i tng ngi VitTrng lng (weight; kg) 51.0 66.0 47.0 54.0 64.0 75.0 54.0 52.0 53.0 52.0 48.0 46.0 63.0 40.0 90.0 Vng eo (waist; cm) 71.0 89.0 64.0 74.0 87.0 93.0 66.0 74.0 75.0 72.0 70.0 66.0 81.0 57.0 94.0Ch rng, trong cc phng trnh trn, l s trung bnh ca bin s x v y. Xin nhc li rng chng ta . khng bit c gi tr ca a v b , m ch c th c tnh chng, v c s ca hai tham s ny chnh l chnh l gi tr ca y khi x = 0. Cng thc [4] cho thy c s b ch n gin bng hip bin ca x v y chia cho phng sai ca y. Tuy cc cng thc ny mi nhn qua c v rc ri, nhng trong thc t th rt n gin, bn c ch cn mt my tnh cm tay (calculator) hay tt hn na phn mm Excel cng c th tnh rt d dng. V d 1 (tip tc) cn nng v vng eo: Trong phn trc (phn tch tng quan), chng ta c s liu v cn nng v vng eo ca 15 i tng nh sau (in li d theo di): Bng 1. Cn nng v vng eo ca 15 i tng ngi Vit Trng lng (weight; Vng eo (waist; cm) kg) 51.0 66.0 47.0 71.0 89.0 64.0

Thut ng thng k gi a l intercept, v b l gradient hay slope. Nh chng ta thy qua phng trnh trn, intercept

Gi cn nng l x v vng eo l y. Vi54.0 gi ny, chng ta c mun s dng cn nng ca tin on vng eo ca mt i tng. Xin cch 74.0 nhc li, trong bi trc, chng ta c nhng kt qu tnh ton sau y:64.0 75.0 54.0 52.0 53.0 87.0 93.0 66.0 74.0 75.0

Vi n = 15 i tng, v da vo cng thc [4] v [5] chng ta c th c tnh tham s b v a nh sau: n y th chng ta c mt cng thc tin on vng eo da vo trng lng ca mt i tng 52.0 72.0 qua phng trnh sau y:48.0 46.0 63.0 70.0 66.0 81.0

Ch rng trong phng trnh trn, bin s y c du m trn nhc nh rng y l gi tr tin on, phn bit vi gi tr o lng (thc t) l y i.

ngha ca phng trnh ny l g? y, gi tr a = 30 khng c ngha thc t 40.0 g ng k, nhng b = 0.80 c ngha l mi kg cn57.0 nng tng quan vi 0.8 cm vng eo. Chng hn nh nu chng ta hi bit mt bnh nhn cn nng l 60 kg, th90.0 phng trnh trn, c th tin on rng vng eo ca bnh nhn l: qua 94.0 nhng nu mt bnh nhn khc vi cn nng 62 kg, th chng ta c th tin on vng eo ca bnh nhn l:Gi cn nng l x v vng eo l y. Vi cch gi ny, chng ta c mun s dng cn nng ca tin on vng eo ca mt i tng. Xin nhc li, trong bi trc, chng ta c nhng kt qu tnh ton sau y:

By gi, chng ta th xem phng trnh trn tin on vng eo chnh xc ra sao, bng cch s dng phng trnh trn c tnh vng eo cho tng i tng trong Bng 1 nh sau:

Bng 2. Tin on vng eo da vo cn nng cho 15 i tngVi n = 15 i tng, v da vo cng thc [4] v [5] chng ta c th c tnh tham s b v a nh sau:

S ID (i)

Trng lng (x i )

Vng eo th c t (y i )

Khc bit gi a gi Vng eo tin on tr th c t v tin qua m hnh on ( )

(i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 51.0 66.0 47.0 54.0 64.0 75.0 54.0 52.0 53.0 52.0 48.0 46.0 63.0 40.0 90.0

(y i ) ( 71.0 89.0 64.0 74.0 87.0 93.0 66.0 74.0 75.0 72.0 70.0 66.0 81.0 57.0 94.0 ) 70.8 0.2 82.8 6.2 n y th chng ta c mt cng thc tin on vng eo da vo trng lng ca mt i tng qua phng 67.6 -3.6 trnh sau y: 73.2 0.8 81.2 5.8 90.0 3.0 73.2 -7.2 71.6 2.4 Ch rng trong phng trnh trn, bin s y c du m trn nhc nh rng y l gi tr tin on, phn bit vi 72.4 2.6 gi tr o lng (thc t) l yi . 71.6 0.4 68.4 1.6 ngha ca phng trnh ny l g? y, gi tr a = 30 khng c ngha thc t g ng k, nhng b = 0.80 c ngha l mi 66.8 nng tng quan vi 0.8 cm vng eo. Chng hn nh nu chng ta hi bit mt bnh nhn cn nng kg cn -0.8 l 60 kg, th qua phng trnh0.6 c th tin on rng vng eo ca bnh nhn l: trn, 80.4 62.0 -5.0 102.0 -8.0nhng nu mt bnh nhn khc vi cn nng 62 kg, th chng ta c th tin on vng eo ca bnh nhn l:

Ct s 4 cho thy gi tr vng eo c tin on qua phng trnh , v ct s 5 cung cp cho chng ta chnh xc ca tin on (ly vng eo o lng tr cho vng eo tin on). Nh c th thy qua bng ny, phng trnh tin on kh chnh xc vng eo. Chng hn nh vi i tng 1, vng eo thc t l 71 cm, nhng m hnh tin on l 70.8 cm. iu ny c ngha g? N c ngha l gi tr By gi, chng ca xem phng trnh trn vi cn nng 51 kg. tin on 70.8 cm chnh l vng eo trung bnhta thtt c nhng ngitin on vng eo chnh xc ra sao, bng cch s dng phng trnh trn c tnh vng eo cho tng i tng trong Bng 1 nh sau:

Tuy nhin nhn qua bng trn, chng ta cng thy mt s i tng, phng trnh tin on khng my tt. Chng hn nh i tng 2, Bng 2. Tin on vng eo da vo cn nng cho 15 i tng phng trnh tin on thp hn thc t n 6.2 cm, nhng vi i tng 15, phng trnh tin on cao hn thc t n 8 cm! Chng ta s quay li thm nh cht lng v chnh xc ca m hnh ny trong mt phn tip theo. Ngoi ra, cch tt hn l v biu so snh gia gi tr o lng vng eo v gi qua tin on nh thc t v tin di y th hin. Biu ny tr m hnh Biu 1 on (yi ) cho thy, m hnh trn tin on kh(i) chnh xc vng eo nhng ngi c trng lng di 60 kg, nhng nhng ngi c trng lng cao ( ) hn ngng ny, th gi tr tin on khng my chnh xc.1 2 3 4 5 6 7 8 9 10 11 12 51.0 66.0 47.0 54.0 64.0 75.0 54.0 52.0 53.0 52.0 48.0 46.0 63.0 40.0 71.0 89.0 64.0 74.0 87.0 93.0 66.0 74.0 75.0 72.0 70.0 66.0 81.0 57.0 70.8 82.8 67.6 73.2 81.2 90.0 73.2 71.6 72.4 71.6 68.4 66.8 80.4 62.0 0.2 6.2 -3.6 0.8 5.8 3.0 -7.2 2.4 2.6 0.4 1.6 -0.8 0.6 -5.0 S ID Trng lng (xi ) Vng eo thc t Vng eo tin on Khc bit gia gi tr

2. Kim nh gi thuyt v a v b Quay li vi m hnh

13 14

, ta 15chng 90.0c th

rt ra94.0 nhn xt nh102.0 Nu vi sau:

, th phng trnh n gin thnh -8.0, v ct s 5 cung cp cho

,

tc l khng c mi tng quan no gia x v y; nhng nu

Ct s 4 cho thy gi tr vng eo c tin on qua phng trnh

(tc c th m hay dng) th mi lin h gia x v y hin hu. Do , . kim nh gi thuyt ny, chng ta cn tnh ton

kim nh m hnh hi qui tuyn tnh tp trung vo kim nh gi thuytphng trnh tin on kh chnh xc vng eo. phng sai ca b (v nn nh rng b l c s ca ).

chng ta chnh xc ca tin on (ly vng eo o lng tr cho vng eo tin on). Nh c th thy qua bng ny, Chng hn nh vi i tng 1, vng eo thc t l 71 cm, nhng m

hnh tin on l 70.8 cm. iu ny c ngha g? N c ngha l gi tr tin on 70.8 cm chnh l vng eo trung bnh ca tt c nhng ngi vi cn nng 51 kg.

Chng ta bit rng

l s trung bnh ca vng eo, v phng sai ca vng eo, trc khi bit cn nng, c th c tnh nh sau:

Tuy nhin nhn qua bng trn, chng ta cng thy mt s i tng, phng trnh tin on khng my tt. Chng on cao hn thc t n 8 cm! Chng ta s quay li thm nh cht lng v chnh xc ca m hnh ny trong mt phn tip theo.

Cng thc trn hn nh i tngl phng trnhsai oniu kin, t Nhngcm, nhng vi i tng 15,nh va tin trn, cn c gi 2, phng tin v thp hn thc n 6.2 trong m hnh [1], phng trnh cp chnh l vng eo trung bnh vi iu kin x i. Chnh v th m phng sai ca y (k hiu s2 ) vi :

Ngoi ra, cch tt hn l v biu so snh gia gi tr o lng vng eo v gi tr tin on nh Biu 1 di y th hin. Biu ny cho thy, m hnh trn tin on kh chnh xc vng eo nhng ngi c trng lng di 60

iu kin bit cn nng c c tnh bng cch thay th

iu kin bit cn nng c c tnh bng cch thay th

kg, nhng nhng ngi c trng lng cao hn ngng ny, th gi tr tin on khng my chnh xc.

:

(Ch rng mu s l n 2, ch khng phi n 1, v y l phng sai c c tnh vi 2 tham s a v b, cho nn n phi tr cho 2). Gi ei l khc bit gia vng eo thc t v vng eo tin on:

Trong thut ng thng k hc, ei cn c gi l residual. Phng sai trong phng trnh [6] c th vit li nh sau: s2 chnh l c s ca trong m hnh [1]

Sau vi thao tc i s, c th chng minh rng phng sai ca b v a c th vit nh sau: v:

2. Kim nh gi thuyt v a v b Quay li vi m hnh gin thnh , chng ta c th rt ra vi nhn xt nh sau: Nu , tc l khng c mi tng quan no gia x v y; nhng nu , th phng trnh n (tc c th m hay dng)

th mi lin h gia x v y hin hu. Do , kim nh m hnh hi qui tuyn tnh tp trung vo kim nh gi thuyt . kim nh gi thuyt ny, chng ta cn tnh ton phng sai ca b (v nn nh rng b l c s ca ).

vng eo, trc khi bit trnh by trong tnh Khi n (s i tng tng i ln), b Chngtheo lut phn phi chun vi seo, v phng sai cav phng sai nh cn nng, c th c [8]. Do , tun ta bit rng l s trung bnh ca vng trung bnh l

kim nh gi thuyt = 0 c th da vo t s T b sau y:

nh sau:

Cng thc trn cn c gi l phng sai v iu kin,

Nhng trong m hnh [1], nh va cp trn,2)

(Ch , sx chnh l lch chun ca x). Nu Ngoi ra, kim nh gi thuyt Nu

chnh l vng eo trung bnh = 0 th T b tun theo lut phn vi iu vi bc t do l n phng sai ca y (k hiu s phi t kin xi. Chnh v th m 2. :

vi

= 0 cng c th tnh ton qua t s T a nh sau:

iu kin bit cn nng c c tnh bng cch thay th

= 0 th t tun theo lut phn phi t vi bc t do l n 2.(Ch rng mu s l n 2, ch khng phi n 1, v y l phng sai c c tnh vi 2 tham s a v b , cho nn n phi tr cho 2). Gi ei l khc bit gia vng eo thc t v vng eo tin on:

V d 1 (tip tc): Chng ta tip tc s dng s liu ca v d 1 minh ha cho cc tnh ton trn. c tnh phng sai s2 theo cng thc [6], chng ta cn tnh ton bnh phng ca khc bit gia y i v sau: v tng s nh

Bng 3. Tnh ton phng sai ca yTrong thut ng thng k hc, e i cn c gi l residual. Phng sai trong phng trnh [6] c th vit li nh sau:

S ID (i)

Vng eo Trng th c t lng (x i ) ( yi) 51.0 66.0 47.0 54.0 64.0 75.0 54.0 52.0 53.0 52.0 48.0 46.0 71.0 89.0 64.0 74.0 87.0 93.0 66.0 74.0 75.0 72.0 70.0 66.0

Khc bit Vng eo tin gi a gi tr on qua m th c t v hnh tin ons 2 chnh l c s ca trong m hnh [1]

( 1 2 3 4 5 6 7 8 9 10 11 12

)Sau vi thao tc i s, c th chng minh rng phng sai ca b v a c th vit nh sau:

70.8 82.8 67.6 73.2 81.2 90.0 73.2 71.6 72.4 71.6 68.4 66.8

v:

0.2 6.2 -3.6 0.8 5.8 3.0 -7.2 2.4 2.6 0.4 1.6 -0.8

0.04 38.44 12.96 0.64 33.64 9.00 51.84 5.76 6.76 0.16 2.56 0.64

13 14 15

63.0 40.0 90.0 = 57.0, sx = 12.8

81.0 57.0 94.0 =75.5 sy =11.1

80.4 Khi n (s i tng tng i ln), b tun theo lut phn phi chun vi s trung bnh l 0.6 0.36 62.0 -5.0 25.00 by trong [8]. Do , kim nh gi thuyt = 0 c th da vo t s Tb sau y: 102.0 -8.0 64.00 T ng c ng: 251.7(Ch , sx chnh l lch chun ca x). Nu

v phng sai nh trnh

= 0 th Tb tun theo lut phn phi t vi bc t do l n 2.

Bng trn cho thy s2 = 251.7 / 13 = 19.36. Do , phng sai ca b l:

Ngoi ra, kim nh gi thuyt

= 0 cng c th tnh ton qua t s Ta nh sau:

v lch chun ca b l: . Do , kim nh gi thuyt 0 l: Nu

=

Nu

= 0 th T b dao ng t -2 n +2, nhng y, = 0 th t tun theo lut phn phi t vi bc t do l chng ta thy T b = 8.69, tc cao hn 4 ln so vi gi tr k n 2. vng, cho nn chng ta c th kt lun rng mi lin hv tng s nh sau:

gia vng eo v cn nng c ngha thng k. 3. Phn tch phng sai

V d 1 (tip tc): Chng ta tip tc s dng s liu ca v d 1 minh ha cho cc tnh ton trn. c tnh phng sai s 2 theo cng thc [6], chng ta cn tnh ton bnh phng ca khc bit gia yi v Bng 3. Tnh ton phng sai ca y

Mt trong nhng mc ch ca phn tch hi qui tuyn tnh l tm hiu xem bin c lp c th gii thch bao nhiu phn trm bin thin ca bin ph thuc. Trong v d ny, c th l chng ta mun bit bao nhiu phn trm ca bin thin (hay khc bit) gia cc c nhn v vng eo c th gii thch bng cn S ID Cm tlng Vng eoy l bin thin (thut ng thng k hc l variation). Ch rng mi nng. Trng cn bn thc Vng eo tin Khc bit c nhn c 3 gi tr: vng eo thc t y i, vng eo tin onybng cnhnh nng ( i) tr ny c th m t nh sau:( ) (i) (xi ) t on qua m gia gi tr

, vthc t v vng eo trung bnh ca qun th . Mi lin h gia ba gitin on

Ni cch khc, khc bit gia vng eo ca mt c nhn v s trung bnh l tng s khc bit ca:1 2

(a) gia gi tr tin on v s trung bnh (66.0 47.0 89.0 64.0 82.8 67.6 6.2 -3.6

51.0

71.0

70.8

0.2

0.04

), v (b) gia gi tr thc t v gi tr tin on

38.44 12.96

.

3

4 54.0 74.0 73.2 0.8 Do , mt ch s c th o lng bin thin ca mt bin l tng bnh phng ca bin 0.64 Ni cch khc, nu y i l vng eo ca tng c .

nhn i v

64.0 81.2 l vng eo trung bnh, th5tng bnh phng 87.0 hiu SST): l (k 6 75.0 93.0 90.0

5.8 3.0

33.64 9.00

Trong m hnh [1], chng ta c l gi tr tin on ca y i sau khi iu chnh cho cn nng (x i), cho nn,7 54.0 66.0 73.2 -7.2 cng mt logic nh trn, c th ni rng tng bin thin ca 51.84 eo m m hnh [1] c th gii thch c l vng (k hiu SSR): 52.0 8 74.0 71.6 2.4 5.76 9 10 11 12 13 14 53.0 52.0 48.0 46.0 63.0 40.0 75.0 72.0 70.0 66.0 81.0 57.0 94.0 =75.5 hay: 72.4 71.6 68.4 66.8 80.4 62.0 102.0 2.6 0.4 1.6 6.76 0.16 2.56 0.64 0.36 25.00 64.00

V phn cn li, tc phn bin thin ca vng eo khng th gii thch bng m hnh [1] l (k hiu SSE):-0.8 0.6

Do , qua mi lin h [9], chng ta c th chng minh rng:-5.0 -8.0

SST 15 SSR +90.0 = SSE= 57.0, s x= 12.8

s y=11.1 Chng ta thy c tnh SST, Tng cng: 251.7 tiu ra mt tham s (s trung chng ta phi bnh), cho nn bc t do (degrees of freedom) ca SST l n 1. Do , s trung bnh bnh phng (mean squares) l MST = SST / (n 1). tnh SSE, chng ta phi Bng t cho thy s 2 = l n / 2; 19.36. s trung bnh bnh l: cn n hai tham s (a v b), cho nn bc trndo ca SSE 251.713 =do , Do , phng sai ca bphng l MSE = SSE / (n 2). Cc ch s ny c th tm lc trong mt bng phn tch phng sai (analysis of variance) nh sau:

Bng 4. Phn tch phng sai cho m hnh hi qui tuyn tnhNgun bin thin Bc t do (degrees of T ng bnh phng (sum Trung bnh bnh freedom) of squares) phng (mean squares)

Hi qui (regression)

1

v lch chun ca b l:

MSR = SSR / 1

. Do , kim nh gi thuyt

= 0 l:

Phn d (residual)

n2

Nu

MSE = SSE / (n 2) = 0 th Tb dao ng t -2 n +2, nhng y, chng ta thy Tb = 8.69, tc cao hn 4 ln so vi gi tr k vng,

cho nn chng ta c th kt lun rng mi lin h gia vng eo v cn nng c ngha thng k.

T ng s bin thin

n1

3. Phn tch phng sai Mt trong nhng mc ch ca phn tch hi qui tuyn tnh l tm hiu xem bin c lp c th gii thch bao nhiu phn trm bin thin ca bin ph thuc. Trong v d ny, c th l chng ta mun bit bao nhiu phn trm ca bin thin (hay khc bit) gia cc c nhn v vng eo c th gii thch bng cn nng. Cm t cn bn y l bin thin (thut ng thng k hc l variation). Ch rng mi c nhn c 3 gi tr: vng eo thc t yi , vng eo tin on

Kim nh F

bng cn nng

, v vng eo trung bnh ca qun th

. Mi lin h gia ba gi tr ny c th m t nh sau:

kim nh ngha thng k ca gi thuyt kim nh F, vi cng thc sau y:

= 0, chng ta lm quen vi t s T b . Nhng cn mt kim nh tng ng khc l

Ni cch khc, khc bit gia vng eo ca mt c nhn v s trung bnh l tng s khc bit ca: (a) gia gi tr tin on v s trung bnh ( ), v (b) gia gi tr thc t v gi tr tin on .

Tht ra, F nh nh ngha trong cngDo , [15] chnh l bnh phng ca T b ca mt bin l tng bnh phng ca bin . thc mt ch s c th o lng bin thin .l vng eo ca tng c nhn i v l vng eo trung bnh, th tng bnh phng l (k hiu SST):

Ni cch khc, nu yi

H s xc nh (coefficient of determination) Bi v tng s bin thin l SST, v trong s ny, bin thin c th gii thch qua m hnh hi qui tuyn tnh [1] l SSR, cho nn chng ta c th c tnh s phn trm m m hnh c th gii thch tng bin thin ca y. H s ny c gi l h s xc nh v k hiu l R2 :Trong m hnh [1], chng ta c l gi tr tin on ca yi sau khi iu chnh cho cn nng (xi ), cho nn, cng mt logic

Nhn qua mi lin h [13], chng ta d bin thin caR2 c gi tr t 0 [1] c 1. Nu R2 c l (k hiu SSR): qui tuyn tnh coi nh trn, c th ni rng tng dng thy vng eo m m hnh n th gii thch = 0, m hnh hi nh v dng, v khng gii thch phn trm no bin thin ca y. Nu R2 = 1 hay gn 1, m hnh hi qui tuyn tnh c th tin on chnh xc gi tr ca y. Tuy nhin cn phi nhn mnh rng mt m hnh vi R2 cao khng c ngha l rng m hnh tt. Tht vy, R2 c th cao nu b cao hayV phn R c th cao khi m hnh tuyn tnh c p bng cho mt l (k hiu SSE): bin c lp c range (dy s) ln. Ngoi ra,cn2li, tc phn bin thin ca vng eo khng th gii thchdng m hnh [1]mi lin h phi tuyn tnh.

V d 1 (tip tc): Chng ta tip tc s dng s liu ca v d 1 minh ha cho cc tnh ton trn.

Bng 5. Chi tit phn tch phng sai ca m hnh hi qui tuyn tnhSST = SSR + SSE

Do , qua mi lin h [9], chng ta c th chng minh rng:

S ID (i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 T ng c ng:

yi 71.0 89.0 64.0 74.0 87.0 93.0 66.0 74.0 75.0 72.0 70.0 66.0 81.0 57.0 94.0 70.8 82.8 67.6 73.2 81.2 90.0 73.2 71.6 72.4 71.6 68.4 66.8 80.4 62.0 102.0

hay:

20.25 0.04 182.25 Chng 38.44 c tnh SST, chng ta phi tiu ra mt tham s (s trung bnh), cho nn bc t do (degrees of ta thy 132.25 12.96 freedom) ca SST l n 1. Do , s trung bnh bnh phng (mean squares) l MST = SST / (n 1). tnh SSE, 2.25 chng ta phi cn n hai tham s (a v b ), cho nn bc t do ca SSE l n 2; do , s trung bnh bnh phng l 0.64 MSE = SSE / (n 2). Cc ch s ny c th tm lc trong mt bng phn tch phng sai (analysis of variance) nh 132.25 33.64 sau: 9.00 306.25 90.25 51.84 Bng 4. Phn tch phng sai cho m hnh hi qui tuyn tnh 2.25 5.76 0.25 Ngun6.76 thin bin Bc t do (degrees of Tng bnh phng (sum of Trung bnh bnh phng 12.25 0.16 freedom) squares) (mean squares) 30.25 2.56 (regression) 1 MSR = SSR / 1 90.25 Hi qui0.64 30.25 0.36 342.25 25.00 342.25 64.00 1715.75Phn d (residual)

251.7

n2

MSE = SSE / (n 2)

Tng s bin thin n 1715.75, v SSE = 251.7. Do , SSR = SST SSE = 1715.75 251.7 = Qua tnh ton trnh by trong bng trn, chng ta c SST = 1 1463.95. Bng 6 sau y tm lc cc ch s trn:

Bng 6. Phn tch phng sai (v d 1)

Bng 6. Phn tch phng sai (v dnh F Kim 1)Ngun bin thin Hi qui (regression) Phn d (residual) T ng s bin thin Bc t do (degrees ofnh ngngha thng ng (sum Trung bnh bnh kim T bnh ph k ca gi thuyt = 0, chng ta lm quen vi t s Tb . Nhng cn mt kim nh freedom) of squares) phng (mean squares) tng ng khc l kim nh F, vi cng thc sau y: 1 SSR = 1463.95 1463.95 13 SSE = 251.7 19.37 14 SST = 1715.75Tht ra, F nh nh ngha trong cng thc [15] chnh l bnh phng ca Tb . H s xc nh (coefficient of determination) Bi v tng s c tnh SST, H s xc nh bi, theo cng thc [13], c th bin thin ll: v trong s ny, bin thin c th gii thch qua m hnh hi qui tuyn tnh [1] l SSR, vi cn . Ni cch khc, m hnh tuyn tnh cho nn nng l bin c lp c th gii thch khong chng tngth c tnh s phnnhng khc bit v) gii thch tng bin thin ca y. H s ny c gi 85% ta c bin thin (hay trm m m hnh c th vng eo gia cc c nhn. l h s xc nh v k hiu l R 2 :

4. Phn tch phn d v kim tra gi nh

Nh cp trn, phn d (residual) l khc bit gia gi tr thc t v gi tr tin on ca binph thuc: . Phn d rt quan Nhn qua mi lin h [13], chng ta d dng thy R 2 c gi tr t 0 n 1. Nu R 2 = 0, m hnh hi qui tuyn tnh coi trng cho vic thm nh tnh hp l v chnh xc ca m hnh tin on. Xin nhc li rng m hnh hi qui tuyn tnh m chng ta p nh v dng, v khng gii thch phn trm no bin thin ca y. Nu R2 = 1 hay gn 1, m hnh hi qui tuyn tnh c th dng trong V d 1 ch c gi tr khoatin on chnh xc giny p ng nhng gi nh sau y: hc nu m hnh tr ca y. 1. Mi lin h gia cn nng v vngTuy nhin cn phi nhn mnh ph mt m hnh vi v y) phi l mi linrng tuyn tnh,Tht vy, R 2 c th eo (bin c lp v bin rng thuc, hay x R 2 cao khng c ngha l h m hnh tt. tc tun th theo mt cao nu b cao hay bin c lp c range (dy s) ln. Ngoi ra, R 2 c th cao khi m hnh tuyn tnh c p dng ng thng;cho mt mi lin h phi tuyn tnh.

2. Phng sai ca y khng thay i ty theo gi tr ca x; hay ni cch khc, phn d ei khng bin chuyn mt cch c h thng vi x i;V d 1 (tip tc): Chng ta tip tc s dng s liu ca v d 1 minh ha cho cc tnh ton trn.

3. Bin ph thuc y (hay ei) tun theo lut phn phi chun; hay mt cch tng ng; 4. Cc gi tr ca ei khng lin quan nhau.Bng 5. Chi tit phn tch phng sai ca m hnh hi qui tuyn tnh

kim nh nhng gi nh trn, mt s biu sau y c th p dng:S ID (i) yi

Biu ei v

tm xem c 1 gi tr outlier70.8 nhng 20.25 m m hnh khng tin on chnh xc), v xem m hnh c vi cc (tc gi tr 71.0 0.042 89.0 82.8 182.25 38.44

phm gi nh 1 v 2 hay khng.

Biu so snh gi tr quan st3ca ei v gi tr k 67.6 (da vo lut phn phi chun) ca ei kim tra xem gi nh 3 c p ng vng 64.0 132.25 12.96 hay khng.4 74.0 73.2 2.25 0.64

Biu ei v x kim tra xem5c cn hon chuyn x hay khng. 87.0 81.2 132.25

33.64

6 93.0 90.0 306.25 Tuy nhin, phng sai ca ei khng c nh. Do , cc nh nghin cu khuyn 9.00 chng ta nn s dng phn d chun ha (thut ng co

thng k l standardised residuals). Phn d chun ha (k hiu ri) c nh ngha nh sau: ly phn d ei chia cho lch chun ca 7 66.0 73.2 90.25 51.84 m hnh:8 74.0 71.6 2.25 5.76

Vi cch 9hon chuyn ny, r i s c gi tr trung bnh l 0 v phng sai bng 1. Chng ta c th s dng r i 75.0 72.4 0.25 6.76 kim tra cc gi nh ca m hnh hi qui tuyn tnh.10 11 12 13 14 15 Tng cng: 1715.75 251.7 72.0 70.0 66.0 81.0 57.0 94.0 71.6 68.4 66.8 80.4 62.0 102.0 12.25 30.25 90.25 30.25 342.25 342.25 0.16 2.56 0.64 0.36 25.00 64.00

Qua tnh ton trnh by trong bng trn, chng ta c SST = 1715.75, v SSE = 251.7. Do , SSR = SST SSE = 1715.75 251.7 = 1463.95. Bng 6 sau y tm lc cc ch s trn:

Bng 6. Phn tch phng sai (v d 1) Ngun bin thin Bc t do (degrees of Tng bnh phng (sum Trung bnh bnh phng freedom) Hi qui (regression) 1 of squares) SSR = 1463.95 (mean squares) 1463.95

Phn d (residual) Tng s bin thin

13 14

SSE = 251.7 SST = 1715.75

19.37

H s xc nh bi, theo cng thc [13], c th c tnh l:

. Ni cch khc, m

hnh tuyn tnh vi cn nng l bin c lp c th gii thch khong 85% tng bin thin (hay nhng khc bit v) vng eo gia cc c nhn. 4. Phn tch phn d v kim tra gi nh

(Cn tip)

Nh cp trn, phn d (residual) l khc bit gia gi tr thc t v gi tr tin on ca binph thuc: . Phn d rt quan trng cho vic thm nh tnh hp l v chnh xc ca m hnh tin on. Xin nhc

Ch thch k thut:

li rng m hnh hi qui tuyn tnh m chng ta p dng trong V d 1 ch c gi tr khoa hc nu m hnh ny p ng nhng gi nh sau y:

1. phn h gia cn nng v vng Cc m R sau y c s dng cho Mi lintch va trnh by. eo (bin c lp v bin ph thuc, hay x v y) phi l mi lin h tuyn tnh, tc tun th theo mt ng thng;

# M phng cho biu 1d zn1