multipe and non-linear regression. what is what? regression: one variable is considered dependent on...
Post on 19-Dec-2015
222 views
TRANSCRIPT
![Page 1: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/1.jpg)
Multipe and non-linear regression
![Page 2: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/2.jpg)
2
What is what?
• Regression: One variable is considered dependent on the other(s)• Correlation: No variables are considered dependent on the other(s)• Multiple regression: More than one independent variable• Linear regression: The independent factor is scalar and linearly
dependent on the independent factor(s)• Logistic regression: The independent factor is categorical
(hopefully only two levels) and follows a s-shaped relation.
![Page 3: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/3.jpg)
3
Remember the simple linear regression?
If Y is linaery dependent on X, simple linear regression is used:
is the intercept, the value of Y when X = 0
is the slope, the rate in which Y increases when X increases
jj XY
![Page 4: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/4.jpg)
4
I the relation linaer?
-3 -2 -1 0 1 2 3-4
-2
0
2
4
6
8
10
12
![Page 5: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/5.jpg)
5
Multiple linear regression
If Y is linaery dependent on more than one independent variable:
is the intercept, the value of Y when X1 and X2 = 01 and 2 are termed partial regression coefficients1 expresses the change of Y for one unit of X when 2 is kept constant
jjj XXY 2211
05
1015
20
25
1
2
3
4
5
6
70
0.5
1
1.5
2
2.5
3
3.5
4
4.5
![Page 6: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/6.jpg)
6
Multiple linear regression – residual error and estimations
As the collected data is not expected to fall in a plane an error term must be added
The error term summes up to be zero.
Estimating the dependent factor and the population parameters:
jjjj XXY 2211
05
1015
20
25
1
2
3
4
5
6
70
0.5
1
1.5
2
2.5
3
3.5
4
4.5
jjj XbXbaY 2211ˆ
![Page 7: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/7.jpg)
7
Multiple linear regression – general equations
In general an finite number (m) of independent variables may be used to estimate the hyperplane
The number of sample points must be two more than the number of variables
j
m
iijij XY
1
![Page 8: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/8.jpg)
8
Multiple linear regression – least sum of squares
The principle of the least sum of squares are usually used to perform the fit:
2
1
ˆ
n
jjj YY
![Page 9: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/9.jpg)
9
Multiple linear regression – An example
![Page 10: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/10.jpg)
10
Multiple linear regression – The fitted equation
![Page 11: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/11.jpg)
11
Multiple linear regression – Are any of the coefficients significant?
F = regression MS / residual MS
![Page 12: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/12.jpg)
12
Multiple linear regression – Is it a good fit?
R2 = 1-regression SS / total SS• Is an expression of how much of
the variation can be described by the model
• When comparing models with different numbers of variables the ajusted R-square should be used:
Ra2 = 1 – regression MS / total MS
The multiple regression coefficient:R = sqrt(R2) The standard error of the estimate =
sqrt(residual MS)
![Page 13: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/13.jpg)
13
Multiple linear regression – Which of the coefficient are significant?
• sbi is the standard error of the regresion parameter bi
• t-test tests if bi is different from 0
• t = bi / sbi
• is the residual DF• p values can be found in a
table
![Page 14: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/14.jpg)
14
Multiple linear regression – Which of the are most important?
• The standardized regression coefficient , b’ is a normalized version of b
2
2'
y
xbb iii
![Page 15: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/15.jpg)
15
Multiple linear regression - multicollinearity
• If two factors are well correlated the estimated b’s becomes inaccurate.
• Collinearity, intercorrelation, nonorthogonality, illconditioning• Tolerance or variance inflation factors can be computed
• Extreme correlation is called singularity and on of the correlated variables must be removed.
![Page 16: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/16.jpg)
16
Multiple linear regression – Pairvise correlation coefficients
22
22;;
XXxYYXXxy
yx
xyr iiixy
![Page 17: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/17.jpg)
17
Multiple linear regression – Assumptions
The same as for simple linear regression:1. Y’s are randomly sampled 2. The reciduals are normal distributed 3. The reciduals hav equal variance4. The X’s are fixed factors (their error are small). 5. The X’s are not perfectly correlated
![Page 18: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/18.jpg)
Logistic regression
18
![Page 19: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/19.jpg)
19
Logistic Regression
• If the dependent variable is categorical and especially binary?
• Use some interpolation method
• Linear regression cannot help us.
![Page 20: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/20.jpg)
20
The sigmodal curve
0 1 1
1
1 e...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0;
1 = 1
XX
X
ee
ep
1
1
1
![Page 21: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/21.jpg)
21
The sigmodal curve
• The intercept basically just ‘scale’ the input variable
0 1 1
1
1 e...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0;
1 = 1
0 = 2;
1 = 1
0 = -2;
1 = 1
![Page 22: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/22.jpg)
22
The sigmodal curve
0 1 1
1
1 e...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0;
1 = 1
0 = 0;
1 = 2
0 = 0;
1 = 0.5
• The intercept basically just ‘scale’ the input variable
• Large regression coefficient → risk factor strongly influences the probability
![Page 23: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/23.jpg)
23
The sigmodal curve
0 1 1
1
1 e...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0;
1 = 1
0 = 0;
1 = -1
• The intercept basically just ‘scale’ the input variable
• Large regression coefficient → risk factor strongly influences the probability
• Positive regression coefficient → risk factor increases the probability
• Logistic regession uses maximum likelihood estimation, not least square estimation
![Page 24: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/24.jpg)
24
Does age influence the diagnosis? Continuous independent variable
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a Age ,109 ,010 108,745 1 ,000 1,115 1,092 1,138
Constant -4,213 ,423 99,097 1 ,000 ,015
a. Variable(s) entered on step 1: Age.
age1
1
10
BBze
pz
![Page 25: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/25.jpg)
25
Does previous intake of OCP influence the diagnosis? Categorical independent variable
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a OCP(1) -,311 ,180 2,979 1 ,084 ,733 ,515 1,043
Constant ,233 ,123 3,583 1 ,058 1,263
a. Variable(s) entered on step 1: OCP.
OCP1
1
10
BBze
pz
0.48051
1
1
1)1( 1, OCP If
0.55801
1
1
1)1( 0, OCP If
311.0233.01
233.0
10
0
eeYp
eeYp
BB
B
![Page 26: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/26.jpg)
26
Odds ratio
zep
po
1
0.7327 ratio odds 311.01010
0
10
eeee
e BBBBB
BB
![Page 27: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/27.jpg)
27
Multiple logistic regression
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157
BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128
OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603
Constant -6,974 ,762 83,777 1 ,000 ,001
a. Variable(s) entered on step 1: Age, BMI, OCP.
BMIageOCP1
1
3210
BBBBze
pz
![Page 28: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/28.jpg)
28
Predicting the diagnosis by logistic regression
What is the probability that the tumor of a 50 year old woman who has been using OCP and has a BMI of 26 is malignant?
z = -6.974 + 0.123*50 + 0.083*26 + 0.28*1 = 1.6140p = 1/(1+e-1.6140) = 0.8340
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157
BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128
OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603
Constant -6,974 ,762 83,777 1 ,000 ,001
a. Variable(s) entered on step 1: Age, BMI, OCP.
![Page 29: Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered](https://reader030.vdocuments.site/reader030/viewer/2022032800/56649d2d5503460f94a03eec/html5/thumbnails/29.jpg)
29
Exercises
20.1, 20.2