new linear regression (part ii) · 2020. 9. 23. · linear regression (part ii) [email protected]...

11/3/2008

1

Linear Regression(Part II)

[email protected]

Outline

• Generalize the linear regression

• Simple multiple regression

• Regularization

• Bias vs. Variance tradeoff

• Model selection

2

11/3/2008

2

Linear regression in D dimensions

From D = 1

To D > 1 and linear on x(multiple regression)

To D > 1 and any set of nonlinear functions on x

3

All are linear on wLinear regression

Linear basis functions models

4

11/3/2008

3

Example of basis functions transformation

5

Maximum likelihood and least squares

Assuming normal distribution

The likelihood to observe N target values t from a set of X D dimensional predictors

Maximize ln of likelihood…

6

is the same as minimizing E

11/3/2008

4

Solving MLEFind optimal w…

based on Moore-Penrose pseudo-inverse

7

Outline



• Regularization


• Model selection

8

11/3/2008

5

The case of simple multiple regression

wi XiY=w0 +

w

w

w0

w1

wD

9

The case of simple multiple regression

w

10

11/3/2008

6

Least squares estimatew

w

w w w

w

w

11

ww w

Least squares estimate

w

w

12

11/3/2008

7

Example

13

Solution

14

11/3/2008

8

Solution (cont.)

15

w

What it looks like?

16

11/3/2008

9

Outline



• Regularization


• Model selection

17

Regularization

Recall the case of

• a single predictor

• and polynomial basis function φj(x) = xj

18

11/3/2008

10

1st Order Polynomial

19

3rd Order Polynomial

20

11/3/2008

11

9th Order Polynomial

21

Over‐fitting

Root‐Mean‐Square (RMS) Error:

22

11/3/2008

12

Polynomial Coefficients

23

Data Set Size:


24

11/3/2008

13

Data Set Size:


25

Regularization

Penalize large coefficient values

26

11/3/2008

14

Regularization:

27

Regularization:

28

11/3/2008

15

Regularization: vs.

29

Polynomial Coefficients

30

11/3/2008

16

Outline



• Regularization


• Model selection

31

Bias vs. Variance

noise

32

variance

bias

11/3/2008

17

How this applies to regression?

33


34

11/3/2008

18


35

Outline



• Regularization


• Model selection

36

11/3/2008

19

Model Selection

M (degree of polynomial) corresponds to the l it f th d lcomplexity of the model

The bigger M, the bigger the overfitting

Regularization (λ) helps by controlling the complexity

But how to find good combinations for M and λ?But how to find good combinations for M and λ?

37

Information criteria

p(D|wML) is the likelihoodM is the number of parameters

• AIC (Akaike Information Criterion)

maximize p(D|wML) - M

Tends to favor overly simply models

38

new linear regression (part ii) · 2020. 9. 23. · linear regression (part ii) [email protected]...

Documents