computational intelligence sew · why linear regression? • simplest machine learning algorithm...

59
COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Upload: others

Post on 19-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

COMPUTATIONAL

INTELLIGENCE SEW(INTRODUCTION TO MACHINE LEARNING) SS18

Lecture 2:

• Linear Regression

• Gradient Descent

• Non-linear basis functions

Page 2: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Practical of Friday rescheduled to Monday

11:00 to 12:00 and

12:00 to 13:00

News group:

tu-graz.lv.ew

Page 3: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

LINEAR REGRESSION

MOTIVATION

Page 4: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Why Linear Regression?

• Simplest machine learning algorithm for regression• Widely used in biological, behavioural and social sciences to describe

and to extract relationships between variables from data

• Prediction of real-valued outputs

• Easy to implement, fast to execute

• Benchmark algorithm for comparison with more complex algorithms

• Introduction to notation and concepts that we will need again later in

the course• Data format, vector & matrix notation

• Learning from data by minimizing a cost function

• Gradient descent

• Non-linear features and basis functions• Preparation for neural networks

Page 5: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Applications of (linear) regression

• Brain computer interfaces

• https://www.youtube.com/watch?v=Ae6En8-eaww

• Neuroprosthetic control

• https://www.youtube.com/watch?v=X_AI4MiY6L4

Page 6: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

LINEAR REGRESSION

WITH ONE INPUT

Page 7: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

A regression problem• We want to learn to predict a person’s height based on his/her

knee height and/or arm span

• This is useful for patients who are bed bound or in a wheelchair

and cannot stand to take an accurate measurement of their height

Knee

Height

[cm]

Arm

span

[cm]

Height

[cm]

50 166 171

56 172 175

52 174 168

… … …

Page 8: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Linear regression with one input

Learning algorithm

„Hypothesis“

hx

Training set

Hypothesis

Parameters

Test input

Prediction

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

?

?

Page 9: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Example Data

Knee

height

[cm]

Arm

span

[cm]

Height

[cm]

50 166 171

56 172 175

52 174 168

… … …

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

160 165 170 175 180 185 190170

175

180

185

190

armspan

body h

eig

ht

m=30 data points

Page 10: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Example Data

4550

5560

160

180

200170

175

180

185

190

knee heightarmspan

body h

eig

ht

Knee

Height

[cm]

Arm

span

[cm]

Height

[cm]

50 166 171

56 172 175

52 174 168

… … …

Page 11: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Linear regression with one input

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

Knee

Height

[cm]

Height

[cm]

50 171

56 175

52 168

… …

HypothesisParameters ?

Which hypothesis is better?

In what sense is it better?

Page 12: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Formalization of problem

• Given m training examples

• Goal: learn parameters

such that

for all training examples i=1…30.

Knee

Height

[cm]

Height

[cm]

50 171

56 175

52 168

… …

m=30 data points

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

Page 13: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

Least Squares Objective

• Minimize Error

0.6

150

Page 14: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Least Squares Objective

• Minimize Error

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

10.77

0.6

150

cost function mean squared error

Page 15: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

Least Squares Objective

• Minimize Error

5.94

0.75

140

cost function mean squared error

Page 16: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Cost function illustrated

Properties of cost function:

• Quadratic function

• „Bowl“-shaped

• Unique local and global

minimum (under

„regular“ conditions)

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

10.77

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

5.94

Page 17: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Minimizing the cost

• Two ways to find the parameters

minimizing

• Gradient descent

• Direct analytical solution

(setting derivatives = 0)

Page 18: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Recall: Functions of multiple variables

• Example:

• Partial derivatives

• Gradients vectors formed with the partial derivatives (fundamental in lecture 2)

• Chain rule (fundamental for neural networks in lecture 4)

• Function of multiple variable with high dimensional values

• Jacobian matrix formed with the partial derivatives

Page 19: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

GRADIENT DESCENT

Page 20: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Descending in the steepest directionGradient descent on some arbitrary cost function …

Page 21: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

learning rate („eta“)

Gradient descent algorithm

• Repeat until convergence

(simultaneously updating

and )

partial derivative of

with respect to

negative gradient =

descent

Page 22: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Gradient is orthogonal to contour

lines

-2-1

01

2 -2

-1

0

1

20

0.5

1

1.5

2

2.5

3

3.5

4

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

A contour line

is a line along which

= const

Page 23: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Potential issues with gradient descent

• May get stuck in local minima

• Learning rate too small: slow

convergence

• Learning rate too large: oscillations,

divergence

too small too large

Page 24: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

LINEAR REGRESSION

WITH GRADIENT

DESCENT(ONE INPUT)

Page 25: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Application of gradient descent

• Linear regression cost • Gradient descent

(simultaneous update)

(simultaneous

update)

“error” “input”

”learning rate”

Page 26: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Predicting height from knee height

• Optimal fit to training data

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

0.8

137.4

Page 27: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

LINEAR REGRESSIONMORE GENERAL FORMULATION: MULTIPLE FEATURES

Page 28: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Multiple inputs (features)

• Notation:

… number of training examples

… number of features

… input features of i‘th training example (vector-valued)

…. value of feature j in i‘th training example

Knee

Height

x1

Arm

span

x2

Age

x3

Height

y

50 166 32 171

56 172 17 175

52 174 62 168

… … … …

= 3

=

56

172

17

= 17

Page 29: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Linear hypothesis

• Hypothesis (one input):

• Hypothesis (multiple input features):

• More compact notation:

Example: h(x) = 50 + 0.5*kneeheight + 0.3*armspan + 0.1*age

Introduce

Why? Notation convenience!

Page 30: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Multiple inputs (features) revisited

• Notation:

… number of training examples

… number of features

… input features of i‘th training example (vector-valued)

…. value of feature j in i‘th training example

x0

Knee

Height

x1

Arm

span

x2

Age

x3

Height

y

1 50 166 32 171

1 56 172 17 175

1 52 174 62 168

1 … … … …

= 3

=

1

56

172

17

= 17

= 1

Page 31: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Matrix and vector notation

x0

Knee

Height

x1

Arm

span

x2

Age

x3

Height

y

1 50 166 32 171

1 56 172 17 175

1 52 174 62 168

(n+1) ˟ 1 m ˟ (n+1) m ˟ 1

design matrixfeatures of i‘th training example output/target vector

Page 32: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Matrix and vector notation

x0

Knee

Height

x1

Arm

span

x2

Age

x3

Height

y

1 50 166 32 171

1 56 172 17 175

1 52 174 62 168

𝐻 𝜽 = 𝑋𝜽

Page 33: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

LINEAR REGRESSION

WITH GRADIENT

DESCENT(GENERAL FORMULATION)

Page 34: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Linear regression problem statement

• Hypothesis:

• Cost function:

Goal is to find parameters which minimize the cost

high-dimensional quadratic

(„bowl“-shaped) function

Page 35: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Gradient descent (multiple features)

(simultaneous

update for

j=0…n)

For j = 0: define for convenience

with one input feature:

with n input features:

(simultaneous

update)

“error”

“error”

“input”

“input””learning rate”

”learning rate”

Page 36: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

LINEAR REGRESSION

ANALYTICAL SOLUTION

Page 37: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Analytical solution

… design matrix

… output/target vector

• Set all partial derivatives of cost

function = 0

• Solving system of linear

equations yields:

Moore-Penrose Pseudoinverse of

• Note: This analytical solution requires that columns of are linearly

independent („regular“ conditions)

Page 38: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Example: analytical solution applied

to problem with one input

Knee

Height

[cm]

Height

[cm]

50 171

56 175

52 168

… …

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

Page 39: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Example: analytical solution applied

to problem with one input

Knee

Height

[cm]

Height

[cm]

50 171

56 175

52 168

… … 30 ˟ 2 30 ˟ 1

2 ˟ 2

2 ˟ 2

2 ˟ 1

Page 40: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Predicting height from knee height

45 50 55 60170

175

180

185

190

knee height

body h

eig

ht

0.8

137.4

Page 41: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Gradient descent Analytical solution

• Need to choose learning

rate

• Iterative algorithm (needs

many iterations to

converge)

• Works well even when

number of input features

is large

• No need to choose

• Direct solution (no

iteration)

• Slow if is too large

(inverting n x n matrix)

Page 42: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

NON-LINEAR FEATURES(NON-LINEAR BASIS FUNCTIONS)

Page 43: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Non-linear trends in data

x y

0.01 -0.27

-1.22 2.63

0.17 -0.13

… …

-4 -3 -2 -1 0 1 2 3-2

0

2

4

6

8

10

12

14

16

-4 -3 -2 -1 0 1 2 3-2

0

2

4

6

8

10

12

14

16

• How can we learn non-linear hypotheses?

?

? ? ?

Page 44: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Linear fit to this “non-linear” data

x y

0.01 -0.27

-1.22 2.63

0.17 -0.13

… …

standard design matrix

Hypothesis:

Optimal parameters:

Page 45: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Linear fit to this “non-linear” data

-4 -3 -2 -1 0 1 2 3-2

0

2

4

6

8

10

12

14

16

Page 46: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Non-linear (quadratic) fit

x y

0.01 -0.27

-1.22 2.63

0.17 -0.13

… …

design matrix with

non-linear features

Hypothesis:

Optimal parameters:

Page 47: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Non-linear (quadratic) fit

-4 -3 -2 -1 0 1 2 3-2

0

2

4

6

8

10

12

14

16

Page 48: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Non-linear (sinusoid) fit

x y

0.01 -0.27

-1.22 2.63

0.17 -0.13

… …

design matrix with

non-linear features

Hypothesis:

Optimal parameters:

Page 49: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Non-linear (sinusoidal) fit

-4 -3 -2 -1 0 1 2 3-2

0

2

4

6

8

10

12

14

16

Page 50: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Non-linear input features (in general)

• Feature 2 for each training example i is computed by applying a

non-linear basis function:

• Allows to learn a variety of non-linear functions with the same technique(s):• Analytical or gradient descent

all features of

1st training example

feature 2 of all training examples

Page 51: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Polynomial regression• Features are powers of x

n = degree of polynome

to be learned

n=0 n=1

n=3 n=9

What happened here?

Next lecture…

Page 52: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Radial basis functions

• „Gaussian“-shaped RBFs:• Each basis function j has a center in the input space

• The width of the basis functions is determined by

-6 -4 -2 0 2 4 6 80

0.2

0.4

0.6

0.8

1

x

Page 53: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

-6 -4 -2 0 2 4 6 80

0.2

0.4

0.6

0.8

1

x

Radial basis functions

• „Gaussian“-shaped RBFs:• Each basis function j has a center in the input space

• The width of the basis functions is determined by

Page 54: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

-6 -4 -2 0 2 4 6 80

0.2

0.4

0.6

0.8

1

x

Radial basis functions

• „Gaussian“-shaped RBFs:• Each basis function j has a center in the input space

• The width of the basis functions is determined by

Page 55: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Fitting a single RBF to data

-4 -2 0 2 4 6-2

0

2

4

6

8

10

12

14

16

RBF with

-4 -2 0 2 4 6-2

0

2

4

6

8

10

12

14

16

Page 56: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

-4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

-4 -2 0 2 4 6-15

-10

-5

0

Fitting RBFs to data

-4 -2 0 2 4 6-2

0

2

4

6

8

10

12

14

16

-4 -2 0 2 4 6-2

0

2

4

6

8

10

12

14

16

RBFs with

Page 57: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

SUMMARY (QUESTIONS)

Page 58: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

Some questions…

• Hypothesis for linear regression = ?

• Cost function for linear regression = ?

• How many local minima may the cost function for lin. reg. have (under

regular conditions)?

• Name two ways to minimize the cost function?

• General gradient descent formula?

• How is Linear regression with gradient descent solved?

• What issues can arise during gradient descent?

• What is the design matrix? What are its dimensions?

• Analytical solution for linear regression = ?• What are the components of the solution?

• Pros and Cons of gradient descent vs. analytical solution?

• How can one learn non-linear hypotheses with linear regression?

• What is polynomial regression?

• What are radial basis functions?

Page 59: COMPUTATIONAL INTELLIGENCE SEW · Why Linear Regression? • Simplest machine learning algorithm for regression • Widely used in biological, behavioural and social sciences to describe

What is next?

• Classification with Logistic Regression

• Gradient descent tricks & more advanced optimization techniques

• Underfitting & Overfitting

• Model selection (Training, Validation and test set)