linear algebra review - university of michigan · 2011-09-12 · linear algebra review part 2: ax=b...

Linear Algebra ReviewPart 2: Ax=b

Edwin OlsonUniversity of Michigan

Saturday, September 10, 11

The Three-Day Plan• Geometry of Linear Algebra

‣ Vectors, matrices, basic operations, lines, planes, homogeneous coordinates, transformations

• Solving Linear Systems

‣ Gaussian Elimination, LU and Cholesky decomposition, over-determined systems, calculus and linear algebra, non-linear least squares, regression

• The Spectral Story

‣ Eigensystems, singular value decomposition, principle component analysis, spectral clustering


Linear Systems• System of simultaneous equations

‣ Can be interpreted as intersection of hyper planes

‣ Left: normal directions of the hyperplanes

- Do they intersect at a point?

‣ Right: where do they intersect?


The classic approach

• Eliminate variables by adding/subtracting multiples of equations

Solve using Gaussian Elimination

x=[2 -1 1]’

Note upper-triangular form.


LU Decomposition• Factor matrix into product of lower

triangular and upper triangular matrix

‣ We have 12 degrees of freedom but A only has 9 degrees of freedom. Let’s set the diagonal elements of L to 1.


LU Decomposition• What is this factorization useful?

• The last two steps are trivial... Only the LU step is hard.


Checkpoint• Use LU decomposition to solve

• Ax=b

• A=[1 3 ; 2 8 ]

• b=[2 -6]

• x = [5 -1]


Over-determined systems

• Is there an (exact) solution to this 3x2 system?

• Is it ever possible for a 3x2 system to have an exact solution?

‣ What does this imply about the hyperplane geometry? [Some of them don’t thin down the solution space.]

no


Over-determined Systems

• Derive the least-squares solution for Ax=b

• Given some x, what is our error on each row?

• Minimize the sum of squared errors

Ax-b

how to take derivatives of x’A and x’Ax...

show 2x2 example worked out

Only do this algebraically on this slide... save example for next slide


Checkpointx=(A^TA)^{-1}A^Tb

step one: everyone arrive at the expression below

step two: LU decomposition again.

x=[1 -1]


Geometric Intuition• Let’s think about it in 3-dimensions:

‣ We have three ingredients (vectors) that we can mix together in order to get as close as possible to b.

‣ What is the right amount to move in each direction?

• Let’s project the problem so that our variables are the distances to move in each direction.

‣ What are our new directions? The columns of A.

- A’Ax=A’b

• This is called the normal equation. Why?

‣ We project b into the column space of A.

‣ Any component of b that is perpendicular (normal) to the columns of A will be zero.

• The resulting equation finds the best distance to travel for each column of A such that the remaining error is normal to all of our columns.

• Why is this the same as the least-squared solution?


Symmetric Positive Definite (SPD) Matrices

• With non-linear least squares, we see matrices of the form

• These matrices are symmetric.

‣ (Prove it!)

• They’re also positive semi-definite

‣ (We haven’t defined this yet, but we’ll be able to show it next lecture easily using SVD)


Cholesky Decomposition

• Definition:

• Similar to LU decomposition, but U=L’

‣ Exists for SPD matrices

• Advantages over LU decomposition:

‣ About twice as fast, half as much memory.

L=\left[\begin{array}{cc}2\sqrt{2} & 0 \\1/\sqrt{2} & \sqrt{4.5}\end{array}\right]


Least squares regression• Estimate a continuous-valued quantity in terms of a

number of features

‣ Example: APPL stock price

• Features:

‣ Number of news articles about upcoming products

‣ Last quarter’s revenue

‣ Cash on hand

‣ Whether Steve Jobs is CEO

• Example: Movie rating predictions

‣ Features:

- How much did the user like other movies?

- How much did other users like this movie?


Fitting Lines• Let’s start with the linear case:

• Which line is best?


Minimize Prediction Error

• What else could we minimize?


Minimize distance?

• This makes sense too. Which one should we minimize?

‣ Depends on the nature of the error.

p

ei

n is the unit normal to the line

p is any point on the line


Fitting a line• In 2D, suppose the hyperplane (“line”) goes through:

‣ (1,1)

‣ (2,2)

‣ (3,2)

• Model: y = mx + b

‣ Other models require other tools...

• How do we formulate our problem into an Ax=b problem?


Non-linear regression

• Model:

‣ y = ax^2 + bx + c

Data (xi,yi)-------------

(0,3)(1,1)(2,0)(3,0)(4,3)

Augment x vector


linear algebra review - university of michigan · 2011-09-12 · linear algebra review part 2: ax=b...

Documents