linear algebra review - university of michigan · 2011-09-12 · linear algebra review part 2: ax=b...
TRANSCRIPT
Linear Algebra ReviewPart 2: Ax=b
Edwin OlsonUniversity of Michigan
Saturday, September 10, 11
The Three-Day Plan• Geometry of Linear Algebra
‣ Vectors, matrices, basic operations, lines, planes, homogeneous coordinates, transformations
• Solving Linear Systems
‣ Gaussian Elimination, LU and Cholesky decomposition, over-determined systems, calculus and linear algebra, non-linear least squares, regression
• The Spectral Story
‣ Eigensystems, singular value decomposition, principle component analysis, spectral clustering
Saturday, September 10, 11
Linear Systems• System of simultaneous equations
‣ Can be interpreted as intersection of hyper planes
‣ Left: normal directions of the hyperplanes
- Do they intersect at a point?
‣ Right: where do they intersect?
Saturday, September 10, 11
The classic approach
• Eliminate variables by adding/subtracting multiples of equations
Solve using Gaussian Elimination
x=[2 -1 1]’
Note upper-triangular form.
Saturday, September 10, 11
LU Decomposition• Factor matrix into product of lower
triangular and upper triangular matrix
‣ We have 12 degrees of freedom but A only has 9 degrees of freedom. Let’s set the diagonal elements of L to 1.
Saturday, September 10, 11
LU Decomposition• What is this factorization useful?
• The last two steps are trivial... Only the LU step is hard.
Saturday, September 10, 11
Checkpoint• Use LU decomposition to solve
• Ax=b
• A=[1 3 ; 2 8 ]
• b=[2 -6]
• x = [5 -1]
Saturday, September 10, 11
Over-determined systems
• Is there an (exact) solution to this 3x2 system?
• Is it ever possible for a 3x2 system to have an exact solution?
‣ What does this imply about the hyperplane geometry? [Some of them don’t thin down the solution space.]
no
Saturday, September 10, 11
Over-determined Systems
• Derive the least-squares solution for Ax=b
• Given some x, what is our error on each row?
• Minimize the sum of squared errors
Ax-b
how to take derivatives of x’A and x’Ax...
show 2x2 example worked out
Only do this algebraically on this slide... save example for next slide
Saturday, September 10, 11
Checkpointx=(A^TA)^{-1}A^Tb
step one: everyone arrive at the expression below
step two: LU decomposition again.
x=[1 -1]
Saturday, September 10, 11
Geometric Intuition• Let’s think about it in 3-dimensions:
‣ We have three ingredients (vectors) that we can mix together in order to get as close as possible to b.
‣ What is the right amount to move in each direction?
• Let’s project the problem so that our variables are the distances to move in each direction.
‣ What are our new directions? The columns of A.
- A’Ax=A’b
• This is called the normal equation. Why?
‣ We project b into the column space of A.
‣ Any component of b that is perpendicular (normal) to the columns of A will be zero.
• The resulting equation finds the best distance to travel for each column of A such that the remaining error is normal to all of our columns.
• Why is this the same as the least-squared solution?
Saturday, September 10, 11
Symmetric Positive Definite (SPD) Matrices
• With non-linear least squares, we see matrices of the form
• These matrices are symmetric.
‣ (Prove it!)
• They’re also positive semi-definite
‣ (We haven’t defined this yet, but we’ll be able to show it next lecture easily using SVD)
Saturday, September 10, 11
Cholesky Decomposition
• Definition:
• Similar to LU decomposition, but U=L’
‣ Exists for SPD matrices
• Advantages over LU decomposition:
‣ About twice as fast, half as much memory.
L=\left[\begin{array}{cc}2\sqrt{2} & 0 \\1/\sqrt{2} & \sqrt{4.5}\end{array}\right]
Saturday, September 10, 11
Least squares regression• Estimate a continuous-valued quantity in terms of a
number of features
‣ Example: APPL stock price
• Features:
‣ Number of news articles about upcoming products
‣ Last quarter’s revenue
‣ Cash on hand
‣ Whether Steve Jobs is CEO
• Example: Movie rating predictions
‣ Features:
- How much did the user like other movies?
- How much did other users like this movie?
Saturday, September 10, 11
Fitting Lines• Let’s start with the linear case:
• Which line is best?
Saturday, September 10, 11
Minimize Prediction Error
• What else could we minimize?
Saturday, September 10, 11
Minimize distance?
• This makes sense too. Which one should we minimize?
‣ Depends on the nature of the error.
p
ei
n is the unit normal to the line
p is any point on the line
Saturday, September 10, 11
Fitting a line• In 2D, suppose the hyperplane (“line”) goes through:
‣ (1,1)
‣ (2,2)
‣ (3,2)
• Model: y = mx + b
‣ Other models require other tools...
• How do we formulate our problem into an Ax=b problem?
Saturday, September 10, 11
Non-linear regression
• Model:
‣ y = ax^2 + bx + c
Data (xi,yi)-------------
(0,3)(1,1)(2,0)(3,0)(4,3)
Augment x vector
Saturday, September 10, 11