homework 3 solution - ee263: introduction to linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf ·...

27
Homework 3 Solutions EE 263 Stanford University Summer 2018 July 12, 2018 1. Least-squares residuals. Suppose A is skinny and full-rank. Let x ls be the least-squares approximate solution of Ax = y, and let y ls = Ax ls . Show that the residual vector r = y - y ls satisfies krk 2 = kyk 2 -ky ls k 2 . Also, give a brief geometric interpretation of this equality (just a couple of sentences, and maybe a conceptual drawing). Solution. Let us first show that r y ls . Since y ls = Ax ls = AA y = A(A T A) -1 A T y y ls T r = y ls T (y - y ls )= y ls T y - y ls T y ls = y T A(A T A) -1 A T y - y T A(A T A) -1 A T A(A T A) -1 A T y = y T A(A T A) -1 A T y - y T A(A T A) -1 (A T A)(A T A) -1 A T y = y T A(A T A) -1 A T y - y T A(A T A) -1 A T y =0. 1

Upload: lambao

Post on 21-Jul-2018

460 views

Category:

Documents


25 download

TRANSCRIPT

Page 1: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

Homework 3 Solutions

EE 263 Stanford University Summer 2018

July 12, 2018

1. Least-squares residuals. Suppose A is skinny and full-rank. Let xls be the least-squaresapproximate solution of Ax = y, and let yls = Axls. Show that the residual vector r = y − ylssatisfies

‖r‖2 = ‖y‖2 − ‖yls‖2.

Also, give a brief geometric interpretation of this equality (just a couple of sentences, andmaybe a conceptual drawing).

Solution. Let us first show that r ⊥ yls. Since yls = Axls = AA†y = A(ATA)−1ATy

ylsTr = yls

T(y − yls) = ylsTy − yls

Tyls

= yTA(ATA)−1ATy − yTA(ATA)−1ATA(ATA)−1ATy

= yTA(ATA)−1ATy − yTA(ATA)−1(ATA)(ATA)−1ATy

= yTA(ATA)−1ATy − yTA(ATA)−1ATy

= 0.

1

Page 2: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

Thus, ‖y‖2 = ‖yls+r‖2 = (yls+r)T(yls+r) = ‖yls‖2+2ylsTr+‖r‖2 = ‖yls‖2+‖r‖2. Therefore

‖r‖2 = ‖y‖2 − ‖yls‖2.

���������

EEEEEEEEEEEEEEEE

������������������

EE��

yls = Axls

y

0

‖r‖

‖yls‖

‖y‖

R(A)

−→ By Pythagoras’ theorem, ‖y‖2 = ‖yls‖2 + ‖r‖2

2. Least-squares model fitting. In this problem you will use least-squares to fit severaldifferent types of models to a given set of input/output data. The data consist of a scalarinput sequence u, and a scalar output sequence y, for t = 1, . . . , N . You will develop severaldifferent models that relate the signals u and y.

• Memoryless models. In a memoryless model, the output at time t, i.e., y(t), dependsonly the input at time t, i.e., u(t). Another common term for such a model is static.

constant model: y(t) = c0

static linear: y(t) = c1u(t)

static affine: y(t) = c0 + c1u(t)

static quadratic: y(t) = c0 + c1u(t) + c2u(t)2

• Dynamic models. In a dynamic model, y(t) depends on u(s) for some s 6= t. We considersome simple time-series models (see problem 2 in the reader), which are linear dynamic

2

Page 3: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

models.

moving average (MA): y(t) = a0u(t) + a1u(t− 1) + a2u(t− 2)

autoregressive (AR): y(t) = a0u(t) + b1y(t− 1) + b2y(t− 2)

autoregressive moving average (ARMA): y(t) = a0u(t) + a1u(t− 1) + b1y(t− 1)

Note that in the AR and ARMA models, y(t) depends indirectly on all previous inputs,u(s) for s < t, due to the recursive dependence on y(t− 1). For this reason, the AR andARMA models are said to have infinite memory. The MA model, on the other hand, hasa finite memory : y(t) depends only on the current and two previous inputs. (Anotherterm for this MA model is 3-tap system, where taps refer to taps on a delay line.)

Each of these models is specified by its parameters, i.e., the scalars ci, ai, bi. For each of thesemodels, find the least-squares fit to the given data. In other words, find parameter values thatminimize the sum-of-squares of the residuals. For example, for the ARMA model, pick a0, a1,and b1 that minimize

N∑t=2

(y(t)− a0u(t)− a1u(t− 1)− b1y(t− 1))2 .

(Note that we start the sum at t = 2 which ensures that u(t − 1) and y(t − 1) are defined.)For each model, give the root-mean-square (RMS) residual, i.e., the squareroot of the meanof the optimal residual squared. Plot the output y predicted by your model, and plot theresidual (which is y − y). The data for this problem are available from the class web page inthe file uy_data.json. This file contains the vectors u and y and the scalar N (the length ofthe vectors). Now you can plot u, y, etc. Note: the dataset u, y is not generated by any ofthe models above. It is generated by a nonlinear recursion, which has infinite memory.

Solution. For each of the given models, we get a linear relationship between the outputsand the unknown parameters. For example, for the constant model we have

y(1)y(2)...

y(N)

=

1...1

c0Or for the static quadratic model

y(1)y(2)...

y(N)

=

1 u(1) u(1)2

1 u(2) u(2)2

......

...1 u(N) u(N)2

c0c1c2

,Similarly, for the autoregressive moving average model we get

y(2)y(3)...

y(N)

=

u(2) u(1) y(1)u(3) u(2) y(2)...

......

u(N) u(N − 1) y(N − 1)

a0a1b1

,

3

Page 4: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

(Note that for this model we start from y(2), since u(0) and y(0) are undefined). All ofthe above are in the form of y = Ax, where y is the output sequence, x is the vector ofcorresponding unknown coefficients. The goal is to find the coefficients that minimize thesum-of-squares of the residuals. This is nothing but the least-squares solution of y = Ax,given by xls = (ATA)−1ATy. Then using xls, the model output can be computed as y = Axls.This can be done easily in matlab:

uy_data; % read u,y,NA1=ones(N,1); A2=u; A3=[ones(N,1), u]; A4=[ones(N,1), u, u.^2];x1=A1\y; y1_hat=A1*x1; r1=y-y1_hat; rms1=sqrt(mean(r1.^2))x2=A2\y; y2_hat=A2*x2; r2=y-y2_hat; rms2=sqrt(mean(r2.^2))x3=A3\y; y3_hat=A3*x3; r3=y-y3_hat; rms3=sqrt(mean(r3.^2))x4=A4\y; y4_hat=A4*x4; r4=y-y4_hat; rms4=sqrt(mean(r4.^2))A5=[u(3:N), u(2:N-1), u(1:N-2)]; y5=y(3:N); A6=[u(3:N), y(2:N-1),y(1:N-2)]; y6=y(3:N); A7=[u(2:N), u(1:N-1), y(1:N-1)]; y7=y(2:N);x5=A5\y5; y5_hat=A5*x5; r5=y5-y5_hat; rms5=sqrt(mean(r5.^2))x6=A6\y6; y6_hat=A6*x6; r6=y6-y6_hat; rms6=sqrt(mean(r6.^2))x7=A7\y7; y7_hat=A7*x7; r7=y7-y7_hat; rms7=sqrt(mean(r7.^2))figure(1); subplot(211); plot(y1_hat,’b’); grid on; hold on;plot(r1,’--r’); hold off; title(’constant’); xlabel(’n’);ylabel(’y_{hat}’); subplot(212); plot(y2_hat,’b’); grid on; holdon; plot(r2,’--r’); hold off; title(’linear’); xlabel(’n’);ylabel(’y_{hat}’);figure(2); subplot(211); plot(y3_hat,’b’); grid on; hold on;plot(r3,’--r’); hold off; title(’affine’); xlabel(’n’);ylabel(’y_{hat}’); subplot(212); plot(y4_hat,’b’); grid on; holdon; plot(r4,’--r’); hold off; title(’quadratic’); xlabel(’n’);ylabel(’y_{hat}’);figure(3); subplot(211); plot(y5_hat,’b’); grid on; hold on;plot(r5,’--r’); hold off; title(’MA’); xlabel(’n’);ylabel(’y_{hat}’); subplot(212); plot(y6_hat,’b’); grid on; holdon; plot(r7,’--r’); hold off; title(’AR’); xlabel(’n’);ylabel(’y_{hat}’);figure(4); subplot(211); plot(y7_hat,’b’); grid on; hold on;plot(r7,’--r’); hold off; title(’ARMA’); xlabel(’n’);ylabel(’y_{hat}’);figure(1); print uy_1 figure(2); print uy_2 figure(3); print uy_3figure(4); print uy_4

And the following RMS values for the residuals are obtained: Constant: 1.1181, linear: 0.5940,affine: 0.5210, quadratic: 0.5179; MA : 0.2504, AR : 0.1783, ARMA : 0.1853 For the mem-oryless models, the error decreases as the model becomes more complicated. However, themodels with memory perform significantly better. Among these the error decreases with theintroduction of autoregressive terms. Among the memoryless models, the affine model wouldbe a good choice, since the more complicated quadratic model yields only slightly smallerresiduals. Overall, the autoregressive model seems to do a good job. Of course, to choose a

4

Page 5: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

model, we should really validate on another batch of data. Note that in this problem we wereonly concerned with model fitting to the data, and not in ‘validating’ the model, i.e., how wellthis model will work for inputs other than the ones used for fitting the model. The followingplots show y (solid) and the residuals y − y (dashed):

0 10 20 30 40 50 60 70 80 90 100−3

−2

−1

0

1

2

3

0 10 20 30 40 50 60 70 80 90 100−3

−2

−1

0

1

2

3

n

n

yy

constant

linear

0 10 20 30 40 50 60 70 80 90 100−2

−1

0

1

2

3

0 10 20 30 40 50 60 70 80 90 100−2

−1

0

1

2

3

n

n

yy

affine

quadratic

5

Page 6: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

0 10 20 30 40 50 60 70 80 90 100−3

−2

−1

0

1

2

3

0 10 20 30 40 50 60 70 80 90 100−2

−1

0

1

2

3

n

n

yy

moving average

autoregressive

0 10 20 30 40 50 60 70 80 90 100−2

−1

0

1

2

3

n

y

autoregressive moving average

3. Identifying a system from input/output data. We consider the standard setup:

y = Ax+ v,

where A ∈ Rm×n, x ∈ Rn is the input vector, y ∈ Rm is the output vector, and v ∈ Rm is thenoise or disturbance. We consider here the problem of estimating the matrix A, given someinput/output data. Specifically, we are given the following:

x(1), . . . , x(N) ∈ Rn, y(1), . . . , y(N) ∈ Rm.

These represent N samples or observations of the input and output, respectively, possiblycorrupted by noise. In other words, we have

y(k) = Ax(k) + v(k), k = 1, . . . , N,

where v(k) are assumed to be small. The problem is to estimate the (coefficients of the) matrixA, based on the given input/output data. You will use a least-squares criterion to form an

6

Page 7: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

estimate A of A. Specifically, you will choose as your estimate A the matrix that minimizesthe quantity

J =

N∑k=1

‖Ax(k) − y(k)‖2

over A.

a) Explain how to do this. If you need to make an assumption about the input/outputdata to make your method work, state it clearly. You may want to use the matricesX ∈ Rn×N and Y ∈ Rm×N given by

X =[x(1) · · · x(N)

], Y =

[y(1) · · · y(N)

]in your solution.

b) On the course web site you will find some input/output data for an instance of thisproblem in the file sysid_data.json. Executing this Julia file will assign values to m, n,andN , and create two matrices that contain the input and output data, respectively. Then×N matrix variable X contains the input data x(1), . . . , x(N) (i.e., the first column of Xcontains x(1), etc.). Similarly, them×N matrix Y contains the output data y(1), . . . , y(N).You must give your final estimate A, your source code, and also give an explanation ofwhat you did.

Solution.

a) We start by expressing the objective function J as

J =

N∑k=1

‖Ax(k) − y(k)‖2

=N∑k=1

m∑i=1

(Ax(k) − y(k))2i

=

N∑k=1

m∑i=1

(aTi x(k) − y(k)i )2

=m∑i=1

(N∑k=1

(aTi x(k) − y(k)i )2

),

where aTi is the ith row of A. The last expression shows that J is a sum of expressions Ji(shown in parentheses), each of which only depends on ai. This means that to minimizeJ , we can minimize each of these expressions separately. That makes sense: we canestimate the rows of A separately. Now let’s see how to minimize

Ji =N∑k=1

(aTi x(k) − y(k)i )2,

7

Page 8: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

which is the contribution to J from the ith row of A. First we write it as

Ji =

x(1)T

...x(N)T

ai − y

(1)i...

y(N)i

2

.

Now that we have the problem in the standard least-squares format, we’re pretty muchdone. Using the matrix X ∈ Rn×N given by

X =[x(1) · · · x(N)

],

we can express the estimate as

ai = (XXT)−1X

y(1)i...

y(N)i

.Using the matrix Y ∈ Rm×N given by

Y =[y(1) · · · y(N)

],

we can express the estimate of A as

AT = (XXT)−1XY T.

Transposing this gives the final answer:

A = Y XT(XXT)−1.

b) Once you have the neat formula found above, it’s easy to get matlab to compute theestimate. It’s a little inefficient, but perfectly correct, to simply use

Ahat = Y*X’*inv(X*X’);

This yields the estimate

A =

2.03 5.02 5.010.01 7 1.017.04 0 6.94

7 3.98 49.01 1.04 74.01 3.96 9.034.99 6.97 8.037.94 6.09 3.020.01 8.97 −0.041.06 8.02 7.03

.

8

Page 9: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

Once you’ve got A, it’s a good idea to check the residuals, just to make sure it’s reason-able, by comparing it to

N∑k=1

‖y(k)‖2.

Here we get (64.5)2, around 4.08%. There are several other ways to compute A inmatlab. You can calculate the rows of A one at a time, using

a1hat = (X’\(Y(i,:)’))’;

In fact, the backslash operator in matlab solves multiple least-squares problems at once,so you can use

AhatT = X’ \ (Y’);Ahat = AhatT’;

or

Ahat = (X’\(Y’))’;

In any case, it’s not exactly a long matlab program . . .

4. Curve-smoothing. We are given a function F : [0, 1] → R (whose graph gives a curve inR2). Our goal is to find another function G : [0, 1] → R, which is a smoothed version of F .We’ll judge the smoothed version G of F in two ways:

• Mean-square deviation from F , defined as

D =

∫ 1

0(F (t)−G(t))2 dt.

• Mean-square curvature, defined as

C =

∫ 1

0G′′(t)2 dt.

We want both D and C to be small, so we have a problem with two objectives. In generalthere will be a trade-off between the two objectives. At one extreme, we can choose G = F ,which makes D = 0; at the other extreme, we can choose G to be an affine function (i.e.,to have G′′(t) = 0 for all t ∈ [0, 1]), in which case C = 0. The problem is to identify theoptimal trade-off curve between C and D, and explain how to find smoothed functions Gon the optimal trade-off curve. To reduce the problem to a finite-dimensional one, we willrepresent the functions F and G (approximately) by vectors f, g ∈ Rn, where

fi = F (i/n), gi = G(i/n).

9

Page 10: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

You can assume that n is chosen large enough to represent the functions well. Using thisrepresentation we will use the following objectives, which approximate the ones defined for thefunctions above:

• Mean-square deviation, defined as

d =1

n

n∑i=1

(fi − gi)2.

• Mean-square curvature, defined as

c =1

n− 2

n−1∑i=2

(gi+1 − 2gi + gi−1

1/n2

)2

.

In our definition of c, note thatgi+1 − 2gi + gi−1

1/n2

gives a simple approximation of G′′(i/n). You will only work with this approximate versionof the problem, i.e., the vectors f and g and the objectives c and d.

a) Explain how to find g that minimizes d + µc, where µ ≥ 0 is a parameter that givesthe relative weighting of sum-square curvature compared to sum-square deviation. Doesyour method always work? If there are some assumptions you need to make (say, onrank of some matrix, independence of some vectors, etc.), state them clearly. Explainhow to obtain the two extreme cases: µ = 0, which corresponds to minimizing d withoutregard for c, and also the solution obtained as µ → ∞ (i.e., as we put more and moreweight on minimizing curvature).

b) Get the file curve_smoothing.json from the course web site. This file defines a specificvector f that you will use. Find and plot the optimal trade-off curve between d and c.Be sure to identify any critical points (such as, for example, any intersection of the curvewith an axis). Plot the optimal g for the two extreme cases µ = 0 and µ→∞, and forthree values of µ in between (chosen to show the trade-off nicely). On your plots of g,be sure to include also a plot of f , say with dotted line type, for reference. Submit yourmatlab code.

Solution.

a) Let’s start with the two extreme cases. When µ = 0, finding g to minimize d + µcreduces to finding g to minimize d. Since d is a sum of squares, d ≥ 0. Choosing g = ftrivially achieves d = 0. This makes perfect sense: to minimize the deviation measure,just take the smoothed version to be the same as the original function. This yields zerodeviation, naturally, but also, it yields no smoothing! Next, consider the extreme casewhere µ → ∞. This means we want to make the curvature as small as possible. Canwe drive it to zero? The answer is yes, we can: the curvature is zero if and only if g isan affine function, i.e., has the form gi = ai + b for some constants a and b. There arelots of vectors g that have this form; in fact, we have one for every pair of numbers a, b.

10

Page 11: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

All of these vectors g make c zero. Which one do we choose? Well, even if µ is huge, westill have a small contribution to d + µc from d, so among all g that make c = 0, we’dlike the one that minimizes d. Basically, we want to find the best affine approximation,in the sum of squares sense, to f . We want to find a and b that minimize

f −A[ab

]where A =

1 12 13 1...

...n 1

.

For n ≥ 2, A is skinny and full rank, and a and b can be found using least-squares.Specifically, [a b]T = (ATA)−1ATf . In the general case, minimizing d+ µc, is the sameas choosing g to minimize

1√nIg − 1√

nf2

+ µn2√n− 2

−1 2 −1 0 · · · 00 −1 2 −1 · · · 0

0 0. . . . . . . . .

...0 0 · · · −1 2 −1

︸ ︷︷ ︸

S

g

2

.

This is a multi-objective least-squares problem. The minimizing g is

g = (ATA)−1ATy where A =

[I√n√µS

]and y =

[f√n

0

].

The inverse of A always always exists because I is full rank. The expression can also bewritten as g = ( In + µSTS)−1 fn .

b) The following plots show the optimal trade-off curve and the optimal g corresponding

11

Page 12: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

to representative µ values on the curve.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

6

Sum−square deviation (y intercept = 0.3347)

Su

m−

sq

ua

re c

urv

atu

re (

x in

terc

ep

t =

1.9

72

4e

06

)

Optimal tradeoff curve

5 10 15 20 25 30 35 40 45 50−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

Curves illustrating the trade−off

f

u = 0

u = 10e−7

u = 10e−5

u = 10e−4

u = infinity

12

Page 13: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

The following matlab code finds and plots the optimal trade-off curve between d and c.It also finds and plots the optimal g for representative values of µ. As expected, whenµ = 0, g = f and no smoothing occurs. At the other extreme, as µ goes to infinity, weget an affine approximation of f . Intermediate values of µ correspond to approximationsof f with different degrees of smoothness.

close all;clear all;curve_smoothingS = toeplitz([-1; zeros(n-3,1)],[-1 2 -1 zeros(1,n-3)]);S = S*n^2/(sqrt(n-2));I = eye(n);g_no_deviation = f;error_curvature(1) = norm(S*g_no_deviation)^2;error_deviation(1) = 0;u = logspace(-8,-3,30);for i = 1:length(u)A_tilde = [1/sqrt(n)*I; sqrt(u(i))*S];y_tilde = [1/sqrt(n)*f; zeros(n-2,1)];g = A_tilde\y_tilde;error_deviation(i+1) = norm(1/sqrt(n)*I*g-f/sqrt(n))^2;error_curvature(i+1) = norm(S*g)^2;enda1 = 1:n;a1 = a1’;a2 = ones(n,1);A = [a1 a2];affine_param = inv(A’*A)*A’*f;for i = 1:ng_no_curvature(i) = affine_param(1)*i+affine_param(2);endg_no_curvature = g_no_curvature’;error_deviation(length(u)+2) = 1/n*norm(g_no_curvature-f)^2;error_curvature(length(u)+2) = 0;figure(1);plot(error_deviation, error_curvature);xlabel(’Sum-square deviation (y intercept = 0.3347)’);ylabel(’Sum-square curvature (x intercept = 1.9724e06)’);title(’Optimal tradeoff curve ’);print curve_extreme.eps;u1 = 10e-7;A_tilde = [1/sqrt(n)*I;sqrt(u1)*S];y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];g1 = A_tilde\y_tilde;u2 = 10e-5;A_tilde = [1/sqrt(n)*I;sqrt(u2)*S];

13

Page 14: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];g2 = A_tilde\y_tilde;u3 = 10e-4;A_tilde = [1/sqrt(n)*I;sqrt(u3)*S];y_tilde = [1/sqrt(n)*f;zeros(n-2,1)];g3 = A_tilde\y_tilde;figure(3);plot(f,’*’);hold;plot(g_no_deviation);plot(g1,’--’);plot(g2,’-.’);plot(g3,’-’);plot(g_no_curvature,’:’);axis tight;legend(’f’,’u = 0’,’u = 10e-7’, ’u = 10e-5’, ’u = 10e-4’,’u = infinity’,0);title(’Curves illustrating the trade-off’);print curve_tradeoff.eps;

Note: Several exams had a typo that defined

c =1

n− 1

n−1∑i=2

(gi+1 − 2gi + gi−1

1/n2

)2

instead of

c =1

n− 2

n−1∑i=2

(gi+1 − 2gi + gi−1

1/n2

)2

.

The solutions above reflect the second definition. Full credit was given for answers consistentwith either definition. Some common errors

• Several students tried to approximate f using low-degree polynomials. While fitting fto a polynomial does smooth f , it does not necessarily minimize d+ µc for some µ ≥ 0,nor does it illustrate the trade-off between curvature and deviation.

• In explaining how to find the g that minimizes d+ µc as µ→∞, many people correctlyobserved that if g ∈ null(S), then c = 0. For full credit, however, solutions had to showhow to choose the vector in null(S) that minimizes d.

• Many people chose to zoom in on a small section of the trade-off curve rather than plotthe whole range from 0 to µ → ∞. Those solutions received full-credit provided theycalculated the intersections with the axes (i.e. provided they found the minimum valuefor d+ µc when µ = 0 and when µ→∞).

5. Hovercraft with limited range. We have a hovercraft moving in the plane with twothrusters, each pointing through the center of mass, exerting forces in the x and y directions

14

Page 15: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

with 100% efficiency. The hovercraft has mass 1. The discretized equations of motion for thehovercraft are

x(t+ 1) =

1 1 0 00 1 0 00 0 1 10 0 0 1

x(t) +

12 01 00 1

20 1

[ u1(t)u2(t)

]

where x1 and x2 are the position and velocity in the x-direction, and x3, x4 are the positionand velocity in the y-direction. Here

u(t) =

[u1(t)u2(t)

]is the force acting on the hovercraft for time in the interval [t, t+ 1). Let the position of thevehicle at time t be q(t) ∈ R2.

a) The hovercraft starts at the origin. We’d like to apply thrust to make it move throughpoints p1, p2, p3 at times t1, t2, t3, where

p1 =

[1−1

2

]p2 =

[01

]p3 =

[−3

20

]t1 = 6 t2 = 40 t3 = 50

We will run the hovercraft on the time interval [0, 70]. We’d like to apply a sequenceof inputs u(0), u(1), . . . , u(70) to make the hovercraft position pass through the abovesequence of points at the specified times.

We would like to find the sequence of inputs that drives the hovercraft through thedesired points which has the minimum cost, given by the sum of the squares of theforces:

70∑t=0

‖u(t)‖2

To do this, pick Ahov and ydes to set this problem up as an equivalent minimum-normproblem, where we would like to find the minimum-norm useq which satisfies

Ahovuseq = ydes

where useq is the sequence of force inputs

useq =

u(0)u(1)...

u(70)

Plot the trajectory of the hovercraft using this input, and the way-points p1, . . . , p3. Alsoplot the optimal u against time.

15

Page 16: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

b) Now we would like to compute the trade-off curve between the accuracy with which themass passes through the waypoints and the norm of the force used. Let our two objectivefunctions be

J1 =3∑i=1

‖q(ti)− pi‖2 = ‖Ahovuseq − ydes‖2

and

J2 =70∑t=0

‖u(t)‖2

By minimizing the weighted sumJ1 + µJ2

for a range of values of µ, plot the trade-off curve of J1 against J2 showing the achievableperformance. To generate suitable values of µ, you may find the logspace commanduseful in Matlab; you’ll need to pick appropriate maximum and minimum values. Thisabove trade-off curve shows how we can trade-off between how accurately the hovercraftpasses through the waypoints and how much input energy is used.

c) For each of the following values of µ

{ 10p2 | p = −2, 0, 2, . . . , 10 }

plot the trajectories all on the same plot, together with the waypoints.

d) Now suppose we are controlling the hovercraft by radio control, and the maximum rangepossible between the transmitter and receiver is 2 (in whatever units we are using fordistance.) Notice that, if we use the minimum-norm input then the hovercraft passesout of range, both when making its first turn and on the final stretch (between times 50and 70).

We’d like to do something about this, but trading off the input norm as above doesn’tdo the right thing; if µ is large then the hovercraft stays within range, but misses thewaypoints entirely; if µ is small then it comes close to the waypoints, but goes out ofrange. Notice that this is particularly a problem on the final stretch between times 50and 70; explain why this is.

e) One remedy for this problem is to solve a constrained multiobjective least-squares prob-lem. We would like to impose the constraint that

Ahovuseq = ydes

that is, achieve zero waypoint error J1 = 0. We can attempt to keep the hovercraft inrange by trading off the sum of the squares of the position

J3 =

70∑t=0

‖q(t)‖2

against input cost J2 subject to this constraint. To do this, we’ll solve

minimize J3 + γJ2

subject to Ahovuseq = ydes

16

Page 17: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

First, find the matrix W so that the cost function is given by

J3 + γJ2 = ‖Wuseq‖2

f) Now we have a problem of the form

minimize ‖Wu‖2

subject to Au = ydes

This is called a weighted minimum-norm solution; the only difference from the usualminimum-norm solution to Au = ydes is the presence of the matrix W , and when W = Ithe optimal u is just given by uopt = A†ydes. Show that the solution for general W is

uopt = Σ−1AT (AΣ−1AT )−1ydes

where Σ = W TW . (One way to do this is using Lagrange multipliers.) Use this to solvethe remaining parts of this problem.

g) For each of the following values of γ

{ 10p2 | p = 0, 2, 4, . . . , 20 }

Plot the trajectories all on the same plot, together with the waypoints. Explain whatyou see.

h) By trying different values of γ, you should be able to find a trajectory which just keeps thehovercraft within range. Plot the trajectory of the hovercraft; what is the correspondingvalue of γ? Is this the smallest-norm input u that just keeps the hovercraft within range,and drives the hovercraft through the waypoints? Explain why, or why not.

i) For a range of values of γ, plot the trade-off curve of J3 against J2 showing the achievableperformance.

Solution.

a) Setting

C =

[1 0 0 00 0 1 0

]gives the position of the hovercraft at time t as

y(t) =

t−1∑τ=0

CAt−1−τBu(τ)

The parameters for the least-squares problem are therefore

Ahov =

CAt1−1B CAt1−1 · · · CB 0 0 . . . 0CAt2−1B CAt2−2B · · · 0CAt3−1B CAt3−2B · · · 0

ydes =

p1p2p3

17

Page 18: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

Solving this least squares problem gives optimal trajectory

−2 −1 0 1 2

−2

−1

0

1

2

The corresponding optimal input sequence is below.

0 10 20 30 40 50 60 70−0.05

0

0.05

0.1

b) The weighted sum objective is

J1 + µJ2 =

∥∥∥∥[ Ahov√µI

]useq −

[ydes

0

]∥∥∥∥2

where

useq =

u(0)...

u(69)

18

Page 19: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

and so the optimal input sequence is given by

useq =

[AwayõI

]† [ydes

0

]Choosing values of µ between 1 and 107 using mus=logspace(0,7,50), the trade-offcurve is shown below.

0 0.005 0.01 0.015 0.02 0.025 0.030

0.5

1

1.5

2

2.5

3

3.5

4

4.5

J2

J1

c) All of the trajectories together are

−2 −1 0 1 2

−2

−1

0

1

2

We can see clearly that increasing µ reduces the accuracy with which the trajectorypasses through the waypoints.

19

Page 20: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

d) On the final stretch the input is zero, and so is unaffected by increasing µ. We wereattempting to use the heuristic ’keeping u small keeps x small’ but this fails, becausewhen u = 0 the hovercraft just keeps going in a straight line.

e) We would like to minimize J3 +γJ2 subject to the constraints that the hovercraft movesthrough the waypoints. Denote the sequence of positions of the hovercraft by

yseq =

y(0)...

y(T )

where T = 70. Then we have

yseq = Tuseq

where T is the Toeplitz matrix

T =

0CB 0CAB CB 0

... . . .CAT−1B CAT−2B . . . CB

Now the cost function is

J3 + γJ2 = ‖Tuseq‖2 + γ‖useq‖2

= ‖Wuseq‖2

where

W =

[T√γI

]f) We’d like to solve

minimize ‖Wu‖2

subject to Au = ydes

One way to solve this is using Lagrange multipliers; if we augment the cost function bythe Lagrange multipliers multiplied by the constraints, we have

L(u, λ) = uTΣu+ λT (Au− ydes)

and the optimality conditions are

∂L

∂u= 2uToptΣ + λTA = 0

∂L

∂λ= uToptA

T − yTdes = 0

20

Page 21: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

The first condition gives

uopt = −1

2Σ−1ATλ

and substituting this into the second we have

−1

2AΣ−1ATλ = ydes

henceλ = −2(AΣ−1AT )−1ydes

anduopt = Σ−1AT (AΣ−1AT )−1ydes

as desired.

21

Page 22: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

g) The trajectory for a range of γ values is shown below. (Actually these are clearer onseparate plots)

−2 −1 0 1 2

−2

−1

0

1

2 γ=1

−2 −1 0 1 2

−2

−1

0

1

2 γ=100

−2 −1 0 1 2

−2

−1

0

1

2 γ=1000

−2 −1 0 1 2

−2

−1

0

1

2 γ=3000

−2 −1 0 1 2

−2

−1

0

1

2 γ=10000

−2 −1 0 1 2

−2

−1

0

1

2 γ=1e+06

−2 −1 0 1 2

−2

−1

0

1

2 γ=1e+08

−2 −1 0 1 2

−2

−1

0

1

2 γ=1e+10

We can see the trade-off clearly; decreasing γ causes the hovercraft to try very hard to

22

Page 23: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

stay close to the origin. Also notice the asymmetry caused by the different times atwhich the hovercraft must be at the waypoints.

h) A good choice of gamma is about 1.7 × 104. Here the trajectory just remains withinrange, as shown below.

−2 −1 0 1 2

−2

−1

0

1

2

This is not the smallest-norm u that keeps the hovercraft within range and drives thehovercraft through the waypoints, because we are minimizing the sum of the squaresof ‖q(t)‖, rather than constraining each ‖q(t)‖ independently. You can see this in theplot, since in the final stretch the hovercraft is expending extra effort to stay well withinrange, and this excessive input could be reduced.

In fact, one can compute the exact optimal, but this is not required and not covered inthis course; (an approximation of) it is below.

23

Page 24: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

−2 −1 0 1 2

−2

−1

0

1

2

i) The trade-off is below.

0 0.5 1 1.5 20

10

20

30

40

50

60

70

80

90

100

J2

J3

Notice that the vertical asymptote occurs when J2 ≈ 0.03; this is the minimum-norm ofu which drives the hovercraft through the desired trajectory, as seen in part (b).

6. You Must Construct Additional Pylons. You are the Hierarch of the Baelaam chargedwith maintaining the power levels of energizing pylons which power various structures in yourbase of operations. Consider m structures powered by n pylons. Each structure’s energy level

24

Page 25: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

yj for j = 1...m is given by

yj(p) = log(n∑i=1

exp(pid2j,i

))

Where pi are the power levels of the i’th pylon and dj,i are the distances between the j’thstructure and the i’th pylon (we choose log-sum-exp as a smooth approximation of the maxfunction). While each structure has some given target energy level Rj , they can handle somedeviation (either over or under), however that will cause damage to the Nexus Crystals thatact as energy conduits for the structure. Your goal as Hierarch is to find a set of pylon powerlevels p ∈ Rn that minimizes the total square deviation, J , from the required energy levels.

J(p) =

m∑j=1

(Rj − yj(p))2

Your chief engineer proposes that you could linearize the yj(p) function to find an updatealgorithm that starts with some initial pylon power level and changes the power each step bya small amount to reduce the total energy deviation J.

a) Find an update expression for the approximate power level y(p+δp) as a linear dynamicalsystem where y ∈ Rm is the vector of structure energy levels. I.E find A and B suchthat

y(p+ δp) ≈ Ay(p) +Bδp

We want to relate the energy level at p + δp to the energy level at p and the change inenergy from a small change in power δp.

hint : B is not necessarily constant

b) Derive an expression for the one step change in power levels that minimizes

J(p+ δp) =m∑j=1

(Rj − yj(p+ δp))2

as a function of y(p), A,B, δp. Use the result of this minimization problem, (the optimalδp) to determine an update expression for p[k+ 1] = p[k] +αδp, where α is a given stepsize, and k is the current iteration. If your method involves an inverse, explain whatconditions must hold in order for the inverse to exist.

c) Given the following list of required energy levels and locations of each structure andpylon, apply your algorithm for 200 iterations with an α = .01 and in initial power levelof p[0] =[20, 40, 20].

Plot the pylon power levels and the structures energy levels for each iteration. as wellas the the power deviation metric J. There should be 3 plots total. It should convergein roughly 150-200 iterations.

Also report the final cost and pylon power levels

Stucture_energy_goal=[10, 20 , 5 , 10 , 5]

25

Page 26: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

Strcuture_location=[2 8; 4 5; 6 8; 2 2; 4 1]Pylon_location=[2 5; 3 4 ; 5 4]p0=[20 40 20]

For locations, each row is an x,y location.

Solution. Solution.

a) 6 points

First we linearize the energy function by finding the Jacobian of the energy function.Clearly A = I , but B is a bit more involved

y(p+ αδp) = y(p) +Bδp = y(p) +

∂y1∂p1

. . . ∂y1∂pn

......

...∂ym∂p1

. . . ∂ym∂pn

δpnow we find the partial derivatives

∂yj∂pi

=exp( pi

d2j,i)∑n

i=1 exp( pid2j,i

)

1

d2j,i

We see that B is now just a function of p. Partial Credit was given to those whoattempted to find B

b) 7 points

We can substitute our expression for y[k + 1] here

J [k + 1] = ||(R− (y(p) +Bδp)||2 = ||z −Hδp||2

Where z is the difference between R− y[k] (the last error vector) and H is the Jacobianmatrix of the energy function Which yields δp(HTH)−1HT z as the expression thatminimizes the one step power update. Thus the update to the pylon power is given by

y[k + 1] = y[k] + α(HTH)−1HT (R− y[k])

c) 7 points Final pylon power levels after 200 iterations

[81.3, 35.2, 34.7]

Final cost 3.7

for 50 iterations

Final pylon power levels [52.7, 40.3, 35.7]

Final cost 24.1

Code (in Julia) that solves this problem [language=Julia]code2.jl

26

Page 27: Homework 3 Solution - EE263: Introduction to Linear ...ee263.stanford.edu/hw/hw3/hw3_sol.pdf · Homework 3 Solution EE263 Stanford University, Fall 2017 Due: Wednesday 10/18/17 11:59pm

2/figs/pylon.png

Figure 1: pylon power levels

2/figs/structure.png

Figure 2: structure energy levels

2/figs/cost.png

Figure 3: cost

27