hilbert spaces - web.eecs.umich.edufessler/course/600/l/l03.pdf · 3.4 c j. fessler, november 5,...

Chapter 3

Hilbert spaces

Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 3.2

Inner products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.2

Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 3.2

Induced norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 3.3

Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 3.4

Minimum norm optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 3.5

Chebyshev sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 3.5

Projectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 3.6

Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 3.6

Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.11

Orthogonal projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.12

Direct sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 3.13

Orthogonal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 3.14

Gram-Schmidt procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 3.15

Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 3.17

Normal equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 3.17

Gram matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 3.18

Orthogonal bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.18

Approximation and Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 3.18

Linear varieties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 3.20

Dual approximation problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.20

Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 3.22

Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 3.24

Complete orthonormal sequences / countable orthonormal bases . . .. . . . . . . . . . . . . . . . . . . . . . . . . 3.26

Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 3.26

Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 3.27

Normed spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.27

Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 3.28

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 3.28

Minimum distance to a convex set . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.29

Projection onto convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.30

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 3.31

3.1

3.2 c© J. Fessler, November 5, 2004, 17:8 (student version)

Key missing geometrical concepts:angleandorthogonality (“right angles”).

3.1Introduction

We now turn to the subset of normed spaces calledHilbert spaces, which must have aninner product . These are particularlyuseful spaces in applications/analysis.

Why not introduce Hilbert first then? For generality: it is helpful to see which properties are general to vector spaces, orto normedspaces, vs which require additional assumptions like an inner product.

Overview• inner product• orthogonality• orthogonal projections• applications

• least-squares minimization• orthonormalization of a basis• Fourier series

General forms of things you have seen before: Cauchy-Schwarz, Gram-Schmidt, Parseval’s theorem

3.2Inner products

Definition. A pre-Hilbert space, aka aninner product space, is a vector spaceX defined on the fieldF = R or F = C, alongwith an inner product operation〈·, ·〉 : X × X → F , which must satisfy the following axioms∀x,y ∈ X , α ∈ F .

1. 〈x, y〉 = 〈y, x〉∗ (Hermitian symmetry ), where∗ denotes complex conjugate.2. 〈x + y, z〉 = 〈x, z〉 + 〈y, z〉 (additivity )3. 〈αx, y〉 = α〈x, y〉 (scaling)4. 〈x, x〉 ≥ 0 and〈x, x〉 = 0 iff x = 0. (positive definite)

Properties of inner products

Bilinearity property:⟨

∑

i

αixi,∑

j

βjyj

⟩

=∑

i

∑

j

αiβ∗j 〈xi, yj〉.

Lemma. In an inner product space, if〈x, y〉 = 0 for all y, thenx = 0.Proof.Let y = x. 2

Cauchy-Schwarz inequality

Lemma. For allx,y in an inner product space,

|〈x, y〉| ≤√

〈x, x〉√

〈y, y〉 = ‖x‖ ‖y‖ (see induced norm below),

with equality iff x andy are linearly dependent.Proof.For anyλ ∈ F the positive definiteness of〈·, ·〉 ensures that

0 ≤ 〈x − λy, x − λy〉 = 〈x, x〉 − λ〈y, x〉 − λ∗〈x, y〉 + |λ|2 〈y, y〉.

If y = 0, the inequality holds trivially. Otherwise, considerλ = 〈x, y〉/〈y, y〉 and we have

0 ≤ 〈x, x〉 − |〈y, x〉|2 /〈y, y〉.

Rearranging yields|〈y, x〉| ≤√

〈x, x〉〈y, y〉 = ‖x‖ ‖y‖ .The proof about equality conditions isProblem 3.1. 2

This result generalizes all the “Cauchy-Schwarz inequalities” you have seen in previous classes,e.g., vectors inRn, random

variables, discrete-time and continuous-time signals, each of which corresponds to a particular inner product space.

c© J. Fessler, November 5, 2004, 17:8 (student version) 3.3

Angle

Thanks to this inequality, we can generalize the notion of theanglebetween vectors to any general inner product space as follows:

θ = cos−1

( |〈x, y〉|‖x‖ ‖y‖

)

, ∀x,y 6= 0.

This definition is legitimate since the argument ofcos−1 will always be between 0 and 1 due to the Cauchy-Schwarz inequality.

Induced norm

Proposition. In an inner product space(X , 〈·, ·〉), theinduced norm ‖x‖ =√

〈x, x〉 is indeed a norm.

Proof.What must we show?• The first axiom ensures that〈x, x〉 is real.• ‖x‖ ≥ 0 with equality iff x = 0 follows from Axiom 4.• ‖αx‖ =

√

〈αx, αx〉 =√

α〈x, αx〉 =√

α〈αx, x〉∗ =√

αα∗〈x, x〉∗ = |α|√

〈x, x〉 = |α| ‖x‖, using Axioms 1 and 3.• The only condition remaining to be verified is the triangle inequality:‖x + y‖2

= 〈x, x〉 + 〈x, y〉 + 〈y, x〉 + 〈y, y〉= ‖x‖2

+ 2 real(〈x, y〉) + ‖y‖2 ≤ ‖x‖2+ 2 |〈x, y〉| + ‖y‖2 ≤ ‖x‖2

+ 2 ‖x‖ ‖y‖ + ‖y‖2= (‖x‖ + ‖y‖)2 . 2

(Recall ifz = a + ıb, thena = real(z) ≤√

a2 + b2 = |z|.). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .Any inner product space is necessarily a normed space.Is the reverse true? Not in general.

The following property distinguishes inner product spacesfrom mere normed spaces.

Lemma. (Theparallelogram law.) In an inner product space:

‖x + y‖2+ ‖x − y‖2

= 2 ‖x‖2+ 2 ‖y‖2

, ∀x,y ∈ X . (3-1)

Proof.Expand the norms into inner products and simplify. 2

x-y

x

y

x+y

Remarkably, the converse of this Lemma also holds (see,e.g., problem [2, p. 175]).

Proposition. If (X , ‖·‖) is a normed space overC or R, and its norm satisfies the parallelogram law (3-1), thenX is also aninner product space, with inner product:

〈x, y〉 =1

4

(

‖x + y‖2 − ‖x − y‖2+ i ‖x + iy‖2 − i ‖x − iy‖2

)

.

Proof.homework challenge problem.

Continuity of inner products

Lemma. In an inner product space(X , 〈·, ·〉), if xn → x andyn → y, then〈xn, yn〉 → 〈x, y〉.Proof.

|〈xn, yn〉 − 〈x, y〉| = |〈xn, yn〉 − 〈x, yn〉 + 〈x, yn〉 − 〈x, y〉| ≤ |〈xn, yn〉 − 〈x, yn〉| + |〈x, yn〉 − 〈x, y〉|= |〈xn − x, yn〉| + |〈x, yn − y〉|≤ ‖xn − x‖ ‖yn‖ + ‖x‖ ‖yn − y‖ by Cauchy-Schwarz

≤ ‖xn − x‖M + ‖x‖ ‖yn − y‖ sinceyn is convergent and hence bounded

→ 0 asn → ∞.

Thus〈xn, yn〉 → 〈x, y〉. 2


Examples

Many of the normed spaces we considered previously are actually induced by suitable inner product space.

Example. In Euclidean space, the usual inner product (aka “dot product”) is

〈x, y〉 =n∑

i=1

aibi, wherex = (a1, . . . , an) andy = (b1, . . . , bn).

Verifying the axioms is trivial. The induced norm is the usual `2 norm.

Example. For the space2 over the complex field, the usual inner product is1 〈x, y〉 =∑

i aib∗i .

The Holder inequality, which is equivalent to the Cauchy-Schwarz inequality for this space, ensures that|〈x, y〉| ≤ ‖x‖2 ‖y‖2 .So the inner product is indeed finite forx,y ∈ `2. Thus`2 is not only a Banach space, it is also an inner product space.

Example. What about `p for p 6= 2? Do suitable inner products exist?

ConsiderX =(

R2, ‖·‖p ·

)

with x = (1, 0) andy = (0, 1).

The parallelogram law holds (for thisx andy) iff 2(1 + 1)2/p = 2 · 12 + 2 · 12, i.e., iff 22/p = 2.Thus`2 is only inner product space in the`p family of normed spaces.

Example. The space of measurable functions on[a, b] with inner product

〈f, g〉 =

∫ b

a

w(t)f(t)g∗(t) dt,

wherew(t) > 0, ∀t is some (real) weighting function. Choosingw = 1 yieldsL2[a, b].

Hilbert space

Definition. A completeinner product space is called aHilbert space.

In other words, a Hilbert space is a Banach space along with aninner product that induces its norm. The addition of the innerproduct opens many analytical doors, as we shall see.

The concept “complete” is appropriate here since any inner product space is a normed space.

All of the preceding examples of inner product spaces were complete vector spaces (under the induced norm).

Example. The following is an inner product space, butnot a Hilbert space, since it is incomplete:

R2[a, b] =

{

f : [a, b] → R : Riemann integral∫ b

a

f2(t) dt < ∞}

,

with inner product (easily verified to satisfy the axioms):〈f, g〉 =∫ b

af(t)g(t) dt .

1Note that the conjugate goes with the second argument become ofAxiom 3. I have heard that some treatments scale the second argument in Axiom 3, whichaffects where the conjugates go in the inner products.


Minimum norm optimization problems

Section3.3 is called “the projection theorem” and it is about a certain type of minimum norm problem. Before focusing on thatspecific minimum norm problem, we consider the broad family of such problems.

Consider the following problem, which arises in many applications such as in approximation problems:Givenx in a normed space(X , ‖·‖), and a subsetS in X , find “the” vectors ∈ S that minimizes‖x − s‖.

Example. Control subject to energy constraint. See Section 3.11.

Example. Least-squares estimation:minθ ‖y −∑n

i=1 θixi‖ is equivalent tominm∈[{x1,...,xn}] ‖y − m‖ .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .What questions should we ask about such problems?• Is there any bests? I.e., does there exists? ∈ S s.t. ‖x − s?‖ = d(x, S)?

What answers do we have so far for this question? ??• If so, iss? unique?Answers thus far? ??• How iss? characterized? (Better yet would be an explicit formula fors?.)

Chebyshev sets

One way to address such questions is “answer by definition.”

Definition. A set S in a normed space(X , ‖·‖) is calledproximinal [16] iff ∀x ∈ X , thereexists at least one points ∈ Ss.t. ‖x − s‖ = d(x, S).

Definition. In a normed space, a setS is called aChebyshev setiff ∀x ∈ X , there exists auniques ∈ S s.t. ‖x − s‖ = d(x, S).

Fact. Any proximinal set is closed. (The points inS − S do not have a closest point inS.)Fact. Any Chebyshev set is a proximinal set.Fact. Any compact set is a proximinal set (due to Weierstrass theorem).

Note that we have not said anything about inner products here, so why not study minimum norm problems in detail in Banachspaces? The answer is due to one of the most famous unsolved problems in functional analysis: characterization of Chebyshev setsin general Banach spaces and in infinite-dimensional Hilbert spaces. What is known includes the following.(See Deutsch paper [17], a scanned version of which is available on the course web page.)• inner product space

• In finite-dimensional Hilbert spaces, any Chebyshev set is closed, convex, and nonempty.• “Conversely,” in any inner product space, any complete and convex set is Chebyshev. (We will prove this later in3.12).

• normed space• Are all Chebyshev sets convex? In general: no. A nonconvex Chebyshev set (in an incomplete infinite-dimensional normed

space within the space of finite-length sequences) is given in [18].• In [3, p. 285], an example is given in a Banach space of a closed(and thus complete) subspace (hence convex) that is not a

Chebyshev set.

There is continued effort to characterize Chebyshev sets,e.g., [19–21].

Since the characterization of Chebyshev sets is unsolved innormed spaces, we focus primarily on closed convex sets in innerproduct spaces hereafter.

However, this fact is encouraging. IfS is a nonempty closed subset ofRn, then

{x ∈ R

n : arg miny∈S ‖x − y‖ is nonunique}

has Lebesgue measure zero [16, p. 493].

Are all complete and bounded sets also proximinal?The answer is yes in finite-dimensional normed spaces since there all closed and bounded sets are compact.But the answer is “not necessarily” in infinite dimensional normed spaces, even in a Hilbert space in fact.

Example. Let S = {(1 + 1/n)en : n ∈ N} ⊂ `p. ThenS is bounded, and is complete since there are no “non-trivial”Cauchysequences inS. Sinced(0, (1 + 1/n)en) = 1 + 1/n, we haved(0, S) = 1, yet there is nos ∈ S for which‖s − 0‖ = 1.


compact completeproximinal

closed

Projectors

If S is a Chebyshev set in a normed space(X , ‖·‖), then we can define aprojector P : X → S that, for each pointx ∈ X , givesthe closest pointP(x) ∈ S. In other words, for a Chebyshev setS, we can define legitimately

P(x) = arg mins∈S

‖x − s‖ ,

and “arg min” is well defined since there exists a unique minimizer whenS is Chebyshev.

When needed for clarity, we will writePS rather than justP .

Such a projector satisfies the following properties.• P(x) ∈ S, ∀x ∈ S• ‖x − P(x)‖ ≤ ‖x − y‖ , ∀y ∈ S, i.e., ‖x − P(x)‖ = d(x, S)• P(P(x)) = P(x) or more concisely:P2 = P.

As noted above, closedness ofS is a necessary condition for existence of a projector definedon all ofX .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .Example. ConsiderX = R andS = [0,∞). This is a Chebyshev set withP(x) = max{x, 0}.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .Example. ConsiderX = R

2 and the (compact) setK ={y ∈ R

2 : ‖y‖ = 1}

(the unit circle). There is not a unique minimizerof the distance tox = 0, the center of the unit circle.

This why there is not a plethora of signal processing papers on “projections onto compact sets” (POKS?) methods.

3.3Orthogonality

Definition. In an inner product space, two vectorsx,y are calledorthogonal iff 〈x, y〉 = 0, which is denotedx ⊥ y.(This is consistent with the earliercos−1 definition of angle.)

Definition. A vectorx is called orthogonal to a setS iff ∀s ∈ S, x ⊥ s, in which case we writex ⊥ S.

Definition. Two setsS andT in an inner product space(X , 〈·, ·〉) are calledorthogonal and writtenS ⊥ T iff

〈s, t〉 = 0, ∀s ∈ S, ∀t ∈ T .

Exercise. Showx ⊥ S = [y1, . . . ,yn] iff x ⊥ yi, i = 1, . . . , n.

Lemma. (Pythagorean theorem) x ⊥ y =⇒ ‖x + y‖2= ‖x‖2

+ ‖y‖2.

Proof.‖x + y‖2= ‖x‖2

+ ‖y‖2+ 2 real(〈x, y〉) . 2

The converse does not hold. ConsiderC and the vectorsx = 1 andy = ı. Then‖x + y‖2= ‖x‖2

+ ‖y‖2= 2 but x andy are

not perpendicular since here〈x, y〉 = xy∗ = −ı 6= 0.


We first consider the easiest version of the general minimum norm problem, where the set of interest is actually asubspaceof X .

Theorem. (Pre-projection theorem)LetX be an inner product space,M a subspace ofX , andx a vector inX .• If there exists a vectorm0 ∈ M such that‖x − m0‖ = d(x,M), then thatm0 is unique.• A necessary and sufficient condition thatm0 ∈ M be the unique minimizing vector inM is that theerror vector x − m0

be orthogonal toM .

Proof.

Claim 1. If∃m ∈ M (not necessarily unique) s.t.‖x − m0‖ = d(x,M), thenx − m0 ⊥ M .

Pf. We show by contradiction. Supposem0 ∈ M is a minimizer, butx − m0 ⊥�M .Then∃m ∈ M s.t. 〈x − m0, m〉 = δ 6= 0, for someδ, where w.l.o.g. we assume‖m‖ = 1.Considerm1 , m0 + δm ∈ M .

‖x − m1‖2= ‖x − m0 − δm‖2

= ‖x − m0‖2−δ∗〈x−m0, m〉−δ〈m, x−m0〉+ |δ|2 = ‖x − m0‖2−|δ|2 < ‖x − m0‖2,

contradicting the assumption thatm0 is a minimizer.

Claim 2. If∃m0 ∈ M s.t.x − m0 ⊥ M , thenm0 is a unique minimizer.

Pf. For anym ∈ M , sincem0 − m ∈ M , by the Pythagorean theorem:

‖x − m‖2= ‖(x − m0) + (m0 − m)‖2

= ‖x − m0‖2+ ‖m0 − m‖2

> ‖x − m0‖2

if m 6= m0. Som0, and onlym0, is the minimizer. 2

x

M

m00

In words: in an inner product space, if a subspace is proximinal, then it is Chebyshev.

The “good thing” about this theorem is that it does not require completeness ofX , only the presence of an inner product. However,it does not prove the existence of a minimizer!

Is this lack of an existence proof simply because “we” have not been clever enough to find it?Or, are there (incomplete) inner product spaces in which no such minimizer exists?(We cannot find an example drawing 2d or 3d pictures, sinceE

n is complete!)

Example. Consider the Hilbert space2 with the incomplete (and hence non-closed) subspaceM that consists of sequences withonly finitely many nonzero terms, and considerx = (1, 1/2, 1/3, 1/4, . . .).

For n ∈ N, let mn be identical tox for the firstn terms, and then zero thereafter. Then‖x − mn‖22 =

∑∞k=n (1/k)

2, which

approaches 0 asn → ∞, but{mn} converges tox /∈ M . Sod(x,M) = infm∈M ‖x − m‖ = 0.

But clearly nom ∈ M achieves this minimum sincex has an infinite number of nonzero terms.


But how about an example whereX is incomplete andM is a closed subspace?

Example. (See [3, p. 289].)Consider the (incomplete) inner product spaceX = (“finite-length” sequences, ‖·‖2). Let u ∈ `2 denote the sequence of realsui = 1/2i, and define the following (uncountably infinite) subspace:M = {x ∈ X :

∑∞i=1 xiui = 0} .

Claim 1.M is closed.Suppose{yn} ∈ M andyn → y ∈ X . Then (borrowing the2 inner product):|〈y, u〉| = |〈y − yn, u〉| ≤ ‖y − yn‖2 ‖u‖2 →0 asn → ∞ (using Cauchy-Schwarz inequality) so〈y, u〉 = 0 and hencey ∈ M .

Claim 2. infm∈M ‖x − m‖ is not achieved, wherex = (1, 0, 0, . . .) /∈ M butx ∈ X .Supposem ∈ M achieves that infimum. Then by the pre-projection theorem,z , x − m ⊥ M .Sincez ∈ X , z = (z1, . . . , zN , 0, 0, . . .) for someN ∈ N.Define wi = 1

uiei − 1

uN+1eN+1 for i = 1, . . . , N where{ei} denotes the standard basis vectors for`2. Sincewi ∈ M ,

〈z, wi〉 = 0. But thereforezi = 0, i = 1, . . . , N , soz = 0, i.e., x = m ∈ M . This contradicts the factx /∈ M .

To establishexistenceof a minimizer, we make a stronger assumption:completenessof the subspace. Or, more frequently, we

assume the inner product space is complete, so that all closed subspaces within it are complete.Why? ??

Theorem. (Theclassical projection theorem)LetM be a completesubspace of an inner product spaceH (e.g., M may be a closedsubspace of a Hilbert space).• For anyx ∈ H, there exists a uniquem0 ∈ M such that‖x − m0‖ = d(x,M), i.e., M is Chebyshev.• Furthermore, a necessary and sufficient condition thatm0 ∈ M be the unique minimizer is thatx − m0 ⊥ M .

Proof. Uniqueness and the characterization ofm0 in terms of orthogonality was established in the pre-projection theorem, so wefocus on existence.

Clearly, ifx ∈ M , thenm0 = x and we are done. Forx /∈ M , δ = d(x,M) > 0. Why? Ifd(x,M) were zero, then we could finda sequencemi ∈ M such thatd(x,mi) → 0, meaningmi → x, but that would implyx ∈ M becauseM is closed, contradictingx /∈ M . Soδ > 0.

Let {mi} denote a sequence of vectors inM such that‖x − mi‖ < δ + 1/i, which is possible by definition ofd(x,M).

Claim 1.{mi} is Cauchy.

By the parallelogram law:

‖(mj − x) + (x − mi)‖2+ ‖(mj − x) − (x − mi)‖2

= 2 ‖mi − x‖2+ 2 ‖x − mj‖2

so‖mi − mj‖2

= 2 ‖mi − x‖2+ 2 ‖x − mj‖2 − 4 ‖x − 1

2(mj + mi)‖2

.

However,12(mj + mi) ∈ M , so‖x − 1

2(mj + mi)‖ ≥ δ. Thus

‖mi − mj‖2 ≤ 2 ‖mi − x‖2+ 2 ‖x − mj‖2 − 4δ2 → 2δ2 + 2δ2 − 4δ2 = 0 asi, j → ∞.

Since{mi} is Cauchy, andM is complete,∃m0 ∈ M s.t.mi → m0.

Since norms are continuous,

‖x − m0‖ =∥∥∥x − lim

i→∞mi

∥∥∥ = lim

i→∞‖x − mi‖ ≤ lim

i→∞δ + 1/i = δ = d(x,M) .

2

Remark.The key step in the proof was (the clever use of) the parallelogram law, a defining property of inner product spaces.

Remark.The proof uses only completeness ofM , not ofH. We will use this generality in a subsequent example.


Polynomial approximation example

Consider the problem of approximating the functionx(t) = sin−1 t over the interval [-1,1] by a third-order polynomial.

If we want the approximation to fit better at the ends than in the middle, then the following inner product space is reasonable:

X = {f : [−1, 1] → R : f is continuous} , 〈f, g〉 =

∫ 1

−1

w(t)f(t)g(t) dt, wherew(t) = 1 + t2. (Picture)

Sincex(t) is an odd function, the following subspace ofX suffices:

M ={at + bt3 : a, b ∈ R

}.

Is X complete? ?? Is M? ??

To find the best 3rd-order polynomial approximation,i.e., m? = arg minm∈M ‖x − m‖, we apply the projection theorem, whichcharacterizes the minimizer throughx − m? ⊥ M. Denotingm?(t) = ct + dt3, then

∫ 1

−1

w(t)(sin−1 t − ct − dt3)(at + bt3) dt = 0, ∀a, b ∈ R.

Sinceaq+br = 0,∀a, b ∈ R ⇐⇒ q = r = 0, we can reduce the problem to the following finite-dimensional system of equations:[ ∫ 1

−1(1 + t2)t sin−1 t dt

∫ 1

−1(1 + t2)t3 sin−1 t dt

]

=

[ ∫ 1

−1(1 + t2)t2 dt

∫ 1

−1(1 + t2)t4 dt

∫ 1

−1(1 + t2)t4 dt

∫ 1

−1(1 + t2)t6 dt

] [cd

]

.

Using MATLAB ’s symbolic toolbox for the integration yields:

[cd

]

=

[16/15 24/3524/35 32/63

]−1 [13/3213/48

]

π =

[296/1027148/1027

]

π.

Thusm?(t) =296π

1027t +

148π

1027t3, som?(0) = 296π/1027 ≈ 0.91.

The following figure showsx(t), the Taylor approximationt + t3/3!, and the minimum norm approximationm?(t). Although theTaylor approximation fits the best neart = 0, the minimum norm approximation has a better overall fit.

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−2

−1.5

−1

−0.5

0

0.5

1

1.5

2f(t) = asin(t) Taylor approximationMinimum norm

It would be fair to argue that we did not really need the general version of the projection theorem for this problem. We could havesolvedmina,b∈R

∫ 1

−1w(t)(x(t) − at − bt3)2 dt by conventional methods. The forthcoming Fourier series examples, whereM is

infinite dimensional, are (perhaps?) more compelling.

What about ‖·‖1? ??


Complete subspaces versus Chebyshev subspaces

The preceding theorems and examples might lead one to conjecture that completeness of a subspaceM is anecessaryconditionfor M to be Chebyshev. The next example shows otherwise.

Example. Consider the (incomplete) inner product spaceX consisting of sequences with finitely many nonzero terms, with theusual`2 inner product, and define the subspaceM = {(a1, a2, . . .) ∈ X : a1 = 0} . This subspace is closed, but is not completefor the same reasons thatX is incomplete. Nevertheless,M is Chebyshev, andx = (a1, a2, a3, . . .) =⇒ PM (x) = (0, a2, a3, . . .).

These complications are eliminated if we focus on Hilbert spaces. In a Hilbert space, completeness of a subspace becomesanecessarycondition for the subspace to be Chebyshev.

Theorem. [22, p. 192] In a Hilbert spaceH, a subspaceM is Chebyshev if and only ifM is complete.

Proof.M complete=⇒ M Chebyshev follows from the projection theorem.M Chebyshev=⇒ M closed=⇒ M is complete (sinceH is also a complete normed space).

Summary

The following Venn diagram summarizes the situation with subspaces in any inner product space.

subspacesclosed

Chebyshevcomplete

In Hilbert spaces (including all finite-dimensional inner product spaces), the three ellipses coincide!

Example. In signal processing problems, the subspace ofband-limited (continuous-time) signals with a certain band-limit isimportant. Is this subspace complete?

ConsiderX = L2[R] and defineM ⊂ X to be the subspace of all square-integrable functions having Fourier transform supportedon the frequency interval[−νmax, νmax]. SinceX is a Hilbert space, the preceding theorem says that the question of whetherM iscomplete is equivalent to determining whetherM is Chebyshev. In this case, a simple way to answer that is to construct a projectorfor M . Givenx ∈ X , can we find a (unique, in theL2 sense) band-limited functiony? such that

‖y? − x‖ ≤ ‖y − x‖ , ∀y ∈ M.

Applying Parseval’s theorem from Fourier analysis:

y ∈ M =⇒ ‖y − x‖2=

∫ ∞

−∞

|y(t) − x(t)|2 dt =

∫ ∞

−∞

|Y (ν) − X(ν)|2 dν

=

∫ νmax

−νmax

|Y (ν) − X(ν)|2 dν +

∫

|ν|>νmax

|X(ν)|2 dν .

Clearly this is minimized by takingy? to be the (unique inL2 sense) signal having Fourier transform

Y?(ν) =

{X(ν), |ν| ≤ νmax

0, otherwise.

Since a projector exists,M is Chebyshev, and henceM is complete.

Of course, in this caseM is also convex.


3.4Orthogonal complements(The key toduality )

We saw in the projection theorem that anorthogonality condition characterizes the closest point in a complete subspace of aninner product space. We now examine orthogonality in more detail.

In R2, we decompose any vector into a sum of an “x-component” vector and a “y-component vector,” the two of which are

orthogonal. We can extend this concept considerably in general Hilbert spaces.

Definition. If S is a subset of an inner product space(X , 〈·, ·〉), then theorthogonal complementof S is thesubspace:

S⊥ , {x ∈ X : x ⊥ S} .

Clearly• x ∈ S⊥ ⇐⇒ x ⊥ S• y ∈ S =⇒ y ⊥ S⊥.

Example. What is {0}⊥? ??

Example. In 3-space, what is the orthogonal complement of the “cone” {αx : α ∈ [0,∞)} for some x 6= 0? ??

Proposition. If S andT aresubsetsof an inner product space(X , 〈·, ·〉), then the following hold.(a) S⊥ is a closed subspace ofX .(b) S ⊆ S⊥⊥ When equal? See next theorem!(c) If S ⊆ T , thenT⊥ ⊆ S⊥.(d) S⊥⊥⊥ = S⊥

(f) S⊥ =(S

)⊥= S⊥

Proof.

(a)S⊥ is a subspace since linear combinations of vectors orthogonal toS are also orthogonal toS.Suppose{xn} ∈ S⊥ with xn → x ∈ X . Since〈xn, s〉 = 0, ∀s ∈ S, by continuity of the inner product we have:〈x, s〉 = 〈limn→∞ xn, s〉 = limn→∞〈xn, s〉 = 0, sox ∈ S⊥.

(b) y ∈ S =⇒ y ⊥ S⊥ =⇒ y ∈ S⊥⊥, soS ⊆ S⊥⊥.

(c) If S ⊆ T , theny ∈ T⊥ =⇒ y ⊥ x, ∀x ∈ T =⇒ y ⊥ x, ∀x ∈ S =⇒ y ∈ S⊥. SoT⊥ ⊆ S⊥.

(d) From aboveS⊥ ⊆ S⊥⊥⊥. Also S ⊆ S⊥⊥ so by 3rd property:S⊥⊥⊥ ⊆ S⊥. ThusS⊥⊥⊥ = S⊥.

(f) From (a),S⊥ = S⊥, which is the second equality. Now pick anyx ⊥ S. If y ∈ S, then∃sn ∈ S s.t.sn → y. Sincex ⊥ S,we havex ⊥ sn, ∀n, so〈x, y〉 = 〈x, limn→∞ sn〉 = limn→∞〈x, sn〉 = 0. Thusx ⊥ S =⇒ x ⊥ S.

Sincex was arbitrary,S⊥ ⊆(S

)⊥. Furthermore,S ⊆ S =⇒

(S

)⊥ ⊆ S⊥. ThusS⊥ =(S

)⊥. 2

Proposition. If S is asubsetof a Hilbert space, then the following hold.(e) S⊥⊥ = [S], i.e., S⊥⊥ is thesmallestclosed subspace containingS.(g) S⊥ is complete

Proof.

(e) See problem3.9. ??

(g) follows sinceS⊥ is closed and a Hilbert space is complete. 2

Example. E2 with S = {(1, 0)}. ThenS⊥ = [(0, 1)] , S⊥⊥ = [(1, 0)] 6= S, S⊥⊥⊥ = [(0, 1)] = S⊥.

Example. E2 with S = {(1, 0), (2, 0)}. andT = {(1, 0)}. ThenS⊥ = [(0, 1)] = T⊥, yet it is not the case thatS ⊂ T . So the

converse of (c) above fails.


Orthogonal projection

The projection theorem allows us to extend the geometric projection property of Euclidean space to general inner product spaces.

Definition. Let M be aChebyshev subspaceof an inner product spaceX . For eachx ∈ X , let P(x) be the unique point inMclosest tox:

P(x) = arg minm∈M

‖x − m‖ .

ThenP : X → M is called theorthogonal projection of X ontoM . When needed for clarity, we writePM rather than justP.

Lemma. If M is aChebyshev subspaceof an inner product space, then• M⊥ is a Chebyshev set,• PM⊥(x) = P⊥

M (x) , x − PM (x), which is called theprojection onto the orthogonal complement2,• x = PM (x) +P⊥

M (x) wherePM (x) ∈ M, P⊥M (x) ∈ M⊥.

Proof. (Exercise)

Recall the following properties of the projectorP onto a Chebyshev subsetS.• P(x) ∈ S,• ‖x − P(x)‖ = d(x, S)• P(P(x)) = P(x) (idempotent)

Here are some trivial properties oforthogonalprojections (onto subspaces) that follow directly from theprojection theorem.• x − P(x) ⊥ M• P⊥(x) ∈ M⊥

• P(x) ⊥ P⊥(x)• x = P(x)+P⊥(x)

Here are some less trivial properties.

Proposition. If M is a Chebyshev subspace of an inner product spaceX , then the orthogonal projectorP = PM has thefollowing properties.

(a) P(x) = 0 iff x ⊥ M (cf. earlier figure)(b) P : X → M is a linear operator (cf. earlier figure)(c) |||P ||| = supx6=0

‖P(x)‖ / ‖x‖ = 1 (see p. 105) (providedM is at least 1 dimensional,i.e., not simply{0})(d) P is continuous,i.e., xn → x =⇒ P(xn) → P(x).

Proof.(a) P(x) = 0 ⇐⇒ x ⊥ M follows directly from the “characterization” part of the pre-projection theorem.(b) P(x1) = m1 andP(x2) = m2 =⇒ x1 − m1 ⊥ M andx2 − m2 ⊥ M .

Thusα(x1 − m1) + β(x2 − m2) ⊥ M so(αx1 + βx2) − (αm1 + βm2) ⊥ M.ThusP(αx1 + βx2) = αm1 + βm2 = α P(x) +β P(x2) by the pre-projection theorem.

(c) x ∈ M =⇒ P(x) = x so|||P ||| ≥ 1. x ∈ H =⇒ x − P(x) ⊥ M =⇒ x − P(x) ⊥ P(x) sinceP(x) ∈ M .So by Pythagorean:‖x‖2

= ‖x − P(x) +P(x)‖2= ‖x − P(x)‖2

+ ‖P(x)‖2 ≥ ‖P(x)‖2 so‖P(x)‖ / ‖x‖ ≤ 1.

(d) Exercise. ??xn

x

M

P(x)P(xn)

0

2

2Notice the reuse of the symbol⊥ here. This is reasonable sincePM⊥ = P

⊥

MwhenM is a Chebyshev subspace in an inner product space.


Direct sum

Recall that ifS,T are subsets of a common vector spaceX , thenS + T = {s + t : s ∈ S, t ∈ T} . However, in general, ifx ∈ S + T , decompositions of the formx = s + t need not be unique.

Definition. A vector spaceX is called thedirect sum of subspacesM andN iff eachx ∈ X has auniquerepresentation asx = m + n wherem ∈ M andn ∈ N . We notate this situation as follows:

X = M ⊕ N.

Fact. If {u1, . . . ,un} are a linearly independent set of vectors, then

[{u1, . . . ,un}] = [{u1}] ⊕ · · · ⊕ [{un}]

This is an algebraic concept, so we could have introduced it much earlier. But its main use is in the context of Hilbert spaces.

Theorem. If M is a Chebyshev subspace of an inner product spaceX , then

X = M ⊕ M⊥, andM⊥⊥ = M.

Proof.As shown previously,x = m? + n? wherem? = P(x) ∈ M andn? = P⊥(x) ∈ M⊥.

(However, uniqueness ofm? as the minimizer inM of ‖x − m‖ does not alone ensure uniqueness of the decompositionm? +n?,so we must prove uniqueness next.)

Supposex = m + n with m ∈ M andn ∈ M⊥. Then0 = x − x = (m? + n?) − (m + n) = (m? − m) + (n? − n), butm? − m ⊥ n? − n, so by the Pythagorean theorem:0 = ‖0‖2

= ‖m? − m‖2+ ‖n? − n‖2. Thusm? = m andn? = n.

SinceM ⊆ M⊥⊥ was shown previously, we need to showM⊥⊥ ⊆ M .Supposex ∈ M⊥⊥. By the first part of this theorem,x = m + n wherem ∈ M ⊆ M⊥⊥ andn ∈ M⊥.Since bothx andm are inM⊥⊥, n = x − m ∈ M⊥⊥. But alson ∈ M⊥, son ⊥ n = 0 =⇒ n = 0.Thusx = m ∈ M and henceM⊥⊥ ⊆ M sincex was arbitrary. 2

Corollary . For any subsetS of a Hilbert spaceX : X = [S] ⊕ [S]⊥

.

Summarizing previous results:• In any inner product space, for anysubsetS we haveS ⊆ S⊥⊥.• In any inner product space, for anyChebyshev subspaceM we haveM = M⊥⊥.

Example. A subspaceM in a Hilbert space whereM 6= M⊥⊥.

TakeX = `2 andM = {sequences with finitely many nonzero terms} (not closed). ThenM⊥ = {0} andM⊥⊥ = `2 = M 6= M .

Example of a closed subspace in an inner product space where M 6= M⊥⊥? (Exercise). ??


Having established the fundamental theory of inner productspaces, we now move towards “applications:” Fourier seriesand otherminimum norm problems like approximation.

3.5Orthogonal sets

Orthogonal sets of vectors in a Hilbert space (such as complex exponentials for ordinary Fourier series) are very usefulin applica-tions such as signal analysis.

Definition. A setS of vectors in an inner product space is called anorthogonal setiff

u,v ∈ S, u 6= v =⇒ u ⊥ v.

If in addition each vector inS has unity norm, thenS is called anorthonormal set.

Remark.An orthogonal set can include the zero vector. An orthonormal set cannot.

Example. L2[0, 1] with S = {cos(2πkt) : k ∈ N} (countable).

Do uncountable orthogonal sets exist?The exampleS ={1{t=a} : a ∈ (0, 1)

}fails sinceS ≡ 0 in L2.

Proposition. In any inner product space, an orthogonal set of nonzero vectors is a linearly independent set.

Proof.Suppose{u1, . . . ,un} ⊂ S and∑n

i=1 αiui = 0 yetui 6= 0, ∀i. Then

0 = 〈0, uk〉 =

⟨n∑

i=1

αiui, uk

⟩

= αk ‖uk‖2,

soαk = 0, k = 1, . . . , n. Thus the vectors are linearly independent.Sincen and theαi’s andui’s were arbitrary, the set is linearly independent. 2

Fact. (Projection onto the span of a single vector.) Using a convenient shorthand:

Pu(x) , P[{u}](x) =

{〈x, u〉

‖u‖2 u, u 6= 0

0, u = 0.

Proof. 〈αu, x − Pu(x)〉 = α⟨

u, x − 〈x, u〉

‖u‖2 u⟩

= α (〈u, x〉 − 〈x, u〉∗) = 0, sox − Pu(x) ⊥ [{u}] .

Fact. If {u1, . . . ,un} form an orthogonal set, then

P[{u1,...,un}](x) = Pu1(x)+ · · · + Pun

(x) .

Proof.Exercise.

Fact. If M andN are orthogonalChebyshev subspaces of an inner product space, then (Prob. 3.7)• M + N = M ⊕ N ,• M ⊕ N is aChebyshev subspace, and• PM⊕N (x) = PM (x) +PN (x) .


Gram-Schmidt procedure

Since orthonormal sets are so convenient, it is fortunate the we can always create such sets by using theGram-Schmidt orthogo-nalization proceduredescribed in the proof of the following theorem.

This is another generalization of a familiar method in finitedimensions to general inner product spaces.

Theorem. (Gram-Schmidt)If {xi} is a finite or countable sequence of linearly independent vectors in an inner product space(X , 〈·, ·〉), then there existsan orthonormal sequence{ei} such that

[{e1, . . . ,en}] = [{x1, . . . ,xn}], ∀n ∈ N.

Proof.Linearly independent vectors are necessarily nonzero, so‖xi‖ 6= 0.

Takee1 = x1/ ‖x1‖, which clearly has unity norm and spans the same space asx1.

Form the remaining vectors recursively:

zn = xn −n−1∑

i=1

〈xn, ei〉ei, en = zn/ ‖zn‖ , n = 2, 3, . . . .

Being a linear combination of linearly independent vectorszn is nonzero. Andzn ⊥ ei for i = 1, . . . , n − 1 is easily verified.Since we can writexn as a linear combination of theei vectors, by an induction argument the span of{e1, . . . ,en} equals thespan of{x1, . . . ,xn}. 2

Note:

zn = xn −n−1∑

i=1

〈xn, ei〉ei = xn −n−1∑

i=1

Pei(xn) = xn − P[{e1,...,en−1}](xn) = P⊥

[{e1,...,en−1}](xn) = P⊥

[{x1,...,xn−1}](xn) .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .

Corollary . Any finite-dimensional inner product space has an orthonormal basis.

Example. The polynomialsxi(t) = ti, i = 0, 1, . . . , are linearly independent inL2[−1, 1].

Why? ??

Applying Gram-Schmidt yields the orthogonal polynomials:

e0(t) =x0(t)

‖x0‖=

1√2, z1(t) = x1(t) − 〈x1, e0〉e0(t) = t −

(∫ 1

−1

t · 1√2

dt

)

· 1√2

= t, ‖z1‖2=

∫ 1

−1

t2 dt = 2/3,

e1(t) =z1(t)

‖z1‖=

√

3

2t, z2(t) = x2(t) − 〈x2, e0〉e0(t) − 〈x2, e1〉e1(t) = t2 − 2

3, . . . .

One can show by induction that

en(t) =

√

2n + 1

2Pn(t), n = 0, 1, 2, . . . ,

where thePn(t) are theLegendre polynomials

Pn(t) =1

2nn!

dn

dtn(t2 − 1)n.

Clearly some subset of theseei(t)’s will be an orthonormal basis for any finite-dimensional space of polynomials.

Is the entire collection some type of “basis” for L2[−1, 1]? (It is not a Hamel basis forL2.)We will return to this question soon!


−1 −0.5 0 0.5 1−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

t

e n(t)

Legendre polynomials on [−1,1] for n=0,...,5


Approximation

A previous example considered the problem of approximatingarcsin(t) by a 3rd-order polynomial, which reduced to a 2 by 2system of equations since only 2 of the 4 coefficients were relevant.

We now consider suchapproximation problems more generally, and see that such reductions to a finite system of equations is thegeneral behavior when the subspaceM is finite dimensional.

SupposeM is a finite-dimensional subspaceof an inner product space(X , 〈·, ·〉). Then by definition,M = [{y1, . . . ,yn}] forsome vectorsyi ∈ X . Furthermore, being finite dimensional,M is complete, so by the projection theorem,M is a Chebyshev set.Thus, given an arbitrary vectorx ∈ X , there exists a uniqueapproximationx ∈ M that isclosestto x, as measured, of course, bythe norm induced by the inner product:

‖x − x‖ = d(x,M) = infm∈M

‖x − m‖ .

Now we would like to find an explicitformula for x, since “existence and uniqueness” alone is inadequate for most practicalapplications.

3.6Normal equations

Sincex ∈ M =⇒ x =∑n

i=1 αiyi, we must find then scalars{αi} that minimize‖x − ∑ni=1 αiyi‖.

Theprojection theorem ensures thatx exists, and is characterized byx − x ⊥ M , or equivalentlyx − x ⊥ yj for j = 1, . . . , n.Thus

0 = 〈x − x, yj〉 =

⟨

x −n∑

i=1

αiyi, yj

⟩

= 〈x, yj〉 −n∑

i=1

αi〈yi, yj〉, j = 1, . . . , n.

Rearranging yields then × n system of linear equations for the scalar coefficients, called thenormal equations:

〈y1, y1〉 · · · 〈yn, y1〉...

. .....

〈y1, yn〉 · · · 〈yn, yn〉

α1

...αn

=

〈x, y1〉...

〈x, yn〉

.

If the yi’s are vectors inCm, with the usual inner product〈x, y〉 = y′x, then defining them×n matrixY = [y1 . . . yn] we have

α = (Y ′Y )−1Y ′x.

In particular, ifn = 1, thenx = 〈x, y/ ‖y‖〉y/ ‖y‖ (cf. the picture we draw).

Example. See previous polynomial approximation toarcsin(t).


Gram matrices

Then × n matrix above is called theGram matrix of {y1, . . . ,yn}.Its determinantg(y1, . . . ,yn) is called theGram determinant.

To find the bestαi’s, we must solve the above system of equations, which has a unique solution iff the Gram determinant is nonzero.

Proposition. The Gram determinant is nonzero iff the vectors{y1, . . . ,yn} are linearly independent.

Proof. If the yi’s are linearly dependent, then∃ αi’s not all zero such that∑

i αiyi = 0.Thus the rows of the Gram matrix are also linearly dependent,so the determinant is zero.

Conversely, if the determinant is zero, then the rows of the Gram matrix are linearly dependent, so∃ αi’s not all zero such that∑

i αi〈yi, yj〉 = 0 for all j. Thus 〈∑

i αiyi, yj〉 = 0 for all j, so∑

j α∗j 〈

∑

i αiyi, yj〉 = 0. Thus‖∑

i αiyi‖ = 0 so∑

i αiyi = 0 so theyi’s are linearly dependent. 2

Remark.If the yi’s are linearly dependent, then there are multiple solutions to the normal equations, all of which are equally goodapproximations.Often, at least in signal processing, of these many solutions we prefer the one that minimizes the Euclidean norm of(α1, . . . , αn).

However, no matter which solution forα we choose, when added up viax =∑n

i=1 αiyi we will get the samex sincex is uniqueby the projection theorem! Uniqueness ofx is different than uniqueness ofα.

The text also describes the explicit error norm formula (which does not seem particularly useful):

‖x − x‖2=

g(y1, . . . ,yn,x)

g(y1, . . . ,yn).

We see now that the approximation problem has an easily computed solution whenM is a finite dimensional subspace.We will see later in3.10that the same is true ifM⊥ is finite dimensional!

Orthogonal bases

What happens if the yi’s are orthonormal?Then the Gram matrix is just the identity matrix, and we can immediately write down the optimal approximation:

x =n∑

i=1

αiyi, αi = 〈x, yi〉, or equivalentlyx =n∑

i=1

Pyi(x).

To generalize this result, we want to considerinfinite dimensional approximating subspaces, since that is often of most interest inpractice (e.g., ordinary Fourier series).

3.9Approximation and Fourier series

Returning to the case of finite-dimensional subspaceM , we can find the minimum norm solution by first applying Gram-Schmidtto orthonormalize a (linearly independent) basis forM , and then use the Fourier series:

x =

n∑

i=1

〈x, ei〉ei.

Thus applying Gram-Schmidt is equivalent to inverting the Gram matrix. Pick your poison...


Weighted least-squares FIR filter design

Example. Suppose we have a given desired frequency responseD(ω) that we would like to approximate by a FIR filter withimpulse response

h[n] =

M∑

k=0

h[k] δ[n − k]

and corresponding frequency response

H(ω) =

M∑

k=0

h[k] e−ıωk ∈[1, e−ıω , e−ı2ω , . . . , e−ıMω

].

The natural inner product space here isL2[−π, π] but with aweightedinner product

〈H1, H2〉 =

∫ π

−π

H1(ω)H∗2 (ω)W (ω) dω,

where the positive weighting functionW (ω) can influence which frequency bands require the closest match betweenD(ω) andH(ω) since the induced norm is

‖D − H‖2=

∫ π

−π

|D(ω) − H(ω)|2 W (ω) dω .

The Gram matrixG has elements

Gkl = 〈e−ıωk , e−ıωl〉 =

∫ π

−π

e−ıω(k−l) W (ω) dω, k, l = 0, . . . ,M,

so the WLS optimal filter design has coefficientsh = G−1d, whered = (d0, . . . , dM ) with

dk = 〈D, e−ıωk〉 =

∫ π

−π

D(ω) eıωk W (ω) dω, k = 0, . . . ,M.

Because the complex exponentialse−ıωk are linearly independent, the Gram matrix is invertible.

What if we want to minimize ‖D − H‖∞ instead? Use Remez algorithm...


We now explore other minimum norm problems, in particular infinite dimensional ones where neither the normal equations nor theFourier series solutions are applicable directly. In particular, we consider the broad family of problems involving linear varieties.

Linear varieties (from 2.3and3.10)

Definition. A subsetV of a vector spaceX is called alinear variety if V = x0 + M for somex0 ∈ X and some subspaceM ofX . Another term used isaffine subspace.

Exercise. If (X ,F) is a vector space andV ⊂ X , then the following are equivalent.• V is a linear variety.• For anyx? ∈ V , the setM = {x − x? : x ∈ V } is a subspace.• ∀x,y,z ∈ V, α ∈ F , α(x − y) + z ∈ V .

Example. In R2, considerV = {(a, b) : a + b = 1, a, b ∈ R}. (A line not through the origin.)

Exercise. If V = x0 + M , thenV is complete / Chebyshev / closed iffM is complete / Chebyshev / closed.

• Any subspace is a linear variety (takex0 = 0), so linear varieties are a small generalization of subspaces.• A single pointx0 is a linear variety (takeM = {0}).

Fact. The pointx0 need not be unique. IfV = x0+M is a linear variety, then for anyv0 ∈ V we can writeV = v0+M , becausev0 = x0 + m0 for somem0 ∈ M , so if v ∈ V thenv = x0 + m = v0 − m0 + m = v0 + (m − m0) wherem − m0 ∈ M .

In a “variety” of problems one would like to find thex having minimum norm (e.g., minimum energy) subject to certain constraints.

Example. Continuing the previous example, the following figure shows thev? ∈ V with minimum‖·‖2 norm.

V

v?

3.10Dual approximation problem

The following theorem shows that such problems always have aunique solution and characterizes that solution.

Theorem. (Modified projection theorem.)Let M be a Chebyshev subspace of an inner product spaceX . If V = x0 + M is a linear variety wherex0 ∈ X , then thereexists a uniquev? ∈ V having minimum norm, andv? is characterized completely by the two conditionsv? ∈ V andv? ⊥ M .

Proof.Simply translateV by−x0 and apply the projection theorem:

infv∈V

‖v‖ = infm∈M

‖x0 + m‖ = infm∈M

‖x0 − m‖ = ‖x0 − x?‖ wherex? = PM (x0) andx0 − x? ⊥ M.

So usev? , x0 − x? = P⊥M (x0) ∈ V andv? ⊥ M andv? has minimum norm inV . 2


Remark.Note thatv? ⊥ M , notv? ⊥ V , cf. preceding figure.

x0x0

v? x?

M M

V

Why called dual ? Perhaps:

x? = arg minm∈M

‖x0 − m‖ = x0 − v? wherev? = arg minv∈V =x0+M

‖v‖ .

Exercise. Generalize to the problemarg minv∈V ‖x − v‖ if possible. ??

One important application of linear varieties is in the study of problems withconstraints.

In particular, the projection theorem led to the very convenient normal equations for the case of finite-dimensional subspaces. Butthere are also problems where something akin to the normal equations still apply even though the problem appears to be infinitedimensional.

Example. The setV in the previous example could be writtenV ={x ∈ R

2 : 〈x, y〉 = 1}

, wherey = (1, 1).

The following proposition shows that such sets are always linear varieties.

Proposition. Let {y1, . . . ,yn} be a finite set oflinearly independentvectors in an inner product spaceX .Then for given scalarsc1, . . . , cn, the following set is a closed linear variety:

V = {x ∈ X : 〈x, yi〉 = ci} .

Moreover, there exists a uniquex0 ∈ [y1, . . . ,yn] such thatV = x0 + [y1, . . . ,yn]⊥.

Proof.Theyi’s are linearly independent, so the Gram matrix is nonsingular, and there is a uniquex0 ∈ [y1, . . . ,yn] such that〈x0, yi〉 =ci for i = 1, . . . , n. ThusV is nonempty and consists at least of this single pointx0.

Claim. V = x0 + [y1, . . . ,yn]⊥.

x ∈ V ⇐⇒ 〈x, yi〉 = ci, i = 1, . . . , n

⇐⇒ 〈x − x0, yi〉 = 〈x, yi〉 − 〈x0, yi〉 = ci − ci = 0, i = 1, . . . , n

⇐⇒ x − x0 ⊥ [y1, . . . ,yn] ⇐⇒ x − x0 ∈ [y1, . . . ,yn]⊥.

ThusV = x0 + [y1, . . . ,yn]⊥. Recall that orthogonal complements are closed, so[y1, . . . ,yn]⊥ is closed.(We did not use completeness to show this, rather just the continuity of the inner product!)... V is closed by a preceding Exercise.

SupposeV = x1 + [y1, . . . ,yn]⊥. Thenx0 ∈ V =⇒ x0 = x1 + n wheren ∈ [y1, . . . ,yn]⊥. Thus〈x0, yi〉 = 〈x1 + n, yi〉 =〈x1, yi〉 = ci for i = 1 . . . , n, sox1 = x0 by the uniqueness discussed above. 2

Remark.In general a linear varietyV can be infinite dimensional. However, for the specific type ofV in the above proposition, wesayV hascodimensionn since the orthogonal complement of the underlying subspacehas dimensionn.


Applications

Two types of linear varieties are of particular interest in optimization problems.• V = {x +

∑ni=1 cixi : ci ∈ R} , where thexi’s are linearly independent

• V = {x ∈ X : 〈x, yi〉 = ci, i = 1, . . . , n}Both reduce to finite dimensional problems thanks to the projection theorem.

Theorem. Let{y1, . . . ,yn} be a set of linearly independent vectors in an inner product spaceX . Let

V = {x ∈ X : 〈x, yi〉 = ci, i = 1, . . . , n} .

Then there exists a uniquev? ∈ V with minimum norm. Moreover,v? =∑n

i=1 βiyi, where theβi’s satisfy the normalequations

〈y1, y1〉 · · · 〈yn, y1〉...

.. ....

〈y1, yn〉 · · · 〈yn, yn〉

β1

...βn

=

c1

...cn

.

Remark.Note thatV is necessarily nonempty by the previous Proposition, due tothe linear independence.

Proof.From the previous proposition,V = x0 + M⊥, whereM = [y1, . . . ,yn] andx0 ∈ M .Being finite dimensional,M is Chebyshev, soM⊥ is also Chebyshev.So existence of a unique minimizingv? follows from the modified projection theorem.Likewise,v? ⊥ M⊥ follows from that theorem. Thusv? ∈ M⊥⊥ = M sinceM⊥ is Chebyshev.Sincev? ∈ M , we havev? =

∑nj=1 βjyi for someβ’s.

We also needv? ∈ V , i.e., v? must satisfy the constraints〈v?, yi〉 = ci or equivalently⟨

n∑

j=1

βjyj , yi

⟩

=

n∑

j=1

βj〈yj , yi〉 = ci, i = 1, . . . , n,

leading to the normal equations. 2

Remark.Combining the projection theorem and the derivation of the normal equations yields the following theorem, which shouldbe contrasted with the previous theorem.

Theorem. ((3.10-2) Really just a corollary to projection theorem.)If M = [y1, . . . ,yn] is a finite-dimensional subspace of an inner product spaceX , then givenx ∈ X , there exists a uniquex? ∈ M s.t. ‖x − x?‖ = infm∈M ‖x − m‖. Furthermore,x − x? ⊥ M , andx? =

∑ni=1 αiyi where

〈y1, y1〉 · · · 〈yn, y1〉...

. .....

〈y1, yn〉 · · · 〈yn, yn〉

α1

...αn

=

〈x, y1〉...

〈x, yn〉

.

As the nice figure at the bottom of p. 67 shows, if eitherM or M⊥ is finite dimensional, then minimum norm problems reduce toa finite set of linear equations.

Skim 3.11: control problem example


Example. (See [4, p. 66].)

Consider the linear system

y(t) = y(0) +

∫ t

0

h(t, τ)u(τ) dτ

whereh ∈ C([0, T ] × [0, T ]),i.e., h(t, τ) is a real-valued function that is continuous on[0, T ] × [0, T ],

Problem: findu(·) that minimizes∫ T

0 |u(t)|2 dt subject toy(T ) = yf .

This is a minimum energy control problem.

Solution. LetH = L2[0, T ] with 〈x, y〉 =∫ T

0 x(t)y(t) dt.

Now h(T, ·) ∈ L2[0, T ] sinceφ(t) = h(T, t) is a continuous function on the compact set[0, T ].

Thus

yf = y(0) +

∫ T

0

h(T, τ)u(τ) dτ = y(0) + 〈h(T, ·), u(·)〉

so we have the following constraint set (with codimension = 1):

V = {u ∈ L2[0, T ] : y(t) = yf} ={

u ∈ L2[0, T ] : yf − y(0)︸︷︷︸

“c1”

= 〈h(T, ·)︸︷︷︸

“y1”

, u(·)〉}

.

So in function space notation our problem is

minu∈V

‖u‖ .

By a previous theorem, there is a unique solutionu?(t) that satisfies

u?(t) = βh(T, t), where〈h(T, ·), h(T, ·)〉β = yf − y(0) (normal equation),

so the general solution is

u?(t) =yf − y(0)

∫ T

0 h2(T, τ) dτh(T, t).

Interestingly, the system itself, throughh(T, t), determines theform of the solution; thisparticular constraint affects only a scale factor.


3.7Fourier series

Recall that aninfinite series of the form∑∞

i=1 xi is said toconvergeto x in a normed space iff the sequence of partial sumssn =

∑ni=1 xi converges tox, in which case we writex =

∑∞i=1 xi as short hand forx = limn→∞

∑ni=1 xi.

Lemma. If {ei} is anorthonormal sequencein an inner product space, then

∥∥∥∥∥

n∑

i=1

ciei

∥∥∥∥∥

2

=

n∑

i=1

|ci|2, ∀ci ∈ F , ∀n ∈ N.

This lemma is a form ofParseval’s relationship.

The following theorem gives necessary and sufficient conditions for convergence of an infinite series of orthogonal vectors.

Theorem. If {ei} is an orthonormal sequence in a Hilbert spaceH, then a series of the form∑∞

i=1 ciei converges to somex ∈ H iff

∑∞i=1 |ci|2 < ∞. In that case, we haveci = 〈x, ei〉.

Proof.Let sn =∑n

i=1 ciei andβn =∑n

i=1 |ci|2. Note{βn} is an increasing sequence.

Claim. {sn} is Cauchy iff∑∞

i=1 |ci|2 < ∞.

‖sn − sm‖2=

∥∥∥∥∥

m∑

i=n+1

ciei

∥∥∥∥∥

2

=

m∑

i=n+1

|ci|2 = |βn − βm|.

So{sn} is Cauchy⇐⇒ {βn} is Cauchy⇐⇒ {βn} converges inR (sinceR is complete)⇐⇒∑∞

i=1 |ci|2 < ∞.

SinceH is complete, when{sn} is Cauchy it converges to some limitx ∈ H s.t.x =∑∞

i=1 ciei.

Now 〈sn, ei〉 = ci, so by continuity of the inner product:

〈x, ei〉 =⟨

limn→∞

sn, ei

⟩

= limn→∞

〈sn, ei〉 = limn→∞

ci = ci.

2

This “continuity of the inner product” technique is very useful in such proofs.

Theci = 〈x, ei〉 values are called theFourier coefficientsof x w.r.t. {ei}.

Remark.The above theorem doesnot yet ensure thatanyx in H can be written as a Fourier series. That will come soon though.


Lemma. (Bessel’s inequality)If x is an element of an inner product space and{ei} is an orthonormal sequence in that space, then

∞∑

i=1

|〈x, ei〉|2 ≤ ‖x‖2.

Proof.Let ci = 〈x, ei〉.

0 ≤∥∥∥∥∥x −

n∑

i=1

ciei

∥∥∥∥∥

2

=

⟨

x −n∑

i=1

ciei, x −n∑

i=1

ciei

⟩

= ‖x‖2 −n∑

i=1

|ci|2,

so∀n ∈ N,∑n

i=1 |ci|2 ≤ ‖x‖2, so∑∞

i=1 |ci|2 ≤ ‖x‖2. 2

Remark.Bessel’s inequality guarantees that in a Hilbert spaceH, there existsx ∈ H such thatx =∑∞

i=1〈x, ei〉ei, thanks to thepreceding theorem.

Now we need to characterizex.

Theorem. If x is an element of a Hilbert spaceH, and{ei} is an orthonormal sequence inH, then

x ,

∞∑

i=1

〈x, ei〉ei ∈ M , [{ei}∞i=1],

which is called theclosed subspace generated by theei’s. Furthermore,x − x ⊥ M .

Why do we need a closure above? ??Proof.Convergence of the series follows from Bessel’s inequalityand the preceding theorem.

Clearlyx ∈ M sincex is the limit of partial sumssn =∑n

i=1 ciei ∈ M , whereci = 〈x, ei〉.By continuity of the inner product:

〈x − x, ei〉 = 〈x − limn→∞

sn, ei〉 = limn→∞

〈x − sn, ei〉 = limn→∞

ci − ci = 0,

sox − x ⊥ [{ei}∞i=1]. Using (f) of proposition on orthogonal complements, we concludex − x ⊥ [{ei}∞i=1]. 2

Corollary . If M is a closed subspace of a Hilbert spaceH and{ei} is an orthonormal sequence such thatM = [{ei}∞i=1], thenP : H → M , the orthogonal projection, is given by

P (x) =

∞∑

i=1

〈x, ei〉ei.

Now the key question is when is[{ei}∞i=1] = H? When the closed subspace generated by theei’s is all of H, then we can expandanyvector inH as a series of theei’s with the Fourier coefficients.


3.8Complete orthonormal sequences / countable orthonormal bases

See “review of bases”.

Definition. An orthonormal sequence{ei} in a Hilbert spaceH is calledcomplete(Luenberger) or acountable orthonormalbasis(Naylor and others) iff the closed subspace generated by theei’s isH, i.e., iff H = [{ei}∞i=1].

Lemma. If {ei} is an orthonormal sequence in a Hilbert spaceH, then the following are equivalent.• {ei} is complete.• The only vector that is orthogonal to all of theei’s is the zero vector,i.e., [

⋃∞i=1 {ei}]⊥ = {0}

• Every vector inH has a Fourier series representation,i.e., ∀x ∈ H, x =∑∞

i=1〈x, ei〉ei.• ∀x ∈ H, ‖x‖2

=∑∞

i=1 |〈x, ei〉|2, which is calledParseval’s equality[23, p. 24].

Proof. (Left to reader)

When does a Hilbert space H have a countable orthonormal basis? When (and only when)H is separable[23, p 21].For a (complicated) example of a nonseparable Hilbert space, see [3, p. 230].

Example. Completeness of the orthogonal polynomials inL2[−1, 1].

Sketch of proof.see textIt suffices to show that the only function that is orthogonal to all the polynomials is the zero function.Supposef ⊥ tn for all n = 0, 1, . . . for somef ∈ L2[−1, 1].Then its integralF (which is continuous) is also orthogonal to the polynomials, by integration by parts.Use Weierstrass and Cauchy-Schwarz to show that‖F‖ can be made arbitrarily small by choosing a sufficiently accurate polyno-mial approximation toF . SoF must be zero, and hencef must be zero a.e.

Example. Completeness of the complex exponentialsek(t) = eikt/√

2π for k = 0,±1,±2, . . . in L2[0, 2π].

see text

Caution. Equality inL2 is meant in theL2 sense,i.e., x = y meansx(t) = y(t) a.e.

Remark. Completeness of the complex exponentials does not contradict Gibbs phenomena for truncated Fourier series. Com-pleteness inL2 implies that‖x − sn‖2 → 0, but it can still be the case that‖x − sn‖∞ does not go to zero, and indeed it doesnot for functions with discontinuities. There is a big difference between convergence in integrated squared error and pointwiseconvergence.

Remark.The Fourier series coefficients ofx(t) = 1{t∈Q} are all zero.

Wavelets

What major post-1969 topic is missing here? Wavelets, of course.

There are both orthonormal wavelets and non-orthogonal wavelets.

A set{yn} ∈ H (a Hilbert space) is called aframe iff ∃A > 0, B < ∞ such that

A ‖x‖2 ≤∑

n

|〈x, yn〉|2 ≤ B ‖x‖2, ∀x ∈ H.

If the frame boundsA andB are equal, then we call the frametight .

In a tight frame [23, p. 27]

‖x‖2=

1

A

∑

n

|〈x, yn〉|2

x =1

A

∑

n

〈x, yn〉yn.

Despite how similar this last expression looks to the Fourier series representation for an orthonormal basis, theyn’s here need notbe orthogonal, and in fact may be linearly dependent [23, p. 27,320], in which case we call it anovercomplete expansion.


Bases

Vector spaces

Definition. A finitesum∑

n

i=1αixi for xi ∈ X andαi ∈ F is called alinear combination.

Definition. If S is asubsetof a vector spaceX , then thespanof S is the subspace of linear combinationsdrawn fromS:

[S] =

{

x ∈ X : x =n∑

i=1

αixi, for xi ∈ S, αi ∈ F , andn ∈ N

}

.

Definition. A setS of vectors is calledlinearly independent iff each vector in the set is linearly independentof the others,i.e.,

∀x ∈ S, x /∈ [S − {x}] .

S can even be uncountable.

Definition. A setS is called aHamel basis[3, p 183] forX iff S is linearly independent and[S] = X .Luenberger says “finite set” but Naylor [3, p. 183] and Maddox[2, p. 74] do not. Let us agree to use theabove definition, rather than Luenberger’s.

If S is linearly independent set in a vector spaceX , then there exists a basisB for X such thatS ⊆ B [3,p 184].Thus, every vector space has a Hamel basis, (since the empty set is linearly independent).

However, “Hamel basis is not the only concept of basis that arises in analysis. There are concepts of basisthat involve topological as well as linear structure. ... Inapplications involving infinite-dimensional spaces,a useful basis, if one even exists, is usually something other than a Hamel basis. For example, a completeorthonormal set is far more useful in an infinite-dimensional Hilbert space than a Hamel basis” [3, p. 183].

Normed spaces

A (Hamel) basis for a Banach space is either finite or uncountably infinite [3, p. 218].

What is the closure of a span?• If S is finite, then[S] is finite-dimensional so[S] = [S].• If S is countably infinite,i.e., S =

⋃∞

i=1xi, then[S] contains (at least) all the convergent series formed

from S:

[S] ⊃{

∞∑

i=1

αixi : αi ∈ F ,

∞∑

i=1

αixi is convergent

}

.

An example where[S] contains more than its convergent series is given in Naylor [3, p. 316], whereS isa countable collection of linearly independent vectors!

Definition. In a normed space,{xn} ∈ X is a Schauder basisfor X iff for eachx ∈ X , there exists aunique sequence{λn} such thatx =

∑∞

n=1λnxn [2, p. 98].

The famous Banach conjecture that every separable Banach space has a Schauder basis was shown to beincorrect by Elfon in 1973 [2, p. 100].

So for more satisfactory answers we need to turn to Hilbert spaces, which have better structure.


Inner product spaces

Any finite-dimensional inner product space has an orthonormal basis (via Gram-Schmidt).

Definition. An orthonormal set S = {xα} is maximal iff there is no unit vectorx0 ∈ X such thatS ∪ x0

is an orthonormal set.

Definition. A maximal orthonormal setB in a Hilbert spaceH is called anorthonormal basis.

Theorem. If {xn} be an orthonormal set in a Hilbert space, then [3, p. 307]:

{xn} is an orthonormal basis⇐⇒ x =∑

n

〈x, xn〉xn, ∀x ∈ X .

WhenS = {xn} is an orthonormal basis in Hilbert spaceH, we have

H = [S] =

{∞∑

i=1

αixi :∞∑

i=1

|αi|2 < ∞}

.

Orthonormal sets can be countable or uncountable, and the previous theorem applies to both cases [3, p. 314].In engineering we usually work in separable Hilbert spaces (notably`2 andL2).

Theorem. A Hilbert space has a countable orthonormal basis iff it is separable [3, p. 314].

In other words:Luenberger’scomplete orthonormal sequence≡ Naylor’scountable orthonormal basis

and Naylor’s terminology is more common in signal processing literature,e.g., Vetterli’s wavelet book.

Naylor notes that he deliberately avoids the term “complete” to describe certain orthonormal sets.

Summary

In Hilbert spaces, we have two kinds of bases:• Hamel bases, which are not very useful, (uncountable for infinite-dimensional spaces)• orthonormal bases, which are extremely useful.


3.12Minimum distance to a convex set

Thus far we have considered only minimum distances to subspaces and linear varieties. Many applications need broader sets.

Theorem.• LetK be a nonempty complete convex subset in an inner product spaceX (e.g., K may be a nonempty closed convex subset

of a Hilbert space.) ThenK is Chebyshev: for anyx ∈ X , there exists a unique vectork? ∈ K such that

‖x − k?‖ = d(x,K) = infk∈K

‖x − k‖ .

• Let K be a convex Chebyshev subset in an inner product spaceX . Then the projectorPK for K is characterized in thefollowing necessary and sufficient sense:k? = PK(x) ⇐⇒ real(〈x − k?, k − k?〉) ≤ 0 for all k ∈ K.

• LetK be a subset of an inner product spaceX with the property that for everyx ∈ X , there exists a unique pointk? suchthat real(〈x − k?, k − k?〉) ≤ 0 for all k ∈ K. ThenK is Chebyshev andk? = PK(x) .

Proof.

First we prove existence. Let{ki} be a sequence inK such that‖x − ki‖ → δ = d(x,K).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .Claim 1.{ki} is Cauchy.

By the parallelogram law,‖ki − kj‖2= ‖(ki − x) − (kj − x)‖2

= 2 ‖ki − x‖2+ 2 ‖kj − x‖2 − 4

∥∥∥x − ki+kj

2

∥∥∥

2

.

SinceK is convex, ki+kj

2 ∈ K, so∥∥∥x − ki+kj

2

∥∥∥

2

≥ δ2.

Thus‖ki − kj‖2 ≤ 2 ‖ki − x‖2+ 2 ‖kj − x‖2 − 4δ2 → 0 asi, j → ∞.

Since{ki} is Cauchy andK is complete, {ki} converges to somek? ∈ K.By continuity of the norm,‖x − k?‖ = limi→∞ ‖x − ki‖ = δ.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .Claim 2.k? is unique. (Proof by contradiction)

Supposek1 ∈ K also satisfies‖x − k1‖ = δ. Then the sequencekn =

{k?, n evenk1, n odd

is Cauchy by the same argument used

for Claim 1, so{kn} is convergent, which can only happen ifk? = k1.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .Claim 3.real(〈x − k?, k − k?〉) ≤ 0 for all k ∈ K. (Proof by contradiction.)

Suppose∃k ∈ K s.t. real(〈x − k?, k − k?〉) = ε > 0. Definekα = αk + (1 − α)k? ∈ K for α ∈ [0, 1] sinceK is convex.Definef(α) = ‖x − kα‖2, sof(0) = δ2. Now

f(α) = ‖x − αk − (1 − α)k?‖2= ‖(x − k?) − α(k − k?)‖2

= δ2 + α2 ‖k − k?‖2 − 2α real(〈x − k?, k − k?〉)

andd

dαf(α)

∣∣∣∣α=0

= αδ2 − 2 real(〈x − k?, k − k?〉)∣∣∣α=0

= −2ε < 0.

Thus∃α > 0 s.t.f(α) < f(0) = δ2, contradicting the minimizing norm property ofk?.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .Claim 4. If real(〈x − k?, k − k?〉) ≤ 0 for somek ∈ K, then‖x − k?‖ ≤ ‖x − k‖. (So “is characterized by” means “iff.”)

‖x − k‖2= ‖x − k? − (k − k?)‖2

= ‖x − k?‖2+ ‖k − k?‖2 − 2 real(〈x − k?, k − k?〉) ≥ ‖x − k?‖2

.

2

Remark.• The convexity ofK was used forboththe existence and characterization parts.• However, Claim 4 did not use convexity, which explains the last item in the Theorem.• In this case the characterization is an inequality, which isusually harder to work with.• AlthoughPK exists sinceK is Chebyshev, we do not have a general formula for it other than the “characterization” inequality.• Can we generalize further? No. In any finite-dimensional inner product space,K Chebyshev=⇒ K closed, convex, and

nonempty.


Revisiting subspaces

What happens whenK is in fact asubspace? Then for anyk0 ∈ K we can pickk = k? −k0 ∈ K andk = k? +k0 ∈ K to showthatreal(〈x − k?, k0〉) = 0. For complex subspaces we can also pickk = k? − ık0 ∈ K to show that the imaginary part is zero,to conclude thatx − k? ⊥ K. Conversely, ifk? ∈ K andx − k? ⊥ K, then〈x − k?, k − k?〉 = 0 for all k ∈ K.So we could have presented convex sets first, and then specialized to subspaces.

Example. See text p. 71 for a somewhat unsatisfying example involving nonnegativity constraints.

Example. In R2, consider the half planeK =

{(a, b) ∈ R

2 : a + b ≤ 0}

, which is a closed convex set.By sketching this set and using geometric intuition, one canconjecture that the projector is given by

PK((a, b)) = (a − p, b − p), wherep =

[a + b

2

]

+

, and [x]+ =

{x, x > 00, otherwise.

Note that[x]+ (x − [x]+) = 0.

To verify that the above projector is correct, we can check itagainst the characterization condition in the preceding Theorem.

If x = (a, b) andk? = PK(x) then

〈x − k?, k − k?〉 = 〈(a, b) − (a − p, b − p), (k1, k2) − (a − p, b − p)〉 = 〈(p, p), (k1 − a + p, k2 − b + p)〉= 2p2 − p(a + b) + p(k1 + k2)

≤ 2p2 − p(a + b) sincek ∈ K =⇒ k1 + k2 ≤ 0

= 2p

(

p − a + b

2

)

= 0.

Thus the above projector is correct.

Projection onto convex sets(POCS)

In light of the previous theorem, ifK is a nonempty closed convex subset of a Hilbert spaceH, thenK is Chebyshev and we canlegitimately define a projection operatorPK : H → K by

PK(x) = arg mink∈K

‖x − k‖ .

It is not an orthogonal projection in general,i.e., x − PK(x) ⊥�K, (unless of courseK happens to be a subspace).

Indeed, in general this projection inherits only the trivial properties of projectors given previously:• PK(x) ∈ K• PK(PK(x)) = PK(x)• ‖x − PK(x)‖ = d(x,K)

In addition,PK(·) is continuous [24] (homework problem).

There are a variety of convex sets of interest in signal and image processing problems, such as• The subspace of band-limited signals with a given band-limit.• The set of nonnegative signals.• The subspace of signals with a given time or spatial support.

Because of such examples, POCS methods are (somewhat) useful in signal processing. A typical problem would be “find the signalwith a given spatial support whose spectrum is given over certain frequency ranges only.”

Example. In Rn considerK = {x : xj ≥ 0, j = 1, . . . , n}. Then if k = PK(x) we havekj =

{xj , xj ≥ 00, otherwise.

If k ∈ K then〈x − k, k − k〉 =∑n

j=1(xj − kj)(kj − kj) =∑

j:xj<0(xj − 0)(kj − 0) ≤ 0, as required, sincekj ≥ 0.

??


Summary• Inner products and properties

• Cauchy-Schwarz• induced norm• parallelogram law• continuity

• Orthogonality• Pythagorean theorem• orthogonal complements• direct sum• orthogonal sets• Gram-Schmidt procedure• orthonormal bases

• Minimum norm problems• (orthogonal) projections onto subspaces• normal equations• Fourier series• complete orthonormal sequences (countable orthonormal bases)• minimum norm within linear variety (constraints)• minimum distance to convex sets

c© J. Fessler, November 5, 2004, 17:8 (student version)

1. P. Enflo. A counterexample to the approximation problem inBanach spaces.Acta Math, 130:309–17, 1973.2. I. J. Maddox.Elements of functional analysis. Cambridge, 2 edition, 1988.3. A. W. Naylor and G. R. Sell.Linear operator theory in engineering and science. Springer-Verlag, New York, 2 edition, 1982.4. D. G. Luenberger.Optimization by vector space methods. Wiley, New York, 1969.5. J. Schauder. Zur theorie stetiger abbildungen in funktionenrumen.Math. Zeitsch., 26:47–65, 1927.6. A. M. Ostrowski.Solution of equations in Euclidian and Banach spaces. Academic, 3 edition, 1973.7. R. R. Meyer. Sufficient conditions for the convergence of monotonic mathematical programming algorithms.J. Comput.

System. Sci., 12(1):108–21, 1976.8. M. Rosenlicht.Introduction to analysis. Dover, New York, 1985.9. A. R. De Pierro. On the relation between the ISRA and the EM algorithm for positron emission tomography.IEEE Tr. Med.

Im., 12(2):328–33, June 1993.10. A. R. De Pierro. On the convergence of the iterative imagespace reconstruction algorithm for volume ECT.IEEE Tr. Med.

Im., 6(2):174–175, June 1987.11. A. R. De Pierro. Unified approach to regularized maximum likelihood estimation in computed tomography. InProc. SPIE

3171, Comp. Exper. and Num. Meth. for Solving Ill-Posed Inv.Imaging Problems: Med. and Nonmed. Appl., pages 218–23,1997.

12. J. A. Fessler. Grouped coordinate descent algorithms for robust edge-preserving image restoration. InProc. SPIE 3170, Im.Recon. and Restor. II, pages 184–94, 1997.

13. A. R. De Pierro. A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography.IEEE Tr. Med. Im., 14(1):132–137, March 1995.

14. J. A. Fessler and A. O. Hero. Penalized maximum-likelihood image reconstruction using space-alternating generalized EMalgorithms.IEEE Tr. Im. Proc., 4(10):1417–29, October 1995.

15. M. W. Jacobson and J. A. Fessler. Properties of optimization transfer algorithms on convex feasible sets.SIAM J. Optim.,2003. Submitted.

16. P. L. Combettes and H. J. Trussell. Method of successive projections for finding a common point of sets in metric spaces. J.Optim. Theory Appl., 67(3):487–507, December 1990.

17. F. Deutsch. The convexity of Chebyshev sets in Hilbert space. In A. Yanushauskas Th. M. Rassias, H. M. Srivastava, editor,Topics in polynomials of one and several variables and theirapplications, pages 143–50. World Sci. Publishing, River Edge,NJ, 1993.

18. M. Jiang. On Johnson’s example of a nonconvex Chebyshev set. J. Approx. Theory, 74(2):152–8, August 1993.19. V. S. Balaganskii and L. P. Vlasov. The problem of convexity of Chebyshev sets.Russian Mathematical Surveys, 51(6):1127–

90, November 1996.20. V. Kanellopoulos. On the convexity of the weakly compactChebyshev sets in Banach spaces.Israel Journal of Mathematics,

117:61–9, 2000.21. A. R. Alimov. On the structure of the complements of Chebyshev sets.Functional Analysis and Its Applications, 35(3):176–

82, July 2001.22. Y. Bresler, S. Basu, and C. Couvreur. Hilbert spaces and least squares methods for signal processing, 2000. Draft.23. M. Vetterli and J. Kovacevic.Wavelets and subband coding. Prentice-Hall, New York, 1995.24. D. C. Youla and H. Webb. Image restoration by the method ofconvex projections: Part I—Theory.IEEE Tr. Med. Im.,

1(2):81–94, October 1982.

hilbert spaces - web.eecs.umich.edufessler/course/600/l/l03.pdf · 3.4 c j. fessler, november 5,...

Documents