linear algebra and normed spaces lecture notes · lappeenranta university of technology department...

72
Lappeenranta University of Technology Department of Mathematics and Physics Linear Algebra and Normed Spaces Lecture Notes MattiHeili¨o Matti.Heilio@lut.fi Lappeenranta 2012 Acknowledgement: A few of my students have helped in writing down my lecture notes in LaTeX. I thank Pekka Paalanen, Sapna Sharma, Vladimir X and N.N.

Upload: lethuan

Post on 01-Jul-2018

229 views

Category:

Documents


2 download

TRANSCRIPT

Lappeenranta University of TechnologyDepartment of Mathematics and Physics

Linear Algebra and Normed SpacesLecture Notes

Matti [email protected]

Lappeenranta2012

Acknowledgement: A few of my students have helped in writing downmy lecture notes in LaTeX. I thank Pekka Paalanen, Sapna Sharma, VladimirX and N.N.

Contents

1 Vector Space 21.1 What is a vector space? . . . . . . . . . . . . . . . . . . . . . 21.2 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Sum of subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 51.5 Product space . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Normed Spaces 62.1 Important inequalities . . . . . . . . . . . . . . . . . . . . . . 72.2 Norm in the product space . . . . . . . . . . . . . . . . . . . . 82.3 Equivalent norms, isomorphic spaces . . . . . . . . . . . . . . 82.4 Isometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.5 Ill posed problems, norm sensitive mappings . . . . . . . . . . 10

3 Convergence and continuity 113.1 Topological Terms . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Convergence of a Sequence . . . . . . . . . . . . . . . . . . . . 123.3 Continuity of a function . . . . . . . . . . . . . . . . . . . . . 123.4 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.6 Convexity, finite dimensional spaces . . . . . . . . . . . . . . . 143.7 Cauchy Sequence . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Banach Space 174.1 Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 Contraction Theorem . . . . . . . . . . . . . . . . . . . . . . . 184.3 Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.4 Compact Support . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Metric Spaces 215.1 Translation Invariance . . . . . . . . . . . . . . . . . . . . . . 225.2 Application areas . . . . . . . . . . . . . . . . . . . . . . . . . 22

6 Hilbert Space 236.1 Properties of Inner Product . . . . . . . . . . . . . . . . . . . 246.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . 256.3 Projection Principle . . . . . . . . . . . . . . . . . . . . . . . . 256.4 Projection Operator . . . . . . . . . . . . . . . . . . . . . . . 266.5 Orthogonal Sequence . . . . . . . . . . . . . . . . . . . . . . . 266.6 Orthonormal Sequence . . . . . . . . . . . . . . . . . . . . . . 27

1

6.7 Maximal Orthonormal Sequence . . . . . . . . . . . . . . . . . 286.8 Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . 286.9 Orthogonal Subspaces . . . . . . . . . . . . . . . . . . . . . . 29

7 Measure theory, measure spaces 307.1 Lebesque Integral . . . . . . . . . . . . . . . . . . . . . . . . . 317.2 Riemann-Stietltjes Integral . . . . . . . . . . . . . . . . . . . . 32

8 Linear transformations 338.1 Bounded Linear Transformation . . . . . . . . . . . . . . . . . 348.2 Norm of an Operator . . . . . . . . . . . . . . . . . . . . . . . 358.3 Composite operator . . . . . . . . . . . . . . . . . . . . . . . . 358.4 Exponential Operator Function . . . . . . . . . . . . . . . . . 368.5 Filters in Signal Spaces . . . . . . . . . . . . . . . . . . . . . . 368.6 Linear Functional . . . . . . . . . . . . . . . . . . . . . . . . . 378.7 Duality, weak solutions . . . . . . . . . . . . . . . . . . . . . . 398.8 Some classes of operators . . . . . . . . . . . . . . . . . . . . . 408.9 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

9 Operators in Hilbert Space 449.1 Self Adjoint Operator . . . . . . . . . . . . . . . . . . . . . . . 449.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459.3 Spectral representation of an operator . . . . . . . . . . . . . . 459.4 Spectral decomposition on matrices . . . . . . . . . . . . . . . 46

10 Fourier Analysis 4710.1 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . 47

11 Time/frequency localization, Wavelets 4811.1 Window function . . . . . . . . . . . . . . . . . . . . . . . . . 4911.2 Continuous wavelet transform CWT . . . . . . . . . . . . . . . 5011.3 Redundancy of CWT . . . . . . . . . . . . . . . . . . . . . . . 5111.4 Discrete wavelet transform DWT . . . . . . . . . . . . . . . . 5111.5 Haar MRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

12 Calculus in Banach Space 5312.1 Bochner integral . . . . . . . . . . . . . . . . . . . . . . . . . . 5312.2 Gateaux derivative . . . . . . . . . . . . . . . . . . . . . . . . 5412.3 Frechet derivative . . . . . . . . . . . . . . . . . . . . . . . . . 55

2

13 Stochastic Calculus 5613.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . 5613.2 Stochastic process . . . . . . . . . . . . . . . . . . . . . . . . . 5713.3 Gaussian stochastic process . . . . . . . . . . . . . . . . . . . 5713.4 Stochastic differential equation . . . . . . . . . . . . . . . . . . 5813.5 Wiener process and White noise . . . . . . . . . . . . . . . . . 5913.6 Stochastic integral, an introduction . . . . . . . . . . . . . . . 6013.7 Black Scholes model . . . . . . . . . . . . . . . . . . . . . . . 6113.8 System with noisy input . . . . . . . . . . . . . . . . . . . . . 61

14 Optimization and introduction to optimal control 6214.1 Least squares minimization . . . . . . . . . . . . . . . . . . . . 6314.2 Inverse problems and regularization . . . . . . . . . . . . . . . 6414.3 Example from earth science . . . . . . . . . . . . . . . . . . . 66

15 Optimal Control 6615.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6715.2 Classical calculus of variations problem . . . . . . . . . . . . . 69

3

1 Vector Space

1.1 What is a vector space?

Vector space is a useful mathematical structure which can represent a set ofpoints, functions, matrices, signals and many other types of objects wherethe concepts addition and scalar multiples are defined. In more exact termsa vector space W is a set of objects with two operations:

(x1,x2)→x1 ⊕ x2 ∈ W addition,

(α,x1)→αx1 ∈ W scalar multiplication.

Here xi ∈ W, α ∈ C are arbitrary elements.In a vector space these operations must satisfy the following eight axioms:

∀x,y, z ∈ W and ∀α, β ∈ C:

x⊕ y = y ⊕ x(x⊕ y)⊕ z = x⊕ (y ⊕ z)

∃0 s.t. x⊕ 0 = x

∃ − x s.t. x⊕ (−x) = 0

α(βx) = (αβ)x

(1x) = x

α(x⊕ y) = αx⊕ αy(α + β)x = αx⊕ βx.

For the rest of the document we will use + instead of ⊕.

Examples of vector spaces:The following are familiar examples of sets have such structure

1. Rn = (x1, x2, . . . , xn) | xi ∈ Rx+ y = (x1 + y1, x2 + y2, . . . , xn + yn) and λx = (λx1, λx2, . . . , λxn).

2. F(a, b) = f | f : (a, b)→ R means the function space(f + g)(x) = f(x) + g(x) and (λf)(x) = λf(x))

3. F(A,W ) = f : A→ W, where W is a vector space

4. C[a, b] = f ∈ F [a, b] | f continuous

4

5. s = (xn) | xn ∈ R, space of signals

6. l1 = (xn) ∈ s |∑∞

1 |xn| <∞l2 = (xn) ∈ s |

∑∞1 |xn|2 <∞

7. Rn×m = A | A is m× n matrix

Later we will see how new vector spaces can be constructed by differentprocedures from given ones (sum, product, subspace etc).

1.2 Basic Concepts

Some basic vocabulary and concepts are listed below

• A subspace is a subset M ⊂ W such that

1. u,v ∈M ⇒ u+ v ∈M2. u ∈M,λ ∈ R⇒ λu ∈M

• A hyperplane is a maximal subspaceIf W = R4 then M = (xn) ∈ R4 | x1+x2+x3+x4 = 0 is a hyperplane

If W = C[a, b] then M = f |∫ 1

0f(t)dt = 0 also a hyperplane

• A sum of subspaces is defined as followsIf M1,M2 subspaces, then M1 +M2 = x+ y | x ∈M1,y ∈M2Example: In C[0, 1]: M1 = f | f = a constant, M2 = f | f(t) = ctM1 +M2 = f | f = a+ ct a, c ∈ R

• span of a set of vectorsspan x1,x2,x3, . . . ,xn =

∑ni=1 λixi ∀λi ∈ C

• linear independencec1x1 + c2x2 + · · ·+ cnxn = 0 ⇔ c1 = c2 = · · · = cn = 0

• dimension of a vector space or subspacemaximal number of linearly independent vectors in the space

5

1.3 Examples

Function (sub)spaces. The function space F(a, b) = f | f : (a, b) → Rcontains many important subspaces. For instance C[0, 1] is the space of con-tinuous functions and C2[0, 1] is the set of two times continuously differen-tiable functions on the interval [0, 1]. It is clear that

C2[a, b] ⊂ C[a, b] ⊂ F [a, b].

The following notation L1[0, 1] means the set of integrable functionsu : [0, 1]→ R | u is integrable and

∫[0,1]

|u|dµ <∞

and similarly L2[0, 1] is the space of square-integrabe functions. These andso called Lp(Ω) -spaces are explained in the chapter of measure theory.

Solution set of a differential equation. Define the following two sub-setsM = u ∈ C2[0, 1] | u′′ + α(x)u = β(x)N = u ∈ C2[0, 1] | u′′ + α(x)u = 0N is a vector subspace in C2, M is not. Here is the proof:

Since u,v ∈ N , we have

u′′ + α(x)u = 0

v′′ + α(x)v = 0

Let us check the axioms.

1. u+ v ∈ N?

(u+ v)′′ + α(x)(u+ v)

=u′′ + v′′ + α(x)(u) + α(x)(v)

=u′′ + α(x)(u) + v′′ + α(x)(v) = 0

Clearly in M u+ v would yield 2β(x), so M is not a subspace.

2. λu ∈ N? In N we have

(λu)′′ + α(x)(λu) = λ[u′′ + α(x)u] = λ0 = 0,

while in M we have(λu)′′ + α(x)(λu) = λ[u′′ + α(x)u] = λβ(x) 6= β(x) ∀λ

Let u0 be one particular solution to u′′ + α(x)u = β(x). The generalsolution is then u0(x) +N (Fig. 1), a shifted version of the subspace N . Seeillustration.

6

Figure 1: Illustration of solutions to u′′ + α(x)u = β(x)

1.4 Sum of subspaces

In vector space W = C[−1, 1] defineN = f ∈ W | f(−1) = f(0) = f(1) = 0P2 = f ∈ W | f = α0 + α1x+ α2x

2N and P2 are vector subspaces. Study if W = N + P2?

In F = Rn×n define the subspaces of symmetric and antisymmetric ma-trices M1 = A ∈ F | A = A> and M2 = A ∈ F | A = −A> .Clearly M1 and M2 are subspaces. Show that F = M1 +M2.

In vector space W = C[0, 1] define C+[0, 1] = f ∈ W |f ≥ 0This subset is not a subspace, why?

Also the set B = x ∈ Rn | x12 + x2

2 + · · ·+ xn2 ≤ 1 is not a subspace.

In the space of matrices the important set of binary matrices M = A ∈Rn×n | A(i, j) = 0, 1 is not a subspace.

Direct sum of subspaces

If in a sum of subspaces M = M1 +M2 we have M1 ∩M2 = 0 we saythat M is a direct sum of M1 and M2 and we write

M = M1 ⊕M2

In this case every vector v ∈ M has a unique representation as a sumv = x+ y where x ∈M1 and y ∈M2.

Example.In W = C[−1, 1] define

N = f ∈ W | f(−1) = f(0) = f(1) = 0P2 = f ∈ W | f = α0 + α1x+ α2x

2Using basic results from interpolation theory one finds that N ∩ P2 = 0and so W = N ⊕ P2

In F = Rn×n define M1 = A ∈ F | A = A> andM2 = A ∈ F | A = −A> . Study if the sum F = M1 + M2 is a directsum.

7

1.5 Product space

If X, Y are vector spaces, we can construct a new vector space

Z = X × Y = (u,v) | u ∈ X,v ∈ Y

The operations of vector addition and scalar multiplication are defined in thenatural way (u1,v1) + (u2,v2) = (u1 + u2,v1 + v2) etc. This is called theproduct space of X and Y . A trivial example is Rn = R× R× · · · × R.

2 Normed Spaces

In a vector space one can often define a “distance” or “length” concept calleda norm. This is a nonnegative function x → ‖x‖ ≥ 0 which satisfies threeaxioms given below. Such a space is called normed space and denoted as(X, ‖ · ‖).

Axioms of a norm:

‖x‖ = 0⇔ x = 0

‖x+ y‖ ≤ ‖x‖+ ‖y‖‖αx‖ = |α|‖x‖

Example 1: The space Rn with norm ‖x‖ =√∑

xi2 and Cn with

norm ‖z‖ =√∑

|zi|2

Example 2: In Cn or Rn the following are norms ‖x‖1 = |x1| + |x2| +· · ·+ |xn| and ‖x‖∞ = maxi‖xi‖ In general lp-norm is defined as follows:

‖x‖ =

(n∑j=1

|xj|p)1/p

Example 3: The following are spaces of sequences (or signals).

l1 = (xi)|∑∞

i=1 |xi| <∞ with norm ‖x‖1 =∑∞

i=1 |xi|

l∞ = (xi)|supi‖xi| <∞ with norm ‖x‖∞ = supi‖xi|

lp = (xi)|∑∞

i=1 |xi|p <∞ with norm ‖x‖p = (∑∞

i=1 |xi|p)1/p

8

They can be used to model discrete data structures like digital signals,measurement time series etc.

In each of the cases mentioned above we should prove that the normaxioms are satisfied. Some of these are left as exercises. All can be foundfrom text books of functional analysis [See ....].

The space lp is sometimes written as lp, but they mean the same thing.The same notations apply to l∞, Lp etc.

Example 4: We denoted by C[a, b] the space of continuous functions ofinterval [a, b].

Unless stated otherwise, usually this space is equipped with the sup-norm

‖u‖∞ = supt|u(t)|.

The symbol Ck[a, b] means the space of k times continuously differentiablefunctions.

2.1 Important inequalities

The following important inequalities are needed in the proof of the normaxiom 3 (Triangle ineaquality). They are also often used in proving resultsabout convergence etc. These inequalities are true for arbitrary sets of realor complex numbers xi abd yi.

Schwartz inequality

n∑i=1

|xiyi| ≤

(n∑i=1

|xi|2)1/2( n∑

i=1

|yi|2)1/2

This result has can be proved as follows. For every λ the following isobviously true

∑∞i=1(xi + λyi)

2 ≥ 0 . This quantity can be written as afunction of λ as Aλ2 +Bλ+C ≥ 0 ∀λ . As an upward parabola thisfunction is always above the λ-axis. The determinant of this polynomialδ = B2−4AC must be negative. This observation leads to the Schwartzineaquality.

Holder inequality

n∑i=1

|xiyi| ≤

(n∑i=1

|xi|p)1/p( n∑

i=1

|yi|q)1/q

, where1

p+

1

q= 1

9

Here we say that p and q are conjugate exponents. For instance ifp = 3 then q = 1/[1 − (1/3)] = 3/2 The proof of Hlder’s iequality isbased on an ingenious convexity argument [See...]. A consequence ofthis inequality if the following result (Triangle ineqauality for lp -norm).

Minkowski inequality(n∑i=1

|xi + yi|p)1/p

(n∑i=1

|xi|p)1/p

+

(n∑i=1

|yi|p)1/p

, p ≥ 1

2.2 Norm in the product space

If X, Y ar normed spaces, we can construct a new normed spaceZ = X × Y = (u,v) | u ∈ X,v ∈ Y A norm ‖(u,v)‖ in this space can be for example

‖u‖+ ‖v‖ or√‖u‖2 + ‖v‖2 or max‖u‖, ‖v‖

A trivial example is Rn = R×R× · · · ×R with any lp-norm. The notionof product space is a useful idea in formulation and analysis of mathematicalmodels for many technical arrangements as the following examples show.

Example (Fig. 2). Consider a wire under tension, having some vibra-tions. State of the system can be described as [u(x), λ] ∈ C[a, b] × R whereu(x; t) is the form of the wire at time t and λ is the stretching force.

Figure 2: A wire under tension; example of a product space.

Example (Fig. ??). A support beam of varying thickness is attached toa wall and loaded with mass distribution. Denote x as vertical distance, u(x)is the design, or varying thickness, m(x) is the mass density of the loading,d(x) is the deflection of the bar.

The systems reaction to the load means a function [u(x),m(x)]F→ d(x)

Here [u,m] ∈ C[0, 1]× L1[0, 1] and F[u,m] ∈ C[0, 1]

Think about possible norms in these spaces.

2.3 Equivalent norms, isomorphic spaces

Let X, Y be Normed Spaces and T : X → Y a mapping such thatT (x+ y) = Tx+ TyT (λx) = λTx

10

Such a mapping is called a linear mapping, linear transformation or linearoperator (see chapter X).

Isomorphism is a bijective linear mapping between two vector spaces.When such mapping exists the two vector spaces can be seen to have anidentical structure. Hence an isomorphic relatioship may be used to deriveuseful conclusions.

Example. Let us consider two vector spaces

X = A | A ∈ R2×2Y = u | u ∈ C[0, 1] : u(x) = α0 + α1x+ α2x

2 + α3x3 = P3[0, 1]

We define a mapping T as a correspondence relationship between thesespaces ilustrated by the diagram[

a bc d

]←→ a+ bx+ cx2 + dx3

This maping T : X → Y is obviously bijective and a linear operator. SoX and Y are isomorphic vector spaces.

If both spaces X and Y possess norms, then the operator T is betweentwo normed spaces. For instance, let us define norms as follows

For A ∈ X the norm is ‖A‖1 = |a|+ |b|+ |c|+ |d|and for u(t) ∈ Y we define ‖u(t)‖0 = max |a|, |b|, |c|, |d| The we have a diagram

(X, ‖ ‖1)T−→ (Y, ‖ ‖0)

We say that the mapping T is a ”topological isomorphism” if the normsof x and Tx are comparable in the following sense. There are fixed constantsm and M so that for each x

m‖x‖1 ≤ ‖Tx‖0 ≤M‖x‖1.

This condition defines a topological isomorphism. We will learn later(Chapter xx) that this condition also means continuity of the mapping T inboth directions. If ‖ ‖0 and ‖ ‖1 are two norms in the same space satisfyingm‖x‖1 ≤ ‖Tx‖0 ≤M‖x‖1 , they are said to be equivalent norms.

11

2.4 Isometry

Assume that we have a mapping between two normes spaces

T : (X, ‖ ‖1) → (Y, ‖ ‖2)

If for each x we have ‖Tx‖2 = ‖x‖1 then we say that

T is a norm-preserving mapping, or

T is an isometry.

Example. Let V = Rn, U ∈ Rn×n and U an orthogonal matrix(U>U = I).

Define a mapping U : V → V as x → Ux. Then ‖Ux‖ = ‖x‖ forall x, so we see that U is an isometry. In fact we know that such matrixmultiplication means just rotation in the space Rn.

2.5 Ill posed problems, norm sensitive mappings

Many technical algorithms and mathematical operations have the generingform

F (x) = y.

Here F may represent a system or an algorithm/program, x represents theinput, initial values, initial state of the system and y represents the observed,measured or forecast output. In practice F may represent the (left side of)differential equation, partial differential equation, system of ODE:s, stochas-tic system, algebraic system of equations or a mixture of these.

Using this system model may mean computation of y when x is knownor measured (direct problem) or solving x when y in known or measured(inverse problem).

When applying this model one may have errors, uncertainty, inaccuracyetc in either y or x or both. A system may exhibit sensitivity to initial values.

Some examples. Here F may refer to partial differential equation, likeheat equation. Then x is an initial temperature distribution and y is thefinal temperature. Computing initial temperature backwards starting fromthe final is extremely senditive to errors.

Extruding plastic/elastic polymer mass through an opening (a die) oneweants to produce after solidification a profile of given form. The output formy is (due to elastic/plastic deformation) slightly different from the design ofthe die x. One would need to find a form for the die to create a desiredoutput profile. Here we have inverse problem y → F−1(x).

12

Similar phenomena are seen in light scattering, image enhancement, op-tical lence corrections etc.

In all the cases the question will be about how the norms ‖x‖ and ‖Fx‖are related. When the mapping is an isomorphism (or an isometry), thesituation is easy.

3 Convergence and continuity

3.1 Topological Terms

Metric space is a set possessing a distance fuction d(x, y). This will be definedin later chapter (see XX). Normed space has a natural metric ‖x − y‖. Inmetric spaces the following concepts are valid.

• open ballLet X be a metric space with metric d and x0 ∈ XOpen ball is a set x ∈ X | d(x,x0) < r, r > 0.

• closed ballClosed ball is a set x ∈ X | d(x,x0) ≤ r, r > 0.

• open setA subset M of metric space X is open if it contains a ball about eachof its points (it has no boundary points). See open ball.

• closed setA subset K of metric space X is closed if its complement in X is open.See closed ball.

• boundary (point)Boundary point of set A ⊂ X is a point that may or may not belongto A and its every neighbourhood contain points belonging to A andalso points not belonging to A.

• neighbourhoodNeighbourhood of point x0 ∈ X is any set in X containing an openball of radius ε > 0 (ε-neighbourhood).

•A is interior of A,the largest open set contained in A.

• A is closure of A.Closure of A is the smallest closed set containing A.

13

• ∂A is boundary of A.Boundary of set A is the set of all boundary points of A.

• accumulation pointPoint x0 ∈ X is an accumulation point of set M ⊂ X if every neigh-bourhood of x0 contains at least one point y ∈ M distinct from x0.x0 may or may not be in M . Note that the union of the set of allaccumulation points of M and M is the closure of M .Hence, you can construct an infinite sequence of points in M that con-verges to an accumulation point, but the point does not necessarilybelong to M .

Some implications of these definitions are the following. For an open set

all point are interior points or A =A. A closed set contains its boundary

points, hence A = A. More exactly A =A⋃∂A .

3.2 Convergence of a Sequence

A sequence (xn) in metric space X = (X, d) converges if ∃x ∈ X such that

limn→∞

d(xn,x) = 0.

Here x is the limit of the sequence: xn −→ x.In a normed space we can define d(x,y) = ‖x− y‖, so it can be written

limn→∞

‖xn − x‖ = 0.

3.3 Continuity of a function

Let X, Y normed spaces, F : X → Y function and x0 ∈ X.Function F is continuous at x0 if x→ x0 ⇒ F (x)→ F (x0), or‖x− x0‖ → 0 ⇒ ‖F (x)− F (x0)‖ → 0.Stated in exact terms this means that ∀ε > 0, ∃δ > 0 so that‖F (x)− F (x0)‖ ≤ ε for all x satisfying ‖x− x0‖ ≤ δ.

Exercise: If the function F is linear and continuous at any point a, thenit is continuous everywhere.

3.4 Uniform Continuity

Uniform continuity means that in the above definition on continuity one canselect one δ which satisfies the condition at every point. More exactly

14

∀ε > 0, ∃δ > 0 so that for all x,y ∈ X ‖x − y‖ ≤ δ implies ‖F (x) −F (y)‖ ≤ ε.

Figure 3 presents a curve that is not uniformly continuous; this happensbecause its slope becomes infinity at one point. Such function is f(x) =sgn(x)

√x on interval [−1, 1].

Figure 3: A curve that is not uniformly continuous.

Example. A function A : Rn → Rn, defined by a matrix A ∈ Rn×n byx 7→ Ax is continuous and uniformly continuous

‖x− y‖ ≤ δ ⇒ ‖Ax− Ay‖ ≤ εExercise: If a function F between normed spaces is linear and continuous

at any point a, then it is uniformly continuous.

3.5 Compactness

Set A is said to be compact if every infinite sequence of points (xn), xn ∈ Ahas a convergent subsequence, i.e., ∃a ∈ A so that a subsequence (xn(i))→ a.

An example of a compact set is closed and bounded set in Rn . Toillustrate the idea assume that A ⊂ R2 is a closed and bounded subset(Fig.4). We show that it is compact.

Figure 4: A compact set.

Choose a seguence xn from this set A. The set can be inscribed by asquare Q1. Divide the square into four identical subsquares. One of then,call it Q2 must contain infinitely many points from the set xn. Divitethis square into 4 subsquares. One of them, call it Q3 must again containinfinitely many points from the set xn. Continuing we generate a nestedsequence of squares with size converging to zero. It is easy to pick a sub-sequence so that xn(i) ∈ Qi . This sequence must be convergent as can beeasily seen.

This observation is no longer true if the space has infinite dimension.

Example. A closed and bounded set in normed space

l1 = (xn) |∞∑1

|xn| <∞

15

is defined as follows B = (xn) |∑∞

1 |xn| ≤ 1 = x ∈ l1 | ‖x‖1 ≤ 1

This set is is not a compact. To see this, pick a sequence ei from this setas follows. e1 = (1, 0, 0, 0, . . .), e2 = (0, 1, 0, 0, . . .), e3 = (0, 0, 1, 0, . . .), . . .Then (en) is an infinite sequence in B, but it does not contain any convergentsubsequence. Why?

Compactnes is important property in optimization, for instance. In Fig-ure 5 is shown a two-variable function. We are interested in finding the maxi-mum of the function over two different sets (constraints) in the xy-plane, oneis unbounded and one is bounded. The infinite set is not compact, the finiteset is compact. The picture illustrates the following important theorem.

Figure 5: An infinite, non-compact set and a finite compact set.

Theorem 3.1. Assume that A ⊂ X is a compact set in a normed space X.Let F (u) be a continuous function F : X → R. Then F has a maximum(and a minimum) point in A.

To see this select a sequence u(i) ∈ A so that F (u(i))→ max F and usecompactness to select a convergent subsequence. This theorem can be usedto quarantee that certain optimization problems have a solution.

Example. Assume that the cost function of some control problem de-pends on the applied control/design/policy function f(t) according to anenergy funtional

F (f) =

∫ 1

0

Φ(t)|f(t)|2dt.

Assume that this optimun is sought in a subset of functions B = f ∈C[0, 1] | ‖f‖ ≤ 0.1. This would be a constrained optimization problemin optimal control. Unfortunately the constraint set B in this case is notcompact.

3.6 Convexity, finite dimensional spaces

A real valued function on a normed space (V, ‖ ‖) is called convex if forevery x, y ∈ V and 0 ≤ λ ≤ 1 we have

f [λx+ (1− λ)y] ≤ λf(x) + (1− λ)f(y).

Asume that f(x) is a continuous convex function on the unit ball B =x : ‖x‖ ≤ 1 in the finite dimensional space Rn. It can be shown, using

16

known results about the theory of real functions, that if f(x) > 0 for everyx ∈ B then the overall minimum is positive, that is minBf(x) > 0.

Using convexity argument one can show that in a normed space each finitedimensional subspace is closed. Assume that a subspaceH = span x1, x2, . . . , xnand a point x is not in H. Define functions d(t) and f(t) on the unit cubeof Rn as follows

d(t1, t2, . . . , tn) =

∥∥∥∥∥n∑n=1

tixi

∥∥∥∥∥and

f(t1, t2, . . . , tn) =

∥∥∥∥∥x−n∑n=1

tixi

∥∥∥∥∥ .Both d and f are convex and continuous (exercise). Function d attains

its minimum ∆ in the compact set t : ‖ti‖∞ ≤ 1 . Because on linear in-dependence we know that ∆ > 0. For an arbitrary point t = (ti) in Rn wemust have

d(t1, t2, . . . , tn) = ‖∑n

n=1 tixi‖ ≥ max‖ti‖∆.Similarly for function f(t) we can estimate

f(t) =

∥∥∥∥∥x−n∑n=1

tixi

∥∥∥∥∥ ≥∥∥∥∥∥

n∑n=1

tixi

∥∥∥∥∥− ‖x‖ ≥ max‖ti‖∆− ‖x‖ .

Using this one can show that f(t) must have a positive global minimumm. This means that x is an exterior point regarding the subspace H withseparating distance m. Since the complement is open H must be closed.

One can also show that on a finite dimensional normed space all normsare equivalent. A related fact is that on a finite dimensional normed spaceevery linear mapping T must be continuous. None of these facts holds ininfinite dimensional spaces.

3.7 Cauchy Sequence

Next we define a condition which could be called ’almost convergence’. Let(V, ‖ ‖) be a normed space and xn ∈ V a sequence.(xn) is a Cauchy sequence if ‖xn − xm‖ −→ 0 when n and m −→∞

Later we will show that any Cauchy sequence actually is convergent butthe limit point may be ’outside the space’.

Example. Each convergent sequence satisfies the Cauchy condition. Let(xn), xn → x0, be a convergent sequence.

17

Then ‖xn − x0‖ → 0 and so ‖xn − xm‖ = ‖xn − x0 + x0 − xm‖ ≤ ‖xn −x0‖+ ‖x0 − xm‖ → 0 as n,m→∞.

Is the converse true? In Rn yes, but not in general. This is shown in thenext example.

Example. Let V = C[0, 1], the space of continuous functions equippedwith ‖ ‖1 -norm. Define a sequence of functions (un) like in Figure 6:the middle part is located in the interval

[12− 1

n, 1

2+ 1

n

]and the slope grows

with n.

Figure 6: A sequence of functions.

The expression of the function un(t) can be written

un(t) =

0, 0 ≤ t ≤ 1

2− 1

n12− n

4+ n

2t, 1

2− 1

n≤ t ≤ 1

2+ 1

n

1, 12

+ 1n≤ t ≤ 1

As illustrated in Figure 7

‖un − um‖ =

∫ 1

0

|un(t)− um(t)|dt −→ 0

and similarly

‖un − um‖2 =

∫ 1

0

|un(t)− um(t)|2dt −→ 0

when n and m grow to infinity. Hence, this is a Cauchy sequence in However,the limit of this sequence is not a continuous function (Fig. 8), so the sequenceis not convergent in V .

One can see however that the sequence has a limit in the space L1[0, 1]or L2[0, 1] .

Figure 7: ‖un − um‖

Figure 8: Limit of the sequence of functions.

18

4 Banach Space

A normed space (V, ‖ ‖) is called complete if every Cauchy sequence is con-vergent in V . Complete normed space is called a Banach space.

Examples of Banach spaces are: Cn, Rn, C[a, b], ‖ ‖∞ In fact everyfinite dimensional normed space is complete and so a Banach space.

The sequence space lp with the lp-norm or l∞ with the sup-norm.Also the space Lp[a, b] of (Lebesgue) integrable functions with norm

‖u‖p =

∫[a,b]

|u(t)|pdt

1/p

is a Banach space, see chapter (xxx). Proof of these facts are given in manybasic textbooks of functional analysis. Why are we interested if any givenset of mathematical objects is a Banach space? The reason is many factsand powerful theorems that are known to to be true in Banach spaces. Manyuseful mathematical tools, arguments, algorithms are ready to be appliedonce we know that we are working in a Banach space.

4.1 Completion

Many important normed spaces are not complete. However we know thatevery normed space is almostcomplete. These spaces can be ’fixed’ to becomeBanach spaces. This is done by a procedure called completion. The basis ofthis idea is given below.

Let V be a normed space. There exists a complete normed space W suchthat V ⊂ W and closure of V is W , V = W , we say: V is dense in W .

A subset S ⊂ V is called dense if S = V . This means that∀a ∈ V ∃(xn) ∈ S s.t. xn → a.

Example. C[a, b], ‖ ‖1 is not a Banach space. However C[a, b] ⊂L1[a, b], C is dense in L1 and L1 is a Banach space. Here L1 is the com-pletion of C

Example. The space V = C∞[a, b] of infinitely many times differentiablefunctions.For these functions so called Sobolev norms are defined as

‖u‖n,p =

[∫ b

a

n∑i=0

|Diu(t)|pdt

]1/p

19

(C∞[a, b], ‖ ‖n,p) is not in general a Banach space.The above mentioned spaces and Sobolev norms are important in the

theory of PDE:s. The simplest Sobolev norm is

‖u‖ =

∫ b

a

|u(t)|+ |u′(t)|dt

The significance of this norm can be seen by the following example. Insurface inspection of a machining process or quality evaluation of newly builtroad we are measuring the difference between and ideal curve f0(t) and thereal result f(t). See figured below. Compare the result when (a) L1-norm or(2) Sobolev-norm is used.

Example.The following Banach space might appear in CFD or otherengineering application of PDE:s. Ω ⊂ Rn, ”interior of some pressurised ves-sel”.Ck(Ω) = u | function u has continuous derivatives up to degree kα = (α1, α2, . . . , αn) is a multi-index (vector of integers).|α| =

∑|αi|

Dαu =∂|α|u

∂xα11 ∂x

α22 · · · ∂xαn

n

Define‖u‖ = max

|α|‖Dαu‖∞.

Then space (Ck(Ω), ‖ ‖) is a Banach space.

4.2 Contraction Theorem

We present an example of the power of Banach space theory. Let (V, ‖ ‖) bea normed space and F : V → V a function.F is a contraction mapping if ∃k, 0 ≤ k < 1 such that ‖F (x) − F (y)‖ ≤k‖x− y‖ for all x,y ∈ X.

With a contraction mapping, whatever x and y you choose, their imagesare closer together than the original points, by a factor less than 1.

Theorem 4.1. Let X be a Banach space and F : X → X a contractionmapping.Then there exists x? ∈ X such that F (x?) = x? called a fixed point.

Contraction theorem has some important applications. This is illustratedby the folloiwng

20

Example. We study a general differential equation with initial conditionu′ = f(u, t)

u(t0) = u0

(1)

Tranform this equation into equivalent integral equation

⇔ u(t) = u0 +

t∫t0

f(u(s), s)ds

We define a space of functions (with suitable radius a, not specified here)X = C[t0− a, t0 + a] with the sup-norm and an operator F : X → X by thefollowing formula

F(u) = u0 +

t∫t0

f(u(s), s)ds

.The following equivalence is obviousEq. 1 ⇔ u = Fu.

Solving the differential equation has been transformed into a questionabout a fixed point of the operator F . Existence of the solution is guaranteedfor a large class of differential equations. Whenever the kernel-function f(u, t)is such that the integral operator becomes a contraction, the solution existsby the contraction principle.

Fig. 9.

Figure 9: An example of what?

4.3 Separability

Normed space (V, ‖ ‖) is separable ⇔ there exists a denumerable (=count-able) dense subset of V . Separability is an important property which may beused in the study of convergence, approximation, numerical algorithms etc.

The set of rational numbers Q is denumerable, R is not. The first claimis seen by contructing an infinite table of integer pairs (n,m). Every rationalif of the form q = n/m and hence has a place in this table. The cells of thistable can be easily numeratated (1, 1)→ (1, 2)→ (2, 1)→ (3, 1)→ (2, 2)→(1, 3)→ (1, 4)→ (2, 3) · · · .

21

The second claim is proved by contradiction using famous diagonal argu-ment. Assume that the reals [0, 1) can be enumerated - ordering them into asequence xn. We represent each of them by its binary decimal representation

x1 = 0.α11α

12 . . .α1

m

x2 = 0.α21α

22 . . .α2

m

x3 = 0.α31α

32 . . .α3

m

.... . .

xn = 0.αn1αn2 . . .αnm

Then we can construct z = 0.(1−α11)(1−α2

2)(1−α33) . . . that is not found

from the sequence (xn). This is a contradiction.Examples. It is clear that the set of rationals Q is dense in R so it is

trivilly a separable space.How about l1 = (xn) |

∑∞1 |xn| <∞

. Is this a separable space?Let S = x | xi ∈ Q ⊂ l1 be the set of rational sequences? It is easy

to se that this set is dense in l1. Exercise! However S is not denumerable.Think binary sequences of 0:s and 1:s so S contains more elements that in-terval [0, 1). A binary signal α = (α0, α1, α2, . . .), αi ∈ 0, 1 can bemade to correspond the binary number a = 0.α0α1α2 . . .. 0 ≤ a < 1 so weget a mapping [0, 1)↔ binary signals.

However the argument can be modified in a nice wayLet S0 = (r1, r2, r3, . . . , rk, rk+1 = 0, rk+2 = 0, . . .), ri ∈ Q be the set

of truncated rational sequences.Countable union of countable sets is countable (Exercise!) and so S0 isdenumerable.

S0 is also dense in l1. This is seen as follows

Figure 10Choose arbitrary x = (xi) ∈ l1, and ε > 0. From the definition of l1:∑|xi| <∞.

We choose N big enough so that∑∞

N |xi| <ε2.

Next we choose r1, r2, · · · , rN so that∑N

1 |ri − xi| <ε2

We have constructed a vector r = (r1, . . . , rk, 0, 0, . . .) for which∑∞1 |ri − xi| =

∑N1 +

∑∞N+1 <

ε2

+ ε2

= ε

22

We have constructed a vector r ∈ S0 that is arbitrarily close to x ∈ l1.

Figure 10: A vector of space l1.

4.4 Compact Support

Let F be a function on X. The support of the function is defined assupp F = x ∈ X | F (x) 6= 0

F is said to have a compact support ⇔ supp F is compact. Such func-tions are important for instance in the study on PDE:s, FEM-methods, weaksolutions etc. where so called test functions are often taken from this class.

The symbol C00(R) is used to denote the space of continuous functionsof compact support. The space C∞00(R) is an important space of smooth (in-finitely differentiable) compactly supported functions appearing in the theoryof FEM methods, ditributions, generalized derivatives etc.

5 Metric Spaces

There are situations where an different idea of measuring difference, distanceetc is appearing, more general than norm. Let X be a set equipped with atwo place function x,y ∈ X 7→ d(x,y).This funtion d is called a metric if ∀x,y, z ∈ X

1. d(x,y) = 0 ⇔ x = y

2. d(x,y) = d(y,x)

3. d(x, z) ≤ d(x,y) + d(y, z).

Example. Let W be a normed space. Then d(x,y) = ‖x − y‖ is ametric.U ⊂ W , then (U, d) is a metric space, but not necessarily a vector space.

Example. Let C+[0, 1] = f ∈ C[0, 1] | f ≥ 0Define λ(A) = ”the length of A”Define a distance by d(f, g) = λx | f(x) 6= g(x). Figure 11.

Figure 11: An example of a metric in function space.

Example. Let Φ[a, b] = f : [a, b]→ R | 0 ≤ f ≤ 1, integrable

d(f, g) =b∫a

min|f(x)− g(x)|, εdx

23

In this example we are evaluating difference between two functions by apply-ing a certain threshold. If the functions differ by at least ε then it does notmatter how much the difference is. Figure 12.

Figure 12: An example of a metric in function space.

Example. We define BV [0, 1] = f | f has bounded variation and

Var F = |f(0)|+ supD

∑|f(xi+1)− f(xi)|,

where D = x0, x1, . . . , xN is an arbitrary subdivision of the interval [0, 1].Figure 13, ”length of the curve”.

A function of unbounded variation is sin 1x

for example. Now d(f, g) =Var(f − g) is a metric in BV [0, 1].

Figure 13: Piecewise linear approximation of curve length.

This space BV [0, 1] has some theoretical interest because it can be shownto be the dual space of L∞[0, 1]. See a later chapter about duality.

5.1 Translation Invariance

Let V be a vector space. If there exists a metric d, does

d(x,y) = d(x+ a,y + a)

hold? Generally no, but if the metric is defined through a norm, we cancalculate

d(x,y) = ‖x− y‖ = ‖x+ a− y − a‖ = d(x+ a,y + a)

.We see that a norm always produces a translation invariant metric.

5.2 Application areas

One example of using metric in technical applications is image recognition.Think a gas station or railway station where you can pay by a bank note.The machine must recognize if the bank note is real. The recognition is basedon comparison of the measured signal, the digital image of the received bank

24

note, with a stored official image of a real bank note. These two imagesare not exactly identical (why?) and so the machine needs to compute thedistance between the two images. So here we need a metric to compare twoimages. Other application areas would be

• evaluating the convergence of a numerical algorithm by measuring adistance between two functions

• in digital codes to define a distance between code words (especially er-ror correctiong codes)

• comparing a measured heart curve EKG with an ideal curve of a healthyperson to diagnose the condition of the heart

• measuring similarity of text samples in comparing authors’ styles

• measuring similarity/dissimilarity of fuzzy sets.

6 Hilbert Space

An important class of normed spaces are those where the norm is generatedby an inner product.

Let H be a vector space. Mapping H ×H → R or C denoted by symbol(x,y) 7→ 〈x,y〉 is an inner product if it satisfies the following axioms:

1. 〈αx+ βy, z〉 = α 〈x, z〉+ β 〈y, z〉

2. 〈x,y〉 = 〈y,x〉 (complex conjugate)

3. 〈x,x〉 ≥ 0, 〈x,x〉 = 0 ⇔ x = 0

Inner product generates a norm ‖x‖ =√〈x,x〉 is a norm. This satisfies

the Schwartz inequality.

| 〈x,y〉 | ≤ ‖x‖‖y‖The proof is an easy adaptation of the proof presented for vectors in Rn.We call (H, 〈 , 〉) an inner product space. If H is complete, that is if

every Cauchy sequence is convergent in H, we say that H is a Hilbert space.Following inclusions hold Hilbert ⊂ Banach ⊂ normed ⊂ vector spaces.

25

Examples.The spaces Rn, Cn, equipped with

〈x,y〉 ≡ x1y1 + x2y2 + · · ·+ xnyn

are Hilbert spaces.C[a, b] with 〈u, v〉 ≡

∫ bau(t)v(t)dt is an inner product space, but not a

Hilbert space.The space l2 with 〈u, v〉 ≡

∑∞1 xnyn is the only lp-type Hilbert space. In

fact practically all Hilbert spaces are isomorphic to l2.The function space L2(a, b) = f |

∫ ba|f(t)|2dt <∞ with

inner product 〈u, v〉 ≡∫ bau(t)v(t)dt is a Hilbert space.

Several other inner products can be defined. For instance an inner prod-uct with weight function 〈u, v〉φ ≡

∫ baφ(t)u(t)v(t)dt, φ(t) > 0 or

The following inner product is an example from Sobolev spaces. 〈u, v〉S ≡∫ bauvdt+

∫ ba

DuDvd valid in C[a, b] ∩ L2(a, b)

6.1 Properties of Inner Product

The following note is often useful

〈x,y〉 = 0 ∀x⇒ y = 0

Parallelogam law. The Following is true in all inner products ‖x +y‖2 + ‖x− y‖2 = 2‖x‖2 + 2‖y‖2, Figure 14.

Figure 14: Sum and difference of two vectors.

Continuity of norm and inner product. Let xi be a sequence invector space V . If xi → x, then ‖xi‖ → ‖x‖. If in a Hilbert space xi → xand yi → y then one can easily prove 〈xi, yi〉 → 〈x, y〉. Think Schwarzinequality.

Infinite sums. If xi is a sequence in normed space V and the finitesums

∑k1 xi form a convergent sequence, that is

∑k1 xi → x or ‖

∑k1 xi−x‖ →

0 we write∑∞

1 xi = x and say that the series is convergent.If we have a convergent series

∑∞1 xi = x in a Hilbert space H and z ∈ H,

then using the continuity of the inner product we can write⟨k∑1

xi, z

⟩=

k∑1

〈xi, z〉 → 〈x, z〉

26

and so

〈x, z〉 =∞∑1

〈xi, z〉

6.2 Orthogonality

Two vectors are orthogonal if their inner product is zero.

x ⊥ y ⇔ 〈x,y〉 = 0

The Pythagoran theorem holds for inner product spaces.

x ⊥ y ⇒ ‖x+ y‖2 = ‖x‖2 + ‖y‖2

Define orthogonal complement of S by S⊥ = x ∈ H | x ⊥ y, ∀y ∈ SS⊥ is always a closed subspace (Exercise!). Also it is not difficult to see

that S⊥⊥ = span S. The notation refers to the closure of span S.

6.3 Projection Principle

Let S be a closed subspace in a Hilbert space H. Let x ∈ H, but x /∈ S.One can prove that there exists a unique point y ∈ S such that

‖x− y‖ = min‖x− z‖ | z ∈ S.Figure 15.

Figure 15: Projecting x onto S.

This point of minimal distance is found by orthogonal projection: find ysuch that (x− y) ⊥ S or 〈x− y, z〉 = 0 ∀z ∈ S.

Example. Approximate the function f(t) = sin t2 on an interval [a, b]with a polynomial of degree n. Here f(t) ∈ L2(a, b) and we define a subspace

S = span 1, t, t2, . . . , tn

Application of the projection principle means the following task. Solvecoefficinets αi from∫ b

a

[f(t)− (α0 + α1t+ α2t

2 + · · ·+ αntn)]tkdt = 0

for k = 0, 1, 2, . . . , n. This system of equations gives the optimal approxima-tion.

27

6.4 Projection Operator

The operator PS : H → S that maps a vector x ∈ H to the orthogonalprojection vector y ∈ S is called projection operator.

(Figure 15).It is clear that this operator is linear. Moreover PS · PS = PS so that

projection operator satisfies equation P 2 = P .Example. Consider the Moving average operator M in signal filtering.

Obviously M2 6= M and so it is not a projection.

6.5 Orthogonal Sequence

Let (ui) be a sequence, ui ∈ H. Orthogonal sequence is defined as

ui ⊥ uj or 〈ui,uj〉 = 0 ∀i 6= j

Take u1,u2, . . . ,uk orthogonal. Then arbitrary vector x in space spannedby ui can be represented as x =

∑k1 αiui.

Take an inner product of x with each of uj.

〈x,uj〉 =k∑1

αi 〈ui,uj〉 = αj 〈uj,uj〉

⇒ αj =〈x,uj〉〈uj,uj〉

Let (uj) be an infinite orthogonal sequence. Often one can represent avector as a sum of an infinite series as follows

x =∞∑1

αiui

This means that the finite partial sums

x(k) =k∑1

αiui

converge to x in the norm of the space

‖x− x(k)‖ −−−→n→∞

0.

One would like to know if the following calculation is valid

28

〈x,uj〉 =

⟨∞∑1

αiui, uj

⟩=∞∑1

αi 〈ui,uj〉

also in the case of infinite series. For finited sums it is clearly true. Theprove this for infinite sums one needs to use the fact that norm x→ ‖x‖ andinner product x, y → 〈x, y〉 are continuous functions meaning that if xn → x0

and yn → y0 then‖xn − x0‖ → 0

and〈xn, yn〉 → 〈x0, y0〉 .

Hence we know that the formula given earlier about the coefficients isalso valid for infinine orthogonal series

αj = 〈xj, uj〉 / 〈uj, uj〉 .

6.6 Orthonormal Sequence

Sequence ui is orthonormal if

ui ⊥ uj when i 6= j and

‖ui‖ = 1 for all i.

An equivalent condition is

〈ui,uj〉 = 0, i 6= j

〈ui,uj〉 = 1, i = j

The infinite series representation x =∑∞

1 αiui is easy to compute

〈x,uj〉 = . . . = αj 〈uj,uj〉 = αj

We get the Fourier representation

x =∞∑1

〈x,ui〉ui

.

29

6.7 Maximal Orthonormal Sequence

Let H be a Hilbert space and ui an orthonormal sequence. If

span ui = H

then we say ui is maximal orthonormal sequence (or total orthonormalsequence).

Let x ∈ H be an arbitrary vector. Set y =∑∞

1 〈x,ui〉ui. We cancompute as follows

〈x− y,uj〉 = 〈x,uj〉 − 〈y,uj〉

= 〈x,uj〉 −

⟨∞∑1

〈x,ui〉ui, uj

= 〈x,uj〉 −∞∑1

〈x,ui〉 〈ui,uj〉

= 〈x,uj〉 − 〈x,uj〉 = 0.

If follows that for all linear combinations we have⟨x− y,

k∑1

ciui

⟩= 0.

These linear combinations are densi in H so we have 〈x− y, z〉 = 0 ∀z

We remember the Hilbert space axiom: 〈v, z〉 = 0 ∀z ∈ H ⇒ v = 0 .Hence x = y and we see thatx =

∑∞1 〈x,ui〉ui exists for every x ∈ H.

6.8 Orthogonal Polynomials

Define inner product in a Hilbert function space as

〈u, v〉M =

b∫a

M(t)u(t)v(t)dt.

The weight function M(t) > 0, except at the end points it can be also zero.For many choises of weight functions L2(a, b), 〈 , 〉M is a Hilbert space.

Starting from the usual polynomials tn and applying the well knownGramm−Schmidt− process one can generate an orthonorma sequence with respect tothe inner product 〈 , 〉M .

30

1, t, t2, · · · , tn, · · · Gram-Schmidt−−−−−−−−→ ΦMn

For M(t) = 1 we get the Legendre polynomials.For M(t) = 1/

√1− t2, t ∈ [−1, 1] we get the Chebyshev polynomials.

For M(t) = (1− x)α(1 + x)β one gets Jacobi Polynomials.The weight function M(t) = e−x generates Laguerre polynomials. Hermitepolynomials are orthogonal with respect to the Gaussian weight function

M(t) = e−x2/2.

Orhogonal polynomials are used in deriving solutins for PDE:s for in-stance. In so called spectral methods one uses orthogonal expansions ofbasis functions.

6.9 Orthogonal Subspaces

Let H be a Hilbert space and S ⊂ H a subspace.The spaces S and S⊥ are orthogonal subspaces and H = S ⊕ S⊥. This

means that for any x ∈ H one can find y ∈ S and z ∈ S⊥ so that x = y+z.Consider a mutual orthogonal set of subspaces M1,M2, · · · ,Mn, where

Mi ⊥ Mj, i 6= j. Let P = Pi be the projection operator onto the subspaceMi.

In case the subspaces span the whole space, we can write

H = M1 ⊕M2 ⊕ · · · ⊕Mn.

This is called an orthogonal decomposition of the space H.For each vector x we can write

x = x1 + x2 + · · ·+ xn, and for each i,jxi ⊥ xj= P1x+ P2x+ · · ·+ Pnx

This means that the identity operator has been split into a sum of projections

I = P1 + P2 + · · ·+ Pn.

Example. Let H = L2(−π, π) and Mn = span cosnt. It is known incalculus that

〈cosnt, cosmt〉 =

∫ π

−πcos(nt) cos(mt)dt = 0, for all n 6= m.

Hence we have a decoposition H = M1 ⊕M2 ⊕ · · · ⊕Mn︸ ︷︷ ︸⊕H⊥n .

31

7 Measure theory, measure spaces

If A ∈ R2 is a set, we can intuitively define a function µ(A) = “area of A”.The definition is clear for a rectangle, or if the set can be approximatedby combination of rectangles. However it is known that it is not possibleto define this concept for arbitrary sets A. Area can be defined only fora collection of measurable sets. The collection of measurable sets has astructure of sigma algebra.

Let Ω be a set (a universe) and P a collection of subsets A ⊂ Ω. Thiscollection is Sigma algebra if

1. φ ∈ P

2. A ∈ P ⇒ Ac ∈ P

3. Ai ∈ P ⇒∞⋃i

Ai ∈ P

A measure is a function defined on a sigma algebra. More exactly, letµ : Ω→ R+ be a function such that

1. µ(φ) = 0

2. µ (⋃Ei) =

∞∑1

µ(Ei) if Ei⋂Ej = 0

We say that µ is a measure and (Ω,P, µ) is a measure space.

Example: Here (pi) is a sequence of points in R and ](A) = ]i | pi ∈ A.Here ] means the counting function and is also called “Counting measure”.If B is any sigma algebra of sets in R , then (R, B, ]) is a measure space.

Example: Suppose we have a random experiment, where the range of allpossible outcomes is Ω, a collection of possible events is a sigma algebra P .We define the probability measure by p(A) = Pω ∈ AThen (Ω, P, p) is a measure space, also called probability space.

Example: Let us have an amount of some material (imagine smoke inspace or ink on paper, dye dissolved in liquid, electical charge in material)distributed/dissolved in a space Ω . We define a function

µ(A) = amount of substance in A. This defines a measure.

The material can be distributed in a continuous cloud, or in pointwisegrains (like color/dye). The mass distribution can be a mixture of both

32

types. Electrical charge can be distributed in a mixture consisting of spacecloud, surfaces, sharp edges, or points. Singular points may have non-zeromeasure µ(x) > 0

Example:

Imagine the following experiment. We have a disc where one tiny sectionof the perimeter has been cut so that there is a short line segment. The discis located in a space between two walls where it can roll easily. The discis perturbed from start position and rolling back and froth until it stops.Denote by α the angle of rotation from start position. In this random examplethe stopping angle α has an interesting distribution. What kind?

7.1 Lebesque Integral

Let (Ω, P, µ) be a measure space. A simple function s : Ω → R is defined

as s =n∑1

ciχEi, where

χE(t) =

1 if t ∈ E0 otherwise

means the characteristic function of a set E .Integral of a simple function over Ω is defined as∫

Ω

sdµ =n∑1

ciµ(Ei)

Integral over a subset A is∫A

sdµ =n∑1

ciµ(Ei⋂A)

The integral over arbitrary integrable (measurable) function u : Ω → Ris defined via a limit process.

Definition. Let sn be an increasing sequence of simple functions suchthat sn → µ (pointwise). The limit

∫A

udµ = limn

∫A

sndµ is called a lower

integral.A function is integrable if a similar approximation by a decreasing se-

quence from above produces the same limit. Such function is called Lebesqueintegrable. Integrable functions may be non-smooth, with infinitely manydiscontinuities etc. However they are very close to nice functions.

33

If u is integrable and ε > 0 then ∃v ∈ C∞ such that

∞∫−∞

|u− v|dµ < ε

.The space of Lebesque integrable functions is definedLp(Ω) =

µ : Ω→ R | integrable

∫Ω|u|pdµ <∞

The following expression looks like a norm

‖u‖p =

[∫Ω

|f |pdµ] 1

p

However the third norm axiom is not true because‖u‖ = 0 ⇔ µx | u(x) 6= 0 = 0. In this case we say that u = 0 almosteverywhere.

We identify all functions that are equal almost everywhere. This is doneby introducing equivalence classes [u] = f | f = u almosteverywhere.

Finally we define a space Lp as the space of equivalence classesLp(Ω) = [u] :

∫|u|pdµ <∞. This space is a banach space.

7.2 Riemann-Stietltjes Integral

α : [a, b]→ R, non-decreasing function and continuous on the right.f : [a, b]→ R a measurable function.We divide the interval [a , b] into subintervals by intermediate points. Thisis called a partition of the intervalπ = a = x0 < x1 < x2 < . . . < xn = b. We study the following sums takenover all partitions

n∑1

f(ξi)[α(xi+1)− αxi]→b∫

a

f(x)dα(x)

where ξi ∈ [xi, xi+1]. If this sum has a limit when the partition is ultimatelyrefined so that the ‖π‖ = max‖xi+1 − xi‖ → 0 this limit is called theRiemann-Stieltjes integral of the function f(x) with respect to the integratorfunction α(x).

34

8 Linear transformations

Let X, Y be Normed Spaces and T : X → Y a mapping such thatT (x+ y) = Tx+ TyT (λx) = λTx

Such mapping is called linear transformation or linear operator.Let X = C[0,∞) Y = C2[0,∞). Then taking a derivative is a linear

operator D : Y → X mapping u→ u′

Similarly D2 : Y → X mapping u→ u′′ is a linear operator.

The following is a familiar differential operator - being teh left hand sideof a well-known circuit equation. S = LD2 +RD + 1

cI.

Solving the circuit equation Lu′′ + Ru′ + 1cu = v is equivalent to solving

operator equation sµ = v.

From the theory of differential equations we know that the solution is

u(t) =

t∫0

k(t− s)v(s)ds

where

k = k(t) =(eλ1t − eλ2t)L(λ1 − λ2)

This integral formula u = Tv defines another linear operator T : X → Y .It is clear that S · T = Ishowing that S and T are inverse to each other.Examples:

The formula L(f(t)) =∞∫o

e−stf(t)dt defines a linear operator f(t)→ f(s).

This is called Laplace Transform

The formula Φ(f(t)) =∞∫−∞

e−iωtf(t)dt defines Fourier Transform

In signal analysis we have convolution (f ∗ g)(t) =∞∫−∞

f(s)g(t − s)ds

which is also a linear operation.

35

Linear partial differential equation PDE contains a linear operator on func-tions u = u(x, t)

In the following PDE

∂u

∂t− ∂

∂x[k(x)

∂u

∂x] = q(x)

we see the action of following linear operators

u→ Dtu

u→ k(x)u

u→ Dxu and

u→ [Dt −Dx[k(x)Dx]] · u

Example: The following is a generic formula of an integral equationIn space X = L2(a, b) define an operator T : X → X by

(Tu)(s) =

b∫a

k(s, t)u(t)dt

Here k = k(s, t) ∈ L2[(a, b)x(a, b)] is called the kernel of the operator. Notethat the transforms mentioned above are actually special cases of the genericformula.

8.1 Bounded Linear Transformation

A linear operator between normed space is called bounded if

‖Tx‖ ≤ K · ‖x‖

for some constant K.It is easily seen that for a linear mapping T is bounded⇔ T iscontinuous

If xn → x0 then ‖Txn − Ty0‖ = ‖T (xn − x0)‖ ≤ K · ‖xn − x0‖. The reverseimplication is left as an exercise.

Example: Some common operators are not bounded.Consider D : C1[0, ∞)→ C[0, ∞), with ‖ ‖∞-norm. By stydying functionsun(t) = 1/sqrtn · sin(nt) one can see that D is not a bounded operator.

We call an operator T invertible if T−1 exist and is bounded.

36

8.2 Norm of an Operator

The norm of an opetarator T : X → Y is defined

‖T‖ = sup‖Tx‖/‖x‖ : x 6= 0

or equivalently‖T‖ = sup‖Tx‖/‖x‖ ≤ 1.

Then we have always ‖Tx‖ ≤ ‖T‖‖x‖

The set of operators L(X, Y ) = T | T : X → Y, linear equipped withthis norm then becomes itself a normed space L(X, Y ), ‖ ‖.

Example A matrix A ∈ Rm×n defines a linear operator A : Rn → Rm

via matrix multiplication x→ Ax.Specifying norms on both spaces we get an operator between normed spaces,such as A : [Rn, ‖.‖∞]→ [Rm, ‖.‖1] .

I this case the operator norm ‖A‖∞,1 = sup‖Ax‖1 | ‖x‖∞ ≤ 1is called a Matrix Norm.

Example: When we have A : [Rn, ‖.‖2]→ [Rm, ‖.‖2] the matrix norm isknown to be

‖A‖ = ‖A‖2,2 =| λmax |12 ,

where λmax = largest eigenvalue of the matrix ATA . This is used for instancein the construction of single value decomposition SVD

8.3 Composite operator

Assume that Y , Y and Z are normed spaces and we have operators T and Sbetween them.

XT−→ Y

S−→ Z

The the composite mapping ST : X → Z is created and ST (x) = S(T (x)). In this situation it is cleat that the following norm inequality holds ‖ST‖ ≤‖S‖‖T‖ .

37

8.4 Exponential Operator Function

When operator T acts inside the same space, that is T : X → X , then wecan define the n:th power of the operator T.T.T...T = T n . It is clear thatthen

‖T n‖ ≤ ‖T‖n

.If A : X → X is a bounded operator on X , we can define an exponential

function exp(A) or eA of the operator A as the limit of power series

I + A+1

2!.A2 + . . .+

1

k!.Ak = Tk → eA.

To derive this result on convergence, the following observations are used.The space of operators L(X) = L(X,X) is a Banach space if X is a

Banach space. Moreover for the partial sums Tk = I + A + . . . + 1k!.Ak we

can show that ‖Tk − Tm‖ → 0 as k,m →∞ . Note that we can calculate

‖Tk − Tm‖ = ‖k∑

m+1

Ak/k!‖ ≤k∑

m+1

‖Ak/k!‖ → 0 since this is a section of

the power series of the real exponential function et .

8.5 Filters in Signal Spaces

In digital electonics, processing videa/audio signals one is often working withlinear operations. In the space X = s = (xn) | n = −∞, . . .∞ we candefine linear map T : X → X as follows T (Xn) → T (xn−1) . This is calledthe shift operator. It is a linear mapping in the signal space. What wouldmean the n:th power T n ? What would the following operator do αT +βT 2 ?

Next we define another mapping Xn → Zn in the signal space. Thefollowing operator is called ’Moving Average filter’.

Zn = 1k+1

k∑0

Xn−j. It is also a linear map.

In general the formula Zn =k∑0

akXn−j represents an arbitrary filter in

the signal space. The coefficients (ak) will define the behavior of the filter.By a choice of suitable coefficient vector the filter may modify the signal byremoving (smoothing out) certain features (high-pass,low-pass filters etc).

38

8.6 Linear Functional

Let X be a Normed Space. A linear mapping f : X → R (or in C) linear iscalled linear functional.The space X ′ = f | f is linear functional is called the algebraic DualSpace of X.The suibspace X∗ = f | f is continuous linear functional is called the dualSpace or sometimes the topological Dual Space of X.The norm in thsi space is

‖f‖ = sup‖x‖≤1

|f(x)| <∞

Example: X = l1, x = (xn) ∈ l1. Examples of linear functionals in thisspace aref(x) =

∑∞1 xi

f(x) = x1 − x10

f(x) =100∑1

xi .

In the function space X = C[a, b] the following are linear functionals

f(u) =∫ bau(t)dt

g(u) = 12[u(a) + u(b)]

h(u) =∫ baφ(t)u(t)dt .

Example: LetH be so called Haar function [see]. The following functionsui,k(x) = 2−iH(2i(x− k)) are a basis for an orthonormal wavelet expansion

f(x) =∑i,k

αikhi,k(x)

The formula to compute the coefficient αikhi,k(x) is, as we know from Fouriertheory

f(x)→∞∫

−∞

f(x) · hi,k(x)dt.

This formula can be seen also as a linear functional on X = L2(R) .

Example: Let X = lnp = (R, ‖‖p) and x = (x1, x2, . . . , xn) ∈ X . Wedefine a linear funtion f : Rn → R as follows

f(x) = α1x1 + α2x2 + . . .+ αnxn = 〈a, x〉 ,

39

where a = (a1, a2, . . . , an) ∈ Rn . It is clearly a linear functional.A consequence of Holder inequality is that if ‖x‖p ≤ 1 then |f(x)| = |〈x, a〉| ≤‖a‖q , where (1/p+ 1/q = 1) . One can prove that in fact

‖f‖ = sup‖x‖p≤1

|〈x, a〉| = sup‖x‖p≤1

|α1x1 + α2x2 + . . .+ αnxn| = ‖a‖q,

with (1

p+

1

q= 1

).

This means that we have identified elements of the dual, and(lnp)∗

= lnq .

One can also prove that the same holds for infinite dimensional case, thatis (

lnp)∗

= lq.

Example: The case of function space X = Lp is similar and very impor-tant. One can show that in the space X = Lp(a, b) linear functionals takethe form

F (u) =

∫ b

a

u(x).v(t)dt = 〈u(v), v(t)〉 ,

where v(t) ∈ Lq[a, b] and so we can say that

[Lp(a, b)]∗ = Lq(a, b).

To identify the dual space of a given normed space is not trivial. In thespace of continuous functions C [a, b] the following formula defines a linearfunctional

f(u) =

∫ b

a

u(t)dF (t),

where F (t) is a function of bounded variation and the integral is of Riemann-Stieltjes type. Let us define NBV [a, b] to be the space of BV-functions F (t)normalized as F (a) = 0. One can show that all continuous functionals onC[a, b] are of this type and so

C∗[a, b] = NBV [a, b].

40

8.7 Duality, weak solutions

If X and Y are normed spaces and A : X → Y is a linear operator, then wecan define a dual or adjoint operator A∗ : Y ∗ → X∗ between the dual spacesby formula

A∗g(x) = g(Ax) for all g ∈ Y ∗.

The notation 〈x, f〉 is often used instead of f(x) when speaking of linearfunctionals. Hence the definition of the dual operator can also be written

〈x,A∗g〉 = 〈Ax, g〉 .

A sequence xn in a normed space X is said to be weakly convergentto a limit x0, if

〈xn − x0, a〉 → 0 for all a ∈ X∗.

Convergence in norm implies weak convergence but not vice versa.As a consequence of a well-known theorem (Hahn-Banach) we know that

for any vector x ∈ X we have

‖x‖ = sup |〈x, a〉| : a ∈ X∗ .

A consequence of this is a that if for a vector x we have 〈x, a〉 = 0 for alllinear functionals a ∈ X∗ then necessarily x = 0. If fact we know that ifwe have a dense subset S ⊂ X∗ then

〈x, a〉 = 0 for all a ∈ S ⇒ x = 0.

This fact is used in the derivation of so called weak solutions. Assumethat we are studying an equation

Lx = b,

where L : X → Y is a linear operator. If D is a subspace in the dual X∗ and

〈Lx− b, a〉 = 0 for all a ∈ D,

such vector x is sometimes called a weak solution of the equation. In casethe subset D is dense we have a real solution.

Example. Let us look the Poisson PDE

∂2u

∂x2+∂2u

∂y2= b(x, y)

41

defined on an open set Ω ⊂ R2. Let us consider the space

C∞0 (Ω) = φ(x, y) : φ is smooth and has compact support ⊂ Ω ,

often called test functions.The the derivation of the weak solutions of the equation is now done as

follows. ∫ ∫Ω

[∂2u

∂x2+∂2u

∂y2− b(x, y)

]φ(x, y)dxdy = 0.

Applying Green’s theorem this can be written

∫ ∫Ω

[∂u

∂x

∂φ

∂x+∂u

∂y

∂φ

∂y+ b(x, y)φ(x, y)

]dxdy = 0, or in shorter notation

∫ ∫Ω

[uxφx + uyφy + bφ] dxdy = 0.

Solutions of this equation are called weak solutions of the Poisson equa-tion. Such a techique is central in the Galerkin method for finite elementsolutions for PDE:s.

8.8 Some classes of operators

Some special types of linear operators appear in various contexts. An opera-tor A : X → Y is called finite dimensional if the image A(X) ⊂ Y is finitedimensional. The operator is called normal if A∗A = AA∗. The operator iscalled unitary if A∗A = I. Operator is nilpotent in case Ak = 0 for somepower k. An operator is compact if the image of the unit ball x : ‖x‖ = 1is contained in a compact set.

Example. Let X = C[0, 1] and I be a set of interpolation pointsz1, z2, .., zk ∈ [0, 1]. Define an operator

T : C[0, 1]→ P k[0, 1] ⊂ C∞[0, 1]

which maps a function f into the interpolation polynomial

Tf = pI(f).

This operator is clearly finite dimensional, dimT (X) = k.

42

Compact operators are very close to finite dimensional operators and thisdetermines their behavior. The following is true. An operator

T : X → Y

is compact if and only if there are a sequence of finite dimensional operatorsTn : X → Y such that

‖Tn − T‖ → 0.

Example Consider the integral operator

(Tu)(s) =

∫ b

a

k(s, t)u(t)dt

on the space of L1[a, b] for instance. Assume that the kernel function k(s, t)can be approximated by a sequence of simple functions

kn(s, t) = ΣkχA(k)(s, t).

Define operator Tn as follows

(Tnu)(s) =

∫ b

a

kn(s, t)u(t)dt.

It is clear that these are finite dimensional operators and

‖T − Tn‖ → 0.

So this generic integral operator formula, with some mild assumptions onk(s, t), genenerates a compact operator.

The possibility to approximate a compact operator by finite dimensionaloperators is used in many contexts. One can show for instance that therange T (x) of a compact operator must always be separarable, the adjointoperator T ∗ is also compact etc. An interesting example in Hilbert space isthe operator

Ax = Σiλi 〈x, ei〉 ei,

where ei is a sequence of orthonomal vectors. One can show that this operatoris compact if and only if λi → 0. All compact operators in a Hilbert spacehave such a representation.

43

8.9 Eigenvalues

Recall that for a matrix A ∈ Rn∗n an eigenvalue λ is a number which solvesthe equation Ax = λx or (A−λI)x = 0 for some vector x 6= 0. This is equiv-alent to saying that the linear operator (A− λI) has nontrivial null-space or(A− λI) is non-invertible.

The same idea can be transferred to arbitrary normed space X. IfT : X → X is a linear operator onX, then λ is an eigenvalue of T ⇔ Tx = λ x. This means also that the linear operator T − λI does not have an inverseoperator. In normed spaces one usually distinguishes the cases

T − λI has an inverse operator

T − λI has an inverse operator and it is bounded.

The set

λ : T − λI does not have a bounded inverse operator

is called the spectrum of the operator T .

Study of eivenvectors and eigenvalues of linear operators is called spec-tral theory. Eigenvalues and eigenvectors appear in many contexts, includingspectral methods in solvin PDE:s, integral equations etc.

Example: Integral equation λu(t) −b∫a

k(t, s)u(s)ds = v(t) can be writ-

ten as λu − Tu = v, where Tu =b∫a

k(t, s)u(s)ds. Here we see the familiar

structure (λI − T )u = v .

Example: This equation represents a model of forced vibration

u′′(t) + λu(t) = v(t) with u(0) = u(1) = 0.

Using the notation with differential operator D this reads(D2 + λI)u = v . Eigenvalues in this case would reveal the typical resonantfrequencies (eigenfrequency) and the eigenvectors would be the correspond-ing function shapes uk(t), often called vibration-eigenmodes in engineering.

Example: The following is an equation of a vibrating membrane

λu−∇2u = f,where Ω ⊂ R2, and u = 0 on ∂Ω.

44

This can be written (λI −∇2)u = f .This model has been used in medical context to study the dynamics of heartvalve with an intention to diagnose flawed structure or malfuntion from themeasured electrocardiogram.

Example : The following is a simple heat equation in 1D.ut − kuxx = 0. Boundary cohnditions are given as

u(−1, t) = 0, u(1, t) = 0, u(x, 0) = f(x)

The usual method of separating variables will illustrate the idea of eigen-vector method in solving PDE:s. Let us look for solutions of the followingtype u(x, t) = M(x) · L(t) . By differentiating this expression (twice on x)and separating x and t on different sides of the equation we will get

L′(t)

L(t)= k

M ′′(x)

M(x)= λ

The last equal sign is because we can conlude that left and right sides are nolonger functions of either x or t. This equation will split into two separateODE:s

kM ′′(x) = −λM(x)

L′(t) = λL(t)

The equation on M(x) has a characteristic equation −ka2 = λ with rootsa = ±i

√λ/k so the solutions are

M(x) = A cos

√λ

kx+B sin

√λ

kx.

The boundary conditions require that M(−1) = M(1) = 0 and so onlycertain values for λ satisfy this. We have

λn = k(nπ/2)2 n = 0, 1, . . . .

These acceptable values of λ are the eigenvalues of this problem and thecorresponding eigenmodes are

Mn(x) = A cos(nπx/2) +B sin(nπx/2)

When the found values of λn are inserted to the L-equation we obtainthe solutions Ln(t) . The solution of the original problem is finally sought asseries representation

u(x, t) =∞∑n=1

αnMn(x)Ln(t)

45

9 Operators in Hilbert Space

In a Hilbert space H the formula x → 〈x, a〉 defines a linear functional foreach a ∈ H. A very important Riez theorem states that these are the onlylinear funtionals in H and the norm of this functional is ‖a‖ . This meansthat the dual space of H is identical with H, that is H = H∗ . As an example(l2)∗ = l2.

9.1 Self Adjoint Operator

If T : H → H is an operator on H, the The adjoint operator T ∗ : H → His defined by the formula

〈T ∗x, y〉 = 〈x, Ty〉

If V and W are any normed spaces and T : V → W ia a linear operatorthe above formula defines an adjoint operator T : V ∗ → W ∗between thecorresponding dual spaces. However in Hilbert space H = H∗ so the dualoperator is acting on the space itself.

In the case of Rn this reduces to familiar notion. The inner product isnow 〈x, y〉 = xTy. If H = Rn and A ∈ Rn∗n the adjoint of the operatorx → Ax will be simply x → A∗x , that is taking the adjoint equals takingtranspose.

The following condition defines a self adjoint operator

〈Tx, y〉 = 〈x, Ty〉 .

This means that T = T ∗. In the case of a matrix A ∈ Rn this conditionmeans a symmetric matrix as the following calculation shows

〈Ax, y〉 = 〈x,Ay〉 , ∀x, y(Ax)Ty = xT (Ay),

xTATy = xTAy

AT = A

Self adjoint operators have many useful properties. The eigenvectors cor-responding to distinct eigenvalues are always orthogonal. Assume T : H → His self adjoint, λ 6= µ and

Tu = λu, Tv = µv

46

The we can calculate

λ 〈u, v〉 = 〈λu, v〉 = 〈Tu, v〉 = 〈u, Tv〉 = 〈u, µv〉 = µ 〈u, v〉

Since λ 6= µ we must have 〈u, v〉 = 0 which means that u ⊥ v.

9.2 Example

To find the adjoint of a given operator requires careful computing. As anexample think an operator A : L2(R)→ L2(R) defined as

u(t)→ a(t)u(t+ 1),

where a(t) is a bounded integrable function. Find the adjoint operator A∗.Is the operator A self adjoint?

9.3 Spectral representation of an operator

Let un be an orthonormal sequence ∈ H and

αn a square summable sequence of scalars∞∑1

|αi|2 <∞.

Define an operator T : H → H by formula

Tx =∞∑1

αi 〈x, ui〉ui

Then Tuk =∞∑1

αi 〈uk, ui〉ui = αkuk so each uk is an eigenvector and αk

is an eigenvalue. Moreover the following calculation shows that the operatoris self adjoint.

〈Tx, y〉 =

⟨∞∑1

αi 〈x, ui〉ui, y

=∞∑1

αi 〈x, ui〉 〈ui, y〉 , and similarly

〈x, Ty〉 =∞∑1

αi 〈y, ui〉 〈ui, x〉

47

Let us study a case where T : H → H is a self adjoint operator withfinitely many eigenvalues λ1, λ2, . . . , λn eigenvalues. For each λ we candefine a set

H(λ) = x ∈ H | Tx = λx.

This is always a subspace and it is called an eigenspace. Is clear thatdimH(λ) > 0⇐⇒ λ = λi for soem i.Due to a remark above H(λi)⊥H(λj) whenever λi 6= λj .

Let us assume, that the eigenspaces span the whole space (it happen whenthe operator has enough eigenvectors). The we can write

H = H(λ1)⊕H(λ2)⊕ . . .⊕H(λn).

Let Pi : H −→ H(λi) be the natural orthogonal projection on theeigenspace H(λ). Then for each x we can derive the following decomposition

x = P1x+ P2x+ . . .+ Pnx

Tx = T (P1x+ P2x+ . . .+ Pnx)

Tx = T (P1x) + T (P2x) + . . .+ T (Pnx)

Tx = λ1P1x+ λ2P2x+ . . .+ λnPnx which means that

T = λ1P1 + λ2P2 + . . .+ λnPn

The operator has been decomposed into a combination of projections.

9.4 Spectral decomposition on matrices

In the finite dimensional case we have H = Rn and a symmetric matrixA ∈ Rn∗n A> = A. This matrix has orthonormal eigenvectors u1, u2, . . . , un. Let the eigenvalues be λ1, λ2, . . . , λn.

Let us form a matrix of the eigenvectors P = [u1, u2, . . . , un]. Due to wellknown facts in theory of matrices this P is now oothogonal P>P = I (why?)and A = PDP> where D = [λ1 . . . λn] is the diagonal matrix of eigenvalues.Columnwise matrix multiplication gives

A = PDP> = [u1, u2, . . . , un]D[u1, u2, . . . , un]>

one gets

A = λ1u1u>1 + λ2u2u

>2 + . . .+ λnunu

>n

This is the spectral decomposition of a symmetric matrix.

48

10 Fourier Analysis

The classical Fourier-analysis is a example of orhonorma decomposition, basisfunctions etc. Let us consider the space H = L2(−π, π).

The functions ek = eikt form an orthogonal system in this space. Onecan show that

〈ek, em〉 =

∫ π

−πeikte−imtdt = 0 for all k 6= m.

Given a function f(t) defined on the interval [−π, π] we want to representthe function using the orthonormal basis

f =∑k

〈f, ek〉 ek

=∑k

αkeikt

If such decomposiotion exist, we know that due to orthogonality

αk = 〈f, ek〉 =

∫ π

−πf(t)e−iktdt

In case the function f(t) does not admit the above decomposition, weknow that the series anyhow gives the best approximation, the orthogonalprojection of the function f(t) on the subspace spanned by the funtions eikt.

10.1 Fourier Transform

Let f(t) be a function defined on the whole real axis −∞,∞. We generatethe Fourier series representation of f(t) on an interval−T/2, T/2. Defineω = 2π/T = ω(T ). The funtions einωt are now an orthonormal basis in theHilbert space L2 [−T/2, T/2] and we can generate the Fourier decomposition

f(t) =∞∑

n=−∞

cneinωt

=∞∑

n=−∞

cn(T )einωt

The coefficients are

cn(T ) =ω

∫ T/2

−T/2f(t)e−inωtdt

49

This can be written as

f(t) =1

∞∑n=−∞

einωtG(inω)ω,

where

G(inω) =

T/2∫−T/2

f(t)e−inωtdt.

When T −→∞, this series will approach the integral

f(t) =1

∞∫−∞

eiωtF (ω)dω,

where F (ω) =∞∫−∞

e−iωtf(t)dt is called the Fourier transform of f(t). The

preceding integral above defines the Inverse Fourier Transform F (ω)→ f(t).The Fourier transform of f is often denoted also by the symbol f .

11 Time/frequency localization, Wavelets

The Fourier transform is meant to reveal, how a funntion is composed as amixture of frequencies. In the case of Fouries series on a finite interval oneobtains a denumerable decomposition of harmonic components. In the caseof Fourier transform we have a continuum of harmonics mingled together andthe function is assumed to be defined on the whole real axis.

One would often like to analyse the frequency composition of a functionthat changes its behavior in time. We may have a signal which exhibits localvariations in time. Think a voice/audio signal with transient vibrations,seismic signals, syllables in speech, chirps in natural sounds etc.

Fourier transform is not able to detect local passing structures in thesignal, like localise the time when a sound is uttered. Fourier analysis is aboutthe global befavior of the function. Soe exaples are ilutrated below. TheFourier transform of a monocromatic sien wave funtion is concentrated at onepoint, the single frequency ω. On the other hand if one has an infinitely sharppulse (so called delta function) at time point t0 the Fourier transform will beconstant over the entire spectrum ω ∈ (−∞,∞). If we have a rectangularpulse on the interval −T, T , teh Fourier transform does not reveal in anyway, where this pulse happened. Any translate of the pulse would give thesame Fourier transform.

50

f F

ω

t0 ω

−T T-

11.1 Window function

To analyse time-localised or similarly in image analysis spatially localisedfeatures in a signal, one solution is windowed Fourier transform. The ideais to cut a pieceof the signal multiplying it by a local function φ(t−) andcomputing the Fourier trasform of the windowed function

f → φf → F [φf ] (2)

When the window is moved around by shifting it φ(t−s) one can analyzethe local frequency content everywhere. One possibel window funtion is therectangle function. However tht sharp edges will cause embarrassing ripplesin teh transform. To oversome this a smoother window funtion has beenproposed. A famous one is

φ(t) =1

2παet

2/4α

51

This is the well known Gaussian function and is called Gabor Filter.The windowed Fourier transform thus generated is called Gabor transform.

11.2 Continuous wavelet transform CWT

We study next a linear filter called continuous wavelet transform. The basisis a function ψ ∈ L2 (R) which satisfies the condition

a∫−∞

ψdt = 0

The funtion ψ will be called ”Mother-wavelet”. The spefific properties of thisfunction will appear later.

The function ψ generates a transfromation W : L2 (R) → L2 (R2), map-ping f(t)→ W (a, b) where

W (a, b) =

∞∫−∞

f(t) · 1√aψ

(t− ba

)dt =

∞∫−∞

f(t) · ψa,b(t)dt = 〈f, ψa,b〉

The notation

ψa,b(t) =1√aψ

(t− ba

)simply means the translated and dilated versions of the base function (mother)ψ. The coefficient in the front means a normalizing constant giving

∞∫−∞

|ψa,b|2 dt =

∞∫−∞

|ψ|2 dt = 1

Note the familiar inner product

W (a, b) = 〈f, ψa,b〉 = f(b) · ψa,0(−b).

Note also that the CWT transforms the signal f(t) (one variable function)into an image whis is a two-variable function W (a, b).

Possible mother wavelets are so called Haar wavelet function

H(t) = χ[0, 12 ] − χ[ 12 ,1]

or Mexican hat which has the form of the function

(1− t2/σ2)exp[−t2/2σ2

].

52

Morlet wavelet has the shape

e−t2

cos (αt) ,

where the constant is α = π√

2/ln2 .Example: The following example will describe the nature of the CWT.

Select as the mother-wavelet for instance Haar wavelet or Mexican Hat. De-fine

f1(t) = sin(πt100

), if 0 6 t 6 10;

f2(t) = sin(πt200

), if 5 6 t 6 15,

and set f = f1 +f2. The computation creates a two variable function W (a, b)depicted in the following image.

11.3 Redundancy of CWT

Many features of Fourier transform are also true for CWT. The transform

f(t) W (a, τ)

is reciprocal. One can reconstuct the function f(t) from the Wavelet trans-form W (a, τ) by an inverse transform. For the formula f ← W see[]. Im-portant fearture is that the transform W (a, b) contains a lot of redundantinformation. One can reconstruct the function f(t) knowing the value of theCWT in only at a denumerable discrete set of points.

This means that instead of all dilatations and transformations

ψ

(t− τa

)we only need to use integer translates τ = l and the dyadic dilatations a = 2k

. In this way we arrive to the discrete WT.

11.4 Discrete wavelet transform DWT

The discrete version of wavelet transform is now

f(t) =∑k

∑l

d(k, t) · 2−k/2 · ψ(2−kt− l

)The wavelet tarnsform - when succesfuly computed - means decomposing

the signal into components that will dissect the signal into pieces of informa-tion. The dilatated versions

ψ(2−kt

), −∞ < t <∞ and k ∈ Z

53

will be able to feel or diagnose structures of the signal at different scales.The translates represent the localization of the events (in time or space).

The discovery and choice of the mother wavelet function ψ(t) will ingeneral be the crucial exciting part of the wavelet theory. The powerfulproperties depend on the choice.

11.5 Haar MRA

A simple example of a wavelet is Haar wavelet

H(t) = χ[0, 12 ] − χ[ 12 ,1]

This is conceptually clear but not very useful in applications. Due to thesimplicity it can be used to explain the main ideas.

The purpose of the choice of the mother wavelet analysis is to generate asplitting of the signal into approximations which represent the information ofthe signal on different scales. Such decomposition is called MultiresolutionAnalysis MRA.

Simplest example is the following, called Haar MRA. We study a functionf(t) defined on the real axis. We define a set of functions fk(t) which areconstants ( = average of f) on the dyadic intervals

[2kl, 2k(l + 1)

]

f0(t) =l+1∫l

f(τ)d, τ if l 6 t 6 l + 1;

f1(t) = 12

2l+2∫2l

f(τ)dτ, if 2l 6 t 6 2l + 2;

f−1(t) = 11/2

l+12∫l2

f(τ)dτ, if l26 t 6 l

2+ 1

2.

Define a set of functions

Vk = f | f ≡ constant on 2kl 6 t < 2k(l + 1) .

This is a subspace in the Hilbers space L2, or Vk ⊂ L2. In fact we havegenerated a chain of nested subspaces

. . . ⊂ V2 ⊂ V1 ⊂ V0 ⊂ V−1 ⊂ . . . ⊂ V−k ⊂ L2 (R)

It is natural to define

V∞ = 0 and V−∞ = L2 (R)

54

12 Calculus in Banach Space

Next we discuss integrating of functions with values in a Banach space X.Let (Ω,Σ, µ) be a measure space and u : Ω→ X a measurable function.

Example: We have corrosive messy liquid in a tube with a profile ofacidity varying in x and t. Let Ω = [0, T ] be the time period of the experi-ment. Let E = L1[0, 1] be the space of acid concentration profiles in the tubesituated at x ∈ [0, 1]. The acididy profile f(x, t) in changing in x−directionand pulsating randomly in time t. Here (Σ, µ) models the probability struc-ture regarding how often and how long an acidity profile f(x, t) persists whenrandomly hopping in E = L1[0, 1].

In this way we have defined a function u : [0, T ] → L1[0, 1] which ismapping t → u(x, t). In the following we explain what we mean by theintegral ∫

Ω

udµ =

∫ T

0

udµ,

which is supposed to represent the accumulated total corrosive effect onthe pipe along the distance x.

12.1 Bochner integral

The following function u : Ω→ X is called a simple function

u(ω) =n∑i=1

uiχEi(ω).

Here the sets Ei ⊂ Ω are disjoint measurable sets.We define the integral of the function u as follows∫

Ω

udµ =n∑i=1

uiµ(Ei).

For an arbitrary measurable subset A ⊂ Ω the integral is defined∫A

dµ =n∑i=1

uiµ(Ei ∩ A).

Let u : Ω → X be an arbitrary function on Ω and A ⊂ Ω. We assumethat there is a sequence of approximating simple functions un : Ω→ X suchthat

55

‖un(ω)− u(ω)‖ → 0, for all ω.

Now if∫

Ω‖u‖ dµ <∞ , the integrals of the simple functions

∫E

undµ

will then become a Cauchy sequence in X. The limit is called the Bochnerintegral of u(ω) ∫

E

udµ = limn

∫E

undµ.

12.2 Gateaux derivative

Let X be a vector space Y a normed space and u : X → Y, Y a function.If x ∈ X and η ∈ Y , we define

Du(x) · η = limα→0

u(x+ αη)− u(x)

α.

If this limit exists, it is called the directional derivative of u at point x todirection η. The directional derivative is a function Du(x) : X → Y (definedmaybe only on some directions).

The following fundamental fact has important applications in optimiza-tion, especialy optimal control and optimal shape design

Theorem 12.1. Assume that f is a function f : X → R, x is an ex-tremal point of this function and the directional derivative Df(x) exist, thenDf(x) = 0

Assume that we need to find a function shape y(t) so that it minimizesa given utility/cost functional. The cost functional may represent costs ofmanufacturing, use of raw material, penalty due to technical performance etc.In optimal control we may be interested in optimizing cash flow, minimizingtime to finish etc.

Example: We need to find a function shape y(t) on the interval [0, 1] sothat it minimizes the functional F (y) given below.

F : C[0, 1]→ R, F (y) =

∫ 1

0

ty(t) + y(t)2dt

We compute the Gateaux derivative of this functional at point y(t) ∈C[0, 1] into direction η = η(t) ∈ C[ 0, 1 ].

56

F (y + αη)− F (y) = F (y(t) + αη(t))− F (y(t))

=

∫ 1

0

t [y(t) + αη(t)] + [y(t) + αη(t)]2 dt−∫ 1

0

ty(t) + y(t)2dt

From this one can easily derive the limit (Exercise!)

DF (y)η =

∫ 1

0

(t+ 2y(t))η(t)dt

If there is a minimum for the functional F (y(t)) it should satisfy

DF (y)η = 0 ∀η

Especially by setting η = (t+ 2y(t)) we get

DF (y)η =

∫ 1

0

(t+ 2y(t))2dt = 0

This is possible only if t+ 2y(t) = 0 so we have shown that the candidate foran optimal shape is

y(t) = −1

2t

.Example: As an exercise compute the Gateaux derivative of the follow-

ing cost functional

F (y(t)) =

∫ 1

0

x(t)2 + x(t)x′(t)dt

Often the optimization should be carried out in a subspace of functions,expressed by boundary conditions, such as

V =x(t) ∈ C1[0, l] : x(0) = x(1) = 0

, or

W =x(t) ∈ C1[0, l] : x(0) = 0, x′(1) = 0

etc.

12.3 Frechet derivative

There is another way to define the derivative of a function u(x) betweennormed spaces X and Y . Assume that the function u can be locally approx-imated in the neighbourhood of point x by a linear function. More preciselyif L is a linear operator L : X → Y such that

57

u(x+ h) = u(x) + Lh+ ‖h‖φ(h)

and φ(h) → 0 as h → 0 . Then we say that the linear operator L is theFrechet derivative of u at point x. We denote this operator by L = du(x).If u has Frechet derivative at x, it is easy to see that it also has Gateauxderivative for every direction.

Example In optimal control theory one often has a cost functional of thegeneric form

F (x(t)) =

∫ b

a

u[x(t), x′(t), t]dt.

Here u = u(x, y, z) is a function in C2(R3) that is having second partialderivatives. One can show ( CP page 98) that the Frechet derivative of thisfunctional is ∫ b

a

[∂u

∂x− d

dt

(∂u

∂y

)]hdt+

[∂u

∂y

]ba

.

This is the basis of Euler-Lagrange equations in variational calculus.

13 Stochastic Calculus

13.1 Random variables

A random experiment refers to a system with uncertain, unpredictable out-come ω. The sample space Ω is the universe of all possible outcomes. Subsetsof Ω are called events. The events constitute a collection called σ- algebra Σ.A random variable is a numeral variable that is determined by the outcomeω. The probabaility

Mathematically speaking probability p is now a measure on the σ- algebraΣ. A random variable

x : Ω→ R

is a measurable function on Ω and we assume that x ∈ L2(Ω).The expactation of x is defined as E(x) =

∫Ωxdp. If x, y : Ω → R are

two random variables, the covariance between them is defined as

cov(x, y) = E(x− Ex)(y − Ey)

= Exy − Ex · Ey and the variance is

V ar(x) = cov(x, x) = E(x− Ex)2

= Ex2 − (Ex)2

58

Note: If for a random variable we have µ = E(x) = 0 then

variance = V ar(x) = E(x2) =

∫Ω

x2dp = ‖x‖22 .

For Gaussian variables, if E(x) = E(y) = 0 then

ρ(x, y) = 0⇔ E(xy) = 0⇔ x,y are independent.

13.2 Stochastic process

This chapter is an introduction about how fuctional analytic concepts andmethods are used in stochastic calculus, analysis of stochastic processes andstochastic differential equations.

A stochastic process is a sequence or continuous chain of random vari-ables Xt : Ω→ R where t ∈ [0, T ].Here Ω is probability space. Some examplecould be x(k) =”number of calls in GSM - network”y(k) =”share value of a company X”z(k) =”diameter of copper wire”v(k) =”amount of insurance claims in week k received by a car insurancecompany”

Knowledge of the stochastic behaviour of the process can be given as thejoint probability distribution of samples

x(t1), x(t2), . . . x(tk)

For instance in the case of monitoring the quality of continuos manufac-turing process (paper weight, copper wire diameter) one can assume that thevariable x(t) has a probability distribution x(t) ∼ n (µ(t), σ2).

If the variable exhibits internal coupling between time point t and t+ k,this would be visible in the correlation coefficinet of the joint distribution ofthe random vector x(t), x(t+ k).

13.3 Gaussian stochastic process

Multinormal distribution is defined as

f(x) = Cexp− 1

2(x− µ)TA(x− µ).

Here A is the inverse of the covariance matrix which gives the structureof internal dependencies of the random vector x(t1), x(t2), . . . x(tk).

59

A Gaussian stochastic process is one where the vector x(t1), . . . x(tk)has a multinormal distribution. The stochastic process has its characteristicfeatures, expectation

E(X(t)) = µ(t)

, and the covariance function

r(s, t) = cov (X(t), X(s)) = E (X(t) ·X(s))− µ(t) · µ(s).

13.4 Stochastic differential equation

A familiar differential equation of forced vibration of mass-spring system canbe used to model the behavior of car tire suspension, where the effect of thenon-smooth road is appearing as a forcing term affecting the system. Thisequation (when changed into a 1st degree system) has a typical form

x′(t) + a(t)x(t) = F (t)

The complicated irregular shape of the non-smooth road surface suggestto model the forcing term as a stochastic process F (t) = b(t)w′(t) where w′(t)means so called white noise and b(t) it’s amplitude. In this way we arrive ata simple example of a stochastic differential equation

x′(t) + a(t)x(t) = b(t)w′(t)

The analysis of such an equation is done by transforming it into a corre-sponding equation about integrals

x(t) = x(0) +

∫ t

0

a(s)x(s)ds+

∫ t

0

b(s)dw(s).

To understand and analyse such equations one needs to know the basicsof a stochastic integral.

Let us have a stochastic process ω(t) : Ω→ R or (Rk) where t ∈ [0, T ].If we have a measurable function B ∈ L2[0, T ] we want to define what wemean by the integral of B(t) with respect to the stochastic process ω(t)∫ T

0

B(t)dω(t) =?

The definition of teh integral will be based on the idea of Riemann-Stieltjes integral, by studying the convergence of partial sums

ΣjB(tj)[w(tj+1)− w(tj)],

60

when the partition t0, t1, . . . , tn is refined. Notice that since ω(t)| t ∈[0, T ] is a random orbit of the process, the integral will also become arandom variable. More exactly we will have a random variable∫ T

0

B(t)dω(t) : Ω→ R.

In most cases the integrating process ω(t) will be of a special type calleda Wiener process.

13.5 Wiener process and White noise

White noise (discrete) process is defined as a sequence xn of independentindetically distributes random variables with E(x) = 0 Hence the covariancefunction is

cov(xn, xm) = r(n,m) =

σ2 m = n0 m 6= n

One can also have a white noice process in k dimensions, where

xn = (x1n, x

2n, . . . , x

kn) ∈ Rk

and E(xn) = 0 = (0, 0, . . . , 0). Then

R(m,n) = E(xnx′m) =

σ2

1 0 . . . 00 σ2

1...

. . .

0 σ2

One can also define a continuos time White noise process x(t) where

x(t) ∼ n (0, σ2) and cov(x(t), x(s)) = 0 when t 6= s.Let w(t), t ≥ 0 be a Gaussian stochastic process, that is w(t) ∼ n (0, σ2).

It is called a Wiener process if its covariance function is

r(s, t) = cov[w(s), w(t)] = σ2min(s, t).

This assumption will imply the following

(1) E[(w(t)− w(s))2

]= σ2(t− s)

(2) If [s1, t1], [s2, t2] are disjoint intervals, then

w(s1)− w(t1), w(s2)− w(t2) are independent.

The uncertainty regarding the value of the variable increases linearly in timeand the process has independent increments.

61

13.6 Stochastic integral, an introduction

The integral of a simple function

B(t) =n∑1

BiχEi(t)

is clearly ∫ T

0

B(t)dw(t) =n∑1

Bi[w(ti+1)− w(ti)]

Since w(t) is a Wiener process, we have E[w(ti)] = 0 for all i. Hence also

E

[∫ T

0

B(t)dw(t)

]= 0.

Due to the assumptions of Wiener process (variance ruel, independence ofincrements) we can compute the L2-norm

∥∥∥∥∫ T

0

B(t)dw(t)

∥∥∥∥2

2

= V ar ΣjBj[w(tj+1)− w(tj)]

= ΣjB2jV ar[w(tj+1)− w(tj)]

= σ2ΣjB2j [tj+1 − tj]

= σ2 ‖B‖22

For a general measurable function B(t) we take a sequence of simplefunctions Bn(t) so that

‖Bn −B‖2 → 0.

The integral will then be defined as a limit∫ T

0

B(t)dw(t) = limn→∞

∫ T

0

Bn(t)dw(t).

The limit exists because the values of the integrals∫Bndw(t) are a

Cauchy sequnce in L2

∥∥∥∥∫ T

0

Bn(t)dw(t)−∫ T

0

Bm(t)dw(t)

∥∥∥∥2

=

∥∥∥∥∫ T

0

Bn(t)−Bm(t)dw(t)

∥∥∥∥2

= σ ‖Bn −Bm‖2 .

62

13.7 Black Scholes model

A famous example of a stochastic differential equation if the Black-Scholesmodel

dSt = µStdt+ σStdw(t) or equivalently

dStSt

= µdt+ σdw(t).

This model is used to describe the time evolution of stock prices. The modelhas been the basis of financial trading since it can be used to compute optionprices. The coefficient µ is the natural interest rate, σ so called market-volatility and w(t) a Brownian motion describing the non-predictable im-pulses affecting the market.

13.8 System with noisy input

Let us consider a simple system modelled by a differential equation

x′(t) = −ax(t).

The model may describe growth/decay and also a system with a tendencyto approach an equilibrium x = 0. Assume that the system is also subjectto outside perturbation, random impulses. We write

x′(t) = −ax(t) + w′(t),

where w′(t) = white noise process, the derivative of a Wiener processw(t). Multiplying by exp(at) we get

eatx′(t) + eatax(t) = eatw′(t)

d

dt

[eatx(t)

]= eatw′(t)

Integrating this equation gives

eatx(t) = x(0) +

∫ t

0

easw′(s)ds

x(t) = e−atx(0) +

∫ t

0

ea(s−t)w′(s)ds

x(t) = e−atx(0) +

∫ t

0

ea(s−t)dw(t)

63

It is easy to generate numerical simulations of this model. The solutionare - as they should be - random realizations of a stochastic process. Due tothe properties of Wiener process we know that the increments

∆w(t) = w(t+ h)− w(t)

are Gaussain random variables with variance hσ2 and so we have

w(t+ h) = w(t) + Zσ√h,

where Z is a random variable sampled from the standard normal distributionZ ∼ n(0, 1). Using this one can generate simulated paths for x(t).

14 Optimization and introduction to optimal

control

Vector space concept, especaily Hilbert space methods are important toolsin optimization theory. Let us start by some notes on bilinaar forms. Let Hbe a Hilbert space and F = F (x, y) a function: H × H → R that is linearwith respect to both variables. Such funtions are called bilinear forms.

A simple example in R3 would be

F (x, y) = Σi,jaijxiyj = 4x1y1 + 6x1y2 − 2x2y2 + x2y3

If L1 and L2 are linear functionals on H then

F (x, y) = L1(x)L2(y)

is also a bilinear form.A bilinear form is coercive if for some c > 0

F (x, x) ≥ c ‖x‖2

for all x ∈ H.Using wellknown theorems in Hilbert spaces and convexity argument one

can prove the following (see Curtain Pritchard p 254).Theorem. If F (x, y) is a continuous symmetric coersive bilinear form

and L is continuous linear functional on H, the functional

J(x) = F (x, x)− L(x)

has a unique minimum point x∗ and this point satisfies the equation

F (x∗, y) =1

2L(y)

for all y ∈ H.

64

14.1 Least squares minimization

Let A : H1 → H2 be a bounded linear operator. We want to solve an equation

Ax = b.

In many real situations we have an equation which does NOT have a solution.A simple example is an overdetermined system where errors in data make itnot solvable. The idea of least squares solution is, instead of exact solution,minimize the functional J(x) = ‖Ax− b‖ . Now

J(x) = 〈Ax− b, Ax− b〉= 〈A∗Ax, x〉 − 2 〈A∗b, x〉+ ‖b‖2

=F (x, x)− 2L(x) + + ‖b‖2 ,

where F is a symmetric continuous bilinar form anf L a bounded linearfunctional. If now the operator A∗A is coersive, or equivalently A satisfies

‖Ah‖2 ≥ c ‖h‖2 for somec > 0,

then we could use the previous theorem to find the LS-solution.In this case we can also directly compute the Frechet differential of the

functional J(x). After a straightforward calculation one gets

dJ(x) = 2A∗Ax− 2A∗b.

The extremum point u is found by solving the so called normal equation

A∗Au = A∗b.

Example Consider finding a solution u = u(x) on an interval [a, b] forthe differential equation arising for instance in mechanics

− d

dx

[a(x)

du

dx

]= f(x).

Here a(x) > 0 and the function f(x) represents an outside loading/inputonto the system. If for instance the loading is given as a step function withdiscontinuities, it may lead to an equation which does not allow an exactsolution. Let us write

A = − d

dx

[a(x)

d

dx

].

65

One would like to apply the method explained above to find the least squaressolution for Au = f (Exercise). Note that the space C2[a, b] is not a Hilbertspace. Let us consider instead the Sobolev space

H2 [a, b] =u(x) | Du,D2u ∈ L2 [a, b]

.

Here D and D2 mean generalized derivatives (see Hall-Porshing p 108).This space will become a Hilbert space when we incorporate it with an innerproduct

〈u, v〉 =

∫ b

a

uv +DuDv +D2uD2vdx

and the norm

‖u‖ =

∫ b

a

|u|2 + |Du|2 +∣∣D2u

∣∣2 dx.To carry out the task one needs to find the adjoint operator A∗ first.

14.2 Inverse problems and regularization

We have a linear operator T : X → Y between normed spaces. We aresolving an operator equation Tx = y in the case where this operator doesnot have a decent inverse operator. Either T−1 is unbounded or the norm‖T−1‖ is very large. In this case the problem is called ill-posed. In practicewe cannot know y exactly but our model is disturbed by some measurementerror ε, which means that we are actually solving an equation

Tx = y + ε.

Small deviation in y will now generate big or huge deviation in the solutionx = T−1(y+ ε). This is the nature of so called inverse problems. An exampleof such phenomenon are the well-known integral operators

Tu(s) =

∫k(s, t)u(t)dt.

Example. In photography an accidental shift of the camera and perhapspoor focus will result in a ”blurred image”. If u = u(x, y) is the correct image(colour density in gray scale) the blurred image b = b(x, y) will be a resultof an integral transform

b(x, y) = T [u(s, t)]

of the type described above, this time with double integral of course.

66

Example. Let us look the integral equation∫k(s, t)u(t)dt = y(s)

We consider this integral operator T between normed spaces, where thenorm of u(t) ∈ X is defined as

‖u‖ =

[∫|u(t)|2 + |Du(t)|2 dt

]1/2

and as the norm of y(s) ∈ Y we take usual L2-norm ‖y‖2. Assume thatwe have measured an output η(s) which is known to be inaccurate, so someerror is involved. Due to this error in the approximation η(s) ≈ y(s) andthe unbounded inverse of T−1 the solution of the integral equation can befar away from correct.

The following regularization idea tries to decrease the sensitivity of thesolution. This is done by minimization

‖Tu− η‖2 + α ‖u‖2 → minimum!

The idea is to control the error sum and at the same time keep the norm ofthe solution small. The parameter α > 0 is called regularization parameter.To achieve the minimum we compute the Frechet derivative of the functional

J(u) = ‖Tu− η‖2 + α ‖u‖2

= 〈Tu− η, Tu− η〉+ α 〈u, u〉= 〈T ∗Tu, u〉 − 2 〈T ∗η, u〉+ 〈η, η〉+ α 〈u, u〉

We see that the Frechet derivative is

dJ(u) = 2T ∗Tu+ 2αu− 2T ∗η.

The minimum should satify teh equation

T ∗Tu+ αu = T ∗η.

From this we get

u = [T ∗T + αI]−1 T ∗η.

67

14.3 Example from earth science

Let p(x) be the permeability of soil at depth x, which describes the ability ofwater to seep through the layer of earth. We denote by w(x, t) the amount ofwater (per unit volume) in the soil at depth x at time t. In earth science onehas derived a law which govens the permeation of water (after rain shower,let us say) into the soil. The equation modeling the process looks like this

w(x, t) =T (p(x))

=

∫ t

0

K(x, s)p(s)ds

This equation means an inverse problem. The solution will give the ”waterhold-up” w(x, t).

15 Optimal Control

Let us study the following scheduling problem in the production of a biochem-ical substance in agrobio-industry. The substance is important raw materialin bakery and food production. The material is spoiled easily in storage, sothe consumption of the material must be closely adjusted with the demand.The material is decaying with a small constant of decay α = 0.04 (evapora-tion or some leakage). Adding a catalytic and nutritious material one cangenerate growth. More exactly let x(t) = the amount and u(t) = the amountof catalytic nutrient. The natural decay process is modeled by equation

x(t) = −αx(t).

The organic growth can be modelled by

x(t) = βx(t)u(t).

The period for scheduling the production is [0, T ]. The demand curve variesperiodically as D(t) = 10 + sint(t).

The amount of the key product is described by the differential equation

x(t) = −αx(t) + βx(t)u(t) (1)

We call the amount x(t) as the state of our system. This state can becontrolled by adding the nutrient. The amount u(t) is called control variableof our system.

68

We want to make the production curve to be as close to the demand curveas possible. We try to minimize the total integral of [x(t)D(t)]2. The secondand simultaneous objective is to save the nutrient which is quite expensive touse. So we want to minimize also the square integral of u(t). So all togetherwe want to find a schedule for applying u(t) and the resulting output x(t)following differential equation (*) so that the total cost function

F (x(t), u(t)) =

∫[x(t)D(t)]2 + u(t)2dt (2)

is minimized. The optimal solution if we can find it is a function u ∗ (t)in the space of functions C[0, T ]. Likewise the optimal solution regarding theproduction is also a function x ∗ (t) in the space of functions C[0, T ].

A generic form of optimal control problem can be formulated as follows.We have a system whose state at time t is described by x(t). The stateusually is a vector

x(t) = [x1(t), x2(t), . . . , xn(t)],

but in our case in above example we have 1-dimensional state vector. Thesystem evolves in time and the time-path can be influenced by control func-tion u(t) which also in general case may be vector valued.

The time-evolution of the system is determined by a differential equation(a system of DE:s in general)

x(t) = f(x(t), u(t)), x(0) = x0 (3)

We see, that the system has initial state x(0) = x0. The objective functionthat one wants to maximize (or minimize depending on application) has theform

F (x(t), u(t)) =

∫g(x(t), u(t))dt+ γ(x(T )) (4)

The latter term in this expression means that we may put a price on thefinal state of the system at the end x(T ). It is called sometimes terminalpayoff.

The task is to find function u = u∗(t) which satisfies equation (3) andmaximizes functional F (x(t), u(t)). The solution will be a pair (x, u) =(x∗(t), u∗(t)) giving the optimal control and the resulting optimal time evo-lution leading to maximum of the objective functional.

15.1 Examples

Drug therapy. Let x(t) = number of bacteria in intestinal system, u(t)= concentration of drug which can decrease the bacterial growth. Without

69

treatment the bacteria would be growing exponentially. Let us model thebacterial growth by

x(t) = αx(t)− u(t). (5)

When initial infection is given x(0) = x0, the wish is to minimize theamount of bacteria at the end of the treatment T. At the same time taking alot of drugs is also not nice, so we would like to minimize the negative effect.Let us model the harm caused by drug consumption by

∫u(t)2dt. Hence our

objective would be :

minimize

x(T ) +

∫u(t)2dt

. (6)

Fishery model. We model the total population of fish in a fishery farmby x(t). It may be a huge artificial tank or a natural lake. If the fish areallowed to breed and grow freely, the time evolution will follow a model

x(t) = kx(t)(M − x(t)).

This model describes logistic growth towards a saturation limit value,in the case of fishery that maximum amount of fish that the volume cansustain, due to limitations of space and food. Let u(t) describe the intensityof fishing, more precisely the proportion of the total amount of fish thatfishermen remove from the living space of the fish population. Hence thetime evolution of the size of fish population is modeled by

x(t) = kx(t)(M − x(t))− u(t)x(t). (7)

The commercial value of sold fish is modelled by a function

p(x) = ax− bx2

with certain coefficients. This function will mean that the income does notgrow linearly because the unit price will get lower when the supply of fishis increasing. A term diminishing returns is used. The cost of fishing ismodeled by function cu(t). The net profit to be maximized is now

F (x(t), u(t)) =

∫e−δt[ax(t)u(t)− bx(t)2u(t)2 − cu(t)]dt (8)

Here we have added the discount factor which will emphasize the factthat income arriving early is more valuable because it can be invested toproduce more profit.

70

15.2 Classical calculus of variations problem

Let us see as an example the problem of minimizing functional

F (x(t)) =

∫g(x(t), x(t), t)dt (10)

over the class of differentiable functions on [0, T ] where the initial andfinal states are given x(0) = x0 and x(T ) = xT . Here g = g(x, y, t) isa 3-variable function. By using rules of calculus, including chain rule andTaylor expansion, one can calculate the Gateaux derivative of The functionalF (x(t)). This means that we need to study the difference

F (x+ h)− F (x) =F (x(t) + h(t))− F (x(t))

=

∫g(x(t) + h(t), x(t) + h(t), t)− g(x(t), x(t), t)dt

=

∫[∂

∂xg(x, x, t)h(t) +

∂yg(x, x, t)h(t)]dt+ remainder(11)

Because the function x(t)+h(t) must also satisfy the boundary conditions,we must have h(0) = h(T ) = 0. As h→ 0, one can show that the remainderwill vanish and so the first term will give the Gateaux derivative. Applyingintegration by parts and leaving out some arguments this will be

dF (x)h =

∫ T

0

[∂g

∂x− d

dt

∂g

∂y

]h(t)dt (12)

Because h(0) = h(T ) = 0, the second term in the integration by partswill be zero, that is the substitution[

∂u

∂yh(t)

]T0

= 0.

If x = x(t) is an function that minimizes the functional, then the Gateauxderivative for all increment vectors h(t) should be = 0. This is possible onlyif

∂g

∂x− d

dt

(∂g

∂y

)= 0 (13)

By solving this so called Euler-Lagrange equation one gets the optimalfunction x(t).

71