from vector spaces to function spces

Upload: cesar-quiroga

Post on 13-Apr-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/26/2019 From Vector spaces to function spces

    1/267

    Society for Industrial and Applied Mathematics

    Philadelphia

    From Vector Spaces

    to Function SpacesIntroduction to FunctionalAnalysis with Applications

    Yutaka YamamotoKyoto UniversityKyoto, Japan

  • 7/26/2019 From Vector spaces to function spces

    2/267

    Copyright 2012 by the Society for Industrial and Applied Mathematics

    10 9 8 7 6 5 4 3 2 1

    All rights reserved. Printed in the United States of America. No part of this book maybe reproduced, stored, or transmitted in any manner without the written permission of

    the publisher. For information, write to the Society for Industrial and Applied Mathematics,3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA.

    Library of Congress Cataloging-in-Publication Data

    Yamamoto, Yutaka, 1950- [Shisutemu to seigyo no sugaku. English]

    From vector spaces to function spaces : introduction to functional analysis with

    applications / Yutaka Yamamoto. p. cm. -- (Other titles in applied mathematics) Includes bibliographical references and index. ISBN 978-1-611972-30-6 (alk. paper)

    1. Functional analysis. 2. Engineering mathematics. I. Title.TA347.F86Y3613 2012

    515.7--dc23 2012010732

    is a registered trademark.

  • 7/26/2019 From Vector spaces to function spces

    3/267

    Contents

    Preface ix

    Glossary of Notation xiii

    1 Vector Spaces Revisited 1

    1.1 Finite-Dimensional Vector Spaces . . . . . . . . . . . . . . . . . . . . 11.2 Linear Mappings and Matrices . . . . . . . . . . . . . . . . . . . . . 141.3 Subspaces and Quotient Spaces . . . . . . . . . . . . . . . . . . . . . 191.4 Duality and Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 30

    2 Normed Linear Spaces and Banach Spaces 39

    2.1 Normed Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 392.2 Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.3 Closed Linear Subspaces and Quotient Spaces . . . . . . . . . . . . . 512.4 Banachs Open Mapping and Closed Graph Theorems . . . . . . . . . 542.5 Baires Category Theorem . . . . . . . . . . . . . . . . . . . . . . . . 55

    3 Inner Product and Hilbert Spaces 59

    3.1 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 593.2 Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.3 Projection Theorem and Best Approximation . . . . . . . . . . . . . . 73

    4 Dual Spaces 77

    4.1 Dual Spaces and Their Norms . . . . . . . . . . . . . . . . . . . . . . 77

    4.2 The RieszFrchet Theorem . . . . . . . . . . . . . . . . . . . . . . . 814.3 Weak and WeakTopologies . . . . . . . . . . . . . . . . . . . . . . 824.4 Duality Between Subspaces and Quotient Spaces . . . . . . . . . . . . 85

    5 The SpaceL(X,Y) of Linear Operators 89

    5.1 The Space L(X, Y) . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.2 Dual Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.3 Inverse Operators, Spectra, and Resolvents . . . . . . . . . . . . . . . 925.4 Adjoint Operators in Hilbert Space . . . . . . . . . . . . . . . . . . . 945.5 Examples of Adjoint Operators . . . . . . . . . . . . . . . . . . . . . 965.6 Hermitian Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.7 Compact Operators and Spectral Resolution . . . . . . . . . . . . . . 101

    v

  • 7/26/2019 From Vector spaces to function spces

    4/267

    vi Contents

    6 Schwartz Distributions 107

    6.1 What Are Distributions? . . . . . . . . . . . . . . . . . . . . . . . . . 1076.2 The Space of Distributions . . . . . . . . . . . . . . . . . . . . . . . 1126.3 Differentiation of Distributions . . . . . . . . . . . . . . . . . . . . . 1166.4 Support of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 1196.5 Convergence of Distributions . . . . . . . . . . . . . . . . . . . . . . 1196.6 Convolution of Distributions . . . . . . . . . . . . . . . . . . . . . . 1266.7 System Theoretic Interpretation of Convolution . . . . . . . . . . . . 1316.8 Application of Convolution . . . . . . . . . . . . . . . . . . . . . . . 132

    7 Fourier Series and Fourier Transform 141

    7.1 Fourier Series Expansion inL2[ , ] or of Periodic Functions . . . . 1417.2 Fourier Series Expansion of Distributions . . . . . . . . . . . . . . . . 1477.3 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1507.4 Space Sof Rapidly Decreasing Functions . . . . . . . . . . . . . . . 1527.5 Fourier Transform and Convolution . . . . . . . . . . . . . . . . . . . 1567.6 Application to the Sampling Theorem . . . . . . . . . . . . . . . . . . 159

    8 Laplace Transform 165

    8.1 Laplace Transform for Distributions . . . . . . . . . . . . . . . . . . 1658.2 Inverse Laplace Transforms . . . . . . . . . . . . . . . . . . . . . . . 1708.3 Final-Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 171

    9 Hardy Spaces 175

    9.1 Hardy Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

    9.2 Poisson Kernel and Boundary Values . . . . . . . . . . . . . . . . . . 1789.3 Canonical Factorization . . . . . . . . . . . . . . . . . . . . . . . . . 1849.4 Shift Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1899.5 Nehari Approximation and Generalized Interpolation . . . . . . . . . 1909.6 Application of Sarasons Theorem . . . . . . . . . . . . . . . . . . . . 1969.7 Neharis TheoremSupplements . . . . . . . . . . . . . . . . . . . . 201

    10 Applications to Systems and Control 203

    10.1 Linear Systems and Control . . . . . . . . . . . . . . . . . . . . . . . 20310.2 Control and Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . 20510.3 Controllability and Observability . . . . . . . . . . . . . . . . . . . . 207

    10.4 Input/Output Correspondence . . . . . . . . . . . . . . . . . . . . . . 21510.5 Realization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21710.6 HControl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22010.7 Solution to the Sensitivity Minimization Problem . . . . . . . . . . . 22410.8 General Solution for Distributed Parameter Systems . . . . . . . . . . 22910.9 Supplementary Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 230

    A Some Background on Sets, Mappings, and Topology 233

    A.1 Sets and Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233A.2 Reals, Upper Bounds, etc. . . . . . . . . . . . . . . . . . . . . . . . . 235A.3 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 238A.4 Product Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . 239

  • 7/26/2019 From Vector spaces to function spces

    5/267

    Contents vii

    A.5 Compact Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240A.6 Norms and Seminorms . . . . . . . . . . . . . . . . . . . . . . . . . 241A.7 Proof of the HahnBanach Theorem . . . . . . . . . . . . . . . . . . 241A.8 The HlderMinkowski Inequalities . . . . . . . . . . . . . . . . . . 245

    B Table of Laplace Transforms 247

    C Solutions 249

    C.1 Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249C.2 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251C.3 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252C.4 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252C.5 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252C.6 Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

    C.7 Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256C.8 Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256C.9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

    D Bibliographical Notes 259

    Bibliography 261

    Index 265

  • 7/26/2019 From Vector spaces to function spces

    6/267

    Preface

    This book intends to give an accessible account of applied mathematics, mainly ofanalysis subjects, with emphasis on functional analysis. The intended readers are senior

    or graduate students who wish to study analytical methods in science and engineering andresearchers who are interested in functional analytic methods.Needless to say, scientists and engineers can benefit from mathematics. This is not

    confined to a mere means of computational aid, and indeed the benefit can be greater andfar-reaching if one becomes more familiar with advanced topics such as function spaces,operators, and generalized functions. This book aims at giving an accessible account ofelementary real analysis, from normed spaces to Hilbert and Banach spaces, with someextended treatment of distribution theory, Fourier and Laplace analyses, and Hardy spaces,accompanied by some applications to linear systems and control theory. In short, it is amodernized version of what has been taught as applied analysis in science and engineeringschools.

    To this end, a more conceptual understanding is required. In fact, conceptual under-standing is not only indispensable but also a great advantage even in manipulating compu-tational tools. Unfortunately, it is not always accomplished, and indeed often left aside.Mathematics is often learned by many people as a collection of mere techniques and swal-lowed as very formal procedures.

    This is deplorable, but from my own experience of teaching, its cure seems quitedifficult. For students and novices, definitions are often difficult to understand, and mathe-matical structures are hard to penetrate, let alone the background motivation as to how andwhy they are formulated and studied.

    This book has a dual purpose: one is to provide young students with an accessibleaccount of a conceptual understanding of fundamental tools in applied mathematics. The

    other is to give those who already have some exposure to applied mathematics, but wish toacquire a more unified and streamlined comprehension of this subject, a deeper understand-ing through background motivations.

    To accomplish this, I have attempted to

    elaborate upon the underlying motivation of the concepts that are being discussed and describe how one can get an idea for a proof and how one should formalize the proof.

    I emphasized more verbal, often informal, explanations rather than streams of logicallycomplete yet rather formal and detached arguments that are often difficult to follow fornonexperts.

    ix

  • 7/26/2019 From Vector spaces to function spces

    7/267

    x Preface

    The topics that are dealt with here are quite standard and include fundamental notionsof vector spaces, normed, Banach, and Hilbert spaces, and the operators acting on them.They are in one way or another related to linearity, and understanding linear structuresforms a core of the treatment of this book. I have tried to give a unified approach to realanalysis, such as Fourier analysis and the Laplace transforms. To this end, distribution the-ory gives an optimal platform. The second half of this book thus starts with distributiontheory. With a rigorous treatment of this theory, the reader will see that various fundamentalresults in Fourier and Laplace analyses can be understood in a unified way. Chapter 9 isdevoted to a treatment of Hardy spaces. This is followed by a treatment of a remarkablysuccessful application in modern control theory in Chapter 10.

    Let us give a more detailed overview of the contents of the book. As an introduction,we start with some basics in vector spaces in Chapter 1. This chapter intends to give aconceptual overview and review of vector spaces. This topic is often, quite unfortunately,a hidden stumbling point for students lacking an in-depth understanding of what linearityis all about. I have tried to illuminate the conceptual sides of the notion of vector spaces inthis chapter. Sometimes, I have attempted to show the idea of a proof first and then showthat a complete proof is a realization of making such an idea logically more complete.I have also chosen to give detailed treatments of dual and quotient spaces in this chapter.They are often either very lightly touched on or neglected completely in standard coursesin applied mathematics. I expect that the reader will become more accustomed to standardmathematical thinking as a result of this chapter.

    From Chapter 2 on, we proceed to more advanced treatments of infinite-dimensionalspaces. Among them, normed linear spaces are most fundamental and allow rather directgeneralizations of finite-dimensional spaces. A new element here is the notion of norms,

    which introduces the concept of topology. Topology plays a central role in studying infinite-dimensional spacesand linearmaps actingon them. Normed spacesgive thefirst step towardsuch studies.

    A problem is that limits cannot, generally, be taken freely for those sequences thatmay appear to converge (i.e., so-called Cauchy sequences). The crux of analysis lies inlimiting processes, and to take full advantage of them, the space in question has to beclosed under such operations. In other words, the space must becomplete. Completenormed linear spaces are called Banach spaces, and there are many interesting and powerfultheorems derived for them. If, further, the norm is derived from an inner product, thespace is called a Hilbert space. Hilbert spaces possess a number of important propertiesdue to the very nature of inner productsfor example, the notion of orthogonality. The

    Riesz representation theorem for continuous linear functionals, as well as its outcome of theorthogonal projection theorem, is a typical consequence of an inner product structure andcompleteness. Hilbert space appears very frequently in measuring signals due to its affinitywith such concepts as energy, and hence in many optimization problems in science andengineering applications. The problem of best approximation is naturally studied throughthe projection theorem in the framework of Hilbert space. This is also a topic of Chapter 3.

    Discussing properties of spaces on their own will give only half the story. What isequally important is their interrelationship, and this exhibits itself through linear operators.In this connection, dual spaces play crucial roles in studying Banach and Hilbert spaces.We give in Chapter 5 a basic treatment of them and prove the spectral resolution theo-rem for compact self-adjoint operatorswhat is known as the HilbertSchmidt expansion

    theorem.

  • 7/26/2019 From Vector spaces to function spces

    8/267

    Preface xi

    We turn our attention to Schwartz distributions in Chapter 6. This theory makestransparent the treatments of many problems in analysis such as differential equations,Fourier analysis (Chapter 7), Laplace transforms (Chapter 8), and Poisson integrals (Chap-ter 9), and it is highly valuable in many areas of applied mathematics, both technically andconceptually. In spite of this fact, this theory is often very informally treated in introduc-tory books and thereby hardly appreciated by engineers. I have strived to explain why itis important and how some more rigorous treatments are necessary, attempting an easilyaccessible account for this theory while not sacrificing mathematical rigor too much.

    The usefulness of distributions hinges largely on the notion of the delta function(distribution). This is the unity element with respect to convolution, and this is why itappears so frequently in many situations of applied mathematics. Many basic results inapplied mathematics are indeed understood from this viewpoint. For example, a Fourierseries or Poissons integral formula is the convolution of an objective function with theDirichlet or the Poisson kernel, and its convergence to such a target function is a resultof the fact that the respective kernel converges to the delta distribution. We take this as aleading principle of the second half of this book, and I attempted to clarify the structure ofthis line of ideas in the treatments of such convergence results in Chapter 6, and subsequentChapters 7 and 8 dealing with Fourier and Laplace transforms.

    Chapter 9 gives a basic treatment of Hardy spaces, which in turn played a fundamentalrole in modern control theory. So-called Hcontrol theory is what we are concerned with.Of particular interest is generalized interpolation theory given here, which also plays afundamental role in this new control theory. We will prove Neharis theorem as well asSarasons theorem, along with the applications to the NevanlinnaPick interpolation andthe CarathodoryFejr theorem. We will also discuss the relationship with boundary values

    and the inner-outer factorization theorem. I have tried to give an easy entry point to thistheory.Chapter 10 is devoted to basic linear control system theory. Starting with an inverted

    pendulum example, we will see such basic concepts as linear system models, the concept offeedback, controllability, and observability, a realization problem, an input/output frame-work, and transfer functions, leading to the simplest case ofHcontrol theory. We will seesolutions via the NevanlinnaPick interpolation, Neharis theorem, and Sarasons theorem,applying the results of Chapter 9. Fourier analysis (Chapter 7) and the Laplace transforms(Chapter 8) also play key roles here. The reader will no doubt see part of a beautiful appli-cation of Hardy space theory to control and systems. It is hoped that this chapter can serveas a concise introduction to those who are not necessarily familiar with the subject.

    It is always a difficult question how much preliminary knowledge one should assumeand how self-contained the book should be. I have made the following assumptions:

    As prerequisite, I assumed that the reader has taken an elementary course in linearalgebra and basic calculus. Roughly speaking, I assumed the reader is at the junioror a higher level in science and engineering schools.

    I did not assume much advanced knowledge beyond the level above. The theory ofintegration (Lebesgue integral) is desirable, but I chose not to rely on it.

    However, if applied very rigorously, the above principles can lead to a logical dif-ficulty. For example, Fubinis theorem in Lebesgue integration theory, various the-

    orems in general topology, etc., can be an obstacle for self-contained treatments.

  • 7/26/2019 From Vector spaces to function spces

    9/267

    xii Preface

    I tried to give precise references in such cases and not overload the reader with suchconcepts.

    Some fundamental notions in sets and topology are explained in Appendix A. I triedto make the exposition as elementary and intuitive as to be beneficial to students whoare not well versed in such notions. Some advanced background material, e.g., theHahnBanach theorem, is also given here.

    This book is based on the Japanese version published by the Asakura PublishingCompany, Ltd., in 1998. The present English version differs from the predecessor in manyrespects. Particularly, it now contains Chapter 10 for application to system and controltheory, which was not present in the Japanese version. I have also made several additions,but it took much longer than expected to complete this version. Part of the reason lies inits dual purposeto make the book accessible to those who first study the above subjects

    and simultaneously worthwhile for reference purposes on advanced topics. A good balancewas not easy to find, but I hope that the reader finds the book helpful in both respects.It is a pleasure to acknowledge the precious help I have received from many colleagues

    and friends, to whom I am so much indebted, not only during the course of the preparationof this book but also over the course of a long-range friendship from which I have learnedso much. Particularly, Jan Willems read through the whole manuscript and gave extensiveand constructive comments. I am also most grateful for his friendship and what I learnedfrom numerous discussions with him. Likewise, I have greatly benefited from the com-ments and corrections made by Thanos Antoulas, Brian Anderson, Bruce Francis, TryphonGeorgiou, Nobuyuki Higashimori, Pramod Khargonekar, Hitay zbay, Eduardo Sontag,and Mathukumalli Vidyasagar. I would also like to acknowledge the help of Masaaki Naga-

    hara and Naoki Hayashi for polishing some proofs and also helping me in preparing somefigures. I wish to acknowledge the great help I received from the editors at SIAM inpublishing the present book.

    I would like to conclude this preface by thanking my family, particularly my wifeMamiko for her support in every respect in the past 30 years. Without her help it would nothave been possible to complete this work.

    Yutaka YamamotoKyotoFebruary, 2012

  • 7/26/2019 From Vector spaces to function spces

    10/267

    Chapter 10

    Applications to Systems and

    Control

    Modern control and system theory is built upon a solid mathematical basis. Advanced math-ematical concepts developed in this book are indispensable for further in-depth study ofelaborate theory of systems and control. We here present some fundamentals of this theory.

    10.1 Linear Systems and Control

    Many physical systems are described by differential equations, ordinary or partial. Weoften wish tocontrolsuch systems, natural or artificial. In other words, given a system, we

    want it to behave in such a way that is in some sense desirable for us. Normally, a systemmay contain three types of variables: aninput variable that drives the system, anoutputvariable that can either be observed by some devices or affect the external environment,and astate variable that describes the internal behavior of the system which is not nec-essarily observable from the external environment. Summarizing, a differential equationdescription may take the following form:

    dx

    dt(t) = f(x(t), u(t)), (10.1)

    y(t) = g(x(t)), (10.2)

    wherex

    (t) Rn

    ,u

    (t) Rm

    , andy

    (t) Rp

    denote, respectively, the state, input, and outputvariables. These two equations have the structure that

    1. starting with aninitial state x0 at some time t0, and for a given input u(t), t0,(10.1) describes how the system state x(t) evolves in time, and

    2. the output y(t) is determined by the present value of the state x(t) according to (10.2).

    The fundamental objective ofcontrol is to design or synthesize u(t) so as to make theabove system behave desirably. Here the control input u(t) is something we can maneuverto control the system state x(t), and the output y(t) is then controlled accordingly. In general,it is necessary to make x(t) behave nicely rather than merely controlling the behavior ofy(t) alone, as there can be some hidden behavior in x (t) that does not explicitly appear inthat ofy(t).

    203

  • 7/26/2019 From Vector spaces to function spces

    11/267

    204 Chapter 10. Applications to Systems and Control

    While most systems are nonlinear as above, or maybe even more complex as to bepartial differential or integral equations, their treatment is often difficult. In such cases, oneoften resorts to a more simplified approximation. This amounts to linearizing the differentialequation along a given reference trajectory, by taking the error (variation) from thisprespecified reference trajectory.

    We thus consider the simplified ordinary linear differential system equations as fol-lows:

    dx

    dt(t) = Ax(t) + Bu(t), (10.3)

    y(t) = Cx (t). (10.4)Herex(t) Rn,u(t) Rm, andy(t) Rp are the state, input, and output vectors, as above,and matricesA Rnn,BRnm, andCRpn are constant matrices. When necessary,we regard an initial statex0as fixed at timet0asx(t0) = x0. For every suchx0R

    n

    , (10.3)yields a unique solution onceuis specified.The interpretation of these equations is as follows: There is a system described

    by (10.3), (10.4), and this system accepts input u(t), and the state x (t) changes accordingto thestate transition equation(10.3) with initial state x (t0) = x0. The state variable x (t)simultaneously produces an outputy (t) according to theoutput equation(10.4). Since thecorrespondences [x, u] dx/dtand x yare linear in (10.3), (10.4), we call systemalinear system. Itisalsofinite-dimensional, since the space (called the statespace of)Rn towhichx(t) belongs is a finite-dimensional vector space overR. For brevity of notation, wewill denote the system by the triple (A, B, C).

    Example 10.1.1. Consider Figure 10.1. The pendulum is placed upside down and issupported by a free joint at the bottom. Ignoring the friction at the joint, and also thatbetween the cart and the ground, its equation of motion is given as follows:

    (M+ m)x + ml(cos2 sin ) = u, (10.5)x cos + l g sin = 0. (10.6)

    M

    m

    x

    l

    u

    Figure 10.1.Inverted pendulum

  • 7/26/2019 From Vector spaces to function spces

    12/267

    10.2. Control and Feedback 205

    Set [x1, x2, x3, x4]T := [x,x, ,]T. Then the above equations can be rewritten as

    d

    dt

    x1x2x3x4

    =

    x2mlx 24sin x3 mg sin x3 cos x3

    M+ m sin2 x3x4

    g

    lsin x3

    m sin x3 cos x3(lx24 g cosx3)

    (M+ m sin2 x3)l

    +

    01

    M+m sin2 x30

    cos x3(M+ m sin2 x3)l

    u,

    which is in the form of (10.1)(10.2).If we assume that andare sufficiently small, the following linearized equations

    can be obtained from (10.5)(10.6).

    (M+ m)x + ml= u,x + l g= 0. (10.7)

    This can further be rewritten in state space form (i.e., that not involving higher-order deriva-tives) as

    d

    dt

    x1x2x3x4

    =

    0 1 0 0

    0 0 mgM

    0

    0 0 0 1

    0 0 (M+ m)g

    Ml0

    x1x2x3x4

    +

    01

    M0

    1

    Ml

    u

    := Ax + Bu, (10.8)

    which is in the form of (10.3).

    10.2 Control and Feedback

    Let us examine Example 10.1.1 a little further. The eigenvalues of the A matrix in (10.8)are 0,0,, where =

    (M+m)g/Ml. The fundamental solution for u=0 is given

    by eAtx0 with initial condition x(0)

    =x0. Clearly the positive eigenvalueand also the

    double root 0 make the total systemunstable; i.e., eAtx0 does not approach zero (i.e., theequilibrium state) as t . That is, if the initial state were nonzero, the solutionx(t)would not remain in a neighborhood of the origin, and the pendulum would fall. While thisconclusion is based on the linearized model (10.8) or (10.7), the conclusion easily carriesover to the nonlinear model (10.5) and (10.6) as well.

    Now consider the following problem: is it possible to keep the system statex (t)in aneighborhood of the origin (or even approach the origin)? As it is, without the action ofthe inputu, this is clearly not possible, but it can be with a suitable choice of an inputu.

    How do we do this? A rough idea is to choose u(t) such that when the pendulum isfalling to the right, we apply an u(t) to move the cart to the right to catch up and do theopposite when it is falling to the left. This is only a rough scenario, and it need not workwithout a more elaborate choice ofu.

  • 7/26/2019 From Vector spaces to function spces

    13/267

    206 Chapter 10. Applications to Systems and Control

    There is another serious problem: even if we happen to find a suitable control inputu, that depends on the initial state x0. If we start from a different initial statex , then weneed to find a differentufor thisx . This appears to be an impossibly complicated problem:for each initial statex , we need to find a suitable input functionucorresponding tox .

    Note that this is very different in nature from the seemingly easy appearance of thedifferential equations (10.3), (10.4). The objective here is not merely to solve them. Indeed,it is almost trivial to give a concrete solution,

    x(t) = eAtx0 + t

    0eA(t)Bu()d; (10.9)

    this is far from our objective todesignthe controller here to make the total system behavedesirably.

    The notion offeedbackis particularly useful. It uses the measured output signal y(t)

    and modifies the input signal u(t) accordingly. A simple example is alinear feedback. Thisgives a control input uas a linear function of the output y or the state x. The former iscalled anoutput feedback, while the latter is called astate feedback. Consider the simplestcase whereu(t) = Kx(t), whereK is a constant matrix.99 Substituting this into (10.3), weobtain a new differential equation,

    dx

    dt(t) = (A + BK)x(t), (10.10)

    and this changes the system dynamics completely. Part of the problem in control can bereduced to designing a suitable K to make the system behave desirably. But what do we

    mean by desirable? One requirement is stability. The linear system (10.3) is said to be(asymptotically) stableifeAtx 0 ast for every initial state x . It isunstableif itis not stable. System (10.3) is stable if and only if the real parts of all eigenvalues ofAarenegative. The inverted pendulum in Example 10.1.1 is clearly unstable. Note that the inputtermB does not enter into this definition. The role of feedback as in (10.10) is to changethe dynamicsAasA A+ BKwith someK . How can we achieve stability then?

    Lord Maxwells paper [34] is probably the first tractable theoretical account on sta-bility of control systems. There he derived a condition for stability of a third-order lineardifferential equation obtained from an equation of a centrifugal governor. Needless to say,centrifugal governors played a crucial role in Watts steam engine invention, and its stabil-ity was obviously of central importance. The problem is how one can obtain a condition on

    coefficients without solving for the characteristic roots of the system matrixA.If is not stable, we may want to modify the system (for example, by feedback) to

    make it stable. This is called a stabilization problem. In other words, we want to studyproperties of A, B, and C or combinations thereof to find out how can or cannot bestabilized. Solving (10.3), (10.4) is only a small part of it, or may not even be necessary,to arrive at a solution.

    However, before asking more elaborate system questions, let us ask a fundamentalquestion:Can be controlled at all? This is no doubt one of the most basic questions thatcan be raised if one wants tocontrol .

    99In general, we should consider output feedback, and the feedback may also depend on the past valuesofynot just the present value y(t); butthis simple example case gives rise to a generic constructionin conjunctionwith astate estimatoroften known as theKalman filter[24] or anobserver.

  • 7/26/2019 From Vector spaces to function spces

    14/267

    10.3. Controllability and Observability 207

    Somewhat surprisingly, it was relatively recently, with the advent of so-called mod-ern control theory in the 1950s (see [25]), that such a question was properly addressedin the literature. Part of the reason was that classical control theory did not employ thestate space model (10.3), (10.4) but described its direct input to output relationship viaso-calledtransfer functions. We will come back to this issue later, but let us proceed toformulate the notion ofcontrollability.

    10.3 Controllability and Observability

    Controllability and observability are two fundamental notions in modern system and controltheory. The former deals with the capability of how control inputs can affect the system,while the latter is concerned with how we can identify the internal behavior of the systemby observing the output. Interestingly, for finite-dimensional linear systems such as (10.3)

    (10.4), they are completely dual to each other.Let us start our discussion with controllability.

    10.3.1 Controllability

    Letx0, x1Rn be two arbitrary elements in the state space Rn. We say that system iscontrollable if there exists a control input uand a time interval [t0, t1) such that usteersthe initial state x0given at time t0to the final state x1at time t1along the trajectory given by(10.3). Since (10.3) is clearly invariant under time shifts, we may take t0= 0. Formalizingthis and using (10.9), we arrive at the following definition.

    Definition 10.3.1. Systemdefined by (10.3)(10.4) is said to becontrollableif for everypairx0, x1Rn, there exists an input function ugiven on some interval [0, t1) such that

    x1= eAt1 x0 + t1

    0eA(t1)Bu()d. (10.11)

    If this holds for x0, x1Rn, we also say that x0 is controllable to x1 in time t1. SeeFigure 10.2.

    A related notion isreachability. We say that x1isreachablefrom 0 if (10.11) holdsforx0= 0.100 System is said to bereachableif everyx is reachable from 0.

    Note here that for a fixed pair (x0, x1), t1 and uneed not be unique, and hence thetrajectory connecting these two states is not unique, either (Figure 10.2).

    Reachability as above is clearly a special case of controllability. However, the twonotions are actually equivalent in the present context. We state this as a lemma.

    Lemma 10.3.2.Systemas above is controllable if and only if it is reachable.

    Proof.We need only prove the sufficiency.Suppose that is reachable. We want to show that for every pair x0, x1Rn,x0is

    controllable tox1(or x1can be reached from x0) by some action of an input.

    100We also say that x1is reachable from 0 in timet1.

  • 7/26/2019 From Vector spaces to function spces

    15/267

    208 Chapter 10. Applications to Systems and Control

    Figure 10.2.Control steeringx0to x1

    First, there existT andusuch that

    x1= T1

    0eA(T1)Bw()d. (10.12)

    Since every state must be reachable from 0, we see thateAT1 x0is also reachable from 0.That is, there exist uandT0such that

    eAT1 x0= T0

    0eA(T0)Bu()d. (10.13)

    LetT := max{T0, T1}. Note that we may replace both T0and T1above byTas follows:

    eATx0= T

    0eA(T)Bu()d

    and

    x1= T

    0eA(T)Bw()d.

    IfT > T0, then setuto zero on [TT0, T) to obtain

    eATx0= eA(TT0) T0

    0eA(T0)Bu()d=

    T0

    eA(T)Bu()d (10.14)

    becauseuis zero on [TT0, T). Similarly, ifT > T1, set

    w:= 0, 0 t < T T1,w(t T+T1), TT1 t < T,

  • 7/26/2019 From Vector spaces to function spces

    16/267

    10.3. Controllability and Observability 209

    to obtain

    x1= T

    0eA(T)Bw()d. (10.15)

    Definingv(t) := u(t) + w(t) for 0 t < Tthen easily yields

    eATx0 + T

    0eA(T)Bv()d=eATx0 +

    T0

    eA(T)Bu()d+ T

    0eA(T)Bw()d=x1,

    by (10.14) and (10.15), as desired.

    If the state space is finite-dimensional, reachability implies that the system is reach-able in auniformly bounded time.

    Proposition 10.3.3.Suppose that system defined by (10.3) is reachable. Then there existsT >0such that for everyxRn there existsusuch that

    x= T

    0eA(T)u()d.

    Sketch of proof. Consider a basisx1, . . . , xnfor Rn, and take suitableu1, . . . , unthat gives

    xi= Ti

    0eA(T)ui ()d.

    Take T :=

    max{

    T1

    , . . . , Tn}

    , and modify uisuitably as in the proof of Lemma 10.3.2 to obtain

    xi= T

    0eA(T)ui ()d.

    Express x as a linear combination of xi , and also take u as a linear combination of uiaccordingly. Thisudrives 0 to x in timeT.

    We now give some criteria for reachability. Given (10.3), consider the mapping

    RT :L2[0, T]

    Rn :u

    T

    0

    eAtBu(t)dt. (10.16)

    It is easy to see thatxRn is reachable from zero in time Tif and only ifx imRT.Now let RT : R

    n L2[0, T] be the adjoint operator ofRT. According to Example5.5.2, the adjoint operator RTis given by

    RT : R

    n L2[0, T] :x BeAtx.

    Now consider the composed operator MT :=RTRT :Rn Rn. We first show the follow-ing lemma.

    Lemma 10.3.4.The kernel ofMTconsists of the elements that are not reachable from zero

    in timeT, except0itself.

  • 7/26/2019 From Vector spaces to function spces

    17/267

    210 Chapter 10. Applications to Systems and Control

    Proof.Letx ker MT, i.e.,MTx= 0. Then

    xMTx= T

    0 xeAt

    BB eAt

    xd t= 0.Hence T

    0

    xeAtB2 dt= 0,whereis the Euclidean norm ofRn. Since the integrand is continuous, it follows thatxeAtB 0 on [0, T].

    Now suppose that there exists u L2[0, T] such that

    x

    = T

    0

    eA(T)Bu()d.

    Multiplyxfrom the left to obtain

    xx= T

    0xeA(T)Bu()d=

    T0

    xeAtBu(T t)dt= 0,

    and hencex= 0.

    We are now ready to give the following theorem.

    Theorem 10.3.5.The following conditions are equivalent:

    1. Systemdefined by(10.3)is reachable.

    2. MT is of full rank, i.e., it is invertible.

    3.rank

    B,AB ,. . . ,An1B

    = n, (10.17)

    wherenis the dimension of the state space, i.e., the size ofA.

    Proof.Suppose is reachable. Then by Lemma 10.3.4, ker MT= 0. Since MTis a squarematrix, this means thatMTmust be of full rank.

    Conversely, suppose that MTis invertible. Then Theorem 3.3.1 (p. 75) implies that

    u(t) :=RTM1T x

    gives a solution to RTu = x; that is,x is reachable from 0 in timeTby inputu L2[0, T].(Moreover, this uis the input with minimum norm, according to Theorem 3.3.1.) Henceproperty 2 implies reachability.

    Now suppose that condition 2 fails; i.e., there exists a nonzero xker MT. Then,as in the proof of Lemma 10.3.4, xeAtB 0 on [0, T]. Differentiating successively andevaluating att= 0, we see that

    x B,AB ,. . . ,An1B= 0. (10.18)

  • 7/26/2019 From Vector spaces to function spces

    18/267

    10.3. Controllability and Observability 211

    Sincex= 0,B,AB ,. . . ,An1Bcannot have full rank. Hence condition 3 implies con-dition 2.

    Conversely, suppose that (10.17) fails; i.e., there exists a nonzero vector x such that(10.18) holds. Note that by the CayleyHamilton theorem

    An = 1An1 nI. (10.19)Hence we have

    xAkB= 0, k= 0,1,2, . . . .This implies that xeAtB is identically zero since it is an analytic function and its Taylorcoefficients are all zero. HenceBeAtx 0, which readily implies x ker MT.

    Remark 10.3.6. Note that MT is always a nonnegative definite matrix. Hence the abovecondition 2 is equivalent toM

    Tbeing positive definite.

    A remarkable consequence of controllability is that it enables the property known aspole-shifting. We state this as a theorem without giving a detailed proof.

    Theorem 10.3.7 (pole-shifting/pole-assignment theorem).Letbe the system defined by

    (10.3), and suppose that it is controllable. Take any complex numbers 1, . . . , n. Then thereexists an nm matrixKsuch that the characteristic polynomial (s) = det(sIA+BK)has zeros1, . . . , n.

    For a complete proof, we refer the reader to standard textbooks such as [19, 23, 56].

    We here give a simple proof for the sufficiency for the special case of single inputs, i.e.,m = 1 in (10.3).

    Proof of sufficiency for the case m= 1. In this case, B is a column vector, and we writebfor B . By Theorem 10.3.5,

    {b, Ab, . . . , An1b}is linearly independent and hence forms a basis for Rn. Let det(sI A) = sn + 1sn1 + +nbe the characteristic polynomial ofA. Consider the set of vectors

    e1= An1b + 1An2b + + nb...

    en1= Ab + 1b,en= b.

    This is a triangular linear combination of{b, Ab, . . . , An1b} and hence forms a basis. Withrespect to this basis, it is easy to see thatAandbtake the form

    A =

    0 1 0 00 0 1 0

    00 0 0

    1

    n n1 n2 1

    , b =

    00...01

    . (10.20)

  • 7/26/2019 From Vector spaces to function spces

    19/267

    212 Chapter 10. Applications to Systems and Control

    Let (s) = sn + 1sn1 + + n. To make(s) have roots1, . . . , n, it is sufficient thatany such(s) can be realized as det(sI A+ bK). Just choose

    K:= [n n, n1 n1, . . . , 1 1].ThenA bKtakes the companion form

    A =

    0 1 0 00 0 1 0

    00 0 0 1

    n n1 n2 1

    .

    Hence det(sI A+ bK) = (s) = sn + 1sn1 + +n.

    Remark 10.3.8. Naturally, when the i s are complex, K will also be a complex matrix.In order that K be real, these i s should satisfy the condition that it be symmetric againstthe real axis; i.e., ifbelongs to this set, then should also belong to it.

    10.3.2 Observability

    A completely dual notion isobservability. For simplicity, we confine ourselves to system(10.3)(10.4).

    Conceptually, observability means the determinability of the initial state of the sys-tem under the assumption of output observation with suitable application of an input. Letx0

    Rn be an initial state of system (10.3)(10.4). We say that x0 isindistinguishable

    from 0 if the output derived fromx0satisfies

    CeAtx0= 0 (10.21)for allt 0. That is, there is no way to distinguish x0from 0 by observing its output. Notethat the input term can play no role in view of the linearity in u entering into the solution(10.9). We say that x0 isdistinguishableif it is not indistinguishable from 0, i.e., if thereexistsT 0 such that

    Ce ATx0= 0. (10.22)This leads to the following definition.

    Definition 10.3.9.The system(10.3)(10.4)is said to beobservableif every statex0Rn

    is distinguishable from0.

    Some remarks are in order. First, in principle, there is no uniform upper bound for Tin (10.22) for different x0 even if the system is observable. However, when the system isfinite-dimensional, as in the present situation, there is indeed an upper bound for T.

    We first note the following lemma.

    Lemma 10.3.10. Suppose that the initial state x Rn satisfies Ce Atx0on [0, T]forsomeT > 0. ThenCAk x= 0for every nonnegative integerk, andCeAtx 0for allt 0.

    Proof.Evaluatedk

    (CeAt

    x)/dtk

    at 0 to obtainCx= CAx= = 0.

  • 7/26/2019 From Vector spaces to function spces

    20/267

    10.3. Controllability and Observability 213

    SinceC eAtxis a real analytic function in t, it readily follows that it is identically zero onthe real axis.

    FixT >0, and consider the mapping

    OT :Rn L2[0, T] :x CeAtx. (10.23)

    In view of Lemma 10.3.10 above, system (10.3)(10.4) is observable if and only if (10.23)is an injective mapping for some T >0 (indeed, for anyT >0).

    The adjoint operator ofOTis given by

    OT :L

    2[0, T] Rn :u T

    0eA

    tCu(t)dt

    according to Example 5.5.3 on page 97.Consider the composed operator

    NT :=OTOT :Rn Rn.

    We have the following theorem, which is dual to Theorem 10.3.5.

    Theorem 10.3.11.The following conditions are equivalent:

    1. Systemdefined by (10.3) and (10.4) is observable.

    2. NT is of full rank for everyT >0, i.e., it is invertible.

    3.

    rank

    C

    CA...

    CAn1

    = n, (10.24)

    wherenis the dimension of the state space, i.e., the size ofA.

    Proof. Suppose is observable and also that there exists T >0 such that NTx=0 forsomex . Then

    xNTx= T

    0xeAtCCeAtxd t= 0.

    Hence T0

    CeAtx2 dt= 0.This readily implies (cf. the proof of Lemma 10.3.4) C eAtx0 on [0, T], which in turnyieldsC eAtx 0 for allt 0 by Lemma 10.3.10. Since is observable,x must be zero.HenceNTis of full rank.

    Now suppose that NT is of full rank, i.e., it is invertible. Suppose also that (10.24)fails. Then by the CayleyHamilton theorem (see (10.19)), C Akx

    =0 for every nonnega-

    tive integer kfor some nonzero x(cf. the proof of Theorem 10.3.5). This implies CeAtx 0for everyt 0, and henceNTx= 0. This is a contradiction; i.e., observability should hold.

  • 7/26/2019 From Vector spaces to function spces

    21/267

    214 Chapter 10. Applications to Systems and Control

    Suppose now that is not observable. Then there exists a nonzero x such thatCe Atx 0 for every t 0. Lemma 10.3.10 readily yields CAk x= 0 for every nonnegativeintegerk, which contradicts (10.24).

    Finally, suppose that (10.24) fails. There exists a nonzero x such thatC Akx= 0 fork= 0, . . . , n1. By the CayleyHamilton theorem, this implies CAk x= 0 for allk 0. Itfollows thatC eAtx 0,t 0, contradicting observability.

    10.3.3 Duality between Reachability and Observability

    Theorems 10.3.5 and 10.3.11 are in clear duality. In this subsection, we will further estab-lish a more direct relationship.

    Let be given by (10.3) and (10.4). Itsdual systemdenotedis defined by

    : dz

    dt

    (t) =

    Az(t)+

    Cv(t),w(t) = Bz(t), (10.25)

    where denotes complex conjugation, andz(t) Rn,v(t) Rp,w(t) Rm.The following theorem is immediate from Theorems 10.3.5 and 10.3.11, particu-

    larly (10.17) and (10.24).

    Theorem 10.3.12. System given by (10.3)and(10.4)is reachable if and only if (10.25)is observable;is observable if and only ifis reachable.

    We now give a more intrinsic interpretation of this fact. First suppose for simplicitythat is stable. This is by no means necessary but makes the subsequent treatment easier.

    Define the mappings R and O as follows:

    R : (L2(,0])m Rn :u 0

    eAtBu(t)dt, (10.26)

    O :Rn (L2[0,))p :x CeAtx. (10.27)Dually, we also define

    R : (L2(,0])p Rn :v 0

    eA

    tCv(t)dt, (10.28)

    O : Rn (L2[0,))m :z BeAtz. (10.29)The following lemma is then readily obvious.

    Lemma 10.3.13.System (10.3)(10.4)is reachable if and only ifR is surjective and isobservable if and only ifOis injective. Similarly, the dual system is reachable if andonly if R is surjective and is observable if and only if O is injective.

    Now define the following bilinear form between (L2(,0]) and (L2[0,)):

    (L2(,0]) (L2[0,)) : (, ) , := 0

    (t)(t)dt=

    0(t)(t)dt,

    (10.30)

  • 7/26/2019 From Vector spaces to function spces

    22/267

    10.4. Input/Output Correspondence 215

    where is either m or p. With respect to this duality, it is easily seen that ((L2(,0])) =(L2[0,)) and ((L2[0,)))=(L2(,0]). The following theorem gives a dualitybetween reachability and observability.

    Proposition 10.3.14.With respect to the duality(10.30)above, the following duality rela-tions hold:

    R=O, O=R. (10.31)

    Proof.Apply Example 5.5.2 (p. 96) in Chapter 5 to (10.28) with K(t) := eAtB,a= ,b=0.101 By reversing the time direction as demanded by (10.30), we obtain K(t)=BeAt, and hence R=O. The same is true for O.

    Henceif is reachable,R is surjective, andby Proposition 5.2.3 in Chapter 5,R =Omust be injective, and hence

    is observable. The same is true for O. This readily yields

    Theorem 10.3.12 and establishes a complete duality between reachability and observabilityin a function space setting.

    10.4 Input/Output Correspondence

    The system description (10.3)(10.4) gives the following flows of signals: inputs statesand states outputs.

    There is also a contrasting viewpoint that directly describes the correspondence frominputs to outputs. In so-called classical control theory, one often deals with such a corre-spondence.

    Let an inputu be applied to (10.3) on the interval [0, t) with initial state x at time 0.Then we have

    x(t) = eAtx + t

    0eA(t)Bu()d,

    and hence

    y(t) = CeAtx + t

    0CeA(t)Bu()d. (10.32)

    We have a family of input to output correspondences

    u CeAtx +

    t

    0Ce A(t)Bu()d: x

    Rn (10.33)

    parametrized by statex .Hence, strictly speaking, the correspondence from inputs to outputs is not a mapping;

    that is, an input gives rise to a family of output functions parametrized by state x , which isunknown. To bypass this difficulty, in the approach of classical control theory, one assumesthatx is zero in the above and considers the correspondence

    u t

    0Ce A(t)Bu()d. (10.34)

    101with a slight abuse of the result because in Example 5.5.2 [ a,b] is a bounded interval, but this is notessential.

  • 7/26/2019 From Vector spaces to function spces

    23/267

    216 Chapter 10. Applications to Systems and Control

    This is perhaps an acceptable assumption when the underlying system is a prioriknown to be stable, so that an effect due to the initial state will soon decay to zero. However,strictly, or philosophically, speaking, this is rather unsatisfactory, and one should attack thecorrespondence (10.33) right in front. This is the viewpoint ofbehavioral system theoryproposed by Willems and is studied extensively in the literature; see [44].

    However, we will not pursue this relatively modern approach here and will contentourselves with the classical viewpoint, setting the initial state x= 0, mainly for technicalconvenience.

    Another rationale for assuming the initial state is zero is the following: Almost everylinear system arises as a linearized approximation of a nonlinear system. This linearizationis performed around an equilibrium trajectory, and the state variable x (t) represents a de-viation from such a reference trajectory. In this sense it is rather natural to assume that theinitial statex(0) is zero. See Figure 10.3.

    R e f e r e n c e t r a j e c t o r y

    d e v i a t i o n

    Figure 10.3.Linearization around a reference trajectory

    Under this assumption, (10.32) takes the form

    y(t) = t

    0CeA(t)Bu()d=

    t0

    g(t )u()d, (10.35)

    whereg(t) := CeA(t)B. In other words, y= g u, andg is called theimpulse responseof system . This is becauseg=g ; that is, gis obtained as the response against theimpulse. The Laplace transformL[g] is called thetransfer function (matrix)of. Thesenotions are already encountered under a more specialized situation in Chapter 6, section 6.7(p. 131).

  • 7/26/2019 From Vector spaces to function spces

    24/267

    10.5. Realization 217

    10.5 Realization

    In the previous sections, we have given two different types of representations for linear

    systems: one via the differential equation description (10.3)(10.4), and the other via theinput/output correspondencey= g u(under the assumption that the initial state is zero).Let us now denote the system described by (10.3)(10.4) by the triple (A, B, C). Thedimension of the state is implicitly specified by the size ofA.

    Given the triple (A, B, C), the impulse response g is given byC eAtB,t 0, and thetransfer function byC (sI A)1B. Therealization problemasks the following conversequestion.The realization problem. Given a p mmatrix function gwith support in [0,), finda system =(A, B, C) such that g(t)=C eAtB for all t0. The obtained system=(A, B, C) is called arealizationofg.

    If necessary, assume sufficient regularity for g. Fort < 0, we setg(t)

    =0, and hence

    it will usually have a discontinuity at t= 0, as shown in Figure 10.4. Naturally, not everysuchgcan be realized by a finite-dimensional system as above. For example,g(t) = et2is clearly seen to be nonrealizable by any (A, B, C) because it grows much faster thanany exponential order.

    Figure 10.4.Impulse response

    Even ifg is realizable by some (A, B, C), the realization is not necessarily unique.For example, consider

    g(t) :

    = et, t 0,0, t

  • 7/26/2019 From Vector spaces to function spces

    25/267

    218 Chapter 10. Applications to Systems and Control

    minimal if it possesses the least dimension among all realizations for a given impulseresponse. It is known that a minimal realization is essentially uniquely determined (up tobasis change in the state space) for a given impulse response [23].

    In general, reasonably regular (smooth) galways admits a realization, but not nec-essarily with a finite-dimensional state space. We say that a linear system isfinite-dimensionalif the state space (in which x(t) resides) is finite-dimensional.

    For example, consider the input/output relation

    u(t) u(t1), (10.38)

    which represents a unit time delay. In order to represent this relation, it is clear that weneed to store a complete function piece defined on a unit time interval, and a function spaceconsisting of all such functions. For example, if we consider L2 inputsu, this space will

    becomeL2[0,1], which is clearly not finite-dimensional.The following criterion is well known for its finite-dimensional realizability.

    Theorem 10.5.2.Letgbe an impulse response. A necessary and sufficient condition forg

    to admit a finite-dimensional realization is that the associated transfer function matrixg(s)be a strictly proper rational function ofs .

    Proof.For simplicity, we give a proof for a single-input/single-output system, i.e., the casem = p= 1.Necessity. Let g(t)=C eAtB for some A, B, C. Theng(s)=C(sIA)1B, and this isclearly rational in s . RewritingC (sI

    A)1Bin terms of Cramers rule, we immediately

    see that this is strictly proper.Sufficiency. Conversely, suppose thatg(s) = b(s)/a(s), where deg b < deg a. Without lossof generality, we can takeato be a monic polynomial; i.e., its highest order coefficient is 1.Writea(s) = sn + a1sn1 + + an, andb(s) = b1sn1 + + bn. Then the triple

    A:=

    0 1 0 . . .0 0 1 . . ....

    . . .0 1

    an

    an1

    an

    , B:=

    00...01

    , C:= [bn, bn1, , bn]

    is easily seen to satisfy C(sIA)1B= b(s)/a(s). This readily yields g(t) = CeAtB.

    Exercise 10.5.3. Generalize the above to the matrix case.

    10.5.1 Transfer Functions and Steady-State Response

    Let us return to formula (10.35) for the input/output relations. In terms of the Laplacetransform, this means (Theorem 8.1.2)

    y

    = g

    u. (10.39)

  • 7/26/2019 From Vector spaces to function spces

    26/267

    10.5. Realization 219

    u+ g

    y

    h

    Figure 10.5.Feedback control system

    u

    g

    y

    Figure 10.6.Input/output relationship with transfer functiong

    eit

    g

    g(i)eit

    Figure 10.7.Steady-state response againsteit

    An advantage of this representation is that it can place the whole framework of con-trol systems into an algebraic structure. For example, thefeedbackstructure

    y= ge, e = u hycan be schematically represented by the block diagram Figure 10.5.

    A remarkable feature of this representation is that it yields a concise expression of thesteady-state response against sinusoidal inputs (see Figure 10.7). Consider Figure 10.6,and suppose that (i) the system is stable and (ii) an input eitu0is applied to the system.The following proposition states that the system output becomes asymptotically equal to amultiple of this sinusoid, with the same frequency , and no other frequencies are present.

    Proposition 10.5.4. Consider the linear system with a strictly proper rational transfer

    functiongas in Figure10.6. Suppose thatghas its poles only in the open left half complexplaneC = {s: Re s < 0}. Apply an inputu(t) = H(t)eitu0to this system. Then the outputy(t)asymptotically approachesg(i)eitu0as t .

    Proof.For simplicity, assume m = p = 1and u0 = 1. Let us first suppose that the initial statex0at t= 0 is zero. Then the response yagainst u must satisfyy(s) = gu(s) = g(s)/(s i).Sincegis analytic ats= i, we can expand it as

    g(s) = g(i) + (s i)g1(s),

  • 7/26/2019 From Vector spaces to function spces

    27/267

    220 Chapter 10. Applications to Systems and Control

    where g1(s) is analytic at i and has no poles in the closed right half complex planeC+= {s: Re s 0}, because neither doesg. It follows that

    g(s)s i =

    g(i)s i + g1(s).

    Taking the inverse Laplace transform of the terms on the right, we see that the first termbecomesg(i)eit, and the second term approaches zero ast because of the absenceof closed right half plane poles. Hence the system response asymptotically approachesg(i)eit.

    When there is a nonzero initial state, simply observe that the corresponding responseapproaches zero as t due to the nonexistence of closed right half plane poles ofg.

    The responseg(i)eit is called thesteady-state responseagainsteit.

    This proposition shows the following fact:For a linear stable system described by atransfer functiong, the steady-state response against a sinusoideit is proportional to it,obtained by multiplyingg(i). Rewritingg(i)eit, we obtain

    g(i)eit = |g(i)|eit+ , = g(i), (10.40)

    where g(i) denotes the argument (phase angle) ofg(i). This means that in steadystate, the amplitude is multiplied by |g(i)|, and the phase angle is shifted byg(i). Themapping

    R C : g(i) (10.41)

    is called thefrequency responseof the system.In general, when an input is applied to a system, it can be decomposed into frequency

    components via the Fourier transform, and the resulting output is a composite (i.e., the sumor integral) of respective responses at all frequencies. The plot exhibiting the magnitude|g(i)| and the phaseg(i) for each frequency is very useful in analyzing the steady-stateresponse ofg. The plot that shows the curves of|g(i)|andg(i) against the frequencyaxis is called theBode plot, named after Hendrick Bode, who devised this scheme. Inparticular, the plot exhibiting the gain|g(i)| is called theBode magnitude plot, whilethat for phaseg(i) is called theBode phase plot. To show the gain, its dB value (i.e.,20log10 |g(i)|) is used. This is often convenient for evaluating a composite effect whentwo systems are connected in a cascade way.102 Figure 10.8 shows the Bode magnitudeand phase plots ofg(s) = 100/(s2 +2s +100). This example shows that this system showsa highly sensitive response at = 10 [rad/sec].

    Observe also that the maximum of the Bode magnitude plot gives the Hnorm ofg.

    10.6 H Control

    We are now at the point of discussing some basics ofHcontrol.Consider the control system depicted in Figure 10.9. Here P(s) is the plant to be

    controlled, andC(s) is the controller to be designed. Such a control system is usually called102In this case, the transfer function is the product of the two component transfer functions, and the gain be-

    comes the product at each frequency; hence the dB value can be obtained as the sum.

  • 7/26/2019 From Vector spaces to function spces

    28/267

    10.6. HControl 221

    Frequency (rad/sec)

    Phase(deg);Magnitude(dB)

    Bode Diagrams

    -40

    -30

    -20

    -10

    0

    10

    20

    From: U(1)

    Frequency (rad/sec)

    100

    101

    102

    -200

    -150

    -100

    -50

    0

    To:Y(1)

    Figure 10.8.Bode plots for100/(s2 +2s +100)

    r

    + e

    C(s) u

    P(s)

    y

    Figure 10.9.Unity feedback system

    aunity feedback systemin the sense that the feedback loop has the unity gain. The feedbacksignal is subtracted from the input, and the controller C(s) is driven by the error e. Inthis sense theclosed-loop systemhas anegative feedback. The original idea of feedback,

    from the time of Watt, was toreducethe sensitivity of the control system against variousfluctuations inherent to the system and operation conditions. Hence the error from thereference is to be measured, and this leads to the negative feedback. This structure is fairlystandard. If one needs to incorporate a different gain in the feedback loop, that can beaccomplished by suitably scaling the output y .

    The objective here is to designC(s)soastomakethecharacteristicsofthis closed-loopsystemdesirable with respect to some performance index.

    Observe that

    y=Pu, u =Ce, e = r y.

    Eliminatingu andy, we obtain the transfer function S(s)=(1 +P(s)C(s))1 from thereference inputr to the errore. ThisSis called thesensitivity function. An objective here

  • 7/26/2019 From Vector spaces to function spces

    29/267

    222 Chapter 10. Applications to Systems and Control

    is to make this function small. This will accomplish small errors against references, evenunder modeling errors and fluctuations in the plant.

    Unfortunately, this objective cannot be accomplished uniformly over all frequencies(Bodes theorem [8]). If the gainSbecomes lower at a certain frequency range, it becomeshigher somewhere else. On the other hand, making the sensitivity function uniformly smallis not necessarily always required. There is usually a relatively important frequency rangewhereS(i) has to be small, while in other ranges this is not necessarily so. In view ofthis, one usually takes a weighting function W(s) (usually taken to be a proper and stablerational function) and then attempts to minimize W(i)S(i).

    A classical criterion is to take theH2 norm as the performance index:

    J= infC(s):stabilizing

    W(s)S(s)2, (10.42)

    where C runs over all controllers that make the closed-loop (i.e., the feedback) systemstable. This is a minimization problem in the Hilbert space H2, and the projection theoremas described in Theorem 3.2.7 (Chapter 3) can be invoked to derive its solution.

    While thisH2 minimization has played some historically important roles in classicaloptimal control theory, it gives only a limited estimate of the closed-loop performance.For example, if there is a sharp peak in|W(i)S(i)| at a certain frequency, it is notnecessarily reflected upon itsH2 norm as a large value, because theH2 norm measures anintegrated (i.e., averaged) performance over all frequency range.

    Around 1980, George Zames introduced theHperformance index [69]

    opt

    = inf

    C(s):stabilizingW(s)

    S(s)

    (10.43)

    and asserted that it gives a very natural performance criterion for many control objectives.We can safely say that this is the origin ofHcontrol theory.

    The performance index (10.43) attempts to minimize the maximum (supremum) of theBode magnitude plot. Hence, if this index can be made small, it guarantees the worst-caseresponse to be within a tolerable range and hence is desirable. On the other hand, since His not a Hilbert space, the projection theorem, a key to many optimization problems, is notavailable to us here. Naturally, at the time, there was doubt as to whether this problem issolvable in sufficient generality as to be applicable to general control systems.

    However, contrary to such concerns, a general solution was found in subsequent

    developments in a few years [15, 70]. In due course, deep connections with Sarasonsgeneralized interpolation theory (Theorem 9.5.2, p. 191), Neharis theorem (Theorem 9.5.4,p. 192), and NevanlinnaPick interpolation theory (Theorem 9.6.4, p. 198) were found andclarified.

    This is one of the cornerstone accomplishments in control theory in the past 30 years,and it has become standard. It is now largely applied in many control problems.

    We here give solutions to an elementary H control problem. Our objective is tofind the optimal value

    infC(s):stabilizing

    W(s)(1 +P(s)C(s))1 (10.44)

    and find a controller Caccomplishing this value.

  • 7/26/2019 From Vector spaces to function spces

    30/267

    10.6. HControl 223

    10.6.1 Preliminaries for the H Solution

    For brevity of exposition, let us assume that the plant P(s) is stable, i.e., it has polesonly in the open left-half complex planeC:= {s: Re s

  • 7/26/2019 From Vector spaces to function spces

    31/267

    224 Chapter 10. Applications to Systems and Control

    SinceS(s) +T(s)1, however, it is impossible to attenuate these two measuressimultaneously. In many practical cases, S is required to be small in a low-frequencyrange, while

    T(s) needs to be small mainly in a high-frequency range. This is because the

    tracking performance is more important in low frequency (typically against the step inputs(see Remark 10.6.2 below) for which = 0), while plant uncertainties are more dominantin high frequency. To accommodate this constraint, we often employ weighting functionsW1,W2and attempt to minimize W1S W2T instead of just one sensitivity function. This problem is called the mixed sensitivity problem.See, e.g., [8, 14, 73] for more general and advanced materials.

    Remark 10.6.2. The step input here means a constant multiple of the Heaviside unit step

    functionH(t). In many applications, a control command is given in the form of trackingsuch a signal. For example, when we ride on an elevator, we push the button of a targetfloor. This action is transferred to the system as a proper voltage value that should betracked, and it is representable as a constant multiple of the unit step function. The Laplacetransform of the unit step function is 1/s, and hence it corresponds to = 0 in the frequencyresponse, i.e., the direct current component. In some other cases, one may wish to rejectdisturbances arising from AC power supply, which is often represented as a sinusoid at=60[Hz]. These simple examples show why we are interested in tracking/rejectionproperties in control systems.

    On the other hand, the complementary sensitivity function T= (1+P C)1P Cgivesthe correspondencer yin Figure 10.9. This has much uncertainty at high frequencies,particularly when identified from experimental data.

    10.7 Solution to the Sensitivity Minimization Problem

    We here give two different types of solutionsone via the NevanlinnaPick interpolationand the other via Neharis theorem.

    However, before going into detail, let us note the difference of the treatment of Hardyspaces given in Chapter 9 and those needed in this section. While Hardy spaces in Chapter 9are given on D, we here deal with those defined on half planes in C. Typically,H2 is thespace consisting of functions fanalytic on the open right half plane C

    +:= {

    s: Re s >0}and satisfying

    supx>0

    |f(x + iy)|2dy < .

    TheH2 norm off H2 is defined by

    f2:= supx>0

    1

    2

    |f(x + iy)|2dy1/2

    . (10.48)

    Similarly, the space H2is the one consisting of functions fanalytic on the open left halfplane C:= {s: Re s

  • 7/26/2019 From Vector spaces to function spces

    32/267

    10.7. Solution to the Sensitivity Minimization Problem 225

    Its norm is defined by

    f2:= supx

  • 7/26/2019 From Vector spaces to function spces

    33/267

    226 Chapter 10. Applications to Systems and Control

    10.7.1 Solution via the NevanlinnaPick Interpolation

    The problem (10.46) is equivalent to finding the least optamong alls that satisfy

    W mQ , (10.52)

    subject to Q H. We give a solution based on the NevanlinnaPick interpolation, fol-lowing the treatment in [8]. Let us first characterize the suboptimalinequality (10.52).

    Take any >0, and set

    G:= 1

    (W mQ).

    IfQ H, then G belongs to H, but not necessarily conversely. We need some inter-polation conditions.

    Let{s1, . . . , sn}be the zeros ofm(s) in C+. For simplicity, we assume that these si sare all distinct. Since m is inner, there is no zero on the imaginary axis, and hence Re si > 0,i= 1, . . . , n. This readily means that the interpolation condition

    G(si ) =1

    W(si ), i= 1, . . . , n,

    must be satisfied. Conversely, ifG Hsatisfies this condition, then

    Q:= m1(W G)

    clearly belongs toH. All unstable polessiarising fromm1 are cancelled by the numer-ator W G. Hence the optimumoptis given by the minimum of all such that

    G(si ) =1

    W(si ), i= 1, . . . , n,

    G 1

    is satisfied for some Hfunction G. This is nothing but the NevanlinnaPick interpolationproblem with respect to the interpolation data

    s1, sn

    W(s1)/, W(sn)/ .We know from Theorem 9.6.4 (p. 198) that satisfies the condition above if and only

    if the Pick matrixPsatisfies

    P= (pij) =

    1 W(si )W(sj)/2si + sj

    0.

    The optimumoptis the minimum of such. The corresponding Qis given by

    Q:= W

    optG

    m .

  • 7/26/2019 From Vector spaces to function spces

    34/267

    10.7. Solution to the Sensitivity Minimization Problem 227

    10.7.2 Solution via Neharis Theorem

    We here give another solution via Neharis theorem. (The following exposition mainly

    follows that of [14].)Multiplyingm(s) = m1(s) on both sides of (10.46), we obtain

    opt= infQH

    mWQ. (10.53)

    However, note here that mW H, and hence the norm cannot be taken in the sense ofH. It should be understood in the sense ofL(i, i).

    DecomposemWasmW= W1 + W2,

    such that W1 H, W2is strictly proper and has all its poles in the open right-half complex

    plane.Define the Hankel operatorW2 associated withW2as

    W2: H2 H2:x PW2x, (10.54)

    wherePis the canonical projection: L2(i, i) H2.SinceW2clearly belongs toL

    (i, i), this gives a Hankel operator accordingto Neharis theorem, Theorem 9.5.4 (p. 192), and this implies that

    opt= W2.An actual computation ofopt may be done as follows: Let (A, B, C) be a minimal

    realization ofW2 (note that W2 is strictly proper; i.e., the degree of the numerator is less

    than that of the denominator). HenceW2(s) = C(sI A)1B. Taking the inverse Laplacetransform (via the bilateral Laplace transform), we have

    W2(t) =CeAtB, t

  • 7/26/2019 From Vector spaces to function spces

    35/267

    228 Chapter 10. Applications to Systems and Control

    is decomposed as the composition of two (finite-rank) operatorscand o, it is a compactoperator. Hence its norm is given by its maximal singular value (Chapter 5, Problem 12).To this end, consider the eigenvalue problem

    c o ocu = u. (10.55)

    Since the operator oc must be nonzero, it suffices to consider nonzero eigenval-ues. Then by Lemma 10.7.4 below, we can interchange the order of operators to obtain

    cc

    o ox= x. (10.56)

    Moreover,

    cc=

    0

    eAtBBTeAT tdt,

    o o=0

    eAT

    tCTCeAtdt

    hold true.Observe that (10.56) is a finite-dimensional eigenvalue problem, in contrast to (10.55).

    It is also known that matrices

    Lc:= cc , Lo:= o oare unique solutions of the following Lyapunov equations:

    ALc + LcAT = BBT,ATLo

    +LoA

    =CTC.

    Solving these equations and finding the maximal eigenvalue of LcLo, we arrive at thesolution.

    It remains to show the following lemma. (See also Problems 2 and 8, Chapter 5, forthe caseX= Y.)

    Lemma 10.7.4. Let X, Ybe normed linear spaces, and letAL(X, Y), B L(Y, X).Then a nonzero Cbelongs to the resolvent set (AB)if and only if (BA). As aconsequence,(AB) \ {0} = (BA) \ {0}.

    Proof.Note first that if

    =0, then

    (I T)1 = 1(I T /)1.Hence we may assume without loss of generality that = 1.

    Suppose1 (AB). Consider105 T:= I+B(IAB)1A, which isclearly a boundedoperator. Then

    T(I BA) = I BA+ B(I AB)1A B(I AB)1ABA= I BA+ B(I AB)1(IAB)A= I BA+ BA = I.

    105This may appear ad hoc. For a supporting intuition, consider the Neumann series expansion (5.7), page 92,(I

    BA)

    1

    =I+

    BA+

    BABA+ +

    (BA)n

    + , which is equal to I

    +B(I

    +AB

    +ABAB

    + )A

    =I+ B(IAB)1A.

  • 7/26/2019 From Vector spaces to function spces

    36/267

    10.8. General Solution for Distributed Parameter Systems 229

    Similarly, (I BA)T= I, and henceT= (I BA)1. This means 1 (BA).Since (AB) =C\(AB), thelast assertionforthespectra immediately follows.

    10.8 General Solution for Distributed Parameter Systems

    The solutions given above depend crucially on the rationality of m(s). When the givenplant has spatially dependent parameters, i.e., when it is adistributed parameter system, theplant becomes infinite-dimensional, and these solutions do not directly carry over. Thereis much research that attempts similar ideas. However, since many of these attempts endup with infinite-dimensional Lyapunov equations or so-called Riccati equations, obtainingpractically computable solutions is nontrivial.

    In contrast to these approaches, there is an alternative studied by Bercovici, Foias,

    zbay, Tannenbaum, Zames, and others (see, e.g., [13]) that makes use of the rational-ity of W, and this leads to an interesting rank condition in spite of underlying infinite-dimensionality. Since one has freedom in choosing the weighting function W, this is a veryplausible choice.

    Let m(s) be an arbitrary (real) inner function that is not necessarily rational. ThenspacemH2 is a right-shift invariant closed subspace ofH2 which is indeed invariant un-der multiplication by an arbitrary H function (see Theorem 9.4.1, p. 190). As seen inChapter 9 (however, see Remark 10.8.1 below), its orthogonal complement is H(m)=H2 mH2. Foreach t t,let Mets denote the multiplication operator induced by ets ,i.e.,

    Mets :H2

    H2 :(s)

    ets (s).

    Since this occurs in the Laplace transformed domain, it corresponds to the right shift operatorbytinL2. letT(t) denote its compression to H(m):

    T(t) = mMets |H(m),

    where m : H2 H(m) is the canonical projection. More generally, let W2be an arbitraryfunction inH, and we define itscompressionW2(T) to H(m) by

    W2(T) :

    = mMW2

    |H(m). (10.57)

    Remark 10.8.1. While in Chapter 9 Hp spaces are defined on D, we deal here with thosespaces defined on C+, etc. Via the bilinear transforms (z 1)/(z +1), the results for Dnaturally carry over to results overHp spaces on C+.

    We state Sarasons theorem (Theorem 9.5.2, p. 191) in the following modified form,which is more suitable to the present context. The difference here is that we takeT(t)instead ofTas in Chapter 9 (p. 191).

    Theorem 10.8.2 (Sarason [49]).LetF: H(m) H(m)be a bounded linear operator thatcommutes with T(t)for every t

    0. Then there existsf

    Hsuch thatF

    =f(T)and

    F = f.

  • 7/26/2019 From Vector spaces to function spces

    37/267

    230 Chapter 10. Applications to Systems and Control

    10.8.1 Computation of Optimal Sensitivity

    Let us now return to the computation of the optimal sensitivity:

    opt= infQH

    W mQ.

    Let F:= W(T). According to Sarasons theorem, thereexists fopt Hsuch that fopt(T) =W(T) and W(T) = f. The action ofWandfmust agree onH(m), and hence

    fopt= W mQopt, and opt= W(T).

    Hence computation of the optimum opt is reduced to that ofW(T). For a largeclass of systems, it is known that this is equivalent to computing the maximal singular value

    ofW(T), and, moreover, its computation is reducible to computing the rank of a certainmatrix [13].

    10.8.2 ZhouKhargonekar Formula

    We here present a beautiful formula, first proven by Zhou and Khargonekar [74] for m(s) =eLs , and later extended to general inner functions by Lypuchuk, Smith, and Tannenbaum[30, 55]. Later some simplified proofs were given in [16, 62]. Let (A, B, C) be a minimalrealization ofW(s). Define the Hamiltonian matrixHby

    H := A BBT/

    CTC/ AT .Then >0 is a singular value of the Hankel operator mWif and only if

    det m(H)|22= 0, (10.58)

    where m(s) = m1(s) = m(s) and ()|22 denotes the (2,2)-block when partitioned con-formably withH. In the case ofm(s) = eLs , (10.58) takes the form

    detexp

    A BBT/

    CTC/ AT

    L|22= 0.

    Needless to say, the maximal singular value is the largest among those satisfying theabove condition. (For brevity we omit some extra conditions; see, e.g., [62].)

    10.9 Supplementary Remarks

    We have given a brief overview of systems and control theory, with emphasis on the fre-quency domain treatment of the sensitivity optimization of H control. Except someextensions to nonlinear or infinite-dimensional systems, H control theory has becomefairly complete and been used in many important applications.

    Control theory can provide solutions to many practical problems. It is no exaggera-tion to say that almost no system, artificial or natural, can work without control, and control

  • 7/26/2019 From Vector spaces to function spces

    38/267

    10.9. Supplementary Remarks 231

    theory also raises many interesting mathematical problems, whose resolution can lead tofruitful new directions. It is hoped that the brief overview here motivates the reader to lookmore closely into the subject.

    For a more general treatment ofHcontrol, the reader is referred to [9, 13, 73]. For afunctional analytic approach to a more general Hproblem [13], one needs the commutantlifting theorem, which is a generalization of Sarasons theorem treated here. For this, thereader is referred to [40].

    For generalizations to distributed parameter systems, see, e.g., [4, 13] and also [27]for some recent developments. While there are many standard textbooks in systems andcontrol, we list [23] as a classical textbook and also [19, 56] for a more recent treatment.The book by Young [67] is also useful in grasping H control from a Hilbert space pointof view.

  • 7/26/2019 From Vector spaces to function spces

    39/267

    Appendix A

    Some Background on Sets,

    Mappings, and Topology

    A.1 Sets and Mappings

    Let us briefly review and summarize the fundamentals on sets and mappings.We will content ourselves with the naive understanding that a set is a collection of

    objects. Rigorously speaking, this vague definition is unsatisfactory because it can leadto a contradiction if one proceeds to enlarge this class of objects too much, for example, byconsidering the set of all sets. But we do not deal with such overly big objects.

    A member of a set is called anelementof the set. For vector spaces, it may be called

    avectoror apoint. When we wish to say that a property holds for every element of a setX,we may often use the expression for anarbitraryelement ofX, meaning for every (any)element ofX. This expression does not seem to be so common in daily life as it is inmathematics and perhaps requires a little care.

    Given a propositionP(x) on an elementx , we denote by

    {x : P(x)}the set of all x such that P(x) holds true. In particular, when we explicitly show that x isan element inX and consider the above as a subset ofX, then we write

    {x X : P(x)}.For example, ifX=R andP(x) is given by |x| 1, then

    {xR: |x| 1}is the interval [1,1].

    The union, intersection, and inclusionare assumed to be already familiar tothe reader. We just note that A Bdoes not exclude the caseA = B. For a subsetAofX,itscomplementAc is defined as

    Ac := {x X : x A}.A set consisting of finitely many elements is called afinite set; otherwise it is called

    aninfinite set. The size of an infinite set, called thecardinal numberorcardinality, i.e., the

    233

  • 7/26/2019 From Vector spaces to function spces

    40/267

    234 Appendix A. Some Background on Sets, Mappings, and Topology

    number of elements in the set, is beyond the scope of this book and is hence omitted. Weonly note that if the elements of the set can be shown with ordering as A = {x1, x2, x3, . . .},the setA is called acountable set, or possesses a countable number of elements. The sets

    N of natural numbers, Z of integers, Q of rational numbers are countable sets.Let X , Ybe sets. If for every x in X there is always associated one (and only one)

    elementy in Y, this correspondence is called amappingor simply amap(it may be calledafunctiondepending on the context; the distinction is ambiguous) and is denoted as

    f :X Y, or X f Y.

    The set X is called thedomain, and Y thecodomain, off. When we wish to explicitlyinclude in this notation the correspondence between x andy under f, we write it (in thisbook) as

    f :X Y:x yusing the notation . For example,

    sin : R R :x sin(x).

    We may also write simply xy when showing the domain and codomain is not neces-sary. Of course, the notation y= f(x) is also used. We also use the following expressions:fmaps (takes, sends) x to y , or the action off onx is y .

    Strictly speaking,y= f(x) shows that theimageof the elementx underf isf(x).In other words, representing a mapping (function) as f(x) (showing its dependence on x

    explicitly) is somewhat imprecise. To be more rigorous, one should write either f orf()without reference to a particular elementx .Letf :X Ybe a mapping, and Aa subset ofX. The set

    f(A) := {f(a) Y : a A}

    of all elements ofYmapped by f from A is called theimageofA under f. When A = X, itis called the image,orthe range, offand is denoted by imfor R(f). Here the symbol :=means that the right-hand side defines the left-hand one. Given two mappings f :X Yand g: Y Z , we define theircomposition(orcomposed mapping) as g f :XZ :(g f)(x) := g(f(x)).

    WhenY= f(X), i.e., any element ofYis an image of some element ofX underf,we say that the mapping fis anonto mapping, asurjection, or a surjective mapping. Onthe other hand, iff(x1) = f(x2) always implies x1= x2, i.e., fsends different elementsto different elements, we say that f is aone-to-one mapping or an injection, injectivemapping. A mapping that is both surjective and injective simultaneously is called a bijection,bijective mappingor aone-to-one correspondence. For a bijectionf, as clearly seen fromFigure A.1, there exists itsinverse mapping f1 that maps every f(x) Y tox X, i.e.,f1 :Y X : f(x)x . Clearly f1 f= IX, f f1 =IY. HereIX denotes theidentity mappingonX, that is, the mapping that sends everyx to x itself.

    For sets X, Y, their (direct)product) X Ymeans the set of ordered pairs (x, y),x

    X,y

    Y:

    X Y:= {(x, y) : x X, y Y}. (A.1)

  • 7/26/2019 From Vector spaces to function spces

    41/267

    A.2. Reals, Upper Bounds, etc. 235

    f

    x f(x)

    X Y

    f -1

    Figure A.1.Inverse mapping

    Since this concept arose originally from the analytical geometry and coordinate system dueto Descartes, it is also commonly referred to as theCartesian product. We can likewiseconsider the product of an arbitrary number (can be infinite) of sets. In particular, thenproduct of the same set X is denoted asXn. Spaces such as Rn and Cn are nothing but thenproducts ofR and C, respectively.

    A.2 Reals, Upper Bounds, etc.

    The construction ofreal numbers(or, simply,reals) is a basis for analysis, but it is often

    treated very lightly. It would be an interesting experience to read [6] by Dedekind himself.The crux of the construction of the reals lies, in short, in the completion of the setof rational numbers Qwhile maintaining its algebraic structures represented by the fourfundamental rules of arithmetic (i.e., addition, subtraction, multiplication, and division),and, further, the order structure and the concept of closeness (topology) determined by theabsolute value, etc. In other words, we wish to enlarge the setQso that we can freelytake limits. As also pointed out in the main text, the source of difficulty lies in the factthat we can use no other concepts than rationals in constructing reals. Unfortunately, weoften forget this trivial fact, since we are so used to expressions such as

    2. One tends to

    implicitly assume that an object such as

    2 already exists before proving its existence. Justthe fact that one is familiar with an expression such as

    2 cannot be a proof of existence,

    of course.Consider for example the definition of

    2 using the Dedekind cut (see Figure A.2).

    It is well known and rather easy to prove that there is no rational number whose square

    A B

    Figure A.2.Dedekind cut

  • 7/26/2019 From Vector spaces to function spces

    42/267

    236 Appendix A. Some Background on Sets, Mappings, and Topology

    is 2. This means that if we draw a line with rationals only, there should be a holecorresponding to this number. This hole should be what we desire to define as

    2. The

    difficulty here is how we can define, e.g.,

    2, only with rationals, without illegitimatelyimporting the concept of reals which is what we want to define.

    The idea of Dedekind is roughly as follows: If there is a hole, then the rationalsshould be separated into two disjoint parts: one subset Ashould consist of all rationalsthat are less than

    2, and the other set B should consist of those that are greater than

    2.

    Of course, we have to express these two sets Aand B without using the expression

    2.Once this is done, the hole (i.e.,

    2) can be identified with the pair(A, B). One may

    think this is trivial, but it is based on the beautiful change of views from the hole itself tothe opposite objects (i.e., the subsets separated by the hole). The difference is that whileit is difficult to pin down the hole itself, it is possible to describe these sets A andB usingrationals only. In the present case,AandB are given by

    A:= {aQ: a 0 or a >0 anda2 0 andb2 >2}.

    Since every element ofAis smaller than any element ofB, this pair separates the wholeset of rationals into two disjoint parts. Dedekind called such a pair acut. The remainingtechnical problems include formal procedures of defining the four fundamental rules ofarithmetic, absolute values, etc., so that we can safely deal with pairs like (A, B) as bonafidenumbers. But we will not delve into this further. The totality of all such pairs is calledthe real numbers.

    A possible doubt is that we may arrive at a further extension by considering cuts of

    real numbers. But it can be shown that we will not obtain any further extension and willremain in the realm of real numbers. Thus any limit of reals remains in the reals, and thisleads to the completeness of reals.

    A remarkable consequence of the completeness can be seen, for example, by thefollowing fact:

    A monotone increasing (nondecreasing) sequence that is bounded above isconvergent.

    In fact, if we deal only with rationals, there are many (in fact, infinitely many) holes in thesense described by the Dedekind cut, and, as we saw in the example of

    2, the limit does

    not exist in Q. To circumvent this situation, we had to construct the reals. The statementabove asserts that for reals, there are no such holes.

    In the same vein, for a subsetMof the reals, itssupremumorleast upper bound

    sup M= sup{xR : x M}

    always exists within reals, provided that M is bounded above. It is indeed known thatthis statement is equivalent to the completeness of the reals. The same can be said of aninfimumor agreatest lower bound. A least upper bound is not necessarily a maximum.Hence sup Mdoes not necessarily belong to the set M. Because of this, some may findit difficult to handle such concepts. It is worthwhile to note the following fundamentalproperties of the supremum and infimum of a sequence {an}.

  • 7/26/2019 From Vector spaces to function spces

    43/267

    A.2