ascher petzold

Computer Methods for Ordinary Di�erentialEquations and Di�erential-AlgebraicEquationsUri M. Ascher and Linda R. PetzoldDecember 2, 1997

PrefaceThis book has been developed from course notes that we wrote, havingrepeatedly taught courses on the numerical solution of ordinary di�erentialequations (ODEs) and related problems. We have taught such courses at asenior undergraduate level as well as at the level of a �rst graduate course onnumerical methods for di�erential equations. The audience typically consistsof students from Mathematics, Computer Science and a variety of disciplinesin engineering and sciences such as Mechanical, Electrical and Chemical En-gineering, Physics, Earth Sciences, etc.The material that this book covers can be viewed as a �rst course on thenumerical solution of di�erential equations. It is designed for people whowant to gain a practical knowledge of the techniques used today. The courseaims to achieve a thorough understanding of the issues and methods involvedand of the reasons for the successes and failures of existing software. On onehand, we avoid an extensive, thorough, theorem-proof type exposition: wetry to get to current methods, issues and software as quickly as possible.On the other hand, this is not a quick recipe book, as we feel that a deeperunderstanding than can usually be gained by a recipe course is required toenable the student or the researcher to use their knowledge to design theirown solution approach for any nonstandard problems they may encounter infuture work. The book covers initial-value and boundary-value problems, aswell as di�erential-algebraic equations (DAEs). In a one-semester course wehave been typically covering over 75% of the material it contains.We wrote this book partially as a result of frustration at not being ableto assign a textbook adequate for the material that we have found ourselvescovering. There is certainly excellent, in-depth literature around. In fact, weare making repeated references to exhaustive texts which, combined, coveralmost all the material in this book. Those books contain the proofs andreferences which we omit. They span thousands of pages, though, and thetime commitment required to study them in adequate depth may be morethan many students and researchers can a�ord to invest. We have tried tostay below a 350-page limit and to address all three ODE-related areas men-ii

iiitioned above. A signi�cant amount of additional material is covered in theExercises. Other additional important topics are referred to in brief sectionsof Notes and References. Software is an important and well-developed partof this subject. We have attempted to cover the most fundamental softwareissues in the text. Much of the excellent and publicly-available software isdescribed in the Software sections at the end of the relevant chapters, andavailable codes are cross-referenced in the index. Review material is high-lighted and presented in the text when needed, and it is also cross-referencedin the index.Traditionally, numerical ODE texts have spent a great deal of time de-veloping families of higher order methods, e.g. Runge-Kutta and linear mul-tistep methods, applied �rst to nonsti� problems and then to sti� problems.Initial value problems and boundary value problems have been treated inseparate texts, although there is much in common. There have been fun-damental di�erences in approach, notation, and even in basic de�nitions,between ODE initial value problems, ODE boundary value problems, andpartial di�erential equations (PDEs).We have chosen instead to focus on the classes of problems to be solved,mentioning wherever possible applications which can lend insight into therequirements and the potential sources of di�culty for numerical solution.We begin by outlining the relevant mathematical properties of each problemclass, then carefully develop the lower-order numerical methods and funda-mental concepts for the numerical analysis. Next we introduce the appropri-ate families of higher-order methods, and �nally we describe in some detailhow these methods are implemented in modern adaptive software. An im-portant feature of this book is that it gives an integrated treatment of ODEinitial value problems, ODE boundary value problems, and DAEs, empha-sizing not only the di�erences between these types of problems but also thefundamental concepts, numerical methods and analysis which they have incommon. This approach is also closer to the typical presentation for PDEs,leading, we hope, to a more natural introduction to that important subject.Knowledge of signi�cant portions of the material in this book is essentialfor the rapidly emerging �eld of numerical dynamical systems. These are nu-merical methods employed in the study of the long term, qualitative behaviorof various nonlinear ODE systems. We have emphasized and developed inthis work relevant problems, approaches and solutions. But we avoided de-veloping further methods which require deeper, or more speci�c, knowledgeof dynamical systems, which we did not want to assume as a prerequisite.The plan of the book is as follows. Chapter 1 is an introduction to thedi�erent types of mathematical models which are addressed in the book.We use simple examples to introduce and illustrate initial- and boundary-value problems for ODEs and DAEs. We then introduce some importantapplications where such problems arise in practice.

iv Each of the three parts of the book which follow starts with a chapterwhich summarizes essential theoretical, or analytical issues (i.e. before ap-plying any numerical method). This is followed by chapters which developand analyze numerical techniques. For initial value ODEs, which compriseroughly half this book, Chapter 2 summarizes the theory most relevant forcomputer methods, Chapter 3 introduces all the basic concepts and simplemethods (relevant also for boundary value problems and for DAEs), Chapter4 is devoted to one-step (Runge-Kutta) methods and Chapter 5 discussesmultistep methods.Chapters 6-8 are devoted to boundary value problems for ODEs. Chapter6 discusses the theory which is essential to understand and to make e�ectiveuse of the numerical methods for these problems. Chapter 7 brie y con-siders shooting-type methods and Chapter 8 is devoted to �nite di�erenceapproximations and related techniques.The remaining two chapters consider DAEs. This subject has been re-searched and solidi�ed only very recently (in the past 15 years). Chapter 9is concerned with background material and theory. It is much longer thanChapters 2 and 6 because understanding the relationship between ODEs andDAEs, and the questions regarding reformulation of DAEs, is essential andalready suggests a lot regarding computer approaches. Chapter 10 discussesnumerical methods for DAEs.Various courses can be taught using this book. A 10-week course can bebased on the �rst 5 chapters, with an addition from either one of the remain-ing two parts. In a 13-week course (or shorter in a more advanced graduateclass) it is possible to cover comfortably Chapters 1-5 and either Chapters 6-8or Chapters 9-10, with a more super�cial coverage of the remaining material.The exercises vary in scope and level of di�culty. We have provided somehints, or at least warnings, for those exercises that we (or our students) havefound more demanding.Many people helped us with the tasks of shaping up, correcting, �lteringand re�ning the material in this book. First and foremost there are ourstudents in the various classes we taught on this subject. They made usacutely aware of the di�erence between writing with the desire to explainand writing with the desire to impress. We note, in particular, G. Lakatos,D. Aruliah, P. Ziegler, H. Chin, R. Spiteri, P. Lin, P. Castillo, E. Johnson,D. Clancey and D. Rasmussen. We have bene�ted particularly from ourearlier collaborations on other, related books with K. Brenan, S. Campbell,R. Mattheij and R. Russell. Colleagues who have o�ered much insight, adviceand criticism include E. Biscaia, G. Bock, C. W. Gear, W. Hayes, C. Lubich,V. Murata, D. Pai, J. B. Rosen, L. Shampine and A. Stuart. Larry Shampine,in particular, did an incredibly extensive refereeing job and o�ered manycomments which have helped us to signi�cantly improve this text. We have

PREFACE valso bene�ted from comments of numerous anonymous referees.December 2, 1997 U. M. AscherL. R. Petzold

vi PREFACE

Contents1 Ordinary Di�erential Equations 11.1 Initial Value Problems . . . . . . . . . . . . . . . . . . . . . . 31.2 Boundary Value Problems . . . . . . . . . . . . . . . . . . . . 81.3 Di�erential-Algebraic Equations . . . . . . . . . . . . . . . . . 91.4 Families of Application Problems . . . . . . . . . . . . . . . . 111.5 Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . 151.6 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 On Problem Stability 192.1 Test Equation and General De�nitions . . . . . . . . . . . . . 212.2 Linear, Constant Coe�cient Systems . . . . . . . . . . . . . . 222.3 Linear, Variable Coe�cient Systems . . . . . . . . . . . . . . . 262.4 Nonlinear Problems . . . . . . . . . . . . . . . . . . . . . . . . 282.5 Hamiltonian Systems . . . . . . . . . . . . . . . . . . . . . . . 292.6 Notes and References . . . . . . . . . . . . . . . . . . . . . . . 312.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Basic Methods, Basic Concepts 353.1 A Simple Method: Forward Euler . . . . . . . . . . . . . . . . 353.2 Convergence, Accuracy, Consistency and 0-Stability . . . . . . 383.3 Absolute Stability . . . . . . . . . . . . . . . . . . . . . . . . . 423.4 Sti�ness: Backward Euler . . . . . . . . . . . . . . . . . . . . 473.5 A-Stability, Sti� Decay . . . . . . . . . . . . . . . . . . . . . . 563.6 Symmetry: Trapezoidal Method . . . . . . . . . . . . . . . . . 583.7 Rough Problems . . . . . . . . . . . . . . . . . . . . . . . . . 613.8 Software, Notes and References . . . . . . . . . . . . . . . . . 643.8.1 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.8.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . 653.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674 One Step Methods 734.1 The First Runge-Kutta Methods . . . . . . . . . . . . . . . . 754.2 General Formulation of Runge-Kutta Methods . . . . . . . . . 81vii

viii CONTENTS4.3 Convergence, 0-Stability and Order for Runge-Kutta Methods 834.4 Regions of Absolute Stability for Explicit Runge-Kutta Methods 894.5 Error Estimation and Control . . . . . . . . . . . . . . . . . . 914.6 Sensitivity to Data Perturbations . . . . . . . . . . . . . . . . 964.7 Implicit Runge-Kutta and Collocation Methods . . . . . . . . 1014.7.1 Implicit Runge-Kutta Methods Based on Collocation . 1024.7.2 Implementation and Diagonally Implicit Methods . . . 1054.7.3 Order Reduction . . . . . . . . . . . . . . . . . . . . . 1084.7.4 More on Implementation and SIRK Methods . . . . . . 1094.8 Software, Notes and References . . . . . . . . . . . . . . . . . 1104.8.1 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104.8.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . 1124.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135 Linear Multistep Methods 1255.1 The Most Popular Methods . . . . . . . . . . . . . . . . . . . 1275.1.1 Adams Methods . . . . . . . . . . . . . . . . . . . . . . 1285.1.2 Backward Di�erentiation Formulae . . . . . . . . . . . 1315.1.3 Initial Values for Multistep Methods . . . . . . . . . . 1325.2 Order, 0-Stability and Convergence . . . . . . . . . . . . . . . 1345.2.1 Order . . . . . . . . . . . . . . . . . . . . . . . . . . . 1345.2.2 Stability: Di�erence Equations and the Root Condition 1375.2.3 0-Stability and Convergence . . . . . . . . . . . . . . . 1395.3 Absolute Stability . . . . . . . . . . . . . . . . . . . . . . . . . 1435.4 Implementation of Implicit Linear Multistep Methods . . . . . 1465.4.1 Functional Iteration . . . . . . . . . . . . . . . . . . . . 1465.4.2 Predictor-Corrector Methods . . . . . . . . . . . . . . . 1465.4.3 Modi�ed Newton Iteration . . . . . . . . . . . . . . . . 1485.5 Designing Multistep General-Purpose Software . . . . . . . . . 1495.5.1 Variable Step-Size Formulae . . . . . . . . . . . . . . . 1505.5.2 Estimating and Controlling the Local Error . . . . . . 1525.5.3 Approximating the Solution at O�-Step Points . . . . . 1555.6 Software, Notes and References . . . . . . . . . . . . . . . . . 1555.6.1 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 1555.6.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . 1565.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1576 More BVP Theory and Applications 1636.1 Linear Boundary Value Problems and Green's Function . . . . 1666.2 Stability of Boundary Value Problems . . . . . . . . . . . . . . 1686.3 BVP Sti�ness . . . . . . . . . . . . . . . . . . . . . . . . . . . 1716.4 Some Reformulation Tricks . . . . . . . . . . . . . . . . . . . . 1726.5 Notes and References . . . . . . . . . . . . . . . . . . . . . . . 174

CONTENTS ix6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1757 Shooting 1777.1 Shooting: a Simple Method and its Limitations . . . . . . . . 1777.1.1 Di�culties . . . . . . . . . . . . . . . . . . . . . . . . . 1807.2 Multiple Shooting . . . . . . . . . . . . . . . . . . . . . . . . . 1837.3 Software, Notes and References . . . . . . . . . . . . . . . . . 1867.3.1 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 1867.3.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . 1877.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1878 Finite Di�erence Methods for BVPs 1938.1 Midpoint and Trapezoidal Methods . . . . . . . . . . . . . . . 1948.1.1 Solving Nonlinear Problems: Quasilinearization . . . . 1978.1.2 Consistency, 0-stability and Convergence . . . . . . . . 2018.2 Solving the Linear Equations . . . . . . . . . . . . . . . . . . . 2048.3 Higher Order Methods . . . . . . . . . . . . . . . . . . . . . . 2068.3.1 Collocation . . . . . . . . . . . . . . . . . . . . . . . . 2068.3.2 Acceleration Techniques . . . . . . . . . . . . . . . . . 2088.4 More on Solving Nonlinear Problems . . . . . . . . . . . . . . 2108.4.1 Damped Newton . . . . . . . . . . . . . . . . . . . . . 2108.4.2 Shooting for Initial Guesses . . . . . . . . . . . . . . . 2118.4.3 Continuation . . . . . . . . . . . . . . . . . . . . . . . 2118.5 Error Estimation and Mesh Selection . . . . . . . . . . . . . . 2138.6 Very Sti� Problems . . . . . . . . . . . . . . . . . . . . . . . . 2158.7 Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2208.8 Software, Notes and References . . . . . . . . . . . . . . . . . 2228.8.1 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 2228.8.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . 2238.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2239 More on Di�erential-Algebraic Equations 2319.1 Index and Mathematical Structure . . . . . . . . . . . . . . . 2329.1.1 Special DAE Forms . . . . . . . . . . . . . . . . . . . . 2389.1.2 DAE Stability . . . . . . . . . . . . . . . . . . . . . . . 2459.2 Index Reduction and Stabilization: ODE with Invariant . . . . 2479.2.1 Reformulation of Higher-Index DAEs . . . . . . . . . . 2489.2.2 ODEs with Invariants . . . . . . . . . . . . . . . . . . 2509.2.3 State Space Formulation . . . . . . . . . . . . . . . . . 2539.3 Modeling with DAEs . . . . . . . . . . . . . . . . . . . . . . . 2549.4 Notes and References . . . . . . . . . . . . . . . . . . . . . . . 2569.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

x CONTENTS10 Numerical Methods for Di�erential-Algebraic Equations 26310.1 Direct Discretization Methods . . . . . . . . . . . . . . . . . . 26410.1.1 A Simple Method: Backward Euler . . . . . . . . . . . 26510.1.2 BDF and General Multistep Methods . . . . . . . . . . 26810.1.3 Radau Collocation and Implicit Runge-Kutta Methods 27010.1.4 Practical Di�culties . . . . . . . . . . . . . . . . . . . 27610.1.5 Specialized Runge-Kutta Methods for Hessenberg Index-2 DAEs . . . . . . . . . . . . . . . . . . . . . . . . . . 28010.2 Methods for ODEs on Manifolds . . . . . . . . . . . . . . . . . 28210.2.1 Stabilization of the Discrete Dynamical System . . . . 28310.2.2 Choosing the Stabilization Matrix F . . . . . . . . . . 28710.3 Software, Notes and References . . . . . . . . . . . . . . . . . 29010.3.1 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 29010.3.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . 29210.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293Bibliography 300Index 307

List of Tables3.1 Maximum errors for Example 3.1. . . . . . . . . . . . . . . . 603.2 Maximum errors for long interval integration of y0 = (cos t)y . 714.1 Errors and calculated convergence rates for the forward Euler,the explicit midpoint (RK2) and the classical Runge-Kutta(RK4) methods . . . . . . . . . . . . . . . . . . . . . . . . . . 805.1 Coe�cients of Adams-Bashforth methods up to order 6 . . . 1305.2 Coe�cients of Adams-Moulton methods up to order 6 . . . . 1315.3 Coe�cients of BDF methods up to order 6 . . . . . . . . . . 1325.4 Example 5.3: Errors and calculated convergence rates for Adams-Bashforth methods. . . . . . . . . . . . . . . . . . . . . . . . 1335.5 Example 5.3: Errors and calculated convergence rates for Adams-Moulton methods. . . . . . . . . . . . . . . . . . . . . . . . . 1345.6 Example 5.3: Errors and calculated convergence rates for BDFmethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1358.1 Maximum errors for Example 8.1 using the midpoint method:uniform meshes. . . . . . . . . . . . . . . . . . . . . . . . . . 1958.2 Maximum errors for Example 8.1 using the midpoint method:nonuniform meshes. . . . . . . . . . . . . . . . . . . . . . . . 1958.3 Maximum errors for Example 8.1 using collocation at 3 Gaus-sian points: uniform meshes. . . . . . . . . . . . . . . . . . . 2078.4 Maximum errors for Example 8.1 using collocation at 3 Gaus-sian points: nonuniform meshes. . . . . . . . . . . . . . . . . 20710.1 Errors for Kepler's problem using various 2nd order methods. 28510.2 Maximum drifts for the robot arm; � denotes an error over ow. 291xi

xii LIST OF TABLES

List of Figures1.1 u vs t for u(0) = 1 and various values of u0(0). . . . . . . . . 21.2 Simple pendulum. . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Periodic solution forming a cycle in the y1 � y2 plane. . . . . 51.4 Method of lines. The shaded strip is the domain on whichthe di�usion PDE is de�ned. The approximations yi(t) arede�ned along the dashed lines. . . . . . . . . . . . . . . . . . 72.1 Errors due to perturbations for stable and unstable test equa-tions. The original, unperturbed trajectories are in solid curves,the perturbed in dashed. Note that the y-scales in Figures (a)and (b) are not the same. . . . . . . . . . . . . . . . . . . . . 233.1 The forward Euler method. The exact solution is the curvedsolid line. The numerical values are circled. The broken lineinterpolating them is tangential at the beginning of each stepto the ODE trajectory passing through that point (dashedlines). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2 Absolute stability region for the forward Euler method. . . . 433.3 Approximate solutions for Example 3.1 using the forward Eu-ler method, with h = :19 and h = :21 . The oscillatory pro�lecorresponds to h = :21; for h = :19 the qualitative behaviorof the exact solution is obtained. . . . . . . . . . . . . . . . . 443.4 Approximate solution and plausible mesh, Example 3.2. . . . 483.5 Absolute stability region for the backward Euler method. . . 503.6 Approximate solution on a coarse uniform mesh for Example3.2, using backward Euler (the smoother curve) and trape-zoidal methods. . . . . . . . . . . . . . . . . . . . . . . . . . 583.7 Sawtooth function for � = 0:2. . . . . . . . . . . . . . . . . . 624.1 Classes of higher order methods. . . . . . . . . . . . . . . . . 744.2 Approximate area under curve . . . . . . . . . . . . . . . . . 774.3 Midpoint quadrature. . . . . . . . . . . . . . . . . . . . . . . 77xiii

xiv LIST OF FIGURES4.4 Stability regions for p-stage explicit Runge-Kutta methods oforder p, p = 1; 2; 3; 4. The inner circle corresponds to forwardEuler, p = 1. The larger p is, the larger the stability region.Note the \ear lobes" of the 4th order method protruding intothe right half plane. . . . . . . . . . . . . . . . . . . . . . . . . 904.5 Schematic of a mobile robot . . . . . . . . . . . . . . . . . . . 984.6 Toy car routes under constant steering: unperturbed (solidline), steering perturbed by �� (dash-dot lines), and corre-sponding trajectories computed by the linear sensitivity anal-ysis (dashed lines). . . . . . . . . . . . . . . . . . . . . . . . . 1004.7 Energy error for the Morse potential using leapfrog with h =2:3684. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1174.8 Astronomical orbit using the Runge-Kutta Fehlberg method. 1184.9 Modi�ed Kepler problem: approximate and exact solutions . . 1245.1 Adams-Bashforth methods . . . . . . . . . . . . . . . . . . . 1285.2 Adams-Moulton methods . . . . . . . . . . . . . . . . . . . . 1305.3 Zeros of �(�) for a 0-stable method. . . . . . . . . . . . . . . 1415.4 Zeros of �(�) for a strongly stable method. It is possible todraw a circle contained in the unit circle about each extraneousroot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1425.5 Absolute stability regions of Adams methods . . . . . . . . . 1445.6 BDF absolute stability regions. The stability regions are out-side the shaded area for each method. . . . . . . . . . . . . . 1455.7 Lorenz \butter y" in the y1 � y3 plane. . . . . . . . . . . . . 1596.1 Two solutions u(t) for the BVP of Example 6.2. . . . . . . . 1656.2 The function y1(t) and its mirror image y2(t) = y1(b� t), for� = �2; b = 10. . . . . . . . . . . . . . . . . . . . . . . . . . 1687.1 Exact (solid line) and shooting (dashed line) solutions for Ex-ample 7.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1817.2 Exact (solid line) and shooting (dashed line) solutions for Ex-ample 7.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1827.3 Multiple shooting . . . . . . . . . . . . . . . . . . . . . . . . 1838.1 Example 8.1: Exact and approximate solutions (indistinguish-able) for � = 50, using the indicated mesh. . . . . . . . . . . 1968.2 Zero-structure of the matrix A, m = 3; N = 10. The matrixsize is m(N + 1) = 33. . . . . . . . . . . . . . . . . . . . . . . 2048.3 Zero-structure of the permutedmatrixA with separated bound-ary conditions, m = 3; k = 2; N = 10. . . . . . . . . . . . . . 2058.4 Classes of higher order methods. . . . . . . . . . . . . . . . . 2068.5 Bifurcation diagram for Example 8.5 : kuk2 vs �. . . . . . . . 214

LIST OF FIGURES xv8.6 Solution for Example 8.6 with � = �1000 using an upwinddiscretization with a uniform step size h = 0:1 (solid line).The \exact" solution is also displayed (dashed line). . . . . . 2199.1 A function and its less smooth derivative. . . . . . . . . . . . 2329.2 Sti� spring pendulum, " = 10�3, initial conditions q(0) =(1 � "1=4; 0)T ;v(0) = 0. . . . . . . . . . . . . . . . . . . . . . . 2449.3 Perturbed (dashed lines) and unperturbed (solid line) solu-tions for Example 9.9. . . . . . . . . . . . . . . . . . . . . . . 2529.4 A matrix in Hessenberg form. . . . . . . . . . . . . . . . . . . 25810.1 Methods for the direct discretization of DAEs in general form. 26510.2 Maximum errors for the �rst 3 BDF methods for Example10.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27010.3 A simple electric circuit. . . . . . . . . . . . . . . . . . . . . . 27410.4 Results for a simple electric circuit: U2(t) (solid line) and theinput Ue(t) (dashed line). . . . . . . . . . . . . . . . . . . . . 27610.5 Two-link planar robotic system . . . . . . . . . . . . . . . . . 28910.6 Constraint path for (x2; y2). . . . . . . . . . . . . . . . . . . . 290

Chapter 1Ordinary Di�erential EquationsOrdinary di�erential equations (ODEs) arise in many instances when us-ing mathematical modeling techniques for describing phenomena in science,engineering, economics, etc. In most cases the model is too complex to al-low �nding an exact solution or even an approximate solution by hand: ane�cient, reliable computer simulation is required.Mathematically, and computationally, a �rst cut at classifying ODE prob-lems is with respect to the additional or side conditions associated with them.To see why, let us look at a simple example. Consideru00(t) + u(t) = 0; 0 � t � bwhere t is the independent variable (it is often, but not always, convenientto think of t as \time"), and u = u(t) is the unknown, dependent variable.Throughout this book we use the notationu0 = dudt ; u00 = d2udt2 ;etc. We shall often omit explicitly writing the dependence of u on t.The general solution of the ODE for u depends on two parameters � and�, u(t) = � sin(t+ �):We can therefore impose two side conditions:� Initial value problem: Given values u(0) = c1 and u0(0) = c2, the pairof equations � sin� = u(0) = c1� cos� = u0(0) = c2can always be solved uniquely for � = tan�1 c1c2 and � = c1sin� (or � =c2cos� { at least one of these is well-de�ned). The initial value problem1

2 Chapter 1: Ordinary Di�erential Equations0 0.5 1 1.5 2 2.5 3 3.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

t

u

ODE trajectories

Figure 1.1: u vs t for u(0) = 1 and various values of u0(0).has a unique solution for any initial data c = (c1; c2)T . Such solutioncurves are plotted for c1 = 1 and di�erent values of c2 in Fig. 1.1.� Boundary value problem: Given values u(0) = c1 and u(b) = c2, itappears from Fig. 1.1 that for b = 2, say, if c1 and c2 are chosen carefullythen there is a unique solution curve that passes through them, justlike in the initial value case. However, consider the case where b = �:Now di�erent values of u0(0) yield the same value u(�) = �u(0) (seeagain Fig. 1.1). So, if the given value of u(b) = c2 = �c1 then we havein�nitely many solutions, whereas if c2 6= �c1 then no solution exists.This simple illustration already indicates some important general issues.For initial value problems, one starts at the initial point with all the solu-tion information and marches with it (in \time") { the process is local. Forboundary value problems the entire solution information (for a second orderproblem this consists of u and u0) is not locally known anywhere, and theprocess of constructing a solution is global in t. Thus we may expect manymore (and di�erent) di�culties with the latter, and this is re ected in thenumerical procedures discussed in this book.

Chapter 1: Ordinary Di�erential Equations 31.1 Initial Value ProblemsThe general form of an initial value problem (IVP) that we shall discuss isy0 = f(t;y); 0 � t � by(0) = c (given): (1.1)Here y and f are vectors with m components, y = y(t), and f is in generala nonlinear function of t and y. When f does not depend explicitly on t, wespeak of the autonomous case. When describing general numerical methodswe shall often assume the autonomous case simply in order to carry lessnotation around. The simple example from the beginning of this chapter isin the form (1.1) with m = 2, y = (u; u0)T , f = (u0;�u)T .In (1.1) we assume, for simplicity of notation, that the starting point fort is 0. An extension to an arbitrary interval of integration [a; b] of everythingwhich follows is obtained without di�culty.Before proceeding further, we give three examples which are famous forbeing very simple on one hand and for representing important classes ofapplications on the other hand.Example 1.1 (Simple pendulum) Consider a tiny ball of mass 1 attachedto the end of a rigid, massless rod of length 1. At its other end the rod'sposition is �xed at the origin of a planar coordinate system (see Fig. 1.2).ΘFigure 1.2: Simple pendulum.Denoting by � the angle between the pendulum and the y-axis, the friction-free motion is governed by the ODE (cf. Example 1.5 below)�00 = �g sin � (1.2)where g is the (scaled) constant of gravity. This is a simple, nonlinear ODEfor �. The initial position and velocity con�guration translates into values

4 Chapter 1: Ordinary Di�erential Equationsfor �(0) and �0(0). The linear, trivial example from the beginning of thischapter can be obtained from an approximation of (a rescaled) (1.2) for smalldisplacements �. 2The pendulum problem is posed as a second order scalar ODE. Much ofthe software for initial value problems is written for �rst order systems in theform (1.1). A scalar ODE of order m,u(m) = g(t; u; u0; : : : ; u(m�1))can be rewritten as a �rst-order system by introducing a new variable foreach derivative, with y1 = u:y01 = y2y02 = y3...y0m�1 = ymy0m = g(t; y1; y2; : : : ; ym):Example 1.2 (Predator-prey model) Following is a basic, simple modelfrom population biology which involves di�erential equations. Consider anecological system consisting of one prey species and one predator species. Theprey population would grow unboundedly if the predator were not present, andthe predator population would perish without the presence of the prey. Denote� y1(t)| the prey population at time t� y2(t)| the predator population at time t� �| (prey's birthrate){(prey's natural death rate) (� > 0)� �| probability of a prey and a predator to come together� | predator's natural growth rate (without prey; < 0)� �| increase factor of growth of predator if prey and predator meet.Typical values for these constants are � = :25, � = :01, = �1:00, � = :01.Writing y = 0@y1y21A f =0@�y1 � �y1y2 y2 + �y1y21A (1.3)

Chapter 1: Ordinary Di�erential Equations 515

20

25

30

35

40

70 80 90 100 110 120 130Figure 1.3: Periodic solution forming a cycle in the y1 � y2 plane.we obtain an ODE in the form (1.1) with m = 2 components, describing thetime evolution of these populations.The qualitative question here is, starting from some initial values y(0)out of a set of reasonable possibilities, will these two populations surviveor perish in the long run? As it turns out, this model possesses periodicsolutions: starting, say, from y(0) = (80; 30)T , the solution reaches the samepair of values again after some time period T , i.e. y(T ) = y(0). Continuingto integrate past T yields a repetition of the same values, y(T + t) = y(t).Thus, the solution forms a cycle in the phase plane (y1; y2) (see Fig. 1.3).Starting from any point on this cycle the solution stays on the cycle for alltime. Other initial values not on this cycle yield other periodic solutions witha generally di�erent period. So, under these circumstances the populations ofthe predator and prey neither explode nor vanish for all future times, althoughtheir number never becomes constant. 1 21In other examples, such as the Van der Pol equation (7.13), the solution forms anattracting limit cycle: starting from any point on the cycle the solution stays on it for alltime, and starting from points nearby the solution tends in time towards the limit cycle.The neutral stability of the cycle in our simple example, in contrast, is one reason whythis predator-prey model is discounted amongmathematical biologists as being too simple.

6 Chapter 1: Ordinary Di�erential EquationsExample 1.3 (A di�usion problem) A typical di�usion problem in onespace variable x and time t leads to the partial di�erential equation (PDE)@u@t = @@x �p@u@x�+ g(x; u);for an unknown function u(t; x) of two independent variables de�ned on astrip 0 � x � 1; t � 0. For simplicity, assume that p = 1 and g is a knownfunction. Typical side conditions which make this problem well-posed areu(0; x) = q(x); 0 � x � 1 Initial conditionsu(t; 0) = �(t); u(t; 1) = �(t); t � 0 Boundary conditionsTo solve this problem numerically, consider discretizing in the space vari-able �rst. For simplicity assume a uniform mesh with spacing �x = 1=(m+1), and let yi(t) approximate u(xi; t), where xi = i�x; i = 0; 1; : : : ;m + 1.Then replacing @2u@x2 by a second-order central di�erence we obtaindyidt = yi+1 � 2yi + yi�1�x2 + g(xi; yi); i = 1; : : : ;mwith y0(t) = �(t) and ym+1(t) = �(t) given. We have obtained an initialvalue ODE problem of the form (1.1) with the initial data ci = q(xi).This technique of replacing spatial derivatives by �nite di�erence approx-imations and solving an ODE problem in time is referred to as the methodof lines. Fig. 1.4 illustrates the origin of the name. Its more general form isdiscussed further in Example 1.7 below. 2We now return to the general initial value problem for (1.1). Our inten-tion in this book is to keep the number of theorems down to a minimum:the references which we quote have them all in much detail. But we willnonetheless write down those which are of fundamental importance, and theone just below captures the essence of the (relative) simplicity and localityof initial value ODEs. For the notation that is used in this theorem andthroughout the book, we refer to x1.6.Theorem 1.1 Let f(t;y) be continuous for all (t;y) in a region D = f0 �t � b; �1 < jyj <1g . Moreover, assume Lipschitz continuity in y: thereexists a constant L such that for all (t;y) and (t; y) in D,jf(t;y)� f(t; y)j � Ljy� yj: (1.4)Then1. For any c 2 Rm there exists a unique solution y(t) throughout theinterval [0; b] for the initial value problem (1.1). This solution is dif-ferentiable.

Chapter 1: Ordinary Di�erential Equations 7x0 1

t

Figure 1.4: Method of lines. The shaded strip is the domain on which thedi�usion PDE is de�ned. The approximations yi(t) are de�ned along thedashed lines.2. The solution y depends continuously on the initial data: if y also sat-is�es the ODE (but not the same initial values) thenjy(t)� y(t)j � eLtjy(0)� y(0)j: (1.5)3. If y satis�es, more generally, a perturbed ODEy0 = f(t; y) + r(t; y)where r is bounded on D, krk �M , thenjy(t)� y(t)j � eLtjy(0)� y(0)j+ ML (eLt � 1): (1.6)Thus we have solution existence, uniqueness and continuous dependenceon the data, in other words a well-posed problem, provided that the condi-tions of the theorem hold. Let us check these conditions: If f is di�erentiablein y (we shall automatically assume this throughout) then the L can be takenas a bound on the �rst derivatives of f with respect to y. Denote by fy theJacobian matrix, (fy)ij = @fi@yj ; 1 � i; j � m:

8 Chapter 1: Ordinary Di�erential EquationsWe can writef(t;y)� f(t; y) = Z 10 ddsf(t; y+ s(y� y)) ds= Z 10 fy(t; y+ s(y� y)) (y � y) ds:Therefore, we can choose L = sup(t;y)2D kfy(t;y)k.In many cases we must restrict D in order to be assured of the existenceof such a (�nite) bound L. For instance, if we restrict D to include boundedy such that jy� cj � , and on this D both the Lipschitz bound (1.4) holdsand jf(t;y)j � M , then a unique existence of the solution is guaranteed for0 � t � min(b; =M), giving the basic existence result a more local avor.For further theory and proofs see, for instance, Mattheij & Molnaar [67].#" !Reader's advice: Before continuing our introduction, let us re-mark that a reader who is interested in getting to the numericsof initial value problems as soon as possible may skip the rest ofthis chapter and the next, at least on �rst reading.1.2 Boundary Value ProblemsThe general form of a boundary value problem (BVP) which we consider is anonlinear �rst order system of m ODEs subject to m independent (generallynonlinear) boundary conditions, y0 = f(t;y) (1.7a)g(y(0);y(b)) = 0: (1.7b)We have already seen in the beginning of the chapter that in those caseswhere solution information is given at both ends of the integration interval(or, more generally, at more than one point in time), nothing general likeTheorem 1.1 can be expected to hold. Methods for �nding a solution, bothanalytically and numerically, must be global and the task promises to begenerally harder than for initial value problems. This basic di�erence ismanifested in the current status of software for boundary value problems,which is much less advanced or robust than that for initial value problems.Of course, well-posed boundary value problems do arise on many occa-sions.

Chapter 1: Ordinary Di�erential Equations 9Example 1.4 (Vibrating spring) The small displacement u of a vibratingspring obeys a linear di�erential equation�(p(t)u0)0 + q(t)u = r(t)where p(t) > 0 and q(t) � 0 for all 0 � t � b. (Such an equation describesalso many other physical phenomena in one space variable t.) If the springis �xed at one end and is left to oscillate freely at the other end then we getthe boundary conditions u(0) = 0; u0(b) = 0:We can write this problem in the form (1.7) for y = (u; u0)T . Better still,we can use y = 0@ upu01A, obtaining f = 0@ p�1y2qy1� r1A, g = 0@y1(0)y2(b)1A. Thisboundary value problem has a unique solution (which gives the minimum forthe energy in the spring), as shown and discussed in many books on �niteelement methods, e.g. Strang & Fix [90]. 2Another example of a boundary value problem is provided by the predator-prey system of Example 1.2, if we wish to �nd the periodic solution (whoseexistence is evident from Fig. 1.3). We can specify y(0) = y(b). However,note that b is unknown, so the situation is more complex. Further treatmentis deferred to Chapter 6 and Exercise 7.5. A complete treatment of �ndingperiodic solutions for ODE systems falls outside the scope of this book.What can be generally said about existence and uniqueness of solutionsto a general boundary value problem (1.7)? We may consider the associatedinitial value problem (1.1) with the initial values c as a parameter vector tobe found. Denoting the solution for such an IVP y(t; c), we wish to �nd thesolution(s) for the nonlinear algebraic system of m equationsg(c;y(b; c)) = 0: (1.8)However, in general there may be one, many or no solutions for a system like(1.8). We delay further discussion to Chapter 6.1.3 Di�erential-Algebraic EquationsBoth the prototype IVP (1.1) and the prototype BVP (1.7) refer to an explicitODE system y0 = f(t;y): (1.9)

10 Chapter 1: Ordinary Di�erential EquationsA more general form is an implicit ODEF(t;y;y0) = 0 (1.10)where the Jacobian matrix @F(t;u;v)@v is assumed nonsingular for all argumentvalues in an appropriate domain. In principle it is then often possible to solvefor y0 in terms of t and y, obtaining the explicit ODE form (1.9). However,this transformation may not always be numerically easy or cheap to realize(see Example 1.6 below). Also, in general there may be additional questionsof existence and uniqueness; we postpone further treatment until Chapter 9.Consider next another extension of the explicit ODE, that of an ODEwith constraints: x0 = f(t;x; z) (1.11a)0 = g(t;x; z): (1.11b)Here the ODE (1.11a) for x(t) depends on additional algebraic variablesz(t), and the solution is forced in addition to satisfy the algebraic constraints(1.11b). The system (1.11) is a semi-explicit system of di�erential-algebraicequation (DAE). Obviously, we can cast (1.11) in the form of an implicit ODE(1.10) for the unknown vector y = 0@xz1A; however, the obtained Jacobianmatrix @F(t;u;v)@v = 0@I 00 01Ais no longer nonsingular.Example 1.5 (Simple pendulum revisited) The motion of the simplependulum of Fig. 1.2 can be expressed in terms of the Cartesian coordi-nates (x1; x2) of the tiny ball at the end of the rod. With z(t) a Lagrangemultiplier, Newton's equations of motion givex001 = �zx1x002 = �zx2 � gand the fact that the rod has a �xed length 1 gives the additional constraintx21 + x22 = 1:

Chapter 1: Ordinary Di�erential Equations 11After rewriting the two second-order ODEs as four �rst order ODEs, weobtain a DAE system of the form (1.11) with four equations in (1.11a) andone in (1.11b).In this very simple case of a multibody system, the change of variablesx1 = sin �; x2 = � cos � allows elimination of z by simply multiplying theODE for x1 by x2 and the ODE for x2 by x1 and subtracting. This yieldsthe simple ODE (1.2) of Example 1.1. Such a simple elimination procedureis usually impossible in more general situations, though. 2The di�erence between an implicit ODE (with a nonsingular Jacobianmatrix) and a DAE is fundamental. Consider the simple examplex0 = z0 = x� t:Clearly, the solution is x = t; z = 1, and no initial or boundary conditionsare needed. In fact, if an arbitrary initial condition x(0) = c is imposed itmay well be inconsistent with the DAE (unless c = 0, in which case thisinitial condition is just super uous). We refer to Chapter 9 for more on this.Another fundamental point to note is that even if consistent initial values aregiven we cannot expect a simple, general existence and uniqueness theoremlike Theorem 1.1 to hold for (1.11). The nonlinear equations (1.11b) alonemay have any number of solutions. Again we refer the reader to Chapter 9for more details.1.4 Families of Application ProblemsInitial-value and boundary-value problems for ODE and DAE systems arisein a wide variety of applications. Often an application generates a family ofproblems which share a particular system structure and/or solution require-ments. Here we brie y mention three families of problems from importantapplications. The notation we use is typical for these applications, and isnot necessarily consistent with (1.1) or (1.11). You don't need to understandthe details given in this section in order to follow the rest of the text { thismaterial is supplemental.Example 1.6 (Mechanical systems) When attempting to simulate the mo-tion of a vehicle for design or in order to simulate safety tests, or in phys-ically based modeling in computer graphics, or in a variety of instances in

12 Chapter 1: Ordinary Di�erential Equationsrobotics, one encounters the need for a fast, reliable simulation of the dy-namics of multibody systems. The system considered is an assembly of rigidbodies (e.g. comprising a car suspension system). The kinematics de�ne howthese bodies are allowed to move with respect to one another. Using general-ized position coordinates q = (q1; : : : ; qn)T for the bodies, with m (so-calledholonomic) constraints gj(t;q(t)) = 0; j = 1; : : : ;m, the equations of motioncan be written as ddt �@L@q0i�� @L@qi = 0; i = 1; : : : ; nwhere L = T � U �P�jgj is the Lagrangian, T is the kinetic energy andU is the potential energy. See almost any book on classical mechanics, forexample Arnold [1], or the lighter Marion & Thornton [65]. The resultingequations of motion can be written asq0 = v (1.12a)M(t;q)v0 = f(t;q;v)�GT (t;q)� (1.12b)0 = g(t;q) (1.12c)where G = @g@q , M is a positive de�nite generalized mass matrix, f are theapplied forces (other than the constraint forces) and v are the generalizedvelocities. The system sizes n and m depend on the chosen coordinates q.Typically, using relative coordinates (describing each body in terms of its nearneighbor) results in a smaller but more complicated system. If the topologyof the multibody system (i.e. the connectivity graph obtained by assigning anode to each body and an edge for each connection between bodies) does nothave closed loops, then with a minimal set of coordinates one can eliminateall the constraints (i.e. m = 0) and obtain an implicit ODE in (1.12). Forinstance, Example 1.1 uses a minimal set of coordinates, while Example 1.5does not, for a particular multibody system without loops. If the multibodysystem contains loops (e.g. a robot arm, consisting of two links, with the pathof the \hand" prescribed) then the constraints cannot be totally eliminated ingeneral and a DAE must be considered in (1.12) even if a minimal set ofcoordinates is employed. 2Example 1.7 (Method of lines) The di�usion equation of Example 1.3 isan instance of a time-dependent partial di�erential equation (PDE) in onespace dimension, @u@t = f �t; u; @u@x; @2u@x2� : (1.13)

Chapter 1: Ordinary Di�erential Equations 13Time-dependent PDEs naturally arise also in more than one space dimen-sion, with higher order spatial derivatives, and as systems of PDEs. Theprocess described in Example 1.3 is general: such a PDE can be transformedinto a large system of ordinary di�erential equations by replacing the spa-tial derivatives in one or more dimension by a discrete approximation (via�nite-di�erence, �nite-volume or �nite-element methods; see texts on numer-ical methods for PDEs, e.g. Strikwerda [91]). Typically, we obtain an initialvalue problem. This technique of semi-discretizing in space �rst and solvingan initial value ODE problem in time is referred to as the method of lines.It makes sense when two conditions are satis�ed: i) The 'time' variable t issu�ciently di�erent from the 'space' variables to warrant a special treatment.ii) There is no sharp front in the solution that moves rapidly as a functionof both space and time, i.e. the rapid moving fronts (if there are any) can bereasonably well-decoupled in time and space. Typically, the method of linesis more suitable for parabolic PDEs than for hyperbolic ones.Remaining still with the prototype di�usion problem considered in Exam-ple 1.3, in some situations the 'special' independent variable is not time butone of the spatial variables. This is the case in some interface problems.Another way to convert a PDE to an ODE system is then to replace the timederivative by a di�erence approximation. Replacing the time derivative by asimple backward di�erence approximation using time step �t in the di�usionequation yields un � un�1�t = @2un@x2 + g(x; un)and using u0 = q(x) and the given boundary conditions yields a boundaryvalue problem in x for each n. This technique, of replacing the time derivativeby a di�erence approximation and solving the boundary value problem inspace, is called the transverse method of lines. 2Example 1.8 (Optimal control) A rather large number of applicationsgive rise to optimal control problems. For instance, the problem may be toplan a route for a vehicle traveling between two points (and satisfying equa-tions of motion) such that fuel consumption is optimized, or the travel timeis minimized. Another instance is to optimize the performance of a chemicalprocessing plant. Typically, the state variables of the system, y(t), satisfy anODE system which involves a control function u(t) 2,y0 = f(t;y;u); 0 � t � b: (1.14a)2The dimension of u(t) is generally di�erent from that of y(t).

14 Chapter 1: Ordinary Di�erential EquationsThis system may be subject to some side conditions, e.g.y(0) = c (1.14b)but it is possible that y(b) is prescribed as well, or that there are no sideconditions at all. The control u(t) must be chosen so as to optimize somecriterion (or cost) function, sayminimize J = �(y(b); b) + Z b0 L(t;y(t);u(t))dt (1.15)subject to (1.14).The necessary conditions for an optimum in this problem are found byconsidering the Hamiltonian functionH(t;y;u;�) = mXi=1 �ifi(t;y;u) + L(t;y;u)where �i(t) are adjoint variables, i = 1; : : : ;m. The conditionsy0i = @H@�i ; i = 1; : : : ;myield the state equations (1.14a), and in addition we have ordinary di�erentialequations for the adjoint variables,�0i = �@H@yi = � mXj=1 �j @fj@yi � @L@yi ; i = 1; : : : ;m (1.16)and 0 = @H@ui ; i = 1; : : : ;mu: (1.17)This gives a DAE in general; however, often u(t) can be eliminated from(1.17) in terms of y and �, yielding an ODE system. Additional side condi-tions are required as well,�i(b) = @�@yi (b); i = 1; : : : ;m: (1.18)The system (1.14),(1.16),(1.17),(1.18) comprises a boundary value ODE (orDAE).An indirect approach for solving this optimal control problem involves thenumerical solution of the BVP just prescribed. The techniques described inChapters 7 and 8 are directly relevant. In contrast, a direct approach involvesthe discretization of (1.14),(1.15), and the subsequent numerical solution of

Chapter 1: Ordinary Di�erential Equations 15the resulting large, sparse (but �nite dimensional) constrained optimizationproblem. The techniques described in this book are relevant for this approachtoo, although less directly. Each of these two approaches has its advantages(and fans). Note that, even though (1.14) is an IVP, the direct approachdoes not yield a local process, which would have allowed a simple marchingalgorithm, because a change in the problem anywhere has a global e�ect,necessitating a global solution process (as needed for a BVP).A closely related family of applications involves parameter estimation inan ODE system. Given a set of solution data in time (usually obtained byexperiment), the problem is to choose the parameters to minimize a measureof the distance between the data and the solution of the ODE (or DAE)depending on the parameters.We note, furthermore, that optimal control applications often require, inaddition to the above model, also inequality (algebraic) constraints on thecontrols u(t) and on the state variables y(t) (e.g. a maximum speed or accel-eration which must not, or cannot, be exceeded in the vehicle route planningapplication). Such inequality constraints complicate the analysis yielding nec-essary conditions, but we do not pursue this further. There are many bookson optimal control and parameter estimation, e.g. Bryson & Ho [22]. 21.5 Dynamical SystemsRecent years have seen an explosion of interest and e�orts in the study of thelong term, qualitative behavior of various nonlinear ODE systems. Typically,one is interested in the behavior of the ow of a system y0 = f(t;y), not onlyin one trajectory for a given initial value c. Attention is often focussed thenon limit sets (a limit set is a special case of an invariant set, i.e., a set ofinitial data that is mapped into itself by the ow).While most of our book is concerned with the accurate and reliable sim-ulation of solution trajectories, and the reader is not assumed to necessar-ily possess a background in dynamical systems, the techniques we exploreare essential for numerical dynamical systems. Moreover, various additionalchallenges arise when considering the simulation of such qualitative proper-ties. In some cases these additional challenges can be addressed using simpletricks (e.g. for �nding a periodic solution, or for projecting onto a giveninvariant de�ned by algebraic equations), while on other occasions the chal-lenge is rather more substantial (e.g. �nding an invariant set in general, ornumerically integrating an ODE over a very long time period).

16 Chapter 1: Ordinary Di�erential EquationsThroughout this book we will pay attention to such additional consider-ations, especially when they extend our investigation in a natural way. Wewill certainly not attempt to do a complete job; rather, we will point outproblems, some solutions and some directions. For much more, we refer thereader to Stuart & Humphries [93].1.6 NotationThroughout the book, we use the following conventions for notation.� Scalar variables and constants are denoted by Roman and Greek letters,e.g. t, u, y, K, L, N , �, �, etc.� Vectors are denoted by boldface letters, e.g. f , y, c, etc. The ithcomponent of the vector y is denoted yi. (Distinguish this from the no-tation yn which will be used later on to denote a vector approximatingy at position tn.)� The maximum norm of a vector is denoted just like the absolute valueof a scalar: jyj = max1�i�m jyij. Occasionally the Euclidean vectornorm jyj2 = pyTy proves more convenient than the maximum norm{ we may drop the subscript when the precise vector norm used doesnot matter or is obvious.� Capital Roman letters are used for matrices. The induced norms ofmatrices are denoted by double bars:kAk = supjxj=1 jAxjjxj :Occasionally, a boldfaced capital Roman letter, e.g. A, is used for largematrices consisting of blocks which are themselves matrices.� The (sup) norms of functions are denoted as followskyk = sup0�t�b jy(t)j:� Letter from other alphabets, e.g. D; L;Nh, are used to denote domainsand operators. Also, Re and Im denote the real and imaginary partsof a complex scalar, and R is the set of real numbers.

Chapter 1: Ordinary Di�erential Equations 17� For a vector function g(x), where g has n components and x has kcomponents (g may depend on other variables too, e.g. g = g(t;x;y)),we denote the Jacobian matrix, i.e. the n � k matrix of �rst partialderivatives of g with respect to x, by gx or by @g@x :�@g@x�i;j � (gx)i;j = @gi@xj ; 1 � i � n; 1 � j � k :We use the Jacobian matrix notation a lot in this book, and occasionally�nd one of these common notational forms to be clearer than the otherin a particular context. Hence we keep them both.� The gradient of a scalar function of k variables g(x), denoted rg(x),is its one-row Jacobian matrix transposed into a vector function:rg(x) = gTx :The divergence of a vector function g(x) where g and x both have kcomponents, is the scalar function denoted by rg(x) and given byrg(x) = kXi=1 @gi@xi :

18 Chapter 1: Ordinary Di�erential Equations

Chapter 2On Problem StabilityThe term stability has been used in the literature for a large variety of di�erentconcepts. The basic, qualitative idea is that a model that produces a solution(output) for given data (input) should possess the property that if the inputis perturbed by a small amount then the output should also be perturbedby only a small amount. But the precise application of this idea to initialvalue ODEs, to boundary value ODEs and to numerical methods has givenrise to a multitude of de�nitions. The reader should therefore be carefulwhen speaking of stability to distinguish between stability of problems andof numerical methods, and between stability of initial and boundary valueproblems.In this chapter we brie y discuss the stability of initial value problems.No numerical solutions or methods are discussed yet { that will start onlyin the next chapter. Matrix eigenvalues play a central role here, so we alsoinclude a quick review below. 19

20 Chapter2: Initial Value ProblemsReview: Matrix eigenvalues. Given an m�m real matrix A,an eigenvalue � is a scalar which satis�esAx = �xfor some vector x 6= 0. In general, � is complex, but it is guaran-teed to be real if A is symmetric. The vector x, which is clearlydetermined only up to a scaling factor, is called an eigenvector.Counting multiplicities, A has m eigenvalues.A similarity transformation is de�ned, for any nonsingular matrixT , by B = T�1AT :The matrixB has the same eigenvalues as A and the two matricesare said to be similar. If B is diagonal, B = diagf�1; : : : ; �mg,then the displayed �i are the eigenvalues of A, the correspondingeigenvectors are the columns of T , and A is said to be diago-nalizable. Any symmetric matrix is diagonalizable, in fact by anorthogonal matrix (i.e. T can be chosen to satisfy T T = T�1).For a general matrix, however, an orthogonal similarity trans-formation can only bring A to a matrix B in upper triangularform (which, however, still features the eigenvalues on the maindiagonal of B).For a general A there is always a similarity transformation into aJordan canonical form,B = 0BBBBBBBBB@�1 0�2 � �0 �s1CCCCCCCCCA ; �i =0BBBBBB@�i 1 0�i 1. . . 10 �i1CCCCCCA ; i = 1; : : : ; s :

Chapter2: On Problem Stability 212.1 Test Equation and General De�nitionsConsider at �rst the simple scalar ODE, often referred to later as the testequation y0 = �y (2.1)where � is a constant. We allow � to be complex, because it represents aneigenvalue of a system's matrix. The solution for t � 0 isy(t) = e�ty(0):If y(t) and y(t) are two solutions of the test equation then their di�erencefor any t depends on their di�erence at the initial time:jy(t)� y(t)j = j(y(0)� y(0))e�tj = jy(0)� y(0)jeRe(�)t:We may consider y(t) as the \exact" solution sought, and y(t) as the solutionwhere the initial data has been perturbed. Clearly then, if Re(�) � 0 thisperturbation di�erence remains bounded at all later times, if Re(�) < 0 itdecays in time, and if Re(�) > 0 the di�erence between the two solutionsgrows unboundedly with t. These possibilities correspond to a stable, anasymptotically stable and an unstable solution, respectively.The precise de�nition for a general ODE systemy0 = f(t;y) (2.2)is more technical, but the spirit is the same as for the test equation. Weconsider (2.2) for all t � 0 and de�ne a solution (or trajectory) y(t) to be� stable if given any � > 0 there is a � > 0 such that any other solutiony(t) satisfying the ODE (2.2) andjy(0)� y(0)j � �also satis�es jy(t)� y(t)j � � for all t � 0� asymptotically stable if, in addition to being stable,jy(t)� y(t)j ! 0 as t!1:It would be worthwhile for the reader to compare these de�nitions tothe bound (1.5) of the fundamental Existence and Uniqueness Theorem 1.1.Note that the existence theorem speaks of a �nite, given integration interval.These de�nitions are given with respect to perturbations in the initialdata. What we really need to be considering are perturbations at any latertime and in the right hand side of (2.2) as well. These correspond to thebound (1.6) in Theorem 1.1 and lead to slightly stronger requirements. Butthe spirit is already captured in the simple de�nitions above, and the morecomplete de�nitions are left to ODE texts.

22 Chapter2: Initial Value ProblemsExample 2.1 Suppose we integrate a given IVP exactly for t � 0; thenwe perturb this trajectory at a point t0 = h by an amount � = �(h) andintegrate the IVP exactly again for t � t0, starting from the perturbed value.This process, which resembles the e�ect of a numerical discretization step,is now repeated a few times. The question is then, how do the perturbationerrors propagate? In particular, how far does the value of the last trajectorycomputed at t = b (� h) get from the value of the original trajectory at t = b?For the test equation (2.1), we can calculate everything precisely. Ify(t0) = c then y(t) = ce�(t�t0). So, starting from y(0) = 1, we calculatethe trajectoriesy(t) = yI(t) = e�tyII(t) = (e�h � �)e�(t�h) = e�t � �e�(t�h)yIII(t) = e�t � �e�(t�h)� �e�(t�2h)...For each such step we can de�ne the error due to the jth perturbation,ej(t) = �e�(t�jh):So, after n steps the di�erence between the original trajectory and the lastone computed at t � nh is e(t) = nXj=1 ej(t):Apparently from the form of ej(t), the errors due to perturbations tend todecrease in time for asymptotically stable problems and to increase in timefor unstable problems. This e�ect is clearly demonstrated in Fig. 2.1, wherewe took h = 0:1, � = 0:05 and plotted curves for the values � = �1; 1; 0.Note that the instability of y0 = y can really generate a huge deviation forlarge t (e.g. t = 30). 22.2 Linear, Constant Coe�cient SystemsHere we consider the extension of the test equation analysis to a simple ODEsystem, y0 = Ay (2.3)

Chapter2: On Problem Stability 230 0.5 1 1.5 2 2.5 3

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t

y (a) Error propagation for y0 = �y 0 0.5 1 1.5 2 2.5 30

2

4

6

8

10

12

14

16

18

20

t

y (b) Error propagation for y0 = y0 0.5 1 1.5 2 2.5 3

0.8

0.85

0.9

0.95

1

1.05

t

y (c) Error propagation for y0 = 0Figure 2.1: Errors due to perturbations for stable and unstable test equations.The original, unperturbed trajectories are in solid curves, the perturbed indashed. Note that the y-scales in Figures (a) and (b) are not the same.

24 Chapter2: Initial Value Problemswhere A is a constant m�m matrix. The solution for t � 0 isy(t) = eAty(0): (2.4)Review: The matrix exponential. The matrix exponential is de-�ned via power series expansion byeAt = 1Xn=0 tnAnn! = I + tA+ t2A22 + t3A36 + � � � :If A = T�T�1, where � is a diagonal matrix, then it is easy to showthat eAt = Te�tT�1, where e�t = diag(e�it).Denote the eigenvalues of A by �1; �2; : : : ; �m, and let� = diagf�1; �2; : : : ; �mgbe the diagonal m � m matrix having these eigenvalues as its diagonal el-ements. If A is diagonalizable then there exists a similarity transformationthat carries it into �, viz. T�1AT = �:Then the change of variables w = T�1yyields the ODE for w w0 = �w:The system for w is decoupled: for each component wi of w we have a testequation w0i = �iwi. Therefore, the stability for w, hence also for y, isdetermined by the eigenvalues: stability is obtained if Re(�i) � 0, for alli = 1; : : : ;m, and asymptotic stability holds if the inequalities are all strict.In the more general case, A may not be similar to any diagonal matrix.Rather, we face a Jordan canonical form:T�1AT = 0BBBBBB@�1 0� �0 �l1CCCCCCA

Chapter2: On Problem Stability 25where each Jordan block �i has the form�i =0BBBBBB@�i 1 0� �� 10 �i1CCCCCCA :A little more is required then. A short analysis which we omit establishesthat in general, the solution of the ODE (2.3) is� stable i� all eigenvalues � of A satisfy either Re(�) < 0 or Re(�) = 0and � is simple (i.e. it belongs to a 1� 1 Jordan block)� asymptotically stable i� all eigenvalues � of A satisfy Re(�) < 0.Example 2.2 Consider the second order ODE�u00 + u = 0obtained by taking p = q = 1 in the vibrating spring example. Writing as a�rst order system we obtainy0 = 0@0 11 01Ay:The eigenvalues of this matrix are �1 = �1 and �2 = 1. Hence this ini-tial value problem is unstable. (Note that in Chapter 1 we considered thisODE in the context of a boundary value problem. With appropriate boundaryconditions the problem can become stable, as we'll see in Chapter 6.)Returning to the experiment of Example 2.1, here we have one source ofgrowing error and one source of decreasing error for the IVP. Obviously, aftera su�ciently long time the growing perturbation error will dominate, even ifit starts from a very small deviation �. This is why one \bad" eigenvalue ofA is su�cient for the onset of instability. 2Example 2.3 The general homogeneous, scalar ODE with constant coe�-cients, aku+ ak�1u0 + � � �+ a0u(k) = 0 (2.5)(or Pkj=0 aj dk�judtk�j = 0) with a0 > 0, can be converted as we saw in Chapter 1to a �rst order ODE system. This gives a special case of (2.3) with m = k,

26 Chapter2: Initial Value Problemsy1 = u and A = 0BBBBBBBBB@ 0 10 1� �0 1�ak=a0 �ak�1=a0 �a1=a01CCCCCCCCCA :It is easy to verify that the eigenvalues of this matrix are the roots of thecharacteristic polynomial �(�) = kXj=0 aj�k�j : (2.6)The solution of the higher order ODE (2.5) is therefore,� stable i� all roots � of the characteristic polynomial satisfy either Re(�) <0 or Re(�) = 0 and � is simple� asymptotically stable i� all roots � of the characteristic polynomial sat-isfy Re(�) < 0. 22.3 Linear, Variable Coe�cient SystemsThe general form of a linear ODE system isy0 = A(t)y+ q(t) (2.7)where the m�mmatrix A(t) and the m-vector inhomogeneity q(t) are givenfor each t, 0 � t � b.We brie y review elementary ODE theory. The fundamental solutionY (t) is the m�m matrix function which satis�esY 0(t) = A(t)Y (t); 0 � t � b (2.8a)Y (0) = I (2.8b)

Chapter2: On Problem Stability 27i.e., the jth column of Y (t), often referred to as a mode, satis�es the homoge-neous version of the ODE (2.7) with the jth unit vector as initial value. Thesolution of ODE (2.7) subject to given initial valuesy(0) = cis then y(t) = Y (t) �c+ Z t0 Y �1(s)q(s)ds� : (2.9)Turning to stability, it is clear that for a linear problem the di�erencebetween two solutions y(t) and y(t) can be directly substituted into (2.9) inplace of y(t) with the corresponding di�erences in data substituted into theright hand side (say c� c in place of c). So the question of stability relatesto the boundedness of y(t) for a homogeneous problem (i.e. with q = 0) aswe let b!1. Then the solution of the ODE is� stable i� sup0�t<1 kY (t)k is bounded� asymptotically stable i� in addition to being stable, kY (t)k ! 0 ast!1.We can de�ne the stability constant� = sup0�t<1 kY (t)kin an attempt to get a somewhat more quantitative feeling. But an examina-tion of (2.9) suggests that a more careful de�nition of the stability constant,taking into account also perturbations in the inhomogeneity, is� = sup0�s�t<1 kY (t)Y �1(s)k: (2.10)Example 2.4 The simple ODEy0 = (cos t)yhas the eigenvalue �(t) = cos t and the fundamental solution Y (t) = esin t.This problem is stable, with a moderate stability constant � = e2 < 8, eventhough the eigenvalue does not always remain below 0. 2

28 Chapter2: Initial Value Problems2.4 Nonlinear ProblemsA full exposition of stability issues for nonlinear problems is well beyond thescope of this book. A fundamental di�erence from the linear case is that thestability depends on the particular solution trajectory considered.For a given, isolated solution y(t) of (2.2)1 , a linear analysis can be ap-plied locally, to consider trends of small perturbations. Thus, if y(t) satis�esthe same ODE with y(0) = c not too far from c, then (under certain condi-tions) we can ignore the higher order term r(t;y; y) in the Taylor expansionf(t; y) = f(t;y) + @f@y(y� y) + r(t;y; y)and consider the linear, variational equationz0 = A(t;y)z (2.11)for z (not y), with the Jacobian matrix A = @f@y .Example 2.5 Often, one is interested in steady state solutions, i.e. wheny(t) becomes independent of t, hence y0 = 0 = f(y). An example isy0 = y(1� y)which obviously has the steady state solutions y = 0 and y = 1. The Jacobianis A = 1 � 2y, hence A > 0 for the value y = 0 and A < 0 for y = 1. Weconclude that the steady state solution y = 0 is unstable whereas the steadystate solution y = 1 is stable. Thus, even if we begin the integration of theODE from an initial value close to the steady state y = 0, 0 < c � 1, thesolution y(t) will be repelled from it and attracted to the stable steady statey = 1. 2Since the Jacobian matrix depends on the solution trajectory y(t), itseigenvalues do not necessarily retain the same sign throughout the integrationinterval. It is then possible to have a system with a bounded solution over anarbitrarily long integration interval, which contains time subintervals whosetotal length is also arbitrarily large, where the system behaves unstably.This is already possible for linear problems with variable coe�cients, e.g.Example 2.4, but it is not possible for the constant coe�cient problems ofx2.2. Through the periods of solution growth, perturbation errors grow aswell. Then, unless the system is su�ciently simple so that these errors shrinkagain, they may remain bounded through stable periods only to grow even1i.e., there is some tube in which y(t) is the only solution of (2.2), but no globaluniqueness is postulated.

Chapter2: On Problem Stability 29further when the system becomes unstable again. This generates an e�ect ofunpredictability, where the e�ect of errors in data grows uncontrollably evenif the solution remains bounded.'& $%Reader's advice: In the following section we give some briefbackground for material of current research interest. But for thosewho operate on a need-to-know basis we note that this materialappears, in later chapters, only in sections on notes and referencesand in selected exercises.2.5 Hamiltonian SystemsA lot of attention has been devoted in recent years to Hamiltonian systems.A Hamiltonian system consists of m = 2l di�erential equations,q0i = @H@pi (2.12a)i = 1; : : : ; lp0i = �@H@qi (2.12b)or in vector notation (with rpH denoting the gradient of H with respectto p, etc.), q0 =rpH(q;p); p0 = �rqH(q;p) :The scalar function H(q;p), assumed to have continuous second derivatives,is the Hamiltonian.2Di�erentiating H with respect to time t and substituting (2.12) we getH 0 =rpHTp0 +rqHTq0 = 0so H(q;p) is constant for all t. A typical example to keep in mind is thatof a conservative system of particles. Then the components of q(t) are thegeneralized positions of the particles, and those of p(t) are the generalizedmomenta. The Hamiltonian H in this case is the total energy (the sum ofkinetic and potential energies), and the constancy of H is a statement ofconservation of energy.Next, consider an autonomous ODE system of order m = 2,y0 = f(y)2In Chapters 9 and 10 we use e instead of H to denote the Hamiltonian.

30 Chapter2: Initial Value Problemswith y(0) = (y1(0); y2(0))T 2 B, for some set B in the plane. Each initialvalue y(0) = c from B spawns a trajectory y(t) = y(t; c), and we can followthe evolution of the set B under this ow,S(t)B = fy(t; c) ; c 2 Bg :We then ask how the area of S(t)B compares to the initial area of B: doesit grow or shrink in time? It is easy to see for linear problems that this areashrinks for asymptotically stable problems and grows for unstable problems(recall Example 2.1). It is less easy to see, but it can be shown, that the areaof S(t)B remains constant, even for nonlinear problems, if the divergence off vanishes, rf = @f1@y1 + @f2@y2 = 0:This remains valid for m > 2 provided that rf = 0, with an appropriateextension of the concept of volume in m dimensions.Now, for a Hamiltonian system with l = 1,q0 = Hp; p0 = �Hqwe have for rf rf = @2H@p@q � @2H@q@p = 0hence the Hamiltonian ow preserves area. In more dimensions, l > 1,it turns out that the area of each projection of S(t)B on a qi � pi plane,i = 1; : : : ; l, is preserved, and this property is referred to as a symplecticmap.Since a Hamiltonian system cannot be asymptotically stable, its stability(if it is stable, which is true in case that H can be considered a norm ateach t, e.g. if H is the total energy of a friction-free multibody system) isin a sense marginal. The solution trajectories do not simply decay to a reststate and their long-time behavior is therefore of interest. This leads to someserious numerical challenges.We conclude this brief exposition with a simple example.Example 2.6 The simplest Hamiltonian system is the linear harmonic os-cillator. The quadratic HamiltonianH = !2 (p2 + q2)yields the linear equations of motionq0 = !p; p0 = �!q

Chapter2: On Problem Stability 31or 0@qp1A0 = !J 0@qp1A ; J = 0@ 0 1�1 01A :Here ! > 0 is a known parameter. The general solution is0@q(t)p(t)1A = 0@ cos !t sin !t� sin!t cos!t1A0@q(0)p(0)1A :Hence, S(t)B is just a rotation of the set B at a constant rate depending on!. Clearly, this keeps the area of B unchanged.Note that the eigenvalues of J are purely imaginary. Thus, a small \push"(i.e. a perturbation of the system) of these eigenvalues towards the positivehalf plane can make the system unstable. 22.6 Notes and ReferencesThere are many books and papers on the subject of this chapter. The booksby Hairer, Norsett & Wanner [50], Mattheij & Molnaar [67] and Stuart &Humphries [93] treat the theory carefully with computations in mind, so werecommend them in particular. See also [77, 47, 8, 55, 4]. For Hamiltoniansystems, see [82, 93, 50].2.7 Exercises1. For each of the following constant coe�cient systems y0 = Ay, deter-mine if the system is stable, asymptotically stable or unstable.(a) A = 0@�1 00 �1001A (b) A = 0@�1 100 �21A(c) A = 0@1 33 11A (d) A = 0@ 0 1�1 01A

32 Chapter2: Initial Value Problems2. (a) Compute the eigenvalues of the matrixA(t) = 0@�14 + 34 cos 2t 1 � 34 sin 2t�1� 34 sin 2t �14 � 34 cos 2t1A(b) Determine whether the variable coe�cient system y0 = A(t)y isstable, asymptotically stable or unstable.[You may want to use T (t) =0@ cos t sin t� sin t cos t1A .]3. The Lyapunov function is an important tool for analyzing stabilityof nonlinear problems. The scalar, C1-function V (y) is a Lyapunovfunction at �y if ddtV (y(t)) � 0 (2.13)for all y in a neighborhood of �y. If also V (�y) = 0 and V (y) > 0 in theneighborhood then V is a positive de�nite Lyapunov function at �y.It can be shown that if �y is a steady state solution of (2.2) then �y isstable if there is a corresponding positive de�nite Lyapunov function.If the inequality in (2.13) is sharp (except at y = �y) then the steadystate solution is asymptotically stable.(a) Construct a suitable Lyapunov function to show that �y = 1 isstable in Example 2.5. (You should �nd it di�cult to construct asimilar function for the other steady state, �y = 0, for this exam-ple.)(b) Let U(y) be a smooth, scalar function with a minimum at �y (notethat y is not necessarily scalar), and consider the systemy0 = �ryU = �@U@y :Show that �y is a stable steady state solution of this nonlinearODE system.4. Consider a nonlinear ODE system (2.2) which has an invariant setMde�ned by the equations h(t;y) = 0 (2.14)i.e., assuming that the initial conditions satisfy h(0;y(0)) = 0, thesolution of the ODE satis�es h(t;y(t)) = 0 for all later times t � 0.

Chapter2: On Problem Stability 33Let us assume below, to save on notation, that f and h are autonomous.De�ne the Jacobian matrix H(y) = @h@yand assume that it has full row rank for all t (in particular, there areno more equations in (2.14) than in (2.2)).Next we stabilize the vector �eld, replacing the autonomous (2.2) byy0 = f(y)� HT (HHT )�1h(y): (2.15)(a) Show that if h(y(0)) = 0 then the solution of (2.15) coincides withthat of the original y0 = f(y).(b) Show that if there is a constant 0 such thatjHf(y)j2 � 0jh(y)j2for all y in the neighborhood of the invariant set M then Mbecomes asymptotically stable, i.e. jh(y(t))j decreases in t fortrajectories of (2.15) starting nearM, provided that � 0.

34 Chapter2: Initial Value Problems

Chapter 3Basic Methods, Basic ConceptsWe begin our discussion of numerical methods for initial value ODEs with anintroduction of the most basic concepts involved. To illustrate these concepts,we use three simple discretization methods: forward Euler, backward Euler(also called implicit Euler), and trapezoidal. The problem to be solved iswritten, as before, in the general formy0 = f(t;y); 0 � t � b (3.1)with y(0) = c given. You can think of this at �rst as a scalar ODE { mostof what we are going to discuss generalizes to systems directly, and we willhighlight occasions where the size of the system is important.We will assume su�cient smoothness and boundedness on f(t;y) so asto guarantee a unique existence of a solution y(t) with as many boundedderivatives as referred to in the sequel. This assumption will be relaxed inx3.7.3.1 A Simple Method: Forward EulerTo approximate (3.1), we �rst discretize the interval of integration by a mesh0 = t0 < t1 < : : : < tN�1 < tN = band let hn = tn�tn�1 be the nth step size. We then construct approximationsy0(= c); y1; : : : ;yN�1;yNwith yn an intended approximation of y(tn). 35

36 Chapter 3: Initial Value ProblemsIn the case of an initial value problem we know y0 and may proceed tointegrate the ODE in steps, where on each step n (1 � n � N) we knowan approximation yn�1 at tn�1 and we seek yn at tn. Thus, as we progresstowards tn we do not need an advance knowledge of the entire mesh beyondit (or even of N , for that matter). Let us concentrate on one such step, n(� 1).Review: Order notation. Throughout this book we considervarious computational errors depending on a discretization step-size h > 0, and ask how they decrease as h decreases. We denotefor a vector d depending on hd = O(hp)if there are two positive constants p and C such that for all h > 0small enough, jdj � Chp:For example, comparing (3.2) and (3.3) we see that in (3.3) theorder notation involves a constant C which bounds 12ky00k.In other instances, such as when estimating the e�ciency of aparticular algorithm, we are interested in a bound on the workestimate as a parameterN increases unboundedly (e.g. N = 1=h).For instance, w = O(N logN)means that there is a constant C such thatw � CN logNas N !1. It will be easy to �gure out from the context whichof these two meanings is the relevant one.To construct a discretization method consider Taylor's expansiony(tn) = y(tn�1) + hny0(tn�1) + 12h2ny00(tn�1) + : : : (3.2)which we can also write, using the order notation, asy(tn) = y(tn�1) + hny0(tn�1) +O(h2n): (3.3)The forward Euler method can be derived by dropping the rightmost termin this Taylor expansion and replacing y0 by f , yielding the schemeyn = yn�1 + hnf(tn�1;yn�1): (3.4)

Chapter 3: Basic Methods, Basic Concepts 37This is a simple, explicit method { starting from y0 = c we apply (3.4)iteratively for n = 1; 2; : : : ; N . The e�ect of the approximation is depictedin Fig. 3.1. The curved lines represent a family of solutions for the ODE

0 0.5 1 1.5 2 2.5 3 3.5 40

1

2

3

4

5

6

7

8

9

10

t

y

Figure 3.1: The forward Euler method. The exact solution is the curved solidline. The numerical values are circled. The broken line interpolating themis tangential at the beginning of each step to the ODE trajectory passingthrough that point (dashed lines).with di�erent initial values. At each step, the approximate solution yn�1 ison one of these curves at tn�1. The forward Euler step amounts to takinga straight line in the tangential direction to the exact trajectory starting at(tn�1;yn�1), continuing until the end of the step. (Recall Example 2.1.) Onehopes that if h is small enough then yn is not too far from y(tn). Let usassess this hope.

38 Chapter 3: Initial Value Problems3.2 Convergence, Accuracy, Consistency and0-StabilityWe now rewrite Euler's method (3.4) in a form compatible with the approx-imated ODE, yn � yn�1hn � f(tn�1;yn�1) = 0:To formalize a bit, let the di�erence operatorNhu(tn) � u(tn)� u(tn�1)hn � f(tn�1;u(tn�1)) (3.5)be applied for n = 1; 2; : : : ; N for any function u de�ned at mesh points withu(t0) speci�ed, and consider yh to be a mesh function which takes on thevalue yn at each tn, n = 0; 1; : : : ; N . Then the numerical method is given byNhyh(tn) = 0(with y0 = c).Much of the study of numerical ODEs is concerned with the errors on eachstep that are due to the di�erence approximation, and how they accumulate.One measure of the error made on each step is the local truncation error.It is the residual of the di�erence operator when it is applied to the exactsolution, dn = Nhy(tn): (3.6)The local truncation error measures how closely the di�erence operator ap-proximates the di�erential operator. This de�nition of the local truncationerror applies to other discretization methods as well (they di�er from oneanother in the de�nition of the di�erence operator). The di�erence methodis said to be consistent (or accurate) of order p ifdn = O(hpn) (3.7)for a positive integer p.For the forward Euler method (3.4), the Taylor expansion (3.2) yieldsdn = hn2 y00(tn) +O(h2n)so the method is consistent of order 1.A straightforward design of di�erence approximations to derivatives nat-urally leads to consistent approximations to di�erential equations. However,our real goal is not consistency but convergence. Leth = max1�n�N hn

Chapter 3: Basic Methods, Basic Concepts 39and assume Nh is bounded independent of N . The di�erence method is saidto be convergent of order p if the global error en, where en = yn � y(tn),e0 = 0, satis�es en = O(hp) (3.8)for n = 1; 2; : : : ; N . The positive integer p does not really have to be thesame as the one in (3.7) for the de�nition to hold. But throughout this bookwe will consider methods where the order of convergence is inherited fromthe order of accuracy. For this we need 0-stability.The di�erence method is 0-stable if there are positive constants h0 andK such that for any mesh functions xh and zh with h � h0,jxn � znj � Kfjx0 � z0j+ max1�j�N jNhxh(tj)�Nhzh(tj)jg; 1 � n � N:(3.9)What this bound says in e�ect is that the di�erence operator is invertible, andthat its inverse is bounded by K. Note the resemblance between (3.9) andthe bound (1.6) which the di�erential operator satis�es. The bound in (3.9)measures the e�ect on the numerical solution of small perturbations in thedata. The importance of this requirement lies in the following fundamentaltheorem .Theorem 3.1 consistency + 0-stability ) convergenceIn fact, if the method is consistent of order p and 0-stable, then it is conver-gent of order p: jenj � K maxj jdjj = O(hp): (3.10)The proof of this fundamental theorem is immediate: simply let xn yn and zn y(tn) in the stability bound (3.9), and use the de�nitions ofaccuracy and local truncation error. 2Turning to the forward Euler method, by this fundamental convergencetheorem we will obtain convergence of order 1 (assuming that a bounded y00exists) if we show that the 0-stability bound (3.9) holds. To see this, denotesn = xn � zn; � = max1�j�N jNhxh(tj)�Nhzh(tj)j:Then for each n,� � ��sn � sn�1hn � (f(tn�1;xn�1)� f(tn�1; zn�1))�� jsnjhn � ��sn�1hn + (f(tn�1;xn�1)� f(tn�1; zn�1))�� :

40 Chapter 3: Initial Value ProblemsUsing Lipschitz continuity,��sn�1hn + (f(tn�1;xn�1)� f(tn�1; zn�1))�� jsn�1jhn + Ljsn�1j = � 1hn + L� jsn�1jso that,jsnj � (1 + hnL)jsn�1j+ hn�� (1 + hnL)[(1 + hn�1L)jsn�2j+ hn�1�] + hn�� : : :� (1 + h1L) � � � (1 + hn�1L)(1 + hnL)js0j+ � nXj=1 hj(1 + hj+1L) � � � (1 + hnL)� eLtnjs0j+ 1L(eLtn � 1)�:The last inequality above is obtained by noting that 1 + hL � eLh implies(1 + hj+1L) � � � (1 + hnL) � eL(tn�tj) 0 � j � nand also,nXj=1 hjeL(tn�tj) � nXj=1 Z tjtj�1 eL(tn�t)dt = eLtn Z tn0 e�Ltdt = 1L(eLtn � 1):The stability bound is therefore satis�ed, with K = max�eLb; 1L(eLb � 1)in (3.9).It is natural to ask next if the error bound (3.10) is useful in practice, i.e.,if it can be used to reliably estimate the step size h needed to achieve a givenaccuracy. This is a tempting possibility. For instance, letM be an estimatedbound on ky00k. Then the error using forward Euler can be bounded byjenj � hM2L(eLtn � 1); 1 � n � N: (3.11)However, it turns out that this bound is too pessimistic in many applications,as the following example indicates.Example 3.1 Consider the scalar problemy0 = �5ty2 + 5t � 1t2 ; y(1) = 1for 1 � t � 25. (Note that the starting point of the integration is t = 1, nott = 0 as before. But this is of no signi�cance: just change the independentvariable to � = t� 1.) The exact solution is y(t) = 1t .

Chapter 3: Basic Methods, Basic Concepts 41To estimate the Lipschitz constant L, note that near the exact solution,fy = �10ty � �10:Similarly, use the exact solution to estimate M = 2 � 2=t3. Substituting thisinto (3.11) yields the bound jenj � h10e10(tn�1)so jeN j � h10e240, not a very useful bound at all. 2We will be looking in later chapters into the question of realistic errorestimation.We close this section by mentioning another, important measure of theerror made at each step, the local error. It is de�ned as the amount by whichthe numerical solution yn at each step di�ers from the solution �y(tn) to theinitial value problem �y0(t) = f(t; �y(t)) (3.12)�y(tn�1) = yn�1 :Thus the local error is given byln = �y(tn)� yn: (3.13)Under normal circumstances, it can be shown that the numerical solutionexists and jdnj = jNh�y(tn)j+O(hp+1):Moreover, it is easy to show, for all of the numerical ODE methods consideredin this book, that 1 hnjNh�y(tn)j = jlnj(1 +O(hn)): (3.14)The two local error indicators, hndn and ln, are thus often closely related.1We caution here that for sti� problems, to be discussed in x3.4, the constant impliedin this O(hn) may be quite large.

42 Chapter 3: Initial Value Problems3.3 Absolute StabilityExample 3.1 may make one wonder about the meaning of the fundamentalconvergence Theorem 3.1. The theorem is not violated: we still have thatjenj � Ch for some constant C, even if large, so as h! 0, jenj ! 0. However,the theoremmay not give a quantitative indication of what happens when weactually compute with a step size h not very small. (The name \0-stability"now becomes more intuitive { this concept deals with the limit of h ! 0.)The basic reason why the constant in this example is so pessimistically largeis that while fy � �10, i.e. the exact solution mode is decaying, the stabilitybound uses the Lipschitz constant L = 10, and consequently is exponentiallyincreasing.For large step-sizes, the di�erence equation should mimic the behavior ofthe di�erential equation in the sense that their stability properties should besimilar.What stability requirements arise for h which is not vanishingly small?Consider the scalar test equation y0 = �y (3.15)where � is a complex constant (complex because later we will be looking atODE systems, and there � corresponds to an eigenvalue). If y(0) = c (sayc > 0 for notational convenience), then the exact solution isy(tn) = ce�tnwhereas Euler's method, with a uniform step size hn = h, givesyn = yn�1 + h�yn�1 = (1 + h�)yn�1 = : : : = c(1 + h�)n:Let us distinguish between three cases (cf. Example 2.1).� If Re(�) > 0, then jy(t)j = ceRe(�)t grows exponentially with t. Thisis an unstable problem, although for eRe(�)b not too large, one can stillcompute solutions which are meaningful in the relative sense. In thiscase, the error bound (3.10) is realistic. For unstable problems, thedistance between solution curves increases in time.� If Re(�) = 0, the solution is oscillating (unless � = 0) and the distancebetween solution curves stays the same.� If Re(�) < 0, then jy(t)j decays exponentially. The distance betweensolution curves decreases. The problem is (asymptotically) stable, andwe cannot tolerate growth in jynj. This is usually the interesting case,and it yields an additional absolute stability requirement,jynj � jyn�1j; n = 1; 2; : : : (3.16)

Chapter 3: Basic Methods, Basic Concepts 43For a given numerical method, the region of absolute stability is that regionof the complex z-plane such that applying the method for the test equation(3.15), with z = h� from within this region, yields an approximate solutionsatisfying the absolute stability requirement (3.16).For the forward Euler method we obtain the conditionj1 + h�j � 1 (3.17)which yields the region of absolute stability depicted in Fig. 3.2. For instance,−4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Re(z)

Im(z

)

Stability regions in the complex z−plane

stable

Figure 3.2: Absolute stability region for the forward Euler method.if � is negative, then h must be restricted to satisfyh � 2��:For Example 3.1 this gives h < :2 . In this case the restriction is notpractically unbearable. To see the e�ect of violating the absolute stabilityrestriction, we plot in Fig. 3.3 the approximate solutions obtained withuniform step sizes h = :19 and h = :21 . For h = :19 the solution pro�le lookslike the exact one (y = 1=t). The other, oscillatory pro�le, is obtained forh = :21, which is outside the absolute stability region. When computing withh = :4 using the same forward Euler method and oating point arithmeticwith a 14-hexadecimal-digit mantissa (this is the standard \double precision"in IEEE FORTRAN, for example), the computed solution oscillates and thenblows up (i.e., over ow is detected) before reaching t = 25 .

44 Chapter 3: Initial Value Problems0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25Figure 3.3: Approximate solutions for Example 3.1 using the forward Eulermethod, with h = :19 and h = :21 . The oscillatory pro�le corresponds toh = :21; for h = :19 the qualitative behavior of the exact solution is obtained.It is important to understand that the absolute stability restriction isindeed a stability, not accuracy, requirement. Consider the initial valuec = 10�15 for the test equation with Re(�) < 0, so the exact solution isapproximated very well by the constant 0. Such an initial value correspondsto an unavoidable perturbation in the numerical method, due to roundo�errors. Now, the forward Euler solution corresponding to this initial pertur-bation of 0 remains very close to 0 for all tn > 0, like the exact solution,when using any h from the absolute stability region, but it blows up as nincreases if h is from outside that region, i.e. if j1 + h�j > 1.The concept of absolute stability was de�ned with respect to a very sim-ple test equation (3.15), an ODE whose numerical solution is not a com-putationally challenging problem in itself. Nonetheless, it turns out thatabsolute stability gives useful information, at least qualitatively, in more gen-eral situations, where complicated systems of nonlinear ODEs are integratednumerically.

Chapter 3: Basic Methods, Basic Concepts 45#" !Reader's advice: Those readers who are prepared to trust uson the above statement may wish to skip the rest of this section,at least on �rst reading (especially if your linear algebra needssome dusting).We now consider the extension of the test equation analysis to a simpleODE system, y0 = Ay (3.18)where A is a constant, diagonalizable, m�m matrix.Denote the eigenvalues of A by �1; �2; : : : ; �m, and let� = diagf�1; �2; : : : ; �mgbe the diagonal m � m matrix composed of these eigenvalues. Again, theinteresting case is when (3.18) is stable, i.e. Re(�j) � 0; j = 1; : : : ;m. Thediagonalizability of A means that there is a nonsingular matrix T , consistingof the eigenvectors of A (scaled to have unit Euclidean norm, say), such thatT�1AT = �:Consider the following change of dependent variables,w = T�1y:For w(t) we obtain, upon multiplying (3.18) by T�1 and noting that T isconstant in t, the decoupled systemw0 = �w: (3.19)The components of w are separated and for each component we get a scalarODE in the form of the test equation (3.15) with � = �j ; j = 1; : : : ;m.Moreover, since A and therefore T are constant, we can apply the sametransformation to the discretization: Let wn = T�1yn, all n. Then theforward Euler method for (3.18),yn = yn�1 + hnAyn�1transforms into wn = wn�1 + hn�wn�1which is the forward Euler method for (3.19). The same commutativity ofthe discretization and the w�transformation (in the case that T is constant!)holds for other discretization methods as well.

46 Chapter 3: Initial Value ProblemsNow, for the decoupled system (3.19), where we can look at each scalarODE separately, we obtain that if hn are chosen such that h�1; h�2; : : : ; h�mare all in the absolute stability region of the di�erence method (recall h =maxn hn), then jwnj � jwn�1j � : : : � jw0jso jynj � kTkjwnj � : : : � kTkjw0j � kTkkT�1kjy0j:Denoting by cond(T ) = kTkkT�1k (3.20)the condition number of the eigenvector matrix T (measured in the norminduced by the vector norm used for jynj), we obtain the stability boundjynj � cond(T )jcj; n = 0; 1; : : : ; N (3.21)(recall y(0) = c).Note that in general the stability constant cond(T ) is not guaranteedto be of moderate size, although it is independent of n, and it may oftendepend on the size m of the ODE system. An additional complication ariseswhen A is not diagonalizable. The considerations here are very similar tothose arising in eigenvalue sensitivity analysis in linear algebra. Indeed theessential question is similar too: how representative are the eigenvalues ofthe properties of the matrix A as a whole?But there are important special cases where we encounter more favorablewinds. If A is (real and) symmetric, then not only are its eigenvalues real,also its eigenvectors are orthogonal to one another. We may therefore chooseT to be orthogonal, i.e., T�1 = T T :In this case it is advantageous to use the Euclidean norm l2, because we getcond(T ) = 1regardless of the size of the system. Thus, if hmin1�j�m �j is in the abso-lute stability region of the di�erence method, then (3.21) yields the stabilitybound in the l2 norm,ynTyn � cTc; 0 � n � N: (3.22)The importance of obtaining a bound on cond(T ) which is independentof m increases, of course, when m is large. Such is the case for the method of

Chapter 3: Basic Methods, Basic Concepts 47lines (Examples 1.3 and 1.7), where the ODE system arises from a spatiallydiscretized time-dependent PDE. In this case m is essentially the number ofspatial grid points. This is worked out further for some instances in Exercises3.6 and 3.7.3.4 Sti�ness: Backward EulerIdeally, the choice of step size hn should be dictated by approximation ac-curacy requirements. But we just saw that when using the forward Eulermethod (and, as it turns out, many other methods too), hn must be cho-sen su�ciently small to obey an additional, absolute stability restriction,as well. Loosely speaking, the initial value problem is referred to as beingsti� if this absolute stability requirement dictates a much smaller step sizethan is needed to satisfy approximation requirements alone. In this case othermethods, which do not have such a restrictive absolute stability requirement,should be considered.To illustrate this, consider a simple example.Example 3.2 The scalar problemy0 = �100(y � sin t); t � 0; y(0) = 1has a solution which starts at the given initial value and varies rapidly. Butafter a short while, say for t � 0:03, y(t) varies much more slowly, satisfy-ing y(t) � sin t, see Fig. 3.4. For the initial small interval of rapid change(commonly referred to as an initial layer or transient), we expect to use smallstep sizes, so that 100hn � 1, say. This is within the absolute stability regionof the forward Euler method. But when y(t) � sin t, accuracy considerationsalone allow a much larger step size, so we want 100hn � 2. A reasonablemesh is plotted using markers on the t axis in Fig. 3.4. Obviously, how-ever, the plotted solution in this �gure was not found using the forward Eulermethod (but rather, using another method) with this mesh, because the abso-lute stability restriction of the forward Euler method is severely violated here.2 Scientists often describe sti�ness in terms of multiple time scales. If theproblem has widely varying time scales, and the phenomena (or, solutionmodes) that change on fast scales are stable, then the problem is sti�. Forexample, controllers are often designed to bring a system rapidly back to asteady state and are thus a source of sti�ness. In chemically reacting systems,sti�ness often arises from the fact that some chemical reactions occur muchmore rapidly than others.

48 Chapter 3: Initial Value Problems0

0.2

0.4

0.6

0.8

1

0 0.5 1 1.5 2 2.5 3++++++++++++++++++++++++ + + + + + + + + + + + + + + + + +Figure 3.4: Approximate solution and plausible mesh, Example 3.2.The concept of sti�ness is best understood in qualitative, rather thanquantitative, terms. In general, sti�ness is de�ned in terms of the behaviorof an explicit di�erence method, and the behavior of forward Euler is typicalof such methods.De�nition 3.1 An IVP is sti� in some interval [0; b] if the step size neededto maintain stability of the forward Euler method is much smaller than thestep size required to represent the solution accurately.We note that sti�ness depends, in addition to the di�erential equationitself, on the� accuracy criterion� length of the interval of integration� region of absolute stability of the methodIn Example 3.2, for a moderate error tolerance, the problem is sti� afterabout t = 0:03. If it were required to solve the problem to great accuracy,then it would not be sti� because the step size would need to be small inorder to attain that accuracy, and hence would not be restricted by stability.For stable, homogeneous, linear systems, sti�ness can be determined bythe system's eigenvalues. For the test equation (3.15) on [0; b], we say that

Chapter 3: Basic Methods, Basic Concepts 49the problem is sti� if bRe(�)��1: (3.23)Roughly, the general ODE system (3.1) is sti� in a neighborhood of thesolution y(t) if there exists for some bounded data a component of y whichdecays rapidly on the scale of the interval length b. In the general case,sti�ness can often be related to the eigenvalues �j of the local Jacobianmatrix fy(t;y(t)), generalizing (3.23) tobminj Re(�j)��1: (3.24)Thus, we look for methods which do not violate the absolute stabilityrequirement when applied to the test equation (3.15), even when hRe(�)��1. Such a method is the backward Euler method. It is derived for thegeneral ODE (3.1) just like the forward Euler method, except that everythingis centered at tn, rather than at tn�1. This gives the �rst-order methodyn = yn�1 + hnf(tn;yn): (3.25)Geometrically, instead of using the tangent at (tn�1;yn�1), as in the for-ward Euler method, the backward Euler method uses the tangent at thefuture point (tn;yn), thus enhancing the stability. The local truncation errorof this method is similar in magnitude to that of the forward Euler method,and correspondingly the convergence bound (3.10) is similar too (we leavethe 0-stability proof in this case as an exercise). The two major di�erencesbetween these simple methods are:� While the forward Euler method is explicit, the backward Euler methodis implicit: the unknown vector yn at each step appears on both sides ofthe equation (3.25), generally in a nonlinear expression. Consequently,a nonlinear system of algebraic equations has to be (approximately)solved at each step. That's the bad news for backward Euler.� The good news is the method's stability. Applying the backward Eulermethod (3.25) to the test equation, we obtainyn = yn�1 + h�yni.e., yn = (1 � h�)�1yn�1:The ampli�cation factor, i.e. what multiplies jyn�1j to get jynj in abso-lute value, satis�es 1j1 � h�j � 1

50 Chapter 3: Initial Value Problemsfor all values of h > 0 and � satisfying Re(�) � 0. In particular, thereis no absolute stability prohibition from taking hRe(�) � �1, e.g. inExample 3.2.The region of absolute stability of the backward Euler method is depictedin Fig. 3.5. It contains, in addition to the entire left half-plane of z = h�,−4 −3 −2 −1 0 1 2 3 4

−3

−2

−1

0

1

2

3

Re(z)

Im(z

)


stability OUTSIDE

shaded area

Figure 3.5: Absolute stability region for the backward Euler method.also a major part of the right half-plane. (The latter is a mixed blessing,though, as will be discussed later on.) For a given sti� problem, the back-ward Euler method needs fewer steps than the forward Euler method. Ingeneral, however, each backward Euler step may be more expensive in termsof computing time. Still, there are many applications where the overall com-putational expense using the implicit method is much less than with theexplicit Euler method.For an implicit method like backward Euler, a nonlinear system of equa-tions must be solved in each time step. For backward Euler, this nonlinearsystem is g(yn) = yn � yn�1 � hf(tn;yn) = 0(where h = hn for notational simplicity). There are a number of ways tosolve this nonlinear system. We mention two basic ones.

Chapter 3: Basic Methods, Basic Concepts 51Functional iterationOur �rst impulse might be to solve the nonlinear system by functional iter-ation. This yields y�+1n = yn�1 + hf(tn;y�n); � = 0; 1; : : :where we can choose y0n = yn�1, for instance. (Note that � is an iterationcounter, not a power.)The advantage here is simplicity. However, the convergence of this it-eration requires hk@f=@yk < 1 in some norm.2 For sti� systems, @f=@y islarge, so the step size h would need to be restricted and this would defeatthe purpose of using the method.Example 3.3 Let us generalize the ODE of Example 3.1 toy0 = �(ty2 � t�1) � t�2; t > 1where � < 0 is a parameter. With y(0) = 1, the exact solution is stilly(t) = 1t . The backward Euler method gives a nonlinear equation for yn,yn � yn�1 = hn�(tny2n � t�1n )� hnt�2n ; (3.26)and functional iteration readsy�+1n � yn�1 = hn�(tn(y�n)2 � t�1n )� hnt�2n ; � = 0; 1; : : : : (3.27)The question is, under what conditions does the iteration (3.27) convergerapidly?Subtracting (3.27) from (3.26) and denoting "�n = yn � y�n, we get"�+1n = hn�tn(y2n � (y�n)2)= hn�tn(yn + y�n)"�n � 2hn�"�n; � = 0; 1; : : : :This iteration obviously converges i� j"�+1n j < j"�nj, and the approximate con-dition for this convergence is thereforehn < 12j�j :The convergence is rapid if hn � 12j�j: Now, if � = �5, as in Example 3.1,then convergence of this nonlinear iteration is obtained with h < 0:1, andchoosing h = :01 yields rapid convergence (roughly, one additional signi�cantdigit is gained at each iteration). But if � = �500 then we must take h < :001for convergence of the iteration, and this is a harsh restriction, given thesmoothness and slow variation of the exact solution. Functional iteration istherefore seen to be e�ective only in the nonsti� case. 2Functional iteration is often used in combination with implicit methodsfor the solution of nonsti� problems, as we will see later, in Chapter 5.2This would yield a contraction mapping, and therefore convergence as � !1 to the�xed point yn.

52 Chapter 3: Initial Value ProblemsNewton iterationVariants of Newton's method are used in virtually all modern sti� ODEcodes.Given the nonlinear systemg(yn) = yn � yn�1 � hf(tn;yn) = 0;Newton's method yieldsy�+1n = y�n ��@g@y��1 g(y�n)= y�n ��I � h @f@y��1 (y�n � yn�1 � hf(tn;y�n)); � = 0; 1; : : : :

Chapter 3: Basic Methods, Basic Concepts 53Review: Newton's method. For a nonlinear equationg(x) = 0we de�ne a sequence of iterates as follows: x0 is an initial guess.For a current iterate x�, we write0 = g(x) = g(x�) + g0(x�)(x� x�) + � � � :Approximating the solution x by neglecting the higher order termsin this Taylor expansion, we de�ne the next iterate x�+1 by thelinear equation 0 = g(x�) + g0(x�)(x�+1 � x�):We can generalize this directly to a system of m algebraic equa-tions in m unknowns, g(x) = 0:Everything remains the same, except that the �rst derivative ofg is replaced by the m �m Jacobian matrix @g@x . We obtain theiterationx�+1 = x� ��@g@x(x�)��1 g(x�); � = 0; 1; : : : :We note that it is not good practice to compute a matrix inverse.Moreover, rather than computing x�+1 directly, it is better in cer-tain situations (when ill-conditioning is encountered), and neverworse in general, to solve the linear system for the di�erence �between x�+1 and x�, and then update. Thus, � is computed (foreach �) by solving the linear system�@g@x� � = �g(x�)where the Jacobian matrix is evaluated at x�, and the next New-ton iterate is obtained byx�+1 = x� + � :The matrix (I �h@f=@y) is evaluated at the current iterate y�n. This matrixis called the iteration matrix, and the costs of forming it and solving thelinear system (for � = y�+1n � y�n) often dominate the costs of solving the

54 Chapter 3: Initial Value Problemsproblem. We can take the initial guessy0n = yn�1although better ones are often available. Newton's method is iterated untilan estimate of the error due to terminating the iteration is less than a user-speci�ed tolerance, for examplejy�+1n � y�nj � NTOL:The tolerance NTOL is related to the local error bound that the user aimsto achieve, and is usually well above roundo� level. Because there is a veryaccurate initial guess, most ODE initial value problems require no more thana few Newton iterations per time step. A strategy which iterates no morethan, say, 3 times, and if there is no convergence decreases the step sizehn (thus improving the initial guess) and repeats the process, can be easilyconceived. We return to these ideas in the next two chapters.Newton's method works well for Example 3.3, without a severe restrictionon the time step.Newton's method requires the evaluation of the Jacobian matrix, @f@y .This presents no di�culty for Example 3.3; however, in practical applica-tions, specifying these partial derivatives analytically is often a di�cult orcumbersome task. A convenient technique is to use di�erence approxima-tions: at y = y�n, evaluate f = f(tn; y) and ~f = f(tn; ~y), where y and~y are perturbations of y in one coordinate, yj = yj + �, ~yj = yj � �, andyl = ~yl = yl; l 6= j. Then the jth column of @f@y can be approximated by@f@yj � 12�(f � ~f);where � is a small positive parameter.This very simple trick is very easy to program and it does not a�ect the ac-curacy of the solution yn. It often works very well in practice with the choice� = 10�d, if oating point arithmetic with roughly 2d signi�cant digits is be-ing used (e.g. d = 7). The technique is useful also in the context of boundaryvalue problems, see for instance x7.1 and x8.1.1. It does not always work well,though, and moreover, such an approximation of the Jacobian matrix may attimes be relatively expensive, depending on the application. But it gives theuser a simple technique for computing an approximate Jacobian matrix whenit is needed. Most general-purpose codes provide a �nite-di�erence Jacobianas an option, using a somewhat more sophisticated algorithm to select theincrement.

Chapter 3: Basic Methods, Basic Concepts 55Review: Matrix decompositions. Consider a linear system ofm equations Ax = bwhere A is real, square and nonsingular, b is given and x is asolution vector to be found. The solution is given byx = A�1b:However, it is usually bad practice to attempt to form A�1.The well-known algorithm of Gaussian elimination (without piv-oting) is equivalent to forming an LU-decomposition of A:A = LUwhere L is a unit lower triangular matrix (i.e. lij = 0; i < j, andlii = 1) and U is upper triangular (i.e. uij = 0; i > j). Note thatthis decomposition is independent of the right hand side b. It canbe done without knowing b and it can be used for more than oneright hand side. The LU -decomposition requires 13m3 + O(m2) ops (i.e., elementary oating-point operations).Given a data vector b we can now �nd x by writingL(Ux) = Ax = b :Solving Lz = b for z involves forward substitution and costsO(m2) ops. Subsequently solving Ux = z completes the so-lution process using a back substitution and another O(m2) ops.The solution process is therefore much cheaper when m is largethan the cost of the decomposition.Not every nonsingular matrix has an LU -decomposition, and evenif there exists such a decomposition the numerical process may be-come unstable. Thus, partial pivoting must be applied (unless thematrix has some special properties, e.g. it is symmetric positivede�nite). A row-partial pivoting involves permuting rows of A toenhance stability and results in the decompositionA = PLUwhere P is a permutation matrix (i.e. the columns of P are them unit vectors, in some permuted order). We will refer to an LU -decomposition, assuming that partial pivoting has been appliedas necessary.

56 Chapter 3: Initial Value ProblemsReview: Matrix decompositions, continued. Another impor-tant matrix decomposition is the QR-decompositionA = QRwhere R is upper triangular (like U) and Q is orthogonal:QTQ = I. This decomposition costs twice as much as the LU -decomposition, but it has somewhat better stability properties,because kQk2 = kQ�1k2 = 1, which implies an ideal condition-ing, cond(Q) = 1 (see (3.20)). This is useful also for �ndingleast squares solutions to over-determined linear systems and asa building block in algorithms for �nding matrix eigenvalues.If A is large and sparse (i.e. most of its elements are zero) thenthe LU and the QR decompositions may or may not remain suit-able. For instance, if all the nonzero elements of A are containedin a narrow band, i.e. in a few diagonals along the main diagonal(whence A is called banded) then both the LU and the QR algo-rithms can be easily adjusted to not do any work outside the band.For boundary value ODEs this typically leads to a reduction inthe algorithm's complexity from cubic to linear in the matrix di-mension. But inside the band the sparsity is usually lost, andother, iterative algorithms become more attractive. The latter istypically the case for elliptic PDEs, and is outside the scope ofour book.3.5 A-Stability, Sti� DecayIdeally, one would desire that a numerical discretization method mimic allproperties of the di�erential problem to be discretized, for all problems. Thisis not possible. One then lowers expectations, and designs discretizationmethods which capture the essential properties of a class of di�erential prob-lems.A �rst study of absolute stability suggests that, since for all stable testequations, jy(tn)j � jy(tn�1)j, a good discretization method for sti� problemsshould do the same, i.e. satisfy jynj � jyn�1j.This gives the concept of A-Stability: A di�erence method is A-stable ifits region of absolute stability contains the entire left half-plane of z = h�.

Chapter 3: Basic Methods, Basic Concepts 57A glance at Figs. 3.2 and 3.5 indicates that the backward Euler method isA-stable, whereas the forward Euler method is not.But a further probe into A-stability reveals two de�ciencies. The �rst isthat it does not distinguish between the casesRe(�)��1and �1� Re(�) � 0; jIm(�)j � 1 :The latter case gives rise to a highly oscillatory exact solution, which doesnot decay much. The di�culties arising are of a di�erent type, so whenaddressing sti�ness of the type that we have been studying, it is not essentialto include points near the imaginary axis in the absolute stability region ofthe di�erence method.The second possible weakness of the A-stability de�nition arises from itsexclusive use of absolute stability. In the very-sti� limit, hnRe(�) � �1,the exact solution of the test equation satis�es jy(tn)j = jy(tn�1)jehnRe(�) �jy(tn�1)j. The corresponding absolute stability requirement, jynj � jyn�1j,seems anemic in comparison, since it does not exclude jynj � jyn�1j.Let us generalize the test equation a bit, to include an inhomogeneity,y0 = �(y � g(t)) (3.28)where g(t) is a bounded, but otherwise arbitrary, function. We can rewrite(3.28) as "y0 = �(y � g(t))where " = 1jRe(�)j; � = "�, and note that the reduced solution, obtained for" = 0, is y(t) = g(t). This motivates saying that the discretization methodhas sti� decay if for tn > 0 �xed,jyn � g(tn)j ! 0 as hnRe(�)!�1: (3.29)This is a stronger requirement than absolute stability in the very-sti�limit, and it does not relate to what happens elsewhere in the h�-plane. Thebackward Euler method has sti� decay, because when applied to (3.28) ityields yn � g(tn) = (1 � hn�)�1(yn�1 � g(tn)):The forward Euler method of course does not have sti� decay.The practical advantage of methods with sti� decay lies in their abilityto skip �ne-level (i.e. rapidly varying) solution details and still maintain a

58 Chapter 3: Initial Value Problems-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5 3 3.5Figure 3.6: Approximate solution on a coarse uniform mesh for Example 3.2,using backward Euler (the smoother curve) and trapezoidal methods.decent description of the solution on a coarse level in the very-sti� (not thehighly oscillatory!) case. For instance, using backward Euler with a �xedstep h = :1 to integrate the problem of Example 3.2, the initial layer is poorlyapproximated, and still the solution is qualitatively recovered where it variesslowly, see Fig. 3.6. Herein lies a great potential for e�cient use, as well asa great danger of misuse, of such discretization methods.3.6 Symmetry: Trapezoidal MethodThe forward Euler method was derived using a Taylor expansion centeredat tn�1. The backward Euler method was likewise derived, centered at tninstead. Both methods are �rst order accurate, which is often insu�cientfor an e�cient computation. Better accuracy is obtained by centering theexpansions at tn�1=2 = tn � 12hn.

Chapter 3: Basic Methods, Basic Concepts 59Writingy(tn) = y(tn�1=2) + hn2 y0(tn�1=2) + h2n8 y00(tn�1=2) + h3n48y000(tn�1=2) + : : :y(tn�1) = y(tn�1=2)� hn2 y0(tn�1=2) + h2n8 y00(tn�1=2)� h3n48y000(tn�1=2) + : : :dividing by hn and subtracting, we obtainy(tn)� y(tn�1)hn = y0(tn�1=2) + h2n24y000(tn�1=2) +O(h4n): (3.30)Furthermore, writing similar expansions for y0 instead of y and adding, wereplace y0(tn�1=2) by 12(y0(tn) + y0(tn�1)) and obtainy(tn)� y(tn�1)hn = 12(y0(tn) + y0(tn�1))� h2n12y000(tn�1=2) +O(h4n): (3.31)The latter equation suggests the trapezoidal method for discretizing our pro-totype ODE system (3.1),yn = yn�1 + hn2 (f(tn;yn) + f(tn�1;yn�1)): (3.32)The local truncation error can be read o� (3.31): this method is second-orderaccurate.The trapezoidal method is symmetric: a change of variable � = �t on[tn�1; tn] (i.e. integrating from right to left) leaves it unchanged. 3 Like thebackward Euler method, it is implicit: the cost per step of these two methodsis similar. But the trapezoidal method is more accurate, so perhaps fewerintegration steps are needed to satisfy a given error tolerance. Before beingable to conclude that, however, we must check stability.Both the trapezoidal method and the backward Euler method are 0-stable(Exercise 3.1). To check absolute stability, apply the method with a step sizeh to the test equation. This givesyn = 2 + h�2� h�yn�1:3Consider for notational simplicity the ODE y0 = f(y). A discretization method givenby yn = yn�1 + hn (yn�1; yn; h)is symmetric if (u; v; h) = (v; u;�h) ;because then, by letting zn yn�1; zn�1 yn and h �h, we get the same method forzn as for yn.

60 Chapter 3: Initial Value ProblemsThe region of absolute stability is precisely the left half-plane of h�, so thismethod is A-stable. Moreover, the approximate solution is not dampenedwhen Re(�) > 0, which is qualitatively correct since the exact solution growsin that case.On the other hand, we cannot expect sti� decay with the trapezoidalmethod, because its ampli�cation factor satis�es j2+h�2�h� j ! 1 in the very-sti�limit. This is typical for symmetric methods. Precisely, for the trapezoidalmethod, 2 + h�2 � h� ! �1 as hRe(�)!�1:The practical implication of this is that any solution details must be resolvedeven if only a coarse picture of the solution is desired, because the fast modecomponents of local errors (for which h is \large") get propagated, almostundamped, throughout the integration interval [0; b]. This is evident in Fig.3.6, where we contrast integrations using the trapezoidal and the backwardEuler methods for Example 3.2 with a uniform step size h = :1. To apply thetrapezoidal method intelligently for this example, we must use a small stepsize through the initial layer, as in Fig. 3.4. Then the step size can becomelarger. The indicated mesh in Fig. 3.4 yields the solution pro�le shown whenusing the trapezoidal method.Finally, in Table 3.1 we display the maximum error at mesh points forExample 3.1, when using each of the three methods introduced hitherto, withuniform step sizes h = :1; h = :05, and h = :025. Note that the error in themethod h max errorforward Euler .1 .91e-2forward Euler .05 .34e-2forward Euler .025 .16e-2backward Euler .1 .52e-2backward Euler .05 .28e-2backward Euler .025 .14e-2trapezoidal .1 .42e-3trapezoidal .05 .14e-3trapezoidal .025 .45e-4Table 3.1: Maximum errors for Example 3.1.

Chapter 3: Basic Methods, Basic Concepts 61two Euler methods is not only larger in magnitude than the error obtainedusing the trapezoidal method, it also decreases linearly with h, while thetrapezoidal error decreases at a more favorable, quadratic rate.3.7 Rough ProblemsIn the beginning of this chapter we have assumed that the given ODE (3.1)is \su�ciently smooth", in the sense that all derivatives mentioned in thesequel are bounded by a constant of moderate size. This is often the casein practice. Still, there are many important instances where the problem isnot very smooth. In this section we discuss some such situations and theirimplication on the choice of discretization method.In general, if f(t;y) has k bounded derivatives at the solution y(t),sup0�t�b j djdtj f(t;y(t))j �M; j = 0; 1; : : : ; kthen by (3.1), y(t) has k + 1 bounded derivatives, and in particular,ky(j)k �M; j = 1; : : : ; k + 1:So, if f is discontinuous but bounded then y has a bounded, discontinuous�rst derivative. But the higher derivatives of y appearing in the Taylorexpansion (3.2) (and hence in the expression for the local truncation error)are not bounded, so a discretization across such a discontinuity may yieldrather inaccurate approximations.Suppose �rst that there is one point, �t, 0 < �t < b, where f(�t;y(�t)) isdiscontinuous, and everywhere else f is smooth and bounded. Note that theconditions of Theorem 1.1 do not hold on the interval [0; b], but they do holdon each of the intervals [0; �t] and [�t; b]. Thus, we may consider integratingthe problem y0 = f(t;y); 0 < t < �t; y(0) = cfollowed by the problemz0 = f(t; z); �t < t < b; z(�t) = y(�t):For each of these subproblems we can discretize using one of the methodsdescribed in this chapter and the next one, and expect to realize the fullaccuracy order of the method. Now, the algorithm does not \know" that wehave switched problems at t = �t. The integration can therefore proceed asbefore from 0 to b, provided that �t coincides with one of the mesh points, or

62 Chapter 3: Initial Value Problemsstep ends, t�n. On the other hand, if �t is in the interior of a step, i.e. for some�n, t�n�1 < t < t�n;then an O(h�n) error results, regardless of the order of the (consistent) dis-cretization method applied.Example 3.4 Consider the functionf(t; y) = t� j�; j� � t < (j + 1)�; j = 0; 1; : : : ; J;where � > 0 is a parameter. The ODE y0 = f is therefore a quadrature0 0.5 1 1.5

0

0.05

0.1

0.15

0.2

0.25

t

f

Figure 3.7: Sawtooth function for � = 0:2.problem for a sawtooth function, see Fig. 3.7. With y(0) = 0, the solution isy(t) = j� 2=2 + (t� j� )2=2; j� � t < (j + 1)�; j = 0; 1; : : : ; J:We also calculate that away from the discontinuity points j� ,y00(t) = 1; y000(t) = 0:For the trapezoidal method, the local truncation error therefore vanishes on astep [tn�1; tn] that does not contain a point of discontinuity. For this specialcase, the trapezoidal method using a constant step size h reproduces y(tn)exactly if � = lh for some positive integer l. If, on the other hand, � =(l+ r)h for some fraction r, then an O(h� ) error results, up to J times, fora combined O(h) error. A worse, O(1) error, may result if the \teeth aresharper", e.g. f(t) = t=� � j; j� � t < (j + 1)� , and � = O(h). 2

Chapter 3: Basic Methods, Basic Concepts 63If f(t;y) has a discontinuity then it is important to locate it and placea mesh point as close to it as possible. Unlike in the simple Example 3.4,however, the precise location of such a discontinuity may not be known. Inmany examples, the discontinuity is de�ned by a switching function, e.g.f(t;y) = (fI(t;y(t)) if g(t;y(t)) < 0;fII(t;y(t)) if g(t;y(t)) > 0:This situation often occurs when simulating mechanical systems (Example1.6) with dry friction. A simple event location algorithm, which automati-cally detects when g changes sign and then solves a local nonlinear algebraicequation (using interpolation of nearby yn-values) to locate the switchingpoint g = 0 more accurately, proves very useful in practice. An alternative,when using a general-purpose code which features adaptive step-size selec-tion, is to simply rely on the code to select a very small step size near thediscontinuity (because the local error is large there). But this is usually infe-rior to the event location strategy because the theory on which the step sizeselection is based is typically violated in the presence of a discontinuity, andbecause the code can become quite ine�cient when taking such small stepsizes.Note that the �rst order Euler methods utilize y00 in the expression forthe local truncation error, while the second order trapezoidal method utilizesthe higher derivative y000. This is general: a method of order p matches p+1terms in a local Taylor expansion such as (3.2), so the local truncation erroris expected to behave like O(hpny(p+1)(tn)). Thus, if only the �rst l + 1derivatives of y(t) exist and are bounded then, in general, any di�erencemethod will exhibit an order of at most l. As we will see in the next fewchapters, higher order methods cost more per step, so if the problem is rough(i.e. l is low) then lower order methods get the nod.Example 3.5 The harmonic oscillatoru00 + !2u = 0; 0 < t < bu(0) = 1; u0(0) = 0has the solution u(t) = cos !t:If the frequency ! is high, ! � 1, then the derivatives grow larger and larger,because ku(p)k = !p:The local error of a discretization method of order p isO(hp+1!p+1):

64 Chapter 3: Initial Value ProblemsThis means that to recover the highly oscillatory solution u(t) accurately, wemust restrict h < 1=!regardless of the order of the method. In fact, for h > 1=!, increasing theorder of the method as such is useless. 23.8 Software, Notes and References3.8.1 NotesThe basic questions of a numerical method's order, consistency, 0-stabilityand convergence are discussed, in one form or another, in most books thatare concerned with the numerical solution of di�erential equations. But thereis a surprisingly wide variety of expositions; see, e.g., [54, 43, 62, 85, 50, 26,8, 93, 67]. We chose to expose concepts in a way which highlights how theymimic properties of the di�erential equation. One bene�t of this is that theconcepts and the treatment naturally extend to boundary value problemsand to PDEs (which we do not pursue directly in this book). Omitted istherefore a general derivation of the global error as an accumulation of localerrors, such as what leads to (3.11).For this reason we also chose to de�ne the local truncation error as in(3.5)-(3.6). Some other authors have chosen to de�ne this quantity multi-plied by hn, making dn and ln generally of the same order. We also chosenot to use the local error (whose de�nition depends on the existence of theapproximate solution: see, e.g., Exercise 3.9 to note that existence of thenumerical solution is not always a foregone conclusion) as a tool in our expo-sition of fundamental concepts, despite its practical usefulness as discussedin the next chapter.Another decision we took which deviates from most texts on numericalmethods for initial value ODEs was to introduce the concept of sti�ness atan early stage. This not only highlights the basic importance of the topic, itis also a natural approach if one has the numerical solution of PDEs in mind,and moreover it allows a natural treatment of the concept of absolute sta-bility. The extension of ODE stability restrictions to time-dependent PDEsis somewhat facilitated in x3.3, but it remains a nontrivial issue in general,because the ODE system size m is very large for the corresponding methodof lines. Some cases do extend directly, though, see Exercises 3.6 and 3.7.For more challenging cases see, e.g., Reddy and Trefethen [78] and referencestherein.

Chapter 3: Basic Methods, Basic Concepts 65While the concept of sti�ness is intuitively simple, its precise de�nitionand detection have proved to be elusive. Our discussion has excluded, forinstance, the in uence of rough forcing terms and of small but positive eigen-values. These issues are treated at length in many texts, e.g. Hairer &Wanner [52], Shampine [85] and Butcher [26].The concept of absolute stability is due to Dahlquist [33]. But the searchfor a good stability concept to capture the essence of sti�ness in numericalmethods has brought about an often confusing plethora of de�nitions (foran extensive exposition see, e.g., [52]). We feel that an appropriate conceptshould re ect what one aims to capture: a simple, intuitive phenomenonwhich is independent of the discretization approach. Hence we have usedthe somewhat less well-known terminology and de�nition of sti� decay. Thisterm was introduced by J. Varah in 1980, following Prothero & Robinson[73].A more reliable, and in some cases also more e�cient, means than �nitedi�erences for obtaining a Jacobian matrix without user intervention is touse automatic di�erentiation software [16]. This software takes as input auser-provided routine that computes f , and produces another routine whiche�ciently computes the Jacobian matrix. At present, this requires an initialtime investment, to install and learn to use the automatic di�erentiationsoftware. However, we expect that simpler interfaces will soon be available.The sti� problems dealt with in Chapters 3 { 5 have eigenvalues withlarge, negative real parts. Another type of \sti�ness" is when the problem has(nearly) purely imaginary large eigenvalues. This yields highly (i.e. rapidly)oscillatory problems. See Examples 3.5 and 9.8. A recent survey on thenumerical solution of such problems is given by Petzold, Jay & Yen [71].3.8.2 SoftwareIn each of the chapters following this one which deal with numerical methods,there is a section (just before the Exercises) which brie y describes someavailable software packages where corresponding methods and techniques areimplemented. The methods described in the current chapter are too basicto be implemented alone in quality general-purpose codes. Here we quicklymention instead some outlets through which such software is available. Thereare three types.1. A complete environment is made available in which users can do theirown code development interactively, having direct access to varioussoftware tools. These tools include certain ODE packages (and muchmore). Examples of such programming environments areMatlab andMathematica. We have used Matlab in most of the coding de-veloped for the examples and exercises in this book, and we stronglyrecommend it. An interactive environment such as Matlab does not

66 Chapter 3: Initial Value Problemsreplace procedural languages like C or FORTRAN, though, especiallyfor production runs and large scale computing. The tools providedwith Matlab currently do not cover all the problem classes of thisbook, nor are they always the best available for a given application.Matlab does allow interfacing with external software written in C orFORTRAN.2. Collected programs are available for a fee through software librariessuch as Nag and Imsl. These programs are written in FORTRAN orC. The range of software available in this way in some areas is consider-ably more extensive than what is available as part of the integrated pro-gramming environments, and it is more suitable for production codes.The advantage here, compared to the next alternative, is that there isa measure of quality control and support of the software available inthis way. This occasionally also implies some limitations on the rangeand richness of the software available.3. A large collection of codes is available electronically through Netlib.The web page is athttp://netlib.bell-labs.com/netlib/master/readme .It is possible also to [email protected] a request such assend codename from odewhich causes the (hypothetical) ODE code codename to be e-mailedback. Netlib is a software repository: it is available for free and comeswith no guarantee.The codes colsys, dassl and their derivatives, which solve sti� initialvalue problems, boundary value problems and di�erential-algebraic problems,and are distinguished by the fact that one of your authors took part in writingthem, are available through Netlib, as well as through this book's web page.Most software for scienti�c computation to date is written in FORTRAN,but software is available to convert to, or interface with, C and C++ pro-grams. The user therefore does not need to be uent in FORTRAN in orderto use these codes. Such porting programs are available through Netlib andalso through this book's web page.

Chapter 3: Basic Methods, Basic Concepts 673.9 Exercises1. Show that the backward Euler method and the trapezoidal method are0-stable.2. To draw a circle of radius r on a graphics screen, one may proceedto evaluate pairs of values x = r cos �; y = r sin � for a succession ofvalues �. But this is expensive. A cheaper method may be obtained byconsidering the ODE _x = �y x(0) = r_y = x y(0) = 0where _x = dxd� , and approximating this using a simple discretizationmethod. However, care must be taken so as to ensure that the obtainedapproximate solution looks right, i.e. that the approximate curve closesrather than spirals.For each of the three discretization methods introduced in this chapter,namely, forward Euler, backward Euler and trapezoidal methods, carryout this integration using a uniform step size h = :02 for 0 � � � 120.Determine if the solution spirals in, spirals out, or forms an approximatecircle as desired. Explain the observed results. [Hint: this has to dowith a certain invariant function of x and y, rather than with the orderof the methods.]3. The following ODE systemy01 = � � y1 � 4y1y21 + y21y02 = �y1(1 � y21 + y21 )where � and � are parameters, represents a simpli�ed approximationto a chemical reaction [92]. There is a parameter value �c = 3�5 � 25�such that for � > �c solution trajectories decay in amplitude and spiralin phase space into a stable �xed point, whereas for � < �c trajectoriesoscillate without damping and are attracted to a stable limit cycle.[This is called a Hopf bifurcation.](a) Set � = 10 and use any of the discretization methods introducedin this chapter with a �xed step size h = 0:01 to approximate thesolution starting at y1(0) = 0; y2(0) = 2; for 0 � t � 20. Do thisfor the parameter values � = 2 and � = 4. For each case plot y1vs t and y2 vs y1. Describe your observations.

68 Chapter 3: Initial Value Problems(b) Investigate the situation closer to the critical value �c = 3:5. [Youmay have to increase the length of the integration interval b to geta better look.]4. When deriving the trapezoidal method, we proceeded to replace y0(tn�1=2)in (3.30) by an average and then use the ODE (3.1). If instead we �rstuse the ODE, replacing y0(tn�1=2) by f(tn�1=2;y(tn�1=2)), and then av-erage y, we obtain the implicit midpoint method,yn = yn�1 + hnf(tn�1=2; 12(yn + yn�1)): (3.33)(a) Show that this method is symmetric, second-order and A-stable.How does it relate to the trapezoidal method for the constantcoe�cient ODE (3.18)?(b) Show that even if we allow � to vary in t, i.e. we consider thescalar ODE y0 = �(t)yin place of the test equation, what corresponds to A-stabilityholds, namely, using the midpoint method,jynj � jyn�1j if Re(�) � 0(this property is called AN-stability [24]). Show that the samecannot be said about the trapezoidal method: the latter is notAN-stable.5. (a) Show that the trapezoidal step (3.32) can be viewed as half a stepof forward Euler followed by half a step of backward Euler.(b) Show that the midpoint step (3.33) can be viewed as half a stepof backward Euler followed by half a step of forward Euler.(c) Consider an autonomous system y0 = f(y) and a �xed step size,hn = h; n = 1; : : : ; N . Show that the trapezoidal method appliedN times is equivalent to applying �rst half a step of forward Euler(i.e. forward Euler with step size h=2), followed by N�1 midpointsteps, �nishing o� with half a step of backward Euler.Conclude that these two symmetricmethods are dynamically equiv-alent [34], i.e., for h small enough their performance is very similarindependently of N , even over a very long time: b = Nh� 1.(d) However, if h is not small enough (compared to the problem'ssmall parameter, say ��1) then these methods do not necessar-ily perform similarly. Construct an example where one of these

Chapter 3: Basic Methods, Basic Concepts 69methods blows up (error > 105, say) while the other yields anerror below 10�5. [Do not program anything: this is a (nontrivial)pen-and-paper question.]6. Consider the method of lines applied to the simple heat equation inone space dimension, ut = auxxwith a > 0 a constant, u = 0 at x = 0 and x = 1 for t � 0, andu(x; 0) = g(x) is given as well. Formulate the method of lines, as inExample 1.3, to arrive at a system of the form (3.18) with A symmetric.Find the eigenvalues of A and show that, when using the forward Eulerdiscretization for the time variable, the resulting method is stable ifh � 12a�x2:(This is a rather restrictive condition on the time step.) On the otherhand, if we discretize in time using the trapezoidal method (the re-sulting method, second order in both space and time, is called Crank-Nicolson), or the backward Euler method, then no stability restrictionfor the time step arises. [Hint: to �nd the eigenvalues, try eigenvectorsvk in the form vki = sin(ik��x); i = 1; : : : ;m, for 1 � k � m.]7. Consider the same question as the previous one, but this time the heatequation is in two space variables on a unit square,ut = a(uxx + uyy); 0 � x; y � 1; t � 0:The boundary conditions are u = 0 around the square, and u(x; y; 0) =g(x; y) is given as well.Formulate a system (3.18) using a uniform grid with spacing �x onthe unit square. Conclude again that no restrictions on the time steparise when using the implicit methods which we have presented for timediscretization. What happens with the forward Euler method? [Hint:don't try this exercise before you have solved the previous one.]8. Consider the ODE dydt = f(t; y); 0 � t � bwhere b� 1.

70 Chapter 3: Initial Value Problems(a) Apply the stretching transformation t = �b to obtain the equiva-lent ODE dyd� = b f(�b; y); 0 � � � 1(strictly speaking, y in these two ODEs is not quite the samefunction. Rather, it stands in each case for the unknown function).(b) Show that applying any of the discretization methods in this chap-ter to the ODE in t with step size h = �t is equivalent to applyingthe same method to the ODE in � with step size �� satisfying�t = b�� . In other words, the same stretching transformationcan be equivalently applied to the discretized problem.9. Write a short program which uses the forward Euler, the backwardEuler and the trapezoidal or midpoint methods to integrate a linear,scalar ODE with a known solution, using a �xed step size h = b=N ,and �nds the maximum error. Apply your program to the followingproblem dydt = (cos t)y; 0 � t � by(0) = 1. The exact solution isy(t) = esin t:Verify those entries given in Table 3.2 and complete the missing ones.Make as many (useful) observations as you can on the results in thecomplete table. Attempt to provide explanations. [Hint: plotting thesesolution curves for b = 20; N = 10b, say, may help.]10. Consider two linear harmonic oscillators (recall Example 2.6), one fastand one slow, u001 = �"�2(u1� �u1) and u002 = �(u2� �u2). The parameteris small: 0 < "� 1. We write this as a �rst order systemu0 = 0@"�1 00 11Avv0 = �0@"�1 00 11A (u� �u)where u(t);v(t) and the given constant vector �u each have two com-ponents. It is easy to see that EF = 12"(v21 + (u1 � �u1)2) and ES =12(v22 + (u2 � �u2)2) remain constant for all t (see x2.5).

Chapter 3: Basic Methods, Basic Concepts 71b N forward Euler backward Euler trapezoidal midpoint1 10 .35e-1 .36e-1 .29e-2 .22e-220 .18e-1 .18e-1 .61e-3 .51e-310 100200100 1000 2.46 25.90 .42e-2 .26e-220001000 100010000 2.72 1.79e+11 .42e-2 .26e-220000100000 2.49 29.77 .42e-4 .26e-4Table 3.2: Maximum errors for long interval integration of y0 = (cos t)yNext, we apply the following time-dependent linear transformation,u = Qx; v = Qz; Q(t) = 0@ cos !t sin!t� sin !t cos!t1A ; K = _QTQ = 0@0 �11 0 1Awhere ! � 0 is another parameter. This yields the coupled systemx0 = QT 0@"�1 00 11AQz+ !Kx (3.34a)z0 = �QT 0@"�1 00 11AQ(x� �x) + !Kz ; (3.34b)where �x = QT �u. We can write the latter system in our usual notationas a system of order 4, y0 = A(t)y+ q(t) :(a) Show that the eigenvalues of the matrixA are all purely imaginaryfor all !.[Hint: show that AT = �A.]

72 Chapter 3: Initial Value Problems(b) Using the values �u = (1; �=4)T , u(0) = �u, v(0)T = (1;�1)=p2and b = 20, apply the midpoint method with a constant step sizeh to the system (3.34) for the following parameter combinations:" = 0:001, k = 0:1; 0:05; 0:001 and ! = 0; 1; 10 (a total of 9runs). Compute the error indicators maxt jEF (t) � EF (0)j andmaxt jES(t)� ES(0)j. Discuss your observations.(c) Attempt to show that the midpoint method is unstable for thisproblem if h > 2p"=! [12]. Conclude that A-stability and AN-stability do not automatically extend to ODE systems.11. Consider the implicit ODEM(y)y0 = f(t;y)where M(y) is nonsingular for all y. The need for integrating initialvalue problems of this type typically arises in robotics. When thesystem size m is large, the cost of invertingM may dominate the entiresolution cost. Also, @M@y is complicated to evaluate, but it is given thatits norm is not large, say O(1).(a) Extend the forward Euler and the backward Euler discretizationsfor this case (without invertingM). Justify.(b) Propose a method for solving the nonlinear system of equationsresulting at each time step when using backward Euler, for thecase where j@f=@yj is very large.

Chapter 4One Step MethodsThe basic methods developed in Chapter 3 can be adequate for computingapproximate solutions of a relatively low accuracy (as we will see, for instance,in Example 4.1), or if the problem being solved is rough in a certain sense(see x3.7). But often in practice a quality solution of high accuracy to arelatively smooth problem is sought, and then using a basic, low order methodnecessitates taking very small steps in the discretization. This makes theintegration process ine�cient. Much fewer steps are needed when using ahigher order method in such circumstances.In order to develop e�cient, highly accurate approximation algorithms,we therefore design higher order di�erence methods. The higher order meth-ods we consider in this book are of two types: one step and linear multistep.In each of these classes of methods it will be useful to distinguish furtherbetween methods for sti� problems and methods for nonsti� problems. Thelarge picture is depicted in Fig. 4.1.In this chapter we will explore higher-order one-step methods. These aremethods which do not use any information from previous steps (in contrastto linear multistep methods which will be taken up in the next chapter).Thus, in a typical step of size h = hn = tn � tn�1, we seek an approximationyn to y(tn) given the previous step end result, yn�1.73

74 Chapter 4: Initial Value ProblemsRunge-Kutta linear multistep

non-stiff stiff non-stiff stiff

basic methods

Figure 4.1: Classes of higher order methods.Review: Recall from Advanced Calculus that Taylor's Theorem fora function of several variables givesF (x; y) = F + �@F@x (x� x) + @F@y (y � y)�+ 12! �@2F@x2 (x� x)2 + 2 @2F@x@y(x� x)(y � y) + @2F@y2 (y � y)2�+ � � �+ 1n!`nF + � � �where the functions on the right hand side are evaluated at (x; y) and`nF = nXj=0 �nj�� @nF@xj@yn�j (x; y)� (x� x)j(y � y)n�j :The conceptually simplest approach for achieving higher order is to usethe di�erential equation to construct the Taylor series for the solution. Fora scalar ODE y0 = f(t; y), the Taylor series method is given by replacing thehigher derivatives in a truncated Taylor expansion, yielding the formulayn = yn�1 + hy0n�1 + h22 y00n�1 + � � � hpp! y(p)n�1

Chapter 4: One Step Methods 75with f(t; y) and its derivatives evaluated at (tn�1; yn�1),y0n�1 = fy00n�1 = ft + fyfy000n�1 = ftt + 2ftyf + fyft + fyyf2 + f2y f (4.1)etc. The local truncation error is hpy(p+1)(tn)=(p + 1)! + O(hp+1). For asystem of di�erential equations, the derivatives are de�ned similarly.A problem with this method is that it requires analytic expressions forderivatives which in a practical application can be quite complicated. Onthe other hand, advances in compiler technology have enabled much morerobust programs for symbolic and automatic di�erentiation in recent years,which may make this method more attractive for some applications.We thus seek one-step methods that achieve a higher accuracy order with-out forming the symbolic derivatives of the Taylor series method. This leadsto Runge-Kutta methods, to which the rest of this chapter is devoted.4.1 The First Runge-Kutta MethodsWe stay with a scalar ODE for some of this exposition. The extension toODE systems is straightforward, and will be picked up later.Many Runge-Kutta (RK) methods are based on quadrature schemes. Infact, the reader may want to quickly review basic quadrature rules at thispoint.

76 Chapter 4: Initial Value ProblemsReview: Basic quadrature rules.Given the task of evaluating an integralZ ba f(t)dtfor some function f(t) on an interval [a; b], basic quadrature rulesare derived by replacing f(t) with an interpolating polynomial �(t)and integrating the latter exactly. If there are s distinct interpolationpoints c1; : : : ; cs, then we can write the interpolating polynomial ofdegree < s in Lagrange form,�(t) = sXj=1 f(cj)Lj(t)where Lj(t) = �si=1;i 6=j (t� ci)(cj � ci) :Then Z ba f(t)dt � sXj=1 wjf(cj)where the weights wj are given bywj = Z ba Lj(t)dt:The precision of the quadrature rule is p if the rule is exact for allpolynomials of degree < p, i.e., if for any polynomial f of degree < p,Z ba f(t)dt = sXj=1 wjf(cj):If b � a = O(h) then the error in a quadrature rule of precision p isO(hp+1). Obviously, p � s, but p may be signi�cantly larger than s ifthe points cj are chosen carefully.The midpoint and trapezoidal rules have precision p = 2. Simpson'srule has precision p = 4. Gaussian quadrature at s points has thehighest precision possible at p = 2s.Let's reconsider the methods we have already seen in the previous chapter

Chapter 4: One Step Methods 77in the context of quadrature. Writingy(tn)� y(tn�1) = Z tntn�1 y0(t)dt (4.2)we can approximate the area under the curve y0(t) (see Fig. 4.2) using eitherthe lower sum based on y(tn�1) (forward Euler) or the upper sum based ony(tn) (backward Euler). These are �rst order methods.y’(t)

ntn-1tFigure 4.2: Approximate area under curveFor a better approximation, we can use the height at the midpoint of theinterval, i.e. y0(tn�1=2) where tn�1=2 = tn � h=2, see Fig. 4.3. This leads to

y’(t)

ntn-1tFigure 4.3: Midpoint quadrature.the midpoint method (recall (3.28))yn = yn�1 + hf �tn�1=2; yn�1 + yn2 � :

78 Chapter 4: Initial Value ProblemsThis is an implicit Runge-Kutta method. We can construct an explicitmethod based on the same idea by �rst approximating y(tn�1=2) by the for-ward Euler method, and then substituting into the midpoint method to ob-tain yn�1=2 = yn�1 + h2f(tn�1; yn�1) (4.3a)yn = yn�1 + hf(tn�1=2; yn�1=2): (4.3b)The obtained explicit midpoint method (4.3) gives us a �rst real tasteof the original Runge-Kutta idea: a higher order is achieved by repeatedfunction evaluations of f within the interval [tn�1; tn]. Note that this methodis not linear in f anymore (substitute (4.3a) into (4.3b) to see this). At �rstglance, it might seem that the order would be limited to one, because the �rststage (4.3a) uses the forward Euler method, which is �rst order. However,note that the term involving yn�1=2 enters into (4.3b) multiplied by h, andtherefore its error becomes less important.Indeed, the local truncation error of (4.3) is given bydn = y(tn)� y(tn�1)h � f �tn�1=2; y(tn�1) + h2f(tn�1; y(tn�1))�= y0 + h2y00 + h26 y000 ��f + h2 (ft + fyf) + h28 (ftt + 2ftyf + fyyf2)�+O(h3) (4.4)where all quantities on the right hand side are evaluated at (tn�1; y(tn�1)).Using the ODE and its derivatives, all but O(h2) terms cancel. Thus themethod is consistent of order 2.The trapezoidal method considered in the previous chapter is obtained ina similar manner based on applying the trapezoidal quadrature rule to (4.2)yn = yn�1 + h2f(tn; yn) + h2f(tn�1; yn�1):This is another implicit Runge-Kutta method. To obtain an explicit methodbased on this idea, we can approximate yn in f(tn; yn) by the forward Eulermethod, yieldingyn = yn�1 + hf(tn�1; yn�1) (4.5a)yn = yn�1 + h2f(tn; yn) + h2f(tn�1; yn�1): (4.5b)This is called the explicit trapezoidal method. Like the explicit midpointmethod it is an explicit two-stage Runge-Kutta method of order two.

Chapter 4: One Step Methods 79The famous classical fourth order Runge-Kutta method is closely relatedto Simpson's quadrature rule applied to (4.2),y(tn)� y(tn�1) � h6 �y0(tn�1) + 4y0(tn�1=2) + y0(tn)� : (4.6)To build an explicit approximation of y0(tn�1=2) is not a simple matter any-more, though. The formula is given byY1 = yn�1Y2 = yn�1 + h2f(tn�1; Y1)Y3 = yn�1 + h2f(tn�1=2; Y2) (4.7)Y4 = yn�1 + hf(tn�1=2; Y3)yn = yn�1 + h6 �f(tn�1; Y1) + 2f(tn�1=2; Y2) + 2f(tn�1=2; Y3) + f(tn; Y4)� :It has order 4.Example 4.1 We compute the solution of the simple Example 3.1y0 = �5ty2 + 5t � 1t2 ; y(1) = 1using three explicit Runge-Kutta methods: forward Euler, explicit midpointand the classical fourth order method. We use various �xed step sizes tointegrate up to t = 25 and record the absolute errors at the end of the intervalin Table 4.1 (the exact solution, to recall, is y(t) = 1=t). We also recordfor each method a calculated convergence rate. This \rate" is calculated asfollows: if the error at step n behaves like en(h) = yn � y(tn) � chp for someunknown constant c and rate p then the error with half the step size shouldsatisfy e2n(h=2) � c �h2�p. Thus p � rate := log2 � en(h)e2n(h=2)�.A number of general observations can be deduced already from this verysimple example.1. The error for a given step size is much smaller for the higher ordermethods. That is the basic reason for embarking on the search for higherorder methods in this chapter and the next. Of course, the cost of eachstep is also higher for a higher order method. Roughly, if the cost ismeasured simply by the number of evaluations of f (which in complexapplications is usually the determining cost factor) then the cost of anRK4 step is double that of the explicit midpoint method which in turnis double that of the forward Euler method.

80 Chapter 4: Initial Value Problemsstep h Euler error rate RK2 error rate RK4 error rate0.2 .40e-2 .71e-3 .66e-60.1 .65e-6 12.59 .33e-6 11.08 .22e-7 4.930.05 .32e-6 1.00 .54e-7 2.60 .11e-8 4.340.02 .13e-6 1.00 .72e-8 2.20 .24e-10 4.160.01 .65e-7 1.00 .17e-8 2.08 .14e-11 4.070.005 .32e-7 1.00 .42e-9 2.04 .89e-13 3.980.002 .13e-7 1.00 .66e-10 2.02 .13e-13 2.13Table 4.1: Errors and calculated convergence rates for the forward Euler, theexplicit midpoint (RK2) and the classical Runge-Kutta (RK4) methods2. Thus, the choice of method depends on the accuracy requirements. Gen-erally, the smaller the error tolerance and the smoother the problem andits solution, the more advantageous it becomes to use higher order meth-ods (see x3.7). Here, if the maximum error tolerance is 10�4 then thebest choice would be forward Euler. But for an error tolerance 10�12the fourth order method is best.3. The error is polluted, as evidenced by the deviations of the computedrates from their predicted values of 1, 2 and 4, both for very large andfor very small step sizes. For h = :2 an error due to partial violationof absolute stability is observed. For h = :002 the truncation errorin the classical fourth order method is so small that the total errorbegins to be dominated by roundo� error (we have been using oatingpoint arithmetic with 14 hexadecimal digits). Roundo� error generallyincreases as h decreases, because more steps are required to cover theintegration interval. The assumption en(h) � chp presupposes that theroundo� error is dominated for this h by the truncation error, which isoften the case in practice. 2

Chapter 4: One Step Methods 814.2 General Formulation of Runge-Kutta Meth-odsIn general, an s-stage Runge-Kutta method for the ODE systemy0 = f(t;y)can be written in the formYi = yn�1 + h sXj=1 aijf(tn�1 + cjh;Yj) (4.8a)yn = yn�1 + h sXi=1 bif(tn�1 + cih;Yi): (4.8b)The Yi's are intermediate approximations to the solution at times tn�1+ cihwhich may be correct to a lower order of accuracy than the solution yn atthe end of the step. Note that Yi are local to the step from tn�1 to tn, andthe only approximation that the next step \sees" is yn. The coe�cients ofthe method are chosen in part so that error terms cancel and yn is moreaccurate.The method can be represented conveniently in a shorthand notationc1 a11 a12 � � � a1sc2 a21 a22 � � � a2s... ... ... . . . ...cs as1 as2 � � � assb1 b2 � � � bsWe will always choose ci = sXj=1 aij; i = 1; : : : ; s: (4.9)The Runge-Kutta method is explicit i� aij = 0 for j � i, because theneach Yi in (4.8a) is given in terms of known quantities. Historically, the�rst Runge-Kutta methods were explicit. However, implicit Runge-Kuttamethods are useful for the solution of sti� systems, as well as for boundaryvalue problems (see Chapter 8).Some examples of explicit Runge-Kutta methods are given below:

82 Chapter 4: Initial Value ProblemsForward Euler 0 01One-parameter family of second order methods0 0 0� � 01 � 12� 12�For � = 1, we have the explicit trapezoidal method, and for � = 1=2 it is theexplicit midpoint method.There are three one-parameter families of third order 3-stage methods. Onesuch family is 0 0 0 023 23 0 023 23 � 14� 14� 014 34 � � �where � is a parameter.Finally, the classical fourth order method is written using this notation as0 0 0 0 012 12 0 0 012 0 12 0 01 0 0 1 016 13 13 16We see that there are s-stage explicit Runge-Kutta methods of orderp = s, at least for p � 4. One may wonder if it is possible to obtain orderp > s, and if it is possible to always maintain at least p = s. The answersare both negative. There will be more on this in the next section.The choice of intermediate variables Yi to describe the Runge-Kuttamethod (4.8) is not the only natural one. Sometimes it is more naturalto use intermediate approximations to f rather than y at the interior stages.

Chapter 4: One Step Methods 83We leave it to the reader to verify that the general s-stage Runge-Kuttamethod (4.8) can be written asKi = f tn�1 + cih;yn�1 + h sXj=1 aijKj! (4.10a)yn = yn�1 + h sXi=1 biKi: (4.10b)4.3 Convergence, 0-Stability and Order forRunge-Kutta MethodsThe basic convergence of one-step methods is essentially automatic. All ofthe methods we have seen so far, and any that we will see, are accurate toat least �rst order, i.e. they are consistent. The fundamental Theorem 3.1tells us that convergence (to the order of accuracy, as in (3.10)) is achieved,provided only that the method is 0-stable. We can write any reasonableone-step method in the formyn = yn�1 + h (tn�1;yn�1; h) (4.11)where satis�es a Lipschitz condition in y. (This is obvious for explicitmethods. For implicit methods the Implicit Function Theorem is applied.)In the previous chapter we showed, following Theorem 3.1, that the forwardEuler method is 0-stable. Replacing f in that proof by yields the sameconclusion for the general one-step method (4.11). We leave the details tothe exercises.We next consider the question of verifying the order of a given Runge-Kutta method. Order conditions for general Runge-Kutta methods are ob-tained by expanding the numerical solution in Taylor series, as we did forsecond-order methods. For an autonomous ODE system1y0 = f(y)1Without loss of generality, we can consider systems of autonomous di�erential equa-tions only. This is because the ODE y0 = f(t; y) can be transformed to autonomous formby adding t to the dependent variables as followst0 = 1y0 = f(t; y):

84 Chapter 4: Initial Value Problemsthe exact solution satisfying y(tn�1) = yn�1 (recall that we are interestedhere in what happens in just one step) has at tn�1 the derivativesy0 = f =: f0y00 = f 0 = ( @f@y)f =: f1y000 = (@f1@y )f =: f2...y(k) = (@fk�2@y )f =: fk�1:(Note: f j is not the jth power of f). By Taylor's expansion at y = yn�1,y(tn) = y+ hy0 + h22 y00 + � � �+ hp+1(p+ 1)!y(p+1) + � � �= y+ hf + h22 f1 + � � � + hp+1(p+ 1)!fp + � � �For an s-stage Runge-Kutta method (4.8), substituting y(t) into the di�er-ence equations gives that in order to obtain a method of order p we musthave sXi=1 bif(Yi) = f + h2 f1 + � � �+ hp�1p! fp�1 +O(hp):A Taylor expansion of the f(Yi) therefore follows.Although this is conceptually simple, there is an explosion of terms tomatch which is severe for higher order methods. An elegant theory involvingtrees for enumerating these terms was developed by J. C. Butcher, in a longseries of papers starting in the mid-1960's. The details are complex, though,and do not yield a methodology for designing a method of a desirable order,only for checking the order of a given one.Instead, we proceed to derive simple, necessary order conditions, to getthe hang of it. The idea is that the essence of a method's accuracy is oftencaptured by applying our analysis to very simple equations.Consider the scalar ODEy0 = y + tl�1; t � 0 (4.12)l a positive integer, with y(0) = 0, for the �rst step (i.e. tn�1 = yn�1 = 0).Then Yi = h sXj=1 aij(Yj + (hcj)l�1):

Chapter 4: One Step Methods 85We can write this in matrix form(I � hA)Y = hlAC l�11where A is the s� s coe�cient matrix from the tableau de�ning the Runge-Kutta method, Y = (Y1; : : : ; Ys)T , C = diagfc1; : : : ; csg is the diagonalmatrix with the coe�cients cj on its diagonal and 1 = (1; 1; : : : ; 1)T . Itfollows that Y = hl(I � hA)�1AC l�11 andyn = h sXi=1 bi(Yi + (hci)l�1)= hlbT [I + hA+ � � �+ hkAk + � � � ]C l�11where bT = (b1; b2; � � � ; bs). Now we compare the two expansions for ynand for y(tn) and equate equal powers of h. For the exact solution y(t) =R t0 et�ssl�1ds, we havey(0) = � � � = y(l�1)(0) = 0; y(l+j)(0) = (l� 1)!; j � 0:This yields that for the method to be of order p the following order conditionsmust be satis�ed:bTAkC l�11 = (l � 1)!(l + k)! = 1l(l+ 1) � � � (l + k) ; 1 � l + k � p : (4.13)(The indices run as follows: for each l, 1 � l � p, we have order conditionsfor k = 0; 1; : : : ; p � l.)In component form the order conditions (4.13) readXi;j1;::: ;jk biai;j1aj1;j2 � � � ajk�1;jkcl�1jk = (l � 1)!(l + k)!:The vector form is not only more compact, though, it is also easy to program.We next consider two simple subsets of these order conditions. Settingk = 0 in (4.13), we obtain the pure quadrature order conditionsbTC l�11 = sXi=1 bicl�1i = 1l ; l = 1; 2; : : : ; p : (4.14)Note that the coe�cients aij of the Runge-Kutta method do not appear in(4.14). Next, setting l = 1 in (4.12), and k k + 1 in (4.13), we obtainthat for the method to be of order p the following order conditions must besatis�ed: bTAk�11 = 1k!; k = 1; 2; : : : ; p: (4.15)

86 Chapter 4: Initial Value ProblemsThese conditions are really additional to (4.14) only when k � 3, becauseA1 = c, so Ak�11 = Ak�2c.The leading term of the local truncation error for the ODE (4.12) withl = 0 is dn � hp�bTAp1� 1(p + 1)!� :For explicit Runge-Kutta methods, A is strictly lower triangular, hence Aj =0 for all j � s. This immediately leads to the conclusions:1. An explicit Runge-Kutta method can have at most order s, i.e., p � s.2. If p = s then the leading term of the local truncation error for thetest equation (4.12) with l = 1 cannot be reduced by any choice ofcoe�cients of the explicit Runge-Kutta method.Example 4.2 Consider all explicit 2-stage Runge-Kutta methods0 0 0c2 c2 0b1 b2For l = 1; k = 0, condition (4.13) reads b1 + b2 = 1. For l = 1; k = 1, wehave b2c2 = 1=2. The condition for l = 2; k = 0, is the same. Denoting� = c2 results in the family of two-stage, order-two methods displayed in theprevious section. For the choice of the parameter �, we can minimize thelocal truncation error for the quadrature test equation y0 = tp. It isdn � hp sXi=1 bicpi � 1p + 1! :Trying to achieve b2c22 = 1=3 gives the choice � = 2=3; b2 = 3=4. But thischoice does nothing special for the ODE y0 = y, for instance. 2The obtained set of order conditions (4.13) (recall also (4.9)) is certainlynecessary for the method to have order p. These order conditions are notsu�cient in general! Still, they can be used to �nd a simple upper boundon the order of a given method (Exercise 4.4), and also for the purpose ofdesigning new Runge-Kutta methods. In fact, often the order is alreadydetermined by the conditions (4.14) plus (4.15) alone.Example 4.3 We can now view the classical Runge-Kutta method as a re-sult of a methodical design process. The starting point is an attempt to extendSimpson's quadrature rule, which is fourth order. Although Simpson's rule

Chapter 4: One Step Methods 87has only 3 abscissae, 0; 1=2 and 1, we know already that a method of order4 will not result from only 3 stages, so we settle for 4 stages and chooseabscissae ci = 0; 1=2; 1=2; 1. Next, we must have from (4.15) bTA31 =b4a21a32a43 = 1=24. In particular, we must choose ai+1;i 6= 0. The simplestchoice then is to set the rest of the aij to 0, yielding ci+1 = ai+1;i; i = 1; 2; 3.The choice b1 = b4 = 1=6 is as in Simpson's rule and results from the quadra-ture conditions (4.14) alone. The �nal choice b2 = b3 is determined by thecondition (4.15) with k = 3. This completes the de�nition of the method. Itsorder does turn out to be p = 4. 2Example 4.4 The simple necessary order conditions (4.14), (4.15) give anupper bound on a method's order which turns out to agree with the order formany if not all the methods in actual use. However, counter-examples wherethese conditions are not su�cient can be constructed. For explicit Runge-Kutta methods of order p with p stages, these conditions are su�cient for p =1; 2 and 3. One is tempted to conclude that this is always true, as the famousjoke goes, \by induction". For p = 4, however, there are two additionalconditions: b3a32c22 + b4(a42c22 + a43c23) = 1=12 and b3c3a32c2 + b4c4(a42c2 +a43c3) = 1=8. The �rst of these is covered by (4.13), but the second, which isin the form bTCAC1 = 2=4!, is not. Together with (4.14) and (4.15) theseconditions imply in particular that we must choose c4 = 1 (Exercise 4.5). Butthe conditions (4.13) alone do not imply this. A particular example wherethese conditions are not su�cient is0 0 0 0 014 14 0 0 012 0 12 0 034 0 14 12 00 23 �13 23 2Example 4.5 For the sake of completeness, we list below the full set of con-ditions that must hold for a method to have order at least 5, in addition tothose conditions already necessary for order 4. (The order-4 conditions arediscussed in the previous example 4.4. For a higher order we recommendthat the reader get a hold of a more in-depth book.) The �rst 4 of these are

88 Chapter 4: Initial Value Problemsincluded in (4.13). bTC41 = 15bTA41 = 1120bTA2C21 = 160bTAC31 = 120bTC2AC1 = 110bTCAC21 = 115bTCA2C1 = 130bTCACAC1 = 140Xi;j;k biaijcjaikck = 120 2Finally, we return to the question, what is the maximal attainable order pby an explicit s-stage Runge-Kutta method? This question turns out to havea complicated answer. Given (4.9), the number of coe�cients in an explicits-stage method is s(s + 1)=2, but it does not bear a simple relationship tothe number of independent order conditions. Indeed, one often encountersfamilies of \eligible" methods for given order and number of stages in practice.Still, we have the following limitations on the attainable order as a functionof the number of stages:number of stages 1 2 3 4 5 6 7 8 9 10attainable order 1 2 3 4 4 5 6 6 7 7This explains in part why the fourth order explicit Runge-Kutta method isso popular (especially when no adaptive error control is contemplated).

Chapter 4: One Step Methods 894.4 Regions of Absolute Stability for ExplicitRunge-Kutta MethodsIn this section we investigate the regions of absolute stability for explicitRunge-Kutta methods. To recall, this region is obtained for a given methodby determining for what values of z = h� we get jynj � jyn�1j when applyingthe method to the test equation2 y0 = �y: (4.16)This test equation is an obvious generalization of (4.12) without the in-homogeneity. Repeating the same arguments here, we obtainyn = [1 + zbT (I � zA)�11]yn�1 (4.17)= [1 + zbT (I + zA+ � � �+ zkAk + � � � )1]yn�1:Substituting (4.15) into this expression and writing (4.17) asyn = R(z)yn�1we get for a Runge-Kutta method of order pR(z) = 1 + z + z22 + � � �+ zpp! +Xj>p zjbTAj�11 : (4.18)For an s-stage explicit method of order p, since Aj�1 = 0; j > s, we getyn = "1 + z + z22 + � � �+ zpp! + sXj=p+1 zjbTAj�11# yn�1 :In particular, the region of absolute stability of an explicit pth order RKmethod for s = p � 4 is given by��1 + h�+ (h�)22 + � � �+ (h�)pp! �� 1: (4.19)Thus we note that all p-stage explicit Runge-Kutta methods of order p havethe same region of absolute stability. For an s-stage method with order p < s,the absolute stability region is seen to depend somewhat on the method'scoe�cients.3The stability regions for the explicit p-stage pth order RK methods, 1 �p � 4, are shown in Fig. 4.4.2We must consider �, hence z, to be a complex number, because it represents aneigenvalue of a matrix in general.3For the fourth (and to a lesser extent the third) order methods depicted in Fig. 4.4there is a stretch along the imaginary axis of z where jR(z)j < 1. This translates todissipativity when such methods are used to construct �nite di�erence approximationsto hyperbolic PDEs, and it facilitates using such methods as smoothers when designingmultigrid solvers for certain PDEs. No such e�ect occurs for lower order discretizations.A full discussion of this is well beyond the scope of this book.

90 Chapter 4: Initial Value Problems−6 −5 −4 −3 −2 −1 0 1

−3

−2

−1

0

1

2

3

Re(z)

Im(z

)


Figure 4.4: Stability regions for p-stage explicit Runge-Kutta methods oforder p, p = 1; 2; 3; 4. The inner circle corresponds to forward Euler, p = 1.The larger p is, the larger the stability region. Note the \ear lobes" of the4th order method protruding into the right half plane.How do you plot a region of absolute stability? Recall that the numbersof modulus one in the complex plane are represented by ei�, for 0 � � � 2�.The stability condition is given by jR(z)j � 1, where R(z) is given by (4.18).For explicit Runge-Kutta methods R(z) is a polynomial in z = h� given,e.g., by the expression whose magnitude appears in (4.19). Thus, to �nd theboundary of the region of absolute stability, we �nd the roots z(�) ofR(z) = ei�for a sequence of � values. Starting with � = 0, for which z = 0, we repeatedlyincrease � by a small increment, each time applying a root �nder to �nd thecorresponding z, starting from z of the previous � as a �rst guess4, until thestability boundary curve returns to the origin.It is also possible to compute the region of absolute stability via a bruteforce approach. To do this, we �rst form a grid over a large part of thecomplex plane including the origin. Then at each mesh point zij, if jR(zij)j <1, we mark zij as being inside the stability region.4This is an elementary example of a continuation method.

Chapter 4: One Step Methods 91Finally, we note that no explicit Runge-Kutta method can have an un-bounded region of absolute stability. This is because all Runge-Kutta meth-ods applied to the test equation y0 = �y yieldyn = R(z)yn�1; z = h�where R(z) is a polynomial of degree s. Since jR(z)j ! 1 as jzj ! 1, verylarge negative values z cannot be in the region of absolute stability. In fact,it turns out that all known explicit Runge-Kutta methods are inappropriatefor sti� problems, and we are led to consider implicit Runge-Kutta methodsin x4.7. Before that, though, we discuss some of the ingredients necessary towrite robust software for initial value ODEs.4.5 Error Estimation and ControlIn virtually all modern codes for ODEs, the step size is selected automaticallyto achieve both reliability and e�ciency. Any discretization method with aconstant step size will perform poorly if the solution varies rapidly in someparts of the integration interval and slowly in other, large parts of the inte-gration interval, and if it is to be resolved well everywhere by the numericalmethod (see Exercise 4.12). In this section we will investigate several waysto estimate the error and select the next step h = hn = tn � tn�1. Since westrive to keep the entire integration process local in time (i.e., we march intime with all the information locally known) we attempt to control the localerror or the local truncation error, rather than the global error. Basically, byspecifying an error tolerance ETOL a user can require a more accurate (andmore expensive) approximate solution or a less accurate (and cheaper) one.Our step-size selection strategy may attempt to roughly equate the errorsmade at each step, e.g., jlnj � ETOLwhere ln is the local error5. (The vector ln has m components for a systemof m �rst order ODEs.) This makes the step size as large as possible, butto achieve a higher success rate in such step-size predictions, we typicallyuse some fraction of ETOL for safety. The global error also relates to thetolerance in case it can be obtained as a simple sum of local errors.If the components of the solution y are very di�erent in magnitude thenwe are better o� to consider an array of tolerances. In fact, for each com-ponent j of y (1 � j � m) it may be necessary to specify an absolute error5Recall from (3.14) that the local truncation error in the nth time step is related tothe local error by hn(jdnj + O(hp+1)) = jlnj(1 + O(hn)). Thus, local error control andstep-size selection are sometimes viewed as controlling the local truncation error.

92 Chapter 4: Initial Value Problemstolerance ATOLj, in addition to a common relative error tolerance RTOL.One then wants to choose h so that for each j, 1 � j � m,j(lj)nj � frac [ATOLj + j(yj)njRTOL];where frac is a safety fraction (say, frac = :9). Good codes allow the speci�-cation of m+ 1 tolerances as well as a default option of specifying only oneor two.Let us next assume again a scalar ODE, for notational simplicity. A basicproblem for estimating the step size in Runge-Kutta methods is that theexpression for the local truncation error is so complicated. For example, thelocal truncation error of the 2-stage family of explicit second order methodsderived earlier for a scalar ODE is given byhdn = h36 � 34 �fyyf2 + 2ftyf + ftt�� y000� +O(h4):Since the whole purpose of Runge-Kutta methods was to eliminate the needto calculate the symbolic partial derivatives explicitly, we will look for meth-ods to estimate the error at each step which do not use the partial derivativesdirectly. For this reason, it is convenient in the case of Runge-Kutta methodsto estimate the local error, rather than the local truncation error.The essential idea of the methods described below is to calculate twoapproximate solutions yn and yn at tn, such that yn� yn gives an estimate ofthe local error of the less accurate of the two approximate solutions, yn. Wecan then check if jyn � ynj � ETOL. If this inequality is not satis�ed thenthe step h is rejected and another step ~h is selected instead. If the methodfor �nding yn has order p then ln(~h) � c~hp+1, so we choose ~h to satisfy ~hh!p+1 jyn � ynj � fracETOLand repeat the process until an acceptable step size is found6. If the stepis accepted then the same formula can be used to predict a larger step sizehn+1 = ~h for the next time step.Embedded methodsWe are searching, then, for methods which deliver two approximations, ynand yn, at tn. A pair of Runge-Kutta methods of orders p and p+1, respec-tively, will do the job. The key idea of embedded methods is that such a pair6Another safety factor, ensuring that ~h=h is neither too large nor too small, is used inpractice, because we are using a simpli�ed model for the error which does not take intoaccount large h, roundo� error and absolute stability e�ects.

Chapter 4: One Step Methods 93will share stage computations. Thus we seek to derive an s-stage formula oforder p+ 1 such that there is another formula of order p embedded inside it(therefore, using the same function evaluations).If the original method is given byc AbTthen the embedded method is given byc AbTWe therefore use a combined notation for an embedded method:c AbTbTThe simplest example is forward Euler embedded in modi�ed trapezoid:0 0 01 1 01 012 12Probably the most famous embedded formula is the Fehlberg 4(5) pair.It has 6 stages and delivers a method of order 4 with an error estimate (or amethod of order 5 without):014 1438 332 9321213 19322197 �72002197 729621971 439216 �8 3680513 � 845410412 � 827 2 �35442565 18594104 �114025216 0 14082565 21974104 �15 016135 0 665612825 2856156430 � 950 255

94 Chapter 4: Initial Value ProblemsNote that we have omitted obvious 0's in the tableau. The somewhatunintuitive coe�cients of the Fehlberg pair arise not only from satisfying theorder conditions but also from an attempt to minimize the local error in yn.A question arises with any error estimate, whether one should add theestimate to the solution to produce a more accurate method (but now withno close error estimate). Here this would simply mean using yn rather thanyn for the start of the next step. Of course, this casts doubt on the quality ofthe error estimation, but users rarely complain when a code provides moreaccuracy than requested. Besides, the quality of ETOL as an actual errorestimate is questionable in any case, because it does not directly relate to theactual, global error. More on this later. This strategy, called local extrapo-lation, has proven to be successful for some methods for nonsti� problems,and all quality explicit Runge-Kutta codes use it, but it is not so common inthe solution of sti� problems.The methods of Dormand and Prince bite the bullet. They are designedto minimize the local error in yn, in anticipation that the latter will be usedfor the next step. The 4(5) pair given below has 7 stages, but the last stageis the same as the �rst stage for the next step (yn = Y7, and in the next stepn � 1 n, Y1 = yn�1, so Y7 at the current step and Y1 at the next step arethe same), so this method has the cost of a 6-stage method.015 15310 340 94045 4445 �5615 32989 193726561 �253602187 644486561 �2127291 90173168 �35533 467325247 49176 � 5103186561 35384 0 5001113 125192 �21876784 1184517957600 0 757116695 393640 � 92097339200 1872100 14035384 0 5001113 125192 �21876784 1184 0Note: For sti� problems, the stability properties of a method and itsembedded pair should be similar; see Exercise 5.7.Step doublingThe idea behind step doubling is simple. By subtracting the solution ob-tained with two steps of size hn = h from the solution obtained using onestep of size 2h, we obtain an estimate of the local error. Since we know the

Chapter 4: One Step Methods 95form of the local error as h! 0, we can estimate it well. To make this moreprecise, write the local error (recall (3.12)-(3.13)) asln = (tn; y(tn))hp+1 +O(hp+2):The function is called the principal error function. Now, let yn be thesolution using two steps of size h starting from yn�2, and let ~yn be the solutiontaking one step of size 2h from yn�2. Then the two local errors satisfy2ln(h) = 2hp+1 +O(hp+2)ln(2h) = (2h)p+1 +O(hp+2)where we have assumed that the local error after two steps is twice the localerror after one step. (This is true in the limit as h! 0.) Thenj~yn � ynj � 2hp+1(2p � 1)j (tn; yn)j+O(hp+2):Thus, 2ln � 2hp+1 (tn; y(tn)) � jy2 � y1j=(2p � 1):Although step doubling gives an accurate local error estimate, it is moreexpensive per step, especially for sti� problems. The embedded error esti-mates are cheaper, especially if the importance of an accurate local errorestimate is discounted. The step doubling procedure is general, though, andworks without inventing special embedded pair formulae.Global errorA similar step doubling procedure can be applied to estimate the global error.Here, after a sequence of steps for integrating the ODE over a given intervalhas been chosen and the integration carried out (say, by an embedded Runge-Kutta pair), the integration is repeated with each of these steps halved usingthe same discretization method. The above step-doubling error estimationprocedure is repeated, this time for all n, to obtain a global error estimate.There are other procedures for estimating the global error, but like theone just described, they are all non-local, and as such are much more cumber-some than the local procedures used above. When solving boundary valueproblems one is naturally dealing with the global error anyway, so typicalpractical procedures estimate it. For initial value problems, one would liketo avoid estimating the global error if possible, for the reasons indicated be-fore. However, there are applications where an estimate of the global erroris required.

96 Chapter 4: Initial Value ProblemsExample 4.6 While the global error often behaves like jenj � maxj jdjj,there are exceptions. Considery0 = �(y � sin t) + cos t; t � 0with y(0) = 0 and � = 50, say. Here the exact solution y(t) = sin t issmooth and nicely bounded, and this is what local errors and locally basedstep size selection relate to. But globally the error accumulates roughly likejenj � e50tn maxj jdjj. Therefore the actual, global error at t = 1, say, willbe much poorer than the local error, and will not relate well to a user-givenlocal error tolerance.Fortunately, examples like this are rare in practice, or so one hopes (seeExercise 4.18). 2While local error estimates are more easily obtained and they allow amore robust, dynamical error control and step size selection, satisfying aglobal error tolerance is typically closer to what the user (i.e. the personlooking for a solution to a particular application) may want. But how does auser go about specifying a global error tolerance? And how accurately needthe error tolerance(s) be speci�ed and satis�ed?These questions arise in modeling and depend strongly on the application.Here we just note that a precise error bound is often unknown and not reallyneeded. When a ball-park value for the global error would do, the stock-valueof a local error tolerance goes up.4.6 Sensitivity to Data PerturbationsOne important factor in assessing the choice of error tolerance for a givenapplication is the accuracy expected of the exact solution. Real-world modelsoften involve various parameters and initial data whose values are determinedby inaccurate means. The exact solution of the given initial value ODEsystem may therefore be viewed as a sample out of a cluster of trajectories.It makes no sense then (it is a waste of resources) to impose an error toleranceso strict that the computed solution is much closer to the exact solution thanthis exact solution trajectory is to its equally valid neighbor trajectories.So, to assess solution accuracy (and worth) in practice a user often needsa sensitivity analysis, i.e., we ask by how much does the exact solution changewhen the data are perturbed? Below we consider the relatively simple caseof small perturbations.To be speci�c, let us consider an IVP depending on parametersy0 = f(t;y;p); 0 < t < b (4.20)y(0) = c:

Chapter 4: One Step Methods 97The l parameters p can be functions of t, but for simplicity assume they areall given constants. Denote the exact solution of (4.20) by y(t). We nextconsider a perturbation vector �, i.e.,�p = p+ � :Call the resulting solution of (4.20) (i.e., with �p replacing p) �y(t). We seeka bound on ky � �yk in terms of j�j in case that j�j is so small that O(j�j2)terms can be considered negligible.Thus we write�y(t) = �y(t;p+ �) � y(t;p) +�@y(t;p)@p ��and obtain j�y(t)� y(t)j � jP (t)�j+O(j�j2) 0 � t � b (4.21a)where P = @y@p (4.21b)is an m� l matrix function. The simplest form of sensitivity analysis there-fore consists of approximately calculating P (t). Then, given bounds on theparameter variation j�jj � �Uj ; 1 � j � lwe can determine for each t�i = max j lXj=1 Pi;j�jj � lXj=1 jPi;jj�Uj ;giving the approximate boundj�y(t)� y(t)j � �(t) +O(j�U j2) 0 � t � b : (4.22)The behavior of the perturbation matrix function P (t) is governed by alinear initial value ODE. Di�erentiating (4.20) with respect to p and notingthat the initial conditions are assumed independent of p, we obtainP 0 = ( @f@y)P + @f@p ; 0 < t < b (4.23)P (0) = 0 :For each column of P (corresponding to one parameter in p), we thereforehave an initial value problem which depends on y(t) but is linear in P . Thus,

98 Chapter 4: Initial Value Problemsin order to estimate the perturbation function in practice we may solve (4.20)with a relatively permissive error tolerance, and compute P by integrating(4.23) along as well, using the same time step sequence. The combined systemcan be solved e�ciently, noting that the sensitivity system (4.23) is linearand shares the iteration matrix of the original system (4.20), and exploitingthis structure.Before turning to an example, we remark that a similar treatment canbe applied to assess solution sensitivity with respect to perturbations in theinitial data c. This is left to Exercise 6.4.Example 4.7 A simpli�ed description of the motion of a car in an arena isgiven by the equations corresponding to the ODE in (4.20),x0 = v cos � (4.24a)y0 = v sin � (4.24b)�0 = v tan L (4.24c)v0 = a� v (4.24d)where x and y are the Cartesian coordinates of the car's center, � is theangle the car makes with the x-axis (see Fig. 4.5), and v is the velocity withwhich the car is moving. Denote y = (x; y; �; v)T .xy L(x; y) �

Figure 4.5: Schematic of a mobile robot (not to scale).

Chapter 4: One Step Methods 99The damping (friction) factor and the car's length L are given: wetake = 1=6 and L = 11:5 cm (it's a toy car). The acceleration a andthe steering angle that the front wheels make with the car's body are twofunctions which one normally controls in order to drive a car. Here we takethem, for simplicity, as constant parameters: we set a = 100 cm s�2, = 1.We are interested in the sensitivity of the car's position (x; y) with respect toa constant change in .Since we are checking sensitivity with respect to only one parameter, P (t)is a vector of length 4. The di�erential equations (4.23) for P (t) are foundby di�erentiating (4.24) with respect to the parameter , to obtainP 01 = �vP3 sin � + P4 cos �P 02 = vP3 cos � + P4 sin �P 03 = [P4 tan + vcos2 ]=LP 04 = � P4 :Note that P depends on y but y does not depend on P , and that the ODEfor P is linear, given y.We useMatlab to compute the solutions for y(t) and P (t), starting withy(0) = (10; 10; 1; 0)T , P (0) = 0. We evaluate �y�(t) = y(t)� �P (t), and wealso numerically solve (4.20) directly for the perturbed problems, where isreplaced by � �. The resulting plots for � = 0:01 and for � = 0:05 aregiven in Fig. 4.6. We see that for � = 0:01 the linear sensitivity analysiscaptures the trajectory perturbation rather well. Also, not surprisingly foranyone who drives, the distance between y(t) and the perturbed trajectoriesincreases with t. As the size of the perturbation is increased to � = 0:05, thelinear approximation becomes less valid. 2Sensitivity analysis plays an important role in a number of situations, inaddition to assessing the accuracy of a model. In model development andmodel reduction, the sensitivity of the solution with respect to perturbationsin the parameters is often used to help make decisions on which parts of themodel are actively contributing in a given setting. Partial derivative matricessimilar to those occurring in sensitivity analysis also arise in the shooting andmultiple shooting methods for boundary value problems (see Chapter 7), aswell as in parameter estimation, design optimization and optimal control.

100 Chapter 4: Initial Value Problems−4 −2 0 2 4 6 8 10 12

10

12

14

16

18

20

22

x

y

y + 0.01P

y − 0.01P

(a) � = 0:01−6 −4 −2 0 2 4 6 8 10 126

8

10

12

14

16

18

20

22

24

x

y

y − 0.05P

y + 0.05P (b) � = 0:05Figure 4.6: Toy car routes under constant steering: unperturbed (solid line),steering perturbed by �� (dash-dot lines), and corresponding trajectoriescomputed by the linear sensitivity analysis (dashed lines).

Chapter 4: One Step Methods 1014.7 Implicit Runge-Kutta and Collocation Meth-odsCompared to explicit Runge-Kutta methods, for implicit Runge-Kutta meth-ods there are many more parameters to choose in (4.8) or (4.10). Thus, wemight expect to be able to attain a higher order for a given number of stages.This turns out to be the case, as we have already seen in the implicit midpointmethod, which is a 1-stage method of order 2. Moreover, the ampli�cationfunction R(z) is no longer a polynomial. This enables the construction ofimplicit Runge-Kutta methods which are appropriate for the solution of sti�systems.Many of the most commonly used implicit Runge-Kutta methods arebased on quadrature methods - that is, the points at which the intermediatestage approximations are taken are the same points used in certain classesof quadrature formulas. There are several classes of these methods, of whichwe give some examples with the �rst two instances for each.Gauss methods - these are the maximum order methods - an s-stage Gaussmethod has order 2s:12 121 implicit midpoint s = 1;p = 23�p36 14 3�2p3123+p36 3+2p312 1412 12 s = 2; p = 4Radau methods - these correspond to quadrature rules where one end of theinterval is included (c1 = 0 or cs = 1), and attain order 2s � 1. The choicec1 = 0 makes no sense so we consider only the case cs = 1:1 11 backward Euler s = 1;p = 113 512 � 1121 34 1434 14 s = 2; p = 3

102 Chapter 4: Initial Value ProblemsLobatto methods - these correspond to quadrature rules where the function issampled at both ends of the interval. The order of accuracy is 2s� 2. Thereare three families. One such is:0 0 01 12 1212 12 trapezoidal method s = 2;p = 20 0 0 012 524 13 � 1241 16 23 1616 23 16 s = 3; p = 4Note that, in constructing a Runge-Kutta method, common sense shouldprevail. For example, while there is no analytical reason why we shouldchoose 0 � ci � 1, in physical applications it sometimes does not make senseto evaluate the function outside the interval.A Runge-Kutta method with a nonsingular coe�cient matrix A whichsatis�es asj = bj; j = 1; : : : ; s, is called sti�y accurate. This gives sti� decay(Exercise 4.7).4.7.1 Implicit Runge-Kutta Methods Based on Collo-cationCollocation is an idea which runs throughout numerical analysis. The basicidea is to choose a function from a simple space (usually a polynomial), anda set of collocation points, and require that the function satisfy the givenproblem at the collocation points.Starting with a set of s distinct points 0 � c1 < c2 < � � � < cs � 1,and considering for simplicity a scalar ODE y0 = f(t; y) at �rst, we seek thepolynomial �(t) of degree at most s which collocates the ODE as follows�(tn�1) = yn�1�0(ti) = f(ti; �(ti)); i = 1; 2; : : : ; swhere ti = tn�1 + cih are the collocation points. This de�nes �(t) uniquely.77Note that if we collect the polynomial pieces de�ned in this way on each step interval[tn�1; tn] into one function de�ned on [0; b], then we get a continuous, piecewise polynomialapproximation of the solution y(t).

Chapter 4: One Step Methods 103Now, take yn = �(tn) :This gives an s-stage implicit Runge-Kutta method. Why? Observe that�0 is a polynomial of degree at most s � 1 which interpolates s data pointsf(ti; �(ti)). De�ne Ki = �0(ti). Now, write �0 as a Lagrange interpolationformula �0(tn�1 + �h) = sXj=1 Lj(tn�1 + �h)Kjwhere Lj(tn�1 + �h) = �si=1;i 6=j (��ci)(cj�ci). (Because �0 is a polynomial of degree< s, it agrees with its s-points interpolant identically.) Integrating �0 withrespect to t from tn�1 to ti, i = 1; 2; : : : ; s, and from tn�1 to tn, we get�(ti)� �(tn�1) = h sXj=1 �Z ci0 Lj(r)dr�Kj�(tn)� �(tn�1) = h sXj=1 �Z 10 Lj(r)dr�Kj :(Recall again our brief review of basic quadrature.) Now de�neaij = Z ci0 Lj(r)drbj = Z 10 Lj(r)dr: (4.25)Thus, Ki = f(ti; �(ti)) = f(ti; yn�1 + hPsj=1 aijKj), and yn = yn�1 +hPsi=1 biKi. The obtained formula is therefore a Runge-Kutta method inthe form (4.10). Finally, note that for the general ODE systemy0 = f(t;y)precisely the same argument can be repeated, where now we have a vectorof m collocation polynomials, �(t).The Gauss, Radau and Lobatto methods introduced above are collocationmethods. That is, given the quadrature points ci, in each case all the othermethod's coe�cients are determined by (4.25). We note:� Runge-Kutta methods which are also collocation methods are easy toderive.� The order of such a collocation Runge-Kutta method is at least s andis determined only by the quadrature order condition (4.14) (i.e., theorder limitation is a result from quadrature theory).

104 Chapter 4: Initial Value Problems� The maximum order of an s-stage Runge-Kutta method is 2s.The last two conclusions require a proof which is left for the exercises. Wenote here that the order is restricted to be at most 2s by the quadrature ordercondition (4.14) and that a simple collocation analysis reveals that this order2s is attained by collocation at Gaussian points.With regard to absolute stability, we have already seen in (4.17) that forthe test equation y0 = �y a Runge-Kutta method readsyn = R(z)yn�1where z = h� and R(z) = 1 + zbT (I � zA)�11: (4.26)The region of absolute stability is given by the set of values z such thatjR(z)j � 1. For an explicit method, we saw that R(z) is a polynomial, andhence the method cannot be A-stable. For implicit Runge-Kutta methods, incontrast, R(z) is a rational function, i.e. it is a quotient of two polynomialsR(z) = P (z)Q(z)and A-stable methods are abundant. All of the implicit Runge-Kutta meth-ods which we have seen so far turn out to be A-stable.When Re(z) ! �1 we also would like a method to have sti� decay.For this we must have R(�1) = 0 in (4.26), which is achieved if P (z) hasa lower degree than Q(z). Note that by (4.26), if A is nonsingular thenR(�1) = 1 � bTA�11, so R(�1) = 0 if the last row of A coincides withbT . For a collocation Runge-Kutta method this happens when c1 > 0 andcs = 1. In particular,� The Radau methods, extending backward Euler, have sti� decay.� The Gauss and Lobatto methods, which extend midpoint and trape-zoid, do not have sti� decay, although they are A-stable.These innocent looking conclusions have in fact far reaching importance.The Gauss and Lobatto methods are families of symmetric methods { this isimportant particularly in the context of boundary value problems. Symmet-ric methods can work for sti� problems, but for very sti� problems they donot approximate the exponential function well. The arguments of the previ-ous chapter extend here directly. The Radau methods, on the other hand,are particularly suitable for the solution of sti� initial value ODEs.

Chapter 4: One Step Methods 105#" !Reader's advice: The technical level of the rest of x4.7 is higherthan what we've had so far, and although it is of practical interest,skipping it should not impede the reading of the next section norof subsequent chapters.4.7.2 Implementation and Diagonally Implicit Meth-odsOne of the challenges for implicit Runge-Kutta methods is the developmentof e�cient implementations. To see why e�ciency could be a problem, weconsider again the general Runge-Kutta methodYi = yn�1 + h sXj=1 aijf(tn�1 + cjh;Yj)yn = yn�1 + h sXi=1 bif(tn�1 + cih;Yi):For the �th Newton iterate, let �i = Y�+1i �Y�i and ri = Y�i � yn�1 �hPsj=1 aijf(Y�j ). Then the Newton iteration takes the form0BBBBBB@I � ha11J1 �ha12J2 � � � �ha1sJs�ha21J1 I � ha22J2 � � � �ha2sJs... ... . . . ...�has1J1 �has2J2 � � � I � hassJs1CCCCCCA0BBBBBB@�1�2...�s1CCCCCCA = �[email protected] Ji = (@f=@y), evaluated at Y�i , i = 1; 2; : : : ; s. We note that for asystem of m di�erential equations, this is an sm � sm system of equationsto be solved at each time step. This is usually not competitive with themultistep methods to come in Chapter 5, which require only the solution ofan m�m nonlinear system at each time step. Thus, it is important to lookfor ways to make the iteration process less expensive.

106 Chapter 4: Initial Value ProblemsReview: The Kronecker product, or direct product of two matricesA and B is given by:AB = 0BBBBBB@ a11B a12B � � � a1sBa21B a22B � � � a1sB... ... . . . ...as1B as2B � � � assB 1CCCCCCAThere are two important properties of the Kronecker product that wewill need:1. (AB)(C D) = AC BD2. (AB)�1 = A�1 B�1.First, we simplify the Newton iteration by taking J1 = J2 = � � � = Js =J = (@f=@y), evaluated at yn�1. Using the approximate Jacobian doesnot reduce the method's accuracy, provided that the Newton iteration canstill converge. Using the Kronecker product notation, the simpli�ed Newtonmethod can now be written as(I � hA J)� = �r: (4.27)Note that while � and r depend on the iteration �, the matrix in (4.27) isthe same for all iterations, and it depends only on the step counter n. So,at most one LU -decomposition is needed per time step (at most, because wemay hold this matrix �xed over a few time steps).Unfortunately, collocation tends to yield Runge-Kutta methods where thecoe�cient matrix A = (aij)si;j=1 has few zeros. In particular, there are nozeros in the coe�cient matrices of Radau collocation methods. For e�ciencyreasons one can therefore consider also non-collocation implicit Runge-Kuttamethods for which A is a lower triangular matrix.One such family of methods are the Diagonally Implicit Runge-Kutta(DIRK) methods. These are implicit Runge-Kutta methods for which thecoe�cient matrix A is lower triangular, with equal coe�cients a along the

Chapter 4: One Step Methods 107diagonal.8 Thus, the stages in the Runge-Kutta method are de�ned byYi � haf(tn�1 + cih;Yi) = yn�1 + h i�1Xj=1 ai;jf(tn�1 + cjh;Yj);i = 1; : : : ; s : (4.28)Instead of having to solve an sm � sm system of linear equations we nowhave to solve s systems of size m�m each, all with the same matrix I�haJ .Hence the nonlinear system can be solved by block back-substitution. Onlyone evaluation of the local Jacobian J and one LU-decomposition of them�m sub-matrix I � haJ need to be done on each time step.DIRK methods have become popular for the numerical solution of time-dependent PDEs via the method of lines (recall Examples 1.3 and 1.7). Herethe Jacobian J is very large and sparse, and iterative methods are often usedfor the linear algebra. A fully implicit Runge-Kutta method makes thingscumbersome, but DIRKmethods o�er a Runge-Kutta alternative to the BDFmethods of the next chapter in this situation. When iterative methods areused it becomes less important to insist that the diagonal elements of thecoe�cient matrix A be all the same; however, it turns out that this extrafreedom in designing the DIRK method does not buy much.Because so many of the coe�cients of DIRK methods have been speci�edby construction to be zero, it is not surprising that the maximum attainableorder is much less than for general implicit Runge-Kutta methods. In fact, ithas been shown that the maximum order of an s-stage DIRK method cannotexceed s + 1. Some such methods are the midpoint method (s = 1; p = 2)and 01 � 1� 2 1=2 1=2 s = 2; p = 3with = 3+p36 . This latter method satis�es R(�1) = 1 �p3 � �0:7321.Thus R(�1) < 1, a marked improvement over the midpoint method whichhas no attenuation ( R(�1) = �1) at the sti�ness limit z = Re(�)h = �1.If the method is to have sti� decay as well then the order is further8Strictly speaking, DIRK methods do not require equal coe�cients along the diagonal.However, the subset of DIRK methods with equal diagonal coe�cients, which is calledsingly diagonally implicit Runge-Kutta (SDIRK), are the methods most commonly usedbecause of the possibility for a more e�cient implementation. So, we refer only to theSDIRK methods, and we call them DIRK to distinguish from the rather di�erent SIRKmethods which arise in x4.7.4.

108 Chapter 4: Initial Value Problemsrestricted to s. Examples are backward Euler (s = p = 1), 01 1 � 1 � s = 2; p = 2where = 2�p22 , and:4358665215 :4358665215 0 0:7179332608 :2820667392 :4358665215 01 1:208496649 �:644363171 :43586652151:208496649 �:644363171 :4358665215 s = 3; p = 34.7.3 Order ReductionWhen speaking of the order of accuracy of a method and making statementslike kek = max0�n�N jenj = O(hp)regarding the error in a method of order p, we mean that as the maximumstep size h ! 0 (and N ! 1), the error also shrinks, so fast that h�pkekremains bounded in the limit. In a �nite world we must of course think of has small but �nite, and we normally think of it as being much smaller thanthe smallest scale of the ODE being approximated.This changes in the very sti� limit. We have considered already in Chap-ter 3 problems like y0 = �(y � q(t)); 0 < t < 1 (4.29)where 0 < 1�Re(�) � h � 1. Here there are two small parameters, onethe method's step-size and the other the problem's. We consider the limitprocess in which 1�Re(�) ! 0 faster than h! 0. In this case our statementsabout the method's order may have to be revised.Indeed, some Runge-Kutta methods su�er from a reduction in their orderof convergence in the very sti� limit. The essential reason is that thesemethods are based on quadrature, and in the very sti� case the integratione�ect is very weak, at least for some solution components. For example,assume that � is real in (4.29). Upon dividing (4.29) by �� we see that

Chapter 4: One Step Methods 109y0 is multiplied by a constant which shrinks to 0 while the right hand sideis scaled to 1, so there is almost no integration in determining y(t) fromq(t). Upon applying a Runge-Kutta method in this case we obtain almost aninterpolation problem at the internal stages of a Runge-Kutta method. Theaccuracy order of this interpolation, and not the quadrature precision, thentakes center stage.The phenomenon of order reduction is important particularly for DAEs,so we leave a fuller discussion to Chapter 10. Here we note that somemethodsare a�ected by this more than others. Unfortunately, DIRKmethods are only�rst order accurate at the very sti� limit, which suggests that they shouldnot be used for very sti� problems. Fortunately, many time-dependent PDEsare sti� but not very sti�. Collocation at Radau points retains its full usualorder in the very sti� limit. That is one reason why these methods are sopopular in practice despite the more expensive linear algebra necessary fortheir implementation.4.7.4 More on Implementation and SIRK MethodsThe DIRK methods require a special zero-structure for the coe�cient matrixA, which implies the restrictions discussed above. A clever alternative isto seek implicit RK methods where A can be transformed by a similaritytransformation T into a particularly simple form,T�1AT = S:It can be shown that, upon transforming the variables� = (T�1 I)�and multiplying equations (4.27) by T�1I, the matrix problem to be solvedhas the form (I � hS J)� = �r (4.30)where r = (T�1 I)r. Thus, any lower triangular matrix S yields the DIRKstructure in (4.30) for the transformed variables �. An e�cient implemen-tation of Radau collocation can be obtained in this way (or using anothersimple form of S). We can go further, and require that S be a scalar multi-ple of the identity matrix, i.e. we look for methods in which A has a singles-fold real eigenvalue, a. This yields the SIRK (singly implicit Runge-Kutta)methods. Here, at most one m�m matrix needs be formed and decomposedper step. A good s-stage method of this sort has order s + 1, and unlikeDIRK this order does not reduce in the very sti� limit.

110 Chapter 4: Initial Value Problems4.8 Software, Notes and References4.8.1 NotesRunge [81] and Kutta [61] did not collaborate to invent the methods bearingtheir names: rather, Runge was �rst, and Kutta gave the general form. Buttogether they were responsible for much of the development of the earlyexplicit RK methods. Chapter II of Hairer, Norsett & Wanner [50] gives afull exposition which we do not repeat here.As noted earlier, the order of an e�cient RK method can be a challenge toverify if it is higher than 3 or 4, and our exposition in x4.3 aims to give a tasteof the issues involved, rather than to cover the most general case. Excellentand elaborate expositions of the general theory for order conditions can befound in Butcher [26] or in [50]. These references also discuss order barriers,i.e., limitations on the attainable order for explicit RK as a function of thenumber of stages.Error estimation and control is discussed in all modern texts, e.g. [62,43, 50, 52, 85]. Shampine [85] has many examples and elaborates on pointsof practical concern.For a comprehensive treatment of implicit RK methods and related collo-cation methods we refer to Hairer & Wanner [52]. It contains the theorems,proofs and references which we have alluded to in x4.7. The concepts of sti�accuracy and sti� decay were introduced in [73].A large number of topics have been omitted in our presentation, despitetheir importance. Below we brie y mention some of these topics. Othershave made their way into the exercises.The Runge-Kutta family is not the only practical choice for obtaining highorder one step methods. Another family of methods is based on extrapolation.Early e�orts in the 60's are due to W.B. Gragg, to J. Stoer and R. Bulirsch,and to H. Stetter [89]. See [50] for a full description. These methods haveperformed well, but overall they do not appear to outperform the familiesof methods discussed in this book. The extrapolation idea is discussed inChapter 8.A great deal of theory relating to stability and convergence for sti� prob-lems has been developed in the past few decades. We have described someof the more accessible work. Many di�erent stability de�nitions have beenproposed over the years; perhaps not all of them have stood the test of time.But we mention in particular the theories of order stars (and rational ap-proximations to the exponential) and B-convergence, for their elegance andability to explain the behavior of numerical methods [52].A topic of practical importance is dense output, or continuous extension;see, e.g., [50] or [85]. Normally, a discretization method yields approximatesolution values only at discrete mesh points. If every point at which a so-

Chapter 4: One Step Methods 111lution is required is taken to be a mesh point, an ine�cient procedure mayresult. A better, obvious idea is to use cubic Hermite interpolation (which isa cubic polynomial interpolant using two function values and two derivativevalues) because at mesh points we have both yn and f(tn;yn) which approx-imates y0(tn). Dense output can often be accomplished more e�ciently, andat higher order, using Runge-Kutta methods which have been speci�callydesigned for this purpose.One important application, in addition to plotting solutions, for whichdense output is needed, is for event location. Recall from x3.7 that if f hasdiscontinuities, determined according to a change of sign of some switch-ing function, then it is desirable to place mesh points at or very near suchswitching points (which are points of discontinuity) in t. Hence the \event"of the discontinuity occurrence requires detection. This is done by solvinga nonlinear equation iteratively and the function values needed are suppliedby the dense output mechanism.Another important instance where dense output may be required is insolving delay di�erential equations. A simple prototype isy0(t) = f(t;y(t);y(t� � ))where � is the delay. Delay equations can be very complicated, see, e.g., [50].We do not pursue this except to say that when discretizing in a straightfor-ward way at t = tn, say, also the value of y at tn � � is required. If thatdoes not fall on a past mesh point then the dense output extension must becalled upon. An IVP implementation is described in [50]. For possibilities ofconverting delay di�erential equations to standard ODEs, see Exercise 7.6.We have commented in x4.4 on the dissipativity of explicit Runge Kuttamethods of order > 2 and the use of these methods as a smoother in PDEsolvers. A reference is Jameson [57].A lot of attention has been devoted in recent years to symplectic methods.Recall from x2.5 that a Hamiltonian system provides a symplectic map. Asa corollary, considering a set of initial values each spawning a trajectoryof a Hamiltonian system, the volume of this set at a later time remainsconstant under the ow. Next, we may ask if the property of the symplecticmap is retained by a given numerical method. A numerical discretizationthat preserves symplecticity for a constant step size is called a symplecticmethod. Such methods are particularly desirable for applications involvinglong time integration, i.e. Nh = b � 1, where h is the step size and Nis the number of steps taken. Examples appear in molecular dynamics andcelestial mechanics simulations. For much more on this topic, see Sanz-Serna& Calvo [82] and [52]. See also Exercises 4.10, 4.11 and 4.19. As it turns out,there are di�culties in constructing general, e�cient methods of this sort,and varying the step size is also a challenge, yet there are instances wheresymplectic methods impressively outperform standard methods.

112 Chapter 4: Initial Value ProblemsViewing the discretization of a nonlinear ODE as a continuous dynamicalsystem yields a discrete dynamical system. One may wonder if the dynam-ical properties of the two systems are qualitatively similar, especially if thediscretization step size h is not very small or the number of steps taken Nis very large. We have already indicated above that this is not necessarilyso, e.g. when a non-symplectic method is applied to a Hamiltonian system.Another instance is where the discrete dynamical system has more solutionsthan its continuous counterpart. Viewed as a function of the step size hthere are principal solutions which tend towards the corresponding genuinesolutions, and in addition there may be spurious solutions . The latter wouldtend to 0 as h! 0, but they may be confusingly present for a �nite h. Formore on this see, e.g., Stuart & Humphries [93].Many e�orts have been invested since the 1980's in the development ofRunge-Kutta methods suitable for the solution of large ODE systems on par-allel computer architectures. The book by Burrage [23] covers such methodswell. Basic ideas include the design of Runge-Kutta methods where di�erentstages are su�ciently independent from one another that they can be eval-uated in parallel. This is parallelism in time t. Another direction exploitsparallelism also in the large system being integrated, and often leads to morespecialized methods. We mention the closely related multirate methods andwaveform relaxation methods. In the former, di�erent components of the sys-tem that vary more slowly are discretized over larger elements (time steps)than more rapidly varying components, allowing for parallel evaluations ofdi�erent parts of the system at a given stage. (The system being integratedmust be such that a block decoupling of this sort is possible; this is often thecase in VLSI circuit simulation, for example.) In the latter, a global iterativemethod in time such as(y�+1)0 �My�+1 = f(t;y�)�My�is considered, where y�(t) is known at the start of iteration �, � = 0; 1; : : : .This is really considered not for all time but over a time window (such as thelarge step size in a multirate method), and the matrix M is now chosen toallow parallelism in the evaluation of the iteration.4.8.2 SoftwareHere we brie y mention some (certainly not all) general-purpose codes andthe methods on which they are based. All of these codes provide error controland step-size selection. Some of them are available through Netlib.For nonsti� problems:� Many Runge-Kutta codes have been based on the Fehlberg 4(5) em-bedded pair [40]. An early in uential code was rkf45 by Shampine &

Chapter 4: One Step Methods 113Watts [87].� The code dopri5 presented in the monumental book [50] is based onthe Dormand-Prince 4(5) formulae [39], which uses local extrapolation.The code ode45 used in Matlab 5 is also based on these formu-lae, a switch from earlier Matlab versions where it was based on theFehlberg pair. This re ects the current accumulated experience whichsuggests that the Dormand-Prince pair performs better in practice.� Other codes based on embedded Runge-Kutta pairs are dverk by Hull,Enright & Jackson [56] which uses a pair of formulae by Verner [94],and rksuite by Brankin, Gladwell & Shampine [18], which implementsthree Runge-Kutta pairs { 2(3), 4(5) and 7(8) { and uses local extrap-olation. The latter has an option for global error estimation and it alsoautomatically checks for sti�ness.For sti� problems:� The code radau5 [52] uses the Radau 3-stage formula of order 5 withan implementation of the linear algebra along the lines described inx4.7.4.� The code stride by Burrage, Butcher & Chipman [25] uses a familyof SIRK methods.The codes dopri5, ode45, rksuite, radau5 and stride all have adense output option. The code ode45 also has a built-in event locationmodule.4.9 Exercises1. Show that a general Runge-Kutta method (4.8) can be written in theform (4.10). What is the relationship between Ki of (4.10a) and Yi of(4.8a)?2. Show that the explicit Runge-Kutta methods described in x4.1 can allbe written in the form (4.11), with Lipschitz continuous in y if f is.3. Prove: the one-step method (4.11) is 0-stable if satis�es a Lipschitzcondition in y.4. Write a computer program that will �nd an upper bound on the orderof a given Runge-Kutta method by checking the order conditions (4.13),

114 Chapter 4: Initial Value Problemsor alternatively, only (4.14) and (4.15) (and (4.9)). For a given numberof stages s your program should check these conditions for k = 1; 2; : : :until one is violated (this will happen before k reaches 2s + 1). Note:do not use any matrix-matrix multiplications!(a) Apply your program to all the embedded methods given in x4.5(both methods of each pair).(b) Apply your program to the Lobbato methods given in x4.7.(c) What does your program give for the counter-example of Example4.4 (x4.3)? What is the actual order of that method?5. For a 4-stage explicit Runge-Kutta method of order 4 show:(a) sXi=1 biaij = bj(1 � cj); j = 1; : : : ; s(This is a useful additional design requirement in general.) [ Theproof is a bit tricky and is given in [50].](b) Using this result show that we must have c4 = 1 in this case.6. It has been argued that displaying absolute stability regions as in Fig.4.4 is misleading: since a step of an s-stage explicit method costs es-sentially s times a forward Euler step, its stability region should becompared with what forward Euler can do in s steps. Thus, the scaledstability region of an s-stage explicit method is the stability regionshrunk by a factor s.For all Runge-Kutta methods with p = s, s = 1; 2; 3; 4, plot the scaledstability regions. Observe that forward Euler looks mighty good: noother method's scaled stability region fully contains the forward Eulercircle [52].7. An RK method with a nonsingular A satisfyingasj = bj; j = 1; : : : ; s (4.31)is called sti�y accurate [73].(a) Show that a sti�y accurate method has sti� decay.(b) Show that a collocation method is sti�y accurate i� c1 = 1 andc0 > 0.(c) Not all RK methods which have sti� decay satisfy (4.31). Showthat sti� decay is obtained also if A is nonsingular and its �rstcolumn is b11.

Chapter 4: One Step Methods 1158. For a given ODE y0 = f(y)consider the �-methodyn = yn�1 + hn[�f(yn) + (1� �)f(yn�1)]for some value �, 0 � � � 1.(a) Which methods are obtained for the values (i) � = 0, (ii) � = 1,and (iii) � = 1=2?(b) Find a range of �-values, i.e. an interval [�; �], such that themethod is A-stable for any � � � � �.(c) For what values of � does the method have sti� decay?(d) For a given �, 0 � � < 1, let us call a method �-damping ifjynj � �jyn�1jfor the test equation y0 = �y as �h!�1. (Thus, if y0 = 1 thenfor any tolerance TOL > 0, jynj � TOL after n steps when nexceeds logTOLlog � .)Find the range of �-values such that the �-method is �-damping.(e) Write the �-method as a general Runge-Kutta method, i.e. specifyA, b and c in the tableau c AbT(f) What is the order of the �-method?[If you managed to answer the previous question then try to an-swer this one without any Taylor expansions.]9. The solution of the problem y0 = f(y), y(0) = c, wheref(y) = (�y2; y1)T ; c = (1; 0)Tsatis�es y21 + y22 = 1i.e. it is a circle of radius 1 centered at the origin. Integrating thisODE numerically, though, does not necessarily satisfy this invariant,and the obtained curve in the y1 � y2 plane does not necessarily close.Show that when using collocation based on Gaussian points, the ap-proximate solution does satisfy the invariant, i.e. the obtained approx-imate solution stays on the circle. [See also Exercise 4.15, where a hintis provided.]

116 Chapter 4: Initial Value Problems10. In molecular dynamics simulations using classical mechanics modeling,one is often faced with a large nonlinear ODE system of the formMq00 = f(q); where f(q) = �rU(q): (4.32)Here q are generalized positions of atoms, M is a constant, diagonal,positive mass matrix and U(q) is a scalar potential function. Also,rU(q) = ( @U@q1 ; : : : ; @U@qm )T . A small (and somewhat nasty) instanceof this is given by the Morse potential [83] where q = q(t) is scalar,U(q) = D(1�e�S(q�q0))2, and we use the constantsD = 90:5�:4814e�3,S = 1:814, q0 = 1:41 and M = 0:9953.(a) De�ning the momenta p = Mq0, the corresponding �rst orderODE system for q and p is given byMq0 = p (4.33a)p0 = f(q): (4.33b)Show that the HamiltonianH(q;p) = pTM�1p=2 + U(q)is constant for all t > 0.(b) Use a library nonsti� Runge-Kutta code based on a 4-5 embeddedpair to integrate this problem for the Morse potential on the in-terval 0 � t � 2000, starting from q(0) = 1:4155, p(0) = 1:54548:888M .Using a tolerance TOL = 1:e � 4 the code should require a lit-tle more than 1000 times steps. Plot the obtained values forH(q(t); p(t))�H(q(0); p(0)). Describe your observations.11. The system (4.33) is in partitioned form. It is also a Hamiltonian systemwith a separable Hamiltonian, i.e., the ODE for q depends only on pand the ODE for p depends only on q. This can be used to designspecial discretizations. Consider a constant step size h.(a) The symplectic Euler method applies backward Euler to (4.33a)and forward Euler to (4.33b). Show that the resulting method isexplicit and �rst order accurate.(b) The leapfrog, or Verlet method can be viewed as a staggered mid-point discretization:M(qn+1=2 � qn�1=2) = h pnpn � pn�1 = h f(qn�1=2)

Chapter 4: One Step Methods 117i.e., the mesh on which the q-approximations \live" is staggeredby half a step compared to the p-mesh. The method can be kick-started by q1=2 = q0 + h=2M�1p0:To evaluate qn at any mesh point, the expressionqn = 12(qn�1=2 + qn+1=2)can be used.Show that this method is explicit and 2nd order accurate.(c) Integrate the Morse problem de�ned in the previous exercise using1000 uniform steps h. Apply three methods: forward Euler, sym-plectic Euler and leapfrog. Try the values h = 2, h = 2:3684and h = 2:3685 and plot in each case the discrepancy in theHamiltonian (which equals 0 for the exact solution). The plotfor h = 2:3684 is given in Fig. 4.7.0 500 1000 1500 2000 2500

−0.5

0

0.5

1

1.5

2

2.5x 10

−3

t

H(t

) −

H(0

)

Figure 4.7: Energy error for the Morse potential using leapfrog with h =2:3684. What are your observations? [The surprising increase in leapfrogaccuracy from h = 2:3684 to h = 2:3685 relates to a phenomenoncalled resonance instability.][Both the symplectic Euler and the leapfrog method are symplectic {like the exact ODE they conserve certain volume projections for Hamil-tonian systems (x2.5). We refer to [82, 50, 93] for much more on sym-plectic methods.]

118 Chapter 4: Initial Value Problems12. The following classical example from astronomy gives a strong motiva-tion to integrate initial value ODEs with error control.Consider two bodies of masses � = 0:012277471 and � = 1 � � (earthand sun) in a planar motion, and a third body of negligible mass (moon)moving in the same plane. The motion is governed by the equationsu001 = u1 + 2u02 � �u1 + �D1 � �u1 � �D2u002 = u2 � 2u01 � � u2D1 � � u2D2D1 = ((u1 + �)2 + u22)3=2D2 = ((u1 � �)2 + u22)3=2:Starting with the initial conditionsu1(0) = 0:994; u2(0) = 0; u01(0) = 0; u02(0) = �2:00158510637908252240537862224the solution is periodic with period < 17:1. Note thatD1 = 0 at (��; 0)and D2 = 0 at (�; 0), so we need to be careful when the orbit passesnear these singularity points.The orbit is depicted in Fig. 4.8. It was obtained using the Fehlberg−1.5 −1 −0.5 0 0.5 1

−1.5

−1

−0.5

0

0.5

1

1.5

u_1

u_2

Figure 4.8: Astronomical orbit using the Runge-Kutta Fehlberg method.embedded pair with a local error tolerance 1:e � 6. This necessitated204 time steps.

Chapter 4: One Step Methods 119Using the classical Runge-Kutta method of order 4, integrate this prob-lem on [0; 17:1] with a uniform step size, using 100, 1000, 10000 and20000 steps. Plot the orbit for each case. How many uniform steps areneeded before the orbit appears to be qualitatively correct?13. For an s-stage Runge-Kutta method (4.8), de�ne the s� s matrix Mby mij = biaij + bjaji � bibj:The method is called algebraically stable [24] if b � 0 (componentwise)and M is nonnegative de�nite. Show(a) Radau collocation is algebraically stable.(b) Gauss collocation is algebraically stable. In fact, M = 0 in thiscase.(c) The trapezoidal method, hence Lobatto collocation, is not alge-braically stable.(d) Algebraic stability is equivalent to AN -stability, i.e. for the nonau-tonomous test equation y0 = �(t)yone gets jynj � jyn�1j whenever Re� < 0, all t.[This exercise is di�cult. The basic idea is to write the expression forjynj22 and substitute yn�1 in terms of Yi in it .]14. A Runge-Kutta method (4.8) is symmetric if it remains invariant undera change of direction of integration. Thus, letting zn yn�1, zn�1 yn, Zj Ys+1�j and h �h, the same method (4.8) is obtained forzn.(a) Let E = 0BBBBBB@0 11�1 01CCCCCCA :Show that the Runge-Kutta method (4.8) is symmetric ifc+ Ec = 1; b = EbEAE +A = 1bT :(These conditions are essentially necessary for symmetry as well.)

120 Chapter 4: Initial Value Problems(b) Show that a symmetric Runge-Kutta method is algebraically sta-ble if and only if M = 0 :15. The problem considered in Exercise 4.9 is a simple instance of a systemwith an invariant [2]. More generally, an ODE system y0 = f(y) mayhave an invariant de�ned by algebraic equationsh(y) = 0 (4.34)meaning that for the exact solution y(t) of the ODE we have h(y(t)) =0, provided the initial values satisfy h(y(0)) = 0. The question is,which numerical discretization of the ODE (if any) satis�es the invari-ant precisely, i.e.,h(yn) = h(yn�1) n = 1; 2; : : : ; N :Denote the Jacobian H = hy and assume it has a full row rank for allrelevant y. We say that we have an integral invariant ifHf = 0; 8y:(See, e.g., [93].)(a) Show that any Runge-Kutta method preserves linear invariants.(b) Show that collocation at Gaussian points, and only at Gaus-sian points, preserves quadratic integral invariants. [Hint: Writeh(yn) = h(yn�1)+R tntn�1 h0 and use your knowledge of quadrature.](More generally, for Runge-Kutta methods the needed conditionis M = 0.)(c) The non-dimensionalized equations in Cartesian coordinates forthe simple pendulum can be written as_q1 = v1; _v1 = �q1�; _q2 = v2; _v2 = �q2� � 1q21 + q22 = 1:Di�erentiating the constraint twice and eliminating � yields theODE _q1 = v1; _v1 = �q1q21 + q22 (v21 + v22 � q2)_q2 = v2; _v2 = �q2q21 + q22 (v21 + v22 � q2)� 1with the invariants q21 + q22 = 1

Chapter 4: One Step Methods 121q1v1 + q2v2 = 0:Show that the midpoint method preserves the second of theseinvariants but not the �rst. [You may show this by a numericaldemonstration.]16. This exercise builds on the previous one.(a) Consider the matrix di�erential system_U = A(t; U)U 0 < t < b (4.35)U(0) = Iwhere A and U are m�m and A is skew-symmetric for all U; t:AT = �A :It can be shown that the solution U(t) is then an orthogonal ma-trix for each t.Show that collocation at Gaussian points (including the midpointmethod) preserves this orthogonality. We note that collocationat Lobatto points (including the trapezoidal method) does notpreserve orthogonality.(b) A number of interesting applications lead to problems of isospec-tral ow [27], where one seeks a matrix function satisfying_L = AL� LA (4.36)L(0) = L0for a given initial value matrix L0, where A(L) is again skew-symmetric. The eigenvalues of L(t) are then independent of t.Verify that L = UL0UTwhere U(t) is the orthogonal matrix function satisfying_U = AU; U(0) = Iand propose a discretization method that preserves the eigenvaluesof L.17. This exercise continues the previous two.Collocation at Gaussian points is an implicit, expensive method. Analternative idea is to use an explicit Runge-Kutta method, orthogonal-izing U at the end of each time step [37]. Consider a method of the form(4.11) for which we consider the matrix U(t) written as an m2-length

122 Chapter 4: Initial Value Problemsvector of unknowns. Since the result of this step is not necessarily anorthogonal matrix, a step of this method starting with an orthogonalUn�1 approximating U(tn�1) consists of two phases:Un = Un�1 + h (tn�1; Un�1; h)Un = UnRnwhere UnRn is a QR-decomposition of Un. The orthogonal matrix Unis then the projection of the result of the Runge-Kutta step onto theinvariant manifold, and it is taken as the end result of the step.Write a program which carries out this algorithm using the classical4th order Runge-Kutta method. (A library routine from LINPACK orMatlab can be used for the decomposition.) Try your program on theproblem U 0 = !0@ 0 1�1 01AUwhose exact solution is the re ection matrixU(t) = 0@ cos !t sin !t� sin!t cos!t1Afor various values of !; h and b. What are your conclusions?[Note that the QR-decomposition of a matrix is only determined up tothe signs of the elements on the main diagonal of the upper triangularmatrix R. You will have to ensure that Un is that orthogonal matrixwhich is close to Un.]18. If you are a Matlab fan, like we are, then this exercise is for you.Matlab (version 5) o�ers the user a simple ODE integrator, calledode45, which is based on the Dormand-Prince embedded pair. Weused this facility to generate the plot of Fig. 4.8 in less than one person-hour, in fact. In the interest of keeping things simple, the designers ofMatlab kept the interface for this routine on an elementary level, andthe user simply obtains 'the solution'.(a) Use Matlab to solve the problem of Example 4.6. Plot the ob-tained solution. Does it look like the exact one y(t) = sin t?Explain your observations.(b) It can be argued that the solution thatMatlab produces for thisexample does not look plausible (or \physical"), i.e. we could

Chapter 4: One Step Methods 123guess it's wrong even without knowing the exact one. Can youconstruct an example that will makeMatlab produce a plausible-looking solution which nonetheless is in 100% error? [This ques-tion is somewhat more di�cult.]19. The modi�ed Kepler problem [82, 51] is a Hamiltonian system, i.e.,q0 = Hp; p0 = �Hq ;with the HamiltonianH(q;p) = p21 + p222 � 1r � �2r3where r = pq21 + q22 and we take � = 0:01. Clearly, H 0 = Hqq0 +Hpp0 = 0, so H(q(t);p(t)) = H(q(0);p(0)); 8t. We consider simu-lating this system over a long time interval with a relatively coarse,uniform step size h, i.e. bh� 1. The mere accumulation of local errorsmay then become a problem. For instance, using the explicit midpointmethod with h = 0:1 and b = 500, the approximate solution for rbecomes larger than the exact one by two orders of magnitude.But some methods perform better than would normally be expected.In Fig. 4.9 we plot q1 vs. q2 (\phase plane portrait") for (a) the implicitmidpoint method using h = 0:1, (b) the classical explicit Runge-Kuttamethod of order 4 using h = 0:1, and (c) the exact solution (or rather,a su�ciently close approximation to it). The initial conditions areq1(0) = 1� �; q2(0) = 0; p1(0) = 0; p2(0) =p(1 + �)=(1� �)with � = 0:6. Clearly, the midpoint solution with this coarse step sizeoutperforms not only the explicit midpoint method but also the 4thorder method. Even though the pointwise error reaches close to 100%when t is close to b, the midpoint solution lies on a torus, like the exactsolution, whereas the RK4 picture is noisy. Thus, we see yet again thattruncation error is not everything, even in some nonsti� situations, andthe theory in this case must include other aspects.Integrate these equations using the two methods of Fig. 4.9 with aconstant step size h = 0:1 and h = 0:01 (four runs in total), moni-toring the maximum deviation jH(q(t);p(t)) � H(q(0);p(0))j. (Thisis a simple error indicator which typically underestimates the error inthe solution components, and is of interest in its own right.) What areyour conclusions?

124 Chapter 4: Initial Value Problems−1.5 −1 −0.5 0 0.5 1 1.5

−1

−0.5

0

0.5

1

q1

q2

q1 vs q2 for Kepler problem

(a) Implicit midpoint, 5000 uniform steps −1.5 −1 −0.5 0 0.5 1 1.5

−1

−0.5

0

0.5

1

q1

q2


(b) RK4, 5000 uniform steps−1.5 −1 −0.5 0 0.5 1 1.5

−1

−0.5

0

0.5

1

q1

q2


(c) Exact solution for T=500Figure 4.9: Modi�ed Kepler problem: approximate and exact solutions

Chapter 5Linear Multistep MethodsIn this chapter we consider another group of methods extending the basicmethods of Chapter 3 to higher order, see Fig. 4.1. The methods consideredhere use information from previous integration steps to construct higher orderapproximations in a simple fashion. Compared to the Runge-Kutta methodsof the previous chapter the methods here typically require fewer functionevaluations per step, and they allow a simpler, more streamlined methoddesign, at least from the point of view of order and error estimate. On theother hand, the associated overhead is higher as well, e.g. when wanting tochange the step size, and some of the exibility of one-step methods is lost.For our prototype ODE systemy0 = f(t;y); t � 0it is customary to denote fl = f(tl;yl)where yl is the approximate solution at t = tl. The general form of a k-steplinear multistep method is given bykXj=0 �jyn�j = h kXj=0 �jfn�jwhere �j ; �j are the method's coe�cients. We will assume that �0 6= 0, andj�kj + j�kj 6= 0. To eliminate arbitrary scaling, we set �0 = 1. The linearmultistep method is explicit if �0 = 0 and implicit otherwise. Note that thepast k integration steps are assumed to be equally spaced.Throughout most of this chapter we again consider a scalar ODEy0 = f(t; y)to simplify the notation. The extension to ODE systems is straightforwardunless otherwise noted. We also assume, as before, that f has as many125

126 Chapter 5: Initial Value Problemsbounded derivatives as needed. The general form of the method is rewrittenfor the scalar ODE for later reference,kXj=0 �jyn�j = h kXj=0 �jfn�j : (5.1)The method is called linear because, unlike general Runge-Kutta, theexpression in (5.1) is linear in f . Make sure you understand that this doesnot mean that f is a linear function of y or t, i.e. it is the method which islinear, not the ODE problem to be solved. A consequence of this linearityis that the local truncation error, to be de�ned later, always has the simpleexpression dn = Cp+1hpy(p+1)(tn) +O(hp+1) (5.2)where p is the method's order and Cp+1 is a computable constant. We willshow this in x5.2.The most popular linear multistep methods are based on polynomial in-terpolation, and even methods which are not based on interpolation useinterpolation for such purposes as changing the step size. So be sure thatyou're up on polynomial interpolation in Newton's form.

Chapter 5: Multistep Methods 127Review: The interpolating polynomial and divided di�erences.Let f(t) be a function to be interpolated at the k distinct pointst1; t2; : : : ; tk by the unique polynomial �(t) of degree < k which satis-�es the relations �(tl) = f(tl); l = 1; 2; : : : ; k:The polynomial can be written down explicitly in Lagrangian form aswe did in Chapter 4. Here, though, it is more convenient to write �(t)in Newton's form:�(t) = f [t1]+f [t1; t2](t�t1)+� � �+f [t1; t2; : : : ; tk](t�t1)(t�t2) � � � (t�tk�1)where the divided di�erences are de�ned recursively by f [tl] = f(tl),f [tl; : : : ; tl+i] = f [tl+1; : : : ; tl+i]� f [tl; : : : ; tl+i�1]tl+i � tl :The interpolation error at any point t is thenf(t)� �(t) = f [t1; : : : ; tk; t]�ki=1(t� ti):If the points ti and t are all in an interval of size O(h) and f has kbounded derivatives then the interpolation error is O(hk). If h is smallthen k!f [t1; : : : ; tk; t] � f (k)(t). Finally, for the case where the pointstl are equally spaced the expression for divided di�erences obviouslysimpli�es. We de�ne for future referencer0fl = flrifl = ri�1fl �ri�1fl�1: (5.3)An important property of the divided di�erences of f is that theyapproximate the derivatives, rkf � hkf (k).5.1 The Most Popular MethodsLinear multistep methods typically come in families. The most popular fornonsti� problems is the Adams family and the most popular for sti� problemsis the BDF family. In this section we derive these methods via the interpolat-ing polynomial. In the next section we give an alternative derivation which

128 Chapter 5: Initial Value Problemsis applicable for general multistep methods. We note that although the de-rived formulae in this section are for a constant step size h, the derivationsthemselves also suggest how to obtain formulae for a variable step size.5.1.1 Adams MethodsStarting with the di�erential equationy0 = f(t; y);we can integrate both sides to obtainy(tn) = y(tn�1) + Z tntn�1 f(t; y(t))dt:For Adams methods, the integrand f(t; y) is approximated by an interpo-lating polynomial through previously computed values of f(tl; yl). In thegeneral form (5.1) we therefore set, for all Adams methods, �0 = 1; �1 = �1and �j = 0; j > 1.The k-step explicit Adams method is obtained by interpolating fthrough the previous points t = tn�1; tn�2; : : : ; tn�k, see Fig. 5.1. The explicitt

n-1

y’(t)

fn-k

fn-2f

Figure 5.1: Adams-Bashforth methodsAdams methods1 are the most popular among explicit multistep methods. Asimple exercise in polynomial interpolation yields the formulaeyn = yn�1 + h kXj=1 �jfn�j ;1called Adams-Bashforth after J. C. Adams, who invented them to solve a problem ofcapillary action in collaboration with F. Bashforth published in 1883.

Chapter 5: Multistep Methods 129where2 �j = (�1)j�1 k�1Xi=j�1� ij � 1� i i = (�1)i Z 10 ��si �ds :This formula is a k-step method because it uses information at the k pointstn�1; tn�2; : : : ; tn�k. It is sometimes also called a (k + 1)-value method, be-cause the total information per step, which determines storage requirements,involves also yn�1.The local truncation error turns out to be Cp+1hpy(p+1)(tn) + O(hp+1),where p = k. Note that there is only one function evaluation per step.Example 5.1 The �rst order Adams-Bashforth method is the forward Eulermethod. The second order Adams-Bashforth method is given by the aboveformula with k = 2 and 0 = 1, 1 = 1=2. This yieldsyn = yn�1 + h�fn�1 + 12rfn�1�or, equivalently, yn = yn�1 + h�32fn�1 � 12fn�2� : 2Table 5.1 gives the coe�cients of the Adams-Bashforth methods for k upto 6.The Adams-Bashforth methods are explicit methods with very small re-gions of absolute stability. This has inspired the implicit versions of Adamsmethods, also called Adams-Moulton.The k-step implicit Adams method is derived similarly to the explicitmethod. The di�erence is that for this method, the interpolating polynomialis of degree � k and it interpolates f at the unknown value tn as well, seeFig. 5.2. This yields an implicit multistep methodyn = yn�1 + h kXj=0 �jfn�j :2Recall �si� = s(s � 1) � � � (s � i + 1)i! ; �s0� = 1:

130 Chapter 5: Initial Value Problemsp k j ! 1 2 3 4 5 6 Cp+11 1 �j 1 122 2 2�j 3 �1 5123 3 12�j 23 �16 5 384 4 24�j 55 �59 37 �9 2517205 5 720�j 1901 �2774 2616 �1274 251 952886 6 1440�j 4277 �7923 9982 �7298 2877 �475 1908760480Table 5.1: Coe�cients of Adams-Bashforth methods up to order 6t

y’(t)

f

)

n-k

fn-2fn-1

f(t n , yn

Figure 5.2: Adams-Moulton methodsThe order of the k-step Adams-Moulton method is p = k+1 (that p � k+1follows immediately from the fact that k+1 points are used in the underlyingpolynomial interpolation). An exception is the case for k = 1 where fn�1 isnot used, yielding p = k = 1. A straightforward interpolation yields thecoe�cients summarized in Table 5.2.Example 5.2 Here are some examples of Adams-Moulton methods:� k = 1 with �1 = 0 gives the backward Euler method;� k = 1 with �1 6= 0 gives the implicit trapezoidal method;� k = 2 gives yn = yn�1 + h12 [5fn + 8fn�1 � fn�2]. 2.The Adams-Moulton methods have smaller error constants than the Adams-Bashforth methods of the same order, and use one step less for the same order.

Chapter 5: Multistep Methods 131p k j ! 0 1 2 3 4 5 Cp+11 1 �j 1 �122 1 2�j 1 1 � 1123 2 12�j 5 8 �1 � 1244 3 24�j 9 19 �5 1 � 197205 4 720�j 251 646 �264 106 �19 � 31606 5 1440�j 475 1427 �798 482 �173 27 � 86360480Table 5.2: Coe�cients of Adams-Moulton methods up to order 6They have much larger stability regions than the Adams-Bashforth methods.But they are implicit. Adams-Moulton methods are often used together withAdams-Bashforth methods for the solution of nonsti� ODEs. This type ofimplementation is called predictor-corrector and will be described later, inx5.4.2.5.1.2 Backward Di�erentiation FormulaeThe most popular multistep methods for sti� problems are the backwarddi�erentiation formulae (BDF). Their distinguishing feature is that f(t; y)is evaluated only at the right end of the current step, (tn; yn). A mo-tivation behind this is to obtain formulae with the sti� decay property(recall x3.5). Applying the general linear multistep method (5.1) to theODE y0 = �(y � g(t)) and considering the limit hRe(�) ! �1, we havePkj=0 �j(yn�j � g(tn�j)) ! 0. To obtain yn � g(tn) ! 0 for an arbitraryfunction g(t) we must therefore set �0 6= 0 and �j = 0; j > 0. This leavestreating y0 in the di�erential equation y0(t) = f(t; y(t)). In contrast to theAdams methods, which were derived by integrating the polynomial which in-terpolates past values of f , the BDF methods are derived by di�erentiatingthe polynomial which interpolates past values of y, and setting the derivativeat tn to f(tn; yn). This yields the k-step BDF formula, which has order p = k,kXi=1 1iriyn = hf(tn; yn):This can be written in scaled form where �0 = 1,kXi=0 �iyn�i = h�0f(tn; yn):

132 Chapter 5: Initial Value ProblemsThe BDF formulae are implicit and are usually implemented together witha modi�ed Newton method to solve the nonlinear system at each time step.The �rst 6 members of this family are listed in Table 5.3. The �rst, one-stepmethod, is again backward Euler.p k �0 �0 �1 �2 �3 �4 �5 �61 1 1 1 �12 2 23 1 �43 133 3 611 1 �1811 911 � 2114 4 1225 1 �4825 3625 �1625 3255 5 60137 1 �300137 300137 �200137 75137 � 121376 6 60147 1 �360147 450147 �400147 225147 � 72147 10147Table 5.3: Coe�cients of BDF methods up to order 65.1.3 Initial Values for Multistep MethodsFor one-step methods we set y0 = y(0), the given initial value. Nothing elseis needed to start up the iteration in time for n = 1; 2; : : : .With a k-step method, in contrast, the method is applied for n = k; k +1; : : : . Thus, k initial values y0; y1; : : : ; yk�1 are needed to start it up. Theadditional initial values y1; : : : ; yk�1 must be O(hp) accurate for a methodof order p, if the full convergence order is to be realized (x5.2.3). If errorcontrol is used, these additional starting values must be accurate to a givenerror tolerance.To obtain these additional initial values, an appropriate Runge-Kuttamethod can be used. Another approach, utilized in all modern multisteppackages, is to recursively use a (k � 1)-step method. As we have seen, lin-ear multistep methods tend to come in families, so a general-purpose codecan be written which implements the �rst methods of such a family, fork = 1; 2; : : : ; p, say. Then the code can at a starting (or a restarting) pointt gradually and adaptively increase the method's number of steps (and cor-respondingly its order).Example 5.3 We compute the solution of the simple Example 3.1y0 = �5ty2 + 5t � 1t2 ; y(1) = 1:

Chapter 5: Multistep Methods 133The exact solution is y(t) = 1=t. We record results parallel to Example 4.1,i.e. for the same constant step sizes, measuring absolute errors and conver-gence rates at t = 25. Results for some Adams-Bashforth, Adams-Moultonand BDF methods are displayed in Tables 5.4, 5.5 and 5.6, respectively. Theinitial values for the k-step method are obtained by the values of the (k� 1)-step method of the same family. The symbol � denotes an \in�nite" error,which occurs when the absolute stability restriction is strongly violated.step h k = 1 error rate k = 2 error rate k = 4 error rate0.2 .40e-2 � �0.1 .65e-6 12.59 .32e-2 �0.05 .32e-6 1.00 .16e-8 20.9 .16e-10.02 .13e-6 1.00 .26e-9 2.00 .35e-14 31.80.01 .65e-7 1.00 .65e-10 2.00 .16e-14 1.170.005 .32e-7 1.00 .16e-10 2.00 .11e-14 0.540.002 .13e-7 1.00 .26e-11 2.00 .47e-14 -1.61Table 5.4: Example 5.3: Errors and calculated convergence rates for Adams-Bashforth methods.In Table 5.4 we can observe the high accuracy that the higher order meth-ods achieve for this very smooth problem. However, these small errors arewiped out by an explosion of the roundo� error if the step size is so large that�10hy is not in the absolute stability region of the method. The region ofabsolute stability is seen to be shrinking as the order of the method increases,in contrast to the Runge-Kutta results of Table 4.1.In the �rst column k = 1 of Table 5.5 we see the results for the backwardEuler method. For h small they are very close to those of the forward Eu-ler method (Table 4.1), but for the larger values of h they are much better.Newton's method was used to obtain convergence of the nonlinear iterationfor h > :02, and functional iteration was used for the smaller step sizes.The column of p = 2 describes the performance of the 2nd order trapezoidalmethod. For the 4th order method the error reaches roundo� level alreadyfor h = :02. The BDF methods perform similarly to the Adams-Moultonmethods for this nonsti� problem. The order of the methods, before the onsetof roundo� error, is clearly re ected in the results. The absolute value of theerrors is unusually small. 2

134 Chapter 5: Initial Value Problemsstep h k = 1; p = 1 error rate k = 1; p = 2 error rate k = 3 p = 4 error rate0.2 .13e-5 .52e-8 .22e-110.1 .65e-6 1.01 .13e-8 2.00 .14e-12 4.030.05 .32e-6 1.00 .33e-9 2.00 .87e-14 3.960.02 .13e-6 1.00 .52e-10 2.00 .50e-15 3.120.01 .65e-7 1.00 .13e-10 2.00 .17e-14 -1.820.005 .32e-7 1.00 .33e-11 2.00 .11e-14 0.730.002 .13e-7 1.00 .52e-12 2.00 .47e-14 -1.62Table 5.5: Example 5.3: Errors and calculated convergence rates for Adams-Moulton methods.5.2 Order, 0-Stability and ConvergenceAs in the two previous chapters, the basic convergence theory requires thata method have a certain (positive) order of accuracy (i.e. consistency) andthat it be 0-stable. The emphasis, though, is somewhat di�erent here fromwhat we had for Runge-Kutta methods: whereas there 0-stability was trivialand attaining useful methods with a high order of accuracy was tricky, here0-stability is not automatic (although it is not di�cult to check), whereasattaining high order is straightforward, provided only that we are preparedto use su�ciently many past values and provide su�ciently accurate initialvalues. Note also that the restriction to a constant step size, which is notneeded in x4.3, simpli�es life considerably in this section.5.2.1 OrderThe simple derivation below is incredibly general: it will give us a tool notonly for checking a method's order but also for �nding its leading local trun-cation error term and even for designing linear multistep methods, givensome desired criteria.De�ne the linear operator Lhy(t) byLhy(t) = kXj=0 [�jy(t� jh)� h�jy0(t� jh)] (5.4)where y(t) is an arbitrary continuously di�erentiable function on [0; b]. Thelocal truncation error is naturally de�ned as the defect obtained when plug-

Chapter 5: Multistep Methods 135step h k = 1 error rate k = 2 error rate k = 4 error rate0.2 .13e-6 .21e-7 .17e-100.1 .65e-6 1.01 .53e-8 2.02 .10e-11 4.060.05 .32e-6 1.00 .13e-8 2.01 .65e-13 4.020.02 .13e-6 1.00 .21e-9 2.00 .94e-15 4.620.01 .65e-7 1.00 .52e-10 2.00 .18e-14 -0.960.005 .32e-7 1.00 .13e-10 2.00 .10e-14 0.860.002 .13e-7 1.00 .21e-11 2.00 .46e-14 -1.66Table 5.6: Example 5.3: Errors and calculated convergence rates for BDFmethods.ging the exact solution into the di�erence equation (which here is (5.1) di-vided by h, see x3.2). This can be written asdn = h�1Lhy(tn) (5.5)where y(t) is the exact solution. In particular, the exact solution satis�esy0 = f(t; y(t)), soLhy(t) = kXj=0 [�jy(t� jh)� h�jf(t� jh; y(t� jh))]:If we now expand y(t � jh) and y0(t � jh) in Taylor series about t andcollect terms, we haveLhy(t) = C0y(t) + C1hy0(t) + � � �+ Cqhqy(q)(t) + : : :where the Cq are computable constants. Recall that the order of the methodis p if dn = O(hp). Thus:� The order of the linear multistep method is p i�C0 = C1 = � � � = Cp = 0; Cp+1 6= 0:� The local truncation error is given, as advertised in (5.2), bydn = Cp+1hpy(p+1)(tn) +O(hp+1):

136 Chapter 5: Initial Value ProblemsFrom the Taylor series expansions, it can be easily seen that the coe�-cients are given by:C0 = kXj=0 �jCi = (�1)i " 1i! kXj=1 ji�j + 1(i� 1)! kXj=0 ji�1�j# ; i = 1; 2; : : : (5.6)To obtain a method of order p, therefore, the �rst p of these expressions mustbe set to 0. The �rst few of these conditions read0 = �0 + �1 + �2 + � � � + �k0 = (�1 + 2�2 + � � �+ k�k) + (�0 + �1 + �2 + � � �+ �k)0 = 12(�1 + 4�2 + � � � + k2�k) + (�1 + 2�2 + � � �+ k�k);etc. When the order is p, Cp+1 is called the error constant of the method.Example 5.4 For the forward Euler method, �1 = �1; �1 = 1. So,C0 = 1� 1 = 0; C1 = 1 � 1 = 0; C2 = �12 + 1 = 12 :For the 2-step Adams-Bashforth method, �1 = �1; �1 = 32 ; �2 = �12. So,C0 = 1� 1 = 0; C1 = 1 � 32 + 12 = 0; C2 = �12 + 32 � 1 = 0;C3 = ��16 + 12 �32 � 2�� = 512 : 2Example 5.5 The coe�cients of the methods of the previous section can beobtained by applying their family design criteria to select some method coe�-cients and then using the order conditions to choose the remaining coe�cientssuch that the order is maximized.For instance, consider a 2-step BDF, �0 6= 0, �1 = �2 = 0. The methodis yn + �1yn�1 + �2yn�2 = h�0fn:The order conditions give the linear equations1 + �1 + �2 = 0�1 + 2�2 + �0 = 0�1 + 4�2 = 0:This system can be easily solved to yield �0 = 23; �1 = �43 ; �2 = 13, as perTable 5.3. The coe�cient of the leading term of the local truncation error isC3 = �16 ��43 + 83� = �29 . 2

Chapter 5: Multistep Methods 137Given some of the �0s and � 0s we can obviously use these relations to �ndthe remaining �0s and �0s for the method of maximal order (see Exercise 5.3;note, though, that this method may not be optimal, or even usable, due tostability considerations).A linear multistep method is consistent if it has order p � 1. Thus, themethod is consistent i�kXj=0 �j = 0; kXj=1 j�j + kXj=0 �j = 0:Sometimes it is more convenient to express the linear multistep methodin terms of the characteristic polynomials�(�) = kXj=0 �j�k�j (5.7a)�(�) = kXj=0 �j�k�j : (5.7b)In terms of these polynomials, the linear multistep method is consistent i��(1) = 0, �0(1) = �(1).'&

$%

Reader's advice: The material that follows below is importantfor the fundamental understanding of linear multistep methods.We derive simple conditions on the roots of the characteristicpolynomial �(�) which guarantee that a method is 0-stable. This,together with consistency, then gives convergence. Recall that�(1) = 0 by consistency, so this determines one root. The bottomline of the following discussion is that a usable method must haveall other roots of the polynomial �(�) strictly inside the unit cir-cle. A reader who is interested mainly in practical aspects maytherefore skip the next few pages until after Example 5.7, at leaston �rst reading.5.2.2 Stability: Di�erence Equations and the Root Con-ditionOne way of looking at a linear multistepmethod is that it is a di�erence equa-tion which approximates the di�erential equation. The stability of the linear

138 Chapter 5: Initial Value Problemsmultistep method, and the essential theoretical di�erence between multistepand one-step methods, are given by the stability of the di�erence equation.Before discussing stability for linear multistep methods, we review somebasic facts about linear di�erence equations with constant coe�cients. Givensuch a scalar di�erence equationakyn�k + ak�1yn�k+1 + � � �+ a0yn = qn; n = k; k + 1; : : :if fvng is a particular solution for this equation then the general solution isyn = xn+ vn, where xn is the general solution to the homogeneous di�erenceequation, 3akxn�k + ak�1xn�k+1 + � � �+ a0xn = 0; n = k; k + 1; : : :There are k linearly independent solutions to the homogeneous equation.To �nd them, we try the educated guess (ansatz) xn = �n. Substituting intothe homogeneous di�erence equation we have�(�) = kXj=0 aj�k�j = 0; (5.8)thus � must be a zero of the polynomial �(�). If all k roots are distinct thenthe general solution is given byyn = kXi=1 ci�ni + vn;where the ci, i = 1; : : : ; k, are arbitrary constants which are determined bythe k initial conditions required for the di�erence equation. If the roots arenot distinct, say �1 = �2 is a double root, then the solution is given byyn = c1�n1 + c2n�n2 + kXi=3 ci�ni + vn:For a triple root, we have �n; n�n; n(n � 1)�n as solution modes, etc. Thusthe solutions to the di�erence equation are intimately related to the roots ofthe characteristic polynomial which is associated with it.We can de�ne stability for this di�erence equation similarly to stability fora di�erential equation (see xx2.1-2.2, particularly Example 2.2). Clearly, fora perturbation in the ci not to grow unboundedly with n, we need to boundthe roots �i. We de�ne, in complete analogy to the constant coe�cient ODEcase:3For example, fvng can be the solution of the di�erence equation with zero initialconditions, v0 = v1 = � � � = vk�1 = 0.

Chapter 5: Multistep Methods 139� The di�erence equation is stable if all k roots of �(�) satisfy j�ij � 1,and if j�ij = 1 then �i is a simple root.� The di�erence equation is asymptotically stable if all roots satisfy j�ij <1.This completes our review of di�erence equations.For multistep methods applied to the test equation y0 = �y, the di�erenceequation is given by kXj=0(�j � h��j)yn�j = 0: (5.9)This is a homogeneous, constant-coe�cient di�erence equation, like what wehave just treated, with aj = �j�h��j. A solution to this di�erence equationis f�ni g if �i is a root of the polynomial �(�) = �(�) � h��(�) = 0. Sincethe solution to the ODE (with y(0) = 1) is y = e�t = (eh�)n, we expectone root to approximate eh� so that yn can approximate y(tn) (i.e., this is aconsistency requirement). That root is called the principal root. The otherroots are called extraneous roots.What should strike you in the above review is how closely the solutionprocedure for the di�erence equation is related to that of a scalar di�erentialequation of order k. The source of these extraneous roots (also referred toat times as parasitic roots) is the discrepancy between the ODE of order 1which should be approximated and the ODE of order k which is approximatedinstead by the multistep method. A good multistep method therefore mustensure that these extraneous roots, which cannot do any good, do not causeany harm either. This is what 0-stability (and strong stability, to be de�nedbelow) are about.5.2.3 0-Stability and ConvergenceRecall that in the previous two chapters convergence followed from accuracyusing a perturbation bound, i.e. 0-stability. Consider an ODE system, y0 =f(t;y) on the interval [0; b]. The de�nition (3.9) of 0-stability for one-stepmethods needs to be updated here to read that the linear multistep methodis 0-stable if there are positive constants h0 and K such that for any meshfunctions xh and zh with h � h0,jxl � zlj � Kfk�1Xi=0 jxi � zij+ maxk�n�N jh�1 kXj=0 �j(xn�j � zn�j)� kXj=0 �j(f(tn�j ;xn�j)� f(tn�j ; zn�j))jg; 1 � l � N: (5.10)

140 Chapter 5: Initial Value ProblemsIf we have this bound then convergence follows immediately. In fact, byplugging xn yn and zn y(tn) in the stability bound (5.10) we obtainthat if the k initial values are accurate to order p and the method has orderp then the global error is O(hp).The 0-stability bound is cumbersome to check for a given linear multistepmethod. Fortunately, it turns out that it is equivalent to a simple conditionon the roots of the characteristic polynomial �(�) of (5.7a). The completeproof is technical and appears in classical texts. Instead, we bring its essence.As the name implies, 0-stability is concerned with what happens in thelimit h ! 0. In this limit, it is su�cient to consider the ODE y0 = 0(corresponding to the fact that y0 is the dominant part of the di�erentialoperator y0� f(t;y)). Now, the ODE y0 = 0 is decoupled, so we can considera scalar component y0 = 0. For the latter ODE, the method reads�kyn�k + �k�1yn�k+1 + � � �+ �0yn = 0:This is a di�erence equation of the type considered in the previous subsection.It must be stable for the multistep method to be 0-stable. Identifying �(�)of (5.8) with �(�) of (5.7a), we obtain the following theorem:Theorem 5.1� The linear multistep method is 0-stable i� all roots �i of the character-istic polynomial �(�) satisfy j�ij � 1;and if j�ij = 1 then �i is a simple root, 1 � i � k.� If the root condition is satis�ed, the method is accurate to order p, andthe initial values are accurate to order p, then the method is convergentto order p.Note that the root condition guaranteeing 0-stability relates to the charac-teristic polynomial �(�) alone, see Fig. 5.3. Also, for any consistent methodthe polynomial � has the root 1. One-step methods have no other roots,which again highlights the fact that they are automatically 0-stable.Example 5.6 Instability is a disaster. Here is an example of an unstablemethod, yn = �4yn�1 + 5yn�2 + 4hfn�1 + 2hfn�2 :In terms of the local truncation error, this is the most accurate explicit 2-stepmethod. However, �(�) = �2 + 4� � 5 = (� � 1)(� + 5). The extraneous rootis �2 = �5 and the root condition is violated.

Chapter 5: Multistep Methods 141-1 1Figure 5.3: Zeros of �(�) for a 0-stable method.Consider solving y0 = 0 with initial values y0 = 0; y1 = �. Theny2 = �4y1 = �4�y3 = �4y2 + 5y1 = 21�y4 = �4y3 + 5y2 = �104�...There is no hope for convergence here. 2Consider again the test equation, and its discretization (5.9). IfRe(�) < 0then the exact solution decays and we must prevent any growth in the ap-proximate solution. This is not possible for all such � if there are extraneousroots of the polynomial � with magnitude 1. For h > 0 su�ciently small thedi�erence equation (5.9) must be asymptotically stable in this case, see Fig.5.4. We de�ne a linear multistep method to be� strongly stable if all roots of �(�) = 0 are inside the unit circle exceptfor the root � = 1,� weakly stable if it is 0-stable but not strongly stable.Example 5.7 Weak stability can be a disaster for some problems, too. Con-sider Milne's method,yn = yn�2 + 13h(fn + 4fn�1 + fn�2)

142 Chapter 5: Initial Value Problems-1 1Figure 5.4: Zeros of �(�) for a strongly stable method. It is possible to drawa circle contained in the unit circle about each extraneous root.for y0 = �y. The error satis�es the equationen = en�2 + 13h�(en + 4en�1 + en�2):Substituting as before en = �n, we have�1� 13h�� 2 � 43h�� 1 + 13h�� = 0(�(�)� h��(�) = 0). Clearly, � = �2 � 1 has a root at +1, and a root at �1.The roots of the full polynomial equation are given by� = 23h� �q1 + 13(h�)2(1� 13h�) :By expanding � into a power series in h�, we �nd that�1 = eh� +O(h�)5�2 = �e�(h�3 ) +O(h3):For � < 0, the extraneous root dominates, so the solution is unstable. 2A practically minded reader must conclude that any useful linear multi-step method must be strongly stable. We shall not be interested henceforthin any other methods. But this restricts the attainable order of accuracy. G.Dahlquist showed that

Chapter 5: Multistep Methods 143� Strongly stable k-step methods can have at most order k + 1.Example 5.8 The Adams methods, both explicit and implicit, have the char-acteristic polynomial �(�) = �k � �k�1 = (� � 1)�k�1so the extraneous roots are all 0, for any k. These methods are all stronglystable. The implicit methods have the highest order attainable.This explains in part the popularity of Adams methods. 2Example 5.9 The BDF methods were motivated in x5.1 by the desire toachieve sti� decay. This, however, does not automatically mean that theyare strongly stable. Exercise 5.4 shows that BDF methods are 0-stable for1 � k � 6 and unstable for k > 6. Thus, only the �rst 6 members of thisfamily are usable. 25.3 Absolute StabilityRecall that the general linear multistep methodkXj=0 �jyn�j = h kXj=0 �jfn�japplied to the test equation y0 = �y gives (5.9), i.e.,kXj=0 �jyn�j = h� kXj=0 �jyn�j :If we let yn = �n, then � must satisfykXj=0 �j�k�j = h� kXj=0 �j�k�j ; (5.11)or �(�) = h��(�).Now, the method is absolutely stable for those values of z = h� such thatjynj does not grow with n. This corresponds to values for which all roots of(5.11) satisfy j�j � 1.

144 Chapter 5: Initial Value ProblemsFor di�erential equations with positive eigenvalues, it is sometimes con-venient to de�ne an alternate concept: the region of relative stability. Thisis a region where the extraneous roots may be growing, but they are growingmore slowly than the principal root, so that the principal root still dominates.We will not pursue this further.Finding the region of absolute stability is simple for linear multistepmeth-ods. Just look for the boundary z = �(e{�)�(e{�)and plot (the complex scalar) z for � ranging from 0 to 2�.In Fig. 5.5 we plot absolute stability regions for the Adams methods. The�rst two Adams-Moulton methods are missing because they are A-stable.Notice how much larger the stability regions for the Adams-Moulton meth-ods are compared to the Adams-Bashforth methods for the same order (orfor the same number of steps): interpolation is a more stable process thanextrapolation.−4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Re(z)

Im(z

)


k=1k=2

k=3

k=4

(a) Adams-Bashforth k = 1; 2; 3; 4 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1−4

−3

−2

−1

0

1

2

3

4

Re(z)

Im(z

)


k=2

k=3k=4

(b) Adams-Moulton k = 2; 3; 4Figure 5.5: Absolute stability regions of Adams methodsRecall the de�nition of A-stability: a numerical method is A-stable if itsregion of absolute stability contains the left half plane hRe(�) < 0. Unfor-tunately, A-stability is very di�cult to attain for multistep methods. It canbe shown that:

Chapter 5: Multistep Methods 145� An explicit linear multistep method cannot be A-stable.� The order of an A-stable linear multistep method cannot exceed two.� The second order A-stable implicit linear multistep method with small-est error constant (C3 = 112) is the trapezoidal method.The utility of the trapezoidal method has already been discussed in x3.6. 4If we want to use linear multistep methods for sti� problems, the A-stability requirement must be relaxed. Moreover, the discussion in Chapter3 already reveals that in the very sti� limit hRe(�) ! �1, the A-stabilitybound may not be su�cient and the concept of sti� decay is more useful. TheBDF methods introduced in x5.1 trade in chunks of absolute stability regionsnear the imaginary axis for sti� decay. The size of these chunks increaseswith the number of steps k, until the methods become unstable for k > 6,see Fig. 5.6.−1 0 1 2 3 4 5 6 7 8 9

−4

−3

−2

−1

0

1

2

3

4

Re(z)

Im(z

)


stability OUTSIDEshaded areas

k=3

k=2

k=1

(a) BDF k = 1; 2; 3 −10 −5 0 5 10 15 20 25 30 35

−15

−10

−5

0

5

10

15

Re(z)

Im(z

)


stability OUTSIDEshaded areas

k=6

k=5

k=4

(b) BDF k = 4; 5; 6Figure 5.6: BDF absolute stability regions. The stability regions are outsidethe shaded area for each method.4These results were given by Dahlquist in the 1960's. They had a major impact onresearch in this area in the 1960's and 1970's. Today it is still easy to appreciate theirmathematical beauty, and the sophistication that went into the proofs, even though aglance at the methods used in successful implementations makes it clear that A-stabilityis not the property that separates the winners from the also-rans.

146 Chapter 5: Initial Value Problems5.4 Implementation of Implicit Linear Multi-step MethodsWhen using an implicit, k-step linear multistep method, i.e. �0 6= 0 in theformula kXj=0 �jyn�j = h kXj=0 �jfn�j ;a system of m nonlinear equations for yn has to be solved at each step (recallx3.4). We can solve this system by some variant of functional iteration (fornonsti� systems), or by a modi�ed Newton iteration (for sti� systems). Forany of these iterative methods we must \guess", or predict a starting iteratey0n, usually by evaluating an interpolant passing through past values of yand/or f at tn, or via an explicit multistep method.5.4.1 Functional IterationThe simplest way to solve the nonlinear algebraic system for yn is via func-tional iteration. The iteration is given by 5y�+1n = h�0f(tn;y�n)� kXj=1 �jyn�j + h kXj=1 �jfn�j ; s = 0; 1; : : : (5.12)This is a �xed point iteration. It converges to the �xed point yn if it isa contraction, i.e., if kh�0 @f@yk � r < 1. Hence it is appropriate only fornonsti� problems. The iteration is continued until it has been determinedto have converged as described for the Newton iteration below. Usually, ifconvergence is not attained within two to three iterations, or if the rate ofconvergence is found to be too slow, the current step is rejected and retriedwith a smaller step size (for example, halve the step size).5.4.2 Predictor-Corrector MethodsOften in nonsti� codes, the iteration is not taken to convergence. Instead, a�xed number of iterations is used for each time step. First, an approximationy0n to yn is predicted, usually by an explicit multistep method of the sameorder as the implicit method (for example, by the k-step Adams-Bashforthmethod of order k),P : y0n + �1yn�1 + � � �+ �kyn�k = h(�1fn�1 + � � �+ �kfn�k):5Do not confuse the notation y�+1n for the (� + 1)st iterate of yn with the notation forthe (� + 1)st power. Which of these is the correct interpretation should be clear from thecontext. We have reserved the superscript � for an iteration counter in this chapter.

Chapter 5: Multistep Methods 147Then the function is evaluated at y0n:E : f0n = f(tn;y0n)and inserted into the corrector formula (for example, Adams-Moulton oforder k or k + 1) to obtain a new approximation to yn. Setting � = 0,C : y�+1n + �1yn�1 + � � �+ �kyn�k = h(�0f�n + �1fn�1 + � � �+ �kfn�k):The procedure can be stopped here (this is called a PEC method), or thefunction can be evaluated at y1n to giveE : f1n = f(tn;y1n)(this is called a PECE method), or the steps E and C can be iterated �times to form a P(EC)� method or a P(EC)�E method. The �nal functionevaluation in a P(EC)�E method yields a better value for f to be used inthe next time step (i.e. n n + 1) as the new fn�1. Although it appearsthat the method might be expensive, the �nal function evaluation is usuallyadvantageous because it yields a signi�cant increase in the region of absolutestability over the corresponding P(EC)� method.It should be noted that because the corrector formula is not iteratedto convergence, the order, error and stability properties of the P(EC)�E orP(EC)� methods are not necessarily the same as for the corrector formulaalone. The methods of this subsection are di�erent, in principle, from themethods of the previous subsection x5.4.1 for the same implicit formula.Predictor-corrector methods are explicit methods6 which are members of aclass of methods called general linear methods. This class contains also linearmultistep methods.Example 5.10 Combining the 2-step Adams-Bashforth method (i.e. �1 =�1; �2 = 0; �1 = 3=2; �2 = �1=2) with the 2nd order 1-step Adams-Moultonmethod (i.e. the trapezoidal method, �1 = �1; �0 = �1 = 1=2), we obtain thefollowing method:Given yn�1; fn�1; fn�2,1. y0n = yn�1 + h2 (3fn�1 � fn�2)2. f0n = f(tn;y0n)3. yn = yn�1 + h2 (fn�1 + f0n)6Unlike an implicit method, an explicit method evaluates the next yn at each stepprecisely (in the absence of roundo� error) in a �nite number of elementary operations.This relates to the fact that no such predictor-corrector formula has an unbounded absolutestability region, even if the implicit corrector formula has one. These predictor-correctormethods are suitable only for nonsti� problems.

148 Chapter 5: Initial Value Problems4. fn = f(tn;yn).This is an explicit, 2nd order method which has the local truncation errordn = � 112h2y000(tn) +O(h3): 2The most-used variant of predictor-corrector methods is PECE. In thecommon situation where the order of the predictor formula is equal to theorder of the corrector formula, the principal term of the local truncation errorfor the PECE method is the same as that of the corrector:dn = Cp+1hpy(p+1)(tn) +O(hp+1):The local error is given by a similar expression (see (3.14)). Roughly speak-ing, the principal terms of the error are the same for the corrector as forthe PECE method because y0n, which is already accurate to the order of thecorrector, enters into the corrector formula multiplied by h, hence the errorwhich is contributed by this term is O(hp+1).5.4.3 Modi�ed Newton IterationFor sti� systems, a variant of Newton's method is used to solve the nonlin-ear algebraic equations at each time step. For the general linear multistepmethod we writeyn � h�0f(tn;yn) = � kXj=1 �jyn�j + h kXj=1 �jfn�jwhere the right hand side is known. Newton's iteration yieldsy�+1n = y�n ��I � h�0 @f@y��1 " kXj=0 �jyn�j � h kXj=0 �jfn�j#where yn, fn and @f=@y are all evaluated at y�n. The initial guess y0n is usuallyobtained by evaluating an interpolant passing through past values of y at tn.For a simple implementation, this method does the job. However, it is oftennot the cheapest possible.A modi�ed Newton method is usually employed in sti� ODE packages,where the Jacobian matrix @f=@y and its LU decomposition are evaluated(updated) only when deemed necessary. The matrix may be evaluated when-ever1. the iteration fails to converge, or

Chapter 5: Multistep Methods 1492. the step size has changed by a signi�cant amount or the order haschanged, or3. after a certain number of steps have passed.Since forming and LU-decomposing the matrix in Newton's iteration areoften the major computational expense in carrying out the next step's ap-proximation, relatively large savings are realized by the modi�ed Newton'smethod.The iteration is considered to have converged, e.g., when�1� � jy�+1n � y�nj < NTOLwhere the Newton iteration tolerance NTOL is usually taken to be a fractionof ETOL, the user error tolerance, say NTOL = :33ETOL, and � is anindication of the rate of convergence of the iteration, which can be estimatedby � = � jy�+1n � y�njjy1n � y0nj � 1� :#" !Reader's advice: The material that follows deals with some ofthe nuts and bolts for writing general-purpose software based onmultistep methods. Depending on your orientation, you may wishto read it with special care, or to skip it.5.5 DesigningMultistep General-Purpose Soft-wareThe design of an e�ective general-purpose code for solving initial-value prob-lems using multistep methods is a challenging task. It involves decisionsregarding error estimation and control, varying the step size, varying themethod's order and solving the nonlinear algebraic equations. The latter hasbeen considered already. Here we outline some of the important options forthe resolution of the remaining issues.

150 Chapter 5: Initial Value Problems5.5.1 Variable Step-Size FormulaeWe have seen in the previous chapters that in some applications varying thestep size is crucial for the e�ective performance of a discretization method.The general k-step linear methods that we have seen so farkXj=0 �jyn�j = h kXj=0 �jfn�jassume that we know the past values (yn�j ; fn�j); j = 1; : : : ; k, at a sequenceof equally spaced mesh points de�ned by the step length h. Now, if att = tn�1 we want to take a step of size hn which is di�erent from the stepsize hn�1 used before, then we need solution values at past times tn�1� jhn,1 � j � k � 1, whereas what we have from previous steps are values attn�1 � jhn�1, 1 � j � k � 1. To obtain approximations for the missingvalues, there are three main options. We will illustrate them in terms of thesecond order BDF method. Note that for Adams methods the interpolationswill be of past values of f instead of past values of y, and errors are estimatedvia di�erences of f and not y.Fixed-Coe�cient StrategyThe constant step-size, second order BDF formula (for step size hn) is givenby 32 �yn � 43yn�1 + 13yn�2� = hnf(tn;yn);where tn = tn�1 + hn. The BDF formula requires values of y at tn�1 andtn�1�hn. The �xed-coe�cient method computes these values from the valuesat tn�1, tn�1 � hn�1 and tn�1 � hn�1 � hn�2 by quadratic (more generally,polynomial) interpolation. The interpolated values of y at tn�1�hn becomethe `new' past values yn�2, and are used in the �xed-coe�cient BDF formulato advance the step.Fixed-coe�cient formulae have the advantage of simplicity. However,there is an error due to the interpolation, and they are less stable thanvariable-coe�cient formulae. Stability of the variable step-size formulae isan important consideration for problems where the step size must be changedfrequently or drastically (i.e., hn � hn�1 or hn � hn�1).Variable-Coe�cient StrategyBetter stability properties are obtained by deriving directly the formulaewhich are based on unequally-spaced data. Recall that the BDF formulaewere derived by �rst approximating y by an interpolating polynomial, andthen di�erentiating the interpolating polynomial and requiring it to satisfy

Chapter 5: Multistep Methods 151the ODE at tn. The variable-coe�cient BDF formulae are derived in ex-actly the same way, using an interpolating polynomial which is based onunequally-spaced data. The Adams methods can also be directly extended,using polynomial interpolation of unequally-spaced f -values.For example, to derive the variable-coe�cient form of the second orderBDF method, we can �rst construct the interpolating quadratic polynomial�(t) based on unequally-spaced data (here it is written in Newton form):�(t) = yn + (t� tn)[yn;yn�1] + (t� tn)(t� tn�1)[yn;yn�1;yn�2]:Next we di�erentiate the interpolating polynomial to obtain7�0(tn) = [yn;yn�1] + (tn � tn�1)[yn;yn�1;yn�2]:Then the variable-coe�cient form of the second-order BDF formula is givenby f(tn;yn) = [yn;yn�1] + hn[yn;yn�1;yn�2]:Note that on an equally spaced mesh, this formula reduces to the �xed step-size BDF method. The coe�cients in this method depend on hn and onhn�1.The variable-coe�cient method has the advantage for problems whichrequire frequent or drastic changes of step size. However, in the case ofimplicit methods, it can be less e�cient than the alternatives. To see this,rewrite the formula in terms of past stepshnf(tn;yn) = yn � yn�1 + h2nhn + hn�1 �yn � yn�1hn � yn�1 � yn�2hn�1 � :Then the iteration matrix for Newton's method is given by��1 + hnhn + hn�1� I � hn @f@y� :So the coe�cients of the iteration matrix depend not only on the currentstep size, but also on the previous one, and more generally on the sequenceof k � 1 past steps. For economy, it is advantageous to try to save andreuse the iteration matrix and/or its factorization from one step to the next.However, if the coe�cients in the matrix change frequently, then this is notpossible.7Recall that the divided di�erences are de�ned by[yn] = yn[yn;yn�1; : : : ;yn�i] = [yn;yn�1; : : : ;yn�i+1]� [yn�1;yn�2; : : : ;yn�i]tn � tn�i :

152 Chapter 5: Initial Value ProblemsThis changing Jacobian is a serious shortcoming of the variable-coe�cientstrategy, in the case of implicit methods for sti� problems. In the design ofcodes for nonsti� problems, for example in Adams codes, the Jacobian matrixdoes not arise and there is no need to consider the next alternative.Fixed Leading-Coe�cient StrategyThis is a compromise which incorporates the best features of both previousmethods. We describe it for the k-step BDF. First a polynomial �(t) ofdegree � k, which is sometimes called a predictor polynomial, is constructedsuch that it interpolates yn�i at the last k+1 values on the unequally-spacedmesh �(tn�i) = yn�i; i = 1; : : : ; k + 1:Then the �xed leading-coe�cient form of the k-step BDF formula is givenby requiring that a second polynomial (t) of degree � k, which interpolatesthe predictor polynomial on a �xed mesh tn�1; tn�1 � hn; : : : ; tn�1 � khn,satis�es the ODE at tn, (tn � ihn) = �(tn � ihn); 1 � i � k 0(tn) = f(tn; (tn));and setting yn = (tn):The �xed leading-coe�cient form has stability properties which are interme-diate between the other two forms, but is as e�cient as the �xed-coe�cientform.Whichever method is chosen to vary the step size, it is clear that thee�ort is more signi�cant than what is required for Runge-Kutta methods.On the other hand, estimating the local truncation error is easier with linearmultistep methods, as we will see next.5.5.2 Estimating and Controlling the Local ErrorAs was the case for Runge-Kutta methods, the errors made at each step aremuch easier to estimate than the global error. Thus, even though the globalerror is more meaningful, the local truncation error is the one that general-purpose multistep codes usually estimate in order to control the step size andto decide on the order of the method to be used. We recall from (3.14) thatthe local truncation error is related to the local error byhn(jdnj+O(hp+1)) = jlnj(1 +O(hn)):

Chapter 5: Multistep Methods 153Thus, to control the local error, multistep codes attempt to estimate andcontrol hndn.In developing estimates below using the local truncation error, we willpretend that there is no error in previous steps. This is of course not truein general, but it turns out that the errors in previous time steps are oftencorrelated so as to create a higher order contribution, so the expressionsderived by ignoring these past errors do yield the leading term of the currentlocal error. There are more di�culties with the theory when the order isvaried.Estimating the Local Truncation ErrorIn the case of predictor-corrector methods (x5.4.2), the error estimate can beexpressed in terms of the di�erence between the predictor and the corrector.Let the local truncation error of the predictor formula be given bydn = Cp+1hpy(p+1)(tn) +O(hp+1):Subtracting the predicted from the corrected values, we obtainyn � y0n = (Cp+1 � Cp+1)hpy(p+1)(tn) +O(hp+1):Hence an estimate for the local truncation error of the corrector formula orof the PECE formula is given in terms of the predictor-corrector di�erenceby Cp+1hpy(p+1)(tn) +O(hp+1) = Cp+1Cp+1 � Cp+1 (yn � y0n):This is called Milne's estimate.8 In an Adams predictor-corrector pair a k-step Adams-Bashforth predictor is used together with a (k� 1)-step Adams-Moulton corrector to obtain a PECE method of order p = k with a localerror estimate, at the cost of two function evaluations per step. See Example5.10.Alternatively, it is also possible to use a predictor of order p� 1. This isan instance of local extrapolation, as de�ned in the previous chapter.The local truncation error for more general multistep methods can beestimated directly by approximating y(p+1) using divided di�erences9. Forexample, for second order BDF, if �(t) is the quadratic interpolating yn,yn�1 and yn�2 thenf(tn;yn) = �0(tn) = [yn;yn�1] + hn[yn;yn�1;yn�2] + rn;8Note that in contrast to the methods used in the Runge-Kutta context to evaluatetwo approximations to y(tn), here the predictor and the corrector methods have the sameorder.9In the case of Adams methods, y(p+1) is approximated via the divided di�erence of f ,using y(p+1) = f (p).

154 Chapter 5: Initial Value Problemswhere rn = hn(hn + hn�1)[yn;yn�1;yn�2;yn�3]:The principal term of the local truncation error is then given by �0rn.The error estimate is used to decide whether to accept the results of thecurrent step or to redo the step with a smaller step size. The step is acceptedbased on a test EST � ETOL;where EST is hn times the estimated local truncation error.Choosing the Step Size and Order for the Next StepOnce the current step has been accepted, the next task is to choose the stepsize and order for the next step. We begin by forming estimates of the errorwhich we expect would be incurred on the next step, if it were taken with amethod of order p, for several possible orders, for example p�2, p�1, p andp + 1, where p is the current order.There are several philosophies for choosing the next order:1. Choose the next order so that the step size at that order is the largestpossible. We will show how to compute these step sizes.2. Raise or lower the order depending on whetherjhp�1y(p�1)j; jhpy(p)j; jhp+1y(p+1)j; jhp+2y(p+2)jform an increasing or decreasing sequence, where h is the current stepsize. The philosophy behind this type of order selection strategy is thatthe Taylor series expansion is behaving as expected for higher ordersonly if the magnitudes of successive higher order terms form a decreas-ing sequence. If the terms fail to form a monotone decreasing sequence,the order is lowered. The e�ect is to bias the formulae towards lowerorders, especially in situations where the higher order formulae are un-stable (thus causing the higher order di�erences to grow).Given the order p, the step size for the next step is computed as follows.Because the error for the next step is a highly nonlinear function of the stepsize to be chosen, a simplifying assumption is made. The step size expectedfor a step of order p is computed as if the last p + 1 steps were taken atthe current step size, and the step size is chosen so that the error estimatesatis�es the tolerance. More precisely, the new step size hn+1 = rhn ischosen conservatively so that the estimated error is a fraction of the desiredintegration error tolerance ETOL,jrp+1hp+1n Cp+1y(p+1)j = fracETOL;

Chapter 5: Multistep Methods 155with frac = 0:9, say. If ÊST = jhp+1n Cp+1y(p+1)j is the error estimate, thenrp+1 ÊST = fracETOL:Thus r = �fracETOLÊST � 1p+1 :5.5.3 Approximating the Solution at O�-Step PointsIn many applications, the approximate solution is needed at intermediatetimes which may not coincide with the mesh points chosen by the code.Generally, it is easy and cheap to construct polynomial interpolants basedon solution values at mesh points. Then, we just evaluate the interpolant atthe o�-step points. However, we note that the natural interpolant for BDFis continuous but not di�erentiable, and the natural interpolant for Adamsmethods is not even continuous (although its derivative is)! Although thenatural interpolants yield the requested accuracy, for applications where moresmoothness is required of the numerical solution, interpolants which matchthe solution with greater continuity have been derived.5.6 Software, Notes and References5.6.1 NotesThe Adams-Bashforth methods date back to 1883; J.C. Adams also designedthe implicit formulas known as Adams-Moulton. Both F.R. Moulton andW.E. Milne used these formulae in 1926 in predictor-corrector combinations.The BDF methods were introduced in the 1950's, if not earlier, but theycame to prominence only much later, due to the work of C.W. Gear [43]. See[50] for more background and early references.The material in xx5.2-5.4 is standard, although di�erent ways have beenused to prove the basic Stability Theorem 5.1. It is covered (plus more) in anumber of other texts, e.g. [50, 52, 62, 43, 85]. Early works of G. Dahlquistand others, reported in [54], laid the foundations of this material.For our presentation we chose a di�erent order than the other texts bycombining the nonsti� and the sti� cases. This re ects our belief that sti�equations should not be considered advanced material to be taught onlytowards the end of a course. Also, as in Chapter 4, we have omitted manystability concepts that have been proposed in the literature in the past 30

156 Chapter 5: Initial Value Problemsyears, and have instead concentrated only on the properties of sti� decay andA-stability.Writing a general-purpose code based on multistep methods is a morecomplicated endeavor than for one-step methods, as x5.5 may already sug-gest. The books by Shampine & Gordon [86] and Brenan, Campbell &Petzold [19] describe such implementations in detail.While most recent developments in the numerical ODE area seem to haverelated more to Runge-Kutta methods, you should not conclude that linearmultistep methods may be forgotten. In fact, there are still some seriousholes in the theory behind the practical implementation issues in x5.5 on onehand, and on the other hand these methods are winners for certain (but notall) applications, both sti� and nonsti�. Software exists as well. Note alsothat a number of the features and additional topics described in the previouschapter, including for example global error estimation, dense output andwaveform relaxation, are equally relevant here.5.6.2 SoftwareA variety of excellent and widely used software based on linear multistepmethods is readily available. A few of the codes are described here.� ode, written by Shampine and described in detail in [86], is basedon variable-coe�cient Adams PECE methods. It is useful for nonsti�problems, and has a feature to diagnose sti�ness.� vode, written by Hindmarsh, Brown and Byrne [21], o�ers �xed leading-coe�cient Adams and BDF methods. The implicit formulae are solvedvia functional iteration or modi�ed Newton, depending on the optionselected. Thus, this code has options to deal with both sti� and nonsti�problems.� difsub, written by Gear [43], solves sti� problems and was a veryin uential code popularizing the BDF methods.� vodpk is an extension of vode for large-scale sti� systems. In additionto the direct methods for solving linear systems used in vode, vodpko�ers the option of preconditioned Krylov iterative methods (see, e.g.,[48, 14]; the user must write a routine which gives the preconditioner,and this in some applications is a major task).� dassl and daspk [19] are based on �xed leading-coe�cient BDF for-mulae and can accommodate di�erential-algebraic equations as well assti� ODEs (see Chapter 10).

Chapter 5: Multistep Methods 1575.7 Exercises1. (a) Construct a consistent, unstable multistepmethod of order 2 (otherthan the one in Example 5.6).(b) Is it possible to construct a consistent, unstable one-step methodof order 2? Why?2. For the numerical solution of the problemy0 = �(y � sin t) + cos t; y(0) = 1; 0 � t � 1whose exact solution is y(t) = e�t + sin t, consider using the followingfour 2-step methods, with y0 = 1 and y1 = y(h) (i.e. using the exactsolution so as not to worry here about y1):(a) Your unstable method from the previous question(b) The midpoint 2-step methodyn = yn�2 + 2hfn�1(c) Adams-Bashforth yn = yn�1 + h2 (3fn�1 � fn�2)(d) BDF yn = (4yn�1 � yn�2)3 + 2h3 fn:Consider using h = :01 for � = 10, � = �10 and � = �500. Discussthe expected quality of the obtained solutions in these 12 calculations.Try to do this without calculating any of these solutions. Then con�rmyour predictions by doing the calculations.3. Write a program which, given k and the values of some of the coe�-cients �1; �2; : : : ; �k, �0; �1; : : : ; �k of a linear k-step method, will� �nd the rest of the coe�cients, i.e., determine the method, suchthat the order of the method is maximized,� �nd the error coe�cient Cp+1 of the leading local truncation errorterm.Test your program to verify the 2nd and the last rows in each of theTables 5.1 and 5.2.Now use your program to �nd Cp+1 for each of the six BDF methodsin Table 5.3.

158 Chapter 5: Initial Value Problems4. Write a program which, given a linear multistep method, will testwhether the method is� 0-stable� strongly stable[Hint: This is a very easy task using Matlab, for example.]Use your program to show that the �rst 6 BDF methods are stronglystable, but the 7-step and 8-step BDF methods are unstable. (For thisyou may want to combine your program with the one from the previousexercise.)5. The famous Lorenz equations provide a simple example of a chaoticsystem (see, e.g., [92, 93]). They are given byy0 = f(y) = 0BBB@ �(y2 � y1)ry1 � y2 � y1y3y1y2 � by3 1CCCAwhere �; r; b are positive parameters. Following Lorenz we set � =10; b = 8=3; r = 28 and integrate starting from y(0) = (0; 1; 0)T .Plotting y3 vs. y1 we obtain the famous \butter y" depicted in Fig.5.7.(a) Using a software package of your choice, integrate these equationsfor 0 � t � 100 with an error tolerance 1:e� 6, and plot y3 vs. y1,as well as y2 as a function of t. What do you observe?(b) Plot the resulting trajectory in the three dimensional phase space(i.e. the three y-coordinates; if in Matlab, type 'help plot3').Observe the strange attractor that the trajectory appears to settleinto.(c) Chaotic solutions are famous for their highly sensitive dependenceon initial data. This leads to unpredictability of the solution (andthe physical phenomena it represents). When solving numericallywe also expect large errors to result from the numerical discretiza-tion. Recompute your trajectory with the same initial data usingthe same package, changing only the error tolerance to 1:e � 7.Compare the values of y(100) for the two computations, as wellas the plots in phase plane. Discuss.6. (a) Show that the only k-step method of order k which has the sti�decay property is the k-step BDF method.

Chapter 5: Multistep Methods 159−20 −15 −10 −5 0 5 10 15 200

5

10

15

20

25

30

35

40

45

50

y_1

y_3

Figure 5.7: Lorenz \butter y" in the y1 � y3 plane.(b) Is it possible to design a strongly stable linear multistep methodof order 7 which has sti� decay?7. Explain why it is not a good idea to use an Adams-Bashforth methodto predict the �rst iterate y0n to start a Newton method for a BDF stepwhen solving a sti� problem.8. Given an ODE system y0 = f(t;y), y(0) = y0, we can calculatey0(0) = f(0;y0). The initial derivatives are used in modern BDF codesto estimate the error reliably.Consider the opposite problem: given y0(T ) at some t = T , �nd y(T )satisfying the ODE. For example, �nding the ODE solution at a steadystate corresponds to specifying y0(T ) = 0.(a) What is the condition necessary to �nd y(T ), given y0(T )? Howwould you go about �nding y(T ) in practice? [Note also the pos-sibility for multiple solutions, in which case we want the conditionfor �nding an isolated solution.](b) Suppose that the condition for solvability that you have just spec-i�ed does not hold, but it is known that the solution of the IVPsatis�es a set of nonlinear equations at each t,0 = h(t;y):

160 Chapter 5: Initial Value ProblemsHow would you modify the solvability condition? How would youimplement it? [Hint: Exercise 5.9 provides an example.]9. The following ODE system due to H. Robertson models a chemicalreaction system and has been used extensively as a test problem forsti� solvers [85, 52]: y01 = ��y1 + �y2y3y02 = �y1 � �y2y3 � y22y03 = y22 :Here � = 0:04; � = 1:e+4, and = 3:e+7 are slow, fast and very fastreaction rates. The starting point is y(0) = (1; 0; 0)T .(a) It is known that this system reaches a steady state, i.e., wherey0 = 0. Show that P3i=1 yi(t) = 1, 0 � t � b; then �nd the steadystate.(b) Integrate the system using a nonsti� code with a permissive errortolerance (say 1:e � 2) for the interval length b = 3, just to seehow inadequate a nonsti� solver can be. How far is y(b) from thesteady state?(c) The steady state is reached very slowly for this problem. Use asti� solver to integrate the problem for b = 1:e + 6 and plot thesolution on a semilog scale in t. How far is y(b) from the steadystate?10. Consider the following 2-step method [10],yn � yn�1 = h16(9fn + 6fn�1 + fn�2): (5.13)Investigate the properties of this method in comparison with the 1-stepand the 2-step Adams-Moulton formulae. Does this method have anyadvantage?[Hint: when hRe(�)!�1 one must consider the roots of the charac-teristic polynomial �(�) of (5.7b) .]11. The border of the absolute stability region is the curve in the �h-planewhere jynj = j�nj = j�n�1j, for � satisfying �(�) = h��(�). Occasionallyit is interesting to plot the region where the approximate solution fory0 = �y is actually dampened by a factor � < 1, i.e. j�j = � (recallExercise 4.8).(a) Show that the boundary of this �-region is given byh� = �(�e{�)�(�e{�) :

Chapter 5: Multistep Methods 161(b) Plot the �-curves with � = :9; :5; :1 for the backward Euler, trape-zoidal and (5.13) methods. Discuss your observations.12. Often in practice one has to solve an ODE system of the formy0 = f(t;y) + g(t;y); t � 0 (5.14)where f and g have signi�cantly di�erent characteristics. For instance,f may be nonlinear and the ODE z0 = f(t; z) is nonsti�, while g islinear but the ODE z0 = g(t; z) is sti�. This suggests mixing an explicitmethod for f with an implicit method, suitable for sti� problems, forg. An implicit-explicit (IMEX) [10] k-step method has the formkXj=0 �jyn�j = h kXj=1 �jfn�j + h kXj=0 jgn�j : (5.15)The combination of 2-step Adams-Bashforth for f and trapezoidal rulefor g is common in the PDE literature (especially in combination withspectral methods).Show that:(a) The method (5.15) has order p ifkXj=0 �j = 0 (5.16)1i! kXj=1 ji�j = � 1(i� 1)! kXj=0 ji�1�j = � 1(i� 1)! kXj=0 ji�1 jfor i = 1; 2; : : : ; p, and such a condition does not hold for i = p+1.(b) The 2p + 1 constraints (5.16) are linearly independent, providedthat p � k; thus, there exist k-step IMEX methods of order k.(c) A k-step IMEX method cannot have order greater that k.(d) The family of k-step IMEX methods of order k has k parameters.13. A convection-di�usion partial di�erential equation in one space variablehas the form (recall Examples 1.3 and 1.7)@u@t = u@u@x + @@x(p(x)@u@x); 0 � x � 1; t � 0where p = p(x) > 0 is a given function which may be small in magnitude(in which case the equation is said to be convection-dominated).

162 Chapter 5: Initial Value ProblemsWe now apply the method of lines (cf. xx1.1, 1.3). Discretizing in x ona mesh 0 = x0 < x1 < � � � < xJ = 1, �xi = xi � xi�1, �x = maxi�xi,let yi(t) be the approximation along the line of u(xi; t), and obtain theODE system of the form (5.14),y0i = yi� yi+1 � yi�1�xi +�xi+1�+ 2�xi +�xi+1 � pi+1=2�xi+1 (yi+1 � yi)� pi�1=2�xi (yi � yi�1)�= fi(y) + gi(y); i = 1; : : : ; J � 1: (5.17)Here, pi�1=2 = 12(p(xi) + p(xi�1)) or, if p is a rough function, we choosethe harmonic averagepi�1=2 � �xi �Z xixi�1 p�1(x)dx��1 :If p is small then the centered discretization leading to fi(y) is ques-tionable, but we do not pursue this further here.It is natural to apply an IMEX method to (5.17), since the nonlinearconvection term typically yields an absolute stability requirement ofthe form h � const�x, which is not di�cult to live with, whereas thelinear di�usion term is sti� (unless p is very small). Moreover, due tothe hyperbolic nature of the convection term and the parabolic natureof the di�usion term, an appropriate test equation to investigate thestability properties of the IMEX method (5.15) isy0 = (a+ {b)y (5.18)with a; b real constants, a < 0 ({ = p�1), and where we identifyf(y) = {by and g(y) = ay in (5.14), (5.15).(a) What is the domain of absolute stability for an IMEXmethod withrespect to this test equation? What corresponds to a �-region asin Exercise 5.11?(b) Plot �-curves with � = 1; :9; :5; :1 for the following 2-step IMEXmethods:� Adams-Bashforth with trapezoidal methodyn = yn�1 + h2 (3fn�1 � fn�2 + gn + gn�1):� Adams-Bashforth with (5.13)yn = yn�1 + h16(24fn�1 � 8fn�2 + 9gn + 6gn�1 + gn�2):� Semi-explicit BDFyn = 13(4yn�1 � yn�2) + 2h3 (2fn�1 � fn�2 + gn):Discuss your observations.

Chapter 6More BVP Theory andApplicationsIn this chapter and the next two we will consider an ODE system with mcomponents, y0 = f(t;y); 0 < t < b (6.1)subject to m two-point boundary conditionsg(y(0);y(b)) = 0 : (6.2)We denote the Jacobian matrices of g(u;v) with respect to its �rst andsecond argument vectors byB0 = @g@u; Bb = @g@v : (6.3)Often in applications, g is linear, i.e. the boundary conditions can bewritten as B0y(0) +Bby(b) = b (6.4)for some given data b,1 and the m�m matrices B0 and Bb are constant.Also, often in applications the boundary conditions are separated, i.e.each of the components of g is given either at t = 0 or at t = b, but noneinvolves both ends simultaneously.2 In this case for each i, 1 � i � m, eitherthe ith row of B0 or the ith row of Bb are identically zero.Example 6.1 Recall the vibrating spring Example 1.4,�(p(t)u0)0 + q(t)u = r(t)1Note that the data vector b and the interval end b are not related. Alas, we seem tobe running out of good notation.2A notable exception is the case of periodic boundary conditions.163

164 Chapter 6: Boundary Value Problemsu(0) = 0; u0(b) = 0where p(t) > 0, q(t) � 0 for all 0 � t � b. In more independent variables, aproblem like this corresponds to an elliptic partial di�erential equation.To convert this into a system we have two popular options.� The standard option is to set y1 = u, y2 = u0, resulting in a system ofthe form (6.1) with m = 2.� Often in practice the function p(t) has discontinuities. In this case it isbetter to de�ne the unknowns y1 = u and y2 = py01 (this y2 is sometimesreferred to as the ux). This gives an ODE system in the form (6.1)with f(t;y) =0@ p�1y2qy1 � r1A.The boundary conditions are separated and are given for both choices of un-knowns by B0 = 0@1 00 01A ; Bb = 0@0 00 11A ; b = 0 : 2We have already seen in Chapter 1 that there is no chance for extendingthe general Existence and Uniqueness Theorem 1.1 for initial value problemsto the boundary value problem case. In particular, assuming that the con-ditions of that theorem hold for f , for each initial value vector c we have asolution y(t) = y(t; c) for the ODE (6.1) satisfying y(0; c) = c. Substitutinginto (6.2) we have g(c;y(b; c)) = 0:This gives a set of m nonlinear algebraic equations for the m unknownsc (unknown, because we are asking what initial conditions would yield asolution that satis�es the boundary conditions). It is well-known that ingeneral such a system may have many solutions, one, or none at all.Example 6.2 The problem u00 + eu+1 = 0u(0) = u(1) = 0has two solutions of the formu(t) = �2 lnfcosh[(t� 1=2)�=2]cosh(�=4) g

Chapter 6: More BVP Theory and Applications 1650 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.5

1

1.5

2

2.5

3

Figure 6.1: Two solutions u(t) for the BVP of Example 6.2.where � satis�es � = p2e cosh(�=4) :This nonlinear algebraic equation has two solutions for � (Exercise 6.1). Thecorresponding two solutions of the BVP are plotted in Fig. 6.1. 2The possibility of having more than one solution does not in itself preventus from expecting to be able to �nd them. The question of existence ofunique solutions for a nonlinear BVP must be considered in a local sense.The important question is whether a BVP solution is isolated, i.e., if thereis a neighborhood about it in which it is the only solution. For this purposewe look at the variational problem for the BVP (6.1)-(6.2): assuming forsimplicity of notation that g is linear, i.e. that the boundary conditions arein the form (6.4), the variational problem corresponding to linearizing theproblem about an exact solution y(t) isz0 = A(t;y(t))z (6.5)B0z(0) +Bbz(b) = 0where A = @f@y is the Jacobian matrix. Now, if the variational problem hasthe unique solution z = 0 then the solution y(t) of the given nonlinear prob-lem is isolated, or locally unique. We will show this claim following (6.14)below. The uniqueness of the zero solution z means that the linearizationis nonsingular and this gives us a �ghting chance at �nding isolated solu-tions using the numerical methods described in the next two chapters. ForExample 6.2, it can be veri�ed that both solutions are isolated.

166 Chapter 6: Boundary Value ProblemsIn order to understand the issues arising in the numerical solution ofBVPs, then, we must get a better idea of the theory of linear BVPs.6.1 Linear Boundary Value Problems and Green'sFunctionConsider the linear ODE system of m equations,y0 = A(t)y+ q(t); 0 < t < b (6.6)and recall that a fundamental solution Y (t) is the m � m matrix functionsatisfying Y 0 = A(t)Y; 0 < t < band Y (0) = I. Using this fundamental solution, the general solution of theODE (6.6) is y(t) = Y (t) �c+ Z t0 Y �1(s)q(s)ds� : (6.7)The parameter vector c in (6.7) is determined by the linear boundary condi-tions (6.4). Substituting, we get[B0Y (0) +BbY (b)]c = b�BbY (b)Z b0 Y �1(s)q(s)ds :The right hand side in the above expression depends on the given data.Thus we have obtained a basic existence and uniqueness theorem for linearboundary value problems.Theorem 6.1 Let A(t) and q(t) be continuous and de�ne the matrixQ = B0 +BbY (b) (6.8)(remember, Y (0) = I). Then� The linear BVP (6.6),(6.4) has a unique solution i� Q is nonsingular.� If Q is nonsingular then the solution is given by (6.7) withc = Q�1 �b�BbY (b)Z b0 Y �1(s)q(s)ds� :

Chapter 6: More BVP Theory and Applications 167Example 6.3 Returning to the �rst example in Chapter 1,u00 = �u; u(0) = b1; u(b) = b2we write this in �rst order form withA = 0@ 0 1�1 01A ; B0 = 0@1 00 01A ; Bb = 0@0 01 01A ; b =0@b1b21A :It is easy to verify that Y (t) = 0@ cos t sin t� sin t cos t1Aso Q = B0 +Bb0@ cos b sin b� sin b cos b1A = 0@ 1 0cos b sin b1A :This matrix is singular i� b = j� for some integer j. Theorem 6.1 nowimplies that a unique solution exists if b 6= j�, for any integer j { see Fig.1.1. 2The fundamental solution Y (t) satis�es Y (0) = I, i.e., it is scaled for aninitial value problem. A better scaled fundamental solution for the BVP athand is �(t) = Y (t)Q�1 : (6.9)Note that � satis�es the homogeneous ODE, i.e. it is indeed a fundamentalsolution. We have �0 = A�; 0 < t < b (6.10)B0�(0) +Bb�(b) = I :So �(t) plays the same role for the BVP as Y (t) plays for the IVP.We often refer to the columns of the scaled fundamental solution �(t)(or Y (t) in the IVP case) as solution modes, or just modes for short. Theyindicate the solution sensitivity to perturbation in the initial data (recallChapter 2).If we carry out the suggestion in Theorem 6.1 and substitute the expres-sion for c into (6.7) then we get an expression for the solution y(t) in termsof the data b and q(t). Rearranging, this givesy(t) = �(t)b+ Z b0 G(t; s)q(s)ds (6.11)

168 Chapter 6: Boundary Value Problems0 1 2 3 4 5 6 7 8 9 10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t

y

y y1 2

Figure 6.2: The function y1(t) and its mirror image y2(t) = y1(b � t), for� = �2; b = 10.where G(t; s) is the Green's functionG(t; s) = 8<: �(t)B0�(0)��1(s) s � t��(t)Bb�(b)��1(s) s > t (6.12)Green's function may be loosely viewed as the inverse of the di�erentialoperator (or, as the solution operator).6.2 Stability of Boundary Value ProblemsTo understand the fundamental issues in stability for BVPs, the reader mustbe familiar with the rudiments of stability of IVPs. Therefore, please makesure that you are familiar with the contents of Chapter 2.Consider the test equation y01 = �y1 for 0 < t < b and regard b as verylarge. The IVP (e.g. y1(0) = 1) is stable if Re(�) � 0. Now apply a variabletransformation � = b � t. The same problem in � then reads dy2d� = ��y2,with y2(b) = 1, i.e. this is a terminal value problem and we are integratingfrom b to 0, see Fig. 6.2. Of course, reversing the direction of time doesnot a�ect the stability, which has to do with the e�ect of small changes inthe data on the solution, so this terminal value problem is stable as well (forRe(��) � 0). Putting the two together, we obtain that the following BVP

Chapter 6: More BVP Theory and Applications 169is stable y0 = Ay; A = 0@� 00 ��1Ay1(0) = 1; y2(b) = 1although the IVP for the same ODE is unstable when Re(�) 6= 0. Thus,the stability of solutions for a given ODE depends on how (and where) theboundary conditions are speci�ed.For a general linear BVP (6.6), (6.4), the sensitivity of the solution toperturbations in the data is immediately given by introducing bounds in(6.11), because this formula gives the solution in terms of the data. Let thestability constant of the BVP be de�ned by� = max(k�k1; kGk1) : (6.13)Then from (6.11),kyk = max0�t�b jy(t)j � � �jbj+ Z b0 jq(s)jds� : (6.14)Rather than considering families of problems with b becoming unbounded, weshall say qualitatively that the linear BVP is stable if the stability constant� is of moderate size. Roughly, \moderate size" means not much larger thanthe magnitude of the problem's coe�cients, kA(t)kb.Why is (6.14) a stability bound? Consider a perturbed problem, y0 =A(t)y+ q(t), B0y(0)+Bby(b) = b. Thus, the inhomogeneities are perturbedby �(t) = q(t) � q(t) and � = b � b. Then the perturbation in the solu-tion, x(t) = y(t) � y(t), satis�es the same linear BVP (6.6), (6.4) for theperturbation in the data,x0 = A(t)x+ �(t); 0 < t < bB0x(0) +Bbx(b) = �:So (6.14) bounds x in terms of the perturbations in the data,kxk � � �j�j+ Z b0 j�(s)jds� : (6.15)Now we can further explain the concept of an isolated solution for thenonlinear problem (6.1), (6.4). Suppose that y(t) is a non-isolated solution,i.e., for any arbitrarily small � > 0 there is another solution y which satis�esy0 = f(t; y), B0y(0) + Bby(b) = b, ky � yk = �. Then the di�erence x(t) =y(t)� y(t) satis�esx0 = f(t; y)� f(t;y) = A(t;y(t)) x+O(�2); 0 < t < bB0x(0) +Bbx(b) = 0:

170 Chapter 6: Boundary Value ProblemsNote that the variational problem (6.5) has the unique zero solution i� thecorresponding Q of (6.8) is nonsingular. But if Q is nonsingular then forsome �nite � we get from (6.15) that� = kxk � �O(�2) :This inequality cannot hold if � is arbitrarily small and positive. Hence, thenon-singularity of the variational problem implies that y(t) is an isolatedsolution of the nonlinear problem.The stability of the problem essentially means that Green's function isnicely bounded. Consider next the case of separated boundary conditions,i.e., assume that the �rst k rows of Bb and the last m� k rows of B0 are allzeros. Then from (6.10), clearlyB0�(0) = P = 0@Ik 00 01A ; Bb�(b) = I � Pwhere Ik is the k�k identity, so P is an orthogonal projection matrix (mean-ing P 2 = P ) of rank k. In this case we can write Green's function asG(t; s) = 8<: �(t)P��1(s) s � t��(t)(I � P )��1(s) s > tThe BVP is said to have dichotomy if there is a constant K of moderatesize such that k�(t)P��1(s)k � K; s � t (6.16a)k�(t)(I � P )��1(s)k � K; s > t : (6.16b)The BVP has exponential dichotomy if there are positive constants �; � suchthat k�(t)P��1(s)k � Ke�(s�t); s � t (6.17a)k�(t)(I � P )��1(s)k � Ke�(t�s); s > t : (6.17b)Dichotomy and exponential dichotomy correspond to stability and asymp-totic stability, respectively, in IVPs. (Compare (2.10) with (6.16a) for k =m.) Dichotomy implies that the �rst k columns of �(t) are non-increasing(actually decreasing in case of exponential dichotomy) as t grows, and thatthe last m � k columns of �(t) are nondecreasing (actually increasing incase of exponential dichotomy) as t grows. The k non-increasing modes are

Chapter 6: More BVP Theory and Applications 171controlled in size by the boundary conditions at 0, whereas the m� k non-decreasing modes are controlled in size by the boundary conditions at b.Dichotomy is a necessary and su�cient condition for stability of the BVP.The situation for non-separate boundary conditions is much more com-plicated, although the conclusions remain essentially the same.Example 6.4 For the problemu00 = u; u(0) = b1; u(b) = b2i.e., with the ODE di�erent from Example 6.3 but the boundary conditionsthe same, we convert to �rst order form with A = 0@0 11 01A. The fundamentalsolution satisfying Y (0) = I isY (t) = 0@cosh t sinh tsinh t cosh t1A :Clearly, kY (t)k grows exponentially with t, indicating that the initial valueproblem is unstable. For the boundary value problem, however, we haveQ = B0Y (0) +BbY (b) = 0@ 1 0cosh b sinh b1A ;so �(t) = Y (t)Q�1 = 1sinh b0@ sinh(b� t) sinh t� cosh(b� t) cosh t1A :Thus, the �rst column of �(t) (here k = 1 and m = 2) is decreasing in t andthe second column of �(t) is increasing. Both of these columns are nicelyscaled: k�k � 1even though Q becomes extremely ill-conditioned as b grows. We leave itas Exercise 6.2 to show that this boundary value problem is stable and hasexponential dichotomy. 26.3 BVP Sti�nessIn x3.4 we introduce the notion of sti�ness for IVPs. In the terminology ofthe previous section, a sti� (linear) problem is a stable problem which has

172 Chapter 6: Boundary Value Problemsvery fast modes. For an IVP such modes can only be decreasing. But for astable BVP we must entertain the possibility of both rapidly decreasing andrapidly increasing modes being present.Corresponding to (3.23) in x3.4 we say that a stable BVP for the testequation y0 = �y ; 0 < t < bis sti� if bjRe(�)j � 1: (6.18)In contrast to the IVP case, here we no longer require Re(�) < 0. Similarlyto (3.24), this generalizes for a nonlinear system y0 = f(t;y) tobjRe(�j)j � 1 (6.19)where �j are the eigenvalues of the local Jacobian matrix @f@y(t;y(t)). 3This extension of the IVP de�nition makes sense, in light of the discussionof dichotomy in the previous section. The practical understanding of thequalitative notion behind the inequalities in (6.18) and (6.19) is that wemust look for numerical methods that work also when hjRe(�j)j � 1, whereh is a typical discretization step size.However, this is easier said than done. There are really no known dis-cretization methods which have a similar robustness to that in the IVP caseof backward Euler and its higher order extensions (e.g. BDF methods andcollocation at Radau points). The methods discussed in the next chapter,and other variants which are not discussed there, are not suitable for very sti�BVPs. Symmetric di�erence methods like midpoint, which are our methodsof choice for BVPs and are discussed in Chapter 8, often perform well inpractice for sti� BVPs, but their theoretical foundation is somewhat shakyin this case, as discussed further in Chapter 8. There are methods (e.g.Riccati) which attempt to decouple rapidly increasing and rapidly decreasingmodes explicitly, and then integrate such modes only in their correspondingstable directions. But these methods appear more suitable for special appli-cations than for general-purpose use for nonlinear problems. To explicitlydecouple modes, especially for nonlinear problems, is no easy task.6.4 Some Reformulation TricksWhile general-purpose codes for BVPs usually assume a system of the form(6.1) subject to boundary conditions of the form (6.2) or, even more fre-3Of course, �j = �j(t) may in general have a large real part in some parts of the intervaland a small (in magnitude) real part in others, but let us assume here, for simplicity ofthe exposition, that this does not happen.

Chapter 6: More BVP Theory and Applications 173quently, separated boundary conditions, the natural way in which boundaryvalue problems arise in applications often does not conform to this standardform. An example that we have already seen is the conversion from a higherorder ODE system to a �rst order system. There are other, less obvioussituations, where a given BVP can be reformulated. Of course all this canbe said of IVPs as well, but there is more diversity in the BVP case. Thereare a number of reformulation \tricks" that can be used to convert a givenproblem to standard form, of which we describe a few basic ones here.In many applications, the ODE system depends on an unknown constant,a, and this gives rise to an additional boundary condition. One can then addthe ODE a0 = 0to the system. This means that the constant a is viewed as a function overthe interval of integration which is independent of t.Example 6.5 The ow in a channel can be modeled by the ODEf 000 �R[(f 0)2 � ff 00] + Ra = 0f(0) = f 0(0) = 0; f(1) = 1; f 0(1) = 0 :The constant R (Reynolds number) is known, but the constant a is undeter-mined. There are 4 boundary conditions on the potential function f whichdetermine both it and a. To convert to standard form we write y1 = f; y2 =f 0; y3 = f 00; y4 = a; and obtainy0 = f(y) = 0BBBBBB@ y2y3R[y22 � y1y3 � y4]0 1CCCCCCA :The boundary conditions are obviously in separated, standard form as well.2 The unknown constant can be the size of the interval of integration. As-suming that the problem is given in the form (6.1) but with the integrationrange b unknown, we can apply the change of variable� = t=bto obtain the ODE systemdyd� = bf(b�;y); 0 < � < 1dbd� = 0 :

174 Chapter 6: Boundary Value ProblemsSo, the new vector of unknowns is 0@y(� )b 1A and its length is m + 1. Thisis also the number of independent boundary conditions that should be givenfor this system.The unknown constants trick can also be used to convert non-separateboundary conditions to separated boundary conditions, at the expense ofincreasing the size of the ODE system. (In the end, representing a constantby an unknown function to be integrated throughout the interval in t is neververy economical, so there has to be a good reason for doing this.) We haveseen in the previous section that the theory is easier and simpler for the caseof separated boundary conditions. This tends also to be re ected in simplersolution methods for the linear systems arising from a �nite di�erence ormultiple shooting discretization.Given the boundary conditions g(y(0);y(b)) = 0, let a = y(0) be ourunknown constants. Then we can rewrite the system in the formy0 = f(t;y)a0 = 0;with y(0) = a(0)g(a(b);y(b)) = 0:6.5 Notes and ReferencesChapter 3 of Ascher, Mattheij & Russell [8] contains a much more detailedaccount of the material presented in x6.1 and x6.2, which includes the var-ious extensions and proofs mentioned here. Classical references on Green'sfunction and on dichotomy are Stakgold [88] and Coppel [32], respectively.For periodic solutions, see Stuart & Humphries [93]. Sti�ness and decouplingin the linear case are discussed at length in [8], where more reformulationexamples and references can be found as well.

Chapter 6: More BVP Theory and Applications 1756.6 Exercises1. Show that the equation � = p2e cosh(�=4)has two solutions �.2. (a) Show that the problem in Example 6.4 is stable for all b > 0 andhas exponential dichotomy. What are its Green's function andstability constant?(b) Same question for the periodic boundary conditionsu(0) = u(b); u0(0) = u0(b) :3. Consider the problemu000 = 2u00 + u0 � 2u; 0 < t < bu0(0) = 1; u(b)� u0(b) = 0;u(�) = 1with b = 100.(a) Convert the ODE to a �rst order system and �nd its fundamentalsolution satisfying Y (0) = I.[ Hint: another, well scaled fundamental solution is(t) = 0BBB@ e�t et�b e2(t�b)�e�t et�b 2e2(t�b)e�t et�b 4e2(t�b)1CCCAand recall that Y (t) = (t)R for some constant matrix R.](b) It's not given whether the last boundary condition is prescribedat � = 0 or at � = b. But it is known that the BVP is stable(with stability constant � < 20). Determine where this boundarycondition is prescribed.4. Consider an ODE system of size my0 = f(t;y) (6.20a)where f has bounded �rst and second partial derivatives, subject toinitial conditions y(0) = c (6.20b)

176 Chapter 6: Boundary Value Problemsor boundary conditionsB0y(0) +Bby(b) = b : (6.20c)It is often important to determine the sensitivity of the problem withrespect to the data c or b. For instance, if we change cj to cj + �for some j, 1 � j � m, where j�j � 1, and call the solution of theperturbed problem y(t), what can be said about jy(t)�y(t)j for t � 0?(a) Writing the solution of (6.20a,6.20b) as y(t; c), de�ne the m�mmatrix function Y (t) = @y(t; c)@c :Show that Y satis�es the initial value problemY 0 = A(t)YY (0) = Iwhere A = @f@y (t;y(t; c)).(b) Let y(t) satisfy (6.20a) andy(0) = c+ �dwhere jdj = 1 and j�j � 1. Show thaty(t) = y(t) + �Y (t)d+O(�2) :In particular, what can you say about the sensitivity of the prob-lem with respect to the j-th initial value?(c) Answer questions analogous to (a) and (b) above regarding thesensitivity of the boundary value problem (6.20a,6.20c) with re-spect to the boundary values b. How would a bound on ky �yk1 = max0�t�b jy(t)� y(t)j relate to the stability constant � of(6.13)?

Chapter 7ShootingShooting is a straightforward extension of the initial value techniques thatwe have seen so far in this book to solve boundary value problems. Essen-tially, one \shoots" trajectories of the same ODE with di�erent initial valuesuntil one \hits" the correct given boundary values at the other interval end.The advantages are conceptual simplicity and the ability to make use of theexcellent, widely available, adaptive initial-value ODE software. But thereare fundamental disadvantages as well, mainly in that the algorithm inheritsits stability properties from the stability of the initial value problems that itsolves, not just the stability of the given boundary value problem.7.1 Shooting: a Simple Method and its Lim-itationsFor a system of ODEs of order m,y0 = f(t;y); 0 < t < b (7.1)subject to m two-point boundary conditionsg(y(0);y(b)) = 0 (7.2)we denote by y(t) = y(t; c) the solution of the ODE (7.1) satisfying theinitial condition y(0; c) = c. Substituting into (7.2) we haveh(c) � g(c;y(b; c)) = 0 : (7.3)This gives a set of m nonlinear algebraic equations for the m unknowns c.The simple (or single) shooting method consists of a numerical imple-mentation of these observations, which we have used in previous chapters for177

178 Chapter 7: Boundary Value Problemstheoretical purposes. Thus, one couples a program module for solving non-linear algebraic equations (such library routines are available) with a modulethat, for a given c, solves the corresponding initial value ODE problem.Example 7.1 Recall Example 6.2 which considers a very simple model of achemical reaction u00 + eu+1 = 0u(0) = u(1) = 0 :The two solutions are depicted in Fig. 6.1 (only the lower one is a physicallystable steady state). Converting to �rst order form for y = (u; u0)T , weknow that y1(0) = 0 = c1, so only y2(0) = c2 is unknown. The IVP has aunique solution y(t; c) (or u(t; c2)) for each value c2, even though it is notguaranteed that this solution will reach t = 1 for any c2. But, as it turns out,this problem is easy to solve using simple shooting. With a starting \angle"of shooting (for the nonlinear iteration) c02 = 0:5, the lower curve of Fig. 6.1is obtained after a few Newton iterations to solve (7.3), and with a starting\angle" of shooting c02 = 10, the high curve of Fig. 6.1 is easily obtained aswell (Exercise 7.1). 2Let us consider Newton's method for the solution of the nonlinear equa-tions (7.3). The iteration isc�+1 = c� ��@h@c��1 h(c�)where c0 is a starting iterate1 (di�erent starting guesses can lead to di�erentsolutions, as in Example 7.1). To evaluate h(c�) at a given iterate we haveto solve an IVP for y(t; c) (see (7.3)). Moreover, to evaluate �@h@c � at c = c�,we must di�erentiate the expression in (7.3) with respect to c. Using thechain rule of di�erentiation and the notation of (6.3), this gives�@h@c� = B0 +BbY (b) = Qwhere Y (t) is the m�m fundamental solution matrix satisfyingY 0 = A(t)Y; 0 < t < bY (0) = Iwith A(t;y(t; c�)) = @f@y (see Chapter 6 { this variational ODE should befamiliar to you at this point).1Note that the superscript � is an iteration counter, not a power.

Chapter 7: Shooting 179We see therefore that using Newton's method, m + 1 IVPs are to besolved at each iteration (one for h and m linear ones for the columns ofY (t)). However, the m linear systems are simple and they share the samematrix A(t) which can therefore be evaluated once for all m systems, so thesolution of these IVPs typically costs much less than m+1 times the solutionof the IVP for h. 2Once convergence has been obtained, i.e. the appropriate initial valuevector c which solves h(c) = 0 has been (approximately) found, we integratethe corresponding IVP to evaluate the solution of the BVP at any givenpoints.To summarize, here is the algorithm combining shooting with Newton'smethod for a nonlinear BVP (7.1)-(7.2).Algorithm 7.1 Shooting with Newton� Given1. f , @f@y for each t and y;2. g(u;v), @g@u, @g@v for each u and v;3. An initial-value solver;4. An initial guess c0; and5. A convergence tolerance TOL for the nonlinear iteration.� For s = 0; 1; : : : , until jcs+1 � csj < TOL,1. Solve the IVP (7.1) with y(0) = cs, obtaining a mesh and solutionvalues ysn; n = 0; : : : Ns.2. Construct h(cs) = g(cs;ysNs).3. Integrate the fundamental matrix Yn; n = 0; : : :Ns (Y0 = I), onthe same mesh, using A(tn) = @f@y(tn;ysn).4. Form Q = B0 +BbY sNs usingB0 = @g@u(cs;ysNs); Bb = @g@v(cs;ysNs) ;and solve the linear systemQ� = h(cs)for the Newton correction vector �.2Solving the variational ODE is equivalent to computing the sensitivity of the solutionto the original ODE (7.1) with respect to variations in the initial conditions, see x4.6.

180 Chapter 7: Boundary Value Problems5. Set cs+1 = cs + � :� Solve the IVP for (7.1) for y(0) = c, with the values c obtained by theNewton iteration.We note that to maximize the e�ciency of a shooting code, methods otherthan Newton's (e.g. quasi-Newton) should be used. We do not pursue thisfurther, though.7.1.1 Di�cultiesFrom the above description we also see the potential trouble that the simpleshooting method may run into: the conditioning of each iteration dependson the IVP stability, not only on the BVP stability. The matrix that featuresin the iteration is Q = B0+BbY (b), and this matrix can be extremely poorlyconditioned (recall Example 6.4) even when the BVP is stable and not verysti�. Finding the solution, once the correct initial values are known, alsoinvolves integrating a potentially unstable IVP.It is not di�cult to see that if a method of order p is used for the initialvalue integrations (in the sense of the IVP methods studied in Chapters3, 4 and 5) then a method of order p is obtained for the boundary valueproblem. This follows directly if we assume that the nonlinear iterationfor (7.3) converges, and in the absence of roundo� errors. The trouble in�nding c (if there is any) does not arise because of truncation errors, becausefor a stable BVP error growth along unstable modes gets cancelled (recallx6.2), and this e�ect is reproduced by a consistent, stable IVP discretization.Also, if the BVP is unstable then the shooting method is expected to havedi�culties, but these will be shared by other standard methods; the casewhere the simple shooting method is particularly unsatisfactory is when othersimple methods (discussed in the next chapter) work well while this methoddoes not. Such is the case in the following example.Example 7.2 The following problemy0 = A(t)y+ q(t)A =0BBB@ 0 1 00 0 1�2�3 �2 2�1CCCAy1(0) = b1; y1(1) = b2; y2(1) = b3

Chapter 7: Shooting 181has the exact solutiony(t) = (u(t); u0(t); u00(t))T ; u(t) = e�(t�1)+ e2�(t�1) + e��t2 + e�� + cos �t(you can evaluate the expressions and values for q(t) = y0(t)�A(t)y(t) andthe boundary values b from this exact solution). The problem is in the form(7.7) with B0 = 0BBB@1 0 00 0 00 0 01CCCA ; Bb = 0BBB@0 0 01 0 00 1 01CCCA :For � = 20, say, the BVP is stable but the IVP is not.In Figs. 7.1 and 7.2 we display the exact and approximate solutions (solidand dashed lines, resp.) for various values of � ranging from a harmless 1 toa tough 50. We use the classical Runge-Kutta method of order 4 with a �xedstep size h = :004 and a 14-hexadecimal digits oating point arithmetic.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

t

u

exact and approximate u(t)

(a) � = 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

−0.5

0

0.5

1

1.5

2

t

u


(b) � = 10Figure 7.1: Exact (solid line) and shooting (dashed line) solutions for Exam-ple 7.2.Note that the disastrous e�ect observed is due to the propagation of errorsin the obtained initial values c by unstable modes (recall Example 2.1). Theerror in c is unavoidable and is due to roundo�, not truncation errors. Wehave chosen the discretization step-size so small, in fact, that for the case� = 1 the errors in the initial values vector are all below 10�9, as is themaximum error in u in the ensuing integration for the approximate solution

182 Chapter 7: Boundary Value Problems0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−250

−200

−150

−100

−50

0

50

t

u


(a) � = 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5

0

0.5

1

1.5

2

2.5x 10

32

t

u


(b) � = 50Figure 7.2: Exact (solid line) and shooting (dashed line) solutions for Exam-ple 7.2.of the BVP. For � = 10, already an O(1) error is observed (the maximumerror in u is :34). This case may be regarded as particularly worrisome,because the wrong solution obtained for a moderate value of � may also lookplausible. For � = 20, this maximum error is 206:8 (although the error in theinitial conditions is only less than 10�4); and for � = 50, the overall error inu is 2:1e+ 32 and in the initial conditions it is about 108.The instability is already extreme for � = 20, a value for which the BVPis not very sti�. 2Another potential di�culty with the simple shooting method arises fornonlinear problems. The method assumes that the initial value problemsencountered will have solutions, even for inaccurate initial values, that reachall the way to t = b. For nonlinear problems, however, there is no guaranteethat this would be the case. Initial value solutions with incorrect initial valuesare typically guaranteed to exist locally in t, but not necessarily globally. Foranother potential di�culty with the nonlinear iteration see Exercise 7.7.

Chapter 7: Shooting 183b0 Figure 7.3: Multiple shooting7.2 Multiple ShootingBoth disadvantages of the simple shooting method become worse for largerintervals of integration of the initial value problems. In fact, a rough boundon the propagation error, which is approximately achieved in Example 7.2,is eLb, where L = maxt kA(t)k. The basic idea of multiple shooting is thento restrict the size of intervals over which IVPs are integrated. De�ning amesh 0 = t0 < t1 < : : : < tN�1 < tN = bwe consider approximating the solution of the ODE system y0 = f(t;y)by constructing an approximate solution on each subinterval [tn�1; tn] andpatching these approximate solutions together to form a global one (see Fig.7.3). Thus, let yn(t; cn�1) be the solution of the initial value problemy0n = f(t;yn); tn�1 < t < tn (7.4a)yn(tn�1) = cn�1; (7.4b)for 1 � n � N .3 Assuming for the moment that the initial value problems(7.4) are solved exactly, we then have that the exact solution of the problem(7.1)-(7.2) satis�esy(t) = yn(t; cn�1); tn�1 � t � tn; 1 � n � Nif yn(tn; cn�1) = cn; 1 � n � N � 1 (7.5a)g(c0;yN(b; cN�1)) = 0 : (7.5b)The conditions (7.5a) are patching conditions which ensure that y(t) patchedfrom the di�erent pieces yn(t; cn�1) is continuous on the entire interval [0; b],and (7.5b) is just the resulting expression for the boundary conditions (7.2).3It is important not to confuse this notation with what is used in the �nite di�erencechapters 3, 4, 5 and 8 for a slightly di�erent purpose. Here yn is meant to be the exactsolution on a subinterval [tn�1; tn], provided we can �nd the right cn�1.

184 Chapter 7: Boundary Value ProblemsThe conditions (7.5) give Nm algebraic equations for the Nm coe�cientsc = (cT0 ; cT1 ; : : : ; cTN�1)T :We write these equations, as before, ash(c) = 0 : (7.6)Applying Newton's method to solve the nonlinear equations (7.6) resultsat each iteration � in a system of linear equations which can be viewed asarising from the same multiple shooting method for the linearized boundaryvalue problem A(c�+1 � c�) = �h(c�)where A = @h@c (c�) has a sparse block structure, as in (7.10) below. Anadvantage of Newton's method here (not shared by quasi-Newton methods)is that this sparse block structure remains intact during the iteration process.Since the system of linear equations is the same as the one obtained byapplying the same multiple shooting method to the linearized problem, letus consider the latter further. For the linear problemy0 = A(t)y+ q(t) (7.7)B0y(0) +Bby(b) = bwe can write yn(t; cn�1) = Yn(t)cn�1 + vn(t)where Yn(t) is the fundamental solution satisfyingY 0n = A(t)Yn; Yn(tn�1) = I(in particular, Y1 � Y ), and vn(t) is a particular solution satisfying, e.g.,v0n = A(t)vn + q(t); vn(tn�1) = 0 :The patching conditions and boundary conditions are thenIcn � Yn(tn)cn�1 = vn(tn) 1 � n � N � 1 (7.8a)B0c0 +BbYN (b)cN�1 = b�BbvN (b) : (7.8b)Writing these conditions as a linear system, we getAc = r (7.9)

Chapter 7: Shooting 185where A = 0BBBBBBBBB@�Y1(t1) I�Y2(t2) I. . . . . .�YN�1(tN�1) IB0 BbYN(b)1CCCCCCCCCA ;c = 0BBBBBBBBB@ c0c1...cN�2cN�11CCCCCCCCCA ; r = 0BBBBBBBBB@ v1(t1)v2(t2)...vN�1(tN�1)b�BbvN (b)1CCCCCCCCCA : (7.10)The matrix A is large and sparse when N is large, but there are well-knownvariants of Gaussian elimination which allow the solution of the linear systemof equations (7.9) in O(N) time. This will be discussed in x8.2. In fact, givenN parallel processors in a computational model which ignores communicationcosts, the solution time for this linear system can be reduced to O(logN).Note that the blocks Yn(tn) can be constructed in parallel too4. Initial valueintegration is applied for these constructions, as well as for the constructionof the vn's.Turning to the question of whether the instability of the single shootingmethod has been improved upon, note that, assuming that the boundarymatrices are scaled to O(1),kAk = const� max1�n�NfkYn(tn)kg+ 1� :It can be also veri�ed directly that A has the inverseA�1 =0BBB@ G(t0; t1) � � � G(t0; tN�1) �(t0)... ... ...G(tN�1; t1) � � � G(tN�1; tN�1) �(tN�1)1CCCA (7.11)where G and � are de�ned in (6.12) and (6.9), resp. Therefore, with � thestability constant of the given boundary value problem (recall (6.13)),kA�1k � N�4For this reason the multiple shooting method is sometimes referred to as the parallelshooting method.

186 Chapter 7: Boundary Value Problemsso cond(A) = kAkkA�1k � const �N � max1�n�NfkYn(tn)kg+ 1� (7.12)for some moderate constant const.The problem with simple shooting is that kY (b)k can be very large, evenfor stable BVPs, and it features prominently in the conditioning of the shoot-ing algorithm (because kAk is very large). The bound on the condition num-ber in (7.12) is often much more acceptable. For Example 7.2 with � = 20,10 equally spaced multiple shooting points produce an accurate solution (to7 digits, using the same discretization for the IVPs), in contrast to the simpleshooting results shown in Fig. 7.2. The other disadvantage, resulting from�nite escape time in nonlinear initial value problems, is corrected to a largeextent by multiple shooting as well.However, with the signi�cant improvement of the various de�ciencies, theconceptual simplicity of simple shooting is also gone. Moreover, for very sti�BVPs the number of shooting points must grow unacceptably large with thesti�ness parameter (e.g., it is proportional to �, as �!1, in Example 7.2).7.3 Software, Notes and References7.3.1 NotesA detailed treatment of the techniques covered in this chapter can be foundin Chapter 4 of Ascher, Mattheij & Russell [8]. See also Mattheij & Molenaar[67]. Earlier references include Keller [59]. Our presentation is deliberatelyshort { we have chosen to concentrate more on �nite di�erence methods inthe next chapter.The simple shooting method applied to a linear BVP, see (6.6), can beviewed as a method of superposition where the solution is composed of alinear combination of solution modes (columns of Y (t)) plus a particularsolution of the nonhomogeneous problem (6.6) subject to (say) homogeneousinitial conditions. There are more e�cient, reduced superposition variants aswell, see [8] and references therein.There are other initial value techniques like stabilized march and Ric-cati methods which possess certain advantages (and disadvantages) over themultiple shooting method presented here. They can be viewed as achieving,for a linear(ized) problem, a decoupling of rapidly increasing modes (whoseforward integration yields stability problems) from the other modes. The

Chapter 7: Shooting 187algorithm described in Exercise 7.8 can be made to be stable then. See x8.7for more on decoupling. For reasons of space and bias, however, we do notexplore these methods further. The interested reader can consult Chapter 4of [8].For use of a multiple shooting method for parameter identi�cation, i.e.attempting to �nd unknown parameters which de�ne the ODE given obser-vations on its solution, see [17].7.3.2 SoftwareMany scientists and engineers seem to implement their own application-dependent shooting codes, making use of the excellent and abundant softwarewhich is available for initial value problems. Shooting handles problems withnon-separated boundary conditions, and extends naturally to handle prob-lems with parameters. Sparse linear algebra is avoided, at least when m isnot large. The Nag library has a simple shooting code written by I. Glad-well. Another shooting code is being developed by L. Shampine, at the timeof this writing, for Matlab. However, we �nd the limited applicability ofthis method somewhat unsettling for general purposes.A number of multiple shooting codes have been developed in the 1970'sand 1980's. We mention the code mus by Mattheij & Staarink [68, 8] whichis available from Netlib. Earlier codes include suport by Scott & Watts[84].7.4 Exercises1. Write a simple shooting code, using available software modules for ini-tial value ODE integration, solution of nonlinear algebraic equations,and solution of linear algebraic equations as you �nd necessary. Applyyour code to the following problems:(a) Find both solutions of Example 6.2. What are the correct initialvalues for each of the two solutions?(b) Use your program (only after verifying that it is correct) on somestable boundary value problem of your choice where it is not sup-posed to work, and explain the observed results.2. (a) Verify that the expression given in (7.11) is indeed the inverse ofA given in (7.10).

188 Chapter 7: Boundary Value Problems(b) Estimate cond(A) for Example 7.2 with � = 20, using 10 equallyspaced multiple shooting points.(c) How many multiple shooting points are needed to obtain a similarbound on cond(A) when � = 5000?3. Consider the car model of Example 4.7 with the same initial conditionsas employed there. Given that a = 100, the task is to �nd a constantsteering angle so that the car will pass through the point x(b) = 100,y(b) = 0.(a) Formulate this as a BVP (of order 6) in standard form.(b) Solve this BVP numerically, using a package of your choice oryour own home-grown program. Verify that the �nal speed isv(b) = 137:63. What is the required angle ? How long does ittake the car to get to x(b); y(b)?4. Consider the nonlinear problemv00 + 4t v0 + (tv � 1)v = 0; 0 < t <1 (7.13)v0(0) = 0; v(1) = 0:This is a well-behaved problem with a smooth, nontrivial solution. Tosolve it numerically, we replace [0;1) by a �nite, large interval [0; L]and require v(L) = 0 :For large t the solution is expected to decay exponentially, like e��t,for some � > 0.(a) Find the asymptotic behavior of the solution for large t (i.e. �nd�). [You may assume that v(t) is very (i.e. exponentially) smallwhen t is large.](b) Show that the simple shooting method is unstable for this prob-lem.(c) Describe the application of the multiple shooting method for thisproblem. Estimate (roughly) the number and location of theneeded shooting points.(d) What would you do to obtain convergence of your scheme, avoid-ing convergence to the trivial solution?5. The so-called SH equations, arising when calculating the ground dis-placements caused by a point moment seismic source in a layeredmedium, form a simple ODE systemy0 = A(t;!; k)y; 0 < t < b

Chapter 7: Shooting 189where A = 0@ 0 ��1�k2 � �!2 0 1A :Here the angular frequency ! and the horizontal wave number k areparameters, �1 < ! < 1, 0 � k < 1. The independent variablet corresponds to depth into the earth (which is the medium in thisseismological application). See [60, 8] and references therein for moredetails, although you don't really need to understand the physics in or-der to solve this exercise. A hefty assumption is made that the earth inthe area under consideration consists of horizontal layers. Thus, thereis no horizontal variation in medium properties. Assume, moreover,that there is a partition0 = t0 < t1 < � � � < tN = bsuch that the S-wave velocity �(t), the density �(t), and thus also�(t) = ��2 are constant in each layer:� = �n; � = �n; � = �n; tn�1 � t < tn:(a) At the earth's surface t = 0, y2(0) 6= 0 is given. Another boundarycondition is derived from a radiation condition, requiring that onlydown-going waves exist for t � b. Assuming that the propertiesof the medium are constant for t � b, this yields��y1(b) + ��1y2(b) = 0where �� =pk2 � (!=�)2:Derive this boundary condition.[Hint: the eigenvalues of A are ��.](b) Describe a multiple shooting method that would yield the exactsolution (except for roundo� errors) for this BVP.[Note that this problem has to be solved many times, for various valuesof k and !, because the obtained solution is used for integrand eval-uation for a double integral in k and !. It is therefore worthwhile totailor a particularly good method for this simple BVP.]6. Delay di�erential equations arise often in applications. There are somesituations in which a conversion to an ODE system can be useful. Con-sider a problem with a single, constant delay � > 0,z0(t) = f(t; z(t)) +A(t)z(t� � ); 0 < t < b (7.14a)B01z(t) = b1(t); �� t � 0 (7.14b)Bb2z(b) = b2 (7.14c)

190 Chapter 7: Boundary Value Problemswhere in (7.14a) there are m equations, B01 is full rank k � m, Bb2is (m � k) �m, B0 = 0@B010 1A is m � m, and A(t) can be written asA(t) = A(t)B0. We further assume that there is a unique, continuoussolution and that b = �J for some positive integer J .(a) Show that the functionsyj(s) = z(s+ (j � 1)� ); j = 1; : : : ; Jsatisfy the ODE systemy0j(s) = f(s + (j � 1)�;yj(s)) +A(s+ (j � 1)� )yj�1(s); j = 2; : : : ; J0 < s < �y01(s) = f(s;y1(s)) + A(s)b1(s� � )B01y1(0) = b1(�� ); Bb2yJ(� ) = b2yj(� ) = yj+1(0); j = 1; : : : ; J � 1;where b1 is just b1 extended by m � k zeros. This is a BVP instandard form.(b) Solve the following problem using your code of Exercise 7.1 or alibrary BVP package:u00(t) = � 116 sinu(t)� (t+ 1)u(t� 1) + t; 0 < t < 2u(t) = t� 1=2; �1 � t � 0u(2) = �12 :[Recall x6.4, in case you need the boundary conditions in separatedform.](c) In the case that k = m and B0 = I, (7.14) is an initial value delayODE. Describe a method to convert this to a sequence of initialvalue ODEs of increasing size.(d) Explain why both conversion tricks to standard BVP and to stan-dard IVP forms lose their appeal when � shrinks, i.e. � � b.[This is a curious thing, because as � ! 0 the delay ODE (7.14a)becomes \closer to" an ODE system of size m. For more on thistopic see [85, 8, 50] and references therein.]7. The well-known Newton-Kantorovich Theorem guarantees convergenceof Newton's method starting at c0 for the nonlinear system of algebraic

Chapter 7: Shooting 191equations h(c) = 0. With the notation J(c) = @h@c , a su�cient conditionfor convergence is �� < 1=2where kJ(c0)�1h(c0)k � �kJ(c0)�1k � �kJ(c)� J(d)k � kc� dk :Show that for the simple shooting method for (7.1), (7.2), we can merelybound � eLb, where L is the Lipschitz constant of f .[This bound may be realized in practice, and indicates potential troublein the convergence of Newton's method, unless c0 is very close to theexact solution c so that � is very small. The bound on is improved alot when using multiple shooting with uniformly distributed shootingpoints [96].]8. For the multiple shooting method we are faced with the challenge ofsolving the linear equations (7.9), where the matrix A may be largeand sparse if there are many shooting points. A simple way of doingthis involves viewing the equations (7.8) as a recursion. Thus, we writefor (7.8a)cN�1 = YN�1(tN�1)cN�2 + vN�1(tN�1)= YN�1(tN�1)[YN�2(tN�2)cN�3 + vN�2(tN�2)] + vN�1(tN�1)= : : :until we can express cN�1 in terms of c0, and this is substituted in(7.8b). The linear system to be solved is then only m�m and can besolved by usual means. This method is called compacti�cation in [8].(a) Carry out the method just outlined for �nding c, i.e., �nd theformula.(b) Show that, unfortunately, this method can degrade the stabilityproperties of the multiple shooting method to those of the simpleshooting method, i.e. this method for solving the linear system(7.9) can be unstable.(c) Discuss the application of this method to the problem of Exercise7.5 [53].[Note that, compared to simple shooting the method just outlined doeshave improved convergence properties for nonlinear problems.]

192 Chapter 7: Boundary Value Problems9. This exercise is concerned with �nding periodic solutions for given ODEproblems. In each case you are required to plot the obtained solutionin phase space and to �nd the length of the period accurate to 5 digits(so eye-balling or trial-and-error would not work well enough for thepurpose of �nding the period). You are allowed to use any initial valuesoftware and boundary value software you want (including your ownprogram from Exercise 1).(a) Find the period of the heavenly bodies example of Exercise 4.12(Fig. 4.8).(b) Find the period of the solution of the Predator-Prey Example 1.2(Fig. 1.3). The initial value used in that example was (80; 30).(c) Find the attracting limit cycle and the period of the Van der Polequation u00 = (1� u2)u0 � u: (7.15)

Chapter 8Finite Di�erence Methods forBVPsAs in the previous chapter, we seek numericalmethods for BVPs based on ourknowledge of methods for IVPs. But unlike the previous chapter, here we willnot integrate IVPs. Rather, we consider the suitability of the discretizationsstudied in Chapters 3, 4 and 5 for BVPs. Consider a system of ODEs oforder m, y0 = f(t;y); 0 < t < b (8.1)subject to m two-point boundary conditionsg(y(0);y(b)) = 0 : (8.2)De�ne a mesh (or a sequence of steps; we refer to the entire mesh as �)� = f0 = t0 < t1 < : : : < tN�1 < tN = bgwith hn = tn � tn�1 the nth step size, and consider solving fory0;y1; : : : ;yN�1;yNwith yn the intended approximation of y(tn). The following observations arestraightforward.� For BVPs, no particular yn is entirely known before all other meshvalues for y are known. Hence, no di�erence method can be regardedas explicit. So, using what we called in Chapter 4 an explicit Runge-Kutta method, for instance, o�ers no advantage over using what wasreferred to in the IVP context as implicit Runge-Kutta methods.� It makes no sense to use multistep methods either, both because thereare really no \past", known solution values, and because the sparsity193

194 Chapter 8: Boundary Value Problemsstructure of the linear system that results is adversely a�ected, com-pared to one-step methods.1� Symmetric, implicit Runge-Kutta methods are natural, because likeBVPs they are indi�erent to the direction of integration, i.e. they actsimilarly for nondecreasing and for nonincreasing modes.In the sequel we therefore concentrate, with the exception of x8.6, on sym-metric, one-step methods. As in Chapter 3, we start with the midpoint andthe trapezoidal methods.8.1 Midpoint and Trapezoidal MethodsWe consider below the midpoint method, and leave the parallel developmentfor the trapezoidal method to Exercise 8.1. Recall that the midpoint methodfor the ODE system (8.1) readsyn � yn�1hn = f �tn�1=2; 12(yn + yn�1)� ; n = 1; : : : ; N (8.3)and require also that the boundary conditions be satis�ed,g(y0;yN) = 0 : (8.4)In (8.3)-(8.4) we havem(N+1) algebraic equations for them(N+1) unknownmesh values (including the end values). These equations are nonlinear if f isnonlinear in y, and there are many such equations { it is not unusual to get500 equations for a small ODE system. Their solution is discussed below.Before this we consider an example.Example 8.1 Consider again Example 7.2. To recall, this is a linear prob-lem of the form (7.7) with m = 3,A = 0BBB@ 0 1 00 0 1�2�3 �2 2�1CCCA ;and the exact solution isy = (u; u0; u00)T ; u(t) = e�(t�1) + e2�(t�1)+ e��t2 + e�� + cos �t;1Note that we are discussing �rst order ODEs. For a second order ODE a naturaldiscretization stencil would involve two steps, see Exercises 8.9{8.12.

Chapter 8: Finite Di�erence Methods 195which determines the inhomogeneity vector q(t). For boundary conditions,u(0); u(1) and u0(1) are prescribed. In Tables 8.1 and 8.2 we record maximumerrors in u at the mesh points for � = 1; 50 and 500 using uniform meshesand specialized, nonuniform meshes. For the uniform meshes, h = 1=N .These results are for the midpoint method; similar results are obtained alsofor the trapezoidal method.N � error rate � error rate � error rate10 1 .60e-2 50 .57 500 .9620 .15e-2 2.0 .32 .84 .90 .0940 .38e-3 2.0 .14e-1 1.9 .79 .1980 .94e-4 2.0 .34e-1 1.9 .62 .35Table 8.1: Maximum errors for Example 8.1 using the midpoint method:uniform meshes. N � error rate � error rate10 50 .14 500 *20 .53e-1 1.4 .26e-140 .14e-1 1.9 .60e-2 2.180 .32e-2 2.2 .16e-2 1.9Table 8.2: Maximum errors for Example 8.1 using the midpoint method:nonuniform meshes.Note that for � = 1 the second order accuracy of the midpoint method isre ected in the computed results. Given the smoothness of the exact solutionit is also clear that there is room for employing higher order methods (seeTable 8.3), especially if highly accurate trajectories are desired.For � = 50, and even more so for � = 500, the method is much lessaccurate if we use a uniform mesh, and the convergence order is reduced.The reason has already been discussed in Chapter 3: O(1) errors which aregenerated in the narrow layer regions near the interval ends propagate almostundamped throughout the interval (recall Fig. 3.4). To retrieve the potentialaccuracy of the midpoint method in regions where the solution varies slowly,the mesh in layer regions must be dense. The nonuniform meshes used for

196 Chapter 8: Boundary Value ProblemsTable 8.2 result from a primitive e�ort to handle the layer regions. They aregiven (for N = 10) by0; 12�; 3�; 8�; :25; :5; :75; 1 � 8�; 1� 3�; 1� 12�; 1and the re�nements obtained by successively subdividing each of the meshelements into two to obtain the next mesh. For � = 500 the errors aremeasured only at mesh points away from the layer.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−1

−0.5

0

0.5

1

1.5

t

u


Figure 8.1: Example 8.1: Exact and approximate solutions (indistinguish-able) for � = 50, using the indicated mesh.Even with these simple nonuniform meshes, a signi�cant improvement inthe quality of the solution is obtained. The exact and approximate solutionsfor � = 50 are plotted in Fig. 8.1, together with the mesh that was used togenerate these curves. This mesh corresponds to the last entry of Table 8.2(N = 80). The approximate solution is in agreement with the exact one, asfar as the eye can tell. It turns out that for this type of problem it is possibleto construct more sophisticated meshes on which we obtain good, accuratesolutions for any � � 1 with N independent of �. This is in contrast tomultiple shooting techniques, where N grows linearly with �. 2For solving the many nonlinear algebraic equations we again considerNewton's method, because it is basic, it is fast when it works well, and itretains the sparsity structure of the Jacobian, which is important for such

Chapter 8: Finite Di�erence Methods 197a large system. As it turns out, Newton's method applied to the midpointequations (8.3)-(8.4) is equivalent to the method of quasilinearization coupledwith the midpoint discretization for linear problems. The latter approach hasthe attraction of being more modular, so we describe it next.8.1.1 Solving Nonlinear Problems: QuasilinearizationNewton's method for algebraic equations is obtained by expanding in Taylorseries and truncating the nonlinear terms at each iteration. The quasilin-earization method does the same for the nonlinear di�erential system. Thus,let y0(t) be an initial solution pro�le2 (a guess), and write(y�+1)0 = f(t;y�) + @f@y(t;y�)(y�+1 � y�)0 = g(y�+1(0);y�+1(b))= g + @g@u((y�+1(0) � y�(0)) + @g@v((y�+1(b)� y�(b))where y� = y�(t) is a known function at the �th iteration, and g, B0 = @g@uand Bb = @g@v are evaluated at the known iterate (y�(0);y�(b)) on the righthand side of the last expression. Letting alsoA(t) = @f@y(t;y�(t))we obtain at the �th iteration that the next iterate y�+1 = y satis�es thelinear BVP y0 = A(t)y+ q(t); 0 < t < bB0y(0) +Bby(b) = b (8.5)where q = f(t;y�(t))�A(t)y�(t)b = �g(y�(0);y�(b)) +B0y�(0) +Bby�(b) :The coe�cients in the linear problem (8.5) may all depend, in general, onthe current iterate y�(t). The quasilinearization procedure therefore de�nesa sequence of linear BVPs whose solutions hopefully converge to that of thegiven nonlinear BVP. Thus, if we know how to discretize and solve linearBVPs then we obtain a method also for nonlinear BVPs.We proceed by applying the midpoint method for the linear problem.Note that the iterates y�(t) are never really needed anywhere other than at2Here and below we denote iteration number by a simple superscript, e.g. y� for the�th iterate. This should not be confused with the notation for the �th power.

198 Chapter 8: Boundary Value Problemsmesh points. It is also easy to verify that the operations of linearization anddiscretization commute here: we obtain the same linear systems to solve aswe would if we apply Newton's method directly to (8.3)-(8.4).Example 8.2 Revisiting Example 6.2, we write the ODE in the �rst orderform (8.1) for y(t) = (u(t); u0(t))T ,y0 = 0@ y2�ey1+11A ; 0 < t < 1:The boundary conditions are linear and homogeneous. They can be writtenas B0y(0) +Bby(1) = 0, withB0 =0@1 00 01A ; Bb =0@0 01 01A :The Jacobian matrix is apparently@f@y =0@ 0 1�ey1+1 01Aso at the �th quasilinearization iteration we de�neA(t) = 0@ 0 1�ey�1 (t)+1 01A q(t) =0@ y�2(t)�ey�1 (t)+11A�A(t)y�(t)and solve the linear system (8.5) with b = 0 for y = y�+1(t) .Starting with the initial guessu0(t) = c2t(1� t); 0 � t � 1and employing the midpoint method with a uniform mesh of size N = 10, weobtain convergence after 2 Newton iterations to good-quality approximationsof each of the two solutions depicted in Fig. 6.1, upon setting c2 = 0:5 andc2 = 10, respectively (cf. Example 7.1). This problem is very easy to solvenumerically, despite its nonunique solutions. 2Instead of solving in the �th quasilinearization iteration for the next it-erate y�+1 we can (and we prefer to) solve for the Newton direction at y�,�(t) = y�+1(t)� y�(t)

Chapter 8: Finite Di�erence Methods 199and then let y�+1 = y� + � :For � (which depends of course on �) we have the linear problem (Exercise8.1) �0 = A(t)� + q(t); 0 < t < bB0�(0) +Bb�(b) = b (8.6)where A, B0 and Bb are as before, in (8.5), but the data simpli�es toq(t) = f(t;y�)� (y�)0b = �g(y�(0);y�(b)) : (8.7)Note that in Example 8.2 we may no longer automatically set b = 0 whensolving for �(t) { this depends on the initial guess y0(t).The midpoint method applied to the linear problem (8.5) yields the linearequationsyn � yn�1hn = A(tn�1=2)yn + yn�12 + q(tn�1=2); n = 1; : : : ; NB0y0 +BbyN = b : (8.8)This is a large, sparse linear system of m(N + 1) equations,Ay� = rwith A = 0BBBBBBBBB@S1 R1S2 R2. . . . . .SN RNB0 Bb1CCCCCCCCCA ; (8.9)y� = 0BBBBBBBBB@ y0y1...yN�1yN 1CCCCCCCCCA ; r = 0BBBBBBBBB@ q(t1=2)q(t3=2)...q(tN�1=2)b 1CCCCCCCCCA

200 Chapter 8: Boundary Value Problemswhere Sn = � �h�1n I + 12A(tn�1=2)� ; Rn = �h�1n I � 12A(tn�1=2)� :We see that the structure of A is the same as that of A for the multipleshooting method. In fact, to make it even more similar, we can multiply thenth block row of A by R�1n , obtaining block rows in the form�� R�1n Sn I � � ��withR�1n Sn presumably approximating the fundamental solution matrix value�Yn(tn).To summarize, here is the algorithm combining quasilinearization withthe midpoint discretization for a nonlinear BVP (8.1)-(8.2).Algorithm 8.1 Quasilinearization with Midpoint� Given1. f , @f@y for each t and y;2. g(u;v), @g@u, @g@v for each u and v;3. a mesh � : 0 = t1 < : : : < tN = b;4. an initial guess y0(t), or just y0n = y0(tn); n = 0; 1; : : : ; N ; and5. a convergence tolerance NTOL for the nonlinear iteration.� For � = 0; 1; : : : , until max0�n�N jy�+1n � y�nj < NTOL,1. For n = 1; : : : ; N , form Sn, Rn and rn = q(tn�1=2) usingA(tn�1=2) = @f@y(tn�1=2; y�n + y�n�12 )q(tn�1=2) = f(tn�1=2; y�n + y�n�12 )� y�n � y�n�1hn :2. Form A and r of (8.9) usingB0 = @g@u(y�0;y�N); Bb = @g@v(y�0;y�N); b = �g(y�0;y�N) :3. Solve the linear system of equations for y� = ��.4. Set y�+1� = y�� + �� :

Chapter 8: Finite Di�erence Methods 2018.1.2 Consistency, 0-stability and ConvergenceThe local truncation error, consistency and accuracy of a di�erence methodare de�ned as in x3.2. There is essentially no dependence in this regardon what type of side conditions are prescribed (be they initial or boundaryconditions) so long as they are approximated well. The question is still byhow much the exact solution fails to satisfy the di�erence equations.For the midpoint method we de�neN�u(tn) � u(tn)� u(tn�1)hn � f �tn�1=2; 12(u(tn�1) + u(tn))�so the numerical method is given byN�y�(tn) = 0(with g(y0;yN) = 0). By Taylor's expansion (see Exercise 3.4) we obtainthat the local truncation error satis�esdn = N�y(tn) = O(h2n)so this is a consistent, second order accurate method.The de�nition of convergence is also exactly as in x3.2. Leth = max1�n�N hn :The method is convergent of order p ifen = O(hp)for n = 0; 1; 2; : : : ; N , where en = yn � y(tn). We expect 2nd order conver-gence for the midpoint method.The vehicle that carries accuracy results into convergence statements is0-stability. For nonlinear problems we must con�ne ourselves to a vicinityof an exact, isolated solution (recall Chapter 6). Consider a \discrete tube"around such an exact solution y(t),S�;�(y) = fu�; jui � y(ti)j � �; 0 � i � Ng (8.10)(the notation � is for the particular mesh considered, and � > 0 is the radiusof the tube around y(t)). The rest of the 0-stability de�nition is similar tothe IVP case. The di�erence method is 0-stable if there are positive constantsh0, � and K such that for any mesh � with h � h0 and any mesh functionsx� and z� in S�;�(y),jxn � znj � Kfjg(x0;xN)� g(z0; zN )j+ max1�j�N jN�x�(tj)�N�z�(tj)jg; 0 � n � N: (8.11)

202 Chapter 8: Boundary Value ProblemsSubstituting xn y(tn) and zn yn into (8.11) we obtain an extension ofthe Fundamental Theorem 3.1 to the BVP case,jenj � K maxj jdjj = O(hp); 0 � n � N : (8.12)In particular, the midpoint method is second order convergent. Note alsothat, as in the IVP case, the bound (8.12) is useful only if K is of the orderof magnitude of the stability constant of the given di�erential problem.How can we show 0-stability? Below we consider the linear case. Forthe nonlinear problem we consider a linearization, much in the spirit of thequasilinearization method and the variational problem (6.5). The di�erenceoperator must satisfy certain smoothness and boundedness requirements, andthen the results extend.For the linear BVP (8.5) the midpoint method (8.8) has been cast intomatrix form in (8.9). Obviously, 0-stability is obtained if there is a constantK such that for all meshes � with h small enough,kA�1k � K :Indeed, then we would have for the exact solution y(t), written at meshpoints as ye = (y(0);y(t1); : : : ;y(tN�1);y(b))T ;the estimates Aye = r+O(h2)A(ye � y�) = O(h2)jye � y�j � K O(h2) :To show 0-stability we call upon the closeness of A to the multiple shoot-ing matrix. It is not di�cult to see thatR�1n Sn = �Yn(tn) +O(h2n) :Hence, denoting the multiple shooting matrixM = 0BBBBBBBBB@�Y1(t1) I�Y2(t2) I. . . . . .�YN (tN) IB0 Bb1CCCCCCCCCA

Chapter 8: Finite Di�erence Methods 203(which is in a slightly, but not meaningfully, di�erent form from A of (7.10))and de�ning the block-diagonal scaling matrixD = 0BBBBBBBBB@R�11 R�12 . . . R�1N I1CCCCCCCCCAwe obtain DA =M+Ewhere E has the same zero-structure as A and kEk = O(h2).From this we have A�1 = (M+E)�1D :Taking norms, and capitalizing on our knowledge of the exact inverse of M(recall (7.11)), we readily obtainkA�1k � �+O(h) � K (8.13)where � is the stability constant of the problem, de�ned in (6.13). For hsmall enough, the stability bound is therefore quantitative! If the BVP isstable and not sti�, and the local truncation error is small, then the globalerror is expected to have the order of the local truncation error times thestability constant of the given BVP.It is important to understand that the closeness just discovered betweenthe midpoint di�erence method and the multiple shooting method is mainlyuseful for theoretical purposes. The placement of shooting points in the lattermethod is done to reduce IVP instabilities, not to control truncation error(which is controlled by the initial value solver). Thus, the distance betweenshooting points is not necessarily small. If the number of shooting pointsneeded is as large as what is typical for a di�erence method like midpointthen the multiple shooting method becomes rather ine�cient, because whereone simple discretization step would do it �res up a whole IVP solver. Also,for sti� BVPs the midpoint method does not use steps so small that R�1n Sncan be said to approximate �Yn(tn) well (and wisely so). The interpretationof the above result is still valid as h! 0, as the name `0-stability' indicates.

204 Chapter 8: Boundary Value Problems0 5 10 15 20 25 30

0

5

10

15

20

25

30

nz = 198Figure 8.2: Zero-structure of the matrix A, m = 3; N = 10. The matrix sizeis m(N + 1) = 33.8.2 Solving the Linear EquationsHaving discretized a linear BVP using, say, the midpoint method, we obtaina large, sparse linear system of algebraic equations to solve,Ay� = r (8.14)with A having the sparsity structure depicted in (8.9). It is important thatthe reader imagine the structure of this matrix for, say, m = 3 and N = 100{ it is large and rather sparse (only 1818 entries out of 91809 are possiblynonzero). In Fig. 8.2 we depict this structure for more modest dimensions,where zeros are blanked.Of particular concern is the block B0 at the lower left corner of A. If itwas not there then we would have a banded system, i.e. all nonzero entriesare concentrated in a narrow band around the main diagonal. Fortunately,the situation for separated boundary conditions is much better than for thegeneral case, just like in x6.2. IfB0 =0@B010 1A ; Bb = 0@ 0Bb21Awhere B01 has k rows and Bb2 has m� k rows, then we can simply permutethe matrix A, putting the rows of B01 at the top. The right hand side ris permuted accordingly as well. This also establishes a \time" direction {

Chapter 8: Finite Di�erence Methods 2050 5 10 15 20 25 30

0

5

10

15

20

25

30

nz = 189Figure 8.3: Zero-structure of the permuted matrix A with separated bound-ary conditions, m = 3; k = 2; N = 10.the lower is the row in the permuted A, the larger is t to which it refers.In Fig. 8.3 we depict the permuted A corresponding to Fig. 8.2, where twoboundary conditions are prescribed at t = 0 and one at t = b.A number of methods which require O(Nm3) ops (instead of the usualO(N3m3)) to solve the linear system (8.14) in the case of separated bound-ary conditions have been proposed in the literature. Here we describe thesimplest and crudest of these, and only comment on other methods.Once permuted, the matrixA can be considered as banded, withm+k�1diagonals below the main diagonal and 2m � k � 1 diagonals above themain diagonal possibly having nonzero entries. Outside this total of 3m� 1diagonals, all entries of A are 0. Gaussian elimination with partial pivotingextends to the banded case in a straightforward fashion. Simply, two of thethree nested loops de�ning the elimination process are shortened in order notto eliminate elements known to be 0 at the start. The �ll-in is only withinthe banded structure, with the addition of a few diagonals due to the partialpivoting. It is not di�cult to write a program to carry out this algorithm.Also, there exists standard software to do this, e.g. in Linpack or Lapack.If you look at the band containing all nonzero elements in the matrixdepicted in Fig. 8.3 you will notice that there are triangles of zeros withinthe band for each interior mesh point. These zeros are not taken advantageof in the band method just described. Other, more sophisticated methods forsolving (8.14) attempt to avoid, or at least minimize, �ll-in of these triangles,thereby achieving an additional savings of up to 50% in both storage andcomputational e�ciency.

206 Chapter 8: Boundary Value ProblemsThe fact that all the nonzeros of A densely populate a narrow band as inFig. 8.3 is typical for boundary value ODEs, where neighboring elements (i.e.subintervals sharing unknowns) can be ordered consecutively. For boundaryvalue PDEs, on the other hand, the band is necessarily much wider and thematrix is sparse inside the band as well. Variants of Gaussian elimination be-come less e�ective then, and iterative methods like preconditioned conjugategradients and multigrid take center stage.8.3 Higher Order MethodsThe midpoint and trapezoidal methods may be considered as basic methods.The only problem in using them as they are for many applications is that theyare only second order accurate. There are two types of higher order methodsextending the basic methods: higher order Runge-Kutta and accelerationtechniques. The overview picture is given in Fig. 8.4.symmetric one-sided

acceleration

deferred correctionextrapolation

one-step

basic methods

Figure 8.4: Classes of higher order methods.8.3.1 CollocationOne class of extensions to higher order methods is simply higher order im-plicit Runge-Kutta methods. Continuing to prefer symmetric methods, thisleads to collocation methods at Gauss or Lobatto points. We have already

Chapter 8: Finite Di�erence Methods 207considered the basic properties of these methods in Chapter 4, and every-thing else pertaining to BVPs extends in a very similar way to the treatmentin the previous two sections. We summarize without repeating the proof:� Collocation at s Gaussian points is 0-stable with a numerical stabilityconstant satisfying (8.13). It converges with the error boundjenj � K maxj jdjj = O(h2s); 0 � n � N: (8.15)Example 8.3 We repeat the computations of Example 8.1 using collocationat 3 Gaussian points per mesh element. The results are recorded in Tables8.3 and 8.4.N � error rate � error rate � error rate10 1 .60e-8 50 .54e-1 500 .7120 .94e-10 6.0 .66e-2 3.0 .50 .5040 .15e-11 6.0 .32e-3 4.4 .27 .9180 .24e-14 5.9 .73e-5 5.5 .89e-1 1.6Table 8.3: Maximum errors for Example 8.1 using collocation at 3 Gaussianpoints: uniform meshes.N � error rate � error rate10 50 .25e-2 500 .54e-320 .12e-3 4.4 .14e-3 1.940 .27e-5 5.5 .75e-4 .9080 .40e-7 6.1 .33e-4 1.2Table 8.4: Maximum errors for Example 8.1 using collocation at 3 Gaussianpoints: nonuniform meshes.The errors for � = 1 in Table 8.3 re ect the fact that this method is oforder 6 and has a nice error constant to boot. For � = 50 the errors are fairlygood even on the uniform mesh, although they are better on the nonuniformmeshes (whose construction is discussed in Example 8.1). For � = 500 anonuniform mesh is certainly needed, see Table 8.4. In fact, a better layermesh can become useful as well in order to retrieve the full convergence order(which turns out to be 4) that this method has outside the layer regions. 2

208 Chapter 8: Boundary Value ProblemsApplying quasilinearization to nonlinear problems and considering collo-cation for the resulting linear ones, we obtain a linear system of equations forthe mesh unknowns as well as the internal stages (see (4.8) or (4.10)). Since,unlike in the IVP case, the solution is not known anywhere in its entiretyuntil it is known at all mesh points, one approach is to solve for all mesh val-ues and internal stages simultaneously. This alters the structure of the linearsystem (8.14), but A is still in a block form and is banded independently ofN . Alternatively, we eliminate the internal stages locally, in each mesh subin-terval n, in terms of the mesh values yn�1 and yn. This IVP-style approachis called local elimination, or parameter condensation, in the �nite elementliterature. The remaining global system to be solved, (8.14), has the almost-block-diagonal form (8.9) independently of s, which adds an attractive mod-ularity to the process. However, the partial decompositions used for thelocal elimination stage have to be stored from one nonlinear iteration to thenext, so the advantage here is in elegance, not in storage or computationale�ciency.8.3.2 Acceleration TechniquesThe other possibility for extending basic methods to higher order is to staywith the midpoint or the trapezoidal method as the basic discretizationmethod and to accelerate its convergence by applying it more than once.One way of doing this is extrapolation, where the method is applied onmore than one mesh and the results are combined to kill o� the lower orderterms in the error expansion. For instance, if the global error on a givenmesh � has the form en = y(tn)� yn = ch2n +O(h4)where c may vary (slowly) in t but is independent of h, then subdividingeach mesh subinterval into two and applying the same method again yieldsfor the solution ~y2n y(tn)� ~y2n = 14c h2n +O(h4)so 4~y2n�yn3 is a 4th order accurate approximate solution. This process can berepeated to obtain even higher order methods (Exercise 8.4).Another possibility is deferred correction, where the discretization onthe same mesh is applied a few times, at each instance using the previousapproximation to correct the right hand side by better approximating thelocal truncation error.Unlike extrapolation, which uses the expansion in powers of h for theglobal error, defect correction uses the corresponding expansion for the local

Chapter 8: Finite Di�erence Methods 209truncation error. For instance, applying the trapezoidal method to (8.1) weobtain (Exercise 8.8)dn = rXj=1 h2jn Tj[y(tn�1=2)] +O(h2r+2n ) (8.16)where Tj[z(t)] = �122j�1(2j + 1)!f (2j)(t; z(t)) (8.17)if f has continuous partial derivatives up to order 2r + 2 for some positiveinteger r. Now, let y� = fyngNn=0 be the obtained solution on a given meshusing the trapezoidal method to discretize the stable BVP (8.1)-(8.2), anddenote fn = f(tn;yn), as in Chapter 5. Then we can use these values toapproximate T1 up to O(h2n), e.g.T1[y(tn�1=2)] � T1;n�1=2 = (8.18)= 124h2n (�fn�2 + fn�1 + fn � fn+1); 2 � n � N � 1:This can be added to the right hand side of the trapezoidal discretization,i.e. we solve~yn � ~yn�1hn = 12(f(tn; ~yn) + f(tn�1; ~yn�1)) + h2nT1;n�1=2; 1 � n � Ng(~y0; ~yN) = 0 :The local truncation error is now O(h4n), because the sum in the expressionin (8.16) starts from j = 2. Hence also the global error is 4th order.As in the case of extrapolation, the deferred correction process can berepeated to obtain higher order approximations. Moreover, all approxima-tions are solved for on the same mesh. For a linearized problem, one matrixA must be decomposed. Then, in the ensuing iterations which graduallyincrease the accuracy, only the right hand side vectors are updated. The cor-responding solution iterates are computed each by a pair of forward-backwardsubstitutions. It may look at this point as if we are getting something fromnothing! The catch, though, is in having to use more and more cumbersomeand accurate approximations to the T 0js. The extrapolation method is moreexpensive but simpler.These acceleration methods are useful and important in practice. Theyhave useful counterparts for IVPs and DAEs as well. Methods of both col-location and the acceleration types just described have been implemented ingeneral-purpose codes. In a broad-brush comparison of the methods, it seemsthat they share many attributes. Methods of the acceleration type seem to

210 Chapter 8: Boundary Value Problemsbe faster for simple problems, while methods of higher order collocation atGaussian points do a better job for sti� BVPs.8.4 More on Solving Nonlinear ProblemsNewton's method (or quasilinearization) converges very rapidly if the �rstiterate is already a su�ciently good approximation of the (isolated) solution.This is the typical case for IVPs, even for sti� problems, where the knownvalue of yn�1 is only O(hn) away from the sought value of yn. But for BVPsno such high-quality initial iterate is generally available, and getting thenonlinear iteration to converge is a major practical challenge. This makesfor one of the most important practical di�erences between general-purposeIVP and BVP solvers. Below we brie y discuss some useful approaches. 38.4.1 Damped NewtonFor the nonlinear system h(y�) = 0Newton's (or the quasilinearization) method at the �th iteration can be writ-ten as solving the linear system�@h@y(y��)�� = �h(y��)and forming y�+1� = y�� + ��(see (8.6) and Algorithm 8.1). This can be interpreted as taking a step oflength 1 in the direction ��. If the model on which Newton's method isbased (which can be viewed as assuming a local quadratic behavior of anappropriate objective function) is too optimistic then a smaller step in thisdirection may be called for. In the damped Newton method we then lety�+1� = y�� + �� (8.19)3Throughout this section we consider a system of nonlinear algebraic equations, andcall the vector of unknowns y� . The somewhat cumbersome index � is there merely toremind us that we seek a mesh function, approximating the solution of the BVP. However,no special properties of the mesh function as such are utilized.

Chapter 8: Finite Di�erence Methods 211where the parameter , 0 < � 1, is chosen to ensure a decrease at eachiteration in the objective function. For example, we can requirejh(y�+1� )j2 � (1� �)jh(y��)j2 (8.20)where � ensures some minimum decrease, e.g. � = 0:01.It can be shown theoretically that a sequence f �g can be found undercertain conditions (which include non-singularity of @h@y with a reasonablebound on the inverse) such that the damped Newton method converges glob-ally, i.e. from any starting iterate y0�. No such theorem holds for Newton'smethod without damping, which is assured to converge only locally, i.e. withy0� \close enough" to the sought solution y� (recall, e.g., Exercise 7.7).In practice this technique is useful on some occasions, but it is not su�-cient for really tough BVPs. Typically, in such tough problems the Newtondirection �� is so polluted that it makes no sense to step in that directionfor any step length. There seems to be no easy substitute for the remedy of�nding better initial iterates.8.4.2 Shooting for Initial GuessesOften users feel more comfortable supplying only guesses for the initial valuesy0(0), rather than an entire solution pro�le y0(t). This is all that is requiredto �re up a simple shooting method. But if the stability of the BVP issuch that a shooting method may indeed be used then one can instead usean initial value code to solve the IVP once for the guessed initial values,obtaining an initial solution pro�le y0(t). Then a quasilinearization iterationfor a �nite di�erence method may be started.This idea is a trick of convenience. It has obvious limitations. A morepowerful (and more expensive) approach is to develop an appropriate initialsolution gradually, solving a sequence of BVPs. The latter idea is discussednext.8.4.3 ContinuationThis approach is powerful and general. We embed the given problem in afamily of problems �(y�; �) = 0 �0 � � � �1 (8.21)where the problem �(y�; �0) = 0 is easy to solve and �(y�; �1) = h(y�).Under suitable conditions this de�nes a homotopy path from an easy problem

212 Chapter 8: Boundary Value Problemsto the given problem, which we traverse numerically. Thus, we solve at eachcontinuation step the problem�(y�; � +��) = 0(call its solution y�(t;� + ��)), given the solution y�(t;�), where �� is asu�ciently small step-size in �. The simplest use of y�(t;�) is as a �rstiterate y0�(t;�+��) = y�(t;�), but it is possible to get fancier.This approach can be very successful in practice, although it can becomeexpensive and it seems hard to automate for really di�cult problems. Thebig question is de�ning the family of problems (8.21). The homotopy pathhas to somehow parameterize the problem well, and automatic choices suchas a simple interpolation between �0 and �1 typically do not work well inthis sense. Fortunately, there often exists a natural parameterization andembedding of this sort in applications.Example 8.4 Often a nonlinear BVP results from the need to �nd a steadystate solution of a time-dependent partial di�erential equation in one spacevariable. Solving the PDE, starting from some initial solution pro�le, canthen be considered as a continuation method for the steady state problem.For instance, consider the di�usion problem of Example 1.3,@u@t = @@x �p@u@x�+ g(x; u):For a steady state solution, setting @u@t = 0 yields the ODE in x0 = (pu0)0 + g(x; u);where prime ( 0) denotes di�erentiation with respect to the independent vari-able x. This ODE is typically equipped with one boundary condition at eachend of the interval in x.Now, solving this nonlinear BVP numerically can be achieved by discretiz-ing the space variable of the PDE while keeping @u@t in, and then applying themethod of lines in t.The time-embedding continuation method is very natural, but it can bevery slow. One can often solve the steady state problem at a tiny fraction ofthe cost of solving the PDE. But this method has the attraction of generality:a straightforward numerical method is applied, regardless of how di�cult thenonlinearity resolution is. Moreover, playing with di�erent initial values maylead to di�erent steady states (in cases where there is more than one suchsolution), perhaps in an intuitive way. 2The continuation technique opens the door to a variety of interestingtopics such as path following, and constructing bifurcation diagrams, butstepping through that door leads us outside the scope of this book, so wemerely give a simple example here.

Chapter 8: Finite Di�erence Methods 213Example 8.5 The problem considered in Examples 8.2 and 6.2 certainlyrequires no fancy means to �nd its two solutions, once we have an idea thatthere are two such solutions and roughly what they look like. But to �ndthem, and much more, automatically, we embed it in the family of problemsu00 + �eu = 0 (8.22)u(0) = u(1) = 0and consider choosing the continuation parameter � = �. As it turns out,continuation in � starting from � = 0 (where the solution is u � 0) leads tothe stabler of the two solutions for � = e (which is the value of � consideredin Example 8.2). Continuing with � further, the problem becomes singular atsome � = �� 3:5. What happens is that two solutions approach each otheras � increases and then cease to be isolated.A more general continuation procedure uses arclength along the homotopypath for the continuation parameter �. This is the preferred procedure fora general-purpose implementation, but it is again beyond the scope of ourpresentation. However, for this example there is also a simple trick: insteadof doing the continuation with � = � of (8.22), use � = kuk2. Thus, considerthe embedding u00 + �eu = 0�0 = 0 (8.23)w0 = u2u(0) = u(1) = 0; w(0) = 0; w(1) = �2 :Carrying out the continuation process for this system (8.23) from � = 0 to� = 8 yields the bifurcation diagram depicted in Fig. 8.5, where the computa-tion for each � was carried out using a standard BVP code. (Collocation at 3Gaussian points was utilized and the problem was solved for 800 equidistantvalues of �. This does not take long { less than a minute in total on anSGI Indigo2 R4400.) Fig. 8.5 clearly suggests that for � < �� there are twosolutions, for � = �� there is one and for � > �� there are none. The typeof singularity which occurs in this example at � = �� is called a fold. 28.5 Error Estimation and Mesh SelectionAnother key ingredient to the success of IVP solvers which is lost in BVPsolvers is the ability to control local errors locally, as the solution process

214 Chapter 8: Boundary Value Problems0 0.5 1 1.5 2 2.5 3 3.5 4

0

1

2

3

4

5

6

7

8

lambda

|| u

||

bifurcation diagram ||u|| vs lambda

Figure 8.5: Bifurcation diagram for Example 8.5 : kuk2 vs �.proceeds from t = 0 to t = b. To capitalize on this, IVP solvers oftenabandon controlling the global error en (although this is usually what theuser may want) and control the local truncation error dn instead.In the BVP case there is no compelling reason, in general, not to estimatethe global error en. Such an estimate is compared against user-speci�edtolerances or used to select a new mesh. The process in overview for a givenBVP is to discretize and solve it on a sequence of meshes, where the error inthe solution on the current mesh is estimated and this information is usedto decide what the next mesh should be, in case there is a deemed need fora next mesh. The �rst mesh is a guess.The error estimation can be achieved using a process similar to the onedescribed for extrapolation methods in x8.3.2. For instance, given a midpointor a trapezoidal solution fyng on a mesh � and another one f~yjg on a meshobtained by subdividing each element of � into two halves, we have~y2n � yn = 34ch2 +O(h4) :So en = y(tn)� yn � 43(~y2n � yn)and4 ~e2n = y(tn)� ~y2n � 13(~y2n � yn) :4Note that we do not get a good error estimate for the 4th order extrapolated solution

Chapter 8: Finite Di�erence Methods 215For the deferred correction approach the local truncation error is esti-mated as part of the algorithm. A global error estimate can be directlyobtained as well, by solving with the truncation error as the right hand side.It is also possible to construct some cruder indicators for the error enon a given mesh without recomputing another solution. This can be doneby taking advantage of the form of the error if it has a local leading term,or by considering arclength or other error-monitoring functions, e.g. en =hknjy(k)(tn)j for some 1 � k � p, with p the order of accuracy. Such monitorfunctions may be su�cient to select a mesh, even if they do not provide agenuine, reliable error estimate.Given such an error estimate or indicator, the next mesh is selected basedon the principle of error equidistribution, where one attempts to pick the meshsuch that the resulting solution will satisfyjeij � jejj; 1 � i; j � N :This essentially minimizes maxn jenj for a given mesh size. The mesh size Nis further selected so that maxn jenj � ETOLfor a user-speci�ed error tolerance.'& $%Reader's advice: The technical level and expertise required ofthe reader for the next section is a touch higher than what hasbeen required so far in this chapter, and it gets even higher inx8.7. But these sections are important and are worth the extrae�ort.8.6 Very Sti� ProblemsAs in the IVP case we expect the midpoint (or trapezoidal) method to berobust when large eigenvalues with both negative and positive real parts arepresent, so long as the layers are resolved. We have already seen a demon-stration in Examples 8.1 and 8.3. The properties of symmetric methods areessentially similar for the IVP and the BVP cases.13 (4~y2n � yn). The error estimates are tight only for the lower order approximations.Similarly, using midpoint solutions on 3 meshes it is possible to obtain a 4th order ap-proximation with an error estimate or a 6th order approximation without a tight errorestimate.

216 Chapter 8: Boundary Value ProblemsFor sti� initial value problems we prefer methods with sti� decay, suchas BDF or collocation at Radau points. Unfortunately, it is not possible toattain this property automatically in the BVP case, because methods withsti� decay cannot be symmetric.For the IVP test equationy0 = �y; Re(�) < 0we prefer, when hRe(�)� �1, to use a method like backward Euler,yn = yn�1 + h�yn :Similarly, changing the direction of integration for the unstable IVP y0 = ��yto � = tn � t, we get d~yd� = �~yand applying backward Euler to the equation in ~y readily yields the forwardEuler method for the original,yn = yn�1 � h�yn�1 :For the system y0 = 0@� 00 ��1Ayit then makes sense to use upwinding5:y1;n = y1;n�1 + h�y1;ny2;n = y2;n�1 � h�y2;n�1:For a general sti� problem, unfortunately, the increasing and decreas-ing modes are coupled together, and there are also slow solution modes forwhich a higher order symmetric discretization method is perfectly suitable.Consider the general linearized di�erential system related to (8.1)y0 = A(t)y+ q(t)where A(t) = @f@y , and de�ne the transformation for some nonsingular, su�-ciently smooth matrix function T (t),w = T�1y:5The term upwind originates from computational uid dynamics, where the directionof stable integration corresponds to the upwind (i.e., against the wind) direction of the ow. This type of discretization has also been called upstream, a name naturally arisingfrom applications where what ows is liquid. See Exercise 8.11.

Chapter 8: Finite Di�erence Methods 217Then w(t) satis�es the ODEw0 = (T�1AT + T�1T 0)w + T�1q :Now, if T is such that the transformed matrix A can be written in blockform T�1AT + T�1T 0 = 0BBB@B1 0 00 B2 00 0 B31CCCAwhere� B1 is dominated by eigenvalues with large negative real parts,� B2 is dominated by eigenvalues with large positive real parts,� kB3k is not largethen we can use backward Euler for w1, forward Euler for w2 and the trape-zoidal method for w3, where wi corresponds to the equations involving Biand wT = (wT1 ;wT2 ;wT3 ). The equations resulting from such a discretizationneed not be solved for w. Rather, the back-transformation from w to y isused to transform them into di�erence equations for y.Example 8.6 The stable BVPy0 = 0@ cos t sin t� sin t cos t1A0@ � 1�1 ��1A0@cos t � sin tsin t cos t 1Ayy1(0) = 1; y1(1) = 2is sti� when Re(�)��1. Applying forward Euler or backward Euler to thisODE with a step size h yields disastrous results when hRe(�) < �2. But forw = T�1y, where T (t) =0@ cos t sin t� sin t cos t1Awe obtain the decoupled systemw0 = 0@� 00 ��1Aw

218 Chapter 8: Boundary Value Problemsand the upwind method described above can be applied, yielding a very stablediscretization method. We write it in matrix form as0@1� h� 00 11Awn = 0@1 00 1 � h�1Awn�1 :De�ning for each n yn = Tnwn = 0@ cos tn sin tn� sin tn cos tn1Awnthe obtained upwind method for y� is0@1� h� 00 11A0@cos tn � sin tnsin tn cos tn 1Ayn == 0@1 00 1� h�1A0@cos tn�1 � sin tn�1sin tn�1 cos tn�1 1Ayn�1; 1 � n � Nwith y1;0 = 1; y1;N = 2:In Fig. 8.6 we display the approximate solution using the upwind method justdescribed and a uniform mesh with � = �1000; h = 0:1. We also display the\exact" solution, obtained using the code colnew, employing collocation at 3Gaussian points per subinterval on a nonuniform mesh with 52 subintervals.(The code selects the mesh automatically, satisfying a global error tolerance of1:e�5.) Note that despite the fact that the boundary layers are totally missed(i.e., skipped over), the solution values at mesh points are approximated wellby the upwind method, in analogy to the IVP case depicted in Fig. 3.6. 2The upwind discretizationmethod outlined above, and other similarmeth-ods of higher order, work very well for special classes of problems (e.g., gasdynamics in PDEs). Layer details can be skipped, as with the backwardEuler method in Chapter 3. But �nding the transformation T in generaland applying it in practice are major obstacles. For linear problems thereare recipes for this, but the general case involves such an additional amountof work that straightforward collocation at Gaussian points is often moree�cient. Things are worse for nonlinear problems, where the entire processis di�cult because the linearization is based on unknowns and a mistake inthe sign of a fast mode is a disaster akin to simple shooting. (Alternatively,

Chapter 8: Finite Di�erence Methods 2190 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

t

y_1

Figure 8.6: Solution for Example 8.6 with � = �1000 using an upwinddiscretization with a uniform step size h = 0:1 (solid line). The \exact"solution is also displayed (dashed line).upon using quasilinearization, stable but entirely wrong linear problems aresolved, so the nonlinear iteration may not converge.)For symmetric di�erence methods no explicit decoupling transformationis usually needed. But, as mentioned earlier, the lack of sti� decay can beevident in computations. For the test equationy0 = �ythe midpoint method yields, to recall,yn = 2 + h�2 � h�yn�1:So, although jynj � jyn�1j precisely whenever the exact solution satis�esjy(tn)j � jy(tn�1)j, as hj�j ! 1 we getyn � �yn�1 :Thus, if y(0) = 1 and hRe(�)� �1 then the exact solution satis�es y(h) � 0,yet for the numerical solution,yn � (�1)n; n = 1; : : : ; N :This not only necessitates covering layer regions (where the solution variesrapidly) with dense meshes, as we have already seen in Examples 3.2 and 8.1,

220 Chapter 8: Boundary Value Problemsit also means that in the very sti� case local errors propagate through smoothsolution regions (where h is relatively large) almost undamped, yielding non-local error e�ects.There are some exotic examples where a symmetric method may evenblow up when approximating a stable BVP or even a stable IVP (see Ex-ercise 8.14). These seem to be rare in practice, though. Moreover, andperhaps more importantly, since there is almost no quadrature e�ect when0 < j��1j � 1 in a stable problem of the formy0 = �(y � q(t))(note that the solution is y(t) = q(t)+O(j��1j) and the quadrature precisiona�ects only the O(j��1j) term), methods which are based on high-precisionquadrature may experience order reduction away from collocation points; seex4.7.3 and Exercise 8.13. For collocation at s Gaussian points we can get,assuming no layer errors and 0 < jRe(�)j�1 � h� 1, the error boundy(tn)� yn = O(hs) :This error estimate improves toy(tn)� yn = O(hs+1)if s is odd and some mesh restrictions apply. But this still falls short of theusual nonsti� order O(h2s) when s > 1. For Example 8.3, the e�ective orderis s+ 1 = 4 (instead of 2s = 6) when h� is very large.Despite their limitations, as candidates for constructing a general-purposesolver symmetric discretizations seem to win. It appears that upwinding tech-niques should be reserved for special problems where the explicit decouplingof modes can proceed with relative ease { see Exercises 8.11 and 8.12.8.7 DecouplingThe concept of decoupling of modes of distinctly di�erent types is fundamen-tal to the understanding of numerical discretizations for di�erential equa-tions. But it seems to be particularly important in the context of boundaryvalue ODEs, so we brie y discuss it here. This section can be viewed as aroad map for numerical methods for BVPs.As we have seen in x6.2 a stable linear BVP must have a dichotomy, i.e. acertain number of its fundamental modes are nonincreasing and the rest arenondecreasing, throughout the interval [0; b] on which the di�erential problem

Chapter 8: Finite Di�erence Methods 221is de�ned. Slow modes can be grouped either with the rapidly increasing orwith the rapidly decreasing modes, but this is possible only locally, i.e. amode can change from fast to slow and vice versa in di�erent subintervals of[0; b].What must be avoided then is a numerical integration of fast modes inthe direction of their increase. When the modes are decoupled, as e.g. in thesystem y0 = 0@� �0 ��1Ay (8.24)where Re(�) > 0 and j�j is not much larger than j�j, then the modes can beintegrated in suitable directions { for (8.24) the second ODE is integratedfrom 0 to b and then the �rst is integrated from b to 0. In the general casewhere the modes are not decoupled, some decoupling must be applied. Thesimple shooting method does not apply any decoupling, hence it is unstablein the presence of fast (not even very fast) increasing and decreasing modes.For the multiple shooting method discussed in x7.2 and the �nite dif-ference methods discussed in the early sections of this chapter, stability isproved provided that su�ciently many shooting points or mesh points areused. This is the limit where these methods are all similar to one another andbounds like (8.13) apply, indicating that the discretization follows the con-tinuous system closely. The decoupling of modes is then achieved implicitly,through the solution of the linear system of algebraic equations. Recall fromx8.2 that this system has an almost block diagonal form (Fig. 8.3) corre-sponding to the sequential ordering of the mesh points. So, when performingLU -decomposition and forward and backward substitutions in order to solvea system like (8.14), we are in e�ect sweeping forward and then backwardalong the interval of integration. The LU -decomposition itself can be seento correspond to a decoupling transformation along the lines given by thedichotomy bound (6.16).The great robustness of symmetric di�erence methods arises from the pos-sibility of achieving the decoupling e�ect implicitly, i.e. without an explicittransformation, even for sti� BVPs. But for very sti� problems a methodlike midpoint also tends to transform fast increasing and decreasing modesinto slower ones, when the step size is not very small.Some unfortunate e�ects may also result when fast and slow modes arenot suitably decoupled by the numerical method. This may occur alreadyfor stable IVPs (Exercise 8.14 is a case in point) but such problems are rarerin practice and, moreover, the decoupling must be done locally, hence explic-itly, as described in x8.6, which is not very practical for many applications.Trouble can happen also for DAEs when di�erent solution components arenot properly decoupled, as we will see in the next two chapters.

222 Chapter 8: Boundary Value Problems8.8 Software, Notes and References8.8.1 NotesMuch of the early theoretical development of the theory of numericalmethodsfor boundary value problems was done by H. Keller and appears in [59] aswell as in the more modern reference book [8].A lot of work was done on the numerical solution of second order two-point boundary value problems. Often a single ODE is considered in thiscontext. Many papers on numerical methods for sti� BVPs of this sorthave appeared, probably both because of the relevance to advection-di�usionPDEs where advection dominates and because of the relative tractability ofthese problems compared to the general sti� system case. We have devoteda series of exercises to this (Exercises 8.9 { 8.12), and refer for more to [8]and the references therein.But for our main exposition we consider the general ODE system case,which naturally extends our discussion in the previous IVP and shootingchapters. All of the material covered in this chapter, including proofs andreferences which we have omitted here (plus, be warned, much more!) canbe found in [8].The linear system solvers used in all the leading software are more so-phisticated than the band solver that we have described. See Chapter 7 of [8]and references therein. For the parallel solution of such systems, see Wright[97].A thorough treatment of discretization methods and their asymptoticexpansions can be found in [89]. See also the early book [41]. V. Pereyramade fundamental contributions to the theory of deferred corrections. Animportant work in the early development of collocation at Gaussian pointsis de Boor & Swartz [36], although the later treatment in [8] is cleaner.The earliest uses of the principle of error equidistribution seem to havebeen made in de Boor [35]; see also [70]. (The apparent addition of theword `equidistribution' to the English language is due to M. Lentini and V.Pereyra.)Major contributions on decoupling in BVPs were made in the 1980's byR. Mattheij and appear in [8].In Chapter 11 of [8] there is a brief description plus relevant referencesof a number of topics which we have omitted here, except in the occasionalexercise. These include eigenvalue problems; singular BVPs; BVPs on in�niteintervals; singular points, bifurcation and arclength continuation; and highlyoscillatory BVPs.Finally, while we have treated �nite di�erence methods exclusively in thisbook, there has been much theoretical development on �nite elementmethodsas well (see, e.g., [90, 20]). The power of the latter methods, however, appears

Chapter 8: Finite Di�erence Methods 223to be more pronounced in the PDE context.8.8.2 SoftwareMost general-purpose codes for BVPs which are publicly available use themethods described in this chapter.� The code colsys by Ascher, Christiansen & Russell [5] and its newerversion colnew by Bader & Ascher [13] use collocation at Gaussianpoints. This code is available from netlib.� Also available from netlib is the code twpbvp by Cash & Wright[30], which uses deferred correction in combination with certain non-collocation Runge-Kutta methods called mono-implicit, which we havenot covered in this book.� The NAG library contains the code pasvar by Lentini & Pereyra [63]which also uses deferred correction. This code has been in uential formany years.� The code auto, by Doedel & Kernevez [38], which does bifurcationanalysis and �nds periodic solutions, is based on Gauss collocation.8.9 Exercises1. Show that the formulation of the quasilinearization method using � asde�ned in (8.6)-(8.7) is equivalent to the formulation using (8.5).2. Carry out the development of theory and practice as in x8.1 for thetrapezoidal method (3.32) instead of the midpoint method.3. (a) Write down the quasilinearization (or the linearization) problem(8.6) for the BVP (8.23).(b) Show that this linearized problem is singular (i.e. it does not havea unique solution) when it is carried out about u � 0. Concludethat starting the quasilinearization iteration with the initial guessu0 � 0 is unwise in this example.4. It can be shown that the error when applying the trapezoidal methodto a su�ciently smooth BVP (8.1)-(8.2) has the expansionen = y(tn)� yn = lXj=1 cjh2jn +O(h2l+1) (8.25)

224 Chapter 8: Boundary Value Problemswhere h = maxn hn on a mesh � which satis�esh=minn hn � constant:The functions cj are independent of the mesh �. Just how large l isdepends on the smoothness of the problem, and we assume l � 3.(a) Construct a method of order 6 using extrapolation, based on thetrapezoidal method.(b) Apply this extrapolation method to the problem of Examples 8.1and 8.3, using the same parameter values and meshes. Comparewith collocation at 3 Gaussian points (Example 8.3). What areyour conclusions?5. Use your code from the previous exercise, or any available softwarebased on the methods discussed in this chapter, to solve the followingproblems to about 5-digit accuracy.(a) Find a nontrivial solution for the problem (7.13) of Exercise 7.4.(b) Find the attracting limit cycle and the period of the Van der Polequation (7.15).(c) Solve (8.23) for � = 1. What is the corresponding value of �?6. The injected uid ow through a long, vertical channel gives rise to theBVP u0000 = R (u0u00 � uu000)u(0) = u0(0) = 0u(1) = 1; u0(1) = 0where u is a potential function and R is a given (constant) Reynoldsnumber.Use your code from the previous exercise, or any available software (wesuggest that it be based on the methods discussed in this chapter), tosolve this problem for 4 values of R: R = 10; 100; 1000 and R =10; 000. Observe the increased di�culty, due to a boundary layer nearthe left boundary, as R increases.7. Consider the following particle di�usion and reaction system,T 00 + 2t T 0 = ��2�Ce (1�T�1)C 00 + 2t C 0 = �2Ce (1�T�1)

Chapter 8: Finite Di�erence Methods 225where C(t) is the concentration and T (t) is the temperature. Repre-sentative values for the constants are = 20, � = 0:02, � = 14:44. Theboundary conditions at t = 0 areT 0(0) = C 0(0) = 0:Use any available software (we suggest that it be based on the methodsdiscussed in this chapter), to solve this problem for the following setsof additional boundary conditions:(a) T 0(1) = C 0(1) = 1:(b) �T 0(1) = �(T (1) � 1); �C 0(1) = �(C(1) � 1), with � = 5; � =250.[This case may cause you more grief. Note that there is a thinboundary layer near t = 1.]8. (a) Show that the error expansion (8.16)-(8.17) holds for the trape-zoidal method.(b) The centered approximation (8.18) is not good near the boundary,e.g. for n = 1. Construct one-sided, 2nd order accurate di�erencemethods for near-boundary points.(c) To what order should T1 and T2 be approximated in order toachieve a 6th order deferred correction method? Explain why youneed the 4th order ~yn, not just the 2nd order yn, to constructsuch higher order approximations for the truncation error terms.9. Consider the scalar ODE of order 2�(a(t)u0)0 + b(t)u = q(t) (8.26)u(0) = b1; u(1) = b2where a > 0; b � 0;8t. We convert this ODE into a �rst order systemwithout di�erentiating a byu0 = a�1vv0 = bu� q:(a) Show that if we discretize the �rst order system by the midpointmethod we obtain a 5-diagonal matrix A.(b) Consider instead a staggered midpoint method: on a uniform mesh,the equation for u is centered at tn�1=2 and the equation for v iscentered at tn, with u� de�ned at mesh points and v� at midpoints:(un � un�1)=h = a�1(tn�1=2)vn�1=2(vn+1=2 � vn�1=2)=h = b(tn)un � q(tn):

226 Chapter 8: Boundary Value ProblemsShow that by eliminating the v-values we obtain for the meshvalues in u a tridiagonal matrix A. Under what condition are weassured that it is diagonally dominant?(c) The usual 3-point formula for discretizing (8.26) becomes �rstorder if the mesh is no longer uniform. Generalize the staggeredmidpoint method developed above to obtain a 2nd order accurate,3-point method for u on an arbitrary mesh.[Hint: You can use quadratic interpolation of three adjacent meshvalues of u without changing the sparsity structure of A.](d) Try your method on the problem given by a = 1+t2; b = 1; u(t) =sin(t) (calculate the appropriate q(t) and b required for this exactsolution). Compute maximum errors on three meshes:� a uniform mesh with h = :01,� a uniform mesh with h = :02,� a nonuniform mesh with 100 subintervals. The step sizes areto be chosen by a random number generator, scaled and trans-lated to lie between :01 and :02.What are your observations?10. For the second order ODE systemy00 = f(t;y)we can consider the linear 3-point methods�0yn+1 + �1yn + �2yn�1 = h2(�0fn+1 + �1fn + �2fn�1) (8.27)where we use the notational convention of Chapter 5 and set �0 =1. (Note that these methods are compact: the order of the di�erenceequation is 2, just like the order of the ODE, so there are no parasiticroots for the stability polynomial here.)(a) Derive order conditions (as in x5.2) for (8.27).(b) Show that to obtain a consistent method (with a constant h) wemust set �1 = �2; �2 = 1, as in the usual discretization for y00,and �0 + �1 + �2 = 1 :(c) Show that to obtain a second order method we must set in addition�0 = �2 :In particular, the usual formula with �0 = �2 = 0 and �1 = 1 issecond order accurate.

Chapter 8: Finite Di�erence Methods 227(d) Show that Cowell's methodyn+1 � 2yn + yn�1 = h212(fn+1 + 10fn + fn�1) (8.28)is 4th order accurate.(e) Describe in detail an implementation of the method (8.28) for theDirichlet BVP, where y(0) and y(b) are given.11. Consider the scalar Dirichlet problem�"u00 + au0 = q(t)u(0) = b1; u(1) = b2where a 6= 0 is a real constant and 0 < " � 1. Assume for simplicitythat there are no boundary layers (i.e. the values b1 and b2 agreewith the reduced solution satisfying u0 = q=a), and consider 3-pointdiscretizations on a uniform mesh with step size h = 1N+1,�nun = �nun�1 + nun+1 + qn; 1 � n � Nu0 = b1; uN+1 = b2 :The solution for u� = (u1; : : : ; uN)T requires solving a linear tridiago-nal system with the matrixA = 0BBBBBBBBB@ �1 � 1��2 �2 � 2. . . . . . . . .��N�1 �N�1 � N�1��N �N 1CCCCCCCCCA :It is desirable that A be diagonally dominant. A related, importantrequirement is that the method be positive:�n > 0; �n � 0; n � 0; 8n:(This implies a discrete maximum principle which yields stability.)(a) A symmetric, or centered 2nd order discretization is given by"h2 (�un�1 + 2un � un+1) + a2h(un+1 � un�1) = q(tn) :Show that A is diagonally dominant and the method is positive ifand only if R = jajh" � 2:(R is called the mesh Reynolds number).

228 Chapter 8: Boundary Value Problems(b) An upwind method is obtained by replacing the discretization ofu0 with forward or backward Euler, depending on sign(a):"h2 (�un�1 + 2un � un+1) + ah�n = q(tn)�n = (un+1 � un a < 0un � un�1 a � 0Show that this method is positive and A is diagonally dominantfor all R � 0. It is also only �rst order accurate.(c) Show that ah�n = aun+1 � un�12h + nwhere n is the 3-point discretization of �hjaju00. The upwindmethod can therefore be viewed as adding an O(h) arti�cial dif-fusion term to the centered discretization.12. This exercise continues the previous one.(a) Extend the de�nitions of the centered and upwind 3-point dis-cretizations to the ODE�"u00 + a(t)u0 + b(t)u = q(t); 0 < t < 1where a(t) varies smoothly and can even change sign on (0; 1),and b(t) is a smooth, bounded function. What happens whena(t) = t� 12 and 0 < "� h� 1? What happens when there areboundary or turning-point layers?(b) Extend the de�nitions of centered and upwind 3-point discretiza-tions to the nonlinear problem�"u00 + uu0 + b(t)u = q(t); 0 < t < 1u(0) = �1; u(1) = 1:(c) When R < 2 the centered method is preferred because of its ac-curacy. When R > 2 the upwind method has superior stabilityproperties. Design a method which mixes the two and graduallyswitches between them, adding at each mesh point just enougharti�cial di�usion to achieve positivity, for any values of ", h, anda or u.13. (a) Write down the midpoint method, on an arbitrary mesh �, for thescalar ODE y0 = �(y � q(t)); 0 < t < b

Chapter 8: Finite Di�erence Methods 229where q(t) is a smooth, bounded function and �� 1. Considerthe IVP case with �hn � �1, where h is the maximum step size(h = max1�n�N hn). Assume no initial layer, so jy00j and highersolution derivatives are bounded independently of �.(b) Show that the local truncation error satis�esdn = �znhn8 (y00(tn) +O(hn)) +O(h2n)where zn = �hn, and that the global error en = y(tn)�yn satis�esen = (1� zn=2)�1(1 + zn=2)en�1 + (1� zn=2)�1hndn; 1 � n � N:(c) Letting zn ! �1, 1 � n � N , show that the global error satis�esen = nXj=0 (�1)j h2j4 (y00(tj) +O(hj)):This is the leading error term for the case zn � �1; 1 � n � N .(d) Conclude that (for b = O(1)) the error for the midpoint methodin the very sti� limit reduces to O(h). However, if the mesh islocally almost uniform, i.e. the steps can be paired such that foreach odd j hj+1 = hj(1 +O(hj))then the convergence order is restored to O(h2).[This mesh restriction is mild: take any mesh, and double it asfor extrapolation by replacing each element by its two halves. Theresulting mesh is locally almost uniform. Note, on the other hand,that even when the second order accuracy is thus restored, thereis no error expansion of the type utilized in x8.3.2.](e) Can you guess why we have included this exercise here, ratherthan in Chapters 3 or 4?14. Consider the initial value problem0@1 �t0 " 1Ay0 = 0@�1 1 + t� �(1 + �t)1Ay +0@ 0sin t1A (8.29)y(0) = (1; �)Twhere the two parameters � and " are real and 0 < "� 1.

230 Chapter 8: Boundary Value Problems(a) Apply the transformation0@1 �t0 1 1Ay = wto show that this problem is stable and to �nd the exact solution.(b) Let " = 10�10. There is no initial layer, so consider applying themidpoint method with a uniform step sizeh = :1=max(j�j; 1)to (8.29). Calculate maximum errors in y1 for � = 1; 100; �100.What are your observations?(c) Attempt to explain the observed results. [This may not be easy.]

Chapter 9More on Di�erential-AlgebraicEquationsIn this chapter and the next we study di�erential-algebraic equations (DAEs),already introduced in x1.3. Here we consider the mathematical structureof such systems and some essential analytical transformations. Numericalapproaches and discretizations are discussed in the next chapter. But here,too, our motivation remains �nding practical computer solutions. Comparedto Chapters 2 and 6 this chapter is unusually long. One reason is that DAEtheory is much more recent than ODE theory. As a result DAE theory ismore in a state of ux, and good expositions are scarce. More importantly,understanding the principles highlighted here is both essential for, and willget you a long way towards, constructing good numerical algorithms.To get a taste of the similarity and the di�erence between DAEs andODEs, consider two functions y(t) and z(t) which are related on some interval[0; b] by y0(t) = z(t); 0 � t � b (9.1)and the task of recovering one of these functions from the other. To recoverz from y one needs to di�erentiate y(t) { an automatic process familiar to usfrom a �rst calculus course. To recover y from z one needs to integrate z(t) {a less automatic process necessitating also an additional side condition (suchas the value of y(0)).This would suggest that di�erentiation is a simpler, more straightforwardprocess than integration. On the other hand, though, note that y(t) is gen-erally a smoother function than z(t). For instance, if z(t) is bounded but hasjump discontinuities then y(t) is once di�erentiable { see Fig. 9.1.Thus, integration is a smoothing process while di�erentiation is an anti-smoothing process. The di�erentiation process is in a sense unstable, 1 al-1If we add to y(t) a small perturbation � cos !t, where j�j � 1 and ! > j��1j, then z(t)is perturbed by a large amount j!�j. 231

232 Chapter 9: Di�erential-Algebraic Equations0 0.5 1 1.5 2 2.5

−0.5

0

0.5

1

1.5

t

y (a) y(t) 0 0.5 1 1.5 2 2.5−1.5

−1

−0.5

0

0.5

1

1.5

t

z (b) z(t) = y0(t)Figure 9.1: A function and its less smooth derivative.though it is often very simple to carry out analytically.A di�erential equation involves integration, hence smoothing: the solutiony(t) of the linear system y0 = Ay+q(t) is one derivative smoother than q(t).A DAE, on the other hand, involves both di�erentiations and integrations.The class of DAEs contains all ODEs, as well as the problems in Example9.1 below. But it also contains problems where both di�erentiations andintegrations are intertwined in a complex manner, and that's when the funreally starts: simple di�erentiations may no longer be possible, but theire�ect complicates the numerical integration process, potentially well beyondwhat we have seen so far in this book.9.1 Index and Mathematical StructureSince a DAE involves a mixture of di�erentiations and integrations, one mayhope that applying analytical di�erentiations to a given system and eliminat-ing as needed, repeatedly if necessary, will yield an explicit ODE system forall the unknowns. This turns out to be true, unless the problem is singular.The number of di�erentiations needed for this transformation is called theindex of the DAE. Thus, ODEs have index 0. We will re�ne this de�nitionlater, but �rst let us consider some simple examples.Example 9.1 Let q(t) be a given, smooth function, and consider the follow-

Chapter 9: More on Di�erential-Algebraic Equations 233ing problems for y(t).� The scalar equation y = q(t) (9.2)is a (trivial) index-1 DAE, because it takes one di�erentiation to obtainan ODE for y.� For the system y1 = q(t)y2 = y01 (9.3)we di�erentiate the �rst equation to gety2 = y01 = q0(t)and then y02 = y001 = q00(t) :The index is 2 because two di�erentiations of q(t) were needed.� A similar treatment for the systemu = q(t)y3 = u00 (9.4)necessitates 3 di�erentiations to obtain an ODE for y3, hence the indexis 3. 2Note that whereas m initial or boundary conditions must be given tospecify the solution of an ODE of size m, for the simple DAEs of Example9.1 the solution is completely determined by the right hand side. Morecomplicated DAE systems will usually include also some ODE subsystems.Thus, the DAE system will in general have l degrees of freedom, where l isanywhere between 0 and m.In general it may be di�cult, or at least not immediately obvious, todetermine which l pieces of information are needed to determine the DAEsolution. Often the entire initial solution vector is known. Initial or boundaryconditions which are speci�ed for the DAE must be consistent. In otherwords, they must satisfy the constraints of the system. For example, aninitial condition on the index-1 system (9.2) (which is needed if we write itas an ODE) must satisfy y1(0) = q(0). For the index-2 system (9.3), thesituation is somewhat more complicated. Not only must any solution satisfy

234 Chapter 9: Di�erential-Algebraic Equationsthe obvious constraint y1 = q(t), there is also a hidden constraint y2 = q0(t)which the solution must satisfy at any point t, so the only consistent initialconditions are y1(0) = q(0); y2(0) = q0(0). This is an important di�erencebetween index-1 and higher-index (index greater than 1) DAEs: higher-index DAEs include some hidden constraints. These hidden constraints arethe derivatives of the explicitly-stated constraints in the system. Index-2systems include hidden constraints which are the �rst derivative of explicitly-stated constraints. Higher-index systems include hidden constraints whichcorrespond to higher-order derivatives; for example, solutions to the index-3system (9.4) must satisfy the hidden constraints u0 = q0(t) and y3 = q00(t).The most general form of a DAE is given byF(t;y;y0) = 0 (9.5)where @F=@y0 may be singular. The rank and structure of this Jacobianmatrix may depend, in general, on the solution y(t), and for simplicity wewill always assume that it is independent of t. Recall also from x1.3 theimportant special case of a semi-explicit DAE, or an ODE with constraints,x0 = f(t;x; z) (9.6a)0 = g(t;x; z): (9.6b)This is a special case of (9.5). The index is one if @g=@z is nonsingular,because then one di�erentiation of (9.6b) yields z0 in principle2. For thesemi-explicit index-1 DAE we can distinguish between di�erential variablesx(t) and algebraic variables z(t). The algebraic variables may be less smooththan the di�erential variables by one derivative (e.g. the algebraic variablesmay be non-di�erentiable).In the general case each component of y may contain a mix of di�erentialand algebraic components, which makes the numerical solution of such high-index problems much harder and riskier. The semi-explicit form is decoupledin this sense. On the other hand, any DAE (9.5) can be written in the semi-explicit form (9.6) but with the index increased by one, upon de�ning y0 = z,which gives y0 = z (9.7a)0 = F(t;y; z): (9.7b)Needless to say, this re-writing alone does not make the problem easier tosolve. The converse transformation is also possible: given a semi-explicitindex-2 DAE system (9.6), let w0 = z. It is easily shown that the systemx0 = f(t;x;w0) (9.8a)0 = g(t;x;w0) (9.8b)2Note that a di�erentiation of a vector function counts as one di�erentiation.

Chapter 9: More on Di�erential-Algebraic Equations 235is an index-1 DAE and yields exactly the same solution for x as (9.6). Theclasses of fully implicit index-1 DAEs of the form (9.5) and semi-explicitindex-2 DAEs of the form (9.6) are therefore equivalent.It is important to note, as the following example illustrates, that in generalthe index depends on the solution and not only on the form of the DAE.This is because the local linearization, hence the partial derivative matrices,depend on the solution.Example 9.2 Consider the DAE system for y = (y1; y2; y3)T ,y01 = y30 = y2(1� y2)0 = y1y2 + y3(1� y2)� t:The second equation has two solutions y2 = 0 and y2 = 1, and it isgiven that y2(t) does not switch arbitrarily between these two values (e.g.another equation involving y02 and y04 is prescribed with y4(0) given, implyingcontinuity of y2(t)).1. Setting y2 = 0, we get from the third equation y3 = t. Then fromthe �rst equation y1 = y1(0) + t2=2. The system has index 1 and thesolution is y(t) = (y1(0) + t2=2; 0; t)T :Note that this is an index-1 system in semi-explicit form.2. Setting y2 = 1, the third equation reads y1 = t. Then, upon di�erentiat-ing the �rst equation, y3 = 1. The system has index 2 and the solutionis y(t) = (t; 1; 1)T :Note that, unlike in the index-1 case, no initial value is required.If we replace the algebraic equation involving y2 by its derivative andsimplify, we obtain the DAEy01 = y3 (9.9)y02 = 00 = y1y2 + y3(1 � y2)� t :Now the index depends on the initial conditions. If y2(0) = 0 the index is 1,and if y2(0) = 1 the index equals 2. 2We are ready to de�ne the index of a DAE.

236 Chapter 9: Di�erential-Algebraic EquationsDe�nition 9.1 For general DAE systems (9.5), the index along a solutiony(t) is the minimum number of di�erentiations of the system which would berequired to solve for y0 uniquely in terms of y and t (i.e. to de�ne an ODEfor y). Thus, the index is de�ned in terms of the overdetermined systemF(t;y;y0) = 0dFdt (t;y;y0;y00) = 0...dpFdtp (t;y;y0; : : : ;y(p+1)) = 0 (9.10)to be the smallest integer p so that y0 in (9.10) can be solved for in terms ofy and t.We note that in practice, di�erentiation of the system as in (9.10) israrely done in a computation. However, such a de�nition is very useful inunderstanding the underlying mathematical structure of the DAE system,and hence in selecting an appropriate numerical method.Example 9.3 The computer-aided design of electrical networks involves sim-ulations of the behavior of such networks in time. Electric circuits are as-sembled from basic elements such as resistors, diodes, inductors, capacitorsand sources. Large circuits can lead to large DAE systems.A circuit is characterized by the type of elements it has and by its net-work's topology. For each element there is a relationship of the voltage dropbetween the nodes of the element to the current. For instance, a linear resis-tor satis�es, by Ohm's law, U = RIwhere U is the potential drop, I = Q0 is the current (Q is the charge), andR is the resistance; for a linear inductorU = LI 0where L is the inductance; and for a linear capacitorI = CU 0where C is the capacitance. There are nonlinear versions of these too, e.g.L = L(I) for a current-controlled inductor or C = C(U) for a voltage-controlled capacitor.The network consists of nodes and branches (it is a directed graph) andits topology can be encoded in an incidence matrix A. The (i; j)th entry of Ais 1 if current ows from node i into branch j, �1 if current ows in branch

Chapter 9: More on Di�erential-Algebraic Equations 237j towards node i, and 0 if node i and branch j are not adjacent. Thus, Ais typically large and very sparse. Let uN be the vector function of all nodepotentials, uB the branch potentials and iB the (branch) currents. Kircho�'scurrent law states that AiB = 0 (9.11a)and Kircho�'s voltage law states thatuB = ATuN : (9.11b)Adding to this the characteristic element equations as described earlier,�(iB;uB; i0B;u0B) = 0; (9.11c)we obtain a typically very large, sparse DAE.The sparse tableau approach leading to the DAE (9.11) is general, andsoftware can be written to generate the equations from a given functionaldescription of a circuit, but it is not favored in practice because it leads totoo much redundancy in the unknowns. Instead, the modi�ed nodal analysiseliminates uB (via (9.11b)) and the currents iB, except for those currentsthrough voltage-controlled elements (inductors and voltage sources). Thisleads to a large, sparse, but smaller DAE of the formM(y)y0 + f(y) = q(t) (9.12)where the possibly singular and still quite sparse M describes the dynamicelements, f corresponds to the other elements and q are the independentsources.The index of (9.12) depends on the type of circuit considered. In practicalapplications it often equals 0 or 1, but it may be higher. This index is oftenlower than that of (9.11), because some constraints are eliminated. Standardsoftware exists which generates (9.12) from a functional description. How-ever, a further reduction to an explicit ODE in the case that M is singularis not a practical option for most large circuits, because the sparsity of M isdestroyed by the necessary matrix decomposition (such as (9.27) below).A speci�c instance of a circuit is given in Example 10.3. 2For initial value ODEs, Theorem 1.1 guarantees solution existence, unique-ness and continuous dependence on initial data for a large class of problems.No corresponding theorem holds in such generality for boundary value ODEs(see Chapter 1). No corresponding theorem holds for general DAEs either,although there are some weaker results of this type. Boundary value DAEsare of course no less complex than boundary value ODEs, and will not beconsidered further in this chapter.

238 Chapter 9: Di�erential-Algebraic Equations9.1.1 Special DAE FormsThe general DAE system (9.5) can include problems which are not well-de�ned in a mathematical sense, as well as problems which will result infailure for any direct discretization method (i.e. a method based on dis-cretization of y and y0 without �rst reformulating the equations). Fortu-nately, most of the higher-index problems encountered in practice can beexpressed as a combination of more restrictive structures of ODEs coupledwith constraints. In such systems the algebraic and di�erential variables areexplicitly identi�ed also for higher index DAEs, and the algebraic variablesmay all be eliminated (in principle) using the same number of di�erentiations.These are called Hessenberg forms of the DAE and are given below.Hessenberg index-1 x0 = f(t;x; z) (9.13a)0 = g(t;x; z): (9.13b)Here the Jacobian matrix function gz is assumed to be nonsingular for all t.This is also often referred to as a semi-explicit index-1 system. Semi-explicitindex-1 DAEs are very closely related to implicit ODEs. Using the implicitfunction theorem, we can in principle solve for z in (9.13b). Substituting zinto (9.13a) yields an ODE in x (although no uniqueness is guaranteed; seeExercise 9.5). For various reasons, this procedure is not always recommendedfor numerical solution.Hessenberg index-2 x0 = f(t;x; z) (9.14a)0 = g(t;x): (9.14b)Here the product of Jacobians gxfz is nonsingular for all t. Note the absenceof the algebraic variables z from the constraints (9.14b). This is a pureindex-2 DAE and all algebraic variables play the role of index-2 variables. 3Example 9.4 A practical example of a pure index-2 system arises from mod-eling the ow of an incompressible uid by the Navier-Stokes equationsut + uux + vuy + px � �(uxx + uyy) = 0 (9.15a)vt + uvx + vvy + py � �(vxx + vyy) = 0 (9.15b)ux + vy = 0 (9.15c)3Whether a DAE is Hessenberg index-1 or index-2 may depend on the solution (Ex-ample 9.2) but usually doesn't in practice.

Chapter 9: More on Di�erential-Algebraic Equations 239where subscripts denote partial derivatives, x; y are spatial variables and t istime, u; v are the velocities in the x- and y- directions, respectively, p is thescalar pressure, and � is the (known) kinematic viscosity. Equations (9.15a)-(9.15b) are the momentum equations, and (9.15c) is the incompressibilitycondition. The extension to three spatial variables is straightforward. Aftera careful spatial discretization of (9.15) with a �nite-di�erence, �nite-volumeor �nite-element method, the vectors u(t) and p(t) approximating (u(t; x; y),v(t; x; y)) and p(t; x; y) in the domain of interest satisfyMu0 + (K +N(u))u + Cp = f (9.16a)CTu = 0: (9.16b)In this DAE the mass matrix M is symmetric positive de�nite. Skippingsome nontrivial details of the spatial discretization, we assume not only thatthe same matrix C appears in (9.16a) and (9.16b) but also that CTM�1C isa nonsingular matrix with a bounded inverse. This yields an index-2 DAE inHessenberg form. The DAE could be made semi-explicit upon multiplyingby M�1, but the sparsity of the coe�cient matrices of the DAE would be lost,unless M is block-diagonal. The forcing function f comes from the (spatial)boundary conditions.It is well-known that obtaining an accurate solution for the pressure in(9.15) can be problematic. Often this variable is treated in a di�erent way bydiscretization methods. For instance, a staggered grid may be used in space,where the pressure values are considered at mid-cells and the velocity values\live" on cell edges. Part of the reason for this is that the pressure in (9.15)is an index-two variable. It has the same order of (temporal) smoothness asthe derivative of the velocity. The pressure in (9.16) is playing the role ofthe index-two variable z in (9.14).One can consider di�erentiating (9.15c) with respect to time and substi-tuting into (9.15a,9.15b) to obtain a Poisson equation for p with the righthand side being a function of u and v. This is called the pressure-Poissonequation { the matrix CTM�1C above can in fact be viewed as a discretizationof the Laplace operator plus suitable boundary conditions { and the obtainedsystem has index 1. For the index-1 system the discretization in space needno longer be staggered, but some di�culties with boundary conditions mayarise. 2Another way to look at index-2 variables like the pressure in (9.16) de-rives from the observation that these DAEs are closely related to constrainedoptimization problems. From this point of view, p in (9.16) plays the role of

240 Chapter 9: Di�erential-Algebraic Equationsa Lagrange multiplier: it forces the velocity u to lie in the constraint man-ifold de�ned by (9.16b). The relationship between higher-index DAEs andconstrained optimization problems is no accident; many of these DAEs, in-cluding the incompressible Navier-Stokes equations, arise from constrainedvariational problems.Example 9.5 Consider the DAEy01 = �y1 � y4 (9.17)y02 + y03 = (2� � sin2 t)(y2 + y3) + 1=2(y2 � y3)20 = y2 � y3 � 2(sin t)(y1 � 1)0 = y2 + y3 � 2(y1 � 1)2where � is a parameter and y1(0) = 2; y2(0) = 1 are prescribed.This DAE is not in semi-explicit form. We can, however, easily convertit to that form by the constant, nonsingular transformationx1 = y1; x2 = 12(y2 + y3); z1 = 12(y2 � y3); z2 = y4yielding x01 = �x1 � z2 (9.18a)x02 = (2� � sin2 t)x2 + z21 (9.18b)0 = z1 � (sin t)(x1 � 1) (9.18c)0 = x2 � (x1 � 1)2: (9.18d)The DAE is now in the semi-explicit form (9.6), but it is not in Hes-senberg form. In particular, (9.18c) yields z1 = z1(x), so z1 is an index-1algebraic variable, whereas z2 cannot be eliminated without di�erentiation. Adi�erentiation of (9.18d) and a substitution into (9.18a) con�rm that, for thegiven initial conditions, z2 can be subsequently eliminated. Hence the DAEis index-2 and z2 is an index-2 algebraic variable.Note that if we further carry out the substitution for z1 then the resultingDAE x01 = �x1 � z2 (9.19)x02 = (2� � sin2 t)x2 + (sin2 t)(x1 � 1)20 = x2 � (x1 � 1)2is Hessenberg index-2. 2

Chapter 9: More on Di�erential-Algebraic Equations 241Hessenberg index-3 x0 = f(t;x;y; z) (9.20a)y0 = g(t;x;y) (9.20b)0 = h(t;y): (9.20c)Here the product of three matrix functions hygxfz is nonsingular.Example 9.6 The mechanical systems with holonomic constraints describedin Example 1.6 are Hessenberg index-3. This type of DAEs often arises fromsecond-order ODEs subject to constraints.Indeed, the ODEs describe Newton's second law of motion relating bodyaccelerations to forces. Since accelerations are second derivatives of positions,constraints imposed on the positions imply that two di�erentiations must beburied in the system of ODEs with constraints. 2The index of a Hessenberg DAE is found, as in the general case, bydi�erentiation. However, here only the constraints need to be di�erentiated.Example 9.7 To illustrate, we �nd the index of a simple mechanical sys-tem, the pendulum in Cartesian coordinates from Example 1.5. We use thenotation q for the position coordinates and v = q0 for the velocities. First,the DAE is written as a �rst-order systemq01 = v1 (9.21a)q02 = v2 (9.21b)v01 = ��q1 (9.21c)v02 = ��q2 � g (9.21d)0 = q21 + q22 � 1 : (9.21e)(Note that � = �(t) is an unknown function and g is the known, scaledconstant of gravity.) Then the position constraint (9.21e) is di�erentiatedonce, to obtain q1q01 + q2q02 = 0 :Substituting for q0 from (9.21a) and (9.21b) yields the velocity constraintqTv = q1v1 + q2v2 = 0 : (9.22)Di�erentiating the velocity constraint (9.22) and substituting for q0 yieldsq1v01 + q2v02 + v21 + v22 = 0:

242 Chapter 9: Di�erential-Algebraic EquationsSubstituting for v0 from (9.21c) and (9.21d), and simplifying using the posi-tion constraint, yields the acceleration constraint�� q2g + v21 + v22 = 0: (9.23)This yields �, which can be substituted into (9.21c) and (9.21d) to obtain anODE for q and v. To obtain a di�erential equation for all the unknowns,however, we need to di�erentiate (9.23) one more time, obtaining an ODE for� as well. In the process of getting to the explicit ODE system, the positionconstraints were di�erentiated three times. Hence, the index of this systemis three. 2The index has proven to be a useful concept for classifying DAEs, in orderto construct and identify appropriate numerical methods. It is often notnecessary to perform the di�erentiations in order to �nd the index, becausemost physical systems can be readily seen to result in systems of Hessenbergstructure or in simple combinations of Hessenberg structures.Example 9.8 Consider a tiny ball of mass 1 attached to the end of a springof length 1 at rest with a spring constant "�1, " > 0. At its other end thespring's position is �xed at the origin of a planar coordinate system (see Fig.1.2 and imagine the rod in the simple pendulum being replaced by a spring).The sum of kinetic and potential energies in this system ise(q;v) = 12 [vTv+ "�1(r � 1)2] + gq2where q = (q1; q2)T are the Cartesian coordinates, v = (v1; v2)T are thevelocities (which equal the momenta p in our scaled, dimensionless notation),r = pq21 + q22 = jqj2 is the length of the spring at any given time and g isthe scaled constant of gravity. The equations of motion are (recall x2.5)q0 = ev = vv0 = �eq = �"�1 r � 1r q�0@0g1A :This is an ODE. Let us next write the same system as a DAE. De�ning� = "�1(r � 1) we get q00 = �1rq��0@0g1A"� = r � 1:

Chapter 9: More on Di�erential-Algebraic Equations 243This DAE is semi-explicit index-1. It is not really di�erent from the ODE ina meaningful way (although it may suggest controlling the error also in � ina numerical approximation).Next, consider what happens when the spring is very sti�, almost rigid,i.e. " � 1. We then expect the radius r to oscillate rapidly about its restvalue, while the angle � varies slowly. This is depicted in Fig. 9.2.Provided that the initial conditions yieldr(t) = 1 +O(")we have �(t) = O(1) to balance the constraint equation in the index-1 for-mulation. The passage to the limit "! 0 is simple then, and we obtain theDAE q00 = �q��0@0g1A0 = r � 1which gives the equations for the simple pendulum of Examples 1.5 and 9.7.4This is an index-3 DAE in Hessenberg form. Unlike the "-dependent ODEsolution, the DAE solution varies slowly! 2The simple example above leads to some important observations:� One rich source of DAEs in practice is as the limit systems of singularperturbation ODE problems, when the small parameter tends to 0.The solution then is often referred to as the reduced solution for thesingularly perturbed problem.� A higher index DAE can often be simpler than, or result as a simpli�-cation of, an ODE or a lower index DAE. In Example 9.8 the index-3DAE is much simpler to solve than the original ODE (or the index-1DAE) for a small ".� A DAE can in a sense be very close to another DAE with a di�erentindex. Thus, a more quantitative stability theory involving not onlythe DAE index is necessary for a more complete picture.4In Fig. 9.2 the initial conditions are such that j�j � 1, so that the oscillations in qcan be seen by the naked eye, but the limit DAE turns out to be the same.

244 Chapter 9: Di�erential-Algebraic Equations0 1 2 3 4 5 6 7 8 9 10

−1.5

−1

−0.5

0

0.5

1

1.5

t

q_1 (a) Cartesian coordinate q1 0 1 2 3 4 5 6 7 8 9 10

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

t

q_2 (b) Cartesian coordinate q2

0 1 2 3 4 5 6 7 8 9 100.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

t

r (c) Radius r 0 1 2 3 4 5 6 7 8 9 101.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2

t

thet

a (d) Angle �Figure 9.2: Sti� spring pendulum, " = 10�3, initial conditions q(0) = (1 �"1=4; 0)T ;v(0) = 0.

Chapter 9: More on Di�erential-Algebraic Equations 245'& $%Reader's advice: Below we continue to discuss properties ofthe index of a DAE, and DAE stability. The conclusion at theend of x9.1.2 is practically important, but it is possible to skipthe discussion, at least on �rst reading, and still understand thematerial in x9.2.9.1.2 DAE StabilityExample 9.2 suggests that the index is a local quantity, to be measured aboutan isolated exact solution. Thus, we next consider perturbations in linearDAEs and their relationship to the index and to stability constants. For anonlinear problem we form the variational problem about an exact solutionand its perturbations, and de�ne the index locally based on the index of thislinear problem. As in x6.2 we note that for a linear problem the objective isto bound the solution in terms of the data (the inhomogeneities). The samebound then holds for a perturbation to the solution when the inhomogeneitiesare replaced by their perturbations. (See the exposition following (6.14).)For the linear ODE systemy0 = A(t)y+ q(t); 0 < t < bsubject to homogeneous initial or boundary conditions, we can transform theindependent variable by � = t=b for any large b. Let us assume that this hasbeen done and take b = 1. We have seen in x2.3 and in x6.2 that the followingstability bound holds:kyk = max0�t�1 jy(t)j � �Z 10 jq(s)jds = �kqk1: (9.24)5 For the trivial index-1 DAE y = q(t)we have a slightly weaker bound than (9.24), namelykyk � kqk5In (9.24) we have de�ned the L1 norm. Note thatkqk1 = Z 10 jq(s)jds � max0�t�1 jq(t)j = kqk :

246 Chapter 9: Di�erential-Algebraic Equations(weaker because the maximumnorm, rather than the L1-norm, must be usedfor q). For the semi-explicit index-1 DAEx0 = Ax+Bz+ q1(t) (9.25a)0 = Cx+Dz + q2(t) (9.25b)where A;B;C;D are bounded functions of t and with D boundedly invert-ible, we get a similar result, kyk � �kqkwhere yT = (xT ; zT ), qT = (qT1 ;qT2 ). The generic stability constant � in-volves bounds on D�1, as well as the stability constant of the underlyingODE for x, once z given by (9.25b) has been substituted for in (9.25a). Thisbound can actually be re�ned tokzk � �kqk; kxk � �kqk1 :For the general index-1 linear DAE 6E(t)y0 = A(t)y+ q(t) (9.26)still with homogeneous initial or boundary conditions, we can decomposeE(t) into E(t) = S(t)0@I 00 01AT�1(t) (9.27)where T and S are nonsingular matrix functions with uniformly boundedcondition numbers. Then a change of variables0@xz1A = T�1ywhere x has the dimension of the identity block in (9.27) yields a semi-explicit system (9.25). Hence we obtain again (assuming of course that theunderlying ODE problem is stable) an estimatekyk � �kqkwhere now the condition numbers of the transformations are also lumpedinto the stability constant �.In short, for a linear index-1 problem, if6We assume here that the system is strictly index-one, i.e. not tending arbitrarilyclosely to a higher-index or singular system.

Chapter 9: More on Di�erential-Algebraic Equations 247� it can be transformed (without di�erentiations) into a semi-explicit sys-tem, and from there to an ODE by eliminating the algebraic variables,� the transformations are all suitably well-conditioned,� the obtained ODE problem is stable,then the index-1 DAE problem is also stable in the usual sense. Exercise 9.7makes this statement precise.For higher index problems we must di�erentiate at least some of theequations. For an index-p DAE we need p � 1 di�erentiations to obtain anindex-1 DAE, hence all we can hope for is a \stability" bound of the formkyk � � pXj=1 kq(j�1)k : (9.28)Fortunately, for a DAE in Hessenberg form this can be somewhat improvedupon. In particular, for an index-2 Hessenberg DAE of the form (9.25) withD � 0 and CB nonsingular we havekxk � �kqk (9.29)kzk � �kq0k :All this suggests that a direct numerical discretization of nontrivial higherindex DAEs other than Hessenberg index-2 may encounter serious di�culties.We will see in the next chapter that this is indeed true.9.2 Index Reduction and Stabilization: ODEwith InvariantOften, the best way to solve a high index DAE problem is to �rst convert it toa lower index system by carrying out di�erentiations analytically. In this sec-tion we describe some of the techniques which are available for reformulationof a higher-index, semi-explicit DAE (9.6), where di�erentiations are appliedto the constraint equations (9.6b). The essential concept here is that theDAE is equivalent to an ODE with an invariant. For an index-(p+ 1) DAEin Hessenberg form with m ODEs and l constraints, recall that we need p dif-ferentiations in order to eliminate the algebraic variables and obtain an ODEsystem of size m in closed form. The equations (9.6b), together with their�rst p�1 derivatives (with z(t) eliminated), form an invariant set de�ned bypl algebraic constraints. One can consider using these algebraic constraints

248 Chapter 9: Di�erential-Algebraic Equationsat each t in order to de�ne a smaller set of m�pl unknowns. The di�erentialequations for the smaller set of unknowns then describe the dynamics whileenforcing the constraints. This yields an ODE on a manifold and is furtherdiscussed in x9.2.3. Since the dimension of the constraint manifold is pl, thetrue dimension (i.e. the number of degrees of freedom) of the entire systemis m� pl, as discussed in the previous section.In the presentation that follows we use constrained mechanical systemsas a case study for higher index DAEs in Hessenberg form. Problems fromthis important class are often solved in practice using the techniques of thissection. The general principles of reformulation of DAE systems are alsouseful in a wide variety of other applications.9.2.1 Reformulation of Higher-Index DAEsRecall the mechanical systems from Example 1.6,q0 = v (9.30a)M(q)v0 = f(q;v)�GT (q)� (9.30b)0 = g(q) (9.30c)where q are generalized body positions, v are generalized velocities, � 2 <lare Lagrange multiplier functions, g(q) 2 <l de�nes the holonomic con-straints, G = gq is assumed to have full row rank at each t, M is a positivede�nite generalized mass matrix and f are the applied forces. Any explicitdependence on t is omitted for notational simplicity, but of course all thequantities above are functions of t. We also denotex =0@qv1A 2 <m ;corresponding to the notation in (9.6).We now apply two di�erentiations to the position constraints (9.30c). The�rst yields the constraints on the velocity level0 = Gv (= g0) (9.31)and the second di�erentiation yields the constraints on the acceleration level0 = Gv0 + @(Gv)@q v (= g00) : (9.32)Next, multiply (9.30b) by GM�1 and substitute from (9.32) to eliminate �:�(q;v) = (GM�1GT )�1�GM�1f + @(Gv)@q v� : (9.33)

Chapter 9: More on Di�erential-Algebraic Equations 249Finally, � from (9.33) can be substituted into (9.30b) to yield an ODE for x,q0 = v (9.34a)Mv0 = f �GT (GM�1GT )�1�GM�1f + @(Gv)@q v� : (9.34b)In practice we may want to keep (9.34b) in the equivalent form (9.30b),(9.32) as long as possible and to never evaluate the matrix function @(Gv)@q(i.e., to evaluate only its product with v).The ODE system (9.34) has dimension m and is the result of an unstabi-lized index reduction. The constraints on the position and the velocity levels,which are now additional to this ODE, de�ne an invariant set of dimension2l, h(x) �0@ g(q)G(q)v1A = 0: (9.35)Thus, any solution of the larger ODE system (9.34) with consistent initialvalues, i.e. with initial values satisfying h(x(0)) = 0, satis�es h(x(t)) = 0 atall later times. We denote the constraint Jacobian matrixH = hx (9.36)and note that for the mechanical system (9.30),H = 0@ G 0@(Gv)@q G1A (9.37)has full row rank 2l. Restricted to the constraint manifold, the ODE hasdimension m� 2l, which is the correct dimension of the DAE (9.30).Example 9.9 For the DAE (9.21) of Example 9.7 we substitute�� = q2g � v21 � v22to obtain the ODE corresponding to (9.34),q01 = v1q02 = v2v01 = �(v21 + v22 � q2g)q1v01 = �(v21 + v22 � q2g)q2 � gand the invariant equations corresponding to (9.35),0 = q21 + q22 � 10 = q1v1 + q2v2: 2

250 Chapter 9: Di�erential-Algebraic Equations9.2.2 ODEs with InvariantsDi�erential systems with invariants arise frequently in various applications,not only as a result of index reduction in DAEs. The invariant might repre-sent conservation of energy, momentum or mass in a physical system. TheODE system in Example 9.8 has the invariant that the energy is constant int, as is typical for Hamiltonian systems. Recall also Exercises 4.16-4.17.The relationship between DAEs and ODEs with invariants goes bothways. Not only does index reduction of a DAE lead to an ODE with aninvariant, also an ODE with an invariantx0 = f (x) (9.38a)h(x) = 0 (9.38b)is equivalent to the Hessenberg index-2 DAEx0 = f(x)�D(x)z (9.39a)0 = h(x): (9.39b)Here D(x) is any bounded matrix function such that HD, where H = hx, isboundedly invertible for all t. The systems (9.38) and (9.39) have the samesolutions for x(t). The exact solution of (9.39) gives z(t) � 0, but this is nolonger true in general for a numerical discretization of this system. Note thatthe DAE (9.39) is not the same as the original DAE (9.30) in case that thelatter is the source of the system (9.38). The choice of the matrix function Din (9.39) de�nes the direction of the projection onto the constraint manifold.A common choice is D = HT , which yields an orthogonal projection.7Indeed, there are applications where simply integrating the ODE is a per-fectly valid and useful approach. The numerical solution does not preciselysatisfy the constraints then, but it is close to satisfying (9.38b) within theintegration tolerance. But in other applications the invariant cannot simplybe ignored. This is the case when there are special reasons for insisting thatthe error in (9.38b) be much smaller than the error in (9.38a), or when theproblem is more stable on the manifold than o� it.The latter reason applies in the case of a DAE index reduction. To seethis, imagine a nonsingular transformation of variablesq! 0@ �1A =0@g(q)~g(q)1A7Note that in the case of mechanical systems (9.37) we would like to avoid the lowerleft block of H if at all possible, see Exercise 9.10.

Chapter 9: More on Di�erential-Algebraic Equations 251such that ~gq is orthogonal to GT . Now, the di�erentiations of the constraints,i.e. (9.32), yield 00 = 0and this equation has a double eigenvalue 0. This indicates a mild insta-bility, because if (0) = �, 0(0) = 0 and 00 = 0, then (t) = 0:5�t2, i.e.perturbations grow quadratically in time. The instability, known as a drifto� the constraint manifold, is a result of the di�erentiations (i.e., it is notpresent in the original DAE, hence not in the equivalent ODE restricted tothe manifold).Rather than converting the ODE to a DAE, which carries the penalty ofhaving to solve the resulting DAE, we can consider stabilizing, or attenuatingthe ODE (9.38a) with respect to the invariant set M = fx : h(x) = 0g.The ODE x0 = f(x)� F (x)h(x) (9.40)obviously has the same solutions as (9.38a) on M (i.e. when h(x) = 0).It also has the desired stability behavior if HF is positive de�nite and thepositive parameter is large enough. In fact, we can easily apply a Lyapunov-type argument (see Exercises 2.3{2.4) to obtain12 ddthTh = hTh0 = hTH(f � Fh)� ( 0 � �0)hThwhere 0 is a constant such that, using the Euclidean vector norm,jH f (x)j � 0jh(x)j (9.41)for all x near M, and �0 is the smallest eigenvalue of the positive de�nitematrix function HF .Thus, asymptotic stability of the constraint manifold results for any > 0=�0. What this means is that any trajectory of (9.40) starting from someinitial value nearM will tend towards satisfying the constraints, i.e. towardsthe manifold. Moreover, this attenuation is monotonic:jh(x(t+ �))j � jh(x(t))j (9.42)for any t; � � 0.To get a grip on the values of 0 and �0, note that often 0 = 0 in (9.41),in which case the invariant is called an integral invariant. (Because for anyx(t) near M satisfying (9.38a) it transpires that ddth = 0, hence h(x(t)) isconstant.) For the mechanical system (9.30) it can be shown that 0 = 1(Exercise 9.8). Also, if we chooseF (x) = D(HD)�1

252 Chapter 9: Di�erential-Algebraic Equationswhere D(x) is as before in (9.39), then HF = I, hence �0 = 1.If the system is not sti� then (9.40) can be integrated by an explicitmethod from the Runge-Kutta or Adams families, which is often faster thanthe implicit methods of x10.1.Example 9.10 We consider again the simple pendulum in Cartesian coor-dinates and apply the Matlab standard IVP solver to the ODE of Example9.9. Starting from q(0) = (1; 0)T , v(0) = (0;�5)T , the solver is accurateenough and the problem simple enough, that the unit circle is obtained in theq-phase space to at least 4 signi�cant digits. Then we repeat the calculationsfrom the starting points q(0) = (1;�:5)T and the same v(0). The resultingcurves are depicted in Fig. 9.3(a).−1.5 −1 −0.5 0 0.5 1 1.5

−1

−0.5

0

0.5

1

q_1

q_2

Unstabilized perturbed trajectories

(a) Unstabilized pendulum equations −1.5 −1 −0.5 0 0.5 1 1.5

−1

−0.5

0

0.5

1

q_1

q_2

Stabilized perturbed trajectories

(b) Stabilized pendulum equations, = 10Figure 9.3: Perturbed (dashed lines) and unperturbed (solid line) solutionsfor Example 9.9.Next we modify the ODE according to (9.40), withDT = H = 0@2q1 2q2 0 0v1 v2 q1 q21Aand = 10, and repeat these integrations. The results are depicted in Fig.9.3(b). Of course, for the starting values which do not satisfy jq(0)j2 = 1,the exact solution of the stabilized ODE is di�erent from the original, but theFigure clearly indicates how the unit circle becomes attractive for the latterODE system, even when the initial values are signi�cantly perturbed. 2

Chapter 9: More on Di�erential-Algebraic Equations 253One of the earliest stabilization methods proposed in the literature wasdue to J. Baumgarte. In this method, the acceleration-level constraints arereplaced by a linear combination of the constraints on the acceleration, ve-locity and position levels:0 = g00 + 1g0 + 2g: (9.43)The parameters 1 and 2 are chosen such that the polynomial equationx2 + 1x+ 2 = 0has two negative roots; thus the ODE (9.43) for g is stable. This stabilizesthe invariant setM. The system (9.30a), (9.30b) and (9.43) is a semi-explicitindex-1 DAE. It can be made into an ODE upon elimination of �, and besubsequently solved numerically by standard methods. But the choice of theparameters has proved tricky in practice. Exercise 9.11 and x10.2 elaboratemore on this.9.2.3 State Space FormulationThe di�erentiations of the constraints of the given high-index DAE (9.30)yield an ODE (9.34) with an in ated dimension, as we have seen. Eventhough the number of degrees of freedom of the system is m� 2l, we have in(9.34) m ODEs, and in (9.35) an additional 2l algebraic equations. Ratherthan stabilizing the invariant, another approach is to use these algebraicequations to de�ne a reduced set of unknowns, obtaining an ODE systemof the minimal size m � 2l. The main di�culty with this idea arises in thepresence of highly nonlinear terms.Suppose that R is a rectangular, constant matrix such that, together withthe constraint Jacobian G, we obtain a nonsingular matrix with a boundedinverse: k0@RG1A�1 k � K : (9.44)De�ning the change of variablesu = Rq; w = Gq (9.45)we get q = 0@RG1A�10@uw1A :We can now use the constraints de�ning the invariant set, i.e. g(q) = 0 andGv = 0, to express w as a function of u, and hence q in terms of u. For u we

254 Chapter 9: Di�erential-Algebraic Equationsthen obtain, upon multiplying the equations of motion by R, an underlyingODE of size (when converted to a �rst order system) m� 2l,u00 = RM�1(f �GT�) (9.46)where � is given by (9.33).There are two popular choices for R. The �rst is such that the unknownsu form a subset of the original q, i.e., the columns of R are either unitvectors or 0. This has the advantage of simplicity. Note, however, thatwe cannot expect in general that one such choice of R will be good for allt, in the sense that (9.44) will remain valid with a moderate constant K.This coordinate partitioning has to be monitored and modi�ed as necessary.The other choice is to make R orthogonal to M�1GT . This eliminates � in(9.46), but introduces additional complications into the calculation due tothe varying R.The attraction of this approach is the small ODE system that is obtainedand the elimination of any drift o� the constraint manifold. On the negativeside, this approach involves a somewhat messier algebra and is less transpar-ent. The transformation non-singularity (9.44) must be monitored and anywrinkle in the constraint manifold might have to be fully re ected here, evenif it could be otherwise ignored.9.3 Modeling with DAEsThe closing decades of the 20th century have seen many scientists recognizethat their mathematical models are in fact instances of DAEs. Such a recog-nition has often carried with it the bene�t of a�ording a new, sometimesrevealing, computational look at the old problem.Note, however, that whereas a sensible formulation of a mathematicalmodel as an initial value ODE is typically followed simply by its numericalsolution using some appropriate code, DAE formulations may require moreuser attention and intervention, combining the processes of problem formu-lation and numerical solution. Since high index DAEs are all unstable, weknow already before advancing to Chapter 10 that attempting to discretizethem directly may adversely a�ect the resulting numerical scheme. The re-formulations of the problem discussed in the previous section are done withnumerical implementations in mind. In the extreme, a DAE would be con-verted to an ODE, but bear in mind that this may be cumbersome to carryout and costly to work with.Consider a DAE system and its various index reductions and reformu-lations studied in x9.2. The exact solution satis�es all such equations, but

Chapter 9: More on Di�erential-Algebraic Equations 255numerical discretizations generally result in nonzero residuals. When a semi-explicit DAE such as (9.6) is discretized and solved numerically, it is auto-matically assumed that the ODE (9.6a) will be solved approximately whilethe algebraic constraints will be satis�ed (almost) exactly. The residual in(9.6b) is essentially set to 0, while that of (9.6a) is only kept small (at thelevel of the truncation error). The relative importance of these residualschanges when index reduction is applied prior to discretization.The situation is similar for an ODE with an invariant (9.38). Once aparticular formulation is discretized, a greater importance is placed on theconstraints than on the ODE, in the sense described above.Satisfying the constraints (including the hidden ones) exactly is in someinstances precisely what one wants, and in most other cases it provides ahelpful (e.g. stabilizing), or at least harmless, emphasis. State space methods(x9.2.3) tacitly assume that this constraint satisfaction is indeed desired, andthey provide no alternative for when this is not the case.Yet, there are also instances where such an emphasis is at odds with thenatural ow of the ODE. In such cases one may be better o� not to insist onsatisfying constraints too accurately. Such examples arise when we apply themethod of lines (Example 1.3) for a PDE, allowing the spatial mesh points tobe functions of time and attempting to move them as the integration in timeproceeds as part of the solution process in order to meet some error equidis-tribution criteria which are formulated as algebraic equations (this is called amoving mesh method). The emphasis may then be wrongly placed, becauseobtaining an accurate solution to the PDE is more important than satisfy-ing a precise mesh distribution criterion. One is better o� using the DAEto devise other, clever moving mesh schemes, instead of solving it directly.Rather than dwelling on this further, we give another such example.Example 9.11 Recall from x2.5 that the Hamiltonian8 e(q;v) is constantin a Hamiltonian system given byq0 = rvev0 = �rqewhere e(q;v) does not depend explicitly on time t. So, this ODE system hasthe invariant e(q(t);v(t))� e(q(0);v(0)) = 0 8t:The system is in the form (9.38). To enforce the preservation of the invariant(conservation of energy), we can write it as a Hessenberg index-2 DAE (9.39)8In this chapter and the next we use e rather than H to denote the Hamiltonian, toavoid a notational clash with H = hx.

256 Chapter 9: Di�erential-Algebraic Equationswith D = HT . This gives q0 = rve� (rqe)zv0 = �rqe� (rve)ze(q;v) = e(q(0);v(0)) :Note that the DAE has one degree of freedom less than the original ODE. It issometimes very helpful to stabilize the solution with respect to this invariant;see, e.g., Example 10.8.But when the Hamiltonian system is highly oscillatory, e.g. in case ofExample 9.8 with 0 < "� 1, the projected DAE is poorly balanced. (Roughly,large changes in z are required to produce a noticeable e�ect in the ODE forv, but they then strongly a�ect the ODE for q.) The observed numericale�ect is that the best direct numerical discretizations of the DAE (which arenecessarily implicit) require that the step size h satisfy (at best) h = O(p"),or else the Newton iteration does not converge. With this step-size restriction,the explicit leapfrog discretization (Exercise 4.11) of the ODE is preferred.A complete discussion of this example is beyond the scope of this presen-tation. Let us simply state that there are also other reasons why imposingenergy conservation during a large-step integration of highly oscillatory prob-lems is not necessarily a good idea. 2Of course, we do not mean to discourage the reader from using DAEmodels and solvers, what with having spent a quarter of our book on them!Rather, we wish to encourage a careful thought on the problem formulation,whether it is based on an ODE or DAE model.9.4 Notes and ReferencesA more detailed development of the DAE theory contained in x9.1 can befound in the books by Brenan, Campbell & Petzold [19], Hairer & Wanner[52], and Griepentrog & M�arz [46]. See also the survey paper [66]. However,unlike in the previous theory chapters the material in this one is not a strictsubset of any of these references.There is an extensive theory for linear DAEs with constant coe�cientswhich we have chosen not to develop here. For an introduction and furtherreferences, see [19]. Be careful not to confuse constant coe�cient DAEs withmore general, linear DAEs.It is interesting that, in contrast to the situation with ODEs, theoremson existence and uniqueness of solutions of nonlinear DAEs did not appear

Chapter 9: More on Di�erential-Algebraic Equations 257until relatively recently. Most of these results are due to Rabier & Rheinboldt[75, 76]. The theory is based on a di�erential geometric approach; see also[66] and references therein.There have been many de�nitions of index in the literature, most of whichhave been shown to be equivalent or at least closely related, for the classes ofproblems to which they apply. The concept which we have de�ned here is are�nement of the di�erential index. In [49] and [52], a related concept calledthe perturbation index was introduced, which is directly motivated by theloss of smoothness in solutions to higher-index DAEs, as discussed in x9.1.2.However, we chose to restrict the perturbation analysis to linear(ized) DAEs(see Exercise 9.3).Underlying the index de�nition, and more generally our DAE discussion,is the assumption that, whatever matrix function which after certain ma-nipulations eventually is nonsingular, has this property independently of t.For example, in the semi-explicit form (9.13) we have considered either thecase that gz is nonsingular for all t or that it is singular for all t. Thisfundamental assumption breaks down for singular DAEs, where this matrixbecomes singular at some isolated points t. (For example, in (9.47a) considerthe case where a(t) varies and changes sign at some points.) The situationcan become much more complex, and a variety of phenomena may occur, fornonlinear, singular DAEs. The solution may remain continuous or it maynot [74, 6]. See also exercises 10.5 and 10.16.Some of the material covered in x9.2 is curiously missing from the usualDAE books. We refer for more to [2, 31, 3, 15, 93, 79]. Generalized co-ordinate partitioning methods were introduced in [95], and tangent planeparameterization methods were implemented in [72].9.5 Exercises1. A square matrix is said to be in (block, upper-) Hessenberg form if ithas the sparsity structure depicted in Fig. 9.4. Can you guess why\DAEs in Hessenberg form" have been endowed with this name?2. Consider the two-point boundary value problem"u00 = au0 + b(t)u+ q(t) (9.47a)u(0) = b1; u(1) = b2 (9.47b)where a 6= 0 is a constant and b; q are continuous functions, all O(1)in magnitude.

258 Chapter 9: Di�erential-Algebraic Equations0 2 4 6 8 10 12 14

0

2

4

6

8

10

12

14

nz = 103Figure 9.4: A matrix in Hessenberg form.(a) Write the ODE in �rst order form for the variables y1 = u andy2 = "u0 � au.(b) Letting "! 0, show that the limit system is an index-1 DAE.(c) Show that only one of the boundary conditions in (9.47) is neededto determine the reduced solution (i.e. the solution of the DAE).Which one?3. Consider the DAE [49] y01 = y3y02 � y2y030 = y20 = y3with y1(0) = 1.(a) Show that this DAE has index 1.(b) Show that if we add to the right hand side the perturbation�(t) = (0; � sin !t; � cos!t)Twhich is bounded in norm by a small �, the perturbed solution y(t)satis�es y01 = �2!, which is unbounded as ! ! 1. The stabilitybound is seen to depend on �0, as is typical for index-2 rather thanindex-1 problems.(c) Show that if we add a similar perturbation to the linearizationaround the solution y(t) for z = (z1; z2; z3)T :z01 = z3y02 + y3z02 � z2y03 � y2z030 = z20 = z3

Chapter 9: More on Di�erential-Algebraic Equations 259then the perturbed z is bounded in terms of k�k, like an index-1DAE solution should be.4. Construct an example of a DAE which for some initial conditions hasindex 1 and for others index 3.5. Consider the IVP for the implicit ODE(y0)2 = y2; y(0) = 1:(a) Show that this problem has two solutions.(b) Write down a corresponding Hessenberg index-1 DAE with twosolutions.6. The following equations describe a chemical reaction [69, 19]C 0 = K1(C0 � C)�RT 0 = K1(T0 � T ) +K2R �K3(T � TC)0 = R �K3e�K4=TCwhere the unknowns are the concentration C(t), the temperature T (t)and the reaction rate per unit volume R(t). The constants Ki and thefunctions C0 and T0 are given.(a) Assuming that the temperature of the cooling medium TC(t) isalso given, what is the index of this DAE? Is it in Hessenbergform?(b) Assuming that TC(t) is an additional unknown, to be determinedsuch that an additional equation specifying the desired productconcentration C = ufor a given u(t) be satis�ed, what is the index of this DAE? Is itin Hessenberg form?7. Given a general linear DAE (9.26) with E(t) decomposed as in (9.27),apply the decoupling transformation into semi-explicit form, give acondition for the DAE to have index 1 and formulate a precise stabilitycondition.8. (a) Writing the mechanical system (9.34)-(9.35) in the notation (9.38),�nd H f and a bound on 0 in (9.41).(b) Show that the velocity constraints (9.31) alone de�ne an invariantmanifold for (9.34). What are h, H and H f then?

260 Chapter 9: Di�erential-Algebraic Equations(c) Show that the position constraints (9.30c) alone do not de�ne aninvariant manifold for (9.34).9. Let r =px21 + x22 and consider the ODE [31]x01 = x2 + x1(r2 � 1)1=3r�2x02 = �x1 + x2(r2 � 1)1=3r�2:(a) Show that h(x) = r2 � 1 = 0de�nes an invariant set for this ODE.(b) Show that there is no �nite 0 > 0 for which (9.41) holds.10. Consider the mechanical system with holonomic constraints written asan ODE with invariant (9.34)-(9.35).(a) Write down the equivalent Hessenberg index-2 DAE (9.39) withD = 0@GT 00 GT1A :(b) This D simpli�es HT of (9.37) in an obvious manner. Verify thatHD is nonsingular.(c) Show that by rede�ning � the system you obtained can be writtenas q0 = v�GT�Mv0 = f �GT�0 = g(q)0 = Gv:This system is called the stabilized index-2 formulation [45].11. (a) Write down the system resulting from Baumgarte's [15] stabiliza-tion (9.43) applied to the index-3 mechanical system (9.30).(b) Consider the index-2 mechanical system given by (9.30a), (9.30b),(9.31). This is representative of nonholonomic constraints, wherevelocity-level constraints are not integrable into a form like (9.30c).Write down an appropriate Baumgarte stabilizationh0 + h = 0

Chapter 9: More on Di�erential-Algebraic Equations 261for the index-2 mechanical system and show that it is equivalentto stabilization of the invariant (9.40) withF = 0@ 0M�1GT (GM�1GT )�11A :(c) However, Baumgarte's technique (9.43) for the index-3 problem isnot equivalent to the stabilization (9.40). Show that the mono-tonicity property (9.42) does not hold here.

262 Chapter 9: Di�erential-Algebraic Equations

Chapter 10Numerical Methods forDi�erential-AlgebraicEquationsNumerical approaches for the solution of di�erential-algebraic equations (DAEs)can be divided roughly into two classes: (i) direct discretizations of the givensystem and (ii) methods which involve a reformulation (e.g. index reduction),combined with a discretization.The desire for as direct a discretization as possible arises because a refor-mulation may be costly, it may require more input from the user and it mayinvolve more user intervention. The reason for the popularity of reformula-tion approaches is that, as it turns out, direct discretizations are limited intheir utility essentially to index-1 and semi-explicit index-2 DAE systems.Fortunately, most DAEs encountered in practical applications are eitherindex-1 or, if higher-index, can be expressed as a simple combination ofHessenberg systems. The worst-case di�culties described in x10.1.1 belowdo not occur for these classes of problems. On the other hand, the mostrobust direct applications of numerical ODE methods do not always work aswell as one might hope, even for these restricted classes of problems. We willoutline some of the di�culties, as well as the success stories, in x10.1.We will consider two classes of problems:� Fully-implicit index-1 DAEs in the general form0 = F(t;y;y0): (10.1)� Index-2 DAEs in pure, or Hessenberg formx0 = f(t;x; z) (10.2a)0 = g(t;x): (10.2b)263

264 Chapter 10: Di�erential-Algebraic EquationsRecall that the class of semi-explicit index-2 DAEsx0 = f(t;x; z) (10.3a)0 = g(t;x; z) (10.3b)is equivalent to the class of fully implicit index-1 DAEs via the transforma-tions (9.7) and (9.8), although the actual conversion of DAEs from one formto another may come with a price of an increased system size. For the DAE(10.3), z are algebraic variables which may be index-1 or index-2, whereasfor the Hessenberg form the variables in z are all index-2 (which is why wesay that the DAE is pure index-2).Although there are in some cases convergence results available for nu-merical methods for Hessenberg DAEs of higher index, there are practicaldi�culties in the implementation which make it di�cult to construct robustcodes for such DAEs. For a DAE of index greater than two it is usuallybest to use one of the index-reduction techniques of the previous chapter torewrite the problem in a lower-index form. The combination of this with asuitable discretization is discussed in x10.2.10.1 Direct Discretization MethodsTo motivate the methods in this section, consider the regularization of theDAE (10.3), where (10.3b) is replaced by the ODE"z0 = g(t;x; z) ; (10.4)which depends on a small parameter 0 � " � 1. Despite the promisingname, we do not intend to actually carry out this regularization, unless spe-cial circumstances such as for a singular DAE (e.g. Exercise 10.16) requireit, because the obtained very sti� ODE (10.3a), (10.4) is typically more cum-bersome to solve than the DAE (recall, e.g., Example 9.8).1 But this allowsus to consider suitable ODE methods. Observe that:� Since the regularized ODE is very sti�, it is natural to consider methodsfor sti� ODEs for the direct discretization of the limit DAE.� ODE discretizations which have sti� decay are particularly attractive:to recall (x3.5), any e�ect of an arti�cial initial layer which the regular-ization introduces can be skipped, fast ODE modes are approximated1For this reason we may also assume that the regularized ODE problem is stable undergiven initial or boundary conditions. Were the regularized problem to be actually solved,the term "z0 might have to be replaced by a more general "Bz0, where e.g. B = �gz.

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 265well at mesh points, and so the passage to the limit of "! 0 in (10.4)is smooth and yields a sensible discretization for the DAE.The rest of this section is therefore devoted to the direct application ofODE methods to low-index DAEs. All the winning methods have sti� decay,but this property alone is not su�cient. For initial value DAEs which arecumbersome to transform, and especially for DAEs whose underlying ODEis sti�, the BDF and Radau collocation methods discussed in this sectionare the overall methods of choice. We thus begin with the simplest methodof this kind, the backward Euler method, and then consider its extension tohigher order via BDF or Radau methods; see Fig. 10.1.Radau collocation

Backward Euler

BDFFigure 10.1: Methods for the direct discretization of DAEs in general form.10.1.1 A Simple Method: Backward EulerConsider the general DAE 0 = F(t;y;y0) :The idea of a direct discretization is simple: approximate y and y0 by adiscretization formula likemultistep or Runge-Kutta. Applying the backwardEuler method to this DAE, we obtain0 = F�tn;yn; yn � yn�1hn � : (10.5)This gives, in general, a system of m nonlinear equations for yn at each timestep n.Unfortunately, this simple method does not always work. In the worstcase, there are simple higher-index DAE systems with well-de�ned solutions

266 Chapter 10: Di�erential-Algebraic Equationsfor which the backward Euler method, and in fact all other multistep andRunge-Kutta methods, are unstable or not even applicable.Example 10.1 Consider the following linear index-2 DAE which dependson a parameter �,0@0 01 �t1Ay0 +0@1 �t0 1 + �1Ay = 0@q(t)0 1A : (10.6)The exact solution is y1(t) = q(t) + �tq0(t), y2(t) = �q0(t), which is well-de�ned for all values of �. The problem is stable for moderate values of �.Yet, if � = �1, we show below that there is no solution of the equationsde�ning yn using the backward Euler discretization. It can be shown (seeExercise 10.1) that the backward Euler method is unstable when � < �:5.Let us analyze this problem. To transform to semi-explicit form, de�neu = y1 + �ty2, v = y2, hencey = 0@1 ��t0 1 1A0@uv1A :We readily obtain u0 + v = 0; u = q(t)for which the backward Euler method givesun = q(tn); vn = �q(tn)� un�1h(note that a forward Euler method makes no sense here). Thus, provided thatwe start with a consistent initial value for u, i.e. u0 = q(0), we havevn = �q0(tn) +O(h)which is all that one can expect from a �rst order method for ODEs.This is in marked contrast to what happens when applying backward Eulerdirectly to (10.6),0@0 01 �tn1A yn � yn�1h +0@1 �tn0 1 + �1Ayn = 0@q(tn)0 1A :De�ning 0@unvn1A = 0@1 �tn0 1 1Ayn, we get from this latter discretizationun = q(tn); (1 + �)vn = (q(tn)� q(tn�1))=h :

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 267We see that, while un is reproduced exactly, vn is unde�ned when � = �1and has O(1) error when � 6= 0.The transformation to semi-explicit form decouples the solution compo-nents y into di�erential and algebraic variables. The backward Euler dis-cretization works well for the decoupled problem. But in general, a directdiscretization of non-decoupled DAEs of index higher than one is not recom-mended. 2For the remainder of x10.1 we thus consider only index-1 or semi-explicitindex-2 DAEs.For the simplest class of nonlinear DAEs, namely semi-explicit index-1,x0 = f(t;x; z) (10.7a)0 = g(t;x; z) (10.7b)where gz is nonsingular, it is easy to see that the backward Euler methodretains all of its properties (i.e. order, stability, and convergence) from theODE case. First, we recall that by the implicit function theorem, there existsa function ~g such that z = ~g(t;x):(Let us assume, for simplicity, that there is only one such ~g, so what isdepicted in Exercise 9.5 does not happen). Thus the DAE (10.7) is equivalentto the ODE x0 = f(t;x; ~g(t;x)): (10.8)Now, consider the backward Euler method applied to (10.7),xn � xn�1hn = f(tn;xn; zn) (10.9a)0 = g(tn;xn; zn): (10.9b)Solving for zn in (10.9b) and substituting into (10.9a) yieldsxn � xn�1hn = f(tn;xn; ~g(tn;xn)) (10.10)which is just the backward Euler discretization of the underlying ODE (10.8).Hence we can conclude from the analysis for the nonsti� case in x3.2 thatthe backward Euler method is �rst-order accurate, stable and convergent forsemi-explicit index-1 DAEs.For fully-implicit index-1 DAEs, the convergence analysis is a bit morecomplicated. It is possible to show that for an index-1 DAE, there existstime (and solution)-dependent transformation matrices in a neighborhood of

268 Chapter 10: Di�erential-Algebraic Equationsthe solution, which locally decouple the linearized system into di�erentialand algebraic parts. Convergence and �rst-order accuracy of the method onthe di�erential part can be shown via the techniques of x3.2. The backwardEuler method is exact for the algebraic part. The complications arise mainlydue to the time-dependence of the decoupling transformations, which entersinto the stability analysis. (Recall that for fully-implicit higher-index DAEs,time-dependent coupling between the di�erential and algebraic parts of thesystem can ruin the method's stability, as demonstrated in Example 10.1.Fortunately, for index-1 systems it only complicates the convergences anal-ysis; however, it may a�ect some stability properties of the method.) Seex10.3.1 for pointers to further details.The convergence result for backward Euler applied to fully-implicit index-1 DAEs extends to semi-explicit index-2 DAEs in an almost trivial way.Making use of the transformation (9.8), it is easy to see that solving theindex-1 system (9.8) by the backward Euler method gives exactly the samesolution for x as solving the original semi-explicit index-2 system (10.3) by thesame method. A separate argument must be made concerning the accuracyof the algebraic variables z. For starting values which are accurate to O(h),it turns out that the solution for z is accurate to O(h), after 2 steps havebeen taken.For nonlinear problems of the form (10.5) a Newton iteration for yn,starting from an approximation y0n based on information from previous steps,yields for the (� + 1)st iterate,y�+1n = y�n �� 1hn @F@y0 + @F@y��1F�tn;y�n; y�n � yn�1hn � : (10.11)Note that, in contrast to the ODE case, the iteration matrix is not simplydominated by an h�1n I term. We discuss the implication of this in x10.1.4.10.1.2 BDF and General Multistep MethodsThe constant step-size BDF method applied to a general nonlinear DAE ofthe form (10.1) is given byF tn; yn; 1�0h kXj=0 �jyn�j! = 0 (10.12)where �0 and �j , j = 0; 1; : : : ; k, are the coe�cients of the BDF method.Most of the available software based on BDF methods addresses the fully-implicit index-1 problem. Fortunately, many problems from applicationsnaturally arise in this form. There exist convergence results underlying themethods used in these codes which are a straightforward extension of theresults for backward Euler. In particular, the k-step BDF method of �xed

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 269step size h for k < 7 converges to O(hk) if all initial values are correctto O(hk), and if the Newton iteration on each step is solved to accuracyO(hk+1). This convergence result has also been extended to variable step-size BDF methods, provided that they are implemented in such a way thatthe method is stable for standard ODEs. See the discussion in x5.5. Aswith backward Euler, this convergence result extends to semi-explicit index-2 DAEs via the transformation (9.8). A separate argument must be madeconcerning the accuracy of the algebraic variable z. For starting values whichare accurate to O(hk), it turns out that the solution for z is accurate to O(hk)after k + 1 steps have been taken.There has been much work on developing convergence results for generalmultistep methods. For general index-1 DAEs and for Hessenberg index-2DAEs, the coe�cients of the multistep methods must satisfy a set of orderconditions which is in addition to the order conditions for ODEs, to attainorder greater than 2. It turns out that these additional order conditions aresatis�ed by BDF methods.You may wonder if all this additional complication is really necessary:why not simply write (10.1) as (10.3), then consider (10.3b) as the limit of(10.4)? Then apply the known theory for BDF from the ODE case!?The answer is that there is no such a priori known convergence theoryin the ODE case. The basic convergence, accuracy and stability theory ofChapters 3, 4 and 5 applies to the case h! 0, whereas here we must alwaysconsider "� h. Indeed, since any DAE of the form (10.3) can be \treated"this way, regardless of index, we cannot expect much in general in view ofthe negative results in Chapter 9 for higher index DAEs. For an ODE system(10.3a)-(10.4) whose limit is an index-2 DAE (10.3), convergence results asstated above do apply. But these results are not easier to obtain for theODE: on the contrary, the very sti� ODE case is generally more di�cult.Example 10.2 To check the convergence and accuracy of BDF methods,consider the simple linear example,x01 = (�� 12� t)x1 + (2 � t)�z + 3 � t2 � tx02 = 1� �t� 2 x1 � x2 + (�� 1)z + 2et0 = (t+ 2)x1 + (t2 � 4)x2 � (t2 + t� 2)etwhere � is a parameter. This DAE is in a pure index-2 form (10.2). For theinitial conditions x1(0) = x2(0) = 1 we have the exact solutionx1 = x2 = et; z = � et2� t:Recall that we can de�ne y0 = z with some initial condition (say, y(0) = 0)to obtain a fully implicit index-1 DAE for x = (x1; x2)T and y. The BDF

270 Chapter 10: Di�erential-Algebraic Equationsdiscretization remains the same. We select � = 10 and integrate this DAE10

−410

−310

−210

−110

−12

10−10

10−8

10−6

10−4

10−2

100

h

max

|err

or|

(a) Errors in x1(t) 10−4

10−3

10−2

10−1

10−12

10−10

10−8

10−6

10−4

10−2

100

h

max

|err

or|

(b) Errors in z(t)Figure 10.2: Maximum errors for the �rst 3 BDF methods for Example 10.2.from t = 0 to t = 1 using the �rst three BDF methods. In Fig. 10.2 wedisplay maximum errors in x1 and in z for di�erent values of h ranging from120 to 12560. We use a log-log scale, so the slopes of the curves indicate theorders of the methods. The results clearly indicate that the convergence orderof the k-step BDF method is indeed k and that in absolute value the errorsare pleasantly small. 210.1.3 Radau Collocation and Implicit Runge-KuttaMethodsRunge-Kutta Methods and Order ReductionThe s-stage implicit Runge-Kutta method applied to the general nonlinearDAE of the form (10.1) is de�ned by0 = F(ti;Yi;Ki); (10.13a)ti = tn�1 + cih; i = 1; 2; : : : ; s (10.13b)Yi = yn�1 + h sXj=1 aijKj (10.13c)and yn = yn�1 + h sXi=1 biKi : (10.14)

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 271We assume here that the coe�cient matrix A = (aij) is nonsingular.For the semi-explicit problem (10.3) the formula (10.13) for the internalstages reads Ki = f(ti;Xi;Zi)Xi = xn�1 + h sXj=1 aijKj0 = g(ti;Xi;Zi) :For the algebraic variables z in this case it is often better to avoid the quadra-ture step implied by (10.14), because there is no corresponding integrationin the DAE. This gives an advantage to sti�y accurate methods which sat-isfy bj = asj; j = 1; : : : ; s, because for these methods the constraints areautomatically satis�ed at the �nal stage. Indeed, for such methods we haveyn = Ysin (10.13), and (10.14) is not used. For (10.3) we then simply set xn = Xs.As was the case for general multistep methods, there are additional orderconditions which the method coe�cients must satisfy for the method to attainorder greater than 2, for general index-1 and Hessenberg index-2 DAEs. ForRunge-Kutta methods, there is an additional set of order conditions evenfor semi-explicit index-1 DAEs. We are often faced with an order reduction,the causes of which are closely related to the causes of order reduction forRunge-Kutta methods applied to sti� ODEs (recall x4.7.3 and x8.6). Thisis not surprising, given the close relationship between very sti� ODEs andDAEs.�� Reader's advice: It is possible to skip the remainder of thissubsection, if you are interested mainly in using the methods anddo not require an understanding of the causes of order reduction.To understand this order reduction, consider �rst the simple scalar ODE"z0 = �z + q(t) (10.15a)and its limit DAE 0 = �z + q(t) (10.15b)to which we apply an s-stage Runge-Kutta method with a nonsingular co-e�cient matrix A. Using notation similar to Chapter 4, the internal stagesolution values areZi = zn�1 + h=" sXj=1 ai;j(q(tj)� Zj); i = 1; : : : ; s :

272 Chapter 10: Di�erential-Algebraic EquationsSo, with Z = (Z1; : : : ; Zs)T , Q = (q(t1); : : : ; q(ts))T , we haveZ = ("h�1I +A)�1("h�11zn�1 +AQ)or Z = "h�1A�11zn�1 + (I � "h�1A�1)Q+O("2h�2) :Letting "! 0 we get the exact DAE solution at the internal stagesZi = q(ti); i = 1; : : : ; s : (10.16)At the end of the current step,zn = zn�1 � h"bT (Z�Q) = zn�1 � bTA�1(1zn�1 �Q) +O("h�1)and for the DAE (10.15b) this giveszn = (1 � bTA�11)zn�1 + bTA�1Q : (10.17)The recursion (10.17) for zn converges if jR(�1)j = j1 � bTA�11j �1, but the order of approximation of the ODE, which involves quadratureprecision, may be reduced. For instance, for an s-stage collocation methodthe approximate solution on the subinterval [tn�1; tn] is a polynomial whichinterpolates q(t) at the collocation points ti of (10.13b). The local errorzn � q(tn) (assuming zn�1 = q(tn�1) for a moment) is therefore strictly aninterpolation error, which is O(hs+1).The situation is much better if the method has sti� decay, which happenswhen bT coincides with the last row of A. In this case cs = 1 necessarily,and zn = Zs = q(tn)is exact. This can also be obtained from (10.17) upon noting that bTA =(0; : : : ; 0; 1). Thus, while Gauss collocation yields a reduced local error orderO(hs+1), down from the usual order O(h2s+1), Radau collocation yields theexact solution for (10.15b) at mesh points tn.Next, consider the systemx0 = �x+ q1(t) (10.18a)"z0 = �z + x+ q2(t) (10.18b)and the corresponding index-1 DAE obtained with " = 0. Applying thesame Runge-Kutta discretization to this system and extending the notationin an obvious manner, e.g.Xi = xn�1 + h sXj=1 ai;j(�Xj + q1(tj))Zi = zn�1 + h" sXj=1 ai;j(�Zj +Xj + q2(tj))

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 273we obtain for (10.18b) as "! 0,Zi = Xi + q2(ti); i = 1; : : : ; szn = (1� bTA�11)zn�1 + bTA�1(Q2 +X)with an obvious extension of vector notation. Thus, the stage accuracy ofthe method, i.e. the local truncation error at each stage, enters the error inzn, unless the method has sti� decay.A Runge-Kutta method is said to have stage order r if r is the minimumorder of the local truncation error over all internal stages. For an s-stagecollocation method, the stage order is s. For an s-stage DIRK the stageorder is 1. We see that for (10.18) the local error in zn has the reduced orderr + 1, unless the method has sti� decay. For the latter there is no reductionin order.This result can be extended to general semi-explicit index-1 DAEs (10.3).But it does not extend to fully implicit index-1 DAEs or to higher-indexDAEs. In particular, DIRK methods experience a severe order reduction forsemi-explicit index-2 DAEs and hence also for fully implicit index-1 problems.This is true even for DIRK methods which have sti� decay.The rooted tree theory of Butcher has been extended to yield a completeset of necessary and su�cient order conditions for classes of DAEs such assemi-explicit index-1, index-1, and Hessenberg index-2 and index-3. We willnot pursue this further here.Collocation MethodsBy their construction, Runge-Kutta methods which are collocation methodsare not subject to such severe order limitations as DIRK methods in theDAE case. These methods were introduced in x4.7. For the semi-explicitDAE (10.3) we approximate x by a continuous piecewise polynomial x�(t)of degree < s + 1 on each subinterval [tn�1; tn], while z is approximated bya piecewise polynomial which may be discontinuous at mesh points tn andhas degree < s on each subinterval (see Exercise 10.7). The convergenceproperties are summarized below.Consider an s-stage collocation method of (ODE) order p, with all ci 6= 0,approximating the fully-implicit index-1 DAE (10.1) which has su�cientlysmooth coe�cients in a neighborhood of an isolated solution. Let = 1 �bTA�11 and assume j j � 1. This method converges and the order satis�es:� The error in yn is at least O(hs).� If j j < 1 then the error in yn is O(hs+1).� If = �1 and a mild mesh restriction applies then the error in yn isO(hs+1).

274 Chapter 10: Di�erential-Algebraic Equations� If cs = 1 then the error in yn is O(hp).For the semi-explicit index-2 DAE (10.3), the error results for the di�er-ential variable x are the same as for the index-1 system reported above. Forthe algebraic variable z, the error satis�es:� The error in zn is at least O(hs�1).� If j j < 1 then the error in zn is O(hs).� If = �1 and a mild mesh restriction applies then the error in zn isO(hs).� If cs = 1 then the error in zn is O(hs).In particular, collocation at Radau points retains the full order p = 2s�1for the di�erential solution components, and so this family of methods isrecommended as the method of choice for general-purpose use among one-step methods for initial value DAEs and for very sti� ODEs.Example 10.3 Fig. 10.3 is a schematic depiction of a simple circuit con-taining linear resistors, a capacitor, voltage sources (operating voltage Ub andinitial signal Ue), and two npn-bipolar transistors. For the resistors and the3

R 1

R 3

R 4

C R5

5

R2

U0

U

Ue

b

1

24

Figure 10.3: A simple electric circuit.capacitor the current relates directly to the voltage drop along the device (re-call Example 9.3). For the transistors the relationship is nonlinear and is

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 275characterized by the voltage U = UB � UE between the base and the emitter.(The third pole of the transistor is the collector C.) We useIE = f(U) = �[eU=UF � 1]IC = ��IEIB = (� � 1)IEwhere UF = 0:026; � = 0:99; � = 1:e� 6.Applying Kircho�'s current law at the 5 nodes in sequence, we get0 = (U1 � Ue)=R1 � (�� 1)f(U1 � U3)C(U 02 � U 04) = (UB � U2)=R2 + (U4 � U2)=R4 � �f(U1 � U3)0 = (U3 � U0)=R3 � f(U1 � U3)� f(U4 � U3)C(U 04 � U 02) = (� � 1)f(U4 � U3)� (U4 � U2)=R40 = (U4 � Ub)=R5 + �f(U4 � U3):We use the values U0 = 0 (ground voltage), Ub = 5, Ue = 5 sin(2000�t),R1 = 200; R2 = 1600; R3 = 100; R4 = 3200; R5 = 1600, C = 40:e � 6. (Thepotentials are in Volts, the resistances are in Ohms, t is in seconds.)This is a simple index-1 DAE which, however, has scaling and sensitivitydi�culties due to the exponential in the de�nition of f . We can obviouslymake it semi-explicit for the di�erential variable U2 � U4, but we leave thesystem in the fully implicit form and apply the collocation code radau5.This code is based, to recall, on collocation at 3 Radau points. It applies toODEs and DAEs of the formMy0 = ~f(t;y)(see x10.3.2 and Exercise 10.15), and here we have such a form with theconstant matrix M = 0BBBBBBBBB@0 0 0 0 00 C 0 �C 00 0 0 0 00 C 0 �C 00 0 0 0 01CCCCCCCCCA :For consistent initial conditions, only U2(0)�U4(0) are free. The rest aredetermined by the 4 algebraic equations. (How 4? Three are apparent; thefourth is obtained upon adding up the two equations containing derivatives,which cancels out the derivative term.) A consistent initial vector is given byy(0) = (0; Ub; 0; 0; Ub)T :

276 Chapter 10: Di�erential-Algebraic Equations0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10−3

−70

−60

−50

−40

−30

−20

−10

0

10

time (sec)

volta

ge (

V)

Figure 10.4: Results for a simple electric circuit: U2(t) (solid line) and theinput Ue(t) (dashed line).A plot of the resulting U2 as a function of time, as well as the input signalUe, is given in Fig. 10.4. It is seen that U2(t) has periods where it becomeslarge and negative. The solution is not very smooth. The code used 655 steps,of which 348 were accepted, for this simulation. The right hand side functionwas evaluated almost 7000 times, but the Jacobian only 342 times. 210.1.4 Practical Di�cultiesEven though there are order and convergence results for the backward Eulermethod (as well as for BDF and collocation at Radau points) applied to fully-implicit index-1 and semi-explicit index-2 DAEs, some practical di�cultiespersist. Fortunately, they are not insurmountable.Obtaining a consistent set of initial conditionsA major di�erence in practice between the numerical solution of ODEs andDAEs is that the solution of a DAE system must be started with a consistentset of initial conditions. Recall from x9.1 that this means that the constraints,and possibly some hidden constraints, must be satis�ed at the initial point.There are two basic types of initialization problems: when there is notenough information for a general-purpose code; and when there is too muchinformation, or not the correct type of information, for the DAE to havea solution. To understand the �rst of these better, consider the simplest

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 277instance of semi-explicit, index-1 DAE (10.7). Suppose thatx(0) = x0is provided. This is precisely the information needed to specify a solutiontrajectory for this problem. For an ODE, i.e. (10.7) without z and g, we canuse the di�erential equation to obtain also x0(0) (denote this by x00). Thisinformation is used by a general-purpose code to obtain an accurate initialguess for the Newton iteration and/or a reliable error estimate for the �rsttime step. A general-purpose DAE solver may require2 the value of z0. Thisis for 3 reasons: to completely specify the solution at t = 0, to provide aninitial guess for the variant of Newton's iteration used to �nd z1, and tocompute x00 from x00 = f(0;x0; z0):The solution process for z0 consists in this case of solving the nonlinearequations 0 = g(0;x0; z0)given x0. Unlike in later steps, where we have zn�1 to guess zn initially,here we must face a \cold start". This can be done with an o�-the-shelfnonlinear equation solver. Also, some of the newer BDF software o�ers thisas an initialization option. The implementation requires little in the wayof additional information from the user, and can exploit structure in theiteration matrix, making use of the same linear algebra methods which areused in subsequent time-stepping.Note that the above does not address the question of �nding all initialvalues z0, in case there is more than one isolated solution for these nonlinearalgebraic equations. An extension of this procedure is given in Exercise 10.6.Another consistent initialization problem, that of �nding initial values of thesolution variables such that the system starts in a steady state, is presentalready for ODEs and discussed in Exercise 5.8.Consistent initialization of more general index-1 DAEs involves more di�-culties, because the di�erential and the algebraic variables are not separated.Thus, information that should be determined internally by the system maybe speci�ed externally (i.e., in addition to the DAE system).Example 10.4 For the semi-explicit index-1 DAEu0 = �(u+ v)=2 + q1(t)0 = (u� v)=2 � q2(t)it is clear that a prescribed u0 determines v0 = u0 � 2q2(0), and then u00 =�(u0 + v0)=2 + q1(0).2Note that z0 is not needed for the exact solution. Moreover, this value is never usedin a simple calculation like for Example 10.2.

278 Chapter 10: Di�erential-Algebraic EquationsBut now, let u = y1 + y2; v = y1 � y2. This yields the DAEy01 + y02 + y1 = q1(t)y2 = q2(t):To get an isolated solution to the DAE, we need to specify y1(0) + y2(0).But we cannot specify y1(0) and y2(0) arbitrarily, because y2(0) = q2(0) isalready determined. Specifying y1(0), it is not possible to solve directly for theremaining initial values as we did in the semi-explicit case. Instead, we canonly �nd y01(0) + y02(0) = q1(0)� y1(0). To �nd y01(0) and y02(0) individually,we need also the information from the derivative of the constraint, namelyy02(t0) = q02(t0). 2The situation gets more complex, of course, for higher index problems.Recall that consistent initial conditions for higher-index systems must satisfythe hidden constraints which are derivatives of the original constraints.Example 10.5 Consider once more the simple pendulum in Cartesian co-ordinates. The equations (9.21) for this index-3 Hessenberg DAE are givenin Example 9.7.Note at �rst that q(0) cannot be speci�ed arbitrarily: given, e.g., q1(0),the value of q2(0) = �p1 � q1(0)2 is determined up to a sign. Then, fromthe hidden constraint (9.22) the speci�cation of one of the components ofv(0) also determines the other. In other words, the user's speci�cation ofq(0) and v(0) must satisfy the constraints (9.21e) and (9.22).This then determines q0(0) by (9.21a)-(9.21b) and �(0) according to (9.23).Finally, v0(0) is determined by (9.21c)-(9.21d), although this may be consid-ered less necessary. 2To make this task easier for non-Hessenberg DAEs (especially in largeapplications such as electric circuits, see Example 9.3), methods and softwareare available which use graph theoretic algorithms to determine the minimalset of equations to be di�erentiated in order to solve for the consistent initialvalues. Initialization for general index-1 systems and for higher-index systemsis often handled on a case-by-case basis.Ill-conditioning of iteration matrixAnother di�culty, which shows up already in the solution of index-1 DAEsbut is more serious for index-2 systems, concerns the linear system to besolved at each Newton iteration. For explicit ODEs, as hn ! 0 the iteration

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 279matrix tends to the identity.3 For index-1 and Hessenberg DAEs, the con-dition number of the iteration matrix is O(h�pn ), where p is the index. Toillustrate, consider the backward Euler method applied to the semi-explicitindex-1 DAE (10.7). The iteration matrix is0@h�1n I � fx �fz�gx �gz1A :It is easy to see that the condition number of this matrix is O(h�1n ). Forsmall hn, this can lead to failure of the Newton iteration. However, scalingcan improve the situation. In this case, multiplying the constraints by h�1nyields an iteration matrix whose condition number no longer depends on h�1nin this way. For Hessenberg index-2 systems the conditioning problem can bepartially �xed by scaling of both the algebraic variables and the constraints(see Exercise 9.2).Error estimation for index-2 DAEsRecall that in modern BDF codes, the errors at each step are estimatedvia a weighted norm of a divided di�erence of the solution variables. ForODEs, this norm is taken over all the solution variables. This type of errorestimate still works for fully-implicit index-1 DAEs, but it is not appropriatefor index-2 problems, as illustrated by the following example.Example 10.6 Consider the simple index-2 DAEy1 = q(t)y2 = y01solved by the backward Euler method to givey1;n = q(tn)y2;n = y1;n � y1;n�1hn = q(tn)� q(tn�1)hn :The truncation error is estimated via the second divided di�erence of thenumerical solution which, for the algebraic variable y2, yieldsEST = hn(hn + hn�1)[y2;n; y2;n�1; y2;n�2]= hn(hn + hn�1) y2;n�y2;n�1hn � y2;n�1�y2;n�2hn�1hn + hn�1 ! (10.19)= hn q(tn)�q(tn�1)hn � q(tn�1)�q(tn�2)hn�1hn � q(tn�1)�q(tn�2)hn�1 � q(tn�2)�q(tn�3)hn�2hn�1 ! :3Of course, for very sti� ODEs the term h�1n I which appears in the iteration matrixdoes not help much, because there are larger terms which dominate. The situation for avery sti� ODE is similar to that of the limit DAE.

280 Chapter 10: Di�erential-Algebraic EquationsFor an ODE, or even for the di�erential variable y1 of this example, EST! 0 as hn ! 0 (all previous step sizes are �xed). However, (10.19) yieldsfor the error estimate of the algebraic variablelimhn!0EST = limhn!0 q(tn)� q(tn�1)hn � q(tn�1)� q(tn�2)hn�1= q0(tn�1)� q(tn�1)� q(tn�2)hn�1which in general is nonzero. Thus, the error estimate for this variable cannotbe decreased to zero by reducing the step size. This can lead to repeated errortest failures. The approximation to y2 is actually much more accurate thanthe error estimate suggests. 2The problem can be �xed by eliminating the algebraic variables (and inparticular the index-2 variables) from the error test. In fact, it has beenshown that this strategy is safe in the sense that it does not sacri�ce theaccuracy of the lower-index (or di�erential) variables, which control the time-evolution of the system. We note that the algebraic variables should not beremoved from the Newton convergence test.Given the di�culties encountered for direct DAE discretization methods,and our recommendation not to apply such methods for DAEs beyond semi-explicit index-2, we must also emphasize again that, on the other hand, suchdirect discretization methods are important in practice. Index reductionmay be necessary at times, but it is often not a desirable medicine! Oneclass of applications where this is important is in large circuit simulation,as discussed in Example 9.3. We saw a small instance in Example 10.3.Other such examples often arise in chemical engineering and in a variety ofapplications involving the method of lines. For large problems, which ariseroutinely in practice, a conversion to explicit ODE form can be a disaster ifas a result the sparsity structure of the matrices involved is lost.10.1.5 Specialized Runge-Kutta Methods for Hessen-berg Index-2 DAEsThe methods discussed in this section apply to Hessenberg index-2 problems(10.2) and not to the more general form of (10.3). The structure of the pureindex-2 system is exploited to achieve gains which are not possible for theperturbed ODE (10.4).

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 281Projected Runge-Kutta MethodsAs we have seen in Chapter 8, one-sided formulas like Radau collocationwithout upwinding are not well-suited for the solution of general boundaryvalue problems. Since a stable boundary value problem can have solutionmodes which decrease rapidly in both directions, a symmetric method ispreferred, or else such modes must be explicitly decoupled. The Gauss col-location methods have been particularly successful for the solution of ODEboundary value problems. However, these methods do not have sti� decay,and when implemented in a straightforward manner as described in x10.1.3,they su�er a severe order reduction for Hessenberg index-2 DAEs. In general,the midpoint method is accurate only to O(1) for the index-2 variable z in(10.2). There are additional di�culties for these methods applied to Hes-senberg index-2 DAEs, including potential instability and the lack of a nice,local error expansion. Fortunately, all of these problems can be eliminatedby altering the method to include a projection onto the constraint manifoldat the end of each step. Thus, not only z�(t) but also x�(t), the piecewisepolynomial approximating x(t), may become discontinuous at points tn (seeExercises 10.8 and 10.9).Let xn, zn be the result of one step, starting from xn�1; zn�1, of an implicitRunge-Kutta method (10.13) applied to the Hessenberg index-2 DAE (10.2).Rather than accepting xn as the starting value for the next step, the projectedRunge-Kutta method modi�es xn at the end of each step so as to satisfyxn = xn + fz(tn;xn; zn)�n (10.20a)0 = g(tn; xn) : (10.20b)(The extra variables �n are needed for the projection only. They are notsaved.) Then set xn xn and advance to the next step.Note that for a method with sti� decay, (10.20b) is already satis�ed byxn, so there is no need to project. For collocation the projection gives themethods essentially the same advantages that Radau collocation has withoutthe extra projection. In particular, projected collocation methods achievesuperconvergence order for x at the mesh points. The solution for z can bedetermined from the solution for x, and to the same order of accuracy, via apost-processing step.Projected collocation at Gauss points has order 2s and is useful for bound-ary value DAEs.Half-Explicit Runge-Kutta MethodsFor many applications, a fully-implicit discretization method is not war-ranted. For example, many mechanical systems are essentially nonsti� andcan, with the exception of the constraints, be handled via explicit methods.

282 Chapter 10: Di�erential-Algebraic EquationsOne way to accommodate this is via half-explicit Runge-Kutta methods. Themethods obtained share many attributes with the methods to be describedin the next section.The half-explicit Runge-Kutta method is de�ned, for a semi-explicit DAE(10.3), by Xi = xn�1 + h i�1Xj=1 aijf(tj;Xj ;Zj)0 = g(ti;Xi;Zi); i = 1; : : : ; sxn = xn�1 + h sXi=1 bif(ti;Xi;Zi)0 = g(tn;xn; zn): (10.21)Thus, at each stage i, Xi is evaluated explicitly and a smaller nonlinearsystem is solved for Zi.For semi-explicit index-1 DAEs, the order of accuracy is the same as forODEs. In fact, the method is not very di�erent from the correspondingexplicit Runge-Kutta method applied to the ODE x0 = f(t;x; z(x)). Forsemi-explicit index-2 systems in Hessenberg form, there is in general orderreduction, but higher-order methods of this type have been developed.10.2 Methods for ODEs on ManifoldsThe numerical solution of di�erential systems where the solution lies on amanifold de�ned explicitly by algebraic equations is a topic with interest inits own right. It also provides a useful approach for solving DAEs.As in x9.2.2, consider the nonlinear di�erential systemx0 = f(x) (10.22a)and assume for simplicity that for each initial value vector x(0) = x0 thereis a unique x(t) satisfying (10.22a). Suppose in addition that there is aninvariant setM de�ned by the algebraic equations0 = h(x) (10.22b)such that if h(x0) = 0 then h(x(t)) = 0 for all t. There are various ap-proaches possible for the numerical solution of (10.22).1. Solve the stabilized ODE (9.40) numerically, using one of the discretiza-tion methods described in earlier chapters. The question of choosing

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 283the stabilization parameter arises. As it turns out, the best choice of typically depends on the step size, though { see Exercise 10.11 andExample 10.7.2. Rather than discretizing (9.40), it turns out to be cheaper and moree�ective to stabilize the discrete dynamical system, i.e. to apply thestabilization at the end of each step. Thus, an ODE method is appliedat each step to (10.22a). This step is followed by a post-stabilizationor a coordinate projection step to bring the numerical solution closer tosatisfying (10.22b), not unlike the projected Runge-Kutta methods ofthe previous section.3. The \automatic" approach attempts to �nd a discretization for (10.22a)which automatically satis�es also the equations (10.22b). This is pos-sible when the constraints are at most quadratic { see Exercises 4.15{4.16.Of these approaches, we now concentrate on post-stabilization and coor-dinate projection.10.2.1 Stabilization of the Discrete Dynamical SystemIf the ODE is not sti� then it is desirable to use an explicit discretizationmethod, but to apply stabilization at the end of the step. This is reminiscentof half-explicit Runge-Kutta methods (10.21). Suppose we use a one-stepmethod of order p with a step size h for the given ODE (without a sta-bilization term). Thus, if at time tn�1 the approximate solution is xn�1,application of the method gives~xn = �fh(xn�1)as the approximate solution at tn (e.g. forward Euler: �fh(xn�1) = xn�1 +hf(xn�1)).The post-stabilization approach modi�es ~xn at the end of the time stepto produce xn, which better approximates the invariant's equations:~xn = �fh(xn�1) (10.23a)xn = ~xn � F (~xn)h(~xn) : (10.23b)The stabilization matrix function F was mentioned already in (9.40) and itsselection is further discussed in x10.2.2.Example 10.7 For the scalar ODE with invariantx0 = 0(t)0 = x� (t)

284 Chapter 10: Di�erential-Algebraic Equationswith x(0) = (0), where is a given, su�ciently di�erentiable function, theexact solution is x = (t).The post-stabilization procedure based, e.g., on forward Euler,~xn = xn�1 + h 0(tn�1)xn = ~xn � (~xn � (tn))produces the exact solution for this simple example.Consider, on the other hand, the stabilization (9.40). Here it gives thestabilized di�erential equationx0 = 0(t)� (x� (t)) :For = 0 the invariant is stable but not asymptotically stable, while for > 0M is asymptotically stable, with the monotonicity property (9.42) holding.But this asymptotic stability does not necessarily guarantee a vanishingdrift: consider forward Euler with step size h applied to the stabilized ODExn = xn�1 + h[ 0(tn�1)� (xn�1 � (tn�1))] :The best choice for is the one which yields no error accumulation. This isobtained for = 1=h, givingxn = (tn�1) + h 0(tn�1) :(Note that this depends on the discretization step size.) So, the driftzn � (tn) = �h22 00(tn�1) +O(h3) ;although second order in h, may not decrease and may even grow arbitrarilywith h �xed, if 00 grows. Such is the case, for instance, for (t) = sin t2 ast grows. 2For the post-stabilization to be e�ective, we must design F such thatkI �HFk � � < 1; (10.24)where H = hx. It has been shown, assuming (i) su�cient smoothness nearthe manifoldM and (ii) that either � = O(h) or (9.42) holds, that for anODE method of (nonsti�) order p:� The global error satis�esxn � x(tn) = O(hp) (10.25)(i.e. the stabilization does not change the method's global order, ingeneral).

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 285� There is a constantK that depends only on the local solution propertiessuch that jh(xn)j � K(�hp+1 + h2(p+1)) : (10.26)� If HF = I then jh(xn)j = O(h2(p+1)) : (10.27)Example 10.8 Recall the modi�ed Kepler problem of Exercise 4.19 (withthe notation e in place of H for the Hamiltonian there). For the unmodi�edproblem, � = 0 and the solution has period 2�. Thus, the error in thesolution can be simply measured at integer multiples of 2�. In Table 10.1 werecord results using post-stabilization with F = HT (HHT )�1 (denoted 'pstab')and also using an explicit 2nd order Runge-Kutta method with and withoutpost-stabilization (denoted 'pstab-eRK' and 'eRK', resp.), and the projectedmidpoint method of x10.1.5 and Exercise 10.9 ('proj-midpt') applied to theprojected invariant formulation (9.39). All runs are with uniform time stepsh and � = 0:6. Note that the projected midpoint method has better stabilityproperties and preserves the invariant, but the symmetry of the original ODEis lost.method h jq2(2�)j jq2(4�)j jq2(20�)j jq2(50�)j khk1midpt :01� .16 .30 .72 .10 .42e-2eRK :01� .12 .18 .67 .52 .36e-1pstab-midpt :01� .54e-2 .11e-1 .54e-1 .13 .81e-7pstab-eRK :01� .40e-2 .81e-2 .40e-1 .10 .15e-6proj-midpt :01� 14e-2 .28e-2 .14e-1 .34e-1 0midpt :001� .16e-2 .32e-2 .16e-1 .40e-1 .42e-4eRK :001� .15e-2 .29e-2 .12e-1 .20e-1 .41e-4pstab-midpt :001� .54e-4 .11e-3 .54e-3 .14e-2 .83e-13pstab-eRK :001� .40e-4 .81e-4 .40e-3 .10-2 .86e-13proj-midpt :001� .14e-4 .29e-4 .14e-3 .36e-3 0Table 10.1: Errors for Kepler's problem using various 2nd order methods.We observe the second order accuracy of all methods considered and theinvariant's accuracy order 2(p + 1) = 6 of the post-stabilization methods.

286 Chapter 10: Di�erential-Algebraic EquationsThe stabilization methods improve the constant of the global error, comparedto their unstabilized counterparts, but not the order. The cheapest methodhere, for the given range of time integration and relative to the quality of theresults, is the post-stabilized explicit method. The projected midpoint methodis more expensive than the rest and is not worth its price, despite being mostaccurate for a given step size.Note that the midpoint method loses all signi�cant digits for h = :01�before reaching t = 50�. The pointwise error does not explode, however, butremains O(1). Also, the error in the Hamiltonian remains the same, depend-ing only on the step size h, not on the interval length. Calculations withthe post-stabilized midpoint method up to t = 2000� yield similar conclusionsregarding the invariant's error for it as well (but not for the post-stabilizedexplicit Runge-Kutta method, where a smaller step size is found necessary).2 A closely related stabilization method is the coordinate projection method.Here, following the same unstabilized ODE integration step as before~xn = �fh(xn�1)we determine xn as the minimizer of jxn � ~xnj2 such that0 = h(xn) :There is a constrained least squares minimization problem to be solved forxn at each step n. As it turns out, the post-stabilization method (10.23) withF = HT (HHT )�1 (for which obviously HF = I), coincides with one Newtonstep for this local minimization problem. 4 An analogue of the relationshipbetween these two stabilization methods would be using a PECE versionof a predictor-corrector as in x5.4.2, compared to iterating the corrector toconvergence using a functional iteration. In particular, the two methodsalmost coincide when the step size h is very small.For this reason, there has been a tendency in the trade to view the twomethods of post-stabilization and coordinate projection as minor variantsof each other. There is an advantage in e�ciency for the post-stabilizationmethod, though. Note that the choice of F is more exible for the post-stabilization method, that (10.27) implies that the �rst Newton iteration ofthe coordinate projection method is already accurate to O(h2(p+1)), and thatno additional iteration at the current time step is needed for maintaining thisaccuracy level of the invariant in later time steps.4An energy norm jxn � ~xnj2A = (xn � ~xn)TA(xn � ~xn)for a positive de�nite matrix A can replace the 2-norm in this minimization, with a cor-responding modi�cation in F , see Exercise 10.13. The error bound (10.27) still holds forthe outcome of one Newton step.

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 287Exercise 4.17 provides another example of post-stabilization (which coin-cides with coordinate projection) in action.10.2.2 Choosing the Stabilization Matrix FThe smaller kI �HFk is, the more e�ective the post-stabilization step. Thechoice F = HT (HHT )�1 which was used in Example 10.8 above, or more gen-erally the choice corresponding to one Newton step of coordinate projectionF = D(HD)�1, achieves the minimumHF = I.However, choices of F satisfying HF = I may be expensive to apply. Inparticular, for the Euler-Lagrange equations (9.30), it is desirable to avoid thecomplicated and expensive matrix @(Gv)@q . Such considerations are application-dependent. To demonstrate possibilities, let us continue with the importantclass of Euler-Lagrange equations and setB =M�1GT :Note that inverting (or rather, decomposing) GB is necessary already toobtain the ODE with invariant (10.22).If we choose for the index-3 problemF = 0@B(GB)�1 00 B(GB)�11A (10.28)(or the sometimes better choiceF = 0@GT (GGT )�1 00 GT (GGT )�11Awhich, however, requires an additional cost) thenHF = 0@I 0L I1A ; L = @(Gv)@q B(GB)�1so HF 6= I.Note, however, that (I �HF )2 = 0:The e�ect ofHF = I can therefore be achieved by applying post-stabilizationwith the cheap F of (10.28) twice. The decomposition (or \inversion") neededfor evaluating F is performed once and this is frozen for further applicationat the same time step (possibly a few time steps).The application to multibody systems with holonomic constraints thenreads:

288 Chapter 10: Di�erential-Algebraic EquationsAlgorithm 10.1 Post-stabilization for multibody systems1. Starting with (qn�1;vn�1) at t = tn�1, use a favorite ODE integrationmethod �fh (e.g. Runge-Kutta or multistep) to advance the systemq0 = vM(q)v0 = f(q;v)�GT (q)�0 = G(q)v0 + @(Gv)@q vby one step. Denote the resulting values at tn by (~qn; ~vn).2. Post-stabilize using F of (10.28)0@qnvn1A = 0@~qn~vn1A� F (~qn; ~vn)h(~qn; ~vn)3. Set 0@qnvn1A = 0@qnvn1A� F (~qn; ~vn)h(qn; vn) :In case of nonholonomic constraints the DAE is index-2 and only oneapplication of F per step is needed.Example 10.9 Consider a two-link planar robotic system with a prescribedpath for its end e�ector (the \robot's hand"). Thus, one end of a rigid rodis �xed at the origin, and the other is connected to another rigid rod withrotations allowed in the x � y plane. Let �1 be the angle that the �rst rodmakes with the horizontal axis, and let �2 be the angle that the second rodmakes with respect to the �rst rod (see Fig. 10.5). The masses of the rodsare denoted by mi and their lengths are denoted by li. The coordinates of thelink between the rods are given byx1 = l1c1 y1 = l1s1and those of the \free" end arex2 = x1 + l2c12 y2 = y1 + l2s12where ci = cos �i; si = sin �i; c12 = cos(�1 + �2); s12 = sin(�1 + �2).Referring to the notation of the Euler-Lagrange equations (9.30), we letq = (�1; �2)T and obtainM = 0@m1l21=3 +m2(l21 + l22=3 + l1l2c2) m2(l22=3 + l1l2c2=2)m2(l22=3 + l1l2c2=2) m2l22=3 1A

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 289l

l21y

(x2 , y2 )

(x1 , y

1

1 )

θ

11

θ2

1xFigure 10.5: Two-link planar robotic systemf = 0@�m1gl1c1=2�m2g(l1c1 + l2c12=2)�m2gl2c12=2 1A+0@m2l1l2s2=2(2�01�02 + (�02)2)�m2l1l2s2(�01)2=2 1A :In the following simulation we use the datam1 = m2 = 36kg; l1 = l2 = 1m; g = 9:81m=s2�1(0) = 70o; �2(0) = �140o; �01(0) = �02(0) = 0:So far we do not have constraints g. Indeed, for a double pendulum theequations of motion form an implicit ODE (or an index-0 DAE), becausethe topology of this simple mechanical system has no closed loops and weare using relative (minimal) coordinates to describe the system. But now weprescribe some path constraint on the position of (x2; y2), and this yields, inturn, also a constraint force GT�. We choose the constrainty2(t) = sin2(t=2)(for y2 expressed in terms of q as described above). The obtained constrainedpath for (x2; y2) is depicted in Fig. 10.6. In this case the constraint forcesbecome large at a few distinct times.In Table 10.2 we record the measured drifts, i.e. the error in the pathconstraint ('drift-position') and in its derivative ('drift-velocity'), based onruns up to b = 10s using an explicit Runge-Kutta scheme of order 2 with a

290 Chapter 10: Di�erential-Algebraic Equations-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1y vs x -- position motion

Figure 10.6: Constraint path for (x2; y2).constant step size h. We record results using Baumgarte's technique (9.43),denoting it 'Baum( 1; 2)', and also using various choices of F for post-stabilization: 'S-pos' stands for stabilizing only with respect to the positionconstraints g(t;q) = 0; 'S-vel' stands for stabilizing only with respect to thevelocity constraints g0 = 0; 'S-both' stands for using F of (10.28) once; 'S-both2' is the choice recommended in Algorithm 10.1; and �nally 'S-full' usesF = HT (HHT )�1.Note that without stabilization the computation blows up for h = 0:01.The Baumgarte stabilization is not as e�ective as the S- stabilizations, espe-cially for the case h = :01. Other parameters ( 1; 2) tried do not yield signif-icantly better results. The choice of Algorithm 10.1 shows drift-convergenceorder 6 = 2(p + 1) and, given that it is much cheaper than S-full and notmuch more expensive than the other choices for F , we conclude that S-both2gives the most bang for the buck here. 210.3 Software, Notes and References10.3.1 NotesOf course, scientists encountered the need to numerically solve mathematicalmodels involving di�erential equations with constraints and implicit di�er-ential equations for many decades, if not centuries. But the recognition that

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 291h stabilization drift-velocity drift-position.01 Baum(0; 0) � �.01 Baum(12; 70) .51 .25e-1.01 S-full .72e-5 .63e-6.01 S-both .31e-1 .68e-5.01 S-both2 .20e-3 .68e-6.01 S-vel .60e-14 .78e-2.01 S-pos .73 .28e-2.001 Baum(0; 0) .66e-4 .56e-4.001 Baum(12; 70) .60e-3 .38e-4.001 S-full .39e-10 .53e-11.001 S-both .41e-4 .46e-11.001 S-both2 .20e-9 .78e-15.001 S-vel .43e-14 .58e-4.001 S-pos .44e-2 .88e-10Table 10.2: Maximum drifts for the robot arm; � denotes an error over ow.DAE classes are worth being considered as such, in order to methodicallyderive good numerical methods and software, is relatively recent.The original idea for discretizing DAEs directly with suitable ODE meth-ods was described in the landmark 1971 paper of Gear [42]. He used BDFmethods and applied them to problems of the type discussed in Example 9.3.This was followed in the 1980's by a deluge of e�orts to design and analyzenumerical methods and to write general-purpose software for DAEs. Thedirect discretization methods described in x10.1 are covered in more detailin Hairer & Wanner [52] and Brenan, Campbell & Petzold [19].We have chosen not to discuss convergence results for numerical methodsapplied directly to index-3 DAEs. However, there are convergence results forsome numerical methods (both BDF and Runge-Kutta) applied to Hessen-berg DAEs of index greater than two; see [44, 52].We have noted that direct discretization methods are not applicable togeneral, higher-index DAEs. Campbell [28, 29] has developed least-squarestype methods for such problems which may be viewed as automatic index

292 Chapter 10: Di�erential-Algebraic Equationsreduction. The methods require di�erentiation of the original DAE, whichis accomplished by an automatic di�erentiation software package such asadifor [16]. Using these and similar ideas, initialization schemes for generalDAE systems have been constructed.The �rst results on order reduction for general multistep methods appliedto higher-index DAEs were given by R. M�arz; see [46, 52] for a summary andfor further references.More details and proofs for x10.2.1 and x10.2.2 can be found in Chin [31]and [2]. See also [3], which is the source for Examples 10.8 and 10.9.Example 10.3 was taken from [58]. Other examples from various applica-tions in the literature are formulated as exercises below.10.3.2 SoftwareExcellent and widely-available software exists for the solution of initial-valueproblems and boundary-value problems in DAEs. Here we brie y outlinesome of the available codes. With the exception of software for mechanicalsystems, they all apply to sti� ODEs as well (and if you read the �rst fewpages of this chapter carefully then you should be able to understand whythis is natural).Initial Value Problems� The code dassl by Petzold uses the �xed-leading-coe�cient form ofthe BDF formulas to solve general index-1 DAEs; see [19] for details.Versions for large scale problems (called daspk) and for sensitivityanalysis are also available.� The code radau5 by Hairer & Wanner [52] is based on the 3-stageRadau collocation method. It solves DAEs of the formMy0 = ~f (t;y) (10.29)where M is a constant, square matrix which may be singular; see Ex-ercise 10.15. The code is applicable to problems of index 1, 2 or 3, butthe user must specify which variables are higher-index (this implies aspecial structure).� There are many codes, both commercial and publicly available, whichare designed speci�cally for simulating constrained mechanical sys-tems. They use many of the methods mentioned here, including Baum-garte stabilization, post-stabilization and coordinate projection, andvarious coordinate partitioning methods. The code mexx by Lubichet al. [64] is based on a half-explicit extrapolation method which we

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 293have not covered and implements fast linear algebra techniques for tree-structured mechanical systems.Boundary Value Problems� The code coldae by Ascher & Spiteri [11] uses projected Gauss collo-cation to extend colnew [13] for boundary value, semi-explicit index-2DAEs in the form (10.3). An additional singular value decompositiondecouples algebraic variables of di�erent index if needed.10.4 Exercises1. Show that the implicit Euler method is unstable for the DAE (10.6) if� < �:5.2. Consider the backward Euler method applied to the Hessenberg index-2DAE (10.2).(a) Show that the condition number of the iteration matrix is O(h�2).(b) How should the equations and variables be scaled to reduce thecondition number to O(1)?(c) What are the implications of scaling the variables on the accuracyone can expect in these variables from the linear system solver?3. Set " = 0 in Example 9.8 and solve the resulting DAE numerically. Youmay use any (justi�able) means you like, including index reduction anduse of an appropriate software package. Plot the solution and comparewith Fig. 9.2. Discuss.4. Consider two linked bars of length li and mass mi, i = 1; 2. One endof one bar is �xed at the origin, allowing only rotational motion in theplane (as in Fig. 10.5). The other end of the other bar is constrainedto slide along the x-axis.The equations of motion form a nonlinear index-3 DAE of the form(9.30). Using redundant, absolute coordinates, let ui; vi; �i be the co-ordinates of the center of mass of the ith bar. Then de�neq = (u1; v1; �1; u2; v2; �2)TM = diagfm1;m1;m1l21=3;m2;m2;m2l22=3gf = (0;�9:81; 0; 0;�9:81; 0)T

294 Chapter 10: Di�erential-Algebraic Equationsg =0BBBBBBBBB@ u1 � l1=2 cos �1v1 � l1=2 sin �1u2 � 2u1 � l2=2 cos �2v2 � 2v1 � l2=2 sin �2l1 sin �1 + l2 sin�2 1CCCCCCCCCA G = gq = 0BBBBBBBBB@ 1 0 l1=2 sin �1 0 0 00 1 �l1=2 cos �1 0 0 0�2 0 0 1 0 l2=2 sin �20 �2 0 0 1 �l2=2 cos �20 0 l1 cos �1 0 0 l2 cos �2 1CCCCCCCCCA :(a) Following the lines of Example 10.9, derive a more compact for-mulation of the slider-crank mechanism in relative coordinates,leading to only two ODEs and one constraint. What are the ad-vantages and disadvantages of each formulation?(b) Set m1 = m2 = 1, l1 = 1, �1(0) = 7�4 and �01(0) = 0. Computeand plot the solution for b = 70 and each of the two cases (i)l2 = 1:1 and (ii) l2 = 0:9. Your simulation method should usethe formulation in absolute coordinates given above, and combineindex reduction and some stabilization with an ODE solver or alower index DAE solver.Explain the qualitatively di�erent behavior observed for the dif-ferent values of l2.5. This exercise continues the previous one. Set the various parametersat the same values as above, except l2 = l1 = 1. Then the last row ofG vanishes, i.e. a singularity occurs, each time the periodic solutioncrosses a point where the two bars are upright, i.e., (�1; �2) = (�2 ; 3�2 ).(a) Use the same method you have used in the previous exercise tointegrate this problem, despite the singularity. Explain your ob-served results. [What you obtain may depend on the numericalmethod you use and the error tolerance you prescribe, so you areon your own: make sure the program is debugged before attempt-ing to explain the results.](b) Explain why a stabilization method which stabilizes only withrespect to the velocity constraints Gq0 = 0 would do signi�cantlyworse here than a method which stabilizes also with respect tothe position constraints g = 0. [Hint: you should have solvedExercise 10.4(b) before attempting this one.]6. Consider a semi-explicit index-1 DAE of the formf(t;x; z;x0) = 0g(t;x; z) = 0;

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 295where the matrices fx0 and gz are square and nonsingular.(a) Show that to specify a solution trajectory the initial value infor-mation needed is x(0) = x0.(b) The initialization problem is to �nd x00 = x0(0) and z0 = z(0).Describe a solution algorithm.7. Consider the index-1 DAE (10.3) and the two implicit midpoint meth-ods xn � xn�1h = f(tn�1=2; xn + xn�12 ; zn + zn�12 ) (10.30)0 = g(tn�1=2; xn + xn�12 ; zn + zn�12 )and xn � xn�1h = f(tn�1=2; xn + xn�12 ; zn�1=2) (10.31)0 = g(tn�1=2; xn + xn�12 ; zn�1=2):In the second method z(t) is approximated by a constant zn�1=2 oneach subinterval [tn�1; tn], so the resulting approximate solution z�(t)is discontinuous at mesh points tn.(a) Find an example to show that (10.31) has better stability proper-ties than (10.30). [This may be challenging.](b) Design an a posteriori process, i.e. a process that starts after thesolution to (10.31) has been calculated, to improve the approxi-mate values of z to be second order accurate at mesh points. Testthis on your example.8. Consider the Hessenberg index-2 DAE (10.2) and the midpoint method(10.31) applied to it.(a) Show that the global error is 2nd order on a uniform mesh (i.e.using a constant step size) but only 1st order on an arbitrary mesh.(b) What is the condition on the mesh to achieve 2nd order accuracy?9. (a) Describe the projected midpoint method based on (10.31) andshow that the obtained solution xn is 2nd order accurate.(b) Consider the following modi�cation of (10.31),xn � xn�1h = f(tn�1=2; xn + xn�12 ; zn�1=2) (10.32)0 = g(tn;xn)

296 Chapter 10: Di�erential-Algebraic Equationswhere g(0;x0) = 0 is assumed. Investigate the properties of thismethod and compare it to the projected midpoint method.[This exercise is signi�cantly more di�cult than the previous two.]10. (a) Apply the midpoint method, the projected midpoint method andthe method given by (10.32) to the problem of Example 10.2 withthe same data. Describe your observations.(b) Attempt to explain your observations.[This may prove di�cult; if in distress, see [9].]11. Consider the Hessenberg index-3 DAEy0 = xx0 = �z + �(t)0 = y � (t)where �(t); (t) are known, smooth functions.(a) Formulate this DAE as an ODE with invariant.(b) Discretize the stabilized ODE (9.40) with F = HT (HHT )�1 usingforward Euler. What is the best choice for ?(c) Formulate the Baumgarte stabilization (9.43) for this simple prob-lem and discretize it using forward Euler. Try to �gure out a bestchoice for the parameters 1 and 2 for this case.[This latter task should prove somewhat more di�cult [3].]12. Mechanical systems with holonomic constraints yield index-3 DAEs,as we have seen. Mechanical systems with nonholonomic constraintsinvolve constraints \on the velocity level", such as Gv = 0, but whichcannot be integrated into constraints involving generalized positionsq alone. So, mechanical systems with nonholonomic constraints yieldindex-2 DAEs.Now, every budding mechanical engineer doing robotics knows thatsystems with nonholonomic constraints are more complex and di�-cult than systems with holonomic constraints, whereas every buddingnumerical analyst knows that index-3 DAEs are harder than index-2DAEs.Who is right, the engineers or the numerical analysts? Explain.13. The coordinate projection method for an ODE with invariant (10.22),using an energy norm based on a symmetric positive de�nite matrix A,is de�ned as follows:

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 297� At time step n we use an ODE method for (10.22a) to advancefrom xn�1 to ~xn.� Then xn is determined as the solution of the constrained leastsquares problemminxn 12(xn � ~xn)TA(xn � ~xn)s: t: h(xn) = 0:Consider one Newton step linearizing h to solve this nonlinear system,starting from x0n = ~xn.(a) Show that this step coincides with a post-stabilization step (10.23)with F = A�1HT (HA�1HT )�1:(b) Assuming that the ODE discretization method has order p, showthat one Newton step brings the solution to within O(h2(p+1)) ofsatisfying the constraints (10.22b).(c) For the Euler-Lagrange equations, explain why it may be advan-tageous to choose A =M , where M is the mass matrix.14. Consider the Hamiltonian system describing the motion of a \sti� re-versed pendulum" [12],q0 = pp0 = �(r(q)� r0)rqr � "�2(�(q)� �0)rq�where q = (q1; q2)T ; p = (p1; p2)T , r = jqj2; rqr = r�1q, � =arccos (q1=r); rq� = r�2(�q2; q1)T (all functions of t of course). Setr0 = 1; �0 = �=4; q(0) = 1p2(1; 1)T ; p(0) = (1; 0)T . Let also eS(t) =12 [(rrTp)2 + (r � r0)2] and �eS = maxt2[0;5] jeS(0)� eS(t)j.(a) Use an initial value ODE solver to integrate this highly oscillatoryIVP for " = 10�1; 10�2 and 10�3. (Try also " = 10�4 if this is notgetting too expensive.) Record the values of �eS and conjecturethe value of this quantity in the limit "! 0.(b) Consider the DAE obtained as in Example 9.8,q0 = pp0 = �(r(q)� r0)rqr � (rq�)�0 = �(q)� �0 :Solve this DAE numerically subject to the same initial conditionsas above and calculate �eS. Compare to the conjectured limitfrom (a). Conclude that this DAE is not the correct limit DAE ofthe highly oscillatory ODE problem! [80]

298 Chapter 10: Di�erential-Algebraic Equations15. Many ODE problems arise in practice in the form (10.29) where M isa constant, possibly singular matrix. LetM = U 0@� 00 01AV Tdenote the singular value decomposition of M , where U and V areorthogonal matrices and � = diag(�1; : : : ; �m�l), with �1 � �2 � : : : ��m�l > 0.(a) Show that (10.29) can be written in the semi-explicit form (10.3),where 0@xz1A = V Tyand f and g are de�ned in terms of ~f , U , V and �.(b) Show that, since the transformation V is constant and well-conditioned,any Runge-Kutta or multistep discretization applied to (10.29)corresponds to an equivalent method applied to (10.3).16. [Parts (c) and (d) of this exercise are di�cult and time consuming.]The following equations describe a simple steady state, one-dimensional,unipolar hydrodynamic model for semiconductors in the isentropic case[7], �0 = �E � �J (10.33a)E0 = �� 1 (10.33b)� = J2=� + � (10.33c)�(0) = �(b) = ��: (10.33d)The constants J; �; b and �� > 1 are given. Although you don't needto know the physical interpretation to solve the exercise, we note that�(t) is the electron density and E(t) is the (negative) electric �eld. Theindependent variable t is a space variable. This model corresponds toa current-driven �+��+ device, and we concentrate on one �-region.The ow is subsonic where � > J and supersonic where � < J . Setting�� > J , the question is if transonic ows are possible, i.e., is there aregion (a�; b�) � (0; b) where � < J?(a) Show that for the subsonic or supersonic cases, (10.33) is an index-1, boundary value DAE.

Chapter 10: Numerical Methods for Di�erential-Algebraic Equations 299(b) Propose a computational approach for this boundary value DAEin the strictly subsonic or the supersonic case. Solve for the valuesJ = 1=2; � = 0; �� = 3; b = 10:3. [You may either write your ownprogram or use appropriate software.](c) For the transonic case, �(t) � J crosses from positive to negativeand back to positive (at least once), and the simple DAE modelbreaks down. The solution may be discontinuous at such crossingpoints. The (Rankine-Hugoniot) condition for a shock of this sortto occur at t = t0 is that the jump in � vanish across such a point.Using phase plane analysis, show that such transonic solutionsexist for suitable parameter values![This part of the exercise is suitable only for those who really likethe challenge of analysis.](d) For the numerical solution of the transonic case it seems best toabandon the DAE and regularize the problem: replace (10.33a)by �0 = �E � �J � "�00(a nontrivial, physically-based choice which allows � to be simplyevaluated, no longer one of the primary unknowns) and appendthe boundary condition@@��(��; J)�0(0) = ��E(0) � �J:Solve the obtained boundary value ODE for J = 2; � = 0; �� =3; b = 10:3 and " = 10�3. Plot the solution, experiment furtherwith di�erent values of b, and discuss.[Be warned that this may be challenging: expect interior sharp lay-ers where the DAE solution jumps. We suggest to use some goodsoftware package and to employ a continuation method (x8.4),starting with " = 1 and gradually reducing it.]

300 Bibliography

Bibliography[1] V.I. Arnold. Mathematical Methods of Classical Mechanics. Springer-Verlag, 1978.[2] U. Ascher. Stabilization of invariants of discretized di�erential systems.Numerical Algorithms, 14:1{24, 1997.[3] U. Ascher, H. Chin, L. Petzold, and S. Reich. Stabilization of constrainedmechanical systemswith DAEs and invariant manifolds. J. Mech. Struct.Machines, 23:135{158, 1995.[4] U. Ascher, H. Chin, and S. Reich. Stabilization of DAEs and invariantmanifolds. Numer. Math., 67:131{149, 1994.[5] U. Ascher, J. Christiansen, and R. Russell. Collocation software forboundary value ODE's. ACM Trans. Math Software, 7:209{222, 1981.[6] U. Ascher and P. Lin. Sequential regularization methods for nonlinearhigher index DAEs. SIAM J. Scient. Comput., 18:160{181, 1997.[7] U. Ascher, P. Markowich, P. Pietra, and C. Schmeiser. A phase planeanalysis of transonic solutions for the hydrodynamic semiconductormodel. Mathematical Models and Methods in Applied Sciences, 1:347{376, 1991.[8] U. Ascher, R. Mattheij, and R. Russell. Numerical Solution of Bound-ary Value Problems for Ordinary Di�erential Equations. SIAM, secondedition, 1995.[9] U. Ascher and L. Petzold. Projected implicit Runge-Kutta methods fordi�erential-algebraic equations. SIAM J. Numer. Anal., 28:1097{1120,1991.[10] U. Ascher, S. Ruuth, and B. Wetton. Implicit-explicit methods for time-dependent PDE's. SIAM J. Numer. Anal., 32:797{823, 1995.[11] U. Ascher and R. Spiteri. Collocation software for boundary valuedi�erential-algebraic equations. SIAM J. Scient. Comp., 15:938{952,1994. 301

302 Bibliography[12] U.M. Ascher and S. Reich. The midpoint scheme and variants for Hamil-tonian systems: advantages and pitfalls. 1997. Manuscript.[13] G. Bader and U. Ascher. A new basis implementation for a mixed orderboundary value ODE solver. SIAM J. Scient. Comput., 8:483{500, 1987.[14] R. Barrett, M. Beary, T. Chan, J. Demmel, J. Donald, J. Dongarra,V. Eijkhout, R. Pozo, C. Romaine, and H. Van der Vorst. Templatesfor the Solution of Linear Systems. SIAM, 1996.[15] J. Baumgarte. Stabilization of constraints and integrals of motion indynamical systems. Comp. Methods Appl. Mech., 1:1{16, 1972.[16] C. Bischof, A. Carle, G. Corliss, A. Griewank, and P. Hovland. adi-for - generating derivative codes from fortran programs. Scienti�cProgramming, 1:11{29, 1992.[17] G. Bock. Recent advances in parameter identi�cation techniques forODE. In P. Deu hard and E. Hairer, editors, Numerical treatment ofinverse problems, Boston, 1983. Birkhauser.[18] R.W. Brankin, I. Gladwell, and L.F. Shampine. Rksuite: A suite ofRunge-Kutta codes for the initial value problem for ODEs. Report 92-s1,Dept. Mathematics, SMU, Dallas, Texas, 1992.[19] K. Brenan, S. Campbell, and L. Petzold. Numerical Solution of Initial-Value Problems in Di�erential-Algebraic Equations. SIAM, second edi-tion, 1996.[20] F. Brezzi and M. Fortin. Mixed and Hybrid Finite Element Methods.Springer-Verlag, New York, 1991.[21] P. N. Brown, G. D. Byrne, and A. C. Hindmarsh. vode, a variable-coe�cient ODE solver. SIAM J. Sci. Stat. Comput., 10:1038{1051, 1989.[22] A. Bryson and Y.C. Ho. Applied Optimal Control. Ginn and Co.,Waltham, MA, 1969.[23] K. Burrage. Parallel and Sequential Methods for Ordinary Di�erentialEquations. Oxford University Press, 1995.[24] K. Burrage and J.C. Butcher. Stability criteria for implicit Runge-Kuttamethods. SIAM J. Numer. Anal., 16:46{57, 1979.[25] K. Burrage, J.C. Butcher, and F. Chipman. An implementation ofsingly-implicit Runge-Kutta methods. BIT, 20:452{465, 1980.

Bibliography 303[26] J. C. Butcher. The Numerical Analysis of Ordinary Di�erential Equa-tions. Wiley, 1987.[27] M.P. Calvo, A. Iserles, and A. Zanna. Numerical solution of isospectral ows. Technical report, DAMTP, Cambridge, 1995.[28] S. L. Campbell. Least squares completions of nonlinear di�erential-algebraic equations. Numer. Math., 65:77{94, 1993.[29] S. L. Campbell. Numerical methods for unstructured higher-indexDAEs. Annals of Numer. Math., 1:265{278, 1994.[30] J.R. Cash and M.H.Wright. User's guide for twpbvp: a code for solvingtwo-point boundary value problems. Technical report, on line in netlib,1996.[31] H. Chin. Stabilization methods for simulations of constrained multibodydynamics. PhD thesis, Institute of Applied Mathematics, University ofBritish Columbia, 1995.[32] W.A. Coppel. Dichotomies in stability theory. Springer-Verlag, 1978.Lecture Notes in Math. Vol. 629.[33] G. Dahlquist. A special stability problem for linear multistep methods.BIT, 3:27{43, 1963.[34] G. Dahlquist. Error analysis for a class of methods for sti� nonlinearinitial value problems. In Numerical Analysis, Dundee, pages 50{74.Springer, 1975.[35] C. de Boor. Good approximation by splines with variable knots. ii. InSpringer Lecture Notes in Mathematics, 353, 1973.[36] C. de Boor and B. Swartz. Collocation at Gaussian points. SIAM J.Numer. Anal., 10:582{606, 1973.[37] L. Dieci, R.D. Russell, and E.S. Van Vleck. Unitary integrators andapplications to continuous orthonormalization techniques. SIAM J, Nu-mer. Anal., 31:261{281, 1994.[38] E. Doedel and J. Kernevez. Software for continuation problems in ordi-nary di�erential equations. SIAM J. Numer. Anal., 25:91{111, 1988.[39] J.R. Dormand and P.J. Prince. A family of embedded Runge-Kuttaformulae. J. Comp. Appl. Math., 6:19{26, 1980.

304 Bibliography[40] E. Fehlberg. Low order classical Runge-Kutta formulas with step sizecontrol and their application to some heat transfer problems. Comput-ing, 6:61{71, 1970.[41] L. Fox. The Numerical Solution of Two-Point Boundary Value Problemsin Ordinary Di�erential Equations. Oxford University Press, 1957.[42] C. W. Gear. The simultaneous numerical solution of di�erential-algebraic equations. IEEE Trans. Circuit Theory, CT-18:89{95, 1971.[43] C. W. Gear. Numerical Initial Value Problems in Ordinary Di�erentialEquations. Prentice-Hall, 1973.[44] C. W. Gear and J. B. Keiper. The analysis of generalized BDF methodsapplied to Hessenberg form DAEs. SIAM J. Numer. Anal., 28:833{858,1991.[45] C.W. Gear, G. Gupta, and B. Leimkuhler. Automatic integration of theEuler-Lagrange equations with constraints. J. Comput. Appl. Math.,pages 77{90, 1985.[46] E. Griepentrog and R. M�arz. Di�erential-Algebraic Equations and TheirNumerical Treatment. Teubner, 1986.[47] J. Guckenheimer and P. Holmes. Nonlinear Oscillations, DynamicalSystems, and Bifurcations of Vector Fields. Springer-Verlag, New York,1983.[48] W. Hackbusch. Iterative Solution of Large Sparse Systems of Equations.Springer-Verlag, 1994.[49] E. Hairer, Ch. Lubich, and M. Roche. The Numerical Solution ofDi�erential-Algebraic Systems by Runge-Kutta Methods, volume 1409.Springer-Verlag, 1989.[50] E. Hairer, S.P. Norsett, and G. Wanner. Solving Ordinary Di�erentialEquations I: Nonsti� Problems. Springer-Verlag, second edition, 1993.[51] E. Hairer and D. Sto�er. Reversible long term integration with variablestep sizes. SIAM J. Scient. Comput., 18:257{269, 1997.[52] E. Hairer and G. Wanner. Solving Ordinary Di�erential Equations II:Sti� and Di�erential-Algebraic Problems. Springer-Verlag, 1991.[53] N.A. Haskell. The dispersion of surface waves in multilayered media.Bull. Seis. Soc. Am., 43:17{34, 1953.

Bibliography 305[54] P. Henrici. Discrete Variable Methods in Ordinary Di�erential Equa-tions. John Wiley, 1962.[55] M. Hirsch, C. Pugh, and M. Shub. Invariant manifolds, volume 583.Springer-Verlag, 1976.[56] T.E. Hull, W.H. Enright, and K.R. Jackson. User's guide for dverk {a subroutine for solving non-sti� ODEs. Report 100, Dept. ComputerScience, U. Toronto, 1975.[57] A. Jameson. Computational transonics. Comm. Pure Appl. Math.,XLI:507{549, 1988.[58] W. Kampowsky, P. Rentrop, and W. Schmidt. Classi�cation and nu-merical simulation of electric circuits. Surv. Math. Ind., 2:23{65, 1992.[59] H. B. Keller. Numerical Solution of Two Point Boundary Value Prob-lems. SIAM, 1976.[60] B.L.N. Kennett. Seismic wave propagation in strati�ed media. Cam-bridge University Press, 1983.[61] W. Kutta. Beitrag zur n�aherungsweisen integration totaler di�erential-gleichungen. Zeitschr. f�ur Math u. Phys., 46:435{453, 1901.[62] J. D. Lambert. Numerical Methods for Ordinary Di�erential Systems.Wiley, 1991.[63] M. Lentini and V. Pereyra. An adaptive �nite di�erence solver for non-linear two-point boundary value problems with mild boundary layers.SIAM J. Numer. Anal., 14:91{111, 1977.[64] Ch. Lubich, U. Nowak, U. Pohle, and Ch. Engstler. mexx { numer-ical software for the integration of constrained mechanical multibodysystems. Preprint sc 92-12, ZIB Berlin, 1992.[65] Jerry B. Marion and Stephen T. Thornton. Classical Dynamics of Par-ticles and Systems. Harcourt Brace Jovanovich, third edition, 1988.[66] R. M�arz. Numerical methods for di�erential-algebraic equations. ActaNumerica, 1:141{198, 1992.[67] R.M.M. Mattheij and J. Molnaar. Ordinary Di�erential Equations inTheory and Practice. Wiley, Chichester, 1996.[68] R.M.M. Mattheij and G.W.M. Staarink. Implementing multiple shoot-ing for nonlinear BVPs. Rana 87-14, EUT, 1987.

306 Bibliography[69] C.C. Pantelides. The consistent initialization of di�erential-algebraicsystems. SIAM J. Scient. Comput., 9:213{231, 1988.[70] V. Pereyra and G. Sewell. Mesh selection for discrete solution of bound-ary value problems in ordinary di�erential equations. Numer. Math.,23:261{268, 1975.[71] L.R. Petzold, L.O. Jay, and J. Yen. Numerical solution of highly oscil-latory ordinary di�erential equations. Acta Numerica, pages 437{484,1997.[72] F. Potra and W. Rheinboldt. On the numerical solution of the Euler-Lagrange equations. Mech. Structures Mach., 1, 1991.[73] A. Prothero and A. Robinson. On the stability and accuracy of one-step methods for solving sti� systems of ordinary di�erential equations.Math. Comp., 28:145{162, 1974.[74] P. Rabier and W. Rheinboldt. On the computation of impasse points ofquasilinear di�erential algebraic equations. Math. Comp., 62:133{154,1994.[75] P. J. Rabier and W. C. Rheinboldt. A general existence and uniquenesstheorem for implicit di�erential algebraic equations. Di�. Int. Eqns.,4:563{582, 1991.[76] P. J. Rabier and W. C. Rheinboldt. A geometric treatment of implicitdi�erential-algebraic equations. J. Di�. Eqns., 109:110{146, 1994.[77] M. Rao. Ordinary Di�erential Equations Theory and Applications. Ed-ward Arnold, 1980.[78] S. Reddy and N. Trefethen. Stability of the method of lines. Numer.Math., 62:235{267, 1992.[79] W.C. Rheinboldt. Di�erential-algebraic systems as di�erential equationson manifolds. Math. Comp., 43:473{482, 1984.[80] H. Rubin and P. Ungar. Motion under a strong constraining force.Comm. Pure Appl. Math., 10:65{87, 1957.[81] C. Runge. Ueber die numerische au �osung von di�erentialgleichungen.Math. Ann., 46:167{178, 1895.[82] J.M. Sanz-Serna and M.P. Calvo. Numerical Hamiltonian Problems.Chapman and Hall, 1994.

Index 307[83] T. Schlick, M. Mandziuk, R.D. Skeel, and K. Srinivas. Nonlinear reso-nance artifacts in molecular dynamics simulations. 1997. Manuscript.[84] M.R. Scott and H.A. Watts. Computational solution of linear two-pointboundary value problems. SIAM J. Numer. Anal., 14:40{70, 1977.[85] L. F. Shampine. Numerical Solution of Ordinary Di�erential Equations.Chapman & Hall, 1994.[86] L. F. Shampine and M. K. Gordon. Computer Solution of OrdinaryDi�erential Equations. W. H. Freeman and Co., 1975.[87] L.F. Shampine and H.A. Watts. The art of writing a Runge-Kutta code,part i. In J.R. Rice, editor, Mathematical Software III, pages 257{275.Academic Press, 1977.[88] I. Stakgold. Green's functions and boundary value problems. Wiley,1979.[89] H. Stetter. Analysis of Discretization Methods for Ordinary Di�erentialEquations. Springer, 1973.[90] G. Strang and G. Fix. An Analysis of the Finite Element Method.Prentice-Hall, Englewood Cli�s, NJ, 1973.[91] J.C. Strikwerda. Finite Di�erence Schemes and Partial Di�erentialEquations. Wadsworth & Brooks/Cole, 1989.[92] S.H. Strogatz. Nonlinear dynamics and chaos. Addison-Wesley, Read-ing, MA, 1994.[93] A.M. Stuart and A.R. Humphries. Dynamical systems and numericalanalysis. Cambridge University Press, Cambridge, England, 1996.[94] J.H. Verner. Explicit Runge-Kutta methods with estimates of the localtruncation error. SIAM J. Numer. Anal., 15:772{790, 1978.[95] R.A. Wehage and E.J. Haug. Generalized coordinate partitioning fordimension reduction in analysis of constrained dynamic systems. J. ofMechanical Design, 104:247{255, 1982.[96] R. Weiss. The convergence of shooting methods. BIT, 13:470{475, 1973.[97] S.J. Wright. Stable parallel algorithms for two-point boundary valueproblems. SIAM J. Scient. Comput., 13:742{764, 1992.

IndexImsl, 66Mathematica, 65Matlab, 65, 99, 113Nag, 66, 187Netlib, 66, 112, 187Absolute stability, 42{47implicit Runge-Kutta methods, 104plotting the region of, 89, 144region of, 43explicit Runge-Kutta methods, 89{91multistep methods, 143{145Accuracy, order of, 38Adams methods, 128{1310-stability, 143absolute stability, 144Adams-Bashforth (explicit)method, 129Adams-Moulton (implicit) method, 129Algebraic variables (DAE), 234Almost block diagonal, 208Arti�cial di�usion, 228Asymptotic stability, see Stability, asymp-toticAutomatic di�erentiation, 65, 292Autonomous, 3, 33, 83B-convergence, 110Backward di�erentiation formulae, see BDFBackward Euler method, 47{56, 130DAE, 265{268region of absolute stability, 50solution of nonlinear system, 50BDF methods, 131{1320-stability, 143DAE, 268{270Bifurcation diagram, 212

Boundary conditionsDirichlet, 227non-separate, 174periodic, 163, 174, 192, 223separated, 163, 204two-point, 163Boundary layer, see LayerBoundary value problems (BVP), 8continuation, 211damped Newton method, 210decoupling, 186, 220deferred correction, 208error estimation, 213extrapolation, 208�nite di�erence methods, 193{2300-stability, 201collocation, 206consistency, 201convergence, 201solving the linear equations, 204sti� problems, 215for PDEs, 206in�nite interval, 188mesh selection, 213midpoint method, 194multiple shooting method, 183{186Newton's method, 197reduced superposition, 186Riccati method, 186simple shooting method, 177{182software, 223stabilized march method, 186superposition, 186trapezoid method, 223BVP codes308

Index 309auto, 223colnew, 223colsys, 223mus, 187pasvar, 223suport, 187twpbvp, 223Chaos, 158Characteristic polynomial, 26Chemical reactionBVP example, 178, 213, 224Collocation methodsbasic idea, 102for BVPs, 206Gauss formulae, 101Lobatto formulae, 102order for DAEs, 272order of, 103projected, for DAE, 281Radau formulae, 101relation to implicit Runge-Kutta, 102Compact �nite di�erence methods, 226Compacti�cationmultiple shooting method, 191Condition numbereigenvalue matrix, 46iteration matrix (DAE), 278orthogonal matrix, 56Conservative system, 29Consistency, 38BVPs, �nite di�erence methods, 201multistep methods, 137Constraint manifold, 240Constraints (DAE), hidden, 234Continuation methods, 90, 211{213arclength, 213Continuous extension, 110Contraction mapping, 51Convection-di�usion equation (PDE), 161Convergence, 38BDF methods for DAEs, 268BVPs, �nite di�erence methods, 201

calculated rate, 79multistep methods, 134of order p, 39Runge-Kutta methods, 83Coordinate partitioning (DAE), 254Corrector formula, 147Crank-Nicolson method for PDEs, 69DAE codescoldae, 293daspk, 293dassl, 293mexx, 293radau5, 293Damped Newton method, 210Decoupling, 172, 186, 222, 267BVP, 220{221Decoupling methods (BVP), 172Deferred correction method, 208Degrees of freedom (DAE), 233Delay di�erential equation, 111, 189Dense output, 110Diagonally implicit Runge-Kutta methods(DIRK), 106Dichotomy, 170, 220exponential, 170Di�erence equations, 137Di�erence operator, 38Di�erential variables (DAE), 234Di�erential-algebraic equations (DAE), 10,231index reduction and stabilization, 247algebraic variables, 234BDF methods, 268consistent initial conditions, 233, 276constraint stabilization, 253convergence of BDF methods, 268coordinate partitioning, 254di�erential geometric approach, 256di�erential variables, 234direct discretization methods, 264existence and uniqueness, 256fully-implicit index-1, 263

310 IndexHessenberg form, 238, 257Hessenberg index-2, 239Hessenberg index-3, 240hidden constraints, 234higher-index, 234index reduction, unstabilized, 249index, de�nition, 235least squares methods, 291multistep methods, 269numerical methods, 263ODE with constraints, 10reformulation of higher-index DAEs, 248regularization, 264semi-explicit, 10, 234semi-explicit index-1, 238simple subsystems, 232singular, 257, 264stabilization of the constraint, 251stabilized index-2 formulation, 260state space formulation, 253underlying ODE, 246, 254Di�erentiationautomatic, 75symbolic, 75Discontinuitydiscretization across, 61location of, 63Dissipativity, 111Divergence, 17, 30Divided di�erences, 126Drift o� the constraint (DAE), 251Dry friction, 63Dynamical system, 15discrete, 112Eigenvalue, 20Eigenvector, 20Errorconstant (multistep methods), 136equidistribution, 215, 222global, 39local, 41local truncation, 38

toleranceabsolute and relative, 91Error estimationBVPs, 213embedded Runge-Kutta methods, 92global error, 95index-2 DAE, 279multistep methods, 152Runge-Kutta methods, 91step doubling, 94Euler methodbackward (implicit), 35forward (explicit), 35symplectic, 116written as Runge-Kutta, 81Event location, 63, 111Explicitmethod, 37ODE, 9Extraneous roots, 139Extrapolation, 110, 208Finite element method, 222Fully-implicit index-1 DAEs, 263Functional iteration, 50multistep methods, 146Fundamental solution, 26, 166in shooting method, 178Fundamental theorem, 6di�erence methods, 39Gauss collocation, 101, 104, 115, 119, 121,207, 210, 213, 220, 223, 293Gaussianpoints, 76, 104, 120quadrature, 76Global error, 39estimates of, 95, 213Gradient, 17, 29Green's function, 168, 185Half-explicit Runge-Kutta methods (DAE),281Hamiltonian, 29

Index 311Hamiltonian systems, 29, 111, 116, 123invariants, 250preservation of the invariant, 255Hermite interpolation, 111Hessenberg form (DAE), 238, 257Higher index DAEs, 234Homotopy path, 211Hopf bifurcation, 67Implicitmethod, 49ODE, 10Runge-Kutta methods, 101{109implementation, 105, 109Implicit Euler method, see Backward EulermethodImplicit-explicit (IMEX) methods, 161IncompressibleNavier-Stokes equations, 239Index, 232{247de�nition, 235di�erential, 257perturbation, 257reductionstabilized index-2 formulation, 260unstabilized, 249Initial conditions, consistent (DAE), 233Initial layer, see LayerInitial value problem (IVP), 3InstabilityDAE, drift o� the constraint, 251Interpolating polynomial and divided dif-ferencesreview, 126Invariantintegral, 120, 251ODE with, 120Invariant set, 15, 32, 247, 249, 253, 282Isolated solution (BVP), 165Isolated solution (IVP), 159Isospectral ow, 121Iteration matrix, 53Jacobian matrix, 7, 17di�erence approximation, 54

Kepler problem, modi�ed, 123Kronecker productreview, 105Krylov space methods, 156Lagrange multiplier, 10DAEs and constrained optimization, 240Layerboundary, 195, 207, 215, 218{220, 224initial, 47, 58, 60, 229Leapfrog (Verlet) method, 116, 256Limit cycle, 5, 67Limit set, 15Linearization, local, 28Lipschitzconstant, 7continuity, 6, 40Lobatto collocation, 102, 104, 121, 207Local elimination, 208Local error, 41control of, in Runge-Kutta methods, 91estimation by step doubling, 94relationship to local truncation error,41Local extrapolation, 94Local truncation error, 38, 64BVPs, �nite di�erence methods, 201estimation of (multistep methods), 153multistep methods, 134principal term (multistepmethods), 154relation to local error, 64Long time integration, 111Lyapunov function, 32Matrixbanded, 56sparse, 56Matrix decompositionsLU , 55QR, 56review, 54Matrix eigenvaluesreview, 19Matrix exponential

312 Indexreview, 24Mechanical systems, 11, 240generalized coordinate partitioning method,257reformulation of higher-index DAEs, 248Mesh, 35locally almost uniform, 229Mesh function, 38Mesh Reynolds number, 227Mesh selection (BVP), 213Method of lines, 6, 12, 64, 161, 212, 280heat equation stability restriction, 69transverse, 13Midpoint method, 68, 194dynamic equivalence to trapezoid method,68explicit, 78explicit, written as Runge-Kutta, 82staggered, 225Milne's estimatelocal truncation error (predictor-correctormethods), 153Milne's method (multistep method), 141Modesolution, 27, 167Model reduction, 99Molecular dynamics, 116Moving mesh method (PDEs), 255Multiple shooting method, 183compacti�cation, 191matrix, 185, 202on parallel processors, 185patching conditions, 183Multiple time scales, 47Multirate method, 112Multistep codesdaspk, 156dassl, 156difsub, 156ode, 156vode, 156vodpk, 156Multistep methods, 125

absolute stability, 143Adams methods, 128BDF, 131characteristic polynomials, 137consistency, 137DAE, 269error constant, 136implementation, 146initial values, 132local truncation error, 134order of accuracy, 134predictor-corrector, 146software design, 149variable step-size formulae, 150Newton iterationbackward Euler method, 51DAE, 268di�erence approximation, 54implicit Runge-Kutta methods, 105in shooting method, 178Newton's methoddamped, 210modi�ed, 148quasi-Newton, 180review, 53Newton-Kantorovich Theorem, 190Nonautonomous ODEtransformation to autonomous form, 83ODEexplicit, 9implicit, 10, 72linear constant-coe�cient system, 22on a manifold, 248with constraints, 234with invariant, 247, 250O�-step points (multistep methods), 155One-step methods, 73Optimal control, 13adjoint variables, 14Hamiltonian function, 14Order notationreview, 36

Index 313Order of accuracymultistep methods, 134Runge-Kutta methods, 83, 86, 88, 103Runge-Kutta methods for DAEs, 291Order reduction, 108{109DIRK, for DAEs, 273in BVPs, 220Runge-Kutta methods (DAE), 271Order selection (multistep methods), 154Order stars, 110Oscillator, harmonic, 30, 63Oscillatory system, 65, 242, 255Parallel methodRunge-Kutta, 112Parallel shooting method, 185Parameter condensation, 208Parameter estimation, 15Parasitic roots, 139Partial di�erential equation (PDE), 12, 206,223Path following, 212Pendulum, sti� spring, 242Perturbationsinitial data, 21inhomogeneity, 27Preconditioning, 156Predator-prey model, 4Predictor polynomial, 152Predictor-corrector methods, 146Principal error function, 95Principal root, 139Projected collocation methods, 281Projected Runge-Kutta methods, 281Projection matrix, orthogonal, 170Quadrature rulesreview, 75Quasilinearization, 197{200with midpoint method for BVPs, 200Radau collocation, 101, 104, 119, 274, 292Reduced solution, 57, 243Reduced superposition, 186

Reformulation, boundary value problems,172Regularization (DAE), 264ReviewBasic quadrature rules, 75Kronecker product, 105Matrix decompositions, 54Matrix eigenvalues, 19Matrix exponential, 24Newton's method, 52Order notation, 36Taylor's theorem for a function of sev-eral variables, 73The interpolating polynomial and di-vided di�erences, 126Riccati method, 172, 186Root condition, 140Rough problems, 61Runge-Kutta codesdopri5, 112dverk, 112ode45 (Matlab), 112radau5, 113rkf45, 112rksuite, 112stride, 113Runge-Kutta methodsabsolute stability, 89, 104Butcher tree theory, 110DAE, 270{282diagonally implicit (DIRK), 106Dormand & Prince 4(5) embedded pair,94embedded methods, 92explicit, 81Fehlberg 4(5) embedded pair, 93fourth order classical, 79, 82, 86general formulation, 81half-explicit, for DAEs, 281historical development, 110implicit, 101low order, 76mono-implicit, 223

314 Indexorder barriers, 110order of accuracy by Butcher trees, 84order results for DAEs, 291projected, for DAEs, 281singly diagonally implicit (SDIRK), 107singly implicit (SIRK), 109Semi-explicit index-1 DAE, 238Sensitivityanalysis, 96, 292boundary value problems, 176parameters, 96Shooting method, 177algorithm description, 179di�culties, 180di�culties for nonlinear problems, 182multiple shooting method, 183simple shooting, 177single shooting, 177stability considerations, 180Similarity transformation, 20, 24Simple pendulum, 3, 10, 120, 241Singly diagonally implicit (SDIRK) Runge-Kutta methods, 107Singly implicitRunge-Kutta methods (SIRK),109Singular perturbation problemsrelation to DAEs, 243Smoothing, 231Sparse linear system, 199, 204Spectral methods (PDEs), 161Spurious solution, 112Stability0-stability, 39, 42, 83, 139{143, 201{203A-stability, 56, 104, 144absolute stability, 42algebraic stability, 119AN-stability, 68asymptoticdi�erence equations, 139asymptotic, of the constraint manifold,251boundary value ODE, 168{171

di�erence equations, 138initial value DAE, 245{247initial value ODE, 19{33asymptotic, 21, 25, 27nonlinear, 28relative stability, 143resonance instability, 117root condition (multistepmethods), 140scaled stability region, 114strong stability (multistepmethods), 141weak stability (multistepmethods), 141Stability constantboundary value problem, 169, 185, 203initial value problem, 27StabilizationBaumgarte, 253, 260coordinate projection (DAE), 283, 286of the constraint (DAE), 251post-stabilization (DAE), 283Stabilized index-2 formulation (DAE), 260Stabilized march method, 186Stage order, 273State space formulation (DAE), 253Steady state, 28, 212Step size, 35Step size selectionmultistep methods, 154Runge-Kutta methods, 91Sti� boundary value problems�nite di�erence methods, 215Sti� decay, 57, 65, 102, 104, 114, 131DAE, 272ODE methods for DAEs, 264Sti�y accurate, 102, 114Sti�ness, 47boundary value problems, 171de�nition, 48system eigenvalues, 48transient, 47Strange attractor, 158Superposition method, 186Switching function, 63Symmetric methods, 58{61

Index 315Symmetric Runge-Kutta methods, 119Symplectic map, 30Symplectic methods, 111Taylor series method, 74Taylor's theorem, several variablesreview, 73Test equation, 21, 42Theta method, 115Transformationdecoupling (BVP), 217decoupling (DAE), 267Transformation, stretching, 70Trapezoid method, 35, 130derivation, 58dynamic equivalence to midpointmethod,68explicit, 78explicit, written as Runge-Kutta, 82Upstream di�erence, 216, 227Upwind di�erence, 216, 227Variable step size multistep methods�xed leading-coe�cient strategy, 152variable-coe�cient strategy, 150Variationalboundary value problem, 165equation, 28, 178Vibrating spring, 9, 25Waveform relaxation, 112Well-posed problem, 7continuous dependence on the data, 7existence, 7uniqueness, 7

ascher petzold

Documents