files.transtutors.com · textbooks in mathematics series editor: denny gulick published titles...

MATHEMATICAL AND EXPERIMENTAL

MODELING OF PHYSICAL AND

BIOLOGICAL PROCESSES

TEXTBOOKS in MATHEMATICS

Series Editor: Denny Gulick

PUBLISHED TITLES

COMPLEX VARIABLES: A PHYSICAL APPROACH WITH APPLICATIONS AND MATLAB®

Steven G. Krantz

INTRODUCTION TO ABSTRACT ALGEBRAJonathan D. H. Smith

LINEAR ALBEBRA: A FIRST COURSE WITH APPLICATIONSLarry E. Knop

MATHEMATICAL AND EXPERIMENTAL MODELING OF PHYSICAL AND BIOLOGICAL PROCESSESH. T. Banks and H. T. Tran

FORTHCOMING TITLES

ENCOUNTERS WITH CHAOS AND FRACTALSDenny Gulick

MATHEMATICAL AND EXPERIMENTAL

MODELING OF PHYSICAL AND

BIOLOGICAL PROCESSES

H. T. BanksH. T. Tran

TEXTBOOKS in MATHEMATICS

CRC PressTaylor & Francis Group6000 Broken Sound Parkway NW, Suite 300Boca Raton, FL 33487-2742

© 2009 by Taylor & Francis Group, LLCCRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government worksVersion Date: 20130920

International Standard Book Number-13: 978-1-4200-7338-6 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.

Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.com

and the CRC Press Web site athttp://www.crcpress.com

Preface

For the past several years, the authors have developed and taught a two-semester modeling course sequence based on fundamental physical and bio-logical processes: heat flow, wave propagation, fluid and structural dynamics,structured population dynamics, and electromagnetism. Among the specifictopics covered in the courses were thermal imaging and detection, dynamicproperties (stiffness, damping) of structures such as beams and plates, acous-tics and fluid transport, size-structured population dynamics, electromagneticdispersion and optics.

One of the major difficulties (theoretically, computationally, and technolog-ically) in mathematical model development is the process of comparing modelsto the field data. Typically, mathematical models contain parameters and co-efficients that are not directly measurable in experiments. Hence, experimentsmust be carefully designed in order to provide sufficient data for model pa-rameters and/or coefficients to be determined accurately. In this context, amajor innovative component of the course has been the exposure of studentsto specific laboratory experiments, data collection and analysis. As usual insuch modeling courses, the pedagogy involves beginning with first principlesin a physical, chemical or biological process and deriving quantitative mod-els (partial differential equations with initial conditions, boundary conditions,etc.) in the context of a specific application, which has come from a “clientdiscipline” — academic, government laboratory, or industrial research group,such as thermal nondestructive damage detection in structures, active noisesuppression in acoustic chambers, smart material (piezoceramic sensing andactuation) structures vibration suppression, or optimizing the introduction ofmosquitofish into rice fields for the control of mosquitos. The students thenuse the models (with appropriate computational software — some from MAT-LAB, some from the routines developed by the instructors specifically for thecourse) to carry out simulations and analyze experimental data. The studentsare exposed to experimental design and data collection through laboratory de-mos in certain experiments and through actual hands-on experience in otherexperiments.

Our experience with this approach to teaching advanced mathematics witha strong laboratory experience has been, not surprisingly, overwhelminglypositive. It is one thing to hear lectures on natural modes and frequencies(eigenfunctions and eigenvalues) or even to compute them, but quite anotherto go to the laboratory, excite the structure, see the modes, and take data toverify your theoretical and computational models.

Indeed, in writing this book, which is based on these experimentally orientedmodeling courses, the authors aim to provide the reader with a fundamentalunderstanding of how mathematics is applied to problems in science and en-gineering. Our approach will be through several “case study” problems thatarise in industrial and scientific research laboratory applications. For eachcase study problem the perception on why a model is needed and what goalsare to be sought will be discussed. The modeling process begins with theexamination of assumptions and their translation into mathematical models.An important component of the book is the designing of appropriate exper-iments that are used to validate the mathematical model’s development. Inthis regard, both hardware and software tools, which are used to design theexperiments, will be described in sufficient detail so that the experiments canbe duplicated by the interested reader. Several projects, which were devel-oped by the authors in their own teaching of the above-mentioned modelingcourses, will also be included.

The book is aimed at advanced undergraduate and/or first year graduatestudents. The emphasis of the book is on the application as well as whatmathematics can tell us about it. The book should serve both to give thestudent an appreciation of the use of mathematics and also to spark studentinterest for deeper study of some of the mathematical and/or applied topicsinvolved.

The completion of this text involved considerable assistance from others.Foremost, we would like to express our gratitude to many students, postdoc-toral fellows and colleagues (university and industrial/government laboratorybased scientists) over the past decades, who generously contributed to numer-ous research efforts on which our modules/projects are based. Specifically,we wish to thank Sarah Grove, Nathan Gibson, Scott Beeler, Brian Lewis,Cammey Cole, John David, Adam Attarian, Amanda Criner, Jimena Davis,Stacey Ernstberger, Sava Dediu, Clay Thompson, Zackary Kenz, Shuhua Huand Nate Wanner among our many young colleagues for their assistance inreading various drafts or portions of this book. (Of course, any remainingerrors, poor explanations, etc., are solely the responsibility of the authors.)Finally, the authors wish to acknowledge the unwavering support of our fami-lies in our efforts in the development and completion of this manuscript as wellas other aspects of our professional activities. For their support, patience andlove, this book is dedicated to Susie, John, Jennifer, Thu, Huy and Hoang.

H. T. BanksH. T. Tran

List of Tables

3.1 Estimation using the OLS procedure with CV data for η = 5. 493.2 Estimation using the GLS procedure with CV data for η = 5. 493.3 Estimation using the OLS procedure with NCV data for η = 5. 493.4 Estimation using the GLS procedure with NCV data for η = 5. 493.5 χ2(1) values. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1 Range of values of h in Newton cooling. . . . . . . . . . . . . 915.2 Type T thermocouples: Coefficients of the approximate inverse

function giving temperature u as a function of the thermoelec-tric voltage E in the specified temperature and and voltageranges. The function is of the form: u = c0 + c1E + c2E

2 +· · ·+ c6E

6, where E is in microvolts and u is in degrees Celsius. 975.3 Type T thermocouples: Coefficients of the approximate func-

tion giving the thermoelectric voltage E as a function of tem-perature u in the specified temperature range. The function isof the form: E = c0 + c1u + c2u

2 + · · · + c8u8, where E is in

microvolts and u is in degrees Celsius. . . . . . . . . . . . . . 975.4 Hardware equipment for thermal equipment. . . . . . . . . . 985.5 Software tools for thermal equipment. . . . . . . . . . . . . . 99

6.1 Values of E and G for various materials. . . . . . . . . . . . . 1126.2 Hardware equipment for beam vibration experiment. . . . . . 1486.3 Software tools for beam vibration experiment. . . . . . . . . . 148

7.1 Beam and patch parameters. . . . . . . . . . . . . . . . . . . 204

8.1 Viscosity values of some gases and liquids at atmospheric pres-sure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

9.1 Percent of total catch of selachians. . . . . . . . . . . . . . . . 250

List of Figures

1.1 Schematic diagram of the iterative modeling process. . . . . 3

2.1 Spring-mass system (with the mass in equilibrium position). . 82.2 Graph of the simple harmonic motion, y(t) = R cos(ωt− φ). . 92.3 Spring-mass system (with “massless” paddles attached to the

body). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Plot of y(t) = Re−ct/2m cos(νt− δ). . . . . . . . . . . . . . . . 11

3.1 Plot of the pdf p(x) of a uniform distribution. . . . . . . . . . 233.2 The pdf graph of a Gaussian distributed random variable. . . 243.3 The pdf graph of a chi-square distribution for various degrees

of freedom k. . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 Original and truncated logistic curve with K = 17.5, r = .7

and z0 = .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.5 Residual vs. time plots: Original and truncated logistic curve

for qCVOLS with η = 5. . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6 Residual vs. model plots: Original and truncated logistic curvefor qCV

OLS with η = 5. . . . . . . . . . . . . . . . . . . . . . . . . 503.7 Residual vs. time plots: Original and truncated logistic curve

for qNCVOLS with η = 5. . . . . . . . . . . . . . . . . . . . . . . . 51

3.8 Residual vs. model plots: Original and truncated logistic curvefor qNCV

OLS with η = 5. . . . . . . . . . . . . . . . . . . . . . . . 513.9 Residual vs. time plots: Original and truncated logistic curve

for qCVGLS with η = 5. . . . . . . . . . . . . . . . . . . . . . . . . 52

3.10 Modified residual vs. model plots: Original and truncated lo-gistic curve for qCV

GLS with η = 5. . . . . . . . . . . . . . . . . . 523.11 Modified residual vs. time plots: Original and truncated logis-

tic curve for qNCVGLS with η = 5. . . . . . . . . . . . . . . . . . . 53

3.12 Modified residual vs. model plots: Original and truncated lo-gistic curve for qNCV

GLS with η = 5. . . . . . . . . . . . . . . . . . 533.13 Example of U ∼ χ2(4) density. . . . . . . . . . . . . . . . . . 563.14 Beam excitation. . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1 Two chamber compartments separated by a membrane. . . . 674.2 Binary molecules movement. . . . . . . . . . . . . . . . . . . . 714.3 Moving fluid through a pipe. . . . . . . . . . . . . . . . . . . 724.4 Incremental volume element. . . . . . . . . . . . . . . . . . . 73

4.5 Plug flow model. . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.1 Diagram of SMC-adhesive-SMC joint. . . . . . . . . . . . . . 825.2 A schematic diagram of the NDE method for the detection of

structural flaws. The sensor measures the surface temperature,and the measured temperature is different for the smooth versusthe corroded surface. . . . . . . . . . . . . . . . . . . . . . . . 83

5.3 Transient conduction in one-dimensional cylindrical rod. . . . 855.4 (a) A general three-dimensional region. (b) An infinitesimal

volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.5 Hardware connections used to validate the one-dimensional heat

equation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.6 Heat experiment as set up in our own laboratory. . . . . . . . 98

6.1 2-D fluid/structure interaction system. . . . . . . . . . . . . . 1046.2 Prismatic bar deformation due to tensile forces. . . . . . . . . 1056.3 Normal stresses on the prismatic bar. . . . . . . . . . . . . . . 1056.4 Stress-strain diagram for a typical structural steel in tension. 1076.5 Necking of a prismatic bar in tension. . . . . . . . . . . . . . 1076.6 Bolt subjected to bearing stresses in a bolted connection. . . 1096.7 Shearing stresses exerted on the bolt by the prismatic bar and

the clevis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106.8 Shear stress acts on a rectangular cube. . . . . . . . . . . . . 1116.9 Shear stresses. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.10 Shear strains on the front side of the rhombus. . . . . . . . . 1126.11 A cantilever beam. . . . . . . . . . . . . . . . . . . . . . . . . 1136.12 A simple beam. . . . . . . . . . . . . . . . . . . . . . . . . . . 1136.13 A cantilever beam with a tip mass at the free end and is sub-

jected to a distributed force f . . . . . . . . . . . . . . . . . . 1146.14 Shearing forces and moments on a cantilever beam with a tip

mass at the free end. . . . . . . . . . . . . . . . . . . . . . . . 1156.15 Force balance on an incremental element of the beam. . . . . 1166.16 Local deformation of a segment of the beam due to bending. 1186.17 Stress and strain as functions of distances from the neutral axis

at the point x (or e) on the neutral axis. . . . . . . . . . . . . 1206.18 Segment of a beam with a rectangular cross-sectional area. . 1226.19 Pinned end support. . . . . . . . . . . . . . . . . . . . . . . . 1246.20 Frictionless roller end support. . . . . . . . . . . . . . . . . . 1256.21 Cantilever beam with a tip mass. . . . . . . . . . . . . . . . . 1256.22 Local deformation of the cantilever beam with tip mass. . . . 1266.23 Force balance at the tip mass. . . . . . . . . . . . . . . . . . . 1276.24 Deformation of the beam due to the rotation of the beam cross

section. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.25 Moment balance at the tip mass. . . . . . . . . . . . . . . . . 1296.26 Hat basis functions. . . . . . . . . . . . . . . . . . . . . . . . 144

6.27 Hardware used for modal analysis and model validation of thecantilever beam model. . . . . . . . . . . . . . . . . . . . . . . 149

7.1 A spring-mass-dashpot platform system. . . . . . . . . . . . . 1647.2 State vector x(t) for t1 = 1 second. . . . . . . . . . . . . . . . 1657.3 State vector x(t) for t1 = .5 second. . . . . . . . . . . . . . . . 1667.4 Control u(t) for t1 = 1 second (solid line) and t1 = .5 second

(dashed line). . . . . . . . . . . . . . . . . . . . . . . . . . . . 1667.5 A pendulum. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1767.6 A closed-loop or feedback control system. . . . . . . . . . . . 1787.7 An open-loop control system. . . . . . . . . . . . . . . . . . . 1797.8 Dynamic output compensator. . . . . . . . . . . . . . . . . . 1927.9 The uncontrolled system (u ≡ 0.) . . . . . . . . . . . . . . . . 1947.10 The state vector, x(t), of the closed-loop system with K =

(−2 − 3 − 3) and G = ( 14 8 − 4)T . . . . . . . . . . . . . . 1957.11 The estimator error, e(t), of the closed-loop system with K =

(−2 − 3 − 3) and G = ( 14 8 − 4)T . . . . . . . . . . . . . . 1957.12 The state estimator, x(t), with K = (−2 − 3 − 3) and G =

( 14 8 − 4)T . The label xe1(t) denotes x1(t), etc. . . . . . . 1967.13 The state vector, x(t), with K = (−68 − 48 − 12) and G =

(110 95 20)T . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1967.14 The estimator error, e(t), with K = (−68 − 48 − 12) and

G = (110 95 20)T . . . . . . . . . . . . . . . . . . . . . . . . . 1977.15 The state estimator, x(t), with K = (−68 − 48 − 12) and

G = (110 95 20)T . The label xe1(t) denotes x1(t), etc. . . . 1987.16 Cantilever beam with piezoceramic patches. . . . . . . . . . . 2007.17 Experimental beam with piezoceramic patches. . . . . . . . . 2057.18 Experimental setup and implementation of online component

of the Real-Time Control Algorithm. . . . . . . . . . . . . . . 2067.19 Uncontrolled and controlled displacements at xob = 0.11075m. 2077.20 Control voltages. . . . . . . . . . . . . . . . . . . . . . . . . . 2077.21 The inverted pendulum. . . . . . . . . . . . . . . . . . . . . . 2087.22 Free body diagram of the inverted pendulum. . . . . . . . . . 208

8.1 A fluid initially at rest between two parallel plates. . . . . . . 2178.2 Transient velocity profile of a fluid between two parallel plates. 2188.3 Fluid shear in steady-state between two parallel plates. . . . . 2198.4 A fluid element fixed in space through which a fluid is flowing. 2218.5 A fluid element of volume ∆x∆y∆z fixed in space through

which the x-component of the momentum is transported. . . 2238.6 Hardware used for studying various types of boundary condi-

tions associated with the one-dimensional wave equation. . . 2398.7 Hewlett-Packard dynamic signal analyzer. . . . . . . . . . . . 239

9.1 Graphs of the population p(t). . . . . . . . . . . . . . . . . . 247

9.2 Graph of the solution to the logistic model. . . . . . . . . . . 2489.3 Orbital solutions of the predator/prey model. . . . . . . . . . 2519.4 Total population from size a to b at time t0. . . . . . . . . . . 2539.5 Size trajectories. . . . . . . . . . . . . . . . . . . . . . . . . . 2549.6 Growth characteristic of the conservation equation. . . . . . . 2569.7 Solution to equation (9.10) along the characteristic curve for

g(t, x) ≡ a and µ = 0. . . . . . . . . . . . . . . . . . . . . . . 2589.8 Characteristic curve. . . . . . . . . . . . . . . . . . . . . . . . 2609.9 Regions in the (t, x) plane defining the solution. . . . . . . . . 2619.10 Mosquitofish data. . . . . . . . . . . . . . . . . . . . . . . . . 269

Contents

1 Introduction: The Iterative Modeling Process 1

2 Modeling and Inverse Problems 72.1 Mechanical Vibrations . . . . . . . . . . . . . . . . . . . . . . 72.2 Inverse Problems . . . . . . . . . . . . . . . . . . . . . . . . . 11

References 15

3 Mathematical and Statistical Aspects of Inverse Problems 173.1 Probability and Statistics Overview . . . . . . . . . . . . . . 18

3.1.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . 183.1.2 Random Variables . . . . . . . . . . . . . . . . . . . . 203.1.3 Statistical Averages of Random Variables . . . . . . . 213.1.4 Special Probability Distributions . . . . . . . . . . . . 22

3.2 Parameter Estimation or Inverse Problems . . . . . . . . . . 293.2.1 The Mathematical Model . . . . . . . . . . . . . . . . 293.2.2 The Statistical Model . . . . . . . . . . . . . . . . . . 303.2.3 Known Error Processes: Maximum Likelihood Estima-

tors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.3.1 Normally Distributed Errors . . . . . . . . . 31

3.2.4 Unspecified Error Distributions and Asymptotic Theory 333.2.5 Ordinary Least Squares (OLS) . . . . . . . . . . . . . 343.2.6 Numerical Implementation of the Vector OLS Proce-

dure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2.7 Generalized Least Squares (GLS) . . . . . . . . . . . . 383.2.8 GLS Motivation . . . . . . . . . . . . . . . . . . . . . 393.2.9 Numerical Implementation of the GLS Procedure . . . 40

3.3 Computation of Σn, Standard Errors and Confidence Intervals 413.4 Investigation of Statistical Assumptions . . . . . . . . . . . . 45

3.4.1 Residual Plots . . . . . . . . . . . . . . . . . . . . . . 463.4.2 An Example Using Residual Plots . . . . . . . . . . . 47

3.5 Statistically Based Model Comparison Techniques . . . . . . 513.5.1 RSS Based Statistical Tests . . . . . . . . . . . . . . . 54

3.5.1.1 P-Values . . . . . . . . . . . . . . . . . . . . 563.5.1.2 Alternative Statement . . . . . . . . . . . . . 57

3.5.2 Application: Cat-Brain Diffusion/Convection Problem 57

References 63

4 Mass Balance and Mass Transport 654.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.2 Compartmental Concepts . . . . . . . . . . . . . . . . . . . . 654.3 Compartment Modeling . . . . . . . . . . . . . . . . . . . . . 674.4 General Mass Transport Equations . . . . . . . . . . . . . . 71

4.4.1 Mass Flux Law in a Stationary (Non-Moving) Fluid . 734.4.2 Mass Flux in a Moving Fluid . . . . . . . . . . . . . . 75

References 79

5 Heat Conduction 815.1 Motivating Problems . . . . . . . . . . . . . . . . . . . . . . 81

5.1.1 Radio-Frequency Bonding of Adhesives . . . . . . . . 815.1.2 Thermal Testing of Structures . . . . . . . . . . . . . 82

5.2 Mathematical Modeling of Heat Transfer . . . . . . . . . . . 835.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 835.2.2 Fourier’s Law of Heat Conduction . . . . . . . . . . . 845.2.3 Heat Equation . . . . . . . . . . . . . . . . . . . . . . 855.2.4 Boundary Conditions and Initial Conditions . . . . . . 895.2.5 Properties of Solutions . . . . . . . . . . . . . . . . . . 93

5.3 Experimental Modeling of Heat Transfer . . . . . . . . . . . 945.3.1 The Thermocouple as a Temperature Measuring Device 955.3.2 Detailed Hardware and Software Lists . . . . . . . . . 98

References 101

6 Structural Modeling: Force/Moments Balance 1036.1 Motivation: Control of Acoustics/Structural Interactions . . 1036.2 Introduction to Mechanics of Elastic Solids . . . . . . . . . . 104

6.2.1 Normal Stress and Strain . . . . . . . . . . . . . . . . 1056.2.2 Stress and Strain Relationship (Hooke’s Law) . . . . . 1066.2.3 Shear Stress and Strain . . . . . . . . . . . . . . . . . 109

6.3 Deformations of Beams . . . . . . . . . . . . . . . . . . . . . 1126.3.1 Differential Equations of Thin Beam Deflections . . . 114

6.3.1.1 Force Balance . . . . . . . . . . . . . . . . . 1146.3.1.2 Moment Balance . . . . . . . . . . . . . . . . 1166.3.1.3 Moment Computation . . . . . . . . . . . . . 1176.3.1.4 Initial Conditions . . . . . . . . . . . . . . . 1236.3.1.5 Boundary Conditions . . . . . . . . . . . . . 124

6.4 Separation of Variables: Modes and Mode Shapes . . . . . . 1296.5 Numerical Approximations: Galerkin’s Method . . . . . . . . 1356.6 Energy Functional Formulation . . . . . . . . . . . . . . . . . 1416.7 The Finite Element Method . . . . . . . . . . . . . . . . . . 143

6.8 Experimental Beam Vibration Analysis . . . . . . . . . . . . 147

References 153

7 Beam Vibrational Control and Real-Time Implementation 1557.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1557.2 Controllability and Observability of Linear Systems . . . . . 155

7.2.1 Controllability . . . . . . . . . . . . . . . . . . . . . . 1567.2.1.1 Time-Varying Case . . . . . . . . . . . . . . 1567.2.1.2 Time-Invariant Case . . . . . . . . . . . . . . 162

7.2.2 Observability . . . . . . . . . . . . . . . . . . . . . . . 1707.2.2.1 Time-Varying Case . . . . . . . . . . . . . . 1707.2.2.2 Time-Invariant Case . . . . . . . . . . . . . . 172

7.3 Design of State Feedback Control Systems and State Estima-tors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1757.3.1 Effect of State Feedback on System Properties . . . . 179

7.3.1.1 Stability . . . . . . . . . . . . . . . . . . . . 1797.3.1.2 Controllability . . . . . . . . . . . . . . . . . 1807.3.1.3 Observability . . . . . . . . . . . . . . . . . . 181

7.4 Pole Placement (Relocation) Problem . . . . . . . . . . . . . 1827.4.1 State Estimator (Luenberger Observer) . . . . . . . . 1907.4.2 Dynamic Output Feedback Compensator . . . . . . . 191

7.5 Linear Quadratic Regulator Theory . . . . . . . . . . . . . . 1977.6 Beam Vibrational Control: Real-Time Feedback Control Im-

plementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

References 213

8 Wave Propagation 2158.1 Fluid Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 215

8.1.1 Newton’s Law of Viscosity . . . . . . . . . . . . . . . . 2168.1.2 Derivative in Fluid Flows . . . . . . . . . . . . . . . . 2208.1.3 Equations of Fluid Motion . . . . . . . . . . . . . . . . 220

8.2 Fluid Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . 2298.2.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . 2298.2.2 Sound Waves . . . . . . . . . . . . . . . . . . . . . . . 231

8.2.2.1 Euler’s Equation . . . . . . . . . . . . . . . . 2338.2.2.2 Equation of Continuity . . . . . . . . . . . . 2338.2.2.3 Equation of State . . . . . . . . . . . . . . . 233

8.2.3 Wave Equations . . . . . . . . . . . . . . . . . . . . . 2348.3 Experimental Modeling of the Wave Equation . . . . . . . . 238

References 243

9 Size-Structured Population Models 2459.1 Introduction: A Motivating Application . . . . . . . . . . . . 2459.2 A Single Species Model (Malthusian Law) . . . . . . . . . . . 2469.3 The Logistic Model . . . . . . . . . . . . . . . . . . . . . . . 2479.4 A Predator/Prey Model . . . . . . . . . . . . . . . . . . . . . 2499.5 A Size-Structured Population Model . . . . . . . . . . . . . . 2519.6 The Sinko-Streifer Model and Inverse Problems . . . . . . . 2659.7 Size Structure and Mosquitofish Populations . . . . . . . . . 268

References 277

A An Introduction to Fourier Techniques 281A.1 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 281A.2 Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . 284

B Review of Vector Calculus 287

References 293

Index 295

Chapter 1

Introduction: The IterativeModeling Process

We begin this monograph with a brief discussion of certain philosophical no-tions that are important in the modeling of physical and biological systems.Modeling in our view is simply a means for providing a conceptual frameworkin which real systems may be investigated. The modeling process itself is (orshould be) most often an iterative process: one can distinguish in it a numberof rather separate steps that usually must be repeated. This iterative model-ing process is schematically depicted in Figure 1.1. One begins with the realsystem under investigation and pursues the following sequence of steps:

(i) empirical observations, experiments, and data collection;

(ii) formalization of properties, relationships and mechanisms that resultin a biological or physical model (e.g., stoichiometric relations detailingpathways, mechanisms, biochemical reactions, etc., in a metabolic path-way model; stress-strain, pressure-force relationships in mechanics andfluids);

(iii) abstraction or mathematization resulting in a mathematical model (e.g.,algebraic and/or differential equations with constraints and initial and/orboundary conditions);

(iv) formalization of uncertainty/variability in model and data resulting ina statistical model (this usually involves basic assumptions about errorsin modeling, observation process/measurement, etc.);

(v) model analysis that can consist of simulation studies, analytical andqualitative analysis including stability analysis, and use of mathematicaltechniques such as perturbation studies, parameter estimation (inverseproblems) data fitting, statistical analysis;

(vi) interpretation and comparison (with the real system) of the conclusions,predictions and conjectures obtained from step (v);

(vii) changes in “understanding” of mechanisms, pathways, etc., in the realsystem.

1

2 Mathematical and Experimental Modeling

As one completes step (vii), one is led naturally to reformulate the physicalor biological model by returning to either step (i) (if new experiments areindicated) or step (ii). In either case one then proceeds through the stepsagain, seeking to improve the findings of the previous transit through thesequence.

Steps (i), (ii), (iii), (iv) belong to what one might term the formulationstage of the modeling process, while step (v) is the solution stage of the mod-eling process, and steps (vi) and (vii) constitute the interpretation stage. Inpractice, however, it is often (unfortunately) the case that investigators do notmake a clear distinction in the steps outlined here. This can lead to confusionand, in some cases, incorrect conclusions and gross misunderstanding of thereal system.

Let us turn next to the reasons frequently given for modeling. Perhaps theone most often offered is simplification: the use of models makes possible theinvestigation of very complex systems in a systematic manner. A second ra-tionale is ease in manipulation: investigations involving separation of subunitsand hypothesis testing may often be facilitated through use of simulations inplace of experimentation. The suggestive features in modeling can also help informulation of hypotheses and in the design of critical experiments. The mod-eling process also requires preciseness in investigation in that one must movefrom a general, verbal explanation of phenomena to a specific, quantitativeone.

But a rationale perhaps more fundamental than any of these is that mod-eling leads to an organization of inquiry in that it tends to polarize one’sthinking and aid in posing basic questions concerning what one does and doesnot know for certain about the real system. Whatever the reasons that havebeen advanced to justify modeling attempts, it is sufficient perhaps to notethat the primary goal must be enlightenment, that is, to gain a better under-standing of the real system, and the success or lack thereof of any modelingattempt must be appraised with this in mind.

One must recognize the various levels or multi-scale aspects of modeling inany attempt to compare or assess the validity of several models for a phe-nomenon. For example, consider the phenomena involved in the transmissionof a nerve impulse along an axon: this process is likely to be described by themathematician or biophysicist in terms of partial differential equations, wavephenomena, or transmission line analogies, whereas a neurophysiologist mightspeak in terms of local circuit analogies and changes in conductances. Thecell physiologist might describe the phenomena in the context of transportproperties of membranes and ion flow, while the molecular biochemist couldinsist that the real story lay in the theory of molecular binding.

A second example involves the physical motion (vibration) of a structuresuch as a plate or beam. Again the mathematician might describe this interms of a partial differential equation whereas the mechanical engineer mightuse a modal analysis (in terms of natural frequencies of oscillation) based oninternal stress-strain relationships.

Introduction: The Iterative Modeling Process 3

FIGURE 1.1: Schematic diagram of the iterative modeling process.

In each of the examples cited above, the different modeling approaches moveto an increasingly more micro level. Each approach involves an attempt toexplain a phenomenon that is not understood at one level by description ata more micro level (in general) where understanding is more complete. Thisattempt to explain “unknowns” in terms of more basic “knowns” is clearlythe foundation of most modeling investigations. Indeed, in addition to notingthat nerve impulse phenomena are described in terms of membrane conduc-tances, permeabilities, ion flow, etc., one might observe that blood circulationis studied in the context of elementary hydrostatics and fluid dynamics whilemetabolic processes are usually investigated via use of the language of ele-mentary chemical kinetics and thermodynamics.


The choice of the level (micro vs. macro) at which one models depends verymuch upon the training and background of the investigator. Furthermore, theperception of whether a model is a “good” one or not is also greatly influencedby this factor, and it is therefore not surprising that all of the approaches tothe nerve impulse phenomena mentioned above (or indeed those for modelingany physical or biological phenomena) can be subjected to valid criticisms inany attempt to evaluate them.

Before discussing the criteria one might use in evaluating modeling inves-tigations, let us list some of the common difficulties and limitations oftenencountered in the modeling of systems:

(a) Availability and accuracy of data;

(b) Analysis of the mathematical model;

(c) Use of local representations that are invalid for the overall system;

(d) Obsession with the solution stage;

(e) Assumption that the “model” is the real system;

(f) Communication in interdisciplinary efforts.

The first item in this list requires no further comment; the second includesboth theoretical and computational difficulties in the mathematical treatmentof a given set of equations. Although formidable obstacles can still arise, thisis a much less critical problem today in modeling than it was, say, in thephysical sciences in Newton’s time. This is due in large part to great stridesthat have been made in the last several decades with the advance of moderncomputing facilities and the concomitant development of rather sophisticatednumerical procedures. We remark that (c) is especially prevalent in certainphysiological modeling, where systems are not easily manipulated experimen-tally. In vitro data and parameter values (determined via experimentationin nonphysiological ranges) are often used to model, predict and draw con-clusions about in vivo situations. While (d) is likely to be a problem forinvestigators with a mathematical or physics background (in their enthusiasmfor finding solutions of their model equations and various generalizations, theytend to forget or ignore the fact that the model is only an approximation andthat certain aspects of the physical or biological model on which it is basedare very poorly understood), item (e) can be a problem for both mathemat-ical and physical and/or biological scientists. Even physicists and biologistssometimes have a penchant for disbelieving data that contradicts model sim-ulations and predictions. It can be very tempting to throw out “faulty” datarather than reformulate the basic model. Finally, because most serious phys-ical and biological modeling projects involve an interdisciplinary effort, thereis always the possibility of serious lack of communication and cooperation dueto differences in vocabulary, goals, and attitudes. Often mathematicians are

Introduction: The Iterative Modeling Process 5

only looking for a “problem” to which their already highly developed theoriesand techniques apply; i.e., they are in possession of a “solution” and in searchof the “problem” they have solved! On the other hand, physicists and biol-ogists can be too impatient with the mathematicians’ desire to hypothesizerather implausible mechanisms and relationships (which can sometimes leadto exciting new perspectives about a phenomenon!).

Finally, we turn to the question of how one appraises a specific modelingattempt. There are a number of criteria that one might use. Among thoseproposed by various authors are the suggestions that a good model should:fit data accurately; be theoretically consistent with the real system; have pa-rameters with physical meaning that can be measured independently of eachother; prove useful in prediction; not so much explain or predict, but organizeand economize thinking; pose new empirical questions and help answer themthrough the iterative process; help us understand the phenomena it representsand think comfortably about them; and point to inadequacies in some wayof available data. It is clear, though, that for a modeling investigation to bedeemed a success, it must have enhanced our overall knowledge and under-standing of the phenomena in question. As one of our students (having beenattacked by other students for some rather unorthodox and, at the time, un-supported hypothesis about mechanisms) noted in defending his efforts, “Welearn little indeed if the models we build never stretch our understanding, butonly tell us what we already feel is safely known.” We remind the reader ofthe often quoted truth, “all models are incorrect, but some are more usefulthan others.”

In concluding our philosophical remarks, we remark that one can distin-guish between at least two basic types of scientific models: descriptive andconceptual models. Descriptive models, those designed to explain observedphenomena, will be the focus of our attention here. Conceptual models, mod-els constructed to elucidate delicate and difficult points in some scientifictheory, are often used to help resolve apparent paradoxes involving two de-scriptive models. Conceptual models do not appear widely in the biologicalliterature since in many cases basic descriptive models are still under devel-opment.

Chapter 2

Modeling and Inverse Problems

In this chapter we will present a simple application to illustrate the iterativemodeling process that was given in Chapter 1. In addition, using this illustra-tive example, the notion of an inverse problem will be discussed. The inverseor parameter estimation problem plays an indispensable role in developingmathematical models for biological and physical systems. As we shall see, be-cause so many different mathematical models are plausible for a given system,model validation is an essential part of the modeling process. Indeed as weformulate the physical or biological problems mathematically, we find that theproblem amounts to that of determining one or more unknown parameters inthe mathematical model from some (limited) knowledge about the behaviorof the system. Problems of this type arise in many important applications in-cluding geophysics, ecology, flexible structures, medical imaging and materialstesting.

2.1 Mechanical Vibrations

To begin, we consider a spring of length l attached to a rigid horizontalsupport (e.g., ceiling) and a small object of mass m hanging from the bottomof the spring (see Figure 2.1). Now note that if we pull down or push upon the body a distance ∆l, the elastic spring will exert a restoring forceto pull the object back up or to push the object down, respectively. If ∆lis small compared to the spring natural length l, then the spring restoringforce, denoted by Fr, can be described by Hooke’s law (see, e.g., [7, 11]).Mathematically, we write

Fr = −k∆l, (2.1)

where k is called the spring-constant, which is a measure of the stiffness ofthe spring. Note that if ∆l is positive, then the restoring force is negative,whereas if ∆l is negative, then Fr is positive.

In modeling the motion of the mass m, it will be convenient to describe theposition of the mass with respect to its equilibrium position. The equilibriumposition of the mass is that point where the mass will hang at rest when no

7


m

FIGURE 2.1: Spring-mass system (with the mass in equilibrium position).

external forces (other than gravity) are being applied. We let y = 0 denote thisequilibrium point and take the downward direction to be positive. Newton’sSecond Law of Motion is fundamental to the description of the position of themass at time t; this states

F = ma, (2.2)

where F is the sum of all forces exerted on the mass, m is the body’s mass,and a is the acceleration of the body. Let y(t) denote the position of the massat time t. Using Newton’s Second Law of Motion, we obtain (see for instance,[4, 5])

my = −ky. (2.3)

The differential equation (2.3) is a second-order, linear differential equationwith constant coefficients. Its solution can be readily obtained as (see, e.g.,[4, 10])

y(t) = A cosωt+B sinωt, (2.4)

where ω =√k/m. The constants of integration A and B are determined

from the initial conditions, y(0) = y0 and y(0) = v0, and are given by

A = y0, (2.5)

B =v0

ω. (2.6)

In order to analyze the solution (2.4), it is convenient to rewrite it as a singlecosine function of the form

y(t) = R cos(ωt− φ), (2.7)

Modeling and Inverse Problems 9

where R =√A2 +B2 and φ = tan−1(B/A). This solution is depicted in

Figure 2.2. Note that the solution y(t) lies between −R and +R, and that themotion of the body is periodic with a period of 2π/ω. This type of motion iscalled simple harmonic motion, ω =

√k/m is called the natural frequency of

the system, R is the amplitude of the motion, and φ is called the phase angleof the motion.

t

y(t)

FIGURE 2.2: Graph of the simple harmonic motion, y(t) = R cos(ωt−φ).

In summary, our modeling process as discussed in Chapter 1 begins withthe real physical model (a weight (mass) hanging from the bottom of anelastic spring) (step (i)) and proceeds with force balancing (Newton’s SecondLaw of Motion) (step (ii)) to derive a mathematical model in terms of adifferential equation (step (iii)). We next obtain the analytical solution toour differential equation model (step (v)). However, in comparison to realphysical systems of mechanical vibration (step (vi)), the oscillations do notpersist over time but eventually die out. This leads us to step (vii) whichrequires a re-examination in our understanding of the mechanical vibrationsystem. Perhaps, we have over simplified our assumptions. For example, arethere other forces (in addition to the spring restoring force) being exerted onthe body in Newton’s Second Law of Motion (2.2)? Specifically, consider anew experiment where we now add to the mass two light “massless” paddles(see Figure 2.3). As the body moves through the air, there is an apparentresistive force to motion (the paddles are bending in the direction oppositeto motion). Furthermore, more bending will occur as the mass moves faster.Simply stated, this force is proportional to the magnitude of the velocity yand can be modeled by

Fd = −cy, (2.8)


where c is the viscous damping coefficient. The resistive force Fd, which themedium exerts on the body m, is also called the damping, or drag force. If wetake this new force into consideration, our new mathematical model becomes

my = −ky − cy. (2.9)

m

FIGURE 2.3: Spring-mass system (with “massless” paddles attached tothe body).

If we assume that c2 − 4km < 0, every solution of (2.9) has the form

y(t) = e−ct/2m[A cos νt+B sin νt], (2.10)

where ν =√

4km−c22m and A and B are constants to be determined from initial

conditions as earlier. If we use similar arguments to those in the undampedcase, the damped solution (2.10) can also be rewritten in the form

y(t) = Re−ct/2m cos(νt− δ). (2.11)

Observe that the solution oscillates between the curves ±Re−ct/2m. That is,the motion of the mass is periodic with decreasing amplitude, as depicted inFigure 2.4.

Thus, with damping present in the system, the motion of the body alwaysdies out eventually. Engineers usually refer to such systems as spring-mass-dashpot systems. Spring-mass-dashpot systems are ubiquitous in engineering,science, and indeed in nature. For example, they are used as shock absorbersin vehicles to damp out bumps on the road as well as to minimize the recoileffect of a heavy gun barrel. They also find modeling applications in musclemechanics and molecular level phenomena in materials (e.g., polarization,“electron cloud” models, in response to alternating electric fields).


t

y(t)

FIGURE 2.4: Plot of y(t) = Re−ct/2m cos(νt− δ).

2.2 Inverse Problems

Mathematical models as described by equations (2.3) and (2.9) above areof the “forward” type; that is, the parameters m, c, and k are assumed tobe known, as well as the initial conditions. The mathematical model thenpredicts the resultant model behavior y(t) at any time t from the solutionformulas (2.4) or (2.10). This is typically the approach taken in sensitivityinvestigations, which is quite useful, and can provide important features ofthe model as functions of parameters (see [2, 6, 9, 12] and the referencestherein). However, in reality, not all parameters are directly measurable (e.g.,most springs in mechanical devices come without specification of the springconstant k). Instead, we may have sparse and noisy measurements of displace-ments (using proximity sensors) and/or accelerations (using accelerometers).From this information, we need to find the unknown parameters. Problems ofthis type are called inverse or parameter estimation problems and are ubiq-uitous in modeling. Finding the solutions to an inverse problem is, in general,nontrivial because of non-uniqueness difficulties that arise. This undesirablefeature is often due to noisy data and insufficient number of observations. Fora discussion on the non-uniqueness as well as other issues such as stability ininverse problems we refer the interested reader to [1, 3].

To discuss the inverse problem formulation for the spring-mass-dashpotsystem, we assume that all three parameters m, c, and k are unknown andthat displacement observations ydi at selected temporal points ti are available.If we have noise free observations (which is never the case in practice), thenwe only need three well-chosen points ti to obtain three equations to solve forthree unknowns m, c, and k. However, due to noise in the measurements, weusually take n observations. Then, a typical inverse or estimation problem


involving (2.9) is to find q ∈ QAD = (m, c, k)|0 < m < M, 0 < c, 0 < k byminimizing the least squares criterion

J(q) =n∑i=1

∣∣ymod(ti;m, c, k)− ydi∣∣2 . (2.12)

Here ymod(ti;m, c, k) is the solution to (2.9) corresponding to m, c, and k.The above procedure leads to a constrained optimization problem. We alsoremark that such problems also require one to solve for the solution of thedifferential equation model (2.9) multiple times.

Project: Inverse Problem

The objective of this project is to help students familiarize themselves withthe concepts involved with inverse problems. In addition, students will learnhow to use MATLAB (see, e.g., [8]) to carry out many computations associ-ated with inverse problems.

1.) Consider the following mathematical model for a spring-mass-dashpotsystem (using a linear spring assumption, Hooke’s law, and viscous airdamping):

md2y(t)dt2

+ cdy(t)dt

+ ky(t) = 0

with initial conditions

y(0) = 2,dy(0)dt

= 0,

where m is the mass, c is the damping coefficient, k is the spring con-stant, and y(t) is the vertical displacement of the mass from the equi-librium position.

The solution to the above second order differential equation can be com-puted using MATLAB routine ode23. To use the routine ode23 oneneeds to rewrite the above equation as a system of first order differen-tial equations. That is, letting z1 = y and z2 = dy

dt , we obtain

dz1

dt= z2

dz2

dt= − k

mz1 −

c

mz2.

Letting m = 2, c = 2, k = 3, compute the numerical solution y(t) fort ∈ [0, 5] and plot it on a graph. On your graph, you should label thehorizontal axis as time, t, the vertical axis as y(t), and title the graph


as A Linear Spring Model Response. Also, place a text string on thegraph showing values of m, c, and k. The following MATLAB functionswill be useful for this exercise: ode23, plot, xlabel, ylabel, title andgtext. You can see an explanation of a function by typing help ode23,for example, on the command line.

2.) In general, the coefficients m, c, and k are unknown parameters. Theseparameters can be estimated via a nonlinear least squares estimationproblem. Specifically one seeks ~q = (m, c, k) to minimize the cost func-tion

J(~q) =n∑i=1

∣∣ym(ti; ~q)− ydi∣∣2

where ym(ti; ~q) is the model solution to the spring-mass-dashpot modelat time ti for i = 1, 2, . . . , n, given the parameter set ~q and ydi is the data(displacement) collected also at time ti. In this exercise, we will create“simulated” data to be used for estimating the unknown parameters ~q =(m, c, k). For this, we assume that displacement is sampled at equallyspaced time intervals. We will subdivide the time interval [0, 5] into nequal subintervals of length h = 5/n. Let ydi denote the displacementsampled at time ti = ih, i = 1, . . . , N . For this, use the solution y(ti)to the spring-mass-dashpot system corresponding to m = 2, c = 2, andk = 3 that you have already computed in part 1.) of this exercise. Usingthese “data”, implement an inverse problem for finding the parametersm, c, and k using the least squares criteria above (take n = 20). Thesolution to this minimization problem can be solved using MATLABroutine fminu or fminsearch. To use one of these routines you mustgive an initial guess for the parameters (try ~qg = (m, c, k) = (3, 1, 6);then try several others). Create a table showing the initial guess valuesof the parameters ~qg, its cost function value J(~qg) and the optimal valuesof the parameters ~qop and its cost function value J( ~qop).

3.) In practice, the collected data is corrupted by noise (for example, errorsin collecting data, instrumental errors, etc.). In the next part of the ex-ercise, we wish to test the sensitivity of the inverse least squares methodto errors in sampling the data. For this, we will add to each simulateddata point an error term as follows:

yd(ti) = yd(ti) + nl · randi,

where randi are the normally distributed random numbers with zeromean and variance 1.0. Use the MATLAB routine randn to generate ann-vector with random entries. Here, nl is a noise level constant.

For each of the values nl = 0.01, nl = 0.02, nl = 0.05, nl = 0.1, nl = 0.2,estimate the parameters m, c, and k using the inverse least squaresmethod. Create a table listing the estimated values of the parameters


and the values of the cost functionals for each value of nl. Describe thesensitivity of the inverse least squares method with respect to the noiselevel nl.

References

[1] R. Aster, B. Borchers and C. Thurber, Parameter Estimation and In-verse Problems, Academic Press, New York, 2004.

[2] H.T. Banks, S. Dediu and S.E. Ernstberger, Sensitivity functions andtheir uses in inverse problems, J. Inverse and Ill-posed Problems, 15,2007, pp. 683–708.

[3] H.T. Banks and K. Kunisch, Estimation Techniques for Distributed Pa-rameter Systems, Birkhauser, Boston, 1989.

[4] W.E. Boyce and R.C. DiPrima, Elementary Differential Equations andBoundary Value Problems, John Wiley & Sons, Inc., Hoboken, 8th ed.,2004.

[5] M. Braun, Differential Equations and Their Applications: An Introduc-tion to Applied Mathematics, Springer, Berlin, 4th ed., 1992.

[6] J.A. David, Optimal Control, Estimation, and Shape Design: Analysisand Applications, Ph.D. Dissertation, North Carolina State University,Raleigh, 2007.

[7] J.M. Gere and S.P. Timoshenko, Mechanics of Materials, PWS Pub.Co., Boston, 4th ed., 1997.

[8] A. Gilat, MATLAB: An Introduction with Applications, John Wiley &Sons, Inc., Hoboken, 2nd ed., 2004.

[9] E. Laporte and P. Le Tallec, Numerical Methods in Sensitivity Analysisand Shape Optimization, Birkhauser, Boston, 2002.

[10] R.K. Nagle, E.B. Saff and A.D. Snider, Fundamentals of Differen-tial Equations and Boundary Value Problems, Pearson Education, Inc.,Boston, 2004.

[11] S.S. Rao, Vibration of Continuous Systems, John Wiley & Sons, Inc.,Hoboken, 2007.

[12] A. Saltelli, K. Chan and E.M. Scott, Sensitivity Analysis, John Wiley &Sons, Inc., Hoboken, 2000.

15

Chapter 3

Mathematical and StatisticalAspects of Inverse Problems

In inverse or parameter estimation problems as discussed in Chapter 2, animportant but practical question is how successful the mathematical model isin describing the physical or biological phenomena represented by the exper-imental data. In general, it is very unlikely that the residual sum of squares(RSS) in the inverse least squares formulation is zero. Indeed, due to modelingerror, there may not even be a true set of parameters so that the mathematicalmodel will provide an exact fit to the experimental data.

Even if one begins with a deterministic model and has no initial interestin uncertainty or stochasticity, as soon as one employs experimental datain the investigation, one is led to uncertainty that should not be ignored.This is because all measurement procedures contain error or uncertainty inthe data collection process and hence statistical questions arise. To correctlyformulate, implement and analyze the corresponding inverse problems onerequires a framework entailing a statistical model as well as a mathematicalmodel.

In this chapter we discuss mathematical, statistical and computational as-pects of inverse or parameter estimation problems for deterministic dynami-cal systems based on the Maximum Likelihood Estimation (MLE), OrdinaryLeast Squares (OLS) and Generalized Least Squares methods (GLS) with ap-propriate corresponding data noise assumptions of constant variance and non-constant variance (relative error), respectively, in the latter two cases. Amongthe topics included here are interplay between the mathematical model, thestatistical model and observation or data assumptions, and some techniques(residual plots and model comparison tests) for analyzing the uncertaintiesassociated with inverse problems employing experimental data. We also out-line a standard theory underlying the construction of confidence intervals forparameter estimators. The methodology for statistical “hypothesis” testingthat can form the basis of a heuristic approach to address the important prob-lem of model improvement is illustrated. This latter approach along with anumber of examples as well as its mathematical theory can be found in themonograph by Banks and Kunisch [8] while the asymptotic theory for confi-dence intervals can be found in Seber and Wild [17]. A recent summary [2]contains more examples along with extensive references.

Before we begin the inverse problem discussions, we give a brief but useful

17


review of certain basic probability and statistics concepts.

3.1 Probability and Statistics Overview

The theory of probability and statistics is an essential mathematical tool inthe development of inverse problem formulations and subsequent analysis aswell as for approaches to statistical hypothesis testing. Our coverage of thesefundamental and important topics is brief and limited in scope. Indeed, weprovide in this section a few definitions and basic concepts in the theory ofprobability and statistics that are essential for the understanding of estima-tors, confidence intervals and hypothesis testing to be formulated later in thechapter.

3.1.1 Probability

We adopt the standard practice of denoting events by capital letters andwill write the probability of event A as P (A). The set of all possible outcomes,the sample space, will be denoted by S. For example, consider the experimentof the rolling of a die in which there are six possible outcomes. The samplespace is

S = 1, 2, 3, 4, 5, 6 (3.1)

and an event A might be defined as

A = 1, 5, (3.2)

which consists of the outcomes 1 and 5. Associated with event A containedin S is its probability P (A). In the case that the sample space S is discrete(finite or countably infinite) probability satisfies the following postulates:

(i) P (A) ≥ 0,

(ii) P (S) = 1,

(iii) If A1, A2, A3, . . . , is a finite or an infinite sequence of disjoint subsets ofS, then

P (A1 ∪A2 ∪A3 ∪ · · · ) = P (A1) + P (A2) + P (A3) + · · · . (3.3)

For example, in our fair experiment of the rolling of a die, each possible out-come has probability 1

6 . The event A as defined by (3.2) consists of two disjointsubevents, and hence P (A) = 2

6 = 13 . Using the three postulates of proba-

bility, a number of immediate consequences can also be derived which have

Mathematical and Statistical Aspects of Inverse Problems 19

important applications. For example, probabilities cannot exceed 1 (P (A) ≤ 1for any event A), the empty set ∅ has probability 0 (P (∅) = 0), and the prob-ability that an event will occur and that it will not occur always add up to1 (P (A) + P (A) = 1 where A denotes the complement of the event A whichconsists of all sample points in S that are not in A). If P (A) = 1, then wesay that “the event A occurs with probability 1 or almost surely (a.s.).”

Instead of considering a single experiment, let us perform two experimentsand consider their outcomes. For example, the two experiments may be twoseparate tosses of a single die or a single toss of two dice. The sample spacein this case consists of 36 pairs (i, j), where i, j = 1, 2, . . . , 6. Note that in afair dice game, each point in the sample space has probability 1

36 . We nowconsider the probability of joint events, such as i = 2, j = odd. We beginby denoting the possible outcomes of one experiment by Ai, i = 1, 2, . . . , n,and by Bj , j = 1, 2, . . . ,m the possible outcomes of the second experiment.The combined experiment has the possible joint outcomes (Ai, Bj), wherei = 1, 2, . . . , n and j = 1, 2, . . . ,m. The joint probability P (Ai, Bj) satisfiesthe condition

0 ≤ P (Ai, Bj) ≤ 1. (3.4)

If the outcomes Bj for j = 1, 2, . . . ,m are mutually exclusive (i.e., Bi⋂Bj =

∅, i 6= j), thenm∑j=1

P (Ai, Bj) = P (Ai). (3.5)

Furthermore, if all the outcomes of the two experiments are mutually exclu-sive, then

n∑i=1

m∑j=1

P (Ai, Bj) = 1. (3.6)

The generalization of the above concept to more than two experiments followsin a straightforward manner.

Next, we consider a joint event with probability P (A,B). Assuming thatevent A has occurred, we wish to determine the probability of the event B.This is called the conditional probability of the event B given the occurrenceof the event A and is given by

P (B|A) =P (A,B)P (A)

, (3.7)

where P (A) > 0. A very useful relationship for conditional probabilities,which is known as Bayes’ theorem, states that if Ai, where i = 1, 2, . . . , n, aremutually exclusive events such that

n⋃i=1

Ai = S (3.8)


and B is an arbitrary event with P (B) > 0, then

P (Ai|B) =P (Ai, B)P (B)

=P (B|Ai)P (Ai)∑nl=1 P (B|Al)P (Al)

.

3.1.2 Random Variables

In most applications of probability theory, we are interested only in a partic-ular aspect of the outcome of an experiment. For example, in the experimentof the rolling of a pair of dice, we are generally interested only in the to-tal and not in the outcome for each die. In the language of probability andstatistics, the total which we obtain with a pair of dice is called a randomvariable. More formally, the random variable X(A) represents the functionalrelationship between a random event A and a real number. For example, ifwe flip a coin the possible outcomes are heads, H, or tails, T . We may definea random variable X(A) by

X(A) =

1, A = H,−1, A = T.

(3.9)

We note that the random variable may be continuous or discrete. Associatedwith a random variable X, we consider the event X ≤ x, where −∞ < x <∞.The probability of this event is defined by

F (x) = P (X ≤ x), (3.10)

where the function F (x) is called the probability distribution function of therandom variable X. It is also called the cumulative distribution function or(cdf). The distribution function is right continuous and has the followingproperties:

(i) 0 ≤ F (x) ≤ 1,

(ii) F (x1) ≤ F (x2) if x1 ≤ x2,

(iii) F (−∞) = 0,

(iv) F (∞) = 1.

The derivative p(x) (when it exists) of the distribution function F (x) givenby

p(x) =dF (x)dx

(3.11)


is called the probability density function or (pdf). The name “density function”comes from the fact that the probability of the event x1 ≤ X ≤ x2 is given by

P (x1 ≤ X ≤ x2) = P (X ≤ x2)− P (X ≤ x1)= F (x2)− F (x1)

=∫ x2

x1

p(x) dx.

The probability density function p(x) satisfies the following properties:

(i) p(x) ≥ 0,

(ii)∫∞−∞ p(x) dx = F (∞)− F (−∞) = 1.

Moreover, it is common to denote random variables by capital letters X,Y, Z,etc., while one denotes particular realizations by the corresponding lower caseletters x, y, z, etc.

3.1.3 Statistical Averages of Random Variables

Of particular importance in the characterization of the outcomes of exper-iments and random variables are the concepts of first and second moments ofa single random variable and the joint moments (correlation and covariance)between any pair of random variables in a multi-dimensional set of randomvariables.

We begin the discussion of these statistical averages by considering first asingle random variable X and its pdf p(x). The mean value µ or expectedvalue of the random variable X is defined by

µ = E(X) =∫ ∞−∞

xp(x) dx, (3.12)

where E(·) is called the expected value operator (or statistical averaging op-erator). This is the first moment of the random variable X. The nth momentof a probability distribution of a random variable X is defined as

E(Xn) =∫ ∞−∞

xnp(x) dx. (3.13)

We can also define the central moments, which are the moments of thedifference between X and µ. The second central moment, which is called thevariance of X, is defined by

σ2 = var(X) = E[(X − µ)2] =∫ ∞−∞

(x− µ)2p(x) dx. (3.14)


The square root σ of the variance of X is called the standard deviation of X.Variance is a measure of the “randomness” of the random variable X. It isrelated to the first and second moments through the relationship

σ2 = E(X2 − 2µX + µ2)= E(X2)− 2µE(X) + µ2

= E(X2)− µ2.

In the important case of multi-dimensional or Rp–valued vector randomvariables X = (X1, X2, . . . , Xp)T , we can define joint moments of any order.However, the joint moments that are most useful in practical applications arethe joint moments defined by

E(XiXj) =∫ ∞−∞

∫ ∞−∞

xixjp(xi, xj) dxidxj , (3.15)

which are called the correlation (not to be confused with correlation coef-ficients) between the random variables Xi and Xj . Here, p(xi, xj) are themarginal densities defined by

p(xi, xj)

=∫ ∞−∞· · ·∫ ∞−∞

p(x1, . . . , xp)dx1 . . . dxi−1dxi+1 . . . dxj−1dxj+1 . . . dxp.

Also of particular importance is the joint central moment, which is also calledthe covariance of Xi and Xj and is given by

µij ≡ E[(Xi − µi)(Xj − µj)]

=∫ ∞−∞

∫ ∞−∞

(xi − µi)(xj − µj)p(xi, xj) dxidxj

=∫ ∞−∞

∫ ∞−∞

xixjp(xi, xj) dxidxj − µiµj

= E(XiXj)− µiµj . (3.16)

The (p × p) matrix with elements µij is called the covariance matrix of therandom variable X = (X1, . . . , Xp). Two random variables are said to beuncorrelated if E(XiXj) = E(Xi)E(Xj) = µiµj . In that case, the covarianceµij = 0. We also note that when Xi and Xj are statistically independent,they are uncorrelated. The reverse is, however, not true. That is, if Xi andXj are uncorrelated, they are not necessarily statistically independent.

3.1.4 Special Probability Distributions

We say a continuous random variable has a certain distribution (e.g., Gaus-sian distribution) when it has the corresponding probability density. We


review in this section several frequently encountered random variables, theirpdf’s, and their moments.

Uniform distribution. The pdf of a uniformly distributed random vari-able X is given by

p(x) =

1/(b− a), a ≤ x ≤ b,0, otherwise. (3.17)

This is also called a rectangular distribution and its graph is depicted in Figure3.1. The first two moments of X are

E(X) = µ =a+ b

2

E(X2) =a2 + b2 + ab

3

and the variance is

var(X) = σ2 =(a− b)2

12. (3.18)

a bx

p(x)

1

(b-a)

FIGURE 3.1: Plot of the pdf p(x) of a uniform distribution.

Gaussian (normal) distribution. The pdf of a Gaussian or normallydistributed random variable is given by

p(x) =1√2πσ

e−(x−µ)2/(2σ2), (3.19)

where µ is the mean and σ2 is the variance of the random variable. The pdfof a Gaussian distributed random variable is illustrated in Figure 3.2. The


probability distribution function F (x) has the form

F (x) =∫ x

−∞p(s) ds

=1√2πσ

∫ x

−∞e−(s−µ)2/(2σ2) ds

=12

[1 + erf(

x− µ√2σ

)], (3.20)

where erf denotes the error function and is given by

erf(x) =2π

∫ x

0

e−s2ds. (3.21)

FIGURE 3.2: The pdf graph of a Gaussian distributed random variable.

The kth central moment of the random variable X is given by the expression

E[(X − µ)k] ≡ mk =

1 · 3 · · · (k − 1)σk, k = even,0, k = odd (3.22)

and the kth moments are given in terms of the central moments by

E(Xk) =k∑i=0

(ki

)µimk−i. (3.23)

Finally, if X is a random variable distributed normally with mean µ andvariance σ2, this is commonly denoted by X ∼ N (µ, σ2).

Log-normal distribution. A widely employed model for biological (andother) phenomena where the random variable is only allowed positive values


is the so-called log-normal distribution. If logX ∼ N (µ, σ2), then X has alog-normal distribution with density given by

p(x) =1√2πσ

1x

exp− (log x− µ)2

2σ2

, 0 < x <∞, (3.24)

with mean and variance

E(X) = eµ+σ2/2, (3.25)

var(X) = (eσ2− 1)e2µ+σ2

. (3.26)

We observe that var(X) is proportional to E(X)2 so that the constantcoefficient of variation (CV) defined by

√var(X)/E(X), which represents the

“noise-to-signal” ratio, does not depend on E(X). The density for this randomvariable is skewed (asymmetric) with a “long right tail” but becomes moreand more symmetric as σ → 0.

Multivariate normal distribution. One of the most often encounteredmultivariate random variables is (as we shall see below in discussing asymp-totic theory for confidence intervals) also incredibly important in statisticalmodeling and inference and is known as the multivariate normal or multinor-mal random variable. A random vector X = (X1, . . . , Xp)T has a multivariate(p-variate) normal distribution (denoted by X ∼ Np(µ,Σ)) if αTX is normalfor all α ∈ Rp; its density is given by

p(x) = (2π)−p/2|Σ|−1/2 exp−(x− µ)TΣ−1(x− µ)/2,

for x = (x1, . . . , xp)T ∈ Rp where the mean is

µ = E(X) = (µ1, . . . , µp)T = E(X1), . . . , E(Xp)T

and the covariance matrix is

Σ = E(x− µ)(x− µ)T .

The (p× p) covariance matrix Σ is such that

Σjj = var(Xj), Σjk = Σkj = cov(Xj , Xk).

Finally, we note that the marginal probability densities are univariate normal.

Chi-square distribution. The chi-square distribution is important in sta-tistical analysis of variance (ANOVA) and other statistical procedures [10, 14]based on normally distributed random variables. In particular, a chi-squaredistributed random variable is related to the normally distributed randomvariable through a transformation. That is, if X is a normally distributedrandom variable, then Y = X2 has a chi-square distribution. There are two


types of chi-square distributions. A central chi-square distribution is obtainedwhen X has zero mean; otherwise, we call it a non-central chi-square distri-bution.

First, let us consider the central chi-square distribution. In this case, thepdf of Y has the form

p(y) =1√

2πyσe−y/(2σ

2), (3.27)

where y > 0. The corresponding probability distribution function F (y) isgiven by

F (y) =1√2πσ

∫ y

0

1√se−s/(2σ

2) ds. (3.28)

More generally, suppose that the random variable Y is defined as

Y =k∑i=1

X2i , (3.29)

where Xi, i = 1, 2, . . . , k, are statistically independent and identically dis-tributed normal random variables with zero mean and variance σ2. The pdfis then given by

p(y) =1

σk2k/2Γ(k/2)yk/2−1e−y/(2σ

2), y > 0, (3.30)

where Γ(p) is the gamma function defined as Γ(1/2) =√π, Γ(3/2) =

√π/2,

and for ν > 0,

Γ(ν) =∫ ∞

0

xν−1e−x dx. (3.31)

By integration by parts, it can be shown that

Γ(ν) = (ν − 1)Γ(ν − 1). (3.32)

For positive and integer ν we obtain

Γ(ν) = (ν − 1)!. (3.33)

This pdf, which is a generalization of (3.27), is called a chi-square (or gamma)pdf with k degrees of freedom (denoted Y ∼ χ2(k) or Y ∼ χ2

k). Its graphs forseveral values of k are depicted in Figure 3.3. The first two moments of Y are

E(Y ) = kσ2

E(Y 2) = 2kσ4 + k2σ4


y

p(y)

k=1

k=2

k=4

k=6k=10

0 4 8 12 16 20

.25

.5

s2=1

FIGURE 3.3: The pdf graph of a chi-square distribution for various degreesof freedom k.

and its variance is

var(Y ) = 2kσ4. (3.34)

We now turn to the non-central chi-square distribution. Here, let Xi,i = 1, 2, . . . , k, be Gaussian distributed random variables with means µi andidentical variances equal to σ2. The random variable Y =

∑ki X

2i has the pdf

p(y) =1

2σ2

(ys

)(k−2)/4

e−(s2+y)/(2σ2)Ik/2−1

(√ys

σ2

), y > 0, (3.35)

where the parameter s2, which is called the noncentrality parameter of thedistribution, is given by

s2 =k∑i=1

µ2i .

The function Iα(x) is the αth-order modified Bessel function of the first kindand is given by

Iα(x) =∞∑j=0

(x/2)α+2j

j! Γ(α+ j + 1), x ≥ 0. (3.36)


The pdf function given by the expression (3.35) is called the non-central chi-square pdf with k degrees of freedom.

Finally, the first two moments of the non-central chi-square distributionrandom variable are

E(Y ) = kσ2 + s2

E(Y 2) = 2kσ4 + 4σ2s2 + (kσ2 + s2)2

and its variance isvar(Y ) = 2kσ4 + 4σ2s2.

Rayleigh distribution. Another frequently encountered random variablewhich is closely related to the central chi-square distribution is the Rayleighdistribution. To begin the discussion, let us consider a central chi-square dis-tribution with two degrees of freedom, Y = X2

1 +X22 , where Xi are zero mean

statistically independent Gaussian random variables with identical variancesσ2. The pdf of Y is given by

p(y) =1

2σ2e−y/(2σ

2). (3.37)

Define a new variable Z as

Z =√X2

1 +X22 =√Y .

Then, after a change of variables in equation (3.37), we obtain the pdf of Zas

p(z) =z

σ2e−z

2/(2σ2), z ≥ 0

which is known as the pdf of a Rayleigh distributed random variable. Thecorresponding probability distribution function is given by

F (z) =∫ z

0

s

σ2e−s

2/(2σ2) ds

= 1− e−z2/(2σ2), z ≥ 0.

The moments of Z are

E(Zk) = (2σ2)(k/2)Γ(1 +k

2)

and the variance is given by

var(Z) = (2− π

2)σ2.


Student’s t distribution. If U ∼ N (0, 1) and V ∼ χ2(k) are independent,then X = U/

√V/k has a t distribution with k degrees of freedom (denoted

by X ∼ tk) and density function

p(x) =Γ(k + 1)/2

Γ(k/2)1√kπ

1(1 + x2/k)(k+1)/2

, −∞ < x <∞.

The mean and variance are given by

E(X) = 0 if k > 1 (otherwise undefined)

andvar(X) = k/(k − 2) if k > 2 (otherwise undefined).

The corresponding density is symmetric like that of the normal, with “heav-ier tails,” and becomes similar to a normal as k →∞. As we shall see below,the Student’s t distribution is fundamental to the computation of confidenceintervals for estimated parameters using experimental data in inverse prob-lems (where typically k 2).

3.2 Parameter Estimation or Inverse Problems

3.2.1 The Mathematical Model

We consider inverse or parameter estimation problems in the context of aparameterized (with vector parameter ~q) dynamical system or mathematicalmodel

d~z

dt(t) = ~g(t, ~z(t), ~q) (3.38)

with observation process

~y(t) = C~z(t; ~q). (3.39)

The mathematical model is a deterministic system (here we treat ordinarydifferential equations, but our discussions are relevant to problems involvingparameter dependent partial differential equations, delay differential equa-tions, etc., as long as the system is assumed to be well-posed, i.e., to possessunique solutions that depend smoothly on the parameters and initial data).Following usual convention (which corresponds to the form of data usuallyavailable from experiments), we assume a discrete form of the observations inwhich one has n longitudinal observations ~yj corresponding to

~y(tj) = C~z(tj ; ~q), j = 1, . . . , n, (3.40)

where C is an observation operator that will be described below. In general thecorresponding observations or data ~yj will not be exactly ~y(tj). Because


of the nature of the phenomena leading to this discrepancy, we treat thisuncertainty pertaining to the observations with a statistical model for theobservation process.

3.2.2 The Statistical Model

In our discussions here we consider a statistical model of the form

~Yj = ~f(tj , ~q0) + ~Ej , j = 1, . . . , n, (3.41)

where ~f(tj , ~q) = C~z(tj ; ~q), j = 1, . . . , n, corresponds to the observed part ofthe solution of the mathematical model (3.38) at the jth covariate or observa-tion time for a particular vector of parameters ~q ∈ Rp, ~z ∈ RN , ~f ∈ Rm, andC is an m×N matrix. The term ~q0 represents the “truth” or the parametersthat generate the observations ~Yjnj=1. (The existence of a truth parameter~q0 is a standard assumption in statistical formulations and this along withthe assumption that the means E[~Ej ] are zero yields implicitly that (3.38) isa correct description of the process being modeled.) The terms ~Ej are ran-dom variables which can represent observation or measurement error, “systemfluctuations” or other phenomena that cause observations to not fall exactlyon the points ~f(tj , ~q) from the smooth path ~f(t, ~q). Since these fluctuationsare unknown to the modeler, we will assume that realizations ~εj of ~Ej aregenerated from a probability distribution (with mean zero throughout ourdiscussions) that reflects the assumptions regarding these phenomena. Thusspecific data (realizations) corresponding to (3.41) will be represented by

~yj = ~f(tj , ~q0) + ~εj , j = 1, . . . , n. (3.42)

Assumptions about the distribution for ~Ej must be problem specific. For in-stance, in a statistical model for pharmacokinetics of drugs in human bloodsamples, a natural distribution for ~E = (E1, . . . , En)T might be a multivari-ate normal distribution. In other applications the distribution for ~E mightbe much more complicated [11]. For example, in observing (counting) pop-ulations, the error may well depend on the size of the population itself (i.e.,so-called relative error to be discussed below).

To relate the notation and formulations of this chapter to that of the inverseproblems introduced in the previous chapter, we observe that yj = ydj is thedata and f(tj , ~q) = ymod(tj ;m, c, k) = ym(tj ;m, c, k) in (2.12). Moreover, inthe computational project of Chapter 2, ~q = (m, c, k), ~z = (y, y)T , so thatf(tj , ~q) = C~z where C = (1 0).

The purpose of our presentation is to discuss methodology related to theestimation of the true value of the parameters ~q0 from a set Q of admissibleparameters, and its dependence on what is assumed about the variance of theerror ~Ej , var(~Ej) . We discuss three inverse problem methodologies that can beused to calculate estimates q for ~q0: the ordinary least-squares (OLS) and the


generalized least-squares (GLS) formulations as well as the popular maximumlikelihood estimate (MLE) formulation in the case where one assumes thedistributions of the error process ~Ej are known.

3.2.3 Known Error Processes: Maximum Likelihood Esti-mators

In the introduction of the statistical model we initially made no mentionof the probability distribution that generates the error realizations ~εj . Inmany situations one readily assumes that the errors ~Ej , j = 1, . . . , n, areindependent and identically distributed (we make the standing assumptions ofindependence across j throughout our discussions in this chapter). We discussa case where one is able to make further assumptions on the error, namelythat the distribution is known. In this case, maximum likelihood techniquesmay be used. We discuss first one such case for a scalar observation system,i.e., m = 1. If Ej is assumed a known random variable with parameter (~θ)dependent density p~θ(ε) = p(ε; ~θ), then for the statistical model (3.41) withobservations ~Y , the associated likelihood function is defined by

L(~θ|~Y ) =n∏j=1

p(Yj − f(tj ; ~q); ~θ). (3.43)

In particular, one often assumes that ~θ = (µ, σ) and p = p(ε;µ, σ2) so that thedensity is completely characterized by its mean and variance. If we furtherassume that Ej has known density p(ε; 0, σ2

0), then from the statistical model(3.41) we have that Yj − f(tj , ~q) has density p(ε; 0, σ2

0) or Yj has the densityp(y; f(tj , ~q), σ2

0). The corresponding likelihood function is

L(~q, σ2|~Y ) =n∏j=1

p(Yj − f(tj ; ~q); 0, σ2). (3.44)

3.2.3.1 Normally Distributed Errors

If, in addition, there is sufficient evidence to suspect that the error is gen-erated by a normal distribution, then we may be willing to assume Ej ∼N (0, σ2

0), and hence Yj ∼ N (f(tj , ~q0), σ20). We can then obtain an expression

for determining ~q0 and σ0 by seeking the maximum over (~q, σ2) ∈ Q× (0,∞)of the likelihood function for Ej = Yj − f(tj , ~q) which is defined by

L(~q, σ2|~Y ) =n∏j=1

1√2πσ2

exp− 1

2σ2[Yj − f(tj , ~q)]2

. (3.45)

The resulting solutions qMLE and σ2MLE are the maximum likelihood estima-

tors (MLEs) for ~q0 and σ20 , respectively. We point out that these solutions


qMLE = qnMLE(~Y ) and σ2MLE = σ2 n

MLE(~Y ) are random variables by virtue of thefact that ~Y is a random variable. The corresponding maximum likelihood es-timates are obtained by maximizing (3.45) with ~Y = (Y1, . . . , Yn)T replacedby a given realization ~y = (y1, . . . , yn)T and will be denoted by qMLE = qnMLE

and σMLE = σnMLE, respectively. In our discussions here and below, almost ev-ery quantity of interest is dependent on n, the size of the set of observationsor the sampling size. On occasion, we will express this dependence explicitlyby use of superscripts or subscripts, especially when we wish to remind thereader of this dependence. However, for notational convenience we will oftensuppress the notation of explicit dependence on n.

Maximizing (3.45) is equivalent to maximizing the log likelihood

logL(~q, σ2|~Y ) = −n2

log(2π)− n

2log σ2 − 1

2σ2

n∑j=1

[Yj − f(tj , ~q)]2. (3.46)

We determine the maximum of (3.46) by differentiating with respect to ~q(with σ2 fixed) and with respect to σ2 (with ~q fixed), setting the resultingequations equal to zero and solving for ~q and σ2. With σ2 fixed we solve∂∂~q logL(~q, σ2|~Y ) = 0 which is equivalent to

n∑j=1

[Yj − f(tj , ~q)]∇f(tj , ~q) = 0, (3.47)

where as usual ∇f = ∂∂~qf = f~q. We see that solving (3.47) is the same as the

least squares optimization

qMLE(~Y ) = arg min~q∈Q

J(~Y , ~q) ≡ arg min~q∈Q

n∑j=1

[Yj − f(tj , ~q)]2. (3.48)

We next fix ~q to be qMLE and solve ∂∂σ2 logL(qMLE, σ

2|~Y ) = 0, which yields

σ2MLE(~Y ) =

1nJ(~Y , qMLE). (3.49)

Note that we can solve for qMLE and σ2MLE separately — a desirable feature,

but one that does not arise in more complicated formulations discussed below.The second derivative test (the calculation is deferred to an exercise below)can be used to verify that the expressions above for qMLE and σ2

MLE do indeedmaximize (3.46).

However, if we have a vector of observations for the jth covariate tj , thenthe statistical model is reformulated as

~Yj = ~f(tj , ~q0) + ~Ej , (3.50)

where ~f ∈Rm and

V0 = var(~Ej) = diag(σ20,1, . . . , σ

20,m) (3.51)


for j = 1, . . . , n. In this setting, we have allowed for the possibility that theobservation coordinates Y ij may have different constant variances σ2

0,i, i.e., σ20,i

does not necessarily have to equal σ20,k. If (again) there is sufficient evidence to

claim the errors are independent and identically distributed and generated bya normal distribution, then ~Ej ∼ Nm(0, V0). We can thus obtain the maximumlikelihood estimators qMLE(~Yj) and VMLE(~Yj) for q0 and V0 by determiningthe maximum of the log of the likelihood function for ~Ej = ~Yj− ~f(tj , ~q) definedby

logL(~q, V |Y 1j , . . . , Y

mj ) = −n

2

m∑i=1

log σ20,i −

12

m∑i=1

1σ2

0,i

n∑j=1

[Y ij − f i(tj , ~q)]2

= −n2

m∑i=1

log σ20,i −

n∑j=1

[~Yj − ~f(tj , ~q)]TV −1[~Yj − ~f(tj , ~q)].

Using arguments similar to those given for the scalar case, we determine themaximum likelihood estimators for ~q0 and V0 to be

qMLE = arg min~q∈Q

n∑j=1

[~Yj − ~f(tj , ~q)]TV −1MLE[~Yj − ~f(tj , ~q)] (3.52)

VMLE = diag

1n

n∑j=1

[~Yj − ~f(tj , qMLE)][~Yj − ~f(tj , qMLE)]T

. (3.53)

Unfortunately, this is a coupled system, which requires some care when solvingnumerically. We will discuss this issue further in Sections 3.2.6 and 3.2.9below.

3.2.4 Unspecified Error Distributions and Asymptotic The-ory

In Section 3.2.3 we examined the estimates of ~q0 and V0 under the as-sumption that the error is known and in particular is normally distributed,independent and has constant variance longitudinally. But what if it is sus-pected that the error is not normally distributed, or more generally (as inmost applications) the error distribution is unknown to the modeler beyondthe assumptions on E[~Yj ] embodied in the model and the assumptions madeon var(~Ej)? How should we proceed in estimating ~q0 and σ0 (or V0) in thesecircumstances? In the next several sections we will review two estimationprocedures for such situations: ordinary least squares (OLS) and generalizedleast squares (GLS).


3.2.5 Ordinary Least Squares (OLS)

The statistical model in the scalar case takes the form

Yj = f(tj , ~q0) + Ej , (3.54)

where the variance var(Ej) = σ20 is assumed constant in longitudinal data (note

that the error’s distribution is not specified). We also note that the assumptionthat the observation errors are uncorrelated across j (i.e., time) may be areasonable one when the observations are taken with sufficient intermittencyor when the primary source of error is measurement error. If we define

qOLS(~Y ) = qnOLS(~Y ) = arg min~q∈Q

n∑j=1

[Yj − f(tj , ~q)]2, (3.55)

then qOLS can be viewed as minimizing the distance between the data andmodel where all observations are treated as of equal importance. We notethat minimizing the functional in (3.55) corresponds to solving for ~q in

n∑j=1

[Yj − f(tj , ~q)]∇f(tj , ~q) = 0. (3.56)

We point out that qOLS is a random variable (because Ej = Yj − f(tj , ~q) isa random variable); hence if yjnj=1 is a realization of the random processYjnj=1 then solving

qOLS = qnOLS = arg min~q∈Q

n∑j=1

[yj − f(tj , ~q)]2 (3.57)

provides a realization for qOLS. (A remark on notation: for a random variableor estimator q, we will always denote a corresponding realization or estimatewith an over hat, e.g., q is an estimate for q so that we have here abandonedthe usual convention of capital letters for random variables and a lower caseletter for a corresponding realization — this again follows convention in thestatistical literature.)

Noting that

σ20 =

1nE[

n∑j=1

[Yj − f(tj , ~q0)]2] (3.58)

suggests that once we have solved for qOLS in (3.55), we may readily obtainan estimate σ2

OLS (= σ2 nMLE – see (3.49)) for σ2

0 .Even though the error’s distribution is not specified, we can use asymptotic

theory to approximate the mean and variance of the random variable qOLS

[17]. As will be explained in more detail below, as n→∞, we have that

qOLS = qnOLS ∼ Np(~q0,Σn0 ) ≈ Np(~q0, σ20 [χnT (~q0)χn(~q0)]−1), (3.59)


where the sensitivity matrix χ(~q) = χn(~q) = χnjk is defined as

χnjk(~q) =∂f(tj , ~q)∂qk

, j = 1, . . . , n, k = 1, . . . , p,

andΣn0 ≡ σ2

0 [nΩ0]−1 (3.60)

withΩ0 ≡ lim

n→∞

1nχnT (~q0)χn(~q0), (3.61)

where the limit is assumed to exist (see [17]). However, ~q0 and σ20 are generally

unknown, so one usually will use instead the realization ~y = (y1, . . . , yn)T ofthe random process ~Y to obtain the estimate

qOLS = arg min~q∈Q

n∑j=1

[yj − f(tj , ~q)]2 (3.62)

and the bias adjusted estimate

σ2OLS =

1n− p

n∑j=1

[yj − f(tj , q)]2 (3.63)

to use as an approximation in (3.59).We note that (3.63) represents the estimate for σ2

0 of (3.58) with the factor1n replaced by the factor 1

n−p (in the linear case the estimate with 1n can be

shown to be biased downward and the same behavior can be observed in thegeneral nonlinear case (see Chap. 12 of [17] and p. 28 of [11])). We remarkthat (3.58) is true even in the general nonlinear case (it does not rely on anyasymptotic theories although it does depend on the assumption of constantvariance being correct).

Both q = qOLS and σ2 = σ2OLS will then be used to approximate the covari-

ance matrix

Σn0 ≈ Σn ≡ σ2[χnT (q)χn(q)]−1. (3.64)

We can obtain the standard errors SE(qOLS,k) (discussed in more detail in the

next section) for the kth element of qOLS by calculating SE(qOLS,k) ≈√

Σnkk.Also note the similarity between the MLE equations (3.48) and (3.49), and thescalar OLS equations (3.62) and (3.63). That is, under a normality assumptionfor the error, the MLE and OLS formulations are equivalent.

However, if we have a vector of observations for the jth covariate tj and weassume the variance is still constant in longitudinal data, then the statisticalmodel is reformulated as

~Yj = ~f(tj , ~q0) + ~Ej , (3.65)


where ~f ∈Rm and

V0 = var(~Ej) = diag(σ20,1, . . . , σ

20,m) (3.66)

for j = 1, . . . , n. Just as in the MLE case, we have allowed for the possibilitythat the observation coordinates Y ij may have different constant variancesσ2

0,i, i.e., σ20,i does not necessarily have to equal σ2

0,k. We note that thisformulation also can be used to treat the case where V0 is used to simplyscale the observations, i.e., V0 = diag(v1, . . . , vm) is known. In this case theformulation is simply a vector OLS (sometimes also called a weighted leastsquares (WLS)). The problem will consist of finding the minimizer


n∑j=1

[~Yj − ~f(tj , ~q)]TV −10 [~Yj − ~f(tj , ~q)], (3.67)

where the procedure weights elements of the vector ~Yj − ~f(tj , ~q) according totheir variability. (Some authors refer to (3.67) as a generalized least squares(GLS) procedure, but we will make use of this terminology in a differentformulation in subsequent discussions.) Just as in the scalar OLS case, qOLS

is a random variable (again because ~Ej = ~Yj − ~f(tj , ~q) is); hence if ~yjnj=1 isa realization of the random process ~Yjnj=1 then solving


n∑j=1

[~yj − ~f(tj , ~q)]TV −10 [~yj − ~f(tj , ~q)] (3.68)

provides an estimate (realization) q = qOLS for qOLS. By the definition ofvariance

V0 = diag E

1n

n∑j=1

[~Yj − ~f(tj , ~q0)][~Yj − ~f(tj , ~q0)]T

,

so an unbiased estimate of V0 for the realization ~yjnj=1 is

V = diag

1n− p

n∑j=1

[~yj − ~f(tj , q)][~yj − ~f(tj , q)]T

. (3.69)

However, the estimate q requires the (generally unknown) matrix V0, and V0

requires the unknown vector ~q0, so we will instead use the following expressionsto calculate q and V :

~q0 ≈ q = arg min~q∈Q

n∑j=1

[~yj − ~f(tj , ~q)]T V −1[~yj − ~f(tj , ~q)] (3.70)

V0 ≈ V = diag

1n− p

n∑j=1

[~yj − ~f(tj , q)][~yj − ~f(tj , q)]T

. (3.71)


Note that the expressions for q and V constitute a coupled system of equationsthat will require greater effort in implementing a numerical scheme.

Just as in the scalar case, we can determine the asymptotic properties ofthe OLS estimator (3.67). As n → ∞, qOLS has the following asymptoticproperties [11, 17]:

qOLS ∼ Np(~q0,Σn0 ), (3.72)

where

Σn0 ≈

n∑j=1

DTj (~q0)V −1

0 Dj(~q0)

−1

, (3.73)

and the m× p matrix Dj(~q) = Dnj (~q) is given by

∂f1(tj ,~q)∂q1

∂f1(tj ,~q)∂q2

· · · ∂f1(tj ,~q)∂qp

......

...∂fm(tj ,~q)

∂q1

∂fm(tj ,~q)∂q2

· · · ∂fm(tj ,~q)∂qp

.

Since the true value of the parameters ~q0 and V0 are unknown, their estimatesq and V are used to approximate the asymptotic properties of the least squaresestimator qOLS:

qOLS ∼ Np(~q0,Σn0 ) ≈ Np(q, Σn), (3.74)

where

Σn0 ≈ Σn =

n∑j=1

DTj (q)V −1Dj(q)

−1

. (3.75)

The standard errors SE(qOLS,k) can then be calculated for the kth element ofqOLS by SE(qOLS,k) ≈

√Σkk. Again, we point out the similarity between the

MLE equations (3.52) and (3.53), and the OLS equations (3.70) and (3.71)for the vector statistical model (3.65).

3.2.6 Numerical Implementation of the Vector OLS Proce-dure

In the scalar statistical model (3.54), the estimates q and σ can be solvedfor separately (this is also true of the vector OLS in the case V0 = σ2

0Im, whereIm is the m ×m identity matrix) and thus the numerical implementation isstraightforward — first determine qOLS according to (3.62) and then calculate


σ2OLS according to (3.63). However, the estimates q and V in the case of the

vector statistical model (3.65) require more effort since they are coupled:

q = arg min~q∈Q

n∑j=1

[~yj − ~f(tj , ~q)]T V −1[~yj − ~f(tj , ~q)] (3.76)

V = diag

1n− p

n∑j=1

[~yj − ~f(tj , q)][~yj − ~f(tj , q)]T

. (3.77)

To solve this coupled system the following iterative process will be followed:

1. Set V (0) = I and solve for the initial estimate q(0) using (3.76). Setk = 0.

2. Use q(k) to calculate V (k+1) using (3.77).

3. Re-estimate ~q by solving (3.76) with V = V (k+1) to obtain q(k+1).

4. Set k = k + 1 and return to step 2. Terminate the process and setqOLS = q(k+1) when two successive estimates for q are sufficiently closeto one another.

3.2.7 Generalized Least Squares (GLS)

Although in Section 3.2.5 the error’s distribution remained unspecified, wedid however require that the error remain constant in variance in longitudinaldata. That assumption may not be appropriate for data sets whose error isnot constant in a longitudinal sense. A common relative error model (e.g.,one in which the size of the observation error is assumed proportional to thesize of the observed quantity, an assumption which might be reasonable whencounting individuals in a population) that experimentalists use in this instancefor the scalar observation case [11] is

Yj = f(tj , ~q0) (1 + Ej) , (3.78)

where E(Yj) = f(tj , ~q0) and var(Yj) = σ20f

2(tj , ~q0) which derives from theassumptions that E[Ej ] = 0 and var(Ej) = σ2

0 . We see that the variancegenerated in this fashion is model dependent and hence generally is longitu-dinally non-constant variance. The method we will use to estimate ~q0 and σ2

0

can be viewed as a particular form of the Generalized Least Squares (GLS)method.

To define the random variable qGLS, the following equation must be solvedfor the estimator qGLS:

n∑j=1

wj [Yj − f(tj , qGLS)]∇f(tj , qGLS) = 0, (3.79)


where Yj obeys (3.78) and wj = f−2(tj , qGLS). We note these are the so-callednormal equations (obtained by equating the gradient of the weighted leastsquares criterion to zero in the case the weights wj are independent of q).The quantity qGLS is a random variable, hence if yjnj=1 is a realization ofthe random process Yj , then solving

n∑j=1

f−2(tj , q)[yj − f(tj , q)]∇f(tj , q) = 0 (3.80)

for q we obtain an estimate qGLS for qGLS.The GLS estimator qGLS = qnGLS has the following asymptotic properties

[11]:

qGLS ∼ Np(~q0,Σn0 ), (3.81)

where

Σn0 ≈ σ20

(FT~q (~q0)W (~q0)F~q(~q0)

)−1, (3.82)

F~q(~q) = Fn~q (~q) =

∂f(t1,~q)∂q1

∂f(t1,~q)∂q2

· · · ∂f(t1,~q)∂qp

......

∂f(tn,~q)∂q1

∂f(tn,~q)∂q2

· · · ∂f(tn,~q)∂qp

=

∇f(t1, ~q)T...

∇f(tn, ~q)T

,

and W−1(~q) = diag(f2(t1, ~q), . . . , f2(tn, ~q)

). Note that because ~q0 and σ2

0

are unknown, the estimates q = qGLS and σ2 = σ2GLS will be used in (3.82) to

calculateΣn0 ≈ Σn = σ2

(FT~q (q)W (q)F~q(q)

)−1,

where [11] we take the approximation

σ20 ≈ σ2

GLS =1

n− p

n∑j=1

1f2(tj , q)

[yj − f(tj , q)]2.

We can then approximate the standard errors of qGLS by taking the squareroots of the diagonal elements of Σ. We will also mention that the solutionsto (3.70) and (3.80) depend upon the numerical method used to find theminimum or root, and since Σ0 depends upon the estimate for ~q0, the standarderrors are therefore affected by the numerical method chosen.

3.2.8 GLS Motivation

We note the similarity between (3.56) and (3.80). The GLS equation (3.80)can be motivated by examining the weighted least squares (WLS) estimator

qWLS = arg min~q∈Q

n∑j=1

wj [Yj − f(tj , ~q)]2. (3.83)


In many situations where the observation process is well understood, theweights wj may be known. The WLS estimate can be thought of mini-mizing the distance between the data and model while taking into accountunequal quality of the observations [11]. If we differentiate the sum of squaresin (3.83) with respect to ~q and then choose wj = f−2(tj , ~q), an estimate qGLS

is obtained by solving

n∑j=1

wj [yj − f(tj , ~q)]∇f(tj , ~q) = 0

for ~q. However, we note the GLS relationship (3.80) does not follow fromminimizing the weighted least squares with weights chosen as wj = f−2(tj , ~q).

Another motivation for the GLS estimating equation (3.80) can be foundin [9]. In the text, the authors claim that if the errors (and hence the data)are distributed according to the gamma distribution, then the maximum like-lihood estimator for ~q is the solution to

n∑j=1

f−2(tj , ~q)[Yj − f(tj , ~q)]∇f(tj , ~q) = 0,

which is equivalent to (3.80). The connection between the MLE and ourGLS method is reassuring, but it also poses another interesting question:what if the variance of the data is assumed to be independent of the modeloutput f(tj , ~q) but depends on some other function g(tj , ~q) (i.e., var(Yj) =σ2

0g2(tj , ~q) = σ2

0/wj)? Is there a corresponding maximum likelihood estimatorof ~q whose form is equivalent to the appropriate GLS estimating equation(wj = g−2(tj , ~q))

n∑j=1

g−2(tj , ~q)[Yj − f(tj , ~q)]∇f(tj , ~q) = 0 ? (3.84)

In their text, Carroll and Ruppert [9] briefly describe how distributions be-longing to the exponential family of distributions generate maximum-likelihoodestimating equations equivalent to (3.84).

3.2.9 Numerical Implementation of the GLS Procedure

Recall that an estimate qGLS can either be solved for directly accordingto (3.80) or iteratively using the equations outlined in Section 3.2.7. Theiterative procedure as described in [11] is summarized below:

1. Estimate qGLS by q(0) using the OLS equation (3.55). Set k = 0.

2. Form the weights wj = f−2(tj , q(k)).


3. Re-estimate q by solving

q(k+1) = arg minq∈Q

n∑j=1

wj(yj − f

(tj , ~q

))2to obtain the k + 1 estimate q(k+1) for qGLS.

4. Set k = k + 1 and return to step 2. Terminate the process when two ofthe successive estimates for qGLS are sufficiently close.

We note that the above iterative procedure was formulated by minimizing(over ~q ∈ Q)

n∑j=1

f−2(tj , q)[yj − f(tj , ~q)]2

and then updating the weights wj = f−2(tj , q) after each iteration. Onewould hope that after a sufficient number of iterations wj would converge tof−2(tj , qGLS). Fortunately, under reasonable conditions, if the process enu-merated above is continued a sufficient number of times [11], then wj →f−2(tj , qGLS).

3.3 Computation of Σn, Standard Errors and ConfidenceIntervals

We return to the case of n scalar longitudinal observations and considerthe OLS case of Section 3.2.5 (the extension of these ideas to vectors is com-pletely straight-forward). These n scalar observations are represented by thestatistical model

Yj ≡ f(tj , ~q0) + Ej , j = 1, 2, . . . , n, (3.85)

where f(tj , ~q0) is the model for the observations in terms of the state variablesand ~q0 ∈ Rp is a set of theoretical “true” parameter values (assumed toexist in a standard statistical approach). We further assume that the errorsEj , j = 1, 2, . . . , n, are independent identically distributed (i.i.d.) randomvariables with mean E[Ej ] = 0 and constant variance var(Ej) = σ2

0 , where σ20

is unknown. The observations Yj are then i.i.d. with mean E[Yj ] = f(tj , ~q0)and variance var(Yj) = σ2

0 .Recall that in the ordinary least squares (OLS) approach, we seek to use

a realization yj of the observation process Yj along with the model todetermine a vector qnOLS where

qnOLS = arg min Jn(~q) =n∑j=1

[yj − f(tj , ~q)]2. (3.86)


Since Yj is a random variable, the corresponding estimator qn = qnOLS (herewe wish to emphasize the dependence on the sample size n) is also a randomvariable with a distribution called the sampling distribution. Knowledge ofthis sampling distribution provides uncertainty information (e.g., standarderrors) for the numerical values of qn obtained using a specific data set yj.In particular, loosely speaking, the sampling distribution characterizes thedistribution of possible values the estimator could take on across all possiblerealizations with data of size n that could be collected. The standard errorsthus approximate the extent of variability in possible values across all possiblerealizations, and hence provide a measure of the extent of uncertainty involvedin estimating q using the specific estimator and sample size n in actual datacollection.

Under reasonable assumptions on smoothness and regularity (the smooth-ness requirements for model solutions are readily verified using continuousdependence results for differential equations in most examples; the regular-ity requirements include, among others, conditions on how the observationsare taken as sample size increases, i.e., as n → ∞), the standard nonlin-ear regression approximation theory ([11, 13, 15], and Chapter 12 of [17]) forasymptotic (as n→∞) distributions can be invoked. As stated above, thistheory yields that the sampling distributions for the estimators qn(~Y ), where~Y = (Y1, . . . , Yn)T , can be approximated by a p-multivariate Gaussian (i.e.,the sequence of cumulative distribution functions converge as n→∞ at pointsof continuity of the limit cdf — this is called convergence in distribution) withmean E[qn(~Y )] ≈ ~q0 and covariance matrix var(qn(~Y )) ≈ Σn0 = σ2

0 [nΩ0]−1 ≈σ2

0 [χnT (~q0)χn(~q0)]−1. Here χn(~q) = F~q(~q) is the n× p sensitivity matrix withelements

χjk(~q) =∂f(tj , ~q)∂qk

and F~q(~q) ≡ (f1~q(~q), . . . , fn~q(~q))T ,

where fj~q(~q) = ∂f∂~q (tj , ~q). That is, for n large, the sampling distribution ap-

proximately satisfies

qnOLS(~Y ) ∼ Np(~q0,Σn0 ) ≈ Np(~q0, σ20 [χnT (~q0)χn(~q0)]−1). (3.87)

There are typically several ways to compute the matrix F~q (which are actu-ally the well known sensitivity functions widely used in applied mathematicsand engineering (see the discussions in [2] and the references therein)). First,the elements of the matrix χ = (χjk) can always be estimated using theforward difference

χjk(~q) =∂f(tj , ~q)∂qk

≈ f(tj , ~q + hk)− f(tj , ~q)|hk|

,

where hk is a p-vector with a nonzero entry in only the kth component. But,of course, the choice of hk can be problematic in practice.


Alternatively, if the f(tj , ~q) correspond to longitudinal observations ~y(tj) =C~z(tj ; ~q) of solutions ~z ∈ RN to a parameterized N -vector differential equationsystem ~z = ~g(t, ~z(t), ~q) as in (3.38), then one can use the N × p matrixsensitivity equations (see [3, 4] and the references therein)

d

dt

(∂~z

∂~q

)=∂~g

∂~z

∂~z

∂~q+∂~g

∂~q(3.88)

to obtain∂f(tj , ~q)∂qk

= C ∂~z(tj , ~q)∂qk

.

Finally, in some cases the function f(tj , ~q) may be sufficiently simple so as toallow one to derive analytical expressions for the components of F~q.

We remark that often one also wants to include initial conditions as partof the unknown vector ~q to be estimated. In this case, one can readily derivesensitivity equations for sensitivities with respect to initial conditions that areanalogous to (3.88). See [2] for examples.

Since ~q0, σ0 are unknown, we will use their estimates to make the approxi-mation

Σn0 ≈ σ20 [χnT (~q0)χn(~q0)]−1 ≈ Σn(qnOLS) = σ2[χnT (qnOLS)χn(qnOLS)]−1, (3.89)

where the approximation σ2 to σ20 , as discussed earlier, is given by

σ20 ≈ σ2 =

1n− p

n∑j=1

[yj − f(tj , qnOLS)]2. (3.90)

Standard errors to be used in the confidence interval calculations are thusgiven by SEk(qn) =

√Σkk(qn), k = 1, 2, . . . , p (see [10]).

In order to compute the confidence intervals (at the 100(1 − α)% level)for the estimated parameters in our example, we define the confidence levelparameters associated with the estimated parameters so that

Pqnk − t1−α/2SEk(qn) < q0k < qnk + t1−α/2SEk(qn) = 1− α, (3.91)

where α ∈ [0, 1] and t1−α/2 ∈ R+. Given a small α value (e.g., α = .05for 95% confidence intervals), the critical value t1−α/2 is computed from theStudent’s t distribution tn−p with n − p degrees of freedom. The value oft1−α/2 is determined by PT ≥ t1−α/2 = α/2 where T ∼ tn−p. In general,a confidence interval is constructed so that, if the confidence interval couldbe constructed for each possible realization of data of size n that could havebeen collected, 100(1−α)% of the intervals so constructed would contain thetrue value q0k. Thus, a confidence interval provides further information onthe extent of uncertainty involved in estimating q0 using the given estimatorand sample size n.


When one is taking longitudinal samples corresponding to solutions of adynamical system, the n×p sensitivity matrix depends explicitly on where intime the observations are taken when f(tj , ~q) = Cz(tj , ~q) as mentioned above.That is, the sensitivity matrix

χ(~q) = F~q(~q) =(∂f(tj , ~q)∂~q

)depends on the number n and the nature (for example, how taken) of thesampling times tj. Moreover, it is the matrix [χTχ]−1 in (3.89) and theparameter σ2 in (3.90) that ultimately determine the standard errors and con-fidence intervals. At first investigation of (3.90), it appears that an increasednumber n of samples might drive σ2 (and hence the SE) to zero as long asthis is done in a way to maintain a bound on the residual sum of squares in(3.90). However, we observe that the condition number of the matrix χTχ isalso very important in these considerations and increasing the sampling couldpotentially adversely affect the inversion of χTχ. In this regard, we note thatamong the important hypotheses in the asymptotic statistical theory (see pp.571 of [17]) is the existence of a matrix function Ω(~q) such that

1nχnT (~q)χn(~q)→ Ω(~q) uniformly in ~q as n→∞,

with Ω0 = Ω(~q0) being a nonsingular matrix. It is this condition that israther easily violated in practice when one is dealing with data from differen-tial equation systems, especially near an equilibrium or steady state (see theexamples of [4]).

All of the above theory readily generalizes to vector systems with partial,non-scalar observations. Suppose now we have the vector system (3.38) withpartial vector observations given by equation (3.40). That is, suppose we havem coordinate observations where m ≤ N . In this case, we have

d~z

dt(t) = ~g(t, ~z(t), ~q) (3.92)

and~yj = ~f(tj , ~q0) + ~εj = C~z(tj , ~q0) + ~εj , (3.93)

where C is an m × N matrix and ~f ∈ Rm, ~z ∈ RN . As already explained inSection 3.2.5, if we assume that different observation coordinates fi may havedifferent variances σ2

i associated with different coordinates of the errors Ej ,then we have that ~Ej is an m-dimensional random vector with

E[~Ej ] = 0, var(~Ej) = V0,

where V0 = diag(σ20,1, ..., σ

20,m), and we may follow a similar asymptotic theory

to calculate approximate covariances, standard errors and confidence intervalsfor parameter estimates.


Since the computations for standard errors and confidence intervals (andalso model comparison tests) depend on an asymptotic limit distribution the-ory, one should interpret the findings as sometimes crude indicators of un-certainty inherent in the inverse problem findings. Nonetheless, it is usefulto consider the formal mathematical requirements underpinning these tech-niques. We offer the following summary of possibilities:

(1) Among the more readily checked hypotheses are those of the statisticalmodel requiring that the errors Ej , j = 1, 2, . . . , n, are independent andidentically distributed (i.i.d.) random variables with mean E[Ej ] = 0and constant variance var(Ej) = σ2

0 . After carrying out the estimationprocedures, one can readily plot the residuals rj = yj − f(tj , qnOLS) vs.time tj and the residuals vs. the resulting estimated model/observationf(tj , qnOLS) values. A random pattern for the first is strong support forvalidity of the independence assumption; a non-increasing, random pat-tern for the latter suggests the assumption of the constant variance maybe reasonable.

(2) The underlying assumption that sampling size n must be large (recallthe theory is asymptotic in that it holds as n → ∞) is not so readily“verified” and is often ignored (albeit at the user’s peril in regard to thequality of the uncertainty findings). Often asymptotic results provideremarkably good approximations to the true sampling distributions forfinite n. However, in practice there is no way to ascertain whether theoryholds for a specific example.

3.4 Investigation of Statistical Assumptions

The form of error in the data (which of course is rarely known) dictateswhich method from those discussed above one should choose. The OLSmethod is most appropriate for constant variance observations of the formYj = f(tj , ~q0) + Ej whereas the GLS should be used for problems in which wehave nonconstant variance observations Yj = f(tj , ~q0)(1 + Ej).

We emphasize that to obtain the correct standard errors in an inverse prob-lem calculation, the OLS method (and corresponding asymptotic formulas)must be used with constant variance generated data, while the GLS method(and corresponding asymptotic formulas) should be applied to nonconstantvariance generated data.

Not doing so can lead to incorrect conclusions. In either case, the standarderror calculations are not valid unless the correct formulas (which dependon the error structure) are employed. Unfortunately, it is very difficult toascertain the structure of the error, and hence the correct method to use,


without a priori information. Although the error structure cannot definitivelybe determined, the two residuals tests can be performed after the estimationprocedure has been completed to assist in concluding whether or not thecorrect asymptotic statistics were used.

3.4.1 Residual Plots

One can carry out simulation studies with a proposed mathematical modelto assist in understanding the behavior of the model in inverse problems withdifferent types of data with respect to mis-specification of the statistical model.For example, we consider a statistical model with constant variance (CV) noise

Yj = f(tj , ~q0) +η

100Ej , var(Yj) =

η2

10000σ2,

and another with nonconstant variance (NCV) noise

Yj = f(tj , ~q0)(1 +η

100Ej), var(Yj) =

η2

10000σ2 f2(tj , ~q0).

We obtain a data set by considering a realization yjnj=1 of the random proc-ess Yjnj=1 through a realization of Ejnj=1, and then calculate an estimateq of ~q0 using the OLS or GLS procedure.

We will then use the residuals rj = yj − f(tj , q) to test whether the dataset is i.i.d. and possesses the assumed variance structure. If a data set hasconstant variance error then

Yj = f(tj , ~q0) + Ej or Ej = Yj − f(tj , ~q0).

Since it is assumed that the error Ej is i.i.d., a plot of the residuals rj = yj −f(tj , q) vs. tj should be random. Also, the error in the constant variance casedoes not depend on f(tj , q0), and so a plot of the residuals rj = yj−f(tj , q) vs.f(tj , q) should also be random. Therefore, if the error has constant variance,then a plot of the residuals rj = yj − f(tj , q) against tj and against f(tj , q)should both be random. If not, then the constant variance assumption issuspect.

We turn next to questions of what to expect if this residual test is appliedto a data set that has nonconstant variance (NCV) generated error. Thatis, we wish to investigate what happens if the data are incorrectly assumedto have CV error when in fact they have NCV error. Since in the NCVexample, Rj = Yj − f(tj , ~q0) = f(tj , ~q0) Ej depends upon the deterministicmodel f(tj , ~q0), we should expect that a plot of the residuals rj = yj−f(tj , q)vs. tj should exhibit some type of pattern. Also, the residuals actually dependon f(tj , q) in the NCV case, and so as f(tj , q) increases the variation of theresiduals rj = yj − f(tj , q) should increase as well. Thus rj = yj − f(tj , q) vs.f(tj , q) should have a fan shape in the NCV case.


In summary, if a data set has nonconstant variance generated data, then

Yj = f(tj , ~q0) + f(tj , ~q0) Ej or Ej =Yj − f(tj , ~q0)f(tj , ~q0)

.

If the distributions of Ej are i.i.d., then a plot of the modified residualsrmj = (yj − f(tj , q))/f(tj , q) vs. tj should be random in nonconstant vari-ance generated data. A plot of rmj = (yj−f(tj , q))/f(tj , q) vs. f(tj , q) shouldalso be random.

Another question of interest concerns the case in which the data are in-correctly assumed to have nonconstant variance error when in fact they haveconstant variance error. Since Yj − f(tj , ~q0) = Ej in the constant variancecase, we should expect that a plot of rmj = (yj − f(tj , q))/f(tj , q) vs. tj aswell as that for rmj = (yj − f(tj , q))/f(tj , q) vs. f(tj , q) will possess somedistinct pattern.

There are two further issues regarding residual plots. As we shall see by ex-amples, some data sets might have values that are repeated or nearly repeateda large number of times (for example when sampling near an equilibrium forthe mathematical model or when sampling a periodic system over many peri-ods). If a certain value is repeated numerous times (e.g., frepeat) then any plotwith f(tj , q) along the horizontal axis should have a cluster of values along thevertical line x = frepeat. This feature can easily be removed by excluding thedata points corresponding to these high frequency values (or simply excludingthe corresponding points in the residual plots). Another common techniquewhen plotting against model predictions is to plot against logf(tj , q) insteadof f(tj , q) itself which has the effect of “stretching out” plots at the ends.Also, note that the model value f(tj , q) could possibly be zero or very nearzero, in which case the modified residuals Rmj = Yj−f(tj ,q)

f(tj ,q)would be undefined

or extremely large. To remedy this situation one might exclude values veryclose to zero (in either the plots or in the data themselves). We chose here toreduce the data sets (although this sometimes could lead to a deterioration inthe estimation results obtained). In our examples below, estimates obtainedusing a truncated data set will be denoted by qtcv

OLS for constant variance dataand qtncv

OLS for nonconstant variance data.

3.4.2 An Example Using Residual Plots

We illustrate residual plot techniques by exploring a widely studied model— the logistic population growth model of Verhulst/Pearl [16]

z = rz(1− z

K), z(0) = z0. (3.94)

Here K is the population’s carrying capacity, r is the intrinsic growth rate andz0 is the initial population size. This well-known logistic model describes how


populations grow when constrained by resources or competition. We shalldiscuss this model, its derivation and properties in more detail subsequentlyin this monograph in Chapter 9. The closed form solution of this simple modelis given by

z(t) =K z0e

rt

K + z0 (ert − 1). (3.95)

The left plot in Figure 9.2 depicts the solution of the logistic model for K =17.5, r = .7 and z0 = 0.1 for 0 ≤ t ≤ 25. If high frequency repeated ornearly repeated values (i.e., near the initial value x0 or near the the asymptotex = K) are removed from the original plot, the resulting truncated plot isgiven in the right panel of Figure 9.2 (there are no near zero values for thisfunction).

0 5 10 15 20 250

2

4

6

8

10

12

14

16

18

Time

Tru

e M

od

el

True Solution vs. Time

3 4 5 6 7 8 9 10 11 12 130

2

4

6

8

10

12

14

16

18

Time

Tru

nca

ted

Tru

e M

od

el

Truncated True Solution vs. Time

FIGURE 3.4: Original and truncated logistic curve with K = 17.5, r = .7and z0 = .1.

For this example we generated both CV and NCV noisy data (we sam-pled from N (0, 1) random variables to obtain realizations of Ej) and obtainedestimates q of ~q0 = (K, r, z0) by applying either the OLS or GLS methodto a realization yjnj=1 of the random process Yjnj=1. The initial guesses~qinit = q(0) along with estimates for each method and error structure are givenin Tables 3.1 – 3.4. As expected, both methods do a good job of estimating ~q0,however the error structure was not always correctly specified since incorrectasymptotic formulas were used in some cases.

When the OLS method was applied to nonconstant variance data and theGLS method was applied to constant variance data, the residual plots givenbelow do reveal that the error structure was misspecified. For instance, theplot of the residuals for qncv

OLS given in Figures 3.7 and 3.8 reveal a fan shapedpattern, which indicates the constant variance assumption is suspect. In ad-dition, the plot of the residuals for qcv

GLS given in Figures 3.9 and 3.10 revealan inverted fan shaped pattern, which indicates the nonconstant variance as-


TABLE 3.1: Estimation using the OLS procedure with CV datafor η = 5.

~qinit ~q0 qcvOLS SE(qcv

OLS) qtcvOLS SE(qtcv

OLS)17 17.5 1.7500e+001 1.5800e-003 1.7494e+001 6.4215e-003.8 .7 7.0018e-001 4.2841e-004 7.0062e-001 6.5796e-0041.2 .1 9.9958e-002 3.1483e-004 9.9702e-002 4.3898e-004

TABLE 3.2: Estimation using the GLS procedure with CV datafor η = 5.

~qinit ~q0 qcvGLS SE(qcv

GLS) qtcvGLS SE(qtcv

GLS)17 17.5 1.7500e+001 1.3824e-004 1.7494e+001 9.1213e-005.8 .7 7.0021e-001 7.8139e-005 7.0060e-001 1.6009e-0051.2 .1 9.9938e-002 6.6068e-005 9.9718e-002 1.2130e-005

TABLE 3.3: Estimation using the OLS procedure with NCVdata for η = 5.

~qinit ~q0 qncvOLS SE(qncv

OLS) qtncvOLS SE(qtncv

OLS )17 17.5 1.7499e+001 2.2678e-002 1.7411e+001 7.1584e-002.8 .7 7.0192e-001 6.1770e-003 7.0955e-001 7.6039e-0031.2 .1 9.9496e-002 4.5115e-003 9.4967e-002 4.8295e-003

TABLE 3.4: Estimation using the GLS procedure with NCVdata for η = 5.

~qinit ~q0 qncvGLS SE(qncv

GLS) qtncvGLS SE(qtncv

GLS )17 17.5 1.7498e+001 9.4366e-005 1.7411e+001 3.1271e-004.8 .7 7.0217e-001 5.3616e-005 7.0959e-001 5.7181e-0051.2 .1 9.9314e-002 4.4976e-005 9.4944e-002 4.1205e-005

sumption is suspect. As expected, when the correct error structure is specified,the i.i.d. test and the model dependence test each display a random pattern(Figures 3.5, 3.6 and Figures 3.11, 3.12).

Also, included in the right panel of Figures 3.5 – 3.12 are the residual plotswith the truncated data sets. In those plots only model values between oneand seventeen were considered (i.e., 1 ≤ yj ≤ 17). Doing so removed thedense vertical lines in the plots with f(tj , q) along the x-axis. Nonetheless,the conclusions regarding the error structure remain the same.

In addition to the residual plots, we can also compare the standard errorsobtained for each simulation. At a quick glance of Tables 3.1 - 3.4, the stan-dard error of the parameter K in the truncated data set is larger than thestandard error of K in the original data set. This behavior is expected. If weremove the “flat” region in the logistic curve, we actually discard measure-ments with high information content about the carrying capacity K — see


[4]. Doing so reduces the quality of the estimator for K. Another interestingobservation is that the standard errors of the GLS estimate are more opti-mistic than that of the OLS estimate, even when the non-constant varianceassumption is wrong. This example further solidifies the conclusion that be-fore one reports an estimate and corresponding standard errors, there needsto be some assurance that the proper error structure has been specified.

0 5 10 15 20 25−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Time

Residual

Residual vs. Time with OLS & CV Data

3 4 5 6 7 8 9 10 11 12 13−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Time

Truncated Residual

Residual vs. Time with OLS & Truncated CV Data

FIGURE 3.5: Residual vs. time plots: Original and truncated logisticcurve for qCV

OLS with η = 5.

0 2 4 6 8 10 12 14 16 18−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Model

Residual

Residual vs. Model with OLS & CV Data

0 2 4 6 8 10 12 14 16 18−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Truncated Model

Truncated Residual

Residual vs. Model with OLS & Truncated CV Data

FIGURE 3.6: Residual vs. model plots: Original and truncated logisticcurve for qCV

OLS with η = 5.


0 5 10 15 20 25−3

−2

−1

0

1

2

3

Time

Residual

Residual vs. Time with OLS & NCV Data

3 4 5 6 7 8 9 10 11 12 13−3

−2

−1

0

1

2

3

Time

Truncated Residual

Residual vs. Time with OLS & Truncated NCV Data

FIGURE 3.7: Residual vs. time plots: Original and truncated logisticcurve for qNCV

OLS with η = 5.

0 2 4 6 8 10 12 14 16 18−3

−2

−1

0

1

2

3

Model

Residual

Residual vs. Model with OLS & NCV Data

0 2 4 6 8 10 12 14 16 18−3

−2

−1

0

1

2

3

Truncated Model

Truncated Residual

Residual vs. Model with OLS & Truncated NCV Data

FIGURE 3.8: Residual vs. model plots: Original and truncated logisticcurve for qNCV

OLS with η = 5.

3.5 Statistically Based Model Comparison Techniques

In previous sections we have discussed techniques (e.g., residual plots) forinvestigating correctness of the assumed statistical model underlying the es-timation (OLS or GLS) procedures used in inverse problems. To this pointwe have not discussed correctness issues related to choice of the mathematicalmodel. However there are a number of ways in which questions related to themathematical model may arise. In general, modeling studies [7, 8] can raisequestions as to whether a mathematical model can be improved by more detailand/or further refinement. For example, one might ask whether one can im-prove the mathematical model by assuming more detail in a given mechanism(constant rate vs. time or spatially dependent rate — e.g., see [1] for questionsrelated to time dependent mortality rates during sub-lethal damage in insect


0 5 10 15 20 25−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Time

Residual/Model

Residual/Model vs. Time with GLS & CV Data

3 4 5 6 7 8 9 10 11 12 13−0.15

−0.1

−0.05

0

0.05

0.1

Time

Truncated Residual/Model

Residual/Model vs. Time with GLS & Truncated CV Data

FIGURE 3.9: Residual vs. time plots: Original and truncated logisticcurve for qCV

GLS with η = 5.

0 2 4 6 8 10 12 14 16 18−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Model

Residual/Model

Residual/Model vs. Model with GLS & CV Data

0 2 4 6 8 10 12 14 16 18−0.15

−0.1

−0.05

0

0.05

0.1

Truncated Model


Residual/Model vs. Model with GLS & Truncated CV Data

FIGURE 3.10: Modified residual vs. model plots: Original and truncatedlogistic curve for qCV

GLS with η = 5.

populations exposed to various levels of pesticides). Or one might questionwhether an additional mechanism in the model might produce a better fit todata — see [5, 6, 7] for diffusion alone or diffusion plus convection in cat braintransport in grey vs. white matter considerations.

Before continuing, an important point must be made: In model compari-son results outlined below, there are really two models being compared: themathematical model and the statistical model. If one embeds the mathematicalmodel in the wrong statistical model (for example, assuming constant variancewhen this really isn’t true), then the mathematical model comparison resultsusing the techniques presented here will be invalid (i.e., worthless). An im-portant remark in all this is that one must have the mathematical model onewants to simplify or improve (e.g., test whether V = 0 or not in the examplebelow) embedded in the correct statistical model (determined in large part bythe observation process), so that the comparison actually is only with regardto the mathematical model.


0 5 10 15 20 25−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Time

Residual/Model

Residual/Model vs. Time with GLS & NCV Data

3 4 5 6 7 8 9 10 11 12 13−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Time


Residual/Model vs. Time with GLS & Truncated NCV Data

FIGURE 3.11: Modified residual vs. time plots: Original and truncatedlogistic curve for qNCV

GLS with η = 5.

0 2 4 6 8 10 12 14 16 18−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Model

Residual/Model

Residual/Model vs. Model with GLS & NCV Data

0 2 4 6 8 10 12 14 16 18−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Truncated Model


Residual/Model vs. Model with GLS & Truncated NCV Data

FIGURE 3.12: Modified residual vs. model plots: Original and truncatedlogistic curve for qNCV

GLS with η = 5.

To provide specific motivation, we illustrate the formulation of hypothesistesting by considering a mathematical model for a diffusion-convection proc-ess. This model was proposed for use with experiments designed to studysubstance (labelled sucrose) transport in cat brains, which are heterogeneous,containing grey and white matter [7]. In general, the transport of substancein cat’s brains can be described by a PDE describing change in time andspace. This convection/diffusion model, which is widely discussed in the ap-plied mathematics and engineering literature, has the form

∂u

∂t+ V ∂u

∂x= D∂

2u

∂x2. (3.96)

Here, the parameter ~q = (D,V), which belongs to some admissible parameterset Q, denotes the diffusion coefficient D and the bulk velocity V of the fluid,respectively. Our problem: test whether the parameter V plays a significantrole in the mathematical model. That is, if the model (3.96) represents a


diffusion-convection process, we seek to determine whether diffusion aloneor diffusion plus convection best describes transport phenomena representedin cat brain data sets yij for u(ti, xj ; ~q), the concentration of labelledsucrose at times ti and location xj. We thus might wish to test thenull hypothesis H0 that diffusion alone best describes the data versus thealternative hypothesis HA that convection is also needed. We then may takeH0 : V = 0 and the alternative HA : V 6= 0. Consequently, the restrictedparameter set QH ⊂ Q defined by

QH = ~q ∈ Q : V = 0

will be important. To carry out these determinations, we will need some modelcomparison tests of analysis of variance (ANOVA) type [14] from statisticsinvolving residual sum of squares (RSS) in least squares problems.

3.5.1 RSS Based Statistical Tests

In general, we assume an inverse problem with mathematical model f(t, ~q)and n observations ~Y = Yjnj=1. We define an OLS performance criterion

Jn(~q) = Jn(~Y , ~q) =1n

n∑j=1

[Yj − f(tj , ~q)]2,

where our statistical model again has the form

Yj = f(tj , ~q0) + Ej , j = 1, . . . , n,

with Ejnj=1 being independent and identically distributed, E(Ej) = 0 andconstant variance var(Ej) = σ2. As usual ~q0 is the “true” value of ~q whichwe assume to exist. As noted above, we use Q to represent the set of all theadmissible parameters ~q and assume that Q is a compact subset of Euclideanspace of Rp with ~q0 ∈ Q.

Let qn(~Y ) = qnOLS(~Y ) be the OLS estimator using Jn with correspondingestimate qn = qnOLS(~y) for a realization ~y = yj. That is,

qn(~Y ) = arg min~q∈Q

Jn(~Y , ~q) and qn = arg min~q∈Q

Jn(~y, ~q).

We remark that in most calculations, one actually uses an approximationfN to f , often a numerical solution to the ODE or PDE for modeling thedynamical system. Here we tacitly assume fN will converge to f as the ap-proximation improves. There are also questions related to approximationsof the set Q when it is infinite dimensional (e.g., in the case of functionspace parameters such as time or spatially dependent parameters) by finitedimensional discretizations QM . For extensive discussions related to thesequestions, see [8] as well as [6] where related assumptions on convergences


fN → f and QM → Q are given. We shall ignore these issues in our presen-tations, keeping in mind that these approximations will also be of importancein the methodology discussed below in most practical uses.

In many instances, including the motivating example given above, one isinterested in using data to address the question whether or not the “true” pa-rameter ~q0 can be found in a subset QH ⊂ Q, which we assume for discussionshere is defined by

QH = ~q ∈ Q|H~q = c, (3.97)

where H is an r× p matrix of full rank, and c is a known constant vector. Inthis case we want to test the null hypothesis H0: ~q0 ∈ QH .

Define then

qnH(~Y ) = arg min~q∈QH

Jn(~Y , ~q) and qnH = arg min~q∈QH

Jn(~y, ~q)

and observe that Jn(~Y , qnH) ≥ Jn(~Y , qn). We define the related non-negativetest statistics and their realizations, respectively, by

Tn(~Y ) = n(Jn(~Y , qnH)− Jn(~Y , qn))

andTn = Tn(~y) = n(Jn(~y, qnH)− Jn(~y, qn)).

One can establish asymptotic convergence results for the test statisticsTn(~Y ), as given in detail in [6]. These results can, in turn, be used to establisha fundamental result about more useful statistics for model comparison. Wedefine these statistics by

Un(~Y ) =Tn(~Y )

Jn(~Y , qn), (3.98)

with corresponding realizations Un = Un(~y). We then have the asymptoticresult that is the basis of our ANOVA–type tests.

Under reasonable assumptions (very similar to those required in the asymp-totic sampling distribution theory discussed in previous sections (see [6, 8, 12,17])) involving regularity and the manner in which samples are taken, one canprove a number of convergence results including:

(i) The estimators qn converge to ~q0 with probability one as n→∞;

(ii) If H0 is true, Un converges in distribution to U(r) as n → ∞ whereU ∼ χ2(r), a χ2 distribution with r degrees of freedom, where r is thenumber of constraints specified by the matrix H.

Recall that H is the r×p matrix of full rank defining QH and that randomvariables converge in distribution if their corresponding cumulative distribu-tion functions converge point wise at all points of continuity of the limit cdf.


t

p(u)

a

FIGURE 3.13: Example of U ∼ χ2(4) density.

An example of the χ2 density is depicted in Figure 3.13 where the densityfor χ2(4) (χ2 with r = 4 degrees of freedom) is graphed. In this figure twoparameters (τ, α) of interest are shown. For a given value τ , the value αis simply the probability that the random variable U will take on a valuegreater than α. That is, P (U > τ) = α where in hypothesis testing, α is thesignificance level and τ is the threshold.

We wish to use this distribution to test the null hypothesis, H0, which weapproximate by Un ∼ χ2(r). If the test statistic, Un > τ , then we reject H0

as false with confidence level (1− α)100%. Otherwise, we do not reject H0

as true. We emphasize that care should be taken in stating conclusions: weeither reject or do not reject H0 at the specified level of confidence. For thecat brain problem, we use a χ2(1) table, which can be found in any elementarystatistics text or online and is given here for illustrative purposes; see Table3.5.

TABLE 3.5: χ2(1)values.

α τ confidence.25 1.32 75%.1 2.71 90%.05 3.84 95%.01 6.63 99%.001 10.83 99.9%

3.5.1.1 P-Values

The minimum value α∗ of α at which H0 can be rejected is called the p-value. Thus, the smaller the p-value, the stronger the evidence in the data insupport of rejecting the null hypothesis and including the term in the model,i.e., the more likely the term should be in the model. We implement this as


follows: Once we compute Un = τ , then p = α∗ is the value that correspondsto τ on a χ2 graph and so we reject the null hypothesis at any confidence levelc, such that c < 1 − α∗. For example, if for a computed τ we find p = α∗ =.0182, then we would reject H0 at confidence level (1−α∗)100% = 98.18% orlower. For more information, the reader can consult ANOVA discussions inany good statistics book.

3.5.1.2 Alternative Statement

To test the null hypothesis H0, we choose a significance level α and use χ2

tables to obtain the corresponding threshold τ = τ(α) so that P (χ2(r) > τ) =α. We next compute Un = τ and compare it to τ . If Un > τ , then we rejectH0 as false; otherwise, we do not reject the null hypothesis H0.

3.5.2 Application: Cat-Brain Diffusion/Convection Problem

We summarize use of the model comparison techniques outlined above byreturning to the cat brain example discussed in detail in [7, 8]. There were 3sets of experimental data examined, under the null-hypothesis H0 : V = 0.

For Data Set 1, we found after carrying out the inverse problems over Qand QH , respectively,

Jn(qn) = 106.15 and Jn(qnH) = 180.1.

In this case Un = 5.579 (note that n = 8 6= ∞), for which p = α∗ = .0182.Thus, we reject H0 in this case at any confidence level less than 98.18%.Thus, we should reject that V = 0, which suggests convection is important indescribing this data set.

For Data Set 2, we found

Jn(qn) = 14.68 and Jn(qnH) = 15.35,

and thus, in this case, we have Un = .365, which implies we do not rejectH0 with high degrees of confidence (p-value very high). This suggests V = 0,which is completely opposite to the findings for Data Set 1.

For the final set (Data Set 3) we found

Jn(qn) = 7.8 and Jn(qnH) = 146.71,

which yields in this case, Un = 15.28. This, as in the case of the first dataset, suggests (with p < .001) that V 6= 0 is important in modeling the data.

The difference in conclusions between the first and last sets and that ofthe second set is interesting and perhaps at first puzzling. However, whendiscussed with the doctors who provided the data, it was discovered that thefirst and last set were taken from the white matter of the brain, while theother was taken from the grey matter. This later finding was consistent withobserved microscopic tests on the various matter (micro channels in white


matter that promote convective “flow”). Thus, it can be suggested with areasonably high degree of confidence, that white matter exhibits convectivetransport, while grey matter does not.

Exercise: Solutions to the MLE

Use the second derivative test to verify that the expressions in equations(3.48) and (3.49) for qMLE and σ2

MLE, respectively, do indeed maximize (3.46).

Project: Statistical Analysis in Inverse ProblemsUsing Simulated Data

The aim of this project is to apply the statistical analysis for inverse prob-lems to the exercise described in Chapter 2. In particular, we use the harmonicoscillator (mass-spring-dashpot) model given by

md2y(t)dt2

+ cdy(t)dt

+ ky(t) = 0


y(t0) = y0,dy(t0)dt

= v0,

ord2y(t)dt2

+ Cdy(t)dt

+Ky(t) = 0


y(t0) = y0,dy(t0)dt

= v0.

The above two models are equivalent when C = c/m and K = k/m if m 6=0. In general, the coefficients C and K are unknown parameters. Theseparameters can be estimated via a nonlinear least squares estimation problem.Specifically, one seeks ~q = (C,K) to minimize the cost function

J(~q) =n∑i=1

∣∣ym(ti; ~q)− ydi∣∣2 ,

where ym(ti; ~q) is the model solution to the spring mass dashpot model at timeti for i = 1, 2, . . . , n, given the parameter set ~q and ydi is the data (displace-ment) collected also at time ti. In this exercise, we will create “simulated”data to be used for estimating the unknown parameters ~q = (C,K). Forthis, we assume that displacement is sampled at equally spaced time inter-vals. We will subdivide the time interval [0, 5] into n equal subintervals of


length h = 5/n. Let ydi denote the displacement sampled at time ti = ih,i = 1, . . . , N . For this, use the solution y(ti) to the spring-mass-dashpot sys-tem corresponding to C = 1, and K = 1.5 and add to each simulated data anerror term as follow:

yd(ti) = yd(ti) + nl · randi,

where randi are the normally distributed random numbers with zero meanand variance 1.0. Use the MATLAB routine randn to generate an n-vectorwith random entries. Here, nl is a noise level constant.

For each of the values nl = 0.01, nl = 0.02, nl = 0.05, nl = 0.1, nl = 0.2,

1). Estimate the parameters C, and K using the ordinary least squaresmethod.

2). Compute the standard error for each parameter.

3). Report the covariance matrix and discuss the off-diagonal elements.

4). Compare your computed values with “true” values (which we can com-pute because we know the “true” standard deviation).

Project: Hypothesis Testing Using Experimental Data

The aim of this project is to use the hypothesis testing to investigate phe-nomena represented in experimental data. In particular, we use the harmonicoscillator (mass-spring-dashpot) model to describe vibrational data from acantilever beam. The mathematical model is given by

md2y(t)dt2

+ cdy(t)dt

+ ky(t) = 0


y(t0) = y0,dy(t0)dt

= v0,

ord2y(t)dt2

+ Cdy(t)dt

+Ky(t) = 0


y(t0) = y0,dy(t0)dt

= v0.

The above two models are equivalent when C = c/m and K = k/m if m 6= 0.


1.) Data collection:Excite the cantilever beam with a sinusoidal input at the first funda-mental frequency of the beam (approximately 6.5 Hz). Use the piezo-ceramic patches for the exciting actuator. If the measured data is thedisplacement (e.g., by using a proximity probe), you would terminatethe exciting input to the beam at t = t0 when d

dty(t0) = 0 (see Figure3.14) and measure y0 = y(t0) (so y0 and and v0 are assumed to be givenby observations).

0 5 10 15 20 25−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

t0

FIGURE 3.14: Beam excitation.

However, for this exercise you will take data on [t0, t1]: adi which areobservations for ymod(ti; q) (using an accelerometer). Here q are un-known parameters in the model. It is noted that, since the measureddata are the acceleration, the initial displacement and velocity, y0 andand v0, are indeed unknown (in addition to C and K).

2.) Formulate and carry out the corresponding inverse problem for q =(C,K, y0, v0) with

Jn(q) =n∑i=1

∣∣adi − ymod(ti; q)∣∣2

(a) Estimate q∗ = (C∗,K∗, y∗0 , v∗0) from the data, obtaining q∗n =

(C∗n,K∗n, y∗0n , v

∗0n) so that Jn(q∗n) is the residual.

(b) Estimate q∗∗ = (0,K∗∗, y∗∗0 , v∗∗0 ) - the undamped model - obtainingq∗∗n = (0,K∗∗n , y

∗∗0n , v

∗∗0n) so that Jn(q∗∗n ) is the residual.


(c) For parts (a) and (b), compute the covariance matrix, standarderrors and confidence intervals.

(d) Use the χ2(1) test to test for the significance of your improved fitto the data by allowing nontrivial damping C 6= 0 in the model.Compute the associated p-value.

3.) Repeat 1.) and 2.) above by exciting the beam with a sinusoidal inputat the second fundamental frequency of the beam (approximately 55Hz). In this case, because the solution is highly oscillatory, it might bedifficult to obtain the right initial guesses for the unknown parameters tostart the optimization process. One approach to overcome this difficultyis to consider the problem in the frequency domain by considering thefast Fourier transform (fft) of the data and the model solution. Onethen modifies the parameters C and K so that the frequencies of thesolution and data are close. Next, one modifies the initial conditions y0

and v0 so that the magnitudes of the fft of the solution and the dataare also similar. Now one can use these as initial guesses to carry outthe inverse problem in the time domain.

References

[1] H. T. Banks, J.E. Banks, L.K. Dick and J.D. Stark, Estimation of dy-namic rate parameters in insect populations undergoing sublethal expo-sure to pesticides, CRSC-TR05-22, May, 2005; Bulletin of MathematicalBiology, 69, 2007, pp. 2139–2180.

[2] H.T. Banks, M. Davidian, J.R. Samuels, Jr., and K.L. Sutton, An InverseProblem Statistical Methodology Summary, CRSC-TR08-01, January,2008; Chapter XX in Statistical Estimation Approaches in Epidemiology,(edited by Gerardo Chowell, Mac Hyman, Nick Hengartner, Luis M.ABettencourt and Carlos Castillo-Chavez), Springer, Berlin HeidelbergNew York, to appear.

[3] H.T. Banks, S. Dediu and S.E. Ernstberger, Sensitivity functions andtheir uses in inverse problems, J. Inverse and Ill-posed Problems, 15,2007, pp. 683–708.

[4] H.T. Banks, S.L. Ernstberger and S.L. Grove, Standard errors and con-fidence intervals in inverse problems: Sensitivity and associated pitfalls,CRSC-TR06-10, March, 2006; J. Inverse and Ill-posed Problems, 15,2006, pp. 1–18.

[5] H. T. Banks and B. G. Fitzpatrick, Inverse problems for distributed sys-tems: statistical tests and ANOVA, LCDS/CCS Rep. 88-16, July, 1988,Brown University; Proc. International Symposium on Math. Approachesto Envir. and Ecol. Problems, Springer Lecture Note in Biomath., 81,1989, pp. 262–273.

[6] H. T. Banks and B. G. Fitzpatrick, Statistical methods for modelcomparison in parameter estimation problems for distributed systems,CAMS Tech. Rep. 89-4, September, 1989, University of Southern Cali-fornia; J. Math. Biol., 28, 1990, pp. 501–527.

[7] H.T. Banks and P. Kareiva, Parameter estimation techniques for trans-port equations with application to population dispersal and tissue bulkflow models, J. Math. Biol., 17, 1983, pp. 253–272.

[8] H.T. Banks and K. Kunisch, Estimation Techniques for Distributed Pa-rameter Systems, Birkhauser, Boston, 1989.

[9] R.J. Carroll and D. Ruppert, Transformation and Weighting in Regres-sion, Chapman & Hall, New York, 1988.

63

64 References

[10] G. Casella and R. L. Berger, Statistical Inference, Duxbury, California,2002.

[11] M. Davidian and D. Giltinan, Nonlinear Models for Repeated Measure-ment Data, Chapman & Hall, London, 1998.

[12] B. Fitzpatrick, Statistical Methods in Parameter Identification andModel Selection, Ph.D. Thesis, Division of Applied Mathematics, BrownUniversity, Providence, RI, 1988.

[13] A. R. Gallant, Nonlinear Statistical Models, Wiley, New York, 1987.

[14] F. Graybill, Theory and Application of the Linear Model, Duxbury,North Scituate, MA, 1976.

[15] R. I. Jennrich, Asymptotic properties of non-linear least squares estima-tors, Ann. Math. Statist., 40, 1969, pp. 633–643.

[16] M. Kot, Elements of Mathematical Ecology, Cambridge University Press,Cambridge, 2001.

[17] G.A.F. Seber and C.J. Wild, Nonlinear Regression, J. Wiley & Sons,Hoboken, NJ, 2003.

Chapter 4

Mass Balance and Mass Transport

4.1 Introduction

Mass transfer is important in many areas of science and engineering. Manyfamiliar phenomena involve mass transfer:

• The spreading of odorous gas in a room.

• Liquid in an open pail of water evaporating into surrounding air.

• A piece of sugar added to a cup of coffee eventually dissolving by itselfinto the surrounding solution.

• Transport of chemical substances into the red blood cells.

• Transport of O2 throughout the human body — systemic and cellular.

The most elementary approach to mass transport is compartmental anal-ysis. Compartmental modeling has been and is being used widely in manybranches of biology, biomedicine, and in pharmacokinetics as well as in phys-ical modeling. Indeed, one can find examples of compartmental modeling inalmost any publication of the major journals in physiology and pharmacology.In addition, there are several books that cover both the theory and applica-tions of compartmental modeling, e.g., [3, 4], while several books have chaptersgiving introductions to compartmental analysis as well as its applications (seefor instance, [1, 5, 6, 7, 8]).

4.2 Compartmental Concepts

A compartment is an abstraction used often in biological (and other scien-tific) models. It may of course be a physical entity, a distinct space havingdiscernible boundaries across which material (energy) moves at a measurablerate (and for which, as a rule, an “inside” and “outside” are readily distin-guishable). More generally, we might take as a compartment any anatomical,

65


physiological, chemical, or physical subdivision of a system throughout whichthe behavior (e.g., concentration) of a given substance is uniform. It canalso be useful to compartmentalize in terms of different types of molecules orchemical forms (e.g., hemoglobin, red blood cells, blood plasma). We mightthen make a formal definition of a compartment as follows: if a substance Sis present in a system in several distinguishable forms or locations and if Spasses from one form or location to another form or location at a measurablerate, then each form or location constitutes a separate compartment for S.

The compartment concept represents a system as a set of interconnectingcomponents or subsystems. We further remark that the compartments (sub-systems) do not always correspond to physically identifiable components. Acouple of very simple examples serve to illustrate this concept. In study-ing certain diseases, it is convenient to regard each stage of the disease as acompartment and to construct a mathematical model based on the transferbetween them. Another common example of tracer studies involves red bloodcells suspended in an isotonic (uniform tension or osmotic pressure) fluid. Inthis case one might be interested in the concentration of radioactive potassiumions in the million of separate physical compartments. However, for modelinguptake phenomena, it is most likely that one would consider the collection ofred blood cells as a whole and formulate a two-compartment model consistingof a fluid compartment and a red blood cell “compartment.”

These examples illustrate the fact that it is the behavior of a substance Sin a system which determines the compartmentalization of the system andnot necessarily the physical situation itself. Differences in how investigatorsperceive this “behavior” often lead to the dramatically different compartmen-talizations of a given system found in the literature. For an example, onemight be surprised at the wide range of models used to describe the glucosehomeostatic system in mammals.

In the modeling of mass transport between compartments, several assump-tions are commonly made. Among these are:

(i) constant-volume compartments,

(ii) well-mixed compartments, and

(iii) for systems in which transport is across a membrane, constancy of thetransport coefficient K (discussed further below) in time.

Whether any or all of these assumptions can be justified depends very much onthe nature of the phenomena and systems being modeled. While a decisionto posit (i) is usually rather straightforward, support of (ii) is often moredifficult. There are a number of major contributors to rapid distributionwithin a compartment, including

(a) stirring or mixing by currents within the body of the solution,

(b) transportation (convection) by a flowing stream, and

Mass Balance and Mass Transport 67

(c) diffusion (thermal motion of solute molecules).

Contributions to well-mixing by (a) and (b) can be valid even when the dis-tances (compartment size) are substantial, while (c) is usually a valid compo-nent only in the case of small-volume compartments.

The convenience of this type of decomposition (compartmentalization) isthat it leads directly to a set of equations based on simple balance relations.This can be stated simply as:

change in compartment j = (sum of all transfers into compartment j)−(sum of all transfers out of compartment j)+(creation within compartment j)−(destruction within compartment j).

4.3 Compartment Modeling

To illustrate the ideas behind the concept of a compartment and how it isused, we discuss the simplest example of a two compartment model. Considertwo chambers separated by a membrane with solute S and water in eachchamber (see Figure 4.1).

V1

S1

r1

V2

S2

r2

FIGURE 4.1: Two chamber compartments separated by a membrane.

Assume that each chamber is well-mixed (or well-stirred); that is, when


the solute S is added to the water it is instantly distributed throughout thechamber. This process is slower for liquid than gas and is slowest for solid (itcan be achieved by mixing or by convection by a stream.)

Naturally, one is faced with the following questions. What are the com-partments? How many? The answers depend very much on how the solutebehaves in the system.

• If the membrane is highly permeable (full of holes), one compartmentis adequate to describe the concentration of solute. In this case, equili-bration is essentially instantaneous.

• If the membrane is impermeable (no transport across membrane occurs),only one compartment (the one to which solute is added) is needed tomodel the solute concentration.

• If the membrane is permeable, two compartments are needed and trans-port of solute between the compartments must be modeled.

In addition, in the modeling of mass transport between compartments sep-arated by a membrane, the above assumptions (i)-(iii) are usually made. Thebasic parameter involved in membrane separated compartmental exchange iscalled the transport coefficient. It is usually denoted by K and is proportionalto physical properties of the compartments. In particular,

K ∝ A

δ

= cA

δ,

where c, the proportional constant, is called the membrane permeability co-efficient with units of m2

sec , A is the cross-sectional area (in units m2) and δis the thickness of the membrane (in m). This implies that K has units ofm2

secm2

m = m3

sec , which is the rate at which a substance (volume) is transportedacross the membrane. This is also sometimes called the volumetric rate.

Recall the definition of mass density given by

ρ = mass density (or mass concentration)

=mass of solute

volume of solutionin

kg

m3.

By letting V denote the volume of the compartment into or out of which weare modeling solute transport, we can now write rate of change

of mass incompartment

=

volumetricrate

× mass density .


Then, we have

d

dtm = Kρ.

We next consider using these concepts in a two compartment model suchas depicted in Figure 4.1. In this formulation we assume:

• A two-compartment system labeled 1 and 2 with constant volumes V1

and V2.

• A solute is present and is transported between compartments acrossthe membrane with transport coefficients K1,2 (from 1 to 2) and K2,1

(from 2 to 1), which for the moment are not assumed to be equal. Themasses of solute in compartments 1 and 2 are denoted by m1 and m2

respectively.

Simple mass balance considerations in compartment 1 lead to the followingdifferential equation:

d

dtm1 = (rate of transfer into 1)− (rate of transfer out of 1)

= K2,1ρ2 −K1,2ρ1.

Similarly, for compartment 2 we obtain

d

dtm2 = K1,2ρ1 −K2,1ρ2.

We may rewrite this in terms of concentrations (or densities) by using ρi = miVi

to obtain

d

dtρ1 =

1V1

[K2,1ρ2 −K1,2ρ1]

d

dtρ2 =

1V2

[K1,2ρ1 −K2,1ρ2].

From the above calculations, we observe the following important conse-quences:

• If we assume K1,2 = K2,1 = K, then

dm1

dt= K(ρ2 − ρ1)

dm2

dt= K(ρ1 − ρ2)

= −dm1

dt


by laws of mass conservation. If ρ2 > ρ1, then dm1dt > 0 (that is, the

mass of solute in chamber 1 increases due to movement of solute fromchamber 2 (high concentration) to chamber 1 (low concentration)).

This type of mass transport is called passive transport or molecular(membrane) diffusion. (It is very similar to the manner in which heat istransported in a rod — one observer holds one end of a rod and whenthe other end is heated, the part that is held will become hotter eventhough it is not in direct contact with the heat source; thus, heat is saidto be transported (conducted) from high concentration or temperatureto low concentration or temperature.)

• In terms of mass concentrations (or mass densities) we have

dρ1

dt=K

V1(ρ2 − ρ1)

dρ2

dt=K

V2(ρ1 − ρ2).

Note that ρ1 6= −ρ2 unless V1 = V2. That is, in general, we do not haveconcentration balance.

It is important to note: We have mass conservation and not concentra-tion (or density) conservation.

There are several advantages as well as disadvantages that arise when usingsimple compartmental models.

Advantages:

It is relatively straightforward to write down the mass balance equation(input – output relation). In addition, the resulting model is a set ofordinary differential equations (often rather easy to solve analytically ornumerically).

Disadvantages:

The solution is assumed to be well-mixed. To see the inherent limita-tions, we can consider, for example, dropping a blue liquid dye into abucket of water. The dye will diffuse slowly to other parts of the water.That is, the concentration of the dye is different in different parts ofthe bucket. Often, one can satisfy the well-mixed assumption by con-sidering very small volumes or by stirring the compartment. Howeverthis is not always reasonable. For example, the transport of drug inthe liver will have different concentrations through different parts of theliver. The concentration of a drug injected into the systemic blood mayhave different concentrations in different parts of the blood circulationsystem.


4.4 General Mass Transport Equations

Recall from compartment analysis, we have

dm1

dt∝ (ρ2 − ρ1),

that is, the rate of change of mass is proportional to concentration difference.This type of transport process is known as molecular diffusion. To illustratethis concept, consider the movement of individual molecules, say A and B, ina fluid as depicted in Figure 4.2.

(1)

(2)

A

A

A

AA

A

A

B

B

B

B

B

B

B

B

FIGURE 4.2: Binary molecules movement.

Suppose that there are more A molecules near region (1) than near region(2) and since molecules move randomly in both directions, more A moleculeswill move from (1) to (2) than from (2) to (1). The net transport of A is froma high concentration region to a low concentration region; this is moleculardiffusion.

We further remark that:

• As molecules move they change directions by bouncing off other molecules


after collisions. Since they travel in a random path, molecular diffusionis also called a random walk process.

• To increase the rate of mixing of a substance in solution, the liquid canbe mechanically agitated by a device and convective mass transfer willoccur (due to movement of the bulk liquid).

Let us now consider a mixture of several species (labeled with index i) in amoving fluid through a pipe as depicted in Figure 4.3.

v

x

y

z

x x+∆x

FIGURE 4.3: Moving fluid through a pipe.

We will formulate a mass balance relationship for species i on a volumeelement of thickness ∆x as shown. The general mass balance on species i is rate of accumulation

of mass i involume element

=

rate ofmass i entering

face x

− rate of mass

i leavingface x+ ∆x

±

rate of

generation (or consumption) ofmass i (by metabolism or

chemical reaction)

To write down the rate of mass entering and leaving, we need to discuss

flux laws for mass transport. (Mass flux is defined as the mass that passesthrough a unit cross sectional area per unit time.) We do this first in the casein which the carrier fluid itself is stationary, that is, the fluid bulk velocity vis zero.


4.4.1 Mass Flux Law in a Stationary (Non-Moving) Fluid

Since we are dealing, in general, with multiple species, the “concentrations”of the various species may be expressed in numerous ways. We begin bydefining mass density (or mass concentration) at a point p = (x, y, z) by

ρ(t, x, y, z) =dm

dV= lim

∆V→0

1∆V

∫∆V

m(t, x, y, z) dV,

where ∆V is a small element of volume containing the point p with m(t, x, y, z)being the mass of the particle located at (x, y, z) ∈ ∆V .

We make the following assumptions for our derivation.

(i) In the small volume element ∆V = ∆xA (see Figure 4.4), we have wellmixing so that ρ is constant in ∆V .

(ii) Species are uniform in y and z directions (that is, ρ = ρ(t, x)).

x x+∆x

x

FIGURE 4.4: Incremental volume element.

In a diffusing mixture involving multiple species, the various chemical speciesmay be moving at different velocities. Let vi denote the velocity of species iwith respect to a stationary coordinate system. Then we may define the local“mass average velocity” by

v =

n∑i=1

ρivi

n∑i=1

ρi

.

In some cases one is interested in the velocities of a given species i relativeto v (or perhaps some other velocity) rather than relative to the stationarycoordinate system (vi). This leads to the definition of the “relative diffusionvelocities” vir given by

vir = vi − v = diffusion velocity of i relative to v.


We may use the mass balance for species i in the element of volume ∆V =A∆x with cross sectional A (which may depend on t and/or x). If we assumeno creation or destruction of mass for the present, we may define qi by

qi = rate of mass transport of species i (with mass concentration ρi).

We use the compartmental analysis techniques, treating the element of volume∆V as a “thin membrane” between the immediate “compartments” wherethe concentrations are ρ(x) and ρ(x + ∆x), respectively. We find that qi isproportional to A∆ρ

∆x . Then we may write

qi = ADiρi(x)− ρi(x+ ∆x)

∆x,

where the constant of proportionality is given by Di and is called the massdiffusivity constant

(in units m2

sec

).

Note that here we have assumed that A is approximately constant for thesmall volume. The above expression is the rate of mass transport in theincremental volume element. To find the rate of mass transport at an arbitrarypoint x, we let ∆x→ 0 to obtain

qi → −ADi∂ρi∂x

.

Recall mass flux for species i is ji = rate of mass transportcross sectional area . Hence, we have

ji = −Di∂ρi∂x

with unitskg

m2 − sec. (4.1)

The following remarks are in order:

1. This is known as Fick’s first law of diffusion [2], which says that massflux is proportional to the mass concentration gradient; in general, tem-perature, pressure gradients, and external forces also affect the flux, buttheir effects are usually minor and are ignored, or else treated throughdependence of the diffusion coefficient Di on them.

2. We will later see that Fickian diffusion is very similar to Fourier’s lawof heat conduction and Newton’s law of momentum (in one-dimensionalproblems).

3. The negative sign in (4.1) agrees with the observation that mass flowsfrom high to low mass concentration. If we have ρi(x) < ρi(x+ ∆x), wefind that

∂ρi∂x

> 0

and hence net flow is in the opposite direction from the positive x-direction.


4. In three-dimensional problems, these concepts all readily generalize, andfor mass density ρi(t, x, y, z), we find that the mass flux is given by

~ji = −Di∇ρi.

5. The mass flux with respect to the stationary coordinates is given by

ji = ρivi,

and the mass flux with respect to the relative diffusion velocity jir isgiven by

jir = ρivir.

4.4.2 Mass Flux in a Moving Fluid

We assume that the bulk velocity is denoted by v, so the total velocity ofspecies relative to the fixed coordinate system is vi = vi,diff + v, and hencethe total flux of species i relative to a fixed point in the stationary coordinatesystem is jtoti = ρivi = ρivi,diff + ρiv = jdiffi + jbulki , where we recall thediffusive flux was given by

jdiffi = −Di∂ρi∂x

.

Hence, jtoti = −Di∂ρi∂x + ρiv.

Now write the mass balance on a small element:

∂

∂t[ρi∆xA(t, x)] = jtoti A|x − jtoti A|x+∆x + ri∆xA,

where ri is rate of production (destruction) of species i per unit volume.Dividing by ∆x and taking the limit as ∆x→ 0, we obtain

∂

∂t(ρiA) = − ∂

∂x(jtoti A) + riA

= − ∂

∂x(−ADi

∂ρi∂x

+Aρiv) + riA

or∂

∂t(ρiA) +

∂

∂x(ρivA) =

∂

∂x(ADi

∂ρi∂x

) + riA.

If A is constant, we obtain the usual mass transport equation

∂

∂t(ρi) +

∂

∂x(ρiv) =

∂

∂x(Di

∂ρi∂x

) + ri,

where the second term is identified with the convective or advective transport,and the third term is diffusive transport.


The above derivation can be generalized to the multiple species case toobtain

∂

∂t

(∑ρiA

)= − ∂

∂x

(∑jtoti A

)+∑

riA,

and since∑ri = 0 (total conservation of mass),

∑jtoti =

∑ρiv

toti = ρv (note

that the bulk velocity thus agrees with the local mass velocity v =PρiviPρi

)and

∑ρi = ρ, we have

∂

∂t(ρ) +

∂

∂x(ρv) = 0.

This is the well known equation of continuity.All of the above generalizes to the three-dimensional problem. In particular,

the three-dimensional mass transport equation has the form:

∂

∂t(ρi) +∇ · (ρiv) = ∇ · (Di∇ρi) + ri.

The equation of continuity in three-dimensions is given by:

∂

∂t(ρ) +∇ · (ρ~v) = 0. (4.2)

If ρ is constant, we obtain ∇·~v = 0. This is known as incompressibility of thefluid in which the solute is contained.

Special cases:

(1) When the bulk velocity, ~v, and the reaction rate, ri, are both zero, weobtain

∂

∂t(ρi) = ∇ · (Di∇ρi),

which is called Fick’s second law of diffusion or simply the diffusionequation.

(2) In the case the bulk velocity, ~v, is zero, we have

∂

∂t(ρi) = ∇ · (Di∇ρi) + ri,

which is known as the reaction-diffusion equation.

(3) When the diffusion, Di, is zero, we have

∂

∂t(ρi) +∇ · (ρi~v) = ri.

This is known as the plug-flow, ideal tubular, or unmixed flow model.Here the flow of the fluid is orderly with no element of fluid mixingor overtaking (see Figure 4.5). A necessary and sufficient condition forplug flow is that the residence time is the same for each species.


FIGURE 4.5: Plug flow model.

Exercise: Transport Equations

In the literature one also often finds mass transport in terms of molarconcentration ci and mass fraction ωi. This exercise will provide experiencein deriving mass transport equations in terms of these variables.

(i) Define the molar concentration ci of species i by ci = ρi/Mi, where ρi ismass concentration (in units kg/m3) of species i and Mi is the molecularweight (in units kg/moles) so that ci has units moles of i/m3. Definethe mass fraction ωi = ρi/ρ, where ρ is the total mass density ρ =

∑ρi.

Use compartmental analysis to argue that the rate of mass transport ata point x is given by

qi = −AρDi∂ωi∂x

and the mass flux of species i is given by

ji = ρDi∂ωi∂x

.

Explain when this is equivalent to Fick’s first law of diffusion.

(ii) Now use this and mass balance principles to derive the general masstransport equations with diffusive and convective terms in terms of thevariable ωi (as opposed to in terms of ρi as done earlier in this chapter).

References

[1] J.J. Batzel, F. Kappel, D. Schneditz and H.T. Tran, Cardiovascular andRespiratory Systems: Modeling, Analysis, and Control, SIAM, Philadel-phia, 2006.

[2] C. J. Geankoplis, Transport Processes and Unit Operations, PrenticeHall, Englewood Cliffs, 1993.

[3] K. Godfrey, Compartmental Models and Their Applications, AcademicPress, New York, 1983.

[4] J.A. Jacquez, Compartmental Analysis in Biology and Medicine, TheUniversity of Michigan Press, Ann Arbor, 1985.

[5] R.K. Nagle, E.B. Saff and A.D. Snider, Fundamentals of Differen-tial Equations and Boundary Value Problems, Pearson Education, Inc.,Boston, 2004.

[6] M. Reddy, R.S. Yang, M.E. Andersen and H.J. Clewell, III, Physio-logically Based Pharmacokinetic Modeling: Science and Applications,Wiley-Interscience, Malden, 2005.

[7] S. Strauss and D.W.A. Bourne, Mathematical Modeling of Pharmacoki-netic Data, CRC Press, Boca Raton, 1995.

[8] G.G. Walter and M. Contreras, Compartmental Modeling with Networks,Birkhauser, Boston, 1999.

79

Chapter 5

Heat Conduction

5.1 Motivating Problems

5.1.1 Radio-Frequency Bonding of Adhesives

Radio-frequency (RF) curing of adhesives is a commercially important proc-ess which is used in a number of applications. These include the fixation ofprosthetic joints in some fields of medicine, the acceleration of adhesive set-ting in the woodworking industry, and the bonding of parts in the automotiveindustry. More specifically, in the automobile industry, the use of non-metallicautomotive exterior body panels has grown significantly over the last decade.The most common of these materials is sheet molding compound (SMC), aglass-reinforced polyester which provides corrosion resistance, weight reduc-tion, and complex shape molding capability. These parts are typically moldedin two layers and adhesively bonded in sandwich fashion around their perime-ters to form rigid structures.

The adhesive is commonly applied in a viscous liquid or paste form. Radiofrequency, or dielectric, heating is often used to accelerate the cure rate of theadhesive. In this application, the SMC/adhesive/SMC joint is placed betweentwo electrodes (Figure 5.1). These electrodes then make contact with the joint,compressing it to the desired adhesive bonding thickness. A high voltage elec-tric field, oscillating at approximately 30 MHz, then passes through the jointfor a predetermined period of time at preset power levels, exciting polar orionic species in the adhesive materials and generating heat. In comparison tocommon adhesives, the SMC is dielectrically relatively inactive. Significantheat can be generated within the adhesive, however, causing it to rapidly un-dergo a phase transition from liquid to solid (curing), and effectively bondingthe two substrates to each other. This process, which can be closely simulatedon a laboratory scale using a smaller version of the RF bonding equipmentdescribed above, provides us with a physically interesting problem. We mustdeal with thermally dependent nonlinearities arising from the radio-frequencyfield itself (i.e., temperature dependent input terms as well as conductivities),and complex internal phase transitions which are parametrized by the degreeof cure. This process thus provides us with a problem that is mathematicallyvery interesting. It combines serious modeling issues, mathematical analysis,and computational methodology, while providing a foundation for necessary

81


parameter estimation problems and nonlinear control methodology develop-ment.

Electrode

SMC

Adhesive

SMC

Electrode

FIGURE 5.1: Diagram of SMC-adhesive-SMC joint.

This industrial problem was a joint collaborative effort between scientists atLord Corporation (Cary, North Carolina) and faculty and graduate studentsat North Carolina State University. The goal is to model the radio-frequencycuring of epoxy adhesives in bonding of composites. For a detailed devel-opment of the mathematical model for the heat transfer through the jointwe refer the reader to [1]. The model is a version of the “heat equation” ofFourier fame plus terms that take into account the internal exothermic reac-tion (which is part of the curing process) as well as the heat generated by theconversion of electrical energy to molecular vibrational energy.

5.1.2 Thermal Testing of Structures

Recently, associated with the use of fiber-reinforced composite materials aswell as with more traditional composite metal alloys for aerospace structures,there is growing interest in the detection and characterization of structuralflaws (e.g., cracks, delamination, and corrosion) that may not be detectableby visual inspection. An evaluation procedure for such damage detectionis of paramount importance in the context of aging aircraft (both civilianand military). One recent effort has focused on nondestructive evaluation(NDE) methods based on the measurement of thermal diffusivity in compositematerials (see, e.g., [6]). The idea of this approach is embodied in Figure 5.2.

In [2] the search for structural flaws in materials is formulated as an inverse

Heat Conduction 83

smooth

surface

corroded

surface

Source

Sensor

heatSource

Sensor

heat

FIGURE 5.2: A schematic diagram of the NDE method for the detectionof structural flaws. The sensor measures the surface temperature, and themeasured temperature is different for the smooth versus the corroded surface.

problem for a heat diffusion system. From a physical point of view, thesystem state is the temperature distribution as a function of time and space,the boundary input represents the thermal source (for example, by a laserbeam) and the output corresponds to the observation of the temperaturedistribution at the surface of the material (for example, by an infrared imager);see Figure 5.2 and [6] for more details. The problem is then of identifying,from input and output data, the geometrical structure of the boundary (i.e.,the corroded surface). The mathematical model, which relates front surfacetemperature (the output data) and back surface “geometry,” is described bythe heat equation with appropriate initial and boundary conditions (see [2]for a detailed description).

5.2 Mathematical Modeling of Heat Transfer

5.2.1 Introduction

In addition to the two examples discussed in §5.1, the transfer of energy inthe form of heat occurs in numerous industrial production problems, includ-ing those in the chemical industry, the paper industry, and numerous otherproduction processes. For examples, heat transfer occurs in the drying of lum-ber, chilling of food and biological materials, combustion problems (burningof fuel), and evaporation processes.

In general, heat transfer is energy in transit due to temperature differencesand hence “energy balance” is the underlying conservation principle. Thistransit of energy can occur through conduction, convection, and/or radiation.


• Conduction. Conduction generally refers to heat transfer related tomolecular activity and may be correctly viewed as the transfer of energyfrom the more energetic to the less energetic particles of a substanceor material due to direct interaction between the particles. This typeof transfer is present to some extent in all solids, gases, or liquids inwhich a temperature gradient exists. It is associated with an empiricallybased rate formulation known as Fourier’s law to be discussed below.The conduction mode of heat transfer can be related to the randommotion of molecules in a gas or substance undergoing no bulk motionor macroscopic movement and is therefore termed diffusion of energy orheat diffusion.

• Convection. Heat transfer can also occur in a gas or fluid undergoingbulk or macroscopic motion. The molecules, or aggregates of molecules,move collectively and, in the presence of temperature differences, giverise to energy transfer. The molecules retain, of course, their randommotion and thus the energy transfered is a superposition of energy trans-fer due to random motion of particles as well as due to bulk motion ofthe fluid. The cumulative transport is usually called convection whilethe transfer due to bulk motion alone is called advection, although thisclear distinction is not always made. In modeling, a distinction can bemade between forced convective heat transfer, where a fluid is forced toflow past a solid surface by a pump, for example, and natural or freeconvection which arises most often when a gas or fluid passes over asurface when the two are at different temperatures causing a circulationdue to a density difference resulting from the temperature differences inthe fluid. For either case, the associated empirical rate “law” is calledNewton’s law of cooling.

• Radiation. Thermal radiation refers to energy emitted by matter ata finite positive temperature. This is usually attributed to changes inelectron configurations in atoms and molecules that result in the emis-sion of energy via electromagnetic waves or photons and may occur insolids, fluids or gases. The most important example of radiation is thetransport of heat to the earth from the sun. The associated quantitativerate “law” is given by the Stefan-Boltzmann law.

5.2.2 Fourier’s Law of Heat Conduction

For general molecular transport, all three main types of rate transfer proc-esses — momentum transfer, heat transfer, and mass transfer — are charac-terized by the same general type of equation. This basic equation is given asfollows:

rate of a transfer process =driving force

resistance. (5.1)

Heat Conduction 85

Equation (5.1) simply states that in order to transfer a property (for example,heat) a driving force needs to overcome a resistance.

The transfer of heat by conduction also follows this basic principle and isknown as Fourier’s law of heat conduction in fluids or solids. It is writtenmathematically as

q = −kA∂u∂x, (5.2)

where q is the rate of heat transfer and is given in units of power, i.e., watts(W), where 1W = 1J/sec = .23885 calories/sec, A is the cross-sectional areanormal to the direction of heat flow in m2, k is the thermal conductivity inW/m · K, u is the temperature in K (or C), and x is the distance in m.This “law” is based on phenomenological or empirical observations (such as aconstitutive assumption or “law” in particle mechanics, i.e., Newton’s second“law” of motion, F = ma) as opposed to being based on first principles. Itsatisfies our intuition that the rate of heat transfer across a surface should beproportional to the surface area A and the temperature difference (i.e., thelimit of (u(x+ ∆x)− u(x))/∆x) from one side to the other. The minus signin (5.2) indicates that heat will “flow” from regions of high temperature toregions of low temperature. In general, the thermal conductivity k (which isa measure of the material’s ability to transfer or “conduct” heat) may dependon t, x or even the temperature u. Furthermore, the cross-sectional area Amay depend on x (a nonuniform geometry) and/or t (a “pulsating” solid orbiological compartment). However, in our fundamental development of theheat equation to be presented in the next section, we shall assume that bothk and A are constant (uniform in space and time).

5.2.3 Heat Equation

We begin by considering the unsteady heat transfer problem in one directionin a solid. To derive the conduction equation in one dimension, we refer toFigure 5.3, which depicts a small section of a one-dimensional cylindrical rodcentered about an arbitrary point x.

x x+∆x

x

x-∆x

FIGURE 5.3: Transient conduction in one-dimensional cylindrical rod.


We make the following assumptions:

(i) Heat transfer is by conduction;

(ii) Heat transfer is along the x-axis;

(iii) Temperature is uniform over a cross-section;

(iv) We have perfect insulation, hence no heat is escaping from the sides ofthe cylindrical rod.

Let u(t, x) denote the temperature at x at a given time t and H denote theamount of heat (energy) in units of calories (a calorie is defined as the amountof energy required to raise 1 gm of water 1 C). Heat may also be given inunits of Joules (1 cal = 4.19 J or 1 J = .23885 cal). We expect the amountof heat in an element of mass to be proportional to both the mass and thetemperature. This motivates the quantitative expression for heat:

H = cpmu, (5.3)

where cp is the specific heat, a constant of proportionality which depends onthe material, and m is the mass. The specific heat is given at a constantvolume and has units J

kg·K.

We are now ready to turn to energy balance in the small element of thevolume between x−∆x and x+ ∆x as shown in Figure 5.3. Since the wall ofthe cylindrical rod is insulated, if we assume that there is no heat generatedinside the cylinder, then we have

the net rate of heat accumulation = rate of heat input−rate of heat output. (5.4)

We assume without loss of generality that heat flow is from left to right (i.e.,∂u∂x < 0). The rate of heat input to the cylinder is

rate of heat input = q|x−∆x = −kA∂u∂x

(t, x−∆x). (5.5)

Also,

rate of heat output = q|x+∆x = −kA∂u∂x

(t, x+ ∆x). (5.6)

The rate of heat accumulation in the elemental volume 2A∆x is

rate of heat accumulation =∂H

∂t, (5.7)

Heat Conduction 87

and by using the expression (5.3) for heat, we obtain

∂H

∂t=

∂

∂t(cpmu)

=∂

∂t(cp(2∆xρA)u(t, x))

= 2∆xcpρA∂u(t, x)∂t

, (5.8)

where ρ denotes the mass density of the cylindrical rod. Substituting equa-tions (5.5), (5.6), and (5.8) into (5.4) and dividing by 2∆xA, we have

ρcp∂u

∂t=k ∂u∂x (t, x+ ∆x)− k ∂u∂x (t, x−∆x)

2∆x.

Letting ∆x→ 0 we obtain

ρcp∂u

∂t=

∂

∂x(k∂u

∂x),

or, since k is independent of x,

∂u

∂t=(k

ρcp

)∂2u

∂x2.

This can be written as

∂u

∂t= α

∂2u

∂x2, (5.9)

where α ≡ kρcp

is the thermal diffusivity in m2

sec . Equation (5.9) is known asthe one-dimensional heat equation. Since k is the material’s ability to conductheat and ρcp is the volumetric heat capacity (ability of the material to storeheat), the thermal diffusivity represents the ability of the material to conductthermal energy relative to its ability to store it.

A similar derivation when the heat flow is from right to left (i.e., ∂u∂x > 0)

will give the same partial differential equation (5.9) for the heat conductionin a one-dimensional cylindrical rod.

Before turning to the three-dimensional version of the above quantitativedescription of heat transfer, we note that heat conduction and Fourier’s laware often discussed in terms of heat flux which is the rate of heat transfer (inthe direction x) per unit cross-sectional area and is given by

Φ =q

A= −k∂u

∂x. (5.10)

For heat flux through a general (smooth) surface in three dimensions, theabove formula (5.10) is generalized to have the form

Φ = −k∇u · n, (5.11)


where n is the unit outward normal vector to the surface (n = ±i in the aboveone-dimensional case).

We now consider a general region V and an arbitrary infinitesimal volume∆V enclosed by a surface ∆S (see Figure 5.4).

V

∆S

n

∆V

FIGURE 5.4: (a) A general three-dimensional region. (b) An infinitesimalvolume.

We will formulate heat balance equations for the infinitesimal volume ∆V .First, we have

rate of heat accumulation in ∆V =∂H

∂t

=∂

∂t

(∫∆V

ρcpu dV

)=∫

∆V

ρcp∂u

∂tdV. (5.12)

Also, from (5.11), the heat flux across the boundary ∆S of ∆V is given by

Φ = −k∇u · n.

If ∇u · n is positive, we have heat flow into the infinitesimal element, so therate of change of heat across a surface element dS is given by

−ΦdS = k∇u · n dS,

which is positive (i.e., the temperature u is increasing in the element alongthe n direction.). In this case Φ is negative, i.e., heat is entering the region.If ∇u · n is negative, we have heat flow out of the element and the rate ofchange is again given by

−ΦdS = k∇u · n dS,

which is negative (i.e., the temperature u in the element is decreasing alongthe n direction). In this case the flux Φ is positive, i.e., heat is leaving the

Heat Conduction 89

region. In either case, the rate of change of heat in the volume ∆V is givenby summing the rate (or the negative of the flux) across the boundary surfacearea:

∂H

∂t=∫

∆S

−Φ dS

=∫

∆S

k∇u · n dS. (5.13)

By Gauss’ Theorem (the divergence theorem) [8] (see also Appendix B) wefind that this last expression (5.13) can be rewritten as

∂H

∂t=∫

∆V

∇ · (k∇u) dV. (5.14)

Substituting equation (5.12) for the left side, we have∫∆V

[ρcp

∂u

∂t−∇ · (k∇u)

]dV = 0

for any arbitrary element ∆V in V . Since ∆V is arbitrary, it follows that wemust have

ρcp∂u

∂t= ∇ · (k∇u)

in V . For constant thermal conductivity k, this equation is simplified to thefollowing heat equation in three dimensions:

ρcp∂u

∂t= k∇ · (∇u)

= k∇2u, (5.15)

where ∇2u = ∂2u∂x2 + ∂2u

∂y2 + ∂2u∂z2 . For additional details on the development of

the heat equation, the interested reader may consult [3] or [7].

5.2.4 Boundary Conditions and Initial Conditions

Consider a simple ordinary differential equation (which is the steady-statecase of equation (5.9))

d2u

dx2= 0.

It has infinitely many solutions

u(x) = c1x+ c2,

where c1 and c2 are constants. Thus, to find a unique solution even in thissimple case, one must impose additional conditions on the problem. To find


a unique solution to the heat equation (5.9) or (5.15) we also must imposeauxiliary equations. We choose these auxiliary equations to describe the stateof our heat flow at time “zero” (the beginning of the experiment, for example)and the state of the flow on the boundary of our region. These equations arecalled initial conditions and boundary conditions, respectively.

We consider the one-dimensional heat equation (5.9) on a region (0, L)

∂u

∂t= α

∂2u

∂x2, 0 < x < L, t > 0. (5.16)

From equation (5.16), there is one time derivative, which implies that we needone initial condition specifying u for all x at a given time. In this case we dothis for t = 0 and hence have

u(0, x) = ψ(x), 0 < x < L. (5.17)

In addition, there are two spatial derivatives, which imply that we need twoboundary conditions specifying u for all t at given values of x. For example,we might specify

u(t, 0) = u1(t) and u(t, L) = u2(t), (5.18)

and these are known as Dirichlet boundary conditions.In many cases of interest, boundary conditions might be related to heat flux

rather than to temperature. That is, we might specify fixed heat flux at oneendpoint or both endpoints of the domain. For example, at x = L we mightimpose

Φ(t, L) = −k∂u∂x

(t, L) = f(t). (5.19)

The condition (5.19) is called a Neumann boundary condition. A special caseof (5.19) is

k∂u

∂x(t, L) = 0,

which means that the endpoint at x = L is insulated.Finally, we might combine both conditions (5.18) and (5.19) to obtain a

condition of the form

k∂u

∂x(t, L) + hu(t, L) = g(t), (5.20)

which is known as a Robin boundary condition. Here, the parameter h has aphysical meaning. It is well known that a hot piece of material will cool fasterwhen air is blown or forced by the object. When the fluid or gas (air) outsidethe solid surface is forced or when we have natural convective flow, the rateof heat transfer from the solid to the fluid, or vice versa, is given by

q = hA(us − uf ), (5.21)

Heat Conduction 91

TABLE 5.1: Range of values of h in Newtoncooling.

Mechanism Range of values of h (W/m2K)Still air 2.8-23Moving air 11.3-55Moving water 280-17,000Boiling liquids 1,700-28,000Condensing steam 5,700-28,000

where q is the heat transfer rate in W, A is the area in m2, us is the tem-perature of the solid surface in K, uf is the average or bulk temperature ofthe fluid flowing by in K and h is the convective heat transfer coefficient orNewton cooling constant in W/m2K.

The relation (5.21) is referred to as “Newton’s law of cooling.” Like other“laws,” it is not actually a law but one may think of it as a definition forh based on empirical observations. Since we know that when a fluid flowsby a solid surface, there is a thin film, which is almost stationary, adjacentto the solid wall which presents most of the resistance to heat transfer, theparameter h is also often called the film coefficient or film conductance. Ingeneral, h can not be predicted theoretically. It is a function of the systemgeometry, fluid properties, flow velocity, and, in some cases, the temperaturedifference. In Table 5.1 some values of h are given for different mechanismsof heat transfer and materials (see, for instance, [4, 5]).

If we next divide both sides of equation (5.21) byA, we obtain the convectiveheat flux

Φ =q

A= h(us − uf ). (5.22)

So, we might specify the heat flux at one interface between the solid and fluid(or air) as

−Φ(t, L) = h[uf − u(t, L)],

which, after substitution of the form for heat flux (5.10), can be rewritten as

k∂u

∂x(t, L) + hu(t, L) = huf .

This equation is of the same form as the Robin boundary condition (5.20)and, hence, the meaning of the constant h as discussed above. We note that,if uf > u(t, L), heat flows into the solid; otherwise, heat flows out of the solid(cooling).

Before we conclude this section, we will describe another type of boundarycondition that occurs in some practical applications. This type of boundarycondition is related to the third type of heat transfer mechanism — radiationheat transfer. We recall that this is basically an electromagnetic mechanism


that allows energy to be transported with the speed of light through space.Since it consists of energy in the form of light waves, it obeys the same lawsas does light. That is, it travels in straight lines and is transmitted throughvacuum and space. The associated quantitative law is given by the Stefan-Boltzmann law, expressed as

qmax = Aσu4, (5.23)

where qmax is the maximum rate of emitted heat in units W. The parameterσ is the Stefan-Boltzmann constant (= 5.676 × 10−8 W/m2K4) and u is the(absolute) temperature (in K) of the emitting surface. A body that achievesthis rate in either emission or absorption is called a perfect radiator or blackbody, respectively. The actual emitted rate of a general surface is given by asomewhat smaller number

q = εAσu4 (5.24)

where 0 ≤ ε < 1, with ε called the emissivity of the surface or body. When ε <1, we have a gray body or a gray surface. Bodies (surfaces) also absorb energy,for example, a black body (a perfect absorber) is defined as one that absorbsall radiant energy and reflects none. If qabs, qinc represent the rate of energyabsorbed and rate of energy incident, then the surface absorptive property ischaracterized by a parameter α called the absorptivity and is defined by

qabs = αqinc.

(Unfortunately, the same symbol α often is used in the literature for both theconductive diffusivity and for the absorptivity.) For a gray surface defined byα = ε, the net rate of heat exchange between a surface and its ambient gas isgiven by

qnet = εAσ(u4sur − u4

amb),

where usur, uamb are the surface and ambient temperatures, respectively.Sometimes this net rate of transfer is written as

qnet = hrA(usur − uamb),

where the radiative heat transfer coefficient hr is defined by

hr ≡ εσ(usur + uamb)(u2

sur + u2amb

).

We note that treating hr as a constant essentially linearizes the radiation rateequation.

Finally, when radiation heat transfer occurs from the surface of a solid, it isusually accompanied by convective heat transfer unless the solid is in vacuum.The appropriate boundary condition is then given by

k∂u

∂n|∂Ω = h [uamb − u |∂Ω ] + εσ

[u4amb − u4 |∂Ω

],

where ∂Ω is the closed surface enclosing the solid.

Heat Conduction 93

5.2.5 Properties of Solutions

There is a great deal of mathematical as well as engineering literature re-lated to the heat equation. Many tools for the analysis of the heat equationexist and we remark briefly on several here.

(a) Uniqueness. Consider the classical one-dimensional heat equation

∂u

∂t= α

∂2u

∂x2, (5.25)

defined on a rectangle domain (t, x) : 0 ≤ t ≤ T, 0 ≤ x ≤ L. Then thesolution u(t, x) will achieve its maximum value either initially (t = 0)or along the sides at x = 0 or x = L. This is known as the MaximumPrinciple. The minimum value has the same property and we refer thereader to [10] for the proof of this result.

Let u(t, x) denote a solution to the one-dimensional heat equation (5.25)with initial condition u(0, x) = f(x) and Dirichlet boundary conditionsu(t, 0) = g(t) and u(t, L) = h(t). Let v(t, x) denote another solution ofthe same initial-boundary value problem. Then consider

w(t, x) = u(t, x)− v(t, x).

It follows that w(t, x) also satisfies the heat equation (5.25) but withzero initial condition and zero boundary conditions. By the MaximumPrinciple, w(t, x) ≤ 0 for 0 ≤ t ≤ T and 0 ≤ x ≤ L, where T > 0.Similarly, by the Minimum Principle, w(t, x) ≥ 0. Therefore, w(t, x) =0, so that u(t, x) = v(t, x) for all t ≥ 0 and the initial-boundary valueproblem solution must be unique.

(b) Heat kernel. Consider now the one-dimensional heat equation (5.25)defined on the whole line, −∞ < x < ∞, with the initial conditionu(0, x) = f(x). The solution is given by

u(t, x) =1

2√παt

∫ ∞−∞

f(y)e−(x−y)2

4αt dy.

This solution involves only the well-known “heat kernel” K(t, z) =1

2√παt

e−z24αt and the initial data f .

A special case is when the initial function is the Dirac delta function,f(y) = δ(y) — the “delta” pulse of heat. Then the solution becomes

u(t, x) =1

2√παt

e−x24αt .

This solution exhibits what is referred to as an “infinite speed of prop-agation.” We note that in actual fact heat does not propagate down


a uniform rod with infinite velocity. While this solution is an exactsolution to the one-dimensional heat equation, it is an inarguable man-ifestation of the fact that our carefully derived heat equation model isonly an approximation to physical diffusion or transport of heat in arod.

(c) Next consider the one-dimensional heat equation (5.25) defined on asemi-infinite domain 0 < x < ∞ with the initial condition u(0, x) =f(x). This is called the semi-infinite slab problem and is an idealizationthat is sometimes useful in practical applications. Exact solutions areknown for the following special cases of boundary conditions:

(i) Dirichlet boundary condition case:

u(t, 0) = 0.

By defining an odd extension of the function f(x) as

fodd(x) =

f(x) x > 0,−f(x) x < 0,0 x = 0,

and using the result from part (b), one obtains an explicit formulafor the solution u(t, x) of the form

u(t, x) =∫ ∞

0

(K(t, x− y)−K(t, x+ y))f(y) dy,

where K(t, z) is the heat kernel given in part (b) above.

(ii) Neumann boundary condition case:

∂u

∂x(t, 0) = 0.

Since the derivative of an even function is odd, we use an evenextension of f(x). Using the same reasoning as in part (i), we findthe solution to be

u(t, x) =∫ ∞

0

(K(t, x− y) +K(t, x+ y))f(y) dy.

5.3 Experimental Modeling of Heat Transfer

In §5.2, the development of the mathematical model for heat conduction ina solid was described. As already mentioned in Chapter 1, one of the main

Heat Conduction 95

difficulties in model development is the process of validating the mathematicalmodel by comparing the model prediction to the field (or experimental) data.In addition, mathematical models contain parameters and coefficients that arenot directly measurable in experiments (for example, the thermal diffusivityand the convective heat transfer coefficient). Hence, mathematical modelersand experimentalists must collaborate closely in order to develop effectivequantitative models. That is, experiments must be carefully designed in orderto provide sufficient data to accurately estimate model parameters and/orcoefficients.

We now describe a physical experiment, which is relatively simple and iscost effective to set up, that can be used to validate the model development forthe example of heat conduction in a rod previously derived. The general ar-rangement of equipment needed to set up this experiment is depicted in Figure5.5. The actual heat experiment as set up in our own laboratory is shown inFigure 5.6. The experiment is carried out on a square metal bar of about 75cmlength and 1 cm2 cross section, with holes (in which thermocouples can beinserted) drilled about 4cm apart along the bar. In our lab, we use both cop-per and aluminum bars in the experiment to study the properties of differentmetals and how they affect the heat conduction. The heating element usedis a soldering iron encapsulated cylindrical heater of 30W. There is a widevariety of temperature measuring devices available, such as thermocouples,solid state sensors, mercury in glass thermometers, and resistance temper-ature detectors. The device which is used in our experiment, and which ismost suitable for this application, is the thermocouple. These thermocouples,which are mounted at multiple locations on the rod, are capable of measuringthe temperature of both flat and curved metals, plastic or ceramic surfaces.To insure that thermal equilibrium between the rod and the thermocouple beestablished quickly (this also improves the accuracy of the measured tempera-ture), the holes on the rod are drilled just large enough for the thermocouplesto fit snugly. The temperature measurements are recorded in real time on apersonal computer (PC) using a front-end analog multiplexer that quadru-ples the number of analog input signals. This arrangement allows up to 64thermocouples to be used for simultaneous temperature measurements. Inall of our experiments, 15 thermocouples were used. The analog signals aredigitized by a MIO series multifunction data acquisition (DAQ) board. Thisboard is plugged into one of the empty ISA slots of the PC.

5.3.1 The Thermocouple as a Temperature Measuring De-vice

The relationship between temperatures and thermocouple output voltagesis highly nonlinear. The Seebeck coefficient, or voltage change per degree oftemperature change, can vary by a factor of three or more over the operatingtemperature range of some thermocouples. For this reason, the temperaturefrom thermocouple voltage readings must be approximated by polynomials.


heating

elementsquare metal bar

analog

multiplexer

DAQ

FIGURE 5.5: Hardware connections used to validate the one-dimensionalheat equation.

The polynomials, which can be used to convert the voltage readings in micro-volts to degrees Celsius and vice versa, are given in Tables 2.2 and 2.3. Theseformulas are taken from [9]. The thermocouple used in this experiment is oftype T, which has a fast response time, and is one of the oldest and mostpopular thermocouples for determining temperatures within the range fromabout 370 C down to the triple point of neon (-248.5939 C). Its positivethermoelement, TP, is typically copper of high electrical conductivity and lowoxygen content (99.95% pure copper with an oxygen content varying from0.02 to 0.07% — depending on sulfur content — and with other impuritiestotaling about 0.01%).

To compensate for the temperature difference between the measuring endand the cold junction (AMUX-64T multiplexer screw terminal), the followingprocedure can be used:

(i) Translate the ambient temperature into the corresponding voltage usingthe polynomial in Table 5.3.

(ii) Add the voltages from thermocouples readings to the voltage from step(i).

(iii) Translate the voltage results from step (ii) into the temperatures usingthe polynomial from Table 5.2.

Heat Conduction 97

TABLE 5.2: Type T thermocouples:Coefficients of the approximate inversefunction giving temperature u as a functionof the thermoelectric voltage E in thespecified temperature and and voltageranges. The function is of the form:u = c0 + c1E + c2E

2 + · · ·+ c6E6, where E

is in microvolts and u is in degrees Celsius.

Temperature range: 0 oC to 400 oCVoltage range: 0µV to 20872µVc0 = 0.000000c1 = 2.592800× 10−2

c2 = −7.602961× 10−7

c3 = 4.637791× 10−11

c4 = −2.165394× 10−15

c5 = 6.048144× 10−20

c6 = −7.293422× 10−25

Error range: 0.03 oC to -0.03oC

TABLE 5.3: Type T thermoucouples:Coefficients of the approximate function givingthe thermoelectric voltage E as a function oftemperature u in the specified temperaturerange. The function is of the form:E = c0 + c1u+ c2u

2 + · · ·+ c8u8, where E is in

microvolts and u is in degrees Celsius.

Temperature range: 0 oC to 400 oCc0 = 0.000000c1 = 3.8748106364× 101

c2 = 3.3292227880× 10−2

c3 = 2.0618243404× 10−4

c4 = −2.1882256846× 10−6

c5 = 1.0996880928× 10−8

c6 = −3.0815758772× 10−11

c7 = 4.5479135290× 10−14

c8 = −2.7512901673× 10−17


GUI for data display

Thermocouples

Soldering iron

Multiplexer

FIGURE 5.6: Heat experiment as set up in our own laboratory.

TABLE 5.4: Hardware equipment for thermalequipment.

Descriptive name Probable brand (model)• Super fast responsetime thermocouples Omega (C01-K)• PC data acquisition National Instrumentsboards for ISA (AT-MIO-16E-10)• Analog multiplexer with National Instrumentstemperature sensor (AMUX-64T)• PC computer Pentium (or later processor)

5.3.2 Detailed Hardware and Software Lists

To carry out the experiments outlined here, a minimum of the followinghardware and software is advisable (see Tables 5.4 and 5.5, respectively).

Given a lab set up of the type described above, students can carry out aproject to determine thermal constants in the heat model. We suggest in somedetail such a project.

Project: Thermal Experiment

The aim of this project is to validate the mathematical model for heattransfer in a rod using data collected from the experiment as described in§5.3. To this end, we recall from §5.2 and under the following assumptions

(i) Heat is transferred along the x-axis only,

Heat Conduction 99

TABLE 5.5: Software tools for thermal equipment.

Descriptive name Brand• Labsuite software for data acquisitionand equipment control including LabVIEWfor Windows National Instruments• MATLAB and Optimization toolboxfor data and model analysis The MathWorks, Inc.

(ii) Temperature is uniform over a cross-section and,

(iii) The rod is perfectly insulated,

a one-dimensional heat equation describing the heat conduction in a rod wasdeveloped. This has the form

ρcp∂u(t, x)∂t

= k∂2u(t, x)∂x2

,

where cp is the specific heat, ρ is the mass density, and k is the thermalconductivity.

1. One possible set of boundary conditions for this experiment is:

kux(t, 0) = Q, u(t, L) = uambient,

where Q is an unknown constant determining the constant heat flux atthe source, uambient is the known ambient (room) temperature, and Lis the length of the rod. The specified boundary condition at x = Limplies that we have a constant temperature end. Is this an accurateassumption? If not, what would be a reasonable alternative?

Formulate the least squares problem to estimate the parameters k andQ using the steady-state temperature values along the rod. The opti-mization problem associated with the inverse least squares techniquecan be solved numerically by using MATLAB routines fminsearch orfminu. Note that because steady-state temperature values are used asdata, we only have to solve a steady-state one dimensional heat equa-tion, which is a linear two-point boundary value problem for which anexact solution can be derived. Derive this exact solution. Also, discussall data or measurements that would be required in order to carry outthe parameter estimation problem.

2. Collect steady-state values of the temperature distribution in a copperrod. Using this data set, estimate the unknown parameters k and Q.Note that both MATLAB routines fminu and fminsearch are iterativemethods that require an initial guess for the parameter set. Are the


estimated parameters k and Q unique with respect to different initialguesses? Explain the results. Plot the solution of the mathematicalmodel against the data. How well does the model fit the data?

3. To compare the rates of heat flow in different materials under the sameconditions, set up a second experiment involving an aluminum bar. Re-peat part 2. above. Are the estimated values of k and Q changed (orunchanged) for the two rods? How do thermal conductivities of copperand aluminum bar affect their heat conduction? Compare the values ofk you obtain with those given in the literature for copper and aluminum.

4. Instead of assuming a nonhomogeneous Dirichlet boundary condition atx = L as in part 1., consider a convective cooling boundary conditionof the form

kux(t, L) = h[uambient − u(t, L)],

where h is the convective heat transfer coefficient. Repeat part 2. (whereyou now estimate k, h, and Q) and compare the results.

5. In practice, it is very difficult to ensure perfect insulation. In fact, inthe design of the experiment, we have an uninsulated metal bar whichis heated at one end and allows heat to escape along its entire length.

Under this new assumption of no insulation as well as assumptions (i)and (ii) above, derive the following equation for the conduction of heatin the rod:

ρcp∂u(t, x)∂t

= k∂2u(t, x)∂x2

− 2(a+ b)ab

h(u(t, x)− uambient), 0 ≤ x ≤ L,

where a and b are the dimensions of the cross-sectional area of the rod.The minus term on the right hand side of the above equation comesfrom the heat loss term along the length of the rod, which should bemodeled by Newton’s law of cooling.

Repeat steps 1.-4. using this model.

6. You should obtain improved fits to your data. Give a comparison anddiscussion of your findings for the two different models.

References

[1] H.T. Banks, S.R. Durso, M.A. Goodhart and K. Ito, Nonlinear exother-mic contributions to radio-frequency bonding of adhesives, Center forResearch in Scientific Computation, North Carolina State University,Technical Report, CRSC-TR98-24, 1998; Nonlinear Analysis: TheoryMethod and Applications; Series B, 2, 2001, pp. 257–286

[2] H.T. Banks and F. Kojima, Boundary shape identification problems intwo-dimensional domains related to thermal testing of materials, Quar-terly of Applied Mathematics, Vol. XLVII, No. 2, 1989, pp. 273–293.

[3] W. Boyce and R. Diprima, Elementary Differential Equations andBoundary Value Problems, John Wiley & Sons, Inc., Hoboken, 8th ed.,2004.

[4] R.B. Bird, W.E. Stewart and E.N. Lightfoot, Transport Phenomena,John Wiley & Sons, Inc., New York, 1960.

[5] C.J. Geankoplis, Transport Processes and Unit Operations, PrenticeHall, Englewood Cliffs, 1993.

[6] D.M. Heath, C.S. Welch and W.P. Winfree, Quantitative thermal diffu-sivity measurements of composites, in Review of Progress in QuantitativeNondestructive Evaluation, D.G. Thompson and D.E. Chimenti (eds.),Plenum Publ., 5B, 1986, pp. 1125–1132.

[7] F. Incropera and D. Dewitt, Fundamentals of Heat and Mass Transfer,John Wiley & Sons, New York, 1990.

[8] W. Kaplan, Advanced Calculus, Addison-Wesley Publishing Co., Inc.,New York, 1991.

[9] NIST Monograph 175, Temperature-Electromotive Force ReferenceFunctions and Tables for the Letter-Designated Thermocouple TypesBased on the ITS-90, NIST Monograph 175, 1993.

[10] W.A. Strauss, Partial Differential Equations: An Introduction, JohnWiley & Sons, New York, 1992.

101

Chapter 6

Structural Modeling:Force/Moments Balance

6.1 Motivation: Control of Acoustics/Structural Inter-actions

In many aircraft, a major component of the interior sound pressure fieldis due to structure-borne noise from the engines. For example, the rotat-ing blades in turboprop engines generate low frequency, high displacementacoustic fields that couple (nonlinearly) with the fuselage dynamics. Thesemechanical vibrations produce, through interactions with the air inside thecabin, interior sound pressure oscillations. If these acoustic pressure fields areleft uncontrolled, they can lead to undesirable conditions for the passengers.

One way to attenuate the interior acoustic pressure field is through the useof stiffer structures and acoustic damping material. However, this passivecontrol technique increases the weight of the aircraft and reduces its fuel effi-ciency. Another approach is active or feedback control of noise in the interioracoustic cavity. There is a substantial literature on this subject including afrequency domain approach (see, e.g., [5, 9]) as well as a time domain ap-proach (e.g., [1, 3]). For example, one strategy for active control of noiseis through the use of secondary source techniques with the secondary noisebased on feedback of noise levels in the acoustic cavity. In this approach,loudspeakers and microphones are strategically placed in the interior cavitywhere one can measure the pressure field. This information is used as feed-back for the speakers which generate an interfering secondary field to reducethe total noise levels (primary and secondary sources) in certain critical zones.This strategy is local in nature and requires a large array of external controlhardware.

Another strategy for reducing sound pressure levels is through the use ofsmart materials technology such as piezoceramic or electrostrictive elements.These piezoceramic patches when bonded to the fuselage act as an electro-mechanical transducer. That is, when excited by an electric field, the patchinduces a strain in the material to which it is bonded and hence can beused as an actuator. On the other hand, if the bonded material undergoesa deformation, this produces a strain in the patch which results in a voltage

103


across the patch (that is proportional to the the strain) and thereby permitsthe use of the patch as a mechanical sensor.

In [3] a model problem was considered that consists of an exterior noisesource which is separated from an interior cavity by an active plate. As a2-D analogue of the plate, the coupling boundary between the exterior noisesource and the interior cavity is modeled by a fixed-end Euler-Bernoulli beamwith Kelvin-Voigt damping (see Figure 6.1). The acoustic response inside thecavity is modeled by a linear wave equation.

Interior cavity

Elastic beam

Perturbing force due

to the exterior noise

FIGURE 6.1: 2-D fluid/structure interaction system.

In this chapter we will consider the development of the Euler-Bernoullimodel for the transverse displacement of a beam structure.

6.2 Introduction to Mechanics of Elastic Solids

In this section we briefly review the behavior of elastic bodies subjectedto various types of loading. This field of study is known in the literatureby several names including strength of materials, mechanics of materials, ormechanics of deformable bodies. Mechanics of solids is a fairly old and wellestablished subject. It dates back to the work of Galileo in the early partof the seventeenth century. Of course much progress has been made sincethen, notably the development of the subject by French investigators such asCoulomb, Poisson, Navier, St. Venant, and Cauchy in the nineteenth century.Today there is substantial literature on this subject, see for example, [7, 10, 13,

Structural Modeling: Force/Moments Balance 105

14] and the references contained therein. A good understanding of mechanicalbehavior is essential for the development of the mathematical model describingthe beam displacement to be discussed later in this chapter.

6.2.1 Normal Stress and Strain

The concepts of stress and strain can be illustrated using a prismatic bar(having uniform cross section throughout its length) with axial forces P ap-plied at both ends. In Figure 6.2, the axial forces produce a uniform stretchingof the bar, and the bar is said to be in tension.

L

A

B

P P

d

FIGURE 6.2: Prismatic bar deformation due to tensile forces.

To consider stresses and strains in this bar, we make an artificial plane cut(AB) that is perpendicular to the longitudinal axis of the bar (see Figure 6.2).This plane isolates the bar into two free bodies (parts of the bar to the leftand to the right of the cut). Consider the part of the bar to the right of thecut as depicted in Figure 6.3. The tensile load P acts at the right end ofthis free body, while at the other end are forces representing the action of theremoved part of the bar on the part that remains.

A

B

P

FIGURE 6.3: Normal stresses on the prismatic bar.


These forces are continuously distributed over the cross section. The inten-sity of this force, force per unit area, is called the stress, is denoted by σ, andis defined as

σ =forcearea

=F

A.

From the equilibrium of the body, the magnitude of F is equal to P and isin the opposite direction. This is indeed a consequence of Newton’s ThirdLaw which says to every action there is always an equal opposed reaction.Therefore,

σ =P

A.

When the bar is stretched or in tension, the stress is called tensile stress. Onthe other hand, if the forces are reversed in direction, causing the bar to becompressed, we obtain compressive stresses. A necessary condition for theformula

σ =P

A

to be valid is that the stress be uniform over the cross section of the bar. Thisis realized when P acts on the longitudinal axis of the bar as shown in Figure6.2. If the load P is not on this longitudinal axis, bending of the bar willoccur and the stress is more complicated to derive.

The elongation of the bar due to axial forces P is denoted by δ and theelongation per unit length

ε =δ

L

is called the strain, which is a dimensionless quantity. Just as in the case ofstress, there are tensile and compressive strains.

6.2.2 Stress and Strain Relationship (Hooke’s Law)

The relationship between stress and strain in a particular material is de-termined by a tensile test. A material, usually a prismatic bar, is placed ina testing machine and subjected to a tension. The force on the bar and theelongation of the bar are measured as the load is increased. The stress in thebar is found by dividing the force by the cross-sectional area. The strain isfound by dividing the elongation by the original length of the bar. By de-termining the stress and strain for various magnitudes of the load, we canplot a curve showing the stress-strain relationship. Structural steel, which isone of the most widely used metals in buildings, bridges, cranes, etc., whensubjected to tensile forces has a typical stress-strain diagram as depicted inFigure 6.4.

In the diagram from the point 0 to A, the stress-strain relationship is linear(i.e., the stress and strain are proportional). Beyond A, the stress-strainrelationship is nonlinear. The point A is called the proportional limit. BeyondA, increasing the loading increases the strain more rapidly than the stress


A

B

C

D

E

E'

0 e

s

FIGURE 6.4: Stress-strain diagram for a typical structural steel in tension.

until at point B when a considerable elongation begins to occur with a verysmall increase in the tensile force. This phenomenon is known as yielding ofthe material, and the stress at point B is known as the yield point (or yieldstress). In the region BC, the material is said to become plastic. At point C,the material begins to strain harden which resists further loading increases.Thus, with further elongation, the stress increases and it reaches an ultimatestress at point D. Beyond D, further increase in the strain causes a reductionin the stress until the material breaks down at point E.

In the vicinity of the ultimate stress, a reduction in the cross-section areaof the bar might occur (see Figure 6.5), called necking, which will have aneffect on calculating the stress. In this case, the stress-strain curve follows thedashed line CE′ as depicted in Figure 6.4.

PP

FIGURE 6.5: Necking of a prismatic bar in tension.

We note that aluminum alloys have a more gradual transition from thelinear region to the nonlinear region. In addition, both steel and aluminum


undergo a large strain before failure and are thus classified as ductile. An ad-vantage of ductility is that visible distortions may occur if the load becomesexcessively large, thus providing means for a visual inspection to take reme-dial action before an actual fracture occurs. Finally, stress-strain diagrams forcompression are different from the tension case. For example, ductile metalalloys such as steel, aluminum, and copper have proportional limits in com-pression very close to those in tension; however, when yielding begins, thebehavior is quite different.

So far we have discussed the behavior of elastic solids subjected to tensileforces or compressive forces. Now let us consider what happens when thematerial is unloaded. In this case, the elongation will either partially orcompletely disappear. If the bar completely recovers its original shape whenthe load is removed, it is said to be perfectly elastic. Otherwise, it is knownas partially elastic. In the latter case, the elongation that remains when theload is removed is called the permanent set.

For a perfectly elastic material, the process of loading and unloading canbe repeated for successively higher values of the loading force. Eventuallly,a stress will be found for which a residual strain remains during unloading(i.e., the material becomes a partially elastic material). In this case, thestress which represents the upper limit of the elastic region is known as theelastic limit. For steel, as well as many other metals, the elastic limit and theproportional limit nearly coincide.

Many structural materials have an initial region on the stress-strain curvelike the curve 0A in Figure 6.4. When a material behaves elastically andhas a linear relationship between stress and strain, it is called a linear elasticmaterial. Linear elasticity in the initial region of the stress-strain region is aproperty of many materials such as metals, plastics, wood, ceramics, etc.

The linear relationship between strain and stress can be described by

σ = Eε,

which is commonly known as Hooke’s Law. Here, E is a constant of propor-tionality and is known as the modulus of elasticity or Young’s modulus. It isgiven in units N/m2 (the same as stress). By the definition of stress, we have

σ =P

A= E

δ

L.

Solving for δ we obtain

δ =LP

EA.

Thus for a linear elastic bar, the elongation is directly proportional to L andP and inversely proportional to E and A. The product EA is known as theaxial rigidity of the bar. In addition, the flexibility of the bar is defined as thedeflection due to a unit value of the load and is given by L

EA . Similarly, thestiffness is defined to be the force required to produce a unit deflection and


is given by EAL , which is the reciprocal of the flexibility. These are important

concepts in the analysis of structural materials.When a prismatic bar is subjected to tensile loading, the axial elongation

is accompanied by a lateral contraction, which is normal to the direction ofthe applied forces. Within the elastic region, the ratio

ν =|lateral strain||axial strain|

is constant and is called Poisson’s ratio named after the famous French math-ematician S.P. Poisson (1781-1840). For isotropic materials (having the sameelastic properties in all directions), he found ν = 0.25.

6.2.3 Shear Stress and Strain

In the previous sections we described the behaviors of a prismatic bar un-der normal stresses (i.e., the loading forces act normal to the surface of thematerial). We now consider another kind of stress, called shear stress, thatacts parallel to the surface of the material. To illustrate the concept of shearstress, we consider a bolted connection as depicted in Figure 6.6.

P PA

D C

C

A B

Bar

Clevis

Bolt

FIGURE 6.6: Bolt subjected to bearing stresses in a bolted connection.

Under the influence of the tensile forces P , the prismatic bar and the cleviswill press against the bolt in shearing resulting in shearing stresses being


developed against the bolt (see Figure 6.7). In addition, the bar and theclevis tend to shear the bolt across the sections AB and DC as depicted alsoin Figure 6.7. In this particular consideration, there are two planes of shear(along AB and DC), and so the bolt is said to be in double shear. Each of theshear forces S is indeed equal to P/2. The exact distribution of these shearstresses is not easily determined, but its average value is given by

τAVG =S

A, (6.1)

where A is the cross-sectional area of the bolt. Since shear stresses act tangen-tial to the surface on which they act, they are also called tangential stresses.From equation (6.1), shear stresses, like normal stresses, represent intensity offorce per unit area and thus they have the same unit [force/area] as normalstresses.

A

D

C

BB

C

S

S

D

A

FIGURE 6.7: Shearing stresses exerted on the bolt by the prismatic barand the clevis.

To understand the deformation due to shear stresses, we next consider acubical element with a shearing stress acting on the top plate as shown inFigure 6.8.

If there are no normal stresses, equal and opposite shear stresses must acton the bottom plate (otherwise, the block will move horizontally). These twoshear stresses produce a moment which must be balanced by shearing stressesacting on the right and left plates (see Figure 6.9).


a b

cd

t

FIGURE 6.8: Shear stress acts on a rectangular cube.

a b

cd

t

tt

t

FIGURE 6.9: Shear stresses.

These shearing stresses will transform the square abcd into a rhombus asshown in Figure 6.10.

The angle γ, a measure of the distortion of the element due to shear, iscalled shearing strain and is given by

tan γ =β

|bc|.

When the material behavior is in a linear elastic region, the stress-strainrelationship, as discussed earlier, is linear as is the shear stress and shearstrain relationship. In particular, we have Hooke’s law for shear

τ = Gγ,

where G is the shear modulus of elasticity for the material. It is noted thatE and G are not independent of each other. Indeed, for so-called Hookean


a b

cd

t

t

t

t

b

g

FIGURE 6.10: Shear strains on the front side of the rhombus.

TABLE 6.1: Values of E and G for various materials.Density E GMaterial

( lbin3 ) (psi) (psi)

Aluminum 0.097 10× 106 4× 106

Aluminum Alloys 0.1− 0.3 10× 106 4× 106

Steel (mild high strength) 0.283 (29− 30)× 106 (11− 12)× 106

Copper 0.32 15× 106 6× 106

materials we have

G =E

2(1 + ν).

Values of E and G are listed in Table 6.1 for various commonly used materials[10].

6.3 Deformations of Beams

By the word beam, we mean a bar that is subjected to forces acting trans-versely to its axis. Beams are different from prismatic bars that are subjectedto tensile and compressive forces because of the directions of the load that areapplied to them. The loads on the bar act along its longitudinal axis. On theother hand, the loads on a beam act normal to its axis. Types of beams thatare frequently studied include the cantilever beam and the simply supported(or simple) beam [10]. For a cantilever beam, one end of the beam is fixedand the other end is free (see Figure 6.11). At the fixed (or clamped) end, the


beam can neither translate (horizontally or vertically) nor rotate. However,at the free end, it may do both.

P1

P2

P3

Forces

FIGURE 6.11: A cantilever beam.

Another example, shown in Figure 6.12, is a simply supported beam. Here,one end of the beam is supported by a pin and the other end by a roller.A pinned support is capable of resisting horizontal as well as vertical forces.Whereas, a roller support resists vertical forces only.

q, a distributed force

FIGURE 6.12: A simple beam.

For beams, loads can be concentrated forces such as P1, P2, and P3 asdepicted in Figure 6.11 or a distributed force such as q shown in Figure 6.12.Distributed loads are characterized by their intensity, which is the force perunit length (along the axis of the beam).


6.3.1 Differential Equations of Thin Beam Deflections

In this section we will develop general equations for the transverse deflec-tion or displacement of the center line or axis of a thin cantilever beam. Inparticular, we consider a cantilever beam (depicted in Figure 6.13) with a tipmass at the free end that is subjected to a distributed force f(t, x) along itsaxis, in units force per unit length (e.g., N/m).

y(t,x)

f(t,x)

xL0

FIGURE 6.13: A cantilever beam with a tip mass at the free end and issubjected to a distributed force f .

6.3.1.1 Force Balance

It is assumed that the beam is constructed and the in plane forces areapplied so that the beam bends only in transverse direction (with no out ofplane torsion or twisting about the axis of the beam). The beam is assumedthin and rigid so that its motion is completely characterized by the motion ofits neutral axis, which in this case is the same as its centroid.

Let y(t, x) denote the transverse displacement of the axis of the beam fromits rest position. If we cut through the beam at ab (see Figure 6.14) andisolate the part of the beam to the right of the cut as a free body, we see thatthe action of the removed part (left side of the cut) upon the part of the beamto the right of the cut must be such as to hold the right side of the beam inforce equilibrium. In Figure 6.14, M denotes the bending moments due tothe applied load and S denotes the shear forces.

Consider an incremental element between x and x + ∆x of the beam abdc


S M

SM

a

b

c

d

x x+∆x

FIGURE 6.14: Shearing forces and moments on a cantilever beam with atip mass at the free end.

as depicted in Figure 6.15. A general force balance yields∫ x+∆x

x

f(t, s) ds+ S(t, x+ ∆x) − S(t, x)− fI

−∫ x+∆x

x

γd∂y

∂t(t, s) ds = 0, (6.2)

where fI represents the inertia force due to the mass of the differential elementabdc and is given by a sum of point inertia forces fi

fI =∫ x+∆x

x

fi(t, s) ds =∫ x+∆x

x

ρ∂2y

∂t2(t, s) ds.

Here, ρ denotes the linear mass density in units mass per unit length. Theterm

fD =∫ x+∆x

x

fd(t, s) ds =∫ x+∆x

x

γd∂y

∂t(t, s) ds

in equation (6.2) represents the sum of viscous air damping forces fd(t, s) =γd

∂y∂t (t, s). After rearranging terms and dividing both sides of equation (6.2)

by ∆x, we obtain

1∆x

∫ x+∆x

x

f(t, s)ds+S(t, x+ ∆x)− S(t, s)

∆x

− 1∆x

∫ x+∆x

x

ρ∂2y

∂t2(t, s)ds− 1

∆x

∫ x+∆x

x

γd∂y

∂t(t, s)ds = 0.


Now taking the limit as ∆x→ 0 we find

f(t, x) +∂

∂xS(t, x)− ρ∂

2y

∂t2− γd

∂y

∂t= 0. (6.3)

S(t,x+∆x)

M

S(t,x)

f(t,s)

M

a

b

c

d

f (t,s) f (t,s)

x x+∆xs

di

FIGURE 6.15: Force balance on an incremental element of the beam.

Equation (6.3) describes the transverse beam displacement y in terms of theloading force f and the shear stress S. Next we will derive an expression forthe shear stress S by satisfying another requirement for equilibrium conditionsfor the beam segment abdc, namely, moment balance.

6.3.1.2 Moment Balance

In the previous section force balance,∑fk = 0, assures that one of two

requirements for equilibrium of a beam segment is met. In particular, theequation

∑fk = 0 is satisfied by relying on the existence of shear forces

at a section of a beam. The remaining condition of static equilibrium is∑Mk = 0. This, in general, can be satisfied by considering the internal

resisting moment within the cross-sectional area of the cut to balance theexternal moment caused by the load (see Figure 6.15). The internal momentmust act in a direction opposite to the external moments to satisfy the momentbalance equation (

∑Mk = 0). These moments bend the beam in the plane

of the loads and are generally referred to as bending moments about pointson the neutral axis (or more precisely, about axes perpendicular to the planeof loading passing through points on the neutral axis).

To maintain the equilibrium of the segment abdc, we consider moment bal-ance about the plane ac (actually about the neutral axis point midway between


a and c). Adapting the convention that positive (+) moment is counterclock-wise, we obtain

−M(t, x) + M(t, x+ ∆x) + S(t, x+ ∆x)∆x

+∫ x+∆x

x

[f(t, s)− fi(t, s)− fd(t, s)](s− x)ds = 0. (6.4)

After rearranging terms and dividing both sides of equation (6.4) by ∆x, wearrive at (taking F ≡ f − fi − fd)

M(t, x+ ∆x)−M(t, x)∆x

+ S(t, x+ ∆x) +1

∆x

∫ x+∆x

x

F (t, s)(s− x)ds = 0.

In the limit, as ∆x→ 0, we have:

∂

∂xM(t, x) + S(t, x) = 0,

or

S(t, x) = − ∂

∂xM(t, x).

Substituting the above expression for shear force into equation (6.3), we obtainthe expression describing the dynamics of the beam displacement

ρ∂2y

∂t2+ γd

∂y

∂t+

∂2

∂x2M(t, x) = f(t, x). (6.5)

For the above equation to be useful, we need to derive an expression for thebending moment M in terms of the beam displacement y.

6.3.1.3 Moment Computation

To continue theoretical development of in-plane beam deflections, the geom-etry of beam local deformation will be considered in this section. That is, todescribe the center line displacement, we need to understand internal momentswhich are produced by local deformations (elongations and compressions) inthe plane of displacement motion. We begin with the following fundamentalassumptions (which are often called the Euler-Bernoulli assumptions):

(i) Plane sections remain planes during deformation;

(ii) Transverse displacement is small compared to the length of the beam;

(iii) Shear deformation is generally very small and will be neglected (nointernal shear).

We note that for a beam where L = 10h, that is, a slender beam, the deflectiondue to shear is less than 1% [10].


In Figure 6.16 an initially straight segment of the beam (top figure) isshown after it undergoes deformation (bottom edge) and compression (topedge). The deflected neutral axis (that portion of the beam that does notundergo compression or elongation) of the beam OO′ is bent into a curvewith radius R. The center of curvature can be found by extending any twoadjoining sections such as a′e and b′f .

MM

a'a

a

b'b

b

c' c

c

d'd

d

O

O

O'

O'

-ξ

∆x

∆x

∆u

2

∆u

2

∆θ

center of

curvature

∆s

-ξ

∆s

e

e

f

f

R

FIGURE 6.16: Local deformation of a segment of the beam due to bending.

The arc length of ef is given by

∆s = R∆θ,

where ∆θ is the angle between two adjoining sections a′e and b′f . From theabove formula we obtain

∆θ∆s

=1R

= κ,


where κ denotes the curvature of the center line or neutral axis of the beam. Inaddition, the strained fiber at distance −ξ from the neutral axis has elongation(half length)

∆u2

= −ξ∆θ2. (6.6)

We note that in the above formula the negative sign (−) of ξ is consistentwith the elongation due to the deformation as depicted in Figure 6.16. Sincethe fiber ef is not strained, we obtain

∆s = ∆x.

Substituting this expression into equation (6.6), we obtain

∆u∆x

= −ξ∆θ∆s

= −ξκ.

The above formula can then be used to compute the strain ε to be

ε =∆u∆x

= −ξ∆θ∆x

= −ξκ, (6.7)

where ∆u is the elongation due to deformation and ∆x is the original lengthof the beam segment.

Next, by Hooke’s law we obtain

σ = Eε

= −Eξκ= −βξ, (6.8)

where β = Eκ. Equations (6.7) and (6.8) show that normal strain and stressvary linearly with their respective distances ξ from the neutral axis (see alsoFigure 6.17). The absolute maximum stress and strain occur at the edges ofthe beam.

Balancing the moments about the point e at x, we obtain

M +∫A

σξ dA = 0,

where σdA is the infinitesimal force acting on a cross-sectional element of areadA. Substituting expression (6.8) for the stress σ we find

M =∫A

βξ2 dA

= β

∫A

ξ2 dA, (6.9)

where∫Aξ2 dA is the moment of inertia I of the cross-sectional area about

the centroidal axis and ξ is measured from this axis. In fact, the neutral axis


σ

εmax

e

a′a

c′ c

FIGURE 6.17: Stress and strain as functions of distances from the neutralaxis at the point x (or e) on the neutral axis.

of the beam passes through the centroid of the cross-sectional area. This canbe seen from the following observation: by force balancing, we have∑

Fk = 0,

or ∫A

σ dA = 0.

Using equation (6.8) we obtain

β

∫ξ dA = 0,

which implies ∫ξ dA = 0.

Therefore, we have ∫ξ dA = ξA = 0,

orξ = 0.

That is, the distance from the neutral axis to the centroid ξ is zero, or theneutral axis passes through the centroid of the cross-sectional area.

From equation (6.9) and the definition of the parameter β in equation (6.8),we obtain the expression for the bending moment

M = EIκ. (6.10)


Next, we recall from elementary calculus that for a curve (in our case thedisplacement of the center line of the beam) given by z = z(x), the curvatureis given by the formula

κ =d2zdx2[

1 +(dzdx

)2] 32.

If dzdx 1 (a condition justified by the small displacement assumption (ii) at

the beginning of this section), we obtain

κ ≈ d2z

dx2. (6.11)

Hence the bending moment is expressed in term of the beam displacementy(t, x) by the equation

Mint = Mbending = EI∂2y

∂x2. (6.12)

Before closing this section, we note that for a beam with a rectangularcross-sectional area (thickness h and width b) as depicted in Figure 6.18, themoment of inertia I can be computed as a function of the cross-sectional widthand height as

I =∫A

ξ2dA

=∫ h

2

−h2

∫ b2

− b2ξ2dzdξ

= b

∫ h2

−h2ξ2dξ

= bh3

12.

Finally, the formula (6.12) that we derived above did not take into accountany resistance of material (internal resistance) to bending. For a materialknown as a viscous material or viscoelastic material, the stress depends onstrain rate as well as strain [10]. That is,

σ = Eε+ cD ε.

From moment balance we have for such materials

M = −∫A

σξ dA

= −∫A

(Eε+ cD ε)ξ dA.


h

b

x

y

z

FIGURE 6.18: Segment of a beam with a rectangular cross-sectional area.

Substituting equations (6.7) and (6.11), after some simple calculations weobtain

M = Mbending +Mdamping

≈ EI∂2y

∂x2+ cDI

∂3y

∂x2∂t. (6.13)

It is seen from the above equation that for a viscoelastic material, the mo-ment is the sum of two terms: the first term is the standard internal (bending)moment and the second term is due to the structural damping (which is alsoreferred to as Kelvin-Voigt damping). There are other types of damping mod-els that one encounters frequently in practice. For example, for compositefiberous materials

Mdamping(t, x) =∫ l

x

dξ

∫ l

0

b(ξ, θ)[∂2y

∂ξ∂t(t, ξ)− ∂2y

∂θ∂t(t, θ)

]dθ.

This is known as spatial hysteresis damping. It follows from the above formulathat the shear force is given by

S(t, x) = −∂M∂x

(t, x) =∫ l

0

b(x, θ)[∂2y

∂x∂t(t, x)− ∂2y

∂θ∂t(t, θ)

]dθ.

Another type of damping known as time hysteresis damping is used to modelmany viscoelastic materials such as rubbers and other polymers, electromag-netic materials, and “smart materials” such as ferroelectrics and ferromag-netics. These materials exhibit hysteresis in their stress-strain curves. Inparticular, the stress-strain relationship has the form

σ = Eε+ E1ε1,


where ε1 is called the “internal” strain variable. One simple model for theinternal strain variable is

ε1 + cε1 = kε; ε1(0) = 0.

Expressing the solution to the above first order linear differential equation bythe variation of constants formula and substituting it into the stress-strainrelationship, we obtain

σ(t) = Eε+ E1

∫ t

0

e−c(t−s)kε(s) ds

= Eε(t) +∫ t

0

K(t− s)ε(s) ds, (6.14)

where K(·), the kernel function, is given by the exponential function for thesimple internal strain model given above. Using the expression

M = −∫A

σξ dA

and equations (6.7) and (6.11), we obtain the following formula for the moment

M(t, x) = EI∂2y

∂x2(t, x) +

∫ t

0

K(t− s)∂2y

∂x2(s, x) ds

= Mbending +Mhysteresis damping

6.3.1.4 Initial Conditions

If one does not take into account any resistance of material to bending, themoment is given by

M = Mint = Mbending = EI∂2y

∂x2.

Substituting this expression into the dynamical equation (6.5) for transversebeam displacement, we have

ρ∂2y

∂t2+ γd

∂y

∂t+

∂2

∂x2

(EI

∂2y

∂x2

)= f(t, x). (6.15)

This dynamical equation is second order in time and fourth order in space.Therefore, for well posedness of solutions, two initial conditions are required.These are usually initial displacement and initial velocity which are given by

y(0, x) = y1(x),∂

∂ty(0, x) = y2(x),

where y1 and y2 are known functions.


6.3.1.5 Boundary Conditions

From the discussions above, we found that the dynamical equation (6.15)for the cantilever beam displacement is fourth order in space. Therefore, fourboundary conditions need to be specified. Two of these boundary conditionsare naturally specified at the fixed (clamped) end. This type of support resistsa displacement in any direction as well as a slope in the beam. Therefore, wehave the following mathematical conditions:

• Fixed end support (x = 0):

y(t, 0) = 0∂

∂xy(t, 0) = 0.

There are other types of supports used frequently for beams loaded withforces acting in the same plane. These include:

• Pinned end support (x = 0):Since a pinned (or simple) end support (shown in Figure 6.19) resistsa force acting in any direction of the plane, both the displacement andmoment are specified to be zero. That is,

y(t, 0) = 0 = M(t, 0).

Actual (pinnned) Diagramatic (resisting

forces in all directions)

FIGURE 6.19: Pinned end support.

• Free end support (x = 0):For this type of support, the moment as well as the shear force are zero.Thus,

M(t, 0) =∂M

∂x(t, 0) = 0.


• Frictionless ring or roller end support (x = 0):This type of support (see Figure 6.20) allows vertical forces, resistinglongitudinal forces only. Therefore, both the slope and the shear forceare zero as specified by

∂y

∂x(t, 0) =

∂M

∂x(t, 0) = 0.

(resisting force in only one direction --

the longitudinal direction)

FIGURE 6.20: Frictionless roller end support.

• Tip mass (x = L):To allow for a more general (and often useful) formulation, we considernext a cantilevered beam with a tip mass at the free end as depicted inFigure 6.21. Figure 6.22 shows a segment of the beam at the tip end

f(t,x)

xL0

g(t)

FIGURE 6.21: Cantilever beam with a tip mass.

after it undergoes deformation due to rotational inertia.


y(t,L)

q

qd

hdeformation due

to rotational inertia

FIGURE 6.22: Local deformation of the cantilever beam with tip mass.

Since θ in Figure 6.22 is small by the Euler-Bernoulli assumption (ii),we have

tan θ =sin θcos θ

≈ sin θ=η

δ,

which implies thatη = δ tan θ.

Hence, the total transverse displacement of the center of mass of the tipbody is

η + y(t, L).

Considering the dynamics at the tip mass and applying force balanceincluding the inertia force at the tip (see Figure 6.23), we find

−S(t, L) + g(t)− (ftip)I = 0,

or,

−S(t, L) + g(t)−m ∂2

∂t2[η + y(t, L)] = 0,

wherem is the tip body mass and g is the resultant external force appliedto the center of mass of the tip body. Thus, after arranging terms, we


d

s

h

g(t)

(ftip

)

S(t,L)

I

q

FIGURE 6.23: Force balance at the tip mass.

obtain

− ∂

∂xM(t, L) +m

∂2

∂t2[δ tan θ + y(t, L)] = g(t).

It remains to determine tan θ. Consider a differential beam element abdcas before. The deformation of the beam due to the rotation of the beamcross section is illustrated in Figure 6.24. From Figure 6.24, it followsthat

θ ≈ tan θ

≈ y(t, x+ ∆x)− y(t, x)∆x

≈ ∂y

∂x,

where ∆x is small. Here we utilize the assumptions that plane sec-tions remain planes and that there is no shear deformation. Hence theboundary condition at x = L is given by

∂M

∂x(t, L) +m

∂2

∂t2

[δ∂

∂xy(t, L) + y(t, L)

]= g(t).

It should be emphasized that if we also consider beam deformation dueto shearing, the deformation dynamics can also be written but are muchmore complicated.


aa'

b

b'

c

c'

d

d'

q

q

FIGURE 6.24: Deformation of the beam due to the rotation of the beamcross section.

To obtain the second boundary condition at x = L, we will now applyNewton’s second rotational law (the sum of all moments equals to theproduct of the moment of inertia and the angular acceleration). Notethat since it is a free end at the tip mass, it is free to rotate. Hence wehave ∑

MI = Jα(t). (6.16)

This expression is the counterpart to Newton’s second law for rectilin-ear motion, which states that the sum of all applied forces equals tothe product of mass and acceleration. In equation (6.16), J is the mo-ment of inertia about the axis of rotation (which describes the spreadof distribution of a region about an axis) and is given by

J =∫ ∫

V

∫ρr2dV,

where ρ is the tip mass density. In addition, the angular acceleration,α(t), can be computed from the following expressions

α(t) =d

dt(dθ

dt),

≈ d

dt(d

dt

∂y

∂x(t, L)),

=∂3

∂t2∂xy(t, L). (6.17)

Finally, from Figure 6.25, we obtain that∑MI = −M(t, L)+S(t, L)δ+

h(t), where h is the external applied moment about the center of massof the tip body. Substituting this expression for the moment of inertia


as well as the formula for the angular acceleration (6.17) into Newton’ssecond rotational law (6.16), we obtain

J∂3

∂t2∂xy(t, L) = −M(t, L) + S(t, L)δ + h(t)

or

J∂3

∂t2∂xy(t, L) + δ

∂

∂xM(t, L) +M(t, L) = h(t),

which gives us the required second boundary condition at x = L.

h(t)M(t,L)

g(t)

S(t,L)

FIGURE 6.25: Moment balance at the tip mass.

6.4 Separation of Variables: Modes andMode Shapes

During the last two centuries several methods have been developed for solv-ing the types of partial differential equations such as those describing trans-verse beam displacement (e.g., (6.15)). Among these, in this section we willconsider the method of separation of variables. This is perhaps the oldestsystematic and relatively simple method for solving partial differential equa-tions. It has been used since 1750 by D’Alembert, D. Bernoulli, and Euler intheir studies of the wave equation (to be discussed later in this manuscript).In essence, the principal feature of this method is the replacement of the par-tial differential equation by a family of ordinary differential equations. Thesolution of the partial differential equation is then expressed, in general, asan infinite sum of solutions to these ordinary differential equations, usually in


terms of trigonometric functions. Because of its relative simplicity and manyimportant and practical problems to which it is applicable, the method of sep-aration of variables has became one of the classical techniques of mathematicalphysics.

We begin by considering a simply supported (pinned-pinned) and undampedbeam with no forcing function. Its transverse displacement satisfies

ρy + (EIy′′)′′ = 0 (6.18)

with boundary conditions

y(t, 0) = y(t, L) = 0, (6.19)

M(t, 0) = M(t, L) = 0, (6.20)

where M(t, x) = EI ∂2y∂x2 and L is the length of the beam. Here y denotes

the temporal derivative ∂y∂t , and y′ = ∂y

∂x , the spatial derivative. The initialconditions are given by

y(0, x) = y1(x), (6.21)

y(0, x) = y2(x), 0 ≤ x ≤ L. (6.22)

The simply supported beam problem (6.18)-(6.22) is a linear homogeneousdifferential equation with linear boundary conditions. This suggests that onemight want to seek solutions of the differential equation and boundary condi-tions, and then superpose them to satisfy the initial condition. In particular,we will apply the method of separation of variables to find these solutions.

In the method of separation of variables, one seeks a solution to the partialdifferential equation of the form

y(t, x) = w(t)φ(x). (6.23)

Substituting equation (6.23) for y into the differential equation (6.18), weobtain

ρwφ+ w(EIφ′′)′′ = 0,

or, equivalently,

− ww

=(EIφ′′)′′

ρφ. (6.24)

The variables in equation (6.24) are now separated if ρ and EI depend onlyon x and not on t; that is, the left term depends only on t and the right termdepends only on x. Since the equation (6.24) is valid for 0 < x < L andt > 0, it is necessary that both sides of equation (6.24) be equal to the sameconstant, which we denote by β. Thus equation (6.24) becomes

− ww

=(EIφ′′)′′

ρφ= β


from which we obtain the following two ordinary differential equations forw(t) and φ(x):

w + βw = 0, (6.25)(EIφ′′)′′

ρ− βφ = 0. (6.26)

Consequently, the partial differential equation (6.18) is replaced by two ordi-nary differential equations. If we assume that EI is constant, each of theseequations can be readily solved for any constant β. However, we are onlyinterested in those solutions of equation (6.18) that also satisfy the boundaryconditions (6.19) and (6.20). This, in turn, restricts the possible values of βas we shall see below.

Substituting for y(t, x) in equation (6.23) the boundary condition at x = 0,we obtain

y(t, 0) = w(t)φ(0) = 0, EI∂2y

∂x2(t, 0) = EIw(t)φ′′(0) = 0. (6.27)

Expressions given by equation (6.27) are satisfied if w(t) = 0 for all t orφ(0) = φ′′(0) = 0. However, the condition w(t) = 0 for all t would implythat y(t, x) = 0. This is not acceptable since it does not satisfy the initialconditions (6.21)-(6.22) except in the trivial case y1 = y2 = 0. Therefore,equations (6.27) are satisfied by requiring that

φ(0) = φ′′(0) = 0. (6.28)

Similarly, substituting for y(t, x) in equation (6.23) the boundary conditionat x = L we find

φ(L) = φ′′(L) = 0. (6.29)

Returning to (6.26), we have

φ′′′′ =βρ

EIφ. (6.30)

Now, multiplying both sides of equation (6.30) by φ, integrating both sides ofthe resulting equation from 0 to L, and then integrating by parts twice fromthe left side we obtain

φ′′′(L)φ(L) − φ′′′(0)φ(0)− φ′′(L)φ′(L)

+φ′′(0)φ′(0) +∫ L

0

(φ′′(x))2 dx =∫ L

0

βρ

EIφ2(x) dx. (6.31)

Substituting the boundary conditions (6.28)-(6.29) into equation (6.31) weobtain ∫ L

0

(φ′′(x))2 dx =∫ L

0

βρ

EIφ2(x) dx. (6.32)


Since EI and ρ are all positive physical quantities, we must have β = 0 orβ > 0. We note that if β = 0, then the solution to the equation (6.31)with boundary conditions (6.28)-(6.29) is φ(x) = 0 for all 0 ≤ x ≤ L. Thiswould also imply that the solution y(t, x) = 0, which is unacceptable sincethis solution does not satisfy the initial conditions (6.21)-(6.22) except in thetrivial case.

Summarizing our results to this point, we have shown that we can satisfythe boundary condition (6.20) for nontrivial solutions only if the separationconstant β is positive. The corresponding solution to the linear, constantcoefficient equation (6.30) can be now readily obtained from standard methodsof ordinary differential equations and is given by

φ(x) = a cos(ξx) + b cosh(ξx) + c sin(ξx) + d sinh(ξx), (6.33)

where ξ =(βρEI

)1/4

. Next, applying the boundary conditions (6.28) at x = 0,we have

a = b = 0.

In addition, applying the boundary conditions (6.29) at x = L, we find

d = 0, c sin(ξL) = 0.

Hence, either c = 0 or sin(ξL) = 0. But c = 0 implies that φ(x) = 0 andconsequently y(t, x) = 0, which is again unacceptable for nontrivial solutions.Hence, we must have sin(ξL) = 0 and from this condition, we obtain

ξL = nπ, n = 1, 2, . . . . (6.34)

We have thus shown that we can satisfy the boundary condition (6.20) onlyif the separation constant β is positive and is given by

βn =EI

ρ

(nπL

)4

, n = 1, 2, . . . , (6.35)

where we used ξ =(βρEI

)1/4

. In addition, the functions

φn(x) = sin(nπLx), n = 1, 2, . . . , (6.36)

satisfy the boundary conditions (6.28)-(6.29) and the differential equation(6.30). These functions are called eigenfunctions as well as mode shapes inengineering literature. Next, upon substituting the values of β given by equa-tion (6.35) into the differential equation (6.25), we find that w(t) is givenby

wn(t) = An cosωnt+Bn sinωnt, (6.37)


where ωn =√βn =

√EIρn2π2

L2 , and An and Bn are arbitrary constants for n =1, 2, . . .. The constants ωn are called modes or natural frequencies. Therefore,we conclude that the functions

yn(t, x) = (An cosωnt+Bn sinωnt) sin(nπLx), n = 1, 2, . . . , (6.38)

satisfy the differential equation (6.18) and the boundary conditions (6.20).However, since the differential equation and the boundary conditions are linearand homogeneous, by the superposition principle, any linear combination ofthe yn(t, x) also satisfies the differential equation and boundary conditions.Hence, we have

y(t, x) =∞∑n=1

cnyn(t, x)

=∞∑n=1

(An cosωnt+Bn sinωnt)) sin(nπLx), (6.39)

where, for simplicity, we absorbed the constants of proportionality cn into Anand Bn. It now only remains to satisfy the initial conditions (6.21)-(6.22).

That is, upon substituting the initial conditions into the equation (6.39) fory(t, x), we must have

y1(x) =∞∑n=1

An sin(nπLx)

(6.40)

and

y2(x) =∞∑n=1

ωnBn sin(nπLx). (6.41)

These infinite series for y1 and y2 are the well known Fourier series (see Ap-pendix A). The coefficients An and Bn are readily obtained from the followingEuler-Fourier formulas [4]:

An =2L

∫ L

0

y1(x) sin(nπLx)dx,

Bn =2

ωnL

∫ L

0

y2(x) sin(nπLx)dx. (6.42)

These formulas as well as conditions on the functions y1, y2 that guaranteeconvergence of a Fourier series to the function from which its coefficientswere computed can be found in most books on advanced calculus or appliedmathematics (e.g., [6, 8]) (a brief introduction to the Fourier series and Fouriertransform are given in Appendix A).

The method of separation of variables can also be used to solve the dampedbeam vibration problems with other boundary conditions than the pinned-pinned type given by equations (6.19)-(6.20). For example, in the case of


internal damping such as Kelvin-Voigt damping (6.13), the equation for thetransverse beam displacement is given by

ρy + (EIy′′)′′ + (cDIy′′)′′ = 0. (6.43)

Without loss of generality, we assume that boundary conditions and initialconditions are the same as in the undamped case and are given by equations(6.19)-(6.20) and (6.21)-(6.22), respectively (the project assignment at theend of this chapter deals with the boundary condition case associated withthe cantilever beam). This resulting problem is solved by essentially the sameprocedure as in the undamped problem considered above. That is, assumingy(t, x) = w(t)φ(x) and EI and cDI are constants, we obtain

ρwφ+ EIwφ′′′′ + cDIwφ′′′′ = 0.

Further simplifications yield

w

w+EI

ρ

φ′′′′

φ+cDI

ρ

w

w

φ′′′′

φ= 0. (6.44)

The above equation (6.44) is not readily separable. However, if we assumethat

EI

ρ

φ′′′′

φ= β

as in the undamped case above, where β is a positive constant, then we obtainthe same mode shape functions as before

φn(x) = sin(nπLx)

and the same separation constants βn = EIρ

(nπL

)4, for n = 1, 2, . . .. Further-more, equation (6.44) now becomes

w

w+ βn + ζn

w

w= 0, (6.45)

where ζn = cDIEI βn. Equation (6.45) is a simple linear, second-order ordinary

differential equation with constant coefficients. Its solution is given by

wn(t) = e−ζn2 t (An cos ωnt+Bn sin ωnt) ,

where ωn =√βn − ζ2n

4 =√βn

(1− (cDI)2

(EI)2βn4

). It is worth observing that the

undamped frequencies are larger than the damped ones; that is, ωn > ωn forn = 1, 2, . . .. Consequently, since all structures and materials have damping,we should expect that the actual frequencies depend on this damping and arelower than the calculated so-called “natural” frequencies which ignore dampingin the model.


In this section we have shown how the method of separation of variablescan be used to obtain the solution to beam vibration problems. In general,the method can be extended to a larger class of problems including prob-lems described by more general differential equations, more general boundaryconditions, or different geometrical regions. Because of its relative simplicityand wide applicability in many important physical applications, the method ofseparation of variables remains a method of great importance today. However,this method does have certain limitations. In the first place, the differentialequation must be linear so that the principle of superposition can be appliedto construct additional solutions by taking linear combinations of the funda-mental solutions of the appropriate homogeneous problem. Secondly, in someproblems to which the method of separation of variables can be applied inprinciple, the solvability of the ordinary differential equations is not trivial.For example, in general, EI and/or ρ are not constant but spatially depend-ent, which would render the method of separation of variables of very limitedpractical value due to the lack of information about the solutions of the ordi-nary differential equation (6.26). In the next two sections we will show how alarge class of problems can be represented by truncated series similar in spiritto that in equation (6.39). Moreover, we will show that the coefficients inthese representations can be determined in a very simple manner.

6.5 Numerical Approximations: Galerkin’s Method

In the last section we showed how to solve the simply supported dampedbeam vibration problem by the method of separation of variables. In principle,we sought solutions of the form

y(t, x) = w(t)φ(x),

which led to the representation

y(t, x) =∞∑n=1

wn(t)φn(x). (6.46)

In that case the eigenfunctions or mode shapes are given by

φn(x) = sin(nπLx)

andwn(t) = wn(t) = e−

ζn2 t (An cos ωnt+Bn sin ωnt) ,

where the coefficients An and Bn are determined from the initial conditions(6.21)-(6.22).


It is clear that if we take only a finite number N of terms in the series(6.46), then we obtain only an approximation yN of y:

y(t, x) ≈ yN (t, x) =N∑n=1

wn(t)φn(x). (6.47)

The N basis functions φ1, φ2, . . . , φN define an N -dimensional subspacesince each function yN is determined by a linear combination of only the Nfunctions φ1, . . . , φN in equation (6.47). It should be emphasized that, ingeneral, the basis functions φi(x) need not be trigonometric but may be lesssmooth functions. In fact, in finite element methods, the main idea is that thebasis functions φi are defined piecewise over subregions of the domain calledfinite elements. In addition, over any subdomain, the φi can be chosen to bevery simple functions such as polynomials of low degree.

We are now ready to discuss Galerkin’s method for constructing approxi-mate solutions to a model problem (equation (6.5) with γd = 0):

ρy +M ′′ = f (6.48)

with boundary condition

y(t, 0) = y(t, L) = 0,

M(t, 0) = M(t, L) = 0, (6.49)

where M(t, x) = EIy′′ + cDIy′′ and L is the length of the beam. The initial

condition is given by

y(0, x) = y1(x), (6.50)y(0, x) = y2(x). (6.51)

We begin by considering our model problem in a weak form given as fol-lows: find the function y such that the differential equation, together with theboundary conditions, are satisfied in the sense of weighted averages. Precisely,by the satisfaction of the differential equation in a weighted average sense, wemean that ∫ L

0

[ρy +M ′′]φdx =∫ L

0

fφ dx (6.52)

for all members φ that belong to a suitable class of functions. In the equation(6.52) the function φ is called the weight function or test function that hasto be sufficiently smooth so that the integrals make sense. In fact, the testfunctions in weak formulations such as (6.52) may not belong to the sameclass of functions as the class to which the solution y belongs as a functionof x. The class of functions to which the solution y belongs as a functionof x is called the class of trial functions. For example, y might be chosento be in a class of functions such that their fourth spatial derivatives, when


multiplied by a test function φ, produce a function y′′′′φ that is integrable onthe interval (0, L). However, in the weak form (6.52), the test function has noderivatives at all. Hence, although the equation (6.52) is correctly formulatedas a variational or weak form of the model problem (6.48), the spaces to whichthe solution and the test function belong are not the same. Consequently, theweak form (6.52) may not be suitable for easy theoretical or computationalconsiderations.

Therefore, in order to overcome this lack of symmetry in the formulation,we now assume that the solution y and the test function φ are sufficientlysmooth functions so that we can perform standard integration by parts twiceon the moment term to obtain∫ L

0

[ρyφ+Mφ′′] dx+M ′φ|L0 −Mφ′|L0 =∫ L

0

fφ dx. (6.53)

To continue with the above formulation, it is important to identify two typesof boundary conditions associated with any differential equations. These arecalled the natural and essential boundary conditions. After integration byparts, we examine all boundary terms of the variational formulation. From(6.53) we note that the boundary terms involve both the test function φ andthe dependent variable y. Coefficients of the test function and its derivatives inthe boundary expressions are called the secondary variables. Specification ofsecondary variables on the boundary yields the natural boundary conditions.Hence, from (6.53), the secondary variables are M ′ and M , where M(t, x) =EIy′′ + cDIy

′′, and the natural boundary conditions involve specifying M ′

and M at the boundary points. We also emphasize that secondary variablesalways have physical meaning and are quantities of interest. In our modelproblem, the secondary variables M and M ′ represent bending moment andshear force, respectively.

On the other hand, the dependent variable of the problem, when expressedin the same form as the test function appearing in the boundary terms, iscalled the primary variable. Specifying the primary variable at the boundarypoints constitutes the essential boundary conditions. For the case under con-sideration, the test function appears as φ and φ′. Therefore, the dependentvariable y and its derivative y′ are the primary variables, and evaluations ofthese variables at the boundary points constitute essential boundary condi-tions.

With these definitions behind us, we now return to the weak form (6.53).If we assume that the test functions vanish at the boundary endpoints andapply the boundary conditions (6.49), we obtain∫ L

0

[ρyφ+Mφ′′] dx =∫ L

0

fφ dx, (6.54)

for all admissible test functions φ. Therefore, the statement (6.53) can bereplaced by the following alternative weak or variational formulation: find


y(t, ·) ∈ V = H2⋂H1

0 such that

∫ L

0

[ρyφ+ EIy′′φ′′ + cDIy′′φ′′] dx =

∫ L

0

fφ dx, (6.55)

for all φ ∈ V . Here, the class of test and trial functions is defined byV = H2

⋂H1

0 where H2(0, L) = φ ∈ L2(0, L) : φ′ ∈ L2(0, L), φ′′ ∈ L2(0, L)and H1

0 (0, L) = φ ∈ L2(0, L) : φ′ ∈ L2(0, L) with φ(0) = φ(L) = 0. Wenote that, from (6.55), there is a certain symmetry in the formulation: that is,the same order of derivatives appear in both the test and trial functions. Inaddition, as we pass from the weak formulation (6.52) to (6.55) we have pro-gressively weakened the smoothness assumptions (in x) on our solution y and,consequently, enlarged the class of functions for which the weak formulationmakes sense.

Galerkin’s method consists of seeking an approximate solution to the weakform (6.55) in a finite-dimensional subspace V N of the space V . That is,we seek an approximate solution yN in V N = spanBN0 , BN1 , . . . , BNN of theform

yN (t, x) =N∑j=0

wNj (t)BNj (x) (6.56)

such that ∫ L

0

[ρyNφ+ EI(yN )′′(φN )′′ + cDI(yN )′′(φN )′′

]dx

=∫ L

0

fφN dx, (6.57)

for all φN ∈ V N . It is noted that BNj are assumed to be known basis functions,and hence the approximate solution yN will be completely determined oncethe coefficient functions wNj are found. Furthermore, in Galerkin’s methodthe test functions φN are chosen to be the same as the basis functions BNj .Hence, to determine the specific functions wNj (t) that will characterize theapproximate solution yN , we introduce φN = BNk , k = 0, 1, . . . , N , and theapproximation (6.56) into the equation (6.57) to obtain

∫ L

0

ρ∑j

wNj (t)BNj (x)BNk (x) + EI∑j

wNj (t)(BNj (x))′′(BNk (x))′′

+cDI∑j

wNj (t)(BNj (x))′′(BNk (x))′′

dx =∫ L

0

f(t, x)BNk (x) dx, (6.58)


for k = 0, 1, . . . , N . Interchanging the summation and the integration, we find

N∑j=0

wNj (t)∫ L

0

ρBNj (x)BNk (x) dx

+N∑j=0

wNj (t)∫ L

0

cDI(BNj (x))′′(BNk (x))′′ dx

+N∑j=0

wNj (t)∫ L

0

EI(BNj (x))′′(BNk (x))′′ dx =∫ L

0

f(t, x)BNk (x) dx,(6.59)

for k = 0, 1, . . . , N . The structure of the above equation (6.59) is most easilyseen by rewriting it in the following vector form

M d2

dt2~wN (t) + C d

dt~wN (t) +K~wN (t) = ~FN (t), (6.60)

where the vector functions are defined by

~wN (t) =(wN0 (t), wN1 (t), . . . , wNN (t)

)T,

~FN (t) =

∫ L

0f(t, x)BN0 (x) dx∫ L

0f(t, x)BN1 (x) dx

...∫ L0f(t, x)BNN (x) dx

,

and the elements of the matrices M, C, and K are given by

(M)ij =∫ L

0

ρBNi (x)BNj (x) dx

(C)ij =∫ L

0

cDI(BNi (x))′′(BNj (x))′′ dx

(K)ij =∫ L

0

EI(BNi (x))′′(BNj (x))′′ dx,

for i, j = 0, 1, . . . , N . These matrices M, C and K are referred to as themass matrix, the damping matrix, and the stiffness matrix, respectively. Tosolve the second order system of ordinary differential equation (6.60) we needthe initial conditions. Substituting the approximation (6.56) into the initialconditions (6.51), we have

yN (0, x) =N∑j=0

wNj (0)BNj (x) ≈ y1(x). (6.61)

In general, we should not expect y1 to lie in V N , and hence this initial con-dition can only be satisfied approximately. To do this, we proceed as follows.


Multiplying both sides of the equality of part (6.61) by BNk (x) and integratingboth sides from 0 to L, we obtain

∫ L

0

y1(x)BNk (x) dx =N∑j=0

wNj (0)∫ L

0

BNj (x)BNk (x) dx, (6.62)

for k = 0, 1, . . . , N . Equation (6.62) is a linear system of equations for theunknowns

(wN0 (0), wN1 (0), . . . , wNN (0)

). Similarly, using the other initial con-

ditions y(0, x) = y2(x) we obtain the initial condition for the time derivativeddt (~w

N (t)) at t = 0.It is important to note that the quality of approximation is completely de-

termined by the choice of the basis functions BNj . Once these functions havebeen chosen, the determination of coefficients wNj reduces to a computationalmatter, which is one of solving for the solution of the ordinary differentialequation (6.60). For example, if the basis functions BNj are chosen to bethe trigonometric function sin jπx

L as in the Fourier series representation, themass, damping, and stiffness matrices become diagonal matrices and the ap-proximation (6.56) is known as the modal approximation. If one chooses thebasis functions BNj to be spline functions, the matrices M, C, and K be-come banded matrices. For example, let us partition the domain of our modelproblem into N finite elements of equal length h = hN = L/N . One set ofadequate elements such that BNj ∈ H2

⋂H1

0 is the standard cubic spline givenby

BNj (x) =1h3

(x− xj−2)3, x ∈ [xj−2, xj−1],h3 + 3h2(x− xj−1)+3h(x− xj−1)2 − 3(x− xj−1)3, x ∈ [xj−1, xj ],h3 + 3h2(xj+1 − x)+3h(xj+1 − x)2 − 3(xj+1 − x)3, x ∈ [xj , xj+1],(xj+2 − x)3, x ∈ [xj+1, xj+2],0, otherwise,

(6.63)

where xj = xNj = jLN , and j = −1, 0, 1, . . . , N,N + 1. However, to satisfy the

essential boundary conditions BNj (0) = BNj (L) = 0 we further modify theseN + 3 cubic splines to be of the form

BNj (x) =

BN0 (x)− 2BN−1(x)− 2BN1 (x), j = 0,BNj (x), j = 1, 2, . . . , N − 1BNN (x)− 2BNN−1(x)− 2BNN+1(x), j = N.

The resulting mass, damping, and stiffness matrices in the finite-dimensionalsystem (6.60) are 7 bandwidth banded matrices with upper and lower band-width equal to 3. The space V N = spanBNj is then of dimension N + 1.


6.6 Energy Functional Formulation

In the previous section, we refer to our weak form (6.54) as variationalform. This reference arises from the fact that whenever the operators involvedpossess a certain symmetry, a weak form of the problem can be obtained froma standard problem in the calculus of variations. In such cases, the variationalboundary-value problem represents a characterization of the function y thatminimizes the energy of the problem.

To illustrate, we now recall the strong form of the undamped, simply sup-ported beam vibration problem

ρy + EIy′′′′ = 0 (6.64)

with boundary conditions

y(t, 0) = y(t, L) = 0,

EIy′′(t, 0) = EIy′′(t, L) = 0, (6.65)

and initial conditions

y(0, x) = y1(x), (6.66)y(0, x) = y2(x), (6.67)

where, without loss of generality, we assume that EI is constant.The associated kinetic energy and potential energy are given by

Ek(t) =12

∫ L

0

ρy(t, x)2 dx, (6.68)

Ep(t) =12

∫ L

0

EIy′′(t, x)2 dx, (6.69)

respectively. We also denote the action to be the real-valued function givenby

A[y](t0, t1) =∫ t1

t0

(Ek − Ep) dt. (6.70)

Note that A is a “function of functions” and the values of A are real.Any function with these properties is termed a functional. Using Hamilton’sPrinciple of Least (Stationary) Action, which states that “On any interval[t0, t1], motion (solution of the dynamical problem) provides a stationary valuefor A,” we can consider the classical minimization problem in the calculus ofvariations. Toward this end, we consider perturbations (also called variations)in the motion of the beam y such that

yε = y + εΦ (6.71)


is also an admissible motion. Here, the variations Φ(t, x) = η(t)φ(x) are suchthat yε satisfies essential boundary conditions. Hence, φ(0) = φ(L) = 0.Moreover, for arbitrary t0 and t1, we must have

yε(t0, x) = y(t0, x),yε(t1, x) = y(t1, x),

which imply that η(t0) = η1(t1) = 0. Finally, the functions η and φ mustbe sufficiently smooth so that the integral in the functional A makes sense.Consequently, we choose η ∈ C1(t0, t1) and assume that φ belongs to ourpreviously defined class H2

⋂H1

0 .Using Hamilton’s Principle of Least Action, we find the motion y provides

a minimum (stationary point) to A[y + εΦ] at ε = 0. Hence, from the firstorder necessary condition of optimality,

d

dεA[y + εΦ]|ε=0 = 0,

where

A[y + εΦ] =∫ t1

t0

[12

∫ L

0

ρ(y + εΦ)2 dx− 12

∫ L

0

EI(y′′ + εΦ′′)2 dx

]dt.

We thus obtain

d

dεA[y + εΦ]|ε=0 =

∫ t1

t0

∫ L

0

[ρyΦ− EIy′′Φ′′] dxdt = 0, (6.72)

for all admissible variations Φ = ηφ. Introducing this substitution into equa-tion (6.72) and integrating by parts, we find∫ t1

t0

∫ L

0

[ρyΦ− EIy′′Φ′′] dxdt =

−∫ t1

t0

∫ L

0

ρyηφ dxdt+∫ L

0

[ρyηφ] dx|t1t0 −∫ t1

t0

∫ L

0

EIy′′ηφ′′ dxdt =

−∫ t1

t0

∫ L

0

[ρyφ+ EIy′′φ′′]η dxdt = 0,

where we used η(t0) = η(t1) = 0. Since η are arbitrary C1 functions, we musthave for all φ ∈ H2

⋂H1

0∫ L

0

[ρyφ+ EIy′′φ′′] dx = 0, (6.73)

which is our previously considered weak formulation (6.55) with cDI = 0 andf = 0. Hence, we see that the first order necessary condition of variationaltheory is the same as the weak form of the beam equation. This observationprovides the rationale for our use of the term “variational” formulation whenwe refer to the weak formulation (6.73) of the beam problem (6.64).


6.7 The Finite Element Method

In this section, we formally give an introduction to the finite element methodin one space dimension, which was discussed previously in the context of thebeam problem. In essence, the finite element method is a general and sys-tematic technique for constructing approximate solutions to boundary-valueproblems. The method involves the application of variational concepts to con-struct an approximation of the solution over the collection of finite elements.The method has been shown to be successful in solving a wide range of prob-lems in engineering and science. Here we give an introduction in the contextof second order (in space) systems.

From a mathematical point of view, a convenient way to introduce thefinite element method is through the method of weighted residuals (see, e.g.,[11, 15]). To this end, we consider the following initial boundary value modelproblem in one-spatial dimension:

ut = Lu(t, x), t > 0, x ∈ [0, L], (6.74)

with initial conditionu(0, x) = u0(x),

and boundary conditionsu(t, 0) = f1(t),

u(t, L) = f2(t).

In the model equation (6.74), L denotes a second order spatial differentialoperator. For example, in the case that Lu = ∂2

∂x2u(t, x), then the modelequation (6.74) becomes the well-known one-dimensional heat equation (seeChapter 5).

To approximate the solution to the model equation (6.74) we begin by sub-dividing the one-dimensional spatial domain [0, L] into N equally subintervals,also called finite elements. Within each element, certain points are identified,called nodes or nodal points, which play an important role in finite elementconstruction. In the case of equal length finite elements, the nodes are definedby

xi =(i− 1)L

N, i = 1, 2, . . . , N + 1.

The collection of elements and nodes, which make up the domain of the ap-proximate problem, is often referred to as a finite element mesh.

After constructing the finite element mesh for our model problem, we nextproceed to construct a corresponding set of basis functions Bi. The basisfunctions are defined so that the approximate solution uA(t, x) to our model


equation is represented by

uA(t, x) =N+1∑i=1

ci(t)Bi(x).

In addition, these basis functions usually are required to satisfy the followingconstraints:

1. The basis functions are simple, piecewise functions defined over the finiteelement mesh;

2. The basis functions are smooth enough so that the approximate solutionmakes sense in some appropriate space;

3. The basis functions are chosen in such a way so that the coefficientfunctions ci(t) are precisely the values of uA at the nodal points.

One very simple set of basis functions, which are called hat functions, is de-picted in Figure 6.26. These functions are given by

Bi(x) =

x−xi−1xi−xi−1

, x ∈ [xi−1, xi],

xi+1−xxi+1−xi , x ∈ [xi, xi+1].

xN+1

xN

xi+1

xi

xi-1

x3

x2

x1

0

B1(x) B

i(x) B

N+1(x)

. . . . . .

L

1

FIGURE 6.26: Hat basis functions.

More generally, one selects basis function Bi(x) so that the approximatesolution uA is of the form

uA(t, x) = uB(t, x) +N+1∑i=1

ci(t)Bi(x),


whereuB(t, 0) = f1(t) and uB(t, L) = f2(t)

and Bi(x) = 0 for x = 0, L. Therefore, uA(t, x) satisfies the boundary condi-tions but not the initial condition or the differential equation. This is calledthe interior method . On the other hand, if uA(t, x) is chosen to satisfy thedifferential equation but not the boundary condition, this is known as theboundary method . For the mixed method , uA satisfies neither the differentialequation nor the boundary condition.

We next form the residuals

RE(uA) = LuA − (uA)tRI(uA) = u0(x)− uA(0, x).

If uA is the exact solution, both residuals are zero. In the weighted residualmethod (WRM) [11, 15], the coefficient functions ci(t) are chosen in such away that the residuals are zero in some “average sense.” That is, we choosethe weight function φ(x) so that∫ L

0

φ(x)RE(uA(t, x))dx = 0.

However, this gives us only one equation for N + 1 unknowns ci(t). Toobtain N + 1 equations, we choose N + 1 different weighting functions φj(x),j = 1, 2, . . . , N + 1 so that∫ L

0

φj(x)RE(uA(t, x))dx = 0, for j = 1, 2, . . . , N + 1

and ∫ L

0

φj(x)RI(uA(0, x))dx = 0.

Weighted residual methods differ from one another through the choice ofφj(x).

• Galerkin Method

Perhaps the best known of these approximate methods is the Galerkinmethod discussed above (sometimes referred to as Bubnov-Galerkin pro-cedure). Here, the weighting functions φj(x) are chosen to be the sameas the basis functions Bj(x),

φj(x) = Bj(x).


• Least Squares Method

Let

I(~c) =∫ L

0

R2Edx,

where

~c =

c1c2...

cN+1

.

The idea behind the least squares method is to find a stationary pointof I(~c). That is, we seek ci so that I(~c) is minimized. A necessaryoptimality condition is given by

∂I

∂cj= 2

∫ L

0

RE∂RE∂cj

dx = 0, j = 1, 2, . . . , N + 1,

where ∂RE∂cj

= φj(x). These equations provide N + 1 equations to solvefor N + 1 unknown functions cj .

• Collocation

In this method, we select N + 1 nodal points xj , j = 1, 2, . . . , N + 1 andthe corresponding weighting functions

φj(x) = δ(x− xj),

where δ is the Dirac delta generalized function. Then∫ L

0

φj(x)RE(uA(t, x))wjdx =∫ L

0

δ(x− xj)RE(uA(t, x))dx

= RE(uA(t, xj))= 0.

This equation specifies that the residual is zero at N + 1 specified nodalpoints xj . One possible choice would be to choose Bj(x) to be theChebyshev polynomials and use the roots of a Chebyshev polynomial asthe collocation points. The Chebyshev polynomials are defined recur-sively by

P0 = 1, P1 = x, P2 = 2x2 − 1,

Pr+1(x) = 2xPr(x)− Pr−1(x),

for −1 ≤ x ≤ 1. This procedure is sometimes referred to as orthogonalcollocation in the literature.


6.8 Experimental Beam Vibration Analysis

In this section we describe a physical experiment that is routinely carriedout by students in our laboratory (see http://www.ncsu.edu/crsc/ilfum.html).It can be used to perform modal analysis as well as model validation for a can-tilever beam. The general arrangement of the hardware needed to setup thisexperiment is depicted in Figure 6.27. In particular, our experimental setupinvolves a cantilever beam in a “smart material” paradigm. One end of thebeam is clamped while the other end is free, and the beam is mounted withtwo self-sensing, self-actuating piezoceramic patches. Piezoceramic patchesare made up of lead zirconate titanates (PZT’s), a piezoelectric material.This type of material belongs to a class of dielectrics that exhibits significantmaterial deformations in response to an applied electric field, and producesdielectric polarization in response to mechanical strains. Therefore, piezo-electric materials have actuating as well as sensing capabilities. The beam inour laboratory can be excited by two sources: (a) an impulse excitation (by ahammer hit) (see Figure 6.27) and (b) a transient periodic excitation (throughpiezoceramic actuators). The beam transverse acceleration is measured by ac-celerometers which are attached to the beam structure. In addition, the beamtransverse displacement is measured by a proximity transducer system. Theproximity transducer system is a gap-to-voltage transducer system that pro-vides accurate, noncontacting static as well as dynamic displacement measure-ments. Data are recorded and analyzed by the four-channel Hewlett-Packard(HP) dynamic signal analyzer. The HP analyzer allows for both time domainand frequency domain analysis. Finally, in addition to providing data formodel validation and modal analysis, this experimental setup has also beenused by our students in mechanical vibrational control studies (see Chapter 7).More specifically, for the control of transient vibrations we also added a rapidcontrol prototyping (RCP) system. This system consists of 450MHz PentiumII PC, a digital signal processor (DSP) controller board, a multi-channel filterinstrument, and a multi-channel amplifier. The DSP board, a DS1103 madeby dSPACE, is equipped with a PowerPC 604e processor running 333 MHz, 36ADC Channels, and 8 DAC channels. The DS1103 is supported by dSPACE’sTotal Development Environment (TDE). Within the TDE, programming isdone easily via The MathWorks’ Simulink. Its graphical user interface allowsthe user to design new controller with graphical blocks, instead of hardcoding.The amplifier is needed to amplify the low output voltage from the DSP todrive the piezoceramic patches, and the low-pass filter is used to minimize theeffects of aliasing. The interested reader is referred to [12] for further detailsregarding the experimental setup for beam vibration control.

To carry out the experiments outlined above, the hardware and software,which are listed in Tables 6.2 and 6.3, respectively, are recommended.


TABLE 6.2: Hardware equipment for beam vibrationexperiment.

Descriptive name Probable brand (model)• Proximity Transducer System Bently Nevada (7200)• PiezoBeam Accelerometer Kistler Instrument Corp.

8630C5• HP Dynamic Signal Analyzer Hewlett-Packard

35670A• Impact Hammer Kit W.A. Brown Instr.

GK291C03• 4-Channel Transducer Coupler Kistler Instrument Corp.

5134• Piezoceramic Patches EDO Corporation• Wideband Amplifier Krohn-Hite Corp. (7600)• Low-Pass Filter Frequency Devices (9002)• DSP Board dSPACE (1103)

TABLE 6.3: Software tools for beam vibration experiment.

Descriptive name Brand• Real-Time Workshop (for control) The MathWorks, Inc.• Simulink (for control) The MathWorks, Inc.• MATLAB and Optimization toolboxfor data and model analysis The MathWorks, Inc.


HP signal analyzerProximity sensor

Piezoceramic actuator (front)

and sensor (back)

Impact

hammer

FIGURE 6.27: Hardware used for modal analysis and model validation ofthe cantilever beam model.

Project: Beam Vibration Analysis

The aim of this project is to carry out modal analysis of the mathematicalmodel for the transverse displacement of a cantilever beam. In addition, thesecond part of the project involves a parameter estimation problem usingdata collected from the experiment as described in §6.8. For readers withoutaccessibility to such experimental data, simulation data from a numerical (e.g.,finite element) solution with noise added can be generated and used in theplace of experimental observations.

A. Mode Shapes and Natural Frequencies

Consider a cantilever beam with length, L = 1, and free end at x = L.Assume that ρ and EI are constants with their values equal to 1 and 5 re-spectively. The free-vibration equation of motion for this system is

ρ∂2y(t, x)∂t2

+ EI∂4y(t, x)∂x4

= 0

with initial conditions:

y(0, x) = y1(x)∂y(0, x)∂t

= y2(x),

where y1 and y2 are some given functions of x and x ∈ [0, 1]. The boundaryconditions for this case are given by:

y(t, 0) = 0,∂y(t, 0)∂x

= 0


M(t, 1) = EI∂2y(t, 1)∂x2

= 0,∂M(t, 1)∂x

= EI∂3y(t, 1)∂x3

= 0.

1.) Using separation of variables, that is, letting

y(t, x) = w(t)φ(x),

show that

φ(x) = A1 cos(ξx) +A2 sin(ξx) +A3 cosh(ξx) +A4 sinh(ξx),

where Ai are constants, ξ = (βρ/EI)1/4 and β is a constant to bedetermined.

2.) Applying the boundary conditions, show that

A3 = −A1, A4 = −A2,

and ξ satisfies cos(ξ) = −(1/ cosh(ξ)).

3.) Plot the functions cos(ξ) and −(1/ cosh(ξ)) on the same graph for ξ ∈[0, 50]. Note that the intersections of the two plots give the values of βand there are infinitely many such values.

4.) Estimate the first five values of β from the graph in part 3.) and usethese values as estimates for the MATLAB routine fzero to computethe first five zeros of the function cos(ξ) + (1/ cosh(ξ)) = 0. Show thatthe fourth and higher zeros can be approximated accurately (within fourdecimal places) by

ξi =π

2(2i− 1), i = 4, 5, 6, . . . ,

which are the zeros of cosx.

5.) Using these computed values of ξ from part 4.) compute the first fivenatural frequencies of the beam ωi, i = 1, 2, 3, 4, 5 and plot the first fivemode shape functions φi(x), i = 1, 2, 3, 4, 5.

6.) Now consider the same cantilever beam as above but with viscous damp-ing of the form γ ∂y∂t where γ is a constant and is equal to 2. How arethe frequencies and mode shapes different from the undamped case? Ifthey are different, compute the first five frequencies and mode shapefunctions.

B. Experimental Data

Consider the same cantilever beam as in part A. but with length L beingthe length of the beam which was used in the experiments.


1.) Suppose that the parameters ρ and EI are unknown but the first fewnatural frequencies of the beam, ωi, are known (for example, from thephysical experiments that you performed). Discuss how one can use thenatural frequencies to compute the unknown parameters ρ and EI.

2.) From the experiments performed in the laboratory (or from simulateddata), you should have at least two data sets. One data set containsthe displacement of the beam as a function of time. The other data setcontains the power spectrum data whose graph provides information onthe natural frequencies of the beam (that is, the location of the peaksin the plot indicates the frequency components present in the signal).

(a) Perform the discrete Fourier transform (DFT) on your time domaindata set and verify that the location of peaks in the plot of the DFTcorrespond to those in the power spectrum data set. Recall that,for real data, the N -point DFT is symmetric around the N/2 point,so for plotting purposes, it is sufficient to plot the first half of theDFT, which corresponds to positive frequencies.

(b) How many frequency components do you observe in the signal?What are their values?

(c) Using the frequency information, estimate the unknown parametersρ and EI of the beam.

(d) Combining direct computation (for example, the moment of inertia,I, can be computed exactly from the known dimensions of thebeam) with a literature search, find the “book” values of ρ andEI for the beam that you used in the experiments. How are thesevalues different from those that you computed in part (c)? Explainwhat you think are the reasons for the differences.

References

[1] H.T. Banks, W. Fang, R.J. Silcox and R.C. Smith, Approximation meth-ods for control of acoustic/structure models with piezoceramic actuators,Journal of Intelligent Material Systems and Structures, 4(1), 1993, pp.98–116.

[2] H.T. Banks, N.G. Medhin and G.A. Pinter, Multiscale considerationsin modeling of nonlinear elastomers, CRSC-TR03-42, North CarolinaState University, Raleigh, North Carolina, October, 2003; Journal ofComputational Methods in Science and Engineering, 8, 2007, pp. 53–62.

[3] H.T. Banks, R.J. Silcox and R.C. Smith, The modeling and control ofacoustic/structure interaction problems via piezoceramic actuators: 2-Dnumerical examples, ASME Journal of Vibration and Acoustics, 116(3),1994, pp. 386–396.

[4] W.E. Boyce and R.C. DiPrima, Elementary Differential Equations andBoundary Value Problems, John Wiley & Sons, Inc., Hoboken, 8th ed.,2004.

[5] A.J. Bullmore, P.A. Nelson, A.R.D. Curtis and S.J. Elliott, The activeminimization of harmonic enclosed sound fields, part II: A computersimulation, Journal of Sound and Vibration, 117(1), 1987, pp. 15–33.

[6] R. Courant and D. Hilbert, Methods of Mathematical Physics, vol. II,Wiley, New York, 1962.

[7] J.M. Gere and S.P. Timoshenko, Mechanics of Materials, PWS Pub.Co., Boston, 4th ed., 1997.


[9] H.C. Lester and C.R. Fuller, Active control of propeller induced noisefields inside a flexible cylinder, AIAA 10th Aeroacoustic Conference,Seattle, WA, 1986.

[10] E.P. Popov, Introduction to Mechanics of Solids, Prentice-Hall, Inc.,Englewood Cliffs, 1968.

[11] J.N. Reddy, An Introduction to the Finite Element Method, McGrawHill Series in Mechanical Engineering, New York, 3rd ed., 2005.

153

154 References

[12] R.C.H. del Rosario, H.T. Tran and H.T. Banks, Proper orthogonal de-composition based control of transverse beam vibrations: Experimen-tal implementation, CRSC-TR99-43, North Carolina State University,Raleigh, North Carolina, 1999; IEEE Trans. on Control Systems Tech-nology, 10, 2002, pp. 717–726.

[13] K.R. Symon, Mechanics, Addison-Wesley Publishing Co., Reading,1971.

[14] S.P. Timoshenko and J.N. Goodier, Theory of Elasticity, McGraw-Hill,Inc., New York, 1987.

[15] O.C. Zienkiewicz and R.C. Taylor, Finite Element Method: Volume 1,The Basis, Butterworth-Heinemann, Newbury (UK), 5th ed., 2000.

Chapter 7

Beam Vibrational Control andReal-Time Implementation

7.1 Introduction

In this chapter we focus on the real-time implementation of feedback con-trols for the attenuation of transverse beam vibrations due to transient pul-sation. In particular, we will consider an aluminum cantilever beam to whichtwo piezoceramics patches are mounted in a symmetric opposing fashion. Thesensing device to be used for observation is a proximity probe, and thus thesensor loading effects on the beam (an extremely thin metallic surface mountedon the beam) are assumed to be negligible and are not taken into account inthe modeling of the beam. Also, it is assumed that the beam vibration occurstransversely (with no out of plane torsion or twisting about the axis of thebeam), a reasonable assumption for beams that have relatively small thicknesswhen compared to width. Hence we can make use of the Euler-Bernoulli beammodel that we developed earlier in Chapter 6. Most of the facts presentedin this section are standard knowledge and can be found in the textbookliterature (see, e.g., [1, 2, 4, 5, 6, 17]).

The control methodology to be discussed in this chapter is the well-knownlinear quadratic regulator (LQR) design method. We will discuss how toimplement this control in real-time where only partial state measurementsare available (transverse beam displacement data at a single location on thebeam). Such considerations require the use of an observer or state estimatorcoupled with full state feedback. To illustrate these control methodologies,we will begin by reviewing several important concepts from control theory.

7.2 Controllability and Observability of Linear Systems

In control theory the typical problem is to find the input (or control) thatcauses the state or the output to behave in a desired way. In particular, twobasic questions in control theory are:

155


(a) Is it possible to find a suitable control input that can transform anyinitial state to any desired state in a finite length of time?

(b) Is it possible to identify or reconstruct the initial state by observing theoutput in a finite length of time?

Consider a simple example where the state (x1(t), x2(t))T and the outputy(t) of a dynamical system are given by:(

x1(t)x2(t)

)=(−1 0

0 2

)(x1(t)x2(t)

)+(

10

)u(t), (7.1)

y(t) =(

0 1)(x1(t)

x2(t)

). (7.2)

It is clear that no matter what input u(t) is applied, the state variable x2(t) isnot affected. Hence x2(t) is said to be not controllable by the input u(t). Onthe other hand, the state variable x2(t) can be measured or observed but x1(t)is neither observable nor measurable. Hence x1(t) is said to be not observablefrom the output y(t). This example illustrates the concepts of controllabilityand observability that we shall explain in more detail below.

Remark 7.2.1 Kalman introduced the ideas of controllability and observ-ability in [11]. Another detailed exposition of these concepts can be found in[12].

7.2.1 Controllability

7.2.1.1 Time-Varying Case

Consider the n-dimensional linear state equation

~x(t) = A(t)~x(t) +B(t)~u(t), (7.3)

where ~x(t) ∈ Rn, ~u(t) ∈ Rm and A(·) and B(·) are n× n and n×m matriceswhose elements are continuous functions on (−∞,∞). Because the outputdoes not play a role in controllability, we will not consider the output equationfor now.

Definition 7.2.1 The state system ( 7.3) is said to be (completely) controllableat time t0 if for any pair of states ~x0 and ~x1 ∈ Rn there is a finite time t1 > t0and an input ~u(·) on [t0, t1] such that ~u(t) transfers ~x0 to the state ~x1 at timet1. That is,

~x1 = Φ(t1, t0)~x0 +∫ t1

t0

Φ(t1, s)B(s)~u(s)ds, (7.4)

where Φ(t, t0) is the n× n state transition matrix satisfying

Φ(t, t0) = A(t)Φ(t, t0)Φ(t0, t0) = I.

Beam Vibrational Control and Real-Time Implementation 157

Remark 7.2.2

(i) If A(t) has the following commutative property

A(t)(∫ t

t0

A(s)ds)

=(∫ t

t0

A(s)ds)A(t),

thenΦ(t, t0) = e

R tt0A(s)ds

,

and the solution to

~x(t) = A(t)~x(t), ~x(t0) = ~x0

is given by~x(t) = Φ(t, t0)~x0,

which is a transformation of the initial condition. For this reason,Φ(t, t0) is called the state transition matrix. Note that the above com-mutative condition holds in particular if the elements of A are constants.

(ii) In this definition, the term “completely” means that the definition holdsfor all ~x0 and ~x1. The control ~u(·) is assumed to be either piecewisecontinuous on [t0, t1] or in L2[t0, t1].

(iii) Rewrite equation (7.4) as

~x1 − Φ(t1, t0)~x0 =∫ t1

t0

Φ(t1, s)B(s)~u(s)ds.

Controllability means that this equation is solvable for ~u(t) given ar-bitrary ~x1, ~x0, t0 and t1. In addition, if we let x1 = ~x1 − Φ(t1, t0)~x0,then

x1 =∫ t1

t0

Φ(t1, s)B(s)~u(s)ds,

which shows that u(·) also transfers the state from ~x0 = ~0 to x1 on[t0, t1]. Hence an equivalent definition of controllability means that everystate can be reached from the origin in finite time. On the other hand,if we rewrite equation (7.4) as

~0 = Φ(t1, t0)~x0 − ~x1 +∫ t1

t0

Φ(t1, s)B(s)~u(s)ds

= Φ(t1, t0)[~x0 − Φ(t0, t1)~x1] +∫ t1

t0


then the same control u(·) also transfers the state from x0 = ~x0 −Φ(t0, t1)~x1 to ~0. That is, controllability also means that every state canbe transferred to the origin in finite time.


(iv) This definition requires only that ~u(t) be capable of moving any statein the state space to any other state in finite time; the exact statetrajectory is not specified.

(v) Controllability has nothing to do with the output and no constraint isimposed on the input.

(vi) Controllability implies the existence of an open-loop control, but it doesnot tell us how to construct one.

It is well known that the solution to the state equation (7.3) is given by thevariation of constants formula (see, e.g., [6])

~x(t) = Φ(t, t0)~x(t0) +∫ t

t0

Φ(t, s)B(s)~u(s)ds

= Φ(t, t0)[~x(t0) +

∫ t

t0

Φ(t0, s)B(s)~u(s)ds]. (7.5)

Consider the n× n constant matrix

W (t0, t1) =∫ t1

t0

Φ(t0, s)B(s)BT (s)ΦT (t0, s)ds.

Let ~x(t0) = ~x0 and ~x1 be arbitrary. Assume that W (t0, t1) is nonsingular andconsider

~u(t) = −BT (t)ΦT (t0, t)W−1(t0, t1)[ ~x0 − Φ(t0, t1)~x1].

Substituting the above equation for ~u(t) into equation (7.5) we have

~x(t1) = Φ(t1, t0) ~x0

−∫ t1

t0

Φ(t0, s)B(s)BT (s)ΦT (t0, s)W−1(t0, t1)[ ~x0 − Φ(t0, t1)~x1]ds

= Φ(t1, t0)~x0 −W (t0, t1)W−1(t0, t1)[ ~x0 − Φ(t0, t1)~x1]

= Φ(t1, t0)Φ(t0, t1)~x1

= ~x1.

Thus, W (t0, t1) being nonsingular is a sufficient condition for controllabilityof system (7.3). Equivalently [6], the linearly independence of the rows ofΦ(t0, ·)B(·) on [t0, t1] is a sufficient condition for the controllability of (7.3).In fact, the linear independency of the rows of the n×m matrix Φ(t0, ·)B(·)on [t0, t1] is a necessary and sufficient condition for the controllability of (7.3).

To see the necessary condition, we assume that (7.3) is controllable but therows of Φ(t0, ·)B(·) are linearly dependent on [t0, t1] for all t1 > t0. Then thereexists a nonzero vector ~c ∈ Rn such that

~c TΦ(t0, t)B(t) = 0


for all t ∈ [t0, t1]. Let us choose ~c = ~x(t0) = ~x0. Then equation (7.5) can berewritten as

~c TΦ(t0, t1)~x(t1) = ~c T~c+ ~c T∫ t1

t0

Φ(t0, s)B(s)~u(s)ds.

Since (7.3) is controllable at t0, for any state ~x(t1) (in particular for ~x(t1) = 0)there exists a control ~u(t) such that

~c T~c = 0,

where we used the fact that ~c TΦ(t0, s)B(s) = 0 for all s ∈ [t0, t1]. This impliesthat ~c = ~0 which is a contradiction.

Remark 7.2.3

(a) The above discussion on controllability involves the computation of thestate transition matrix Φ(t0, ·) which is a very difficult task in general.

(b) The n× n constant matrix

W (t0, t1) =∫ t1

t0

Φ(t0, s)B(s)BT (s)ΦT (t0, s)ds

is called the (controllability) Grammian matrix. From [6], it can beshown that the equivalent necessary and sufficient condition for (7.3) tobe controllable at time t0 is for W (t0, t1) to be nonsingular.

Recall that controllability means that there exists a control ~u(·) capable ofmoving any state in the state space to any other state in finite time. Since thestate trajectory is not specified (see Remark 7.2.2(iv)), there are, in general,many different controls ~u(·) that achieve this task. One possible control ~u(·)is given by the formula

~u(t) = −BT (t)ΦT (t0, t)W−1(t0, t1)[ ~x0 − Φ(t0, t1)~x1] (7.6)

as shown above. If we define the so-called total energy E of (7.3) as

E =∫ t1

t0

‖~u(t)‖2dt,

where ‖ ·‖ denotes the Euclidean norm, then the control ~u(·) in equation (7.6)is the one which minimizes this energy and is called the minimum-energycontrol. That is, if we let ~u(t) be given by (7.6) which transfers ~x0 to ~x1 attime t1 and let u(t) be another control on [t0, t1] that accomplishes the sametask, then ∫ t1

t0

‖~u(t)‖2dt ≤∫ t1

t0

‖u(t)‖2dt.


To see this, we recall that the solution to (7.3) with the initial condition~x(t0) = ~x0 is given by

~x(t) = Φ(t, t0)[~x0 +

∫ t

t0

Φ(t0, s)B(s)~u(s)ds].

Letting ~x1 = ~x(t1), we can rewrite the above equation as

Φ(t0, t1) ~x1 − ~x0 =∫ t1

t0

Φ(t0, s)B(s)~u(s)ds

or

x1 =∫ t1

t0


where x1 = Φ(t0, t1) ~x1 − ~x0. Since ~u(·) and u(·) both transfer ~x0 to ~x1 at t1,we have ∫ t1

t0

Φ(t0, s)B(s)~u(s)ds =∫ t1

t0

Φ(t0, s)B(s)u(s)ds

or ∫ t1

t0

Φ(t0, s)B(s)[~u(s)− u(s)]ds = ~0.

This implies that⟨∫ t1

t0

Φ(t0, s)B(s)[~u(s)− u(s)]ds,W−1(t0, t1)x1

⟩= 0,

or equivalently,∫ t1

t0

⟨~u(s)− u(s), BT (s)ΦT (t0, s)W−1(t0, t1)x1

⟩ds = 0.

Using (7.6), we find from this equation∫ t1

t0

〈~u(s)− u(s), ~u(s)〉 ds = 0.

Then∫ t1

t0

‖u(t)‖2dt =∫ t1

t0

‖u(t)− ~u(t) + ~u(t)‖2dt

=∫ t1

t0

‖u(t)− ~u(t)‖2dt+∫ t1

t0

‖~u(t)‖2dt+ 2∫ t1

t0

〈u(t)− ~u(t), ~u(t)〉dt

=∫ t1

t0

‖u(t)− ~u(t)‖2dt+∫ t1

t0

‖~u(t)‖2dt.


Hence ∫ t1

t0

‖u(t)‖2dt−∫ t1

t0

‖~u(t)‖2dt =∫ t1

t0

‖u(t)− ~u(t)‖2dt

≥ 0,

or ∫ t1

t0

‖~u(t)‖2dt ≤∫ t1

t0

‖u(t)‖2dt.

We now give a controllability criterion based solely on the system matricesA(·) and B(·). To this end, we assume that A(·) and B(·) are (n − 1) timescontinuously differentiable and let Ψ(t) denote the fundamental matrix of~x(t) = A(t)~x(t). The relationship between the state transition matrix Φ andthe fundamental matrix Ψ is given by [6]

Φ(t, t0) = Ψ(t)Ψ−1(t0),

for all t, t0 ∈ (−∞,∞). Define a sequence of n ×m matrices Mk(t) by theequations

M0(t) = B(t),

Mk+1(t) = −A(t)Mk(t) + d

dtMk(t),

for k = 0, 1, . . . , n− 1. We next observe that

Φ(t0, t)B(t) = Φ(t0, t)M0(t),∂

∂tΦ(t0, t)B(t) =

d

dtΨ(t0)Ψ−1(t)B(t)

= Ψ(t0)d

dt[Ψ−1(t)]B(t) + Ψ(t0)Ψ−1(t)

d

dtB(t)

= Ψ(t0)Ψ−1(t)[−A(t)B(t) +

d

dtB(t)

],

where we have used Ψ(t)Ψ−1(t) = I(t). After differentiating both sides, weobtain

d

dtΨ−1(t) = −Ψ−1(t)A(t).

Therefore,

∂

∂tΦ(t0, t)B(t) = Φ(t0, t)M1(t).

Similarly,

∂2

∂t2Φ(t0, t)B(t) =

∂

∂t[Φ(t0, t)]M1(t) + Φ(t0, t)

d

dtM1(t)

= Φ(t0, t)[−A(t)M1(t) +

d

dtM1(t)

]= Φ(t0, t)M2(t).


In general,∂k

∂tkΦ(t0, t)B(t) = Φ(t0, t)Mk(t),

for k = 0, 1, . . . , n− 1. Now consider the matrix[Φ(t0, t1)B(t1)

∣∣∣∣ ∂∂t Φ(t0, t)B(t) |t=t1∣∣∣∣ . . . ∣∣∣∣ ∂n−1

∂tn−1Φ(t0, t)B(t)

∣∣∣∣t=t1

]= Φ(t0, t1)[M0(t1)|M1(t1)| . . . |Mn−1(t1)]

where t1 > t0. Since Φ(t0, t1) is nonsingular, if

ρ([M0(t1)|M1(t1)| . . . |Mn−1(t1)]) = n,

then

ρ

([Φ(t0, t1)B(t1)| d

dtΦ(t0, t1)B(t1)| . . . | ∂

n−1

∂tn−1Φ(t0, t1)B(t1)

])= n,

(7.7)where ρ denotes the rank of a matrix and we used the fact that if A = BC,ρ(A) ≤ ρ(C). But then because B is nonsingular, C = B−1A and ρ(C) ≤ρ(A). Hence, ρ(A) = ρ(C) if B is nonsingular. The rank condition (7.7) isequivalent to the condition that the rows of Φ(t0, ·)B(·) are linearly independ-ent on [t0, t1] [6]. Hence, a sufficient condition for the controllability of (7.3)at time t0 is

ρ([M0(t1)|M1(t1)| . . . |Mn−1(t1)]) = n

for some time t1 > t0.

7.2.1.2 Time-Invariant Case

We now consider the controllability question of the time-invariant stateequation

~x(t) = A~x(t) +B~u(t), (7.8)

where ~x ∈ Rn, ~u ∈ Rm and A and B are n× n and n×m constant matrices,respectively.

In the time-invariant case, Φ(t0, t)B(t) = eA(t0−t)B. Elements of the matrixfunction eA(t0−t) are of the form tkeλt; hence, elements of eA(t0−t)B are linearcombinations of terms of the form tkeλt which are analytic on [0,∞). Con-sequently, if the rows of eA(t0−t)B are linearly independent on [0,∞), theyare linearly independent on [t0, t1] for any t0 and t1 > t0. Hence if the time-invariant system (7.8) is controllable, it is controllable at any time t0 ≥ 0.For this reason the reference of t0 in the definition of controllability will bedropped for the time-invariant system.

In a manner similar to that in the time-variant case discussed in the previoussection, one can easily establish the following equivalence statements regardingcontrollability of the time-invariant system of equation (7.8).


(i) The system (7.8) is controllable;

(ii) The n rows of e−AtB are linearly independent on [0,∞);

(iii) The controllability Grammian matrix

W (0, t) =∫ t

0

e−AsBBT e−AT sds (7.9)

is nonsingular for any t > 0. Furthermore, the control

u(t) = −BT e−AT tW−1(0, t1)[~x0 − e−A

T t1~x1] (7.10)

transfers ~x(0) = ~x0 to ~x(t1) = ~x1;

(iv) ρ[Q] = n where Q is the n× nm controllability matrix

Q =[B |AB|A2B |. . .|An−1B

]. (7.11)

Example 7.2.1 In this example, we consider a platform which is supportedby springs and dampers as shown in Figure 7.1. This platform system can beused to study suspension systems of automotives. To simplify the mathemat-ical model, we assume that the mass of the platform is negligible. Thus themovements of the two spring systems can be regarded as independent and theapplied force u is assumed to be distributed to each spring system as shown.In addition, assume that the dampers are proportional to the velocity and thesprings obey Hooke’s law. In particular, the viscous damping coefficients ofboth springs are assumed to be 1 and the spring constants are assumed to be1 and 2 as depicted in Figure 7.1. If the displacements of the two springs fromequilibrium are denoted by x1 and x2, then (using the fact that the mass ofthe platform is very small compared to the spring and damping constants) weobtain from (2.9)

x1 + x1 = u

2x2 + x2 = 3u

or (x1

x2

)=(−1 0

0 −2

)(x1

x2

)+(

13

)u.

If the initial displacements of both ends are different from zero, the system willoscillate. Now we pose the following problem: Let x1(0) = 10 and x2(0) = −2.Is it possible to apply a control function u(t) which will bring the system torest at t1 = 1 second? t1 = 0.5 second?

The controllability matrix, Q, for this platform problem is

Q =(

1 −13 −6

)


x1 x2

Spring constant = 1

Damping coeff. = 1

4u

3 1

Spring constant = 2

Damping coeff. = 1

FIGURE 7.1: A spring-mass-dashpot platform system.

which has rank 2. Hence the system is controllable, and the displacementscan be brought to zero in an arbitrarily small time interval from any initialdisplacements. Using formulas (7.9) and (7.10), we can find the Grammiancontrollability matrix W and hence the control ~u(t) that will drive the systemfrom the initial state ~x0 to the zero state in t1 = 1 second.

Observe that the matrix A is in Jordan form [6]; therefore,

e−At =(et 00 e2t

).

Then

W (0, 1) =∫ 1

0

(es 00 e2s

)(13

)(1 3)( es 0

0 e2s

)ds

=∫ 1

0

(e2s 3e3s

3e3s 9e4s

)ds

=(

3.1945 19.085519.0855 120.5958

),

and the control u(t) is given by

ut1=1(t) = −(

1 3)( et 0

0 e2t

)W−1(0, 1)

(10−2

)= −59.2751et + 28.1925e2t.

Similarly, we can find the control that will drive the system to zero in t1 = .5


second. In this case, the controllability Grammian matrix is

W (0, .5) =∫ .5

0

(es 00 e2s

)(13

)(1 3)( es 0

0 e2s

)ds

=(

0.8591 3.48173.4817 14.3754

),

and the control u(t) is

ut1=.5(t) = −(

1 3)( et 0

0 e2t

)W−1(0, .5)

(10−2

)= −660.1278et + 480.0624e2t.

The state vector ~x(t) was found numerically using MATLAB. The plot of~x(t) for t1 = 1 is given in Figure 7.2, and for t1 = .5 in Figure 7.3. The controlsut1=1 and ut1=.5 are plotted on the same graph for comparison; note that theamplitude of ut1=.5 is greater that that of ut1=1 since it must drive the systemto the desired final state in a shorter period of time (see Figure 7.4).

0 0.2 0.4 0.6 0.8 1-30

-25

-20

-15

-10

-5

0

5

10

t, seconds

x(t

)

x1(t)

x2(t)

FIGURE 7.2: State vector x(t) for t1 = 1 second.

For ease of referencing, we will denote the time-invariant system (7.8) simplyby the pair (A,B) when discussing controllability.


0 0.1 0.2 0.3 0.4 0.5-80

-70

-60

-50

-40

-30

-20

-10

0

10

t, seconds

x(t

)

x1(t)

x2(t)

FIGURE 7.3: State vector x(t) for t1 = .5 second.

0 0.2 0.4 0.6 0.8 1-200

-150

-100

-50

0

50

100

150

200

250

t

u(t

)

FIGURE 7.4: Control u(t) for t1 = 1 second (solid line) and t1 = .5 second(dashed line).

Consider next the example described by equations (7.1) and (7.2)(x1(t)x2(t)

)=(−1 0

0 2

)(x1(t)x2(t)

)+(

10

)u(t),

y(t) =(

0 1)(x1(t)

x2(t)

).

Simple calculations show that the controllability matrix is rank deficient (i.e.,


has less than full rank). We note, however, that the state variable x2 isunaffected by the input u and hence the state space (x1, x2)T can be thoughtof as being decomposed into two subsystems, one being controllable (x1) andthe other uncontrollable (x2). This is the well-known Kalman controllablecanonical decomposition.

We will now outline a procedure for computing the similarity transformationmatrix P to transform the time-invariant system (7.8) where ρ(Q) = n1, withn1 < n into an equivalent system of the form( ˙x1

˙x2

)=(A1 A2

0 A3

)(x1

x2

)+(B1

0

)u(t) (7.12)

where x1 ∈ Rn1 , x2 ∈ Rn−n1 , and (A1, B1) is controllable.Let

ρ(Q) = ρ([B|AB| . . . |An−1B]) = n1,

andP = [P1|P2], (7.13)

where P1 is an n × n1 matrix whose columns form an orthogonal basis forthe column space of Q and P2 is an n × (n − n1) matrix whose columns, inconjunction with those of P1, form an orthogonal basis for Rn.

Consider the state variable transformation

P x(t) = x(t)

which yields the equivalent system

˙x = PTAP x+ PTBu(t).

Partitioning this system according to (7.13) we obtain( ˙x1

˙x2

)=(P1

TAP1 P1TAP2

P2TAP1 P2

TAP2

)(x1

x2

)+(P1

TB

P2TB

)u(t). (7.14)

We next note that

(i) P1TB = B1 for some appropriate n1 × m matrix B1. From the QR-

decomposition, the controllability matrix Q is

Q = [B|AB| . . . |An−1B]= PR

= [P1|P2][R11 R12 . . . R1n

R21 R22 . . . R2n

],

where R has been partitioned such that R1i is an n1 ×m matrix andR2i is an (n − n1) ×m matrix for i = 1, . . . , n. This implies that B =


P1R11 + P2R21. Therefore,

P1TB = P1

T [P1R11 + P2R21]= P1

TP1R11 + P1TP2R21

= R11

due to the orthogonality of the columns of P1 and P2.

(ii) P2TB = P2

TP1R11 = 0. This again follows from the orthogonality ofthe columns of P1 and P2.

(iii) P2TAP1 = 0.

Remark 7.2.4

(a) From (7.12), it is easy to see that the state variable x2(t) is completelyunaffected by the input u(t). Thus the state space has been divided intotwo parts, one being controllable and the other uncontrollable. Thisexplains the term “controllable” used in Kalman controllable canonicaltransformation.

(b) A straightforward method of finding the orthogonal basis vectors in P1

and P2 is to calculate the QR decomposition of the controllability matrix(see Example 7.2.2). In addition, this method is a reliable techniqueto determine the rank of the controllability matrix. That is, one ormore zeros on the diagonal in R implies that R and, consequently, thecontrollability matrix, do not have full rank.

Example 7.2.2 Consider the time-invariant system (7.8) where

A =

3 6 49 6 10−7 −7 −9

,

and

B =

−0.666667 0.3333330.333333 −0.6666670.333333 0.333333

.

Let us first compute the controllability matrix, which in this example will bedenoted by P (in order to avoid confusion with the notation Q used in theQR decomposition).

P = (B|AB|A2B)

=

−0.6667 0.3333 1.3333 −1.6667 −2.6667 6.33330.3333 −0.6667 −0.6667 2.3333 1.3333 −7.66670.3333 0.3333 −0.6667 −0.6667 1.3334 1.3334

.


The rank of P, computed from MATLAB, is given as ρ(P ) = 3. The QRdecomposition of P is P = QR, where

Q =

−0.8165 0.0000 0.57730.4082 −0.7071 0.57730.4082 0.7071 0.5774

,

R =

0.8165 −0.4082 −1.6330 2.0412 3.2660 −7.75670 0.7071 0.0000 −2.1213 0.0000 6.36400 0 0.0000 0.0000 0.0000 0.0000

.

Note, from the QR decomposition, ρ(R) = 2 and hence ρ(P ) = 2, which isin conflict with MATLAB’s computation of rank of (P ). The reason for thisdiscrepancy is that the MATLAB routine to calculate the rank of a matrix isvery sensitive to round-off error. If we were to compute the rank of P using

B =

−2/3 1/31/3 −2/31/3 1/3

in MATLAB, the rank would be calculated correctly as 2. The MATLABroutine to compute the QR decomposition of a matrix A is qr(A).

We form the matrix P as in (7.13) by letting P1 consist of the first twocolumns of Q and P2 be the third column of Q. Then P = [P1|P2] has thedesired properties, and

P1 =

−0.8165 0.00000.4082 −0.70710.4082 0.7071

,

P2 =

0.57730.57730.5774

.

Hence, using equation (7.14) to find A1, A2, A3, and B1, we have( ˙x1

˙x2

)=(A1 A2

0 A3

)(x1

x2

)+(B1

0

)u(t)

=

−2.0000 1.7321 −5.65690.0000 −3.0000 −19.59590.0000 0.0000 5.0000

( x1

x2

)+

0.8165 −0.40820.0000 0.70710.0000 0.0000

u(t),

where x1 ∈ R2, x2 ∈ R, and (A1, B1) is controllable. To verify controllability,observe that the controllability matrix Q of the subsystem (A1, B1) has fullrank:

ρ(Q) = ρ([B1|A1B1]) = 2.


7.2.2 Observability

Closely linked to the idea of controllability is the concept of observability.In fact, these two concepts are dual. Loosely speaking, controllability studiesthe possibility of steering the state from the input; observability studies thepossibility of determining the state of a system from the output. If a dynami-cal equation is controllable, all the modes of the equation can be excited fromthe input; if a dynamical equation is observable, all the modes of the equationcan be observed from the output.

7.2.2.1 Time-Varying Case

Consider the n-dimensional linear state and output equations

~x(t) = A(t)~x(t) +B(t)~u(t)~y(t) = C(t)~x(t),

(7.15)

where ~x(·) ∈ Rn, ~u(·) ∈ Rm, ~y(·) ∈ Rp, and A(·), B(·), and C(·) are matrices ofappropriate dimensions whose elements are continuous functions on (−∞,∞).

Definition 7.2.2 The dynamical system ( 7.15) is said to be (completely) ob-servable at t0 if there exists a finite time t1 > t0 such that for any initial state~x(t0) = ~x0, the knowledge of ~u(t) and ~y(t) for t ∈ [t0, t1] suffices to determinethe state ~x0 uniquely.

Example 7.2.3 Consider the system described by(x1

x2

)=(a1 00 a2

)(x1

x2

)+(b1b2

)u(t),

y(t) = x2(t).

Because the system is decoupled and y(t) = x2(t), the state x1(t0) cannot bedetermined by measuring x2(t) (= y(t)). Hence, the system is not observable.

We now determine conditions that can guarantee observability of (7.15).To this end, by the variation of constants formula we have

~y(t) = C(t)[Φ(t, t0) ~x0 +

∫ t

t0

Φ(t, s)B(s)~u(s)ds].

In the study of observability, ~y(t) and ~u(t) are known functions (or measure-ments). Hence, the above equation can be rewritten as

C(t)Φ(t, t0) ~x0 = y(t), (7.16)

where y(t) is a known function on [t0, t1] and is given by

y(t) = ~y(t)− C(t)∫ t

t0

Φ(t, s)B(s)~u(s)ds.


Question: Can we determine ~x0 from (7.16)?Multiply both sides of (7.16) by ΦT (t, t0)CT (t) and integrate from t0 to t1.

This yields[∫ t1

t0

ΦT (t, t0)CT (t)C(t)Φ(t, t0)dt]~x0 =

∫ t1

t0

ΦT (t, t0)CT (t)y(t)dt.

Therefore, if the constant matrix

V (t0, t1) =∫ t1

t0

ΦT (t, t0)CT (t)C(t)Φ(t, t0)dt

is nonsingular (or, equivalently [6], all columns of C(t)Φ(t, t0) are linearlyindependent on [t0, t1]) then we can determine ~x0 uniquely. In fact, the linearindependence of the columns of C(t)Φ(t, t0) is also a necessary condition forobservability. To see this, we assume that the system (7.15) is observable attime t0 but there exists no time t1 > t0 such that the columns of C(·)Φ(·, t0)are linearly independent on [t0, t1]. Hence, the equation

C(t)Φ(t, t0)~α = 0

has a nonzero n× 1 constant vector solution for all t > t0. Consider

y(t) = C(t)Φ(t, t0)~x(t0)

for t > t0. Since we assume (7.15) is observable, by taking ~x(t0) = ~α, we have

~y(t) = C(t)Φ(t, t0)~α = 0

for all t > t0, which implies that ~α cannot be detected at the output. Thiscontradicts the assumption of observability.

Remark 7.2.5 The above result shows that observability depends only onthe matrices C(·) and Φ(·, t0), or equivalently, only on C(·) and A(·). Hence,in studying observability, it is convenient to assume that u(t) ≡ 0 and to referto (7.15) by the pair (A,C).

The results on controllability and observability suggest that for controlla-bility we study the rows of Φ(t0, ·)B(·) and for observability one considers thecolumns of C(·)Φ(·, t0). These two concepts are in fact related by the wellknown Kalman Duality Theorem [6]. That is, consider the system

~x(t) = A(t)~x(t) +B(t)~u(t)~y(t) = C(t)~x(t)

(7.17)

and the dual system

~z(t) = −AT (t)~z(t) + CT (t)~v(t)~w(t) = BT (t)~z(t).

(7.18)


System (7.17) is controllable (observable) at time t0 if and only if the dualsystem (7.18) is observable (controllable) at time t0 [6].

The Kalman duality result is very useful. It allows us to deduce from acontrollability result the corresponding one on observability, and vice versa.For example, assume that the system matrix A(·) and the output matrix C(·)are (n−1) times continuously differentiable. Then (7.15) is observable at timet0 if there exists a finite time t1 > t0 such that

ρ

N0(t1)N1(t1)

...Nn−1(t1)

= n,

where

N0(t) = C(t)

Nk+1(t) = Nk(t)A(t) +d

dtNk(t),

for k = 0, 1, . . . , n.

7.2.2.2 Time-Invariant Case

Consider the linear time-invariant dynamical equation

~x(t) = A~x(t)~y(t) = C~x(t).

(7.19)

As in the controllability case, if (A,C) is observable then it is observable atevery t0 ≥ 0, and the determination of the initial state can be achieved in anyfinite time interval. Hence we drop the reference to t0 and t1 when we discussobservability of linear time-invariant systems.

From the Kalman duality theorem, the following equivalent statements canbe easily obtained:

(i) The system (7.19) is observable;

(ii) The n columns of CeAt are linearly independent on [0,∞);

(iii) The observability Grammian matrix

V (0, t) =∫ t

0

eAT sCTCeAsds

is nonsingular for any t > 0. Furthermore, the initial state ~x(0) = ~x0

can be determined from

~x0 = V −1(0, t1)∫ t1

0

eAT tCT y(t)dt;


(iv) ρ(V ) = n, where V is the pn× n observability matrix

V =

CCA

...CAn−1

. (7.20)

Example 7.2.4 Consider a spring-mass system with no damping, describedby (

x1

x2

)=(

0 1−1 0

)(x1

x2

)y(t) = x1(t).

Note that we observe the displacement. The observability matrix is(CCA

)=(

1 00 1

),

which has rank 2 and, consequently, the system is observable. Let t0 = −π,t1 = 0, and suppose that we measure y(t) to be

y(t) =12

cos t+12

sin t

on [−π, 0]. We seek to find x1(−π) and x2(−π).We first note that x1(t) = y(t) and hence x1(−π) = − 1

2 . Since x2 = x1 =y, then x2(−π) = − 1

2 . Here, to find x2 we need to differentiate y whichis an unstable process (in practice, y(t) has errors which are magnified bydifferentiation). We first compute

Φ(t,−π) =(− cos t − sin tsin t − cos t

),

soCΦ(t,−π) =

(− cos t− sin t

),

and

V (−π, 0) =∫ 0

−π

(− cos t− sin t

)(− cos t − sin t

)dt

=π

2

(1 00 1

).

Therefore,(x1(−π)x2(−π)

)= V −1(−π, 0)

∫ 0

−πeA

T tCT y(t)dt

=2π

(1 00 1

)∫ 0

−π

(− cos t− sin t

)(12 cos t+ 1

2 sin t)dt

=(− 1

2− 1

2

)


as we had computed earlier.

As in the controllability case, if the ρ(V ) = n1 < n, where V is the ob-servability matrix given by (7.20), then the state space can be divided intotwo subsystems — one observable, and one unobservable. This is analogouslycalled the Kalman observable canonical decomposition. More precisely, con-sider the time-invariant system

~x(t) = A~x(t) +B~u(t)~y(t) = C~x(t),

where ρ(V ) = n1 < n and V is the observability matrix. Then there exists anequivalent system of the form( ˙x1

˙x2

)=(A1 0A2 A3

)(x1

x2

)+(B1

B2

)u

~y = C1x1,

where x1 ∈ Rn1 , x2 ∈ Rn−n1 and (A1, C1) is observable.

Remark 7.2.6 Since ρ(V ) = ρ(V T ), the same QR decomposition applied toV T can be used to find the required transformation matrix.

Example 7.2.5 Consider a linear time-invariant system where

A =

3 6 49 6 10−7 −7 −9

, C =(

1 2 33 3 6

).

The transpose of the observability matrix is

V T = (CT |ATCT |A2TCT )

=

1 3 0 −6 −6 122 3 −3 −6 3 123 6 −3 −12 −3 24

,

which has rank 2. Hence the state space for the system in R3 can be decom-posed into two subsystems, where one state x1 ∈ R2 is observable, and onestate x2 ∈ R1 is unobservable. The QR decomposition of V T is V T = QR,where

Q =

−0.2673 0.7715 −0.5774−0.5345 −0.6172 −0.5774−0.8018 0.1543 0.5774

R =

−3.7417 −7.2161 4.0089 14.4321 2.4054 −28.86420 1.3887 1.3887 −2.7775 −6.9437 5.55490 0 0.0000 0 0.0000 0

.


We form the matrix P = [P1|P2] as in Example 7.2.2, where P1 consists ofthe first two columns of Q, and P2 the third column of Q. Then the equivalentsystem under the state variable transformation P x(t) = x(t) is( ˙x1

˙x2

)=(PT1 AP1 P

T1 AP2

PT2 AP1 PT2 AP2

)(x1

x2

)+(PT1 B

PT2 B

)u(t)

=

−1.0714 −0.3712 0.00004.8250 −3.9286 0.0000

19.4422 −3.7417 5.0000

( x1

x2

)+

−3.7417 −7.21610.0000 1.3887

0 0.0000

u(t)

and

~y = [C1|C2](x1

x2

)=(−3.7417 0.0000 0−7.2161 1.3887 0

)(x1

x2

)= C1x1,

where C1 = [CP1]. To verify observability, note that the observability matrixV1 of the subsystem (A1, C1) has full rank:

ρ(V1) = ρ(V T1 ) = ρ([CT1 |AT1 CT1 ]) = 2.

7.3 Design of State Feedback Control Systems and StateEstimators

We begin this section by reviewing the concept of stability. Consider thelinear time-invariant controlled system x(t) = Ax(t) +Bu(t) with initial con-dition x(t0) = x0. For u(t) = 0, the solution is given by

x(t) = Φ(t; t0, x0)= Φ(t, t0)x0.

Definition 7.3.1 A state xe of a dynamical equation is said to be an equi-librium state at t0 if

xe = Φ(t; t0, xe)

for all t ≥ t0.

Therefore, if a trajectory reaches an equilibrium state and no input is ap-plied, the trajectory will stay at the equilibrium state forever; that is, xe(t) = 0for all t ≥ t0. To find xe, set the right side of the differential equation equalto zero. For example, to find xe for x(t) = A(t)x(t) we solve

A(t)x(t) = 0.


Hence, x(t) = 0 is always an equilibrium state of x(t) = A(t)x(t).

Definition 7.3.2

(a) An equilibrium state xe is said to be stable in the sense of Lyapunov(i.s.L.) at t0 if and only if for every ε > 0 there exists a δ(ε, t0) > 0(which depends on ε and t0) such that if ‖x0 − xe‖ < δ(ε, t0) then‖Φ(t; t0, x0)− xe‖ < ε for all t ≥ t0.

(b) If δ depends only on ε but not on t0, then we say that xe is uniformlystable i.s.L.

Basically, xe is stable i.s.L. if the response due to any initial state that issufficiently near xe does not move far away from xe.

Remark 7.3.1 The state xe is uniformly stable i.s.L. (u.s.i.s.L.) implies it isstable i.s.L. (s.i.s.L.). The converse may not be true.

Example 7.3.1 Consider a pendulum as depicted in Figure 7.5. ApplyingNewton’s second law of motion yields

u(t) cos θ −mg sin θ = mlθ.

Let x1 = θ, and x2 = θ. Then

d

dt

(x1

x2

)=(

x2

− gl sinx1 + cos x1ml u

).

-

?

mg

l

u(t)

θ

SS

SS

SS

SS

SS

FIGURE 7.5: A pendulum.


For the equilibrium state, we take u = 0, and setd

dt

(x1

x2

)= ~0 : we find(

x2gl sinx1

)=(

00

)which implies x =

(kπ0

), k = 0,±1,±2, . . .

Note the equilibrium states satisfy

• xe =(kπ0

), k = 0,±2, . . ., are u.s.i.s.L.

• xe =(kπ0

), k = ±1,±3, . . ., are not s.i.s.L. (Why?)

Definition 7.3.3

(a) An equilibrium state xe is said to be asymptotically stable at t0 if itis stable i.s.L. at t0 and if every motion starting sufficiently near xeconverges to xe as t → ∞; that is, there exists a γ > 0 such that if‖x(t1) − xe‖ ≤ γ, then for any ε > 0 there exists T (γ, ε, t1) > 0 (thatdepends on γ, ε, t1) such that

‖Φ(t; t1, x(t1))− xe‖ ≤ ε

for all t ≥ t1 + T (γ, ε, t1).

(b) If an equilibrium state xe is u.s.i.s.L. and T can be chosen independentof t1 in the definition of asymptotic stability, then we say that xe isuniformly asymptotically stable over [t0,∞).

Remark 7.3.2 For the linear time-invariant system, it can be shown that[6]:

(a) Every equilibrium state of x(t) = Ax(t) is s.i.s.L. if and only if

– all eigenvalues of A have nonpositive real parts (negative or zero).– for any eigenvalue on the imaginary axis (Re(λ) = 0) with multi-

plicity m there correspond exactly m eigenvectors of A.

(b) The zero state of x(t) = Ax(t) is asymptotically stable if and only if allthe eigenvalues of A have negative real parts.

Let us consider a linear time-invariant control system

~x(t) = A~x(t) +B~u(t), (7.21)

where x(·) ∈ Rn, u(·) ∈ Rm and A,B are matrices of appropriate dimensions.The system (7.21) may often arise as the linearization of some nonlinear sys-tem about an equilibrium point or about the original system dynamics ofinterest. Now assume that the homogeneous system (u ≡ 0)

~x(t) = A~x(t) (7.22)


is not asymptotically stable. In control theory, the aim is to compel or con-trol a system to behave in some desired fashion. Thus for system (7.21), anobjective would be to use the control ~u(·) so that the system becomes asymp-totically stable. The traditionally favored means of accomplishing this task isto use a feedback relation

~u = K~x,

where K is an m×n matrix. The problem is thus to find the gain or feedbackmatrix K so that

~x(t) = A~x(t) +BK~x(t)= (A+BK)~x(t) (7.23)

is asymptotically stable (i.e., every eigenvalue of A+BK has a negative realpart). The system (7.23) is called a closed-loop or feedback control system.The main features of a feedback control system are represented in Figure 7.6.

xi

yiu i , i=1,2,...,m

feedback

Disturbances

errors

Measurement

Controlled system

with state variables

, i=1,2,...,n

Output,

Output monitors(sensors)

Goals

lawControl

(input)

Control variable

, i=1,2,...,p

FIGURE 7.6: A closed-loop or feedback control system.

Another class of control systems, called an open-loop control system, isrepresented in Figure 7.7.

In an open-loop system, the control ~u(·) is computed based on the goalsfor the system and all available a priori knowledge about the system. Theinput ~u(·) is in no way influenced by the output ~y(·) and thus if unexpecteddisturbances act upon the system or there are changes in operating conditions,


xi , i=1,2,...,nu i , i=1,2,...,m

Control variables

(input)GoalsControl

law

Controlled systemwith state variable

iOutput, y , i=1,2,...,p

Disturbances

FIGURE 7.7: An open-loop control system.

the output ~y(·) will not behave precisely as desired. On the other hand,in a closed-loop system there is a feedback of information concerning theoutputs to the controller. Thus, a feedback system is better to adapt tochanges in the system parameters or to unexpected disturbances. However, ifthe measurement errors are large, closed-loop control performance might beinferior to open-loop control.

7.3.1 Effect of State Feedback on System Properties

In this section we will discuss how state feedback affects system propertiessuch as stability, controllability and observability.

7.3.1.1 Stability

Stability of a linear time-invariant system depends entirely on the locationof the eigenvalues of the system matrix. A feedback control law yields closed-loop eigenvalues which differ from the open-loop eigenvalues. In addition,small time delays in state feedback can destabilize a system which is asymp-totically stable in the absence of such delays. Such time delays might occurin computing feedback controls.

Example 7.3.2 Consider the linear, time-invariant system

~x(t) =(

2 10 1

)~x(t) +

(01

)u(t)

that has open-loop eigenvalues λ1 = 1 and λ2 = 2 (unstable). Now considera state feedback control law of the form

u = K~x

where K = [−9 −5 ]. The actual computation of K will be discussed later in


this chapter. The closed-loop system is then

~x(t) =[(

2 10 1

)+(

01

)(−9 −5

)]~x(t)

=(

2 1−9 −4

)~x(t),

which has eigenvalues λ1 = λ2 = −1 and thus is asymptotically (exponen-tially) stable. Let us now assume that there is a small time delay in thefeedback loop of the form

u(t) = K~x(t− h),

where h > 0. The closed-loop system then becomes

~x(t) =(

2 10 1

)~x(t) +

(0 0−9 −5

)~x(t− h) (7.24)

which is a delay differential equation. Taking the Laplace transform of (7.24)and assuming that ~x(0) = ~0 we obtain[

sI −(

2 10 1

)−(

0 0−9 −5

)e−hs

]X(s) = 0,

where X(s) = Lx(t), the Laplace transform of x(t).The eigenvalues of the closed-loop system are given by the roots of

det ∆(λ) = 0, (7.25)

where ∆(λ) = λI−(

2 10 1

)−(

0 0−9 −5

)e−hλ (see e.g., [10]). We use a method

described in [14] to compute the roots of (7.25). For h = 0.1, all roots of (7.25)have negative real parts. However, when h = 0.22, one root of (7.25) has apositive real part,

λ = 0.03122± 4.43793i.

This example illustrates that a sufficiently large time delay, when introducedinto the feedback law, can destabilize a system which is exponentially stablein the absence of delays. This destabilization of state feedback is also observedfor certain infinite-dimensional systems (see [7, 8]).

7.3.1.2 Controllability

Consider the linear, time-invariant system (7.21) and assume that ~u(t) isdecomposed into two parts

~u(t) = u(t) + ~ur(t),


where u(t) = K~x(t) and ~ur(t) is the reference input. The closed-loop systemhas the form

~x(t) = (A+BK)~x(t) +B~ur(t).

We now pose the following question: If (A,B) is controllable, is (A+BK,B)controllable? That is, is controllability invariant under state feedback?

We note that the controllability matrix of the closed-loop system can bewritten as

(B|(A+BK)B| . . . |(A+BK)n−1B) = (B|AB| . . . |An−1B)×I KB KAB + (KB)2 · · · ∗0 I KB · · · ∗...

. . . . . . KB0 · · · 0 I

︸︷︷︸

M

.

Since the matrix M on the right side of the equation is non-singular, we have

ρ([B|(A+BK)B| . . . |(A+BK)n−1B]) = ρ([B|AB| . . . |An−1B]).

Hence, this establishes that the pair (A + BK,B) is controllable if and onlyif the pair (A,B) is controllable.

7.3.1.3 Observability

Analogously, we now want to know if observability is invariant under statefeedback. Let us consider a particular example.

Example 7.3.3 Consider the following system

~x(t) =(

1 10 1

)~x(t) +

(b1b2

)u(t)

y(t) =(c1 c2

)~x(t).

The observability matrix is

V =(CCA

)=(c1 c2c1 c1 + c2

)which has rank 2 if and only if c1 6= 0 (c2 is arbitrary). Hence, in particular,the system

~x(t) =(

1 10 1

)~x(t) +

(b1b2

)u(t)

y(t) =(

1 0)~x(t)


is observable. Now consider the feedback law u(t) = K~x(t). Is the pair(A+BK,C) observable? The new observability matrix is

V =(

CC(A+BK)

)=(

1 01 + b1k1 1 + b1k2

)which has rank 2 if and only if 1 + b1k2 6= 0.

Example 7.3.3 is an illustration of the general result: observability is notinvariant under state feedback.

7.4 Pole Placement (Relocation) Problem

Equation (7.23) indicates that the eigenvalues of the closed-loop systemusing state feedback ~u = K~x, where K is a constant m× n feedback matrix,are the roots of

|λI −A−BK| = 0. (7.26)

We now consider the following problem: Given a set of desired eigenvaluesλD1 , λD2 , . . . , λDn in the complex plane, can we find the state variable feedback(gain) matrix K so that the poles of AC ≡ A+BK are the desired ones? Thatis, can we relocate the poles of the open-loop system, λ1, λ2, . . . , λn (the rootsof |λI−A| = 0), to the desired locations λD1 , λ

D2 , . . . , λ

Dn in the complex plane

by state feedback. We note that since the elements of AC = A+BK are real,if any desired poles are complex, they must appear in (complex) conjugatepairs.

The answer to the above question is provided by the following result, whichgives an important link between controllability and pole placement using statefeedback. If the linear, time-invariant control system (7.21) is controllable,then by the state feedback ~u = K~x, where K is an m×n constant, real matrix,the eigenvalues of AC = A + BK can be arbitrarily placed anywhere in thecomplex plane. The proof of this result can be found in [9] and [16]. In thefollowing, we will show how to construct the appropriate feedback gain K bytwo different methods.

(i) Direct MethodThe goal is to have the following equality

|λI − (A+BK)| =n∏i=1

(λ− λDi ).

Expanding both sides of the above equation, we find

aCn + aCn−1λ+ . . .+ aC1 λn−1 + λn = aDn + aDn−1λ+ . . .+ aD1 λ

n−1 + λn. (7.27)


The coefficients aCi , for i = 1, 2, . . . , n, are functions of the elements kij , fori = 1, 2, . . . ,m and j = 1, 2, . . . , n, of the feedback gain matrix K. By equatingthe coefficients of λ in (7.27) we obtain n equations in mn unknowns. Thismethod illustrates that feedback gain in pole placement is unique only whenm = 1 (the single input case). Otherwise, many gains give the same desiredlocations for closed-loop eigenvalues. We will first restrict our attention tothe single-input case.

Example 7.4.1 Consider the system (7.21) where

A =(

2 10 1

), B =

(01

).

It is a simple exercise to show that ρ([B|AB]) = 2, which implies that (A,B)is controllable. Find the state variable feedback gain K =

(k1 k2

)so that

the eigenvalues can be relocated anywhere in the complex plane.Let λD1 and λD2 be arbitrarily given in C (if they have nonzero imaginary

parts, λD1 = λD

2 ). The problem is to find K =(k1, k2

)so that

|λI − (A+BK)| = (λ− λD1 )(λ− λD2 ).

Expanding both sides, we obtain the following equation:

λ2 − λ(2 + 1 + k2) + 2(1 + k2)− k1 = λ2 − (λD1 + λD2 )λ+ λD1 λD2 .

Simplifying and equating the coefficients in λ we obtain the following twoequations in the two unknowns k1 and k2:

λD1 + λD2 = 3 + k2

λD1 λD2 = 2 + 2k2 − k1

from which we obtain

k1 = −λD1 λD2 + 2(1 + k2)k2 = (λD1 + λD2 )− 3.

For example, if λD1 = −1 and λD2 = −1, then k1 = −9 and k2 = −5, so

A+BK =(

2 1−9 −4

)with characteristic polynomial

|λI − (A+BK)| = (λ− 2)(λ+ 4) + 9= (λ+ 1)2.

Hence, the feedback gain K =(−9 −5

)gives the desired closed-loop eigen-

values.


It should be emphasized that some eigenvalue relocation may be achievablewith state feedback even when (A,B) is not controllable.

Example 7.4.2 Consider the system (7.21) with

A =(

1 10 −1

), B =

(10

).

Here, the controllability matrix [B|AB] has rank 1, and therefore (A,B) is notcontrollable. Nevertheless, let us now consider the state feedback, ~u = K~x.Equating the coefficients of λ in

|λI − (A+BK)| = (λ− λD1 )(λ− λD2 )

we get

k1 = λD1 + λD2

−(1 + k1) = λD1 λD2 .

Hence, If λD1 = −2 and λD2 = −1, k1 = −3 and k2 arbitrary we will obtain thedesired closed-loop eigenvalues. That is, just because (A,B) is not controllabledoes not mean that we cannot achieve some desired pole-relocations. However,if λD1 = −2 and λD2 = −3, then

k1 = λD1 + λD2 = −5

but−(1 + k1) = 4 6= λD1 λ

D2 = 6.

That is, the closed-loop poles λD1 = −2 and λD2 = −3 cannot be achieved withany feedback gain K =

(k1, k2

).

(ii) Use of Controllable Canonical FormRecall that a scalar nth-order differential equation

dny

dtn+ a1

dn−1y

dtn−1+ . . .+ an−1

dy

dt+ any = u(t)

can be written equivalently as a first-order system of differential equations asfollows:

x1

x2

x3

...xn

=

0 1 0 · · · 00 0 1 0

0 0 0. . .

......

. . . 1−an −an−1 · · · −a1

x1

x2

x3

...xn

+

000...1

u(t), (7.28)


where x1 = y, x2 = y, . . . , xn = yn−1. We will now establish that for linear,time-invariant, single-input control systems, if (A,B) is controllable, thenthere exists a coordinate transformation such that the new equivalent dynam-ical equation has the form (7.28) which is called the controllable canonicalform. The word “controllable” is used because (7.28) is indeed controllable.More precisely, consider the linear control system with single input,

~x(t) = A~x(t) +~bu(t) (7.29)

and suppose that (A,~b) is controllable. Then there is a nonsingular coordinatetransformation

~z(t) = P−1~x(t)

such that the equivalent dynamical equation

~z(t) = P−1AP~z(t) + P−1~bu(t) (7.30)

has the form (7.28). To construct the transformation matrix P−1 we denote~p =

(p1 p2 . . . pn

)T ∈ Rn and assume that the coordinate transformationP−1 has the form

P−1 =

~p T

~p TA...

~p TAn−1

and P is partitioned as

P =(~q1 ~q2 . . . ~qn

),

where ~qi ∈ Rn. Since P−1P = I, we have

~pT

~pTA...

~pTAn−1

(~q1 ~q2 . . . ~qn)

=

1 0 · · · 0

0 1. . .

......

. . . 00 0 · · · 1

or

~p T ~q1 ~p T ~q2 · · · ~p T ~qn~p TA~q1 ~p TA~q2 · · · ~p TA~qn

......

~p TAn−1~q1 ~pTAn−1~q2 · · · ~p TAn−1~qn

=

1 0 · · · 0

0 1. . .

......

. . . 00 0 · · · 1

. (7.31)


Also, we have

P−1AP =

~p T

~p TA...

~p TAn−1

(A~q1 A~q2 · · · A~qn)

=

~p TA~q1 ~p TA~q2 . . . ~p TA~qn~p TA2~q1 ~p

TA2~q2 · · · ~p TA2~qn...

...~p TAn~q1 ~p

TAn~q2 · · · ~p TAn~qn

.

After comparing the above matrix with (7.31) we obtain

P−1AP =

0 1 0 0

0 0 1. . .

.... . .

0 0 · · · 1~p TAn~q1 ~p

TAn~q2 · · · ~p TAn~qn

.

It therefore remains to force the condition

P−1~b =

~p T~b

~p TA~b...

~p TAn−1~b

=

00...1

or, equivalently,

~p T(~b A~b · · · An−1~b

)=(

0 0 · · · 1).

This is a system of n equations in n unknowns ~p =(p1 p2 . . . pn

)T which

has a unique solution since the matrix(~b A~b · · · An−1~b

)has full rank or is

nonsingular.

Example 7.4.3 Transform the following control system

~x(t) =(

13 35−6 −16

)~x(t) +

(−21

)u(t)

into controllable canonical form.We first note that ρ([B|AB]) = 2 and, consequently, there exists a unique

coordinate transformation matrix P such that the equivalent dynamical equa-tion in the new coordinate is in controllable canonical form. Let ϕ =

(p1 p2

)Tand assume that

P−1 =(ϕ T

ϕ TA

).


The vector ϕ is the unique solution of(ϕ T~b

ϕ TA~b

)=(

01

)or

−2p1 + p2 = 09p1 − 4p2 = 1,

which has the solution p1 = 1 and p2 = 2. Therefore,

P−1 =(

1 21 3

), and P =

(3 −2−1 1

).

We check that

P−1AP =(

0 1−2 −3

), and P−1~b =

(01

)which is of controllable canonical form.

We note that the elements ai, for i = 1, 2, . . . , n, in the last row of P−1APin (7.28) are also the coefficients in the characteristic polynomial of A; thatis,

|λI −A| = λn + a1λn−1 + a2λ

n−2 + . . .+ an−1λ+ an.

The controllability matrices of (7.29) and (7.30) are, respectively, given by

Q = [~b|A~b| · · · |An−1~b]

Q = [P−1~b|P−1APP−1~b| · · · |P−1An−1PP−1~b]

= P−1[~b|A~b| · · · |An−1~b]= P−1Q. (7.32)

The above equation gives a link between the coordinate transformation andthe controllability matrices. Hence, another way to compute P−1 is by

P−1 = QQ−1,

where Q is nonsingular by the controllability assumption of (7.29).Let the equivalent dynamical equation (7.30) be of the controllable canon-

ical form (7.28) and let~u(t) = K~z(t),

where K =(kn kn−1 · · · k1

). The closed-loop system is then given by

~z(t) = (P−1AP + P−1~bK)~z(t),


where

AC = P−1AP + P−1~bK

=

0 1 0 · · · 00 0 1

. . ....1

−an −an−1 · · · −a1

and

ai = ai − ki.

The characteristic polynomial of AC has the form

|λI −AC | = λn + a1λn−1 + · · ·+ an−1λ+ an.

By equating the coefficients ai to those of∏ni=1(λ− λDi ) we can solve for the

elements ki of the feedback gain matrix K. We summarize the procedure ofcomputing K in the following algorithm.

Algorithm

1. Find the characteristic polynomial of A:

|λI −A| = λn + a1λn−1 + · · ·+ an−1λ+ an.

2. Compute

n∏i=1

(λ− λDi ) = λn + aD1 λn−1 + · · ·+ aDn−1λ+ aDn .

3. Computeki = ai − aDi ,

for i = 1, 2, . . . , n.

4. Compute the coordinate transformation P−1 by (7.32).

5. The feedback law for the system in canonical form (7.30) is

~u(t) = K~z(t).

6. The feedback law for the original system (7.29) is

~u(t) = KP−1~x(t).


Example 7.4.4 Consider the previous example where

~x(t) =(

13 35−6 −16

)~x(t) +

(−21

)u(t)

and the equivalent dynamical equation in controllable canonical form

~z(t) =(

0 1−2 −3

)~z(t) +

(01

)u(t)

where ~z(t) = P−1~x(t) and

P−1 =(

1 21 3

).

Let the desired closed-loop eigenvalues be λD1 = −1 + i and λD2 = −1 − i.Then

(λ− λD1 )(λ− λD2 ) = λ2 + 2λ+ 2

and k1 = 3− 2 = 1 and k2 = 2− 2 = 0. The gain matrices for the system incontrollable canonical form and the original system are given by, respectively,

K =(

0 1),

KP−1 =(

0 1)(1 2

1 3

)=(

1 3).

Remark 7.4.1 (Multi-Input Case, u(·) ∈ Rm, m > 1)One approach is to change the multi-input problem into a single-input problemand then apply the result above. To this end, let us consider

~x(t) = A~x(t) +B~u(t)

with state feedback law,~u(t) = K~x(t),

where K is a constant m × n matrix. Let ~d ∈ Rm and ~k ∈ Rn and assumethat

K = ~d~k T .

The closed-loop system is then given by

~x(t) = A~x(t) +B~d~k T~x(t)

= A~x(t) +~b~k T~x(t),

where ~b = B~d ∈ Rn. We now note that if (A,~b) is controllable, then themethod for the single-input case can be applied to find the “gain” ~k T . Theproblem then becomes: Given (A,B) controllable, can we find a vector ~d ∈ Rm

such that (A,~b), where~b = B~d, is controllable? The answer is positive if (A,B)is controllable and A is cyclic.


We note that since many ~d exist, the gain ~d~kT is not unique. The matrix Ais said to be cyclic if and only if the Jordan canonical form of A has one andonly one Jordan block associated with each distinct eigenvalue. For example,the matrix

A =

3 1 00 3 00 0 3

is not cyclic. Clearly, if A has distinct eigenvalues, then A is cyclic. Fora discussion of this idea and of other methods to construct the gain matrixK for the multi-input case, including a method using controllable canonicalform, see [6].

7.4.1 State Estimator (Luenberger Observer)

When we introduced state feedback, we assumed that all the state variableswere available to be fed back to the system. In practice this assumption isnot always met because

(i) all the state variables are not accessible to direct measurement, or

(ii) the number of measurement devices is limited (due to cost).

Thus, in order to utilize state feedback, we must reconstruct or obtain a goodestimate of the state vector ~x. We will reconstruct the state variable by usingthe available inputs and outputs of the system dynamics.

Consider the open-loop, time-invariant system

~x(t) = A~x(t) +B~u(t)

~y(t) = C~x(t)

with the initial condition ~x(0) = ~x0. We model the state estimator, x(t), withthe same dynamic as the original system, so that

˙x(t) = Ax(t) +B~u(t) +G(~y(t)− y(t))

y(t) = Cx(t),

subject to x(0) = 0 where ~y is the measured output and y(t) is the estimatedoutput. The n×p matrix G is called the observer gain matrix. For stochasticsystems, this is the Kalman filter.

We define the estimator error, ~e(t), as follows

~e(t) ≡ x(t)− ~x(t).

Then,

~e(t) = ˙x(t)− ~x(t)= Ax(t) +B~u(t) +G(~y(t)− y(t))−A~x(t)−B~u(t)= A~e(t) +GC(~x(t)− x(t))= (A−GC)~e(t)


and ~e(0) = x(0) − ~x(0) = 0 − x0 = −x0. The problem, then, is to chooseG such that (A − GC) is asymptotically stable and limt→∞ ~e(t) = 0 (i.e.,limt→∞(x(t)−~x(t)) = 0). Therefore, the question is whether we can we chooseG so that the poles of A − GC are anywhere we like in C (in particular, inthe left-half complex plane). The answer is yes if (A,C) is observable.

Let γD1 , . . . , γDn be the desired poles of A−GC in the left-half plane of C.

Recall that the poles of (A − GC) are the poles of (A − GC)T , and observethat

(A−GC)T = AT + CT (−GT )= A+ BK,

where A = AT , B = CT and K = −GT . We know that (A,C) is observableif and only if (A, B) is controllable. Thus, we can use any pole placementscheme to find the state variable gain K that causes A + BK to have thedesired poles γD1 , . . . , γ

Dn . Then, the state estimator gain matrix G = −KT

will cause A−GC to have the same desired poles.

Remark 7.4.2 The state estimator design is a pole placement design onA = AT and B = CT .

7.4.2 Dynamic Output Feedback Compensator

Let x(t) be a state estimator. Consider the control law

~u(t) = Kx(t) + ~ur(t),

where ur is the reference input and the state estimator

˙x(t) = Ax(t) +B~u(t) +G(~y(t)− y(t))y(t) = Cx(t)

subject to the initial condition x(0) = 0. This can be rewritten as

~u(t) = Kx(t) + ~ur(t)

˙x(t) = Lx(t) +G~y(t) +B~ur(t),

where L = A+BK −GC.We want to find the closed-loop poles of the system. We have

~x(t) = A~x(t) +B~u(t),

where

~u(t) = Kx(t) + ~ur(t)= K(~x(t) + ~e(t)) + ~ur(t)


and ~e(t) = (A−GC)~e(t). Therefore,

~x(t) = A~x(t) +BK~x(t) +BK~e(t) +Bur(t)= (A+BK)~x(t) +BK~e(t) +Bur(t).

Consequently, the closed-loop system is given by the composite system:(~x(t)~e(t)

)=(A+BK BK

0 A−GC

)(~x(t)~e(t)

)+(B0

)ur(t).

It then follows that the poles of the closed-loop system are those of A+BK(full state feedback design) and A − GC (state estimator design). This isknown as the deterministic separation principle. The dynamic output com-pensator is summarized in Figure 7.8.

u r (t)

u r (t)

x=Ax+Bu.

x(t)^

t

t

u(t)

K

State

Estimator

y(t)

Open-loop system

y=Cx

FIGURE 7.8: Dynamic output compensator.

Given the desired poles for the system λD1 , . . . , λDn , the state estimator is

designed using the following algorithm.

Algorithm

1. If (A,B) is controllable, we can find a matrix K such that A+BK hasthe desired poles λD1 , · · · , λDn .

2. Use the control law~u(t) = Kx(t) + ~ur(t).

That is, use the estimated states as though they were the actual states.

3. The state estimator is given by

˙x(t) = Ax(t) +B~u(t) +G(~y(t)− y(t))


y(t) = Cx(t)

subject to the initial condition x(0) = 0. The estimator error is

~e(t) = (A−GC)~e(t)

with ~e(0) = −x0. If (A,C) is observable, we can find a matrix G suchthat A−GC has the desired poles γD1 , · · · , γDn ; these are chosen by thedesigner. The rule of thumb is for the estimator error poles to be placedslightly to the left of the controlled system poles, i.e. Re(γDi ) < Re(λDi ),for i = 1, 2, · · · , n.

The closed-loop system will have the poles of A+BK (i.e., the desired polesλD1 , · · · , λDn ) plus the estimator error poles of A−GC (i.e., γD1 , · · · , γDn ). Thedesigner can achieve any set of poles λD1 , · · · , λDn and γD1 , · · · , γDn in Cby separately choosing the gains K and G (Separation Principle).

Example 7.4.5 Let~x(t) = A~x(t) +B~u(t)

~y(t) = C~x(t),

where

A =

0 1 00 0 10 −1 0

, B =

001

, and C =(

1 −1 0).

We want to design a dynamic output feedback compensator with desired polesλD1 = −1, λD2 = −1+j, λD3 = −1−j, and state estimator poles γD1 = −2, γD2 =−2 + j, and γD3 = −2− j. Note that the system is controllable, since

ρ([B|AB|A2B]) = ρ

0 0 10 1 01 0 −1

= 3,

and the system is observable since

ρ

CCACA2

= ρ

1 −1 00 1 −10 1 1

= 3.

The open-loop poles are λ1 = 0, λ2 = j, and λ3 = −j. Note in Figure 7.9 thatthe system is not asymptotically stable.

Using Separation Principle, we design the dynamic output feedback com-pensator as follows:

(a) First, we want to design a gain matrix K so that A+BK has the desiredpoles λD1 , λ

D2 , λ

D3 . Such a matrix is

K =(−2 −3 −3

).

Note that this is a single-input system in controllable canonical form.


0 1 2 3 4 5 6 7 8 9 10-3

-2

-1

0

1

2

3

4

5

t,seconds

x(t

)

x1(t)

x2(t)

x3(t)

FIGURE 7.9: The uncontrolled system (u ≡ 0.)

(b) Next, we design a matrix G such that A−GC has poles γD1 , γD2 , γ

D3 . It

is given by

G =

148−4

.

Note that (AT , CT ) is not in controllable canonical form. In this case,we can either transform the system to canonical form or use a directdesign method.

The closed-loop system and the estimator errors are plotted in Figures 7.10and 7.11, respectively. Note that the closed-loop system is asymptoticallystable. The state estimator, x(t), is plotted in Figure 7.12.

Now, suppose we want to design a dynamic feedback compensator for thissystem with the desired poles shifted further to the left in the complex plane,say λD1 = −4, λD2 = −4 + j, λD3 = −4 − j, and state estimator poles γD1 =−5, γD2 = −5 + j, and γD3 = −5 − j. We design a gain matrix K so thatA+BK has the desired poles λD1 , λ

D2 , λ

D3 :

K =(−68 −48 −12

).

Similarly, we design a matrix G such that A−GC has poles γD1 , γD2 , γ

D3 :

G =

1109520

.


0 1 2 3 4 5 6 7 8 9 10-4

-3

-2

-1

0

1

2

3

4

5

t,seconds

x(t

)

x1(t)

x2(t)

x3(t)

FIGURE 7.10: The state vector, x(t), of the closed-loop system with K =(−2 − 3 − 3) and G = ( 14 8 − 4)T .

0 1 2 3 4 5 6 7 8 9 10-4

-3

-2

-1

0

1

2

3

4

t,seconds

estim

ato

r err

or,

e(t

)

e1(t)

e2(t)

e3(t)

FIGURE 7.11: The estimator error, e(t), of the closed-loop system withK = (−2 − 3 − 3) and G = ( 14 8 − 4)T .

Figures 7.13 and 7.14 illustrate that both the state and the estimator error goto zero faster than in the previous case, and hence, so does the state estimator


0 1 2 3 4 5 6 7 8 9 10-4

-3

-2

-1

0

1

2

3

4

5

t,seconds

sta

te e

stim

ato

r, x

e(t

)

xe1(t)

xe2(t)

xe3(t)

FIGURE 7.12: The state estimator, x(t), with K = (−2 − 3 − 3) andG = ( 14 8 − 4)T . The label xe1(t) denotes x1(t), etc.

(Figure 7.15).

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-100

-80

-60

-40

-20

0

20

40

60

80

100

t,seconds

x(t

)

x1(t)

x2(t)

x3(t)

FIGURE 7.13: The state vector, x(t), with K = (−68 − 48 − 12) andG = (110 95 20)T .


0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-10

-5

0

5

10

15

t,seconds

estim

ato

r err

or,

e(t

)

e1(t)

e2(t)

e3(t)

FIGURE 7.14: The estimator error, e(t), with K = (−68 − 48 − 12) andG = (110 95 20)T .

Note, too, that the matrices K and G are greater in norm than those of theprevious case; that is, as we shift the desired poles farther to the left in thecomplex plane, we gain in the rate of decay but we incur additional penaltyin the process (more control). In the next section, we will discuss methods tofind the optimal control that minimizes a cost criteria.

7.5 Linear Quadratic Regulator Theory

Consider the linear time-invariant control system

~x(t) = A~x(t) +B~u(t) (7.33)

with ~u(t) = K~x(t). Then the closed-loop eigenvalues of

~x(t) = (A+BK)~x(t) (7.34)

can be arbitrarily located in the complex plane if (A,B) is controllable.However, Example 7.4.5 in the last section demonstrated that the faster wemake the system converge to the zero state, the larger the amplitude of Kis (and, therefore, more input or control will be required). This leads usto the following formulation of an optimal control problem: Find a control


0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-100

-80

-60

-40

-20

0

20

40

60

80

100

t,seconds

sta

te e

stim

ato

r, x

e(t

)

xe1(t)

xe2(t)

xe3(t)

FIGURE 7.15: The state estimator, x(t), with K = (−68 − 48 − 12) andG = (110 95 20)T . The label xe1(t) denotes x1(t), etc.

u ∈ L2(t0,∞;Rm) to minimize

J(~x0, ~u) =12

∫ ∞t0

(~xTQ~x+ ~uTR~u) dt, (7.35)

where ~x ∈ Rn, ~u ∈ Rm, Q ∈ Rn×n is symmetric positive semidefinite (SPSD),and R ∈ Rm×m is symmetric positive definite (SPD). Associated with theperformance index (7.35) are the linear dynamics described by (7.33). Thisoptimal control problem is known as the linear quadratic regulator problem[1]. It is not always possible to solve this problem as stated. For example,consider

~x(t) =(

1 00 1

)~x(t) +

(01

)u(t)

with initial condition ~x(0) = (1, 0)T and

J(~x0, u) =∫ ∞

0

(x21 + x2

2 + u2) dt.

We note that x1(t) = et for any arbitrarily control function u(t). Also, sincex2 = x2 + u with x2(0) = 0, it is easy to see that the optimal control u? = 0which, then, implies that the corresponding optimal trajectory x?2 = 0. Hence,

J(~x0, u?) =

∫ ∞0

e2t dt =∞.


This example shows that the optimal cost is not finite because of the followingthree reasons:

(i) the state x1 is uncontrollable,

(ii) the uncontrollable state is unstable,

(iii) the unstable state is part of the cost functional.

This difficulty would not arise if we assumed that (A,B) is controllable. Infact, if (A,B) is controllable and Q > 0, it has been shown [1] that

(a) The unique optimal control is given by ~u?(t) = −K~x?(t), where K =R−1BTΠ, Π is the unique positive definite matrix solution to the alge-braic Riccati equation

ATΠ + ΠA−ΠBR−1BTΠ +Q = 0

and ~x?(t) is the solution to the closed-loop system

~x?(t) = (A−BR−1BTΠ)~x?(t), ~x?(t0) = ~x0.

(b) Moreover, the matrix (A−BR−1BTΠ) has all eigenvalues with negativereal parts (hence, the closed-loop system is asymptotically stable) andthe optimal cost is given by

J?(~x0, ~u?) =

12~xT0 Π~x0.

The above optimal control is in full state feedback form. That is, we assumethat all state variables are available to be fed back. Since, in general, wedo not have full state information, we will make use of a state estimator toformulate the controller. From [1], the optimal state estimator (observer)takes the form

˙x(t) = Ax(t) +B~u(t) +G(~y(t)− y(t)) (7.36)y(t) = Cx(t), (7.37)

where ~y is the measured output. The optimal observer gain G is given byG = ΣCT R−1 and Σ solves the algebraic Riccati equation for the dual system

ΣAT +AΣ− ΣCT R−1CΣ + Q = 0.

Playing a similar role to that of the matrices Q and R in the optimal statefeedback problem, the symmetric positive semidefinite matrix Q and the sym-metric positive definite matrix R are design criteria for the optimal stateestimator.


b

h

x

y

x2

x1

FIGURE 7.16: Cantilever beam with piezoceramic patches.

7.6 Beam Vibrational Control: Real-Time Feedback Con-trol Implementation

In this section we present the application of the control theory discussedin previous sections to the control of transverse beam vibrations. The beamstructure is an aluminum cantilever beam in a “smart structure” paradigm(see Chapter 6). Using the same notation as in [15] we denote the beamlength, width, thickness, density, Young’s modulus and Kelvin-Voigt damp-ing by `, b, h, ρ, E and cD, respectively. The origin is taken to be at theclamped edge of the beam, and the axial direction is denoted by the x-axis(see Figure 7.16). A pair of identical piezoceramic patches are bonded onopposite sides of the beam with edges located at x1 and x2. Passive patchcontributions arising from material changes due to the presence of the patchesare included in the model. Patch parameters are denoted by the subscript pe,thus, the patch thickness, width, density, Young’s modulus and Kelvin-Voigtdamping are given by hpe, bpe, ρpe, Epe and cDpe, respectively. We denote thepiezoelectric constant relating mechanical strain and applied electric field byd31. Finally, we denote the transverse displacement by y and the voltagesapplied to the front and back patches by V1 and V2, respectively.

As derived in [3], the transverse (or bending) equation of the beam is givenin terms of resultant moments bMx by

ρ∂2y

∂t2− ∂2(bMx)

∂x2+∂2(bMx)pe

∂x2= f. (7.38)

Passive patch contributions are incorporated in the model above and hencethe linear mass density ρ(x) = ρhb + 2bρpehpeχpe(x) is piecewise constant.The characteristic function χpe(x) employed to isolate patch contributions is


defined by

χpe(x) =

1 , x1 ≤ x ≤ x2

0 , otherwise.(7.39)

Incorporating both internal damping and material changes due to the presenceof the patches, the internal moment resultant bMx has the form

bMx(t, x) = −EI(x)∂2

∂x2y(t, x)− cDI(x)

∂3

∂x2∂ty(t, x), (7.40)

where

EI(x) = Eh3b

12+

2b3Epea3χpe(x), cDI(x) = cD

h3b

12+

2b3cDpea3χpe(x), (7.41)

and a3 = (h/2+hpe)3−h3/8 (we refer the reader to Chapter 3 of [3] for detailsregarding patch contributions to the internal moment resultant). When volt-ages are applied to the front and back patches, the induced external moment(bMx)pe is given by

(bMx)pe(t, x) =12Epebd31(h+ hpe)χpe(x)[V1(t)− V2(t)]. (7.42)

External transverse forces acting on the beam are modeled by the functionf(t, x). Cantilever boundary conditions are given by

y(t, 0) =∂y

∂x(t, 0) = 0, Mx(t, `) =

∂

∂xMx(t, `) = 0, (7.43)

and initial conditions are denoted by

y(0, x) = y0(x),∂y

∂t(0, x) = y1(x). (7.44)

From previous discussions in Chapter 6, to approximate the solution to thestrong form (7.38) we first write the weak formulation of (7.38) as∫ `

0

ρhb∂2y(t, x)∂t2

φ(x)dx+∫ x2

x1

2bρpehpe∂2y(t, x)∂t2

φ(x)dx

+∫ `

0

EI∂2y(t, x)∂x2

∂2φ(x)∂x2

dx+∫ x2

x1

2b3Epea3

∂2y(t, x)∂x2

∂2φ(x)∂x2

dx

+∫ `

0

cDI∂3y(t, x)∂x2∂t

∂2φ(x)∂x2

dx+∫ x2

x1

2b3cDpea3

∂3y(t, x)∂x2∂t

∂2φ(x)∂x2

dx

=∫ x2

x1

12Epebd31(h+ hpe)

(V1(t)− V2(t)

)∂2φ(x)∂x2

dx

+∫ `

0

f(t, x)φ(x)dx,

(7.45)


and approximate the solution to the weak form (7.45) by performing theGalerkin expansion

yN (t, x) =N∑i=1

zi(t)Bi(x), (7.46)

where Bi are the cubic splines functions (6.63). This yields the matrix systemapproximating (7.45) of the form

(M +Mpe) z(t) + (D +Dpe) z(t) + (KE +KEpe) z(t) = F (t) + Bu(t)

z(0) = z0, z(0) = z1,(7.47)

where z(t) = [z1(t), . . . , zN (t)]T is the vector of coefficients. The matrices in(7.47) are defined by

[M ]k,l = ρhb

∫ `

0

Bl(x)Bk(x)dx, [Mpe]k,l = 2bρpehpe∫ x2

x1

Bl(x)Bk(x)dx

[KE]k,l = EI

∫ `

0

B′′l (x)B′′k(x)dx, [KEpe]k,l =2b3Epea3

∫ x2

x1

B′′l (x)B′′k(x)dx

[D]k,l = cDI

∫ `

0

B′′l (x)B′′k(x)dx, [Dpe]k,l =2b3cDpea3

∫ x2

x1

B′′l (x)B′′k(x)dx

[F ]k(t) =∫ `

0

f(x, t)Bk(x)dx, [B]k,1 = 12Epebd31(h+ hpe)

∫ x2

x1

B′′k(x)dx

u(t) = [V1(t) , V2(t)]T , [B]k,2 = − 12Epebd31(h+ hpe)

∫ x2

x1

B′′k(x)dx.

(7.48)The first order reformulation of (7.47) is given by

w(t) = Aw(t) +Bu(t) + F (t)

w(0) = w0 = [z0, z1]T ,(7.49)

where w = (z, z)T ,

A =[I 00 M +Mpe

]−1 [0 I−(KE +KEpe) −(D +Dpe)

], (7.50)

B =[I 00 M +Mpe

]−1 [0B

]and F (t) =

[I 00 M +Mpe

]−1 [0F (t)

]. (7.51)

Detailed discussion of the Galerkin approximation for the beam model (7.38)and some theoretical considerations including convergence analysis can befound in [3, 15]. Finally, the structure of the observation matrix C in theoutput equation y(t) = Cw(t) depends on the sensor employed in the exper-iment. In our model, the sensor is a proximity probe located at the back of


the beam sensing displacements at the point xob. The observation matrix Cis thus of the form

C =

B1(xob), · · · , BN (xob), 0, · · · , 0︸︷︷︸N

,where the Bi(xob)’s are the basis functions evaluated at xob.

In real-time implementation, the signals from the sensors are digitized andthe real-time processor can only perform at a discrete sample rate ∆t. Thus,the state estimator equation

˙w(t) = Aw(t) +Bu(t) + F (y(t)− y(t)) (7.52)y(t) = Cw(t) (7.53)

can only be evolved in time in discrete time steps, and the control voltage canbe only computed at this rate. The numerical ODE approximation method tosolve the state estimator equation (7.52) must satisfy the following criteria:

(i) the control u(tj) must be calculated before the arrival of the data at thenext time step tj+1 = tj + ∆t,

(ii) the method must be sufficiently accurate to resolve system dynamicsand

(iii) since the ODE systems are often stiff, the method must be A-stable orα-stable (see, e.g., [13]).

We chose a modified backward Euler method given in Chapter 8.2.1 of [3].The fast sample rate (and hence small ∆t) at which we can carry out theexperiment allows the use of this method. An A-stable modified backwardEuler method integrating the state estimator at time tj+1 is given by

wj+1 = (I −∆tAc)−1wj + ∆t(I −∆tAc)−1Fy(tj)

= R(Ac)wj + ∆tR(Ac)Fy(tj),(7.54)

where Ac = A − BK − FC, R(Ac) = (I − ∆tAc)−1 and the constant timestep is ∆t = tj+1 − tj . Note that the method is modified from standardbackward Euler methods since the observation y(tj) at future time steps arenot available. We now summarize the discrete-time algorithm in the followingReal-Time Control Algorithm, which is essentially Algorithm 8.5 in [3].

Real-Time Control Algorithm

(a) Offline

(i) Construct matrices A,B,C,Q,R, Q, R.

(ii) Solve the Riccati equations for Π and Σ.


TABLE 7.1: Beam and patch parameters.Beam Patch

` = 0.286 m hpe = 5.3× 10−44 mh = 0.001 m ρpe = 7.45× 103 kg/m3

b = 0.2543 m Epe = 6.4× 1010 N/m2

ρ = 3.438× 103 kg/m3 cDpe = 3.96× 105 Ns/m2

E = 7.062× 1010 N/m2 d31 = 262× 10−12 m/VcD = 1.04× 106 Ns/m2 x1 = 0.02041 mx = 0.11076 m x2 = 0.04592 m

(iii) Construct K = R−1BTΠ, F = ΣCT R−1, and Ac = A−BK − FC.

(iv) Construct R(Ac) = (I −∆tAc)−1 and R(Ac)F = (I −∆tAc)−1F .

(b) Online

(i) Collect observation y(tj).

(ii) Time stepping the discrete compensator system

wj+1 = R(Ac)wj + ∆tR(Ac)Fy(tj).

(iii) Calculate the voltage u(tj+1) = −Kwj+1.

In Table 7.1, we report the dimensions and parameters of our experimen-tal beam structure depicted in Figure 7.17. The aluminum beam parame-ters ρ,E, cD and the lead zirconate titanate piezoceramic patch parametersρpe, Epe, cDpe, d31 were obtained from the manufacturers.

Numerical simulations were performed to obtain reasonable values of thecontrol parameter matrices Q,R, Q and R. We sought parameters leading tomaximum control voltages within the ±100V range of the patches while atthe same time providing good attenuation. The matrices employed were ofthe form

Q = d1

[KE +KEpe 0

0 M +Mpe

], R = r1I

p×p, p = 1

Q = d1

[IN 00 IN

], R = r1I

s×s, s = 1,(7.55)

where p is the number of actuators, s is the number of sensors, d1 = 2× 108,r1 = 0.98, d1 = 1× 103, and r1 = 1.

In Figure 7.18, we present a diagram of the experimental setup and im-plementation of the online component of the Real-Time Control Algorithm.Voltage spikes to the back patch (to excite the beam) were generated by aDS1103 dSpace control system. The excitation signal was low pass filtered(i.e., only the low frequency signal is retained) and amplified before being


FIGURE 7.17: Experimental beam with piezoceramic patches.

applied to the back patch. The voltage spike was amplified so as to produce90 volts at the peak. A proximity probe located at the back of the beamat x = x = 0.11076m was used to measure displacements and the obser-vation readings were digitized through one analog to digital channel of thedSpace hardware. This observation signal enters the online component of theReal-Time Control Algorithm as y(tj). By employing the discrete modifiedbackward Euler method (7.54), the state estimator w(tj+1) was obtained andmultipled with the gain matrix K to produce the control voltage. The controlsignal was then low pass filtered and amplified before being sent to the frontpatch. A constant discrete time rate of ∆t = 10−4s was employed in runningthe real-time processor.

In Figure 7.19 and 7.20, we report the uncontrolled and controlled displace-ments and the control voltage, respectively. Note that the control system hasbasically attenuated the displacements after one second.

Project: Control Design

Consider the inverted pendulum system mounted on a motor-driven cartas shown in Figure 7.21. Here, we consider only the two-dimensional problemwhere the pendulum moves only in the plane of the paper.

The inverted pendulum is unstable in that it may fall over any time inany direction. It is desired to keep the pendulum upright in the presence of


Voltage Spike

Multiply by

R(A )

Store

+

Add

Multiply by

Gain Matrix K

in memory

dSpace DAC

Channel 1

Control Voltage

w

w

w

dSpace DAC

Channel 2

Low Pass Filter

Amplifier

dSpace ADC

Channel 17

w ob(t )

Multiply by

R(A )F w ob(t )

w

c(t )

c(t )

Proximity Probe Back Patch

Front Patch

Amplifier

Low Pass Filter

+

c(t )

c(t )

R(A )

j

j

j

j+1

j

j+1

R(A )Fc

c

c

c

Beam

dSpace

FIGURE 7.18: Experimental setup and implementation of online compo-nent of the Real-Time Control Algorithm.

disturbances (such as a gust of wind acting on the mass or an unexpectedforce applied to the cart). The slanted pendulum can be brought back to thevertical position when appropriate control force u(t) is applied to the cart. Atthe end of each control process, it is also desired to bring the cart back to theorigin position x = 0, the reference position.

Design a control system such that, given any initial conditions (caused bydisturbances), the pendulum can be brought back to the vertical position andalso the cart can be brought back to the reference position (x = 0). Thus,this is a regulator problem and the controller can be designed using the linearquadratic regulator (LQR) technique. We assume the following numericalvalues for M , m and l:

M = 2 kg,m = 0.1 kg, l = 0.5 m,


FIGURE 7.19: Uncontrolled and controlled displacements at xob =0.11075m.

FIGURE 7.20: Control voltages.

and that the pendulum mass is concentrated at the top of the rod and therod itself is essentially massless.

(i) A mathematical model.


FIGURE 7.21: The inverted pendulum.

To derive the equations of motion for the system, consider the free bodydiagram of the inverted pendulum system as depicted in Figure 7.22.

FIGURE 7.22: Free body diagram of the inverted pendulum.

The rotational motion of the pendulum rod about the center of gravity


of the mass m is described by

Iθ = V l sin(θ)−Hl cos(θ), (7.56)

where I is the moment of inertia of the pendulum rod about its centerof gravity. The horizontal motion of the mass m is given by

md2

dt2(x+ l sin(θ)) = H (7.57)

and the vertical motion of the mass m is given by

md2

dt2(l cos(θ)) = V −mg. (7.58)

Finally, the horizontal motion of the cart is described by

Md2x

dt2= u−H. (7.59)

These equations are nonlinear because of the nonlinearities in (7.56),(7.57) and (7.58) and the coupling between θ and x. In order to applythe linear control theory presented in this chapter we will now proceedto linearize the above equations. We assume that the angle θ is small sothat sin(θ) ≈ θ and cos(θ) ≈ 1. This assumption reduces the equations(7.56-7.59) to

Iθ = V lθ −Hl, (7.60)m(x+ lθ) = H, (7.61)

0 = V −mg, (7.62)Mx = u−H. (7.63)

From equations (7.61) and (7.63), we obtain

(M +m)x+mlθ = u. (7.64)

From equations (7.60) and (7.62), we have

Iθ = mglθ −Hl = mglθ − l(mx+mlθ)

or(I +ml2)θ +mlx = mglθ. (7.65)

For this project, we now assume that the moment of inertia I = 0.Therefore, equations (7.64) and (7.65) can be modified to

Mlθ = (M +m)gθ − u, (7.66)Mx = u−mgθ. (7.67)


We note that equations (7.66) and (7.67) describe the motion of theinverted-pendulum-on-the-cart system (under the assumption that theangle θ is small). They constitute a mathematical model of the sys-tem. In order to design the feedback control law, we now rewrite theabove equations (7.66) and (7.67) as a system of first order differentialequations. To begin we define the state variables x1, x2, x3 and x4 by

x1 = θ, x2 = θ, x3 = x, x4 = x.

Note that the angle θ indicates the rotation of the pendulum rod aboutthe point P and x is the location of the cart. We consider θ and x asthe outputs of the system. That is,

~y(t) =[x1

x3

].

In terms of vector-matrix equations, the state and output equations aredescribed by

x1

x2

x3

x4

=

0 1 0 0

M+mMl g 0 0 00 0 0 1−mM g 0 0 0

x1

x2

x3

x4

+

0− 1Ml01M

u,

~y(t) =[y1

y2

]=[

1 0 0 00 0 1 0

]x1

x2

x3

x4

.Using numerical values for M , m and l as given previously we obtain

~x = A~x+Bu, (7.68)~y = C~x, (7.69)

where

A =

0 1 0 0

20.601 0 0 00 0 0 1

−0.4905 0 0 0

, B =

0−10

0.5

,C =

[1 0 0 00 0 1 0

].

(ii) Model Analysis and State Feedback Control Design.

1. When u = 0, show that the zero equilibrium state is unstable.

2. Show that the system is controllable and observable.


3. For u = 0 and x(0) = [0.1, 0, 0, 0]T , compute and plot the solutiontrajectories. Comment on your solution curves.

4. Use the LQR formulation (use MATLAB routine lqr) to deter-mine a full state observer (estimator) and a stabilizing linear statefeedback control law. For the full state observer, use Q = 50I andR = 2I, where I is the identity matrix. For the state feedbackcontrol law, use Q = 10I and R = 2I. Plot the closed-loop systemstates, the feedback control, and the state estimator when x(0) isthe same as in part 3. above.

5. Now, let Q = 500I and Q = 100I, repeat part 4. and comment onthe effects of the choices for the weights on the state trajectoriesof the closed loop system and on the control.

References

[1] B.D.O. Anderson and J.B. Moore, Linear Optimal Control, PrenticeHall, Englewood Cliffs, 1971.

[2] P.J. Antsaklis and A.N. Michel, Linear Systems, The McGraw-Hill Com-panies, Inc., New York, 1997.

[3] H.T. Banks, R.C. Smith and Y. Wang, Smart Material Structures: Mod-eling, Estimation and Control, Masson/John Wiley, 1996.

[4] S. Barnett and R.G. Cameron, Introduction to Mathematical ControlTheory, Oxford Applied Mathematics and Computing Science Series,2nd ed., 1985.

[5] W.L. Brogran, Modern Control Theory, Prentice Hall, Englewood Cliffs,3rd ed., 1991.

[6] Chi-Tsong Chen, Linear System Theory and Design, Oxford UniversityPress, Inc., New York, 1999.

[7] R. Datko, Not all feedback stabilized hyperbolic systems are robust withrespect to small time delays in their feedbacks, SIAM J. Control andOptimization, 26(3), 1988, pp. 697–713.

[8] R. Datko, J. Laguese and M.P. Polis, An Example on the effect of timedelays in boundary feedback stabilization of wave equations, SIAM J.Control and Optimization, 24(1), 1986, pp. 152–156.

[9] E.J. Davison, On pole assignment in multivariable linear systems, IEEETrans. on AC, AC-13(6), 1968, pp. 747–748.

[10] Jack K. Hale and Sjoerd M. Verduyn Lunel, Introduction to FunctionalDifferential Equations, Applied Mathematical Sciences 99, Springer-Verlag, New York, 1993.

[11] R.E. Kalman, On the general theory of control systems, Proc. FirstInternl. Cong. IFAC, Moscow, 1960, Automatic and Remote Control,1961, pp. 481–92.

[12] R.E. Kalman, Mathematical description of linear dynamical systems,SIAM J. on Control, Ser. A, 1(2), 1963, pp. 152–192.

[13] J.D. Lambert, Computational Methods in Ordinary Differential Equa-tions, John Wiley & Sons, New York, 1973.

213

214 References

[14] A. Manitius, G. Payre, R. Roy and H.T. Tran, Computation of eigenval-ues associated with functional differential equations, SIAM J. Sci. Stat.Comput., 8(3), 1987, pp. 222–247.

[15] R.C.H. del Rosario, H.T. Tran and H.T Banks, Proper orthogonal de-composition based control of transverse beam vibrations: Experimentalimplementation, IEEE Trans. on Control Systems Technology, 10, 2002,pp. 717–726.

[16] W.M. Wonham, On pole assignment in multi-input, controllable linearsystems, IEEE Trans. on AC, AC-12(6), 1967, pp. 660–665.

[17] J. Zabczyk, Mathematical Control Theory: An Introduction, Birkhauser,Boston, 1992.

Chapter 8

Wave Propagation

An area of research in the structural acoustics community which has at-tracted a great deal of interest in the early nineties is the problem of reducingstructure-borne noise levels within an acoustic chamber. A specific examplein the aerospace industry was motivated by the development of a class ofturboprop engines that are very fuel efficient (see also Section 6.1). Theseengines, however, produce low-frequency but high amplitude acoustic fieldswhich in turn cause vibrations in the fuselage leading to unwanted interiornoise through acoustic/structure interactions. As discussed earlier in Sec-tion 6.1, both passive and active control techniques were considered for thisproblem in frequency domain as well as time domain setting. In addition,mathematical models and approximation techniques were developed for both2-D and 3-D coupled structural acoustics problem (see, e.g., [1, 3, 4]). In par-ticular, the mathematical models consist of an exterior noise source which isseparated from an interior cavity by a thin elastic structure (a beam, plate orshell). The dynamical equations for the shell, plate and beam models underappropriate assumptions are known as Donnell-Mushtari shell, Love-Kirchhoffplate and Euler-Bernoulli beam equations, respectively [5]. In this chapter,the development of the wave equation for the interior acoustic pressure willbe considered.

Mathematically we think of sound as perturbations of pressure and densityfrom the “static state” of a fluid. Therefore, we will first consider the workingtool of fluid mechanics, the Navier-Stokes equation, which is merely Newton’sSecond Law of Motion applied to a fluid element.

8.1 Fluid Dynamics

We are all familiar with fluids: ocean waves, air, blood, and so on. The ap-plications of fluid mechanics cover an incredible broad range, from hydraulics,aerodynamics, physical oceanography, atmospheric dynamics, and wind en-gineering, to cardiovascular medicine and biofluids. Understanding of fluiddynamics has been one of major advances of physics, applied mathematicsand engineering over the last hundred years. Indeed modern design of air-

215


craft, spacecraft, automobiles, land and marine structures, to studying efflu-ent discharge into the sea and motions of the atmosphere depend on a clearunderstanding of the relevant fluid mechanics.

As we shall see, the study of fluids is not so simple, because fluid is extendedover space, and when a fluid moves (because of motion of its boundaries)forces are exchanges to the interior of the fluid by the fluid itself. However,underlying all of fluid dynamics are the empirically verified physical principlesof mass, momentum, and energy conservations, combined with the laws ofthermodynamics. These principles have already been discussed earlier in otherapplications such as mass transport and heat conduction. We will see in thissection how they are applied to obtain the equations of motion for fluid.

We begin by introducing some of the terms and concepts which are oftenused when describing fluids and fluid motion mathematically.

8.1.1 Newton’s Law of Viscosity

A fluid differs from a solid in that a fluid will not come into equilibriumwith a shearing force. That is, solids change their shape and deform until abalance is reached between the applied force and internal forces (otherwise thematerial breaks). Fluids, on the other hand, will deform continuously (that is,flow) under the action of shearing force. One way to imagine a shearing forceis to think of a fluid as a stack of thin layers, like a fluid deck of cards. Now,set the cards on a table, rest your hand on top of the deck, then move the handhorizontally. The layers of cards slide over one another, with the top cardsbeing displaced the most, and the bottom card not at all. Unlike a solid, ifyour hand continues to move horizontally, the top cards will continue moving.This describes how a fluid moves in a non-turbulent flow, one layer sliding overanother, although different fluids will oppose its motion by different amounts.This resistance of the fluid to shearing forces is called viscosity. One can seethis “no-slip” boundary condition by considering the motion of a fluid betweentwo parallel plates of area A, which are separated by a distance ∆y (see Figure8.1). At time t = 0 the lower plate moves to right horizontally at a constantvelocity ∆vx. As time increases, the fluid velocity profile is as depicted inFigure 8.2. In fact, for large t the fluid velocity distribution is linear alongthe y-direction (see Figure 8.3). If one measures the force to keep the lowerplate moving with constant velocity ∆vx, one finds that it is proportional tothe velocity ∆vx and to the area A of the plates, and inversely proportionalto the distance ∆y. That is,

F = −µA∆vx∆y

,

where the constant of proportionality µ is called the viscosity of the fluid.Now taking the limit as ∆y → 0, we have

F = −µAdvxdy

. (8.1)

Wave Propagation 217

It is customary to rewrite the above equation in a mere explicit form. Wedefine the shear stress in the x-direction on a fluid surface with normal vectorparallel to the y-direction as τyx, then the equation (8.1) can be rewritten as

τyx = −µdvxdy

. (8.2)

This simply states that the shear stress is proportional to the negative of thevelocity gradient, which is the same behavior that one experiences with heatconduction as discussed earlier in Chapter 5 (in which heat flux is proportionalto the negative of the temperature gradient). This is known as Newton’s lawof viscosity, and fluids or gases that behave in this fashion are called Newto-nian fluids. There are, however, quite a few industrially important materials(pastes and highly polymeric materials) which do not obey the relation (8.2),that is, the relation between τyx and dvx/dy is not linear. These fluids arereferred to as non-Newtonian fluids. For non-Newtonian fluids, the viscosityµ is not constant but a function of either shear stress or velocity gradient.The subject of non-Newtonian flow is beyond the scope of this book (we referthe interested reader to [6]).

∆y

x

y

at rest at t<0

FIGURE 8.1: A fluid initially at rest between two parallel plates.

The shear stress in equation (8.2) can also be interpreted as a flux of x-directed momentum in the y direction, which is equivalent to force per unitarea. Recall that by flux is meant “rate of flow per unit area.” Hence, momen-tum flux has units of momentum per unit area per unit time. The negativesign in equation (8.2) indicates that momentum tends to go in the directionof decreasing velocity. It should be emphasized that the discussion above islimited to laminar flow. That is, at low velocities the fluid flows without lat-eral mixing, and adjacent layers slide past one another like a deck of playingcards. There are no eddies or swirls of fluids. But not all flow is laminar.At higher velocities fluid swirls erratically. This is called turbulent flow. Theonset of turbulent flow depends on the fluid’s speed, its viscosity, its density,and the size of the obstacle it encounters. These variables are combined into


F

∆v x

At t=0, the lower plate is at a constant velocity because of a stead-ied force F being applied.∆v x

For small t, the layer immediately to the bottom plate is carried along at the velocity of this plate. The layer just above it is moving at a slightly slower velocity, etc.

∆v x

FIGURE 8.2: Transient velocity profile of a fluid between two parallelplates.

the so-called Reynolds number after the Irish mathematician and physicistOsborne Reynolds (1842-1912). For a flow passing through a straight circularpipe of diameter D, the Reynolds number, which is dimensionless, is definedby

Reynolds number =density×D × flow speed

viscosity.

For flow in a circular pipe, the flow is laminar when the Reynolds number isless than 2100. When the value is over 4000, the flow is turbulent. In thecase where the Reynolds number is between 2100 and 4100, which is calledthe transition region, the flow can be viscous or turbulent [7].

The viscosity of gases at low density increases with increasing temperature(see, e.g., [6]). In liquids, however, the viscosity usually decreases with in-


For large t (steady state), velocity distri-bution in fluid is linear.

∆v x

FIGURE 8.3: Fluid shear in steady-state between two parallel plates.

creasing temperature. For many applications, viscosity can be ignored and inthis case the fluid is called inviscid. Another important distinction arises fromwhether the fluid is compressible or incompressible. This distinction, whichwill be made clear mathematically later in the chapter, expresses the elasticproperties of the fluid. It describes how the density of the fluid changes inresponse to changes in pressure and temperature.

In the cgs (centimeter-gram-second) system in which the unit of force isdyne, the unit of viscosity is called a poise or centipoise (cp), in honor ofJean Poiseuille (1797-1869) who developed an improved method for measuringblood pressure. In the international System of Units (SI), the unit of forceis newton (N) and the unit of viscosity is newton-second/square meter. Onecp is 1 × 10−3Ns/m2. Finally, sometimes the viscosity is also given as theratio of viscosity to the mass density of the fluid and is called the kinematicviscosity. That is, the kinematic viscosity is given by

ν = µ/ρ,

where ρ is the fluid mass density (mass per unit volume). The unit of ν is inm2/s or cm2/s. In Table 8.1 some experimental viscosity data are given forsome gases and liquids at one atm pressure [6]. Note that at room temperaturethe viscosity is about one cp for water and about 0.02 cp for air.


TABLE 8.1: Viscosity values of some gases and liquids atatmospheric pressure.

Gases Temp. (oC) µ (cp) Liquids Temp. (oC) µ (cp)Air 0 0.01716 Water 0 1.787Air 20 0.01813 Water 20 1.0019Air 100 0.02173 Water 100 0.2821CO2 20 0.0146 C2H5OH 20 1.194N2 20 0.0175 Hg 20 1.547

CH4 20 0.0109 Glycerol 20 1069

8.1.2 Derivative in Fluid Flows

Consider a function f(t, x, y, z) as representing some fluid field variable(such as the velocity, or the pressure, or the density). We want to know howto compute its rate of change with respect to time. Using the chain rule weobtain

df

dt=∂f

∂t+∂f

∂x

dx

dt+∂f

∂y

dy

dt+∂f

∂z

dz

dt. (8.3)

In the above expression the term ∂f/∂t means the rate of change of f withrespect to t at fixed position (x, y, z). This is called a partial derivative orEulerian description, named after the Swiss mathematician Leonhard Euler(1707-1783).

We can also find the rate of change of f as we follow the fluid or the material.In this case the extended derivative (8.3) must be used and such a form iscalled the convective derivative (or, in some textbooks, the material derivativeor Lagrangian description, named in honor of the French mathematician andmathematical physicist Joseph-Louis Lagrange (1736-1813)). We denote thisderivative by

Df

Dt=∂f

∂t+∂f

∂x

dx

dt+∂f

∂y

dy

dt+∂f

∂z

dz

dt.

What this means is that the Lagrangian rate of change in time of any fluidquantity f is made up of two parts: the rate of change in time of f at theinstantaneous spatial position of the fluid element and the rate of change off due to the fact that fluid element is moving from one place to another.

8.1.3 Equations of Fluid Motion

In this section we consider the study of fluid motion as determined by someequations of motion. By this we mean, in the traditional Newtonian sense,an equation which relates the acceleration of the motion of the fluid to theforces that are generating the motion. When we consider the motion of a solid(baseballs, rockets, etc.), it is fairly easy to think of the solid as moving inresponse to the forces applied to it. On the other hand, fluids are slippery andit is more difficult to think about what it is that is being pushed around. We


will get around this obstacle by thinking about an infinitesimal small volumeof fluid within the whole body of fluid. In the sequel we will refer to this smallvolume of fluid as the fluid element and it is sufficiently small so that we canconsider it as a single point or particle.

The equations which we will derive are known as the Navier-Stokes equa-tions. These equations are coupled with the continuity equation and theequation of state to describe all problems of the viscous flow of a pure isother-mal fluid. For nonisothermal fluids, and for multicomponent fluid mixtures,additional equations are needed to describe the conservation of energy and theconservation of individual chemical species. In this section, we will restrictour discussion to isothermal and Newtonian systems.

(a) Continuity Equation

Consider a stationary volume element ∆x∆y∆z which is fixed in space asdepicted in Figure 8.4. The equation of continuity is developed by consideringa mass balance for a fluid flowing through this stationary fluid element:

rate of mass accumulation = rate of mass in - rate of mass out.

Here, it is assumed that we are dealing with fluids which spontaneously gen-erate or destroy material and we are not concerned with any regions withsources or sinks of materials (e.g., taps and plug-holes).

∆z∆y

∆x

xxrv

x+∆xxrv

FIGURE 8.4: A fluid element fixed in space through which a fluid is flow-ing.

We begin by considering the rate of mass entering and leaving the facesperpendicular to the x-axis (see Figure 8.4). Since the product of mass densitywith the velocity is the mass flux, the rate of mass entering the face at xis (ρvx)|x∆y∆z and that leaving the face at x + ∆x is (ρvx)|x+∆x∆y∆z.Rates of mass entering and leaving the faces perpendicular to the y- and z-


axes can also be derived similarly and are omitted here. The rate of massaccumulation within the fluid element is (∆x∆y∆z)(∂ρ/∂t). Substituting allthese expressions into the mass balance equation, we obtain

∆x∆y∆z∂ρ

∂t= ∆y∆z[(ρvx)|x − (ρvx)|x+∆x]

+∆x∆y[(ρvz)|z − (ρvz)|z+∆z]+∆x∆z[(ρvy)|y − (ρvy)|y+∆y].

Now, dividing both sides by ∆x∆y∆z and taking the limits as ∆x,∆y,∆zapproach zero, we get

∂ρ

∂t= −

[∂(ρvx)∂x

+∂(ρvy)∂y

+∂(ρvz)∂z

].

More conveniently, we may rewrite the above equation in vector form as fol-lows

∂ρ

∂t= −∇ · (ρ~v). (8.4)

This is called the equation of continuity, which describes the rate of change ofdensity at a fixed point and is precisely the same equation that was derivedin Chapter 4 in the context of mass balance (see equation (4.2)). We canconvert equation (8.4) into another form by carrying out the actual partialdifferentiation:

∂ρ

∂t+ vx

∂ρ

∂x+ vy

∂ρ

∂y+ vz

∂ρ

∂z︸︷︷︸DρDt

= −ρ[∂vx∂x

+∂vy∂y

+∂vz∂z

]︸︷︷︸

−ρ(∇·~v)

.

Hence, equation (8.4) becomes

Dρ

Dt= −ρ(∇ · ~v), (8.5)

where the notation D/Dt denotes the convective derivative as defined earlier.This equation describes the rate of change of density as seen by an observerfloating along with the fluid.

In any case, equation (8.4) or (8.5) is simply a statement of conservation ofmass. Furthermore, these equations can be derived for an arbitrary shape offluid element instead of a rectangular fluid element as we have done above.

Often in engineering with liquids that are relatively incompressible, thedensity ρ is essentially constant. In this case,

(∇ · ~v) = 0.

This is the incompressibility condition and it is applicable to viscous or inviscidfluids. Note that for the above equation to be valid, ρ must remain constant


for a fluid element as it moves along a path following the fluid motion (thatis, Dρ/Dt = 0 employing the Lagrangian description).

(b) Momentum Equations

Fundamental to the derivation of equations of motion is the idea of applyingthe physical principle of conservation of linear momentum to an arbitrary fluidelement, which is of the form:

(rate of momentum accumulation) = (rate of momentum in)−(rate of momentum out)+(sum of forces acting on system).

Similar to the derivation of the continuity equation we will apply the abovemomentum balance to a stationary fluid element of volume ∆x∆y∆z as shownin Figure 8.5. We begin by considering the rates at which the x component ofthe momentum enters through each of the surfaces. The y and z componentsof the momentum can be described analogously.

∆z∆y

∆x

xxxt

x+∆xxxt

z+∆zzxt

y+∆yyxt

zzxtyyx

t

FIGURE 8.5: A fluid element of volume ∆x∆y∆z fixed in space throughwhich the x-component of the momentum is transported.

Momentum flows into and out of a fluid element by two mechanisms. Onemechanism is by virtue of the bulk fluid flow (convection). The second mech-anism is by virtue of the velocity gradient (molecular transfer). This can beseen by considering the interaction between two adjacent layers of a fluid,


which have different velocities (and hence different momentum). The randommotions of the molecules in the faster moving layer send some of the moleculesinto the slower moving layer, where they collide with slower moving moleculesand thus speed them up (or increase their momentum). Similarly, moleculesin the slower moving layer slow down those in the faster moving layer. Thisexchange of molecules between layers produce a transfer of momentum byvirtue of the velocity gradient (from high velocity to low velocity layers).

The rate at which the x component of momentum enters the face at x byconvection is

((rate of mass in)vx)|x = (ρvxvx)|x∆y∆z.

The rate at which it leaves the face at x+ ∆x is

((rate of mass out)vx)|x = (ρvxvx)|x+∆x∆y∆z.

Note that the product of (ρvx) with vx gives the momentum flux (with unitmomentum/s · m2). The rate at which the x component of momentum en-ters the face at y is (ρvyvx)|y∆x∆z. Similar expressions can be written forthe remaining three faces. Combining these momentum fluxes we obtain thefollowing expression for the net convective x momentum flow into the fluidelement ∆x∆y∆z

∆y∆z [ρvxvx|x − ρvxvx|x+∆x] + ∆x∆z [ρvyvx|y − ρvyvx|y+∆y]+ ∆x∆y [ρvzvx|z − ρvzvx|z+∆z] .

Analogously, the net x component of the momentum by molecular transfer isgiven by

∆y∆z [τxx|x − τxx|x+∆x] + ∆x∆z [τyx|y − τyx|y+∆y]+ ∆x∆y [τzx|z − τzx|z+∆z] .

These products of molecular fluxes of momentum with areas may be consid-ered as friction forces due to shearing. Here, τxx is the normal stress on thex face, τzx is the x-directed tangential (or shear) stress on the z face and τyxis the x-directed shear stress on the y face.

There are two different kinds of forces acting on the fluid element. Thefirst are called body forces which act throughout the whole volume of the fluidelement (not just on its edges such as surfaces). For fluids in a gravitationalfield (as on earth where it is effectively constant), this body force is knownas the gravitational force. The gravitational force gx per unit mass in the xdirection multiplied by the mass of the element ρ∆x∆y∆z to give the bodyforce

ρgx∆x∆y∆z.

The second kind of forces exist by virtue of the fluid element actually beingsurrounded by other fluid elements. For a fluid element in the body of a fluid,the rest of the fluid can only exert forces by contact; that is, only at the


surface of the fluid element. These forces are called surface forces or pressureforces. The net pressure force acting on the fluid element in the x directionis the difference between the force acting at the face x and that at x + ∆x.This is written as

∆y∆z(p|x − p|x+∆x).

Finally, the rate of momentum accumulation in the fluid element of volume∆x∆y∆z is given by

∆x∆y∆z∂(ρvx)∂t

.

Substituting the above expressions into the momentum balance equation, weobtain the following expression

∆x∆y∆z∂(ρvx)∂t

= ∆y∆z [ρvxvx|x − ρvxvx|x+∆x]

+∆x∆z [ρvyvx|y − ρvyvx|y+∆y]+∆x∆y [ρvzvx|z − ρvzvx|z+∆z]+∆y∆z [τxx|x − τxx|x+∆x]+∆x∆z [τyx|y − τyx|y+∆y]+∆x∆y [τzx|z − τzx|z+∆z]+∆y∆z (p|x − p|x+∆x) + ρgx∆x∆y∆z.

Dividing by ∆x∆y∆z and then taking the limits as ∆x,∆y,∆z approachzero, we obtain

∂

∂t(ρvx) = −

(∂

∂xρvxvx +

∂

∂yρvyvx +

∂

∂zρvzvx

)−(∂

∂xτxx +

∂

∂yτyx +

∂

∂zτzx

)− ∂p

∂x+ ρgx.

The y and z components of the equation of motion, which can be obtainedanalogously, are given by

∂

∂t(ρvy) = −

(∂

∂xρvxvy +

∂

∂yρvyvy +

∂

∂zρvzvy

)−(∂

∂xτxy +

∂

∂yτyy +

∂

∂zτzy

)− ∂p

∂y+ ρgy

and

∂

∂t(ρvz) = −

(∂

∂xρvxvz +

∂

∂yρvyvz +

∂

∂zρvzvz

)−(∂

∂xτxz +

∂

∂yτyz +

∂

∂zτzz

)− ∂p

∂z+ ρgz,


respectively. In vector form, these equations become:

∂

∂tρ~v = −∇ · ρ~v~v −∇ · ~τ −∇p+ ρ~g, (8.6)

where

• ∂∂tρ~v is the rate of increase of momentum per unit volume;

• ∇ · ρ~v~v is the “dyadic product” of ρ~v and ~v and represents the rate ofmomentum loss by convection per unit volume. It should be cautionedthat ∇ · ρ~v~v is not a simple divergence because of the tensorial natureof ρ~v~v;

• ∇ · ~τ is the stress tensor term from viscous transfer. Again, this is nota simple divergence;

• ∇p is the pressure force per unit volume;

• ρ~g is the gravitational force on the fluid element per unit volume.

Using the equation of continuity (8.4) we can rewrite equation (8.6) as

ρD~v

Dt= −∇p− [∇ · ~τ ] + ρ~g, (8.7)

where

• ρD~vDt is the mass per unit volume times acceleration,

• ∇p is the pressure force on the fluid element per unit volume,

• ∇ · ~τ is the viscous force on the fluid element per unit volume,

• ρ~g is the gravitational force on the fluid element per unit volume.

Hence, in this form the momentum balance equation is simply the followingstatement

mass × acceleration =∑

forces,

which implies that the momentum balance is equivalent to Newton’s secondlaw of motion.

In order to use these equations to determine the velocity profile of the fluid,we must now express stresses in terms of velocities and fluid properties. ForNewtonian fluids, the shear stress components in rectangular coordinates are

τxx = −2µ∂vx∂x

+(

23µ− κ

)(∇ · ~v)

τyy = −2µ∂vy∂y

+(

23µ− κ

)(∇ · ~v)


τzz = −2µ∂vz∂z

+(

23µ− κ

)(∇ · ~v)

τxy = τyx = −µ(∂vx∂y

+∂vy∂x

)τyz = τzy = −µ

(∂vy∂z

+∂vz∂y

)τzx = τxz = −µ

(∂vz∂x

+∂vx∂z

).

Here, κ is called the bulk viscosity and is identically zero for low densitymonatomic gases and is often neglected in dense gases and liquids. For theremainder of this section, κ is assumed to be zero. These expressions for shearstresses, which are more general statements of Newton’s Law of Viscosity,describe a more complex fluid flowing in all directions. In the case where thefluid flows in the x direction between two parallel plates as depicted in Figure8.1, the above expressions reduce naturally to equation (8.2). Since a detailedderivation of these general expressions for shear stresses is beyond the scopeof this book, we refer the interested reader to [9].

Substituting the general shear stress components into the momentum bal-ance equation (8.7), we obtain the general equations of motion for a Newtonianviscous fluid with varying density and viscosity:

ρDvxDt

= −∂p∂x

+∂

∂x

[2µ∂vx∂x− 2

3µ (∇ · ~v)

]+∂

∂y

[µ

(∂vx∂y

+∂vy∂x

)]+

∂

∂z

[µ

(∂vz∂x

+∂vx∂z

)]+ ρgx

ρDvyDt

= −∂p∂y

+∂

∂y

[2µ∂vy∂y− 2

3µ (∇ · ~v)

]+∂

∂x

[µ

(∂vy∂x

+∂vx∂y

)]+

∂

∂z

[µ

(∂vz∂y

+∂vy∂z

)]+ ρgy

ρDvzDt

= −∂p∂z

+∂

∂z

[2µ∂vz∂z− 2

3µ (∇ · ~v)

]+∂

∂x

[µ

(∂vz∂x

+∂vx∂z

)]+

∂

∂y

[µ

(∂vz∂y

+∂vy∂z

)]+ ρgz.

These equations, along with the equation of continuity, as well as the equationof state and boundary and initial conditions, are used to determine pressure,density, and velocity components in a flowing isothermal fluid.

The above equations are seldom used in their general forms. In particular,when the density and viscosity are both constant, the equations are simplifiedand we obtain the equations of motion for Newtonian inviscid fluids. Theseequations are the celebrated Navier-Stokes equations, which Navier first de-rived in 1822 and Stokes obtained independently in 1845. In vector form they


are given by

ρD~v

Dt= −∇p+ µ∇2~v + ρ~g.

When ∇ · ~τ = 0, equation (8.7) reduces to

ρD~v

Dt= −∇p+ ρ~g.

This is the famous Euler equation, which is used in flow study where viscosityeffect is negligible. It was first derived in 1755 by the Swiss mathematicianLeonhard Euler (1707-1783).

(c) Equation of State

So far, we have one equation for conservation of mass and three equations formomentum balance. This gives us a total of four equations for five unknownsvx, vy, vz, ρ, and p. Therefore, we need one more equation for the determina-tion of the pressure, velocities and density of a fluid flow. This equation, calledthe equation of state, describes the relationship between pressure and density.It is an empirical or constitutive type relationship of the form p = f(ρ). Forexample, in an adiabatic environment we have p = Cργ for some constants Cand γ.

For the special case of low density fluids or gases, pressure and density areproportional

p = Aρ.

This is basically Boyle’s Law where the proportional constant A is a linearfunction of temperature θ, A = R(θ−θo). Here, θo is the reference temperaturewhich is the same for all gases at low densities. Denote

T = θ − θo.

If θ is in degree Celsius, θo = −273.15C. Hence, we have

T = θCelsius + 273.15 = K.

We then obtainp = RTρ.

This is the equation of state which says that the pressure is proportional to thetemperature and inversely proportional to the density. A gas satisfying thisequation over an extended range of pressures and temperatures is known as aperfect (or ideal) gas. No real gases obey these laws exactly, but at ordinarytemperatures and pressures that are not more than several atmospheres, theideal gas law gives a very good approximation to within a few percent of theactual values [7]. In the equation of state, R is the gas law constant and isequal to R

M , where R is the universal gas constant (8314.3 kg m2/kg mol s2K)and M is the molecular weight.


8.2 Fluid Waves

In this section we will consider the phenomenon of waves. Our interest isnot only because of the particular example considered here, which is soundwaves, but also because this is a phenomenon which appears in many con-texts throughout physics. Sound waves, visible light waves, radio waves,microwaves, electromagnetic waves, water waves, sine waves, cosine waves,earthquake waves, and waves on a string are just a few of the examples ofour daily encounters with waves. In addition to waves, there are a variety ofphenomenon in our physical world which resemble waves so closely that wecan describe such phenomenon as being wavelike. These include the motionof a pendulum and the motion of a mass suspended by a spring (as describedin Chapter 2), which can be thought of as wavelike phenomena.

In general, a wave can be described as a disturbance, usually oscillating,which travels through a medium, transporting energy from one location (itssource) to another location without the transport (e.g., convection) of a mate-rial medium. Each individual particle of the medium is temporarily displacedand then returns to its original equilibrium positioned. There are three typesof waves: mechanical waves, electromagnetic waves, and surface waves. Me-chanical waves require a material medium (such as air, water, string) to travel.These waves are further divided into three different types. Transverse wavescause particles of the medium to move perpendicular to the direction of thewave. Longitudinal waves are waves in which particles of the medium movein a direction parallel to the direction of the wave (a sound wave is a classicexample of a longitudinal wave). Surface waves cause particles of the mediumto undergo a circular motion. Surface waves are neither transverse nor lon-gitudinal. The second type of waves are electromagnetic waves that do notrequire a medium to travel (light, radio). Finally, matter waves are producedby electrons and particles.

8.2.1 Terminology

We begin by reviewing some basic terminologies and concepts that are com-monly used to describe waves.

(a) Travelling Wave

Travelling waves are waves that have both spatial and temporal variations.For example, a sinusoidal travelling wave is represented by

A cos(kx− ωt),

where ω is the angular frequency (or just simply called the frequency) of thewave. The angular frequency specifies how the wave oscillates in time. The


SI unit of ω is rad/s. Another related measurement called frequency specifiesthe number of vibrations per second and is given by

f =ω

2π.

The SI unit of f is 1/s, which has been given the name of Hertz after the Ger-man physicist Heinrich Hertz (1857-1894). By knowing the wave frequency,we can determine how long it takes for a wave to execute one oscillation. Thisis called the period and is defined by

T =1f

=2πω.

In addition, we can find the wave velocity from the frequency of a wave andits amplitude by the relation

u = λf =ω

k.

Finally, it is important to point out that as time increases, the phase (kx−ωt)of the wave shifts to lower values, so that for a point on the wave to remainfixed, x must also increase (that is, the wave shifts to the right). Thus, thefunction A cos(kx−ωt) represents a wave that is travelling in the direction ofincreasing x for ω > 0.

(b) Standing Wave

Standing waves, also known as stationary waves, are waves that have spatialvariation but no temporal variation. For example, a sinusoidal stationary waveis described by

A cos kx,

where A, the amplitude of the wave, is the distance from a crest to wherethe wave is at equilibrium. The amplitude is used to measure the energytransferred by the wave. The wave number k specifies how the wave oscillatesin space. It is related to the wave’s wavelength λ, which is the shortestdistance between peaks (the highest points) and troughs (the lowest points),by the relation

λ =2πk.

A second type of standing wave is a wave that is formed by the superpositionof two travelling waves with the same amplitude but that travel in oppositedirections. Here, the standing wave oscillates in time and space but the wavecrests do not move. An example of sinusoidal standing wave is given by

A cos kx cosωt =A

2cos(kx− ωt) +

A

2cos(kx+ ωt).


8.2.2 Sound Waves

Sound is a wave which is created by vibrating objects and propagatedthrough a medium from one location to another. In order to formulate thepropagation of sound in space, it is essential to have some kind of understand-ing of basic mechanisms and phenomena. Fundamentally, the sound wave istransported from one location to another by means of the particle interac-tion. If the sound wave is moving through the air, then as one air particle isdisplaced from its equilibrium position, it exerts a push or pull on its nearestneighbors, causing them to be displaced from their equilibrium position. Thisparticle interaction continues throughout the entire medium, with each parti-cle interacting and causing a disturbance of its nearest neighbors. Because asound wave is a disturbance which is transported through a medium via themechanism of medium particle interaction, it is characterized as a mechanicalwave.

We now seek to mathematically formulate such a process. We begin bynoting that sound waves occur in a medium when there are variations in thepressure. In addition, we would like to describe how the medium densitychanges as it is displaced. Then, of course, the medium particle is displacedand has a velocity, so that we would have to describe the velocity of themedium particles. In summary, the physics of the phenomenon of soundwaves involve three features:

(i) The medium particles move and change the density;

(ii) The change in density corresponds to a change in pressure;

(iii) Pressure changes cause medium particle motion.

Let us first consider feature (ii). For a medium (gas, liquid, or solid), thepressure is some function of density. That is,

p = f(ρ). (8.8)

Also from (ii), even with a change in density, individual fluid elements stillconserve their mass, and so we also have the continuity equation, which is

∂ρ

∂t= −∇ · (ρ~v). (8.9)

We now consider the third feature, which is the equation of motion producedby pressure changes. Euler’s equations for an inviscid flow are still valid here,since they were derived from the rate of change of linear momentum of a fluidelement. Hence, when the dynamical effects of gravity can be neglected (aswe shall assume), we have

ρ∂~v

∂t+ ρ(~v · ∇)~v = −∇p. (8.10)


As discussed earlier, mathematically we think of sound as perturbations ofpressure and density from the “static state” of a fluid. Therefore, we beginby considering the description of the static state. We let ρ0(r) be the staticdensity, p0 the static pressure and ~v0 the motion of the fluid. Here r = (x, y, z)represents a point in space. For our case we consider ~v0 = ~0, so the fluid isunmoving in the silent (static) case. If this is substituted into Euler’s equation(8.10), ~v0 = ~0 implies that p0 is a constant, since ∇p0 = ~0. Note that ρ0(r) isnot necessarily a constant, which allows for damping material in the acousticcavity.

To introduce sound into the system, we use small perturbations of theabove quantities, denoted by ρ(t, r), p(t, r), v(t, r), for perturbation of density,pressure and velocity, respectively, of the fluid and define the relationships

ρ(t, r) = ρ0(r) + ρ(t, r) (8.11)p(t, r) = p0 + p(t, r) (8.12)~v(t, r) = ~v0(t, r) + v(t, x).

However, since ~v0 = ~0, then we have ~v = v, so for our discussion we willdisregard the v, and only use ~v for the perturbations in the velocity. It isnoted that in some textbooks, ρ(t, r) may be written as ρ0(r)[1 + δ(t, r)]where in our formulation ρ(t, r) = ρ0(r)δ(t, r).

(a) Assumptions

For the derivation of the wave equations we make the following standingassumptions:

(i) In the absence of sound, the fluid is found in static equilibrium, aspreviously described by quantities ρ0(r), p0 = c1 for some constant c1and ~v0 = ~0;

(ii) We are considering a non-viscous fluid;

(iii) The only energy in acoustic motion is mechanical;

(iv) There is zero heat conductivity, which is related to (iii);

(v) The only forces affecting our system are compressive elastic forces.

(b) Linearization

We will do a linearization of the three system equations (8.8), (8.9) and(8.10) for perturbations of the steady, silent case. Our guiding principle isto retain the first order terms in ~v and ρ (or δ) to obtain equations for thefluctuations ρ and p from the static case. We disregard the higher order termsby the assumption that only small perturbations are considered.


8.2.2.1 Euler’s Equation

Substituting equations (8.11) and (8.12) into the Euler’s equation (8.10) weobtain

(ρ0 + ρ)∂~v

∂t+ (ρ0 + ρ)(~v · ∇)~v = −∇(p0 + p)

ρ0∂

∂t~v + ρ

∂

∂t~v = −∇p.

The underlined term is a higher order term (both ρ and ~v = v are “small”),so we disregard it. Hence we obtain

ρ0∂~v

∂t= −∇p.

8.2.2.2 Equation of Continuity

We repeat this process for the continuity equation (8.9) to obtain

∂

∂t(ρ0 + ρ) +∇ · ((ρ0 + ρ)~v) = 0

∂ρ

∂t+∇ · (ρ0~v) +∇ · (ρ~v) = 0.

After disregarding the underlined higher ordered term, the above equationreduces to

∂ρ

∂t+∇ · (ρ0~v) = 0.

8.2.2.3 Equation of State

We use a first-order approximation of our function f , while noting thatp0 = f(ρ0). Then

p = f(ρ) = f(ρ0 + ρ) ≈ f(ρ0) + f ′(ρ0)ρ= p0 + f ′(ρ0)ρ.

Now since p− p0 = p, we obtain

p− p0 = f ′(ρ0)ρp = f ′(ρ0)ρ,

or p = c2ρ, where c2 ≡ f ′(ρ0) = ∂p∂ρ |ρ0 , which is the speed of sound in static

material.An associated parameter, the compressibility K ≡ 1

c2ρ0is often encountered

in the equation K ∂p∂t = −∇ · ~v, which is the equation of continuity when ρ0 isconstant.

In summary, the first order equations for sound are:


• ρ0∂~v

∂t= −∇p (Euler);

• ∂ρ

∂t+∇ · (ρ0~v) = 0 (Continuity);

• p = c2ρ (State).

8.2.3 Wave Equations

From our three linearized system equations we can derive three wave equa-tions, one of which is the popular φtt = c2∆φ for the acoustic potential. Ingeneral we assume that c2 = c2(r) = f ′(ρ0(r)).

(a) Wave equation in pressure

Using the linearized state equation ρ = pc2 in the linearized continuity equa-

tion to obtain∂

∂t

(p

c2

)= −∇ · (ρ0~v).

Taking ∂∂t on both sides of the equation and using the linearized Euler equation

yields

1c2∂2

∂t2(p) = −∇ · (ρ0

∂~v

∂t)

= −∇ · (−∇p)= ∆p

∂2p

∂t2= c2∆p.

Thus the pressure perturbation p satisfy the classical wave equation.

(b) Wave equation in velocity

We can derive a wave equation in ~v, with some added restrictions, whichwill become evident. We begin by taking ∂

∂t of the linearized Euler equationto obtain

ρ0∂2~v

∂t2= −∇pt.

But from the continuity equation and the state equation we have

pt = c2ρt = c2[−∇ · (ρ0~v)]= −c2∇ · (ρ0~v).

Therefore,

ρ0∂2~v

∂t2= −∇[−c2∇ · (ρ0~v)]

ρ0∂2~v

∂t2= ∇[c2∇ · (ρ0~v)],


which is a wave equation in velocity with non-constant ρ0(r).If we assume that ρ0 was a constant, then c2 = f ′(ρ0) would be constant.

In this case,∂2~v

∂t2= c2∇[∇ · ~v].

We next use a vector identity ∇× (∇× w) = ∇[∇ · w] −∆w (see AppendixB), to obtain

∂2~v

∂t2= c2∆~v + c2∇× (∇× ~v).

Thus, if the flow is irrotational (∇× v = 0), then we obtain the standard waveequation for ~v in the case of constant density irrotational flow:

∂2~v

∂t2= c2∆~v.

(c) Popular wave equation (potential)

To obtain the third wave equation we have to assume constant density andirrotational flow. But how does irrotational flow affect our system? We define~ω = ∇× ~v as the vorticity, and return to Euler’s equation

ρ0∂~v

∂t= −∇p.

Now take the curl of both sides:

∇×(ρ0∂v

∂t

)= −∇× (∇p) = 0.

If ρ0 is constant, then the above equation yields ∂∂t (∇ × ~v) = 0. Hence, if

we assume that ~ω|t=0 = 0, which is a no initial vorticity assumption, thenwe can integrate ∂

∂t (∇ × ~v) = 0 to obtain ∇ × ~v = 0, which is irrotationalflow. So the assumptions of constant ρ0 and no initial vorticity imply that~ω = (∇× ~v) = 0.

Moreover, recall that if curl ~v = ∇× ~v = 0, then a scalar function φ existssuch that ~v = −∇φ, which is called the velocity potential. It should be notedthat φ is not unique! Now return to the linearized Euler’s equation and recallthat we assumed ρ0 was constant. We have

ρ0∂~v

∂t= −∇p

−ρ0∇φt = −∇p∇(ρ0φt − p) = 0.

This implies that ρ0φt − p must be constant with respect to the spatial vari-ables. Hence,

ρ0φt − p = −k(t)


orp = ρ0φt + k(t).

We claim that without loss of generality we can take k ≡ 0. To see thisdefine another potential by

φ = φ+1ρ0

∫ t

0

k(s) ds.

Then,∇φ = ∇φ = −~v

still holds whileρ0φt = ρ0φt + k(t) = p;

that is, we can use this potential for the velocity potential. Therefore, withoutloss of generality,

p = ρ0φt. (8.13)

Combining the continuity equation, the state equation and (8.13) with con-stant ρ0, we obtain

∂ρ

∂t= −∇(ρ0~v)

∂

∂t

(p

c2

)= +∇(ρ0∇φ)

∂

∂t

(ρ0φtc2

)= ρ0∆φ

ρ0φttc2

= ρ0∆φ

φtt = c2∆φ,

which is the usual wave equation for the velocity potential.

(d) Summary

We didn’t make any assumptions to obtain ptt = c2∆p, where we have thelinearized state equation p = c2ρ. With the assumption that ρ0 is constant,we also have (i) ∂2~v

∂t2 = c2∆~v where c2 is constant, (ii) there exists a scalarfunction φ such that ~v = −∇φ and p = ρ0φt, which implies that φtt = c2∆φ.

Now consider what happens when ρ0 is not constant in space. Then werewrite the linearized Euler’s equation

ρ0∂~v

∂t= −∇p

as∂

∂t(ρ0~v) = −∇p,


and define “vorticity” as ~ω ≡ ∇× ρ0~v. Now we again assume that the initialvorticity is zero (~ω(0) = 0) and take the curl of the Euler’s equation to obtain

∇× ∂

∂t(ρ0~v) = ∇× (−∇p)

∂

∂t(∇× ρ0~v) = 0.

Hence with ~ω(0) = 0, we have ∇ × ρ0~v = 0. Therefore, there exists a scalarfunction Φ such that ρ0~v = −∇Φ and ~v = − 1

ρ0∇Φ. Now we return to Euler’s

equation,

∂

∂t(−∇Φ) = −∇p

∇(p− Φt) = 0.

Then by a similar argument as before, we can without loss of generality takep = Φt.

Finally, we repeat the process which resulted in equation (8.13), startingwith the linearized continuity equation, but without the ρ0 constant assump-tion. Thus

∂ρ

∂t= −∇(ρ0~v)

∂

∂t

(p

c2

)= −∇ · (ρ0~v)

∂

∂t

(Φtc2

)= −∇ · (−∇Φ)

Φtt = c2∆Φ.

Hence, we obtain the same result for the wave equation without the ρ0 con-stant assumption. In this case c2 = f ′(ρ0) may depend on the spatial variable.

Finally, in the case of plane symmetric pressure waves, i.e., when the phys-ical properties of a wave are constant along the directions tangent to a familyof plane surfaces (so that the waves are effectively one-dimensional), the waveequation for the pressure (and similarly for the other two wave equations)becomes

∂2p

∂t2= c2

∂2p

∂x2.

In this case, the classical D’Alembert solution, in honor of the French mathe-matician Jean le Rond d’Alembert (1717-1783), to the wave equation is givenby [8]

p(t, x) = F (t− x/c) +G(t+ x/c), (8.14)

where F and G are arbitrary functions of t and x that are twice continuouslydifferentiable and represent propagating disturbances. It is also necessary


to emphasize regarding solution (8.14) that whatever the initial disturbanceprofiles of F and G, those profiles are maintained during propagation. Thus,sound waves in the one-dimensional problem propagate essentially withoutdistortion.

8.3 Experimental Modeling of the Wave Equation

In specific applications we have to adapt the functions F and G in theD’Alembert solution for the one-dimensional wave equation to the given ini-tial and boundary conditions. In this section, we will describe a cost-effectivephysical experiment that one can use to study various types of boundary con-ditions for acoustic wave propagation in a wave duct. This experiment wasmotivated from an earlier investigation [2] in which the authors consideredseveral types of boundary conditions in the context of time domain modelsfor acoustic waves. They carried out experiments with four different duct ter-minations (hardwall, free radiation, foam, wedge) to measure the reflection ofharmonic waves by the duct terminations over the range of frequency consid-ered. These reflection coefficients are, in turn, used to estimate the parametersin the mathematical models for time domain boundary conditions. The effortsreported there are the first steps in the development of state space/time do-main models for use in the control design problems related to acoustic controlof noise in a closed cylinder. The ultimate intent was to model the frequency-dependent impedance of a treated aircraft interior such that the time domaininterior pressure response to transient excitation may be predicted. In suchapplications, one has negligible fluid damping of the acoustic pressure fields.Since the major dissipative mechanism entails the partial absorption/partialreflection that occurs at the fluid/wall interface, it is important in the controlof the interior acoustic pressure to model this dissipation accurately.

The general hardware needed to set up this experiment is depicted in Figure8.6. In this experiment a PVC pipe (readily available at any hardware store)is used to study the effects of different boundary conditions on the acousticresponse of an enclosed sound field. Sound waves in the pipe are createdwith the use of a speaker (which can be bought from RadioShack) mountedat one end of the pipe connected to a function generator (we use a four-channel Hewlett-Packard dynamic signal analyzer, model 35670A, see Figure8.7). The function generator sends an oscillating current signal to the speakerwhich causes the speaker’s diaphragm to vibrate. As the diaphragm movesoutward, the air near the speaker is compressed, creating a small volume atrelatively high pressure, which propagates away from the speaker. As the di-aphragm moves inward, a low pressure area is created which also propagatesaway from the speaker. The process of compressions and rarefactions contin-


ues with a frequency equivalent to the input signal. A higher input frequencyimplies that the compression/rarefaction cycle occurs more frequently per sec-ond. As the sound wave propagates away from the speaker, they are detectedand measured by the electnet condenser microphones (we used the PanasonicOmnidirectional Electnet Condenser Microphone Carthridge, model WM-034)mounted at various locations along the acoustic pipe. We used the HP signalanalyzer to monitor both the input signal as well as the signal recorded bythe microphone. For this configuration, two termination conditions are inves-tigated. The first is a near hardwall condition obtained by terminating thePVC pipe with a reinforced aluminum plate. The second tested case in thewave pipe is a foam condition. It is hard to anticipate the exact behavior ofthis type of termination condition. We leave it as a modeling exercise for thereader (see the project description at the end of this chapter).

microphones

FIGURE 8.6: Hardware used for studying various types of boundary con-ditions associated with the one-dimensional wave equation.

FIGURE 8.7: Hewlett-Packard dynamic signal analyzer.


Project: Sound Wave Propagation in a PVC Pipe

The objective of this project is to study two types of boundary conditions foracoustic waves propagation in a PVC pipe. Experiments with two differentboundary conditions (hardwall and foam) are considered using a harmonicoscillator at the other boundary condition. The collected data are then usedto obtain reflection coefficients over a wide range of frequencies. The reflectioncoefficients, in turn, are used to estimate unknown parameters in the modelsused for boundary conditions.

The acoustic wave motion in a fluid is described by either the acousticpressure, p, or the velocity potential, φ. These two quantities are related byp(t, x) = ρφt, where ρ is the equilibrium density of the fluid (in this case, air atroom temperature). In the case where the wavelength of the wave disturbanceis large compared to the transverse dimension of the pipe, the wave motionis predominantly parallel to the pipe axis and the sound wave motion is verynearly one-dimensional. That is, the velocity potential φ satisfies the followingone-dimensional wave equation

m∂2φ

∂t2= c2

∂2φ

∂x2, 0 < x < l, (8.15)

where c is the speed of sound and l is the length of the pipe. Two types ofboundary conditions will be considered:

(a)Oscillating boundaries. The interaction of the boundary at x = l and theinterior pressure is modeled by a damped harmonic oscillator and is describedby

δtt + dδt + kδ = −ρφt(t, l). (8.16)

Here, δ is the normal displacement of the boundary in the direction interiorto the fluid. The coefficients m, d and k are the effective mass, resistance,and the stiffness per unit area of the boundary surface and are assumed tobe unknown. In addition, it is also assumed that the boundary surface is notpenetrable by the fluid, that is,

δt(t) = φx(t, l). (8.17)

Recall that the D’Alembert solution to the wave equation (8.15) has the form

φ(t, x) = F (t− x/c) +G(t+ x/c), (8.18)

where the first term on the right side of (8.18) describes a wave propagating tothe right and the second term corresponds to a left propagating wave. Fromequation (8.17) and by integrating, we obtain

δ(t) = −1c

(F (t)− G(t)), (8.19)


where, without loss of generality, the constant of integration is set to zero andF (t) = F (t − l/c) and G(t) = G(t + l/c). Substituting the expression of thesolution (8.19) into (8.16) yields

mGtt + (d+ ρc)Gt + kG = mFtt + (d− ρc)Ft + kF . (8.20)

Now, assume that the incident wave F to the boundary at x = l (which isgenerated by a harmonic input at x = 0) is a simple harmonic of frequencyω/2π, where ω is the angular frequency. That is,

F (t) = A0eiωt, (8.21)

so that the right side of (8.20) is a harmonic forcing function. It follows thatthe steady state solution of (8.20) is also harmonic with the same frequency

G(t) = R(ω)A0eiωt, (8.22)

where the complex coefficient, R(ω), is called the reflection coefficient. Sub-stituting equations (8.21) and (8.22) into (8.20) we have a relation for thereflection coefficient for the oscillating boundary condition model given by

R(ω) =mω2 − i(d− ρc)ω − kmω2 − i(d+ ρc)ω − k

. (8.23)

(b)Damped elastic boundaries. For d = 0, k = 0 the models (8.16), (8.17)together with the relation p = ρφt results in the boundary condition

mpx(t, l) + ρp(t, l) = 0,

for the acoustic pressure. This is called a Robin or elastic boundary condition.To include dissipation it is extended by adding a damping term pt that givesthe following boundary condition in terms of the acoustic pressure

αp(t, l) + βpt(t, l) + cpx(t, l) = 0.

Assuming harmonic incident wave as previously, show that the damped elasticreflection coefficient is given by the following expression:

R(ω) =iω(1− β)− αiω(1 + β) + α

. (8.24)

This project involves the following steps:

1. The acoustic pressure anywhere in the pipe for planar wave propagationis given by the following equation:

p(t, x) = A(ω)eiω(t−x/c) +A(ω)R(ω)eiω(t+x/c).


By measuring the pressure, p(t, x), at a number of axial locations, xj ,and for a specific angular frequency ω, an inverse least squares prob-lem can be formulated to estimate both complex coefficients, A(ω) andR(ω). Considering both physical hardwall and foam type of boundaryconditions at x = l and collecting two corresponding sets of experimen-tal data, one can use these to estimate the reflection coefficient R(ω).This data will be denoted by Rd(ω), over the range of frequencies from100 Hz to 500 Hz.

2. In this problem, we will evaluate how well the oscillating boundary anddamped elastic boundary models described by formulas (8.23) and (8.24)fit the experimental data Rd(ωj). One approach is to determine the setof parameters, (m, d, k, ρ) and (α, β), so that the functional

N∑j=1

|Rd(ωj)−R(ωj)|2

is minimized. Here, N is the number of measurements Rd at frequenciesfj = ωj/2π. In your report, discuss which model (8.23) or (8.24), orboth, is (or are) best to describe the hardwall and the foam type ofboundary conditions.

References

[1] H.T. Banks, W. Fang, R.J. Silcox and R.C. Smith, Approximation meth-ods for control of acoustic/structure models with piezoceramic actu-actors, Journal of Intelligent Material Systems and Structures, 4(1),1993, pp. 98–116.

[2] H.T. Banks, G. Propst and R.J. Silcox, A comparison of time domainboundary conditions for acoustic waves in wave guides, Quarterly ofApplied Mathematics, LIV(2), 1996, pp. 249–265.

[3] H.T. Banks, R.J. Silcox and R.C. Smith, The modeling and control ofacoustic/structure interaction problems via piezoceramic actuators: 2-Dnumerical examples, ASME Journal of Vibration and Acoustics, 116(3),1994, pp. 386–396.

[4] H.T. Banks and R.C. Smith, Modeling and approximation of a coupled3-D structural acoustics problem, in Progress in Systems and ControlTheory, K.L. Bowers and J. Lund, eds., Birkhauser, Boston, 1993, pp.29–48.

[5] H.T. Banks, R.C. Smith and Y. Wang, Smart Material Structures: Mod-eling, Estimation and Control, John Wiley & Sons, Inc., 1996.

[6] R.B. Bird, W.E. Stewart and E.N. Lightfoot, Transport Phenomena,John Wiley & Sons, Inc., New York, 1960.

[7] C.J. Geankoplis, Transport Processes and Unit Operations, Prentice-Hall, Inc., 1993.

[8] K.F. Graff, Wave Motion in Elastic Solids, Dover Publications, Inc.,1991.

[9] H. Schlichting, Boundary-layer Theory, MacGraw-Hill, New York, 1979.

243

Chapter 9

Size-Structured Population Models

9.1 Introduction: A Motivating Application

The mosquitofish, Gambusia affinis, is used throughout the world to con-trol mosquito populations. Indigenous to the southeastern United States andnortheastern Mexico, it is one of the most widely distributed of all freshwaterfish. When introduced into a rice field, the mosquitofish eat the water-bornemosquito larvae. Consequently, it is thought to be the most widely dissemi-nated natural predator as well as the most popular form of mosquito control.

In spite of their widespread use, the mechanisms underlying the growthof Gambusia populations (and consequently, mosquito control) are not wellunderstood. For example, studies have shown that application of Gambusiaearly in the rice season leads to fewer mosquito larvae on the average overseveral fields. However, there is considerable variability among rice fields,with some unstocked fields having fewer larvae than stocked fields.

In the early 1980s a research group from UC-Davis [12] carried out ex-periments to better understand how Gambusia populations develop in ricefields. Their goal was to achieve better mosquito management through moredetailed knowledge of Gambusia population and predation dynamics. Eventhough the economic implications were substantial, no one really knew howmany mosquitofish should be used to stock a rice paddy field. In addition,stocking methods do significantly differ, raising many questions. For example,should all the mosquitofish be added initially, or should they be introducedinto the rice paddy field periodically or by some other time dependent sched-ule?

There are a number of avenues that can be taken to investigate these ques-tions. A control theorist might try to use a general system of ordinary differ-ential equations such as

x = Ax+Bu

and choose a control u (stocking rate perhaps) to improve system behavior(see Chapter 7 for an introduction to the control theory). However, thisrequires knowing the matrices A and B. At one time control theorists thoughtbiologists might be able to provide A and B, but they unfortunately were notable to do this with any degree of certainty.

245


Another avenue is to perform many experiments in hope of finding someempirical relationship. The approach that we pursue here is to adapt somesort of reasonable mathematical model to understand the basic dynamics ofgrowth and decline in the mosquitofish population. Several types of popu-lation models have been developed over the years to model population dy-namics. These include single species models, logistic models, predator/preymodels and structured models, each of which will be discussed in the followingsections.

9.2 A Single Species Model (Malthusian Law)

The simplest population models are the single species models. Let p(t) de-note the population (number) of a given species at time t. Assuming that thispopulation is isolated (that is, there is no net immigration nor emigration),then the rate of change of the population is simply the difference between thebirth rate and the death rate

dp

dt= birth rate− death rate.

We further assume that the more individuals there are, the more births anddeaths that occur. That is, both the birth rate and death rate are proportionalto the number of individuals in the population. Consequently, the birth rateis given by βp and death or mortality rate is µp. In this case, the modelbecomes

dp

dt= βp− µp

= αp, (9.1)

where α = β − µ represents the net rate of birth/death per individual in thepopulation. Equation (9.1) is a linear first order differential equation andis known as the Malthusian law of population growth. If the population ofa given species is p0 at time t = t0, then the solution to the initial-valueproblem has the form p(t) = eα(t−t0)p0. Depending on the value of α thesolution p(t) will have one of the following three characteristics: (i) whenα > 0 (more births than deaths) the population will grow exponentially withtime, (ii) when α is negative the population will die out, and (iii) when α isequal to zero the population will remain constant and is equal to the initialnumber of individuals p0 (see Figure 9.1).

The single species model is so simple that it predicts population outcomesthat are clearly unreasonable. Note that the deaths in this model are from“natural causes” or old age. There is no predatory or otherwise harmful

Size-Structured Population Models 247

t

p(t)

p0

t0

α < 0

α > 0

FIGURE 9.1: Graphs of the population p(t).

activities represented in this model. Moreover, when the number of individualsp becomes very large, the single species model cannot be very accurate, sinceit does not reflect the fact that individual members are now competing witheach other for limited living space, natural resources, and food.

9.3 The Logistic Model

Clearly overcrowding will reduce the amount of food, as well as tax otherresources such as oxygen levels, etc. In the single species model we can adda crowding term, which will result in more deaths with higher numbers ofindividuals. A simple first assumption might be that the death rate per indi-vidual µ, is a function of the population p. That is, we might take µ = µ(p).The simplest form of such a function is linear µ(p) = µp, so that the modelbecomes

dp

dt= βp− (µp)p. (9.2)

This equation was first introduced by the Dutch mathematical biologist Ver-hulst in 1837 and has subsequently become known as the logistic equation.The term µp in equation (9.2) simply translates to more deaths occurringwhen p is large; this is the competition or crowding term.

We observe that if p is small, −µp2 is negligible and the model reduces tothe Malthusian law. On the other hand, if p is large, −µp2 serves to slow downthe rapid rate of increase. In either case, for µ 6= 0, the equation is readilysolved analytically via standard techniques. Using the method of separationof variables, we rewrite the differential equation

dp

dt= βp− µp2, p(t0) = p0


as

dp

βp− µp2= dt.

Hence we find

p(t) =βp0

µp0 + (β − µp0)e−β(t−t0),

the graph of which is depicted in Figure 9.2. This solution is often written as

p(t) =Kp0

p0 + (K − p0)e−β(t−t0)

corresponding to the equation being written as

dp

dt= βp

(1− p

K

),

where K = β/µ is the population’s carrying capacity and β is called theintrinsic growth rate.

t

p(t)

p0

accelerated growth diminishing growth

K=b/m

b|(2m)

FIGURE 9.2: Graph of the solution to the logistic model.

We remark that regardless of the initial population p0, the number of in-dividuals always approaches the limiting value K = β/µ as t→∞. Further-


more, since

d2p

dt2=

d

dt(dp

dt)

=d

dt(βp− µp2)

= βdp

dt− 2µp

dp

dt= (β − 2µp)(β − µp)p,

it follows that if p < β2µ , then d2p

dt2 > 0 and p is thus concave up. On the

other hand, if p > β2µ (and p < β

µ ), then d2pdt2 < 0 and p is concave down.

Hence, the graph of p has the form as depicted in Figure 9.2. Such a curve iscalled a logistic, or S-shaped curve. From its shape, the time period before thepopulation reaches β

2µ is known as the period of accelerated growth. After thisperiod, the rate of growth decreases and asymptotically reaches zero. This isa period of diminishing growth.

The logistic model is sometimes also called the Verhulst-Pearl model (it wasdeveloped by Verhulst [21] and later rediscovered and popularized by Pearl[19]). It has been widely used [17] for many years in certain applications. Itsprimary feature, the population saturation, is biologically realistic if nothingelse is preying on the population. However, this model is not adequate in apredator/prey situation.

9.4 A Predator/Prey Model

In the mid-1920s, Italian biologist Umberto d’Aucona studied the percentof total catch of selachians (a group of fish comprising the sharks, skates, andrays) in the Mediterranean port of Port Fiume, Italy. The data is tabulatedin Table 9.1 for the period from 1914 to 1923 [13].

He was puzzled by the very large increase of selachians during World WarI (1914-1918). He reasoned that selachians increased due to the reduced levelof fishing during the war. Therefore, there were more fish available as foodfor the selachians, and hence the selachian population multiplied. However,this explanation was not satisfactory since one did not have more food fish(supposedly to be eaten by sharks) during this period.

After exhausting all biological explanations, in 1926 [22] he turned to hiscolleague, the famous Italian mathematician Vito Volterra, for help. Volterraformulated a mathematical model for the growth of selachians and their prey,food fish, by separating all food fish into the prey population and selachiansinto the predator population.


TABLE 9.1: Percent of total catch ofselachians.

1914 1915 1916 1917 191811.9% 21.4% 22.1% 21.2% 36.4%1919 1920 1921 1922 1923

27.3% 16.0% 15.9% 14.8% 10.7%

Let the number of predators and prey at time t be N(t) and E(t) (theedibles), respectively. A simple assumption is that the population of edibleswill grow exponentially without the predators. In addition, the prey deathrate depends on both E and N (since they are eaten by predators). Similarly,since the predators need the edibles to live, their birth rate will be dependenton E and N as well. Finally, with no edibles, the predators are assumed todie out exponentially. Then, we can write the following system of differentialequations for the predator/prey model:

dN

dt= (βNE)N − µNN,

dE

dt= βEE − (µEN)E. (9.3)

The system of equations (9.3), which is also called the Lotka-Volterra model,has two equilibrium solutions:

Ne = Ee = 0

andNe =

βEµE

, Ee =µNβN

.

Moreover, it has the following families of solutions:

(i) E(t) = E0eβEt, N(t) = 0,

(ii) N(t) = N0e−µN t, E(t) = 0.

Hence, both the E and N axes are orbits of (9.3). This implies that everysolution E and N of (9.3) that starts in the first quadrant, E > 0 and N > 0,will remain there for all t ≥ t0 (which is guaranteed by the uniqueness resultof the solution to (9.3)). Furthermore, the orbits for E,N 6= 0 can be foundby solving the following equation

dN

dE=−µNN + βNEN

βEE − µENE,

which, after one separates variables and integrates both sides, yields

NβE

eµENEµN

eβNE= k1. (9.4)


E

N

E E

N N

b /m

m /b

FIGURE 9.3: Orbital solutions of the predator/prey model.

Equation (9.4) defines a family of closed curves for E,N > 0 which are de-picted in Figure 9.3.

As shown in Figure 9.3, the solutions to the predator/prey model are pe-riodic functions. The Lotka-Volterra model forms the basis of many modelsused today in the analysis of population dynamics. However, in its originalform it has some significant problems. First, neither equilibrium point is sta-ble (see, e.g., Figure 9.3). In addition, many ecologists/biologists refused toaccept Volterra’s model. They cited the experiments of G.F. Gause (1934)with two species of protozoan (one of which feeds on the other). In all ex-periments, the predators, Didinium, quickly destroyed the prey, Paramecium,and then died of starvation. In this case, the number of individuals of bothspecies decays to zero and clearly does not oscillate indefinitely. Obviously theVolterra model does not take into account that bigger fish eat more and thatthe size varies greatly in the population. Therefore, an approach to introducethese factors into a model is to consider size-structured modeling.

9.5 A Size-Structured Population Model

The logistic and Lotka-Volterra models are both aggregate models. Thatis, they assume that all individuals are identical in characteristics and behavior(fish are all of the same size, for example). Gause’s predator/prey experimentsindicate that this assumption is not very realistic.

We can attempt to model the individuals or members of a system by be-havior or characteristic. This might produce a more realistic model thanthe aggregate model but it is also much more complicated. In 1967 Sinko


and Streifer [20] balanced this trade-off by letting all individuals share somecommon traits, but permitted variation in size. Their formulation and itsgeneralizations have subsequently been used widely in biological modeling[18].

Let u(t, x) be the number of individuals of size x at time t. If we assumethat the species has M distinct size classes x1, x2, ..., xM , the total populationN(t) at time t will be given by

N(t) =M∑i=1

u(t, xi).

This is size discrete modeling. Here, growth is a jump from one size class tothe next. For growth to be continuous, we will let x = x(t) be a continuousfunction of t. Now we cannot determine how many individuals are in a specificsize class, but instead we calculate the number in an interval of size. We useu(t, x) to denote size density (in numbers per unit size) and calculate thenumber of individuals between size a and b at time t by

Nab(t) =∫ b

a

u(t, ξ)dξ.

It is important to note that x is not a spatial variable and has nothing to dowith the location of the individual in the medium. It actually denotes size.Since x(t) is size, the flux of x(t) is defined in terms of growth from size x tox+∆x. Also the size density term, u(t, x), would have units of individuals/sizethat is very different from a location density data, which might have units ofindividuals/length3.

As already mentioned above, to balance an aggregate model and individualmodel, Sinko and Streifer grouped individuals sharing common traits together.Specifically, they make the following assumptions:

1. The growth rate, g > 0, of same sized individuals is the same. That is,

dx

dt= g(t, x).

The simplifying effect of this equation for growth is that the growth ofall sizes of individuals is governed by this one equation. Moreover, it isassumed that g is a continuous function.

2. Individuals of the same size have the same likelihood of death. In asimple version, all sizes will have the same death rate. This gives thefollowing basic equation for “simple” death:

dN

dt= −µN(t),

where µ is the constant of proportionality of mortality. A more com-plicated model will have µ = µ(x) so that mortality is a function of


size (i.e., a large individual might be more likely to die than a smallindividual).

3. The population is sufficiently large to be treated with a continuummodel.

4. There is a “smallest” and a “largest” size (x0 ≤ x ≤ x1).

5. Birth (also called recruitment) rate is proportional to the populationsize density and is given by

R(t) =∫ x1

x0

k(t, ξ)u(t, ξ) dξ,

where k(t, ξ) is the size-dependent fertility term also called the fecundityfunction.

We begin the model derivation by considering first the simple case. Here,we assume that there are no births and no deaths; thus, the population sizeis constant. That is, ∫ x1

x0

u(t, x) dx = C,

where the constant C is the total population. Now, we consider the populationfrom size a to b at time t0. Then,

Nab(t0) =∫ b

a

u(t0, ξ) dξ,

where Nab(t0) is the shaded area depicted in Figure 9.4. Next, we consider

Nab

(t0)

u(t ,x)

x0

x1

a b

0

FIGURE 9.4: Total population from size a to b at time t0.

this same distribution at some later time, say t1, where t1 > t0. In that time


the fish that were size a grow by ∆a and the size b fish grow by ∆b. However,the number of fish between sizes a + ∆a and b + ∆b should be the same asthe number of fish between sizes a and b. That is,∫ b

a

u(t0, ξ) dξ =∫ b+∆b

a+∆a

u(t1, ξ) dξ.

This is certainly not true if fish from other size classes are entering this sizeinterval. We can easily show and therefore subsequently assume that thiscannot happen.

Let t = t0 and assume that we have x(1)(t0) < x(2)(t0) for two classes x(1),x(2). We will now show that x(1)(t) < x(2)(t) for all t > t0. Considering thesimple growth functions with initial functions

x(1) = g(t, x(1)), x(1)(t0) = x10,

x(2) = g(t, x(2)), x(2)(t0) = x20,

we will prove our assertion by contradiction. Assuming that x10 < x2

0 andx(1)(t) > x(2)(t) for some t > t0 and that the growth function g is continuous,we find at some time tnew > t0 that the two sizes x(1) and x(2) are the same(see Figure 9.5). However, they grow at different rates from tnew to tlaterwhich is a contradiction to our assumption that individuals of the same sizegrow at the same rate. This is indeed a consequence of uniqueness of solutionto the ordinary differential equation for x(t).

tnew

tlater

t

x(t)

x1

x2

x(1)(t)

x(2)(t)0

0

FIGURE 9.5: Size trajectories.

Now let a, b ∈ (x0, x1), where a < b, and let x(t; t0, η) denote the uniquesolution to

x = g(t, x)x(t0) = η.


Since there are no births nor deaths and individuals cannot “jump” into adifferent size interval, we have∫ b

a

u(t0, ξ) dξ =∫ x(t;t0,b)

x(t;t0,a)

u(t, ξ) dξ. (9.5)

To obtain the differential version of the conservation formula (9.5) we differ-entiate both sides of the equation with respect to t to obtain

d

dt

∫ b

a

u(t0, ξ) dξ =d

dt

∫ x(t;t0,b)

x(t;t0,a)

u(t, ξ) dξ.

Obviously the left side is zero. Using Leibnitz’s rule [16] on the right side wehave

0 =∫ x(t;t0,b)

x(t;t0,a)

∂

∂tu(t, ξ) dξ + u(t, x(t; t0, b))x(t; t0, b)− u(t, x(t; t0, a))x(t; t0, a)

=∫ x(t;t0;,b)

x(t;t0,a)

∂

∂tu(t, ξ) dξ + u(t, x(t; t0, b))g(t, x(t; t0, b))

−u(t, x(t; t0, a))g(t, x(t; t0, a))

=∫ x(t;t0,b)

x(t;t0,a)

∂

∂tu(t, ξ) dξ +

∫ x(t;t0,b)

x(t;t0,a)

∂

∂ξ(u(t, ξ)g(t, ξ)) dξ,

=∫ x(t;t0,b)

x(t;t0,a)

∂

∂tu(t, x) +

∂

∂ξ(u(t, ξ)g(t, ξ))

dξ.

Since (x(t, t0; a), x(t, t0; b)) is an arbitrary interval of sizes, the integrand mustbe zero and we obtain the equation of conservation

∂

∂tu(t, x) +

∂

∂ξ(u(t, ξ)g(t, ξ)) = 0. (9.6)

We now present another way to derive the conservation equation (9.6) byconsidering flux balancing. That is,

rate of change of population in the size interval (a, b) =rate of individuals entering (a, b)−rate of individuals leaving (a, b).

Let the interval be [x, x+ ∆x], then we have

d

dtNx,x+∆x(t) = g(t, x)u(t, x)− g(t, x+ ∆x)u(t, x+ ∆x) (9.7)

d

dt

∫ x+∆x

x

u(t, ξ) dξ = g(t, x)u(t, x)− g(t, x+ ∆x)u(t, x+ ∆x)

d

dt

1∆x

∫ x+∆x

x

u(t, ξ) dξ =g(t, x)u(t, x)− g(t, x+ ∆x)u(t, x+ ∆x)

∆x.


Now taking the limit as ∆x approaches zero, we obtain

d

dtu(t, x) = − ∂

∂x(g(t, x), u(t, x)),

ord

dtu(t, x) +

∂

∂x(g(t, x), u(t, x)) = 0.

This equation is a hyperbolic partial differential equation. Hence the growthfollows the “characteristics” (in this case, solutions (t, x(t)) of the equationdxdt = g(t, x)). Let us consider what happens to the individuals that start atsize x0 at time t0. At some later time t1 > t0, all of the individuals in oursystem have size x(t1) > x(t0; t0, x0). In addition, since x1 is the largest size,x(t) < x1 for all t. Therefore, as t becomes large, all individuals will be at asize close to x1, the maximum size; that is, the population is “bunching up”near x1 and there are no individuals in the shaded region depicted in Figure9.6. This is a major drawback to conservation. To overcome this undesirablecharacteristic we now consider adding births and deaths.

t1

t0

t

x(t,t0,x0)

x1x0 x

FIGURE 9.6: Growth characteristic of the conservation equation.

Recalling the flux balance equation (9.7), we now add a death rate term tothe right side:

∂

∂t

∫ x+∆x

x


−death rate term. (9.8)

The death term depends on a mortality factor µ(t, x), as well as u(t, x) and


∆x. Therefore, the flux balance equation (9.8) becomes

∂

∂t

∫ x+∆x

x


−∫ x+∆x

x

µ(t, ξ)u(t, ξ) dξ. (9.9)

Now dividing both sides of equation (9.9) by ∆x and letting ∆x go to zerowe obtain

∂

∂tu(t, x) = − ∂

∂x(g(t, x)u(t, x))− µ(t, x)u(t, x). (9.10)

Equation (9.10) is known as the McKendrick-Von Foerster equation or theSinko-Streifer equation [20]. The functions g(t, x) and µ(t, x) correspond re-spectively to the growth rate of an individual of size x at time t and thefraction of individuals of size x dying at time t. To complete the descriptionof this mathematical model requires the specification of an initial condition

u(0, x) = Φ(x)

and a boundary condition. We assume that all births entering the populationbegin at the smallest size x0, for simplicity. More specifically, we have:

rate of population entering at x0 = birth rate

g(t, x0)u(t, x0) =∫ x1

x0

k(t, ξ)u(t, ξ) dξ,

or

R(t) = g(t, x)u(t, x)|x=x0 =∫ x1

x0

k(t, ξ)u(t, ξ) dξ. (9.11)

Here R is known as the recruitment rate. We note that when the newbornsenter the system, they follow the characteristic growth curves just like otherindividuals. The addition of (9.11) essentially completes the specification ofthe mathematical model (9.10). However, since x1 is the maximum attainablesize, we also impose the physical condition

g(t, x)u(t, x)|x=x1 = 0.

If the functions g(t, x), µ(t, x), R(t) and Φ(x) are known explicitly, thissystem can be solved using the method of characteristics [14]. To explainthis, we begin by considering the simpler case where the growth rate functionis assumed to be constant, g(t, x) = a, and the mortality factor µ(t, x) = 0.That is,

ut + aux = 0,u(0, x) = Φ(x).


Note that the total derivative or directional derivative of u is given by

du = utdt+ uxdx,

=(ut + ux

dx

dt

)dt.

Hence, in the direction of dxdt = a we have

du = (ut + uxa)dt = 0.

That is, u is constant along the curve given by dxdt = a. This curve is called

the characteristic of the partial differential equation. (If a is not a constant,then the characteristic is a curve and not a straight line.)

We next consider the characteristic equation

dx

dt= a,

x(0) = x0,

whose solution is given by x(t) = at + x0. Since u(t, x) must be constant onthis curve, we have

u(t, x) = u(0, x0) = Φ(x0) = Φ(x− at),

where the initial population density is given by Φ(x). The solution is deter-mined by the initial condition which is moving to the right with velocity a(the “slope” of the characteristic curve) as t increases (see Figure 9.7).

t0

t

x0x

j (x)

u(x,t)

characteristic curves

corresponding to

different initial sizes

dt 1

dx a=

FIGURE 9.7: Solution to equation (9.10) along the characteristic curve forg(t, x) ≡ a and µ = 0.


We now extend the above ideas to find the solution to the Sinko-Streifermodel given by the following initial-boundary value problem:

∂u

∂t+

∂

∂x(g(t, x)u(t, x)) = −µ(t, x)u(t, x) (9.12)


g(t, x0)u(t, x0) = R(t) (9.13)

and initial conditionu(0, x) = Φ(x). (9.14)

The total derivative of u along the characteristic curve dxdt = g is given by

du

dt=∂u

∂x

dx

dt+∂u

∂t,

=∂u

∂xg(t, x) +

∂u

∂t.

Since ∂u∂t + ∂

∂x (g(t, x)u(t, x)) = −µ(t, x)u(t, x), we obtain

∂u

∂t+ g(t, x)

∂u

∂x= −u(t, x)

∂

∂xg(t, x)− µ(t, x)u(t, x),

which impliesdu

dt= −u(t, x)

∂

∂xg(t, x)− µ(t, x)u(t, x).

That is, along the characteristic curve dxdt = g(t, x), the solution of the Sinko-

Streifer model satisfies the ordinary differential equation

du

dt= −ugx − µu,

= −(gx + µ)u,

which, after separation of variables, yields

u = v0e−

R(gx+µ) dt. (9.15)

Here, v0 is a constant of integration yet to be determined.We emphasize again that the solution u = v0e

−R

(gx+µ)dt is valid only for(t, x) satisfying dx

dt = g(t, x), that is, along the characteristic curve. Let(t,X(t; t, x)) denote a characteristic curve passing through (t, x) in the (t, x)plane as depicted in Figure 9.8, where X satisfies

d

dtX(t; t, x) = g(t,X(t; t, x))

X(t; t, x) = x.


t

x

x1

x

x0

X(t; t, x )

t

0

FIGURE 9.8: Characteristic curve.

The function X(t; t, x) : t→ x maps t (time) to x (size). Since g is assumed tobe positive, dX

dt > 0. Therefore, X is a strictly increasing function and hencehas an inverse T (x; t, x) : x → t. The characteristic curve passing through(t, x) is also given by (T (x; t, x), x).

Now let G(x) = T (x; 0, x0) denote the curve passing through (0, x0). Thiscurve divides the (t, x) plane into two parts as depicted in Figure 9.9. Wetherefore divide our considerations into two separate cases correspondingto the two regions in the (t, x) plane separated by the curve (G(x), x) =(T (x; 0, x0), x) = (t,X(t; 0, x0)).

1. For t ≤ G(x) we obtain

u(t, x) = v0e−

R(gx+µ) dt

= v0e−

R t0 (gx(ξ,x)+µ(ξ,x)) dξ. (9.16)

Evaluating equation (9.16) at t = 0 we find

Φ(x) = u(0, x) = v0e−

R 00 (gx+µ)dt,

which then impliesv0 = Φ(x),

where x = X(0; t, x). Substituting this back into equation (9.16) wehave

u(t, x) = Φ(x)e−R t0 [gx(ξ,x)+µ(ξ,x)]dξ.

This expression for u(t, x) holds only for values of x and t that are onthe characteristic curves. That is,

u(t, x) = Φ(X(0; t, x))e−R t0 [gx(ξ,X(ξ;t,x))+µ(ξ,X(ξ;t,x))] dξ. (9.17)


t

xx1x0

G(x)

t > G(x)

t < G(x)

FIGURE 9.9: Regions in the (t, x) plane defining the solution.

2. Next, in the region where t > G(x) we have

u(t, x) = v0e−

R tT (x0;t,x)[gx(ξ,x)+µ(ξ,x)] dξ

, (9.18)

where T (x0; t, x) is the time corresponding to x0. Hence at the point(t, x0), T (x0; t, x) = t. Therefore,

u(t, x0) = v0e−


= v0e−

R tt

[gx(ξ,x)+µ(ξ,x)] dξ

= v0.

We thus obtain from the boundary condition

v0 = u(t, x0) =R(t)g(t, x0)

.

Substituting this into equation (9.18) we obtain

u(t, x0) =R(t)g(t, x0)

e−


.

Again, since this expression is only valid for (t, x) lying on the charac-teristic curves, we obtain

u(t, x) =R(T (x0; t, x))g(T (x0; t, x), x0)

×

e−

R tT (x0;t,x)[gx(ξ,X(ξ;t,x))+µ(ξ,X(ξ;t,x))]dξ

. (9.19)


Equations (9.17) and (9.19) define the complete analytical solution to theSinko-Streifer model. In summary, this solution consists of two parts:

1. The first part in which the solution is given by (9.17) describes the partof the population whose members are survivors of the initial popula-tion density u(0, x) = Φ(x). This is called the initial condition drivensolution.

2. The second part in which the solution is given by (9.19) describes thepart of the population whose members were born after time t = 0 andenter the population via the boundary. This is called the recruitmentdriven solution.

We note that if the recruitment rate R(t) is known, expressions (9.17) and(9.19) completely decouple the solution to the Sinko-Streifer equation. Onthe other hand, if the recruitment rate is given by (9.11), we have

R(T (x0; t, x)) =∫ x1

x0

k(T (x0; t, x), ξ)u(T (x0; t, x), ξ)dξ (9.20)

and the solution does not decouple since values of u on the region t < G(x)will be used to compute the recruitment rate. In the following two examples,we will consider two special cases where we show how one can apply theabove derived formulas (9.17) and (9.19) to write explicitly the solutions tothe Sinko-Streifer model (9.10).

Example: Constant Growth Rate and MortalityIn this example, we consider the simple case where

g = g0 (constant),µ = µ0 (constant).

Hence, the model is given by

∂u

∂t+ g0

∂u

∂x= −µ0u

with boundary conditionu(t, x0)g0 = R(t)

and initial conditionu(0, x) = Φ(x).

In this simple case, the characteristic curve passing through (t, x) is givenby the initial-value problem

dx

dt= g0

x(t) = x,


which can be solved to obtain

x− x = (t− t)g0.

Thus, we obtain

X(t; t, x) = x+ g0(t− t)

T (x; t, x) = t+1g0

(x− x).

From these equations we can solve for the curve that passes through (0, x0)to be of the form:

G(x) = T (x; 0, x0)

=1g0

(x− x0).

Therefore, the initial condition driven solution, which is the solution definedfor t ≤ G(x) = 1

g0(x− x0), is given by

u(t, x) = Φ(x− g0t)e−R t0 µ0ds

= Φ(x− g0t)e−µ0t.

The recruitment driven solution, defined for t > G(x) = 1g0

(x − x0), has theform:

u(t, x) =R(T (x0; t, x))

g0e−

R tT (x0;t,x) µ0ds

=R(t+ 1

g0(x0 − x)

)g0

e−

R tt+ 1g0

(x0−x)µ0ds

=R(t+ 1

g0(x0 − x)

)g0

eµ0g0

(x0−x).

In the next example, we will consider a more general case where the growthrate and the mortality are both size dependent.

Example: Size-Dependent Growth Rate and MortalityIn this example, we assume that

g = g(x) and µ = µ(x).

Solving the characteristic equation

dx

dt= g(x)


by separation of variables and integration we obtain∫ x

x0

dx

g(x)=∫ t

0

dt = t.

Now defining the function

H(x) ≡∫ x

x0

dx

g(x)

and observing that for x0 ≤ x ≤ x,∫ x

x0

dx

g(x)=∫ x

x0

dx

g(x)+∫ x

x

dx

g(x),

we find that ∫ x

x

dx

g(x)=∫ x

x0

dx

g(x)−∫ x

x0

dx

g(x).

Hence, we obtain

t = t+∫ x

x

dx

g(x)

= t+∫ x

x0

dx

g(x)−∫ x

x0

dx

g(x).

From the above equation, it follows that

T (x; t, x) = t

= t+H(x)−H(x)

andx = H−1(t− t+H(x)) = X(t; t, x).

In particular, recalling an earlier definition of G(x) = T (x; 0, x0), i.e.,(G(x), x) is the characteristic passing through (0, x0), we see that T (x; 0, x0) =0+H(x)−H(x0) = H(x) so that in the case of time independent growth ratewe have G(x) =

∫ xx0

1g(ξ) dξ. That is, G(x) ≡ H(x). Thus, we have

T (x; t, x) = t

= t+G(x)−G(x)

and

x = G−1(t− t+G(x))= X(t; t, x).


9.6 The Sinko-Streifer Model and Inverse Problems

So far, we have derived the analytical solution to the Sinko-Streifer model(9.12)-(9.14), given g, µ, R, and Φ. This is called the forward problem. Nowwe consider the inverse problem. That is, given u(t, x) how do we find g, µ,R, and/or Φ?

One analytical method is due to Hackney and Webb [15], who developeda method for estimating growth and mortality rates from observed size dis-tribution of larval fish, and demonstrated the method using data from lar-val crappie. We will now investigate the foundations of the Hackney-Webbmethod and identify as well any possible limitations.

Assume that we have observations u(ti, xj) at times ti for size xj . In essence,when applying the Hackney-Webb method, one computes for each j the sums

nj ≡∑i

u(ti, xj)

andmj ≡

∑i

tiu(ti, xj),

and then plots xj versus the quotient pj = mj/nj . From this, the growth g,which can depend on size, can be estimated. Similarly, a plot of nj versus xjyields an estimate of the mortality rate function µ. To explain this method-ology, we introduce the functions n(x), m(x), p(x) and an auxiliary functionh(x) given by

h(x) =∫ ∞

0

(t−G(x))u(t, x) dt,

n(x) =∫ ∞

0

u(t, x) dt,

m(x) =∫ ∞

0

tu(t, x) dt,

p(x) =m(x)n(x)

.

We note that the quantities nj , mj and pj may be viewed as discretizationsfor the integrals in n, m and p, respectively. To understand the foundationsof the Hackney-Webb method, we need to develop some relationships betweenn, m, g, and µ.

We begin by noting that

h(x) = m(x)−G(x)n(x)


or

G(x) =m(x)− h(x)

n(x)

= p(x)− h(x)n(x)

.

If h(x)n(x) is constant, we obtain

G′(x) =1

g(x)= p′(x)

and hence a plot of x versus p(x) will determine g(x).We next show that h(x)

n(x) is constant in certain situations. Consider n(x),where

n(x) =∫ ∞

0

u(t, x) dt =∫ G(x)

0

u(t, x) dt+∫ ∞G(x)

u(t, x) dt

=∫ G(x)

0

uinit(t, x) dt+∫ ∞G(x)

urecr(t, x) dt

= ninit(x) + nrecr(x),

where uinit and urecr are the initial driven and recruitment driven solutionsderived earlier, given by equations (9.17) and (9.19), respectively. That is, fort ≤ G(x),

uinit(t, x) = Φ(X(0; t, x))e−R t0 [gx(X(ξ;t,x))+µ(X(ξ;t,x))] dξ

and for t > G(x)

urecr(t, x) =R(T (x0; t, x))

g(x0)e−

R tT (x0;t,x)[gx(X(ξ;t,x))+µ(X(ξ;t,x))] dξ

,

whereT (x; t, x) = t+G(x)−G(x)

and

X(t; t, x) = G−1(G(x) + t− t). (9.21)

Using the above equations for initial driven and recruitment driven solutionswe now obtain

ninit(x) =∫ G(x)

0

uinit(t, x) dt

=∫ G(x)

0

Φ(G−1(G(x)− t))e−R t0 [gx(X(ξ;t,x))+µ(X(ξ;t,x))] dξ dt.


Using the following substitutions

η = G−1(G(x)− t)

ands = X(ξ; t, x) = G−1(G(x) + ξ − t)

we obtain

ninit(x) = −∫ x0

x

Φ(η)e−R xη

[gx(s)+µ(s)]G′(s) dsG′(η) dη

=∫ x

x0

Φ(η)g(η)

e−R xη

1g(x) [gx(s)+µ(s)] ds dη,

where we use the identity G(x) =∫ xx0

dxg(x) and equation (9.21). Similarly, for

the recruitment driven term

nrecr(x) =∫ ∞G(x)

urecr(t, x) dt

=∫ ∞G(x)

R(T (x0; t, x))g(x0)

e−

R tT (x0;t,x)[gx(X(ξ;t,x))+µ(X(ξ;t,x))] dξ

dt,

if we let σ = T (x0; t, x) = t+G(x0)−G(x), we obtain

nrecr(x) =∫ ∞

0

1g(x0)

R(σ)e−R xx0

1g(s) [gx(s)+µ(s)] ds

dσ.

Now lettingD(x) = e

−R xx0

1g(s) [gx(s)+µ(s)]ds

we find

nrecr(x) = D(x)∫ ∞

0

R(σ)g(x0)

dσ

= c1D(x),

where c1 is a constant. Similarly, it can be shown that

ninit = D(x)∫ x

x0

Φ(η)g(η)

e+

R ηx0


dη,

hinit(x) = −∫ x

x0

G(η)Φ(η)g(η)

e−R xη

1g(s) [gx(s)+µ(s)] ds dη

= −D(x)∫ x

x0

G(η)Φ(η)g(η)

e+

R ηx0


dη,

hrecr(x) = D(x)∫ ∞

0

σR(σ)g(x0)

dσ.


Combining the above results, we obtain

n(x) = ninit(x) + nrecr(x)

= D(x)∫ x

x0

Φ(η)g(η)

eR ηx0

1g [gx+µ] ds

dη +D(x)c1

= D(x)[c1 +

∫ x

x0

Φ(η)g(η)

eR ηx0

1g [gx+µ] ds

dη

]while

h(x) = hinit(x) + hrecr(x)

= D(x)[c2 −

∫ x

x0

G(η)Φ(η)g(η)

eR ηx0

1g [gx+µ] ds

dη

].

Hence, if Φ vanishes outside [x0, x], then for x > x we have

η(x) = D(x)c1,h(x) = D(x)c2.

So h(x)η(x) = c2

c1, which is a constant for x ≥ x if Φ vanishes outside [x0, x].

Hence, the Hackney-Webb method can be expected to give estimates for zeroinitial conditions in case of time independent growth and mortality!

A more complete comparison of the method of Hackney and Webb to theinverse least squares method introduced in the project below can be found in[8].

9.7 Size Structure and Mosquitofish Populations

We return now to the mosquitofish populations that we introduced as moti-vation at the beginning of this chapter. We have discussed in previous sectionsthe Sinko-Streifer model and methods for its solution in both forward prob-lem and inverse problem settings. While the Sinko-Streifer model is widely(and successfully) used in the literature (see [1, 2, 17]) on biological popu-lations, it has some rather serious shortcomings. These are readily seen inconsidering the mosquitofish data [12] depicted in Figure 9.10. In this data,we see that a pulse of population (23 July) exhibits in time both dispersion (6August) and bifurcation (25 August). That is, a unimodal density dispersesand becomes bimodal. Recalling the solution of the Sinko-Streifer equation(in particular, the initial condition driven solution (9.17)), we see that a pulsepropagates without dispersion or bifurcation. The initial data Φ propagatesalong characteristics emanating from its region of non-zero support with am-plitude increasing or decreasing in time depending on the values of ∂g

∂x and µ.


In fact, one can argue from the Sinko-Streifer equation itself and the methodof characteristics that dispersion cannot occur unless ∂g

∂x > 0; this is a con-dition for spreading of the characteristic curves defined by dx

dt = g. Such anassumption is inherently unreasonable in many biological applications: in ourexample it is equivalent to the assumption that individual growth rates in-crease as one’s size increases! Indeed, it is counter-intuitive that the largerone is, the faster one grows.

FIGURE 9.10: Mosquitofish data.


Thus, while there is no hope that the Sinko-Streifer model as developed inthe previous sections can describe the mosquitofish data, one might be re-luctant to abandon such a popular as well as reasonable growth model. Onemight instead turn to a more careful analysis of the assumptions underlyingthe Sinko-Streifer equation. Further investigation of the mosquitofish popula-tions and their biological properties leads to the additional information thatmales and females reach different maximum sizes (30mm and 60mm, respec-tively). Hence, we conclude that males and females in the size range 28-30mmmust grow at different rates. This immediately violates one of the underly-ing assumptions of the Sinko-Streifer model that individuals of the same sizegrow at the same rate, i.e., the assumption dx

dt = g(t, x) cannot be reasonablefor mosquitofish (and possibly other) data. We next describe an idea firstintroduced in [9] and later theoretically developed in [4, 5, 6, 10] and morerecently in modeling of early growth of shrimp [3, 7].

To generalize the Sinko-Streifer equation, we allow individuals of the samesize to possess different individual growth rates. This can be accomplishedby assuming the existence of “intrinsic” parameters γ (which in general wecannot observe and hence cannot use to physically distinguish individuals inthe data) on which the growth rates depend. Thus we assume

dx

dt= g(t, x; γ) (9.22)

as a parameter-dependent individual growth rate. The parameter values mayrange over a set of admissible parameters Γ, and the total population is com-posed of subgroups, grouped together in population substructures character-ized by common γ values. For example, if Γ = γ1, γ2 (think males andfemales with each gender possessing a different γ value), and pi is the pro-portion of individuals with intrinsic parameters γi, then the total populationdensity v(t, x) would be given by

v(t, x) = p1u(t, x; γ1) + p2u(t, x; γ2), (9.23)

where u(t, x; γi) is the solution to the Sinko-Streifer equation using (9.22) withγ = γi.

Of course, as soon as one admits the generalization, it is quite reasonable toassume multiple subclasses corresponding to a finite (or even infinite) family ofγ values (again, think here that not all males of the same size have the sameγ values), leading to a distribution of growth rates within the population.For Γ = γ1, · · · , γM with corresponding proportions (probabilities) pi, with∑pi = 1, the expression (9.23) for total population density generalizes to

v(t, x) =M∑i=1

piu(t, x; γi). (9.24)

In the case of an (infinite) continuum Γ of intrinsic parameter values, the aboveideas generalize to a probability measure or distribution P characterizing a


distribution of the γ’s in Γ. Equation (9.24) becomes

v(t, x) =∫

Γ

u(t, x; γ)dP (γ). (9.25)

If the distribution P is (absolutely) continuous, i.e., possesses a correspondingdensity p = dP

dγ , then one has

v(t, x) =∫

Γ

u(t, x; γ)p(γ)dγ.

In [11] it is shown that these generalizations of Sinko-Streifer equation to in-clude distributed growth rates do indeed allow the required dispersion andbifurcation so as to describe well the mosquitofish data depicted in Figure9.10. In the exercise on distributed growth rates below the requested sim-ulations allow the reader to demonstrate these features of the generalizedSinko-Streifer model. More recently [3, 7], the Sinko-Streifer system withgrowth rate distributions has been successfully used to model the variabilityin the early growth of shrimp. A mathematical and stochastic theoreticalfoundation as well as computational ideas for this formulation can be foundin [3, 4, 5, 6, 10]

Project: Size-Structured Population Model InverseProblem

Consider the following Sinko-Streifer model for a larval fish population:

∂u

∂t+

∂

∂x(gu) = −µu, t ∈ (0, tf ], x ∈ (x0, x1)


g(t, x)u(t, x)|x=x0 = R(t)


1. In the case of constant growth and death rates (g(t, x) = g0 and µ(t, x) =µ0), derive the exact solution u(t, x) by the method of characteristics.

2. Define the recruitment function R(t) to be as follows:

R(t) =

3/4((αt− 1)− (αt− 1)3/3 + 2/3), t ∈ [0, 2/α],1, t ∈ [2/α, 2/α+ β],−3/4((αs− 1)− (αs− 1)3/3− 2/3), s = t− 2/α− β,

t ∈ [2/α+ β, 4/α+ β],0, otherwise,

where α = 15 and β = 1/α.


(i) Plot the recruitment function R(t) as a function of t for t ∈ [0, 5].

(ii) Assume that the sampling of the population starts prior to thebeginning of the reproductive season. That is the initial populationdistribution is zero (Φ(x) = 0). (This will also allow us to utilizethe method of Hackney and Webb to determine the growth rate gand the death rate µ.) Furthermore, assume that x0 = 0, x1 = 0.5,growth rate g0 = 0.185 and death rate µ0 = 1.9. Plot the exactsolution u(t, x) of the Sinko-Streifer model as a function of t ∈ [0, 5]and x (three-dimensional plot). Describe the dynamics of u(t, x).

3. We now create “simulated” data to be used for estimating growth andmortality rates. For this, we assume that population is sampled atequally spaced time and size intervals. We will subdivide the size interval[0, 0.5] and time interval [0, 5] into Nx and Nt equal subintervals oflength hx = 0.5/Nx and ht = 5/Nt respectively. Let us(ti, xj) denotethe number of larval fish sampled at size xj = j · hx, j = 1, . . . , Nxand at time ti = i · ht, i = 1, . . . , Nt. For the following problem takeNx = Nt = 20.

(i) Use the Hackney and Webb method to estimate the growth anddeath rates g0, µ0 respectively.

(ii) Now repeat the above question but using the inverse least squarestechnique. For this question we will use the following least squarescriteria: Minimize

J(g, µ) =Nt∑i=1

Nx∑j=1

|uc(ti, xj ; g, µ)− us(ti, xj)|2 ,

where uc(ti, xj ; g, µ) is the characteristic solution evaluated at sizexj and time ti for given values of g and µ. To compute the val-ues of g∗0 and µ∗0 which minimize the above least squares criteria,use the MATLAB routine fminu or the Nelder-Mead algorithmfminsearch.

4. In practice, the data collected is corrupted by noise (for example, errorsin collecting data, instrumental errors, etc.). In the following exercise,we will test the sensitivity of the method of Hackney and Webb and theinverse least squares method to errors in sampling the data. For this,we will add to each simulated data an error term as follows:

us(ti, xj) = us(ti, xj) + nl · randij ,

where randij are the normally distributed random numbers with zeromean and variance 1.0. Use the MATLAB routine randn(m,n) to gen-erate an m-by-n matrix with random entries. Finally, nl is a noise levelconstant.


(i) For nl = 0.01, 0.02, 0.05, 0.1, 0.2, estimate the growth and deathrates using both the Hackney and Webb method and the inverseleast squares method. Create a table listing the estimate values ofthe growth and death rates and the values of the cost functionalsfor each value of nl. Describe the sensitivity of each method withrespect to the noise level nl.

(ii) Plot on the same graph the solution u(t, x) computed using theexact values of g0 and µ0 and the estimated values of g0 and µ0 fornl = 0.01.

Project: Distributed Growth Rate Population Model

Consider the following Sinko-Streifer model for a size-structured mosquitofishpopulation:

∂u

∂t+

∂

∂x(gu) = −µu, t ∈ [0, tf ], x ∈ [x0, x1]


g(t, x)u(t, x)|x=x0 = R(t)


Use the following initial and terminal size classes, initial density and recruit-ment rate throughout:

• x0 = 0, x1 = 1,

• Φ(x) =

sin2 10πx, 0 ≤ x ≤ 0.1,0, 0.1 < x ≤ 1,

• R(t) = 0.

1. In the case of size-dependent growth rate and constant death rate (i.e.,g(t, x) = g(x) and µ(t, x) = µ0), derive the exact solution u(t, x) usingthe method of characteristics. Consider the case when g(x) = b(1 − x)where b is the intrinsic growth rate of the mosquitofish. Furthermore,assume b = 4.5 while considering the three different cases: µ0 = 0,µ0 = 4.5, and µ0 = 7.5.

(a) Plot (3D plot) the exact solution u(t, x) as a function of t ∈ [0, 0.5],and x ∈ [0, 1].


(b) Plot (2D plot) the exact solution u(ti, x) versus x for several dif-ferent moments in time ti.

Do this for all three values of µ0.

2. As discussed previously, assuming that same sized individuals grow atthe same rate is biologically unreasonable. Indeed, one would suspectthat individuals in a population have intrinsic parameters that affecttheir growth rates. Included in these parameters is the intrinsic growthrate, b, in the above example, of the mosquitofish. Consider populationswith a Gaussian distribution on b with a mean of 4.5 and variance of0.0816, i.e., b is N (b, σ2) with b = 4.5 and σ2 = 0.0816, and again carryout the computations below for each of the three values of µ0 givenabove.

(a) Plot (3D plot) the exact solution u(t, x) as a function of t ∈ [0, 0.5],and x ∈ [0, 1].

(b) Plot (2D plots) the exact solution u(ti, x) versus x ∈ [0, 1] forseveral different moments in time, ti.

(c) Do these solutions differ from those in part 1.? If so, how?

3. Also as discussed previously, the mosquitofish population data appearsto consist of at least two subclasses with differing growth characteristics.Thus, in order to attempt to simulate a similar population, consider thecase where g1(x) = b1(1− x) and g2(x) = b2(1− x) for the two distinctsubclasses.

(a) Assume a Bi-Gaussian distribution on b with an overall mean of4.5, with subpopulation means b1 = 3.3 and b2 = 5.7 as well asvariance σ2 = 0.492 each. As above, carry out the below for eachof the three values of µ0 given in part 1.

i. Plot the exact solution u(t, x) as a function of t ∈ [0, 0.5], andx ∈ [0, 1].

ii. Plot the exact solution u(ti, x) versus x for several differentmoments in time, ti.

iii. Do these solutions differ from those in questions 1 and 2? Ifso, how?

(b) In the previous questions, we assumed a constant maximum sizewith no dependence on the intrinsic growth rates. However, it ismore reasonable to assume that the maximum size is dependenton the intrinsic growth rates of the mosquitofish. Letting S denotethe maximum size, we let S(b) = 0.6 + 4

11 (b− 3.3). Then considera Bi-Gaussian distribution on b with overall mean of b = 4.4 withsubpopulation means b1 = 3.8 and b2 = 5.0 as well as varianceσ2 = 0.123 each. Again do this for all three values of µ0.


i. Plot the exact solution u(t, x) as a function of t ∈ [0, 0.5] andx.

ii. Plot the exact solution u(ti, x) versus x for several differentmoments in time, ti.

iii. Discuss the results (e.g., compare with the results of 1, 2, 3a).

References

[1] H.T. Banks, J.E. Banks, L.K. Dick and J.D. Stark, Estimation of dy-namic rate parameters in insect populations undergoing sublethal expo-sure to pesticides, CRSC-TR05-22, NCSU, May, 2005; Bull. Math. Biol.,69 (2007), pp. 2139–2180.

[2] H.T. Banks, J.E. Banks, L.K. Dick and J.D. Stark, Time-varying vitalrates in ecotoxicology: selective pesticides and aphid population dynam-ics, Ecological Modelling, 210 , 2008, pp. 155–160.

[3] H.T. Banks, V.A. Bokil, S. Hu, F.C.T. Allnutt, R. Bullis, A. K. Dharand C. L. Browdy, Shrimp biomass and viral infection for productionof biological countermeasures, CRSC-TR05-45, NCSU, December, 2005;Mathematical Biosciences and Engineering, 3, 2006, pp. 635–660.

[4] H.T. Banks, D.M. Bortz, G.A. Pinter and L.K. Potter, Modeling andimaging techniques with potential for application in bioterrorism, CRSC-TR03-02, NCSU, January, 2003; Chapter 6 in Bioterrorism: Mathe-matical Modeling Applications in Homeland Security, (H.T. Banks andC. Castillo-Chavez, eds.), Frontiers in Applied Math, FR28, SIAM,Philadelphia, PA, 2003, pp. 129–154.

[5] H.T. Banks and J.L. Davis, A comparison of approximation methods forthe estimation of probability distributions on parameters, CRSC-TR05-38, NCSU, October, 2005; Applied Numerical Mathematics, 57, 2007,pp. 753-777.

[6] H.T. Banks and J.L. Davis, Quantifying uncertainty in the estimation ofprobability distributions with confidence bands, CRSC-TR07-21, NCSU,December, 2007; Math. Biosci. Engr., 5, 2008, pp. 647–667.

[7] H.T. Banks, J.L. Davis, S.L. Ernstberger, S. Hu, E. Artimovich, A.K.Dhar and C.L. Browdy, Comparison of probabilistic and stochastic for-mulations in modeling growth uncertainty and variability, CRSC-TR08-03, NCSU, February, 2008; J. Biological Dynamics, to appear.

[8] H.T. Banks, L.W. Botsford, F. Kappel and C. Wang, Estimation ofgrowth and survival in size-structured cohort data: An application tolarval striped bass (Morone Saxatilis), J. Math. Biol., 30, 1991, pp.125–150.

277

278 References

[9] H.T. Banks, L.W. Botsford, F. Kappel and C. Wang, Modeling and esti-mation in size-structured population models, in Proceeding 2nd Courseon Mathematical Ecology, Trieste, 1986, World Press, Singapore, 1988,pp. 521–541.

[10] H.T. Banks and B.G. Fitzpatrick, Estimation of growth rate distri-butions in size-structured population models, Quart. Appl. Math., 49,1991, pp. 215–235.

[11] H.T. Banks, B.G. Fitzpatrick, L.K. Potter and Y. Zhang, Estimationof probability distributions for individual parameters using aggregatepopulation observations, in Stochastic Analysis, Control, Optimizationand Applications (W.Mceneaney, G. Yin, Q. Zhang, eds.), Birkhauser,1998, pp. 353–371.

[12] L.W. Botsford, B. Vandracek, T. Wainwright, A. Linden, R. Kope, D.Reed and J.J. Cech, Population development of the mosquitofish, Gam-busia Affinis, in rice fields, Environ. Biol. Fishes, 20, 1987, pp. 143–154.

[13] M. Braun, Differential Equations and Their Applications: An Introduc-tion to Applied Mathematics, Springer, Berlin, 4th ed., 1992.


[15] P.A. Hackney and J.C. Webb, A method for determining growth andmortality rates of ichthyoplankton, in Proc. Fourth National Workshopon Entrainment and Impringement (L.D. Jenson, ed.), Ecological Ana-lysts Inc., Melville, New York, 1978, pp. 115–124.


[17] M. Kot, Elements of Mathematical Ecology, Cambridge University Press,Cambridge, 2001.

[18] J.A.J. Metz and O. Diekmann (eds.), The Dynamics of PhysiologicallyStructured Populations, Lecture Notes in Biomathematics, 68, Springer,1986.

[19] R. Pearl and L.J. Reed, On the rate of growth of the population ofthe United States since 1790 and its mathematical representation, Pro-ceedings of the National Academy of Sciences of the United States ofAmerica, 6, pp. 275–288.

[20] J.W. Sinko and W. Streifer, A new model for age-size structure of apopulation, Ecology, 48, 1967, pp. 910–918.

[21] P.F. Verhulst, Recherches mathematiques sur la loi d’accroissement dela population, Noveaux Meoires de l’Academie Royale des Sciences etBelles Lettres de Bruxelles, 18, pp. 3–38.

References 279

[22] V. Volterra, Fluctuation in the abundance of a species considered math-ematically, Nature, 118, pp. 558–560.

Appendix A

An Introduction to FourierTechniques

In this appendix we review the basic tools and techniques from linear systemtheory used in the analysis of periodic and non-periodic waveforms. In par-ticular, the development of Fourier methods has had a major impact on theanalysis of linear systems. It allows the analysis of complex waveforms byconsidering sinusoidal components (see, for instance, [3, 4]).

A.1 Fourier Series

Functions that are periodic with finite energy within each period can berepresented by a Fourier series. That is, any real, periodic function x(t)can be represented as an infinite sum of increasing harmonic sine and cosinecomponents as

x(t) =12a0 +

∞∑n=1

(an cos(nt) + bn sin(nt)). (A.1)

Here, the terms cos(t) and sin(t) are called the fundamental terms. The highercomponent terms cos(nt) and sin(nt), for integer n > 1, are called harmonicterms.

The calculation of the coefficients an and bn are facilitated by the followingproperties of the sine, cosine, their products and cross-products.

For any integers m and n, ∫ π

−πsinmtdt = 0,∫ π

−πcosmtdt = 0,∫ π

−πsinmt cosnt dt = 0. (A.2)

281


For integers m 6= n, ∫ π

−πsinmt sinnt dt = 0,∫ π

−πcosmt cosnt dt = 0. (A.3)

For integers m = n, ∫ π

−π(sinmt)2 dt = π,∫ π

−π(cosmt)2 dt = π. (A.4)

Using the above formulas (A.2)-(A.4), we obtain the coefficient a0 by inte-grating both sides of equation (A.1) from −π to π to yield

a0 =1π

∫ π

−πx(t) dt. (A.5)

Similarly, by multiplying both sides of equation (A.1) by sinnt and cosnt andintegrating, we obtain

an =1π

∫ π

−πx(t) cos(nt) dt, (A.6)

bn =1π

∫ π

−πx(t) sin(nt) dt, (A.7)

respectively.The type of Fourier series expressed by equation (A.1) is known as a trigono-

metric Fourier series and can be applied only to real, periodic functions. An-other form of Fourier series, known as the exponential Fourier series, can beapplied to both real-valued and complex-valued functions x(t) as long as theyare periodic. This form of Fourier series makes use of the following identities

cos t =ejt + e−jt

2, (A.8)

sin t =ejt − e−jt

2j. (A.9)

In addition, for a periodic function with period T0, the frequency componentf0 = 1

T0is called the fundamental frequency. The higher frequency component

fn = nf0, for n > 1 and integer, is called the nth harmonic. Another frequencycomponent denoted by ω0 = 2π/T0 is called the fundamental radian frequency.Both terms f and ω are used to denote frequency. When f is used, the unitof frequency is in hertz (Hz); when ω is used, frequency in radians/second isintended.

An Introduction to Fourier Techniques 283

By replacing the nt term in equation (A.1) with 2πnf0t = 2πnt/T0 andusing the identities (A.9) we can express x(t) in exponential form as follows:

x(t) =12a0 +

12

∞∑n=1

[(an − jbn)ej2πnf0t + (an + jbn)e−j2πnf0t

]. (A.10)

The above expression can be rewritten in the following form

x(t) =∞∑

n=−∞cne

j2πnf0t, (A.11)

where the complex coefficients cn are defined by

cn =

12a0 n = 012 (an − jbn) n > 012 (an + jbn) n < 0

and can be computed from the following equation

cn =1T0

∫ T0/2

−T0/2

x(t)e−j2πnf0t dt.

The complex Fourier coefficient cn can be expressed by the following ex-pression

cn = |cn|ejθn , (A.12)c−n = |cn|e−jθn , (A.13)

where

|cn| =12

√a2n + b2n, (A.14)

θn = tan−1

(− bnan

), (A.15)

and b0 = 0 and c0 = a02 . Plots of |cn| and θn versus n or nf0 are called

the discrete spectra of x(t). The plot of |cn| is usually called the magnitudespectrum, and the plot of θn is referred to as phase spectrum. Furthermore, ifx(t) is a real-valued periodic time function, we have

c−n = c∗n,

which implies that the magnitude spectrum is an even function of frequency(since |cn| = |c−n|). Similarly, from equation (A.15), the phase spectrum isan odd function of frequency.


A.2 Fourier Transforms

The Fourier series is very useful in characterizing arbitrary periodic func-tions or waveforms. For nonperiodic signals, however, a frequency-domainapproach based on the Fourier transforms provides a more convenient repre-sentation.

Let x(t) denote an arbitrarily absolutely integrable function. That is,∫ ∞−∞|x(t)| dt <∞.

The Fourier transform of x(t), denoted by X(f) or, equivalently, F [x(t)], isdefined by

F [x(t)] = X(f) =∫ ∞−∞

x(t)e−j2πft dt. (A.16)

The inverse Fourier transform of X(f) is x(t) and is given by

F−1[X(f)] = x(t) =∫ ∞−∞

X(f)ej2πft df.

It is noted that the Fourier transform is a function of the function valueson (−∞,∞) and, consequently, the initial conditions are not treated. Thisdiffers from the Laplace transform. Since the Laplace transform of x(t) isdefined by

L[x(t)] =∫ ∞

0

e−stx(t) dt,

it involves the initial condition effects in a nontrivial way.We now summarize the most important properties of the Fourier transform.

1. Differentiation: Differentiation in the time domain corresponds to mul-tiplication by j2πf in the frequency domain. That is,

F [x′(t)] = j2πfX(f),F [x(n)(t)] = (j2πf)nX(f).

2. Modulation: Multiplication by ej2πf0t in the time domain is equivalentto a frequency shift in the frequency domain:

F [ej2πf0tx(t)] = X(f − f0).

3. Time shift: A time shift in the time domain results in a phase shift inthe frequency domain:

F [x(t− t0)] = e−j2πft0X(f).

An Introduction to Fourier Techniques 285

4. Duality:F [X(t)] = x(−f).

5. Hermitian Symmetry: If x(t) is a real-valued function,

X(−f) = X∗(f).

6. Convolution: Convolution in the time domain corresponds to a multi-plication in the frequency domain, and vice versa. That is,

F [x(t) ∗ y(t)] = X(f)Y (f)F [x(t)y(t)] = X(f) ∗ Y (f),

where Y (f) = F [y(t)] and x(t) ∗ y(t) =∫∞−∞ x(t− τ)y(τ) dτ .

7. Parseval’s Indentity:∫ ∞−∞

x(t)y∗(t) dt =∫ ∞−∞

X(f)Y ∗(f) df.

In the case that y(t) = x(t), we obtain Rayleigh’s relation∫ ∞−∞|x(t)|2 dt =

∫ ∞−∞|X(f)|2 df.

As already mentioned above, the Fourier transform is the extension of theFourier series concept to non-periodic functions. However, for a periodic func-tion x(t), with period T0, the exponential Fourier series expansion is given by

x(t) =∞∑

n=−∞cne

j2πnt/T0 . (A.17)

Now, applying the Fourier transform to both sides of the above equation, weobtain

X(f) = F

[ ∞∑n=−∞

cnej2πnt/T0

]

=∞∑

n=−∞cnF [ej2πnt/T0 ]

=∞∑

n=−∞cnδ

(f − n

T0

).

In other words, the Fourier transform of a periodic function consists of im-pulses at multiples of the fundamental frequency (harmonics) of the originalsignal. Because of this property it is used widely in signal processing method-ology.

Appendix B

Review of Vector Calculus

This appendix is devoted to a brief presentation of the essential aspects ofvector calculus. In particular, we will review results that relate line integralsto double integrals and triple integrals to surface integrals. The correspondingformulas of Stokes and Gauss are of great importance in many physical andengineering problems and are presented here without proofs. For an in-depthtreatment of this subject, we refer the interested reader to many excellenttextbooks in the literature (see, e.g., [1, 2, 4]).

We begin by recalling some properties of the algebraic operations and com-binations of vector fields. These properties occur in many theoretical andapplication considerations.

1. Curl of a gradient: An important corresponding formula is

curl grad f = ~0,

or ∇ × (∇f) = 0. Conversely, if curl ~u = 0, then ~u = grad f for somescalar function f . A vector field ~u such that curl ~u = ~0 is said to beirrotational.

2. Divergence of a curl: One also has the rule

div curl ~u = 0.

Similarly, there is also an important converse; that is, if div ~u = 0, then~u = curl ~v for some vector field ~v. A vector field ~u such that div ~u = 0is often termed solenoidal.

3. Divergence of a gradient: Here one has the expression

div grad f =∂2f

∂x2+∂2f

∂y2+∂2f

∂z2.

The expression on the right side of the above equation is known asthe Laplacian of f and is denoted by 4f or ∇2f . A function f withcontinuous second partial derivatives such that ∇2f = 0 in a domainis called harmonic in that domain and the equation ∇2f = 0 is calledLaplace’s equation.

287


4. Divergence of a vector product:

div (~u× ~v) = ~v · curl ~u− ~u · curl ~v.

5. Curl of a vector product:

curl (~u× ~v) = (div ~v)~u− (div ~u)~v + (~v · ∇)~u− (~u · ∇)~v.

6. Curl of a curl: Another important property is the relation

curl curl ~u = grad div ~u−∇2~u,

where the Laplacian of a vector field ~u = (ux, uy, uz) is defined to be

∇2~u = ∇2ux~i+∇2uy~j +∇2uz~k.

We now review some important topics of vector integral calculus. First, somedefinitions are in order. A curve C in space is said to be smooth in thexyz-plane if it has the parametric representation

~r(s) = x(s)~i+ y(s)~j + z(s)~k, a ≤ s ≤ b,

where x, y, and z are continuous and have continuous derivatives for a ≤ s ≤ b.We assign a direction to C by choosing one of the two directions along C to bethe positive direction (usually that of increasing s). Let f(x, y, z), g(x, y, z),and h(x, y, z) be functions which are defined and continuous in a domain Dof R3. Then the line integral∮

C

f dx+ g dy + h dz (B.1)

is said to be independent of path in D if, for every pair of endpoints A and Bin D, the value of the line integral (B.1) is the same for all paths C from Ato B in D. In other words, the value of the line integral depends in generalon the endpoints A and B, but not on the choice of the path joining them.

The following theorem, which states that a double integral over a planeregion can be replaced by a line integral over the boundary of the region, isknown as Green’s theorem and is fundamental in the theory of line integrals.

THEOREM B.1Let R denote a closed bounded domain in the xy-plane whose boundary Cconsists of finitely many smooth curves. Let P (x, y) and Q(x, y) be functionswhich are continuous and have continuous first partial derivatives in R. Then∮

C

P dx+Qdy =∫∫R

(∂Q

∂x− ∂P

∂y

)dx dy. (B.2)

Review of Vector Calculus 289

If the line integral∮CP dx+Qdy is independent of path in R, then∮

C

P dx+Qdy = 0 =∫∫R

(∂Q

∂x− ∂P

∂y

)dx dy, (B.3)

which implies ∂Q∂x = ∂P

∂y . Hence, there exists a function φ(x, y) defined in Rsuch that

∂φ

∂y= Q(x, y),

∂φ

∂x= P (x, y). (B.4)

The converse of the above result does not hold without a further restriction,namely, that the domain R be simply connected [4]. In plain terms, a domainis simply connected if it has no “holes.”

We now generalize Green’s theorem where the double integrals are definedover a plane region to the case of surface integrals. This extension will requiresome basic facts about surfaces. A surface S in space is said to be a smoothsurface if its normal vector depends continuously on the points of S. If S isnot smooth but consists of finitely many smooth portions, then S is said to bea piecewise smooth surface. A domain D in R3 is said to be simply connectedif every simple closed curve in D forms the boundary of a smooth orientedsurface in D.

One generalization of Green’s theorem to R3 takes the form of the followingStokes’s theorem.

THEOREM B.2Let S be a piecewise smooth oriented surface in space and let the boundary ofS be a piecewise smooth simple closed curve C. Let ~u(x, y, z) be a continuousvector field, with continuous first partial derivatives in a domain that containsS. Then ∮

C

~u · d~r =∫∫S

(curl ~u · n) dA, (B.5)

where n is the unit normal vector of S.

In addition, one can also generalize the above discussions on path independ-ence for two to three dimensions. In fact, line integrals independent of pathin R3 are defined analogously as in R2.

THEOREM B.3Let ~u(x, y, z) = f(x, y, z)~i+g(x, y, z)~j+h(x, y, z)~k be a continuous vector fielddefined in a domain D of R3. The line integral∮

f dx+ g dy + h dz


is independent of path in D if and only if there exists a scalar functionΦ(x, y, z) in D such that ~u = ∇Φ.

The above theorem immediately implies the following important result.

THEOREM B.4Let f(x, y, z), g(x, y, z) and h(x, y, z) be continuously defined functions in adomain D of R3. The line integral∮

f dx+ g dy + h dz

is independent of path if and only if∮C

f dx+ g dy + h dz = 0

for any simple, closed curve C in D.

From these theorems we may derive the following criterion.

THEOREM B.5Let ~u(x, y, z) = f(x, y, z)~i + g(x, y, z)~j + h(x, y, z)~k be a continuous vectorfield with continuous first partial derivatives in a domain D of R3. If the lineintegral ∮

f dx+ g dy + h dz

is independent of path in D, then

curl ~u = ~0

everywhere in D. Conversely, if D is simply connected and curl ~u = ~0 in D,then

~u = ∇Φ

for some scalar function Φ.

We remark that a vector field ~u satisfying the condition curl ~u = ~0 in adomain D is termed vortex free or irrotational in D.

Finally, a second generalization of Green’s theorem is the divergence theoremor Gauss’ theorem, which also plays an important role in many theoretical andpractical considerations.

THEOREM B.6Let ~u(x, y, z) be a vector field which is continuous and has continuous firstpartial derivatives in a domain D of R3. Let S be a piecewise smooth oriented

Review of Vector Calculus 291

surface in D that forms the complete boundary of a closed bounded region Rin D. Then ∫∫

S

~u · n dA =∫∫∫R

div ~u dx dy dz,

where n is the outer unit normal vector of S.

References

[1] R. Buck, Advanced Calculus, McGrawHill, New York, 1978.

[2] R. Courant, Differential and Integral Calculus, translation by E.J.McShane, Interscience, New York, 1937.



293

files.transtutors.com · textbooks in mathematics series editor: denny gulick published titles...

Documents