numerical methods for stochastic computations

fm May 18, 2010

Numerical Methods for StochasticComputations

fm May 18, 2010

fm May 18, 2010


A Spectral Method Approach

Dongbin Xiu

P R I N C E T O N U N I V E R S I T Y P R E S S

P R I N C E T O N A N D O X F O R D

fm May 18, 2010

Copyright c© 2010 by Princeton University Press

Published by Princeton University Press, 41 William Street,Princeton, New Jersey 08540

In the United Kingdom: Princeton University Press, 6 Oxford Street,Woodstock, Oxfordshire OX20 1TW

press.princeton.edu

All Rights Reserved

Library of Congress Cataloging-in-Publication DataXiu, Dongbin, 1971–Numerical methods for stochastic computations : a spectral methodapproach / Dongbin Xiu.

p. cm.Includes bibliographical references and index.ISBN 978-0-691-14212-8 (cloth : alk. paper)1. Stochastic differential equations—Numerical solutions.2. Stochastic processes. 3. Spectral theory (Mathematics).4. Approximation theory. 5. Probabilities. I. Title.QA274.23.X58 2010519.2—dc22 2010014244

British Library Cataloging-in-Publication Data is available

This book has been composed in Times

Printed on acid-free paper. ∞Typeset by S R Nova Pvt Ltd, Bangalore, India

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

fm May 18, 2010

To Yvette, our parents, and Isaac.

fm May 18, 2010

fm May 18, 2010

Contents

Preface xi

Chapter 1 Introduction 1

1.1 Stochastic Modeling and Uncertainty Quantification 11.1.1 Burgers’ Equation: An Illustrative Example 11.1.2 Overview of Techniques 31.1.3 Burgers’ Equation Revisited 4

1.2 Scope and Audience 51.3 A Short Review of the Literature 6

Chapter 2 Basic Concepts of Probability Theory 9

2.1 Random Variables 92.2 Probability and Distribution 10

2.2.1 Discrete Distribution 112.2.2 Continuous Distribution 122.2.3 Expectations and Moments 132.2.4 Moment-Generating Function 142.2.5 Random Number Generation 15

2.3 Random Vectors 162.4 Dependence and Conditional Expectation 182.5 Stochastic Processes 202.6 Modes of Convergence 222.7 Central Limit Theorem 23

Chapter 3 Survey of Orthogonal Polynomials and Approximation Theory 25

3.1 Orthogonal Polynomials 253.1.1 Orthogonality Relations 253.1.2 Three-Term Recurrence Relation 263.1.3 Hypergeometric Series and the Askey Scheme 273.1.4 Examples of Orthogonal Polynomials 28

3.2 Fundamental Results of Polynomial Approximation 303.3 Polynomial Projection 31

3.3.1 Orthogonal Projection 313.3.2 Spectral Convergence 333.3.3 Gibbs Phenomenon 35

fm May 18, 2010

viii CONTENTS

3.4 Polynomial Interpolation 363.4.1 Existence 373.4.2 Interpolation Error 38

3.5 Zeros of Orthogonal Polynomials and Quadrature 393.6 Discrete Projection 41

Chapter 4 Formulation of Stochastic Systems 44

4.1 Input Parameterization: Random Parameters 444.1.1 Gaussian Parameters 454.1.2 Non-Gaussian Parameters 46

4.2 Input Parameterization: Random Processes and DimensionReduction 474.2.1 Karhunen-Loeve Expansion 474.2.2 Gaussian Processes 504.2.3 Non-Gaussian Processes 50

4.3 Formulation of Stochastic Systems 514.4 Traditional Numerical Methods 52

4.4.1 Monte Carlo Sampling 534.4.2 Moment Equation Approach 544.4.3 Perturbation Method 55

Chapter 5 Generalized Polynomial Chaos 57

5.1 Definition in Single Random Variables 575.1.1 Strong Approximation 585.1.2 Weak Approximation 60

5.2 Definition in Multiple Random Variables 645.3 Statistics 67

Chapter 6 Stochastic Galerkin Method 68

6.1 General Procedure 686.2 Ordinary Differential Equations 696.3 Hyperbolic Equations 716.4 Diffusion Equations 746.5 Nonlinear Problems 76

Chapter 7 Stochastic Collocation Method 78

7.1 Definition and General Procedure 787.2 Interpolation Approach 79

7.2.1 Tensor Product Collocation 817.2.2 Sparse Grid Collocation 82

7.3 Discrete Projection: Pseudospectral Approach 837.3.1 Structured Nodes: Tensor and Sparse Tensor

Constructions 857.3.2 Nonstructured Nodes: Cubature 86

7.4 Discussion: Galerkin versus Collocation 87

fm May 18, 2010

CONTENTS ix

Chapter 8 Miscellaneous Topics and Applications 89

8.1 Random Domain Problem 898.2 Bayesian Inverse Approach for Parameter Estimation 958.3 Data Assimilation by the Ensemble Kalman Filter 99

8.3.1 The Kalman Filter and the Ensemble Kalman Filter 1008.3.2 Error Bound of the EnKF 1018.3.3 Improved EnKF via gPC Methods 102

Appendix A Some Important Orthogonal Polynomials in the Askey Scheme 105

A.1 Continuous Polynomials 106A.1.1 Hermite Polynomial Hn(x) and Gaussian Distribution 106A.1.2 Laguerre Polynomial L(α)

n (x) and Gamma Distribution 106A.1.3 Jacobi Polynomial P

(α,β)n (x) and Beta Distribution 107

A.2 Discrete Polynomials 108A.2.1 Charlier Polynomial Cn(x; a) and Poisson Distribution 108A.2.2 Krawtchouk Polynomial Kn(x; p, N) and Binomial

Distribution 108A.2.3 Meixner Polynomial Mn(x; β, c) and Negative

Binomial Distribution 109A.2.4 Hahn Polynomial Qn(x; α, β, N) and Hypergeometric

Distribution 110

Appendix B The Truncated Gaussian Model G(α, β) 113

References 117

Index 127

fm May 18, 2010

fm May 18, 2010

Preface

The field of stochastic computations, in the context of understanding the impactof uncertainty on simulation results, is relatively new. However, over the past fewyears, the field has undergone tremendous growth and rapid development. Thiswas driven by the pressing need to conduct verification and validation (V&V) anduncertainty quantification (UQ) for practical systems and to produce predictionsfor physical systems with high fidelity. More and more researchers with diversebackgrounds, ranging from applied engineering to computer science to computa-tional mathematics, are stepping into the field because of the relevance of stochas-tic computing to their own research. Consequently there is a growing need for anentry-level textbook focusing on the fundamental aspects of this kind of stochasticcomputation. And this is precisely what this book does.

This book is a result of several years of studying stochastic computation andthe valuable experience of teaching the topic to a group of talented graduate stu-dents with diverse backgrounds at Purdue University. The purpose of this bookis to present in a systematic and coherent way numerical strategies for uncertaintyquantification and stochastic computing, with a focus on the methods based on gen-eralized polynomial chaos (gPC) methodology. The gPC method, an extension ofthe classical polynomial chaos (PC) method developed by Roger Ghanem [45] inthe 1990s, has become one of the most widely adopted methods, and in many casesarguably the only feasible method, for stochastic simulations of complex systems.This book intends to examine thoroughly the fundamental aspects of these methodsand their connections to classical approximation theory and numerical analysis.

The goal of this book is to collect, in one volume, all the basic ingredients nec-essary for the understanding of stochastic methods based on gPC methodology. Itis intended as an entry-level graduate text, covering the basic concepts from thecomputational mathematics point of view. This book is unique in the fact that itis the first book to present, in a thorough and systematic manner, the fundamen-tals of gPC-based numerical methods and their connections to classical numericalmethods, particularly spectral methods. The book is designed as a one-semesterteaching text. Therefore, the material is self-contained, compact, and focused onlyon the fundamentals. Furthermore, the book does not utilize difficult, complicatedmathematics, such as measure theory in probability and Sobolev spaces in numer-ical analysis. The material is presented with a minimal amount of mathematicalrigor so that it is accessible to researchers and students in engineering who are

fm May 18, 2010

xii PREFACE

interested in learning and applying the methods. It is the author’s hope that aftergoing through this text, readers will feel comfortable with the basics of stochas-tic computation and go on to apply the methods to their own problems and pursuemore advanced topics in this perpetually evolving field.

West Lafayette, Indiana, USA Dongbin XiuMarch 2010

fm May 18, 2010


chapter01 May 18, 2010

Chapter One

Introduction

The goal of this chapter is to introduce the idea behind stochastic computing in thecontext of uncertainty quantification (UQ). Without using extensive discussions (ofwhich there are many), we will use a simple example of a viscous Burgers’ equationto illustrate the impact of input uncertainty on the behavior of a physical system andthe need to incorporate uncertainty from the beginning of the simulation and not asan afterthought.

1.1 STOCHASTIC MODELING AND UNCERTAINTY QUANTIFICATION

Scientific computing has become the main tool in many fields for understanding thephysics of complex systems when experimental studies can be lengthy, expensive,inflexible, and difficulty to repeat. The ultimate goal of numerical simulations isto predict physical events or the behaviors of engineered systems. To this end, ex-tensive efforts have been devoted to the development of efficient algorithms whosenumerical errors are under control and understood. This has been the primary goalof numerical analysis, which remains an active research branch. What has beenconsidered much less in classical numerical analysis is understanding the impactof errors, or uncertainty, in data such as parameter values and initial and boundaryconditions.

The goal of UQ is to investigate the impact of such errors in data and subse-quently to provide more reliable predictions for practical problems. This topic hasreceived an increasing amount of attention in past years, especially in the contextof complex systems where mathematical models can serve only as simplified andreduced representations of the true physics. Although many models have been suc-cessful in revealing quantitative connections between predictions and observations,their usage is constrained by our ability to assign accurate numerical values to var-ious parameters in the governing equations. Uncertainty represents such variabilityin data and is ubiquitous because of our incomplete knowledge of the underlyingphysics and/or inevitable measurement errors. Hence in order to fully understandsimulation results and subsequently to predict the true physics, it is imperative toincorporate uncertainty from the beginning of the simulations and not as an after-thought.

1.1.1 Burgers’ Equation: An Illustrative Example

Let us consider a viscous Burgers’ equation,{ut + uux = νuxx, x ∈ (−1, 1),

u(−1) = 1, u(1) = −1,(1.1)


2 CHAPTER 1

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

mean solutionstandard deviationupper boundlower bound

Figure 1.1 Stochastic solutions of Burgers’ equation (1.1) with u(−1, t) = 1+ δ, where δ isa uniformly distributed random variable in (0, 0.1) and ν = 0.05. The solid lineis the average steady-state solution, with the dotted lines denoting the bounds ofthe random solutions. The dashed line is the standard deviation of the solution.(Details are in [123].)

where u is the solution field and ν > 0 is the viscosity. This is a well-known non-linear partial differential equation (PDE) for which extensive results exist. The pres-ence of viscosity smooths out the shock discontinuity that would develop otherwise.Thus, the solution has a transition layer, which is a region of rapid variation andextends over a distance of O(ν) as ν ↓ 0. The location of the transition layer z,defined as the zero of the solution profile u(t, z) = 0, is at zero when the solu-tion reaches steady state. If a small amount of (positive) uncertainty exists in thevalue of the left boundary condition (possibly due to some bias measurement orestimation errors), i.e., u(−1) = 1 + δ, where 0 < δ � 1, then the location ofthe transition can change significantly. For example, if δ is a uniformly distributedrandom variable in the range of (0, 0.1), then the average steady-state solution withν = 0.05 is the solid line in figure 1.1. It is clear that a small uncertainty of 10percent can cause significant changes in the final steady-state solution whose aver-age location is approximately at z ≈ 0.8, resulting in a O(1) difference from thesolution with an idealized boundary condition containing no uncertainty. (Detailsof the computations can be found in [123].)

The Burgers’ equation example demonstrates that for some problems, especiallynonlinear ones, a small uncertainty in data may cause nonnegligible changes in thesystem output. Such changes cannot be captured by increasing resolution of theclassical numerical algorithms if the uncertainty is not incorporated at the begin-ning of the computations.


INTRODUCTION 3

1.1.2 Overview of Techniques

The importance of understanding uncertainty has been realized by many for a longtime in disciplines such as civil engineering, hydrology, control, etc. Consequentlymany methods have been devised to tackle this issue. Because of the “uncertain”nature of the uncertainty, the most dominant approach is to treat data uncertainty asrandom variables or random processes and recast the original deterministic systemsas stochastic systems.

We remark that these types of stochastic systems are different from classicalstochastic differential equations (SDEs) where the random inputs are idealizedprocesses such as Wiener processes, Poisson processes, etc., and tools such as sto-chastic calculus have been developed extensively and are still under active research.(See, for example, [36, 55, 57, 85].)

1.1.2.1 Monte Carlo– and Sampling-Based Methods

One of the most commonly used methods is Monte Carlo sampling (MCS) or oneof its variants. In MCS, one generates (independent) realizations of random inputsbased on their prescribed probability distribution. For each realization the data arefixed and the problem becomes deterministic. Upon solving the deterministic re-alizations of the problem, one collects an ensemble of solutions, i.e., realizationsof the random solutions. From this ensemble, statistical information can be ex-tracted, e.g., mean and variance. Although MCS is straightforward to apply as itonly requires repetitive executions of deterministic simulations, typically a largenumber of executions are needed, for the solution statistics converge relativelyslowly. For example, the mean value typically converges as 1/

√K , where K is

the number of realizations (see, for example, [30]). The need for a large number ofrealizations for accurate results can incur an excessive computational burden, espe-cially for systems that are already computationally intensive in their deterministicsettings.

Techniques have been developed to accelerate convergence of the brute-forceMCS, e.g., Latin hypercube sampling (cf. [74, 98]) and quasi Monte Carlo sampling(cf. [32, 79, 80]), to name a few. However, additional restrictions are posed basedon the design of these methods, and their applicability is often limited.

1.1.2.2 Perturbation Methods

The most popular nonsampling methods were perturbation methods, where randomfields are expanded via Taylor series around their mean and truncated at a certainorder. Typically, at most second-order expansion is employed because the result-ing system of equations becomes extremely complicated beyond the second order.This approach has been used extensively in various engineering fields [56, 71, 72].An inherent limitation of perturbation methods is that the magnitude of the uncer-tainties, at both the inputs and outputs, cannot be too large (typically less than 10percent), and the methods do not perform well otherwise.


4 CHAPTER 1

1.1.2.3 Moment Equations

In this approach one attempts to compute the moments of the random solutiondirectly. The unknowns are the moments of the solution, and their equations arederived by taking averages of the original stochastic governing equations. For ex-ample, the mean field is determined by the mean of the governing equations. Thedifficulty lies in the fact that the derivation of a moment almost always, except onsome rare occasions, requires information about higher moments. This brings outthe closure problem, which is often dealt with by utilizing some ad hoc argumentsabout the properties of the higher moments.

1.1.2.4 Operator-Based Methods

These kinds of approaches are based on manipulation of the stochastic operatorsin the governing equations. They include Neumann expansion, which expresses theinverse of the stochastic operator in a Neumann series [95, 131], and the weightedintegral method [23, 24]. Similar to perturbation methods, these operator-basedmethods are also restricted to small uncertainties. Their applicability is often stron-gly dependent on the underlying operator and is typically limited to static problems.

1.1.2.5 Generalized Polynomial Chaos

A recently developed method, generalized polynomial chaos (gPC) [120], a gen-eralization of classical polynomial chaos [45], has become one of the most widelyused methods. With gPC, stochastic solutions are expressed as orthogonal poly-nomials of the input random parameters, and different types of orthogonal poly-nomials can be chosen to achieve better convergence. It is essentially a spectralrepresentation in random space and exhibits fast convergence when the solutiondepends smoothly on the random parameters. gPC-based methods will be the focusof this book.

1.1.3 Burgers’ Equation Revisited

Let us return to the viscous Burgers’ example (1.1), with the same parameter set-tings that produced figure 1.1. Let us examine the location of the averaged transitionlayer and the standard deviation of the solution at this location as obtained by differ-ent methods. Table 1.1 shows the results by Monte Carlo simulations, and table 1.2by a perturbation method at different orders. The converged solutions by gPC (up tothree significant digits) are obtained by a fourth-order expansion and are tabulatedfor comparison. It can be seen that MCS achieves the same accuracy with O(104)

realizations. On the other hand, the computational cost of the fourth-order gPC isapproximately equivalent to that for five deterministic simulations. The perturba-tion methods have a low computational cost similar to that of gPC. However, theaccuracy of perturbation methods is much less desirable, as shown in table 1.2. Infact, by increasing the perturbation orders, no clear convergence can be observed.This is caused by the relatively large uncertainty at the output, which can be as highas 40 percent, even though the input uncertainty is small.


INTRODUCTION 5

Table 1.1 Mean Location of the Transition Layer (z) and Its Standard Deviation (σz) byMonte Carlo Simulationsa

n = 100 n = 1,000 n = 2,000 n = 5,000 n = 10,000 gPCz 0.819 0.814 0.815 0.814 0.814 0.814σz 0.387 0.418 0.417 0.417 0.414 0.414

an is the number of realizations, δ ∼ U(0, 0.1), and ν = 0.05. Also shown are the convergedgPC solutions.

Table 1.2 Mean Location of the Transition Layer (z) and Its Standard Deviation (σz) Ob-tained by Perturbation Methodsa

k = 1 k = 2 k = 3 k = 4 gPCz 0.823 0.824 0.824 0.824 0.814σz 0.349 0.349 0.328 0.328 0.414

ak is the order of the perturbation expansion, δ ∼ U(0, 0.1), and ν = 0.05. Also shown arethe converged gPC solutions.

This example demonstrates the accuracy and efficiency of the gPC method. Itshould be remarked that although gPC shows a significant advantage here, the con-clusion cannot be trivially generalized to other problems, as the strength and theweakness of gPC, or any method for that matter, are problem-dependent.

1.2 SCOPE AND AUDIENCE

As a graduate-level text, this book focuses exclusively on the fundamental aspectsof gPC-based numerical methods, with a detailed exposition of their formulations,basic properties, and connections to classical numerical methods. No research top-ics are discussed in this book. Although this leaves out many exciting new develop-ments in stochastic computing, it helps to keep the book self-contained, compact,and more accessible to students who want to learn the basics. The material is alsochosen and organized in such a way that the book can be finished in a one-semestercourse. Also, the book is not intended to contain a thorough and exhaustive liter-ature review. References are limited to those that are more accessible to graduatestudents.

In chapter 2, we briefly review the basic concepts of probability theory. This isfollowed by a brief review of approximation theory in chapter 3. The material inthese two chapters is kept at almost an absolute minimum, with only the very ba-sic concepts included. The goal of these two chapters is to prepare students for themore advanced material in the following chapters. An interesting question is howmuch time the instructor should dedicate to these two chapters. Students takingthe course usually have some background knowledge of either numerical analy-sis (which gives them some preparation in approximation theory) or probabilitytheory (or statistics), but rarely do students have both. And a comprehensive cov-erage of both topics can easily consume a large portion of class time and leave notime for other material. From the author’s personal teaching experience, it is better


6 CHAPTER 1

to go through probability theory rather fast, covering only the basic concepts andleaving other concepts as reading assignments. This is reflected in the writing ofthis book, as chapter 2 is quite concise. The approximation theory in chapter 3 de-serves more time, as it is closely related to many concepts of gPC in the ensuingchapters.

In chapter 4, the procedure for formulating stochastic systems is presented, andan important step, parameterization of random inputs, is discussed in detail. A for-mal and systematic exposition of gPC is given in chapter 5, where some of theimportant properties of gPC expansion are presented. Two major numerical ap-proaches, stochastic Galerkin and stochastic collocation, are covered in chapters 6and 7, respectively. The algorithms are discussed in detail, along with some exam-ples for better understanding. Again, only the basics of the algorithms are covered.More advanced aspects of the techniques, such as adaptive methods, are left asresearch topics.

The last chapter, chapter 8, is a slight deviation from the theme of the book be-cause the content here is closer to research topics. The topics here, problems inrandom domain, inverse parameter estimation, and “correcting” simulation resultsusing data, are important topics and have been studied extensively. The purpose ofthis chapter is to demonstrate the applicability of gPC methods to these problemsand present unique and effiicient algorithms constructed by using gPC. Neverthe-less, this chapter is not required when teaching the course, and readers are advisedto read it based on their own interests.

1.3 A SHORT REVIEW OF THE LITERATURE

Though the focus of this book is on the fundamentals of gPC-based numericalmethods, it is worthwhile to present a concise review of the notable literature inthis field. The goal is to give readers a general sense of what the active researchdirections are. Since the field is undergoing rapid development, by no means doesthis section serve as a comprehensive review. Only the notable and earlier work ineach subfield will be mentioned. Readers, after learning the basics, should devotethemselves to a more in-depth literature search.

The term polynomial chaos was coined by Nobert Wiener in 1938 in his workstudying the decomposition of Gaussian stochastic processes [115]. This was longbefore the phenomenon of chaos in dynamical systems was known. In Wiener’swork, Hermite polynomials serve as an orthogonal basis, and the validity of theapproach was proved in [12]. Beyond the use of Hermite polynomials, the workon polynomial chaos referred in this book bears no other resemblance to Wiener’swork. In the stochastic computations considered here, the problems we face in-volve some practical systems (usually described by partial differential equations)with random inputs. The random inputs are usually characterized by a set of ran-dom parameters. As a result, many of the elegant mathematical tools in classicalstochastic analysis, e.g., stochastic calculus, are not directly applicable. And weneed to design new algorithms that are suitable for such practical systems.


INTRODUCTION 7

The original PC work was started by R. Ghanem and coworkers. Inspired by thetheory of Wiener-Hermite polynomial chaos, Ghanem employed Hermite polyno-mials as an orthogonal basis to represent random processes and applied the tech-nique to many practical engineering problems with success (cf. [41, 42, 43, 97]).An overview can be found in [45].

The use of Hermite polynomials, albeit mathematically sound, presents diffi-culties in some applications, particularly in terms of convergence and probabilityapproximations for non-Gaussian problems [20, 86]. Consequently, generalizedpolynomial chaos was proposed in [120] to alleviate the difficulty. In gPC, differentkinds of orthogonal polynomials are chosen as a basis depending on the probabilitydistribution of the random inputs. Optimal convergence can be achieved by choos-ing the proper basis. In a series of papers, the strength of gPC is demonstrated fora variety of PDEs [119, 121].

The work on gPC was further generalized by not requiring the basis polynomialsto be globally smooth. In fact, in principle any set of complete bases can be aviable choice. Such generalization includes the piecewise polynomial basis [8, 92],the wavelet basis [62, 63], and multielement gPC [110, 111].

Upon choosing a proper basis, a numerical technique is needed to solve the prob-lem. The early work was mostly based on the Galerkin method, which minimizesthe error of a finite-order gPC expansion by Galerkin projection. This is the stochas-tic Galerkin (SG) approach and has been applied since the early work on PC andproved to be effective. The Galerkin procedure usually results in a set of coupleddeterministic equations and requires additional effort to solve. Also, the derivationof the resulting equations can be challenging when the governing stochastic equa-tions take complicated forms.

Another numerical approach is the stochastic collocation (SC) method, whereone repetitively excecutes an established deterministic code on a prescribed nodein the random space defined by the random inputs. Upon completing the simula-tions, one conducts postprocessing to obtain the desired solution properties fromthe solution ensemble. The idea, primarily based on an old technique, the “deter-ministic sampling method,” can be found in early works such as [78, 103]. Theseworks mostly employed tensor products of one-dimensional nodes (e.g., Gaussquadrature). Although tensor product construction makes mathematical analysismore accessible (cf. [7]) the total number of nodes grows exponentially fast asthe number of random parameters grows—the curse of dimensionality. Since eachnode requires a full-scale underlying deterministic simulation, the tensor productapproach is practical only for low random dimensions, e.g., when the number ofrandom parameters is less than 5.

More recently, there has been a surge of interest in the high-order stochasticcollocation approach following [118]. A distinct feature of the work in [118] isthe use of sparse grids from multivariate interpolation analysis. A sparse grid isa subset of the full tensor grid and can retain many of the accuracy properties ofthe tensor grid. It can significantly reduce the number of nodes in higher randomdimensions while keeping high-order accuracy. Hence the sparse grid collocationmethod becomes a viable choice in practical simulations. Much more work has


8 CHAPTER 1

since followed, with most focusing on further reduction in the number of nodes(cf. [3, 75, 81, 104].

While most sparse grid collocation methods utilize interpolation theory, anotherpractical collocation method is the pseudospectral approach, termed by [116]. Thisapproach employs a discrete version of the gPC orthogonal projection operator andrelies heavily on integration theory. One should keep in mind that in multidimen-sional spaces, especially for high dimensions, both interpolation and integration arechallenging tasks.

The major challenge in stochastic computations is high dimensionality, i.e., howto deal with a large number of random variables. One approach to alleviate thecomputational cost is to use adaptivity. Current work includes adaptive choice ofthe polynomial basis [33, 107], adaptive element selection in multielement gPC[31, 110], and adaptive sparse grid collocation [35, 75, 104].

Applications of these numerical methods take a wide range, a manifestation ofthe relevance of stochastic simulation and uncertainty quantification. Herewe mention some of the more representative (and published) work. It includesBurgers’ equation [53, 123], fluid dynamics [58, 61, 64, 70, 121], flow-structureinteractions [125], hyperbolic problems [17, 48, 68], material deformation [1, 2],natural convection [35], Bayesian analysis for inverse problems [76, 77, 112],multibody dynamics [89, 90], biological problems [37, 128], acoustic and electro-magnetic scattering [15, 16, 126], multiscale computations [5, 94, 117, 124, 129],model construction and reduction [26, 40, 44], random domains with rough bound-aries [14, 69, 102, 130], etc.


Chapter Two

Basic Concepts of Probability Theory

In this chapter we collect some basic facts needed for stochastic computations.Most parts of this chapter can be skipped, if one has some basic knowledge ofprobability theory and stochastic processes.

2.1 RANDOM VARIABLES

The outcome of an experiment, a game, or an event is random. A simple exampleis coin tossing: the possible outcomes, heads or tails, are not predictable in thesense that they appear according to a random mechanism that is too complex to beunderstood. A more complicated experiment is the stock market. There the randomoutcomes of brokers’ activities, which in fact represent the economic environment,political interests, market sentiment, etc., are reflected by share prices and exchangerates.

The mathematical treatment of such kinds of random experiments requires thatwe assign a number to each random outcome. For example, when tossing a coin,we can assign 1 for heads and 0 for tails. Thus, we obtain a random variableX = X(ω) ∈ {0, 1}, where ω belongs to the outcome space � = {heads, tails}.In the example of the stock market, the value of a share price of a stock is already arandom variable. The numbers X(ω) provide us with information about the exper-iment even if we do not know precisely what mechanism drives the experiment.

More precisely, let � be an abstract space containing all possible outcomes ω

of the underlying experiment; then the random variable X = X(ω) is a real-valuedfunction defined on �. Note here that for the abstract space �, it does not reallymatter what the ω are.

To study problems associated with the random variable X, one first collects rele-vant subsets of �, the events, in a class F called a σ -field or σ -algebra. In order forF to contain all relevant events, it is natural to include all the ω in the event space� and also the union, difference, and intersection of any events in F , the set �, andits complement, the empty set ∅.

If we consider a share price X of a stock, not only the events {ω : X(ω) = c}should belong to F but also

{ω : a < X(ω) < b}, {ω : b < X(ω)}, {ω : X(ω) ≤ a},and many more events that can be relevant. And it is natural to require that elemen-tary operations such as

⋂,⋃

, c on the events of F will not land outside the classF . This is the intuitive meaning of a σ -field F .


10 CHAPTER 2

Definition 2.1. A σ -field F (on �) is a collection of subsets of � satisfying thefollowing conditions:

• It is not empty: ∅ ∈ F and � ∈ F .• If A ∈ F , then Ac ∈ F .• If A1, A2, . . . , ∈ F , then

∞⋃i=1

Ai ∈ F and∞⋂

i=1

Ai ∈ F .

Example 2.2 (Some elementary σ -fields). The following collections of subsets of� are σ -fields:

F1 = {∅, �}F2 = {∅, �, A, Ac}, where A �= ∅, A �= �,

F3 = 2� � {A : A ⊂ �}.F1 is the smallest σ -field on �, and F3 is the biggest one containing all possiblesubsets of � and is called the power set of �.

In practice, the power set is in general too big. One can prove that, for a givencollection C of subsets of �, there exists a smallest σ -field σ(C) on � containingC. We call σ(C) the σ -field generated by C.Example 2.3. In example 2.2, one can prove that

F1 = σ({∅}), F2 = σ({A}), F3 = σ(F3).

2.2 PROBABILITY AND DISTRIBUTION

The concept of probability is used to measure the likelihood of the occurrenceof certain events. For example, for the fair coin toss described in the previoussection, we assign the probability 0.5 to both events, heads and tails. That is,P ({ω : X(ω) = 0}) = P ({ω : X(ω) = 1}) = 0.5. This assignment is basedon empirical evidence: if we flip a fair coin a large number of times, we expect thatabout 50 percent of the outcomes will be heads and about 50 percent will be tails.In probability theory, the law of large numbers gives the theoretical justification forthis empirical observation.

Some elementary properties of probability measures are easily summarized. Forevents A, B ∈ F ,

P (A ∪ B) = P (A) + P (B) − P (A ∩ B),

and if A and B are disjoint,

P (A ∪ B) = P (A) + P (B).

Moreover,

P (Ac) = 1 − P (A), P (�) = 1, P (∅) = 0.


CONCEPTS OF PROBABILITY THEORY 11

Definition 2.4 (Probability space). A probability space is a triplet (�,F , P )

where � is a countable event space, F ⊂ 2� is the σ -field of �, and P is a proba-bility measure such that

1. 0 ≤ P (A) ≤ 1, ∀A ∈ F .2. P (�) = 1.3. For A1, A2, . . . ∈ F and Ai ∩ Aj = ∅, ∀i �= j ,

P

( ∞⋃i=1

Ai

)=

∞∑i=1

P (Ai).

Definition 2.5 (Distribution function). The collection of the probabilities

FX(x) = P (X ≤ x) = P ({ω : X(ω) ≤ x}), x ∈ R, (2.1)

is the distribution function FX of X.

It yields the probability that X belongs to an interval (a, b]. That is,

P ({ω : a < X(ω) ≤ b}) = FX(b) − FX(a), a < b.

Moreover, we obtain the probability that X is equal to a number

P (X = x) = FX(x) − limε→0

FX(x − ε).

With these probabilities we can approximate the probability of the event {ω :X(ω) ∈ B} for very complicated subsets B of R.

Definition 2.6 (Distribution). The collection of the probabilities

PX(B) = P (X ∈ B) = P ({ω : X(ω ∈ B})for suitable subsets B ⊂ R is the distribution of X.

The suitable subsets of R are called Borel sets. They are sets from B = σ({(a, b] :−∞ < a < b < ∞}), the Borel σ -field.

The distribution PX and the distribution function FX are equivalent notions inthe sense that both of them can be used to calculate the probability of any event{X ∈ B}.

2.2.1 Discrete Distribution

A distribution function can have jumps. That is,

FX(x) =∑

k:xk≤x

pk, x ∈ R, (2.2)

where

0 ≤ pk ≤ 1, ∀k,

∞∑k=1

pk = 1.

The distribution function (2.2) and the corresponding distribution are discrete. Arandom variable with such a distribution function is a discrete random variable.

A discrete random variable assumes only a finite or countably infinite number ofvalues x1, x2, . . . and with probability pk = P (X = xk).


12 CHAPTER 2

1 2 3 4 5 6 7 8 9 10 110

0.05

0.1

0.15

0.2

0.25

k

P

0 2 4 6 8 10 12 14 160

0.05

0.1

0.15

0.2

0.25

k

P

Figure 2.1 Probability distribution functions. Left: binomial distribution with n = 10, p =0.5. Right: Poisson distribution with λ = 3.

Example 2.7 (Two important discrete distributions). Important discrete distri-butions include the binomial distribution B(n, p) with parameters n ∈ N0 ={0, 1, . . . } and p ∈ (0, 1):

P (X = k) =(

n

k

)pk(1 − p)n−k, k = 0, 1, . . . , n,

and the Poisson distribution P (λ) with parameter λ > 0:

P (X = k) = e−λ λk

k! , k = 0, 1, . . . .

Graphical illustrations of these two probability distributions are shown in figure 2.1.

2.2.2 Continuous Distribution

In contrast to discrete distributions and random variables, the distribution functionof a continuous random variable does not have jumps; hence

P (X = x) = 0, ∀x ∈ R,

or, equivalently,

limε→0

FX(x + ε) = FX(x), ∀x; (2.3)

i.e., a continuous random variable assumes any particular value with probability 0.Most continuous distributions have a density fX:

FX(x) =∫ x

−∞fX(y)dy, x ∈ R, (2.4)

where

fX(x) ≥ 0, ∀x ∈ R,

∫ ∞

−∞fX(y)dy = 1.

Example 2.8 (Normal and uniform distributions). An important continuous dis-tribution is the normal or Gaussian distribution N (µ, σ 2) with parameters µ ∈ R,



−5 −4 −3 −2 −1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

x

f X

Figure 2.2 Normal distribution function with µ = 0, σ 2 = 1.

σ 2 > 0. Its density is

fX(x) = 1√2πσ 2

exp

[− (x − µ)2

2σ 2

], x ∈ R. (2.5)

The density of N (0, 1) is shown in figure 2.2.The uniform distribution U(a, b) on (a, b) has density

fX(x) ={

1b−a

, x ∈ (a, b),

0, otherwise,

which is a constant inside (a, b).

2.2.3 Expectations and Moments

Important characteristics of a random variable X include its expectation, variance,and moments. The expectation or mean value of a random variable X with densityfX is

µX = E[X] =∫ ∞

−∞xfX(x)dx.

The variance of X is defined as

σ 2X = var(X) =

∫ ∞

−∞(x − µX)2fX(x)dx.

The mth moment of X for m ∈ N is

E[Xm] =∫ ∞

−∞xmfX(x)dx.

For a real-valued function g, the expectation of g(X) is

E[g(X)] =∫ ∞

−∞g(x)fX(x)dx.


14 CHAPTER 2

Similarly, for a discrete random variable X with probabilities pk = P (X = xk),we have

µX = E[X] =∞∑

k=1

xkpk,

σ 2X = var(X) =

∞∑k=1

(xk − µX)2pk,

E[Xm] =∞∑

k=1

xmk pk,

E[g(X)] =∞∑

k=1

g(xk)pk.

Often we study E [(X − µX)m], the centered moments. The expectation or meanµX, is often regarded as the “center” of the random variable X or the most likelyvalue of X. The variance σ 2

X, or more precisely the standard deviation σX, de-scribes the spread or dispersion of the random variable X around its mean µX. It isstraightforward to show that

σ 2X =E [

(X − µX)2]

=E[X2 − 2µXX + µ2X] = E[X2] − 2µ2

X + µ2X

=E[X2] − µ2X.

Often this is memorized as “variance equals the mean of the square minus thesquare of the mean.”

Example 2.9 (Moments of a Gaussian random variable). If a random variable X

has a normal distribution with density (2.5), then we have

µX = E[X] = µ,

σ 2X = var(X) = σ,

E[(X − µ)2n−1

] = 0, n = 1, 2, . . . ,

E[(X − µ)2n

] = 1 · 3 · 5 · · · (2n − 1) · σ 2n, n = 1, 2, . . . .

Therefore, the mean and variance of a normal distribution can completely charac-terize all of its moments.

2.2.4 Moment-Generating Function

Definition 2.10. The moment-generating function for a random variable X(ω) isdefined as mX(t) � E[etX]. It exists if there exists b > 0 such that mX(t) is finitefor |t | ≤ b. The reason mX(t) is called a moment-generating function is because

µk = dkmX(t)

dtk

∣∣∣∣t=0

, k = 0, 1, . . . ,



where µk = E[Xk] is the kth moment of X. The relationship can be seen as follows:

mX(t) =E [etX

] =∫

etxpX(x)dx

=∫ ∞∑

k=0

txk

k! pX(x)dx =∞∑

k=0

1

k!∫

t kxkpX(x)dx

=∞∑

k=0

t kµk

k! = µ0 + tµ1 + t2

2µ2 + · · · . (2.6)

If X ∼ N (0, σ 2) is a Gaussian random variable, then mX(t) = eσ 2t2/2.

2.2.5 Random Number Generation

One of the basis tasks in stochastic simulations is to generate a sequence of ran-dom numbers satisfying a desired probability distribution. To this end, one firstseeks to generate a random sequence with a common uniform distribution in (0, 1).There are many available algorithms that have been well studied. The algorithms inpractical implementations are all deterministic (typically using recursion) and cantherefore at best mimic properties of uniform random variables. For this reason,the sequence of outputs is called a sequence of pseudorandom numbers. Despitesome defects in the early work, the algorithms for generating pseudorandom num-bers have been much improved. The readers of this book will do well with existingsoftware, which is fast and certainly faster than self-made high-level-language rou-tines, and will seldom be able to improve it substantially. Therefore, we will notspend time on this subject, and we refer interested readers to references such as[38, 59, 65, 87].

For nonuniform random variables, the most straightforward technique is via theinversion of a distribution function. Let FX(x) = P (X ≤ x) be the distributionfunction of X. For the simple case where FX is strictly increasing and continuous,then x = F −1

X (u) is the unique solution of FX(x) = u, 0 < u < 1. For distributionswith nonconnected support or jumps, FX is not strictly increasing, and more care isneeded to find its inverse. We choose the left-continuous version

F −1X (u) � inf{x : FX(x) ≥ u}. (2.7)

We state here the following results that justify the inversion method for generatingnonuniform random numbers.

Proposition 2.11. Let FX(x) = P (X ≤ x) be the distribution function of X. Thenthe following results hold.

• u ≤ FX(x) ⇐⇒ F −1X (u) ≤ x.

• If U is uniform in (0, 1), then F −1X (U) has distribution function FX.

• If FX is continuous, then FX(x) is uniform in (0, 1).

The part that is mainly used in simulation is the second statement, which allowsus to generate X as F −1

X (U). The most common case is an FX that is continuousand strictly increasing on an interval.


16 CHAPTER 2

Example 2.12 (Exponential random variable). Let X be a random variable withexponential distribution whose probability density is fX(x) = ae−ax . Here a > 0is a parameter that is often called the rate. It is easy to see that E[X] = 1/a, theinverse of the rate. Then the inverse of FX is F −1

X (u) = − log(1 − x)/a. So wecan generate X by inversion: X = − log(1 − U)/a. In practice, one often uses theequivalent X = − log(U)/a.

A main limitation of inversion is that quite often F −1X is not available in explicit

form, for example, when X is a normal random variable. Sometimes approxima-tions are used.

Example 2.13 (Approximate inversion of normal distribution). Let X be aGaussian random variable with zero mean and unit variance; i.e., X ∼ N (0, 1).Its probability density function is fX(x) = 1√

2πe−x2/2, and there is no explicit for-

mula for FX(x). The following two approximations to F −1X are quite simple and

accurate.

F −1X (u) ≈ sign(u − 1/2)

(t − c0 + c1t + c2t2

1 + d1t + d2t2 + d3t3

), (2.8)

where t = (− ln[min(u, (1 − u))]2)1/2 and c0 = 2.515517, c1 = 0.802853, d1 =1.432788, d2 = 0.189269, d3 = 0.001308. The formula has absolute error less than4.5 × 10−4 ([50]). Or,

F −1X (u) ≈ y + p0 + p1y + p2y2 + p3y3 + p4y4

q0 + q1y + q2y2 + q3y3 + q4y4, 0.5 < u < 1, (2.9)

where y = √−2 log(1 − u), the case of 0 < u < 0.5 is handled by symmetry, andpk, qk are given in the accompanying table.

k pk qk

0 −0.322232431088 0.0993484626061 −1 0.5885815704952 −0.342242088547 0.5311034623663 −0.0204231210245 0.103537752854 −0.0000453642210148 0.0038560700634

Other techniques exist for the generation of nonuniform random variables, mostnotably acceptance-rejection algorithms. We will not engage in more in-depth dis-cussions here and refer interested readers to comprehensive references such as[25, 38, 52].

2.3 RANDOM VECTORS

We say X = (X1, . . . , Xn) is an n-dimensional random vector if its compo-nents X1, . . . , Xn are one-dimensional real-valued random variables. Therefore, a



random vector is nothing but a collection of a finite number of random variables.Similarly, we can also define concepts such as distribution function, moments, etc.

Definition 2.14. The collection of the probabilities

FX(x) = P (X1 ≤ x1, . . . , Xn ≤ xn), x = (x1, . . . , xn) ∈ Rn, (2.10)

is the distribution function FX of X.

If the distribution of a random vector X has a density fX, we can represent thedistribution function FX as

FX(x1, . . . , xn) =∫ x1

−∞· · ·

∫ xn

−∞fX(y1, . . . , yn)dy1 · · · dyn,

where the density is a function satisfying

fX(x) ≥ 0, ∀x ∈ Rn,

and ∫ ∞

−∞· · ·

∫ ∞

−∞fX(y1, . . . , yn)dy1 · · · dyn = 1.

If a vector X has density fX, then all of its components Xi , the vectors of thepairs (Xi, Xj ), triples (Xi, Xj , Xk), etc., have a density. They are called marginaldensities. For example,

fXi(xi) =

∫ ∞

−∞· · ·

∫ ∞

−∞fX(y1, . . . , yn)dy1 · · · dyi−1dyi+1 · · · dyn.

The important statistical quantities of a random vector include its expectation,variance, and covariance. The expectation or mean value of a random vector X isgiven by

µX = E[X] = (E[X1], . . . ,E[Xn]).The covariance matrix of X is defined as

CX = (cov(Xi, Xj ))ni,j=1, (2.11)

where

cov(Xi, Xj ) = E[(Xi − µXi)(Xj − µXj

)] = E(XiXj ) − µXiµXj

, (2.12)

is the covariance of Xi and Xj . Note that cov(Xi, Xi) = σ 2Xi

.It is also convenient to standardize covariances by dividing the random variables

by their standard deviations. The resulting quantity

corr(X1, X2) = cov(X1, X2)

σX1σX2

(2.13)

is the correlation coefficient. An immediate fact following the Cauchy-Schwarzinequality is

−1 ≤ corr(X1, X2) ≤ 1. (2.14)

We say the two random variables are uncorrelated if corr(X1, X2) = 0, and stronglycorrelated if |corr(X1, X2)| ≈ 1.


18 CHAPTER 2

Example 2.15 (Gaussian random vector). A Gaussian or normal random vectorhas a Gaussian or normal distribution. The n-dimensional Gaussian distribution isdefined by its density

fX(x) = 1

(2π)n/2(det CX)1/2exp

{−1

2(x − µX)CX

−1(x − µX)T

}, (2.15)

where µX ∈ Rn is the expectation of X and CX is the covariance matrix. Thus, thedensity of a Gaussian vector (hence its distribution) is completely determined via itsexpectation and covariance matrix. If CX = In, the n-dimensional identity matrix,the components are called uncorrelated and the density becomes the product of n

normal densities:

fX(x) = fX1(x1) · · · fXn(xn),

where fXi(xi) is the normal density of N (µXi

, σ 2Xi

). An important and appealingproperty of Gaussian random vectors is that they remain Gaussian under lineartransformation.

Theorem 2.16. Let X = (X1, . . . , Xn) be a Gaussian random vector with distribu-tion N (µ, C) and let A be an m × n matrix. Then AXT has an N (AµT , ACAT )

distribution.

The proof is left as an exercise.

2.4 DEPENDENCE AND CONDITIONAL EXPECTATION

Intuitively, two random events are called independent if the outcome of one eventdoes not influence the outcome of the other. More precisely, we state the following.

Definition 2.17. Two events A1 and A2 are independent if

P (A1 ∩ A2) = P (A1)P (A2).

Definition 2.18. Two random variables X1 and X2 are independent if

P (X1 ∈ B1, X2 ∈ B2) = P (X1 ∈ B1)P (X2 ∈ B2)

for all suitable subsets B1 and B2 of R. This means that the events {X1 ∈ B1} and{X2 ∈ B2} are independent.

Alternatively, one can define independence via distribution functions and densities.The random variables X1, . . . , Xn are independent if and only if their joint distri-bution function can be written as

FX1,...,Xn(x1, . . . , xn) = FX1(x1) · · · FXn

(xn), (x1, . . . , xn) ∈ Rn.

If the random vector X = (X1, . . . , Xn) has density fX, then X1, . . . , Xn are inde-pendent if and only if

fX1,...,Xn(x1, . . . , xn) = fX1(x1) · · · fXn

(xn), (x1, . . . , xn) ∈ Rn. (2.16)



It follows immediately that if X1, . . . , Xn are independent, then for any real-valuedfunctions g1, . . . , gn,

E[g1(X1) · · · gn(Xn)] = E[g1(X1)] · · ·E[gn(Xn)],provided the considered expectations are all well defined. Hence, if X1 and X2 areindependent, then

corr(X1, X2) = cov(X1, X2) = 0.

This implies that independent random variables are uncorrelated. The converse isin general not true.

Example 2.19. Let X be a standard normal random variable. Since X is symmetric(its density is an even function), so is X3. Therefore, both X and X3 have expecta-tion zero. Thus,

cov(X, X2) = E[X3] − E[X]E[X2] = 0,

which implies that X and X2 are uncorrelated. However, they are clearly dependent,in the sense that knowing X determines X2 completely. For example, since {X ∈[−1, 1]} = {X2 ∈ [0, 1]}, we have

P (X ∈ [−1, 1], X2 ∈ [0, 1]) = P (X ∈ [−1, 1])> P (X ∈ [−1, 1])P (X2 ∈ [0, 1])= (P (X ∈ [−1, 1]))2.

Example 2.20. Let X be an n-dimensional Gaussian random vector with density(2.15). The components are uncorrelated when corr(Xi, Xj ) = cov(Xi, Xj ) = 0for i �= j . This means that the correlation matrix is diagonal. In this case, the den-sity of X can be written in the product form of (2.16), and therefore the componentsare independent. Thus, uncorrelation and independence are equivalent notions forGaussian distributions.

Another concept describing relations among random events or random variablesis the conditional expectation. This is an extremely important subject in probabilitytheory, though it is not of as much significance in this book.

From elementary probability theory, the conditional probability of A given B is

P (A|B) = P (A ∩ B)

P (B).

Clearly,

P (A|B) = P (A) if and only if A and B are independent.

Given that P (B) > 0, we can define the conditional distribution function of arandom variable X given B,

FX(x|B) = P (X ≤ x, B)

P (B), x ∈ R,

and also the conditional expectation of X given B,

E[X|B] = E[XIB]P (B)

, (2.17)


20 CHAPTER 2

where

IB(ω) ={

1, ω ∈ B,

0, ω /∈ B,

denotes the indicator function of the event B. For the moment let us assume that� = R. If X is a discrete random variable, then (2.17) becomes

E[X|B] =∞∑

k=1

xk

P ({ω : X(ω) = xk} ∩ B)

P (B)=

∞∑k=1

xkP (X = xk|B).

If X has density fX, then (2.17) becomes

E[X|B] = 1

P (B)

∫xIB(x)fX(x)dx = 1

P (B)

∫B

xfX(x)dx.

2.5 STOCHASTIC PROCESSES

In many physical systems, randomness varies either continuously or discretely overphysical space and/or time. Therefore, it is necessary to study the distribution andevolution of the random variables that describe the randomness as functions ofspace and/or time. A mathematical model for describing this is called a stochasticprocess or a random process.

A stochastic process is a collection of random variables

(Xt , t ∈ T ) = (Xt (ω), t ∈ T , ω ∈ �)

defined on some space �. Here t is the index of the random variable X. The indexset T can be an interval, e.g., T = [a, b], [a, b) or [a, ∞) for a < b. Then X isa continuous process. If T is a finite or countably infinite set, then X is a discreteprocess. Very often the index t of Xt is referred to as time. However, one shouldkeep in mind that it is merely an index and can be a space location as well.

A stochastic process X can be considered as a function of two variables.

• For a fixed index t , it is a random variable:

Xt = Xt(ω), ω ∈ �.

• For a fixed random outcome ω ∈ �, it is a function of the index (time):

Xt = Xt(ω), t ∈ T .

This is called a realization, a trajectory, or a sample path of the process Xt .

It is then natural to seek the statistics of the stochastic process. The task is morecomplicated than that for a random vector. For example, a process Xt with an in-finite index set T is an infinite-dimensional object. Mathematical care is neededfor such objects. A simpler approach, which suits practical needs well, is to in-terpret the process as a collection of random vectors. In this way, we study thefinite-dimensional distributions of the stochastic process Xt , which are defined asthe distributions of the finite-dimensional vectors(

Xt1 , . . . , Xtn

), t1, . . . , tn ∈ T ,



for all possible choices of the index t1, . . . , tn ∈ T and every n ≥ 1. These areeasier to study and indeed determine the distribution of Xt . In this sense, we referto a collection of finite-dimensional distributions as the distribution of the stochasticprocess.

A stochastic process X = (Xt , t ∈ T ) can be considered a collection of randomvectors (Xt1 , . . . , Xtn) for t1, . . . , tn ∈ T and n ≥ 1. We can then extend the defini-tions for expectation and covariance matrices for the random vector to the processand consider these quantities as functions of t ∈ T . The expectation function of X

is given by

µX(t) = µXt= E[Xt ], t ∈ T .

The covariance function of X is given by

CX(t, s) = cov(Xt , Xs) = E[(Xt − µX(t))(Xs − µX(s))], t, s ∈ T .

The variance of X is given by

σ 2X(t) = CX(t, t) = var(Xt ), t ∈ T .

These are obviously deterministic quantities. The expectation function µX(t) is adeterministic path around which the sample paths of X are concentrated. Note thatin many cases µX(t) may not be a realizable sample path. The variance functioncan be considered a measure of the spread of the sample paths around the expecta-tion. The covariance function is a measure of dependence on the process X.

The process X = (Xt , t ∈ T ) is called strictly stationary if the finite-dimensionaldistributions of the process are invariant under shifts of the index t :

(Xt1 , . . . , Xtn)d= (Xt1+h, . . . , Xtn+h)

for all possible choices of indices t1, . . . , tn ∈ T , n ≥ 1 and h such that t1 +h, . . . , tn + h ∈ T . Here

d= stands for the identity of the distributions; see thefollowing section for the definition. In practice, a weaker version of stationarity isoften adopted. A process X is called stationary in the wide sense or second-orderstationary if its expectation is a constant and covariance function CX(t, s) dependsonly on the distance |t − s|. For a Gaussian process, since its mean and covariancefunction can fully characterize the distribution of the process, the two concepts forstationarity become equivalent.

A large class of (extremely) useful processes can be constructed by imposing astationary (strictly or in the wide sense) condition on the increment of the processes.Examples include the homogeneous Poisson process and Brownian motion. Exten-sive mathematical analysis has been conducted on such processes by using nonele-mentary facts from measure theory and functional analysis. This, however, is notthe focus of this book. We refer interested readers to the many excellent books suchas [36, 55].


22 CHAPTER 2

2.6 MODES OF CONVERGENCE

We now introduce concepts of main modes of convergence for a sequence of ran-dom variables X1, X2, . . . .

Definition 2.21 (Convergence in distribution). The sequence {Xn} converges in

distribution or converges weakly to the random variable X, written as Xnd→ X, if

for all bounded and continuous functions f ,

E[f (Xn)] → E[f (X)], n → ∞.

Note that Xnd→ X holds if and only if for all continuous points x of the distribution

function FX the relation

FXn(x) → FX(x), n → ∞,

is satisfied. If FX is continuous, this can be strengthened to uniform convergence:

supx

|FXn(x) − FX(x)| → 0, n → ∞.

We state the following useful theorem without proving it.

Theorem 2.22. Let Xn and X be random variables with moment-generating func-tions mXn

(t) and mX(t), respectively. If

limn→∞ mXn

(t) = mX(t), ∀t,

then Xnd→ X as n → ∞.

Definition 2.23 (Convergence in probability). The sequence {Xn} converges in

probability to X, written as XnP→ X, if for all positive ε,

P (|Xn − X|) > ε) → 0, n → ∞.

Convergence in probability implies convergence in distribution. The converse istrue if and only if X = x for some constant x.

Definition 2.24 (Almost sure convergence). The sequence {Xn} converges almostsurely (a.s.), or with probability 1, to the random variable X, written as Xn

a.s.→ X,if the set of ω with

Xn(ω) → X(ω), n → ∞,

has probability 1.

This implies that

P (Xn → X) = P ({ω : Xn(ω) → X(ω)}) = 1.

Convergence with probability 1 implies convergence in probability, hence conver-gence in distribution. Convergence in probability does not imply a.s. convergence.

However, XnP→ X implies that Xnk

a.s.→ X for a suitable subsequence {Xnk}.



Definition 2.25 (Lp convergence). Let p > 0. The sequence {Xn} converges in

Lp, or in the pth mean, to X, written as XnLp→ X, if E[|Xn|p + |X|p] < ∞ for all

n and

E[|Xn − X|p] → 0, n → ∞.

The well-known Markov’s inequality ensures that P (|Xn −X| > ε) ≤ ε−pE[|Xn −

X|p] for positive p and ε. Thus, XnLp→ X implies Xn

P→ X. The converse is ingeneral not true.

For p = 2, we say that Xn converges to X in mean square. Mean-square conver-gence is convergence in the Hilbert space

L2 = L2(�,F , P ) = {X : E[X2] < ∞}endowed with the inner product 〈X, Y 〉 = E[XY ] and the norm ‖X‖ = √〈X, X〉.

Convergence in distribution and convergence in probability are often referredto as weak convergence, whereas a.s. convergence and Lp convergence are oftenreferred to as strong convergence.

2.7 CENTRAL LIMIT THEOREM

The celebrated central limit theorem (CLT) plays a central role in many aspects ofprobability analysis. Here we state a version of the CLT that suits the exposition ofthis book.

Theorem 2.26. Let X1, X2, . . . , Xn be independent and identically distributed(i.i.d.) random variables with E[Xi] = µ and var(Xi) = σ 2 < ∞. Let

X = 1

n

n∑i=1

Xi

and let

Un = √n

(X − µ

σ

).

Then the distribution function of Un converges to an N (0, 1) distribution functionas n → ∞.

Proof. For all i, let Zi = Xi−µ

σ; then E[Zi] = 0 and var(Zi) = 1. Also, let µ3 =

E[Z3i ] for all i. Then, by following the property of moment-generating function

(2.6), we have

mZi(t) = 1 + t2

2+ t3

3!µ3 + · · · .

By definition,

Un = √n

(X − µ

σ

)= 1√

n

(∑i Xi − nµ

σ

)= 1√

n

∑i

Zi .


24 CHAPTER 2

We immediately have

mUn(t) =

∏i

mZi

(t√n

)=

(mZi

(t√n

))n

=(

1 + t2

2n+ t3

3!n3/2µ3 + · · ·

)n

and

ln(mUn

(t)) = n ln

(1 + t2

2n+ t3

3!n3/2µ3 + · · ·

).

Letting z = t2

2n+ t3

3!n3/2 µ3 + · · · and using ln(1 + z) = z − z2

2 + z3

3 − z4

4 + · · · , wehave

ln(mUn(t)) = n

(z − z2

2+ · · ·

)= n

(t2

2n+ t3µ3

3!n3/2+ · · ·

).

It is obvious that

limn→∞ mUn

(t) = t2

2,

which is the moment-generating function of a unit Gaussian random variableN (0, 1). The theorem is then proved by virtue of theorem 2.22.

This immediately implies that the numerical average of a set of i.i.d. random vari-ables {Xi}n

i=1 will converge, as n is increased, to a Gaussian distributionN (µ, σ 2/n), where µ and σ 2 are the mean and variance of the i.i.d randomvariables.


Chapter Three

Survey of Orthogonal Polynomials and

Approximation Theory

In this chapter we review the basic aspects of orthogonal polynomials and approxi-mation theory. We will focus exclusively on the univariate case, that is, polynomialsand approximations on the real line, to establish the fundamentals of the theories.

3.1 ORTHOGONAL POLYNOMIALS

We first review the basics of orthogonal polynomials, which play a central role inmodern approximation theory. The material is kept to a minimum to satisfy theneeds of this book. More in-depth discussions of the properties of orthogonal poly-nomials can be found in many standard books such as [10, 19, 100].

From here on we adopt the standard notation by letting N be the set of positiveintegers and letting N0 be the set of nonnegative integers. We also let N = N0 ={0, 1, . . . } orN = {0, 1, . . . , N} be an index set for a finite nonnegative integer N .

3.1.1 Orthogonality Relations

A general polynomial of degree n takes the form

Qn(x) = anxn + an−1xn−1 + · · · + a1x + a0, an �= 0, (3.1)

where an is the leading coefficient of the polynomial. We denote by

Pn(x) = Qn(x)

an

= xn + an−1

an

xn−1 + · · · + a1

an

x + a0

an

the monic version of this polynomial, i.e., the one with the leading coefficientsequal to 1.

A system of polynomials {Qn(x), n ∈ N } is an orthogonal system of polyno-mials with respect to some real positive measure α if the following orthogonalityrelations hold: ∫

S

Qn(x)Qm(x)dα(x) = γnδmn, m, n ∈ N , (3.2)

where δmn = 0 if m �= n and δmn = 1 if m = n, is the Kronecker delta func-tion, S is the support of the measure α, and γn are positive constants often termednormalization constants. Obviously,

γn =∫

S

Q2n(x)dα(x), n ∈ N .


26 CHAPTER 3

If γn = 1, the system is orthonormal. Note that by defining Qn(x) = Qn(x)/√

γn,the system {Qn} is orthonormal.

The measure α usually has a density w(x) or is a discrete measure with weightwi at the points xi . The relations (3.2) then become∫

S

Qn(x)Qm(x)w(x)dx = γnδmn, m, n ∈ N , (3.3)

in the former case and∑i

Qn(xi)Qm(xi)wi = γnδmn, m, n ∈ N , (3.4)

in the latter case where it is possible that the summation is an infinite one.If we define a weighted inner product

(u, v)dα =∫

S

u(x)v(x)dα(x), (3.5)

which in continuous cases takes the form

(u, v)w =∫

S

u(x)v(x)w(x)dx (3.6)

and in discrete cases takes the form

(u, v)w =∑

i

u(xi)v(xi)wi, (3.7)

then the orthogonality relations can be written as

(QmQn)w = γnδmn m, n ∈ N , (3.8)

where

γn = (Qn, Qn)w = ‖Qn‖2w, n ∈ N . (3.9)

3.1.2 Three-Term Recurrence Relation

It is well known that all orthogonal polynomials {Qn(x)} on the real line satisfy athree-term recurrence relation

−xQn(x) = bnQn+1(x) + anQn(x) + cnQn−1(x), n ≥ 1, (3.10)

where bn, cn �= 0 and cn/bn−1 > 0. Along with Q−1(x) = 0 and Q0(x) = 1, thethree-term recurrence defines the polynomial system completely. Often the relationis written in a different form,

Qn+1(x) = (Anx + Bn)Qn(x) − CnQn−1(x), n ≥ 0,

and Favard proved the following converse result ([19]).

Theorem 3.1 (Favard’s Theorem). Let An, Bn, and Cn be arbitrary sequences ofreal numbers and let {Qn(x)} be defined by the recurrence relation

Qn+1(x) = (Anx + Bn)Qn(x) − CnQn−1(x), n ≥ 0,

together with Q0(x) = 1 and Q−1(x) = 0. Then the {Qn(x)} are a system oforthogonal polynomials if and only if An �= 0, Cn �= 0, and CnAnAn−1 > 0 forall n.


POLYNOMIALS AND APPROXIMATION 27

3.1.3 Hypergeometric Series and the Askey Scheme

Most orthogonal polynomials can be expressed in a unified way by using hyper-geometric series and incorporated in the Askey scheme. To this end, we first definethe Pochhammer symbol (a)n as

(a)n ={

a, n = 0,

a(a + 1) · · · (a + n − 1), n = 1, 2, . . . .(3.11)

If a ∈ N is an integer, then

(a)n = (a + n − 1)!(a − 1)! , n > 0,

and for general a ∈ R,

(a)n = �(a + n)

�(a), n > 0.

The generalized hypergeometric series rFs is defined by

rFs(a1, . . . , ar ; b1, . . . , bs; z) =∞∑

k=0

(a1)k · · · (ar)k

(b1)k · · · (bs)k

zk

k! , (3.12)

where bi �= 0, −1, −2, . . . , for all i = 1, . . . , s. There are r parameters in the nu-merator and s parameters in the denominator. Clearly, the orders of the parametersare immaterial.

For example,

0F0( ; ; z) =∞∑

k=0

zk

k!is the power series for an exponential function.

When the series is infinite, its radius of convergence ρ is

ρ =

∞, r < s + 1,

1, r = s + 1,

0, r > s + 1.

If one of the numerator parameters ai , i = 1, . . . , r , is a negative integer, say,a1 = −n, the series terminates because (a1)k = (−n)k = 0, for k = n + 1, n +2, . . . , and becomes

rFs(a1, . . . , ar ; b1, . . . , bs; z) =n∑

k=0

(−n)k · · · (ar)k

(b1)k · · · (bs)k

zk

k! . (3.13)

This is a polynomial of degree n.The Askey scheme, which can be represented as a tree structure as shown in

figure 3.1, classifies the hypergeometric orthogonal polynomials and indicates thelimit relations between them. The tree starts with Wilson polynomials and Racahpolynomials at the top. They both belong to class 4F3 of hypergeometric orthogonalpolynomials. Wilson polynomials are continuous polynomials and Racah polyno-mials are discrete. The lines connecting different polynomials denote the limit tran-sition relationships between them, which implies that polynomials at the lower ends


28 CHAPTER 3

Wilson Racah

Continuousdual Hahn

ContinuousHahn Hahn Dual Hahn

Meixner-

PollaczekJacobi Meixner Krawtchouk

Laguerre Charlier

Hermite

4F3(4)

3F2(3)

2F1(2)

1F1(1) 2F0(1)

2F0(0)

Figure 3.1 The Askey scheme for hypergeometric orthogonal polynomials.

of the lines can be obtained by taking the limit of some parameters in the polyno-mials at the upper ends. For example, the limit relation between Jacobi polynomialsP

(α,β)n (x) and Hermite polynomials Hn(x) is

limα→∞ α− 1

2 nP (α,α)n

(x√α

)= Hn(x)

2nn! ,

and that between Meixner polynomials Mn(x; β, c) and Charlier polynomialsCn(x; a) is

limβ→∞ Mn

(x; β,

a

a + β

)= Cn(x; a).

For a detailed account of hypergeometric polynomials and the Askey scheme, theinterested reader should consult [60] and [91].

3.1.4 Examples of Orthogonal Polynomials

Here we present several orthogonal polynomials that will be used extensively inthis book. The focus is on continuous polynomials. More specifically, we discussLegendre polynomials defined on [−1, 1], Hermite polynomials defined on R, andLaguerre polynomials defined on [0, ∞). These correspond to polynomials withsupport on a bounded interval (with proper scaling), the entire real line, and thehalf real line, respectively.



3.1.4.1 Legendre Polynomials

Legendre polynomials

Pn(x) = 2F1

(−n, n + 1; 1; 1 − x

2

)(3.14)

satisfy

Pn+1 = 2n + 1

n + 1xPn(x) − n

n + 1Pn−1(x), n > 0, (3.15)

and ∫ 1

−1Pn(x)Pm(x)dx = 2

2n + 1δmn. (3.16)

Obviously the weight function in the orthogonality relation is a constant;i.e., w(x) = 1. The first few Legendre polynomials are

P0(x) = 1, P1(x) = x, P2(x) = 3

2x2 − 1

2, . . . .

Legendre polynomials are a special case of Jacobi polynomials P(α,β)n (x) with

parameters α = β = 0. The details for Jacobi polynomials can be found inappendix A.

3.1.4.2 Hermite Polynomials

Hermite polynomials

Hn(x) =(√

2x)n

2F0

(−n

n, −n − 1

2; ; − 2

x2

)(3.17)

satisfy

Hn+1(x) = xHn(x) − nHn−1(x), n > 0, (3.18)

and ∫ ∞

−∞Hm(x)Hn(x)w(x)dx = n!δmn, (3.19)

where

w(x) = 1√2π

e−x2/2.

The first few Hermite polynomials are

H0(x) = 1, H1(x) = x, H2(x) = x2 − 1, H3(x) = x3 − 3x, . . . .

Note that the definition of Hn(x) here is slightly different from the classical oneused in the literature. Classical Hermite polynomials Hn(x) are often defined by

Hn+1(x) = 2xHn(x) − 2nHn−1(x), n > 0,


30 CHAPTER 3

and ∫ ∞

−∞Hn(x)Hm(x)w(x)dx = 2nn!δm,n,

where w(x) = 1√π

e−x2/2. The two expressions are off by a scaling factor. We em-ploy Hn(x) here to facilitate the discussions associated with probability theory.

3.1.4.3 Laguerre Polynomials

Laguerre polynomials

L(α)n (x) = (α + 1)n

n! 1F1 (−n; α + 1; x) , α > −1, (3.20)

satisfy

(n+1)L(α)n+1(x) = (−x +2n+α+1)L(α)

n (x)−(n+α)L(α)n−1(x), n > 0, (3.21)

and ∫ ∞

0L(α)

m (x)L(α)n (x)w(x)dx = �(n + α + 1)

n! δm,n, (3.22)

where

w(x) = e−xxα.

Note that the sign of the leading coefficients of Laguerre polynomials flips withincreasing degree.

3.2 FUNDAMENTAL RESULTS OF POLYNOMIAL APPROXIMATION

Let Pn be the linear space of polynomials of degree at most n; i.e.,

Pn = span{xk : k = 0, 1, . . . , n}. (3.23)

We begin with a classical theorem by Weierstrass in approximation theory.

Theorem 3.2 (Weierstrass). Let I be a bounded interval and let f ∈ C0(I ). Then,for any ε > 0, we can find n ∈ N and p ∈ Pn such that

|f (x) − p(x)| < ε, ∀x ∈ I .

We skip the proof here. Interested readers can find the proof in various books onapproximation theory, for example, [18, 105, 106]. This celebrated theorem statesthat any continuous function in a bounded closed interval can be uniformly approx-imated by polynomials. From this theorem, a large variety of sophisticated resultshave emerged. A natural question to ask is whether, among all the polynomials ofdegree less than or equal to a fixed integer n, it is possible to find one that bestapproximates a given continuous function f uniformly in I . In other words, wewould like to study the existence of φn(f ) ∈ Pn such that

‖f − φn(f )‖∞ = infψ∈Pn

‖f − ψ‖∞. (3.24)



This problem admits a unique solution, though the proof is very involved. An ex-tensive and general treatise on this subject can be found in [105]. The nth-degreepolynomial φn(f ) is called the polynomial of best uniform approximation of f inI . Following theorem 3.2, one immediately obtains

limn→∞ ‖f − φn(f )‖∞ = 0.

3.3 POLYNOMIAL PROJECTION

From here on we do not restrict ourselves to bounded intervals and consider thegeneral cases of I , where I = [−1, 1], I = [0, ∞[, or I = R.

Another best approximation problem can be formulated in terms of norms otherthan the infinity norm used in (3.24). To this end, we define, for a positive weightfunction w(x), x ∈ I , the weighted L2 space by

L2w(I) �

{v : I → R

∣∣∣∣∫

I

v2(x)w(x)dx < ∞}

(3.25)

with the inner product

(u, v)L2w(I) =

∫I

u(x)v(x)w(x)dx, ∀u, v ∈ L2w(I), (3.26)

and the norm

‖u‖L2w(I) =

(∫I

u2(x)w(x)dx

)1/2

. (3.27)

Throughout this book, we will often use the simplified notation (u, v)w and ‖u‖w

to stand for (u, v)L2w(I) and ‖u‖L2

w(I), respectively, unless confusion would arise.

3.3.1 Orthogonal Projection

Let N be a fixed nonnegative integer and let {φk(x)}Nk=0 ⊂ PN be orthogonal poly-

nomials of degree at most N with respect to the positive weight w(x); i.e.,

(φm(x), φn(x))L2w(I) = ‖φm‖2

L2w(I)δm,n, 0 ≤ m, n ≤ N. (3.28)

We introduce the projection operator PN : L2w(I) → PN such that, for any function

f ∈ L2w(I),

PN f �N∑

k=0

fkφk(x), (3.29)

where

fk �1

‖φk‖2L2

w

(f, φk)L2w, 0 ≤ k ≤ N. (3.30)

Obviously, PN f ∈ PN . It is called the orthogonal projection of f onto PN viathe inner product (·, ·)L2

w, and {fk} are the (generalized) Fourier coefficients.


32 CHAPTER 3

The following trivial facts hold:

PN f = f, ∀f ∈ PN ,

PN φk = 0, ∀k > N.

Moreover, we have the following theorem.

Theorem 3.3. For any f ∈ L2w(I) and any N ∈ N0, PN f is the best approximation

in the weighted L2 norm (3.27) in the sense that

‖f − PN f ‖L2w

= infψ∈PN

‖f − ψ‖L2w. (3.31)

Proof. Any polynomial ψ ∈ PN can be written in the form ψ = ∑Nk=0 ckφk for

some real coefficients ck , 0 ≤ k ≤ N . Minimizing ‖f − ψ‖L2w

is equivalent tominimizing ‖f − ψ‖2

L2w, whose derivatives are

∂

∂cj

‖f − ψ‖2L2

w= ∂

∂cj

(‖f ‖2

L2w

− 2N∑

k=0

ck(f, φk)L2w

+N∑

k=0

c2k‖φk‖2

L2w

)

= −2(f, φj )L2w

+ 2cj ‖φj ‖2L2

w, 0 ≤ j ≤ N.

By setting the derivatives to zero, the unique minimum is attained when cj = fj ,0 ≤ j ≤ N , where fj are the Fourier coefficients of f in (3.30). This completesthe proof.

The projection operator also takes the name orthogonal projector in the sensethat the error f − PN f is orthogonal to the polynomial space PN .

Theorem 3.4. For any f ∈ L2w(I) and N ∈ N0,∫

I

(f − PN f )φwdx = (f − PN f, φ)L2w

= 0, ∀φ ∈ PN . (3.32)

Proof. Let φ ∈ PN and define G : R→ R by

G(ν) � ‖f − PN f + νφ‖2L2

w, ν ∈ R.

From theorem 3.3, ν = 0 is a minimum of G. Therefore,

G′(ν) = 2∫

I

(f − PN f )φwdx + 2ν‖φ‖2L2

w

should satisfy G′(0) = 0. And (3.32) follows directly.

From (3.32) we immediately obtain the Schwarz inequality

‖PN f ‖L2w

≤ ‖f ‖L2w

(3.33)

and the Parseval identity

‖f ‖2L2

w=

∞∑k=0

f 2k ‖φk‖2

L2w. (3.34)



3.3.2 Spectral Convergence

The convergence of the orthogonal projection can be stated as follows.

Theorem 3.5. For any f ∈ L2w(I),

limN→∞ ‖f − PN f ‖L2

w= 0. (3.35)

We skip the proof here. When I is bounded, the proof is straightforward; see, forexample, [34]. When I is not bounded, the proof is more delicate and we referreaders to [22] for details.

The rate of convergence depends on the regularity of f and the type of orthogonalpolynomials {φk}. There is a large amount of literature devoted to this subject. Asa demonstration we present here a result for Legendre polynomials.

Define a weighted Sobolev space H kw(I), for k = 0, 1, 2, . . . , by

H kw(I) �

{v : I → R

∣∣∣∣dmv

dxm∈ L2

w(I), 0 ≤ m ≤ k

}, (3.36)

equipped an inner product

(u, v)H kw�

k∑m=0

(dmu

dxm,

dmv

dxm

)L2

w

(3.37)

and a norm ‖u‖H kw

= (u, u)1/2H k

w.

Let us consider the case of I = [−1, 1] with weight function w(x) = 1 andLegendre polynomials {Pn(x)} (section 3.1.4). The orthogonal projection for anyf (x) ∈ L2

w(I) is

PN f (x) =N∑

k=0

fkPk(x), fk = 1

‖Pk‖2L2

w

(f, Pk)L2w. (3.38)

The following result holds.

Theorem 3.6. For any f (x) ∈ Hpw [−1, 1], p ≥ 0, there exists a constant C, inde-

pendent of N , such that

‖f − PN f ‖L2w[−1,1] ≤ CN−p‖f ‖H

pw [−1,1]. (3.39)

Proof. Since the Legendre polynomials satisfy (see (A.25) in appendix A)

Q[Pk] = λkPk,

where

Q = d

dx

((1 − x2)

d

dx

)= (1 − x2)

d2

dx2− 2x

d

dx

and λk = −k(k + 1). We then have

(f, Pk)L2w

= 1

λk

∫ 1

−1Q[Pk]f (x)dx = 1

λk

∫ 1

−1

((1 − x2)Pk

′′f − 2xP ′kf

)dx

= − 1

λk

∫ 1

−1

[((1 − x2)f

) ′Pk′ + 2xPk

′f]

dx,


34 CHAPTER 3

where integration by parts has been applied to the first term of the integrand toderive the last equality. Upon simplifying the last expression, we obtain

(f, Pk)L2w

= − 1

λk

∫ 1

−1(1 − x2)f ′P ′

kdx = 1

λk

∫ 1

−1

((1 − x2)f ′)′

Pkdx,

where integration by parts is again utilized. This implies

(f, Pk)L2w

= 1

λk

(Q[f ], Pk)L2w.

By applying the procedure repeatedly for m times, we obtain

(f, Pk)L2w

= 1

λmk

(Qm[f ], Pk

)L2

w

.

The projection error can be estimated as

‖f − PN f ‖2L2

w=

∞∑k=N+1

f 2k ‖Pk‖2

L2w

=∞∑

k=N+1

1

‖Pk‖2L2

w

(f, Pk)2L2

w

=∞∑

k=N+1

1

λ2mk ‖Pk‖2

L2w

(Qm[f ], Pk)2L2

w

≤ λ−2mN

∞∑k=0

1

‖Pk‖2L2

w

(Qm[f ], Pk)2L2

w

≤ N−4m‖Qm[f ]‖2L2

w

≤ CN−4m‖f ‖2H 2m

w,

where the last inequality relies on ‖Qm[f ]‖L2w

≤ C‖f ‖H 2mw

, which is a direct conse-quence of the definitions of Qm[f ] and the norms. By taking p = 2m, the theoremis established.

Therefore, the rate of convergence of the Legendre approximation relies on thesmoothness of the function f , measured by its differentiability. For a fixed approx-imation order N , the smoother the function f , the larger the value of p, and thesmaller the approximation error. This kind of convergence rate is referred to in theliterature as spectral convergence. It is in contrast to the traditional finite differenceor finite element approximations where the rate of convergence is fixed regardlessof the smoothness of the function. An example of spectral convergence is shown infigure 3.2, where the error convergence of the Legendre projections of |sin(πx)|3and |x| are given. Both functions have finite, but different, smoothness. The con-vergence rates of the two functions are clearly different in this log-log figure, with|sin(πx)|3 having a faster rate because of its higher differentiability.

If f (x) is analytic, i.e., infinitely smooth, the convergence rate is faster than anyalgebraic order and we expect

‖f − PN f ‖L2w

∼ Ce−αN ‖f ‖L2w,

where C and α are generic positive constants. Thus, for an analytic function, spec-tral convergence becomes exponential convergence. An example of exponential



100

101

102

10−3

10−2

10−1

100

101

N

|sin(π x)|3

|x|

Figure 3.2 Spectral convergence: projection error of |sin(πx)|3 and |x| by Legendre poly-nomials in x ∈ [−1, 1]. (N is the order of expansion.)

0 5 10 15 2010

−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

N

Figure 3.3 Exponential convergence: projection error of cos(πx) by Legendre polynomialsin x ∈ [−1, 1]. (N is the order of expansion.)

convergence is shown in figure 3.3, where the projection error of cos(πx) by Legen-dre polynomials is plotted. Exponential convergence is visible in this kind of semi-log plot.

3.3.3 Gibbs Phenomenon

When the function f is not analytic, the rate of convergence of the polynomialprojection is no longer faster than the algebraic rate. In the case of discontinuousfunctions, the convergence rate deteriorates significantly. For example, consider the


36 CHAPTER 3

−1 −0.5 0 0.5 1−1.5

−1

−0.5

0

0.5

1

1.5

N=5N=9N=15N=19

Figure 3.4 Orthogonal series expansion of the sign function by Legendre polynomials.(N is the order of expansion.)

sign function in (−1, 1):

sgn(x) ={

1, x > 0;−1, x < 0.

The Legendre series expansion is

sgn(x) =∞∑

n=0

(−1)n(4n + 3)(2n)!22n+1(n + 1)!n! P2n+1(x).

The partial sums of the series are plotted in figure 3.4 for several values of N .We observe oscillations near the discontinuity, and they do not disappear as N isincreased. This is referred to as the Gibbs phenomenon. It is a numerical artifact ofusing globally smooth polynomial basis functions to approximate a discontinuousfunction. In fact, for this Legendre series expansion, the Gibbs phenomenon hasa long-range effect in the sense that it seriously affects the rate of convergence atthe endpoints x = ±1 of the interval. For more a detailed discussion of the Gibbsphenomenon, see [47].

3.4 POLYNOMIAL INTERPOLATION

The goal of polynomial interpolation is to construct a polynomial approximation toa function whose values are known at some discrete points. More precisely, givenm + 1 pairs of (xi, yi), the problem consists of funding a function G = G(x) suchthat G(xi) = yi for i = 0, . . . , m, where yi are given values and G interpolates {yi}at the nodes {xi}. In polynomial interpolation, G can be an algebraic polynomial,a trigonometric polynomial, a piecewise polynomial (that is, a local polynomial), a



rational polynomial, etc. In the brief introduction here, we focus on global polyno-mials of algebraic form.

3.4.1 Existence

Let us consider N + 1 pairs of (xi, yi). The problem is to find a polynomial QM ∈PM , called an interpolating polynomial, such that

QM(xi) = aMxMi + · · · + a1xi + a0 = yi, i = 0, . . . , N. (3.40)

The points {xi} are interpolation nodes. If N �= M , the problem is over- or under-determined. If N = M , the following results hold.

Theorem 3.7. Given N + 1 distinct points x0, . . . , xN and N + 1 correspondingvalues y0, . . . , yN , there exists a unique polynomial QN ∈ PN such that QN (xi) =yi for i = 0, . . . , N .

Proof. To prove existence, let us use a constructive approach that provides an ex-pression for QN . Denoting {li}N

i=0 as a basis for PN , then QN (x) = ∑Nj=0 bj lj (x)

with the property that

QN (xi) =N∑

j=0

bj lj (xi) = yi, i = 0, . . . , N.

Let us define

li ∈ PN : li(x) =N+1∏j=0j �=i

x − xj

xi − xj

, i = 0, . . . , N; (3.41)

then li(xj ) = δi,j and we obtain bi = yi .It is easy to verify that {li , i = 0, . . . , N} form a basis for PN (left as an ex-

ercise). Consequently, the interpolating polynomial exists and has the followingform, called the Lagrange form,

QN (x) =N∑

i=0

yili(x). (3.42)

To prove uniqueness, suppose that another interpolating polynomial QM(x) ofdegree M ≤ N exists such that QM(xi) = yi , i = 0, . . . , N . Then, the differencepolynomial QN − QM vanishes at N + 1 distinct points xi and thus coincides withthe null polynomial. Therefore, QM = QN .

Another approach also provides a way of constructing the interpolating polyno-mial. Let

QN (xi) = aN xNi + · · · + a1xi + a0 = yi, i = 0, . . . , N. (3.43)

This is a system of N + 1 equations with N + 1 unknowns of the coefficientsa0, . . . , aN . By letting a = (a0, . . . , aN )T , y = (y0, . . . , yN )T , and A = (aij ) =(x

j

i ), system (3.43) can be written as

Aa = y.


38 CHAPTER 3

−1 −0.5 0 0.5 1−80

−70

−60

−50

−40

−30

−20

−10

0

10

−1 −0.5 0 0.5 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 3.5 Polynomial interpolation of f (x) = 1/(1+25x2) on [−1, 1], the rescaled Rungefunction. Left: interpolation on uniformly distributed nodes. Right: interpolationon nonuniform nodes (the zeros of Chebyshev polynomials).

The matrix A is called a Vandermonde matrix, which can be shown to be nonsin-gular (left as an exercise). Therefore, a unique set of solutions to the coefficients aexists.

3.4.2 Interpolation Error

Let f (x), x ∈ I , be a given function and let �N f (x) be its interpolating polynomialof the N th degree constructed by using the values of f (x) at N + 1 distinct points.Then the following result holds.

Theorem 3.8. Let x0, . . . , xN be N +1 distinct nodes and let x be a point inside theinterval I . Assume that f ∈ CN+1(Ix), where Ix is the smallest interval containingthe nodes x0, . . . , xN and x. Then the interpolation error at the point x is

EN (x) = f (x) − �N f (x) = f N+1(ξ)

(N + 1)!qN+1(x), (3.44)

where ξ ∈ Ix and qN+1 = ∏N+1i=0 (x − xi) is the nodal polynomial of degree N + 1.

The proof for this standard result is skipped here. (See, for example, [6]).We should note that a high-degree polynomial interpolation with a set of uni-

formly distributed nodes is likely to lead to problems, with qN+1(x) behaving ratherwildly near the endpoint nodes. This leads to �N f (x) failing to converge for sim-ple functions such as f (x) = (1 + x2)−1 on [−5, 5], a famous example due to CarlRunge termed the Runge phenomenon. To circumvent the difficulty, it is essential toutilize piecewise low-degree polynomial interpolation or high-degree interpolationon nonuniform nodes. In the latter approach, the zeros of orthogonal polynomi-als provide an excellent choice. This can be seen in figure 3.5, where the Rungefunction is rescaled to domain [−1, 1] and interpolated by 26 points (N = 25).The interpolation on uniformly distributed nodes (on the left in figure 3.5) becomes



ill-conditioned and incurs large errors. The interpolation on nonuniformly distri-buted nodes (on the right in figure 3.5) is stable and accurate.

3.5 ZEROS OF ORTHOGONAL POLYNOMIALS AND QUADRATURE

It is well known that a polynomial of degree N has at most N distinct complex ze-ros. Although very little can be said about real zeros in general cases, the followingresult holds.

Theorem 3.9. Let {Qn(x)}n∈N, x ∈ I , be orthogonal polynomials satisfying ortho-gonal relation (3.8). Then, for any n ≥ 1, Qn has exactly n real distinct zeros in I .

Proof. First note that (Qn, 1)w = 0. Therefore, Qn changes sign in I , hence it hasat least one real zero z1 ∈ I . And n = 1 is proved. For n > 1, we can find anotherzero z2 ∈ I with z2 �= z1 since if Qn vanishes only at z1, then the polynomial(x − z1)Qn would not change sign in I , which is in contradiction to the relation((x − z1), Qn)w = 0, obtained by orthogonality. In a similar fashion, we considerthe polynomial (x − z1)(x − z2)Qn and, if n > 2, deduce the existence of a thirdzero, and so on. The procedure ends when all n zeroes are obtained.

There are many properties of the zeros of orthogonal polynomials. For example,one can prove that Qn and Qn−1 do not have common zeros. Moreover, betweenany two neighboring zeros of Qn−1, there exists one and only one zero of Qn. Letus recall the following statement.

Theorem 3.10. Let {Qn}n∈N be a sequence of orthogonal polynomials in I . Then,for any interval [a, b] ⊂ I , a < b, it is possible to find m ∈ N such that Qm has atleast one zero in [a, b].In other words, this theorem states that the set J = ⋃

n≥1

⋃nk=1{z(n)

k } is dense in

I , where {z(n)k } are the zeros of the orthogonal polynomials Qn. The proof can be

found in classical texts on polynomials such as [100].For example, the zeros of the Legendre polynomials Pn(x), x ∈ [−1, 1], satisfy

−1 ≤ − cosk − 1

2

n + 12

π ≤ z(n)k ≤ − cos

k

n + 12

π ≤ 1, 1 ≤ k ≤ n.

The length of the interval between two consecutive zeros is

L = − cosk

n + 12

π + cosk − 1

2

n + 12

π = 2 sin2k − 1

2

2n + 1π sin

12

2n + 1π.

For large n, L ∝ n−2 for k ≈ 1 or k ≈ n; and L ∝ n−1 for moderate valuesof k. This indicates that the zeros of Legendre polynomials are clustered toward theendpoints of the interval [−1, 1], a feature shared by many orthogonal polynomials.

The nonuniform distribution of the zeros of orthogonal polynomials makes themexcellent candidates for polynomial interpolation (see figure 3.5). In addition, theyare also excellent for numerical integration.


40 CHAPTER 3

Let

I [f ] �∫

I

f (x)w(x)dx (3.45)

and define an integration formula with q ≥ 1 points,

Uq[f ] �q∑

j=1

f (x(j))w(j), (3.46)

where x(j) are a set of nodes and w(j) are integration weights, j = 1, . . . , q. Theobjective is to find a set of {x(j), w(j)} such that Uq[f ] ≈ I [f ] and hopefullylimq→∞ Uq[f ] = I [f ]. For example, the well-known trapezoidal rule approxi-mates (3.45) in an interval [a, b] by

I [f ] ≈ b − a

2[f (a) + f (b)],

which implies that, in (3.46), q = 2, {x(j)} = {a, b}, and w(j) = b−a2 , j = 1, 2.

Highly accurate integration formulas can be constructed by using orthogonalpolynomials. Let {Qn}n∈N be orthogonal polynomials satisfying (3.8) and let{z(N)

k }Nk=1 be the zeros of QN . Let l

(N−1)j be the (N − 1)th-degree Lagrange polyno-

mials through the nodes z(N)j , 1 ≤ j ≤ N , and let �N−1f (x)=∑N

j=1 f (z(N)j )l

(N−1)j

be the (N − 1)th-degree interpolation of f (x). Then, the integral (7.12) can be ap-proximated by integrating �N−1f (x),

∫I

f (x)w(x)dx ≈N∑

j=1

f (z(N)j )w

(N)j , (3.47)

where

w(N)j =

∫I

l(N)j wdx, 1 ≤ j ≤ N,

are the weights. This approximation is obviously exact if f ∈ PN−1. However, thefollowing result indicates that it is more accurate than this.

Theorem 3.11. Formula (3.47) is exact; i.e., it becomes an equality if f (x) is anypolynomial of degree less than or equal to 2N − 1 in I .

Proof. For any f ∈ P2N−1, let q = �N−1f ∈ PN−1 be the (N − 1)th-degreeinterpolation using {z(N)

j }Nj=1. Then f −q vanishes at {z(N)

j }Nj=1 and can be expressed

as f − q = QN (x)r(x), where r(x) is a polynomial of degree at most N − 1. Byorthogonality and using the exactness of (3.47) for polynomials of degree up toN − 1, we have ∫

I

f wdx =∫

I

qwdx +∫

I

QN rwdx =∫

I

qwdx

=N∑

j=1

q(z(N)j )w

(N)j =

N∑j=1

f (z(N)j )w

(N)j .



The converse of the theorem is also true; i.e., if formula (3.47) holds for anyf ∈ P2N−1, then the nodes are zeros of QN . The degree of the integration formulacannot be improved further. In fact, if one takes f = Q2

N ∈ P2N , the right-handside of (3.47) vanishes because QN (z

(N)j ) = 0, j = 1, . . . , N , but the left-hand

side does not.

3.6 DISCRETE PROJECTION

Here we define a discrete projection of a given function f ∈ L2w(I) as

IN f (x) �N∑

n=0

fnφn(x), (3.48)

where the expansion coefficients

fn = 1

‖φn‖2L2

w

Uq[f (x)φn(x)] = 1

‖φn‖2L2

w

q∑j=1

f (x(j))φn(x(j))wj , 0 ≤ n ≤ N.

(3.49)

When an integration formula Uq is used, the coefficients fn are approximations ofthe coefficients fn (3.30) in the continuous orthogonal projection (3.29). That is,

fn ≈ fn =∫

I

f φnwdx, 0 ≤ n ≤ N.

Note that this definition is slightly more general than what is often used in theliterature, in the sense that we did not specify the type and number of nodes usedin the integration formula. The only requirement is that the integration formulaapproximates the corresponding continuous integrals.

In classical spectral methods analysis, the integration formula is typically chosento be the Gauss quadrature corresponding to the orthogonal polynomials {φn(x)}satisfying (3.28). More specifically, let {z(N)

j }N+1j=1 be the zeros of φN+1(x) in I . Let

�N f be the Lagrange interpolation of f (x) in the form of (3.42),

�N f (x) =N+1∑j=1

f (z(N)j )lj (x). (3.50)

Let us now use the (N + 1)-point Gauss quadrature to evaluate the coefficients inthe discrete expansion (3.48). That is,

fn = 1

‖φn‖2L2

w

N+1∑j=1

f (z(N)j )φn(z

(N)j )w

(N)j , 0 ≤ n ≤ N. (3.51)

Then, the following results hold.

Theorem 3.12. Let IN f be defined by (3.48), where the coefficients {fn} are eval-uated by (N + 1)-point Gauss quadrature based on orthogonal polynomial {φn},as in (3.51). Let �N f be the Lagrange interpolation of f through the same set ofGauss nodes, as in (3.50). Then, for any f ∈ PN , �N f = IN f .


42 CHAPTER 3

Proof. For f ∈ PN , �N f = f because of the uniqueness of Lagrange interpola-tion. The coefficients of its discrete expansion are

fn = 1

‖φn‖2L2

w

N+1∑j=1

f (z(N)j )φn(z

(N)j )w

(N)j

= 1

‖φn‖2L2

w

∫I

f (x)φn(x)w(x)dx

= fn, 0 ≤ n ≤ N,

because the (N + 1)-point Gauss quadrature is exact for integrating polynomialsof degree up to 2N + 1 and the integrand is in P2N . Therefore, IN f = PN f ,the continuous orthogonal projection of f , which in turn equals f . The proof isestablished.

The above equivalence of the discrete projection, continuous projection, and La-grange interpolation does not hold for a general function f because the Gaussquadrature will not be exact. In such a case, fn �= fn. Hence, PN f �= IN f . The dif-ference between the continuous orthogonal projection and the discrete projection isoften termed aliasing error,

AN f � PN f − IN f. (3.52)

By letting γn = ‖φn‖2L2

wand realizing that the summation in (3.51) defines a

discrete inner product [f, φn]w, we obtain, for all 0 ≤ n ≤ N ,

fn = 1

γn

[f, φn]w = 1

γn

∞∑

j=0

fj φj , φn

w

= 1

γn

∞∑j=0

fj [φj , φn]w

= 1

γn

∑

j≤N

(φj , φn)L2wfj +

∞∑j>N

[φj , φn]wfj

= fn + 1

γn

∞∑j>N

fj [φj , φn]w.

Therefore,

AN f =N∑

n=0

(fn − fn)φn

=N∑

n=0

1

γn

∞∑j>N

fj [φj , φn]wφn =∞∑

j>N

N∑n=0

1

γn

φn[φj , φn]wfj

=∞∑

j>N

(IN φj )fj .

Thus, the aliasing error can be seen as the error introduced by using the interpola-tion of the basis, IN φj , rather than the basis itself to represent the higher expansion



modes (j > N ). The aliasing error stems from the fact that one cannot distinguishbetween lower and higher basis modes on a finite grid. A general result holds thatthe aliasing error induced by Gauss points is usually of the same order as that of theprojection error. Hence for well-resolved smooth functions, the qualitative behaviorof the continuous and the discrete expansions is similar for all practical purposes.We will not pursue a further in-depth discussion of this and instead refer interestedreaders to the literature, for example, [51].


Chapter Four

Formulation of Stochastic Systems

This chapter is devoted to the general aspects of formulating stochastic equations,i.e., given an established deterministic model for a physical system, how to prop-erly set up a stochastic model to study the effect of uncertainty in the inputs to thesystem. Prior to any simulation, the key step is to properly characterize the ran-dom inputs. More specifically, the goal is to reduce the infinite-dimensional prob-ability space to a finite-dimensional space that is amenable to computing. This isaccomplished by parameterizing the probability space by a set of a finite num-ber of random variables. More importantly, it is desirable to require the set ofrandom variables to be mutually independent. We remark that the independencerequirement is very much a concern from a practical point of view, for most, ifnot all, available numerical techniques require independence. This is not as stronga requirement from a theoretical point of view. Although there exist some tech-niques to loosen it, in this book we shall continue to employ this widely adoptedrequirement.

To summarize, the critical step in formulating a stochastic system is to properlycharacterize the probability space defined by the random inputs by a set of a finitenumber of mutually independent random variables. In many cases such a charac-terization procedure cannot be done exactly and will induce approximation errors.

4.1 INPUT PARAMETERIZATION: RANDOM PARAMETERS

When the random inputs to a system are the system parameters, the parameteriza-tion procedure is straightforward, for the inputs are already in the form of parame-ters. The more important issue is then to identify the independent parameters in theset. The problem can be stated as follows.

Let Y = (Y1, . . . , Yn), n > 1, be the system parameters with a pre-scribed distribution function FY (y) = P (Y ≤ y), y ∈ Rn, and finda set of mutually independent random variables Z = (Z1, . . . , Zd) ∈R

d , where 1 ≤ d ≤ n, such that Y = T (Z) for a suitable transforma-tion function T .

Let us use a simple example to illustrate the idea. Consider an ordinary differen-tial equation with two random parameters,

du

dt(t, ω) = −α(ω)u, u(0, ω) = β(ω), (4.1)


FORMULATION OF STOCHASTIC SYSTEMS 45

where the rate constant α and the initial condition β are assumed to be random.Thus, the input random variables are Y (ω) = (α, β) ∈ R2.

If α and β are mutually independent, then we simply let Z(ω) = Y (ω). Thesolution

u(t, ω) : [0, T ] × � → R

can now be expressed as

u(t, Z) : [0, T ] × R2 → R,

which has one time dimension and two random dimensions.If α and β are not independent of each other, it implies that there exists a function

f such that

f (α, β) = 0.

Then it is possible to find a random variable Z(ω) to parameterize the relation suchthat

α(ω) = a(Z(ω)), β(ω) = b(Z(ω)),

and f (a, b) = 0. Or, equivalently, the dependence between α and β implies thatthere exists a function g such that

β = g(α).

Then we can let Z(ω) = α(ω) and β(ω) = g(Z(ω)). Which approach is moreconvenient in practice is problem-dependent. Nevertheless, the random inputs viaα and β can now be expressed via a single random variable Z, and the solutionbecomes

u(t, Z) : [0, T ] × R→ R,

which now has only one random dimension.In practice, when there are many parameters in a system, finding the exact form

of the functional dependence among all the parameters can be a challenging (andunnecessary) task. This is especially true when the only available information is the(joint) probability distributions of the parameters. The goal now is to transform theparameters to a set of independent random parameters by using their distributionfunctions.

4.1.1 Gaussian Parameters

Since the first two moments, mean and covariance, can completely characterizeGaussian distribution, the parameterization problem can be solved in a straightfor-ward manner.

Let Y = (Y1, . . . , Yn) be a random vector with a Gaussian distribution ofN (0, C), where C ∈ Rn×n is the covariance matrix and the expectation is assumedto be zero (without loss of generality). Let Z ∼ N (0, I), where I is the n × n

identity matrix, be a uncorrelated Gaussian vector of size n. Thus, the componentsof Z are mutually independent. Let A be an n × n matrix; then by theorem 2.16,


46 CHAPTER 4

AZ ∼ N (0, AAT ). Therefore, if one finds a matrix A such that AAT = C, thenY = AZ will have the given distribution N (0, C). This result is a special case of amore general theorem by Anderson [4].

Since C is real and symmetric, solving the problem AAT = C can be readilydone via, for example, Cholesky’s decomposition, where A takes the form of alower-triangular matrix:

ai1 = ci1/√

c11, 1 ≤ i ≤ n,

aii =√

cii − ∑i−1k=1 a2

ik, 1 < i ≤ n,

aij =(cij − ∑j−1

k=1 aikajk

)/ajj , 1 < j < i ≤ n,

aij = 0, i < j ≤ n,

(4.2)

where aij and cij , 1 ≤ i, j ≤ n, are the entries for the matrices A and C, respec-tively.

4.1.2 Non-Gaussian Parameters

When the system parameters have a non-Gaussian distribution, the parameteri-zation problem is distinctly more difficult. However, there exists a remarkablysimple transformation due to Rosenblatt [88] that can accomplish the goal. LetY = (Y1, . . . , Yn) be a random vector with a (non-Gaussian) distribution functionFY (y) = P (Y ≤ y) and let z = (z1, . . . , zn) = T y = T (y1, . . . , yn) be a transfor-mation defined as

z1 = P (Y1 ≤ y1) = F1(y1),

z2 = P (Y2 ≤ y2|Y1 = y1) = F2(y2|y1),

· · · ,

zn = P (Yn ≤ yn|Yn−1 = yn−1, . . . , Y1 = y1) = Fn(yn|yn−1, . . . , y1).

(4.3)

It can then be shown that

P (Zi ≤ zi; i = 1, . . . , n)

=∫

{Z|Zi≤zi }. . .

∫dyn

Fn(yn|yn−1, . . . , y1) · · · dy1F1(y1)

=∫ zn

0· · ·

∫ z1

0dz1 · · · dzn =

n∏i=1

zi,

where 0 ≤ zi ≤ 1, i = 1, . . . , n. Hence Z = (Z1, . . . , Zn) are independent andidentically distributed (i.i.d.) random variables with uniform distribution in [0, 1]n.

Though mathematically simple and powerful, Rosenblatt transformation is noteasy to carry out in practice, for it relies on the conditional probability distributionsamong the random parameters. Such information is rarely known completely. Andeven if it is known, it is rarely given in explicit formulas. In practice, some kinds ofnumerical approximations of Rosenblatt transformation are required. This remainsan understudied topic and is beyond the scope of this book.



4.2 INPUT PARAMETERIZATION: RANDOM PROCESSES AND

DIMENSION REDUCTION

In many cases, the random inputs are stochastic processes. For example, the inputscould be a time-dependent random forcing term that is a stochastic process in time,or an uncertain material property, e.g., conductivity, that is a stochastic process inspace. The parameterization problem can be stated as follows.

Let (Yt , t ∈ T ) be a stochastic process that models the random in-puts, where t is the index belonging to an index set T , and find asuitable transformation function R such that Yt = R(Z), where Z =(Z1, . . . , Zd), d ≥ 1, are mutually independent.

Note that the index set T can be in either the space domain or the time domainand is usually an infinite-dimensional object. Since we require d to be a finite inte-ger, the transformation cannot be exact. Therefore, Yt ≈ R(Z) in a proper norm ormetric, and the accuracy of the approximation will be problem-dependent.

A straightforward approach is to consider the finite-dimensional version of Yt

instead of Yt directly. This requires one to first discretize the index domain T intoa set of finite indices and then study the process

(Yt1 , . . . , Ytn ), t1, . . . , tn ∈ T ,

which is now a finite-dimensional random vector. The parameterization techniquesfor random parameters from the previous section can now be readily applied, e.g.,Rosenblatt transformation.

The discretization of the Yt into its finite-dimensional version is obviously anapproximation. The finer the discretization, the better the approximation. However,this is not desired because a finer discretization leads to a larger dimension of n

and can significantly increase the computational burden. Some kinds of dimensionreduction techniques are required to keep the dimension as low as possible whilemaintaining a satisfactory approximation accuracy.

4.2.1 Karhunen-Loeve Expansion

The Karhunen-Loeve (KL) expansion (cf. [73], for example) is one of the mostwidely used techniques for dimension reduction in representing random processes.

Let µY (t) be the mean of the input process Yt and let C(t, s) = cov(Yt , Ys) beits covariance function. The Karhunen-Loeve expansion of Yt is

Yt (ω) = µY (t) +∞∑

i=1

√λiψi(t)Yi(ω), (4.4)

where ψi are the orthogonal eigenfunctions and λi are the corresponding eigenval-ues of the eigenvalue problem∫

T

C(t, s)ψi(s)ds = λiψi(t), t ∈ T , (4.5)

and {Yi(ω)} are mutually uncorrelated random variables satisfying

E[Yi] = 0, E[YiYj ] = δij , (4.6)


48 CHAPTER 4

and defined by

Yi(ω) = 1√λi

∫T

(Yt (ω) − µY (t))ψi(t)dt, ∀i. (4.7)

The Karhunen-Loeve expansion, in the form of the equality (4.4), is of little usebecause it is an infinite series. In practice, one adopts a finite series expansion, e.g.,

Yt (ω) ≈ µY (t) +d∑

i=1

√λiψi(t)Yi(ω), d ≥ 1. (4.8)

The natural question to ask is when to truncate the series. That is, how to choosed so that the approximation accuracy is satisfactory. The answer to the question isclosely related to an important property of the Karhunen-Loeve expansion—decayof the eigenvalues λi as index i increases. Here we illustrate the property with thefollowing examples.

Example 4.1 (Exponential covariance function). Let C(t, s) = exp(−|t − s|/a),where a > 0 is the correlation length, and let t ∈ T = [−b, b] be in a bounded do-main with length 2b. Then the eigenvalue problem (4.5) can be solved analytically[109]. The eigenvalues are

λi =

2a

1+a2w2i

, if i is even,

2a

1+a2v2i

, if i is odd,(4.9)

and the corresponding eigenfunctions are

ψi(t) =

sin(wit)/√

b − sin(2wib)

2wi, if i is even,

cos(vi t)/√

b + sin(2vib)

2vi, if i is odd,

(4.10)

where wi and vi are the solutions of the transcendental equations{aw + tan(wb) = 0, for even i,

1 − av tan(vb) = 0, for odd i.

In figure 4.1, the first four eigenfunctions are shown for the exponential covariancefunction in [−1, 1]. It is obvious that the higher modes (the eigenfunctions withlarger index i) have a finer structure compared to the lower modes. The eigenvaluesare shown in figure 4.2 for several different correlation lengths a. It can be seenthat the eigenvalues decay, and the decay rate is larger when the correlation lengthis longer. When the correlation length is very small, e.g., a = 0.01, the decay ofthe eigenvalues is barely visible.

Example 4.2 (Uncorrelated process). The limit of diminishing correlation lengthis the zero correlation case, when the covariance function takes the form of a deltafunction, C(t, s) = δ(t − s). It is easy to see from (4.5) that now any orthogonalfunctions can be the eigenfunctions and the eigenvalues are a constant, i.e., λi = 1,∀i. In this case, there will not be any decay of the eigenvalues.



−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

i=1

i=2

i=3

i=4

Figure 4.1 The first four eigenfunctions of the exponential covariance function.

0 2 4 6 8 10 12 14 16 18 2010

−4

10−3

10−2

10−1

100

101

a=10a=1a=0.1a=0.01

Figure 4.2 The first 20 eigenvalues of the exponential covariance function with differentcorrelation lengths a.

Example 4.3 (Fully correlated process). The other limit is when C(t, s) = 1,which implies an infinite correlation length and that the process Yt is fully corre-lated. This is the rather trivial case where the process depends on just one randomvariable. It is straightforward from (4.5) to show that there exists one nonzero eigen-value corresponding to a constant eigenfunction and that the rest of the eigenvaluesare zero.


50 CHAPTER 4

The aforementioned examples illustrate an important property of the Karhunen-Loeve expansion: for a given covariance function, the decay rate of the eigenvaluesdepends inversely on the correlation length. Long correlation length implies thatthe process is strongly correlated and results in a fast decay of the eigenvalues. Thelimit of this, infinitely long correlation length, is the fully correlated case where theeigenvalues decay to zero immediately. Conversely, a weakly correlated process hasshort correlation length and results in a slow decay of the eigenvalues. The limit ofthis, the uncorrelated process with zero correlation length, has no eigenvalue decay.

The decay rate of the eigenvalues provides a guideline for truncating the infiniteKL series (4.4) into the finite KL series (4.8). The common approach for truncationis to examine the decay of λi and keep the first d eigenvalues so that the contribu-tion of the rest of the eigenvalues is negligible. How much is considered negligible,usually given in terms of a small percentage as a cutoff criterion, is specified ona problem-dependent basis. Naturally, for a given cutoff criterion, a strongly cor-related process allows a finite KL expansion with a fewer number of terms than aweakly correlated process does.

There are many more properties regarding the KL expansion. For example, theerror of a finite-term expansion is optimal in terms of the mean-square error. Wewill not devote further discussion to this and refer interested readers to referencessuch as [93].

4.2.2 Gaussian Processes

The truncated Karhunen-Loeve series (4.8) provides a way to approximate a ran-dom process by a function (a series) involving a finite number of random variables.The set of random variables Yi(ω) are uncorrelated, as in (4.6). For Gaussian ran-dom variables, uncorrelation and independence are equivalent. Furthermore, linearcombinations of Gaussian random variables remain Gaussian-distributed. There-fore, if Yt (ω) is a Gaussian process, then the random variables Yi in (4.4) and (4.8)are independent Gaussian random variables. Hence (4.8) provides a natural wayto parameterize a Gaussian process by a finite number of independent Gaussianrandom variables.

4.2.3 Non-Gaussian Processes

When the input random processes are non-Gaussian, their parameterization and di-mension reduction are significantly more challenging. The main problem is that, fornon-Gaussian distributions, uncorrelation of the random variables Yi in (4.4) doesnot imply independence. Hence the Karhunen-Loeve expansion does not providea way of parameterization with independent variables. In many practical computa-tions, one often still uses the KL expansion for an input process and then furtherassumes that the Yi are independent. Though this is not a rigorous approach, at themoment there are not many practical methods for achieving the parameterizationprocedure. We will not engage in further discussion of this because it remains anactive (and open) research topic.



4.3 FORMULATION OF STOCHASTIC SYSTEMS

We now illustrate the main steps in formulating a stochastic system by taking intoaccount random inputs to a well-established deterministic system. We choose par-tial differential equations (PDEs) as a basic model, although the concept and pro-cedure are not restricted to PDEs.

Let us consider a system of PDEs defined in a spatial domain D ⊂ R�, � =1, 2, 3, and a time domain [0, T ] with T > 0,

ut (x, t, ω) = L(u), D × (0, T ] × �,

B(u) = 0, ∂D × [0, T ] × �,

u = u0, D × {t = 0} × �,

(4.11)

where L is a (nonlinear) differential operator, B is the boundary condition operator,u0 is the initial condition, and ω ∈ � denotes the random inputs of the systemin a properly defined probability space (�,F , P ). Note in general that it is notimportant, nor is it relevant, to identify precisely the probability space. The solutionis therefore a random quantity,

u(x, t, ω) : D × [0, T ] × � → Rnu , (4.12)

where nu ≥ 1 is the dimension of u.The random inputs to (4.11) can take the form of random parameters and random

processes. Let us assume that they can all be properly parameterized by a set ofindependent random variables using the techniques discussed in the previous twosections. Let Z = (Z1, . . . , Zd) ∈ Rd , d ≥ 1, be the set of independent randomvariables characterizing the random inputs. We can then rewrite system (4.11) as

ut (x, t, Z) = L(u), D × (0, T ] × Rd ,

B(u) = 0, ∂D × [0, T ] × Rd ,

u = u0, D × {t = 0} × Rd .

(4.13)

The solution is now

u(x, t, Z) : D × [0, T ] × Rd → Rnu . (4.14)

The fundamental assumption we make is that (4.11) is a well-posed system P -almost surely in �. Loosely and intuitively speaking, this means that if one gener-ates an ensemble of (4.13) by generating a collection of realizations of the randomvariables Z, then each realization is well posed in its corresponding deterministicsense.

Example 4.4. Consider the same example of the ordinary differential equation(ODE) (4.1)

du

dt(t, ω) = −α(ω)u, u(0, ω) = β(ω). (4.15)


52 CHAPTER 4

If the input random variables α and β are independent, then we let Z = (Z1, Z2) =(α, β) and rewrite the problem as

du

dt(t, ω) = −Z1u, u(0, ω) = Z2. (4.16)

If, however, α and β are dependent, then, as discussed in section 4.1, we canlet Z = α and there exists a function such that β = g(α). The system can berewritten as

du

dt(t, ω) = −Zu, u(0, ω) = g(Z). (4.17)

Example 4.5 (Stochastic diffusion equation). Consider a one-dimensional sto-chastic elliptic equation{∇ · (κ(x, ω)∇u) = f (x, ω), x ∈ (−1, 1),

u(−1, ω) = u�(ω), u(1, ω) = ur(ω),(4.18)

where the diffusivity field κ and the source term f are assumed to be random fieldsand u� and ur are random variables. For simplicity of exposition, only the Dirichletboundary condition is considered. Let us assume that the diffusivity field κ can beparameterized by a truncated KL expansion (4.8) with dκ terms, i.e.,

κ(x, ω) ≈ κ(x, Zκ) = µκ(x) +dκ∑

i=1

κi (x)Zκi (ω),

where the functions κi (x) are determined by the eigenvalues and eigenfunctions ofthe covariance function of κ(x, ω), and Zκ

i (ω) are mutually independent. Similarly,let f (x, ω) be parameterized as

f (x, ω) ≈ f (x, Zf ) = µf (x) +df∑

i=1

fi (x)Zf

i (ω),

with df terms and mutually independent Zf

i (ω). Let us assume κ and f are inde-pendent of each other and also independent of u� and ur . Then let

Z = (Z1, . . . , Zd) = (Zκ1 , . . . , Zκ

dκ, Z

f

1 , . . . , Zf

df, u�, ur),

where d = dκ + df + 2, and we can write the elliptic problem as{∇ · (κ(x, Z)∇u) = f (x, Z), x ∈ (−1, 1),

u(−1, Z) = Zd−1, u(1, Z) = Zd.(4.19)

The solution is u(x, Z) : [−1, 1] × Rd → R.

4.4 TRADITIONAL NUMERICAL METHODS

Here we briefly review the traditional methods for solving practical systems withrandom inputs. For the purpose of illustration, we use the simple stochastic ODEin example 4.4 as an example.



The exact solution to (4.15) is

u(t, ω) = β(ω)e−α(ω)t . (4.20)

When the distribution function of α and β is known, i.e., Fαβ(a, b) = P (α ≤ a,

β ≤ b), the statistical moments of the solution can be evaluated. If α(ω) and β(ω)

are independent, i.e., Fαβ(a, b) = Fα(a)Fβ(b), then

E [u(t, ω)] = E[β]E [e−αt

].

For example, if we further assume β(ω) ≡ 1, that is, the initial condition is notrandom, and α(ω) ∼ N (0, 1) is a standard Gaussian random variable with zeromean and unit variance, then

E[u] = 1√2π

∫e−at e−a2/2da = et2/2

and

σ 2u = E [

u2] − (E[u])2 = e2t2 − et2.

4.4.1 Monte Carlo Sampling

Monte Carlo sampling (MCS) is a statistical sampling method that was popularizedby physicists from Los Alamos National Laboratory in the United States in the1940s. The general procedure for (4.15) is as follows.

1. Generate identically and independently distributed random numbers Z(i) =(α(i), β(i)), i = 1, . . . , M , according to the distribution of Fαβ(a, b). Noteonce again that the dependence structure of α and β is required to be known.

2. For each i = 1, . . . , M , solve the governing equation (4.15) and obtainu(i)(t) � u(t, Z(i)).

3. Estimate the required solution statistics. For example, the solution mean canbe estimated as

u(t) = 1

M

M∑i=1

u(t, Z(i)) ≈ E[u]. (4.21)

Other solution statistics can be estimated via proper schemes from the solution en-semble {u(i)}. It is obvious that steps 1 and 3 are preprocessing and postprocessingsteps, respectively. Only step 2 requires solution of the original problem, and itinvolves repetitive simulations of the deterministic counterpart of the problem.

Error estimate of MCS follows immediately from the Central Limit Theorem(theorem 2.26). Since {u(t, Z(i))} are independent and identically distributed ran-dom variables, the distribution function of u(t) converges, in the limit of M →∞, to a Gaussian distribution N (E[u](t), σ 2

u (t)/M), whose standard deviation isM−1/2σu(t), where σu is the standard deviation of the exact solution. Hence thewidely adopted concept that the error convergence rate of MCS is inversely propor-tional to the square root of the number of realizations.

It is obvious that the MCS procedure can be easily extended to a general systemsuch as (4.13). The only requirement is that one needs a well-established solver to


54 CHAPTER 4

solve the corresponding deterministic system. Although very powerful and flexible,the convergence rate of MCS, O(M−1/2), is relatively slow. Generally speaking, if aone-digit increase in solution accuracy of the statistics is required, one needs to runroughly 100 times more simulations and thus increase the computational effort by100 times. For large and complex systems where the solution of a single determinis-tic realization is time-consuming, this poses a tremendous numerical challenge. Onthe other hand, a remarkable advantage of MCS lies in the fact that the O(M−1/2)

convergence rate is independent of the total number of input random variables. Thatis, the convergence rate is independent of the dimension of the random space. Thisturns out to be an extremely useful property that virtually no other methods possess.

4.4.2 Moment Equation Approach

The objective of the moment equation approach is to compute the moments of thesolution directly because in many cases the moments of the solution are what oneneeds.

Let µ(t) = E[u]; then by taking the expectation of both sides of (4.15) we obtain

dµ

dt(t) = −E[αu], µ(0) = E[β].

This requires a knowledge of E[αu], which is not known. We then attempt to derivean equation for this new quantity. This is done by multiplying (4.15) by α and thentaking the expectation,

d

dtE[αu](t) = −E[α2u], E[αu](0) = E[αβ].

A new quantity E[α2u] appears. If we attempt a similar approach by multiplyingthe original system by α2 and then taking the expectation, the equation for the newquantity is

d

dtE[α2u](t) = −E[α3u], E[α2u](0) = E[α2β],

which now requires yet another new quantity, E[α3u]. In general, if we defineµk(t) = E[αku] for k = 0, 1, . . . , then we obtain

d

dtµk(t) = −µk+1, µk(0) = E[αkβ].

The system of equations thus cannot be closed, as it keeps introducing new vari-ables that are not included in the existing system. This is the well-known closureproblem. The typical approach to remedy the difficulty is to assume, for a fixed k,that the higher-order term is a function of the lower-order ones, that is,

µk+1 = g(µ0, . . . , µk),

where the form of the function g is determined on a problem-dependent basis with(hopefully) sufficient justification. There exists, to date, no effective general strat-egy for solving the closure problem. Also, the error caused by most, if not all,closure techniques is not easy to quantify.



4.4.3 Perturbation Method

In the perturbation method, the random part of the solution is treated as a perturba-tion. The fundamental assumption is that such a perturbation is small. And this istypically achieved by requiring the standard deviation of the solution to be small.To illustrate the idea, let us again use the simple ODE example (4.15). For ease ofexposition, let us further assume that the mean value of α is zero, i.e., E[α] = 0,and that the initial condition is a fixed value. That is,

du

dt(t, ω) = −α(ω)u, u(0, ω) = β.

The perturbation method can be applied when the variation of α is small, that is,ε = O(α(ω)) ∼ σα 1. If this is the case, we seek to expand the solution as apower series of α,

u(t, ω) = u0(t) + α(ω)u1(t) + α2(ω)u2(t) + · · · , (4.22)

where the coefficients u0, u1, . . . are supposed to be of same order of magnitude.After substituting the expression into the governing equation (4.1), we obtain

du0

dt+ α

du1

dt+ α2 du2

dt+ · · · = −α(u0 + αu1 + α22u2 + · · · ).

Since O(αk) = εk becomes increasingly small as k increases, we match the termsin the equation according to the power of α.

O(1) : du0

dt= 0,

O(ε) : du1

dt= −u0,

O(ε2) : du2

dt= −u1,

. . . . . .

O(εk) : duk

dt= −uk−1,

Similar expansion and term matching of the initial condition result in the initialconditions for the coefficients,

u0(0) = β, uk(0) = 0, k > 1.

It is then easy to solve the system recursively and obtain

u0(t) = β, u1 = −βt, . . . , uk = β(−1)k tk

k! .The power series then gives us the solution

u(t, ω) =∞∑

k=0

β−t k

k! αk(ω), (4.23)

which is the infinite power series of the exact solution uexact (t, ω) = β exp(−αt).


56 CHAPTER 4

Even though it seems that the perturbation solution can recover the exact solutionin terms of its power series, the requirement for α to be small is still needed. Thisis because in practice one can use only a finite-term series, which can provide agood approximation only when α is small. (Note from (4.23) that when α is O(1)

or bigger, the remainder of a finite-term series is always dominant.)Derivation of the equations for each terms is done by matching the orders. The

procedure cannot be easily automated, except for very simple problems such as theexample here. For practical systems, one usually stops the procedure at the second-order terms because of the complexity of the derivation. For first- or second-orderapproximations to be satisfactory, an even stronger requirement for smallness isneeded. However, a distinct feature of perturbation methods is that, once derived,the systems of equations are almost always decoupled and can be solved recur-sively.


Chapter Five

Generalized Polynomial Chaos

This chapter is devoted to the fundamental aspects of generalized polynomial chaos(gPC). The material is largely based on the work described in [120]. However, theexposition here is quite different from the original one in [120], for better under-standing of the material. It should also be noted that here we focus on the gPCexpansion based on globally smooth orthogonal polynomials, in an effort to illus-trate the basic ideas, and leave other types of gPC expansion (e.g., those based onpiecewise polynomials) as research topics outside the scope of this book.

5.1 DEFINITION IN SINGLE RANDOM VARIABLES

Let Z be a random variable with a distribution function FZ(z) = P (Z ≤ z) andfinite moments

E(|Z|2m

) =∫

|z|2mdFZ(z) < ∞, m ∈ N , (5.1)

where N = N0 = {0, 1, . . . } or N = {0, 1, . . . , N} and for a finite nonnegativeinteger N is an index set. The generalized polynomial chaos basis functions are theorthogonal polynomial functions satisfying

E[�m(Z)�n(Z)] = γnδmn, m, n ∈ N , (5.2)

where

γn = E [�2

n(Z)]

, n ∈ N , (5.3)

are the normalization factors.If Z is continuous, then its probability density function (PDF) exists such that

dFZ(z) = ρ(z)dz and the orthogonality can be written as

E[�m(Z)�n(Z)] =∫

�m(z)�n(z)ρ(z)dz = γnδmn, m, n ∈ N . (5.4)

Similarly, when Z is discrete, the orthogonality can be written as

E[�m(Z)�n(Z)] =∑

i

�m(zi)�n(zi)ρi = γnδmn, m, n ∈ N . (5.5)

With a slight abuse of notation, hereafter we will use

E[f (Z)] =∫

f (z)dFZ(z)

to include both the continuous case and the discrete case.


58 CHAPTER 5

Obviously, {�m(z)} are orthogonal polynomials of z ∈ R with the weight func-tion ρ(z), which is the probability function of the random variable Z. This estab-lishes a correspondence between the distribution of the random variable Z and thetype of orthogonal polynomials of its gPC basis.

Example 5.1 (Hermite polynomial chaos). Let Z ∼ N (0, 1) be a standard Gaus-sian random variable with zero mean and unit variance. Its PDF is

ρ(z) = 1√2π

e−z2/2.

The orthogonality (5.2) then defines the Hermite orthogonal polynomials {Hm(Z)}as in (3.19). Therefore, we employ the Hermite polynomials as the basis functions,

H0(Z) = 1, H1(Z) = Z, H2(Z) = Z2 − 1, H3(Z) = Z3 − 3Z, . . . .

This is the classical Wiener-Hermite polynomial chaos basis ([45]).

Example 5.2 (Legendre polynomial chaos). Let Z ∼ U(−1, 1) be a random vari-able uniformly distributed in (−1, 1). Its PDF is ρ(z) = 1/2 and is a constant. Theorthogonality (5.2) then defines the Legendre orthogonal polynomials (3.16), with

L0(Z) = 1, L1(Z) = Z, L2(Z) = 3

2Z2 − 1

2, . . . .

Example 5.3 (Jacobi polynomial chaos). Let Z be a random variable of beta dis-tribution in (−1, 1) with PDF

ρ(z) ∝ (1 − z)α(1 + z)β, α, β > 0,

whose precise definition is in (A.21). The orthogonality (5.2) then defines theJacobi orthogonal polynomials (A.20) with the parameters α and β, where

J(α,β)

0 (Z) = 1, J(α,β)

1 (Z) = 1

2[α − β + (α + β + 2)Z], . . . .

The Legendre polynomial chaos becomes a special case of the Jacobi polynomialchaos with α = β = 0.

In table 5.1, some of the well-known correspondences between the probabilitydistribution of Z and its gPC basis polynomials are listed.

5.1.1 Strong Approximation

The orthogonality (5.2) ensures that the polynomials can be used as basis functionsto approximate functions in terms of the random variable Z.

Definition 5.4 (Strong gPC approximation). Let f (Z) be a function of a randomvariable Z whose probability distribution is FZ(z) = P (Z ≤ z) and support isIZ . A generalized polynomial chaos approximation in a strong sense is fN (Z) ∈PN (Z), where PN (Z) is the space of polynomials of Z of degree up to N ≥ 0, suchthat ‖f (Z) − fN (Z)‖ → 0 as N → ∞, in a proper norm defined on IZ .


GENERALIZED POLYNOMIAL CHAOS 59

Table 5.1 Correspondence between the Type of Generalized Polynomial Chaos and TheirUnderlying Random Variablesa

Distribution of Z gPC basis polynomials SupportContinuous Gaussian Hermite (−∞, ∞)

Gamma Laguerre [0, ∞)

Beta Jacobi [a, b]Uniform Legendre [a, b]

Discrete Poisson Charlier {0, 1, 2, . . . }Binomial Krawtchouk {0, 1, . . . , N}Negative binomial Meixner {0, 1, 2, . . . }Hypergeometric Hahn {0, 1, . . . , N}

aN ≥ 0 is a finite integer.

One obvious strong approximation is the orthogonal projection. Let

L2dFZ

(IZ) = {f : IZ → R | E[f 2] < ∞}

(5.6)

be the space of all mean-square integrable functions with norm ‖f ‖L2dFZ

=(E[f 2])1/2. Then, for any function f ∈ L2

dFZ(IZ), we define its N th-degree gPC

orthogonal projection as

PN f =N∑

k=0

fk�k(Z), fk = 1

γk

E[f (Z)�k(Z)]. (5.7)

The existence and convergence of the projection follow directly from the classicalapproximation theory; i.e.,

‖f − PN f ‖L2dFZ

→ 0, N → ∞, (5.8)

which is also often referred to as mean-square convergence. Let PN (Z) be the linearspace of all polynomials of Z of degree up to N ; then the following optimalityholds:


= infg∈PN (Z)

‖f − g‖L2dFZ

. (5.9)

Though the requirement for convergence (L2-integrable) is rather mild, the rate ofconvergence will depend on the smoothness of the function f in terms of Z. Thesmoother f is, the faster the convergence. These results follow immediately fromthe classical results reviewed in chapter 3.

When a gPC expansion fN (Z) of a function f (Z) converges to f (Z) in a strongnorm, such as the mean-square norm of (5.8), it implies that fN (Z) converges to

f (Z) in probability, i.e., fNP→ f , which further implies the convergence in distri-

bution, i.e., fNd→ f , as N → ∞. (See the discussion of the modes of convergence

in section 2.6.)


60 CHAPTER 5

Example 5.5 (Lognormal random variable). Let Y = eX, where X ∼ N (µ, σ 2)

is a Gaussian random variable. The distribution of Y is a lognormal distributionwhose support is on the nonnegative axis and is widely used in practice to modelrandom variables not allowed to have negative values. Its probability density func-tion is

ρY (y) = 1

yσ√

2πe

− (ln y−µ)2

2σ2 . (5.10)

To obtain the gPC projection of Y , let Z ∼ N (0, 1) be the standard Gaussianrandom variable. Then X = µ + σZ and Y = f (Z) = eµeσZ . The Hermitepolynomials should be used because of the Gaussian distribution of Z. By following(5.7), we obtain

YN (Z) = eµ+(σ 2/2)

N∑k=0

σ k

k! Hk(Z). (5.11)

5.1.2 Weak Approximation

When approximating a function f (Z) with a gPC expansion that convergesstrongly, e.g., in a mean-square sense, it is necessary to have knowledge of f , thatis, the explicit form of f in terms of Z. In practice, however, sometimes only theprobability distribution of f is known. In this case, a gPC expansion in terms of Z

that converges strongly cannot be constructed because of the lack of informationabout the dependence of f on Z. However, the approximation can still be made toconverge in a weak sense, e.g., in probability. To be precise, the problem can bestated as follows.

Definition 5.6 (Weak gPC approximation). Let Y be a random variable with dis-tribution function FY (y) = P (Y ≤ y) and let Z be a (standard) random variablein a set of gPC basis functions. A weak gPC approximation is YN ∈ PN (Z), wherePN (Z) is the linear space of polynomials of Z of degree up to N ≥ 0, such that YN

converges to Y in a weak sense, e.g., in probability.

Obviously, a strong gPC approximation in definition 5.4 implies a weak ap-proximation, not vice versa. We first illustrate the weak approximation via a triv-ial example and demonstrate that the gPC weak approximation is not unique. LetY ∼ N (µ, σ 2) be a random variable with normal distribution. Naturally we chooseZ ∈ N (0, 1), a standard Gaussian random variable, and the corresponding Hermitepolynomials as the gPC basis. Then a first-order gPC Hermite expansion

Y1(Z) = µH0 + σH1(Z) = µ + σZ (5.12)

will have precisely the distribution N (µ, σ 2). Therefore, Y1(Z) can approximatethe distribution of Y exactly. However, if all that is known is the distribution of Y ,then one cannot reproduce pathwise realizations of Y via Y1(Z). In fact, Y1(Z) =µH0 − σH1(Z) has the sameN (µ, σ 2) distribution but entirely different pathwiserealizations from Y1.



When Y is an arbitrary random variable with only its probability distributionknown, a direct gPC projection in the form of (5.7) is not possible. More specifi-cally, if one seeks an N th-degree gPC expansion in the form of

YN =N∑

k=0

ak�k(Z), (5.13)

with

ak = E[Y �k(Z)]/γk, 0 ≤ k ≤ N, (5.14)

where γk = E[�2k] are the normalization factors, then the expectation in the coef-

ficient evaluation is not properly defined and cannot be carried out, as the depen-dence between Y and Z is not known. This was first discussed in [120] where astrategy to circumvent the difficulty by using the distribution function FY (y) wasproposed. The resulting gPC expansion turns out to be the weak approximation thatwe defined here.

By definition, FY : IY → [0, 1], where IY is the support of Y . Similarly, FZ(z) =P (Z ≤ z) : IZ → [0, 1]. Since FY and FZ map Y and Z, respectively, to a uniformdistribution in [0, 1], we rewrite the expectation in (5.14) in terms of a uniformlydistributed random variable in [0, 1]. Let U = FY (Y ) = FZ(Z) ∼ U(0, 1); thenY = F −1

Y (U) and Z = F −1Z (U). (The definition of F −1 is (2.7).) Now (5.14) can

be rewritten as

ak = 1

γk

EU [F −1Y (U)�k(F −1

Z (U))] = 1

γk

∫ 1

0F −1

Y (u)�k(F −1Z (u))du. (5.15)

This is a properly defined finite integral in [0, 1] and can be evaluated via traditionalmethods (e.g., Gauss quadrature). Here we use the subscript U in EU to make clearthat the expectation is over the random variable U .

Alternatively, one can choose to transform the expectation in (5.14) into the ex-pectation in terms of Z by utilizing the fact that Y = F −1

Y (FZ(Z)). Then the ex-pectation in (5.14) can be rewritten as

ak = 1

γk

EZ[F −1Y (FZ(Z))�k(Z)] = 1

γk

∫IZ

F −1Y (FZ(z))�k(z)dFZ(z). (5.16)

Though (5.15) and (5.16) take different forms, they are mathematically equivalent.The weak convergence of YN is established in the following result.

Theorem 5.7. Let Y be a random variable with distribution FY (y) = P (Y ≤ y)

and E(Y 2) < ∞. Let Z be a random variable with distribution FZ(z) = P (Z ≤ z)

and let E(|Z|2m) < ∞ for all m ∈ N such that its generalized polynomial chaosbasis functions exist with EZ[�m(Z)�n(Z)] = δmnγn, ∀m, n ∈ N . Let

YN =N∑

k=0

ak�k(Z), (5.17)

where

ak = 1

γk

EZ[F −1Y (FZ(Z))�k(Z)], 0 ≤ k ≤ N. (5.18)


62 CHAPTER 5

Then YN converges to Y in probability; i.e.,

YNP−→ Y, N → ∞. (5.19)

Also, YNd−→ Y in distribution.

Proof. Let

Y � G(Z) = F −1Y (FZ(Z)),

where G � F −1Y ◦FZ : IZ → IY . Obviously, Y has the same probability distribution

as that of Y , i.e., FY = FY , and we have YP= Y and E[Y 2] < ∞. This immediately

implies

E[Y 2] =∫

IY

y2dFY (y)

=∫ 1

0

(F −1

Y (u))2

du

=∫

IZ

(F −1

Y (FZ(z)))2

dFZ(z) < ∞.

Therefore, Y = G(Z) ∈ L2dFZ

(Iz). Since (5.17) and (5.18) is the orthogonal projec-tion of Y by the N th-degree gPC basis, the strong convergence of YN to Y in mean

square implies convergence in probability, i.e., YNP→ Y as N → ∞. Since Y

P= Y ,the main conclusion follows. Since convergence in probability implies convergence

in distribution, YNd→ Y . The completes the proof.

Example 5.8 (Approximating beta distribution by gPC Hermite expansion).Let the probability distribution of Y be a beta distribution with probability densityfunction ρ(y) ∝ (1 − y)α(1 + y)β . In this case, if one chooses the correspondingJacobi polynomials as the gPC basis function, then the first-order gPC expansioncan satisfy the distribution exactly. However, suppose one chooses to employ thegPC Hermite expansion in terms of Z ∼ N (0, 1); then a weak approximation canstill be obtained via the procedure discussed here. All is needed is a numerical ap-proximation of the integral (5.15) or (5.16). In figure 5.1, the convergence in PDFis shown for different orders of the gPC Hermite expansion. Numerical oscillationsnear the corners of the distributions can be clearly seen, resembling Gibbs oscilla-tions. Note that the support of Y is in [−1, 1] and is quite different from the supportof Z (which is R).

Example 5.9 (Approximating exponential distribution by gPC Hermite expan-sion). Now let us assume that the distribution of Y is an exponential distributionwith the PDF ρ(y) ∝ e−y . Figure 5.2 shows the convergence of PDF by the gPCHermite expansions. Note that the first-order expansion, which results in a Gaussiandistribution, is entirely different from the target exponential distribution. As theorder of expansion is increased, the approximation improves. In this case, if onechooses the corresponding gPC basis, i.e., the Laguerre polynomials, then the first-order expansion can produce the exponential distribution exactly.



−1 −0.5 0 0.5 1 1.5 20

0.5

1

1.5exact1st−order3rd−order5th−order

−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

0.5

1

1.5

2

2.5

3


Figure 5.1 Approximating beta distributions by gPC Hermite expansions: convergence ofprobability density functions with increasing order of expansions. Left: approxi-mation of uniform distribution α = β = 0. Right: approximation of beta distri-bution with α = 2, β = 0. (More details are in [120].)

−4 −2 0 2 4 6 8 10 12 140

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1exact1st−order3rd−order5th−order

Figure 5.2 Approximating an exponential distribution by gPC Hermite expansions: conver-gence of probability density function with increasing order of expansions.

Example 5.10 (Approximating Gaussian distribution by gPC Jacobi expan-sion). Let us assume that the distribution of Y is the standard Gaussian N (0, 1)

and use the gPC Jacobi expansion to approximate the distribution. The conver-gence in PDF is shown in figure 5.3, where both the Legendre polynomials and theJacobi polynomials with α = β = 10 are used. We observe some numerical oscilla-tions when using the gPC Legendre expansion. Again, if we use the correspondinggPC basis for Gaussian distribution, the Hermite polynomials, then the first-orderexpansion Y1 = H1(Z) = Z will have precisely the desired N (0, 1) distribution.

It is also worth noting that the approximations by gPC Jacobi chaos with α =β = 10 are quite good, even at the first order. This implies that the beta distributionwith α = β = 10 is very close to the Gaussian distribution N (0, 1). However,


64 CHAPTER 5

−4 −3 −2 −1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5


−4 −3 −2 −1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45exact 1st−order3rd−order5th−order

Figure 5.3 Approximations of Gaussian distribution by gPC Jacobi expansions: conver-gence of probability density functions with increasing order of expansions. Left:approximation by gPC Jacobi polynomials with α = β = 0 (Legendre polyno-mials). Right: approximation by gPC Jacobi polynomials with α = β = 10.

a distinct feature of the beta distribution is that it is strictly bounded in a closeinterval. This suggests that in practice when one needs a distribution that is closeto Gaussian but with strict bounds, mostly because of concerns from a physicalor mathematical point of view, then the beta distribution can be a good candidate.More details on this approximation can be found in appendix B.

From these examples it is clear that when the corresponding gPC polynomialsfor a given distribution function can be constructed, particularly for the well-knowncases listed in table 5.1, it is best to use these basis polynomials because a properfirst-order expansion can produce the given distribution exactly. Using other typesof polynomials can still result in a convergent series at the cost of inducing approx-imation errors and more complex gPC representation.

5.2 DEFINITION IN MULTIPLE RANDOM VARIABLES

When more than one independent random variables are involved, multivariate gPCexpansion is required. Let Z = (Z1, . . . , Zd) be a random vector with mutuallyindependent components and distribution FZ(z1, . . . , zd) = P (Z1 ≤ z1, . . . , Zd ≤zd). For each i = 1, . . . , d, let FZi

(zi) = P (Zi ≤ zi) be the marginal distribution ofZi , whose support is IZi

. Mutual independence among all Zi implies that FZ(z) =∏di=1 FZi

(zi) and IZ = IZi× · · · × IZd

. Also, let {φk(Zi)}Nk=0 ∈ PN (Zi) be the

univariate gPC basis functions in Zi of degree up to N . That is,

E [φm(Zi)φn(Zi)] =∫

φm(z)φn(z)dFZi(z) = δmnγm, 0 ≤ m, n ≤ N. (5.20)

Let i = (i1, . . . , id) ∈ Nd0 be a multi-index with |i| = i1 + · · · + id . Then, the

d-variate N th-degree gPC basis functions are the products of the univariate gPCpolynomials (5.20) of total degree less than or equal to N ; i.e.,

�i(Z) = φi1(Z1) · · · φid (Zd), 0 ≤ |i| ≤ N. (5.21)



It follows immediately from (5.20) that

E[�i(Z)�j(Z)

] =∫

�i(z)�j(z)dFZ(z) = γiδij, (5.22)

where γi = E[�2i ] = γi1 · · · γid are the normalization factors and δij = δi1j1 · · · δid jd

is the d-variate Kronecker delta function. It is obvious that the span of the polyno-mials is Pd

N , the linear space of all polynomials of degree at most N in d variables,

PdN (Z) =

p : IZ → R

∣∣∣∣∣∣p(Z) =∑|i|≤N

ci�i(Z)

, (5.23)

whose dimension is

dimPdN =

(N + d

N

). (5.24)

The space of homogeneous gPC, following Wiener’s notion of homogeneous chaos,is the space spanned by the gPC polynomials in (5.21) of degree precisely N ;that is,

PdN (Z) =

p : IZ → R

∣∣∣∣∣∣p(Z) =∑|i|=N

ci�i(Z)

(5.25)

and

dimPdN =

(N + d − 1

N

). (5.26)

The d-variate gPC projection follows the univariate projection in a direct manner.Let L2

dFZ(IZ) be the space of all mean-square integrable functions of Z with respect

to the measure dFZ; that is,

L2dFZ

(IZ) ={

f : IZ → R

∣∣∣∣E[f 2(Z)] =∫

IZ

f 2(z)dFZ(z) < ∞}

. (5.27)

Then for f ∈ L2dFZ

(IZ), its N th-degree gPC orthogonal projection is defined as

PN f =∑|i|≤N

fi�i(Z), (5.28)

where

fi = 1

γiE[f �i] = 1

γi

∫f (z)�i(z)dFZ(z), ∀|i| ≤ N. (5.29)

The classical approximation theory can be readily applied to obtain


→ 0, N → ∞, (5.30)

and


= infg∈Pd

N

‖f − g‖L2dFZ

. (5.31)


66 CHAPTER 5

Table 5.2 An example of graded lexicographic ordering of the multi-index i in d = 4 di-mensions

|i| Multi-index i Single index k

0 (0 0 0 0) 11 (1 0 0 0) 2

(0 1 0 0) 3(0 0 1 0) 4(0 0 0 1) 5

2 (2 0 0 0) 6(1 1 0 0) 7(1 0 1 0) 8(1 0 0 1) 9(0 2 0 0) 10(0 1 1 0) 11(0 1 0 1) 12(0 0 2 0) 13(0 0 1 1) 14(0 0 0 2) 15

3 (3 0 0 0) 16(2 1 0 0) 17(2 0 1 0) 18

· · · · · ·

Up to this point, the mutual independence among the components of Z has notbeen used explicitly, in the sense that all the expositions are made on dFZ(z) andIZ in a general manner and do not require the properties of IZ = IZ1 ×· · ·×IZd

anddFZ(z) = dFZ1(z1) · · · dFZd

(zd), which are direct consequences of independence.This implies that the above presentation of gPC is applicable to more general cases.

Although clear for the formulation, the multi-index notations employed here arecumbersome to manipulate in practice. It is therefore preferable to use a singleindex to express the gPC expansion. A popular choice is the graded lexicographicorder, where i > j if and only if |i| ≥ |j| and the first nonzero entry in the difference,i − j, is positive. Though other choices exist, the graded lexicographic order is themost widely adopted one in practice. The multi-index can now be ordered in anascending order following a single index. For example, for a (d = 4)-dimensionalcase, the graded lexicographic order is shown in table 5.2.

Let us also remark that the polynomial space (5.23) is not the only choice. An-other option is to keep the highest polynomial order for up to N in each direction.That is,

PdN (Z) =

p : IZ → R

∣∣∣∣∣∣p(Z) =∑

|i|0≤N

ci�i(Z)

, (5.32)

where |i|0 = max1≤j≤d ij . This kind of space is friendly to theoretical analysis (e.g.,[8]), as properties of one dimension can be more easily extended. On the other hand,



dim PdN = Nd . And for large dimensions, the number of basis functions grows too

fast. Therefore, this space is usually not adopted in practical computations.

5.3 STATISTICS

When a sufficiently accurate gPC expansion for a given function f (Z) is avail-able, one has in fact an analytical representation of f in terms of Z. Therefore,practically all statistical information can be retrieved from the gPC expansion in astraightforward manner, either analytically or with minimal computational effort.

Let us use a random process to illustrate the idea. Consider a process f (t, Z),Z ∈ Rd and t ∈ T , where T is an index set. For any fixed t ∈ T , let

fN (t, Z) =∑|i|≤N

fi(t)�i(Z) ∈ PdN

be an N th-degree gPC approximation of f (t, Z); i.e., fN ≈ f in a proper sense(e.g., mean square) for any t ∈ T . Then the mean of f can be approximated as

µf (t) � E[f (t, Z)] ≈ E[fN (t, Z)] =∫

∑|i|≤N

fi(t)�i(z)

dFZ(z) = f0(t),

(5.33)following the orthogonality of the gPC basis functions (5.22). The second moments,e.g., the covariance function, can be approximated by, for any t1, t2 ∈ T ,

Cf (t1, t2)�E[(f (t1, Z) − µf (t1))(f (t2, Z) − µf (t2))]≈E[(fN (t1, Z) − f0(t1))(fN (t2, Z) − f0(t2))]=

∑0<|i|≤N

[γifi(t1)fi(t2)]. (5.34)

The variance of f can be obviously approximated by, for any t ∈ T ,

var(f (t, Z)) = E[(

f (t, Z) − µf (t))2

]≈

∑0<|i|≤N

[γif

2i (t)

]. (5.35)

Other statistical quantities of f can also be readily approximated by applyingtheir definitions directly to the gPC approximation fN .


Chapter Six

Stochastic Galerkin Method

In this chapter we discuss the generalized polynomial chaos (gPC) Galerkin methodfor solving stochastic systems. We first introduce the main idea via a general sto-chastic partial differential equation (PDE) and then illustrate more detailed proper-ties of the method by applying it to several representative problems.

6.1 GENERAL PROCEDURE

Again, without loss of generality, we utilize the stochastic PDE system (4.13). Fora physical domain D ⊂ R�, � = 1, 2, 3, and T > 0, consider

ut (x, t, Z) = L(u), D × (0, T ] × Rd ,

B(u) = 0, ∂D × [0, T ] × Rd ,

u = u0, D × {t = 0} × Rd ,

(6.1)

where again L is the differential operator, B is the boundary condition operator, u0

is the initial condition, and Z = (Z1, . . . , Zd) ∈ Rd , d ≥ 1, are a set of mutuallyindependent random variables characterizing the random inputs to the governingequation. For ease of presentation, let us consider a scalar equation where

u(x, t, Z) : D × [0, T ] × Rd → R.

For a system of equations, the gPC expansion will be applied to each component ofu individually.

Let {�k(Z)} be the gPC basis functions satisfying

E[�i(Z)�j(Z)] = δijγi (6.2)

and let PdN (Z) be the space of all polynomials of Z ∈ Rd of degree up to N . Then

the gPC projection of the solution is, for any fixed (x, t),

uN (x, t, Z) =N∑

|i|=0

ui(x, t)�i(Z), ui(x, t) = 1

γiE[u(x, t, Z)�i(Z)]. (6.3)

Though this is the optimal (in the L2dFZ

sense) approximation in PdN , it is not of

practical use since the projection requires knowledge of the unknown solution.The stochastic Galerkin procedure is a straightforward extension of the classical

Galerkin approach for deterministic equations. That is, we seek a solution in PdN

such that the residue of (6.1) is orthogonal to the space PdN . By utilizing the gPC


STOCHASTIC GALERKIN METHOD 69

orthogonal basis functions (6.2), we obtain the following procedure: for any x and t ,we seek vN ∈ Pd

N in the form of

vN (x, t, Z) =N∑

|i|=0

vi(x, t)�i(Z), (6.4)

such that for all k satisfying |k| ≤ N ,

E[∂tvN (x, t, Z)�k(Z)] = E[L(vN )�k], D × (0, T ],E[B(vN )�k] = 0, ∂D × [0, T ],vk = u0,k, D × {t = 0},

(6.5)

where u0,k = E[u0�k]/γk are the gPC projection coefficients for the initial con-dition. Upon evaluating the expectations in (6.5), the dependence in Z disappears.The result is a system of (usually coupled) deterministic equations. The size of thesystem is dimPd

N = (N+d

N

).

6.2 ORDINARY DIFFERENTIAL EQUATIONS

Let us use the ordinary differential equation in example 4.4 to illustrate the mainsteps of the gPC Galerkin method.

du

dt(t, Z) = −α(Z)u, u(t = 0, Z) = β,

where the initial condition is assumed to be deterministic (for simplicity). Wealso assume that the random rate constant follows a normal distribution; i.e., α ∼N (µ, σ 2). The corresponding gPC basis will be the Hermite polynomials. Since α

is the only random input, we need only univariate gPC Hermite expansion{Hk(Z)}N

k=0, N > 0, where Z ∼ N (0, 1) is the standard normal random variablewith zero mean and unit variance. The constant α can be expressed as α = µ+σZ.Or, in a more general form,

αN (Z) =N∑

i=0

aiHi(Z),

wherea0 = µ, a1 = σ, ai = 0, i > 1.

Usually αN is an approximation of α. However, in this case it is an exact expressionas long as N ≥ 1. Similarly, the initial condition has a trivial gPC projection,

βN =N∑

i=0

biHi(Z),

where

b0 = β, bi = 0, i > 0,

which is exact for N > 0.


70 CHAPTER 6

Let

vN (t, Z) =N∑

i=0

vi (t)�i(Z)

be the N th-degree gPC approximation we seek. The gPC Galerkin procedureresults in

E

[dvN

dtHk

]= E[−αN vN Hk], ∀k = 0, . . . , N.

Upon substituting in the gPC expression for αN and vN , we obtain

dvk

dt= − 1

γk

N∑i=0

N∑j=0

ai vj eijk ∀k = 0, . . . , N, (6.6)

where

eijk = E[Hi(Z)Hj (Z)Hk(Z)], 0 ≤ i, j, k ≤ N, (6.7)

are constants. Like the normalization factors γk , these constants can be evaluatedprior to any computations. In fact, for Hermite polynomials these constants can beevaluated analytically,

γk = k! k ≥ 0, (6.8)

eijk = i!j !k!(s − i)!(s − j)!(s − k)! , s ≥ i, j, k, and 2s = i + j + k is even. (6.9)

For other types of gPC basis functions, the analytical expressions for the con-stants may not exist. In such cases, one can use numerical quadrature rules with asufficient number of points to compute the constants numerically but exactly sincethe integrands are of polynomial form.

System (6.6) is thus a system of deterministic ordinary differential equations forthe coefficients {vk(t)} with initial conditions

vk(0) = bk, 0 ≤ k ≤ N. (6.10)

The size of the system is N + 1, and the equations are coupled. Classical numericalmethods, e.g., Runge-Kutta methods, can be applied, and usually the coupling ofthe system does not pose serious numerical challenges. (More details on numericalstudies of this problem can be found in [120].)

We can also rewrite the system in a compact form by using vector notation. Bytaking the summation over i in (6.6) and defining

Ajk = − 1

γk

N∑i=0

aieijk,

we let A = (Ajk)0≤j,k≤N be a (N + 1) × (N + 1) matrix. Denote v(t) =(v0, . . . , vN )T ; then (6.6) can be written as

dvdt

(t) = AT v, v(0) = b, (6.11)

where b = (b0, . . . , bN )T .



6.3 HYPERBOLIC EQUATIONS

Let us now consider a simple linear wave equation

∂u(x, t, Z)

∂t= c(Z)

∂u(x, t, Z)

∂x, x ∈ (−1, 1), t > 0, (6.12)

where c(Z) is a random transport velocity that is a function of a random variableZ ∈ R. For now we will leave the distribution of Z unspecified and study thegeneral properties of the resulting gPC Galerkin system. The initial condition isgiven by

u(x, 0, Z) = u0(x, Z). (6.13)

The boundary conditions are more complicated, as they depend on the sign ofthe random transport velocity c(Z). A well-posed set of boundary conditions isgiven by

u(1, t, Z) = uR(t, Z), c(Z) > 0,

u(−1, t, Z) = uL(t, Z), c(Z) < 0. (6.14)

The interesting issue to understand is how to properly pose the boundary conditionsfor the gPC Galerkin system.

Again an univariate gPC expansion is sufficient. For ease of analysis, let us usethe normalized gPC basis functions,

E[�i(Z)�j (Z)] = δij , 0 ≤ i, j ≤ N.

Note that the normalization only requires dividing the nonnormalized basis by thesquare root of the normalization constants. It facilitates the theoretical analysisonly. In practical implementations, one does not need to normalize the basis. Withthe gPC Galerkin method, we seek, for any (x, t),

vN (x, t, Z) =N∑

i=0

vi (x, t)�i(Z) (6.15)

and conduct the projection

E

[∂vN (x, t, Z)

∂t�k(Z)

]= E

[c(Z)

∂vN (x, t, Z)

∂x�k(Z)

]

for each of the first N + 1 gPC basis k = 0, . . . , N . We obtain

∂vk(x, t)

∂t=

N∑i=0

aik

∂vi(x, t)

∂x, k = 0, . . . , N, (6.16)

where

aik = E[c(Z)�i(Z)�k(Z)], 0 ≤ i, k ≤ N. (6.17)

This is now a coupled system of wave equations of size N + 1, where the couplingis through the random wave speed. If we denote by A the (N + 1) × (N + 1)


72 CHAPTER 6

matrix whose entries are {aik}0≤i,k≤N , then by definition aik = aki and A = AT issymmetric. Let v = (v0, . . . , vN )T be a vector of length N + 1; then system (6.16)can be written as

∂v(x, t)

∂t= A

∂v(x, t)

∂x. (6.18)

It is now clear that system (6.18) is symmetric hyperbolic. Therefore, a completeset of real eigenvalues and eigenfunctions exists. Moreover, we can understandthe signs of the eigenvalues, which indicate the direction of the wave for the gPCGalerkin system (6.18), based on the signs of wave direction in the original system(6.12).

Theorem 6.1. Consider the gPC Galerkin system (6.18) derived from the originalsystem (6.12). Then if c(Z) ≥ 0 (respectively, c(Z) ≤ 0) for all Z, then the eigen-values of A are all nonnegative (respectively, nonpositive); if c(Z) changes sign,i.e., c(Z) > 0 for some Z and c(Z) < 0 for some other Z, then A has both positiveand negative eigenvalues for sufficiently high gPC expansion order N .

The proof can be found in [48].A less trivial issue is how to impose the inflow-outflow boundary conditions for

the hyperbolic system (6.18), especially when the wave speed changes signs in theoriginal system (6.12). Note that the explicit information about the sign of the wavespeed disappears in the gPC Galerkin system (6.18). Because (6.18) is symmetrichyperbolic, we can diagonalize the system and then impose boundary conditionsbased on the sign of the eigenvalues.

Since A is symmetric, there exists an orthogonal matrix ST = S−1 such thatST AS = , where is a diagonal matrix whose entries are the eigenvalues of A;i.e.,

= diag(λ0, . . . , λj+ , . . . , λj− , . . . , λN ).

Here the positive eigenvalues occupy indices j = 0, . . . , j+, the negatives onesoccupy indices j = j−, . . . , N , and the rest, if they exist, are zeros. Obviously,j+, j− ≤ N .

Denote q = (q0, . . . , qN )T = ST v, i.e., qj (x, t) = ∑Nk=0 skj vk(x, t), where sjk

are the entries for S; then we obtain

∂q(x, t)

∂t=

∂q(x, t)

∂x. (6.19)

The boundary conditions of this diagonal system are determined by the sign of theeigenvalues; i.e., we need to specify

qj (1, t) =N∑

k=0

skj uk(1, t), j = 0, . . . , j+,

qj (−1, t) =N∑

k=0

skj uk(−1, t), j = j−, . . . , N.



0 2 4 6 8 10 12 14 16 18 2010

−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

Order

t=2t=5t=10t=15t=20

0 5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t

ExactOrder=4Order=8Order=12

Figure 6.1 Convergence property of the gPC Galerkin solution to the wave problem (6.12).Left: error convergence with respect to the order of gPC expansion at differenttimes. Right: evolution of the solution in mean-square norm in time at differentgPC orders.

Here the coefficients uk at the boundaries are determined by the exact gPC projec-tion of the boundary conditions of u, i.e., uR and uL. Subsequently, the boundaryconditions for the gPC Galerkin system of equations (6.16) are specified as

v(1, t) = Sq(1, t), v(−1, t) = Sq(−1, t).

For vanishing eigenvalues, if they exist, no boundary conditions are required.It can be shown that the solution of the gPC Galerkin system (6.18) converges to

the exact solution. In fact, the following error bound was established (seetheorem 2.2 in [48]),

E[‖u − vN ‖2

2

] ≤ C

N2m−1t, (6.20)

where ‖ · ‖ is the standard L2 norm in the physical domain (−1, 1), C is a con-stant independent of N , t is time, and m > 0 is a real constant depending on thesmoothness of u in terms of Z.

A notable feature of the error bound is that the error depends on time in a linearmanner. This implies that for any fixed gPC expansion order N , the error will growlinearly in time. This can be seen in figure 6.1, where the gPC Galerkin solution isapplied to a simpler version of (6.12) with c(Z) = Z, a uniform random variable in(−1, 1), periodic boundary conditions in space in a domain 0 ≤ x ≤ 2π , and initialcondition u(x, 0, Z) = cos(x). The exact solution is uex = cos(x −Zt). On the leftside of figure 6.1, while we again see the exponential error convergence as the gPCorder is increased, the error is bigger at a larger time t and requires higher expansionorders to reach the converging regime. The time dependence becomes more evidenton the right side of figure 6.1, where the evolution of the mean-square norm of thesolutions are plotted. We observe that the gPC Galerkin solutions deviate from theexact solution after a certain time. The time at which the accuracy is lost, i.e., errorsbecome O(1), is roughly proportional to the order of the gPC expansion.


74 CHAPTER 6

It is important to realize the following facts.

• The linear growth of error is not a result of the boundary condition treatment.This is obvious from the results in figure 6.1 because they are obtained froma problem with a periodic condition (and require no special treatment of theboundary conditions).

• Furthermore, the linear growth of error is not a result of the Galerkin ap-proach. In fact, a direct gPC orthogonal projection of the exact solution uex =cos(x − Zt) would require more and more basis function in order to keepa fixed projection error. This is because as far as the expansion in Z is con-cerned, the time t behaves as a wave number. A larger time t thus requires finerrepresentations. This is the fundamental property of approximation theory.

6.4 DIFFUSION EQUATIONS

Let us consider a time-dependent stochastic diffusion equation

∂u

∂t(x, t, Z) = ∇x · (κ(x, Z)∇xu) + f (x, t), x ∈ D, t ∈ (0, T ],

u(x, 0, Z) = u0(x), u|∂D = 0.

(6.21)

Here we use ∇x to explicitly specify that the differentiation is in the physical co-ordinates x. We assume that the only source of uncertainty is from the diffusivityfield κ , which causes coupling of the resulting gPC Galerkin system of equations.Uncertainty in the source term f and initial condition u0 will not cause couplingand can be dealt with easily. We assume the diffusivity field takes a form

κ(x, Z) = κ0(x) +d∑

i=1

κi (x)Zi, (6.22)

where κi (x) are deterministic functions obtained by applying a parameterizationprocedure (e.g., the Karhunen-Loeve expansion) to the diffusivity field and thatZ = (Z1, . . . , Zd) are mutually independent random variables with specified prob-ability distributions. Alternatively, we can write (6.22) as

κ(x, Z) =d∑

i=0

κi (x)Zi, (6.23)

where Z0 ≡ 1 is fixed. For the problem to be well posed, we require

κ(x, Z) ≥ κmin > 0, ∀x, Z. (6.24)

Such a requirement obviously excludes probability distributions of Z that take neg-ative values with nonzero probability, e.g., Gaussian distributions.

Again we seek an N th-degree gPC approximation

vN (t, x, Z) =N∑

|k|=0

vk(t, x)�k(Z),



where E[�i(Z)�j(Z)] = δij. Here we again choose to normalize the gPC basisfirst. For ease of exposition, let us adopt the single-index notation discussed insection 5.2 and rewrite the gPC expansion as

vN (t, x, Z) =M∑

i=1

vi (t, x)�i(Z), M =(

N + d

N

), (6.25)

where the index i is determined by a proper ordering, e.g., the graded lexicographicordering, of the multi-index i ∈ Nd

0 .Upon substituting (6.23) and the gPC expansion (6.25) into the governing equa-

tion (6.21) and projecting the resulting equation onto the subspace spanned by thefirst M gPC basis polynomials, we obtain, for all k = 1, . . . , M ,

∂vk

∂t(t, x) =

d∑i=0

M∑j=1

∇x · (κi(x)∇x vj )eijk + fk(t, x)

=M∑

j=1

∇x · (ajk(x)∇x vj

) + fk(t, x), (6.26)

where

eijk = E[Zi�j �k] =∫

zi�j (z)�k(z)dFZ(z), 0 ≤ i ≤ d, 1 ≤ j, k ≤ M,

ajk(x) =d∑

i=0

κi (x)eijk, 1 ≤ j, k ≤ M, (6.27)

and fk(t, x) are the gPC projection coefficients for the source term f (x, t). (Forthe simple case of deterministic f considered here, f1 = f and fk = 0 for k > 1.)

The gPC Galerkin system (6.26) is a coupled system of diffusion equations. Itcan be put into a compact form by using vector matrix notation. Let us denotev = (v1, . . . , vM)T , f = (f1, . . . , fM)T , and A(x) = (ajk)1≤j,k≤M . By definition,A = AT is symmetric. The system (6.26) can be written as

∂v∂t

(t, x) = ∇x · [A(x)∇xv] + f, (t, x) ∈ (0, T ] × D,

v(0, x) = v0(x), v|∂D = 0,

(6.28)

where v0(x) = (u0,1, . . . , u0,M)T is the gPC projection coefficient vector of theinitial condition u0(x) in (6.21). For the deterministic initial condition consideredhere, u0,1 = u0(x) and u0,k = 0 for k > 1.

The coupling of the diffusion terms in (6.28) does not pose a serious problem ifthe system is solved explicitly in time. However, an explicit time integration usuallyimposes a severe restriction on the size of the time step because of concerns aboutnumerical stability. To circumvent this difficulty, one can employ a semi-implicitscheme where the diagonal terms of A are treated implicitly and the off-diagonalterms of A are treated explicitly. This results in a naturally uncoupled system to


76 CHAPTER 6

solve with no loss of accuracy in time integration. For example, a first-oder Eulerforward-backward semi-implicit scheme takes the form

vn+1 − vn

t− ∇x · [

D(x)∇xvn+1] = ∇x · [

S(x)∇xvn] + fn+1, (6.29)

where the superscript n denotes numerical solutions at time level tn, t is the timestep, and

D = diag(A), A = D + S.

Similarly, if we consider the steady-state counterpart of (6.21),

−∇x · (κ(x, Z)∇xu(x, Z)) = f (x), x ∈ D; u(x, Z)|∂D = 0, (6.30)

we find that the gPC Galerkin system is

−∇x · [A(x)∇xv] = f, x ∈ D; v|∂D = 0. (6.31)

This is a coupled system of elliptic equations. By using the separation of diago-nal and off-diagonal terms of A, an efficient iterative scheme can be designed tosolve the system as an uncoupled system of equations. These algorithms were firstproposed in [119, 122] and later analyzed in [128].

6.5 NONLINEAR PROBLEMS

The above examples all involve linear problems. This does not imply that the gPCGalerkin method can be applied only to linear problems. (In fact, as far as the ran-dom space is concerned, none of the examples are linear because the randomnessin the equations is all in a multiplicative manner.)

Let us consider the Burgers’ equation from the supersensitivity example insection 1.1.1 to illustrate application of the gPC Galerkin method to nonlinearproblems. {

ut + uux = νuxx, x ∈ [−1, 1],u(−1) = 1 + δ(Z), u(1) = −1,

(6.32)

where δ(Z) > 0 is a random perturbation to the left boundary condition at (x =−1) and ν > 0 is the viscosity. Again this requires a one-dimensional gPC expan-sion. We seek

vN (x, t, Z) =N∑

i=0

vi (x, t)�i(Z)

such that

E

[∂vN

∂t�k

]+ E

[vN

∂vN

∂x�k

]= νE

[∂2vN

∂x2�k

], k = 0, . . . , N.

By substituting vN into the equation and using the orthogonality relation of thebasis functions, we obtain

∂vk

∂t+ 1

γk

N∑i=0

N∑j=0

vi

∂vj

∂xeijk = ν

∂2vk

∂x2, k = 0, . . . , N, (6.33)



where eijk = E[�i�j �k] are constants and γk = E[�2k] are the normalization

constants (which will be 1 if the basis functions are normalized).This is a coupled system of equations where each equation resembles the origi-

nal Burgers’ equation and the coupling is through the nonlinear term. The classicalsemi-implicit scheme can be applied to solve the system in time, where the non-linear coupling terms are treated explicitly and the diffusion terms implicitly. Formore details, see [123].

The nonlinear term uux in the Burgers’ equation is in quadratic form and resultsin a gPC projection

E

[vN

∂vN

∂x�k

]=

N∑i=0

N∑j=0

vi

∂vj

∂xE[�i�j �k]

that can be easily evaluated as long as the term is treated as being explicit in time.In many cases, however, nonlinear terms in a system do not take polynomial formand a direct gPC projection is not straightforward. For example, let us consider theprojection of a nonlinear term eu, where u is the unknown solution. A gPC Galerkinprojection requires us to evaluate

E[evN �k

] =∫

e∑

i vi�i (z)�k(z)dFZ(z), (6.34)

where vN is the N th-degree gPC approximation of u. It is clear that the integral overz cannot be separated from the summation over i, as in the case of polynomial-typenonlinearity.

A feasible treatment for such kinds of nonlinearity is to approximate the integral(6.34) numerically. To this end, one can employ a quadrature rule, or a cubaturerule in multivariate cases, with sufficient accuracy. That is,

E[evN �k

] ≈Q∑

j=1

evN (z(j))�k(z(j))w(j), (6.35)

where z(j) and w(j) are the nodes and weights of the integration rule in the domaindefined by the integral. Note that since vN (Z) takes a known polynomial form, theevaluation of evN at any node is a simple exercise in polynomial evaluation.


Chapter Seven

Stochastic Collocation Method

In this chapter we discuss the basic ideas behind the stochastic collocation (SC)method, also referred to as the probabilistic collocation method (PCM). The collo-cation methods are a popular choice for complex systems where well-establisheddeterministic codes exist. We first clarify the notion of stochastic collocation, forthe purposes of this book, and then discuss the major numerical approaches. As inthe rest of the book, we discuss only the fundamental aspects of SC and leave themore advanced research issues untouched. This is particularly true for this chapterbecause SC has undergone rapid development after its systematic introduction in[118].

7.1 DEFINITION AND GENERAL PROCEDURE

In deterministic numerical analysis, collocation methods are those that require theresidue of the governing equations to be zero at discreet nodes in the computa-tional domain. The nodes are called collocation points. The same definition canbe extended to stochastic simulations. Let us use the stochastic partial differentialequation (PDE) system (4.13) again to explain the idea,

ut (x, t, Z) = L(u), D × (0, T ] × IZ,

B(u) = 0, ∂D × [0, T ] × IZ,

u = u0, D × {t = 0} × IZ,

(7.1)

where IZ ⊂ Rd , d ≥ 1, is the support of Z. For any given x and t , let w(·, Z) bea numerical approximation of u. In general, w(·, Z) ≈ u(·, Z) in a proper sense inIZ , and the system (7.1) cannot be satisfied for all Z after substituting u for w.

Let �M = {Z(j)}Mj=1 ⊂ IZ be a set of (prescribed) nodes in the random space,

where M ≥ 1 is the number of nodes. Then in the collocation method, for allj = 1, . . . , M , we enforce (7.1) at the node Z(j) by solving

ut (x, t, Z(j)) = L(u), D × (0, T ],B(u) = 0, ∂D × [0, T ],u = u0, D × {t = 0}.

(7.2)

It is easy to see that for each j , (7.2) is a deterministic problem because the value ofthe random parameter Z is fixed. Therefore, solving the system poses no difficultyprovided one has a well-established deterministic algorithm. Let u(j) = u(·, Z(j)),j = 1, . . . , M , be the solution of the above problem. The result of solving (7.2) is


STOCHASTIC COLLOCATION METHOD 79

an ensemble of deterministic solutions {u(j)}Mj=1. And one can apply various post-

processing operations to the ensemble to extract useful information about u(Z).From this point of view, all classical sampling methods belong to the class of

collocation methods. For example, in Monte Carlo sampling, the nodal set �M isgenerated randomly according to the distribution of Z, and the ensemble averageis used to estimate the solution statistics, e.g., mean and variance. In determinis-tic sampling methods, the nodal set is typically the nodes of a cubature rule (i.e.,quadrature rule in multidimensional space) defined on IZ such that one can usethe integration rule defined by the cubature to estimate the solution statistics. Con-vergence of these classical sampling methods is then based on the convergence ofsolution statistics, e.g., moments, resulting in convergence in a weak measure suchas convergence in distribution.

In this book we do not label the classical sampling methods as stochastic collo-cation. Instead we reserve the term “stochastic collocation” for the type of collo-cation methods that result in a strong convergence, e.g., mean-square convergence,to the true solution. This is typically achieved by utilizing the classical multivari-ate approximation theory to strategically locate the collocation nodes to constructa polynomial approximation to the solution.

Definition 7.1 (Stochastic collocation). Let �M = {Z(j)}Mj=1 ⊂ IZ be a set of

(prescribed) nodes in the random space, where M ≥ 1 is the number of nodes,and let {u(j)}M

j=1 be the solution of the governing equation (7.2). Then find w(Z) ∈�(Z) in a proper polynomial space �(Z) such that w(Z) is an approximation tothe true solution u(Z) in the sense that ‖w(Z) − u(Z)‖ is sufficiently small in astrong norm defined on IZ .

Convergence of stochastic collocation thus requires

‖w(Z) − u(Z)‖ → 0, M → ∞,

where the norm is to be determined and is typically an Lp norm.As of the writing of this book, there exist two major approaches for high-order

stochastic collocation: the interpolation approach and the discrete projection ap-proach (the pseudospectral approach).

7.2 INTERPOLATION APPROACH

Interpolation is a natural approach to the stochastic collocation problem definedin definition 7.1. The problem can now be posed as follows: given the nodal set�M ⊂ IZ and {u(j)}M

j=1, find a polynomial w(Z) ∈ �(Z) such that w(Z(j)) = u(j)

for all j = 1, . . . , M .The goal can be easily accomplished, at least in principle. One way is to use a

Lagrange interpolation approach. That is, let

w(Z) =M∑

j=1

u(Z(j))Lj (Z), (7.3)


80 CHAPTER 7

where

Lj (Z(i)) = δij , 1 ≤ i, j ≤ M, (7.4)

are the Lagrange interpolating polynomials. The approach, albeit straightforwardin formulation, can become nontrivial in practice. This is mostly due to the factthat unlike the situation in univariate interpolation, where ample mathematical the-ory exists, many fundamental issues of multivariate Lagrange interpolation (whend > 1) are not clear. Issues such as the existence of Lagrange interpolating polyno-mials for any given set of notes are not well understood.

The other way is a matrix inversion approach, where we prescribe the polyno-mial interpolating basis first. For example, let us use a set of gPC polynomial bases�k(Z) and construct

wN (Z) =N∑

|k|=0

wk�k(Z)

as the gPC approximation of u(Z). The interpolation condition w(Z(j)) = u(j)

results in the following problem for the unknown expansion coefficients

AT w = u,

where

A = (�k(Z(j)), 0 ≤ |k| ≤ N, 1 ≤ j ≤ M,

is the Vandermonde-like coefficient matrix, w is the vector of the expansion coef-ficients, and u = (u(Z(1)), . . . , u(Z(M)))T . To prevent the problem from becomingunderdetermined, we require the number of collocation points not to be smallerthan the number of gPC expansion terms, i.e., M ≥ (

N+d

N

). The advantage of the

matrix inversion approach is that the interpolating polynomials are prescribed andwell defined. Once the nodal set is given, the existence of the interpolation canalways be determined in the spirit of determining whether the determinant of Ais zero. However, an important and very practical concern is the accuracy of theinterpolation. Even though the interpolation has no error at the nodal points, errorcan become wild between the nodes. This is particularly true in high-dimensionalspaces. Here again, we find rigorous analysis lacking and no general (and sound)guideline for choosing the location of the nodes. Many ad hoc choices do exist, forexample, those based on design of experiments (DoE) principles. However, nonehas become satisfactory for general purposes.

Since univariate interpolation is a well-studied topic, one solution to multivari-ate interpolation is to employ a univariate interpolation and then fill up the entirespace dimension by dimension. By doing so the properties and error estimates ofunivariate interpolation can be retained as much as possible. In fact, the afore-mentioned two approaches, the Lagrange interpolation and matrix inversion ap-proaches, are direct conceptual extensions of the univariate interpolation techniquesin section 3.4. Let us recall that in the univariate case d = 1, i.e., Z ∈ R. Let(Z(1), . . . , Z(N+1)) be a set of distinct nodes and let (u(1), . . . , u(N+1)) be the so-lution at the nodes. Then an interpolation polynomial �N f (Z) that interpolates



a given function f (Z) can be constructed either in the Lagrange form (3.42) orby inverting the Vandermonde matrix to obtain (3.43). Note that the two are equi-valent because of the uniqueness of univariate interpolation. It is also understoodthat the interpolating nodes offering high accuracy are the zeros of the orthogonalpolynomials {�k(Z)}.

7.2.1 Tensor Product Collocation

For multivariate cases with d > 1, for any 1 ≤ i ≤ d, let Qmibe an interpolating

operator such that

Qmi[f ] = �mi

f (Zi) ∈ Pmi(Zi)

is an interpolating polynomial of degree mi , for a given function f in the Zi variableby using mi + 1 distinct nodes in the set �

mi

1 = {Z(1)i , . . . , Z

(mi)i }. Then the most

straightforward approach to interpolating f in the entire space IZ ⊂ Rd is to use atensor product approach. That is,

QM = Qm1 ⊗ · · · ⊗ Qmd, (7.5)

and the nodal set is

�M = �m11 × · · · × �

md

1 , (7.6)

where the total number of nodes is M = m1 × · · · × md .By using the tensor product construction, all the properties of the underlying

one-dimensional interpolation scheme can be retained. And error estimate in theentire space can be easily derived. For example, let us assume that the number ofpoints in each dimension is a constant; i.e., m1 = · · · = md = m, and that theone-dimensional interpolation error in each dimension 1 ≤ i ≤ d follows

(I − Qmi)[f ] ∝ m−α,

where the constant α > 0 depends on the smoothness of the function f . Then theoverall interpolation error also follows the same convergence rate

(I − QM)[f ] ∝ m−α.

However, if we measure the convergence in terms of the total number of points,M = md in this case, then

(I − QM)[f ] ∝ M−α/d , d ≥ 1.

For large dimensions d � 1, the rate of convergence deteriorates drastically andwe observe very slow convergence, if there is any, in terms of the total number ofcollocation points. Moreover, the total number of points,

M = md,

grows very fast for large d. This poses a numerical challenge because each collo-cation point requires a simulation of the full-scale underlying deterministic system,which can be time-consuming. This is the well-known curse of dimensionality.For this reason, tensor product construction is mostly used for low-dimensionalproblems with d typically less than 5. A detailed theoretical analysis for stochasticdiffusion equations can be found in [7].


82 CHAPTER 7

7.2.2 Sparse Grid Collocation

An alternative approach is Smolyak sparse grids. A detailed presentation of theSmolyak construction, originally proposed in [96], is beyond the scope of thisentry-level textbook, and we refer interested readers to the many more recent stud-ies. It is sufficient, for the purposes of this book, to know that the Smolyak sparsegrids are still based on tensor product construction but are only a subset of the fulltensor grids. The construction takes the following form ([114]):

QN =∑

N−d+1≤|i|≤N

(−1)N−|i| ·(

d − 1

N − |i|)

· (Qi1 ⊗ · · · ⊗ Qid

), (7.7)

where N ≥ d is an integer denoting the level of the construction. Though theexpression is rather complex, (7.7) is nevertheless a combination of the subsets ofthe full tensor grids. The nodal set, the sparse grids, is

�M =⋃

N−d+1≤|i|≤N

(�i11 × · · · × �

id1 ). (7.8)

Again it is clear that this is the union of a collection of subsets of the full ten-sor grids. Unfortunately, there is usually no explicit formula to determine the totalnumber of nodes M in terms of d and N .

Because the construction in (7.7) employs one-dimensional interpolations withvarious numbers of nodes, it is preferable that one-dimensional nodal sets be nested.That is, the one-dimensional nodal sets satisfy

�i1 ⊂ �

j

1, i < j. (7.9)

If this condition is met, then the total number of nodes in (7.8) can reach a mini-mum. However, in practice, since one-dimensional nodes are typically the zeros oforthogonal polynomials, the nested condition (7.9) is usually not satisfied.

One popular choice of nested grids is Clenshaw-Curtis nodes, which are the ex-trema of Chebyshev polynomials and are defined as, for any 1 ≤ i ≤ d,

Z(j)

i = − cosπ(j − 1)

mki − 1

, j = 1, . . . , mki , (7.10)

where an additional index is introduced via the superscript k. With a slight abuseof notation, we will use k to index the point sets instead of using the total num-ber of points mi . Let the point sets double with an increasing index of k > 1, i.e.,mk

i = 2k−1 + 1, and define m1i = 1 and Z

(1)i = 0. It is easy to see that because

of the doubling of the nodes we have �k1 ⊂ �k+1

1 and that the sets are nested.The additional index k here is often referred to as the level of Clenshaw-Curtisgrids. The higher the level, the finer the grids. For a more detailed discussion ofClenshaw-Curtis nodes, see [27].

By using Clenshaw-Curtis grids as one-dimensional nodes, the Smolyak con-struction (7.7) can be expressed in terms of the level k as well. Let N = d + k,where k ≥ 0, and then the “nestedness” of the base one-dimensional nodes canbe retained, �k ⊂ �k+1. Again here we do not use the total number of nodes M ,which does not have an explicit and closed form expression in terms of d and k,



−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Figure 7.1 Two-dimensional (d = 2) nodes based on the same one-dimensional extrema ofChebyshev polynomials at level k = 5. Left: tensor grids. The total number ofpoints is 1089. Right: Smolyak sparse grids. The total number of nodes is 145.

to index the sets. In the more interesting case of high-dimensional spaces, the totalnumber of points satisfies the following estimate:

M = #�k ∼ 2kdk/k!, d � 1. (7.11)

It has been shown ([9]) that interpolation through the Clenshaw-Curtis sparse gridinterpolation is exact if the function is in Pd

k . (In fact, the polynomial space forwhich the interpolation is exact is slightly bigger than Pd

k .) For large dimensionsd � 1, dimPd

k = (d+k

d

) ∼ dk/k!. Therefore, the number of points from (7.11) isabout 2k more and the factor is independent of the dimension d. For this reason, theClenshaw-Curtis–based sparse grids construction is sometimes regarded as optimalin high dimensions. There have been extensive studies on the approximation prop-erties of sparse grids, particularly those based on Clenshaw-Curtis nodes. Here wecite one of the early results from [9]. For functions in space F �

d = {f : [−1, 1]d →R|∂ |i|f continuous, ij ≤ �, ∀j}, the interpolation error follows

‖I − QM‖∞ ≤ Cd,�M−�(log M)(�+2)(d−1)+1,

where M is the total number of nodes. Compared to that for tensor grids, the curseof dimensionality, albeit still present, is lessened.

An example of two-dimensional sparse grids is shown in figure 7.1, where weobserve a significant reduction in the number of nodes.

7.3 DISCRETE PROJECTION: PSEUDOSPECTRAL APPROACH

Another approach to achieving the goal of stochastic collocation, as defined indefinition 7.1, is to conduct discrete projection, or the pseudospectral approach(as it is termed in [116]). To this end, let us first recall the notion of the quadrature


84 CHAPTER 7

rule in section 3.5 and extend it to multivariate space. Often termed the cubaturerule, it is an integration rule that seeks to approximate an integral∫

f (z)dFZ(z), z ∈ Rd , d > 1, (7.12)

by

UQ[f ] �Q∑

j=1

f (z(j))α(j), Q ≥ 1, (7.13)

where (z(j), α(j)), j = 1, . . . , Q, are the nodes and their corresponding weights.The integration rule is convergent if it converges to the integral (7.12) as Q → ∞.Typically, the accuracy of an integration is measured by polynomial exactness. Anintegration rule of degree m implies that the approximation (7.13) is exact for anyintegrand f that is a polynomial of degree up to m and is not exact for at least onepolynomial of degree m + 1. Hereafter we will freely interchange the notation ofthe integration rule and the cubature rule, with an understanding that in univariatecases it is reduced to the quadrature rule.

To conduct discrete gPC projection, we recall the exact orthogonal gPC projec-tion of u(Z), the solution of (7.1),

uN (Z) = PN u =N∑

|k|=0

uk�k(Z), (7.14)

where the expansion coefficients are obtained as

uk = 1

γkE[u(Z)�k(Z)] = 1

γk

∫u(z)�k(z)dFZ(z), ∀|k| ≤ N, (7.15)

where γk = E[�2k] are the normalization constants of the basis.

The idea of discrete projection is to approximate the integrals in the expan-sion coefficients (7.15) of the continuous generalized polynomial chaos (gPC) pro-jection (7.14) by an integration rule. The discrete projection of the solution of(7.1) is

wN (Z) =N∑

|k|=0

wk�k(Z), (7.16)

where the expansion coefficients are

wk = 1

γkUQ[u(Z)�k(Z)] = 1

γk

Q∑j=1

u(z(j))�k(z(j))α(j). (7.17)

It is clear that by using the cubature rule UQ the coefficients {wk} are approxima-tions to the exact projection coefficients {uk} in (7.15). Subsequently, the discreteprojection wN (Z) approximates the continuous projection uN (Z) of (7.14). More-over, if the cubature rule is convergent, then wk converges to uk as Q → ∞, andwN and uN become identical. The following result is then straightforward.



Proposition 7.2. For u(Z) ∈ L2dFZ

(IZ), let uN (Z) be the gPC projection defined in(7.14) and (7.15) and let wN (Z) be the discrete gPC projection defined in (7.16)and (7.17). Assume that the cubature rule UQ used in (7.17) is convergent; then asQ → ∞, wk → uk, for all |k| ≤ N , and

wN (Z) → uN (Z), ∀Z. (7.18)

The error induced by wN can be easily separated by the triangular inequality.

Proposition 7.3. For u(Z) ∈ L2dFZ

(IZ), let uN (Z) be the gPC projection defined in(7.14) and (7.15) and let wN (Z) be the discrete gPC projection defined in (7.16)and (7.17). Then,

‖wN (Z) − u(Z)‖L2dFZ

≤ ‖uN (Z) − u(Z)‖L2dFZ

+‖wN (Z) − uN (Z)‖L2dFZ

. (7.19)

The first term of the error is the gPC projection error induced by using finite-order(N th-order) polynomials. The second term of the error is the difference betweenthe continuous gPC projection and the discrete projection and is caused by using acubature rule with finite accuracy. It can be expressed as

εQN � ‖wN (Z) − uN (Z)‖L2

dFZ

= N∑

|k|=0

(wk − uk)2γk

1/2

, (7.20)

and is termed “aliasing error” in [116], by following the similar nomenclature fromclassical deterministic spectral methods (cf. [13, 46]).

Similar to the interpolation approach, the construction of gPC expansion via(7.16) and (7.17) can also be considered a postprocessing step after all the com-putations are finished at the cubature nodes. A distinct feature of the discrete pro-jection approach is that one can compute only the coefficients that are importantfor a given problem without evaluating the rest of the expansion coefficients. Thismay happen, for example, when global sensitivity is required for some input ran-dom variables Z. This is in contrast to the gPC Galerkin method, where all the gPCcoefficients are coupled and solved simultaneously.

Since the main task in the discrete gPC projection approach is to approximate theintegrals in (7.15), the problem of multivariate polynomial approximation is trans-formed to a problem of multivariate integration, where the accuracy of the chosenintegration rule is critical. Compared to multivariate interpolation, which is usedby the stochastic interpolation collocation approach, there exist, relatively speak-ing, more results on multivariate integration, which is nevertheless a challengingand very active research topic.

7.3.1 Structured Nodes: Tensor and Sparse Tensor Constructions

Since Gauss quadrature rules offer high accuracy for univariate integrations, it isnatural to extend them to multivariate integrations. The most straightforward way ofconstructing high-order integration rules is to extend quadrature rules (in univariatecases) to high-dimensional spaces by using tensor construction. Let Umi be a Gaussquadrature rule in the Zi direction of Z = (Z1, . . . , Zd), d > 1, with a nodal


86 CHAPTER 7

set �mi

1 consisting of an mi ≥ 1 number of nodes. Let us assume that it is exact forall polynomials in P2mi−1(Zi). Then a tensor construction is

UQ = Um1 ⊗ · · · ⊗ Umd .

Obviously, the nodal set is

�Q = �m11 × · · · × �

md

1 ,

whose total number of nodes is Q = m1 × · · · × md . This integration rule is exactfor all polynomials in

P2m1−1(Z1) ⊗ · · · ⊗ P2md −1(Zd).

Though easy to construct and of high accuracy, the problem is again the rapidgrowth of the total number of points in high-dimensional random spaces. If weuse an equal number of nodes in all directions, m1 = · · · = md = m, then the totalnumber of nodes is Q = md . For d � 1, this can be a staggeringly large number.(Again let us keep in mind that at each node the full-scale deterministic problemneeds to be solved.) Consequently, the tensor product approach is mostly used atlower dimensions, e.g., d ≤ 5.

To reduce the total number of nodes while keeping most of the high accuracy of-fered by Gauss quadrature, the Smolyak sparse grids construction can be employed,similarly to the sparse interpolation discussed in section 7.2.2,

UQ =∑

N−d+1≤|m|≤N

(−1)N−|m| ·(

d − 1

N − |m|)

· (Um1 ⊗ · · · ⊗ Umd

), (7.21)

where N ≥ d is an integer denoting the level of construction. The grid set, thesparse grids, is

�Q =⋃

N−d+1≤|m|≤N

(�m11 × · · · × �

md

1 ). (7.22)

Again it is clear that this is the union of a collection of subsets of the full ten-sor grids. Usually there is no closed-form explicit formula for the total number ofnodes Q.

Depending on the choice of Gauss quadrature in one dimension, there are avariety of sparse grid constructions. And they offer different accuracy. Many ofthe constructions are based on the Clenshaw-Curtis rule in one dimension, whoseproperties are closely examined in [108]. Here we will not engage in further in-depth discussion of the technical details. Interested readers should see, for example,[9, 39, 83, 84].

7.3.2 Nonstructured Nodes: Cubature

The study of cubature rules is a relatively old topic but is still actively pursued. Thegoal is to construct a cubature rule with a high degree of polynomial exactness and alesser number of nodes. To achieve this, structured nodes are usually not consideredand many studies are based on geometric considerations. Most of the rules are givenin explicit formulas, regarding the node locations and their corresponding weights.



Depending on the required accuracy for the discrete gPC projection, one can choosea proper cubature with an affordable number of nodes for simulations. For extensivereviews and collections of available cubature rules, see, for example, [21, 49, 99].

It is worth pointing out that the classical error estimate, in terms of error boundsand such, is nontrivial to carry out for cubature rules. Consequently, the accuracyof cubature rules is almost always classified by their polynomial exactness.

7.4 DISCUSSION: GALERKIN VERSUS COLLOCATION

While the gPC expansion provides a solid framework for stochastic computations,a natural question to ask is, For a given practical stochastic system, should one usethe Galerkin method or the collocation method?

The advantage of stochastic collocation is clear—ease of implementation. Thealgorithms are straightforward: (1) choose a set of nodes according to either multi-variate interpolation theory or integration theory; (2) run deterministic code at eachnode; and (3) postprocess to construct the gPC polynomials, via either the interpo-lation approach (section 7.2) or the discrete projection approach (section 7.3). Theapplicability of stochastic collocation is not affected by the complexity or nonlin-earity of the original problem so long as one can develop a reliable deterministicsolver. The executions of the deterministic algorithm at each node are completelyindependent of each other and embarrassingly parallel. For these reasons, stochasticcollocation methods have become very popular.

The stochastic Galerkin method, on the other hand, is more cumbersome to im-plement. The Galerkin system (6.5) needs to be derived, and the resulting equationsfor the expansion coefficients are almost always coupled. Hence new codes need tobe developed to deal with the larger, coupled system of equations. When the origi-nal problem (6.1) takes highly complex and nonlinear form, the explicit derivationof the gPC equations can be nontrivial—sometimes impossible.

However, an important issue to keep in mind is the accuracy of the methods. Thestochastic Galerkin approach ensures that the residue of the stochastic governingequations is orthogonal to the linear space spanned by the gPC polynomials, as in(6.5). In this sense, the accuracy is optimal. On the other hand, stochastic colloca-tion approaches, with no error at the nodes, introduce errors either because of theinterpolation scheme (if interpolation collocation is used) or because of the inte-gration rule (if discrete projection collocation is used). Both errors are caused byintroduction of the nodal sets and can be classified as aliasing errors. Though in onedimension, aliasing error can be kept at the same order as the error of the finite orderGalerkin method, in multidimensional spaces the aliasing errors can be much moresignificant. Roughly speaking, at a fixed accuracy, which is usually measured interms of the polynomial exactness of the approximation, all of the existing colloca-tion methods require the solution of a (much) larger number of equations than thatrequired by the gPC Galerkin method, especially for higher-dimensional randomspaces. This suggests that the gPC Galerkin method offers the most accurate solu-tions involving the least number of equations in multidimensional random spaces,even though the equations are coupled.


88 CHAPTER 7

The exact cost comparison between the Galerkin and the collocation methodsdepends on many factors including error analysis, which is largely unknown, andeven on coding efforts involved in developing a stochastic Galerkin code. How-ever, it is safe to state that for large-scale simulations where a single determin-istic computation is time-consuming, the stochastic Galerkin method should bepreferred (because of the lesser number of equations) whenever (1) the couplingof gPC Galerkin equations does not incur much additional computational and cod-ing effort. For example, for Navier-Stokes equations with random boundary/initialconditions the evaluations of the coupling terms are trivial ([121]) or (2) efficientsolvers can be developed to effectively decouple the gPC Galerkin system. For ex-ample, for stochastic diffusion equations decoupling can be achieved in the mannerin (6.29).

Another factor that should be taken into account as part of the effort in imple-menting the stochastic Galerkin method is that the properties of the Galerkin systemmay not be clear, even when the baseline deterministic system is well understood.And this may affect our design of numerical algorithms for the Galerkin system. Asimple example is the linear wave equation with random wave speed in section 6.3.


Chapter Eight

Miscellaneous Topics and Applications

The discussions up to this point have involved numerical algorithms, mostly basedon generalized polynomial chaos (gPC), for propagating uncertainty from inputs tooutputs. Here we will discuss several related topics regarding variations and appli-cations of gPC methods. More specifically, we will consider efficient algorithmsfor

• random geometry, where the uncertain input is in the specification of the com-putational domain,

• parameter estimation, i.e., how to estimate the probability distribution of theinput parameters, and

• uncertainty in models and how to “correct” it by using available measurementdata.

For the second and third topics, the help of experimental measurement is required.gPC-based stochastic methods can improve the accuracy of existing methods atvirtually no simulation cost.

Unlike those in the previous chapters of this book, the topics in this chapter arecloser to research issues. However, here we will only briefly discuss these topicswith a focus on their direct connections with gPC methods. More in-depth generaldiscussions and literature reviews can be found in the references.

8.1 RANDOM DOMAIN PROBLEM

Throughout this book, we have always assumed that the computational domain isfixed and contains no uncertainty. In practice, however, it can be a major sourceof uncertainty, as in many applications the physical domain cannot be determinedprecisely. The problem with uncertain geometry, e.g., a rough boundary, has beenstudied in areas such as wave scattering with many specially designed techniques.(See, for example, a review in [113].) For general-purpose partial differential equa-tions (PDEs), however, numerical techniques in uncertain domains are less devel-oped. Here we discuss general numerical approaches by following one of the earliersystematic studies in [130].

Let D(ω), ω ∈ �, be a random domain whose boundary ∂D(ω) is the source ofrandomness. Since ∂D(ω) is a random process, we first seek to parameterize it by afunction of a finite number of independent random variables. This parameterizationprocedure is required for the ensuing stochastic simulations. Its implementation,though more or less straightforward on a conceptual level, can be nontrivial in


90 CHAPTER 8

practice. (Detailed discussions are in section 4.2.) Let ∂D(Z), Z ∈ Rd , d ≥ 1,be the parameterization of the random boundary. A partial differential equationdefined on this domain can be written as

ut (x, t) = L(x; u), D(Z) × (0, T ],B(u) = 0, ∂D(Z) × [0, T ],u = u0, D(Z) × {t = 0},

(8.1)

where x = (x1, . . . , x�), � = 1, 2, 3, is the coordinate in the random domain D(Z).For simplicity, here the only source of uncertainty is assumed to be from ∂D(Z).Note that even though the form of the governing equations is deterministic (it doesnot need to be), the solution still depends on the random variables Z and is a sto-chastic quantity. That is,

u(x, t) : D(Z) × [0, T ] → Rnu

depends implicitly on the random variables Z ∈ Rd .The key to solving (8.1) is to define the problem on a fixed domain where the op-

erations for statistical averaging become meaningful. A general approach is to usea one-to-one mapping ([130]). Let y = (y1, . . . , y�), � = 1, 2, 3, be the coordinatein a fixed domain E ⊂ R� and let

y = y(x, Z), x = x(y, Z), ∀Z ∈ Rd , (8.2)

be a one-to-one mapping and its inverse such that the random domain D(Z) can beuniquely transformed to the deterministic domain E. Then (8.1) can be transformedto the following: for all Z ∈ Rd , find v = v(y, Z) : E × Rd → R

nu such that

vt (y, t, Z) = L(y, Z; v), E × (0, T ] × Rd ,

B(v) = 0, ∂E × [0, T ] × Rd ,

v = v0, E × {t = 0} × Rd ,

(8.3)

where the operators L and B are transformed from L and B, respectively, and v0

is transformed from u0 because of the random mapping (8.2). The transformedproblem (8.3) is a stochastic PDE in a fixed domain, and the standard numericaltechniques, including those based on gPC methodology, can be readily applied.

The mapping technique seeks to transform a problem defined in a random do-main into a stochastic problem defined in a fixed domain. The randomness in thedomain specification is absorbed into the mapping and further into the definitionof the transformed equation. Thus, it is crucial to construct a unique and invertiblemapping that is also robust and efficient in practice. For some domains, this can beachieved analytically ([102]).

Example 8.1 (Mapping for a random channel domain). Consider a straightchannel in two dimensions, with length L and height H . Let us assume that thebottom boundary is a random process with zero mean value and other known dis-tribution functions. That is, the channel is defined as

(x1, x2) ∈ D(ω) = [0, L] × [h(x, ω), H ], (8.4)


MISCELLANEOUS TOPICS 91

where E[h(x, ω)] = 0. It is easy to see that a simple mapping

y1 = x1, y2 = H

H − h(x, ω)(x2 − h(x, ω))

can map the domain into

(y1, y2) ∈ E = [0, L] × [0, H ].Example 8.2 (Mapping of a diffusion equation). Consider a deterministic Pois-son’s equation with homogeneous Dirichlet boundary conditions in a random do-main D(Z(ω)), Z ∈ IZ ,

∇ · [c(x)∇u(x, Z)] = a(x) in D(Z),(8.5)

u(x, Z) = 0 on ∂D(Z),

where no randomness exists in the diffusivity field c(x) and the source field a(x).The stochastic mapping (8.2) transforms (8.5) into a stochastic Poisson’s equationin a deterministic domain E:

�∑i=1

∂

∂yi

κ(y, Z)

�∑j=1

(αij (y, Z)

∂v

∂yj

) = J −1f (y, Z) in E × IZ, (8.6)

v(y, Z) = 0 on ∂E × IZ,

where the random fields κ and f are the transformations of c and a, respectively,J is the transformation Jacobian

J (y, Z) = ∂(y1, . . . , y�)

∂(x1, . . . , x�),

and

αij (y, Z) = J −1∇yi · ∇yj , 1 ≤ i, j ≤ �. (8.7)

Though (8.6) is more complex, it is a stochastic diffusion problem in a fixed do-main. The existing methods, such as those based on gPC, can be readily applied.

Example 8.3 (Diffusion in a random channel domain). Now let us combine theaforementioned two examples and consider diffusion problem in a random channeldomain. This is the same example as that used in [130].

Consider the steady-state diffusion (8.5) with a = 0 and constant diffusivityc(x) in a two-dimensional channel (8.4). To be specific, we set L = 5, H = 1, andthe random bottom boundary as a random field with zero mean and an exponentialtwo-point covariance function

Chh(r, s) = E [h(r, ω)h(s, ω)] = σ 2 exp

(−|r − s|

b

), 0 ≤ r, s ≤ L, (8.8)

where b > 0 is the correlation length. In the computational examples below, b =L/5 = 1, which corresponds to a boundary of moderate roughness. Finally, weprescribe Dirichlet boundary conditions u = 1 at x2 = H and u = 0 elsewhere.


92 CHAPTER 8

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

x1

x 2

(a)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

x1

x 2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

ξ1

ξ 2

(b)

Figure 8.1 Channels with a rough wall generated with the 10-term (K = 10) KL expan-sion (8.9). (a) Four sample realizations of the bottom boundary s(x1, ωj ) (j =1, . . . , 4). (b) A sample realization of the channel in the physical domain (x1, x2)

and in the mapped domain in (ξ1, ξ2). Chebyshev meshes are used in both do-mains. (More details are in [130].)

We employ the finite-term Karhunen-Loève (KL)–type expansion (4.8) todecompose the boundary process. That is,

h(x1, ω) ≈ σ

d∑k=1

√λkψk(x1)Zk(ω), (8.9)

where {λk, ψk(x1)} are the eigenvalues and eigenfunctions of the integral equations∫ L

0Chh(r, s)ψk(r)dr = λkψk(s), k = 1, . . . , d. (8.10)

We further set {Zi(ω)} ∼ U(−1, 1) to be independent uniform random variables in(−1, 1) and use the parameter 0 < σ < 1 to control the maximum deviation of therandomness. (In the computational examples in this section, we set σ = 0.1.) Weemploy Legendre polynomials as the gPC basis functions.

It is worthwhile to stress again that the expansion (8.9) introduces two sources oferrors—errors due to the finite d-term truncation and errors due to the assumptionof independence of {Zk(ω)}. The truncation error is typically controlled by select-ing the value of d to ensure that the eigenvalues {λk} with k > d are sufficientlysmall. For example, in this example the expansion with d = 10 captures 95 percentof the total spectrum (i.e., eigenvalues). A few realizations of the bottom boundary,obtained by the 10-term KL expansion, are shown in figure 8.1a. In figure 8.1b,one realization of the channel domain is mapped onto the corresponding rectangu-lar domain E = [0, L] × [0, H ]. Also shown here are the Chebyshev collocationmesh points that are used to solve the mapped stochastic diffusion problem (8.6).

Figure 8.2 shows the first two moments of the solution, i.e., its mean (top) andstandard deviation (STD) (bottom). We observe that the STD reaches its maximumclose to, but not at, the random bottom boundary.



0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

ξ1

ξ 2

0.166670.33333

0.5

0.66667

0.83333

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

ξ1

ξ 2

0.00300430.0060087

0.0090130.012017

0.015022

Figure 8.2 The mean and the STD of the dependent variable u computed with the stochasticGalerkin method.

To ascertain the convergence of the polynomial chaos expansion, we examine theSTD profile along the cross section y = 0.25, where the STD is close to its maxi-mum. Figure 8.3 shows the STD profiles obtained with different orders of Legendreexpansions. One can see that the second order is sufficient for the Legendre expan-sion to converge. Although not shown here, the convergence of the mean solutionis similar to that of the STD.

Monte Carlo simulations (MCSs) are also conducted to verify the results ob-tained by the stochastic Galerkin method. Figure 8.4 compares the STD profilealong the cross section y = 0.25 computed via the second-order Legendre expan-sion with those obtained from MCS. We observe that as the number of realizationsincreases, the MCS results converge to the converged SG results. With about 2,000realizations, the MCS results agree well with the SG results. In this case, at secondorder (N = 2) and 10 random dimensions (d = 10), the gPC stochastic Galerkinmethod requires (N + d)!/N !d! = 66 basis functions and is computationally moreefficient than Monte Carlo simulation.

Often analytical mapping is not available; then a numerical technique can be em-ployed to determine the mapping ([130]). This involves solving of a set of boundaryvalue problems for the mapping. Other techniques for casting random domain prob-lems into deterministic problems include the boundary perturbation method [126],


94 CHAPTER 8

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

x

ST

D

SG: 1st−orderSG: 2nd−orderSG: 3rd−order

Figure 8.3 The STD profiles along the cross section y = 0.25 computed with the first-,second-, and third-degree Legendre polynomials.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

x

ST

D

MC: 100MC: 500MC: 1,000MC: 2,000SG: 2nd−order

Figure 8.4 The STD profiles along the cross section y = 0.25 computed with the SG method(second-order Legendre chaos) and Monte MCS consisting of 100, 500, 1000,and 2000 realizations.



isoparametric mapping [15], the fictitious domain method [14], the eXtended finiteelement method [82], and a Lagrangian approach that works well for solid defor-mation [2]. Interested readers are strongly encouraged to consult the references.

8.2 BAYESIAN INVERSE APPROACH FOR PARAMETER ESTIMATION

When solving a stochastic system, e.g., (4.13), it is important that the probabilitydistribution functions of the input random variables Z be available. This, alongwith the independence condition, allows us to sample the inputs or to build basisfunctions (such as gPC) to solve the system. A question naturally arises: What ifthere is not enough available information to determine and specify the parameterdistributions? This is a problem of practical concern because in many cases thereare not enough measurement data on the parameters—some parameters cannot evenbe measured. However, sometimes other measurement data are available—data noton the parameters but on some other quantities that can be computed. In this case,an inverse parameter estimation can be carried out to estimate the true distributionsof the input random parameters.

The field of parameter estimation is not new—much research has gone into itfor decades. Here, however, we will present an approach using Bayesian inference.More importantly, the approach is built upon the gPC algorithms in such way thatit does not incur any simulation effort in addition to a one-time gPC simulation.In other words, for the inverse problem here, a forward gPC simulation is requiredonly once, and the rest becomes off-line postprocessing. The strong approximationproperty of gPC expansion plays a vital role here.

Let us assume that each random variable Zi has a prior distribution Fi(zi) =P (Zi ≤ zi) ∈ [0, 1]. The distribution can be made upon assumption, intuition, oreven speculation when there are not sufficient data. Here we focus on continu-ous random variables. Subsequently, each Zi has a probability density functionπi(zi) = dFi(zi)/dzi . We also assume the variables are mutually independent—another assumption when not enough data are available to suggest otherwise. Thus,the joint prior density function for Z is

πZ(z) =nz∏

i=1

πi(zi). (8.11)

Whenever possible, we will neglect the subscript of each probability density anduse π(z) to denote the probability density function of the random variable Z, πZ(z),unless confusion would arise.

Letdt = g(u) ∈ Rnd (8.12)

be a set of variables that one observes, where g : Rnu → Rnd is a function re-

lating the solution u to the true observable dt . We then define a forward modelG : Rnz → R

nd to describe the relation between the random parameters Z and theobservable dt :

dt = G(Z) � g ◦ u(Z). (8.13)


96 CHAPTER 8

In practice, measurement error is inevitable and the observed data d may not matchthe true value of dt . Assuming additive observational errors, we have

d = dt + e = G(Z) + e, (8.14)

where e ∈ Rnd are mutually independent random variables with probability den-sity functions π(e) = ∏nd

i=1 π(ei). We make the usual assumption that e are alsoindependent of Z.

The Bayesian approach seeks to estimate the parameters Z when given a set ofobservations d. To this end, Bayes’ rule takes the form

π(z|d) = π(d|z)π(z)∫π(d|z)π(z)dz

, (8.15)

where π(z) is the prior probability density of Z; π(d|z) is the likelihood function;and π(z|d), the density of Z conditioned on the data d, is the posterior probabilitydensity of Z. For notational convenience, we will use πd(z) to denote the posteriordensity π(z|d) and L(z) to denote the likelihood function π(d|z). That is, (8.15)can be written as

πd(z) = L(z)π(z)∫L(z)π(z)dz

. (8.16)

Following the independence assumption on the measurement noise e, the likelihoodfunction is

L(z) � π(d|z) =nd∏

i=1

πei(di − Gi(z)). (8.17)

The formulation, albeit simple, poses a challenge in practice. This is largely be-cause the posterior distribution πd does not have a closed and explicit form, thuspreventing one from sampling it directly. A large amount of literature has beendevoted to this challenge, with one of the most widely used approaches being theMarkov chain Monte Carlo method (MCMC). For an extensive review, see [101].Since most of the approaches are based on sampling, the main concern is to im-prove efficiency because each sampling point requires a solution of the underlyingforward problem G(Z) and can be time-consuming.

The gPC-based numerical method, in addition to its efficiency in solving theforward problem, provides another remarkable advantage here. For all 1 ≤ i ≤ nd ,let GN,i(Z) be a gPC approximation for the ith component of G(Z),

GN,i(Z) =N∑

|k|=0

ak,i�k(Z), (8.18)

where the expansion coefficients {ak,i} are obtained by either a stochastic Galerkinor a stochastic collocation method. Then we effectively have an analytical repre-sentation of the forward problem in terms of Z, which can be sampled at an arbi-trarily large number of nodes by simply evaluating the polynomial expression at thenodes. Thus, if we substitute the gPC approximation into the likelihood function,we obtain an approximate posterior density that can be easily sampled without any



simulation effort, in addition to the stochastic forward problem which needs to besolved only once.

The gPC approximation of the posterior probability is

πdN (z) = LN (z)π(z)∫

LN (z)π(z)dz, (8.19)

where π(z) is again the prior density of Z and LN is the approximate likelihoodfunction defined as

LN (z) � πN (d|z) =nd∏

i=1

πei(di − GN,i(z)). (8.20)

The error of the approximation can be quantified by using Kullback-Leibler di-vergence (KLD), which measures the difference between probability distributionsand is defined, for probability density functions π1(z) and π2(z), as

D(π1‖π2) �∫

π1(z) logπ1(z)

π2(z)dz. (8.21)

It is always nonnegative, and D(π1‖π2) = 0 when π1 = π2.Under the common assumption that the observational error in (8.14) is indepen-

dently and identically distributed (i.i.d.) Gaussian, e.g.,

e ∼ N(0, σ 2I), (8.22)

where σ > 0 is the standard deviation and I is an identity matrix of size nd × nd ,the following results can be established.

Lemma 8.4. Assume that the observational error in (8.14) has an i.i.d. Gaussiandistribution (8.22). If the gPC expansion GN,i in (8.18) of the forward model con-verges to Gi ,

‖Gi(Z) − GN,i(Z)‖L2πZ

→ 0, 1 ≤ i ≤ nd, N → ∞,

then the posterior probability πdN in (8.19) converges to the true posterior probabil-

ity πd in (8.16) in the sense that the Kullback-Leibler divergence (8.21) convergesto zero; i.e.,

D(πdN ‖πd) → 0, N → ∞. (8.23)

Theorem 8.5. Assume that the observational error in (8.14) has an i.i.d. Gaussiandistribution (8.22) and that the gPC expansion GN,i in (8.18) of the forward modelconverges to Gi in the form of

‖Gi(Z) − GN,i(Z)‖L2πZ

≤ CN−α, 1 ≤ i ≤ nd,

where C is a constant independent of N and α > 0, then for sufficiently large N ,

D(πdN ‖πd) � N−α. (8.24)

Proofs of these results can be found in [76].


98 CHAPTER 8

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

5

10

15

20

25

δ

p(δ

| dat

a )

exact posteriorgPC, N=4gPC, N=8δ

true

Figure 8.5 The posterior distribution of the boundary condition uncertainty δ (denoted asp(δ|data) here) of the Burgers’ equation. The exact posterior from Bayes’ ruleis shown here along with the numerical results from the gPC collocation–basedalgorithms with order N = 4 and N = 8. Also shown is the true (and unknown)perturbation of δt .

Example 8.6 (Burgers’ equation). Let us return to the Burgers’ equation exam-ple in section 1.1.1. As shown in figure 1.1, a small amount of uncertainty withδ ∼ U(0, 0.1) can produce a large response in the location of the transition layerbecause of the supersensitive nature of the problem. On the other hand, the pre-diction produces the distribution range of the output, albeit correctly, too big to beuseful. This suggests that the assumption about the uncertainty of δ is too wide andshould be refined both in its range and distribution. This can be achieved using theBayesian inverse estimation when some observation data are available.

Without actual experimental data, we generate “data” numerically. This is ac-complished by fixing a “true” perturbation δt , conducting a high-order determinis-tic simulation to compute the true location of the transition layer dt at steady state,and then adding Gaussian noise e to produce the data d = dt + e. The data d arethen used for the Bayesian inverse estimate for the posterior distribution of δ.

In figure 8.5, the numerical results of the posterior density are shown with gPCorders of N = 4 and N = 8. Compared to the exact posterior density obtainedfrom the Bayesian rule directly (in this case the exact solution can be solved), wenotice that the gPC results converge. For reference, the true and yet unknown lo-cation of the perturbation δt is also plotted. We observe that the posterior densityclusters around the truth, as expected. The convergence of the gPC-based Bayesianalgorithm is examined in more detail in figure 8.6 for orders as high as N = 200.



20 40 60 80 100 120 140 160 180 200

10−6

10−5

10−4

10−3

10−2

N

erro

r

D(πN

|| π )

||G − GN

||L

2

Figure 8.6 Convergence of the numerical posterior distribution by gPC-based Bayesian al-gorithms at increasing orders, along with the convergence of the gPC forwardmodel prediction.

We observe the familiar exponential convergence. The convergence of the gPC for-ward problem solver is also plotted. It is clear that the posterior density convergesat least as fast as (in fact, even faster than) the forward problem. This is consistentwith the theoretical result. For more details on the analysis and numerical results,see [76].

8.3 DATA ASSIMILATION BY THE ENSEMBLE KALMAN FILTER

Any mathematical and numerical models, deterministic or stochastic, no matterhow sophisticated, are approximations to the true physics. Though many modelsare accurate for a wide range of spatial and time domains, many can deviate fromthe “truth” quickly. (For example, weather forecasting.) In addition to improvingthe models, another way to improve the prediction is to take advantage of measure-ment data, which reflect the physical truth, sometimes partially (scarce and indirectmeasurement) and approximately (inaccurate measurement).

In data assimilation, data arrive sequentially in time, and one seeks to incor-porate both the data and the prediction of the mathematical/numerical models toproduce better predictions. There has been an extremely large amount of litera-ture on various methods of data assimilation, with the most popular ones based oneither a Kalman filter (KF) [54] or a particle filter. Here we discuss only the ensem-ble Kalman filter (EnKF) [28], a variance of the Kalman filter, and explain how


100 CHAPTER 8

gPC-based stochastic methods can be used to significantly improve the perfor-mance of the EnKF. The notation here is somewhat different from that in the restof the book, particularly the use of bold letters, as we try to follow the notationscommonly used in the data assimilation literature.

Let uf ∈ Rm, m ≥ 1, be a vector of forecast state variables (denoted by thesuperscript f ) that are modeled by the following system:

duf

dt= f (t, uf ), t ∈ (0, T ], (8.25)

uf (0) = u0(Z), (8.26)

where T > 0 and Z ∈ Rd , d ≥ 1, is a set of random variables parameterizing therandom initial condition. The model (8.25) and (8.26) is obviously not a perfectmodel for the true physics, and the forecast may not represent the true state vari-ables, ut ∈ Rm, sufficiently well. If a set of measurements d ∈ R�, � ≥ 1, areavailable as

d = Hut + ε, (8.27)

where H : Rm → R� is a measurement operator relating the true state variables

ut and the observation vector d ∈ R� and ε ∈ R� is measurement error. Notethat the measurement operator can be nonlinear, although it is written here in alinear fashion by following the traditional exposition of the (ensemble) Kalmanfilter. Also, characterization of the true state variables ut can be highly nontrivial inpractice. Here we assume that they are well-defined variables with dimension m.

The objective of data assimilation is to construct an optimal estimate of the truestate, the analyzed state vector denoted as ua ∈ Rm, based on the forecast uf andthe observation d. Note that it is possible to add a noise term in (8.25) as a modelfor the modeling error. Here we restrict ourselves to the deterministic model (8.25)with the random initial condition (8.26).

8.3.1 The Kalman Filter and the Ensemble Kalman Filter

The Kalman filter is a sequential data assimilation method that consists of twostages at each time level when data are available—a forecast stage where the system(8.25) and (8.26) is solved and an analysis stage where the analyzed state ua isobtained. Let Pf ∈ Rm×m be the covariance matrix of the forecast solution uf .The analyzed solution ua in the standard KF is determined as a combination of theforecast solution uf and the measurement d in the following manner:

ua = uf + K(d − Huf ), (8.28)

where K is the Kalman gain matrix defined as

K = Pf HT (HPf HT + R)−1. (8.29)

Here the superscript T denotes matrix transpose, and R ∈ R�×� is the covarianceof the measurement error ε. The covariance function of the analyzed state ua , Pa ∈R

m×m, is then obtained by

Pa = (I − KH)Pf (I − KH)T + KRKT = (I − KH)Pf , (8.30)



where I is the identity matrix. When the system (8.25) is linear, the KF can beapplied in a straightforward manner, as equations for evolution of the solution co-variance can be derived. For nonlinear systems, explicit derivation of the equationsfor the covariance function is not possible. Subsequently, various approximationssuch as the extended Kalman filter (EKF) were developed. Their applicability islimited to a certain degree depending on the approximation procedure. Further-more, in practical applications, forwarding the covariance functions (8.30) in timerequires explicit storage and computation of Pf , which scales as O(m2) and can beinefficient when the dimension of the model states, m, is large.

The ensemble Kalman filter (EnKF) ([11, 28]) overcomes the limitations of theKalman filter by using an ensemble approximation of the random state solutions.Let

(uf )i, i = 1, . . . , M, M > 1, (8.31)

be an ensemble of the forecast state variables uf , where each ensemble member isindexed by the subscript i = 1, . . . , M and is obtained by solving the full nonlinearsystem (8.25). The analysis step for the EnKF consists of the following updateperformed on each of the model state ensemble members

(ua)i = (uf )i + Ke((d)i − H(uf )i), i = 1, . . . , M, (8.32)

where

Ke = Pfe HT (HPf

e HT + Re)−1 (8.33)

is the ensemble Kalman gain matrix. Here

Pfe � (uf − uf )(uf − uf )T Pf ,

Pae � (ua − ua)(ua − ua)T Pa,

(8.34)

are the approximate forecast covariance and analysis covariance, respectively,obtained by using statistical averages of the solution ensemble (denoted by theoverbar), and Re = εεT R is the approximate observation error covariance.Therefore, the covariance functions are approximated by ensemble averages anddo not need to be forwarded in time explicitly. An extensive review of the EnKFcan be found in [29].

8.3.2 Error Bound of the EnKF

As an approximation to the Kalman filter, one obvious source of error for the EnKFis from the sampling. Note that here we define error as numerical error—it is thedifference between the result obtained by the KF and that obtained by the EnKF.This is different from modeling error between the result of the KF and the physi-cal truth. For EnKF, the overall error consists of both the numerical error and themodeling error. Here we study only the former.

To understand the impact of numerical error more precisely, we cite here an errorbound of the EnKF derived in [66]. Let t1 < t2 < · · · be discrete time instancesat which data arrive sequentially and assimilation is made. Without loss of gen-erality, let us assume that they are uniformly distributed with a constant step size


102 CHAPTER 8

�T = tk − tk−1, ∀k > 1. Let En be the numerical error of the EnKF, that is, thedifference between the EnKF results and the exact KF results measured in a propernorm at time level tn, n ≥ 1, then the following bound holds:

En ≤(

E0 +n∑

k=1

ek

)exp(� · tn), (8.35)

where E0 is the error of sampling the initial state, ek is the local error at time leveltk, 1 ≤ k ≤ n, and � > 0 is a constant. The local error scales as

ek ∼ O(�tp, σM−α

), �t → 0, M → ∞, (8.36)

where O(�tp) denotes the numerical integration error in time induced by solving(8.25) and (8.26) with a time step �t and a temporal integration order p ≥ 1,σ > 0 is the noise level of the measurement data and scales with the standarddeviation of the measurement noise, M is the size of the ensemble, and α > 0 isthe convergence rate of the sampling scheme. For Monte Carlo sampling, α = 1/2.In most cases, this sampling error dominates. A notable result is that the constant �

depends on the size of the assimilation step in an inverse manner, i.e., � ∝ �T −1.This implies that more frequent data assimilation by the EnKF can magnify thenumerical errors. Since more frequent assimilation is always desirable (wheneverdata are available) for a better estimate of the true state, it is imperative to keepthe numerical errors, particularly the sampling errors, of the EnKF under control.Although the sampling errors can be easily reduced by increasing the ensemblesize, in practice this can significantly increase the computational burden, especiallyfor large-scale problems.

8.3.3 Improved EnKF via gPC Methods

Here we demonstrate that one can yet again take advantage of a highly accurategPC approximation to construct an improved EnKF scheme with much reducednumerical error. Let

uf

N (t, Z) =N∑

|i|=0

uf

i (t)�i(Z) (8.37)

be the gPC solution to the forecast equations (8.25) and (8.26) with sufficientlyhigh accuracy, where the expansion coefficients uf

i (t) can be obtained by either thegPC Galerkin procedure or the gPC collocation procedure.

In addition to offering efficiency for the forecast solution, another (often over-looked) advantage of gPC expansion is that it provides an analytical representationof the solution in terms of the random inputs. All statistical information about uf

N

can be obtained analytically or with minimum computational effort. For example,the mean and covariance are

uf

N = uf

0 , Pf

N =∑

0<|i|≤N

[uf

i

(uf

i

)T

γi

], (8.38)

respectively, where γi = E[�2i ]. And they can be used as accurate approximations

of the exact mean and covariance of the forecast solution uf . Furthermore, one can



generate an ensemble of solution realizations by sampling the random variablesZ in (8.37). This procedure involves nothing but polynomial evaluations, and thusgenerating an ensemble with an arbitrarily large number of samples does not requireany computation of the original governing equations (8.25) and (8.26). Let

(uf

N )i =N∑

|k|=0

uf

k (t)�k((Z)i), i = 1, . . . , M, M � 1, (8.39)

be an ensemble of the forecast solution realizations with size M , where (Z)i, i =1, . . . , M, are Monte Carlo samples of the random vector Z. Equipped with aknowledge of the solution statistics, particularly the mean and covariance from(8.38), we can apply the EnKF scheme (8.32) to obtain analyzed states.

(uaN )i = (uf

N )i + KN (di − H(uf

N )i), i = 1, . . . , M, (8.40)

where KN is the gPC Kalman gain matrix defined as

KN = Pf

N HT (HPf

N HT + R)−1, (8.41)

which approximates the Kalman gain matrix (8.29).The key ingredients and advantages of the gPC-based EnKF are

• Solution of the forecast problem by a gPC-based method, either a Galerkin ora collocation method. This, in many cases, is (much) more efficient than thetraditional Monte Carlo sampling employed in the EnKF.

• At the update stage, one can use the gPC forward problem solution to generatean arbitrarily large number of samples and update them individually by (8.40),similar to the traditional EnKF. The ability to update a large ensemble of solu-tions results in a significant reduction in sampling errors. Moreover, this stepis virtually “free,” in the sense that generating the large solution ensemble viagPC is nothing but evaluation of the gPC polynomial expression and inducesno simulation cost.

• For a linear system of equations (8.25) with Gaussian noise, the gPC-basedEnKF becomes equivalent to the standard Kalman filter.

More details about these methods can be found in [67].

appendix May 18, 2010

Appendix A

Some Important Orthogonal Polynomials

in the Askey Scheme

Here we summarize the definitions and properties of some important orthogonalpolynomials from the Askey scheme. Denote {Qn(x)} as an orthogonal polynomialsystem with the orthogonal relation∫

S

Qn(x)Qm(x)w(x)dx = h2nδmn

for continuous x, or in the discrete case,∑x

Qn(x)Qm(x)w(x) = h2nδmn,

where S is the support of w(x). The three-term recurrence relation takes the form

−xQn(x) = bnQn+1(x) + γnQn(x) + cnQn−1(x), n ≥ 0,

with initial conditions Q−1(x) = 0 and Q0(x) = 1. Another way of expressing therecurrence relation is

Qn+1(x) = (Anx + Bn)Qn(x) − CnQn−1(x), n ≥ 0, (A.1)

where An, Cn �= 0 and CnAnAn−1 > 0. It is straightforward to show that, if wescale variable x by denoting y = αx for α > 0, then the recurrence relation takesthe form

Sn+1(y) = (Any + αBn)Sn(y) − α2CnSn−1(y). (A.2)

Another important property is that these orthogonal polynomials are solutions of adifferential equation

s(x)y ′′ + τ(x)y ′ + λy = 0 (A.3)

in continuous cases, and a difference equation

s(x)�∇y(x) + τ(x)�y(x) + λy(x) = 0 (A.4)

in discrete cases, where s(x) and τ(x) are polynomials of at most second and firstdegree, respectively, and λ is a constant. The notations for the discrete cases are

�f (x) = f (x + 1) − f (x), ∇f (x) = f (x) − f (x − 1).

When

λ = λn = −nτ ′ − 1

2n(n − 1)s ′′,

the equations have a particular solution of the form y(x) = Qn(x), which is apolynomial of degree n.


106 APPENDIX A

A.1 CONTINUOUS POLYNOMIALS

A.1.1 Hermite Polynomial Hn(x) and Gaussian Distribution

Definition:

Hn(x) = (2x)n2F0

(−n

2, −n − 1

2; ; − 2

x2

). (A.5)

Orthogonality: ∫ ∞

−∞Hm(x)Hn(x)w(x)dx = n!δmn, (A.6)

where

w(x) = 1√2π

e− x2/2. (A.7)

Recurrence relation:

Hn+1(x) = xHn(x) − nHn−1(x). (A.8)

Rodriguez formula:

e−x2/2Hn(x) = (−1)n dn

dxn

(e−x2/2

). (A.9)

Differential equation:

y ′′(x) − xy ′(x) + ny(x) = 0, y(x) = Hn(x). (A.10)

A.1.2 Laguerre Polynomial L(α)n (x) and Gamma Distribution

Definition:

L(α)n (x) = (α + 1)n

n! 1F1(−n; α + 1; x). (A.11)

Orthogonality:∫ ∞

0L(α)

m (x)L(α)n (x)w(x)dx = (α + 1)n

n! δmn, α > −1, (A.12)

where

w(x) = xαe−x

�(α + 1). (A.13)


(n + 1)L(α)n+1(x) − (2n + α + 1 − x)L(α)

n (x) + (n + α)L(α)n−1(x) = 0. (A.14)

Normalized recurrence relation:

xqn(x) = qn+1(x) + (2n + α + 1)qn(x) + n(n + α)qn−1(x), (A.15)

where

L(α)n (x) = (−1)n

n! qn(x).


POLYNOMIALS IN THE ASKEY SCHEME 107

Rodriguez formula:

e−xxαL(α)n (x) = 1

n!dn

dxn

(e−xxn+α

). (A.16)


xy ′′(x) + (α + 1 − x)y ′(x) + ny(x) = 0, y(x) = L(α)n (x). (A.17)

Recall that the gamma distribution has the probability density function

f (x) = xαe−x/β

βα+1�(α + 1), α > −1, β > 0. (A.18)

The weighting function of Laguerre polynomial (A.13) is the same as that of thegamma distribution with the scale parameter β = 1.

A.1.3 Jacobi Polynomial P(α,β)n (x) and Beta Distribution

Definition:

P (α,β)n (x) = (α + 1)n

n! 2F1

(−n, n + α + β + 1; α + 1; 1 − x

2

). (A.19)

Orthogonality:∫ 1

−1P (α,β)

m (x)P (α,β)n (x)w(x)dx = h2

nδmn, α, β > −1, (A.20)

where

h2n = (α + 1)n(β + 1)n

n!(2n + α + β + 1)(α + β + 2)n−1,

w(x) = �(α + β + 2)

2α+β+1�(α + 1)�(β + 1)(1 − x)α(1 + x)β. (A.21)


xP (α,β)n (x) = 2(n + 1)(n + α + β + 1)

(2n + α + β + 1)(2n + α + β + 2)P

(α,β)

n+1 (x)

+ β2 − α2

(2n + α + β)(2n + α + β + 2)P (α,β)

n (x)

+ 2(n + α)(n + β)

(2n + α + β)(2n + α + β + 1)P

(α,β)

n−1 (x). (A.22)

Normalized recurrence relation:

xpn(x) = pn+1(x) + β2 − α2

(2n + α + β)(2n + α + β + 2)pn(x)

+ 4n(n + α)(n + β)(n + α + β)

(2n + α + β − 1)(2n + α + β)2(2n + α + β + 1)pn−1(x), (A.23)

where

P (α,β)n (x) = (n + α + β + 1)n

2nn! pn(x).


108 APPENDIX A

Rodriguez formula:

(1 − x)α(1 + x)βP (α,β)n (x) = (−1)n

2nn!dn

dxn

[(1 − x)n+α(1 + x)n+β

]. (A.24)


(1−x2)y ′′(x)+[β−α−(α+β+2)x]y ′(x)+n(n+α+β+1)y(x) = 0, (A.25)

where y(x) = P(α,β)n (x).

A.2 DISCRETE POLYNOMIALS

A.2.1 Charlier Polynomial Cn(x; a) and Poisson Distribution

Definition:

Cn(x; a) = 2F0

(−n, −x; ; − 1

a

). (A.26)

Orthogonality:

∞∑x=0

ax

x! Cm(x; a)Cn(x; a) = a−nean!δmn, a > 0. (A.27)


−xCn(x; a) = aCn+1(x; a) − (n + a)Cn(x; a) + nCn−1(x; a). (A.28)

Rodriguez formula:

ax

x! Cn(x; a) = ∇n

(ax

x!)

. (A.29)

Difference equation:

−ny(x) = ay(x + 1) − (x + a)y(x) + xy(x − 1), y(x) = Cn(x; a). (A.30)

The probability function of Poisson distribution is

f (x; a) = e−a ax

x! , k = 0, 1, 2, . . . . (A.31)

Despite a constant factor e−a , it is the same as the weighting function of Charlierpolynomials.

A.2.2 Krawtchouk Polynomial Kn(x;p,N) and Binomial Distribution

Definition:

Kn(x; p, N) = 2F1

(−n, −x; −N; 1

p

), n = 0, 1, . . . , N. (A.32)



Orthogonality:

N∑x=0

(N

x

)px(1 − p)N−xKm(x; p, N)Kn(x; p, N)

= (−1)nn!(−N)n

(1 − p

p

)n

δmn, 0 < p < 1. (A.33)


−xK(x; p, N) = p(N − n)Kn+1(x; p, N) − [p(N − n) + n(1 − p)]Kn(x; p, N)

+ n(1 − p)Kn−1(x; p, N). (A.34)

Rodriguez formula:(N

x

) (p

1 − p

)x

Kn(x; p, N) = ∇n

[(N − n

x

)(p

1 − p

)x]. (A.35)


−ny(x) = p(N − x)y(x + 1) − [p(N − x) − xq]y(x) + xqy(x − 1), (A.36)

where y(x) = Kn(x; p, N) and q = 1 − p.Clearly, the weighting function from (A.33) is the probability function of the

binomial distribution.

A.2.3 Meixner Polynomial Mn(x;β, c) and Negative Binomial Distribution

Definition:

Mn(x; β, c) = 2F1

(−n, −x; β; 1 − 1

c

). (A.37)

Orthogonality:

∞∑x=0

(β)x

x! cxMm(x; β, c)Mn(x; β, c) = c−nn!(β)n(1 − c)β

δmn, β > 0, 0 < c < 1.

(A.38)Recurrence relation:

(c − 1)xMn(x; β, c) = c(n + β)Mn+1(x; β, c) − [n + (n + β)c]Mn(x; β, c)

+ nMn−1(x; β, c). (A.39)

Rodriguez formula:

(β)xcx

x! Mn(x; β, c) = ∇n

[(β + n)xcx

x!]

. (A.40)


n(c − 1)y(x) = c(x + β)y(x + 1) − [x + (x + β)c]y(x) + xy(x − 1), (A.41)

where y(x) = Mn(x; β, c).


110 APPENDIX A

The weighting function is

f (x) = (β)x

x! (1 − c)βcx, 0 < c < 1, β > 0, x = 0, 1, 2, . . . . (A.42)

It can be verified that it is the probability function of a negative binomial distribu-tion. In the case where β is an integer, it is often called the P ascal distribution.

A.2.4 Hahn Polynomial Qn(x;α, β,N) and Hypergeometric Distribution

Definition:

Qn(x; α, β, N) = 3F2(−n, n+α +β +1, −x; α +1, −N; 1), n = 0, 1, . . . , N.

(A.43)Orthogonality: For α > −1 and β > −1 or for α < −N and β < −N ,

N∑x=0

(α + x

x

)(β + N − x

N − x

)Qm(x; α, β, N)Qn(x; α, β, N) = h2

nδmn, (A.44)

where

h2n = (−1)n(n + α + β + 1)N+1(β + 1)nn!

(2n + α + β + 1)(α + 1)n(−N)nN ! .


−xQn(x) = AnQn+1(x) − (An + Cn)Qn(x) + CnQn−1(x), (A.45)

where

Qn(x) := Qn(x; α, β, N)

and

An = (n + α + β + 1)(n + α + 1)(N − n)

(2n + α + β + 1)(2n + α + β + 2)

Cn = n(n + α + β + N + 1)(n + β)

(2n + α + β)(2n + α + β + 1).

Rodriguez formula:

w(x; α, β, N)Qn(x; α, β, N) = (−1)n(β + 1)n

(−N)n

∇n[w(x; α + n, β + n, N − n)],(A.46)

where

w(x; α, β, N) =(

α + x

x

)(β + N − x

N − x

).


n(n + α + β + 1)y(x) = B(x)y(x + 1) − [B(x) + D(x)]y(x) + D(x)y(x − 1),

(A.47)

where y(x) = Qn(x; α, β, N), B(x) = (x + α + 1)(x − N), and D(x) = x(x −β − N − 1).



If we set α = −α − 1 and β = −β − 1, we obtain

w(x) = 1(N−α−β−1

N

)(

α

x

)(β

N−x

)(

α+β

N

) .

Apart from the constant factor 1/(

N−α−β−1N

), this is the definition of hyper-

geometric distribution.


Appendix B

The Truncated Gaussian Model G(α, β)

The truncated Gaussian model was developed in [123] in order to circumvent themathematical difficulty resulting from the tails of Gaussian distribution. It is anapproximation of Gaussian distribution by generalized polynomial chaos (gPC) Ja-cobi expansion. The approximation can be improved either by increasing the orderof the gPC expansion or by adjusting the parameters in the Jacobi polynomials.The important property of the model is that it has bounded support, i.e., no tails.This can be used as an alternative in practical applications where the random inputsresemble Gaussian distribution and the boundedness of the supports is critical tothe solution procedure.

While the procedure of approximating (weakly) a Gaussian distribution by Ja-cobi polynomials was explained in section 5.1.2, here we tabulate the results for fu-ture reference. The gPC Jacobi approximation for Gaussian distribution is denotedas G(α, β) with α, β > −1. Because of the symmetry of Gaussian distribution, weset α = β in the Jacobi polynomials.

In figures B.1–B.3, the probability density functions (PDFs) of the gPC Jacobichaos approximations are plotted for values of α = β = 0 to 10. For α = β = 0,Jacobi chaos becomes Legendre chaos, and the first-order expansion is simply auniform random variable. In this case, Gibbs’ oscillations are observed. As thevalues of (α, β) increase, the approximations improve. The expansion coefficientsat different orders are tabulated in table B.1, together with the errors in varianceand kurtosis compared with the exact Gaussian distribution. It is seen that, with

−4 −3 −2 −1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5


−4 −3 −2 −1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


Figure B.1 Gaussian random variables approximated by Jacobi chaos. Left: α = β = 0.Right: α = β = 2.


114 APPENDIX B

−4 −3 −2 −1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


−4 −3 −2 −1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4



−4 −3 −2 −1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


−4 −3 −2 −1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4



α = β = 10, even the first-order approximation, which is simply a beta randomvariable, has an error in variance of as little as 0.1 percent. The errors in kurtosis arelarger because the Jacobi chaos approximations do not possess tails. This, however,is exactly our objective.


Tabl

eB

.1A

ppro

xim

atin

gG

auss

ian

Ran

dom

Var

iabl

esvi

aJa

cobi

Cha

os:E

xpan

sion

Coe

ffici

ents

yk

and

erro

rsa

α=

β=

0α

=β

=2

α=

β=

4α

=β

=6

α=

β=

8α

=β

=10

y1

1.69

248

8.78

27(−

1)6.

6218

(−1)

5.52

73(−

1)4.

8399

(−1)

4.35

75(−

1)ε 2

4.51

704(

−2)

8.25

346(

−3)

3.46

301(

−3)

2.00

729(

−3)

1.38

842(

−3)

1.07

231(

−3)

ε 41.

3589

47.

0502

4(−1

)4.

7908

9(−1

)3.

6355

7(−1

)2.

9324

6(−1

)2.

4591

6(−1

)

y3

4.83

99(−

1)7.

5493

(−2)

2.60

11(−

2)1.

2216

(−2)

6.77

970(

−3)

4.17

792(

−3)

ε 21.

1707

1(−2

)8.

5181

6(−4

)4.

4924

5(−4

)4.

2398

3(−4

)4.

3389

4(−4

)4.

4528

2(−4

)

ε 45.

0209

7(−1

)7.

9747

4(−2

)3.

3320

1(−2

)2.

4006

4(−2

)2.

2148

4(−2

)2.

2253

9(−2

)

y5

2.70

64(−

1)1.

9959

(−2)

2.99

36(−

3)2.

3531

(−4)

−3.3

0888

(−4)

−4.1

9539

(−4)

ε 25.

0483

8(−3

)3.

9705

9(−4

)3.

9688

0(−4

)4.

2290

3(−4

)4.

2828

3(−4

)4.

2504

3(−4

)

ε 42.

5552

6(−1

)2.

2937

3(−2

)1.

9210

1(−2

)2.

1509

5(−2

)2.

0684

6(−2

)2.

0831

7(−2

)

aε 2

isth

eer

ror

inva

rian

ce;ε

4is

the

erro

rin

kurt

osis

.The

reis

noer

ror

inth

em

ean.

yk

=0

whe

nk

isev

en.

numerical methods for stochastic computations

Documents