forward and adjoint sensitivity computation of chaotic dynamical systems

Journal of Computational Physics 235 (2013) 1–13

Contents lists available at SciVerse ScienceDirect

Journal of Computational Physics

journal homepage: www.elsevier .com/locate / jcp

Forward and adjoint sensitivity computation of chaotic dynamical systems

Qiqi Wang ⇑Department of Aeronautics and Astronautics, MIT, 77 Mass Ave., Cambridge, MA 02139, USA

a r t i c l e i n f o

Article history:Received 23 February 2012Accepted 5 September 2012Available online 27 October 2012

Keywords:Sensitivity analysisLinear responseAdjoint equationUnsteady adjointChaosStatistical averageLyapunov exponentLyapunov covariant vectorLorenz attractor

0021-9991/$ - see front matter � 2012 Elsevier Inchttp://dx.doi.org/10.1016/j.jcp.2012.09.007

⇑ Tel.: +1 617 253 0921.E-mail address: [email protected]

a b s t r a c t

This paper describes a forward algorithm and an adjoint algorithm for computing sensitiv-ity derivatives in chaotic dynamical systems, such as the Lorenz attractor. The algorithmscompute the derivative of long time averaged ‘‘statistical’’ quantities to infinitesimal per-turbations of the system parameters. The algorithms are demonstrated on the Lorenzattractor. We show that sensitivity derivatives of statistical quantities can be accuratelyestimated using a single, short trajectory (over a time interval of 20) on the Lorenzattractor.

� 2012 Elsevier Inc. All rights reserved.

1. Introduction

Computational methods for sensitivity analysis is a powerful tool in modern computational science and engineering.These methods calculate the derivatives of output quantities with respect to input parameters in computational simulations.There are two types of algorithms for computing sensitivity derivatives: the forward algorithms and the adjoint algorithms.The forward algorithms are more efficient for computing sensitivity derivatives of many output quantities to a few inputparameters; the adjoint algorithms are more efficient for computing sensitivity derivatives of a few output quantities tomany input parameters. Key application of computational methods for sensitivity analysis include aerodynamic shape opti-mization [3], adaptive grid refinement [9], and data assimilation for weather forecasting [8].

In simulations of chaotic dynamical systems, such as turbulent flows and the climate system, many output quantities ofinterest are ‘‘statistical averages’’. Denote the state of the dynamical system as xðtÞ; for a function of the state JðxÞ, the cor-responding statistical quantity hJi is defined as an average of JðxðtÞÞ over an infinitely long time interval:

hJi ¼ limT!1

1T

Z T

0JðxðtÞÞdt: ð1Þ

For ergodic dynamical systems, a statistical average only depends on the governing dynamical system, and does not dependon the particular choice of trajectory xðtÞ.

Many statistical averages, such as the mean aerodynamic forces in turbulent flow simulations, and the mean global tem-perature in climate simulations, are of great scientific and engineering interest. Computing sensitivities of these statisticalquantities to input parameters can be useful in many applications.

. All rights reserved.

http://dx.doi.org/10.1016/j.jcp.2012.09.007

mailto:[email protected]

http://dx.doi.org/10.1016/j.jcp.2012.09.007

http://www.sciencedirect.com/science/journal/00219991

http://www.elsevier.com/locate/jcp

2 Q. Wang / Journal of Computational Physics 235 (2013) 1–13

The differentiability of these statistical averages to parameters of interest as been established through the recentdevelopments in the Linear Response Theory for dissipative chaos [6,7]. A class of chaotic dynamical systems, known as‘‘quasi-hyperbolic’’ systems, has been proven to have statistical quantities that are differentiable with respect to smallperturbations. These systems include the Lorenz attractor, and possibly many systems of engineering interest, such as tur-bulent flows.

Despite recent advances both in Linear Response Theory [7] and in numerical methods for sensitivity computation of un-steady systems [10,4], sensitivity computation of statistical quantities in chaotic dynamical systems remains difficult. A ma-jor challenge in computing sensitivities in chaotic dynamical systems is their sensitivity to the initial condition, commonlyknown as the ‘‘butterfly effect’’. The linearized equations, used both in forward and adjoint sensitivity computations, giverise to solutions that blow up exponentially. When a statistical quantity is approximated by a finite time average, the com-puted sensitivity derivative of the finite time average diverges to infinity, instead of converging to the sensitivity derivativeof the statistical quantity [5]. Existing methods for computing correct sensitivity derivatives of statistical quantities usuallyinvolve averaging over a large number of ensemble calculations [5,1]. The resulting high computation cost makes thesemethods not attractive in many applications.

This paper outlines a computational method for efficiently estimating the sensitivity derivative of time averaged statis-tical quantities, relying on a single trajectory over a small time interval. The key idea of our method, inversion of the ‘‘sha-dow’’ operator, is already used as a tool for proving structural stability of strange attractors [6]. The key strategy of ourmethod, divide and conquer of the shadow operator, is inspired by recent advances in numerical computation of the Lyapu-nov covariant vectors [2,11].

In the rest of this paper, Section 2 describes the ‘‘shadow’’ operator, on which our method is based. Section 3 derives thesensitivity analysis algorithm by inverting the shadow operator. Section 4 introduces a fix to the singularity of the shadowoperator. Section 5 summarizes the forward sensitivity analysis algorithm. Section 6 derives the corresponding adjoint ver-sion of the sensitivity analysis algorithm. Section 7 demonstrates both the forward and adjoint algorithms on the Lorenzattractor. Section 8 concludes this paper.

The paper uses the following mathematical notation: Vector fields in the state space (e.g. f ðxÞ;/iðxÞ) are column vectors;gradient of scalar fields (e.g.

@axi

@x ) are row vectors; gradient of vector fields (e.g. @f@x) are matrices with each row being a dimen-

sion of f, and each column being a dimension of x. The (�) sign is used to identify matrix–vector products or vector-vectorinner products. For a trajectory xðtÞ satisfying dx

dt ¼ f ðxÞ and a scalar or vector field aðxÞ in the state space, we often use dadt

to denote daðxðtÞÞdt . The chain rule da

dt ¼ dadx � dx

dt ¼ dadx � f is often used without explanation.

2. The ‘‘shadow operator’’

For a smooth, uniformly bounded n dimensional vector field dxðxÞ, defined on the n dimensional state space of x. The fol-lowing transform defines a slightly ‘‘distorted’’ coordinates of the state space:

x0ðxÞ ¼ xþ �dxðxÞ; ð2Þ

where � is a small real number. Note that for an infinitesimal �, the following relation holds:

x0ðxÞ � x ¼ �dxðxÞ ¼ �dxðx0Þ þ Oð�2Þ: ð3Þ

We call the transform from x to x0 as a ‘‘shadow coordinate transform’’. In particular, consider a trajectory xðtÞ and the cor-responding transformed trajectory x0ðtÞ ¼ x0ðxðtÞÞ. For a small �, the transformed trajectory x0ðtÞwould ‘‘shadow’’ the originaltrajectory xðtÞ, i.e., it stays uniformly close to xðtÞ forever. Fig. 1 shows an example of a trajectory and its shadow.

Now consider a trajectory xðtÞ satisfying an ordinary differential equation

_x ¼ f ðxÞ ð4Þ

with a smooth vector field f ðxÞ as a function of x. The same trajectory in the transformed ‘‘shadow’’ coordinates x0ðtÞ do notsatisfy the same differential equation. Instead, from Eq. (3), we obtain

_x0 ¼ f ðxÞ þ � @dx@x� f ðxÞ ¼ f ðx0Þ � � @x

@x� dxðx0Þ þ � @dx

@x� f ðx0Þ þ Oð�2Þ ð5Þ

In other words, the shadow trajectory x0ðtÞ satisfies a slightly perturbed equation

_x0 ¼ f ðx0Þ þ �df ðx0Þ þ Oð�2Þ; ð6Þ

where the perturbation df is

df ðxÞ ¼ � @f@x� dxðxÞ þ @dx

@x� f ðxÞ ¼ � @f

@x� dxðxÞ þ ddx

dt:¼ ðSf dxÞðxÞ: ð7Þ

For a given differential equation _x ¼ f ðxÞ, Eq. (7) defines a linear operator Sf : dx) df . We call Sf the ‘‘shadow operator’’ of f.For any smooth vector field dxðxÞ that defines a slightly distorted ‘‘shadow’’ coordinate system in the state space, Sf deter-

Fig. 1. A trajectory of the Lorenz attractor under a shadow coordinate transform. The black trajectory shows xðtÞ, and the red trajectory shows xprimeðtÞ. Theperturbation �dx shown corresponds to an infinitesimal change in the parameter r, and is explained in detail in Section 7. (For interpretation of thereferences to color in this figure legend, the reader is referred to the web version of this article.)

Q. Wang / Journal of Computational Physics 235 (2013) 1–13 3

mines a unique smooth vector field df ðxÞ that defines a perturbation to the differential equation. Any trajectory of the ori-ginal differential equation would satisfy the perturbed equation in the distorted coordinates.

Given an ergodic dynamical system _x ¼ f ðxÞ, and a pair ðdx; df Þ that satisfies df ¼ Sf dx; dx determines the sensitivity ofstatistical quantities of the dynamical system to an infinitesimal perturbation �df . Let JðxÞ be a smooth scalar function ofthe state, consider the statistical average hJi as defined in Eq. (1). The sensitivity derivative of hJi to the infinitesimal pertur-bation �df is by definition

dhJid�¼ lim

�!0

1�

limT!1

1T

Z T

0Jðx0ðtÞÞdt � lim

T!1

1T

Z T

0JðxðtÞÞdt

� �; ð8Þ

where by the ergodicity assumption, the statistical average of the perturbed system can be computed by averaging over x0ðtÞ,which satisfies the perturbed governing differential equation. Continuing from Eq. (8),

dhJid�¼ lim

�!0limT!1

1T

Z T

0

Jðx0ðtÞÞ � JðxðtÞÞ�

dt ¼ limT!1

lim�!0

1T

Z T

0


dt ¼ limT!1

1T

Z T

0

@J@x� dx dt ¼ @J

@x� dx

� �: ð9Þ

Eq. (9) represents the sensitivity derivative of a statistical quantity hJi to the size of a perturbation �df . There are two subtlepoints here:

� The two limits lim�!0 and limT!1 can commute with each other for the following reason: The two trajectories x0ðtÞ andxðtÞ stay uniformly close to each other forever; therefore,


�!�!0 @J@x� dx; ð10Þ

uniformly for all t. Consequently,

1T

Z T

0


dt!�!0 1T

Z T

0

@J@x� dxdt; ð11Þ

uniformly for all T. Thus the two limits commute.� The two trajectories x0ðtÞ and xðtÞ start at two specially positioned pair of initial conditions x0ð0Þ ¼ xð0Þ þ �dxðxð0ÞÞ.

Almost any other pair of initial conditions (e.g. x0ð0Þ ¼ xð0Þ) would make the two trajectories diverge as a result of the‘‘butterfly effect’’. They would not stay uniformly close to each other, and the limits lim�!0 and limT!1 would notcommute.


Eq. (9) represents the sensitivity derivative of the statistical quantity hJi to the infinitesimal perturbation �df as another sta-tistical quantity h@J

@x � dxi. We can compute it by averaging @J@x � dx over a sufficiently long trajectory, provided that dx ¼ S�1df is

known along the trajectory. The next section describes how to numerically compute dx ¼ S�1df for a given df .

3. Inverting the shadow operator

Perturbations to input parameters can often be represented as perturbations to the dynamics. Consider a differentialequation _x ¼ f ðx; s1; s2; . . . ; smÞ parameterized by m input variables, an infinitesimal perturbation in a input parametersj ! sj þ � can be represented as a perturbation to the dynamics �df ¼ � df

dsj.

Eq. (9) defines the sensitivity derivative of the statistical quantity hJi to an infinitesimal perturbation �df , provided that adx can be found satisfying df ¼ Sf dx, where Sf is the shadow operator. To compute the sensitivity by evaluating Eq. (9), onemust first numerically invert Sf for a given df to find the corresponding dx.

The key ingredient of numerical inversion of Sf is the Lyapunov spectrum decomposition. This decomposition can be effi-ciently computed numerically [11,2]. In particular, we focus on the case when the system _x ¼ f ðxÞ has distinct Lyapunovexponents. Denote the Lyapunov covariant vectors as /1ðxÞ;/2ðxÞ; . . . ;/nðxÞ. Each /i is a vector field in the state spacesatisfying

ddt

/iðxðtÞÞ ¼@f@x� /iðxðtÞÞ � ki/iðxðtÞÞ; ð12Þ

where k1; k2; . . . ; kn are the Lyapunov exponents in decreasing order.The Lyapunov spectrum decomposition enables a divide and conquer strategy for computing dx ¼ S�1

f df . For any df ðxÞ andevery point x on the attractor, both dxðxÞ and df ðxÞ can be decomposed into the Lyapunov covariant vector directions almosteverywhere, i.e.

dxðxÞ ¼Xn

i¼1

axi ðxÞ/iðxÞ; ð13Þ

df ðxÞ ¼Xn

i¼1

afi ðxÞ/iðxÞ; ð14Þ

where axi and af

i are scalar fields in the state space. From the form of Sf in Eq. (7), we obtain

Sf ðaxi /iÞ ¼ �

@f@x� ðax

i ðxÞ/iðxÞÞ þddtðax

i ðxÞ/iðxÞÞ ¼ �axi ðxÞ

@f@x� /iðxÞ þ

daxi ðxÞ

dt/iðxÞ þ ax

i ðxÞd/iðxÞ

dt: ð15Þ

By substituting Eq. (12) into the last term of Eq. (15), we obtain

Sf ðaxi /iÞ ¼

daxi ðxÞdt

� ki axi ðxÞ

� �/iðxÞ: ð16Þ

By combining Eq. (16) with Eqs. (13) and (14) and the linear relation df ¼ Sf dx, we finally obtain

df ¼Xn

i¼1

Sf ðaxi /iÞ ¼

Xn

i¼1

daxi

dt� ki ax

i

� �|fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl}

afi

/i: ð17Þ

Eqs. (16) and (17) are useful for two reasons:

1. They indicate that the shadow operator Sf , applied to a scalar field axi ðxÞ multiple of /iðxÞ, generates another scalar field

afi ðxÞ multiple of the same vector field /iðxÞ. Therefore, one can compute S�1

f df by first decomposing df as in Eq. (14) toobtain the af

i . If each axi can be calculated from the corresponding af

i , then dx can be computed with Eq. (13), completingthe inversion.

2. It defines a scalar ordinary differential equation that governs the relation between axi and af

i along a trajectory xðtÞ:

daxi ðxÞdt

¼ afi ðxÞ þ ki ax

i ðxÞ: ð18Þ

This equation can be used to obtain axi from af

i along a trajectory, thereby filling the gap in the inversion procedure of Sf out-lined above. For each positive Lyapunov exponent ki, one can integrate the ordinary differential equation

d�axi

dt¼ �af

i þ ki �axi ; ð19Þ

backwards in time from an arbitrary terminal condition, and the difference between �axi ðtÞ and the desired ax

i ðxÞ will decreaseexponentially. For each negative Lyapunov exponent ki, Eq. (19) can be integrated forward in time from an arbitrary initial


condition, and �axi ðtÞ will converge exponentially to the desired ax

i ðxÞ. For a zero Lyapunov exponent ki ¼ 0, Section 4 intro-duces a solution.

4. Time dilation and compression

There is a fundamental problem in the inversion method derived in Section 3: Sf is not invertible for certain df . This can beshown with the following analysis: Any continuous time dynamical system with a non-trivial attractor must have a zeroLyapunov exponent kn0 ¼ 0. The corresponding Lyapunov covariant vector is /n0

ðxÞ ¼ f ðxÞ. This can be verified by substitut-ing ki ¼ 0 and /i ¼ f into Eq. (12). For this i ¼ n0, Eq. (19) becomes

afn0ðxÞ ¼

daxn0ðxÞ

dt: ð20Þ

By taking an infinitely long time average on both sides of Eq. (20), we obtain

afn0ðxÞ

D E¼ lim

T!1

axn0ðxðTÞÞ � ax

n0ðxð0ÞÞ

T¼ 0: ð21Þ

Eq. (21) implies that for any df ¼ Sf dx, the i ¼ n0 component of its Lyapunov decomposition (as in Eq. (14)) must satisfyhaf

n0ðxÞi ¼ 0. Any df that do not satisfy this linear relation, e.g. df � f , would not be in the range space of Sf . Thus the corre-

sponding dx ¼ S�1f df does not exist.

Our solution to the problem is complementing Sf with a ‘‘global time dilation and compression’’ constant g, whose effectproduces a df that is outside the range space of Sf . We call g a time dilation constant for short. The combined effect of a timedilation constant and a shadow transform could produce all smooth perturbations df .

The idea comes from the fact that for a constant g, the time dilated or compressed system _x ¼ ð1þ �gÞf ðxÞ has exactly thesame statistics hJi, as defined in Eq. (1), as the original system _x ¼ f ðxÞ. Therefore, the perturbation in any hJi due to any �df isequal to the perturbation in hJi due to � ðgf ðxÞ þ df ðxÞÞ. Therefore, the sensitivity derivative to df can be computed if we canfind a dx that satisfies Sf dx ¼ gf ðxÞ þ df ðxÞ for some g.

We use the ‘‘free’’ constant g to put gf ðxÞ þ df ðxÞ into the range space of Sf . By substituting gf ðxÞ þ df ðxÞ into the con-straint Eq. (21) that identifies the range space of Sf , the appropriate g must satisfy the following equation

gþ hafn0i ¼ 0; ð22Þ

which we use to numerically compute g.Once the appropriate time dilation constant g is computed, gf ðxÞ þ df ðxÞ is in the range space of Sf . We use the procedure

in Section 3 to compute dx ¼ S�1f ðgf þ df Þ, then use Eq. (9) to compute the desired sensitivity derivative dhJi=d�. The addition

of gf to df affects Eq. (19) only for i ¼ n0, making it

daxn0ðxÞ

dt¼ af

n0ðxÞ þ g: ð23Þ

Eq. (23) indicates that axn0

can be computed by integrating the right hand side along the trajectory.The solution to Eq. (23) admits an arbitrary additive constant. The effect of this arbitrary constant is the following: By

substituting Eqs. (13) into Eq. (9), the contribution from the i ¼ n0 term of dx to dhJi=d� is

limT!1

1T

Z T

0af

n0

dJdt

dt: ð24Þ

Therefore, any constant addition to afn0

vanishes as T !1. Computationally, however, Eq. (9) must be approximated by afinite time average. We find it beneficial to adjust the level of af

n0 to approximately hafn0 i ¼ 0, in order to control the error

due to finite time averaging.

5. The forward sensitivity analysis algorithms

For a given _x ¼ f ðxÞ; df and JðxÞ, the mathematical developments in Sections 3 and 4 are summarized into Algorithm 1 forcomputing the sensitivity derivative ddhJi=d� as in Eq. (9).

The preparation phase of the algorithm (Steps 1–3) computes a trajectory and the Lyapunov spectrum decompositionalong the trajectory. The algorithm then starts by decomposing df (Step 4), followed by computing dx (Steps 5–7), and finallycomputing dhJi=d� (Step 8). The sensitivity derivative of many different statistical quantities hJ1i; hJ2i; . . . to a single df can becomputed by only repeating the last step of the algorithm. Therefore, this is a ‘‘forward’’ algorithm in the sense that it effi-ciently computes sensitivity of multiple output quantities to a single input parameter (the size of perturbation �df ). We willsee that this is in sharp contrast to the ‘‘adjoint’’ algorithm described in Section 6, which efficiently computes the sensitivityderivative of one output statistical quantity hJi to many perturbations df1; df2; . . ..

It is worth noting that the dx computed using Algorithm 1 satisfies the forward tangent equation


_dx ¼ @f@x� dxþ g f þ df : ð25Þ

This can be verified by taking derivative of Eq. (13), substituting Eqs. (19) and (23), then using Eq. (14). However, dx mustsatisfy both an initial condition and a terminal condition, making it difficult to solve with conventional time integrationmethods. In fact, Algorithm 1 is equivalent to splitting dx into stable, neutral and unstable components, corresponding topositive, zero and negative Lyapunov exponents; then solving Eq. (25) separately for each component in different time direc-tions. This alternative version of the forward sensitivity computation algorithm could be useful for large systems to avoidcomputation of all the Lyapunov covariant vectors.

Algorithm 1. The forward sensitivity analysis algorithm

(1) Choose a ‘‘spin-up buffer time’’ TB, and an ‘‘statistical averaging time’’ TA. TB should be much longer than 1=jkij for
all nonzero Lyapunov exponent ki, so that the solutions of Eq. (19) can reach ax
i over a time span of TB. TA should bemuch longer than the decorrelation time of the dynamics, so that one can accurately approximate a statistical quan-tity by averaging over ½0; TA�.(2) Obtain an initial condition on the attractor at t ¼ �TB, e.g., by solving _x ¼ f ðxÞ for a sufficiently long time span,starting from an arbitrary initial condition.(3) Solve _x ¼ f ðxÞ to obtain a trajectory xðtÞ; t 2 ½�TB; TA þ TB�; compute the Lyapunov exponents ki and the Lyapunovcovariant vectors /iðxðtÞÞ along the trajectory, e.g., using algorithms in [11,2].(4) Perform the Lyapunov spectrum decomposition of df ðxÞ along the trajectory xðtÞ to obtain af

i ðxÞ; i ¼ 1; . . . ;n as inEq. (14).(5) Compute the global time dilation constant g using Eq. (22).(6) Solve the differential Eqs. (19) to obtain ax

i over the time interval ½0; TA�. The equations with positive ki are solvedbackward in time from t ¼ TA þ TB to t ¼ 0; the ones with negative ki are solved forward in time from t ¼ �TB tot ¼ TA. For kn0 ¼ 0, Eq. (23) is integrated, and the mean of ax

n0is set to zero.

(7) Compute dx along the trajectory xðtÞ; t 2 ½0; TA� with Eq. (13).(8) Compute dhJi=d� using Eq.(1) by averaging over the time interval ½0; TA�.

6. The adjoint sensitivity analysis algorithm

The adjoint algorithm starts by trying to find an adjoint vector field f ðxÞ, such that the sensitivity derivative of the givenstatistical quantity hJi to any infinitesimal perturbation �df can be represented as

dhJi�¼ f T � dfD E

ð26Þ

Both f in Eq. (26) and @J@x in Eq. (9) can be decomposed into linear combinations of the adjoint Lyapunov covariant vectors al-

most everywhere on the attractor:

f ðxÞ ¼Xn

i¼1

afi ðxÞwiðxÞ; ð27Þ

@J@x

T

¼Xn

i¼1

axi ðxÞwiðxÞ; ð28Þ

where the adjoint Lyapunov covariant vectors wi satisfy

� ddt

wiðxðtÞÞ ¼@f@x

T

� wiðxðtÞÞ � kiwiðxðtÞÞ ð29Þ

with proper normalization, the (primal) Lyapunov covariant vectors /i and the adjoint Lyapunov covariant vectors wi havethe following conjugate relation:

wiðxÞT � /jðxÞ �

0; i – j;1; i ¼ j;

�ð30Þ

i.e., the n� n matrix formed by all the /i and the n� n matrix formed by all the wi are the transposed inverse of each other atevery point x in the state space.

By substituting Eqs. (13) and (28) into Eq. (9), and using the conjugate relation in Eq. (30), we obtain

dhJid�¼Xn

i¼1

axi ax

i

� : ð31Þ


Similarly, by combining Eqs. (26), (14), (27) and (30), it can be shown that f satisfies Eq. (26) if and only if

dhJid�¼Xn

i¼1

afi af

i

D E: ð32Þ

Comparing Eqs. (31) and (32) leads to the following conclusion: Eq. (26) can be satisfied by finding afi that satisfy

afi a

fi

D E¼ ax

i axi

� ; i ¼ 1; . . . ;n: ð33Þ

The afi that satisfies Eq. (33) can be found using the relation between af

i and axi in Eq. (18). By multiplying af

i on both sides ofEq. (18) and integrate by parts in time, we obtain

1T

Z T

0af

i afi dt ¼ af

i axi

T

T

0

� 1T

Z T

0

dafi

dtþ ki af

i

!ax

i dt ð34Þ

for i – n0. Through apply the same technique to Eq. (23), we obtain for i ¼ n0

1T

Z T

0af

n0af

n0dt ¼

afn0

axn0

T

T

0

� 1T

Z T

0

dafn0

dtax

n0dt þ 1

T

Z T

0g af

n0dt: ð35Þ

If we set afi to satisfy the following relations

� dafi ðxÞdt

¼ axi ðxÞ þ ki af

i ðxÞ; i – n0;

� dafi ðxÞdt

¼ axi ðxÞ; haf

i i ¼ 0; i ¼ n0;

ð36Þ

then Eqs. (34) and (35) become

1T

Z T

0af

i afi dt ¼ af

i axi

T

T

0

þ 1T

Z T

0ax

i axi dt; i – n0;

1T

Z T

0af

i afi dt ¼ af

i axi

T

T

0

þ 1T

Z T

0ax

i axi dt þ g

1T

Z T

0af

n0dt � haf

n0i

� �; i ¼ n0:

ð37Þ

As T !1, both equations reduces to Eq. (33).In summary, if the scalar fields af

i satisfy Eq. (36), then they also satisfy Eq. (37) and thus Eq. (33); as a result, the f formedby these af through Eq. (27) satisfies Eq. (26), thus is the desired adjoint vector field.

For each i – n0, the scalar field afi satisfying Eq. (36) can be computed by solving an ordinary differential equations

� d�afi

dt¼ �ax

i þ ki�af

i : ð38Þ

Contrary to computation of axi through solving Eq. (19), the time integration should be forward in time for positive ki, and

backward in time for negative ki, in order for the difference between �afi ðtÞ and af

i ðxðtÞÞ to diminish exponentially.The i ¼ n0 equation in Eq. (36) can be directly integrated to obtain af

n0ðxÞ. The equation is well defined because the right

hand side is mean zero:

1T

Z T

0af

n0ðxðtÞÞdt ¼ 1

T

Z T

0

@J@x� /n0

dt ¼ 1T

Z T

0

dJdt

dt�!T!10: ð39Þ

Therefore, the integral of axn0ðxÞ over time, subtracted by its mean, is the solution af

n0ðxÞ to the i ¼ n0 case of Eq. (36).

Algorithm 2. The adjoint sensitivity analysis algorithm

(1) Choose a ‘‘spin-up buffer time’’ TB, and an ‘‘statistical averaging time’’ TA. TB should be much longer than 1=jkij for
all nonzero Lyapunov exponent ki, so that the solutions of Eq. (19) can reach ax
i over a time span of TB. TA should bemuch longer than the decorrelation time of the dynamics, so that one can accurately approximate a statistical quan-tity by averaging over ½0; TA�.(2) Obtain an initial condition on the attractor at t ¼ �TB, e.g., by solving _x ¼ f ðxÞ for a sufficiently long time span,starting from an arbitrary initial condition.(3) Solve _x ¼ f ðxÞ to obtain a trajectory xðtÞ; t 2 ½�TB; TA þ TB�; compute the Lyapunov exponents ki and the Lyapunovcovariant vectors /iðxðtÞÞ along the trajectory, e.g., using algorithms in [11,2].

(continued on next page)


(continued)

Algorithm 2. The adjoint sensitivity analysis algorithm

(4) Perform the Lyapunov spectrum decomposition of ð@J=@xÞT along the trajectory xðtÞ to obtain axi ðxðtÞÞ; i ¼ 1; . . . ;n

as in Eq. (28).(5) Solve the differential Eqs. (38) to obtain af

i ðxðtÞÞ over the time interval ½0; TA�. The equations with negative ki aresolved backward in time from t ¼ TA þ TB to t ¼ 0; the ones with positive ki are solved forward in time from t ¼ �TB

to t ¼ TA. For i ¼ n0, the scalar �axn0

is integrated along the trajectory; the mean of the integral is subtracted from theintegral itself to obtain af

n0.

(6) Compute f along the trajectory xðtÞ; t 2 ½0; TA� with Eq. (27).(7) Compute dhJi=d� using Eq. (26) by averaging over the time interval ½0; TA�.

The above analysis summarizes to Algorithm 2 for computing the sensitivity derivative derivative of the statistical averagehJi to an infinitesimal perturbations �df . The preparation phase of the algorithm (Steps 1–3) is exactly the same as in Algo-rithm 1. These steps compute a trajectory xðtÞ and the Lyapunov spectrum decomposition along the trajectory. The adjointalgorithm then starts by decomposing the derivative vector ð@J=@xÞT (Step 4), followed by computing the adjoint vector df(Steps 5–6), and finally computing dhJi=d� for a particular df . Note that the sensitivity of the same hJi to many different per-turbations df1; df2; . . . can be computed by repeating only the last step of the algorithm. Therefore, this is an ‘‘adjoint’’ algo-rithm, in the sense that it efficiently computes the sensitivity derivatives of a single output quantity to many inputperturbation.

It is worth noting that f computed using Algorithm 2 satisfies the adjoint equation

� _f ¼ @f

@x

T

� f � @J@x: ð40Þ

This can be verified by taking derivative of Eq. (27), substituting Eq. (36), then using Eq. (28). However, f must satisfy both aninitial condition and a terminal condition, making it difficult to solve with conventional time integration methods. In fact,Algorithm 2 is equivalent to splitting f into stable, neutral and unstable components, corresponding to positive, zero andnegative Lyapunov exponents; then solving Eq. (40) separately for each component in different time directions. This alter-native version of the adjoint sensitivity computation algorithm could be useful for large systems, to avoid computation of allthe Lyapunov covariant vectors.

7. An example: the Lorenz attractor

We consider the Lorenz attractor _x ¼ f ðxÞ, where x ¼ ðx1; x2; x3ÞT , and

f ðxÞ ¼rðx2 � x1Þ

x1ðr � x3Þ � x2

x1x2 � bx3

0B@

1CA: ð41Þ

The ‘‘classic’’ parameter values r ¼ 10; r ¼ 28; b ¼ 8=3 are used. Both the forward sensitivity analysis algorithm(Algorithm 1) and and the adjoint sensitivity analysis algorithm (Algorithm 2) are performed on this system.

We want to demonstrate the computational efficiency of our algorithm; therefore, we choose a relatively short statisticalaveraging interval of TA ¼ 10, and a spin up buffer period of TB ¼ 5. Only a single trajectory of length TA þ 2TB on the attractoris required in our algorithm. Considering that the oscillation period of the Lorenz attractor is around 1, the combined trajec-tory length of 20 is a reasonable time integration length for most simulations of chaotic dynamical systems. In our example,we start the time integration from t ¼ �10 at x ¼ ð�8:67139571762;4:98065219709;25Þ, and integrate the equation tot ¼ �5, to ensure that the entire trajectory from �TB to TA þ TB is roughly on the attractor. The rest of the discussion in thissection are focused on the trajectory xðtÞ for t 2 ½�TB; TA þ TB�.

7.1. Lyapunov covariant vectors

The Lyapunov covariant vectors are computed in Step 3 of both Algorithms 1 and 2, over the time interval ½�TB; TA þ TB�.These vectors, along with the trajectory xðtÞ, are shown in Fig. 2.

The three dimensional Lorenz attractor has three pairs of Lyapunov exponents and Lyapunov covariant vectors. k1 is theonly positive Lyapunov exponent, and /1 is computed by integrating the tangent linear equation

_~x ¼ @f@x� ~x; ð42Þ

forward in time from an arbitrary initial condition at t ¼ �TB. The first Lyapunov exponent is estimated to be k1 � 0:95through a linear regression of ~x in the log space. The first Lyapunov vector is then obtained as /1 ¼ ~xe�k1t .

Fig. 2. The Lyapunov covariant vectors of the Lorenz attractor along the trajectory xðtÞ for t 2 ½0;10�. The x-axes are t; the blue, green and red linescorrespond to the x1; x2 and x3 coordinates in the state space, respectively. (For interpretation of the references to color in this figure legend, the reader isreferred to the web version of this article.)


k2 ¼ 0 is the vanishing Lyapunov exponent; therefore, /2 ¼ h f ðxÞ, where h ¼ 1=ffiffiffiffiffiffiffiffiffiffiffiffiffihkfk2

2iq

is a normalizing constant thatmake the mean magnitude of /2 equal to 1.

The third Lyapunov exponent k3 is negative. So /3 is computed by integrating the tangent linear Eq. (42) backwards intime from an arbitrary initial condition at t ¼ TA þ TB. The third Lyapunov exponent is estimated to be k3 � �14:6 througha linear regression of the backward solution ~x in the log space. The third Lyapunov vector is then obtained as /3 ¼ ~xe�k3t .

7.2. Forward sensitivity analysis

We demonstrate our forward sensitivity analysis algorithm by computing the sensitivity derivative of three statisticalquantities hx2

1i, hx22i and, hx3i to a small perturbation in the system parameter r in the Lorenz attractor Eq. (41). The infini-

tesimal perturbation r ! r þ � is equivalent to the perturbation

Fig. 3.definedreferen

�df ¼ � @f@r¼ � ð0; x1;0ÞT : ð43Þ

The forcing term defined in Eq. (43) is plotted in Fig. 3(a). Fig. 3(b) plots the decomposition coefficients afi , computed by solv-

ing a 3� 3 linear system defined in Eq. (14) at every point on the trajectory.

Lyapunov vector decomposition of df . The x-axes are t; the blue, green and red lines on the left are the first, second and third component of df asin Eq. (43); the blue, green and red lines on the right are af

1; af2 and af

3 in the decomposition of df (Eq. (14)), respectively. (For interpretation of theces to color in this figure legend, the reader is referred to the web version of this article.)


For each afi obtained through the decomposition, Eq. (19) or (23) is solved to obtain ax

i . For i ¼ 1, Eq. (19) is solved back-wards in time from t ¼ TA þ TB to t ¼ 0. For i ¼ n0 ¼ 2, the time compression constant is estimated to be g � �2:78, and Eq.(23) is integrated to obtain ax

2. For i ¼ 3, Eq. (19) is solved forward in time from t ¼ �TB to t ¼ TA.The resulting values of ax

i ; i ¼ 1;2;3 are plotted in Fig. 4(a). These values are then substituted into Eq. (13) to obtain dx, asplotted in Fig. 4(b). The ‘‘shadow’’ trajectory defined as x0 ¼ xþ �dx is also plotted in Fig. 1 as the red lines, for an � ¼ 1=3. Thisdx ¼ S�1

f df is approximately the shadow coordinate perturbation ‘‘induced’’ by a 1=3 increase in the input parameter r, a.k.a.the Rayleigh number in the Lorenz attractor.

The last step of the forward sensitivity analysis algorithm is computing the sensitivity derivatives of the output statisticalquantities using Eq. (9). We found that using a windowed time averaging [4] yields more accurate sensitivities. Here our esti-mates over the time interval ½0; TA� are

Fig. 4.the righis refer

dhx21i

dr� 2:64;

dhx22i

dr� 3:99:

dhx3idr� 1:01 ð44Þ

These sensitivity values compare well to results obtained through finite difference, as shown in Section 7.4.

7.3. Adjoint sensitivity analysis

We demonstrate our adjoint sensitivity analysis algorithm by computing the sensitivity derivatives of the statisticalquantity hx3i to small perturbations in the three system parameters s; r and b in the Lorenz attractor Eq. (41).

The first three steps of Algorithm 2 is the same as in Algorithm 1, and has been demonstrated in Section 7.1. Step 4 in-volves decomposing ð@J=@xÞT into three adjoint Lyapunov covariant vectors. In our case, JðxÞ ¼ x3, therefore @J=@x � ð0;0;1Þ,as plotted in Fig. 5(a). The adjoint Lyapunov covariant vectors wi can be computed using Eq. (30) by inverting the 3� 3 matrixformed by the (primal) Lyapunov covariant vectors /i at every point on the trajectory. The coefficients ax

i ; i ¼ 1;2;3 can thenbe computed by solving Eq. (28). These scalar quantities along the trajectory are plotted in Fig. 5(b) for t 2 ½0; TA�.

Once we obtain axi ; a

fi can be computed by solving Eq. (38). The solution is plotted in Fig. 6(a). Eq. (27) can then be used to

combine the afi into the adjoint vector f . The computed f along the trajectory is plotted both in Fig. 6(b) as a function of t, and

also in Fig. 7 as arrows on the trajectory in the state space.The last step of the adjoint sensitivity analysis algorithm is computing the sensitivity derivatives of hJi to the perturba-

tions dfs ¼ dfds, dfr ¼ df

dr and dfb ¼ dfdb using Eq. (26). Here our estimates over the time interval ½0; TA� are computed as

dhx3ids� 0:21;

dhx3idr� 0:97;

dhx3idb� �1:74: ð45Þ

Note that dhx3idr estimated using adjoint method differs from the same value estimated using forward method (44). This dis-

crepancy can be caused by the different numerical treatments to the time dilation term in the two methods. The forwardmethod numerically estimates the time dilation constant g through Eq. (22); while the adjoint method sets the mean ofaf

i to zero (36), so that the computation is independent to the value of g. This difference could cause apparent discrepancyin the estimated sensitivity derivatives.

The next section compares these sensitivity estimates, together with the sensitivity estimates computed in Section 7.2,, toa finite difference study.

7.4. Comparison with the finite difference method

To reduce the noise in the computed statistical quantities in the finite difference study, a very long time integration lengthof T ¼ 100;000 is used for each simulation. Despite this long time averaging, the quantities computed contain statisticalnoise of the order 0:01. The noise limits the step size of the finite difference sensitivity study. Fortunately all the output sta-

Inversion of Sf for dx ¼ S�1f df . The x-axes are t; the blue, green and red lines on the left are ax

1; ax2 and ax

3, respectively; the blue, green and red lines ont are the first, second and third component of dx, computed via Eq. (13). (For interpretation of the references to color in this figure legend, the reader

red to the web version of this article.)

Fig. 5. Adjoint Lyapunov vector decomposition of @J=@x. The x-axes are t; the blue, green and red lines on the left are the first, second and third componentof @J=@x; the blue, green and red lines on the right are ax

1; ax2 and ax

3 in the decomposition of @J=@x (Eq. (28)), respectively. (For interpretation of the referencesto color in this figure legend, the reader is referred to the web version of this article.)

Fig. 6. Computation of the adjoint solution f for the Lorenz attractor. The x-axes are t; the blue, green and red lines on the left are af1; a

f2 and af

3, respectively;the blue, green and red lines on the right are the first, second and third component of f , computed via Eq. (27). (For interpretation of the references to colorin this figure legend, the reader is referred to the web version of this article.)

Fig. 7. The adjoint sensitivity derivative f as in Eq. (26), represented by arrows on the trajectory.


Fig. 8. Histogram of sensitivities computed using Algorithm 1 (forward sensitivity analysis) starting from 200 random initial conditions. TA ¼ 10; TB ¼ 5.The red region identifies the 3r confidence interval estimated using finite difference regression. (For interpretation of the references to color in this figurelegend, the reader is referred to the web version of this article.)

Fig. 9. Histogram of sensitivities computed using Algorithm 2 (adjoint sensitivity analysis) starting from 200 random initial conditions. TA ¼ 10; TB ¼ 5.The red region identifies the 3r confidence interval estimated using finite difference regression. (For interpretation of the references to color in this figurelegend, the reader is referred to the web version of this article.)


tistical quantities seem fairly linear with respect to the input parameters, and a moderately large step size of the order 0:1can be used. To further reduce the effect of statistical noise, we perform linear regressions through 10 simulations of theLorenz attractor, with r equally spaced between 27:9 and 28:1. The total time integration length (excluding spin up time)is 1;000;000. The resulting computation cost is in sharp contrast to our method, which involves a trajectory of only length20.

Similar analysis is performed for the parameters s and b, where 10 values of s equally spaced between 9:8 and 10:2 areused, and 10 values of b equally spaced between 8=3� 0:02 and 8=3þ 0:02 are used. The slopes estimated from the linearregressions, together with 3r confidence intervals (where r is the standard error of the linear regression) is listed below:

dhx21i

dr¼ 2:70 0:10;

dhx22i

dr¼ 3:87 0:18;

dhx3idr¼ 1:01 0:04;

dhx3ids¼ 0:16 0:02;

dhx3idb¼ �1:68 0:15:

ð46Þ

To further assess the accuracy of our algorithm, which involves finite time approximations to Eqs. (9) and (26), we repeatedboth Algorithms 1 and 2 for 200 times, starting from random initial conditions at T ¼ �10. We keep the statistical averagingtime TA ¼ 10 and the spin up buffer time TB ¼ 5. The resulting histogram of sensitivities computed with Algorithm 1 isshown in Fig. 8; the histogram of sensitivities computed with Algorithm 2 is shown in Fig. 9. The finite difference estimatesare also indicated in these plots.

We observe that our algorithms compute accurate sensitivities most of the time. However, some of the computed sensi-tivities seems to have heavy tails in their distribution. This may be due to behavior of the Lorenz attractor near the unstablefixed point ð0;0;0Þ. Similar heavy tailed distribution has been observed in other studies of the Lorenz attractor [1]. Theyfound that certain quantities computed on Lorenz attractor can have unbounded second moment. This could be the casein our sensitivity estimates. Despite this minor drawback, the sensitivities computed using our algorithm have good quality.Our algorithms are much more efficient than existing sensitivity computation methods using ensemble averages.

8. Conclusion

This paper derived a forward algorithm and an adjoint algorithm for computing sensitivity derivatives in chaotic dynam-ical systems. Both algorithms efficiently compute the derivative of statistical quantities hJi to infinitesimal perturbations �dfto the dynamics.


The forward algorithm starts from a given perturbation df , and computes a perturbed ‘‘shadow’’ coordinate system dx, e.g.as shown in Fig. 1. The sensitivity derivatives of multiple statistical quantities to the given df can be computed from dx. Theadjoint algorithm starts from a statistical quantity hJi, and computes an adjoint vector f , e.g. as shown in Fig. 7. The sensi-tivity derivative of the given hJi to multiple input perturbations can be computed from f .

We demonstrated both the forward and adjoint algorithms on the Lorenz attractor at standard parameter values. The for-ward sensitivity analysis algorithm is used to simultaneously compute @hx2

1i@r , @hx

22i

@r , and @hx3i@r ; the adjoint sensitivity analysis algo-

rithm is used to simultaneously compute @hx3i@s , @hx3i

@r , and @hx3i@b . We show that using a single trajectory of length about 20, both

algorithms can efficiently compute accurate estimates of all the sensitivity derivatives.

References

[1] G. Eyink, T. Haine, D. Lea, Ruelle’s linear response formula, ensemble adjoint schemes and Lévy flights, Nonlinearity 17 (2004) 1867–1889.[2] F. Ginelli, P. Poggi, A. Turchi, H. Chaté, R. Livi, A. Politi, Characterizing dynamics with covariant Lyapunov vectors, Physical Review Letters 99 (2007)

130601.[3] A. Jameson, Aerodynamic design via control theory, Journal of Scientific Computing 3 (1988) 233–260.[4] J. Krakos, Q. Wang, S. Hall, D. Darmofal, Sensitivity analysis of limit cycle oscillations, Journal of Computational Physics 231 (8) (2012) 3228–3245.[5] D. Lea, M. Allen, T. Haine, Sensitivity analysis of the climate of a chaotic system, Tellus 52A (2000) 523–532.[6] D. Ruelle, Differentiation of SRB states, Communications in Mathematical Physics 187 (1997) 227–241.[7] D. Ruelle, A review of linear response theory for general differentiable dynamical systems, Nonlinearity 22 (4) (2009) 855.[8] J.-N. Thepaut, P. Courtier, Four-dimensional variational data assimilation using the adjoint of a multilevel primitive-equation model, Quarterly Journal

of the Royal Meteorological Society 117 (502) (1991) 1225–1254.[9] D. Venditti, D. Darmofal, Grid adaptation for functional outputs: Application to two-dimensional inviscid flow, Journal of Computational Physics 176

(2002) 4069.[10] Q. Wang, P. Moin, G. Iaccarino, Minimal repetition dynamic checkpointing algorithm for unsteady adjoint calculation, SIAM Journal on Scientific

Computing 31 (4) (2009) 2549–2567.[11] C. Wolfe, R. Samelson, An efficient method for recovering Lyapunov vectors from singular vectors, Tellus A 59 (3) (2007) 355–366.

forward and adjoint sensitivity computation of chaotic dynamical systems

Documents