the japan society for industrial and applied...
TRANSCRIPT
The Japan Society for Industrial and Applied Mathematics
Vol.1 (2009) pp.1-79
The Japan Society for Industrial and Applied Mathematics
Vol.1 (2009) pp.1-79
Editorial Board Chief Editor Yoshimasa Nakamura (Kyoto University)
Vice-Chief Editor Kazuo Kishimoto (Tsukuba University)
Associate Editors Reiji Suda (University of Tokyo) Satoshi Tsujimoto (Kyoto University) Masashi Iwasaki (Kyoto Prefectural University) Norikazu Saito (University of Tokyo) Koh-ichi Nagao (Kanto Gakuin University) Koichi Kato (Japan Institute for Pacific Studies) Saburo Kakei (Rikkyo University) Atsushi Nagai (Nihon University) Takeshi Mandai (Osaka Electro-Communication University) Ryuichi Ashino (Osaka Kyoiku University) Ken Umeno (NiCT) Yuzuru Sato (Hokkaido University) Daisuke Takahashi (Waseda University) Katsuhiro Nishinari (University of Tokyo) Hitoshi Imai (University of Tokushima) Nobito Yamamoto (University of Electro-Communications) Takahiro Katagiri (University of Tokyo) Tetsuya Sakurai (Tsukuba University) Yoshitaka Watanabe (Kyushu University) Takeshi Ogita (Tokyo Woman's Christian University) Takashi Suzuki (Osaka University) Yoshihiro Shikata Tatsuo Oyama (National Graduate Institute for Policy Studies) Tetsuo Ichimori (Osaka Institute of Technology) Masami Hagiya (University of Tokyo) Yasuyuki Tsukada (NTT Communication Science Laboratories) Hideyuki Azegami (Nagoya University) Kenji Shirota (Ibaraki University) Naoyuki Ishimura (Hitotsubashi University) Jiro Akahori (Ritsumeikan University) Ken Nakamula (Tokyo Metropolitan University) Miho Aoki (Okayama University of Science) Keiko Imai (Chuo University) Ichiro Kataoka (HITACHI) Shin-Ichi Nakano (Gunma University) Maiko Shigeno (Tsukuba University) Ichiro Hagiwara (Tokyo Institute of Technology) Fumiko Sugiyama (Kyoto University)
Contents
On a discrete optimal velocity model and its continuous and ultradiscrete relatives ・・・ 1-4 Daisuke Takahashi and Junta Matsukidaira
Numerical Inclusion of Optimum Point for Linear Programming ・・・ 5-8 Shin'ichi Oishi and Kunio Tanabe
2D tight framelets with orientation selectivity suggested by vision science ・・・ 9-12 Hitoshi Arai and Shinobu Arai
Analysis of Neuronal Dendrite Patterns Using Eigenvalues of Graph Laplacians ・・・ 13-16 Naoki Saito and Ernest Woei
The Gateau derivative of cost functions in the optimal shape problems and the existence of the shape derivatives of solutions of the Stokes problems
・・・ 17-20
Satoshi Kaizu
On very accurate verification of solutions for boundary value problems by using spectral methods
・・・ 21-24
Mitsuhiro T. Nakao and Takehiko Kinoshita
On oscillatory solutions of the ultradiscrete Sine-Gordon equation ・・・ 25-27 Shin Isojima and Junkichi Satsuma
Computational and Symbolic Anonymity in an Unbounded Network ・・・ 28-31 Hubert Comon-Lundh, Yusuke Kawamoto and Hideki Sakurada
Reformulation of the Anderson method using singular value decomposition for stable convergence in self-consistent calculations
・・・ 32-35
Akitaka Sawamura
On the qd-type discrete hungry Lotka-Volterra system and its application to the matrix eigenvalue algorithm
・・・ 36-39
Akiko Fukuda, Emiko Ishiwata, Masashi Iwasaki and Yoshimasa Nakamura
Eigendecomposition algorithms solving sequentially quadratic systems by Newton method
・・・ 40-43
Koichi Kondo, Shinji Yasukouchi and Masashi Iwasaki
Block BiCGGR: a new Block Krylov subspace method for computing high accuracy solutions
・・・ 44-47
Hiroto Tadano, Tetsuya Sakurai and Yoshinobu Kuramashi
On parallelism of the I-SVD algorithm with a multi-core processor ・・・ 48-51 Hiroki Toyokawa, Kinji Kimura, Masami Takata and Yoshimasa Nakamura
A numerical method for nonlinear eigenvalue problems using contour integrals ・・・ 52-55 Junko Asakura, Tetsuya Sakurai, Hiroto Tadano, Tsutomu Ikegami and Kinji Kimura
Differential qd algorithm for totally nonnegative band matrices: convergence properties and error analysis
・・・ 56-59
Yusaku Yamamoto and Takeshi Fukaya
Algorithm for computing Kronecker basis ・・・ 60-63 Yoshiaki Kakinuma, Kazuyuki Hiraoka, Hiroki Hashiguchi, Yutaka Kuwajima and Takaomi Shigehara
Robust exponential hedging in a Brownian setting ・・・ 64-67 Keita Owari
A hybrid of the optimal velocity and the slow-to-start models and its ultradiscretization ・・・ 68-71 Kazuhito Oguma and Hideaki Ujino
A new compressible fluid model for traffic flow with density-dependent reaction time of drivers
・・・ 72-75
Akiyasu Tomoeda, Daisuke Shamoto, Ryosuke Nishi, Kazumichi Ohtsuka and Katsuhiro Nishinari
Error analysis for a matrix pencil of Hankel matrices with perturbed complex moments ・・・ 76-79 Tetsuya Sakurai, Junko Asakura, Hiroto Tadano and Tsutomu Ikegami
JSIAM Letters Vol.1 (2009) pp.1–4 c©2009 Japan Society for Industrial and Applied Mathematics
On a discrete optimal velocity model and its continuous
and ultradiscrete relatives
Daisuke Takahashi1 and Junta Matsukidaira2
Department of Applied Mathematics, Waseda University, 3-4-1, Okubo, Shinjuku-ku, Tokyo169-8555, Japan1
Department of Applied Mathematics and Informatics, Ryukoku University, Seta, Ohtsu, Shiga520-2194, Japan2
E-mail [email protected], [email protected]
Received August 29, 2008, Accepted October 5, 2008 (INVITED PAPER)
Abstract
We propose a discrete traffic flow model with discrete time. Continuum limit of this model isequivalent to the optimal velocity model. It has also an ultradiscrete limit and a piecewise-linear type of traffic flow model is obtained. Both models show phase transition from freeflow to jam in a fundamental diagram. Moreover, the ultradiscrete model includes the Fukui–Ishibashi model in a special case.
Keywords optimal velocity model, discrete model, ultradiscretization
Research Activity Group Applied Integrable Systems
1. Introduction
There are various models of different levels of discrete-ness to analyze the traffic congestion [1]. Macroscopicmodel is defined by a partial differential equation basedon fluid dynamics and it describes a traffic flow by themotion of continuous media. For example, Musha andHiguchi used the Burgers equation to describe a fluctu-ation of traffic flow [2].
System of ordinary differential equations (ODEs),coupled map lattice (CML) and cellular automaton (CA)are often used as microscopic model to describe each ve-hicle motion directly. About ODE models, time t andvehicle position x are continuous, and vehicle num-ber k is discrete. CML is similar to ODE but timeis discretized [3]. All dependent and independent vari-ables are discrete for CA models. For example, Nagel–Schreckenberg model [4], elementary CA of rule number184 (ECA184) [5], Fukui–Ishibashi (FI) model [6] andslow-start model [7] are known as effective traffic model.Though evolution rule of CA model is simple due toits discreteness, a mechanism of congestion formation ispresented sharply.
Bando et al. proposed a noticeable ODE model [8].The model is now called ‘optimal velocity model’ (OVmodel) and is defined as follows. Assume a finite numberof vehicles moving on a one-way circuit of single lane asshown in Fig. 1. The length of the circuit is L and totalnumber of vehicles is K. Introduce a one-dimensionalcoordinate along the circuit with an appropriate origin.Define xk(t) by a position of vehicle with vehicle numberk (k = 1, 2, · · · , K) at time t. The vehicle number isgiven sequentially to each vehicle as the preceding onehas a larger number. Note that the preceding vehicle ofk = K is k = 1. Then the evolution equation on xk(t) is
xk = AV (xk+1 − xk)− xk, (1)
x1
x2
x3
xK
xK!1
Fig. 1. Circuit and vehicles.
where A is a constant representing a driver’s sensitivityand V (∆x) is an optimal velocity representing a desiredvelocity of a driver with a distance ∆x between his vehi-cle and the vehicle ahead. The acceleration of kth vehicleis determined by (1) and is proportional to the differencebetween its optimal velocity and its current real velocityxk.
The typical profile of optimal velocity is shown inFig. 2. This profile reflects a driver’s behavior; if thedistance from the vehicle ahead is short (long), he wantsto keep low (high) speed. When the distance becomeslong enough, he wants to keep a speed limit of the road.The results obtained by the optimal velocity model agreewith real traffic data well.
Nishinari and Takahashi reported an interesting re-lation between the Burgers equation and ECA184 [9].They proposed a difference equation called ‘discreteBurgers equation’ and showed that the Burgers equa-tion and ECA184 were obtained by continuum and ul-tradiscrete limit respectively from the discrete Burgers
– 1 –
JSIAM Letters Vol. 1 (2009) pp.1–4 Daisuke Takahashi and Junta Matsukidaira
x0
Vmax
V
Fig. 2. Typical profile of optimal velocity.
equation. Ultradiscretization is a method utilizing a non-analytic limit defined by the following formula [10].
limε→+0
ε log(eA/ε + eB/ε + · · · ) = max(A,B, · · · ). (2)
We obtain an equation of piecewise-linear type called‘ultradiscrete equation’ by ultradiscretizing a differenceequation. There is a correspondence between basic op-erations of difference equation and of ultradiscrete one.Usual operations +, × and / of difference equation corre-spond to max, + and − of ultradiscrete one respectively.Thus we can make ‘analytic evaluation’ for ultradiscreteequation as we do for difference equation.
Moreover dependent variables can be discretized us-ing appropriate initial data and constants of ultradis-crete equation. Therefore ultradiscrete equation is acompletely discretized equation in this sense. Utilizingthis feature, we can show that ECA184 originally definedby a binary table is equivalent to ultradiscrete Burg-ers equation. Thus asymptotic behavior of solutions toECA184 can be proved by the analytic evaluation reflect-ing that of Burgers equation. As seen by this example,ultradiscretization gives a direct relation between CAand differential equation via difference one and proposesa new perspective for CA which can not be obtained ifwe make a closed analysis.
In this letter, we propose a difference equation rele-vant to the OV model and call it ‘discrete OV (dOV)equation’ . If we take a continuum limit for this equa-tion, we obtain (1) with a specific V (∆x). If we take anultradiscrete limit, we obtain ‘ultradiscrete OV (uOV)equation’ including ECA184 or FI model in a specialcase. Since uOV equation is of second-order on time dif-ference, it can express an acceleration effect. Both dOVand uOV equations show a phase transition from freeflow to jam.
2. Discrete Optimal Velocity Model
Let us assume the same situation as of OV model (1).The only difference is that a time variable is discrete.Assume a time step denoted by n (n = 0, 1, · · · ) and aninterval of time step by δ (> 0). Using these notations,dOV equation is defined by
xn+1k − 2xn
k + xn−1k = A log
(1 + δ2V (xn
k+1 − xnk )
)− log
(1 + δ(exn
k−xn−1k − 1)
).(3)
If 1 + δ2V (xnk+1 − xn
k ) or 1 + δ(exnk−xn−1
k − 1) in thelogarithmic terms is 0 or negative, (3) is not well-defined.However, if δ is small enough and if V (∆x) and initialdata are appropriately defined, we can easily exclude thisproblem.
Replacing xnk by xk(nδ) and assuming δ ∼ 0, we obtain
the following expansion.
xk = AV (xk+1−xk)−xk+A
2(xk−(x)2)δ+O(δ2). (4)
Thus (1) is derived from (3) by the continuum limitδ → 0 and (3) is a discrete analogue to (1). Consid-ering this relation, V (∆x) is required to have the profileroughly shown in Fig. 2. Moreover if we assume that(3) can be ultradiscretized, V (∆x) is required to have amore specific form. To realize both continuum and ul-tradiscrete limit, we fix the following form for V (∆x),
V (∆x) = a( 1
1 + e−b(∆x−c)− 1
1 + ebc
), (5)
where a, b and c are positive constants. Fig. 3 shows anexample of profile of V (∆x). Fig. 4 shows an example
0 2 4x
1
2
V
Fig. 3. Profile of V (∆x) defined by (5) with a = 2, b = 4, c = 2.
of orbits of vehicles. Initial positions of vehicles are setat nearly regular intervals with small disturbances. Co-alescence of jams occurs at earlier time and three majorjams survive in this figure. Though not shown in thisfigure, more coalescences occur after a long time passes.
The fundamental diagram is shown in Fig. 5 [1]. Thisdiagram shows a dependence of flow Q on density ρ.Density ρ is a number of vehicles per unit length andflow Q is equivalent to a total momentum of vehiclesper unit length. Both are defined by
ρ =1L
(number of vehicles),
Q =1
(n1 − n0 + 1)Lδ
n1∑n=n0
K∑k=1
(xnk − xn−1
k ).(6)
We can observe three phases, that is, (a) free flow phasein a low density region, (b) jam phase in a medium den-sity region and (c) tight jam phase in a high densityregion. Since these phases can be observed for the OVmodel (1), we can consider that discretization of timevariable in the dOV model (3) works well.
– 2 –
JSIAM Letters Vol. 1 (2009) pp.1–4 Daisuke Takahashi and Junta Matsukidaira
Fig. 4. Example of orbits of vehicles for L = 50, K = 25, δ = 0.1,A = 1, a = 2, b = 4 and c = 2.
Fig. 5. Fundamental diagram with L = 50, δ = 0.1, A = 1,a = 2, b = 4, c = 2, n0 = 90000 and n1 = 100000. Plottedpoints are obtained for 1 ≤ K ≤ 50.
3. Ultradiscrete Optimal Velocity Model
The dOV equation (3) with optimal velocity (5) canbe ultradiscretized. Let us introduce transformation ofvariable and constants including a new parameter ε de-fined by
xnk →
xnk + nδ
ε, δ → e−δ/ε, a → e(a+2δ)/ε, c → c
ε.
(7)
Substituting the transformation into (3) and (5), we ob-tain
xn+1k − 2xn
k + xn−1k
= A
ε log(1 +
ea/ε
1 + e−b(xnk+1−xn
k−c)/ε− ea/ε
1 + ebc/ε
)− ε log
(1 + e(xn
k−xn−1k )/ε − e−δ/ε
). (8)
If a, b, c, δ are positive and a < bc, we obtain the fol-lowing ultradiscrete equation by taking a limit ε → +0.
xn+1k − 2xn
k + xn−1k
= Amax(0, a−max(0,−b(xnk+1 − xn
k − c)))
−max(0, xnk − xn−1
k ). (9)
Moreover this equation is equivalent to
xn+1k − 2xn
k + xn−1k
= AV (xnk+1 − xn
k )−max(0, xnk − xn−1
k ), (10)
where
V (∆x) = max(0, b(∆x−c)+a)−max(0, b(∆x−c)). (11)
Note that (10) does not include the parameter δ in (3),since it can not be an arbitrary independent parameterwhen we take the ultradiscrete limit and can be excludedby introducing a background speed δ into xn
k and replac-ing δ by e−δ/ε as shown in (7). And we also commentthat the condition a < bc is necessary to keep the lowerspeed part of profile of V (∆x) for a small ε in the regionof ∆x > 0.
We show a typical profile of V (∆x) in Fig. 6. Fig. 7
0 cx
a
V
cab
Fig. 6. Profile of V (∆x) in (11).
shows an example of orbits of vehicles. Positions of ve-hicles are random integer at initial time step. Howeverxn
k is generally non-integer since A and a are not inte-ger in this example. A fundamental diagram using thesame constants other than K is shown in Fig. 8. Surpris-ingly three phases clearly exist as in Fig. 5. Note thatnumerical experiments are executed by double precisioncalculation of C program.
Fig. 7. Example of orbits of vehicles for (10) with L = 50, K =25, A = 0.5, a = 1.9, b = 4, c = 3.
– 3 –
JSIAM Letters Vol. 1 (2009) pp.1–4 Daisuke Takahashi and Junta Matsukidaira
Fig. 8. Fundamental diagram for (10) with L = 100, A = 0.5,a = 1.9, b = 4, c = 3, n0 = 1000, n1 = 2000. Plotted points areobtained for 1 ≤ K ≤ 100 and 50 trials are executed for everyK.
4. Special Case of Ultradiscrete Optimal
Velocity Model
In this section, we discuss two special cases of uOVmodel.
4.1 No OvertakingAssume that A, a, b, c in (10) and (11) are positive.
Then if AV (∆x) ≤ ∆x, we can derive
xn+1k ≤ xn
k+1 +xnk − xn−1
k −A max(0, xnk − xn−1
k )︸ ︷︷ ︸(a)
.
(12)Moreover if A ≥ 1, (a) is always 0 or negative. Thereforewe get xn+1
k ≤ xnk+1 on these assumptions. And further-
more, if velocity of all vehicles is non-negative, overtak-ing does not occur. When we use the OV (dOV, uOV)model as a numerical simulator of concrete traffic flow,overtaking of vehicle can not occur in a one-way circuitof single lane. Though we can avoid overtaking by choos-ing appropriate constants and initial data, assurance ofno overtaking is important for a real application.
4.2 Cellular AutomatonIf constants A, a, b, c and initial position x0
k are inte-ger, any xn
k calculated by (10) is also integer. Thereforeall the dependent and independent variables in (10) arediscrete in this case. Moreover, if we set A = 1, (10)reduces to
xn+1k = xn
k + V (xnk+1 − xn
k ) + min(0, xnk − xn−1
k ). (13)
Let us assume V (∆x) ≥ 0 as in Fig. 6. Moreover ifxn
k − xn−1k ≥ 0 for any k at a certain n, the last term
min(0, xnk − xn−1
k ) in (13) becomes 0 and xn+1k − xn
k =V (xn
k+1 − xnk ) ≥ 0 for any k. Therefore any vehicle does
not go backward if initial velocity of any vehicle is notnegative. Under this condition, (13) again reduces to thefirst-order equation,
xn+1k = xn
k + V (xnk+1 − xn
k ). (14)
Moreover let us consider the case of A = 1, a = vmax,b = 1, c = vmax + 1 where vmax is positive integer. ThenV (∆x) in (14) becomes
V (∆x) = max(0,∆x−1)−max(0,∆x−vmax−1). (15)
Assuming a size of vehicles is a unit cell size, xnk+1−xn
k−1is a distance between k-th and (k + 1)-th vehicles attime step n. Therefore every vehicle moves forward byits distance up to vmax. This model is nothing but theFI model and ECA184 for vmax = 1. We note that ananalogy between OV and some CA models is commentedby the references [11] and [12].
5. Concluding Remarks
We propose a new discrete OV model with a discretetime. Continuum limit of this model is equivalent to theOV model. We show that orbits of vehicles and the fun-damental diagram agree with those of OV model quali-tatively. Moreover this model has an ultradiscrete limitand a piecewise-linear type of evolution equation is ob-tained. We show that the ultradiscrete OV model alsogives phase transition in its fundamental diagram by anumerical calculation. It includes the FI model as a spe-cial case.
We only show a definition, a few features and somenumerical results about dOV and uOV models in thisletter. Detailed analysis using various combinations ofconstants is necessary to understand a dynamics of themodels fully. Comparison with other models and withreal data is also necessary. These points are future prob-lems to be solved.
References
[1] D. Chowdhury, L. Santen and A. Schadschneider, Statisticalphysics of vehicular traffic and some related systems, Phys.Rep., 329 (2000), 199–329.
[2] T. Musha and H. Higuchi, Traffic current fluctuation and theBurgers equation, Jpn. J. Appl. Phys., 17 (1978), 811–816.
[3] S. Tadaki, M. Kikuchi, Y. Sugiyama and S. Yukawa, Coupledmap traffic flow simulator based on optimal velocity func-tions, J. Phys. Soc. Jpn., 67 (1998), 2270–2276.
[4] K. Nagel and M. Schreckenberg, A cellular automaton modelfor freeway traffic, J. Physique I, 2 (1992), 2221–2229.
[5] S. Wolfram, Theory and Applications of Cellular Automata,World Scientific, Singapore, 1986.
[6] M. Fukui and Y. Ishibashi, Traffic flow in 1D cellular automa-ton model including cars moving with high speed, J. Phys.Soc. Jpn., 65 (1996), 1868–1870.
[7] M. Takayasu and H. Takayasu, 1/f noise in a traffic model,Fractals, 1 (1993), 860–866.
[8] M. Bando, K. Hasebe, A. Nakayama, A. Shibata and Y.Sugiyama, Dynamical model of traffic congestion and numer-ical simulation, Phys. Rev. E, 51 (1995), 1035–1042.
[9] K. Nishinari and D. Takahashi, Analytical properties of ultra-discrete Burgers equation and rule–184 cellular automaton, J.Phys. A, 31 (1998), 5439–5450.
[10] T. Tokihiro, D. Takahashi, J. Matsukidaira and J. Sat-suma, From soliton equations to integrable cellular automatathrough a limiting procedure, Phys. Rev. Lett., 76 (1996),3247–3250.
[11] D. Helbing and M. Schreckenberg, Cellular automata simu-lating experimental properties of traffic flow, Phys. Rev. E,59 (1999), R2505–R2508.
[12] K.Nishinari, A Lagrange representation of cellular automatontraffic-flow models, J. Phys. A, 34 (2001), 10727–10736.
– 4 –
JSIAM Letters Vol.1 (2009) pp.5–8 c©2009 Japan Society for Industrial and Applied Mathematics
Numerical Inclusion of Optimum Point for Linear
Programming
Shin’ichi Oishi1,2
and Kunio Tanabe1
Department of Applied Mathematics, Faculty of Science and Engineering, Waseda University,Tokyo 169-8555, Japan1 and CREST, JST, Japan2
E-mail [email protected], [email protected]
Received August 31, 2008, Accepted October 6, 2008 (INVITED PAPER)
Abstract
This paper concerns with the following linear programming problem:
Maximize ctx, subject to Ax ≦ b and x ≧ 0,
where A ∈ Fm×n, b ∈ F
m and c, x ∈ Fn. Here, F is a set of floating point numbers.
The aim of this paper is to propose a numerical method of including an optimum pointof this linear programming problem provided that a good approximation of an optimumpoint is given. The proposed method is base on Kantorovich’s theorem and the continuousNewton method. Kantorovich’s theorem is used for proving the existence of a solution forcomplimentarity equation and the continuous Newton method is used to prove feasibility ofthat solution. Numerical examples show that a computational cost to include optimum pointis about 4 times than that for getting an approximate optimum solution.
Keywords numerical verification, continuous Newton method, Kantorovich’s theorem
Research Activity Group Quality of Computations
1. Introduction
In this paper, we are concerned with the followinglinear programming problem:
Maximize ctx, subject to Ax ≦ b and x ≧ 0, (1)
where A ∈ Fm×n, b ∈ F
m and c, x ∈ Fn. Here, F is a set
of floating point numbers. The superscript t indicatesthe transposition. The aim of this paper is to proposea numerical method of including an optimum point ofthis linear programming problem provided that a goodapproximation of an optimum point is given.
Let xf be a feasible point of the primal problem (1),i.e., xf be a point satisfying
Axf ≦ b and xf ≧ 0. (2)
It is clear that ctxf becomes a lower bound of the opti-mum value. A dual problem for (1) is given by
Minimize bty, subject to Aty ≧ c and y ≧ 0, (3)
where y ∈ Fm. Let yf be a feasible point of the dual
problem (3), i.e., yf be a point satisfying
Atyf ≧ c and yf ≧ 0. (4)
It is clear that btyf becomes an upper bound of the op-timum value. Thus, an inclusion of the optimum valuev∗ is given by
v∗ ∈ [ctxf , btyf ] (5)
provided that feasible points xf and yf can be found.The duality theorem asserts that in principle the widthof the interval [ctxf , btyf ] can make as small as desired.This argument, which is rather well known (cf. Ref. [1]),
gives a method of numerical inclusion of the optimumvalue.
This paper is concerned with a problem of includingnumerically an optimum point of (1). For the purpose,we shall consider the following complimentarity problem
f(z) =
(
x(Aty − c)y(b − Ax)
)
= 0 ∈ Rn+m (6)
subject to
x ≧ 0, y ≧ 0, b − Ax ≧ 0 and Aty − c ≧ 0, (7)
where z = (xt, yt)t. Here, for two vectors u, v with thesame dimension, uv denotes a vector of the same dimen-sion with components uivi. To solve (6) subject to (7) isequivalent to solve the primal and dual problem (1) and(3). Tanabe [2] has applied Kantorovich’s theorem to (6)to estimate error in an approximate solution. However,the solution proved to be included by this approach isnot guaranteed to satisfy the feasibility condition (7).This paper resolves this difficulty by introducing a con-tinuous Newton method. Namely, this paper proposesa method, in which Kantorovich’s theorem is used forproving the existence of a solution for complimentarityequation and the continuous Newton method is exploitedto prove feasibility of that solution. Since Kantorovich’stheorem for the Newton method is used, non-degeneracyof the optimum solution is required for our analysis. De-generate cases will be considered in a separate paper.
– 5 –
JSIAM Letters Vol. 1 (2009) pp.5–8 Shin’ichi Oishi and Kunio Tanabe
2. Verification Method
The center path of (6) (cf. for instance Refs. [3–5]) isdefined by
f(z) =
(
x(Aty − c)y(b − Ax)
)
= γe, (8)
where e ∈ Rm+n with all elements being 1. The constant
γ is defined by
γ =‖f(z)‖1
m + n=
(bty − ctx)
m + n(9)
provided that z is a feasible point. Namely, if z is afeasible point, γ is the duality gap of the problem dividedby m + n. This fact is pointed out in Refs. [4, 5].
The Frechet derivative f ′(z) is given by
f ′(z) =
(
[Aty − c] [x]At
−[y]A [b − Ax]
)
, (10)
where for a vector x = (x1, x2, · · · , xn)t, [x] denotesdiag(x1, x2, · · · , xn). At a given approximate optimumpoint z, the Newton direction dn and the centered di-rection dc are defined by
f ′(z)dn = −(
x(Aty − c)y(b − Ax)
)
(11)
and
f ′(z)dc = −(
x(Aty − c)y(b − Ax)
)
+ γe, (12)
respectively.In the method we shall propose the following verifica-
tion procedures. First, an interior point of (6) is searchedfor a searching direction, which is a linear combinationof dn and dc, based on the guiding cone method or thepenelalized norm method [4,5]. Here, we assume that wecan find an interior point z, which is a good approxima-tion of an optimum point. Then, the second step of ourmethod is to check conditions of the following theoremat the point z:
Theorem 1 Let z ∈ Rm+n be an interior point, namely
a point satisfying (7) with inequality condition:
x > 0, y > 0, b − Ax > 0 and Aty − c > 0. (13)
Let further constants α and ω be defined by the
inequalities α ≧ ‖f ′(z)−1‖∞‖f(z)‖∞ and ω ≧
2max (‖A‖∞, ‖A‖1)‖f ′(z)−1‖∞, respectively. If
αω ≦1
4, (14)
there exists an optimal point z∗ = (x∗t, y∗t)t ∈ B(z, ρ) =z′ ∈ R
m+n|‖z′ − z‖∞ ≦ ρ, which is a point satisfying
(6) and (7), where
ρ =1 −
√1 − 3αω
ω. (15)
The optimum point z∗ is unique in B(z, ρ).
We note that the half assertion of Theorem 1 can bederived from the following Kantorovich theorem appliedto the nonlinear equation (6):
Theorem 2 (Kantorovich’s Theorem for (6)) Let f be
defined by (6). We assume that the Frechet derivative
f ′(z) is nonsingular and satisfies the inequality
α′ ≧ ‖f ′(z)−1f(z)‖∞ (16)
for a certain positive α′. Furthermore, we assume that
f satisfies
‖f ′(z)−1(f ′(z′) − f ′(z′′))‖∞≦ ω′‖z′ − z′′‖∞ for ∀z′, z′′ ∈ R
m+n (17)
with a certain positive constant ω′. If
α′ω′ ≦1
2, (18)
and
ρ′ =1 −
√1 − 2α′ω′
ω′, (19)
there exists a point z∗ = (x∗t, y∗t)t ∈ B(z, ρ′) satisfying
(6). The solution z∗ of (6) is unique in B(z, ρ′).
Proof of Theorem 1 We assume that the conditionsof Theorem 1 is satisfied.
In the first place, we shall show that the conditionsof Theorem 2 are satisfied. We note that f is defined onR
m+n. If we put α′ = 1.5α, then
‖f ′(z)−1f(z)‖∞ ≦ ‖f ′(z)−1‖∞‖f(z)‖∞≦ α < α′. (20)
It is further noted that for any z′, z′′ ∈ Rm+n
f ′(z′) − f ′(z′′)
=
(
[At(y′ − y′′)] [x′ − x′′]At
−[y′ − y′′]A [−A(x′ − x′′)]
)
. (21)
Let ek = (1, 1, · · · , 1)t ∈ Rk and Ik be the identity ma-
trix in Rk. Then, from (21), we have the following ele-
mentwise inequality:
|f ′(z′) − f ′(z′′)|
≦ ‖z′ − z′′‖∞(
[|A|t em] [en] |A|t[em] |A| [|A| en]
)
≦ ‖z′ − z′′‖∞(
‖A‖1In |A|t
|A| ‖A‖∞
Im
)
(22)
which implies
‖f ′(z′) − f ′(z′′)‖∞≦ 2max (‖A‖∞, ‖A‖1)‖z′ − z′′‖∞. (23)
Here, for x = (x1, x2, · · · , xk)t, y = (y1, y2, · · · , yk)t ∈R
k
|x| = (|x1|, |x2|, · · · , |xk|)t (24)
and
x ≦ y ⇐⇒ xi ≦ yi, (i = 1, 2, · · · , k). (25)
Hence, we can use ω in Theorem 1 as ω′ in Theorem 2.If we put α′ = 1.5α and ω′ = ω, we have
α′ω′ = 1.5αω ≦ 3/8 < 1/2. (26)
Further, ρ′ coincides with ρ. Thus, from Kantorovich’stheorem (Theorem 2) it is seen that there exists a so-lution z∗ = (x∗t, y∗t)t ∈ B(z, ρ) satisfying (6). Kan-
– 6 –
JSIAM Letters Vol. 1 (2009) pp.5–8 Shin’ichi Oishi and Kunio Tanabe
torovich’s theorem states also that z∗ is unique solutionof (6) in the closed ball B(z, ρ).
Next, we show that z∗ is feasible, i.e., it satisfiesthe inequality conditions (7). Let us consider a solutioncurve of the following continuous Newton method start-ing from a given interior point z:
dz(t)
dt= −f ′(z(t))−1f(z(t)) with z(0) = z. (27)
The elementary theory for differential equations such asthe Picard-Lindelof theorem (see, for example, [6]) statesthat the solution curve z(t) exists for t ∈ [0,M) for acertain positive constant M .
Suppose T < M be the smallest value of T such thatz(T ) is on the boundary of the closed ball B(z, ρ). Then
‖z − z(T )‖∞ ≦
∫ T
0
∥
∥
∥
∥
dz(t)
dt
∥
∥
∥
∥
∞
dt < k‖f(z)‖∞, (28)
where k is defined by
k = maxz′∈B
‖f ′(z′)−1‖∞. (29)
This result was used in Refs. [3,7]. In fact, z(t) satisfies
df(z(t))
dt= −f(z(t)) with z(0) = z. (30)
Thus,
f(z(t)) = f(z)e−t (31)
holds. Hence, we have∥
∥
∥
∥
dz(t)
dt
∥
∥
∥
∥
∞
≦ ‖f ′(z(t))−1‖∞‖f(z(t))‖∞
≦ k‖f(z)‖∞e−t, (32)
which gives∫ T
0
∥
∥
∥
∥
dz(t)
dt
∥
∥
∥
∥
∞
dt ≦ k‖f(z)‖∞(1 − e−T )
< k‖f(z)‖∞. (33)
Furthermore, f(z(t)) = f(z)e−t implies z(t) startingwith an interior point remains to be an interior point fort ∈ [0,M).
We note that for z′ ∈ B(z, ρ)
‖f ′(z)−1(f ′(z) − f ′(z′))‖∞ ≦ ω‖z − z′‖∞ (34)
holds. We note also that (14) implies
ωρ < 1. (35)
Thus, from (34), it follows that
k = maxz′∈B
‖f ′(z′)−1‖∞
≦ maxz′∈B
‖f ′(z)−1‖∞1 − ‖I − f ′(z)−1f ′(z′)‖∞
= maxz′∈B
‖f ′(z)−1‖∞1 − ‖f ′(z)−1(f ′(z) − f ′(z′))‖∞
≦ maxz′∈B
‖f ′(z)−1‖∞1 − ω‖z − z′‖∞
≦‖f ′(z)−1‖∞
1 − ωρ. (36)
Therefore, we have
k‖f(z)‖∞ ≦‖f ′(z)−1‖∞‖f(z)‖∞
1 − ωρ
≦α
1 − ωρ. (37)
On the other hand, we can show the following inequal-ity
α
1 − ωρ≦ ρ. (38)
In fact, since 4αω ≦ 1, we have the inequality
1 − 2αω ≦√
1 − 3αω, (39)
which implies the inequality
α√1 − 3αω
≦1 −
√1 − 3αω
ω(40)
which is equivalent to the inequality (38).The inequalities (28), (37) and (38) imply
‖z − z(T )‖∞ < ρ (41)
which contradicts the fact that z(T ) is on the boundaryof B(z, ρ). Therefore, there exists no such T and thesolution curve is contained in the interior of the ballB(z, ρ). There is no singularity of the right hand side of(27) in B(z, ρ). By the elementary theory of differentialequation on extending solutions (see, for instance, [8]),the solution can be prolonged to the interval [0,∞), i.e.,M = ∞ and it converges to z∗ as t tends to ∞. In fact,let z∗∗ be a point in the limit set of the solution curve,which is obviously contained in the closed ball B(z, ρ).Then z∗∗ is a solution of (6) by (31). By the uniquenessof the solution of (6) in the closed ball B(z, ρ), it isidentical to z∗. Therefore, the solution curve convergesto z∗ as t tends to ∞.
Since the solution curve is contained in the feasibleset, the limit point z∗ is also a feasible point.
(QED)
3. Numerical Examples
In this section, let us present numerical examples. Forexecuting verified computation, we have used MATLABon Windows XP over a personal computer having Core2 Duo 1.2GHz Intel processor.
A verification function is programmed based on therounding mode controlled numerical verification methodproposed in Ref. [9].
Example 1 Let us consider the following problem:
Maximize ctx, subject to Ax ≦ b and x ≧ 0, (42)
where ct = (3, 2),
A =
−1 31 12 −1
(43)
and bt = (12, 8, 10). It is known that the optimal solution
– 7 –
JSIAM Letters Vol. 1 (2009) pp.5–8 Shin’ichi Oishi and Kunio Tanabe
is x = (6, 2)t. In this case, we have an interior point
x =
(
5.9999999999999992.000000000000000
)
,
y =
4.166666666666667 × 10−017
2.3333333333333343.333333333333334 × 10−1
.
For this point, we have
αω < 7.93 × 10−14. (44)
Thus, there exists an optimum solution of (42) in the
ball centered at z = (xt, yt)t with a radius
ρ = 5.14 × 10−15. (45)
Further, the objective value is included in
[22.00000000000000, 22.00000000000001]. These re-
sults are consistent with the exact solution x = (6, 2)t.
Example 2 Next, let us consider the following simple
linear programming problem:
Maximize ctx, subject to Ax ≦ b and x ≧ 0, (46)
where ct = (300, 300, 500),
A =
150 100 1001 2 10 0 150
(47)
and bt = (3000, 40, 1200). In this case, we have a feasible
solution
x =
5.999999999999997313.0000000000000048.0000000000000000
,
y =
1.500000000000000075.0000000000000001.8333333333333335
. (48)
For this feasible point, we have
αω < 1.64 × 10−11. (49)
Thus, there exists an optimum solution of (46) in the
ball centered at z = (xt, yt)t with a radius
ρ = 1.45 × 10−13. (50)
Example 3 In this example, we shall consider the fol-
lowing problem with n = 2m.
Maximize ctx, subject to Ax ≦ b and x ≧ 0, (51)
where
c = 10cr, A = 10E + 5Ar, b = 100br (52)
Here, cr ∈ Fn is a pseudo-random vector whose elements
are distributed uniformly in [0, 1], E ∈ Fm×n is a con-
stant matrix whose elements are all one, Ar ∈ Fm×n is
a pseudo-random matrix whose elements are distributed
according to the normal distribution with the mean zero
and the variation one, and br ∈ Fm is a pseudo-random
vector whose elements are distributed uniformly in [0, 1].We examine the cases of m = 100 , 200 , 300, 400, 500,
600, 700, 800 and m = 900. We solve the problem on
MATLAB. The result is summarized in Table 1. In this
table, ta is time for computing an approximate optimum
solution and tb that for numerical verification.
Table 1. Results of Verificationm + n αω ρ ta[sec] tv [sec]
300 3.78× 10−6 3.4× 10−12 0.30 0.29600 1.69× 10−5 5.0× 10−12 0.79 2.17
900 4.22× 10−4 2.9× 10−11 1.9 7.41200 0.0049 7.3× 10−11 3.9 17.41500 0.083 3.8× 10−10 7.2 33.41800 2.91× 10−4 1.8× 10−11 13 59
2100 0.189 4.7× 10−10 21 912400 0.051 2.1× 10−10 37 1372700 0.029 1.4× 10−10 54 184
3000 0.011 8.3× 10−11 72 270
In this numerical example, a computational cost to in-
clude optimum point is about 4 times than that for get-
ting an approximate optimum solution.
Acknowledgments
This research is supported by the Grant-in-Aid forSpecially Promoted Research from the MEXT, Japan:“Establishment of Verified Numerical Computation”,(No. 17002012). The authors express their sincere thanksto the referees for their valuable comments on this arti-cle.
References
[1] C. Jansson, Rigorous lower and upper bounds in linear pro-gramming, SIAM J. Optim., 14(3) (2004), 914–935.
[2] K. Tanabe, A posteriori error estimate for an approximatesolution of a general linear programming, in: New Methods
for Linear Programming 2, The Institute of Statistical Math-ematics Cooperative Research Report 10, pp.118–120, 1988.
[3] K. Tanabe, A geometric method in nonlinear programming,J. Optim. Theory Appl., 30 (1980), 181-210.
[4] K. Tanabe, Complementarity-enforcing centered newtonmethod for mathematical programming: global method, in:
New Methods for Linear Programming, The Institute of Sta-
tistical Mathematics Cooperative Research Report 5, pp.118–144, 1987.
[5] K. Tanabe, Centered Newton method for mathematical pro-gramming, in: System Modeling and Optimization, M. Iri and
K.Yajima eds., pp.197–208, Springer-Verlag, Berlin, 1988.[6] E. Zeitler, Nonlinear Functional Analysis and Its Applica-
tions: Part I Fixed Point Theorems, Springer-Verlag, New
York, 1986, p.78.
[7] K. Tanabe, Continuous Newton-Raphson method for solvingan underdetermined system of nonlinear equations, NonlinearAnal. T.M.A., 3 (1979), 495–503.
[8] M.W.Hirsch and S.Smale, Differential Equations, DynamicalSystems, and Linear Algebra, Academic Press, London, 1974,
p.171.[9] S. Oishi and S.M. Rump, Fast verification of solutions of ma-
trix equations, Numer. Math., 90 (4) (2002), 755–773.
– 8 –
JSIAM Letters Vol.1 (2009) pp.9–12 c©2009 Japan Society for Industrial and Applied Mathematics
2D tight framelets with orientation selectivity suggested
by vision science
Hitoshi Arai1
and Shinobu Arai
Graduate School of Mathematical Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8914, Japan1
E-mail [email protected]
Received August 19, 2008, Accepted October 22, 2008 (INVITED PAPER)
Abstract
In this paper we will construct compactly supported tight framelets with orientation selectivityand Gaussian derivative like filters. These features are similar to one of simple cells in V1revealed by recent vision science. In order to see the orientation selectivity, we also give asimple example of image processing of a test image.
Keywords wavelet frame, framelet, visual cortex, simple cell, orientation selectivity
Research Activity Group Wavelet Analysis
1. Introduction
Simple cells in V1 of the brain cortex play impor-tant roles in the visual information processing, and somemathematical models of such cells have been studied byusing the Gabor function or DOG function. However, R.Young established that Gaussian derivative models aresuitable for studying simple cells (see [1]). In this paperwe construct compactly supported framelets which havesimilar graphs as Gaussian derivatives and good orien-tation selectivity. See [2] for the definition of framelets.In [3], B. Escalante-Ramırez and J. Silvan-Cardenas con-structed a multi-channel model with orientation selectiv-ity by means of Gaussian derivatives. However the Gaus-sian function is not compactly supported. In the previ-ous paper [4], we presented new wavelet frames with ori-entation selectivity and Gaussian derivative like shape.The frames are defined only on product of two finiteabelian groups, Z/N1Z × Z/N2Z, where Z is the ad-ditive abelian group consisting of all integers, and N1
and N2 are positive integers. However our framelet fil-ters presented in this paper define not only tight framesof l2(Z/N1Z ×Z/N2Z), but also one of l2(Z ×Z), andof L2(R2).
To describe our construction we mention our termi-nology. Let Z
2 = Z × Z. For a matrix A we denoteby AT the transpose of A. For a 2 × 2 matrix M ,let N (M) be the set (x1, x2) M : x1, x2 ∈ [0, 1) ∩ Z
2.In this paper we will be concerned with the following
matrices: Mr =
(
2 00 2
)
, Mq =
(
1 1−1 1
)
and
Mh =
(
1 12 −2
)
. These matrices are related to deci-
mation of 2D signals: Mr defines a rectangular decima-tion, Mq a quincunx decimation, and Mh a hexagonaldecimation. Suppose M = Mr, Mq, or Mh. For a sta-ble filter h = (h[m1,m2])m1,m2∈Z
, let H(ω1, ω2) be its
frequency response function
∞∑
m1,m2=−∞
h[m1,m2]e−2πim1ω1e−2πim2ω2 .
The purpose of this paper is to give a simple constructionof a finite number of FIR filters hs = (hs[n])n∈Z2 , s ∈ S,having the above mentioned properties and the followingconditions: For all ω ∈ R
2 and for all r ∈ N (M) withr 6= 0,
∑
s∈S
|Hs(ω)|2 = |det M | , (1)
∑
s∈S
Hs(ω)Hs(ω + rM−1) = 0, (2)
where the bar denotes the complex conjugation. For u ∈Z
2, let hs,u[v] = hs[v − uMT ], v ∈ Z2, and hs,u =
(hs,u[v])v∈Z2 . Obviously hs,u ∈ l2(Z2). Since filters hs,
s ∈ S, satisfy the conditions (1) and (2), hs,us∈S,u∈Z2
is a tight frame of l2(Z2). Moreover as we will show later,when M = Mr, constant multiplication of our filterssatisfies the unitary extension principle, and thereforewe gain also a tight frame of L2(R2).
We note that several wavelet frames with orientationselectivity had been constructed: for example curvelet[5], contourlet [6], complex wavelet [7], wavelet framesin [4] and so on. However our framelet is completely dif-ferent from them, and satisfies several properties similarto simple cells.
2. Construction
Suppose n is an integer with n ≥ 2. Let rn = 1 if n isodd, and let rn = 0 if n is even. Then there is a uniquepositive integer r such that n = 2r + rn. Let
Λf = (0, 0) , (0, n) , (n, 0) , (n, n) ,
Λg = (k, l)k=0,n;l=1,2,··· ,n−1∪(k, l)l=0,n;k=1,2,··· ,n−1,
– 9 –
JSIAM Letters Vol. 1 (2009) pp.9–12 Hitoshi Arai and Shinobu Arai
Λa = (k, l)k=1,2,··· ,n−1; l=1,2,··· ,n−1.
We abbreviate c(x) = cos (πx) and s(x) = sin (πx). Wedenote by
(
nk
)
the binomial coefficient. Let αk =(
nk
)
,
βk =(
n−2
k−1
)
, and cM = |det M |1/2. For (k, l) ∈ Λf , let
Fk,l(x, y) = cM ik+le−rnπi(x+y)c(x)n−ks(x)kc(y)n−ls(y)l.
For (k, l) ∈ Λg, let
Gk,l(x, y) = 2−1/2cM ik+l√αkαle−rnπi(x+y)
× c(x)n−ks(x)kc(y)n−ls(y)l.
For (k, l) ∈ Λa, let
Aκk,l(x, y) =
2−1cM ik+l+1e−rnπi(x+y)c(x)n−k−1s(x)k−1c(y)n−l−1
× s(y)l−1
(
(−1)κ√
αkβlc(x)s(x) +√
αlβkc(y)s(y))
,
where κ = 1, 2. Let
δn,kν =
min(ν+r,n−k)∑
µ=max(ν+r−k,0)
(−1)ν+µ
(
n − k
µ
)(
k
ν + r − µ
)
,
γn,lν =
min(ν+r−1,n−l−1)∑
µ=max(ν+r−l,0)
(−1)ν+µ
(
n − l − 1
µ
)(
l − 1
ν + r − 1 − µ
)
.
By calculation, we have the following lemma.
Lemma 1 Fk,l, Gk,l, A1k,l and A2
k,l are frequency re-
sponse functions of real valued FIR filters as follows.
Fk,l(x, y) =cM
22n
r+rn∑
ν=−r
r+rn∑
λ=−r
δn,kν δn,l
λ e−2πi(νx+λy),
Gk,l(x, y)
=cM
√αkαl
22n+1/2
r+rn∑
ν=−r
r+rn∑
λ=−r
δn,kν δn,l
λ e−2πi(νx+λy),
Aκk,l(x, y) =
cM
22n−1
√
αlβk
r−1+rn∑
ν=−r+1
r+rn∑
λ=−r
γn,kν δn,l
λ e−2πi(νx+λy)
+ (−1)κ
√
αkβl
r+rn∑
ν=−r
r−1+rn∑
λ=−r+1
δn,kν γn,l
λ e−2πi(νx+λy)
.
Let fk,l, gk,l, a1k,l and a2
k,l be filters whose frequency
response functions are Fk,l, Gk,l, A1k,l and A2
k,l, respec-tively. Let
H = fk,l(k,l)∈Λf∪gk,l(k,l)∈Λg
∪
aκk,l
(k,l)∈Λa,κ∈1,2.
We call this family of filters H the simple pinwheelframelet of degree n (abbr. SP framelet of degree n).This name comes from the famous “pinwheel structure”of simple cells. Our main theorem is the following:
Theorem 1 (i) If n is odd, H satisfies the conditions
(1) and (2) for Mr, Mq and Mh.(ii) If n is even and n ≥ 4, H satisfies the conditions
(1) and (2) for Mr and Mq, but does not satisfy (2) for
Mh. If n = 2, H satisfies the condition (1), but does not
satisfy (2) for Mr, Mq and Mh.
Sketch of the proof For real numbers p and q, let
Φ(x, y; p, q) =∑
(k,l)∈Λf
Fk,l(x, y)Fk,l(x + p, y + q)
+∑
(k,l)∈Λg
Gk,l(x, y)Gk,l(x + p, y + q)
+
2∑
κ=1
∑
(k,l)∈Λa
Aκk,l(x, y)Aκ
k,l(x + p, y + q).
Suppose n is odd. By calculation we have thatΦ(x, y; p, q) = 0 if p = 1/2+m (m ∈ Z) or q = 1/2+m′
(m′ ∈ Z), and that Φ(x, y; 0, 0) = |det M |. In particular,
Φ
(
x, y;1
2, 0
)
= Φ
(
x, y; 0,1
2
)
= Φ
(
x, y;1
2,1
2
)
= Φ
(
x, y;1
2,1
4
)
= Φ
(
x, y;1
2,3
4
)
= 0.
This implies (i). If n is even and n ≥ 4, then we haveΦ(x, y; 0, 0) = |det M |, and
Φ
(
x, y;1
2, 0
)
= Φ
(
x, y; 0,1
2
)
= Φ
(
x, y;1
2,1
2
)
= 0.
However Φ (x, y; 1/2, 1/4) and Φ (x, y; 1/2, 3/4) are notidentically zero. Suppose n = 2. Then it is easy to showthat Φ(x, y; 0, 0) = |det M |, and that Φ(x, y; 1/2, 1/2) isnot identically zero.
(QED)
Suppose M = Mr and n is a positive integer with n ≥3. Let B1(x) be the characteristic function of the interval[−1/2, 1/2) on R, and Bm+1(x) = Bm ∗ B1(x), m =1, 2, · · · , where ∗ is the convolution on R. We considerthe SP framelet of degree n. Let
f0,0(x, y) = Bn(x − 1/2)Bn(y − 1/2).
Then the Fourier transform of f0,0 is as follows:
f0,0(ξ1, ξ2) = e−πi(ξ1+ξ2)
(
s(ξ1)
πξ1
s(ξ2)
πξ2
)2
.
Hence we have
f0,0(2ξ1, 2ξ2) = c−1
M F0,0(ξ1, ξ2)f0,0(ξ1, ξ2).
Define
fk,l(ξ1, ξ2) = c−1
M Fk,l(ξ1/2, ξ2/2)f0,0(ξ1/2, ξ2/2),
gk,l(ξ1, ξ2) = c−1
M Gk,l(ξ1/2, ξ2/2)f0,0(ξ1/2, ξ2/2),
aκk,l(ξ1, ξ2) = c−1
M Aκk,l(ξ1/2, ξ2/2)f0,0(ξ1/2, ξ2/2).
By the unitary extension principle [8] we have thatfk,l(k,l)∈Λf\(0,0)
∪ gk,l(k,l)∈Λg∪ aκ
k,l(k,l)∈Λa,κ∈1,2
is a tight frame of L2(R2).
– 10 –
JSIAM Letters Vol. 1 (2009) pp.9–12 Hitoshi Arai and Shinobu Arai
Fig. 1. Filters of MOGMRA (level 2, n = 5).
3. Discussion related to vision science
and image processing
To apply our framelets to computational experimentsfor studying vision, we discuss here a maximal over-lap version of the generalized multiresolution analysis(MOGMRA) defined on Z/N1Z × Z/N2Z, where N1
and N2 are positive even integers. We refer our pa-per [9] why MOGMRA is suitable for studying mathe-matical models of visual information processing. See [10]and [11] for maximal overlap multiresolution analysis forwavelets. We begin with describing MOGMRA basedon our SP framelet of degree n. Suppose M = Mr.For a positive integer N , let ZN = 0, 1, · · · , N − 1.For y = (y[m])m∈ZN1/2×ZN2/2
, we denote by yM =(
yM [m])
m∈ZN1×ZN2
the upsampling of y by the sam-
pling matrix M , that is yM [mMT ] = y[m] for m ∈ZN1/2 × ZN2/2, and otherwise yM [m] = 0. For x =(x[m])m∈ZN1
×ZN2
, let x[m] = x[m]+x[m1+N1/2,m2+
N2/2], m = (m1,m2) ∈ ZN1/2 × ZN2/2, and let S(x) =
(x)M
. Let S0(x) = x, and Sµ(x) = S(Sµ−1(x)) forµ = 1, 2, · · · . For a stable filter h ∈ l1(Z2), let p(h) beits periodization, that is, p(h) = (p(h)[m])m∈ZN1
×ZN2
where for m = (m1,m2) ∈ ZN1× ZN2
,
p(h)[m] =∞∑
k1,k2=−∞
h[m1 + N1k1,m2 + N2k2].
Let T j(h) = Sj−1(p(h)), j = 1, 2, · · · . For x =(x[m])m∈ZN1
×ZN2
, (k1, k2) ∈ Z2, and µ = (µ1, µ2) ∈
ZN1×ZN2
, let xper[µ1 +k1N1, µ2 +k2N2] = x[µ]. Thenxper is identified with a signal defined on Z/N1Z ×
Z/N2Z. Let x∨[m] = xper[−m1,−m2]. For x =(x[m])m∈ZN1
×ZN2
and y = (y[m])m∈ZN1×ZN2
, we de-
note by x ∗ y the cyclic convolution, that is, x ∗ y[m] =∑
k∈ZN1×ZN2
xper[k]yper[m − k], m ∈ ZN1× ZN2
. For
x = (x[m])m∈ZN1×ZN2
, the first stage of the decomposi-
tion of MOGMRA is defined by F 1k,l(x) = T 1(fk,l)
∨ ∗ x,
G1k,l(x) = T 1(gk,l)
∨∗x, and Aκ,1k,l (x) = T 1(aκ
k,l)∨∗x. The
second stage is defined by F 2k,l(x) = T 2(fk,l)
∨ ∗ F 10,0(x),
G2k,l(x) = T 2(gk,l)
∨ ∗F 10,0(x), and Aκ,2
k,l (x) = T 2(aκk,l)
∨ ∗F 1
0,0(x). In general, the j-th stage is as follows: F jk,l(x) =
T j(fk,l)∨ ∗ F j−1
0,0 (x), Gjk,l(x) = T j(gk,l)
∨ ∗ F j−1
0,0 (x), and
Aκ,jk,l (x) = T j(aκ
k,l)∨ ∗ F j−1
0,0 (x). These are the decompo-
sition phase. Let F j = T j(f0,0). For a positive integer J ,the synthesis phase is defined as follows:
˜F Jk,l(x) = 4−JF 1 ∗ · · · ∗ F J−1 ∗ T J(fk,l) ∗ F J
k,l(x),
˜F J−1
k,l (x) = 4−J+1F 1∗ · · · ∗ F J−2∗ T J−1(fk,l) ∗ F J−1
k,l (x),
...
˜F 1k,l(x) = 4−1T 1(fk,l) ∗ F 1
k,l(x),
Fig. 2. A test image.
– 11 –
JSIAM Letters Vol. 1 (2009) pp.9–12 Hitoshi Arai and Shinobu Arai
Fig. 3. MOGMRA decomposition of the test image (level 2).
˜GJk,l(x) = 4−JF 1 ∗ · · · ∗ F J−1 ∗ T J(gk,l) ∗ GJ
k,l(x),
...
˜G1k,l(x) = 4−1T 1(gk,l) ∗ G1
k,l(x),
and ˜Aκ,jk,l are defined by the similar way. Then we obtain
x = ˜F J0,0(x) +
J∑
j=1
∑
(k,l)∈Λf\(0,0)
˜F jk,l(x)
+∑
(k,l)∈Λg
˜Gjk,l(x) +
∑
(k,l)∈Λa
∑
κ=1,2
˜Aκ,jk,l (x)
.
We call this decomposition MOGMRA decomposition ofx at level J . By the same way as in [4], we can defineMOGMRA when N1 and N2 are not even.
Let δ be the 2D unit impulse supported at (N1/2 +1, N2/2 + 1), and let δ′ = p(δ). Suppose n = 5. Fig. 1depicts the plots of the outputs of δ′ by F 2
k,l, G2k,l, and
Aκ,2k,l arranged by the following rule:
F 25,0, G
24,0, G
23,0, G
22,0, G
21,0, F
20,0,
G25,1, A
1,24,1, A
1,23,1, A
1,22,1, A
1,21,1, G
20,1, A
2,24,1, A
2,23,1, A
2,22,1, A
2,21,1,
G25,2, A
1,24,2, A
1,23,2, A
1,22,2, A
1,21,2, G
20,2, A
2,24,2, A
2,23,2, A
2,22,2, A
2,21,2,
G25,3, A
1,24,3, A
1,23,3, A
1,22,3, A
1,21,3, G
20,3, A
2,24,3, A
2,23,3, A
2,22,3, A
2,21,3,
G25,4, A
1,24,4, A
1,23,4, A
1,22,4, A
1,21,4, G
20,4, A
2,24,4, A
2,23,4, A
2,22,4, A
2,21,4,
F 25,5, G
24,5, G
23,5, G
22,5, G
21,5, F
20,5.
Next we consider a test image (Fig. 2). Fig. 3 is theMOGMRA decomposition of the test image at level 2.From this result of image processing we can concludethat our framelet has good orientation selectivity.
References
[1] R. Young, Oh say, can you see? The physiology of vision,
SPIE, 1453 (1991), 92–123.[2] I. Daubechies, B. Han, A. Ron and Z. Shen, Framelets: MRA-
based construction of wavelet frames, Appl.Comput.Harmon.Anal., 14 (2003), 1–46.
[3] B. Escalante-Ramırez and J. Silvan-Cardenas, Advancedmodeling of visual information processing: A multi-resolutiondirectional-oriented image transform based on Gaussian
derivatives, Signal Processing:Image Comm. 20 (2005), 801–812.
[4] H.Arai and S.Arai, Finite discrete, shift-invariant, directionalfilterbanks for visual information processing, I: construction,
Interdisciplinary Information Sciences, 13 (2007), 255–273.[5] E.J. Candes and D. Donoho, New tight frames of curvelets
and optimal representations of objects with piecewise C2 sin-
gularities, Comm. Pure and Appl. Math., 57 (2004), 219–266.[6] M. N. Do and M. Vetterli, The contourlet transform: An effi-
cient directional multiresolution image representation, IEEETrans. Image Processing, 14 (2005), 2091–2106.
[7] N. G. Kingsbury, Image processing with complex wavelets,Phil. Trans. Roy. Soc. London, A357 (1999), 2543–2560.
[8] A. Ron and Z. Shen, Affine systems in L2(Rd): the analysis ofthe analysis operator, J. Funct. Anal., 148 (1997), 408–447.
[9] H. Arai, A nonlinear model of visual information processingbased on discrete maximal overlap wavelets, InterdisciplinaryInformation Sciences, 11 (2005), 177–190.
[10] G. P. Nason and B. W. Silverman, The stationary wavelettransform and some statistical applications, Lect.Notes Stat.,Vol.103, pp.288–299, Springer-Verlag, 1995.
[11] D. B. Percival and A. T. Walden, Wavelet Methods for Time
Series Analysis, Cambridge Univ. Press, 2000.
– 12 –
JSIAM Letters Vol.1 (2009) pp.13–16 c©2009 Japan Society for Industrial and Applied Mathematics
Analysis of Neuronal Dendrite Patterns Using Eigenvalues
of Graph Laplacians
Naoki Saito1
and Ernest Woei1
Department of Mathematics, University of California, Davis, CA 95616 USA1
E-mail [email protected], [email protected]
Received September 29, 2008, Accepted October 16, 2008 (INVITED PAPER)
Abstract
We report our current effort on extracting morphological features from neuronal dendritepatterns using the eigenvalues of their graph Laplacians and clustering neurons using thosefeatures into different functional cell types. Our preliminary results indicate the potentialusefulness of such eigenvalue-based features, which we hope to replace the morphologicalfeatures extracted by methods that require extensive human interactions.
Keywords pattern analysis, graph Laplacian, eigenvalues of Laplacian matrices
Research Activity Group Wavelet Analysis
1. Introduction
In recent years, the advent of new sensors and tech-niques has allowed one to image complicated intercon-nected structures in biology such as dendrites connectedto a single neuron, neuronal axon/fiber tracts in a hu-man brain, and a network of blood vessels in humanbody. Neuroscientists hope to gain insight into modelingand understanding brain functions by analyzing imagesof such network structures. The actual analysis of them,however, remains elusive. For example, vision scientistswant to understand how the morphological propertiesof dendrite patterns of retinal ganglion cells (RGCs),such as those shown in Figure 1, relate to the functionaltypes of these cells. Although such classification of neu-rons should ultimately be done on the basis of molec-ular or genetic markers of neuronal types, it has notbeen forthcoming. Hence, neuronal morphology has of-ten been used as a neuronal signature that allows oneto classify a neuron such as an RGC into different func-tional cell types [1]. The state of the art procedure is stillquite labor intensive and costly: automatic segmentationalgorithms to trace dendrites in a given 3D image ob-tained by a confocal microscope only generate imperfectresults due to occlusions and noise; moreover, one has topainstakingly extract many morphological and geometri-cal parameters (e.g., somal size, dendritic field size, totaldendrite length, the number of branches, branch angle,etc.) with the help of an interactive software system. Infact, 14 morphological and geometric parameters wereextracted from each cell in [1]. It takes roughly half aday to process a single cell from segmentation to param-eter extraction!
In this paper, we examine how to analyze and charac-terize such neuronal dendrite structures automaticallyusing computational harmonic analysis techniques sothat we can save human interaction cost in this dendritepattern analysis.
2. Analysis of Dendrite Structures via
Graph Laplacian Eigenvalues
The segmentation and tracing software system usedby our collaborator, Prof. Leo Chalupa and his group(Dept. Neurobiology, Physiology & Behavior, UC Davis)provides us with a sequence of 3D coordinates that rep-resent points sampled along dendrite arbors (or paths)of RGCs with the branching information [1]. One ofthe most natural and simplest ways to model such a
Fig. 1. Dendrites of various types of retinal ganglion cells of amouse; reprinted from [1] with permission from Elsevier.
– 13 –
JSIAM Letters Vol. 1 (2009) pp.13–16 Naoki Saito and Ernest Woei
network-like structure is to construct a graph. Hence,our first task is to convert such a sequence of 3Dpoints to a connected graph G consisting of the ver-tex set V and edge set E. To fix our notation, let Gbe a graph representing dendrite patterns of an RGC,V = V (G) = v1, v2, . . . , vn where each vk ∈ R
3 is a3D sample point along dendrite arbors of this RGC, andE = E(G) = e1, e2, . . . , em where ek connects two ver-tices vi, vj for some 1 ≤ i, j ≤ n, and write ek = (vi, vj).Let dvk
be the degree (or valency) of the vertex vk. Infact, dendrite patterns of each RGC in our dataset canbe converted to a tree rather than a general graph sinceit is connected and contains no cycles. We also note thatwe only deal with unweighted graphs in this paper. Inother words, we essentially examine the connectivitiesand complexity of the dendrite graphs, which may notreflect the physical lengths of the dendrite arbors. Wewill defer our investigation of models that reflect suchphysical realities as our future project, which includesthe weighted graphs where each edge e ∈ E has weightwe := ‖vi − vj‖−1, i.e., the inverse of the physical dis-tance between two vertices of e.
Once we construct a graph per RGC, we proceed asfollows:
Step 1: Construct the Laplacian matrix (often calledthe combinatorial Laplacian matrix) L(G) :=D(G) − A(G) where D(G) := diag(dv1
, . . . , dvn) is
the diagonal matrix of vertex degrees and A(G) =(ai,j) is the adjacency matrix of G, i.e., ai,j = 1 ifvi and vj are adjacent; otherwise it is 0.
Step 2: Compute the eigenvalues of L(G);
Step 3: Construct features using these eigenvalues;
Step 4: Repeat the above steps for all the RGCs andfeed these feature vectors to clustering algorithms.
Our rationale behind using the Laplacian eigenvalues isthe following: They reflect various intrinsic geometricinformation about the graph e.g., connectivity (or thenumber of separated components), diameter (the maxi-mum distance over all pairs of vertices), mean distance,etc.; see, e.g., [2,3] for the details on the graph Laplacianeigenvalues. In fact, we view the dendrites connected toa neuron as a musical instrument, try to “listen” to itssounds, and check if those can be used to characterizethe dendrite patterns. We know that it is not possible touniquely identify a graph from its Laplacian eigenvaluesin general. In particular, “almost all trees are cospec-tral”; see, e.g., [3]. In practice, however, it is often possi-ble to obtain good approximation of a graph from them.Hence, we believe that the features based on the Lapla-cian eigenvalues of a graph will be useful for variousrecognition and clustering purposes.
Before stating the facts or theorems in [3, 4] (seealso [2]) that are used to construct our features, letus fix our notation and define several key quantities.Let | · | denote a size of a set. Let |V | = n, and let0 = λ0 ≤ λ1 · · · ≤ λn−1 be the sorted eigenvalues ofL(G). Let mG(λ) denote the multiplicity of λ as aneigenvalue of L(G), and let mG(I) be the number ofeigenvalues of L(G), multiplicities included, that belong
to I, an interval of the real line. A vertex of degree 1is called a pendant vertex, and a vertex adjacent to apendant vertex is called pendant neighbor. Let p(G) andq(G) be the number of pendant vertices and the numberof pendant neighbors of G, respectively. For a nonemptysubset of vertices S ⊂ V (G), let ∂S be the boundary ofS defined as ∂S := e = (u, v) ∈ E(G) |u ∈ S, v /∈ S.Let i(G) be the isoperimetric number of G:
i(G) := inf
|∂S||S|
∣
∣
∣
∣
∅ 6= S ⊂ V, |S| ≤ n/2
. (1)
The isoperimetric number is closely related to the con-
ductance of a graph, i.e., how fast a random walk on Gconverges to a stationary distribution. The Wiener in-
dex W (G) of a graph G is the sum of the entries in theupper triangular part of the distance matrix ∆(G) ofG, where (∆(G))i,j is the number of edges in a shortestpath from vertex vi to vertex vj . The Wiener index of amolecular graph has been used in chemical applicationsbecause it may exhibit a good correlation with physicaland chemical properties of the corresponding molecule.
We now list several theorems we use in this paper.
• mG(0) is equal to the number of connected compo-nents of G.
• The number of pendant neighbors of G is boundedas:
p(G) − mG(1) ≤ q(G) ≤ mG(2, n], (2)
where the second inequality holds if G is connectedand satisfies 2q(G) < n.
• For n ≥ 4, the isoperimetric number i(G) satisfies
i(G) <
√
(
2 maxv∈V (G)
dv − λ1(G)
)
λ1(G). (3)
• Let G be a tree. Then
W (G) =
n−1∑
k=1
n
λk
. (4)
3. Numerical Experiments and Prelimi-
nary Results
In this section, we report our preliminary resultswe obtained very recently. We only use the dendritepatterns categorized into the so-called “monostratified”RGCs, meaning the dendrites of those RGCs are con-fined to either the On or the Off sublaminae of the innerplexiform layer (a layer immediately below the RGCstoward rods and cones) [1], which should be contrastedwith “bistratified” RGCs whose dendrites span the Onand Off sublaminae).
The following features were used to characterize thedendrite patterns of 130 monostratified RGCs.
Feature 1: (p(G)−mG(1))/|V (G)| as a lower bound ofthe number of the pendant neighbors q(G) as shownin (2) with the normalization by |V (G)| ;
Feature 2: The normalized Wiener indexW (G)/|V (G)| via (4) ;
Feature 3: mG(4,∞)/|V (G)|, i.e., the number of eigen-values of L(G) larger than 4 (normalized) ;
– 14 –
JSIAM Letters Vol. 1 (2009) pp.13–16 Naoki Saito and Ernest Woei
−110 −100 −90 −80 −70 −60 −50 −40 −30
−30
−20
−10
0
10
20
X (µ m)
Y (
µ m
)
(a) RGC #60
−40 −20 0 20 40 60 80
−220
−210
−200
−190
−180
−170
−160
−150
−140
−130
−120
X (µ m)
Y (
µ m
)
(b) RGC #100
Fig. 2. Zoom up of a part of two RGCs belonging to Cluster 1(a) and Cluster 6 (b). One can see some “spines” in (a).
Feature 4: The upper bound of the isoperimetric num-ber i(G) shown in (3) .
We normalized Features 1, 2, 3, by the number of verticesin the graph because we wanted to make features lessdependent on the number of samples or how the dendritearbors are sampled. Of course, the number of verticesitself could be a feature although it may not be a decisiveone. On the other hand, Feature 4 was not explicitlynormalized because the isoperimetric number (1) itselfis a normalized quantity in terms of number of vertices.
Feature 1 was used because the number of pendantneighbors seems to be strongly related to the so-called“spines,” short protrusions from the dendrite arbors.Figure 2(a) shows several spines as the edges of length 1each of which is attached to a terminal vertex of degree1. Hence, we expect that the larger this lower boundp(G) − mG(1) is, the more likely for the RGC to havespines. In contrast to Figure 2(a), Figure 2(b) shows anexample of the RGC whose Feature 1 value is small.Apparently, there is no spine in this figure and each ofthe pendant neighbors has exactly one pendant (or ter-minal) vertex. The reason why we used Feature 3, thenormalized version of mG(4,∞), is based on our follow-ing observations. The Laplacian eigenvalue distributionof each RGC dendrite graph typically looks like that inFigure 3. It consists of a smooth bell-shaped curve thatranges over the interval [0, 4] and the sudden burst abovethe value 4. We have observed that this value 4 is criti-cal since the eigenfunctions corresponding to the eigen-values below 4 are semi-global oscillations (like Fouriercosines/sines) over the entire dendrites or one of the den-drite arbors whereas those corresponding to the eigen-values above 4 are much more localized (like wavelets)in branching regions. Figures 4 and 5 demonstrate ourobservation.
Finally, Figures 6 and 7 show the scatter plots of thesefour features of 130 RGCs (we only show two such plotshere out of six possible scatter plots). The numbers inthe plots are the cluster numbers obtained by Coombs etal. [1] using the hierarchical clustering algorithm on the14 morphological features. From these figures, we canobserve that Cluster 6 RGCs separate themselves quitewell from the other RGC clusters. In fact, the sparse anddistributed dendrite patterns such as those in Clusters6 and 10 are located below the major axis of the pointclouds in Figure 6 and above the major axis of the pointclouds in Figure 7. These imply that the dendrite pat-
0 200 400 600 800 1000 12000
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
k
λ k
Fig. 3. A typical distribution of the Laplacian eigenvalues. RGC#100 in Cluster 6 was used for this figure.
−200 −100 0 100 200−250
−200
−150
−100
−50
0
50
100
150
200
250
X (µ m)
Y (
µ m
)
Fig. 4. The Laplacian eigenfunction of RGC #100 corresponding
to the eigenvalue λ1141 = 3.9994, immediately below the value4. Note that the support of this eigenfunction is semi-global, i.e.,covers one whole dendrite arbor.
terns belonging to Cluster 6 and 10 have smaller numberof spines and smaller Wiener indices compared to theother denser dendrite patterns such as Clusters 1 to 5.Also we observe that the feature variability of RGCs inClusters 7 and 8 are higher than the other clusters.
4. Discussion
Our results reported here are still preliminary. Thereare many things to be done. Among them, the mosturgent is to answer the following natural questions: 1)Among the features derivable by directly analyzing agraph (e.g., those 14 features used in [1]), which onescan be derived from the Laplacian eigenvalues and whichones cannot? 2) Among the features derivable by bothmethods, which ones can be derived more easily usingthe Laplacian eigenvalues than the direct graph analy-sis? For example, computing the isoperimetric number
– 15 –
JSIAM Letters Vol. 1 (2009) pp.13–16 Naoki Saito and Ernest Woei
−200 −100 0 100 200−250
−200
−150
−100
−50
0
50
100
150
200
250
X (µ m)
Y (
µ m
)
Fig. 5. The Laplacian eigenfunction of RGC #100 correspondingto the eigenvalue λ1142 = 4.3829, immediately above the value
4. Note that the support of this eigenfunction is localized aroundthe branching point.
−5.5 −5 −4.5 −4 −3.5 −3 −2.5 −2 −1.59
10
11
12
13
14
15
16
1
1
1
1
1
1
1
1
1
2
2
22
2
22
2
3
333
3
3
3
3
3333
3
3 3
3
33
44
4
4
4
4
4
4
5
55
5
5
5
5
5
5
5
6
666
6
6
6
6
6
7
7777
7
7
7
77
7
7
7
77
8
8 8
8
8
8
8
8
8
88
8
8
8
8
8
8
8
8
99
9 99
9
9
9
9
9
99
9
9
9
999
9
9
9
10
10
1010
10
10
10
10
10
10
10
10
10
p(G)−mG
(1) [normalized by #vertices] (log scale)
Wie
ner
Inde
x [n
orm
aliz
ed b
y #v
ertic
es] (
log
scal
e)
Fig. 6. A scatter plot of the normalized lower bounds of the num-ber of the pendant neighbors vs the normalized Wiener indices.
−5.5 −5 −4.5 −4 −3.5 −3 −2.5 −2−5.5
−5
−4.5
−4
−3.5
−3
−2.5
−2
1
1
1
1
1
1
1
1
1
2
2
222
22
2
3
3
33
3
3
3
3
333
3
3
3
3
3
33
444
4
4
4
44
5
5
55
5
5
5
55
5
66
66
66
6
6
6
7
77
7
7
77
7
77
7
7
7
778 8
8
8
8
8
8
88
88
8
8
8
8
8
88
8
9
9
99
99
9
9
9
99 99
9
99
99
9
9
9
10
10
10
10
10
10
10
10
10
10
10
10
10
mG
(4,∞) [normalized by #vertices] (log scale)
Upp
er B
ound
s of
Isop
erim
etric
Num
ber
(log
scal
e)
Fig. 7. A scatter plot of the normalized number of the eigen-values larger than 4 vs the upper bounds of the isoperimetric
numbers.
i(G) of a given graph G is an NP problem in terms ofnumber of the vertices [4], and yet we can estimate itsupper bound easily using the Laplacian eigenvalue asshown in (3).
Next, we also need to deepen our theoretical under-standing of the sudden behavior change (like a phasetransition) of the Laplacian eigenfunctions correspond-ing to the eigenvalues below and above 4 as demon-strated in Figures 3, 4, and 5. Note that this phe-nomenon occurs in each cell.
Another interesting thing we need to investigate is to“resample” the dendrite patterns so that each tree hasthe same number of vertices. If we can do so, then thereis no need to normalize the above features by |V (G)|,and we can really examine whether those features arereflecting topological information of the dendrite pat-terns rather than the number of vertices. This resam-pling, however, must be done very carefully (e.g., notskipping the vertices with degree other than 2) so thatwe do not change the topology of the patterns.
Yet another investigation should be to consider theDirichlet-Laplacian eigenvalue problems by explicitlyimposing the Dirichlet boundary condition on the termi-nal nodes of the trees, and then compare the eigenvalueswith those of the combinatorial Laplacians; see [2,4] formore about the Dirichlet-Laplacian eigenvalues.
Finally, analysis using the weighted graphs, as brieflymentioned in the beginning of Section 2, should be care-fully done. On one hand, the weighted graphs reflectmore physical reality of the RGCs, hence we can expectmore accurate results. On the other hand, analysis ofsuch graphs are expected to be tougher than combina-torial Laplacian used in this paper because for example,mG(1) among the different RGCs does not have the samemeaning anymore.
This is quite an interdisciplinary research project thattaps into extremely rich mathematical ideas. We hope toreport more results in the near future.
Acknowledgments
The authors would like to thank Prof. Leo Chalupaand Dr. Julie Coombs of UC Davis for providing uswith the dendrite datasets and answering many ques-tions. This research was partially supported by the USNational Science Foundation grant DMS-0410406, andthe US Office of Naval Research grant N00014-07-1-0166.
References
[1] J. Coombs, D. van der List, G.-Y. Wang and L. M. Chalupa,Morphological properties of mouse retinal ganglion cells, Neu-
roscience, 140 (2006), 123–136.[2] F. R. K. Chung, Spectral Graph Theory, CBMS Regional
Conference Series in Mathematics, No. 92, Amer. Math. Soc.,Providence, RI, 1997.
[3] R. Merris, Laplacian matrices of graphs: A survey, Lin. Alg.Appl., 197-198 (1994), 143–176.
[4] H. Urakawa, Spectral geometry and graph theory, Bull. JapanSIAM, 12 (2002), 29–45 (in Japanese).
– 16 –
JSIAM Letters Vol.1 (2009) pp.17–20 c©2009 Japan Society for Industrial and Applied Mathematics
The Gateau derivative of cost functions in the optimal
shape problems and the existence of the shape derivatives
of solutions of the Stokes problems
Satoshi Kaizu
Received November 14, 2008, Accepted December 16, 2008 (INVITED PAPER)
Abstract
In optimal shape problems the derivatives of costs with respect to shapes are important, be-cause it gives a direction of lower cost from an initial shape. The differentiability of costsstrongly depends on shape derivatives of solutions of mechanical problems, stationary lin-earized flow problems, the Stokes problems. The shape derivatives are usually given auto-matically by the associated material derivatives. We show the convergence of shape difference
quotients under sufficient conditions. These conditions are applied to the existence of theshape derivatives of the velocity and the pressure in the Stokes problems.
Keywords material derivative, shape derivative, Stokes problem
Research Activity Group Mathematical Design
1. Introduction
Let D be a bounded domain of Rd, d = 2, 3, havingLipschitz boundary Γ, S be a proper subdomain of Dsuch that Σ = ∂S, the closure S is also a proper subsetof D and let Ω = D \ S be a concerned domain. Someliquid with the velocity u = (ui)1≤i≤d and the pressurep is filled in Ω. Below we write the notation Ui,j = ∂Ui
∂xj.
Let U : D ∋ x 7→ U(x) ∈ Rd be preliminary given as
Ui,i = 0 in Ω,
U 6≡ 0 on Σ,
U ≡ 0 on Γ.
(1)
Let X = H1(Ω)d, X0 = H10 (Ω)d, M = q ∈
L2(Ω) |∫
Ωqdx = 0. We assume that this flow is de-
termined by the problem (P)Ω: find (u, p) in X × M ,such that
−∆ui + p,i = 0 in Ω,
ui,i = 0 in Ω,
u − U ∈ X0.
(2)
Using the solution (u, p) of (2) with help of other func-tion g : D × Rd × Rd ∋ (x, ξ, η) 7→ g(x, ξ, η) ∈ R weshall define a cost function J(Ω) = J(Ω, u, p) as follows.
J(Ω) =∫
Ωg(x, u,∇p)dx, (3)
where sufficient regularity of g is assumed. Examples ofsuch g are given by g = −uip,i(x) or g = fiui(x), x ∈ Ω,called the pressure loss and the compliance, respectively.Here in the latter g f = (fi)i is outer force. The aim isto give sufficient conditions of the regularity of domainsand domain perturbations, which show the existence theshape derivatives of the velocity and the pressure.
2. Domain perturbation
We differentiate the function J(Ω) in the sense ofGateau at the initial domain Ω = Ω0. We look for a
family Ωǫǫ, Ωǫ = Ωǫρ, having cost j(ǫ) = J(Ωǫ) lowerthan the initial cost j(0). Here, let ρ (= (ρi)1≤i≤d) ∈Ck,1(D)d. The space Ck,1(D)d is the totality ofρ having Lipschitz continuous k-th partial derivatives∂αρi, where α denotes a multi-index with |α| = k, cho-sen k = 0, 1, 2, with its support supp(ρ), the closure ofx ∈ D | ρ(x) 6= 0. We notice Ck,1(D)d ⊂ Xk+1 (=Hk+1(D)d) with dual X ′
k+1. Let
Ωǫ = Ωǫρ
= xǫ | xǫ = x + ǫρ(x), ∀x ∈ Ω0.The Gateau variation j′(0) is defined by
〈j′(0), ρ〉 = limǫ→+0
δj(ǫ)
ǫ,
if the limit of the right hand side exists for some ρ ∈Xk+1 with a certain k. If a Gateau variation j′ is re-garded as j′ ∈ X ′
k+1, then j′ is called the Gateau
derivative. A condition, j′(0) ∈ X ′
k+1(D), is important
for the traction method which determines an elementρ0 ∈ Xk+1, where ρ0 gives the lowest direction of costas an element Xk+1(D) uniquely. The traction methodbegins first by H. Azegami [1], and is applied to variousoptimal shape problems (see [2] and [3]).
After here the velocity and the pressure in (2) withΩ = Ωǫ are denoted by uǫ and pǫ respectively. In thenext section let Σ0 = ∂S0 (= ∂Ω0 \ ∂D). We denote byν the unit normal vector on ∂Ω0.
3. The Gateau derivative of costs
Let gu = ( ∂g∂ui
)1≤u1≤d and g∇p = ( ∂g∂p,i
)1≤i≤d. Under
sufficient regularity of g, ρ ∈ Ck,1(D)d and Ω0 we see
〈j′(0), ρ〉=∫
Σ0 g(x, u0, p0)ρ · νdΣ0
+∫
Ω0(gu(x, u0,∇p0) · u′
+ g∇p(x, u0,∇p0) · ∇p′)dx,(4)
– 17 –
JSIAM Letters Vol. 1 (2009) pp.17–20 Satoshi Kaizu
where u′ and p′ denote the shape derivatives of the ve-locity uǫ and the pressure pǫ defined by
F ′(x) = limǫ→+0
δF ǫ(x)
ǫ. (5)
Here F ǫ(x) = uǫi(x) or pǫ(x), δF ǫ(x) = F ǫ(x) − F 0(x).
The formula (4) is derived by a process below.
δj(ǫ) =∫
Ωǫ\Ω0 g(x, uǫ,∇pǫ)dx
−∫
Ω0\Ωǫ g(x, u0,∇p0)dx
+∫
ωǫ(g(x, uǫ,∇pǫ) − g(x, u0,∇p0))dx,
(6)
where ωǫ denotes Ω0 ∩ Ωǫ. The first term in the righthand side of (4) is obtained directly, after applying thewellknown limit formula to the sum of the first and thesecond one in the right hand side of (6). The remainingg term of (6) determines the last term of the right handside of (4) through the following equality
∫
ωǫ(g(x, uǫ,∇pǫ) − g(x, u0,∇p0))dx
= ǫ∫
ωǫ
∫ 1
0gu(x, u0 + tǫ δuǫ
ǫ,∇pǫ)dt · δuǫ
ǫdx
+ǫ∫
ωǫ
∫ 1
0g∇p(x, u0,∇p0 + tǫ ∇ δpǫ
ǫ)dt · ∇ δpǫ
ǫdx.
The above formula implies the identity (4) intrinsi-cally under some regularity of g and the convergence ofboth of δuǫ
ǫand ∇ δpǫ
ǫ.
4. Sufficient conditions for the existence
of the shape derivatives
The existence of the shape derivatives are stronglyconnected to the existence of the material derivatives ingeneral. The existences of the material derivatives andshape derivatives are derived by the convergences of ma-terial difference quotients and shape difference quotientsrespectively. Let F ǫ : D ∋ x 7→ F ǫ(x) ∈ R, for example,F ǫ = uǫ
i(x), pǫ(x). The term δF ǫ
ǫ(x) in (7) is called by
the shape difference quotient of F ǫ at x. Let
F ǫ(x) = F ǫ T ǫρ(x) = F ǫ(x + ǫρ(x)).
We define the material difference quotient by δF ǫ
ǫ(x).
The definitions of the material difference quotient andthe shape difference quotient imply directly
δF ǫ
ǫ(x) = δF ǫ
ǫ(x) + ρjF
0,j(x) + ρjGǫ
j(x),
Gǫj(x, t) = F ǫ
,j(x + tǫρ(x)) − F 0,j(x),
Gǫj(x) =
∫ 1
0Gǫ
j(x, t)dt.
(7)
The lemma below is shown using basically importantLebesgue theory with tiredness computations.
Lemma 1 Let k = 0, 1, 2. We assume
ρ ∈
Ck,1(D)d
,
F ǫ(x)strongly−−−−→ F 0(x) in Hk+1(Ω0).
(8)
Let ω be any domain such that the closure ω ⊂ Ω0. Then
ρjGǫj
strongly−−−−→ 0 in Hk(ω) as ǫ → 0.
Corollary 2 In Lemma 1 we further assume
∃F ∈ Hk(Ω0), δF ǫ
ǫ
weakly−−−−→ F in Hk(Ω0). (9)
Then
∃F ′ ∈ Hk(Ω0), δF ǫ
ǫ
weakly−−−−→ F ′ in Hk(ω). (10)
The condition on the convergence of shape differencequotients needs the strong convergence of δF ǫ to 0 inHk+1(Ω0) and the weak convergence of the material dif-ference quotients in Hk(Ω0).
5. The material derivatives of the veloc-
ity and the pressure
We assume
U ∈ Hk+1(D)d for some k = 0, 1, 2, (11)
ρ ∈
Ck,1(D)d
, (12)
Ω0 is of class Ck,1. (13)
Let Xǫk+1
= H1(Ωǫ)d,Xǫ0 = H1
0 (Ωǫ)d and M ǫ =
qǫ ∈ L2(Ωǫ) |∫
Ωǫ qǫdx = 0 and M ǫk = M ǫ ∩ Hk(Ωǫ).
We also write (vǫ1, v
ǫ2)ǫ =
∫
Ωǫ vǫ1v
ǫ2dx and (v1, v2) =
(v1, v2)0. Let (uǫ, pǫ) ∈ Xǫ1 × M ǫ be a pair of solutions
of the Stokes problem (P)Ωǫ :
uǫ − U ∈ Xǫ0,
(∇(uǫ − U)i,∇vǫi )ǫ − (vǫ
i,i, pǫ)ǫ = 0, ∀vǫ ∈ Xǫ
0,
((uǫ − U)i,i, qǫ)
ǫ= 0, ∀qǫ ∈ M ǫ.
(14)
Under (11), (12) and (13) (see Temam [4]), the problem(14) admits a unique (uǫ, pǫ) ∈ Xǫ
k+1×Hk(Ωǫ) such that
‖uǫ‖Xǫk+1
+ ‖pǫ‖Mǫk≤ Cǫ,k‖U‖Hk+1(D)d . (15)
A general method on the regulariy of solutions of ellipticsystems using an open covering Wii of Ω0 introducedin the paragraph below (28), is applied to (14) withǫ = 0. Since both T ǫ : Xǫ
k+1∋ vǫ(xǫ) 7→ vǫ(x) ∈ X0
k+1
and its inverse (T ǫ)−1 are continuous between X0k+1
andXǫ
k+1, we see the existence of ǫ1, which depends on con-
stants C0,k, C3 and C4 in (28) (see [5]), such that
Ck = sup0≤ǫ≤ǫ1
Cǫ,k < ∞. (16)
Let uǫ(x) = uǫ(xǫ), pǫ(x) = pǫ(xǫ) and bǫjk, bǫ
0, Rǫjk be
certain functions such that
bǫjk(x)
strongly−−−−→ − (ρj,k + ρk,j)(x) in L∞(Ω0),
bǫ0(x)
strongly−−−−→ ρi,i(x) in L∞(Ω0),
Rǫjk(x)
strongly−−−−→ − ρj,k(x) in L∞(Ω0).
(17)
Let M ǫ = q ∈ L2(Ω0) |∫
Ω0 q(x)(1 + ǫbǫ0(x))dx = 0
and U ǫ(x) = U(xǫ) also. Then the pair (uǫ, pǫ) satisfies
(uǫ − U ǫ, pǫ) ∈ Xǫ0 × M ǫ,
(∇(uǫ − U ǫ)i,∇vi) + ǫ(∇(uǫ − U ǫ)i · ∇vi, bǫ0)
+ǫ((uǫ − U ǫ)i,jvi,k, bǫjk) − (vi,i, p
ǫ(1 + ǫbǫ0))
−ǫ(vi,j Rǫji, p
ǫ(1 + ǫbǫ0)) = 0, ∀v ∈ X0,
((uǫ − U ǫ)i,i, q) + ǫ((uǫ − U ǫ)i,i, qbǫ0)
+ǫ((uǫ − U ǫ)i,j Rǫji, q(1 + ǫbǫ
0)) = 0, ∀q ∈ M ǫ.
(18)
– 18 –
JSIAM Letters Vol. 1 (2009) pp.17–20 Satoshi Kaizu
Then estimates (15) and (16) imply that there exists aconstant C1 independent of ǫ ∈ (0, ǫ1] such that
‖uǫ‖Xk+1+ ‖pǫ‖
Mk≤ C1. (19)
Let X0 = X00 ,M = M0. Then (u0, p0) satisfies
(u0 − U, p0) ∈ X0 × M,
(∇(u0 − U)i,∇vi) − (vi,i, p0) = 0, ∀v ∈ X0,
((u0 − U)i,i, q) = 0, ∀q ∈ M.
(20)
Let δuǫ = uǫ − u0, δU ǫ = U ǫ −U and δpǫ = pǫ − p0. Wesubtract (20) from (18) and obtain
δ(uǫ − U ǫ) ∈ X0, (21)∫
Ω0 δpǫdx + ǫ∫
Ω0 pǫbǫ0dx = 0, (22)
(∇δ(uǫ − U ǫ)i,∇vi)
+ǫ(∇(uǫ − U ǫ)i · ∇vi, bǫ0)
+ǫ((uǫ − U ǫ)i,jvi,k, bǫjk)
−(vi,i, δ pǫ) − ǫ(vi,i, pǫ bǫ0)
−ǫ(vi,j Rǫji, p
ǫ(1 + ǫbǫ0)) = 0, ∀v ∈ X0,
(23)
(δ(uǫ − U ǫ)i,i, q) + ǫ((uǫ − U ǫ)i,i, bǫ0q)
+ǫ((uǫ − U ǫ)i,j Rǫji, q(1+ ǫbǫ
0)) = 0, ∀q ∈ M ǫ.(24)
The estimate (26) is implied by the lemma below.
Lemma 3 There exists a constant C0 such that, for
all q ∈ M ,
‖q‖M ≤ C0 supv 6=0,v∈X0
(vi,i,q)
‖∇v‖L2(Ω0)
. (25)
The relation (22) shows δpǫ + bǫ0p
ǫ ∈ M . The estimateon ‖δpǫ + bǫ
0pǫ‖M is implied by (23) through (25) just as
the proof of Theorem 4.1 of [6]. So the estimate ‖δpǫ‖M
reduces the estimate of ‖∇δ(uǫ − U)‖X0. Putting v =
δ(uǫ − U ǫ) into (23) with help of u0i,i = 0 and (24) (for
an exact description see [5]), the estimate (19) implies
‖δ(uǫ − U)‖X1+ ‖δpǫ‖L2(Ω0) = O(ǫ). (26)
This is an estimate for k = 0. For others k = 1, 2 withthe assumptions (11), (12) and (13) we obtain more es-timates of ‖δ(uǫ − U)‖Xk+1
and ‖δpǫ‖Hk(Ω0). A generalestimate of solutions of the Stokes problem, (21) to (24)(see, for example Proposition 2.3, p. 25 [4] or [5]) is de-scribed as
‖δ(uǫ − U)‖Xk+1+ ‖δ pǫ‖Hk(Ω0) ≤ C2ǫ, (27)
where C2 depends on C1 in (19), C3 and C4 below.
sup1≤i≤d,|α|≤k+1,x∈D
∂αρi(x) = C3 < ∞,
sup1≤i≤N
sup|α|≤k+1,x∈Zi
∂αφi(ξ) = C4 < ∞.(28)
Here, let Wi0≤i≤N and Zi1≤i≤N be two families oflocal coordinate open neighbourhoods and also open sets
in Rd−1 such that Ω0 ⊂ ⋃N
i=0Wi, Ω
0 \ ⋃N
i=1Wi ⊂ W0,
Wi ∩ ∂Ω0 = (ξ, φi(ξ)) ∈ Rd | Zi ∋ ξ 7→ φi(x) ∈ Rwith some functions φi(ξ)1≤i≤N . We show an exactdescription on the regularity of (uǫ, pǫ) in [5] using aboveWi, Zi. By (27) for any sequence of ǫm decreasingto 0 we have a subsequence ǫmn
such as un = uǫmn , pn =
pǫmn ,(
δun
ǫmn, δpn
ǫmn
)
weakly−−−−→ (w, r) in Xk+1 × Hk(Ω0).
Putting (δ(un − U), δpn) and v = ζ ∈ C∞
0 (Ω0)d intothose from (21) to (24), we divide them by ǫmn
and get
δ(un−U)
ǫmn∈ X0,
∫
Ω0
δpn
ǫmndx +
∫
Ω0 pnbǫmn0 dx = 0,
(∇ δ(un−U)i
ǫmn,∇vi) + (∇(un − U)i · ∇vi, b
ǫmn0 )
+((un − U)i,jvi,k, bǫmn
jk )
−(vi,i,δ pn
ǫmn) − (vi,i, b
ǫmn0 pn)
−(vi,j Rǫmnji , pn(1 + ǫmn
bǫmn0 )) = 0, ∀v ∈ X0,
(δ(un
−U)i,i
ǫmn, q) + ((un − U)i,i, b
ǫmn0 q)
+((un − U)i,j Rǫmnj,i , q(1 + ǫmn
bǫmn0 )) = 0,
∀q ∈ M ǫ.
(29)
Increasing n to ∞ in (29) with help of (17) implies
w − U ∈ X0,∫
Ω0 rdx +∫
Ω0 p0ρi,idx = 0,
(∇(w − U)i,∇ζi) + (∇(u0 − U)i · ∇ζi, ρi,i)−((u0 − U)i,jζi,k, ρj,k + ρk,j)−(vi,i, r) − (vi,i, ρi,ip
0)+(ζi,j ρj,i, p
0) = 0, ∀ζ ∈ C∞
0 (Ω0)d,
((w − U)i,i, η) + ((u0 − U)i,i, ρi,i η)−((u0 − U)i,j ρj,i, η) = 0, ∀η ∈ C∞
0 (Ω0).
(30)
Since the variational equalities (30) determine a pair(w, r) uniquely, a weak limit of pairs of any subse-quence ( δun
ǫmn, δun
ǫmn) becomes the same one. This means
( δuǫ
ǫ, δuǫ
ǫ) converges weakly in Xk+1 × Hk(Ω0). Hence
we get (w, r) = (u, p), because (w, r) is a limit of the
material difference quotients of ( δuǫ
ǫ, δpǫ
ǫ).
Theorem 4 We assume (11), (12) and (13) with k =0, 1, 2. Let (uǫ, pǫ) be a pair uniquely determined by the
problem (P)Ωǫ and set (uǫ, pǫ)(x) = (uǫ, pǫ)(xǫ), x ∈ Ω0.
Then we have(
δuǫ
ǫ, δpǫ
ǫ
)
weakly−−−−→ (u, p) in Xk+1 × Hk(Ω0). (31)
The pair (u, p) (∈ Xk+1×Hk) is uniquely determined by
the problem : find (u, p) ∈ X1 × L2(Ω0) such that
u − U ∈ X0,∫
Ω0 pdx +∫
Ω0 p0ρi,idx = 0, (32)
(∇(u − U)i,∇ζi) + (∇(u0 − U)i · ∇ζ, ρi,i)−((u0 − U)i,jζi,k, ρj,k + ρk,j)−(vi,i, p) − (vi,i, ρi,ip
0)+(ζi,j ρj,i, p
0) = 0, ∀ζ ∈ C∞
0 (Ω0)d,
(33)
((u − U)i,i, η) + ((u0 − U)i,i, ρi,i η )−((u0 − U)i,j ρj,i, η) = 0, ∀η ∈ C∞
0 (Ω0).(34)
6. The shape derivatives of the velocity
and the pressure
Let k = 1, 2. All the preparation for the convergenceof the shape difference quotients of the velocity and the
– 19 –
JSIAM Letters Vol. 1 (2009) pp.17–20 Satoshi Kaizu
pressure are done till now and we notice that
(δuǫ, δpǫ)strongly−−−−→ 0 in Xk+1 × Hk(Ω0), (35)
( δuǫ
ǫ, δpǫ
ǫ)
strongly−−−−→ (u, p) in Xk × Hk−1(Ω0), (36)
as ǫ → +0, if all the assumptions in Theorem 4 aresatisfied. The identity between ( δuǫ
ǫ, δpǫ
ǫ) and ( δuǫ
ǫ, δpǫ
ǫ)
are combined by an equality (7). Let ω be any subdomainof Ω0 as in Lemma 1 and we write Xk(ω) = v|ω | ∀v ∈Xk. Applying Corollary 2 with F ǫ = uǫ
i , pǫ togetherwith (35) and (36), we get
( δuǫ
ǫ, δpǫ
ǫ)
strongly−−−−→ (u′, p′) in Xk(ω) × Hk−1(ω). (37)
In our setting we have
U ′ ≡ 0 in Ω. (38)
In the theorem, in spite of (38) the term U ′ is describedexplicitly, because a general form is seen in this case.
Theorem 5 Let k = 1, 2. We assume (11), (12) and
(13). Let ω be a any domain such that ω ⊂ Ω0. Then
( δuǫ
ǫ, δpǫ
ǫ)
strongly−−−−→ (u′, p′) in Xk(ω) × Hk−1(ω), (39)
as ǫ → +0. The shape derivative (u′, p′) is determined
uniquely as the solutions of the problem (Q)Ω0 : find
(u′, p′) ∈ X1 × L2(Ω0) such that
u′ − U ′ + ρj(u0 − U),j ∈ X0,
∫
Ω0(p′ + ρjp
0,j + p0ρi,i)dx = 0,
(∇(u′ − U ′)i,∇ζi)−(ζj,j , p
′) = 0, ∀ζ ∈ C∞
0 (Ω0)d,
((u′ − U ′)i,i, η) = 0, ∀η ∈ C∞
0 (Ω0).
(40)
Remark 6 The variational equalities in (40) are writ-
ten as
u′ − U ′ + ρj(u0 − U),j ∈ X0,
∫
Ω0(p′ + ρjp
0,j + p0ρi,i)dx = 0,
−∆(u′ − U ′) + ∇p′ = 0 in Ω,
(u′ − U ′)i,i = 0 in Ω for (u′, p′) ∈ X2 × H1(Ω0).
Proof Under the assumptions in Theorem 5 we have
(δuǫ
i
ǫ, δpǫ
ǫ) = (
δuǫi
ǫ, δpǫ
ǫ) + ρj(u
0i,j , p
0,j)
+(rǫ(uǫi), r
ǫ(pǫ)),(41)
δUǫi
ǫ=
δUǫi
ǫ+ ρjUi,j + rǫ(U ǫ
i ), (42)
(rǫ(uǫi), rǫ(pǫ))
strongly−−−−→ 0 in Xk(ω) × Hk−1(ω), (43)
rǫ(U ǫi )
strongly−−−−→ 0 in Xk(ω). (44)
First putting v = ζ ∈ C∞
0 (Ω0)d into (29), for smallenough ǫ2 such that, ∀ǫ ∈ (0, ǫ2], supp(ζ) ⊂ Ωǫ∩Ω0, afterputting (41) and (42) into (29), applying (43) and (44),using the integration by parts, we change each factor ofpartial derivatives such as ∂αζi to the factor ζi, after alot of tedious computation, we get a simple form below
−(∆(u′ − U ′)i, ζi) + (ζi, p′
,i) = 0.
The relations from (41) to (43) imply (u − U)i,i =(ρj(ui,j − Ui,j)),i + (u′ − U ′)i,i and then we obtain the
last equality of (40) with help of (u−U)i,i = 0. The firstone of (40) is given by the a definition, u′ = u − ρju,j
on Σ0 (just as (2.163) or (2.169) of [7]), shown to bejustified by the trace of u′. The second equality of (40)is implied directly.
(QED)
A general description on the material and shapederivatives are shown in [7] and [8]. In the latter onethe convergence of the material difference quotients isimplied by the regularity of associated functions assuredby the implicit function theorem (the proof of Propo-sition 2.82 of [7], for example). In this note, estimateson solutions of elliptic systems are directly and fully ap-plied to the convergence of material or shape differencequotients. Hence results are slightly different each other.The method of our paper can be applied to the behaviourof the solutions of the Poisson equation −∆uǫ = f in Ωǫ,uǫ = 0 on ∂Ωǫ. Propositions 2.82 and 2.83 in [7] showthat δuǫ
ǫ
strongly−−−−→ u in Hk(Ω0) ∩ H10 (Ω0), provided Ω0 is
of class Ck and ρ ∈ Ck(D) together with f ∈ H1(D)for k = 1, 2 (rewritten by our notation is done). Onthe other hand, by our method if Ω0 is of class Ck andρ ∈ Ck(D) together with f ∈ L2(D), then δuǫ
ǫ
weakly−−−−→ uin Hk(Ω0) ∩ H1
0 (Ω0).
References
[1] H. Azegami, A solution to domain optimization problems,
Trans. Japan Soc. Mech. Engs., Ser. A, 60 (1994), 1479–1486(in Japanese).
[2] H. Azegami and K. Takeuchi, A smoothing method for shapeoptimization: traction method using the Robin condition, Int.J. Comput. Meth., 31(1) (2006), 21–33.
[3] S. Kaizu and H. Azegami, Optimal shape problems and thetraction method, Trans.Japan Soc. Indust.Appl.Math., 16(3)
(2006), 277–290 (in Japanese).[4] R. Temam, Navier-Stokes Equations: Theory and Numerical
Analysis, AMS Chelsea Pub., 2001.[5] S. Kaizu, Sensitivity analysis of costs including both of the
velocity and the pressure and a finite element method for theStokes problems, in preparation.
[6] V. Girault and P. -A. Raviart, Finite Element Approximation
of the Navier-Stokes Equations, Lect. Notes Math., Vol.749,Springer-Verlag, Berlin, Heidelberg, New York, 1979.
[7] J. Sokolowski and J.-P.Zolesio, Introduction to Shape Op-timization: Shape Sensitivity Analysis, Spriger-Verlag, New
York, 1991.[8] J. Haslinger and R. A. E. Makinen, Introduction to Shape
Optimization: Theory, Approximation, and Computation,SIAM, Philadelphia, 2003.
– 20 –
JSIAM Letters Vol.1 (2009) pp.21–24 c©2009 Japan Society for Industrial and Applied Mathematics
On very accurate verification of solutions for boundary
value problems by using spectral methods
Mitsuhiro T. Nakao1 and Takehiko Kinoshita2
Faculty of Mathematics, Kyushu University, 33, Fukuoka 812-8581, Japan1
Graduate School of Mathematics, Kyushu University, 33, Fukuoka 812-8581, Japan2
E-mail [email protected]
Received October 31, 2008, Accepted December 13, 2008 (INVITED PAPER)
Abstract
In this paper, we consider a numerical verification method of solutions for nonlinear ellipticboundary value problems with very high accuracy. We derive a constructive error estimates forthe H1
0 -projection into polynomial spaces by using the property of the Legendre polynomials.On the other hand, the Galerkin approximation with higher degree polynomials enables us toget very small residual errors. Combining these results with existing verification procedures,several verification examples which confirm us the actual effectiveness of the method arepresented.
Keywords numerical verification, guaranteed error bound, spectral method
Research Activity Group Quality of Computations
1. Introduction
Spectral methods are well-known approximate tech-niques which achieve an arbitrary degree of accuracy incontrast to other methods such as finite difference orfinite element methods. On the other hand, in the nu-merical verification methods of solutions for boundaryvalue problems, e.g., [1, 2] etc., the smaller the residualerror, the finer the enclosure of exact solutions. There-fore, in the present paper, we formulate a method usingthe spectral technique with Legendre polynomials to geta highly accurate verification of solutions for nonlinearelliptic equations with Dirichlet boundary conditions.First, we derive some constructive a priori error esti-mates for the H1
0 -projection into polynomial spaces byusing the property of the Legendre polynomials, whichplays an essential role in our verification method. Next,describing briefly the verification method of solutions fornonlinear boundary value problems, we will present someverification results on the existence and local uniquenessof solutions for Emden’s equation. These results provethat the present method enables us to verify the solu-tions with very high accuracy which has not been at-tained up to now by other methods (e.g., [2, 3]).
2. Basis of H10 by Legendre polynomials
As well known the Legendre polynomials on Λ =(a, b) ⊂ R is defined as, for an arbitrary non-negativeinteger n,
Pn(x) :=(−1)n
n! |Λ|n(
d
dx
)n
(b − x)n(x − a)n, (1)
where |Λ| := b − a. Let Pn(Λ) denote the set of polyno-mials on Λ with degree ≤ n.
We define the set of homogeneous polynomials byP
1,0N (Λ) := uN ∈ PN (Λ) ; uN (a) = uN(b) = 0 which is
a subspace of H10 (Λ). Moreover, for ∀n ≥ 2, φn ∈ P
1,0n (Λ)
is defined by
φn(x) :=
√2n − 1
n(n − 1) |Λ|1/2(b − x)(x − a)P ′
n−1(x) (2)
or equivalently, by (1),
φn(x) =(−1)n
√2n − 1
(n − 1)! |Λ|n−1/2
(
d
dx
)n−2
(b−x)n−1(x−a)n−1.
Then, we have the following property.
Theorem 1 φnn≥2 ⊂ H10 (Λ) is a complete orthonor-
mal system in H10 (Λ), i.e.
(φm, φn)H1
0(Λ)
:= (φ′
m, φ′
n)L2(Λ)= δm,n, ∀m, n ≥ 2.
Proof (Orthogonality) For arbitrary m, n ≥ 2 then,by the well known property of Pn we have
(φm, φn)H1
0(Λ)
=cm,n(m(m−1)Pm−1, n(n−1)Pn−1)L2(Λ),
where cm,n := (−1)m+n√
(2m−1)(2n−1)/(m(m−1)n(n−1)|Λ|). Moreover, from the orthogonality of Pnn≥0, wehave
(φm, φn)H1
0(Λ)
= δm,n.
(Completeness) It suffices to show that, for arbitraryu ∈ H1
0 (Λ),
(u, φn)H1
0(Λ)
= 0, ∀n ≥ 2 =⇒ u = 0 in H10 (Λ).
From the definition, we have
(u, φn)H1
0(Λ)
= −√
2n − 1
|Λ|1/2(u′, Pn−1)L2(Λ)
, ∀n ≥ 2.
Therefore,
(u′, Pn)L2(Λ)= 0, ∀n ≥ 1.
– 21 –
JSIAM Letters Vol. 1 (2009) pp.21–24 Mitsuhiro T. Nakao et al.
It also holds that (u′, P0)L2(Λ)= 0. Since Pnn≥0 is
a complete orthogonal system in L2(Λ), we have u′ =0 in L2(Λ), which implies u = 0 in H1
0 (Λ).(QED)
Now, we define the H10 -projection π1,0
N : H10 (Λ) →
P1,0N (Λ) by
(
u − π1,0N u, vN
)
H1
0(Λ)
= 0, ∀vN ∈ P1,0N (Λ).
Owing to the complete orthogonality of φn, theoperator π1,0
N coincides with the truncation operator.Namely, we have, for arbitrary u ∈ H1
0 (Λ),
u =∞∑
n=2
anφn =⇒ π1,0N u =
N∑
n=2
anφn.
3. Constructive error estimates for H10 -
projection
Theorem 2 For arbitrary u ∈ H10 (Λ)∩H2(Λ), we have
∥
∥
∥u − π1,0N u
∥
∥
∥
H1
0(Λ)
≤ C(N) |u|H2(Λ), (3)
where the constant C(N) is defined as
C(N) =
|Λ| /√
2(2N − 1)(2N + 1), if N = 2, 3,
|Λ| /√
(2N + 1)(2N + 5), if N ≥ 4.
Proof For each u ∈ H10 (Λ) ∩ H2(Λ), we have the fol-
lowing expansion
u =
∞∑
n=2
anφn, an = (u, φn)H1
0(Λ)
. (4)
Here, the truncation operator π1,0N satisfies
π1,0N u =
N∑
n=2
anφn.
By the Parseval equality, we have
∥
∥
∥u − π1,0N u
∥
∥
∥
2
H1
0(Λ)
=
∥
∥
∥
∥
∥
∞∑
n=N+1
anφn
∥
∥
∥
∥
∥
2
H1
0(Λ)
=
∞∑
n=N+1
a2n. (5)
Next, u′′ ∈ L2(Λ) can be expanded by Pn as follows:
u′′ =
∞∑
n=0
bn
Pn
‖Pn‖L2(Λ)
, bn =
(
u′′,Pn
‖Pn‖L2(Λ)
)
L2(Λ)
.
(6)
Therefore, the Parseval equality implies
|u|2H2(Λ)= ‖u′′‖2
L2(Λ)=
∞∑
n=0
b2n. (7)
From the fact that φ′
n = −√
2n − 1 |Λ|−1/2Pn−1, we
have, for ∀n ≥ 2, by using well known properties of Pnan = (u′, φ′
n)L2(Λ)
= −√
2n − 1
|Λ|1/2(u′, Pn−1)L2(Λ)
= − |Λ|1/2
2√
2n − 1
(
u′, P ′
n − P ′
n−2
)
L2(Λ)
=|Λ|1/2
2√
2n− 1(u′′, Pn − Pn−2)L2(Λ)
=|Λ|1/2
2√
2n− 1
(
‖Pn‖L2(Λ)bn − ‖Pn−2‖L2(Λ)
bn−2
)
=:1√2αnbn − 1√
2βn−2bn−2.
Here, we define the constants αn, βn by
αn =|Λ|1/2 ‖Pn‖L2(Λ)√
2(2n − 1)=
|Λ|√
2(2n− 1)(2n + 1),
βn =|Λ|1/2 ‖Pn‖L2(Λ)√
2(2n + 3)=
|Λ|√
2(2n + 1)(2n + 3).
Then, each term in (5) is estimated as follows,
a2n =
1
2
(
αnbn − βn−2bn−2
)2
=1
2
(
α2nb2
n − 2αnbnβn−2bn−2 + β2n−2b
2n−2
)
≤ α2nb2
n + β2n−2b
2n−2.
From the above estimates and (5), we have the errorestimates∥
∥
∥u − π1,0N u
∥
∥
∥
2
H1
0(Λ)
=
∞∑
n=N+1
a2n
≤∞∑
n=N+1
(
α2nb2
n + β2n−2b
2n−2
)
= β2N−1b
2N−1 + β2
Nb2N +
∞∑
n=N+1
(
α2n + β2
n
)
b2n
≤ maxN+1≤n<∞
β2N−1, β2
N , α2n + β2
n
∞∑
n=N−1
b2n
≤ max
β2N−1, α2
N+1 + β2N+1
|u|2H2(Λ).
If N ≤ 7/2 then β2N−1 ≥ α2
N+1 + β2N+1. Therefore
C(N) = βN−1 = |Λ| /√
2(2N − 1)(2N + 1). If N ≥7/2 then β2
N−1 ≤ α2N+1 + β2
N+1. Therefore C(N) =√
α2N+1
+ β2N+1
= |Λ| /√
(2N + 1)(2N + 5).
(QED)
Theorem 3 For arbitrary u ∈ H10 (Λ), we have
∥
∥
∥u − π1,0N u
∥
∥
∥
L2(Λ)
≤ C(N)∥
∥
∥u − π1,0N u
∥
∥
∥
H1
0(Λ)
(8)
Here, C(N) is a constant same as in Theorem 2.
We omit the proof of (8), because it is almost thesame as the usual Aubin-Nitsche trick. For two or threedimensional domains like Λ1×· · ·×Λd, d = 2, 3, by usingthe tensor product of d times with one dimensional ba-sis φn, the problem reduces to one dimensional case.Namely, we obtain the same results in Theorems 2 and3 with the same constant C(N) for those domains.
– 22 –
JSIAM Letters Vol. 1 (2009) pp.21–24 Mitsuhiro T. Nakao et al.
4. Verification for elliptic boundary value
problems
In the below, we briefly describe the verification con-dition for nonlinear elliptic boundary value problemsbased on [2], which we applied for the actual verifica-tion in the present paper. Let Ω ⊂ R
d be a polygonal(polyhedral) domain, and let f : H1
0 (Ω) → L2(Ω) be aFrechet differentiable map. Consider the boundary valueproblem:
−u = f(u) in Ω, (9)
u = 0 on ∂Ω. (10)
Let SN be an m-dimensional subspace in H10∩H2(Ω) and
let uN ∈ SN be an appropriately approximate solutionof (9), (10). We set u = w + uN . Then, the residualequation of Newton type is given by
Lw := −w − f ′(uN )w = g(w) in Ω, (11)
w = 0 on ∂Ω, (12)
where g(w) = f(w + uN ) + uN − f ′(uN )w.If the operator L is invertible, then (11), (12) are
rewritten as the fixed point equation of the form w =L−1g(w) =: F (w) for a compact map F on H1
0 (Ω). Foran α > 0, define the set Wα ⊂ H1
0 (Ω) by
Wα :=
w ∈ H10 (Ω) ; ‖w‖H1
0(Ω)
≤ α
.
If F (Wα) ⊂ Wα then, by Schauder’s theorem there ex-ists a fixed point u in the set Wα which we call a candi-
date set. Furthermore, the local uniqueness condition ofsolutions on Wγ , for a γ > 0, is presented by
‖F (w1) − F (w2)‖H1
0
≤ k ‖w1 − w2‖H1
0
, ∀w1, w2 ∈ Wγ
(13)
for some 0 < k < 1.We now give an invertibility condition for the linear
operator L. Suppose that the linear operator L can berepresented as
Lw ≡ −w − f ′(uN )w
= −w + b · ∇w + cw,
where b ∈ W 1,∞(Ω)d, c ∈ L∞(Ω). Then we have thenext theorem
Theorem 4 ([2]) If the inequality
κ := C(N)(
C1M(N)K(N) + C2
)
< 1
holds, then L is invertible. In the above expression, we
have
C1 = ‖b‖L∞(Ω)d + Cp ‖c‖L∞(Ω),
C2 = ‖b‖L∞(Ω)d + C(N) ‖c‖L∞(Ω),
M(N) =∥
∥
∥DT/2G−1D1/2
∥
∥
∥
E,
K(N) = C(N)(
Cp ‖div b‖L∞ + ‖b‖L∞ + Cp ‖c‖L∞
)
,
Gi,j = (∇φj ,∇φi)L2 + (b · ∇φj , φi)L2 + (cφj , φi)L2 ,
Di,j = (∇φj ,∇φi)L2 ,
where G := (Gi,j), D := (Di,j) are m×m matrices, and
‖ · ‖E stands for the Euclidean norm of a matrix. Here,
Cp is a Poincare constant. The following estimate also
holds, which yields the norm of the inverse operator L−1.
That is, for arbitrary g ∈ L2(Ω),∥
∥L−1g∥
∥
H1
0(Ω)
≤ Cp ‖R‖1/2
E ‖g‖L2(Ω).
Here, the matrix R ∈ R2×2 is defined as
R = τ(
M(N)2(C
2
1C(N)
2+
`
1−C2C(N)´
2) symmetry.
M(N)(C1C(N)+`
1−C2C(N)´
M(N)K(N)) 1+M(N)2K(N)
2
)
,
where τ := 1/(1 − κ)2.
In general, from Theorem 4, the verification conditionF (Wα) ⊂ Wα reduces to some nonlinear inequality withrespect to the real parameter α. Furthermore, the localuniqueness condition is also represented by another kindof inequality in γ.
5. Numerical Example
We consider following Emden equation,
−u = u2 in Ω,
u = 0 on ∂Ω,
where Ω is a one dimensional interval (0, 1) or a rectan-gle (0, 1) × (0, 1) in two dimension. We define the finitedimensional space SN as P
1,0N (0, 1) or its tensor product
space P1,0N (0, 1)2 in H1
0 (Ω). Let φi be the basis of SN .In the below, we use the same symbols as before. First,we compute a numerical solution uN ∈ SN satisfying
(∇uN ,∇vN)L2(Ω)d =(
u2N , vN
)
L2(Ω), ∀vN ∈ SN , (14)
by using the usual Newton method with some appro-priate initial value. Note that, in the present case, it issufficient to compute the solution of the above nonlinearequation by the usual floating point arithmetic. Namely,it is not necessary to get the verified solution of (14). Thelinearized operator L is defined by Lw := −w−2uNw.Then, we compute each constant in Theorem 4 by us-ing guaranteed computation based on the interval arith-metic with Cp = 1/π. In this case, the verification con-dition of the existence of solution for a candidate set Wα
can be represented as the following quadratic inequalityin α.
Cp ‖R‖1/2
E
(
C24α2 +
∥
∥uN + u2N
∥
∥
L2
)
≤ α, (15)
where C4 is an embedding constant in Sobolev’s inequal-ity satisfying
‖u‖L4(Ω)≤ C4 ‖u‖H1
0(Ω)
, ∀u ∈ H10 (Ω).
From the inequality (15), one can find that the orderof magnitude for α is almost the same as the residualnorm.
On the other hand, the verification condition (13) ofthe uniqueness for a set Wγ can be given by the followinginequality in γ,
γ ≤ 1
2CpC24 ‖R‖1/2
E
. (16)
– 23 –
JSIAM Letters Vol. 1 (2009) pp.21–24 Mitsuhiro T. Nakao et al.
Here, the matrix R is same as in Theorem 4. After en-closing a solution in Wα, we can also obtain the L∞ aposteriori error estimates by using the explicit Sobolevinequality as below. Namely, for the error w := u − uN ,in one dimensional case, we have
‖u − uN‖L∞(Ω)≤ 1√
2‖u − uN‖H1
0(Ω)
≤ α√2.
And, in two dimensional case, it holds that by Plum’sestimate ([3])
‖w‖L∞ ≤ C∗
1 ‖∇w‖L2 + C∗
2 |w|H2
≤ C∗
1α + C∗
2 ‖w‖L2 .
Here, in the present case, constants C∗
1 and C∗
2 satisfyC∗
1 ≤ 2/√
3 and C∗
2 ≤√
14/5/6. Further, we have
‖w‖L2 =∥
∥2uNw + w2 + uN + u2N
∥
∥
L2
≤ 2Cpα ‖uN‖L∞ + C24α2 +
∥
∥uN + u2N
∥
∥
L2.
From the above estimates, it is seen that L∞ error is alsothe same order as the residual error. Table 1 shows the
Table 1. Verification results: one dimension
N ‖uN‖L∞
‖uN‖H
1
0
‚
‚u′′
N+ u2
N
‚
‚
L2 M(N)
2 11.6667 26.9431 5.18519E+01 1.00001
4 11.7178 25.6680 1.90865E+01 1.50865
8 11.7959 25.6254 6.13925E−01 1.66595
16 11.7967 25.6254 1.31699E−04 1.66667
24 11.7967 25.6254 1.28287E−08 1.66667
32 11.7967 25.6254 3.45912E−11 1.66669
N κ Existence Uniqueness ‖u − uN‖L∞
2 2.61658 —— —— ——
4 0.91785 Failed —— ——
8 0.32925 Failed —— ——
16 0.09631 8.33030E−05 2.73837 5.89042E−05
24 0.04529 7.41547E−09 2.99641 5.24353E−09
32 0.02622 1.93076E−11 3.10356 1.36525E−11
one dimensional verification results. These results arecomputed by using interval arithmetic with double pre-cision coded by INTLIB [4]. In the table, “——” meansno calculation due to the failure of the invertibility con-dition in Theorem 4. “Failed” means the verification con-dition (15) failed. The column for N in Table 1 stands forthe degree of polynomial. It turns out that the residualnorm decays with exponential order of N . If L is invert-ible, M(N) should be convergent to a certain constant.In the table, a bit of increasing of M(N) dependent onN would come from the influence by interval arithmeticcomputations. “Existence” means the smallest α whichsatisfies quadratic inequality (15) and “Uniqueness” thelargest γ satisfying (16). The L∞ error ‖u − uN‖L∞ isalmost same order as “Existence”. Table 2 shows the re-sult for two dimensional case using bi-N degree polyno-mials. By our numerical observation using floating pointarithmetic in double precision, M(N) was convergent to2.746811 · · · . However, in Table 2, this value tends toincrease as N . Actually, in the computational process,due to the accumulation of enclosing the rounding error,some unexpected enlargement of the width of intervalare caused, which brings the failure of verification, e.g.,for N = 40.
Quadrature rule. In the actual numerical compu-tations, in order to avoid the loss of significant digits
Table 2. Verification results: two dimension
N ‖uN‖L∞
‖uN‖H
1
0
‚
‚uN + u2
N
‚
‚
L2 M(N)
2 27.2223 64.9288 1.61353E+02 1.00001
4 28.3239 59.2653 8.75145E+01 2.15455
8 29.2334 58.8264 6.49785E+00 2.74009
16 29.2571 58.8259 7.61524E−03 2.74682
24 29.2571 58.8259 4.17816E−06 2.74719
32 29.2571 58.8259 1.70506E−08 3.19814
N κ Existence Uniqueness ‖u − uN‖L∞
2 11.8261 —— —— ——
4 6.47153 —— —— ——
8 2.82214 —— —— ——
16 0.82836 7.43965E−02 0.154966 1.39722E−00
24 0.38951 6.86979E−06 0.608198 1.32133E−04
32 0.26043 2.55080E−08 0.668438 4.92153E−07
due to the integration of higher degree polynomials, weeffectively used the Gauss-Legendre quadrature formulaon the interval Λ, satisfying, for each integer m ≥ 1,
∫
Λ
p(x) dx =
m∑
n=1
p(xn)wn, ∀p ∈ P2m−1(Λ). (17)
Here xn is the zero of Pm and wn is the weight at xn,which are computed with guaranteed accuracy.Computer environment. CPU: Intel Core2 Quad
Q6700, Memory: DDR2 8GB, OS: Ubuntu Linux 7.10AMD64, Compiler: Intel Fortran 10.1, LAPACK: ver-sion 3.1.1, BLAS: Goto BLAS 1.26, Interval arithmetic:INTLIB [4].
6. Conclusion
There are some existent verification results for thesame problem. In [2], the corresponding H1
0 error was4.1569×10−2 for the piecewise bi-quadratic C0 functionswith 400 elements and, in [3], the error bound in L∞
sense was 8.460×10−4 for the piecewise bi-quintic poly-nomials of C1-class with 64 elements. Therefore, by ourcomputational results, it was confirmed that the spec-tral methods enable us to get highly precise approxi-mation with guaranteed accuracy for Dirichlet problemswith reasonable computational costs. However, for thepresent, we could not completely overcome the errorpropagation in the computations of polynomials withhigher degree. It seems necessary to use some more pre-cise interval techniques based on multi-precision arith-metic or other efficient approaches such as [5].
References
[1] M.T. Nakao and Y. Watanabe, An efficient approach tothe numerical verification for solutions of elliptic differentialequations, Numerical Algorithms 37, Special issue for Proc.of SCAN2002 (2004), 311–323.
[2] M.T. Nakao, K. Hashimoto and Y. Watanabe, A numericalmethod to verify the invertibility of linear elliptic opera-tors with applications to nonlinear problems, Computing, 75(2005), 1–14.
[3] M. Plum, Numerical existence proofs and explicit bounds forsolutions of nonlinear elliptic boundary value problems, Com-puting 49 (1992), 25–44.
[4] R. Baker Kearfott, Algorithm 763: INTERVAL ARITH-METIC: A Fortran 90 module for an interval data type, ACMTrans. Math. Software, 22(4) (1996), 385–392.
[5] T. Ogita, S.M. Rump and S. Oishi, Accurate sum and dotproduct, SIAM J. Sci. Comput., 26 (2005), 1955–1988.
– 24 –
JSIAM Letters Vol.1 (2009) pp.25–27 c©2009 Japan Society for Industrial and Applied Mathematics
On oscillatory solutions
of the ultradiscrete Sine-Gordon equation
Shin Isojima1 and Junkichi Satsuma1
Department of Physics and Mathematics, College of Science and Engineering, Aoyama GakuinUniversity, 5-10-1 Fuchinobe, Sagamihara, Kanagawa, 229-8558 Japan1
E-mail [email protected], [email protected]
Received December 6, 2008, Accepted February 21, 2009
Abstract
Exact solutions of the ultradiscrete Sine-Gordon equation which have oscillating structureare constructed. They are considered to be a counterpart of the breather solution of theSine-Gordon equation. They are given by setting specific parameters in the discrete solitonsolutions and ultradiscretizing the resulting solutions.
Keywords soliton, cellular automaton, Sine-Gordon equation, ultradiscrete system,breather solution
Research Activity Group Applied Integrable Systems
1. Introduction
Cellular automaton (CA) is a discrete dynamical sys-tem which consists of a regular array of cells. Each celltakes a finite number of states updated by a given rule indiscrete time steps. Although the updating rule is usu-ally simple, CAs may give very complex evolution pat-terns (see for example [1]). Moreover, CAs are suitablefor computer experiments since all variables take dis-crete values. Hence CAs may be good models to capturethe essential mechanisms for physical, social or biologicalphenomena by simple rules.
Ultradiscretization [2] is a procedure transforming agiven difference equation into a CA or an ultradiscretesystem. In general, to apply this procedure, we first re-place a dependent variable in a given equation xn witha new variable Xn by
xn = eXn/ε (1)
upon introduction of a parameter ε > 0. Then in thelimit ε ↓ 0, addition, multiplication and division of theoriginal variables are replaced with max, addition andsubtraction for the new ones, respectively. Note that xn
should be positive definite for (1) and that no generalway to cover subtraction in a discrete equation. In ad-dition to overcoming these difficulty, it is also an openproblem how to capture oscillatory phenomena in ultra-discrete systems. A partial answer is given in [3] and [4],in which ultradiscretization of the elliptic functions isdiscussed. The authors and coworkers reported an ultra-discrete analogue of the Airy function as the solution ofan initial value problem in [5].
It has already been reported that some ultradiscretesystems constructed from discrete soliton equations pos-sess soliton solutions similar to those of the discreteor corresponding continuous systems (see for example[2,6,7]). However, an ultradiscrete solution propagating
with oscillation, as the breather solution of the Sine-Gordon (SG) equation, has not been reported. In thisletter, we propose solutions of an ultradiscrete analogueof the SG (udSG) equation [8] which have oscillatingstructure. They are considered to be a counterpart ofthe breather solution. They are constructed by propersetting of parameters in the known discrete soliton solu-tions and ultradiscretizing the resulting solutions.
2. Ultradiscrete Sine-Gordon Equation
The SG equation, one of the well-known soliton equa-tions,
∂2ϕ
∂x∂t= sin ϕ (2)
is famous for possessing the breather solution, which de-scribes oscillatory phenomena and is given as the specialcase of the 2-soliton solution. Hirota proposed an inte-grable discrete analogue of the SG equation [9]
sin(φm+1
n+1 + φm−1n−1 − φm−1
n+1 − φm+1n−1
4
)
= δ2 sin(φm+1
n+1 + φm−1
n−1 + φm−1
n+1 + φm+1
n−1
4
)
(3)
through the bilinearizing technique. Note that this equa-tion also has the breather solution.
For the purpose of constructing an udSG equation,the authors and coworkers proposed another discrete SGequation [8]
∣
∣
∣
∣
∣
∣
∣
∣
(
1 − δ2)
um−1n−1 − 1
1 + δ2
um+1
n−1
− 1
1 + δ2
um−1
n+1
− 1(
1 − δ2)
um+1
n+1 − 1
∣
∣
∣
∣
∣
∣
∣
∣
= 0. (4)
– 25 –
JSIAM Letters Vol. 1 (2009) pp.25–27 Shin Isojima and Junkichi Satsuma
This equation is reduced to the trilinear form∣
∣
∣
∣
∣
∣
∣
(
1 − δ2)
τm−2n−2 τm
n−2
(
1 + δ2)
τm+2n−2
τm−2n τm
n τm+2n
(
1 + δ2)
τm−2
n+2 τmn+2
(
1 − δ2)
τm+2
n+2
∣
∣
∣
∣
∣
∣
∣
= 0 (5)
through the variable transformation
umn =
τm+1
n+1 τm−1
n−1
τm−1
n+1 τm+1
n−1
. (6)
If we set
δ = tanh
(
L
2ε
)
, τmn = eT m
n /ε, umn = eUm
n /ε (7)
and take the limit ε ↓ 0, we have the udSG equation forUm
n
max[
−|L|+ Um+1
n+1 + Um−1
n−1 , |L| − Um−1
n+1 , |L| − Um+1
n−1
]
= max[
|L| − Um−1n+1 − Um+1
n−1 , Um+1n+1 , Um−1
n−1
]
(8)
from (4) and for T mn
max[
− |L| + T m+2
n+2 + T mn + T m−2
n−2 ,
|L| + T m−2
n+2 + T m+2n + T m
n−2,
|L| + T mn+2 + T m−2
n + T m+2
n−2
]
= max[
|L| + T m−2
n+2 + T mn + T m+2
n−2 ,
T m+2n+2 + T m−2
n + T mn−2,
T mn+2 + T m+2
n + T m−2
n−2
]
(9)
from (5) and the relation between T mn and Um
n
Umn = T m+1
n+1 + T m−1
n−1 − T m−1
n+1 − T m+1
n−1 (10)
from (6). Refer to [8] for more detail about the udSGequation and its soliton solutions.
3. Oscillatory Solution
For the purpose of our discussion, we give the 2-solitonsolution of (5). Let pj , qj be parameters satisfying thedispersion relation
δ2(pj2 + 1)(qj
2 + 1) = (pj2 − 1)(qj
2 − 1) (11)
and aj be arbitrary phase constants. Phases xj and in-teraction factors bjk are defined by
xj = pjnqj
m, (12)
bjk =(pj
2 − pk2)2
((pjpk)2 − 1)2, (13)
respectively. In terms of these notations, the 2-solitonsolution is written as
τmn = 1 + a1x1 + a2x2 + a1a2b12x1x2. (14)
Now, we construct the 2-periodic solution by specificsetting of parameters in (14). Let us set
p2 = −p1, q2 = q1, a1 = α1 + α2, a2 = α2. (15)
Then (14) is reduced to
τmn =
1 + (α1 + 2α2)x1 (n : even),
1 + α1x1 (n : odd).(16)
The phase constant in (16) depends on whether n is aneven number or an odd number. This structure plays acrucial role for 2-periodic behaviour of the solution.
Let us ultradiscretize (16). First, we put
p1 = eP1/ε, q1 = eQ1/ε,
α1 = eA1/ε, α2 = eA2/ε (A1 < A2),(17)
and take the limit ε ↓ 0. Then we have the ultradiscreteanalogue of (16),
T mn =
max(0, P1n + Q1m + A2) (n : even),
max(0, P1n + Q1m + A1) (n : odd).(18)
Note that P1 and Q1 should satisfy the dispersion rela-tion
|P1 + Q1| = |L| + |P1 − Q1|, (19)
which is obtained by ultradiscretizing (11). Substitut-ing (18) into (10), we have Um
n solving (8). For generalparameters, the solution describes travelling pulse withoscillation. In order to emphasize its periodic behaviour,we set P = Q = |L|/2, which satisfy (19), and introducenew independent variables (k, l) by
n = k − l, m = k + l. (20)
Fig. 1–3 show behaviour of Umn for various values of pa-
rameters A1, A2. In all cases, the solution gives local-ized pulse for fixed time l. Each pulse is almost stableand its shape changes for l in period 2. Hence, this so-lution clearly describes oscillatory phenomena. Further-more, its behaviour is similar to that of the breathersolution.
l: even
æ æ
æ
æ
æ æ
-3 -2 -1 1 2k
1
2
Uk-lk+l
l: odd
æ æ
æ
æ æ æ
-3 -2 -1 1 2k
1
2
Uk-lk+l
Fig. 1. An example of oscillatory solution. L = 2, P1 = Q1 = 1,A1 = 1, A2 = 2.
l: even
æ æ
æ
æ æ
æ
æ æ
-5 -4 -3 -2 -1 1 2k
1Uk-l
k+ll: odd
æ æ æ
æ æ
æ æ æ
-5 -4 -3 -2 -1 1 2k
1Uk-l
k+l
Fig. 2. An example of oscillatory solution. L = 2, P1 = Q1 = 1,A1 = 1, A2 = 5.
– 26 –
JSIAM Letters Vol. 1 (2009) pp.25–27 Shin Isojima and Junkichi Satsuma
l: even
æ æ æ
æ
æ æ æ æ
æ
æ æ
-8-7-6-5-4-3-2-1 1 2k
1
2
Uk-lk+l
l: odd
æ æ æ æ æ æ æ
æ
æ æ æ
-8-7-6-5-4-3-2-1 1 2k
1
2
Uk-lk+l
Fig. 3. An example of oscillatory solution. L = 2, P1 = Q1 = 1,A1 = 1, A2 = 10.
For the sake of constructing the solution with richerstructure, we consider the 4-soliton solution
τmn = 1 +
4∑
j=1
ajxj +
4∑
j<k, j,k=1
ajakbjkxjxk
+
4∑
j<k<l, j,k,l=1
ajakalbjkbjlbklxjxkxl
+ a1a2a3a4b12b13b14b23b24b34x1x2x3x4. (21)
If we put (15) and
p4 = −p3, q4 = q3, a3 = α3 + α4, a4 = α4, (22)
then we have
τmn =
1 + (α1 + 2α2)x1 + (α3 + 2α4)x3
+ (α1 + 2α2)(α3 + 2α4)b13x1x3 (n : even),
1 + α1x1 + α3x3 + α1α3b13x1x3 (n : odd).
(23)
Moreover, setting
pj = ePj/ε, qj = eQj/ε,
αj = eAj/ε (A1 < A2, A3 < A4)(24)
and taking the limit ε ↓ 0, we have
T mn =
max[ 0, X1 + A2, X3 + A4,
X1 + X3 + A2 + A4
+ 2(|P1 − P3| − |P1 + P3|) ] (n : even),
max[ 0, X1 + A1, X3 + A3,
X1 + X3 + A1 + A3
+ 2(|P1 − P3| − |P1 + P3|) ] (n : odd).
(25)
The solution Umn constructed from (25) and (10) de-
scribes interaction among oscillating pulses. We considerthe specific case P1 = Q1 = |L|/2, P3 = Q3 = −|L|/2and introducing independent variables (k, l) defined by(20). We observe pulses which are almost stable andchange their shape in period 2 (see Fig. 4).
We can obtain a solution which describes larger num-bers of oscillating pulse by starting from the (2N)-soliton solution. We would, however, comment that wehave only two choices of P , Q such that P = Q and(19) holds, namely P = Q = ±|L|/2. Hence, the oscilla-tory solution constructed from the (2N)-soliton solutionmay be understood as nonlinear superposition of the so-lutions given in this section.
l: even
æ æ
æ
æ æ
æ
æ æ æ
æ
æ æ æ æ
æ
æ
-7-6-5-4-3-2-1 1 2 3 4 5 6 7 8k
1
2
Uk-lk+l
l: odd
æ æ æ
æ æ
æ æ æ æ æ
æ
æ æ æ æ æ
-7-6-5-4-3-2-1 1 2 3 4 5 6 7 8k
1
2
Uk-lk+l
Fig. 4. An example of oscillatory solution with richer structure.L = 2, P1 = Q1 = 1, P3 = Q3 = −1, A1 = 1, A2 = 5, A3 = 1,A4 = 10.
4. Concluding Remarks
We have given exact solutions of the udSG equationwhich describe oscillatory phenomena. They are consid-ered to be a counterpart of the breather solution. It isan interesting problem to construct oscillatory solutionsfor other ultradiscrete systems by applying the proce-dure developed in Section 3. We also comment that theperiod of oscillation of our solution is essentially 2 by itsconstruction. It is a future problem to find the ultradis-crete system having solutions with arbitrary periods.
References
[1] S. Wolfram, A New Kind of Science, Wolfram Media, Inc.,Champaign, 2002.
[2] T. Tokihiro, D. Takahashi, J. Matsukidaira and J. Sat-suma, From soliton equations to integrable cellular automatathrough a limiting procedure, Phys. Rev. Lett., 76 (1996),3247–3250.
[3] D. Takahashi, T. Tokihiro, B. Grammaticos, Y. Ohta and A.Ramani, Constructing solutions to the ultradiscrete Painleveequations, J. Phys. A: Math. Gen., 30 (1997), 7953–7966.
[4] A. Nobe, Ultradiscretization of elliptic functions and its ap-plications to integrable systems, J. Phys. A: Math. Gen., 39(2006), L335–L342.
[5] S. Isojima, B. Grammaticos, A. Ramani and J. Satsuma, Ul-tradiscretization without positivity, J. Phys. A: Math. Gen.,39 (2006), 3663–3672.
[6] J. Matsukidaira, J. Satsuma, D. Takahashi, T. Tokihiro andM. Torii, Toda-type cellular automaton and its N-soliton so-lution, Phys. Lett. A, 225 (1997), 287–295.
[7] M. Murata, S. Isojima, A. Nobe and J. Satsuma, Exact solu-tions for discrete and ultradiscrete modified KdV equationsand their relation to box-ball systems, J. Phys.A: Math.Gen.,39 (2006), L27–L34.
[8] S. Isojima, M. Murata, A. Nobe and J. Satsuma, An ultradis-cretization of the sine-Gordon equation, Phys. Lett. A, 331(2004), 378–386.
[9] R. Hirota, Nonlinear partial difference equations III; discretesine-Gordon equation, J. Phys. Soc. Japan, 43 (1977), 2079–2086.
– 27 –
JSIAM Letters Vol.1 (2009) pp.28–31 c©2009 Japan Society for Industrial and Applied Mathematics
Computational and Symbolic Anonymity
in an Unbounded Network
Hubert Comon-Lundh1,2
, Yusuke Kawamoto3
and Hideki Sakurada4
Ecole Normale Superieure de Cachan, 1
Research Center for Information Security, National Institute of Advanced Industrial Scienceand Technology, Akihabara-Daibiru Room 1102, 1–18–13, Sotokanda, Chiyoda-ku, Tokyo 101–0021, Japan2
Graduate School of Information Science and Technology, The University of Tokyo, 7–3–1Hongo, Bunkyo-ku, Tokyo 113–8656, Japan3
NTT Communication Science Laboratories, NTT Corporation, 3–1 Morinosato-Wakamiya,Atsugi, Kanagawa 243–0198, Japan4
E-mail h.comon-lundh at aist.go.jp
Received October 17, 2008, Accepted February 24, 2009 (INVITED PAPER)
Abstract
We provide a formal model for protocols using ring signatures and prove that this model iscomputationally sound: if there is an attack in the computational world, then there is anattack in the formal (abstract) model. Our original contribution is that we consider securityproperties, such as anonymity, which are not properties of a single execution trace, whileconsidering an unbounded number of sessions of the protocol.
Keywords computational soundness, security protocols, communicating processes, ring sig-natures
Research Activity Group Formal Approach to Information Security
1. Introduction
There are two main approaches to protocol security.The first approach considers an attacker modeled asa probabilistic polynomial time interactive Turing ma-chine (PPT) and the protocol is an unbounded numberof copies of PPTs. The attacker is assumed to controlthe network and can schedule the communications andsend fake messages. The security property is defined asan indistinguishability game: the protocol is secure if, forany attacker A, the probability that A gets an advan-tage in this game is negligible. A typical example is theanonymity property, by which an attacker should notbe able to distinguish between two networks in one ofwhich identities have been switched. The problem withsuch computational security notions is the difficulty inobtaining detailed proofs: they are in general unmanage-able, and cannot be verified by automatic tools.
The second approach relies on a formal model: bit-strings are abstracted by formal expressions (terms), theattacker is any formal process, and security properties,such as anonymity, can be expressed by the observationalequivalence of processes. This model is much simpler:there is no coin tossing, no complexity bounds, and theattacker is given only a fixed set of primitive operations(the function symbols in the term algebra). Therefore itis not surprising that security proofs become much sim-pler and can sometimes be automatized. However, thedrawback is that we might miss some attack becausethe model might be too rough.
Starting with the seminal work of Abadi and Rog-away [1], there have been several results showing the
computational soundness of the formal models: we donot miss any attacks when considering the abstractmodel, provided that the security primitives satisfy cer-tain properties; for instance IND-CPA or IND-CCA inthe case of encryption. Such results allow to perform for-mal symbolic proofs, while yielding computational se-curity guarantees. It is therefore an approach, that isrelevant, in principle, to all protocol security proofs.
The present paper is a contribution to this line ofresearch. Until recently, only a few security propertieswere considered in the soundness results. Roughly speak-ing, only passive attackers or the properties of executiontraces were considered. However, several properties, suchas anonymity, cannot be expressed as a property satis-fied by all execution traces. Here we consider an activeattacker and indistinguishability properties. In [1], theauthors only consider a passive attacker and encryptionschemes, while we are considering ring signatures andactive intruders: we cannot rely on their results in thepresent paper.
This problem has been discussed in two recent papers.In [2], we reported a soundness result for the anonymityof ring signatures. However, we assumed only a fixednumber of instances of the protocol, which is a strongsimplification. Furthermore, the symbolic model gavequite a lot of power to the attacker and the soundnessproof was dedicated to anonymity. In [3], there are nosuch restrictions, however the results are limited to sym-metric encryption, which does not provide any hint asregards an adequate formal model for ring signatures.
The current paper bridges these two recent studies: we
– 28 –
JSIAM Letters Vol. 1 (2009) pp.28–31 Hubert Comon-Lundh et al.
consider a formal model for ring signatures and prove thesoundness of observational equivalence for an unboundednumber of sessions.
2. Ring Signatures
The aim of a ring signature is to enable the verificationwithout revealing the signer’s identity with a group ofsigners.
A ring signature scheme RS = (G,S,V) consists oftwo probabilistic algorithms G and S, and a determinis-tic algorithm V:
• The key-generation algorithm G, given a securityparameter 1η, outputs a private signing key and apublic verification key.
• The signing algorithm S, given a signing key, a set ofverification keys and a message, outputs a signaturefor the message.
• The verification algorithm V, given a set of verifi-cation keys, a message, and a signature, outputs 0or 1.
If a signature is produced by S with keys generated byG, then the verification of the signature always succeeds.
We consider two security notions for ring signatureschemes: existential unforgeability and basic anonymity
[4]. A ring signature scheme RS is existentially unforge-
able if a signature cannot be forged without knowing thesigning key: for any PPT attacker A having access to anoracle O, the following probability is negligible in η:
Pr
(sk1, vk1)← G(1η); · · · ; (skn, vkn)← G(1η);Lleg:=vk1, . . . , vkn;(L,m, σ)← AO(1η,Lleg) :L ⊆ Lleg and V(L,m, σ) = 1 andfor any i with vki ∈ L, neither sign(i,L,m)nor corrupt(i) has been queried to O
where the oracle O returns σ ← S(ski,L,m) whenqueried with sign(i,L,m) and returns ski when queriedwith corrupt(i).RS is basically anonymous if the signer of a message
cannot be inferred: for any PPT attacker A having accessto an oracle O (as above), the following probability isnegligible in η:
Pr
(sk1, vk1)← G(1η); · · · ; (skn, vkn)← G(1η);Lleg:=vk1, . . . , vkn;(i0, i1,L,m, ω)← AO(1η,Lleg);
b$← 0, 1;
σ ← S(skib,L,m);
b′ ← AO(ω, σ);Neither corrupt(i0) nor corrupt(i1)has been queried to O :b = b′
−1
2.
In addition, we assume unpredictability, which meansthat no PPT attacker, even with the signing keys, canpredict the output of the signing algorithm. Unpre-dictability is also assumed in the soundness of symboliczero-knowledge proofs [5]. It is easily obtained by addingextra random bits to signatures.
3. Symbolic Model
We use a fragment [3] of the applied pi-calculus [6].Below, we only give the definitions related to ring signa-tures: for other constructions, refer to [3].
3.1 Terms, predicates and equational theory
The names are split into several disjoint sets:
• identities: K: we confuse the identities and the pri-vate signing keys held by those identities.
• random symbols: R• nonces: NThe set T of ground terms is obtained from the names
by applying the following function symbols, with somerestrictions on the types of their arguments.
• vk(k) constructs a verification key from a signingkey k ∈ K,
• 〈u, v〉 is a pair consisting of two terms u, v,
• check(u,VK ) checks the validity of a signature uw.r.t. a set of verification keys VK = vk(k1), . . . ,vk(kn),
• [u]rk,VK constructs a signature for u ∈ T witha signing key k ∈ K, verification keys VK =vk(k1), . . . , vk(kn) and randomness r ∈ R; twosignature terms with the same random symbol rmust be identical,
• RR(u, r) modifies the random number used in a sig-nature u, replacing it with r ∈ R,
• π1(u), π2(u) retrieve the components of a pair.
These function symbols satisfy certain equations,which we turn into rewrite rules:
π1(〈x, y〉) → xπ2(〈x, y〉) → y
check([x]ry,Z , Z) → x if vk(y) ∈ ZRR([x]ry,Z , r
′) → [x]r′
y,Z
.
This defines an (infinite) convergent term rewritingsystem on terms. The normal form of u is written as u↓.
We also introduce predicate symbols that reflect the(maximal) distinguishing capabilities of an attacker:
• M is the well-formedness predicate on ground terms:M(u) is true if u is in normal form and u does notcontain the symbols π1, π2, check, RR.
• EQ is the strict equality predicate: EQ(u, v) holdsif u = v and both terms are well-formed.
• SK is true on pairs of well-formed terms (k, [s]rk,V ):an attacker who knows a signing key can checkwhether that key is used for signing a given mes-sage.
3.2 Frames and static equivalence
A frame is a sequence of ground terms in whichsome names (typically secret keys) n are hidden: φ =νn.u1, . . . , uk. We let bn(φ) be n. The frames will recordthe sequences of messages sent over the network. Witheach frame φ = νn.s1, . . . , sm, we associate a substitu-tion σφ that replaces the variable xi with si.
– 29 –
JSIAM Letters Vol. 1 (2009) pp.28–31 Hubert Comon-Lundh et al.
A term s is deducible from a frame φ, which we writeas φ ⊢ s, if there is a term u with m variables, notusing the names hidden in φ and such that uσφ ↓= s.This captures the possible attacker’s computations on asequence of messages.
Two frames φ1, φ2 are equivalent, which is written asφ1 ∼ φ2, if, for any terms u, v (with m variables andnot using the names hidden by the frames), M(uσφ1
↓) holds iff M(uσφ2
↓) holds and, for P ∈ EQ,SK,P (uσφ1
↓, vσφ1↓) holds iff P (uσφ2
↓, vσφ2↓) holds. In
words: when we apply any combination of functions tothe two frames, the results always look similar.
Examples
νn,k, r, k′, r′.[n]r′
k′,V , [n]rk,V
∼ νn′, k, r, k′, r′.[n′]rk,V , [n′]r
′
k′,V
since the attacker can only observe an equality betweenthe two signed messages.
[n]r′
k,V 6∼ [n′]rk,V
as soon as n 6= n′ since, unlike the previous example,n, n′ are not hidden, and so can be used by the attacker:EQ(check(x, V ), n) holds on the first message and noton the second.
νk, νr.[u]rk,V , k 6∼ νk, νk′.νr.[u]rk,V , k′
since SK is true on the first sequence and not on thesecond.
3.3 Computation trees, symbolic equivalence
If φ is a frame, we letK(φ) be the set of keys deduciblefrom φ.
A computation tree is a tree whose nodes are labeledwith states (out of a setQ) and frames, and the edges are
labeled with terms. We write tu−→ t′ if there is an edge
labeled with u departing from the root of t and yieldingthe subtree t′. φ(t) is the frame labeling the root of tand q(t) is the state labeling the root of t.∼ is extended to computation trees: ∼ is the largest
equivalence relation on trees such that, if t1 ∼ t2, then
• φ(t1) ∼ φ(t2),
• if t1u1−→ t′1 then there exist u2, t
′
2 such that
t2u2−→ t′2 and t′1 ∼ t′2, and
• if t2u2−→ t′2 then there exist u1, t
′
1 such that
t1u1−→ t′1 and t′1 ∼ t′2.
3.4 Symbolic equivalence of reduced trees
For each sequence of verification keys, we let the firstnon-compromised key be its representative. When allsubterms [u]rk,VK of a frame φ are such that k is therepresentative of keys in VK , we say that φ is reduced.A computation tree is reduced if all the frames labelingits nodes are reduced.
Let ≃ be the equivalence relation on frames definedby: νn1.u1 ≃ νn2.u2 iff there are renamings ρ1 of n1 andρ2 of n2 such that ρ1(u1) = ρ2(u2). ≃ is extended tocomputation trees in the same way as ∼ was extended.
Lemma 1 Let t1, t2 be two reduced computation trees.
Then t1 ∼ t2 iff t1 ≃ t2.
3.5 Processes
A protocol is specified as a simple process, which isa parallel composition of processes that repeatedly re-ceive a message, test it, and send messages. Each testis specified by a conjunction of atomic predicates. Eachmessage is assumed to include its intended recipient.
Each process P in the calculus can be associated witha computation tree tP that records all possible interac-tions with the network: labels of edges are messages fromthe attacker and nodes are labeled with the state of thenetwork and the record of messages that have alreadybeen sent.
4. Computational Interpretation
4.1 Computational interpretation of terms
Given a security parameter η and an interpretationτ of names as bitstrings, a computational interpretation[[t]]τη of each term t is defined as in [3]. We assume that theinterpretation of a ring signature [u]rk,VK contains theinterpretations of u and VK in addition to the signaturebitstring ms: [[[u]rk,VK ]]τη = ([[u]]τη ,ms, [[VK ]]τη). We alsoassume that verification keys come with a certificate:the attacker cannot generate such keys oneself and mustget them from an authority.
4.2 Computational indistinguishability of computation
trees
Given a security parameter η and an interpretation τof names as bitstrings, we assume that there is a totalinjective parsing function κτ
η from bitstrings to terms.From injectivity, for every m, [[κτ
η(m)]]τη = m.Given a computation tree t and an assignment τ of
names to bitstrings, the oracle Ot,τ is defined as follows:
• When queried for the first time with a bitstring m,it returns [[φ(t′)]]τη if t
u−→ t′ and u = κτη(m).
• If there is no edge labeled with κτη(m) and departing
from the root of t, it returns an error message.
• After the first query, it behaves as Ot′,τ .
t1 and t2 are computationally indistinguishable, whichwe write as t1 ≈ t2, if, for any PPT AO,
∣
∣Pr[
τ : AOt1,τ (0η) = 1]
−Pr[
τ : AOt2,τ (0η) = 1]∣
∣
is negligible in the security parameter η.
4.3 Tree soundness
We consider trees without dynamic corruption. In sucha tree t, if ψ is labeling any node of t, we assume thatK(ψ) = K(φ(t)): corrupted keys are identical along allbranches of the tree.
Given a frame φ and a term u, ΨVK ,φ(u) is the termobtained by replacing signatures [s]rk,VK occurring in uwith [s]rk′,VK if k′ is a minimal element in k1 ∈ bn(φ) \K(φ) | vk(k1) ∈ VK. ΨVK is the function that mapseach frame φ to the frame in which all subterms u of φare replaced with ΨVK ,φ(u).
ΨVK is extended to computation trees as follows:φ(ΨVK (t)) = ΨVK (φ(t)), q(ΨVK (t)) = q(t) and, if
tu−→ t′, then ΨVK (t)
ΨVK ,φ(t)(u)−−−−−−−→ ΨVK (t′).
Note that all labels of edges departing from a nodein ΨVK (t) are distinct, as soon as it is the case for t,
– 30 –
JSIAM Letters Vol. 1 (2009) pp.28–31 Hubert Comon-Lundh et al.
because different random symbols must be used for dif-ferent signatures.
Lemma 2 For any computation tree t without dy-
namic corruption, and any set of verification keys VK,
ΨVK (t) ∼ t.Lemma 3 If t1 ≃ t2, then t1 ≈ t2.Lemma 4 Assuming basic anonymity, t ≈ ΨVK (t).
For this crucial lemma, we need to build a machineB, which breaks the basic anonymity, from a machine Athat distinguishes t and ΨVK (t). Roughly speaking, Bwill simulate the network, keeping the state in its mem-ory, and behave as A: when A sends a query m,B parsesm, computes the next state and obtains the symbolicreply u. Then B computes [[u]]τη , possibly sending re-quests to the signing oracle. When such a request wouldyield different answers depending on whether A interactswith t or ΨVK (t), then B requests a signed message andguesses the signer according to the guess of A.
Lemma 5 (Tree soundness for ring signatures)Assuming basic anonymity, if t, t′ are computation trees
without dynamic corruption such that t ∼ t′, then t ≈ t′.Proof sketch We successively apply ΨVK to all setsof verification keys VK occurring in the tree and applyLemmas 1–4.
(QED)
4.4 Trace mapping
We assume here that there is no occurrence of RR orSK in the protocol and that M can be implemented inPTIME. For any simple process P , security parameterη and random tape τ , [[P ]]τη is the computational inter-pretation of P . It behaves as P , except that it sends, re-ceives and compares bitstrings instead of terms. Given aPPT attacker A, a sample τ and the network [[P ]]τη , theexecution of A‖[[P ]]τη yields a (computational) messagesequence. Mesg(P, η, τ, A) is the set of messages that areproduced by either the agents or the attacker along theexecution of A‖[[P ]]τη . The execution of A‖[[P ]]τη is fully
abstracted by a path p of a computation tree t if themessage sequence is the computational interpretation ofthe sequence of symbolic messages in p.
We show that message sequences of A‖[[P ]]τη are fullyabstracted by some path of tP , with an overwhelmingprobability. First we identify the cases in which a com-putational trace cannot be fully abstracted:
Lemma 6 Let P be a simple process, A a PPT at-
tacker, η a security parameter and τ a random tape.
Then either of the following conditions holds:
• The execution of A‖[[P ]]τη is fully abstracted by some
path p in t.
• In the execution of A‖[[P ]]τη, the attacker A sends a
message m after receiving the messages in [[φ]]τη and
there is a message m1 polynomially computable from
m that is neither a pair nor a signature of a group
with a corrupted member and such that φ 6⊢ κτη(m1).
• There are a subterm v of κτη(Mesg(P, η, τ, A)) and
a verification key VK such that check(v,VK ) is in
normal form, while V([[v]]τη) = 1.
• There are a name k occurring in P and a term uoccurring as a subterm in κτ
η(Mesg(P, η, τ, A)) such
that τ(k) = [[u]]τη and k 6= u,
• There are a name k occurring in P and a term vk(u)occurring as a subterm in κτ
η(Mesg(P, η, τ, A)) such
that [[vk(k)]]τη = [[vk(u)]]τη and k 6= u,
• There are a term [u]rk,Z occurring as a subterm in
κτη(Mesg(P, η, τ, A)) and a term [u]r
′
k′,Z such that
[[[u]rk,Z ]]τη = [[[u]r′
k′,Z ]]τη and [u]rk,Z 6= [u]r′
k′,Z .
Lemma 7 Assume unforgeability and unpredictability.
Let P be a simple process, t its process computation tree,
and A a PPT attacker. With an overwhelming probabil-
ity over all samples τ , there is a path p in t that fully ab-
stracts the computational message sequence of A ‖ [[P ]]τη.
Proof sketch Assume that the probability is not over-whelming. One of the cases, except the first one, in theprevious lemma is true with a non-negligible probability.For each of the cases, we can construct a PPT attackerthat breaks either unforgeability or unpredictability bysimulating [[P ]]τη and calling A as a subroutine.
(QED)
5. Soundness of Observational Equiva-
lence
The anonymity of a protocol is specified by theequivalence between, for example, two simple processesP0(k0)‖P1(k1) and P0(k1)‖P1(k0) where k0 and k1 arethe identities (signing keys) of two agents. (We omit thedetails of how we publish vk(k0) and vk(k1).) The sym-bolic anonymity P0(k0)‖P1(k1) ∼ P0(k1)‖P1(k0), im-plies the computational anonymity [[P0(k0)‖P1(k1)]] ≈[[P0(k1)‖P1(k0)]] thanks to the soundness theorem be-low.
Theorem 1 Assume basic anonymity, unforgeability
and unpredictability. Let P and Q be simple processes
and A be a PPT attacker. If P ∼ Q, then [[P ]] ≈ [[Q]].
Proof sketch As shown in [3], P ∼ Q implies tP ∼ tQ.Then tP ≈ tQ follows from Lemma 5. Then [[P ]] ≈ [[Q]]follows from Lemma 7.
(QED)
References
[1] M. Abadi and P. Rogaway, Reconciling two views of cryptog-
raphy (the computational soundness of formal encryption), J.Cryptology, 15 (2) (2002), 103–127.
[2] Y. Kawamoto, H. Sakurada and M. Hagiya, Computationallysound symbolic anonymity of a ring signature, in: Proc. FCS-
ARSPA-WITS’08, pp. 161–175, 2008.[3] H. Comon-Lundh and V. Cortier, Computational soundness
of observational equivalence, in: Proc. CCS’08, pp. 109–118,ACM, 2008.
[4] A. Bender, J. Katz and R. Morselli, Ring signatures: Strongerdefinitions, and constructions without random oracles, in: The-ory of Cryptography, Proc. TCC’06, Lect. Notes Comp. Sci.,
pp. 60–79, Springer-Verlag, 2006.[5] M. Backes and D. Unruh, Computational soundness of sym-
bolic zero-knowledge proofs against active attackers, in: Proc.CSF’08, pp. 255–269, IEEE Computer Society, 2008.
[6] M. Abadi and C. Fournet, Mobile values, new names, and se-cure communication, in: Proc. POPL’01, pp. 104–115, ACM,
2001.
– 31 –
JSIAM Letters Vol.1 (2009) pp.32–35 c©2009 Japan Society for Industrial and Applied Mathematics
Reformulation of the Anderson method
using singular value decomposition
for stable convergence in self-consistent calculations
Akitaka Sawamura1
Sumitomo Electric Industries, Ltd., 1-1-3, Shimaya, Konohana-ku, Osaka 554-0024, Japan1
E-mail [email protected]
Received March 18, 2009, Accepted April 20, 2009
Abstract
The Anderson method provides a significant acceleration of convergence in solving nonlinearsimultaneous equations by trying to minimize the residual norm in a least-square sense ateach iteration step. In the present study I use singular value decomposition to reformulatethe Anderson method. The proposed version contains only a single parameter which shouldbe determined in a trial-and-error way, whereas the original one contains two. This reductionleads to stable convergence in real-world self-consistent electronic structure calculations.
Keywords nonlinear simultaneous equations, least-square method, the Broyden method,the Pulay method, electronic-structure calculations
Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications
1. Introduction
In the past few years first-principles calculations basedon density functional theory [1] have gained enormousinterest among solid-state physists, materials scientists,and quantum chemists. The Kohn-Sham equation [2],which plays a vital role within the density functionaltheory, is not only an eigenvalue problem, but also animplicitly defined, nonlinear fixed-point problem of in-terelecton potential [3–5] at least when local density ap-proximation [2] is introduced. In other words, the Kohn-Sham equation is solved when the self-consistent inter-electon potential is found. The Anderson method [6] isfrequently employed for this purpose. It should be notedthat the Pulay method [7] and limited-memory modifica-tions [8–11] of the second Broyden method [12] are essen-tially equivalent to the Anderson method [13], while thefirst Broyden method can also be cast into the limited-memory form [14,15].
Suppose that for a system of nonlinear equations~F (~x) = ~0, there are independent variable column vec-tors,
~xn, ~xn−1, . . . , ~xn−k ,
which are hopefully approaching a solution, and accom-panying residual column vectors,
~yn, ~yn−1, . . . , ~yn−k ,
where subscripts denote iteration steps. In a simple it-eration method, the independent vector at the (n+1)thiteration step is given by
~xn+1 = ~xn + α~yn, (1)
where α is a mixing factor ranging from a scalar toa preconditioning matrix [16, 17] to a nonlinear proce-
dure [18,19]. In the Anderson method, however, a virtualresidual vector,
~y ⋆n = ~yn +
∑
1≤ν≤k
γν
~yn−ν+1 − ~yn−ν
‖~yn−ν+1 − ~yn−ν‖, (2)
is introduced. Here γν are parameters to be so deter-mined that the virtual residual norm ‖~y ⋆
n‖ is minimizedin a least-square sense. Then an accompanying virtualindependent vector,
~x ⋆n = ~xn +
∑
1≤ν≤k
γν
~xn−ν+1 − ~xn−ν
‖~yn−ν+1 − ~yn−ν‖, (3)
is defined on assumption of linearity. ~x ⋆n is expected
to be a minimizer for∥
∥
∥
~F∥
∥
∥ within the available sub-
space ~xn, ~xn−1, . . . , ~xn−k. Last, the independent vari-able vector for the next step is predicted by applying thesimple iteration method to ~x ⋆
n and ~y ⋆n as
~xn+1 = ~x ⋆n + α~y ⋆
n . (4)
In practice, a specialized linear solver should be usedto determine the parameters γν reliably without encoun-tering numerical instability. This means that a maxi-mum condition number must be set for the linear solverbeforehand. Moreover a limit for the number of the pre-vious independent and residual vectors considered mustbe also set beforehand. Since the two parameters can-not be obtained a priori, they are determined in an adhoc way. In the present study I eliminate the latter byreformulating the Anderson method based on singularvalue decomposition (SVD) [20]. This makes applicationof the Anderson method a little easier. Furthermore, sta-ble convergence is achieved in the sense that the num-bers of iteration steps required by self-consistency are
– 32 –
JSIAM Letters Vol. 1 (2009) pp.32–35 Akitaka Sawamura
less sensitive with respect to the remaining parametersas confirmed by test calculations.
2. Conventional Method
For simplicity I define a rectangular matrix as
Yn =
~yn − ~yn−1
‖~yn − ~yn−1‖,
~yn−1 − ~yn−2
‖~yn−1 − ~yn−2‖,
. . . ,~yn−k+1 − ~yn−k
‖~yn−k+1 − ~yn−k‖
, (5)
and a column vector containing γν as
Γ =
γ1
γ2
...γk
. (6)
I omit the right-pointing arrow above Γ to emphasizethat in general Γ is different form xν and yν in the num-ber of rows. Using Yn and Γ, (2) is rewritten as
~y ⋆n = ~yn + YnΓ. (7)
The formal solution of Γ which minimizes ‖~y ⋆n‖ is given
by
Γ = −(
Y Tn Yn
)−1Y T
n ~yn. (8)
Determining Γ using (8) literally should be discour-aged, because of the potentially large condition numberof Y T
n Yn. Instead, Γ is computed in a following way.First, if the SVD is employed, Yn is factorized into
Yn = YnΣnV Tn , (9)
where Yn and Vn are matrices containing the left andright singular vectors of Yn, respectively, while Σn is adiagonal matrix of the singular values. Then a corre-sponding truncated factorization,
Yn ≈ Y n = YnΣnVT
n , (10)
is considered. Here Σn is a diagonal matrix of the l
largest singular values of Σn, while Yn and VT
n containthe l column vectors of Yn and the l row vectors of V T
n
corresponding to the l largest singular values, respec-tively. l, the effective rank of Yn, is the largest integerso determined that the condition number of Σn does notexceed the first predetermined limit smax. Of course, lcan be equal to k. Last, Γ is given by
Γ = −V nΣ−1
n YT
n~yn. (11)
At the next iteration step, Yn+1 may be set to be
Yn+1 =
~yn+1 − ~yn
‖~yn+1 − ~yn‖, Yn
=
~yn+1 − ~yn
‖~yn+1 − ~yn‖,
~yn − ~yn−1
‖~yn − ~yn−1‖,
. . . ,~yn−k+1 − ~yn−k
‖~yn−k+1 − ~yn−k‖
. (12)
The usual practice is, however, that if k has reached thesecond predetermined limit kmax, the rightmost (oldest)
column of the right-hand side of (12) is removed as
Yn+1 =
~yn+1 − ~yn
‖~yn+1 − ~yn‖,
~yn − ~yn−1
‖~yn − ~yn−1‖,
. . . ,~yn−k+2 − ~yn−k+1
‖~yn−k+2 − ~yn−k+1‖
, (13)
to avoid excessive growth.
3. Proposed Method
Along with (5), I define a rectangular matrix contain-ing the independent variable vectors as
Xn =
~xn − ~xn−1
‖~yn − ~yn−1‖,
~xn−1 − ~xn−2
‖~yn−1 − ~yn−2‖,
. . . ,~xn−k+1 − ~xn−k
‖~yn−k+1 − ~yn−k‖
. (14)
Since Yn = YnV nΣ−1
n holds, a similar quantity,
Xn = XnV nΣ−1
n , (15)
is introduced. ~x ⋆n and ~y ⋆
n are computed by working withXn and Yn as
~x ⋆n = ~xn + XnΓ′ (16)
and
~y ⋆n = ~yn + YnΓ′, (17)
respectively, where Γ′ is obtained by
Γ′ = −YT
n~yn. (18)
At the next iteration step, Xn+1 and Yn+1 are updatedby
Xn+1 =
~xn+1 − ~xn
‖~yn+1 − ~yn‖,Xn
(19)
and
Yn+1 =
~yn+1 − ~yn
‖~yn+1 − ~yn‖,Yn
, (20)
respectively. Xn and Yn consist of the l column vectorswhile by construction l ≤ k holds. Therefore Xn+1 andYn+1 are unlikely to fatten endlessly even if no limit isimposed. No column in Xn+1 and Yn+1 has to be dis-carded artificially.
Since Yn represents a numerically effective subspacespanned by Yn, replacing (12) with (20) makes little in-formation carried in Yn+1 be lost even in the case ofl < k. This no longer holds when the leftmost columnis discarded as in the case of (13). Nevertheless if kmax
is set too large with smax kept moderate in the con-ventional method, the predicted xn+1 may be contam-inated by excessively old xν and yν , because Γ in (11)is a minimum-norm least-square solution. Therefore theproposed method is expected to outperform the conven-tional one.
4. Test Calculations
The conventional and proposed methods are com-pared by applying them to first-principles calculationsfor wurzite ZnO based on plane-wave, pseudopotential
– 33 –
JSIAM Letters Vol. 1 (2009) pp.32–35 Akitaka Sawamura
Table 1. Iterations required to reach self-consistency for wurzite ZnO with various maximum singular values smax and history datalimits kmax. Maximum numbers of history data reached in the proposed method are shown in parentheses. Note that lattice parametersand atomic positions in the unit cell are optimized.
α = 0.2
1/smax 3× 10−1 1× 10−1 3× 10−2 1× 10−2 3× 10−3 1× 10−3 3× 10−4 1× 10−4
kmax
5 74 67 65 49 51 48 50 50
10 74 54 56 46 47 42 42 45
20 55 62 51 47 47 42 47 43Conventional
40 56 62 54 65 56 48 46 44
Proposed 43(14) 42(14) 42(15) 45(20) 52(27) 42(28) 42(28) 42(28)
α = 0.4
1/smax 3× 10−1 1× 10−1 3× 10−2 1× 10−2 3× 10−3 1× 10−3 3× 10−4 1× 10−4
kmax
5 53 52 46 45 45 45 44 44
10 50 45 44 42 39 38 37 38
20 58 42 44 47 42 41 40 39Conventional
40 46 58 46 40 43 39 41 41
Proposed 38(12) 40(13) 36(15) 40(21) 40(23) 42(26) 41(27) 41(27)
α = 0.8
1/smax 3× 10−1 1× 10−1 3× 10−2 1× 10−2 3× 10−3 1× 10−3 3× 10−4 1× 10−4
kmax
5 39 33 33 31 32 31 32 3210 36 34 34 31 30 31 31 3120 35 35 37 35 33 32 34 34
Conventional
40 37 35 35 34 32 33 34 34
Proposed 32(6) 31(11) 32(16) 33(16) 34(20) 34(20) 34(20) 34(20)
α = 1.6
1/smax 3× 10−1 1× 10−1 3× 10−2 1× 10−2 3× 10−3 1× 10−3 3× 10−4 1× 10−4
kmax
5 100 59 50 46 45 45 45 4510 139 59 42 40 40 37 39 39
20 167 63 45 40 41 37 38 37Conventional
40 72 68 71 47 45 37 38 40
Proposed 41(13) 37(13) 38(18) 38(20) 39(23) 40(25) 40(26) 40(26)
approach [21, 22]. Lattice parameters and atomic posi-tions in the unit cell are also optimized. Remaining tech-nical details are explained elsewhere [23]. The mixingfactor α is chosen to be a scalar parameter.
The parameters and results are shown in Table 1. Forα = 0.8, both the methods have achieved fast conver-gence of iteration steps below 40. Clearly, however, theproposed method is the less sensitive to the selectionof α and smax. Almost always the self-consistency isreached within about 40 steps. In contrast, when theparameters are chosen poorly, for example at α = 1.6and 1/smax = 3 × 10−1, the conventional method re-quires more than 100 iteration steps depending on kmax.More importantly, finding the optimal kmax seems tobe difficult, because though the iteration steps increasewith kmax ≤ 20, the fastest convergence is achieved atkmax = 40. As a whole whereas the larger smax is de-sirable for the conventional method, a guiding principlefor kmax is unclear. Table 1 shows also the maximum lreached within the proposed method. These values mightbe taken as the best kmax for the conventional method.At kmax near these values, however, the conventionalmethod does not necessarily show the comparable per-formance of the proposed one. This is likely because dis-carding the oldest column is not the best strategy to
keep Xn and Yn from excessive growth as pointed out inthe previous section.
5. Conclusion
Reformulation of the Anderson method for a systemof nonlinear equations has been described. The Ander-son method in practice requires two empirical parame-ters commanding to what extent stably the least-squareproblem appearing at each iteration step is solved andhow many vectors containing the convergence historyinformation are retained. In the proposed method theSVD is used to extract the effective information fromthe history vectors, rather than as a black-box tool forsolving the least-square problem. The extracted vectorsare chosen to play a role of storage space for the historyinformation. Thereby the latter empirical parameter isno more needed. This makes the proposed method bethe less sensitive to the selection of the remaining pa-rameter and the mixing factor and the more efficientbecause of a smarter way of discarding a redundant partof the history information, as supported by the stableconvergence in the test calculations.
– 34 –
JSIAM Letters Vol. 1 (2009) pp.32–35 Akitaka Sawamura
References
[1] P. Hohenberg and W. Kohn, Inhomogeneous electron gas,Phys. Rev. 136 (1964), B864–B871.
[2] W. Kohn and L. J. Sham, Self-consistent equations includ-ing exchange and correlation effects, Phys. Rev., 140 (1965),A1133–A1138.
[3] P. Bendt and A. Zunger, New approach for solving the
density-functional self-consistent-field problem, Phys. Rev. B,26 (1982), 3114–3137.
[4] J. F. Annett, Efficiency of algorithms for Kohn-Sham densityfunctional theory, Comput. Mater. Sci., 4 (1995), 23–42.
[5] X. Gonze, Toward a potential-based conjugate gradient al-gorithm for order-N self-consistent total energy calculations,Phys. Rev. B, 54 (1996), 4383–4386.
[6] D. G. Anderson, Iterative procedures for nonlinear integralequations, J. Assoc. Comput. Mach., 12 (1965), 547–560.
[7] P. Pulay, Convergence acceleration of iterative sequence: thecase of SCF iteration, Chem. Phys. Lett., 73 (1980), 393–398.
[8] G. P. Srivastava, Broyden’s method for self-consistent fieldconvergence acceleration, J. Phys. A: Math. Gen., 17 (1984),L317–L321.
[9] D.Vanderbilt and S.G.Louie, Total energies of diamond (111)surface reconstructions by a linear combination of atomic or-bitals method, Phys. Rev. B, 30 (1984), 6118–6130.
[10] D. D. Johnson, Modified Broyden’s method for accelerating
convergence in self-consistent calculations, Phys. Rev. B, 38(1988), 12807–12813.
[11] A. Sawamura, M. Kohyama, T. Keishi, M. Kaji, Accelerationof self-consistent electronic-structure calculations: Storage-
saving and multiple-secant implementation of the Broydenmethod, Mater. Trans., JIM, 40 (1999), 1186–1192.
[12] C. G. Broyden, A class of methods for solving nonlinear si-
multaneous equations, Math. Comput., 19 (1965), 577–593.[13] V. Eyert, A comparative study on methods for convergence
acceleration of iterative vector sequences, J. Comput. Phys.,124 (1996), 271–285.
[14] M. S. Engelman, G. Strang and K. -J. Bathe, The applicationof quasi-Newton method in fluid mechanics, Intern. J. Numer.Meth. Eng., 17 (1981), 707–718.
[15] R. H. Byrd, J. Nocedal and R. B. Schnabel, Representationsof quasi-Newton matrices and their use in limited memorymethods, Math. Progm., 63 (1994), 129–156.
[16] G. P. Kerker, Efficient iteration scheme for self-consistent
pseudopotential calculations, Phys. Rev. B, 23 (1981), 3082–3084.
[17] K. -M. Ho, J. Ihm and J. D. Joannopoulos, Dielectric ma-trix scheme for fast convergence in self-consistent electronic-
structure calculations, Phys. Rev. B, 25 (1982), 4260–4262.[18] D. Raczkowski, A. Canning and L. W. Wang, Thomas-Fermi
charge mixing for obtaining self-consistency in density func-
tional calculations, Phys. Rev. B, 64 (2001), 121101-1–121101-4.
[19] A. Sawamura and M. Kohyama, A second-variational predic-tion operator for fast convergence in self-consistent electronic
structure calculations, Mater. Trans., JIM, 45 (2004), 1422–1428.
[20] G.H.Golub and C.F.van Loan, Matrix computations, Second
edition, John Hopkins Univ. Press, London, 1989.[21] J. Ihm, A. Zunger and M. L. Cohen, Momentum-space for-
malism for the total energy of solids, J. Phys. C: Solid StatePhys., 12 (1979), 4409–4422.
[22] W. E. Pickett, Pseudopotential methods in condensed matterapplications, Comput. Phys. Rep., 9 (1989), 115–197.
[23] A.Sawamura, M.Kohyama and T.Keishi, An efficient precon-ditioning scheme for plane-wave-based electronic structure
calculations, Comput. Mater. Sci., 14 (1999), 4–7.
– 35 –
JSIAM Letters Vol.1 (2009) pp.36–39 c©2009 Japan Society for Industrial and Applied Mathematics
On the qd-type discrete hungry Lotka-Volterra system
and its application to the matrix eigenvalue algorithm
Akiko Fukuda1, Emiko Ishiwata
2, Masashi Iwasaki
3and Yoshimasa Nakamura
4
Graduate School of Science, Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, Tokyo162-8601, Japan1
Department of Mathematical Information Science, Tokyo University of Science, 1-3 Kagu-razaka, Shinjuku-ku, Tokyo 162-8601, Japan2
Faculty of Life and Environmental Sciences, Kyoto Prefectural University, 1-5 Nagaragi-cho,Shimogamo, Sakyo-ku , Kyoto 606-8522, Japan3
Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan4
E-mail [email protected]
Received December 22, 2008, Accepted May 25, 2009
Abstract
The discrete hungry Lotka-Volterra (dhLV) system is already shown to be applied to matrixeigenvalue algorithm. In this paper, we discuss a form of the dhLV system named as theqd-type dhLV system and associate it with a matrix eigenvalue computation. Along a waysimilar to the dqd algorithm, we also design a new algorithm without cancellation in terms ofthe qd-type dhLV system.
Keywords discrete hungry Lotka-Volterra system, dqd algorithm, matrix eigenvalue
Research Activity Group Applied Integrable Systems
1. Introduction
Integrable systems have some relationships to numer-ical algorithms. For example, the continuous time Todaequation corresponds to one step of the QR algorithm [1]for computing eigenvalues of a symmetric tridiagonalmatrix. A discretization of the Toda equation is justthe quotient difference (qd) algorithm [2]. The discreteToda (dToda) equation also leads to a new algorithm forthe Laplace transformation [3]. The discrete relativisticToda equation is applicable for continued fraction ex-pansion [4].
Some of the authors designed new algorithms namedthe dLV algorithm for computing singular values of abidiagonal matrix in terms of the integrable discreteLotka-Volterra (dLV) system [5]. For k = 1, 2, . . . , 2m−1and n = 0, 1, . . . ,
u(n+1)
k (1 + δ(n+1)u(n+1)
k−1) = u
(n)
k (1 + δ(n)u(n)
k+1), (1)
u(n)
0 ≡ 0, u(n)
2m ≡ 0,
where δ(n) is the n-th discrete step-size and u(n)
k de-notes the number of k-th species at the discrete time∑n−1
j=0δ(j). It is shown in [6] that u
(n)
2k−1and u
(n)
2k con-verge to certain positive constant and zero, respectively,as n → ∞. The dLV algorithm is also surveyed in arecent review paper [7].
Now we introduce new variables
q(n)
k :=1
δ(n)(1 + δ(n)u
(n)
2k−2)(1 + δ(n)u
(n)
2k−1), (2)
e(n)
k := δ(n)u(n)
2k−1u
(n)
2k . (3)
Then, the dLV system (1) yields the recursion formulaof the qd algorithm
q(n+1)
k+1= q
(n)
k+1− e
(n+1)
k + e(n)
k+1,
e(n+1)
k = e(n)
k
q(n)
k+1
q(n+1)
k
.(4)
As mentioned above, this recursion formula is equivalentto the dToda equation. Namely, the dLV system has arelationship to the dToda equation. Rutishauser intro-duced a modified version, named the dqd (differentialqd) algorithm [2], for the purpose of avoiding numericalinstability of qd algorithm.
Recently, in [8,9], we designed a new algorithm namedthe dhLV algorithm for computing complex eigenval-ues of a certain band matrix. The dhLV algorithmis derived from the integrable discrete hungry Lotka-Volterra (dhLV) system [10]. For k = 1, 2, . . . ,Mm andn = 0, 1, . . . ,
u(n+1)
k
M∏
j=1
(1+δ(n+1)u(n+1)
k−j ) = u(n)
k
M∏
j=1
(1+δ(n)u(n)
k+j), (5)
u(n)
1−M ≡ 0, . . . , u(n)
0 ≡ 0, u(n)
Mm+1≡ 0, . . . , u
(n)
Mm+M ≡ 0,
where Mk := (k−1)M+k, and the meaning of u(n)
k is thesame as that of the dLV system. The dLV system (1) isa prey-predator model that the k-th species is predatorof the (k + 1)-th species. On the other hand, the dhLVsystem (5) is derived by considering the case where thek-th species is predator of the (k + 1)-th, (k + 2)-th,. . . , (k + M)-th species. Of course, if M = 1 then (5)
– 36 –
JSIAM Letters Vol. 1 (2009) pp.36–39 Akiko Fukuda et al.
coincides with (1).In this paper, we discuss a new algorithm for comput-
ing matrix eigenvalues from a viewpoint of the qd-typedhLV system based on (5). See Section 3 for the qd-typedhLV system. Along a way similar to the dqd algorithm,we derive a recursion formula without subtraction.
This paper is organized as follows. In Section 2, we de-scribe some properties of the dhLV system. In Section 3,we show two invariants of the qd-type dhLV system. Weclarify a relationship between the qd-type dhLV systemand the matrix eigenvalue algorithm in Section 4. Wedesign an algorithm for computing eigenvalues withoutcancellation and demonstrate a numerical example. Inthe final section, we give concluding remarks.
2. Some properties for the dhLV system
In this section, we explain some properties for thedhLV system briefly. The matrix representation of (5)is given as
R(n)L(n+1) = L(n)R(n), (6)
L(n) := (e2, . . . , eM+1, U(n)
1 e1 + eM+2, . . . ,
U(n)
Mm−1eMm−1 + eMm+M , U
(n)
MmeMm
), (7)
R(n) := (V(n)
1 e1 + δ(n)eM+2, . . . ,
V(n)
Mm−1eMm−1 + δ(n)
eMm+M , . . . ,
V(n)
MmeMm
, . . . , V(n)
Mm+MeMm+M ), (8)
ek := ( 0, . . . , 0, 1︸ ︷︷ ︸
k
, 0, . . . , 0 )⊤, (9)
U(n)
k := u(n)
k
M∏
j=1
(1 + δ(n)u(n)
k−j), (10)
V(n)
k :=
M∏
j=0
(1 + δ(n)u(n)
k−j). (11)
Eq. (6) is called Lax form of the dhLV system (5), cf. [11,
12]. Assume that 0 < u(0)
k < K0 for k = 1, 2, . . . ,Mm,
then we have 0 < u(n)
k < K as is shown in [8, 9], whereK0 is an arbitrary and K is a related positive constant.
In (11), if δ(n) > 0 holds for n = 0, 1, . . . , then V(n)
k ≥ 1holds for k = 1, 2, . . . ,Mm + M in the Lax matrix (8).Hence, there exists the inverse matrix of R(n), and (6)can be rewritten as
L(n+1) = (R(n))−1L(n)R(n). (12)
This is a similarity transformation from L(n) to L(n+1).Namely, the eigenvalues of L(n) are invariant under theevolution from n to n + 1. Moreover, the eigenvalues ofL(n)+dI are invariant for any n, where I is a unit matrixand d is an arbitrary constant.
The asymptotic behavior of the dhLV system is asfollows.
limn→∞
u(n)
Mk= ck, k = 1, 2, . . . ,m, (13)
limn→∞
u(n)
Mk+p = 0, p = 1, 2, . . . ,M. (14)
See [9] for the proof of (13) and (14). By combining (10)and (11) with (13) and (14), it is obvious that the limits
of U(n)
k and V(n)
k also exist. As n → ∞, the Lax matrixL(n) + dI converges to
L(d) := limn→∞
(L(n) + dI)
=
L1(d) 0EM L2(d)
. . .. . .
0 EM Lm(d)
, (15)
where Lk(d) and EM are (M+1)×(M+1) block matricesdefined by
Lk(d) :=
d ck
1 d. . .
. . .
0 1 d
, EM :=
0 · · · 0 1
00...
0
.
It is of significance to note that L(n) +dI can be dividedinto several block matrices. The characteristic polyno-mial of L(d) is given as
det(λI − L(d)) =
m∏
k=1
(λ − d)M+1 − ck
.
Therefore, we obtain the eigenvalues λk,l of L(0) + dI asfollows.
λk,l = M+1√
ck
cos
(
2lπ
M + 1
)
+ i sin
(
2lπ
M + 1
)
+ d,
l = 1, 2, . . . ,M + 1, k = 1, 2, . . . ,m,
where i =√−1. For a sufficiently large n, λk,l becomes
the approximate value of the eigenvalues of L(0) +dI. Asa result, the dhLV algorithm is designed for computingeigenvalues of L(0) + dI in [8, 9].
3. Invariants of the qd-type dhLV system
In this section, we investigate some properties of arecursion formula derived from the Lax form (6).
By comparing the both sides of (6), the variables U(n)
k
in (7) and V(n)
k in (8) satisfy the following relations
δ(n)U(n+1)
k + V(n)
k+M+1= δ(n)U
(n)
k+M+1+ V
(n)
k+M , (16)
V(n)
k U(n+1)
k = U(n)
k V(n)
k+M , k = 1, 2, . . . ,Mm. (17)
We call (16) and (17) the qd-type dhLV system. Let ushere impose the boundary condition
V(n)
k−M ≡ 1, U(n)
k−M ≡ 0, k = 0, 1, . . . ,M,
V(n)
Mm+M+k ≡ 1, U(n)
Mm+k ≡ 0, k = 1, 2, . . . ,M.
The existence of invariants is one of characteristicproperties in integrable systems. Now we give two propo-sitions concerning invariants independent of the discretevariable n.
Proposition 1 Variable U(n)
k satisfies
Mm∑
k=1
U(n+1)
k =
Mm∑
k=1
U(n)
k . (18)
– 37 –
JSIAM Letters Vol. 1 (2009) pp.36–39 Akiko Fukuda et al.
Proof Taking a sum on both sides of (16) for k =−M,−M + 1, . . . ,Mm, as
Mm∑
k=−M
(δ(n)U(n+1)
k + V(n)
k+M+1)
=
Mm∑
k=−M
(
δ(n)U(n)
k+M+1+ V
(n)
k+M
)
.
Let us expand the above equation and substitute theboundary condition, then we have (18).
(QED)
Proposition 2 Variable U(n)
k satisfies
m∏
k=1
U(n+1)
Mk=
m∏
k=1
U(n)
Mk. (19)
Proof Let us recall that, in (12), L(n+1) has the sameeigenvalues as L(n) for n = 0, 1, . . . . Then it is obviousthat
det(L(n+1)) = det(L(n)), n = 0, 1, . . . . (20)
By cofactor expansion, the determinants of L(n) andL(n+1) are given as
det(L(n)) = (−1)mMU(n)
M1U
(n)
M2. . . U
(n)
Mm,
det(L(n+1)) = (−1)mMU(n+1)
M1U
(n+1)
M2. . . U
(n+1)
Mm,
respectively. Substituting the above expression into (20),we have
(−1)mMU(n+1)
M1U
(n+1)
M2. . . U
(n+1)
Mm
= (−1)mMU(n)
M1U
(n)
M2. . . U
(n)
Mm.
This leads to (19).(QED)
Let us assume that 0 < U(0)
k < K0 for k = 1, 2, . . . ,
Mm, where K0 is an arbitrary positive constant. Then
0 <∑Mm
k=1U
(0)
k < K1 and 0 <∏m
k=1U
(0)
Mk< K2, where
K1, K2 are positive constants related to K0. Proposi-
tions 1 and 2 also imply that, 0 <∑Mm
k=1U
(n)
k < K1 and
0 <∏m
k=1U
(n)
Mk< K2. Under the assumption 0 < u
(0)
k <
K0, it is concluded that 0 < U(n)
k < K3 for n = 0, 1, . . . ,
where K3 is a positive constant related to K0. Note thatthe time evolution is performed in the arithmetic suchthat positivity of variables is assured. This property isimportant for designing numerical algorithms.
4. The qd-type dhLV system and matrix
eigenvalue
In this section, we propose an application of the qd-type dhLV system to matrix eigenvalue computation.Assume that there exists the limit of δ(n) as n → ∞,and let δ∗ := limn→∞ δ(n). By taking account of (10) and
(11), the limits of U(n)
k and V(n)
k also exist as n → ∞.Namely,
limn→∞
U(n)
Mk= ck, k = 1, 2, . . . ,m,
limn→∞
U(n)
Mk+p = 0, p = 1, 2, . . . ,M,
limn→∞
V(n)
Mk+p = δ∗ck + 1, p = 0, 1, . . . ,M.
We simply rewrite the qd-type dhLV system (16) and(17) as the following recursion formula.
U(n+1)
k =U
(n)
k V(n)
k+M
V(n)
k
, (21)
V(n)
k = δ(n)U(n)
k + V(n)
k−1− δ(n)
U(n)
k−M−1V
(n)
k−1
V(n)
k−M−1
. (22)
The time evolution from n to n + 1 in (21) with (22) isapplicable for computing eigenvalues of L(0) + dI. For
U(0)
k > 0 the time evolution in (21) with (22) generates
the same matrix as (15), where V(0)
k is calculated if U(0)
k
is given. In other words, computed eigenvalues by (21)with (22) are theoretically equal to those by the dhLValgorithm.
In finite arithmetic, it is doubtful whether the timeevolution in (21) with (22) is performed with high ac-curacy. This is because that cancellation by subtractionmay occur. Subtraction also appears in the recursion for-mula of the qd algorithm.
Rutishauser [2] recognized some numerical instability
of the qd algorithm (4) where variables q(n)
k and e(n)
k
are not related to the dLV variables u(n)
k . So he intro-duced an modified version, named the dqd (differentialqd) algorithm [2], for the purpose of avoiding numericalinstability. Along a way similar to the dqd algorithm, wederive a recursion formula without subtraction. Let usintroduce a new variable
P(n)
k := V(n)
k−1− δ(n)
U(n)
k−M−1V
(n)
k−1
V(n)
k−M−1
. (23)
Then P(n)
k satisfies the recursion formula
P(n)
k =V
(n)
k−1
V(n)
k−M−1
P(n)
k−M−1, (24)
where we set P(n)
k = 1 for k = −M,−M + 1, . . . , 0. By
using P(n)
k , (22) is rewritten as
V(n)
k = δ(n)U(n)
k + P(n)
k . (25)
Obviously, (24) and (25) have no subtraction, and thecancellation does not occur. The recursion formula (21)with (24) and (25) is essentially equivalent to the qd-type dhLV system (16) and (17). Note that the ratio
V(n)
k /V(n)
k−M appears in both (21) and (24). Let Q(n)
k :=
V(n)
k /V(n)
k−M , and set Q(n)
0 = 1 for n = 0, 1, . . . . Then thetime evolution of the qd-type dhLV system is performed
by the following Procedure 1. In Procedure 1, U(0)
k isgiven by the entry of L(0) + dI and δ(n) for n = 0, 1, . . .is an optional parameter. The time evolution requiresless operations in Procedure 1 than in original (21) with
(22). As shown in [8, 9], ck = limn→∞ U(n)
Mkis equal to
the eigenvalue of L(0) + dI. We call this algorithm the
– 38 –
JSIAM Letters Vol. 1 (2009) pp.36–39 Akiko Fukuda et al.
Table 1. Computed eigenvalues of the Toeplitz matrix T by the dhLV and the qd-type dhLV algorithms
by the dhLV algorithm by the qd-type dhLV algorithm
2.00000000000000 + i 1.788617417884120 2.00000000000000 + i 1.7886174178841190.211382582115879 0.2113825821158802.00000000000000 − i 1.788617417884120 2.00000000000000 − i 1.7886174178841193.78861741788412 3.788617417884122.00000000000000 + i 1.333397829783662 2.00000000000000 + i 1.3333978297836620.666602170216338 0.6666021702163382.00000000000000 − i 1.333397829783662 2.00000000000000 − i 1.3333978297836623.33339782978366 3.333397829783662.00000000000000 + i 0.5683177818055106 2.00000000000000 + i 0.56831778180551071.43168221819449 1.431682218194482.00000000000000 − i 0.5683177818055106 2.00000000000000 − i 0.56831778180551072.56831778180551 2.56831778180551
qd-type dhLV algorithm.
Procedure 1
set boundary conditions of U(n)
k , V(n)
k , P(n)
k , Q(n)
k
for n := 0, 1, . . . , nmax dofor k := 1, 2, . . . ,Mm + M do
P(n)
k = Q(n)
k−1P
(n)
k−M−1
V(n)
k = δ(n)U(n)
k + P(n)
k
Q(n)
k = V(n)
k /V(n)
k−M
end forfor k := 1, 2, . . . ,Mm do
U(n+1)
k = Q(n)
k+MU(n)
k
end forend for
Now we present a numerical experiment carried outon our computer with OS: Windows XP, CPU: Gen-uine Intel (R) 1.66GHz, RAM: 1.99GB. We also useWolfram Mathematica 6.0 with double-precision floatingpoint arithmetic. As a numerical example, we considera 12 × 12 Toeplitz matrix T as L(0) + dI with M = 3,
m = 3, d = 2 and U(0)
k = 1.5 for k = 1, 2, . . . , 9. Letδ(n) = 1.0 for n = 0, 1, . . . .
Table 1 shows computed eigenvalues by the dhLV al-gorithm [8, 9] and the qd-type dhLV algorithm. We seefrom Table 1 that both algorithms can compute the sameeigenvalues with almost the same accuracy. The opera-tion number of the dhLV algorithm and of the qd-typedhLV algorithm are 6M and 5 times, respectively, forthe evolution from n to n + 1 of one variable. From theviewpoint of the operation number, the qd-type dhLValgorithm is better than the dhLV algorithm.
5. Concluding remarks
In this paper, we discuss some properties of the qd-type dhLV system. Based on the qd-type dhLV systemand its properties, we design a new algorithm for com-puting complex eigenvalues of a certain band matrix,similar to the dhLV algorithm. Along the way similarto the dqd algorithm, we design the qd-type dhLV algo-rithm without subtraction. We also confirm that the newalgorithm can compute same eigenvalues as the dhLV al-gorithm through a numerical example. In order to com-pare numerical accuracy and running time of the qd-
type dhLV algorithm with or without subtraction andthe dhLV algorithm, it is necessary to perform more nu-merical experiments.
Acknowledgments
The authors thank the reviewer for his carefully read-ing and helpful suggestions. The authors would like toalso thank Dr. S. Tsujimoto and Dr. A. Nagai for manyfruitful discussions and helpful advices in this work.This was partially supported by Grants-in-Aid for YoungScientist (B) No.20740064, and Scientific Research (C)No.20540137 of Japan Society for the Promotion of Sci-ence.
References
[1] W. W. Symes, The QR algorithm and scattering for the finitenonperiodic Toda lattice, Physica D, 4 (1982), 275–280.
[2] H. Rutishauser, Lectures on Numerical Mathematics,
Birkhauser, Boston, 1990.[3] Y. Nakamura, Calculating Laplace transforms in terms of the
Toda molecule, SIAM J. Sci. Comput., 20 (1999), 306–317.[4] Y. Minesaki and Y. Nakamura, The discrete relativistic Toda
molecule equation and a Pade approximation algorithm, Nu-mer. Algorithms, 27 (2001), 219–235.
[5] R. Hirota, Conserved quantities of a “random-time Todaequation”, J. Phys. Soc. Japan, 66 (1997), 283–284.
[6] M. Iwasaki and Y. Nakamura, An application of the dis-crete Lotka-Volterra system with variable step-size to singularvalue computation, Inverse Problems, 20 (2004), 553–563.
[7] M. T. Chu, Linear algebra algorithm as dynamical systems,Acta Numerica, 17 (2008), 1–86.
[8] A. Fukuda, E. Ishiwata, M. Iwasaki and Y. Nakamura, Anumerical factorization of characteristic polynomial by the
discrete hungry Lotka-Volterra system (in Japanese), Trans.Japan. Soc. Indust. Appl. Math., 18 (2008), 409–425.
[9] A.Fukuda, E. Ishiwata, M.Iwasaki and Y.Nakamura, The dis-
crete hungry Lotka-Volterra system and a new algorithm forcomputing matrix eigenvalues, Inverse Problems, 25 (2009),015007.
[10] Y. Nakamura (Eds.), Applied Integrable Systems (in
Japanese), Shokabo, Tokyo, 2000.[11] S. Tsujimoto, R. Hirota and S. Oishi, An extension and dis-
cretization of Volterra equation I (in Japanese), IEICE Tech.Rep. NLP 92–90 (1993), 1–3.
[12] S. Tsujimoto, On the discrete Toda lattice hierarchy and or-thogonal polynomials (in Japanese), RIMS Kokyuroku, 1280(2002), 11–18.
– 39 –
JSIAM Letters Vol.1 (2009) pp.40–43 c©2009 Japan Society for Industrial and Applied Mathematics
Eigendecomposition algorithms solving sequentially
quadratic systems by Newton method
Koichi Kondo1, Shinji Yasukouchi
1and Masashi Iwasaki
2
Faculty of Science and Engineering, Doshisha University, 1-3 Tatara Miyakodani, KyotanabeCity, 610-0394, Japan1
Faculty of Life and Environmental Sciences, Kyoto Prefectural University, 1-5 Nagaragi-choShimogamo, Sakyo-ku, Kyoto 606-8522, Japan2
E-mail [email protected]
Received March 17, 2009, Accepted June 20, 2009
Abstract
In this paper, we design new algorithms for eigendecomposition. With the help of the New-ton iterative method, we solve a nonlinear quadratic system whose solution is equal to aneigenvector on a hyperplane. By choosing normal vector of the hyperplane in the orthog-onal complement of the space spanned by already obtained eigenvectors, all eigenpairs aresequentially obtained by solving the quadratic systems.
Keywords eigendecomposition, the Newton method, quadratic method
Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications
1. Introduction
The quadratic method is known as one of the methodsfor all eigenpairs [1]. In this method, the eigenvalue prob-lem is replaced with the nonlinear quadratic systems.For an eigenpair, the solution of the quadratic system iscomputed by using the Newton iterative method. For alleigenpairs, the continuation method is proposed in [1].The continuation method requires not only a quadraticsystem to be solved for original eigenvalue problem butalso many perturbative ones. And furthermore it oftenfails in finding the desired eigenpairs. Even if it succeeds,the obtained eigenpairs are not always computed withhigh accuracy. In this paper, we design new eigendecom-position algorithms, which are different from the contin-uation method, through solving the quadratic systemswith the help of the Newton method. Our algorithmsare not also equivalent to the standard inverse iterationmethod. In some numerical experiments, we show thatall eigenvectors are computable by our algorithms.
2. Quadratic method
In this paper, we consider the eigenvalue problem
Ax = λx, A ∈ Cn×n, (1)
where λ ∈ C and x ∈ Cn denote the eigenvalue and the
corresponding eigenvector of A, respectively.Let z be an n-dimensional vector. Let (z,x) = z
Hx
= C for some nonzero constant C, where (·, ·) and thesuperscript H denote the inner product of two vectorsand the complex conjugate of matrix, respectively. Thecase where z = ek is discussed in [1] where ek is a unitvector whose kth entry is the unity. Noting that λ =λ(x) = (AH
z,x)/C for suitable z, then the eigenvector
x is given by solving the nonlinear quadratic system
F (x) := Ax − (w,x)
Cx = 0, w = AH
z. (2)
With the help of the Newton iterative method, the so-lution x is computable by the recurrence formula
x(ℓ+1) =
Cx(ℓ+1)
(z, x(ℓ+1)), ℓ = 0, 1, . . . , ℓmax,
x(ℓ+1) = x
(ℓ) − J(x(ℓ))−1F (x(ℓ)),
J(x(ℓ)) = A − λ(x(ℓ))I − x(ℓ)
wH
C,
λ(x(ℓ)) =(w,x(ℓ))
C,
(3)
where I is an n-dimensional unit matrix and x(0) is an
initial vector. See Section 3 for the setting of x(0). Let
ℓ∗ be the number in (3) such that
‖Ax(ℓ∗) − λ(x(ℓ∗))x(ℓ∗)‖∞ < ǫitr‖x(ℓ∗)‖2 (4)
for small ǫitr. Then x(ℓ∗) becomes a good approximation
of x in (2). By the normalization x(ℓ) → x
(ℓ)/‖x(ℓ)‖2
for each ℓ in (3), the inequality (4) becomes
‖Ax(ℓ∗) − λ(x(ℓ∗))x(ℓ∗)‖∞ < ǫitr. (5)
Note here that ‖x(ℓ)‖2 = 1 for each ℓ. Moreover, byreplacing C with C(ℓ) = (z,x(ℓ)) in (2) and (3), it followsthat
x(ℓ+1) =
x(ℓ+1)
‖x(ℓ+1)‖2
, ℓ = 0, 1, . . . , ℓmax,
x(ℓ+1) = x
(ℓ) − J(x(ℓ))−1F (x(ℓ)),
J(x(ℓ)) = A − λ(x(ℓ))I − x(ℓ)
wH
(z,x(ℓ)),
λ(x(ℓ)) =(w,x(ℓ))
(z,x(ℓ)).
(6)
– 40 –
JSIAM Letters Vol. 1 (2009) pp.40–43 Koichi Kondo et al.
At each ℓ, the hyperplane (z,x(ℓ)) = C(ℓ) is translatedwithout changing its normal vector. We call the algo-rithm for an eigenpair based on (6) the neig J algo-rithm. By applying the Sherman-Morrison formula
(M + uvH)−1 =
(
I − M−1uv
H
1 + (v,M−1u)
)
M−1 (7)
to the inverse J(x(ℓ))−1 in (6), we have
x(ℓ+1) =
λ(x(ℓ))
(w, x(ℓ))/(z,x(ℓ)) − 1x
(ℓ),
x(ℓ) = (A − λ(x(ℓ))I)−1
x(ℓ).
(8)
Hence the following recurrence formula also generatesthe evolution from ℓ to ℓ + 1 of x
(ℓ).
x(ℓ+1) =
x(ℓ+1)
‖x(ℓ+1)‖2
, ℓ = 0, 1, . . . , ℓmax,
x(ℓ+1) = (A − λ(x(ℓ))I)−1
x(ℓ),
λ(x(ℓ)) =(w,x(ℓ))
(z,x(ℓ)).
(9)
In [2, p. 194], (9) is called as a generalized Rayleigh quo-tient iteration. If λ(x(ℓ)) = λ is given, then the iteration(9) becomes
x(ℓ+1) =
x(ℓ+1)
‖x(ℓ+1)‖2
, x(ℓ+1) = (A − λI)−1
x(ℓ). (10)
This is well-known as the inverse iteration for com-puting eigenvector. The iteration (9) may be regardedas one of inverse iterations with updating λ(x(ℓ)) ateach ℓ by a generalized Rayleigh quotient λ(x(ℓ)) =(z, Ax
(ℓ))/(z,x(ℓ)). We call the algorithm based on (9)the neig I algorithm. Though the computed eigenpairby the neig I algorithm is theoretically the same as thatby the neig J algorithm, the neig I algorithm is obvi-ously different from the neig J algorithm with respectto numerical accuracy. See Section 4 for numerical accu-racy.
3. Eigendecomposition algorithm
An eigenpair (λ,x) is computable if suitable initialvector x
(0) is given in (6), (9). The other eigenpairs arealso computed by changing x
(0) in (6), (9). Namely, wecan theoretically compute all eigenpairs by using theneig ∗ algorithm. It is, however, not easy to computeall eigenpairs if x
(0) is randomly given. It is well-knownthat the fractal graph is given from the relationship be-tween the initial vector x
(0) and the limit limℓ→∞ x(ℓ)
in the Newton iteration method (cf. [3, pp. 237–242]).Namely, it is not expected to choose x
(0) for computingthe desired eigenpair in the neig ∗ algorithm.
Let x1, . . . , xk be the already obtained eigenvectorswhere k < n . We here consider the subspace Wk := 〈x1,· · · , xk〉C and its orthogonal complement W⊥
k . Since thenormal vector z of the hyperplane (z,x(ℓ)) = C(ℓ) ischangeable, we may adopt the vector in W⊥
k as z. It isremarkable that W⊥
k does not include x1, . . . , xk. Letus assume that x
(ℓ) converges as ℓ → ∞. Then it is obvi-ous that, for ℓ = 1, 2, . . . , C(ℓ) 6= 0 and limℓ→∞ C(ℓ) 6= 0.
This implies that limℓ→∞ x(ℓ) /∈ Wk. Hence x
(ℓ) → xk+1
and λ(x(ℓ)) → λk+1 as ℓ → ∞. Namely, the eigenpair(λk+1,xk+1) is computable by the neig ∗ algorithm.Similarly, the others are obtained only if x
(ℓ) convergesas ℓ → ∞ for each k. Therefore, all eigenpairs are se-quentially computed by the following algorithm.
Algorithm 101 function [X,D]=sneig ∗(A)02 t := 003 Q = (q1 · · · qn) := I04 for k = 1, 2, . . . , n05 z := qk ∈ W⊥
k−1
06 f := 007 do08 x
(0) := random vec(n)09 [xk, λk, Ek] := neig ∗(A,x(0),z, ℓmax)10 θ := minj=1,...,k−1 angle(xk,xj)11 t := t + 1; f := f + 112 if f ≥ fmax then stop % failed13 while(Ek ≥ ǫgood or θ ≤ θsame)14 rk := xk
15 for j = 1, . . . , k − 116 rk := rk − αj(hj , rk)hj
17 end18 [hk, αk] := householder vec(rk)19 Q := Q − αk(Qhk)hH
k
20 end21 X := (x1 · · · xn); D := diag(λ1, . . . , λn)
Here we call Algorithm 1 the sneig ∗ algorithm. Thesneig J, the sneig I algorithms employ the neig J, theneig I algorithms, respectively.
In the 8th line of Algorithm 1, we make choice of theinitial complex vector x
(0) randomly. In the 9th line, bythe neig ∗ algorithm, we compute the kth eigenvalueλk, the corresponding eigenvector xk and the residualnorm Ek := ‖Axk −λkxk‖∞. As discussed in the above,the neig ∗ algorithm does not converge for unsuitablex
(0). We regard that the neig ∗ algorithm does not con-verge if Ek ≥ ǫgood for small ǫgood. And then we performthe neig ∗ algorithm after the change of x
(0). The op-erations from the 7th line to the 13th line are repeateduntil Ek < ǫgood. Theoretically, xk is not equal to one ofx1, . . . ,xk−1. This property is not always guaranteed inthe double precision arithmetic. In the 10th line, we com-pute the minimal angle θ := minj=1,...,k−1 angle(xk,xj)where
angle(xk,xj) :=180
πcos−1
( |(xk,xj)|‖xk‖2‖xj‖2
)
. (11)
We regard that xk is equal to one of x1, . . . ,xk−1 ifθ ≤ θsame for small θsame, and then we perform theneig ∗ algorithm after the change of x
(0). Let f be theiteration number of the neig ∗ algorithm for an eigen-pair. Then we regard that only a part of eigenpairs iscomputed by the sneig ∗ algorithm if f ≥ fmax forthe maximal iteration number fmax. In this case, thesneig ∗ algorithm is coercively stopped in the 12th line.
In the 5th line of Algorithm 1, we choose z in theorthogonal complement W⊥
k−1. In this paper, for the
– 41 –
JSIAM Letters Vol. 1 (2009) pp.40–43 Koichi Kondo et al.
choice of z we use the QR decomposition based on theHouseholder transformation. Let Xk−1 = Qk−1Rk−1 bethe QR decomposition of Xk−1 = (x1 · · · xk−1), whereQk−1 = (q1 · · · qn) ∈ C
n×n, Rk−1 = (r1 · · · rk−1) ∈C
n×(k−1) are the unitary, the upper triangle matrices,respectively. Let Wk−1 = 〈q1, · · · , qk−1〉C. Then it isobvious that W⊥
k−1= 〈qk, · · · , qn〉C. This implies that z
should be the linear combination of the basis qk, . . . ,qn. In Algorithm 1, we set z = qk. From the view-point of the running time, it is not desirable that wecompute the QR decomposition of Xk for each k. Itis of significance to note here that the columns fromthe 1st to the (k − 1)th of Rk, Qk are equal to thoseof Rk−1, Qk−1, respectively. Hence, in the kth House-holder transformation, we compute only the kth columnof Rk. In the lines from the 14th to the 17th, we com-pute the kth column rk = (r1,k · · · rk−1,k rk,k · · · rn,k)T
of QHk−1
Xk = (r1 · · · rk−1 rk) from xk. In the 18th line,we derive hk and αk from rk for computing the House-holder matrix Hk := I − αkhkh
Hk as follows:
hk = (0 · · · 0 − ζξ rk+1,k · · · rn,k)T , (12)
ζ :=rk,k
|rk,k|, η :=
√
|rk,k|2 + · · · + |rn,k|2, (13)
ξ :=|rk+1,k|2 + · · · + |rn,k|2
|rk,k| + η, αk =
1
ξη, (14)
where Hk : rk 7→ rk = (r1,k · · · rk−1,k ζη 0 · · · 0)T
and HHk QH
k−1Xk = Rk = (r1 · · · rk−1 rk). In the 19th
line, we compute Qk as Qk = Qk−1Hk = Qk−1 −αk(Qk−1hk)hH
k . It is remarkable that q1, . . . , qk−1 arenot changed in the 19th line since hk has 0 from the 1stentry to the (k − 1)th entry. As a result, the sneig ∗algorithm requires only the operations for a QR decom-position. The Lanczos method is also shown in [4] asthe vector orthogonormalization method by using theHouseholder transformation without saving the upper-triangle matrix.
4. Numerical experiments
In this section, we show some numerical experimentswith respect to the sneig ∗ algorithm and the inverseiteration based on (10). Let us call the inverse itera-tion based on (10) the sii algorithm for simplicity. Nu-merical experiments have been carried out on our com-puter with OS: Linux 2.6.26, CPU: Intel Core i7, RAM:2GB. We also use GNU C Compiler 4.3.2 and LAPACK3.1.1 [5]. As test matrix, we adopt the Toeplitz matrix
A =
2 1
0 2 1
γ 0 2. . .
γ. . .
. . . 1. . . 0 2 1
γ 0 2
. (15)
In [6], the Toeplitz matrix (15) appears in numeri-cal test for the solvers of the linear equations. In thesneig ∗ algorithm, we set ǫitr = 10−13, ℓmax = 50,ǫgood = 5 × 10−13, θsame = 0.3, fmax = 2n. The inverse
20 40 60 80 100 120n
1e-15
1e-12
1e-09
1e-06
1e-03
Em
ax
Fig. 1. Maximal residual norm Emax in the case of test matrix(15) with γ = 1.6. © : sneig J, : sneig I, × : sii.
20 40 60 80 100 120n
1.0
2.0
4.0
8.0
16.0
t / n
Fig. 2. Ratio of the iteration number t to the matrix size n in thecase of test matrix (15) with γ = 1.6. © : sneig J, : sneig I,
× : sii.
60 80 100 120n
0.5
1.0
1.5
2.0
θm
in[d
eg]
Fig. 3. Minimal angle θmin among the eigenvectors in the case
of test matrix (15) with γ = 1.6. © : sneig J, : sneig I, × :sii, – : Maple.
matrices appeared in (6), (9), (10) are computed by us-ing the solver of the linear equations with the help ofthe LAPACK routine zgesv. In the sii algorithm, aneigenvalue and its corresponding eigenvector are com-puted by the LAPACK routine zgeev and the inverseiteration based on (10), respectively. The initial vectorx
(0) in (10) is changed if Ek ≥ ǫgood or θ ≤ θsame. Let tbe the iteration number of (6), (9), (10) for computingall eigenpairs.
Figs. 1–3 describe the numerical properties in the casewhere γ = 1.6. No plotted points exist for the casewhere the sneig ∗ algorithms stop without computingall eigenpairs. Fig. 1 shows the maximal residual norm
Emax = maxk=1,...,n
Ek = maxk=1,...,n
‖Axk − λkxk‖∞. (16)
– 42 –
JSIAM Letters Vol. 1 (2009) pp.40–43 Koichi Kondo et al.
1.0 1.5 2.0 2.5 3.0γ
50
100
150
200
nm
ax
Fig. 4. Computable matrix size nmax. © : sneig J, : sneig I,× : sii.
1.0 1.5 2.0 2.5 3.0γ
1.0
2.0
3.0
θm
in[d
eg]
Fig. 5. Minimal angle θmin in the case of n = nmax. © : sneig J,
: sneig I, × : sii.
By using the sneig ∗, the sii algorithms, Emax be-comes O(10−13). Though, in the sneig ∗ and the sii
algorithms, the eigenvectors seem to be computed withhigh accuracy, it is necessary to investigate the anglesamong the computed eigenvectors. This is shown in thelater discussion. Fig. 2 shows the ratio of t to the matrixsize n for several n. For n ≤ 40, t slightly increases inthe sneig ∗ algorithm. For n≥ 60, there is the observa-tion that by both the neig ∗ algorithm and the inverseiteration for an eigenpair, the computed eigenvector isnot with high accuracy, or, is almost equal to the al-ready obtained ones. And then the sneig ∗, the sii
algorithms require the change of the initial vector x(0).
This flow is surely dependent on the angles among theeigenvectors. Let θmin := min1≤i<j≤n angle(xi,xj) bethe minimal angle among the eigenvectors. Fig. 3 showsthe relationship between n and θmin. Fig. 3 also includesthe numerical results by Maple, where 100 digits arith-metic is performed in Maple. For n ≥ 60, θmin is about1. A part of eignevectors are nearly parallel. As thematrix size n increases, θmin by Maple becomes smaller.All eigenvectors computed by the sneig ∗ algorithm arenear to those by Maple. The minimal angle θmin by thesii algorithm are different from that by Maple. Let θ∗min
be the minimal angle among the eigenvectors by Maple.In the sii algorithm, for n ≥ 90, θmin does not satisfy|θmin − θ∗min| < 0.03.
Next we investigate the computable maximal matrixsize nmax as the entry γ in (15) becomes larger. We re-gard that the algorithms fail if |θmin − θ∗min| ≥ 0.03
as the matrix size n grows larger. Fig. 4 shows therelationship between γ and nmax. For γ > 1.2, nmax
in the sneig J algorithm is much larger than that inthe sneig I algorithm. And nmax in the sii algorithmis about 0.79 times as that in the sneig J algorithm.Fig. 5 shows the relationship between γ and θmin inthe case where the matrix size is equal to nmax. Forall γ, θmin in the sneig J algorithm are almost 0.46.For γ > 1.2, θmin in the sneig I algorithm is largerthan that in the sneig J algorithm. And θmin in thesii algorithm is slightly larger than that in the sneig J
algorithm. Compared with the results by Maple, it is ob-vious that the sii algorithm is not with high accuracy.Consequently, the sneig J algorithm generates the mostaccurate eigendecomposition among three algorithms.
5. Conclusion
In this paper, we design new eigendecomposition al-gorithms based on solving the nonlinear quadratic sys-tems. In our algorithms, the existence space of eigenvec-tors is restricted to the suitable hyperplane. The eigen-value problem is replaced with solving the quadraticsystems. An eigenpair is computed through solving thequadratic systems with the help of the Newton iterativemethod. The normal vector of the hyperplane is givenfrom the orthogonal complement of the space spannedby the already obtained eigenvectors. The solutions ofthe quadratic systems are not equal to the already ob-tained eigenvectors. Of course, for any initial vector, thecomputed vector by the Newton iterative method doesnot become the already obtained eigenvectors. Conse-quently, all eigenpairs are sequentially computable. Ouralgorithms are two types named the sneig J algorithmwith the Newton iterative method and the sneig I algo-rithm with a modified inverse iteration. Our algorithmsare compared with the standard inverse iteration fromthe viewpoint of numerical accuracy. It is shown that thesneig J algorithm is the best algorithm for computingall eigenvectors with high accuracy in the case where theminimal angle among the eigenvectors is small.
Acknowledgments
The authors thank the reviewer for his carefully read-ing and helpful suggestions.
References
[1] M. B. Elgindi and A. Kharab, The quadratic method for com-puting the eigenpairs of a matrix, Int. J. Comput. Math., 73
(2000), 517–530.[2] F. Chatelin, Eigenvalues of Matrices (in Japanese), Springer-
Verlag, Tokyo, 2003.
[3] K. Falconer, Fractal Geometry, Mathematical Foundationsand Applications, Second Edition, John Wiley & Sons, Eng-land, 2003.
[4] G.H.Golub and C.F.Van Loan, Matrix Computations, Third
Edition, The Johns Hopkins Univ. Press, Baltimore and Lon-don, 1996.
[5] LAPACK, http://www.netlib.org/lapack/.[6] M. H. Gutknecht, Variants of BiCGSTAB for matrix with
complex spectrum, SIAM J. Sci. Comput., 14 (1993), 1020–1033.
– 43 –
JSIAM Letters Vol.1 (2009) pp.44–47 c©2009 Japan Society for Industrial and Applied Mathematics
Block BiCGGR: a new Block Krylov subspace method
for computing high accuracy solutions
Hiroto Tadano1, Tetsuya Sakurai
1and Yoshinobu Kuramashi
2
Department of Computer Science, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki305-8573, Japan1
Graduate School of Pure and Applied Sciences, University of Tsukuba, 1-1-1 Tennodai,Tsukuba, Ibaraki 305-8573, Japan2
E-mail tadano, [email protected], [email protected]
Received March 19, 2009, Accepted June 15, 2009
Abstract
In this paper, the influence of errors which arise in matrix multiplications on the accuracyof approximate solutions generated by the Block BiCGSTAB method is analyzed. In orderto generate high accuracy solutions, a new Block Krylov subspace method named “BlockBiCGGR” is also proposed. Some numerical experiments illustrate that the Block BiCGGRmethod can generate high accuracy solutions compared with the Block BiCGSTAB method.
Keywords Block Krylov subspace methods, Block BiCGSTAB, linear systems with multipleright hand sides, high accuracy solutions
Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications
1. Introduction
Linear systems with multiple right hand sides
AX = B, (1)
where A ∈ Cn×n, B,X ∈ C
n×L, appear in many scien-tific applications such as lattice quantum chromodynam-ics (lattice QCD) calculation of physical quantities [1],an eigensolver using contour integration [2]. To solvethese linear systems for X, some Block Krylov subspacemethods (e.g., Block BiCG [3], Block BiCGSTAB [4],Block QMR [5]) have been proposed.
Block Krylov subspace methods can compute approxi-mate solutions of linear systems with multiple right handsides efficiently compared with Krylov subspace methodsfor single right hand side [5]. However, the gap betweenthe residual generated by the recursion of the BlockBiCGSTAB method and the true residual may arise. Inthis paper, the gap which arises in the Block BiCGSTABmethod is analyzed. Then, a new Block Krylov subspacemethod named “Block BiCGGR” for reducing the gapis also proposed.
This paper is organized as follows. In Section 2, amatrix-valued polynomial and an operation are defined.The Block BiCGSTAB method is briefly described inSection 3. In Section 4, the influence of errors whicharise in matrix multiplications on the accuracy of ap-proximate solutions of the Block BiCGSTAB method.In Section 5, the Block BiCGGR method is proposed forreducing the gap between the residual generated by therecursion and the true residual. Then the true residualof the Block BiCGGR method is also evaluated. In Sec-tion 6, the accuracy of approximate solutions generatedby both methods is verified by numerical experiments.The paper is concluded in Section 7.
2. Matrix-valued polynomial
Let Mk(z) be a matrix-valued polynomial of degree kdefined by
Mk(z) ≡k
∑
j=0
zjMj ,
where Mj ∈ CL×L and z ∈ C. The operation is used
in this paper for the multiplication
Mk(A) V ≡k
∑
j=0
AjV Mj ,
where V ∈ Cn×L. This operation satisfies the following
properties [4].
Proposition 1 Let M(z) and N (z) be matrix-valued
polynomials of degree k and let V and ξ be an n × Lmatrix and an L×L matrix, respectively. Then, the fol-
lowing properties are satisfied.
(i) (M(A) V )ξ = (Mξ)(A) V,
(ii) (M + N )(A) V = M(A) V + N (A) V.
3. The Block BiCGSTAB method
The (k+1)th residual Rk+1 ∈ Cn×L of the Block
BiCGSTAB method is defined by
Rk+1 = B − AXk+1 ≡ (Qk+1Rk+1)(A) R0, (2)
where R0 = B −AX0 is an initial residual. The matrix-valued polynomial Rk+1(z) of degree (k+1) which ap-pears in (2) can be computed by the following recursions
R0(z) = P0(z) = IL,
Rk+1(z) = Rk(z) − zPk(z)αk,
Pk+1(z) = Rk+1(z) + Pk(z)βk,
– 44 –
JSIAM Letters Vol. 1 (2009) pp.44–47 Hiroto Tadano et al.
X0 ∈ Cn×L is an initial guess,
Compute R0 = B − AX0,
Set P0 = R0,
Choose R0 ∈ Cn×L,
For k = 0, 1, . . . , until ‖Rk‖F ≤ ε‖B‖F do:
Vk = APk,
Solve (RH
0 Vk)αk = RH
0 Rk for αk,
Tk = Rk − Vkαk,
Zk = ATk,
ζk = Trˆ
ZH
k Tk
˜
/Trˆ
ZH
k Zk
˜
,
Xk+1 = Xk + Pkαk + ζkTk,
Rk+1 = Tk − ζkZk,
Solve (RH
0 Vk)βk = −RH
0 Zk for βk,
Pk+1 = Rk+1 + (Pk − ζkVk)βk,
End
Fig. 1. Algorithm of the Block BiCGSTAB method.
where Pk+1(z) is an auxiliary matrix-valued polynomialof degree (k+1), IL is an L×L identity matrix, αk andβk are L×L complex matrices. The polynomial Qk+1(z)of degree (k+1) is defined as follows:
Q0(z) = 1,
Qk+1(z) = (1 − ζkz)Qk(z),
where ζk ∈ C. The residual Rk+1 can be computed bythe following recursions,
Rk+1 = Tk − ζkATk, (3)
Pk+1 = Rk+1 + (Pk − ζkAPk)βk,
Tk = Rk − APkαk, (4)
where matrices Pk+1 and Tk are defined by Pk+1 ≡(Qk+1Pk+1)(A) R0 and Tk ≡ (QkRk+1)(A) R0, re-spectively. The Proposition 1 is used to derive the aboverecursions. From the Eqs. (2), (3), and (4), recursion forthe approximate solution Xk+1 can be obtained by
Xk+1 = Xk + Pkαk + ζkTk. (5)
The L×L matrices αk and βk are determined so thatbi-orthogonal conditions:
RH0 Aj(Rk(A) R0) = OL, j = 0, 1, . . . , k−1, (6)
RH0 Aj+1(Pk(A) R0) = OL, j = 0, 1, . . . , k−1, (7)
are satisfied. Here, R0 is an n × L arbitrary nonzeromatrix, OL is an L × L zero matrix, and ‖ · ‖F denotesthe Frobenius norm of a matrix. Typically, R0 is set toR0, or given by random numbers. The scalar parameterζk is determined so that ‖Rk+1‖F is minimized. Fig. 1shows the algorithm of the Block BiCGSTAB method.Here, Tr[ · ] denotes the trace of a matrix, and ε > 0 isa sufficiently small value for the stopping criterion.
4. Evaluation of the true residual of the
Block BiCGSTAB method
In this section, it is assumed that computation errorsarise in the multiplications with α0, α1, . . . , αk which ap-
pear in the Block BiCGSTAB method. The influence ofthese errors on the true residual of the Block BiCGSTABmethod is considered. A matrix enclosed by a symbol〈 · 〉 denotes the perturbed matrix. Throughout this sec-tion, it is assumed that no calculation errors arise exceptfor multiplications with α0, α1, . . . , αk.
The perturbed matrices 〈Pjαj〉 and 〈(APj)αj〉 are re-quired for the computation of Xj+1 and Rj+1, respec-tively. These matrices can be written as follows:
〈Pjαj〉 = Pjαj + Fj , (8)
〈(APj)αj〉 = APjαj + Gj , (9)
where Fj and Gj denote error matrices.From the Eqs. (5) and (8), Xk+1 is written as
Xk+1 = Xk + 〈Pkαk〉 + ζkTk
= X0 +
k∑
j=0
(Pjαj + ζjTj) +
k∑
j=0
Fj . (10)
By using the Eq. (9), the residual Rk+1 generated by therecursion (3) is also written as
Rk+1 = Rk − 〈(APk)αk〉 − ζkATk
= R0 −k
∑
j=0
(APjαj + ζjATj) −k
∑
j=0
Gj . (11)
By using the Eqs. (10) and (11), the true residualB−AXk+1 of the Block BiCGSTAB method is given by
B − AXk+1 = R0 −k
∑
j=0
(APjαj + ζjATj) −k
∑
j=0
AFj
= Rk+1 +
k∑
j=0
Ej , (12)
where the matrix Ej is defined by Ej ≡ Gj −AFj . Fromthe Eqs. (8) and (9), the matrix Ej can be written as
Ej = 〈(APj)αj〉 − A〈Pjαj〉.The error matrices E0, E1, . . . , Ek appear in (12) whenthe computation errors arise in the multiplications withα0, α1, . . . , αk. The Eq. (12) implies that the true resid-ual B − AXk+1 of the Block BiCGSTAB method ap-
proaches to∑k
j=0Ej when the residual norm ‖Rk+1‖F
is sufficiently small.
5. The Block BiCGGR method
The error matrices Fj and Gj generate the gap be-tween the residual and the true residual. To negate theinfluence of these matrices, a condition Gj = AFj shouldbe satisfied. In this section, a new Block Krylov subspacemethod is proposed to reduce the gap.
5.1 Construction of an algorithm
There are two ways of constructing the recursion forthe residual Rk+1 = (Qk+1Rk+1)(A) R0. In the BlockBiCGSTAB method, firstly, the polynomial Qk+1 is ex-panded. In this case, as shown in the Eq. (12), the trueresidual B − AXk+1 is not equal to the residual Rk+1
– 45 –
JSIAM Letters Vol. 1 (2009) pp.44–47 Hiroto Tadano et al.
generated by the recursion. In the proposed method,firstly, the polynomial Rk+1 is expanded for computingQk+1Rk+1. The recursion of this polynomial is given by
Qk+1Rk+1 = QkRk − ζkzQkRk − zQk+1Pkαk.
The polynomials Qk+1Pk and Qk+1Pk+1 are computedby the following recursions:
Qk+1Pk = QkPk − ζkzQkPk,
Qk+1Pk+1 = Qk+1Rk+1 + Qk+1Pkβk.
From the above recursions, the residual Rk+1 and aux-iliary matrices can be computed by
Rk+1 = Rk − ζkARk − AUk, (13)
Pk+1 = Rk+1 + Ukα−1
k βk, (14)
Sk = Pk − ζkAPk,
where Sk ≡ (Qk+1Pk)(A) R0 and Uk ≡ Skαk. Fromthe Eqs. (2) and (13), Xk+1 can be computed by
Xk+1 = Xk + ζkRk + Uk. (15)
In the proposed method, the generation of the gap be-tween the residual and the true residual can be avoidedby computing the multiplication of Sk and αk before thecomputation of Xk+1 and Rk+1.
Matrices αk and βk are determined so that the bi-orthogonality conditions (6) and (7) are satisfied. Fromthe Eq. (6), the matrix αk can be computed by
αk = (RH0 APk)−1RH
0 Rk. (16)
By the bi-orthogonality condition (7) and the relation
RH0 Rk+1 = −ζkRH
0 ATk,
the matrix βk can be obtained by
βk = (RH0 APk)−1RH
0 Rk+1/ζk. (17)
The matrix γk ≡ α−1
k βk is appeared in the Eq. (14). Byusing the Eqs. (16) and (17), γk can be obtained by
γk = (RH0 Rk)−1RH
0 Rk+1/ζk.
If the parameter ζk is determined so that ‖Rk+1‖F isminimized, then extra multiplications with A are re-quired in the proposed method. To avoid the multiplica-tions with A, the parameter ζk is computed by
ζk = Tr[
(ARk)HRk
]
/Tr[
(ARk)HARk
]
.
In the proposed method, the three multiplicationswith A are required in each iteration. To reduce thenumber of multiplications with A, the matrix APk+1 iscomputed by the following recursion
APk+1 = ARk+1 + AUkγk.
5.2 Evaluation of the true residual
Similar to the previous section, assume that no cal-culation errors arise except for the multiplications withα0, α1, . . . , αk. The multiplication with αj is appearedin the computation of Uj = Sjαj . By using the symbol〈 · 〉, the perturbed matrix 〈Sjαj〉 is represented as
〈Sjαj〉 = Sjαj + Hj , (18)
X0 ∈ Cn×L is an initial guess,
Compute R0 = B − AX0,
Set P0 = R0 and V0 = W0 = AR0,
Choose R0 ∈ Cn×L,
For k = 0, 1, . . . , until ‖Rk‖F ≤ ε‖B‖F do:
Solve (RH
0 Vk)αk = RH
0 Rk for αk,
ζk = Trˆ
WH
k Rk
˜
/Trˆ
WH
k Wk
˜
,
Sk = Pk − ζkVk,
Uk = Skαk,
Yk = AUk,
Xk+1 = Xk + ζkRk + Uk,
Rk+1 = Rk − ζkWk − Yk,
Wk+1 = ARk+1,
Solve (RH
0 Rk)γk = RH
0 Rk+1/ζk for γk,
Pk+1 = Rk+1 + Ukγk,
Vk+1 = Wk+1 + Ykγk,
End
Fig. 2. Algorithm of the Block BiCGGR method.
where Hj is an error matrix. From the Eqs. (15) and(18), the approximate solution Xk+1 is written as
Xk+1 = Xk + ζkRk + 〈Skαk〉
= X0 +k
∑
j=0
(ζjRj + Sjαj) +k
∑
j=0
Hj . (19)
By using the Eqs. (13) and (19), Rk+1 is represented as
Rk+1 = Rk − ζkARk − A〈Skαk〉
= R0 −k
∑
j=0
(ζjARj + ASjαj) −k
∑
j=0
AHj
= B − A
X0 +
k∑
j=0
(ζjRj + Sjαj) +
k∑
j=0
Hj
= B − AXk+1.
By regarding the matrices Hj and AHj as Fj and Gj ,it is confirmed that the proposed method satisfies Ej =Gj − AFj = O. Since the proposed method can reducethe gap between the residual and the true residual, thismethod is named “Block Bi-Conjugate Gradient Gap-Reducing (Block BiCGGR)”. The algorithm of the BlockBiCGGR method is shown in Fig. 2.
6. Numerical experiments
Test matrices used in numerical experiments werePDE900, JPWH991, and CONF5.4-00L8X8-1000 [6].The size and the number of nonzero elements of thesematrices are shown in Table 1. The coefficient matrixof CONF5.4-00L8X8-1000 is constructed by In − κD,where D is an n × n non-Hermitian matrix and κ is areal valued parameter. This parameter was set to 0.1782.
The initial solution X0 was set to the zero matrix.The shadow residual R0 was given by a random numbergenerator. The right hand side B of (1) was given byB = [e1,e2, . . . ,eL], where ej is a jth unit vector. The
– 46 –
JSIAM Letters Vol. 1 (2009) pp.44–47 Hiroto Tadano et al.
Table 1. The size and the number of nonzero elements of testmatrices. NNZ denotes the number of nonzero elements.
Matrix name Size NNZ
PDE900 900 4,380
JPWH991 991 6,027
CONF5.4-00L8X8-1000 49,152 1,916,928
Table 2. Results of the Block BiCGSTAB method.
PDE900
L #Iter. Time/L [s] Res. True Res.
1 53 0.0096 4.8 × 10−15 4.8 × 10−15
2 46 0.0067 1.1 × 10−15 2.0 × 10−13
4 41 0.0031 4.8 × 10−15 1.8 × 10−12
JPWH991
L #Iter. Time/L [s] Res. True Res.
1 56 0.0159 5.7 × 10−15 1.2 × 10−14
2 49 0.0083 8.3 × 10−15 4.1 × 10−13
4 43 0.0034 6.3 × 10−15 5.9 × 10−12
CONF5.4-00L8X8-1000
L #Iter. Time/L [s] Res. True Res.
1 555 13.9408 8.9 × 10−15 9.5 × 10−15
2 452 7.5609 7.3 × 10−15 2.5 × 10−13
4 406 6.1544 8.7 × 10−15 2.8 × 10−13
value ε for the stopping criterion was set to 1.0× 10−14.All experiments were carried out in double precision
arithmetic on CPU: Intel Core 2 Duo 2.4GHz, Memory:4GBytes, Compiler: Intel Fortran ver. 10.1, Compile op-tion: -O3 -xT -openmp. The multiplication with the co-efficient matrix was parallelized by OpenMP.
The results of the Block BiCGSTAB method areshown in Table 2. In this Table, #Iter., Res, and TrueRes. denote the number of iterations, the relative resid-ual norm ‖Rk‖F/‖B‖F, and the true relative residualnorm ‖B − AXk‖F/‖B‖F, respectively.
As shown in Table 2, the relative residual norms ofthe Block BiCGSTAB method were satisfied the con-vergence criterion. However, the true residual norms didnot reach 10−14 when L = 2, 4.
The relation between the true relative residual normand ‖∑k
j=0Ej‖F/‖B‖F for JPWH991 with L = 4 is
shown in Fig. 3. The true relative residual norm becamealmost equal to the value ‖
∑k
j=0Ej‖F/‖B‖F. The Eq.
(12) was verified through this numerical example.The results of the Block BiCGGR method are shown
in Table 3. The true relative residual norms reached10−14 except for JPWH991 with L = 1. By using theBlock BiCGGR method, the gap between the residualand the true residual can be reduced compared with theBlock BiCGSTAB method.
7. Conclusions
In this paper, we have evaluated the true residual ofthe Block BiCGSTAB method when the computationerrors arise in the multiplications with α0, α1, . . . , αk.We have shown that the true residual of this methodapproaches to the sum of error matrices when the resid-ual norm is sufficiently small. Then, we have proposedthe Block BiCGGR method for reducing the gap be-
Table 3. Results of the Block BiCGGR method.
PDE900
L #Iter. Time/L [s] Res. True Res.
1 53 0.0107 3.2 × 10−15 3.3 × 10−15
2 46 0.0051 1.1 × 10−15 1.4 × 10−15
4 45 0.0031 5.5 × 10−15 5.6 × 10−15
JPWH991
L #Iter. Time/L [s] Res. True Res.
1 52 0.0134 8.4 × 10−15 1.3 × 10−14
2 51 0.0082 3.7 × 10−15 6.1 × 10−15
4 44 0.0035 1.5 × 10−15 2.3 × 10−15
CONF5.4-00L8X8-1000
L #Iter. Time/L [s] Res. True Res.
1 555 14.2714 7.4 × 10−15 8.5 × 10−15
2 456 8.1093 5.6 × 10−15 6.7 × 10−15
4 386 6.0348 7.4 × 10−15 8.6 × 10−15
Fig. 3. Relation between the true relative residual norm of
Block BiCGSTAB and ‖P
k
j=0Ej‖F/‖B‖F (JPWH991, L = 4).
--- : true relative residual norm ‖B−AXk‖F/‖B‖F,--- : rel-
ative residual norm ‖Rk‖F/‖B‖F, --- : ‖P
k
j=0Ej‖F/‖B‖F.
tween the residual and the true residual. Through somenumerical experiments, we have verified that the BlockBiCGGR method generates the high accuracy solutionscompared with the Block BiCGSTAB method.
Acknowledgments
This work was supported by Grant-in-Aid for YoungScientists (Start-up) (No. 20800009).
References
[1] PACS-CS Collaboration, S. Aoki et al., 2 + 1 Flavor LatticeQCD toward the Physical Point, arXiv:0807.1661v1 [hep-lat],2008.
[2] T. Sakurai, H. Tadano, T. Ikegami, and U. Nagashima, Aparallel eigensolver using contour integration for generalizedeigenvalue problems in molecular simulation, Tech. Rep. CS–
TR–08–14, Univ. of Tsukuba, 2008.[3] D. P. O’Leary, The block conjugate gradient algorithm and
related methods, Lin. Alg. Appl., 29 (1980), 293–322.[4] A. El Guennouni, K. Jbilou and H. Sadok, A block version of
BiCGSTAB for linear systems with multiple right-hand sides,Elec. Trans. Numer. Anal., 16 (2003), 129–142.
[5] R. W. Freund and M. Malhotra, A block QMR algorithm for
non-Hermitian linear systems with multiple right-hand sides,Lin. Alg. Appl., 254 (1997), 119–157.
[6] Matrix Market, http://math.nist.gov/MatrixMarket/
– 47 –
JSIAM Letters Vol.1 (2009) pp.48–51 c©2009 Japan Society for Industrial and Applied Mathematics
On parallelism of the I-SVD algorithm
with a multi-core processor
Hiroki Toyokawa1,2, Kinji Kimura1, Masami Takata3 and Yoshimasa Nakamura1,2
Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan1
SORST, JST, Japan2
Graduate School of Humanities and Sciences, Nara Women’s University, Kitauoyanishi-machi,Nara, 630-8506, Japan3
E-mail [email protected]
Received February 23, 2009, Accepted June 4, 2009
Abstract
The I-SVD algorithm is a singular value decomposition algorithm consisting of the mdLVsscheme and the dLV twisted factorization. By assigning each piece of computations to eachcore of a multi-core processor, the I-SVD algorithm is parallelized partly. The basic idea is ause of splitting and deflation in the mdLVs. The splitting divides a bidiagonal matrix into twosmaller matrices. The deflation gives one of the singular values, and then the correspondingsingular vector becomes computable by the dLV. Numerical experiments are done on a multi-core processor, and the algorithm can be about 5 times faster with 8 cores.
Keywords singular value decomposition, I-SVD, multi-core processor, parallelism
Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications
1. Introduction
Singular value decomposition (SVD) is one of the mostimportant matrix operations in numerical linear algebra.It is applied to many fields in engineering such as imageprocessing and data search.
Several algorithms have been developed for SVD. TheQR method, Divide and Conquer (DC) and a combina-tion of the bisection and the inverse iteration have beenthe standard algorithms [1]. However, these algorithmsleave something to be improved, for example, in compu-tative cost, relative error of singular values, orthogonal-ity of singular vectors, or memory usage.
Recently a new SVD algorithm called the IntegrableSVD (I-SVD) algorithm is designed in [2–5]. To solve alarge scaled SVD problem, the I-SVD should be paral-lelized. However, it is difficult to parallelize the whole I-SVD algorithm efficiently [6]. This is because the mdLVsscheme [2] is a serial algorithm. In [7,8], the double DCalgorithm is proposed which is a combination of a sim-plified DC and the dLV twisted factorization, and thensuitable for parallel SVD. A parallelism of the I-SVDalgorithm itself is still an important open problem.
These days multi-core processors are widely used.Therefore, a parallelism of the I-SVD algorithm withmulti-core processors comes to be hoped.
In this article, a new parallelism of the I-SVD algo-rithm is established by regarding the I-SVD as a con-tinual job sequences and by assigning each job to eachcore. Besides, numerical experiments are done in orderto evaluate this new parallelism.
2. Singular value decomposition and the
I-SVD algorithm
2.1 Singular value decompositionSingular value decomposition for a square matrix M ∈
Rm×m is described as follows:
M = UΣV T ,
where U, V ∈ Rm×m are orthogonal matrices and Σ ∈Rm×m is a diagonal matrix whose elements are non-negative. It is known that any arbitrary square matrixadmits SVD [9]. SVD itself can be also achieved for m×nrectangular matrices, but in this article we treat thecases of m × m square matrices. The kth diagonal el-ement σk in descending order of Σ is the kth singularvalue of M . The kth columns of U and V are the kthleft singular vector uk and the kth right singular vectorvk, respectively.
2.2 mdLVs scheme and the dLV twisted factorizationThe mdLVs scheme is one of the efficient algorithms
for computing singular values of an upper bidiagonalmatrix B = (bi,j) ∈ Rm×m. As the procedure continues,the matrix converges as follows:
Bconverges−−−−−−→
b(n)1,1 0
b(n)2,2 0
. . . . . .b(n)m−1,m−1 0
b(n)m,m
,
– 48 –
JSIAM Letters Vol. 1 (2009) pp.48–51 Hiroki Toyokawa et al.
I-SVD
mdLVsScheme
dLV TwistedFactorization
HouseholderTransformation
B Σ
M U,V
Fig. 1. Process flow of the I-SVD.
limn→∞
b(n)k,k =
√√√√σ2k −
∞∑l=1
θ(l)2
where θ(n) is a non-negative value called shift. Comparedto other algorithms, the mdLVs scheme converges fasterthan QR and bisection, and the resulting singular valueshave a better relative accuracy than QR and dqds. Thefast convergence is due to the shift, but it is the causeof the seriality of the mdLVs scheme.
An arbitrary m×m matrix can be transformed into anupper bidiagonal matrix B by the Householder transfor-mation [9]. This is the preconditioning process. There-fore the mdLVs scheme can be applied to an arbitrarydense matrix M .
The dLV twisted factorization is an algorithm for com-puting singular vectors of an upper bidiagonal matrix Bfor given singular values. Compared to other algorithmssuch as the inverse iteration, the singular vectors can becomputed faster without iteration [3].
2.3 Integrable singular value decomposition (I-SVD)The I-SVD algorithm is accomplished by using the
mdLVs scheme and the dLV twisted factorization [4, 5].First the I-SVD algorithm computes singular valuesσk of B by the mdLVs scheme. Second it computesthe corresponding singular vectors (uk, vk) by the dLVtwisted factorization(Fig. 1). It is possible to parallelizethe computation of singular vectors because the dLVtwisted factorization is parallel executable. However themdLVs scheme is difficult to parallelize. Thus, the totalparallelism is practically limited by the seriality of themdLVs scheme [6].
3. How to parallelize
In this section, we explain how to parallelize the I-SVD algorithm.
3.1 Parallelization of the mdLVs scheme by splitting ofmatrices
During iteration in the mdLVs scheme, a subdiagonalelement, say b
(n)k,k, may be less than a small positive value
εc ≈ 0. Then this element can be regarded as 0, andthe matrix B(n) ∈ Rm×m can be separated into twosmaller bidiagonal matrices B
(n)1 ∈ Rm1×m1 and B
(n)2 ∈
Rm2×m2 , where m2 = m −m1. We call this division asplitting of B(n). The splitting is illustrated as follows.
B(n) =
∗ ∗
∗ ε∗ ∗
∗
splitting−−−−−→ε≈0
∗ ∗
∗ 0∗ ∗
∗
=
(B
(n)1
B(n)2
).
The singular values of B(n)1 , B
(n)2 is computable by
the mdLVs scheme individually. Therefore, the mdLVsscheme can be partly parallelized by assigning the com-putations of singular values of B
(n)1 , B
(n)2 to each of the
cores.
3.2 Parallel execution of the dLV twisted factorizationbased on a deflation
We call the splitting for m1 = m− 1,m2 = 1, a defla-tion. The deflation is expressed as
B(n) =
∗ ∗
∗ ∗∗ ε
∗
deflation−−−−−→ε≈0
∗ ∗
∗ ∗∗ 0
∗
=
B(n)1
b(n)m,m
.
In this case the bottom row and the right column iseliminated, and the (m,m) element b
(n)m,m converges to√
σ2m −
∑nl=1 θ(l)2. Therefore the deflation gives rise to
the singular value σm.In the I-SVD algorithm, when a singular value σk is
computed, computation of the corresponding singularvectors uk, vk can be started, independent of the compu-tation of other singular vectors. Thus, when a deflationoccurs, computation of the corresponding singular vec-tors can be started by an idle core.
3.3 Multi-core processorA multi-core processor is a processor which is com-
posed of several computation cores in one package. Com-pared to a traditional multi-processor system, a multi-core processor has several advantages. For example, itsshared cache memory contributes to efficient memoryaccess. Another point is that a multi-core processor con-sumes less electric power than other multi-processor sys-tems such as PC clusters.
3.4 Assigning jobs to each coreIn order to parallelize both the mdLVs scheme and the
dLV twisted factorization, we regard the SVD process asthe following continual job sequences.
(A) Continue the mdLVs scheme until the next splittingoccurs.
(B) Compute singular vectors corresponding to the
– 49 –
JSIAM Letters Vol. 1 (2009) pp.48–51 Hiroki Toyokawa et al.
job B
deflation deflationsplitting
Core #1
Core #2
Core #3
job A
job B job B
job A
job A
job B job B
job B
job A : a job for computing singular values by mdLVs method
: a job for computing singular vectors by dLV twisted factorization
time
the list ofunprocessed jobs
Fig. 2. Assigning jobs to each core.
computed singular value with the twisted factoriza-tion.
When a job appears, it is added to the list of unprocessedjobs immediately. An idle core takes out a job from thelist and process it. If a core takes out a job (A), thecore executes the mdLVs scheme until a splitting occurs.When a splitting occurs, the core makes two jobs of type(A) for each of the divided matrices. When a deflationoccurs during the procedure, the core creates a job (B)for the computed singular value. If a core takes out ajob (B), the core executes the twisted factorization andcomputes left and right singular vectors of a singularvalue. In this way, these jobs are processed parallely andthe I-SVD algorithm can be parallelized partly (Fig. 2).
The program is made to execute as follows (Fig. 3):
(1) The main process makes working threads and allo-cate them to each of the cores, and prepares the listof unprocessed jobs (pre-processing).
(2) The working threads execute jobs (compute the sin-gular values with the mdLVs scheme and the singu-lar vectors with the dLV twisted factorization).
(3) When all the singular values and vectors are com-puted, the working threads terminate and themain process prepares the result of SVD (post-processing).
In the pre-processing, the upper limit of the numberof working threads which the main process makes isthe same as the number of cores which the computerhas. The working threads take jobs from the list ofunprocessed jobs, execute the jobs autonomically, andexchange the result with other working threads by us-ing shared memory. Therefore, some jobs wait when thenumber of jobs is greater than that of cores.
4. Numerical experiments
Tests have been carried out on our computer de-scribed in Table 1. As numerical examples, we considertwo types of matrices as in Table 2. Let us set the step-size δ(n) ≡ 1 in the mdLVs scheme and the shift θ(n)2
is calculated by using the generalized Newton bound
Bidiagonal matrix B
Pre-process
SVD completed?Yes
No
Take a job from the list of unprocessed jobs
Execute the job
Executed on the working threads
Singular values / vectors of B
Post-process
Fig. 3. Process flow.
Table 1. Specification of the test computer.
CPU Intel Xeon E5430 2.66GHz Quad × 2(4 cores × 2 processors)
Memory 64GBytesOS Linux (Fedora 7)Compiler g++ and gfortran 4.1.2-27 (-O3 option)
Table 2. Two types of upper bidiagonal matrices.
MatrixDiagonal Subdiagonal Distribution
bk,k bk,k+1 of σk
B1 2.001 2 separatedB2 random random clustered
(without subtraction, p = 2) [10]. The test program iswritten in C++ and parallelized with pthread (librariesimplementing POSIX Threads standard), neither MPIor OpenMP.4.1 Results for B1
Table 3 and Fig. 4 show the computation time andrelative speed for SVD of B1. According to them, thecomputation time for B1 does not decrease efficiently asthe number of cores increases. Fig. 5 shows the num-ber of working cores as the time of SVD of 8000× 8000matrix B1 elapses (the time for pre-processing and post-processing is excluded). It is to be noted that no splittingoccurs during the computation for B1. Hence, not suffi-ciently many jobs (B) appear and some of the cores areidle. It seems that only several cores work at the sametime, but remaining cores have nothing to process andare idle.
4.2 Results for B2
Table 4 and Fig. 6 and Fig. 7 show the result for B2.Fig. 7 shows the number of working cores during SVD of
– 50 –
JSIAM Letters Vol. 1 (2009) pp.48–51 Hiroki Toyokawa et al.
Table 3. Computation time for B1.
#Cores 1 2 4 8
m = 4000 9.28 4.74 2.51 1.87(speed-up) (1.00) (1.96) (3.70) (4.96)
m = 8000 36.96 18.87 9.98 7.33(speed-up) (1.00) (1.96) (3.70) (5.04)
m = 12000 83.08 42.60 23.03 16.45(speed-up) (1.00) (1.95) (3.61) (5.05)
m = 16000 154.12 78.47 44.54 30.19(speed-up) (1.00) (1.96) (3.46) (5.11)
in second: [s]
76543210
deepS evitaleR
0 1 2 3 4 5 6 7 8 9
m=4000m=8000m=12000m=16000
#Cores
Fig. 4. Relative speed for B1.
876543210
seroC gnikro
W#
0 1 2 3 4 5 6 7Elapsed Time [sec]
Fig. 5. Changes of the number of working cores for B1.
8000× 8000 matrix B2 (the time for pre-processing andpost-processing is excluded). Different from the case ofB1, splittings occur successively. Hence, jobs (A) and (B)appear frequently and the cores are used a little moreefficiently than the case of B1.
5. Conclusion
It has been difficult to parallelize the mdLVs [6]. Inthis paper, a new scheme is designed to parallelize the I-SVD algorithm on a multi-core processor. We achieve theparallelism by assigning the jobs of computing singularvalues and singular vectors to each of the cores.
References
[1] J. W. Demmel, Applied Numerical Linear Algebra, SIAM,Philadelphia, 1997.
[2] M. Iwasaki and Y. Nakamura, Accurate computation of sin-gular values in terms of shifted integrable schemes, Japan J.Indust. Appl. Math., 23 (2006), 239–259.
[3] M. Iwasaki, S. Sakano and Y. Nakamura, Accurate twistedfactorization of real symmetric tridiagonal matrices and itsapplication to singular value decomposition (in Japanese),Trans. Japan Soc. Indust. Appl. Math., 15 (2005), 461–481.
Table 4. Computation time for B2.
#Cores 1 2 4 8
m = 4000 9.96 5.23 2.82 1.75(speed-up) (1.00) (1.90) (3.53) (5.69)
m = 8000 38.07 19.93 10.92 6.57(speed-up) (1.00) (1.91) (3.49) (5.79)
m = 12000 87.67 46.30 27.21 16.48(speed-up) (1.00) (1.89) (3.22) (5.32)
m = 16000 162.18 87.83 54.50 33.79(speed-up) (1.00) (1.85) (2.98) (4.80)
in second: [s]
76543210
deepS evitaleR
0 1 2 3 4 5 6 7 8 9
m=4000m=8000m=12000m=16000
#Cores
Fig. 6. Relative speed for B2.
876543210
seroC gnikro
W#
0 1 2 3 4 5Elapsed Time [sec]
Fig. 7. Changes of the number of working cores for B2.
[4] Y. Nakamura, Functionality of Integrable System (inJapanese), Kyoritsu Pub., Tokyo, 2006.
[5] M. Takata, K. Kimura, M. Iwasaki and Y. Nakamura, Per-formance of a new scheme for bidiagonal singular valuedecomposition of large scale, Proc. of 24th IASTED Int.Conf. on Parallel and Distributed Computing and Networks(PDCN2006), pp. 304–309, 2006.
[6] T. Konda, M. Takata, M. Iwasaki, S. Tsujimoto and Y. Naka-mura, A parallelization of singular value computation algo-rithm by the Lotka-Volterra system (in Japanese), IPSJ SIGTech. Rep., HPC-100 (2004), 13–18.
[7] T. Konda and Y. Nakamura, A new algorithm for singularvalue decomposition and its parallelization, Parallel Comput-ing, 35 (2009), 331–344.
[8] T. Konda, M. Takata, M. Iwasaki and Y. Nakamura, A newsingular value decomposition algorithm suited to paralleliza-tion and preliminary results, Proc. of 2nd IASTED Int.Conf. on Advances in Computer Science and Technology(ACST2006), pp. 79–85, 2006.
[9] G. Golub and C. Van Loan, Matrix Computation, Third Edi-tion, John Hopkins Univ. Press, Baltimore, 1996.
[10] T. Yamashita, K. Kimura and Y. Nakamura, On subtractionfree formula for the diagonal elements of inverse powers ofsymmetric positive definite tridiagonal matrices, in prepara-tion, 2009.
– 51 –
JSIAM Letters Vol.1 (2009) pp.52–55 c©2009 Japan Society for Industrial and Applied Mathematics
A numerical method for nonlinear eigenvalue problems
using contour integrals
Junko Asakura1, Tetsuya Sakurai
2, Hiroto Tadano
2, Tsutomu Ikegami
3and Kinji Kimura
4
Research and Development Division, Square Enix Co. Ltd., Shinjuku Bunka Quint Bldg. 3-22-7 Yoyogi, Shibuya-ku, Tokyo 151-8544, Japan1
Department of Computer Science, University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, Ibaraki305-8573, Japan2
Information Technology Research Institute, AIST, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568,Japan3
Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan4
E-mail [email protected]
Received February 14, 2009, Accepted May 1, 2009
Abstract
A contour integral method is proposed to solve nonlinear eigenvalue problems numerically.The target equation is F (λ)x = 0, where the matrix F (λ) is an analytic matrix function of λ.The method can extract only the eigenvalues λ in a domain defined by the integral path, byreducing the original problem to a linear eigenvalue problem that has identical eigenvalues inthe domain. Theoretical aspects of the method are discussed, and we illustrate how to applyof the method with some numerical examples.
Keywords nonlinear eigenvalue problem, contour integral, analytic matrix function
Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications
1. Introduction
We consider a numerical method using contour inte-grals to solve nonlinear eigenvalue problems. The non-linear eigenvalue problem (NEP) involves finding eigen-pairs (λ,x) that satisfy F (λ)x = 0, where the matrixF (λ) is an analytic matrix function of λ. NEPs appearin a variety of problems in science and engineering, suchas accelerator designs [1] and delay differential equa-tions [2].
We herein propose a numerical method using contourintegrals to solve eigenvalue problems for analytic matrixfunctions. The method is closely related to the Sakurai-Sugiura (SS) method for generalized eigenvalue prob-lems [3], and inherits many of its strong points includingsuitability for execution on modern distributed parallelcomputers. We have already extended the SS methodto polynomial eigenvalue problems [4]. In this paper, wewill further generalize the SS method to eigenvalue prob-lems of analytic matrix functions. In the SS method, theoriginal problem is converted to a generalized eigenvalueproblem whose dimension is smaller than the originalone. The converted problem is obtained numerically bysolving a set of linear equations. These linear equationsare derived from the original problem and can form alarge system, but they are independent and can be solvedin parallel. Moreover, the proposed method is free fromthe fixed point iterations required in Newton’s method.In this paper, the extension of the SS method for NEPsis discussed from a theoretical point of view. Some nu-merical examples are also reported, with results that areconsistent with the theory.
The reminder of the present paper is organized as fol-lows. In the next section, we introduce the Smith formfor analytic matrix functions, which is a natural exten-sion of the Smith form for matrix polynomials [5]. InSection 3, we present the numerical method for solvingNEPs by means of the SS method and discuss theoret-ical results related to the proposed method. In Section4, we present the algorithm of the SS method for thecase where the integral path is given by a circle and nu-merical integration is performed using the trapezoidalrule. Some numerical examples are shown in Section 5.Finally, conclusions and suggestions for future researchare presented in Section 6.
2. Canonical form for matrix analytic
functions
Let F (z) be an analytic matrix function defined in asimply connected region in C. The matrix F (z) is calledregular if the determinant of F (z) is not identically zeroin a domain Ω.
We introduce the Smith form for analytic matrix func-tions [5].
Theorem 1 Let F (z) be an n × n regular matrix ana-
lytic function. Then, F (z) admits the representation
P (z)F (z)Q(z) = D(z), (1)
where D = diag(d1(z), . . . , dn(z)) is a diagonal matrix of
analytic functions dj(z) for j = 1, 2, . . . , n and such that
dj(z)/dj−1(z) are analytic functions for j = 2, 3, . . . , n.
In addition, P (z) and Q(z) are n × n regular analytic
– 52 –
JSIAM Letters Vol. 1 (2009) pp.52–55 Junko Asakura et al.
matrix functions with constant nonzero determinants.
The eigenpairs of the NEP are formally derived fromthe Smith form. Let qj(z) be the column vectors of Q(z):
Q(z) = (q1(z) . . . qn(z)), (2)
and pj(z) be
P (z)H = (p1(z) . . . pn(z)). (3)
Let λ1, . . . , λs be distinct zeros of dn(z) in Ω. Becausedj(z)/dj−1(z) is an analytic function, dj(z) can be rep-resented in terms of λi as
dj(z) = hj(z) ·s
∏
i=1
(z − λi)αji , j = 1, 2, . . . , n,
where hj(z) are analytic functions and hj(z) 6= 0 forz ∈ Ω. In addition, αji ∈ Z
+ (non-negative integer) andαji ≤ αj′i for j < j′.
The eigenpairs of the NEP are related to the λi andthe qj(λi) above as follows.
Lemma 2 Let qj(z) be the vector in (2), and λi be a
zero of dj(z). Then, the eigenpair (λi, qj(λi)) is a solu-
tion for the NEP F (λ)x = 0.
Proof Because P (z) and Q(z) are invertible,
F (λi)qj(λi) = P (λi)−1D(λi)Q(λi)
−1(Q(λi)ej)
= dj(λi)P (λi)−1
ej .
Since dj(λi) = 0, we have the result of the lemma.(QED)
Note that if the eigenvalue λi is simple and not de-generate, i.e., λi is a simple zero of det F (z), we haveαji = 0 for j 6= n and αni = 1.
3. An eigensolver using contour integrals
In this section, we propose a numerical method us-ing contour integrals for eigenvalue problems of analyticmatrix functions.
Let F (z) be an n×n regular analytic matrix function.For nonzero vectors u and v ∈ C
n, we define
f(z) := uHF (z)−1
v
for z ∈ Ω such that |F (z)| 6= 0, namely dn(z) 6= 0.The existence of the Smith form allows us to prove thefollowing theorem.
Theorem 3 Let D(z) = diag(d1(z), . . . , dn(z)) be the
Smith form for F (z), and let P (z) and Q(z) be defined
by (1). Then, f(z) admits the representation
f(z) =n
∑
j=1
χj(z)
dj(z), (4)
where χj(z) are analytic functions in Ω.
Proof By Theorem 1, we obtain
f(z) = uHQ(z)D(z)−1P (z)v
=
n∑
j=1
uHqj(z)pj(z)Hv
dj(z)
=
n∑
j=1
χj(z)
dj(z),
where χj(z) := uHqj(z)pj(z)Hv.
(QED)
Let Γ be a positively oriented closed Jordan curvein Ω. Without loss of generality, we may assume thatλ1, . . . , λm(m ≤ s) are distinct eigenvalues in the inte-rior of Γ ⊂ Ω. Assume that these eigenvalues are simpleand not degenerate. Then we can suppose that αji = 0for j 6= n and αni = 1.
Definition 4 For a non-negative integer k, the moment
µk is defined as
µk :=1
2πi
∫
Γ
zkf(z)dz, k = 0, 1, . . . . (5)
Definition 5 Two m×m Hankel matrices H<m and Hm
can be defined as
Hm := [µi+j−2]mi,j=1, H<
m := [µi+j−1]mi,j=1.
The following theorem is one of the main results ofthe present paper.
Theorem 6 If χn(λl) 6= 0 for 1 ≤ l ≤ m, then
the eigenvalues of the pencil H<m − λHm are given by
λ1, . . . , λm.
Proof By Theorem 3 and (5), we obtain
µk =1
2πi
∫
Γ
zkf(z)dz
=
n∑
j=1
1
2πi
∫
Γ
χj(z)
dj(z)zkdz
=
m∑
l=1
νlλkl ,
where νl := χn(λl)/d′n(λl).Let Vm be the Vandermonde matrix
Vm :=
1 1 · · · 1λ1 λ2 · · · λm
......
...λm−1
1 λm−12 · · · λm−1
m
.
Let Dm := diag(ν1, . . . , νm), Λm := diag(λ1, . . . , λm).One can easily verify that
H<m − λlHm = VmDm(Λm − λlI)V T
m . (6)
If χn(λl) 6= 0 for 1 ≤ l ≤ m, then νl 6= 0 for 1 ≤ l ≤ m.Therefore, λ1, . . . , λm are the eigenvalues of the pencilH<
m − λHm.(QED)
Therefore, we can obtain eigenvalues λ1, . . . , λm of theanalytic matrix function F (z) by solving the general-ized eigenvalue problem H<
mw = λHmw. The proof ofthe above theorem for generalized eigenvalue problemsis given in [3].
Now, we evaluate eigenvectors. Let
sk :=1
2πi
∫
Γ
zkF (z)−1vdz, k = 0, 1, . . . ,m − 1, (7)
– 53 –
JSIAM Letters Vol. 1 (2009) pp.52–55 Junko Asakura et al.
and let S := (s0 . . . sm−1). We obtain the following re-lationship between S and qn(z) of (2).
Lemma 7 Let qn(z) be the vector in (2) and let (λl,wl)(1 ≤ l ≤ m) be the eigenpairs of the pencil H<
m − λHm.
Then,
qn(λl) = clSwl, cl ∈ C\0for l = 1, 2, . . . ,m.
Proof From (6), we have
0 = (H<m − λlHm)wl = VmDm(Λm − λlI)V T
mwl.
Since Vm and Dm are nonsingular, and Λmel = λlel,V T
mwl admits the following representation:
V Tmwl = αlel, αl ∈ C\0.
Here, el is the lth unit vector. Let pn(z) be the vectorin (3) and let
βl :=pn(λl)
Hv
d′n(λl)
for l = 1, . . . ,m. Note that βl 6= 0 if χn(λl) 6= 0. As inthe proof of Theorem 3.4, we can derive the followingequation.
S = (s0 . . . sm−1) = (β1qn(λ1) . . . βmqn(λm))V Tm .
Therefore,
qn(λl) =1
βl
SV −Tm el =
1
βl
S1
αl
wl = clSwl,
with cl = 1/(αlβl) for l = 1, 2, . . . ,m.(QED)
From Lemma 2 and Lemma 7, we have the followingtheorem.
Theorem 8 Let (λj ,wj)(j = 1, . . . ,m) be the eigen-
pairs of the pencil H<m − λHm. Then, (λj ,xj)(j =
1, . . . ,m) are the eigenpairs for the NEP F (λ)x = 0,
where
xj = Swj , j = 1, . . . ,m.
4. A case where Γ is given by a circle
Let Γ = γ + ρeiθ(0 ≤ θ < 2π) be a circle in Ω withcenter γ and radius ρ. To retain numerical accuracy, weuse the shifted and scaled moments
µk :=1
2πi
∫
Γ
(
z − γ
ρ
)k
f(z)dz, k = 0, 1, . . . (8)
instead of (5). We evaluated the integral using the N -point trapezoidal rule, leading to the approximations forµk,
µk ≈ µk :=1
N
N−1∑
j=0
(
ωj − γ
ρ
)k+1
f(ωj),
where ωj = γ + ρe2πi(j+1/2)/N for j = 0, 1, . . . , N − 1.Note that due to the shift and scaling, the eigenvaluesλl(l = 1, . . . ,m) are also shifted and scaled. The eigen-values of the original NEP can be recovered from γ+ρλl.
The block version of the SS method for generalizedeigenvalue problems was proposed in [6]. The numerical
examples in [6] indicate that the block SS method hasthe potential to achieve greater accuracy.
Let U and V be n × L matrices, the column vec-tors of which are linearly independent. The block SSmethod is defined by replacing f(z) in (5) with the ma-trix UHF (z)−1V . Accordingly, the kth moment µk in(5), the Hankel matrices Hm, H<
m, the vector sk in (7)and the matrix S = (s0 . . . sm−1) are replaced by thecorresponding block versions:
Mk :=1
2πi
∫
Γ
zkUHF (z)−1V dz, k = 0, 1, . . . , (9)
HmL := [Mi+j−2]mi,j=1, H<
mL := [Mi+j−1]mi,j=1,
Sk :=1
2πi
∫
Γ
zkF (z)−1V dz, k = 0, 1, . . . ,
and S = (S0 . . . Sm−1), respectively. Here m is a positiveinteger such that mL ≥ m. Note that Mk = UHSk bydefinition. Using the N -point trapezoidal rule, we obtainthe following approximation for Sk:
Sk :=1
N
N−1∑
j=0
(
ωj − γ
ρ
)k+1
F (ωj)−1V, k = 0, 1, . . . .
(10)
The algorithm for the block SS method is shown below.
Algorithm of the block SS method
Input: U, V ∈ Cn×L, N, K, L, δ, γ, ρ
Output: λ1, . . . , λm′ , x1, . . . , xm′
1. Set ωj ← γ + ρexp(2πi(j + 1/2)/N), j = 0, 1, . . . , N − 1
2. Compute F (ωj)−1V , j = 0, 1, . . . , N − 1
3. Compute Sk, k = 0, . . . , 2K − 1 by (10)
4. Form Mk = UHSk, k = 0, 1, . . . , 2K − 1
5. Construct HKL and H<
KL∈ C
KL×KL
6. Perform a singular value decomposition of HKL
7. Omit small singular value components σj < δ ·maxi σi
so that
Hm′ = HKL(1 : m′, 1 : m′), H<
m′ = H<
KL(1 : m′, 1 : m′),
where m′ ≤ KL
8. Compute the eigenpairs (ζ1, w1), . . . , (ζm′ , wm′) of the
pencil H<
m′ − λHm′
9. Construct S = (S0 . . . Sm′−1)
10. Compute xj = Swj , j = 1, 2, . . . , m′
11. Set λj ← γ + ρζj , j = 1, 2, . . . , m′
In practice, we assign random matrices to U and V .We can obtain the eigenvectors corresponding to theeigenvalues whose algebraic multiplicity is less than Lby the proposed method.
5. Numerical Examples
In this section, we confirm the validity of the proposedmethod using some nonlinear eigenvalue problems. Thealgorithm was implemented in MATLAB 7.4. We gen-erated a matrix V := (v1 . . . vL) using the MATLABfunction rand and set U = V . The MATLAB com-mand mldivide was used to evaluate F (z)−1V numeri-cally. The evaluated eigenvectors are normalized so that‖x‖2 = 1.
Example 1 We consider the NEP with F (z) that
– 54 –
JSIAM Letters Vol. 1 (2009) pp.52–55 Junko Asakura et al.
Table 1. Relative errors and residuals for Example 1.
k λk |λk − λk|‖F (λk)xk‖2
‖F (λk)‖2‖xk‖2
1 −3.141592653589789 4.00 ×10−15 2.58 × 10−12
2 −1.570796326794277 6.20 ×10−13 1.67 × 10−12
3 0.000000000000661 6.61 ×10−13 1.52 × 10−11
4 1.570796326761298 3.36 ×10−11 1.11 × 10−10
5 1.945910151338245 2.28 ×10−9 3.11 × 10−8
6 3.141592653589055 7.39 ×10−13 3.57 × 10−11
Table 2. Residuals for Example 2.
k λ1/2
k
‖F (λk)xk‖F
‖F (λk)‖F ‖xk‖F
1 0.059793132432759+0.000000862974322i 1.41 ×10−15
2 0.083768827897551+0.000019602073839i 6.38 ×10−17
3 0.084151690319656+0.000003399562592i 1.25 ×10−16
4 0.087765211962668+0.000038185170188i 3.47 ×10−17
5 0.088352686155210+0.000005726087041i 3.13 ×10−17
6 0.093424713463988+0.000393486671297i 5.55 ×10−17
was transformed using elementary transformations fromdiag(cos(z), sin(z), ez − 7). The following list shows theelements of F (z).
(1, 1) 2ez + cos(z) − 14(1, 2) (z2 − 1) sin(z) + (2ez + 14) cos(z)(1, 3) 2ez − 14(2, 1) (z + 3)(ez − 7)(2, 2) sin(z) + (z + 3)(ez − 7) cos(z)(2, 3) (z + 3)(ez − 7)(3, 1) ez − 7(3, 2) (ez − 7) cos(z)(3, 3) ez − 7
The integral path Γ taken was as follows:
Γ = γ + ρeiθ (γ = 0, ρ = 3.2).
There are six eigenvalues λ1 = −π, λ2 = −π/2, λ3 =0, λ4 = π/2, λ5 = log 7(≈ 1.9459), λ6 = π in Γ. We tookN = 64, K = 8, L = 2, and δ = 10−12.
The numerical results are shown in Table 1. We com-pared the eigenvalues λj that are obtained by theblock SS method to the exact eigenvalues λj. As shownin Table 1, we obtained all of the eigenvalues in Γ.
Example 2 We consider the problem that models aradio-frequency gun cavity given in [1] with
F (λ) = A0 − λA1 + i√
λ − σ21W1 + i
√
λ − σ22W2,
where A0, A1,W1,W2 ∈ R9956×9956. We took σ1 = 0 and
σ2 = 0.043551. The integral path Γ taken was as follows:
Γ = γ + ρeiθ (γ = 0.00625, ρ = 0.00375).
We took N = 64, K = 8, L = 24, and δ = 10−12.The numerical results are shown in Table 2. We
used Frobenius norm instead of 2-norm. Table 2 showsthat the proposed method found six eigenvalues in Γ.The largest residual of the computed eigenpairs was1.41 × 10−15.
Example 3 Lastly, we consider the problem derivedby the delay-differential equation with a single delay
Table 3. Residuals for Example 3.
k λk
‖F (λk)xk‖2
‖F (λk)‖2‖xk‖2
1 17.773906360548423 2.41 ×10−16
2 14.471490519110109 2.14 ×10−16
3 8.961335387916407 3.43 ×10−16
4 0.941336550782964 1.43 ×10−15
5 −10.407305274429442 1.08 ×10−15
6 −31.755615500815374 9.43 ×10−16
given in [2]:
F (λ) = −λI + A0 + A1e−τλ,
where A0, A1 ∈ R1000×1000 are tridiagonal matrices and
I is the identity matrix. We took τ = 0.05. The integralpath Γ taken was as follows:
Γ = γ + ρeiθ (γ = −10, ρ = 30).
We took N = 48, K = 16, L = 4 and δ = 10−12. It isknown that a total of six real eigenvalues lie in [−40, 20].
The numerical results are shown in Table 3. As shownin Table 3, the proposed method found all eigenvaluesin the specified domain. The largest residual of the com-puted eigenpairs was 1.43 × 10−15.
6. Conclusion
In the present paper, we have proposed a numericalmethod using contour integrals for nonlinear eigenvalueproblems of analytic matrix functions. The method isconsidered as an extension of the numerical method forpolynomial eigenvalue problems proposed in [4]. Themethod enables us to obtain the eigenpairs of analyticmatrix functions by solving the generalized eigenvalueproblem, which is derived by solving systems of linearequations. Since these linear systems are independent ofeach other, they can be solved in parallel. In addition,the proposed method does not need fixed point iterationssuch as Newton’s iteration. Error analysis for the pro-posed method and the estimation of suitable parametersremain as topics for future research.
References
[1] B. Liao, Subspace projection methods for model order reduc-tion and nonlinear eigenvalue computation, PhD thesis, De-
partment of mathematics, Univ. of California at Davis, 2007.[2] E. Jarlebring, The spectrum of delay-differential equations:
numerical methods, stability and perturbation, PhD thesis,
Inst. Comp. Math, TU Braunschweig, 2008.[3] T. Sakurai and H. Sugiura, A projection method for general-
ized eigenvalue problems, J.Comput.Appl.Math., 159 (2003)119–128.
[4] J.Asakura, T. Sakurai, H.Tadano, T. Ikegami and K.Kimura,A numerical method for polynomial eigenvalue problems us-
ing contour integral, submitted.
[5] I. Gohberg and L. Rodman, Analytic matrix functions withprescribed local data, J. d’Analyse Mathematique, 40 (1981)90–128.
[6] T. Ikegami, T. Sakurai, and U. Nagashima, A filter diago-
nalization for generalized eigenvalue problems based on theSakurai-Sugiura method, Tech. Rep. CS-TR-08-13, Univ. of
Tsukuba, 2008.
– 55 –
JSIAM Letters Vol.1 (2009) pp.56–59 c©2009 Japan Society for Industrial and Applied Mathematics
Differential qd algorithm for totally nonnegative band
matrices: convergence properties and error analysis
Yusaku Yamamoto1
and Takeshi Fukaya1
Department of Computational Science and Engineering, Nagoya University, Furo-cho,Chikusa-ku, Nagoya, 464-8603, Japan1
E-mail [email protected]
Received May 27, 2009, Accepted August 13, 2009
Abstract
We analyze convergence properties and numerical properties of the differential qd algorithmgeneralized for totally nonnegative band matrices. In particular, we show that the algorithmis globally convergent and can compute all eigenvalues to high relative accuracy.
Keywords eigenvalue, totally nonnegative, band matrix, qd algorithm, error analysis
Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications
1. Introduction
The dqds (differential quotient-difference with shift)algorithm [1] is one of the most successful algorithmsfor computing the eigenvalues of a symmetric positive-definite tridiagonal matrix. It is mathematically equiva-lent to the LR algorithm, but instead of working on thetridiagonal matrix T , it uses the elements of its bidiag-onal factor B, where T = BT B, as basic variables and
performs the LR step(
B(n+1))T
B(n+1) = B(n)(
B(n))T
without forming B(n)(
B(n))T
explicitly. The dqds al-gorithm has a unique feature that it can compute allthe eigenvalues to high relative accuracy and is used asone of the key ingredients in the MR3 algorithm for thesymmetric tridiagonal eigenproblem [2].
The high relative accuracy of the dqds algorithm ismade possible thanks to the following two properties:
(i) The algorithm involves only positive variables anduses no subtractions except in the introduction oforigin shifts. Thus the rounding errors arising in thecomputation of B(n) → B(n+1) are kept small in thesense of relative error.
(ii) Small relative errors in the elements of B(n) causesmall relative perturbation in the eigenvalues.
It is therefore interesting to investigate if the dqds algo-rithm can be extended to other types of matrices whileretaining these useful properties.
In this paper, we consider a class of totally nonnega-tive band matrices. A matrix is called totally nonnega-tive (TN) if all of its minors are nonnegative [3]. The TNband matrices can be regarded as a generalization of thesymmetric positive-definite tridiagonal matrices and thedqd (differential qd) algorithm, which is a shiftless ver-sion of the dqds, can be naturally extended to deal withthis type of matrix. We study the convergence propertiesand numerical properties of the dqd algorithm applied tothe TN band matrix. In particular, we prove its globalconvergence and show that it shares the properties (i)and (ii) of the dqds algorithm. Using these facts, we
show that the algorithm can compute all eigenvalues ofa TN band matrix to high relative accuracy.
Recently, Fukuda et al. formulated a new algorithm foreigenvalue computation based on an integrable systemcalled the discrete hungry Lotka-Volterra (dhLV) system[4]. We also point out that there is a close relationshipbetween this algorithm and the dqd algorithm for theTN band matrix.
2. The differential qd algorithm for a to-
tally nonnegative band matrix
Let Li (i = 1, . . . ,mL) and Ri (i = 1, . . . ,mR) bem × m lower and upper bidiagonal matrices defined by
Li =
qi1
1 qi2
1 qi3
. . .. . .
1 qim
and
Ri =
1 ei1
1 ei2
1. . .
. . . ei,m−1
1
, (1)
respectively, where qik (i = 1, . . . ,mL, k = 1, . . . ,m)and eik (i = 1, . . . ,mR, k = 1, . . . ,m − 1) are somepositive numbers. In this paper, we consider the problemof computing the eigenvalues of a matrix defined as theproduct of these bidiagonal factors:
A = L1L2 · · ·LmLR1R2 · · ·RmR
. (2)
Clearly, A is a nonsingular band matrix with lower band-width mL and upper bandwidth mR. Moreover, A is to-tally nonnegative, since it is a product of positive bidi-agonal matrices [5]. From this fact, it can also be con-cluded that all the eigenvalues of A are simple, real and
– 56 –
JSIAM Letters Vol. 1 (2009) pp.56–59 Yusaku Yamamoto et al.
positive. When mL = mR = 1, A is similar to some sym-metric positive-definite tridiagonal matrix. In this sense,the matrix in (2) can be regarded as a generalization ofthe symmetric positive-definite tridiagonal matrix.
Now, consider computing the eigenvalues of A withthe LR algorithm [6]. In the LR algorithm, we first de-compose the input matrix A into the product of lowerand upper triangular matrices as A = L(0)R(0). Thenwe reverse the order of the triangular factors to obtainthe next iterate A(1) = R(0)L(0). This iterate is againdecomposed as A(1) = L(1)R(1) and this process is con-tinued until convergence.
In our case, because the original A is already definedas a product of multiple lower and upper triangular ma-trices, we can omit the first decomposition and writeL(0) and R(0) as
L(0) = L1 · · ·LmL, R(0) = R1 · · ·RmR
. (3)
Furthermore, the decomposition A(1) = L(1)R(1) can beperformed stepwise as the following example shows:
A(1) = R(0)
1 R(0)
2 L(0)
1 L(0)
2
= R(0)
1 L(0,1)1 R
(0,1)2 L
(0)
2
= L(0,2)1 R
(0,1)1 R
(0,1)2 L
(0)
2
= L(0,2)1 R
(0,1)1 L
(0,1)2 R
(0,2)2
= L(0,2)1 L
(0,2)2 R
(0,2)1 R
(0,2)2 ≡ L
(1)
1 L(1)
2 R(1)
1 R(1)
2 .
In summary, when the input matrix A is defined as aproduct of bidiagonal factors as in (2), one step of theLR algorithm can be performed as a sequence of mLmR
LR transformations RiLj = LjRi for a pair of bidiago-nal matrices without forming A(1) explicitly. Since eachLR transformation can be done without subtractions byusing the differential qd algorithm [1], one step of theLR algorithm can be carried out without subtractions.We call this the multiple dqd algorithm. The outline ofthis algorithm is shown below.
[Algorithm 1: The multiple dqd algorithm]Initialization:
q(0,0)
j,k = qj,k (1 ≤ j ≤ mL, 1 ≤ k ≤ m)
e(0,0)
i,k = ei,k (1 ≤ i ≤ mR, 1 ≤ k ≤ m − 1)1: for n = 0, 1, . . . do2: for j = 1, . . . ,mL do3: for i = mR, . . . , 1 do
4: dj,1 = q(n,mR−i)
j,1
5: for k = 1, . . . ,m − 1 do
6: q(n,mR−i+1)
j,k = dj,k + e(n,j−1)
i,k
7: e(n,j)
i,k = e(n,j−1)
i,k q(n,mR−i)
j,k+1/q
(n,mR−i+1)
j,k
8: dj,k+1 = dj,kq(n,mR−i)
j,k+1/q
(n,mR−i+1)
j,k
9: end for10: q
(n,mR−i+1)
j,m = dj,m
11: end for12: end for13: q
(n+1,0)
j,k = q(n,mR)
j,k (1 ≤ j ≤ mL, 1 ≤ k ≤ m)
14: e(n+1,0)
i,k = e(n,mL)
i,k (1 ≤ i ≤ mR, 1 ≤ k ≤ m−1)15: end for
3. Global convergence properties
In this section, we establish a theorem that guaranteesglobal convergence of the multiple dqd algorithm. Thisis proved by extending a technique used in the proof ofglobal convergence of the dqds algorithm [7].
Theorem 1 Let the eigenvalues of a TN matrix defined
by (2) be λ1 > λ2 > · · · > λm > 0. When the multi-
ple dqd algorithm is applied to this matrix, the following
equalities hold:
limn→∞
mL∏
j=1
q(n,0)
j,k = λk (1 ≤ k ≤ m), (4)
limn→∞
e(n,0)
i,k = 0 (1 ≤ i ≤ mR, 1 ≤ k ≤ m − 1), (5)
limn→∞
e(n+1,0)
i,k
e(n,0)
i,k
=λk+1
λk
(1 ≤ k ≤ m − 1). (6)
That is, each subdiagonal element e(n,0)
ik of R(n)
i con-
verges to zero linearly at a rate that depends only on kand not on i. The product of the kth diagonal elements
of L(n)
1 , . . . , L(n)mL converges to the eigenvalue λk.
Proof Algorithm 1 uses the differential qd transforma-tion (lines 4 to 10) to implement the LR transformation
R(n,j−1)
i L(n,mR−i)
j = L(n,mR−i+1)
j R(n,j)
i . (7)
However, for the purpose of proof, it is more convenientto go back to the original equation (7). By comparingthe diagonal elements of (7), we have
q(n,mR−i+1)
j,k = q(n,mR−i)
j,k − e(n,j)
i,k−1+ e
(n,j−1)
i,k (8)
for 1 ≤ k ≤ m with the boundary condition e(n,j)
i,0 =
e(n,j)
i,m = 0. By summing up (8) for 1 ≤ i ≤ mR and
1 ≤ l ≤ n and noting that q(n+1,0)
j,k > 0, we have
0 <
n∑
l=0
mR∑
i=1
e(l,j)
i,k−1< q
(0,0)
j,k +
n∑
l=0
mR∑
i=1
e(l,j−1)
i,k . (9)
Using (9) repeatedly, noting that e(l,j)
i,m = 0 and taking
the limit of n → ∞ [7], we have∑
∞
n=0
∑mR
i=1e(n,j)
i,k < ∞for 1 ≤ k ≤ m − 1. Hence e
(n,j)
i,k → 0 and (5) is proved.
Thus R(n)
1 , . . . , R(n)mR tend to identity matrices. From (8)
and e(n,j)
i,k → 0 we know that each q(n,i)
j,k converges toa constant that does not depend neither on n nor i.Consequently, A(n) converges to a lower triangular ma-
trix whose kth diagonal element is∏mL
j=1q(n,0)
j,k , whichin turn are the eigenvalues of A. Using the same argu-ment as in [7], we know that these eigenvalues appear inthe descending order of magnitude, as claimed by (4).Eq. (6) follows by multiplying line 7 of Algorithm 1 sideby side from j = 1 to j = mL and using (4).
(QED)
4. Accuracy of the computed eigenvalues
Our next objective is to study the accuracy of themultiple dqd algorithm in finite precision arithmetic. Tothis end, we need to combine rounding error analysis
– 57 –
JSIAM Letters Vol. 1 (2009) pp.56–59 Yusaku Yamamoto et al.
with perturbation theory, as in the case of the dqds algo-rithm [1]. First, we quote a lemma concerning roundingerrors resulting from a single LR transformation [1].
Lemma 2 Assume that qkmk=1
and ekm−1
k=1are com-
puted from qkmk=1
and ekm−1
k=1by the LR transforma-
tion RL = LR of differential qd type using finite pre-
cision arithmetic. Then there exist qkmk=1
, ekm−1
k=1,
qkmk=1
and ekm−1
k=1such that
• each qk and ek differs from qk and ek by at most 3
ulps (units in the last place) and 1 ulp, respectively,
• each qk and ek differs from qk and ek by at most 2
ulps, respectively, and
• qkmk=1
and ekm−1
k=1are computed from qkm
k=1
and ekm−1
k=1by an LR transformation in exact
arithmetic.
As for the relative perturbation of the eigenvalues, Koevproves the following lemma [5].
Lemma 3 Let A be a matrix obtained by multiplying an
arbitrary element of one of the bidiagonal factors of the
right-hand-side of (2) by 1 + ǫ, where |ǫ| ≪ 1. Denote
the eigenvalue of A corresponding to λk by λk. Then the
following bound holds.
|λk − λk| ≤2|ǫ|
1 − 2|ǫ|λk (1 ≤ k ≤ m). (10)
By combining Lemmas 2 and 3, we can prove the fol-lowing theorem.
Theorem 4 Assume that the multiple dqd algorithm is
executed in finite precision arithmetic. Denote the ma-
trix (defined implicitly as a product of bidiagonal fac-
tors) at the nth step by A(n) and its eigenvalues by
λ(n)
1 , . . . , λ(n)m . Then for 1 ≤ k ≤ m,
∣
∣
∣λ
(n+1)
k − λ(n)
k
∣
∣
∣≤ 16mmLmRu
1 − 16mmLmRuλ
(n)
k , (11)
where u denotes the machine epsilon.
Proof From Lemma 2, we can decompose each LRtransformation (7) in one step of the multiple dqd al-gorithm into three (virtual) steps:
(a) Multiply each diagonal element of matrix L(n,mR−i)
j
by 1 + ǫk and each subdiagonal element of R(n,j−1)
i
by 1 + δk, where |ǫk| ≤ 3u and |δk| ≤ u, to obtainLj and Ri.
(b) Perform exact LR transformation to get Lj and Ri
from Lj and Ri.
(c) Multiply each diagonal element of Lj by 1 + ǫk and
each subdiagonal element of Ri by 1 + δk, where
|ǫk| ≤ 2u and |δk| ≤ 2u, to obtain L(n,mR−i+1)
j and
R(n,j)
i .
We also recall a lemma from [8] that for positive inte-gers m1,m2 and a sufficiently small positive number δ,if |δ1| ≤ m1δ/(1−m1δ) and |δ2| ≤ m2δ/(1−m2δ), then
|(1 + δ1)(1 + δ2) − 1| ≤ (m1 + m2)δ
1 − (m1 + m2)δ. (12)
1 200 400 600 800 1000
10 -20
10 -40
10 -60
10 -80
10 -100
1
Iterationse i, k(n,0)
e1,1(n,0) e2,1
(n,0) e3,1(n,0)
Fig. 1. Convergence of e(n,0)
i,k.
Using Lemma 3 and (12) repeatedly, we know that theeigenvalues before and after step (a), which we denoteby λk and λk, respectively, are related as follows:
∣
∣λk − λk
∣
∣ ≤ 8mu
1 − 8muλk. (13)
Clearly, step (b) does not change the eigenvalues. The
eigenvalues after step (c), which we denote by λk, is re-lated with λk as follows:
∣
∣
∣λk − λk
∣
∣
∣≤ 8mu
1 − 8muλk. (14)
Combining (13) and (14) using (12) again and notingthat mLmR LR transformations are needed to completeone multiple dqd step, we get at (11).
(QED)
Eq. (11) means that only small relative error is intro-duced into each eigenvalue by one multiple dqd step.Consequently, by iterating the step until A(n) becomessufficiently close to a lower triangular matrix, all eigen-values can be computed to high relative accuracy.
5. Numerical results
To confirm our analysis in the preceding sections, weperformed preliminary numerical experiments. We setm = 10, mL = 1 and mR = 3 and set the values of eik
and qik using random numbers in [0, 1]. The computationwas done using Fortran in double-precision arithmetic.To check the accuracy of the computed eigenvalues, wealso used Mathematica with 100 decimal digits, formedA = L1R1R2R3 explicitly and computed its eigenvalues.
Fig. 1 shows e(n,0)
i,k as a function of n. Clearly, all ofthem converge to zero linearly. Though there are 27 ofthem, only 9 lines can be seen in Fig. 1. This is because
the convergence rate of e(n,0)
i,k depends only on k and
therefore the lines for e(n,0)1,1 , e
(n,0)2,1 and e
(n,0)3,1 , for exam-
ple, overlap. This is in accordance with Theorem 1. Theeigenvalues computed by the multiple dqd algorithm, aswell as those computed by Mathematica, are shown inTable 1. It is clear that all the eigenvalues are computedto high relative accuracy. This confirms the analysis inthe previous section.
– 58 –
JSIAM Letters Vol. 1 (2009) pp.56–59 Yusaku Yamamoto et al.
Table 1. Accuracy of the computed eigenvalues.
k Eigenvalue (multiple dqd) Eigenvalue (Mathematica)
1 4.91157038895942 4.911570388959422 4.39994910448026 4.39994910448027
3 2.78858778700010 2.788587787000094 1.70154451982839 1.701544519828395 1.54018002523324 1.540180025233236 7.87221915454847× 10−1 7.87221915454843× 10−1
7 5.05289068997342× 10−1 5.05289068997343× 10−1
8 2.99791761043647× 10−1 2.99791761043646× 10−1
9 2.55626925974812× 10−2 2.55626925974812× 10−2
10 1.24375120694785× 10−4 1.24375120694785× 10−4
6. Relationship with the dhLV algorithm
Let’s consider the case of mL = 1 and mR = M ,where M is some positive integer. In this case it is wellknown [9] that the eigenvalue problem of A is equivalentto the eigenvalue problem of an (M + 1)m × (M + 1)mmatrix C defined by
C =
L1
RM
. . .
R2
R1
. (15)
More precisely, if λk is an eigenvalue of A, then
(λk)1/(M+1)
exp(2πp/(M + 1)) (0 ≤ p ≤ M) are eigen-values of C, and vice versa. Furthermore, express therow index i of C as i = (l − 1)m + b (1 ≤ l ≤ M + 1,1 ≤ b ≤ m) and permute the ith row to the i′th row,where i′ = (b− 1)(M + 1) + l. Apply the same permuta-tion also to the columns. This amounts to replacing Cwith F = PCPT , where P is a permutation matrix, andis called shuffling [9]. It is easy to see that F is a matrixwith only two nonzero diagonals as follows:
Fi+1,i = 1 (1 ≤ i ≤ (M + 1)m − 1),
Fi,i+M = F(b−1)(M+1)+l,b(M+1)+l−1
=
q1,b (l = 1, 1 ≤ b ≤ m),eM+2−l,b (2 ≤ l ≤ M+1, 1 ≤ b ≤ m−1).
By rewriting Fi,i+M as Ui (1 ≤ i ≤ (m − 1)M + m),it can be seen that F is exactly the type of matrix forwhich the dhLV algorithm [4] has been designed. Thuswe can say that the class of matrices whose eigenvaluescan be computed by the dhLV algorithm is a subset ofthe class of matrices whose eigenvalues can be computedaccurately by the multiple dqd algorithm.
7. Related work
It is widely recognized that by representing a TN ma-trix as a product of positive bidiagonal factors, variouslinear algebra operations can be performed without sub-tractions [10] [11]. Using this fact, several highly accu-rate algorithms have been proposed for linear simultane-ous equations [11], eigenvalue problems [5] and singularvalue problems [12] with TN coefficient matrices.
Among them, Koev’s algorithm [5] can compute allthe eigenvalues of a general TN matrix to high relativeaccuracy. It first reduces a TN matrix in factored form to
a product of a lower bidiagonal matrix an upper bidiag-onal matrix using subtraction-free operations and thencompute the eigenvalues of the resulting matrix with thedqds algorithm. In this approach, the reduction phaserequires O(m3) flops. In contrast, the multiple dqd al-gorithm applies the LR transformation directly to a ma-trix represented by (2). The computational work of oneLR transformation is O(mmLmR). Hence the latter ap-proach may be advantageous when mL,mR ≪ m andonly a small number of eigenvalues of the smallest mag-nitude are required.
8. Conclusion
In this paper, we studied convergence properties andnumerical properties of the differential qd algorithm fortotally nonnegative band matrices. Our analysis showsthat the algorithm is globally convergent and can com-pute all eigenvalues to high relative accuracy. Theseproperties were confirmed by numerical experiments.Our future work includes introducing origin shifts intothis algorithm to speed up the convergence. It is also thesubject of our future research to further investigate therelationship between the multiple dqd algorithm and thedhLV algorithm.
Acknowledgements
The authors would like to thank Prof. MasashiIwasaki, Prof. Satoshi Tsujimoto, Prof. Yoshimasa Naka-mura, Ms. Akiko Fukuda and Mr. Kensuke Aishima forvaluable discussion. This work is partially supported bythe Ministry of Education, Science, Sports and Culture,Grant-in-aid for Scientific Research.
References
[1] K. V. Fernando and B. N. Parlett, Accurate singular valuesand differential qd algorithms, Numer.Math., 67 (1994), 191–229.
[2] I. S. Dhillon and B. N. Parlett, Multiple representations tocompute orthogonal eigenvectors of symmetric tridiagonalmatrices, Lin. Alg. Appl., 387 (2004), 1–28.
[3] T.Ando, Totally positive matrices, Lin.Alg.Appl., 90 (1987),165–219.
[4] A.Fukuda, E. Ishiwata, M.Iwasaki and Y.Nakamura, The dis-crete hungry Lotka-Volterra system and a new algorithm forcomputing matrix eigenvalues, Inverse Problems, 25 (2009),015007.
[5] P. Koev, Accurate eigenvalues and SVDs of totally nonnega-
tive matrices, SIAM J. Matrix Anal. Appl., 27 (2005), 1–23.[6] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Claren-
don Press, Oxford Univ., Oxford, 1965.[7] K. Aishima, T. Matsuo, K. Murota and M. Sugihara, On con-
vergence of the dqds algorithm for singular value computa-tion, SIAM J. Matrix Anal. Appl., 30 (2008), 522–537.
[8] N. J. Higham, Accuracy and Stability of Numerical Algo-rithms, SIAM, Philadelphia, 1996.
[9] D. S. Watkins, Product Eigenvalue Problems, SIAM Review,47 (2005), 3–40.
[10] A. Berenstein, S. Fomin and A. Zelevinsky, Parametrizations
of canonical bases and totally positive matrices, Adv. Math.,122 (1996), 49–149.
[11] M. Gasca and J. M. Pena, Total positivity and Neville elimi-nation, Lin. Alg. Appl., 165 (1992), 25–44.
[12] J. Demmel, Accurate singular value decomposition of struc-tured matrices, SIAM J. Matrix Anal. Appl., 21 (1999), 562–580.
– 59 –
JSIAM Letters Vol.1 (2009) pp.60–63 c©2009 Japan Society for Industrial and Applied Mathematics
Algorithm for computing Kronecker basis
Yoshiaki Kakinuma1, Kazuyuki Hiraoka
1, Hiroki Hashiguchi
1,
Yutaka Kuwajima1
and Takaomi Shigehara1
Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama City, Saitama 338-8570, Japan1
E-mail [email protected]
Received March 18, 2009, Accepted September 16, 2009
Abstract
To make clear geometrical structure of an arbitrarily given pencil, it is crucial to understandKronecker structure of the pencil. For this purpose, GUPTRI is the only practical numericalalgorithm at present. However, although GUPTRI determines the Kronecker canonical form(KCF), it does not give any direct information on Kronecker bases (KB). In this paper, we pro-pose a numerical algorithm which gives a full of information on Kronecker structure includingKB as well as KCF. The main ingredient of the algorithm is singular value decompositions,which guarantee the backward stability of the algorithm.
Keywords pencil, Kronecker canonical form, Kronecker basis
Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications
1. Introduction
Let (f, g)V,W be a pencil, namely a pair of linear map-pings f, g between two finite-dimensional linear spacesV and W over C. It is known [1] that for an arbitrary(f, g)V,W , there exists a Kronecker basis (KB), namelya pair of bases 〈xj〉, 〈yj〉 of V and W composed of se-quences, each of which satisfies one of five diagrams;
(R) 0f−µg←−− x1
g−→ y1
f−µg←−− x2
g−→ · · · f−µg←−− xlg−→ yl,
(S1) 0f←− x1
g−→ y1
f←− x2
g−→ · · · f←− xlg−→ 0,
(S2) 0f←− x1
g−→ y1
f←− x2
g−→ · · · f←− xlg−→ yl,
(S3) y0
f←− x1
g−→ y1
f←− x2
g−→ · · · f←− xlg−→ 0,
(S4) y0
f←− x1
g−→ y1
f←− x2
g−→ · · · f←− xl−1
g−→ yl−1,
where µ is a nonzero constant associated to the R se-quence and l ≥ 1 is the length of each sequence. Ma-trix representations of R, S1, S2, S3 and S4 sequencesof length l lead to Kronecker blocks Jl(µ), Ll−1, Jl(0),Nl and LT
l−1in the Kronecker canonical form (KCF) for
(f, g)V,W in the standard notation. If V = W and g = i(identity transformation), a KB is just a Jordan basis(JB) of V , composed of only R and S2 sequences. Inthis special case, the constant µ corresponds to a nonzeroeigenvalue of f . For a general case, we will show later inthis paper that µ corresponds to a nonzero eigenvalueassociated to a regular linear transformation g−1
s fs
naturally induced from the original pencil (f, g)V,W .At present, the most reliable and the only practical
numerical algorithm to compute the KCF for an arbi-trary pencil is GUPTRI [2, 3]. However, it cannot giveany direct information on KBs for the pencil. In this pa-per, we propose a novel numerical algorithm to computea KB as well as the KCF for an arbitrary pencil underthe premise that the eigenvalues of the linear transfor-mation g−1
s fs are separately computed. The algorithmis based on a recently found constructive proof for the
existence of a KB which reveals a multilayered geomet-rical structure inherent in the pencils [4]. After outliningtheoretical issues, we describe the algorithm in details.Numerical examples to test the numerical accuracy ofthe algorithm are also reported.
The paper is organized as follows. In Section 2, we il-lustrate the essentials of [4] through a simple but genericexample, which serves to understand the subsequent sec-tions. On the basis of Section 2, the algorithm for com-puting a KB is presented in Section 3 in a form withoutrelying on matrix representations, thereby it is describedin a basis-independent, unique form. After a short dis-cussion on a possible matrix representation in Section 4,numerical examples are shown to confirm the numericalaccuracy of the algorithm in Section 5.
2. Sketch of theoretical aspects
Hereafter, we assume that V and W are unitary spacesover C. For a linear mapping h, in general, denote thekernel, the image and the adjoint mapping of h byN(h), R(h) and h∗, respectively.
Definition 1 For a pencil (f, g)V,W , define a pencil
(f ′, g′)V ′,W ′ by
V ′ ≡ R(f) ∩R(g) ⊂W,W ′ ≡ R(f∗) ∩R(g∗) ⊂ V,f ′ ≡ i∗
R(g∗)←W ′ d−1g iR(g)←V ′ ,
g′ ≡ i∗R(f∗)←W ′ d−1
f iR(f)←V ′ ,
(1)
where dh : R(h∗)→ R(h) is the restriction of h to R(h∗)for each h = f, g, and iU1←U2
is the inclusion from a
subspace U2 of U1 to U1 in general.
Note that the operation of the adjoint mapping i∗U1←U2
on U1 is the orthogonal projection from U1 to U2. Theassertion below represents the importance of the pencil(f ′, g′)V ′,W ′ .
– 60 –
JSIAM Letters Vol. 1 (2009) pp.60–63 Yoshiaki Kakinuma et al.
Assertion 2 Every R sequence in a KB for (f, g)V,W
is obtained by lifting a one-to-one corresponding R se-
quence with the same µ and l in a KB for (f ′, g′)V ′,W ′ ,
while every Si sequence with length l ≥ 2 in a KB
for (f, g)V,W is obtained by lifting a one-to-one corre-
sponding Si sequence with length l − 1 in a KB for
(f ′, g′)V ′,W ′ (i = 1, . . . , 4). Supplying Si sequences of
length 1 (i = 1, . . . , 4), we can construct a KB for
(f, g)V,W from a KB for (f ′, g′)V ′,W ′ .
To confirm this, Theorem 3 plays a crucial role.
Theorem 3 (i) Let x ∈ V and p be the orthogonal pro-
jection from V to W ′. Then we have g′(f(x)) = p(x) if
f(x) ∈ V ′. Similarly, f ′(g(x)) = p(x) if g(x) ∈ V ′.(ii) (a) and (b) are equivalent for y1, y2 ∈ V ′; (a) There
exists x ∈ V such that y1 = f(x) and y2 = g(x). (b)f ′(y2) = g′(y1).
To illustrate Assertion 2, consider a simple but genericexample. Suppose that dimV = 9, dim W = 8, dimV ′ =5, dimN(f) = dim N(g) = 3, dim(N(f)+N(g)) = 5 andthat (f ′, g′)V ′,W ′ has a KB composed of three sequences;
1’) 0f ′−µg′
←−−− y1;1
g′
−→ z1;1
f ′−µg′
←−−− y1;2
g′
−→ z1;2 (µ 6= 0),
2’) 0f ′
←− y2;1
g′
−→ z2;1,
3’) 0f ′
←− y3;1
g′
−→ z3;1
f ′
←− y3;2
g′
−→ 0.
Note that the assumption leads to dimW ′ = 4, dim(N(f) ∩ N(g)) = 1, dimR(f) = dimR(g) = 6 anddim(R(f) + R(g)) = 7. In this setting, we can find aKB for (f, g)V,W composed of six sequences;
1) 0f−µg←−− x1;1
g−→ y1;1
f−µg←−− x1;2
g−→ y1;2,
2) 0f←− x2;1
g−→ y2;1
f←− x2;2
g−→ y2;2,
3) 0f←− x3;1
g−→ y3;1
f←− x3;2
g−→ y3;2
f←− x3;3
g−→ 0,
4) 0f←− x4;1
g−→ 0,
5) y5;0
f←− x5;1
g−→ 0,
6) y6;0 ∈ N(f∗) ∩N(g∗).
To see this, we need three steps.
(I) Existence of x1;1, x1;2, x2;1, x2;2, x3;1, x3;2, x3;3 ∈ V ,y2;2 ∈ W in 1)–3): By Theorem 3 (ii), there existx2;1 in 2) and x3;1, x3;2, x3;3 in 3). Since y2;1 ∈ V ′
in 2), there exists x2;2 such that y2;1 = f(x2;2). Weset y2;2 = g(x2;2). Now we show the existence ofx1;1, x1;2 in 1). The sequence 1’) indicates
f ′(y1;1) = µz1;1 = g′(µy1;1),
f ′(y1;2) = µz1;2 + z1;1 = g′(µy1;2 + y1;1).
Hence, by Theorem 3 (ii), there exist x1;1, x1:2 suchthat
f(x1;1) = µy1;1,
g(x1;1) = y1;1,
f(x1;2) = µy1;2 + y1;1,
g(x1;2) = y1;2,
and these vectors satisfy the diagram 1).
(II) Construction of the basis of V : By the constructionof x1;1, x1;2, x2;2, x3;2 and by Theorem 3 (i), we have
p(x1;1) = µz1;1, p(x1;2) = µz1;2 + z1;1,
p(x2;2) = z2;1, p(x3;2) = z3;1.
By the assumption, the four vectors on the right-hand side are a basis of W ′ since µ 6= 0. SinceW ′⊥ = N(f)+N(g), we confirm that x1;1, x1;2, x2;2,
Table 1. Properties of the KB for (f, g)V,W .
∈ N(g) /∈ N(g)
∈ N(f) x4;1 x2;1, x3;1
/∈ N(f) x3;3, x5;1 x1;1, x1;2, x2;2, x3;2
∈ R(g) /∈ R(g)
∈ R(f) y1;1, y1;2, y2;1, y3;1, y3;2 y5;0
/∈ R(f) y2;2 y6;0
x3;2 are a basis of a complementary space of N(f)+N(g) in V . Thus we can construct a basis of V byappending a basis of N(f) + N(g) to this. The vec-tors x2;1 in 2) and x3;1 in 3) belong to N(f) byconstruction. Furthermore, y2;1 = g(x2;1), y3;1 =g(x3;1) are a basis of N(f ′). Thus we confirm thatx2;1, x3;1 are a basis of a complementary space ofN(f) ∩ N(g) in N(f), since dim N(f) = 3 anddim(N(f)∩N(g)) = 1. Similarly, we confirm x3;3 ∈N(g)−N(f)∩N(g). By taking x4;1 ∈ N(f)∩N(g)(x4;1 6= 0), x3;3, x4;1 are a basis of a two-dimensionalsubspace of N(g). Furthermore, the subspace in-cludes N(f) ∩ N(g). Hence, since dim N(g) = 3,we have a basis x3;3, x4;1, x5;1 of N(g) by append-ing x5;1 ∈ N(g)−N(f) ∩N(g). Now we have a ba-sis x1;1, x1;2, x2;1, x2;2, x3;1, x3;2, x3;3, x4;1, x5;1 of Vwhich satisfies 1)–5) and the upper table in Table1.
(III) Construction of the basis of W : Set y5;0 = f(x5;1).By construction of a basis of V in (II), the images ofthe six vectors x1;1, x1;2, x2;2, x3;2, x3;3, x5;1 ∈ V −N(f) by f , namely
f(x1;1) = µy1;1, f(x1;2) = µy1;2 + y1;1,
f(x2;2) = y2;1, f(x3;2) = y3;1,
f(x3;3) = y3;2, f(x5;1) = y5;0
are a basis of R(f). Similarly, the images of the sixvectors x1;1, x1;2, x2;1, x2;2, x3;1, x3;2 ∈ V −N(g) byg, namely
g(x1;1) = y1;1, g(x1;2) = y1;2, g(x2;1) = y2;1,
g(x2;2) = y2;2, g(x3;1) = y3;1, g(x3;2) = y3;2
are a basis of R(g). Recalling that the five vec-tors y1;1, y1;2, y2;1, y3;1, y3;2 are a basis of V ′ =R(f) ∩R(g), we confirm that y5;0 ∈ R(f)−R(f) ∩R(g), y2;2 ∈ R(g) − R(f) ∩ R(g) and that theseven vectors y1;1, y1;2, y2;1, y2;2, y3;1, y3;2, y5;0 area basis of R(f) + R(g). Since dimW = 8, wehave a basis of W by appending y6;0 ∈ (R(f) +R(g))⊥ = N(f∗)∩N(g∗) to this. Now we have a ba-sis y1;1, y1;2, y2;1, y2;2, y3;1, y3;2, y5;0, y6;0 of W whichsatisfies 1)–6) and the lower table in Table 1.
By definition of (f ′, g′)V ′,W ′ , if and only if both of f, gare bijective, we have V ′ = W , W ′ = V , leading tof ′ = g−1, g′ = f−1. Otherwise, we have either dim V ′ <dim W or dimW ′ < dim V . Since V , W are finite-dimensional, by iterating the procedure to construct(fj , gj)Vj ,Wj
≡ (f ′j−1, g′
j−1)V ′
j−1,W ′
j−1(j = 1, 2, . . . ) with
the initial pencil (f0, g0)V0,W0≡ (f, g)V,W several times
(say s), we reach a pencil (fs, gs)Vs,Wswhere both of
fs, gs are bijective. For this pencil, g−1s fs is a regular
– 61 –
JSIAM Letters Vol. 1 (2009) pp.60–63 Yoshiaki Kakinuma et al.
linear transformation on Vs, that has a JB of Vs. TheJB immediately gives a KB for (fs, gs)Vs,Ws
, composedof only R sequences. By applying the above process tothis successively, we can construct a KB for the originalpencil (f, g)V,W . Note that, if we know all the eigenval-ues µ for g−1
s fs separately, a JB for g−1s fs is obtained
within the present framework, by finding S2 sequencesfor the pencil (fs−µgs, gs)Vs,Ws
for each eigenvalue µ ofg−1
s fs.
3. KB algorithm
The algorithm below computes a KB for (fj , gj)Vj ,Wj
(j = s, s−1, . . . , 1, 0) successively; The sequences in fivesets Rj , S1;j , S2;j , S3;j and S4;j supply R, S1, S2, S3 andS4 sequences in the KB for (fj , gj)Vj ,Wj
, respectively.Hereafter, denote a sequence with the property
y0
fj−µgj←−−− x1
gj−→ y1
fj−µgj←−−− x2
gj−→ · · · fj−µgj←−−− xl
gj−→ yl
by cj(µ) ≡ (y0, x1, y1, . . . , xl, yl). For µ = 0, cj(0) issimply abbreviated to cj .
KB Algorithm (KBA)
1. Define a series of pencils (fj , gj)Vj ,Wj(j=0, 1, . . . , s)
recursively by (fj , gj)Vj ,Wj≡ (f ′j−1, g
′
j−1)V ′
j−1,W ′
j−1
(j = 1, 2, . . . , s) from (f0, g0)V0,W0≡ (f, g)V,W ,
where s is the minimum integer such that both offs, gs are bijective.
2. If dimVs = 0, set Rs = ∅. Otherwise, define the setRs of R sequences for (fs, gs)Vs,Ws
such that thesequences in Rs give a KB for (fs, gs)Vs,Ws
. (Seethe last part of Section 2.) Set S1;s = S2;s = S3;s =S4;s = ∅.
3. Repeat the steps (a)–(f) for j = s, . . . , 1;
(a) If Rj = ∅, set Rj−1 = ∅. Otherwise, find the setRj−1 from Rj as follows; For each cj(µ) = (z0 =0, y1, z1, . . . , yl, zl) ∈ Rj , find a solution xk foreach linear system
fj−1(xk) = µyk +yk−1
gj−1(xk) = yk
(k = 1, . . . , l), y0 ≡ 0.
Define LiftR(cj(µ)) ≡ (y0, x1, y1, . . . , xl, yl) andset Rj−1 = LiftR(cj(µ)) | cj(µ) ∈ Rj.
(b) Repeat the following procedure for i = 1, . . . , 4.If Si;j = ∅, set Si;j−1 = ∅. Otherwise, find the
set Si;j−1 from Si;j as follows; For each cj =(z0, y1, z1, . . . , yl, zl) except cj = (z) ∈ S4;j (seeTable 2 for z0, zl), find a solution xk for each lin-ear system
fj−1(xk) = yk−1
gj−1(xk) = yk
(k = 1, . . . , l + 1)
with y0, yl+1 in Table 2. Define LiftS(cj) ≡(y0, x1, y1, . . . , xl, yl, xl+1, yl+1). For cj = (z) ∈S4;j , define LiftS(cj) ≡ (fj−1(z), z, gj−1(z)). Fi-
nally, set Si;j−1 = LiftS(cj) | cj ∈ Si;j.(c) Set S1;j−1 = S1;j−1 ∪ cj−1 = (0, xt, 0) | t =
1, . . . , q1 with a basis x1, . . . , xq1of N(fj−1) ∩
N(gj−1).
(d) Set S2;j−1 = S2;j−1∪cj−1 = (0, ut, gj−1(ut)) | t =1, . . . , q2,where u1, . . . , uq2
are chosen such that
Table 2. Property of z0, zl and definition of y0, yl+1.
z0 zl y0 yl+1
cj ∈ S1;j 0 0 0 0
cj ∈ S2:j 0 gj(yl) 0 gj−1(xl+1)
cj ∈ S3:j fj(y1) 0 fj−1(x1) 0
cj ∈ S4:j fj(y1) gj(yl) fj−1(x1) gj−1(xl+1)
the set u1, . . . , uq2 ∪ x1 | cj−1 = (0, x1, y1, . . . ,
xl, yl) ∈ S1;j−1 ∪ S2;j−1 is a basis of N(fj−1).(e) Set S3;j−1 = S3;j−1∪cj−1 = (fj−1(ut), ut, 0) | t =
1, . . . , q3, where u1, . . . , uq3are chosen such that
the set u1, . . . , uq3∪xl | cj−1 = (y0, x1, y1, . . . ,
xl, 0) ∈ S1;j−1 ∪ S3;j−1 is a basis of N(gj−1).(f) Set S4;j−1 = S4;j−1 ∪ (yt) | t = 1, . . . , q4 with a
basis y1, . . . , yq4of N(f∗j−1) ∩N(g∗j−1).
4. Matrix representation
We consider a possible matrix representation of thetwo core procedures required in KBA. One is to compute(f ′, g′)V ′,W ′ from (f, g)V,W in step 1. The other is tosolve linear systems in steps (a) and (b) in step 3.
For an m × n matrix pencil (F,G)V,W with V =Cn, W = Cm, we can construct the m′ × n′ pencil(F ′, G′)V ′,W ′ in three steps, where V ′ = R(F ) ∩ R(G)(n′ = dimV ′) and W ′ = R(F ∗)∩R(G∗) (m′ = dim W ′).Set rH = rankH for each H = F,G.
1. Calculate the singular value decomposition (SVD)for each H = F,G;
H = IW←R(H)DHI∗V←R(H∗),
where DH ∈ CrH×rH is a diagonal matrix withnonzero singular values of H as diagonal entries,and IV←R(H∗) ∈ Cn×rH and IW←R(H) ∈ Cm×rH
are column-orthogonal matrices. Note that the col-umn vectors of IV←R(H∗) and IW←R(H) are rightand left singular vectors associated to nonzero sin-gular values of H, respectively.
2. For each (F , G,m,W , V ′) = (F,G,m,W, V ′),(F ∗, G∗, n, V,W ′), calculate a basis c1, . . . , cr of thekernel of (IW←R(F),−IW←R(G))∈Cm
×(rF +rG),
and define IR(F)←V ′ ∈ CrF×r
and IR(G)←V ′ ∈CrG×r
by setting(
IR(F)←V ′
IR(G)←V ′
)
= (c1, . . . , cr) ∈ C(rF +rG)×r
.
3. Finally, calculate the matrix products
F ′ = I∗R(G∗)←W ′D
−1
G IR(G)←V ′ ∈ Cm′×n′
,
G′ = I∗R(F∗)←W ′D
−1
F IR(F )←V ′ ∈ Cm′×n′
.
Note that sinceIW←V ′ ≡ IW←R(F)IR(F)←V ′
= IW←R(G)IR(G)←V ′ ∈ Cm×r
by construction in step 2, the column vectors of IW←V ′
are a basis of V ′ ⊂ W and in particular, we confirmr = dimV ′. The computation of c1, . . . , cr is carriedout also by SVD.
Let yf ,yg ∈ V ′. The linear system for x ∈ V(
FG
)
x =
(
yf
yg
)
(2)
– 62 –
JSIAM Letters Vol. 1 (2009) pp.60–63 Yoshiaki Kakinuma et al.
is written as(
IW←R(F )DF I∗V←R(F∗)
IW←R(G)DGI∗V←R(G∗)
)
x =
(
IW←V ′y′
f
IW←V ′y′
g
)
,
where y′
f ,y′g ∈ Cn′
are the coordinate vectors of yf ,yg
with respect to the basis of V ′ determined by the col-umn vectors of IW←V ′ . Thus the linear system in (2) isequivalent to
(
DF I∗V←R(F∗)
DGI∗V←R(G∗)
)
x =
(
IR(F )←V ′y′
f
IR(G)←V ′y′
g
)
.
Within a finite-precision computation, the equationmight be overdetermined in general. A possible solverin numerics is the least squares method, where SVD(Moore-Penrose inverse) plays a crucial role.
5. Numerical experiment
Numerical computation is carried out in double-precision arithmetic. As described in the previous sec-tion, the main ingredient of KBA is SVD. To keep nu-merical accuracy, cut-off parameter ε is required for re-moving small singular values of a relative size less thanε compared to the maximum singular value. At eachstage involved in (Fj , Gj)Vj ,Wj
(j = 1, . . . , s), we in-troduce two parameters; εj;1 for computing SVD of Fj ,Gj and the kernels, and εj;2 to solve linear systems.For the moment, we use a common cut-off parameterεj;1 = εj;2 = 10−8 (j = 1, . . . , s). This value is adoptedas a default value of the cut-off parameter EPSU for SVDin double-precision GUPTRI routine in LAPACK.
To confirm the numerical accuracy, we examine themaximum relative error involved in the sequences in KB;
EK ≡ maxc∈K
‖FXc−µYc;g −Yc;f‖∞‖FXc‖∞
,‖GXc−Yc;g‖∞‖GXc‖∞
.
Here K ≡ R0 ∪ (∪4i=1Si;0) is the output of KBA,
namely the set of the sequences giving rise to a KBfor input pencil (F,G)V,W . For each c ≡ c0(µ) =(y0,x1,y1, . . . ,xl,yl) ∈ K, we set Xc = (x1, . . . ,xl),Yc;f = (y0, . . . ,yl−1) and Yc;g = (y1, . . . ,yl). Note thatFXc−µYc;g−Yc;f = GXc−Yc;g = O in infinite-precisioncomputation.
We examine two types of test matrices;
Type-A (generic): m× n matrices F,G with randomintegers m,n∈ [100, 110](m 6= n) and random numbersin the range [−1, 1] for the elements.
Type-B (non-generic): F −λG = PK(λ)Q−1, whereP,Q are invertible matrices with random numbers inthe range [−1, 1] for elements, and
K(λ) =
(
n1⊕
k1=1
Jlk1(µk1
)
)
⊕(
n2⊕
k2=1
Llk2
)
⊕(
n3⊕
k3=1
Jlk3(0)
)
⊕(
n4⊕
k4=1
Nlk4
)
⊕(
n5⊕
k5=1
LTlk5
)
is a KCF with random integers nj ∈ [1, 5] (j = 1, . . . ,5),random integers lkj
∈ [1, 5] (kj = 1, . . . , nj ; j = 1, . . . ,5) and random numbers µk1
∈ (0, 10] (k1 = 1, . . . , n1).
Table 3. Distribution of EK .
relative errorfrequency
Type-A Type-B
10−2 < EK 0 14
10−4 < EK ≤ 10−2 0 0
10−6 < EK ≤ 10−4 0 6
10−8 < EK ≤ 10−6 0 3
10−10 < EK ≤ 10−8 0 33
10−12 < EK ≤ 10−10 0 411
10−14 < EK ≤ 10−12 72 533
EK ≤ 10−14 928 0
As known for non-square m × n generic pencils (d ≡|n−m| 6= 0), we have for n−m > 0, K(λ) = (
⊕s
k=1Ll)⊕
(⊕s′
k′=1Ll+1) with l = [m/d], s′ = m − dl, s = d − s′,
while for m−n > 0, K(λ) = (⊕s
k=1LT
l )⊕ (⊕s′
k′=1LT
l+1)
with l = [n/d], s′ = n−dl, s = d−s′. Type A is expectedto simulate a generic case. Meanwhile, type B has a non-trivial general Kronecker structure by construction. Themiddle (right) column in Table 3 shows a distribution ofEK for 1000 samples of Type-A (Type-B) matrix pencils.Note for Type-B that, at the final stage involved in theregular pencil (Fs, Gs)Vs,Ws
, we use exact eigenvaluesdetermined from K(λ) as input for computing a KB for(Fs, Gs)Vs,Ws
. Thus experimental results of Ek for Type-B below directly show the numerical error caused byKBA.
For Type-A, we can confirm EK ≃ 10−12 even in theworst case. We also numerically confirmed that the KCFis of generic type in all cases, as expected.
For Type-B, we can confirm EK ≤ 10−8 (the valueof the common cut-off parameter) in 977 cases. ThoughEK > 10−8 in 23 cases, we confirmed in all cases thatEK is made less than 10−8 if the two cut-off param-eters εj;1, εj;2 are appropriately adjusted in the range[10−15, 10−7] at each iterative step (j = 1, . . . , s). In ad-dition, we observed that KBA works well even with theeigenvalues numerically computed for (Fs, Gs)Vs,Ws
, ifwe use an average for closely-spaced eigenvalues.
As well-known, the determination of Kronecker struc-ture is essentially ill-conditioned problem in general. Inparticular, round-off error in numerics might reducesnon-generic Kronecker structures to generic ones. In thepresent implementation, we numerically confirmed forType-B matrix pencils that KBA succeeds in reproduc-ing the original KCF for the 97% of all. An extensiveanalysis on numerical stability of KBA is one of the mainissues in future.
References
[1] F. R. Gantmacher, The theory of matrices, Vol. II, Chelsea,New York, 1959.
[2] J. Demmel, B. Kagstrom, The generalized Schur decomposi-tion of an arbitrary pencil A−λB : Robust software with er-ror bounds and applications. Part I : Theory and algorithms,
ACM Trans. Math. Software, 19 (1993), 160–174.[3] J. Demmel, B. Kagstrom, The generalized Schur decomposi-
tion of an arbitrary pencil A−λB : Robust software with errorbounds and applications. Part II : Software and applications,
ACM Trans. Math. Software, 19 (1993), 175–201.[4] H.Hashiguchi, K.Hiraoka, T. Shigehara, An elementary proof
for the existence of Kronecker basis, preprint.
– 63 –
JSIAM Letters Vol.1 (2009) pp.64–67 c©2009 Japan Society for Industrial and Applied Mathematics
Robust exponential hedging in a Brownian setting
Keita Owari1
Graduate School of Economics, Hitotsubashi University, 2-1, Naka, Kunitachi, 186-8601,Japan1
E-mail [email protected]
Received September 29, 2009, Accepted October 15, 2009
Abstract
This paper studies the robust exponential hedging in a Brownian factor model, giving asolvable example using a PDE argument. The dual problem is reduced to a standard stochasticcontrol problem, of which the HJB equation admits a classical solution. The optimal strategywill be expressed in terms of the solution to the HJB equation.
Keywords robust utility maximization, stochastic control, duality
Research Activity Group Mathematical Finance
1. Introduction
This paper aims to provide a solvable example for therobust exponential hedging problem studied by [1]:
minimize supP∈P
EP [e−α(θ·ST −H)], over θ ∈ Θ. (1)
Here S is a d-dim. cadlag locally bounded semimartin-gale on a filtered probability space (Ω,F , (Ft)t∈[0,T ], R),P is a convex set of probability measures absolutely con-tinuous w.r.t. R, H is a random variable and Θ is a set ofadmissible integrands for S. The set P is a mathematicalexpression of model uncertainty, and (1) is equivalent tothe maximization of the robust exponential utility fromthe net terminal wealth for the seller of the claim H.
The problem (1) is solved via its dual:
minimize H(Q|P )−αEQ[H], over (Q,P ) ∈ Qf ×P,(2)
where H( · | ·) denotes the relative entropy, and Qf is theset of R-absolutely continuous local martingale measuresfor S, having finite relative entropy with some P ∈ P.
Assume:
(A1) dP/dR : P ∈ P is weakly compact in L1(R).
(A2) Qef (S) := Q ∈ Qf : Q ∼ R 6= ∅.
(A3) eα|H|dP/dR : P ∈ P is uniformly integrable andsupP∈P
EP [e(α+ε)|H|] < ∞, for some ε > 0.
Under (A1)–(A3), [1] shows that the dual problem (2)
of (1) admits a solution ( QH , PH) ∈ Qf × P which is
maximal in that if ( ˜Q, ˜P ) ∈ Qf ×P is another solution,
then ˜P ≪ PH and d ˜Q/d ˜P = d QH/d PH , ˜P -a.s. Thissolution has a kind of martingale representation:
d QH
d PH
= c · e−α(θ·ST −H), QH -a.s., (3)
where c is a constant, and θ is a predictable (S, QH)-
integrable process such that θ · S is a QH -martingale.Finally, if we assume additionally:
(A4) QH ∼ R,
the strategy θ is shown to be optimal for (1) with theadmissible class ΘH defined as the set of all (S,R)-integrable predictable processes θ such that θ · S is amartingale under all Q ∈ Qf with H(Q| PH) < ∞.
In the sequel, we investigate this problem in a specificsetting for which the optimal strategy θ is explicitely rep-resented, using a standard stochastic control technique.
2. Main results
This section states the main results of this paper. Allproofs are collected in Section 4.
2.1 Setup
Let W = (W 1,W 2) be a 2-dimensional R-Brownianmotion, (Ft)t∈[0,T ] be its augmented natural filtration.Suppose that the price process S is given by the SDE:
dSt = St(b(Yt)dt + σ(Yt)dW 1t ),
dYt = g(Yt)dt + ρdW 1t + ρdW 2
t ,(4)
where ρ ∈ [−1, 1] and ρ =√
1 − ρ2. The set P of can-didate models is given as follows. Let C be a convexcompact subset of R
2 containing the origin, and IP bethe set of 2-dimensional predictable C-valued processes.Then we set
P :=
P ν ∼ R :dP ν
dR= ET (−ν · W ), ν ∈ IP
, (5)
where E(M) := exp(M − 〈M〉/2) denotes the Doleans-
Dade exponential of a continuous local martingale M .Finally, the claim H is assumed to be of the form H =h(YT ) for some measurable function h.
Remark 1 A typical situation underlying our setup is
as follows. A financial institution sells an option written
on an untradable index Y , and want to maximize her
utility by trading an asset S which is correlated to Y .
However, the probabilistic model of assets (S, Y ) is un-
certain in its expected rate of return (drift, in mathemat-
ical language). Actually, the dynamics under the proba-
– 64 –
JSIAM Letters Vol. 1 (2009) pp.64–67 Keita Owari
bility P ν is:
dSt = St((b(Yt) − ν1t σ(Yt))dt + σ(Yt)dW 1,ν
t ),
dYt = (g(Yt) − ρν1t − ρν2)dt + ρdW 1,ν
t + ρdW 2,νt .
In this context, we can know only the range of the drift
through the set C appearing in the definition of P.
In what follows, we assume
(B1) b, σ, g ∈ C2b (R), where C2
b (R) = f ∈ C2(R) : f, f ′,f ′′are bounded.
(B2) For some k > 0, σ(y) ≥ k for all y.
(B3) h ∈ C2(R), h′ is bounded and h′′ has a polynomialgrowth.
Our first task is to check that:
Lemma 2 Under (B1) – (B3), the conditions (A1) –(A4) of [1] are satisfied.
Once this lemma is established, an optimal strategy θwill be derived via (i) solving the dual problem (2), and
(ii) finding θ satisfying (3).
Remark 3
(I) In this setting, we can show that
H(Q|P ) < ∞ for some P ∈ P ⇔ H(Q|R) < ∞,(6)
for all local martingale measures Q. In particu-
lar, ΘH is characterized as the class of predictable
(S,R)-integrable processes θ such that θ ·S is a mar-
tingale under all absolutely continuous local martin-
gale measures Q with H(Q|R) < ∞. This condition
is further reduced to “all equivalent martingale mea-
sures with...”. Therefore, the class ΘH is actually
independent of PH , hence of H. This point is con-
ceptually important since the dependence of Θ onPH , which is a part of the solution to the dual prob-
lem, implies that we can not specify the admissible
class for the primal problem until we solve the dual
problem.
(II) For our purpose, it suffices to consider Qef for the
domain of dual problem since we already know that a
solution to the dual problem is obtained in Qef ×P.
Let IM be the set of predictable processes η with
ER[∫ T
0η2
t dt] < ∞, and ER[ET (−(λ(Y ), η)·W )] = 1,where λ := b/σ, and
dQη
dR:= ET (−(λ(Y ), η) · W ), η ∈ IM . (7)
Then Qef = Qη : η ∈ IM.
2.2 Dual problem
Let
Jη,νt := Eη
[
αh(YT ) − 1
2
∫ T
t
‖νs − (λ(Ys), ηs)′‖2ds|Ft
]
,
where Eη[ · ] denotes the expectation under Qη, “ ′ ” isthe transpose, and ‖ · ‖ is the Euclidean norm of R
2.The dual problem (2) is now reduced to the followingstochastic control problems:
maximize Jη,ν0 among (η, ν) ∈ IM × IP . (8)
For each constant η ∈ R, set
Aη := (g − ρλ − ρη)∂y +1
2∂yy
=A0 − ρη∂y, (9)
where ∂y := ∂/∂y and ∂yy := ∂2/∂y2 etc. Then the HJBequation for (8) is formally given by
vt +sup(η,ν)∈R×C
(
Aηv− 1
2‖ν− (λ, η)′‖2
)
= 0,
v(T, y) = αh(y).
(10)
Theorem 4 The HJB equation (10) admits a unique
classical solution v ∈ C1,2((0, T ) × R) ∩ C([0, T ] × R)such that vy := ∂yv is bounded. Then we can choose
measurable functions ν : [0, T ]×R −→ C and η : [0, T ]×R −→ R so that
ν(t, y) ∈ arg infν∈C
(
1
2(ν1 − λ(y))2 + ν2ρvy(t, y)
)
,
η = ν2(t, y) − ρvy(t, y),
and (ν·, η·) := (ν(·, Y·), η(·, Y·)) is an optimal control for
(8). In particular, (Qη, P ν) is a solution to (2).
2.3 Optimal strategy
We now give a representation of an optimal strategyθ via Theorem 4 and the duality result of [1].
Theorem 5 An optimal strategy for the problem (1) is
given by
θt =ρvY (t, Yt) + λ(Yt) − ν1(t, Yt)
ασ(Yt)St
. (11)
Remark 6 Here we give a brief review of related liter-
ature. In the case without uncertainty i.e., P = R (⇔C = (0, 0) in our setup), explicit solutions to expo-
nential hedging through duality are studied by [2] using
BSDE arguments with the help of Malliavin calculus, and
by [3] using PDE arguments close to ours.
There are also a few recent works deriving explicit
form of optimal strategies for robust utility maximiza-
tion. Our setup and idea for the proof of Theorem 4 are
due to [4], where robust power utility maximization is
considered. See also [5] for the case of logarithmic util-
ity.
3. Explicit examples
This section provides two explicit examples which maybe reduced to linear PDEs, hence can be computed viaboth elementary numerical schemes and the Feynman-Kac formula. Recall that our model is characterized bythe compact set C, and the HJB equation takes the form:
vt + A0v +ρ2v2
y
2− l(y, vy) = 0,
v(T, y) = αh(y),
where
l(y, p) := infν∈C
(
1
2(ν1 − λ(y))2 + ρν2p
)
.
Thus, if l(y, p) can be explicitly calculated, then we canexpect an explicit solution.
– 65 –
JSIAM Letters Vol. 1 (2009) pp.64–67 Keita Owari
3.1 The case of disk
We first consider the case where the set C is a disk inR
2 with radius r:
C = x ∈ R2 : ‖x‖ ≤ r. (12)
But due to a technical difficulty, we assume the driftb of S under R is identically zero, or equivalently, λ isidentically zero. In this case,
l(y, p) = inf‖ν‖≤r
(
ν21
2+ ρν2p
)
= −rρ|p|,and ν(y, p) = (0,−r · sgn(p)) is a minimizer. Then theHJB equation is written as:
vt + A0v +ρ2v2
y
2+ rρ|vy| = 0.
Now suppose that the payoff function h is non-increasing. Then noting that the 1-dimensional stochas-tic flow associated to Y is order-preserving under (B1)and (B2), the value function is also decreasing in y vari-able, hence vy ≤ 0. Therefore the term rρ|vy| in theequation is replaced by −rρvy. Moreover, changing the
drift, the equation becomes:
vt + Arρv +ρ2v2
y
2= 0.
Here Arρ is the generator of Y under Qrρ. Note that asimple calculation using the Ito formula yields:
deρ2v(t,Yt) = ρ2eρ2v(t,Yt)vy(t, Yt)dW rρt .
Thus eρ2v(t,Yt) is a martingale, and since v(T, y) = αh(y),
v(t, y) =1
ρ2log Erρ
[
eαρ2h(YT )
∣
∣
∣ Yt = y]
=:1
ρ2v(t, y).
Now the Feynman-Kac formula yields:
Corollary 7 Suppose that C is given by (12), λ ≡ 0and h is non-increasing. Then the value function is rep-
resented as
v(t, y) =1
ρ2log v(t, y),
where v is the solution to the Cauchy problem:
vt + Arρv = 0,
v(T, y) = eαρ2h(y).(13)
Furthermore, (η, ν)=(r−ρ(vy/v)(·, Y ),0, r) is an optimal
control, and an optimal portfolio strategy is given by
θt =ρ
αρ2
vy(t, Yt)
v(t, Yt)σ(Yt)St
. (14)
Remark 8 The case of non-decreasing h can be treated
in a symmetric way.
3.2 The case of rectangle
Let C be a rectangle in R2, that is:
C = x ∈ R2 : |x1| ≤ m1, |x2| ≤ m2. (15)
In this case,
l(y, p) =1
2(ν1(y) − λ(y))2 + ρν2(p)p
=k(y;m1)
2− ρm2|p|,
where
ν1(y) = sgn(λ(y))(|λ(y)| ∧ m1),
ν2(p) = −m2sgn(p), k(y;m1) := (|λ(y)| − m1)+2.
Therefore, the HJB equation is written as:
vt + A0v +ρ2v2
y
2+ ρm2|vy| −
k(y;m1)
2= 0. (16)
As in the case of disk, if the value function is mono-tone (e.g., h is non-increasing and λ is constant), the lin-
earization procedure as in the previous subsection yieldsa linear PDE and a Feynman-Kac representation.
4. Proofs
Proof of Lemma 2 (A1) is guaranteed by [4, Lemma3.1] and [6, Lemma 3.2]. The function b/σ =: λ isbounded by the assumptions (B1) and (B2). ThereforedQ0/dR := ET (−(λ(Y ), 0) ·W ) defines an equivalent lo-cal martingale measure. Since R ∈ P and H(Q0|R) =
ER[∫ T
0λ(Ys)
2ds]/2 < ∞, (A2) is satisfied. Also, (B3)implies that h is globally Lipschitz continuous, henceadmits a constant Kh such that |h(y)| ≤ Kh(1 + |y|) forall y ∈ R. Then (A3) will be verified by checking thateγ|h(YT )|ET (−ν ·W ) : ν ∈ IP is bounded in L2(R) forany γ > α. By the Cauchy-Schwarz inequality,
ER
[
(
eγ|h(YT )|ET (−ν · W ))2
]
≤ ER[
e4γ|h(YT )|
]1
2
ER[
e−4ν·WT]
1
2 . (17)
Introducing another R-Brownian motion W = ρW 1+ρW 2,
e4γ|h(YT )| ≤ e4γKh(1+|YT |)
≤ e4γKh(1+|Y0|+‖g‖∞T+|WT |).
Therefore, the first component in the RHS of (17) isbounded by
√2e2γKh(1+|Y0|+(‖g‖∞+2γKh)T . For the sec-
ond, we can apply [7, Th. III 39] to get an upper bound
e8T (diamC)2
. Thus (A3) is verified, and the dual problem
admits a maximal solution ( QH , PH). Finally, (A4) istrivially satisfied since all P ∈ P are equivalent.
(QED)
For the proof of Theorem 4, we first consider a familyof auxiliary control problems, restricting the domain ofη. For each closed interval I ⊂ R, set II
M := η ∈ IM :ηt ∈ I ∀t, a.s., and consider the equation:
∂tvI+ sup
η∈I,ν∈C
AηvI− 1
2‖ν−(λ(y), η)′‖2
= 0,
vI(T, y) = αh(y).
(18)
If I is compact, then so is I × C, hence we can applyTheorem VI.4.1 and VI.6.2 of [8] to get:
– 66 –
JSIAM Letters Vol. 1 (2009) pp.64–67 Keita Owari
Lemma 9 For each compact I ⊂ R, (18) admits a
unique classical solution vI ∈ C1,2p ((0, T )×R)∩C([0, T ]
×R). Then taking (ηI(t, y), νI(t, y)) ∈ arg supη∈I,ν∈C
AηvI − ‖ν − (λ(y), η)′‖2/2
, we have
vI(t, Yt) = ess supη∈I
IM ,ν∈IP
Jη,νt = J
ηI(·,Y ),νI
(·,Y )
t .
Lemma 10 There exists a constant Kv such that |vIy | ≤
Kv for all compact I.
Proof Let Jη,νt (y) := Eη[αh(Yt,T (y))− (1/2)
∫ T
t‖νs −
(λ(Yt,s(y), ηs)′‖2ds], where Yt,T denotes the stochastic
flow associated to Y . Then noting that | supx f(x) −supx g(x)| ≤ supx |f(x)−g(x)|, it suffices to show the ex-istence of a constant Kv such that |Jη,ν
t (y)−Jη,νt (y′)| ≤
Kv|y−y′| for all t ∈ [0, T ], y, y′ ∈ R and (η, ν) ∈ IM×IP .Since h, g, λ ∈ C2
b , a simple computation yields that
|Jη,νt (y) − Jη,ν
t (y′)|
≤ αKhEη [|Yt,T (y) − Yt,T (y′)|]
+ ˜KKλ
∫ T
t
Eη[|Yt,s(y) − Yt,s(y′)|]ds,
where Kh,Kλ are Lipschitz constants for h, λ, respec-tively, and ˜K = diam(C) + max λ. Also, ∀s ∈ [t, T ],
Eη [|Yt,s(y) − Yt,s(y′)|]
≤ |y − y′| + Eη
[∫ s
t
|g(Yt,u(y) − g(Yt,u(y′))|du
]
≤ |y − y′| + Kg
∫ s
t
Eη[|Yt,u(y) − Yt,u(y′)|]du,
where Kg is a Lipschitz constant for g. Then the Gron-wall inequality shows that Eη[|Yt,s(y) − Yt,s(y
′)|] ≤eKg(s−t)|y − y′| ≤ eKgT |y − y′| for any t ≤ s ≤ T . Hence
we get the result with Kv = eKgT (αKh + ˜KKλT ).(QED)
Proof of Theorem 4 The inside of the bracket in(18) is written as:
A0vI + ρ(vIy)2 − 1
2
η − (ν2 − ρvIy)
2
−
1
2(λ(y) − ν1)
2 + ν2ρvIy
.
Here the third term attains the global maximum atηI = ν2 − ρvI
y , which is bounded by diam(C) + Kv
independently of I. Thus taking I0 := [−diam(C) −Kv,diam(C) + Kv], we have
−∂tvI0 = sup
η∈I0,ν∈C
AηvI0 − 1
2‖ν − (λ(y), η)′‖2
= supη∈R,ν∈C
AηvI0 − 1
2‖ν − (λ(y), η)′‖2
.
Hence v := vI0 is a desired classical solution to (10).The rest of the proof is a standard verification argu-
ment, and we omit this.(QED)
Proof of Theorem 5 By the duality, it suffices to
show that θ ∈ Θ and
dQη
dP ν=
e−α(θ·ST −h(YT ))
EP ν[
e−α(θ·ST −h(YT ))
] .
Since v satisfies the HJB equation (10), the Ito formulayields:
αh(YT ) = v(0, Y0) +
∫ T
0
(
∂t + Aη)
v(s, Ys)ds
+
∫ T
0
vy(s, Ys)dW ηs
= v(0, Y0) + logdQη
dP ν
+
∫ T
0
(ρvy + λ − ν1)(s, Ys)dW 1,ηs
= v(0, Y0) + logdQη
dP ν+ αθ · ST .
Therefore we get dQη/dP ν = ev(0,Y0)e−α(θ·ST −h(YT )).Finally,
∫ T
0
θ2sd〈S〉s =
1
α2
∫ T
0
(ρvy + λ − ν1)(s, Ys)2ds
is bounded, hence θ · S is a martingale under every Q ∈Qe
f . This concludes the proof.(QED)
Acknowledgments
The author is grateful for the financial support fromthe Global Center of Excellence (COE) program “theResearch Unit for Statistical and Empirical Analysis inSocial Sciences (G-COE Hi-Stat)” of Hitotsubashi Uni-versity.
References
[1] K. Owari, Robust exponential hedging and indifference valua-
tion, Discussion paper No. 2008-09, Hitotsubashi Univ., 2008.[2] J. Sekine, On exponential hedging and related quadratic back-
ward stochastic differential equations, Appl. Math. Optim., 54(2006), 131–158.
[3] M. H. A. Davis, Optimal hedging with basis risk, in: FromStochastic Calculus to Mathematical Finance, Y. Kabanov, R.Liptser and J. Stoyanov, eds., pp. 169–187, Springer-Verlag,
Berlin, 2006.[4] D. Hernandez-Hernandez and A. Schied, Robust utility max-
imization in a stochastic factor model, Statist. Decisions, 24(2006), 109–125.
[5] D. Hernandez-Hernandez and A. Schied, A control approach torobust utility maximization with logarithmic utility and time-consistent penalties, Stochastic Process. Appl., 117 (2007),
980–1000.[6] A. Schied and C.-T.Wu, Duality theory for optimal investment
under model uncertainty, Statist. Decisions, 23 (2005), 199–217.
[7] P. E. Protter, Stochastic Integration and Differential Equa-tions, Second Edition, Springer-Verlag, Berlin, 2004.
[8] W. H. Fleming and R. W. Rishel, Deterministic and StochasticOptimal Control, Springer-Verlag, Berlin, 1975.
– 67 –
JSIAM Letters Vol.1 (2009) pp.68–71 c©2009 Japan Society for Industrial and Applied Mathematics
A hybrid of the optimal velocity and the slow-to-start
models and its ultradiscretization
Kazuhito Oguma1
and Hideaki Ujino2
Department of Mathematical Engineering and Information Physics, Faculty of Engineering,The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan1
Gunma National College of Technology, 580 Toriba, Maebashi, Gunma 371-8530, Japan2
E-mail [email protected]
Received August 24, 2009, Accepted October 6, 2009
Abstract
Through an extension of an ultradiscrete optimal velocity (OV) model, we introduce an ul-tradiscretizable traffic flow model, which is a hybrid of the OV and the slow-to-start (s2s)models. Its ultradiscrete limit gives a generalization of a special case of the ultradiscrete OV(uOV) model recently proposed by Takahashi and Matsukidaira. A phase transition from freeto jam phases as well as the existence of multiple metastable states are observed in numeri-cally obtained fundamental diagrams for cellular automata (CA), which are special cases ofthe ultradiscrete limit of the hybrid model.
Keywords optimal velocity (OV) model, slow-to-start (s2s) effect, ultradiscretization
Research Activity Group Applied Integrable Systems
1. Introduction
Studies on microscopic models for vehicle traffic pro-vided a good point of view on the phase transition fromfree to congested traffic flow. Related self-driven many-particle systems have attracted considerable interestsnot only from engineers but also from physicists [1, 2].Among such models, the optimal velocity model [3],which successfully shows a formation of “phantom traf-fic jams” in the high-density regime, is a car-followingmodel describing an adaptation to the optimal velocitythat depends on the distance from the vehicle ahead.
Whereas the OV model consists of ordinary differ-ential equations (ODE), cellular automata (CA) suchas the Nagel-Schreckenberg model [4], the elementaryCA of Rule 184 (ECA184) [5], the Fukui-Ishibashi (FI)model [6] and the slow-to-start (s2s) model [7] are ex-tensively used in analyses of traffic flow. Recently, Taka-hashi and Matsukidaira proposed a discrete OV (dOV)model, which enables an ultradiscretization of the OVmodel [8]. The resultant ultradiscrete OV (uOV) modelincludes both the ECA184 and the FI model as its spe-cial cases. However, the s2s effect remains to be includedin their ultradiscretization. The aim of this letter is topresent an ultradiscretizable hybrid of the OV and thes2s models.
2. The OV model and the s2s effect
Imagine many cars running in one direction on asingle-lane highway. Let xk(t) denote the position of thek-th car at time t. No overtaking is assumed so thatxk(t) ≤ xk+1(t) holds for arbitrary time t. The time-evolution of the OV model [3] is given by
dvk(t)
dt=
1
t0[vopt (∆xk(t)) − vk(t)] , (1)
where vk := dxk/dt and ∆xk := xk+1−xk are the veloc-ity of the k-th car and the interval between the cars kand k+1, respectively. A function vopt and a constant t0represent an optimal velocity and sensitivity of drivers,or the delay of drivers’ response, in other words.
Since the current velocity and the current interval be-tween the car ahead determine the acceleration throughthe time-evolution and the optimal velocity, we clas-sify the OV model (1) as the acceleration-control type(aOV). On the other hand, the OV model of the velocity-control type (vOV) was proposed in earlier studies of thecar-following models [9],
vk(t) = vopt (∆xk(t − t0)) . (2)
Replacement of t in the above equation (2) with t + t0and the Taylor expansion of vk(t + t0) yield
vopt (∆xk(t)) = vk(t + t0)
= vk(t) +dvk(t)
dtt0 +
1
2
d2vk(t)
dt2t20 + · · · ,
which is rewritten as
dvk(t)
dt+
1
2
d2vk(t)
dt2t0 + · · · =
1
t0[vopt (∆xk(t)) − vk(t)] .
Thus the aOV model (1) is given by neglecting the higherorder terms in the Taylor series (2). Though the aOVmodel is more common in the studies on vehicle traf-fic, we shall concentrate on an ultradiscretizable hybridof the vOV and the s2s models. Thus we call the vOVmodel (2) simply as the OV model, hereafter.
Note that the input to the OV function vopt(x) in theOV model (2) is the headway at a single point of timet − t0 that is prior to the present time t. Thus we maysay that the OV model describes, in a sense, “reckless”
– 68 –
JSIAM Letters Vol. 1 (2009) pp.68–71 Kazuhito Oguma and Hideaki Ujino
drivers since the model pays no attention to the head-way between the time t − t0 and the present time t. Onthe other hand, “cautious” drivers governed by the s2smodel [7] keep watching and require enough length ofheadway to go on for a certain period of time before theyrestart their cars. The contrast between the two modelssuggests the idea that the s2s effect and the OV modelcan be brought together by appropriately choosing an ef-fective distance ∆effxk(t) containing information on theheadway for a certain period of time going back fromthe present as an input to the OV function vopt(x). Weshall see this idea works in what follows.
What is crucial in the ultradiscretization of the aOVmodel [8] is the choice of the OV function,
vopt(x) := v0
(
1
1 + e−(x−x0)/δx− 1
1 + ex0/δx
)
, (3)
where v0, x0 and δx are positive constants. In terms ofthe auxiliary functions,
vopt(x) := v0
dxopt(x)
dx(4)
xopt(x) := δx log(
1 + e(x−x0)/δx)
(5)
the OV function (3) is expressed as
vopt(x) = vopt(x) − vopt(x = 0).
A naive discretization of the auxiliary function (4),
vdopt(x) :=
xopt(x) − xopt(x − v0δt)
δt,
introduces the OV function for the discrete OV (dOV)model,
vdopt(x) = vd
opt(x) − vdopt(x = 0)
=δx
δtlog
(
1+e(x−x0)/δx
1+e−x0/δx
/
1+e(x−x0−v0δt)/δx
1+e−(x0+v0δt)/δx
)
,
(6)
which is found to be ultradiscretizable [8].Let xn
k := xk(t = nδt) and vnk := (xn+1
k −xnk )/δt where
n(= 0, 1, 2 · · · ) and δt(> 0) are the integral time and thediscrete time-step, respectively. Employing the effectivedistance as
∆deffxn
k := δx log
(
n0∑
n′=0
e−∆xn−n′
k /δx
n0 + 1
)−1
, (7)
where n0 := t0/δt, we extend the OV model (2) in atime-discretized form as
vnk = vd
opt
(
∆deffxn
k
)
, (8)
which is equivalent to
xn+1
k
= xnk + δx
log
1 +
(
n0∑
n′=0
e−(∆xn−n′
k −x0)/δx
n0 + 1
)−1
− log
1 +
(
n0∑
n′=0
e−(∆xn−n′
k −x0−v0δt)/δx
n0 + 1
)−1
− log(
1+e−x0/δx)
− log(
1+e−(x0+v0δt)/δx)
.
It is straightforward to confirm that the continuum limitδt → 0 of the above discrete s2s–OV (ds2s–OV) model(8) reduces to the integral-differential equation which wecall the s2s–OV model,
dxk(t)
dt= vopt (∆effxk(t))
= v0
(
1 +1
t0
∫ t0
0
e−(∆xk(t−t′)−x0)/δxdt′)−1
− v0
(
1 + ex0/δx)−1
, (9)
where the corresponding effective distance is given by
∆effxk := δx log
(
1
t0
∫ t0
0
e−∆xk(t−t′)/δxdt′)−1
.
We shall see that the s2s effect is indeed built into theOV model in the ultradiscrete limit of the ds2s–OVmodel.
3. Ultradiscretization
Ultradiscretization [10] is a scheme for getting apiecewise-linear equation from a difference equation viathe limit formula
limδx→+0
δx(eA/δx + eB/δx + · · · ) = max(A,B, · · · ).
In order to go forward to the ultradiscretization of theds2s–OV model (8), it will be a good choice for us to be-gin with the ultradiscrete limit δx → +0 of the auxiliaryfunction (5):
xuopt(x) := lim
δx→+0xopt(x) = max(0, x − x0). (10)
In the same way that the OV function for the dOV model(6) is obtained from the auxiliary function (5), we obtainthe OV function for the uOV model [8] as
vuopt(x) = vu
opt(x) − vuopt(x = 0)
= max
(
0,x−x0
δt
)
−max
(
0,x−x0
δt−v0
)
, (11)
where vuopt(x) := (xu
opt(x)− xuopt(x−v0δt))/δt. The effec-
tive distance (7), on the other hand, is ultradiscretizedin the same manner:
∆ueffxn
k := limδx→+0
∆deffxn
k = − n0
maxn′=0
(
−∆xn−n′
k
)
=n0
minn′=0
(
∆xn−n′
k
)
. (12)
Thus we obtain an ultradiscrete equation
vnk = vu
opt (∆ueffxn
k ) , (13)
which is equivalent to
xn+1
k = xnk + max
[
0,n0
minn′=0
(
∆xn−n′
k
)
− x0
]
− max
[
0,n0
minn′=0
(
∆xn−n′
k
)
− x0 − v0δt
]
,
– 69 –
JSIAM Letters Vol. 1 (2009) pp.68–71 Kazuhito Oguma and Hideaki Ujino
as the ultradiscrete limit of the ds2s–OV model (8).We name it the ultradiscrete s2s–OV (us2s–OV) model.When the monitoring period n0 is fixed at zero, the us2s–OV model reduces to a special case of the uOV model [8].As we can see from (11), (12) and (13), the velocity vn
k
is determined by the optimal velocity for the minimumheadway in the period between n− n0 and n. Thus carswill not restart nor accelerate, unless enough clearancegoes on for a certain period of time. On the other hand,cars immediately stop or slow down when their head-ways become too small to keep their velocities. The s2seffect and a “cautious” manner of driving are built intothe uOV model in this way.
Now let us see how a CA comes out from the us2s–OVmodel. Let x0 be the discretization step of the headway∆xn
k , or equivalently, the size of the unit cell of the CA.Then with no loss of generality, we may set x0 = 1.Assume that the number of vacant cells between the carsk and k + 1, ˜∆xn
k := ∆xnk − x0, must be non-negative,
˜∆xnk ≥ 0, which prohibits car-crash. Then the us2s–OV
model (13) reduces to
xn+1
k = xnk + min
[
n0
minn′=0
(
˜∆xn−n′
k
)
, v0δt
]
. (14)
Fixing v0δt at an integer, we call this model the s2s–OV cellular automaton (CA). The s2s–OV CA reducesto the FI model [6] when n0 = 0 and to the ECA184 [5]when n0 = 0 and v0δt = 1(= x0). The s2s model [7] alsocomes out from the s2s–OV CA by choosing n0 = 1 andv0δt = 1(= x0). Thus the s2s–OV CA is regarded as ahybrid of the FI model and an extended s2s model.
4. Numerical experiments
We shall numerically investigate the s2s–OV CA (14).Throughout this section, the length of the circuit L isfixed at L = 100 and the periodic boundary condition isassumed as well so that xn
k + L is identified with xnk .
Spatio-temporal patterns showing trajectories of eachvehicle are given in Fig. 1. We choose the parameters andinitial conditions so that jams appear in the trajectories.The two figures in the top share the same monitoring pe-riod n0 = 2 but their maximum velocities are different.The top left trajectories show that the velocities of thevehicles are zero or one, which is less than or equal toits maximum velocity v0δt = 1. In the top right trajec-tories whose maximum velocity v0δt = 3, on the otherhand, the velocities of the vehicles read zero, one, twoand three. Thus we notice that the vehicles driven bythe s2s–OV CA can run at any allowed integral velocitywhich is less than or equal to its maximum velocity v0δt.
The other two figures in the bottom in Fig. 1 share thesame maximum velocity v0δt = 2, but their monitoringperiods are different. As is observed in the bottom twofigures, the longer the monitoring period is, the longerit takes for the cars to get out of the traffic jam. Thejam front is observed to propagate against the streamof vehicles at constant velocity x0/(n0 + 1)δt, sincecars have to wait n0 + 1 time-steps to restart after theirpreceding cars restarted, as is depicted in Fig. 2.
Fig. 3 shows fundamental diagrams giving the relation
Po
siti
on
of
Veh
icle
s
Time0 10 20 30 40 50 60 70 80 90 100
10
20
30
40
50
60
70
80
90
100
Po
siti
on
of
Veh
icle
s
Time0 10 20 30 40 50 60 70 80 90 100
10
20
30
40
50
60
70
80
90
100
Po
siti
on
of
Veh
icle
s
Time0 10 20 30 40 50 60 70 80 90 100
10
20
30
40
50
60
70
80
90
100
Po
siti
on
of
Veh
icle
s
Time0 10 20 30 40 50 60 70 80 90 100
10
20
30
40
50
60
70
80
90
100
Fig. 1. The spatio-temporal patterns of the s2s–OV CA. For allfour patterns, the number of cars K is fixed at K = 30. Themaximum velocities v0δt and the monitoring periods n0 for thesepatterns are (top left) v0δt = 1, n0 = 2, (top right) v0δt = 3,
n0 = 2, (bottom left) v0δt = 2, n0 = 1 and (bottom right)v0δt = 2, n0 = 3, respectively.
( ( = 3) + 1) δt ( = 4) 0n
( = 1)0xdirection of the stream
5
4
3
2
1
0
jam fronttime
Fig. 2. Backward propagation of the jam front at constant ve-
locity x0/(n0 + 1)δt = 1/4 for the case v0δt = 2, n0 = 3 andx0 = 1.
between the vehicle flow
Q :=1
(n1 − n0 + 1)L
K∑
k=1
n1∑
n=n0
xn+1
k − xnk
δt,
which is equivalent to the total momentum of vehiclesper unit length, and the vehicle density ρ := K/L, whereK is the number of vehicles. The fundamental diagramsclearly show phase transitions from free to jam phasesas well as metastable states, which are also observedin empirical flow-density relations [1,2]. It is remarkablethat the fundamental diagrams have multiple metastablebranches. This feature is similar to that reported byNishinari et al. [11]. We observe that each fundamentaldiagram has v0δt metastable branches and a jammingline. The branches and the jamming line correspond tointegral velocities that are less than or equal to the maxi-mum velocity v0δt. Let us confirm it with Fig. 3. The toptwo figures share the same monitoring period n0 = 3, buttheir maximum velocities are different. The top left dia-
– 70 –
JSIAM Letters Vol. 1 (2009) pp.68–71 Kazuhito Oguma and Hideaki Ujino
Flow
Density
2
1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Flow
Density
2
1
0
34
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Flow
Density0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Flow
Density0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 3. The fundamental diagrams of the s2s–OV CA. The flows
Q are computed by averaging over the time period 800 ≤ n ≤1000. The maximum velocities v0δt and the monitoring periodsn0 for these patterns are (top left) v0δt = 2, n0 = 3, (top right)v0δt = 4, n0 = 3, (bottom left) v0δt = 3, n0 = 2 and (bottom
right) v0δt = 3, n0 = 4, respectively. The inclination of the freeline equals to the maximum velocity v0δt. The jamming line hasa negative inclination.
gram corresponding to v0δt = 2 has three branches. Thisnumber equals to that of all the integral velocities, two,one and zero, as is depicted in the diagram. The numberof the metastable branches in the top right diagram aswell as those of the bottom two are explained in the samemanner. This observation also suggests that the moni-toring period is irrelevant to the number of metastablebranches.
All the end points of the branches as well as the jam-ming line are on the line ρ+Q(= ρ+Q(δt/x0)) = 1. Thisis because the density at the end point is the maximumdensity ρmax(v) that allows the velocity of the slowestcar to be vδt. The maximum density ρmax(v) is deter-mined by
ρmax(v) =x0
vδt + x0
.
Since all the cars flow at the velocity vδt when ρ =ρmax(v), the corresponding flow is given by Q(ρmax) =vρmax. Thus the relation ρmax + Q(ρmax)(δt/x0) = 1holds.
The free line is a branch whose inclination equals tothe maximal velocity v0δt. Any other metastable branchand the jamming line branch out from the free line.We observe that the density at the branch point of thebranch corresponding to the velocity vδt reads
ρb =x0
(v0δt − vδt)n0 + v0δt + x0
.
This observation is explained as follows. Suppose onecar, say the car k, runs at the velocity v and the otherK − 1 cars run at the maximum velocity v0. At the mo-ment the k-th car slows down to v, the headway betweenthe cars k and k + 1 is vδt + x0. Since it takes at least
n0 + 1 time-steps for the car k to speed up to v0, theheadway between the cars k and k + 1 expands up toH = (v0δt − vδt)(n0 + 1) + vδt + x0 = x0/ρb ≥ v0δtby the time the k-th car speeds up to v0. If all the carscan obtain the headway H, slow cars running at the ve-locity v disappear in the end. Thus the density at thebranch point of the branch corresponding to the velocityvδt is given by ρb = x0/H. Note that the density at thebranch point becomes smaller as the monitoring periodbecomes larger.
5. Concluding remarks
Through an extension of the ultradiscrete OV model[8], we introduced the ds2s–OV (8) and s2s–OV (9) mod-els as ultradiscretizable traffic flow models. The model isa hybrid of the OV [3] and the s2s [7] models whose ul-tradiscrete limit gives a generalization of a special case ofthe uOV model by Takahashi and Matsukidaira [8]. Thephase transition from free to jam phases as well as theexistence of multiple metastable states were observed inthe numerically obtained fundamental diagrams for thes2s–OV CA (14), which are special cases of the us2s–OVmodel (13).
Detailed studies on the properties of the hybrid mod-els (8), (9), (13) and (14) such as exact solutions, com-parison with other traffic flow models as well as empiricaldata remain to be investigated.
Acknowledgments
The authors are grateful to D. Takahashi, J. Mat-sukidaira, A. Tomoeda, D. Yanagisawa and R. Nishi fortheir valuable comments at the spring meeting of JSIAMin March, 2009.
References
[1] D. Chowdhury, L. Santen and A. Schadschneider, Statisticalphysics of vehicular traffic and some related systems, Phys.
Rep., 329 (2000), 199–329.[2] D. Helbing, Traffic and related self-driven many-particle sys-
tems, Rev. Mod. Phys., 73 (2001), 1067–1141.
[3] M. Bando, K. Hasebe, A. Nakayama, A. Shibata and Y.Sugiyama, Dynamical model of traffic congestion and numer-
ical simulation, Phys. Rev. E, 51 (1995), 1035–1042.[4] K. Nagel and M. Schreckenberg, A cellular automaton model
for freeway traffic, J. Physique I, 2 (1992), 2221–2229.[5] S. Wolfram, Theory and applications of cellular automata,
World Scientific, Singapore, 1986.
[6] M. Fukui and Y. Ishibashi, Traffic flow in 1D cellular automa-ton model including cars moving with high speed, J. Phys.
Soc. Jpn., 65 (1996), 1868–1870.[7] M. Takayasu and H. Takayasu, 1/f noise in a traffic model,
Fractals, 1 (1993), 860–866.[8] D. Takahashi and J. Matsukidaira, On a discrete optimal ve-
locity model and its continuous and ultradiscrete relatives,JSIAM Letters, 1 (2009), 1–4.
[9] G. F. Newell, Nonlinear effects in the dynamics of car follow-ing, Oper. Res., 9 (1961), 209–229.
[10] T. Tokihiro, D. Takahashi, J. Matsukidaira and J. Satsuma,
From soliton equations to integrable cellular automata
through a limiting procedure, Phys. Rev. Lett., 76 (1996),3247–3250.
[11] K. Nishinari, M. Fukui and A. Schadschneider, A stochas-
tic cellular automaton model for traffic flow with multiple
metastable states, J. Phys. A: Math. Gen., 37 (2004), 3101–3110.
– 71 –
JSIAM Letters Vol.1 (2009) pp.72–75 c©2009 Japan Society for Industrial and Applied Mathematics
A new compressible fluid model for traffic flow
with density-dependent reaction time of drivers
Akiyasu Tomoeda1,2
, Daisuke Shamoto3, Ryosuke Nishi
3,
Kazumichi Ohtsuka2
and Katsuhiro Nishinari2,4
Meiji Institute for Advanced Study of Mathematical Sciences, Meiji Univeristy, 1-1-1 Higashi-mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan1
Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1Komaba, Meguro-ku, Tokyo 153-8904, Japan2
Department of Aeronautics and Astronautics, School of Engineering, The University of Tokyo,7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan3
PRESTO, JST4
E-mail [email protected]
Received September 30, 2009, Accepted November 8, 2009
Abstract
In this paper, we have proposed a new compressible fluid model for the one-dimensional trafficflow taking into account a variation of the reaction time of drivers, which is based on the actualmeasurements. The model is a generalization of the Payne model by introducing a density-dependent function of reaction time. The linear stability analysis of this new model showsthe instability of homogeneous flow around a critical density of vehicles, which is observed inreal traffic flow. Moreover, the condition of the nonlinear saturation of density against smallperturbation is theoretically derived from the reduction perturbation method.
Keywords jamology, traffic flow, compressible fluid, stability analysis, reaction time
Research Activity Group Applied Integrable Systems
1. Introduction
Among various kinds of jamming phenomena, a traf-fic jam of vehicles is a very familiar phenomenon andcauses several losses in our daily life such as decreas-ing efficiency of transportation, waste of energy, seri-ous environmental degradation, etc. In particular, high-way traffic dynamics has attracted many researchersand has been investigated as a nonequilibrium systemof interacting particles for the last few decades [1]. Alot of mathematical models for one-dimensional trafficflow have ever been proposed [2–8] and these modelsare classified into microscopic and macroscopic mod-els in terms of the treatments of particles. In micro-
scopic models, e.g. car-following model [2, 3] and cel-lular automaton model [4, 5], the dynamics of trafficflow is described by the movement of individual ve-hicles. Whereas in macroscopic models, the dynamicsis treated as an effectively one-dimensional compress-ible fluid by focusing on the collective behavior of ve-hicles [6–8]. Moreover, it is widely known that some ofthese mathematical models are related with each other,which is shown by using mathematical method such asultra-discretization method [9] or Euler-Lagrange trans-formation [10]. That is, rule-184 elementary cellular au-tomaton model [4] is derived from Burgers equation [6]by ultra-discretization method. The specific case of op-timal velocity (OV) model [3] is formally derived fromthe rule-184 cellular automaton model via the Euler-Lagrange transformation. Quite recently, more notewor-
thy are that the ultra-discrete versions of OV model areshown by Takahashi et. al [11] and Kanai et. al [12].
In contrast to practically reasonable microscopic mod-els, only a small number of macroscopic models withreasonable expression have been proposed, even in vari-ous traffic models based on the hydrodynamic theory offluids [6–8]. In previous fluid models, one does not haveany choice to introduce the diffusion term into the mod-els such as Kerner and Konhauser model [8], in orderto represent the stabilized density wave, which indicatesthe formation of traffic jam. However, it emerges as aserious problem that some vehicles move backward evenunder heavy traffic, since the diffusion term has a spatialisotropy. As mentioned by Daganzo in [13], the most es-sential difference between traffic and fluids is as follows:
“A fluid particle responds to the stimulus from the
front and from behind, however a vehicle is an anisotropic
particle that mostly responds to frontal stimulus.”That is, traffic vehicles exhibits an anisotropic behav-
ior, although the behavior of fluid particles with simplediffusion is isotropic. Therefore, unfortunately we wouldhave to conclude that traffic models which include thediffusion term are not reasonable for the realistic expres-sion of traffic flow. Given these factors, we suppose thattraffic jam forms as a result of the plateaued growth ofsmall perturbation by the nonlinear saturation effect.
Now let us return to Payne model [7], which is oneof the most fundamental and significant fluid models oftraffic flow without diffusion term. Payne model is given
– 72 –
JSIAM Letters Vol. 1 (2009) pp.72–75 Akiyasu Tomoeda et al.
= 3.0
= 2.0
= 1.0
1.00.80.60.40.20.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
τ
τ
τ
-
-
-
-
Density
Sta
bil
ity
Fig. 1. The plots of stability in the case of (3) with ρmax = 1.0and V0 = 1.0.
by
∂ρ
∂t+
∂
∂x(ρv) = 0, (1)
∂v
∂t+ v
∂v
∂x=
1
τ(Vopt(ρ) − v) +
1
2τρ
dVopt(ρ)
dρ
∂ρ
∂x, (2)
where ρ(x, t) and v(x, t) corresponds to the spatial ve-hicle density and the average velocity at position x andtime t, respectively. τ is the reaction time of drivers,which is a positive constant value, and Vopt(ρ) is theoptimal velocity function, which represent the desiredvelocity of drivers under the density ρ.
As the optimal velocity function, Payne employs
Vopt(ρ) = V0
(
1 − ρ
ρmax
)
, (3)
where V0 is the maximum velocity in free flow phase andρmax corresponds to the maximum density when all carsbecome completely still.
The linear stability analysis for Payne model gives usthe dispersion relation as follows [7]:
ω =−kVopt(ρ0)+i
2τ
[
1±√
1+4a20τ
2(2kρ0i−k2)
]
, (4)
where a20 = − 1
2τ
dVopt(ρ)
dρ
∣
∣
∣
∣
ρ=ρ0
> 0. (5)
Furthermore, the linear stability condition is calculatedfrom dispersion relation (4),
1
2τ> −ρ2
0
dVopt(ρ)
dρ
∣
∣
∣
∣
ρ=ρ0
. (6)
If one applies velocity-density relation of (3) to this sta-bility condition, the following linear stability conditionis obtained
ρmax
2τV0
> ρ20. (7)
Here, let us define the stability function S(ρ0) by
S(ρ0) =ρmax
2τV0
− ρ20. (8)
In this function, the condition S(ρ0) > 0 (S(ρ0) < 0)corresponds to the stable (unstable) state.
Fig. 1 shows the stability plots for several constantvalues of τ . From this figure, we can observe that theinstability region of homogeneous flow occurs beyond acritical density of vehicles.
However, Payne model shows the condensation of ve-
Vel
oci
ty[m
/sec
]
Time[sec]540520500480460440
20.5
free
21.5
22.5
23.5
24.5
21
22
23
24
25
Car2Car1
τ
Time[sec]
Car2Car1
0
2
4
6
8
10
12
jamτ
600 620 640 660 680 700 720
Velocity[m
/sec]
Fig. 2. The synthetograph of time-series data in two phases. The
left figure corresponds to the free flow phase (from 430 sec. to560 sec.) and the right one corresponds to the jam phase (from600 sec. to 720 sec.).
hicles due to the momentum equation (2). That is, asthe density increases, the value of optimal velocity inthe first term of right-hand-side becomes zero and thevalue of second term also becomes zero. Thus, the ve-hicles gather in one place due to the nonlinear effectvvx and the small perturbation blows up without sta-bilization. Therefore, Payne model is also incomplete todescribe the realistic dynamics of traffic flow.
Thus, in this paper we propose a new compressiblefluid model, by improving the Payne model in terms ofthe reaction time of drivers based on the following actualmeasurements.
2. New compressible fluid model based
on experimental data
We have performed car-following experiment on ahighway. The leading vehicle cruises with legal velocityand following vehicle pursues the front one. The time-series data of the velocity and position (latitude andlongitude) of each vehicle are recorded every 0.2 seconds(5Hz) by a global positioning system (GPS) receiver on-board with high-precision (< 60 centimeters precision).
By dividing the time-series data into two phases, i.e.free-flow phase and jam phase, based on the velocity, wehave obtained the synthetograph of two time-series dataas shown in Fig. 2, which shows that drivers obviouslyreact to the front car with a slight delay in both twophases. Here, assuming that the reaction time of driversis considered as a slight delay of behavior, we calculatethe correlation coefficient which is denoted by
ri+1(τ)
= 〈vi(t)vi+1(t + τ)〉t (9)
=
∑
k
(
vi(t(k))−vi
) (
vi+1(t(k) +τ)−vi+1
)
√
∑
k
(
vi(t(k))−vi
)2√
∑
k
(
vi+1(t(k) +τ)−vi+1
)2,
(10)
where vi(t) shows the velocity of i-th car at time t. Notethat, i-th car drives in front of (i + 1)-th car. The sym-bol 〈∗〉 and bar indicates an ensemble average and time-average, respectively. Finally we have obtained the cor-relation coefficient for each given τ which is shown inFig. 3. From this figure, we have found that the peak ofcorrelation coefficient shifts according to the situationof the road. Here, since the reaction time of a driver isconsidered as τ , the reaction time of a driver is not con-
– 73 –
JSIAM Letters Vol. 1 (2009) pp.72–75 Akiyasu Tomoeda et al.
Free
Jam
Corr
elat
ion C
oef
fici
ent
(r(τ
))
Reaction Time of Drivers (τ)
1510500.0
0.2
0.4
0.6
0.8
1.0
Fig. 3. The plots of correlation coefficient based on (9) for giveneach reaction time (τ).
stant, but obviously changes according to the situationof the road. That is, if the traffic state is free (jam), thereaction time of a driver is longer (shorter).
As a reasonable assumption based on this result, thereaction time of drivers depend on the density on theroad. Under this assumption, we have extended Paynemodel and proposed a new compressible fluid model asfollows,
∂ρ
∂t+
∂
∂x(ρv) = 0, (11)
∂v
∂t+ v
∂v
∂x=
1
τ(ρ)(Vopt(ρ) − v) +
1
2ρτ(ρ)
dVopt(ρ)
dρ
∂ρ
∂x.
(12)
The difference between this model and Payne model isthat the reaction time of drivers is changed from con-stant value to density-dependent function τ(ρ).
2.1 Linear stability analysis
Now let us perform the linear stability analysis forour new dynamical model to investigate the instabilityof homogeneous flow. The homogeneous flow and smallperturbation are given by
ρ = ρ0 + ερ1, v = Vopt(ρ0) + εv1. (13)
One obtains the form of dispersion relation as
ω =−kVopt +i[
1±√
1+4a20τ(ρ0)2(2kρ0i−k2)
]
2τ(ρ0), (14)
where a20 = − 1
2τ(ρ0)
dVopt(ρ)
dρ
∣
∣
∣
∣
ρ=ρ0
> 0. (15)
Hence, the stability conditions
1
2τ(ρ0)> −ρ2
0
dVopt(ρ)
dρ
∣
∣
∣
∣
ρ=ρ0
, (16)
are obtained. The difference between (6) and (16) comesfrom the reaction time, which changes from τ to τ(ρ0)which is decided by only the initial density of homoge-neous flow. Therefore, this stability condition (16) andthe stability condition of Payne model (6) are essentiallyequivalent, that is, our new model also shows the insta-bility of homogeneous flow. Substituting (3) into (16),the stability condition leads to
ρmax
2τ(ρ0)V0
> ρ20. (17)
The most important point of our new model is that itis possible to stabilize the perturbation by the nonlineareffect which is created by the function τ(ρ), though thisstabilizing mechanism was failed in the Payne model. Inorder to show this nonlinear effect, the evolution equa-tion of small perturbation is derived in the next subsec-tion.
2.2 Reductive perturbation analysis
Let us define the slowly-varying variables X and T byGalilei transformation as
X = ε(x − cgt), T = ε2t, (18)
where cg = dω/dk is the group velocity. Next, we as-sume that ρ(x, t), v(x, t) can be expressed in terms ofthe power series of ε, i.e.,
ρ ∼ ρ0 + ερ1 + ε2ρ2 + ε3ρ3 + · · · , (19)
v ∼ v0 + εv1 + ε2v2 + ε3v3 + · · · . (20)
Substituting (19) and (20) into (11) and (12), for eachorder term in ε, we have, respectively,
ε3:∂ρ1
∂T+
∂
∂X(ρ2(v0−cg)+ρ1v1 +ρ0v2) = 0, (21)
ε4:∂ρ2
∂T+
∂
∂X(ρ3(v0−cg)+ρ2v1 +ρ1v2 +ρ0v3) = 0,
(22)
and
ε2: (v0 − cg)∂v1
∂X
=V ′′
optρ21 + 2V ′
optρ2 − 2v2
2τ(ρ0)+
V ′
opt
2τ(ρ0)ρ0
∂ρ1
∂X, (23)
ε3:∂v1
∂T+ (v0 − cg)
∂v2
∂X+ v1
∂v1
∂X
=1
τ(ρ0)
(
V ′′′
optρ31
6+ V ′′
optρ1ρ2 + V ′
optρ3 − v3
)
− ρ1τ′(ρ0)
τ(ρ0)2
(
V ′
optρ2 +V ′′
optρ21
2− v2
)
+1
2τ(ρ0)
(
V ′
opt
ρ0
∂ρ2
∂X+
V ′′
optρ1
ρ0
∂ρ1
∂X
−V ′
optρ1
ρ20
∂ρ1
∂X−
V ′
optρ1τ′(ρ0)
ρ0τ(ρ0)
∂ρ1
∂X
)
.
(24)
Note that the prime means the abbreviation of eachderivation.
Putting φ1 = ρ1 as a first-order perturbation quantityand eliminating the second-order quantities (ρ2, v2) in(21) and (23), we have obtained the Burgers equation
∂φ1
∂T=
[
2(v0 − cg)
ρ0
− V ′′
optρ0
]
φ1
∂φ1
∂X
+
[
v0 − cg
2ρ0
− τ(ρ0)(v0 − cg)2
]
∂2φ1
∂X2, (25)
as a evolution equation of first-order quantity. Moreover,eliminating the third-order quantities (ρ3, v3) in (22) and
– 74 –
JSIAM Letters Vol. 1 (2009) pp.72–75 Akiyasu Tomoeda et al.
Table 1. Classification table based on the coefficient of diffusionterm.
P Q P − εQΦ Time evolution
(Linear unstable) Q < 0P − εQΦ > 0 Saturation
P < 0P − εQΦ < 0 Amplification
Q > 0 P − εQΦ < 0 Amplification
(Linear stable)Q < 0 P − εQΦ > 0 Damping
P > 0 Q > 0P − εQΦ > 0 DampingP − εQΦ < 0 Amplification
(24) and defining the perturbation Φ included the first-and second-order perturbation as Φ = φ1 + εφ2, thehigher-order Burgers equation
∂Φ
∂T=
2(v0−cg)
ρ0
Φ∂Φ
∂X+
[
v0−cg
2ρ0
− (v0−cg)2τ(ρ0)
]
∂2Φ
∂X2
− ε(v0 − cg)2
[
2τ(ρ0)
ρ0
+ τ ′(ρ0)
]
∂
∂X
(
Φ∂Φ
∂X
)
+
[
τ(ρ0)
ρ0
− 2(v0 − cg)τ(ρ0)2
]
∂3Φ
∂X3
, (26)
is obtained. Note that, in this derivation, we put V ′′
opt =V ′′′
opt = 0 due to the relation (3).Although the first-order equation (25) of our model
is essentially equivalent to that of Payne model, thesecond-order equation (26) is different from that ofPayne model in terms of the coefficient of the third termof right-hand side.
In order to analyze the nonlinear effect of our model,let us consider the coefficient of the diffusion term ofsecond-order equation. Let us put the coefficient of thesecond term of right-hand side in (26) as
P =v0 − cg
2ρ0
− τ(ρ0)(v0 − cg)2, (27)
and also put the coefficient of the third term as
Q =2(v0 − cg)
2τ(ρ0)
ρ0
+ (v0 − cg)2τ ′(ρ0). (28)
Thus, diffusion term of (26) is given by
(P − εQΦ)∂2Φ
∂X2. (29)
Since P = 0 corresponds to the neutrally stable condi-tion, we assume the value P is negative, which corre-sponds to the linear unstable case. In the case of Paynemodel, Q is always positive because τ is constant, i.e. τ ′
is always zero. Therefore, the diffusion coefficient (29) isalways negative under the linear unstable condition ofPayne model, which makes the model difficult to treatnumerically. However, in the case of our model, τ ′(ρ) isalways negative since τ(ρ) is considered as monotonicallydecreasing function. If Q is negative, the diffusion coef-ficient becomes positive as Φ increases even under thelinear unstable condition. In this situation, the smallperturbation will be saturated by nonlinear effect cre-ated by the density-dependent function of reaction timeof drivers. The conditions for nonlinear saturation cor-responds to P < 0 and Q < 0, which are transformed
into the following expressions,
τ(ρ0) >1
2ρ0(v0 − cg), τ ′(ρ0) < −2τ(ρ0)
ρ0
. (30)
All conditions which include the other cases are sum-marized in Table 1.
3. Conclusion
A new compressible fluid model for one-dimensionaltraffic flow has been proposed by introducing the density-dependent function τ(ρ) about reaction time of driversbased on actual measurements. Our new model does notinclude the diffusion term which exhibits the unrealis-tic isotropic behavior of vehicles, since vehicles mostlyresponds to the stimulus from the front one. The linearstability analysis for our new model gives us the exis-tence of instability of homogeneous flow. We have foundthat the stability condition is essentially equivalent toPayne model. Moreover, the behavior of small pertur-bation of density is classified according to the diffusioncoefficient of the higher-order Burgers equation, which isderived from our new model by using reductive perturba-tion method. From this classification, we have obtainedthe special condition where the small perturbation issaturated by nonlinear effect.
References
[1] D. Chowdhury, L. Santen and A. Schadschneider, Statisticalphysics of vehicular traffic and some related systems, Phys.
Rep., 329 (2000), 199–329.[2] G. F. Newell, Nonlinear effects in the dynamics of car follow-
ing, Oper. Res., 9 (1961), 209–229.
[3] M. Bando, K. Hasebe, A. Nakayama, A. Shibata and Y.Sugiyama, Dynamical model of traffic congestion and numer-
ical simulation, Phys. Rev. E, 51 (1995), 1035–1042.[4] K. Nishinari and D. Takahashi, Analytical properties of ultra-
discrete Burgers equation and rule-184 cellular automaton, J.Phys. A: Math. Gen., 31 (1998), 5439–5450.
[5] M.Kanai, K.Nishinari and T.Tokihiro, Stochastic optimal ve-locity model and its long-lived metastability, Phys. Rev. E,
72 (2005), 035102.[6] G. B. Whitham, Linear and Nonlinear Waves, Wiley-
Interscience, New York, 1974.
[7] H. J. Payne, Models of freeway traffic and control, in: Simu-lation Council Proc., G. A. Bekey ed., Mathematical Models
of Public Systems, 1 (1971), 51–61.[8] B. S. Kerner and P. Konhauser, Cluster effect in initially
homogeneous traffic flow, Phys. Rev. E, 48 (1993), R2335–R2338.
[9] T. Tokihiro, D. Takahashi, J. Matsukidaira and J. Satsuma,
From soliton equations to integrable cellular automata
through a limiting procedure, Phys. Rev. Lett., 76 (1996)3247–3250.
[10] J. Matsukidaira and K. Nishinari, Euler-Lagrange correspon-
dence of cellular automaton for traffic-flow models, Phys.Rev.Lett., 90 (2003), 088701.
[11] D. Takahashi and J. Matsukidaira, On a discrete optimal ve-
locity model and its continuous and ultradiscrete relatives,JSIAM Letters, 1 (2009), 1–4.
[12] M. Kanai, S. Isojima, K. Nishinari and T. Tokihiro, Ultra-discrete optimal velocity model: A cellular-automaton modelfor traffic flow and linear instability of high-flux traffic, Phys.Rev. E, 79 (2009), 056108.
[13] C. F. Daganzo, Requiem for second-order fluid approxima-tions of traffic flow, Trans. Res. B, 29 (1995), 277–286.
– 75 –
JSIAM Letters Vol.1 (2009) pp.76–79 c©2009 Japan Society for Industrial and Applied Mathematics
Error analysis for a matrix pencil of Hankel matrices
with perturbed complex moments
Tetsuya Sakurai1, Junko Asakura
2, Hiroto Tadano
1and Tsutomu Ikegami
3
Department of Computer Science, University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, Ibaraki305-8573, Japan1
Research and Development Division, Square Enix Co. Ltd., Shinjuku Bunka Quint Bldg. 3-22-7 Yoyogi, Shibuya-ku, Tokyo 151-8544, Japan2
Information Technology Research Institute, AIST, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568,Japan3
E-mail [email protected]
Received September 30, 2009, Accepted December 6, 2009
Abstract
In this paper, we present perturbation results for eigenvalues of a matrix pencil of Hankelmatrices for which the elements are given by complex moments. These results are extendedto the case that matrices have a block Hankel structure. The influence of quadrature erroron eigenvalues that lie inside a given integral path can be reduced by using Hankel matricesof an appropriate size. These results are useful for discussing the numerical behavior of rootfinding methods and eigenvalue solvers which make use of contour integrals. Results fromsome numerical experiments are consistent with the theoretical results.
Keywords perturbation results, eigenvalues, block Hankel matrix, matrix-valued moments
Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications
1. Introduction
We consider the problem of determining poles and re-spective residues from a sequence of complex moments.This problem appears in methods for finding roots of an-alytic functions [1–3] and eigenvalue solvers [4–8] usingcontour integrals. In these methods, the problem of de-termining zeros or eigenvalues in a given circle is reducedto an eigenvalue problem for a matrix pencil correspond-ing to Hankel matrices.
In this paper, we present perturbation results for theeigenvalues of a matrix pencil of Hankel matrices associ-ated with complex moments. We extend these results tothe case where matrices have a block Hankel structurewith matrix valued moments. These results are usefulin discussing the numerical behavior of moment-basedmethods, and they can be used to determine parametersfor these methods.
Our results suggest that the use of Hankel matrices ofan appropriate size reduces the influence of quadratureerror on eigenvalues that lie inside a given integral path.Hankel matrices are known to be very ill-conditioned [9].Indeed, the condition number for Hankel matrices oftenincreases exponentially. However, our element-wise erroranalysis shows that eigenvalues inside the unit circle ofa matrix pencil of Hankel matrices can be obtained ac-curately. This result can be generalized to an arbitrarycircle by a shift and scale transformation.
The rest of this paper is organized as follows. In Sec-tion 2, we present perturbation results for a matrix pen-cil of Hankel matrices. In Section 3, we extend the resultsto the case where the matrix pencil consists of blockHankel matrices. Some numerical experiments, the re-
sults of which are consistent with the theoretical results,are reported on in Section 4.
2. Perturbation results for a matrix pen-
cil of Hankel matrices
Let f(z) be a rational function with n simple polesηi ∈ C for 1 ≤ i ≤ n, and let νi ∈ C for 1 ≤ i ≤ n be theirresidues, where C denotes the set of complex numbers.Throughout this paper, we assume that η1, . . . , ηn aremutually distinct and that νi 6= 0 for 1 ≤ i ≤ n.
Define the sequence of complex moments as follows:
µk =1
2πi
∫
T
zkf(z)dz, k = 0, 1, . . . , (1)
where T is the unit circle. Let m poles η1, . . . , ηm be lo-cated inside the unit circle, with the rest located outsidethe unit circle. Then, from the residue theorem, µk isgiven by
µk =
m∑
i=1
νiηki , k = 0, 1, . . . .
Let the Hankel matrix Hm ∈ Cm×m associated with
µk2m−2
k=0and the shifted Hankel matrix H<
m ∈ Cm×m
associated with µk2m−1
k=1be
Hm = [µi+j−2]mi,j=1, H<
m = [µi+j−1]mi,j=1,
respectively. Let Vm ∈ Cm×m be the Vandermonde ma-
trix Vm = [ηj−1
i ]mi,j=1. The eigenvalues and eigenvectors ofthe matrix pencil H<
m−λHm can be expressed as follows:
Theorem 1 The eigenvalues of the matrix pencil H<m−
λHm are given by η1, . . . , ηm. The right eigenvector xi
– 76 –
JSIAM Letters Vol. 1 (2009) pp.76–79 Tetsuya Sakurai et al.
with respect to ηi is given by
xi =
(
1√νi
)
V −1m ei,
and the left eigenvector yi is given by yi = xi with
y∗
i Hmxi = 1. Here ei is the i-th unit vector.
Proof Let um ∈ Cm be
um = [ν1
2
1 , . . . , ν1
2
m]T,
and let ∆m = diag(η1, . . . , ηm). It can be easily seen that
µk = uTm∆k
mum, k = 0, 1, . . . .
It follows that the Hankel matrices can be factorized asfollows:
Hm = φTmφm, H<
m = φTm∆mφm,
where
φm = [um ∆mum · · · ∆m−1m um] ∈ C
m×m.
This implies that
H<m − λHm = φT
m(∆m − λIm)φm,
where Im is the m × m identity matrix. The matrix φm
is nonsingular, because φm can be expressed as
φm = diag(ν1
2
1 , . . . , ν1
2
m)Vm,
where η1, . . . , ηm are all distinct, and ν1, . . . , νm are notzero. Thus, the eigenvalues of the matrix pencil H<
m −λHm are given by η1, . . . , ηm.
Since ∆mei = ηiei, it can be verified that
xi = φ−1m ei = V −1
m diag(ν1
2
1 , . . . , ν1
2
m)−1ei =
1√νi
V −1m ei.
We can also verify that yi = (φm)−1ei = xi. From these
results, we have
y∗
i Hmxi = eTi (φT
m)−1 φTm φm φ−1
m ei = 1.
This proves the theorem.(QED)
An error estimation for the eigenvalues of a perturbedmatrix pencil, when all the eigenvalues are simple, isgiven in [2]. Let λ1, . . . , λn be eigenvalues of the matrixpencil A−λB, and let xi and yi be right and left eigen-vectors with respect to λi, respectively. Then the eigen-value λi of the perturbed matrix pencil (A+∆A)−λ(B+∆B), where ‖∆A‖2 ≤ δ and ‖∆B‖2 ≤ δ for sufficientlysmall δ > 0, satisfies the following relation:
|λi − λi| ≤ δ(1 + |λi|)‖xi‖2 · ‖yi‖2
|y∗
i Bxi|+ O(δ2). (2)
Define
τi(A,B) = (1 + |λi|)‖xi‖2 · ‖yi‖2
|y∗
i Bxi|,
then τi(A,B) expresses the condition for the i-th eigen-value of the matrix pencil A−λB. From Theorem 1, wehave the following expression.
Lemma 2
τi(H<m,Hm) =
1 + |ηi||νi|
‖V −1m ei‖2
2. (3)
Suppose that the contour integral (1) is approximatedusing the N -point trapezoidal rule:
µk =1
N
N−1∑
j=0
θk+1
j f(θj), k = 0, 1, . . . ,
with the equi-distributed points on the unit circle:
θj = e2πi
N(j+ 1
2), j = 0, 1, . . . , N − 1.
The approximate moments µk suffer from quadratureerror. For error analysis of the trapezoidal rule, we usethe following estimation.
Lemma 3 Let η be a complex number with |η| 6= 1. For
any integer k with 0 ≤ k < N , the following holds:
1
N
N−1∑
j=0
θk+1
j
θj − η=
ηk
1 + ηN. (4)
Proof If |η| < 1, we have
1
N
N−1∑
j=0
θk+1
j
θj − η=
1
N
N−1∑
j=0
θkj
1 − η
θj
=
∞∑
p=0
ηp 1
N
N−1∑
j=0
θk−pj
=
∞∑
q=0
(−1)qηNq+k. (5)
The last step follows from the fact that
1
N
N−1∑
j=0
θpj =
(−1)q if p = qN for q ∈ Z
0 otherwise,
where Z denotes the set of integers.Similarly, for the case in which |η| > 1, we have
1
N
N−1∑
j=0
θk+1
j
θj − η=
1
N
N−1∑
j=0
(−1
η
)
θk+1
j
1 − θj
η
=
∞∑
p=0
( −1
ηp+1
)
1
N
N−1∑
j=0
θp+k+1
j
=∞∑
q=1
(−1)q−1η−Nq+k. (6)
From (5) and (6), we have (4).(QED)
From this Lemma, we derive the following equation.
µk =
n∑
i=1
(
νi
1 + ηNi
)
ηki , k = 0, 1, . . . , N − 1. (7)
This equation implies that µk is a moment with a newweight νi = νi/(1 + ηN
i ) instead of νi. Therefore, we seethat the quadrature error affects the weight; however,the poles η1, . . . , ηn are unchanged, if computations areperformed without any numerical error.
For ηi such that |ηNi | ≫ 1, the weight νi = νi/(1+ηN
i )is close to zero. Suppose that η1, . . . , ηn are ordered suchthat |η1| ≤ · · · ≤ |ηn|. Let m′ be an integer such thatνi = O(ε) for any i with m′ < i ≤ n for sufficiently smallε > 0. Then (7) can be expressed as
µk =
m′
∑
i=1
νiηki + O(ε).
– 77 –
JSIAM Letters Vol. 1 (2009) pp.76–79 Tetsuya Sakurai et al.
Let µk =∑m′
i=1νiη
ki , then µk can be regarded as a per-
turbed moment from µk which is obtained from m′ polesη1, . . . , ηm′ with weights ν1, . . . , νm′ . Let
Fm′ = diag((1 + ηN1 )−
1
2 , . . . , (1 + ηNm′)−
1
2 ),
then we have
µk = (Fm′um′)T∆km′(Fm′um′).
Therefore, Hm′ and H<m′ , the m′ × m′ Hankel matrices
associated with µk, can be factorized as follows:
Hm′ = (Fm′φm′)T(Fm′φm′), (8)
H<m′ = (Fm′φm′)T∆m′(Fm′φm′). (9)
The right eigenvector of H<m′ − λHm′ with respect to ηi
is given by
xi =1√νi
V −1
m′ ei =
√
1 + ηNi√
νi
V −1
m′ ei,
and the left eigenvector is given by yi = xi. From theseresults and Lemma 2, the following relation is obtained.
Theorem 4 Let Hm′ = Hm′ + O(ε) and H<m′ = H<
m′ +O(ε) with sufficiently small ε > 0. Then
τi(H<m′ , Hm′) = |1 + ηN
i | × τi(H<m′ ,Hm′) + O(ε).
This theorem shows that the condition on the i-th eigen-vector of the matrix pencil H<
m′ − λHm′ , which is con-structed from the moments calculated by numerical in-tegration, is magnified by a factor |1 + ηN
i | > 1. How-ever, the influence of the quadrature error on our targeteigenvalues that lie inside the unit circle is |1 + ηN
i | ≈ 1if νi = O(1). We should take m′ large enough forνi/(1 + ηN
m′) = O(ε). This condition can be assessed by
the singularity of Hm′ .
3. Extension to block Hankel matrices
with matrix-valued moments
Now we extend the results in the previous section tothe case of matrix-valued moments. Let L be a positiveinteger with L ≤ n, and let Ni ∈ C
L×L, 1 ≤ i ≤ n begiven by Ni = dic
Ti , i = 1, 2, . . . , n, with vectors ci,di ∈
CL, 1 ≤ i ≤ n. Define the matrix valued moments Mk ∈
CL×L by
Mk =1
2πi
∫
T
zkF (z)dz, k = 0, 1, . . . ,
where F (z) ∈ CL×L is the matrix-valued function de-
fined by F (z) =∑n
i=1Ni/(z−ηi). This function appears
in the block Sakurai-Sugiura method for both general-ized and nonlinear eigenvalue problems [4–6].
It can be verified that
Mk =m
∑
i=1
Niηki = DT
m∆kmCm, k = 0, 1, . . . ,
where
Cm = [c1 c2 · · · cm]T, Dm = [d1 d2 · · · dm]T.
Here, we assume that the column vectors of Cm andthose of Dm are linearly independent, respectively.
Let K be an integer such that m ≤ KL ≤ n. De-fine the block Hankel matrices HKL,H<
KL ∈ CKL×KL
with elements Mk by HKL = [Mi+j−2]Ki,j=1, H<
KL =
[Mi+j−1]Ki,j=1. Let Φm,KL,Ψm,KL ∈ C
m×KL be
Φm,KL = [Cm ∆mCm . . . ∆K−1m Cm],
Ψm,KL = [Dm ∆mDm . . . ∆K−1m Dm].
We define the m × m leading submatrices as follows:
Hm = HKL(1 : m, 1 : m), H<m = H<
KL(1 : m, 1 : m),
and also
Φm = Φm,KL(1 : m, 1 : m), Ψm = Ψm,KL(1 : m, 1 : m).
Then Hm and H<m, the m × m block Hankel matrices
corresponding to Mk, can be factorized as follows:
Hm = ΨTmΦm, H<
m = ΨTm∆mΦm.
These relations lead to the following theorem.
Theorem 5 The eigenvalues of H<m − λHm are given
by η1, . . . , ηm. The right and left eigenvectors xi and yi
with respect to ηi are given by xi = Φ−1m ei and yi =
(Ψm)−1ei, respectively, and y
∗Hmxi = 1.
The approximations for Mk are calculated by
Mk =1
N
N−1∑
j=0
θk+1
j F (θj), k = 0, 1, . . . . (10)
Similar to the case of µk, we have
Mk =
m′
∑
i=1
Ni
1 + ηNi
ηki + O(ε).
Therefore, we can see that Mk approximately consistsof m′ poles η1, . . . , ηm′ with the matrix-valued weightsN1, . . . , Nm′ . In this case, the quadrature error is O(ε),which is small enough.
Setting Mk =∑m′
i=1Ni/(1 + ηN
i )ηki , we have the fol-
lowing theorem.
Theorem 6 Let Hm′ = Hm′ + O(ε) and H<m′ = H<
m′ +O(ε) for sufficiently small ε > 0. Then
τi(H<m′ , Hm′) = |1 + ηN
i | × τi(H<m′ ,Hm′) + O(ε).
Thus, we obtain a similar result to that of the scalarmoments case. The influence of the quadrature error ofmatrix-valued moments depends on the location of eacheigenvalue ηi. For eigenvalues that lie outside the unitcircle, the influence of the quadrature error is magnifiedby |1 + ηN
i | > 1. However, the perturbation resultingfrom quadrature error is not large for eigenvalues insidethe unit circle.
4. Numerical examples
In this section, some numerical experiments are con-sidered. The computations are performed in MATLABin double precision arithmetic. The matrix pencil issolved by the MATLAB function eig, and the systemof linear equations is solved by mldivide.
Example 1 The first example simply verifies the er-ror estimation in (3). Let n = m = 5. Let η1, . . . , ηm
– 78 –
JSIAM Letters Vol. 1 (2009) pp.76–79 Tetsuya Sakurai et al.
Table 1. Results of Example 1. Underlines indicate the incorrectdigits.
i Real(ηi) |ηi − ηi| νi
1 −1.012073233465553 1.2 × 10−2 10−14
2 0.499999980832560 4.5 × 10−8 10−8
3 0.500000000000000 5.5 × 10−16 1.04 1.000000000000000 8.5 × 10−16 1.0
5 1.999999999978390 4.7 × 10−11 10−6
Table 2. Results for the case of m′ = 12 in Example 2. Parame-ters are set as N = 32 and L = 5.
i Real(ηi) |ηi − ηi| τi
1 0.199999999999951 4.9 × 10−14 1.2 × 10−13
2 0.399999999999759 2.4 × 10−13 7.1 × 10−13
3 0.600000000000011 2.0 × 10−14 1.1 × 10−12
4 0.799999999999916 9.4 × 10−14 3.6 × 10−12
5 1.000000000000462 4.6 × 10−13 4.6 × 10−12
6 1.200000000176180 1.8 × 10−10 2.8 × 10−9
7 1.400000019943357 2.0 × 10−8 2.8 × 10−7
8 1.600000150508887 1.6 × 10−7 4.3 × 10−6
9 1.799956725555216 4.3 × 10−5 1.8 × 10−4
10 2.000315575880587 3.2 × 10−4 2.7 × 10−3
11 2.207688176008069 7.9 × 10−3 1.9 × 10−1
12 2.442759435060872 5.0 × 10−2 5.0 × 100
and ν1, . . . , νm be −1.0, 0.5 + i, 0.5 − i, 1.0, 2.0 and10−14, 10−8, 1.0, 1.0, 10−6, respectively.
The values η1, . . . , ηm are obtained by solving the gen-eralized eigenvalue problem H<
mx = λHmx. The mo-ments are calculated by µk =
∑n
i=1νk
i ηi. In Table 1, weshow ηi, |ηi − ηi| for each i. The condition number ofHm is cond(Hm) = 1.9 × 1014, however, η3 and η4 arecalculated numerically with sufficient accuracy from thematrix pencil. Other poles suffer from numerical errorwhere magnitude of each error is proportional to 1/νi.
Example 2 Let n = 20, ηi = 0.2× i for 1 ≤ i ≤ n. Herewe set m = 5. The elements of c1, . . . , cn and d1, . . . ,dn
are set by a random number generator from a uniformdistribution over the interval [0, 1]. Mk, k = 0, 1 . . . arecalculated by the N -point trapezoidal rule (10). The pa-rameters are set as N = 32, L = 5.
For each ηi with 1 ≤ i ≤ m, the error is evaluated bymax1≤i≤m |ηi − ηi|. To estimate the perturbation in theHankel matrices of size m′, we computed σm′/σ1 for vari-ous m′, where σ1, . . . , σm′ are the singular values of Hm′ .
In Table 2, we present the results for the case of m′ =12. Instead of calculating τi(H
<m′ ,Hm′), we calculated
τi by using the eigenvectors of Hm′ − λHm′ . Note thatη5 = 1 is located on the unit circle; however, it can beobtained because it does not meet any quadrature nodes.The condition number of Hm′ is cond(Hm′) = 1.1×1017.
The results for various m′ are shown in Table 3. Themaximum error for the eigenvalues in the unit circle de-creases as the matrix size m′ increases. The ratio of thesingular values σm′/σ1 gives a good evaluation of theperturbation of the coefficients of Hm′ .
5. Conclusions
Perturbation results for the eigenvalues of a matrixpencil of Hankel matrices associated with complex mo-
Table 3. Maximum error of ηi for 1 ≤ i ≤ 5 for the various m′
in Example 2. Parameters are set as N = 32 and L = 5.
m′ max |ηi − ηi| σm′/σ1 (σm′/σ1) × max(τi)
5 3.5 × 10−3 1.3 × 10−2 1.3 × 10−1
6 4.8 × 10−4 9.7 × 10−4 5.2 × 10−2
7 6.9 × 10−7 1.1 × 10−5 4.1 × 10−4
8 9.9 × 10−9 9.5 × 10−8 1.7 × 10−6
9 5.5 × 10−9 3.2 × 10−9 1.4 × 10−8
10 3.8 × 10−11 3.3 × 10−10 1.7 × 10−9
11 3.0 × 10−12 7.2 × 10−12 4.1 × 10−11
12 2.4 × 10−13 1.4 × 10−13 3.6 × 10−12
13 2.0 × 10−13 9.0 × 10−15 1.4 × 10−13
14 2.2 × 10−13 5.0 × 10−15 1.4 × 10−13
ments have been given. We extended these results to thecase where matrices have a block Hankel structure.
From these results, we ascertain that the use of Han-kel matrices of an appropriate size reduces the influenceof quadrature error for eigenvalues that lie inside a givenintegral path. In this case, the Hankel matrices are ill-conditioned, however, element-wise error analysis showsthat the target eigenvalues can be obtained accurately.The singular values of the Hankel matrix give good in-formation for quadrature errors, and we can estimate anappropriate size of the Hankel matrix.
Numerical examples are consistent with the theoreti-cal results. More detailed error estimations and applica-tions to practical problems are subjects for future study.
Acknowledgments
This research was supported in part by a Grant-in-Aidfor Scientific Research of Ministry of Education, Culture,Sports, Science and Technology, Japan, Grant number:21246018, 21105502 and 19300001.
References
[1] P. Kravanja, T. Sakurai and M. Van Barel, On locating clus-
ters of zeros of analytic functions, BIT, 39 (1999), 646–682.[2] P. Kravanja, T. Sakurai, H. Sugiura and M. Van Barel, A
perturbation result for generalized eigenvalue problems and
its application to error estimation in a quadrature methodfor computing zeros of analytic functions, J. Comput. Appl.Math., 161 (2003), 339–347.
[3] T. Sakurai, P. Kravanja, H. Sugiura and M. Van Barel, An
error analysis of two related quadrature methods for comput-ing zeros of analytic functions, J. Comput. Appl. Math., 152(2003), 467–480.
[4] J.Asakura, T. Sakurai, H.Tadano, T. Ikegami and K.Kimura,A numerical method for polynomial eigenvalue problems us-ing contour integral, Japan J. Indust. Appl. Math., to appear.
[5] J.Asakura, T. Sakurai, H.Tadano, T. Ikegami and K.Kimura,
A numerical method for nonlinear eigenvalue problems usingcontour integral, JSIAM Letters, 1 (2009), 52–55.
[6] I. Ikegami, T. Sakurai and U. Nagashima, A filter diago-nalization for generalized eigenvalue problems based on the
Sakurai-Sugiura projection method, J. Comput Appl. Math.,to appear.
[7] T. Sakurai and H. Sugiura, A projection method for general-
ized eigenvalue problems using numerical integration, J.Com-put. Appl. Math., 159 (2003), 119–128.
[8] T. Sakurai and H. Tadano, CIRR: a Rayleigh-Ritz typemethod with contour integral for generalized eigenvalue prob-
lems, Hokkaido Math. J., 36 (2007), 745–757.[9] E. E. Tyrtyshnikov, How bad are Hankel matrices?, Numer.
Math., 67 (1994), 261–269.
– 79 –
JSIAM Letters Vol.1 (2009)
ISBN : 978-4-9905076-0-2
ISSN : 1883-0617
©2009 The Japan Society for Industrial and Applied Mathematics
Publisher :
The Japan Society for Industrial and Applied Mathematics
4F, Nihon Gakkai Center Building
2-4-16, Yayoi, Bunkyo-ku, Tokyo, 113-0032 Japan
tel. +81-3-5684-8649 / fax. +81-3-5684-8663