arxiv:1803.10270v1 [math.na] 27 mar 2018 · parallel numerical tensor methods for high-dimensional...

Parallel Numerical Tensor Methods for High-Dimensional PDEs

A. M. P. Boelensa, D. Venturib, D. M. Tartakovskya

aDepartment of Energy Resources Engineering, Stanford University, Palo Alto, CA 94305bDepartment of Applied Mathematics and Statistics, UC Santa Cruz, Santa Cruz, CA 95064

Abstract

High-dimensional partial-differential equations (PDEs) arise in a number of fields of science andengineering, where they are used to describe the evolution of joint probability functions. Theirexamples include the Boltzmann and Fokker-Planck equations. We develop new parallel algorithmsto solve high-dimensional PDEs. The algorithms are based on canonical and hierarchical numericaltensor methods combined with alternating least squares and hierarchical singular value decomposi-tion. Both implicit and explicit integration schemes are presented and discussed. We demonstratethe accuracy and efficiency of the proposed new algorithms in computing the numerical solutionto both an advection equation in six variables plus time and a linearized version of the Boltzmannequation.

Keywords:2010 MSC: 00-01, 99-00

1. Introduction

High-dimensional partial-differential equations (PDEs) arise in a number of fields of scienceand engineering. For example, they play an important role in modeling rarefied gas dynamics [1],stochastic dynamical systems [2, 3, 4], structural dynamics [5, 6], turbulence [7, 8, 9, 10], biologicalnetworks [11], and quantum systems [12, 13]. In the context of kinetic theory and stochasticdynamics, high-dimensional PDEs typically describe the evolution of a (joint) probability densityfunction (PDF) of system states, providing macroscopic, continuum descriptions of Langevin-typestochastic systems. Classical examples of such PDEs include the Boltzmann equation [1] and theFokker-Planck equation [14]. More recently, PDF equations have been used to quantify uncertaintyin model predictions [15].

High dimensionality and resulting computational complexity of PDF equations can be mitigatedby using particle-based methods [8, 16]. Well known examples are direct simulation Monte-Carlo(DSMC) [17] and the Nambu-Babovsky method [18]. Such methods preserve physical propertiesof a system and exhibit high computational efficiency (they scale linearly with the number ofparticles), in particular in simulations far from statistical equilibrium [19, 20]. Moreover, thesemethods have relatively low memory requirements since the particles tend to concentrate wherethe distribution function is not small. However, the accuracy of particle methods may be poor and

Email addresses: [email protected] (A. M. P. Boelens), [email protected] (D. Venturi),[email protected] (D. M. Tartakovsky)

Preprint submitted to ArXiv March 29, 2018

arX

iv:1

803.

1027

0v1

[m

ath.

NA

] 2

7 M

ar 2

018

their predictions are subject to significant statistical fluctuations [20, 21, 22]. Such fluctuationsneed to be post-processed appropriately, for example by using variance reduction techniques. Also,relevance and applicability of these numerical strategies to PDEs other than kinetic equations arenot clear. To overcome these difficulties, several general-purpose algorithms have been recentlyproposed to compute the numerical solution to rather general high-dimensional linear PDEs. Themost efficient techniques, however, are problem specific [23, 24, 25, 16]. For example, in the contextof kinetic methods, there is an extensive literature concerning the (six-dimensional) Boltzmannequation and its solutions close to statistical equilibrium. Deterministic methods for solving suchproblems include semi-Lagrangian schemes and discrete velocity models. The former employ a fixedcomputational grid, account for transport features of the Boltzmann equation in a fully Lagrangianframework, and usually adapt operator splitting. The latter employ a regular grid in velocity anda discrete collision operator on the points of the grid that preserves the main physical properties(see §3 and §4 in Di Marco and Pareschi [16] for an in-depth review of semi-Lagrangian methodsand discrete velocity models for kinetic transport equations, respectively).

We develop new parallel algorithms to solve high-dimensional partial differential equations basedon numerical tensor methods [26]. The algorithms we propose are based on canonical and hierar-chical tensor expansions combined with alternating least squares and hierarchical singular valuedecomposition. The key element that opens the possibility to solve high-dimensional PDEs nu-merically with tensor methods is that tensor approximations proved to be capable of representingfunction-related N -dimensional data arrays of size QN with log-volume complexity O(N log(Q)).Combined with traditional deterministic numerical schemes (e.g., spectral collocation method [27],these novel representations allow one to compute the solution to high-dimensional PDEs usinglow-parametric rank-structured tensor formats.

This paper is organized as follows. In Section 2 we discuss tensor representations of N -dimensional functions. In particular, we discuss canonical tensor decomposition and hierarchicaltensor networks, including hierarchical Tucker and tensor train expansions. We also address theproblem of computing tensor expansion via alternating least squares. In Section 3 we developseveral algorithms that rely on numerical tensor methods to solve high-dimensional PDEs. Specifi-cally, we discuss schemes based on implicit and explicit time integration, combined with dimensionalsplitting, alternating least squares and hierarchical singular value decomposition. In Section 4 wedemonstrate the effectiveness and computational efficiency of the proposed new algorithms in solv-ing both an advection equation in six variables plus time and a linearized version of the Boltzmannequation.

2. Tensor Decomposition of High-Dimensional Functions

Consider a multivariate scalar field f(z) : D ⊆ RN → R. In this section we briefly revieweffective representations of f based on tensor methods. In particular, we discuss the canonicaltensor decomposition and hierarchical tensor methods, e.g., tensor train and hierarchical Tuckerexpansions.

2.1. Canonical Tensor Decomposition

The canonical tensor decomposition of the multivariate function f(z) : D ⊆ RN → R is a seriesexpansion of the form

f(z) 'r∑l=1

N∏k=1

Glj(zj), (1)

2

where Gli(zi) : R → R are one-dimensional functions usually represented relative to a known basis{φ1, ..., φQ}, i.e.,

Gli(zi) =

Q∑k=1

βlikφk(zi). (2)

The quantity r in (1) is called separation rank. Although general/computable theorems relatinga prescribed accuracy of the representation of f(z) to the value of the separation rank r are stilllacking, there are cases where the expansion (1) is exponentially more efficient than one wouldexpect a priori [28].

Alternating Least Squares (ALS) Formulation. Development of robust and efficient algorithms tocompute (1) to any desired accuracy is still a relatively open question (see [29, 30, 31, 32, 23] for re-cent progresses). Computing the tensor components Glk(zk) usually relies on (greedy) optimizationtechniques, such as alternating least squares (ALS) [33, 34, 29, 28] or regularized Newton methods[30], which are only locally convergent [35], i.e., the final result may depend on the initial conditionof the algorithm.

In the least-squares setting, the tensor components Glj(zj) are computed by minimizing a normof the residual,

R(z) = f(z)−r∑l=1

N∏k=1

Glj(zj), (3)

with respect to the rNQ degrees of freedom βnhj i.e.,

minβnhj

‖R(z)‖ . (4)

Assuming f(z) to be periodic in the hyper-cube D = [−b, b]N (b > 0), we define the norm ‖ · ‖ asa standard L2 norm

‖R(z)‖2L2=

∫ b

−b· · ·∫ b

−bR(z)2dz,

=

∫ b

−b· · ·∫ b

−b

[f(z)−

r∑n=1

N∏k=1

(Q∑s=1

βnksφs(zk)

)]2dz. (5)

In the alternating least-squares (ALS) paradigm, we compute the minimizer of the residual (3) bysplitting the non-convex optimization problem (4) into a sequence of convex low-dimensional convexoptimization problems. To illustrate the method, let us define vectors

β1 = (β111, ..., β

11Q, ..., β

r11, ..., β

r1Q)T , · · · , βN = (β1

N1, ..., β1NQ, ..., β

rN1, ..., β

rNQ)T . (6)

Each βi collects the degrees of freedom of all functions {G1i (zi), ..., G

ri (zi)} depending on zi. Next,

the optimization problem (4) is split into a sequence of convex optimization problems,

minβ1

‖R‖2L2, · · · , min

βN

‖R‖2L2. (7)

This sequence is not equivalent to the full problem (4) because, in general, it does not allow oneto compute the global minimizer of (4) [35, 36, 37, 38]. The Euler-Lagrange equations associated

3

with (7) are of the form1

Ajβj = gj j = 1, . . . , N, (9)

where

Aj =

A11j11 · · ·A11

j1Q A12j11 · · ·A12

j1Q · · · A1rj11 · · ·A1r

j1Q...

......

...A11jQ1 · · ·A11

jQQ A12jQ1 · · ·A12

jQQ · · · A1rjQ1 · · ·A1r

jQQ...

......

...Ar1j11 · · ·Ar1j1Q Ar2j11 · · ·Ar2j1Q · · · Arrj11 · · ·Arrj1Q

......

......

Ar1jQ1 · · ·Ar1jQQ Ar2jQ1 · · ·Ar2jQQ · · · ArrjQ1 · · ·ArrjQQ

, gj =

g1j1...g1jQ

...grj1...grjQ

, (10)

and

Alnjhs =

∫ b

−bφh(zj)φs(zj)dzj

N∏k=1k 6=j

∫ b

−bGlk(zk)Gnk (zk)dzk, (11)

gnjh =

∫ b

−b· · ·∫ b

−bf(x)φh(xj)

N∏k=1k 6=j

Gnk (xk)dz1 · · · dzN . (12)

The matrices Aj are symmetric, positive definite and of size rQ× rQ.

Convergence of the ALS Algorithm. The ALS algorithm described above is an alternating optimiza-tion scheme, i.e., a nonlinear block Gauss–Seidel method ([39], §7.4). There is a well–developedlocal convergence theory for this type of methods [39, 37]. In particular, it can be shown thatALS is locally equivalent to the linear block Gauss–Seidel iteration applied to the Hessian matrix.This implies that ALS is linearly convergent in the iteration number [35], provided that the Hes-sian of the residual is positive definite (except on a trivial null space associated with the scalingnon-uniqueness of the canonical tensor decomposition). The last assumption may not be alwayssatisfied. Therefore, convergence of the ALS algorithm cannot be granted in general. Anotherpotential issue of the ALS algorithm is the poor conditioning of the matrices Aj in (9), which canaddressed by regularization [33, 34]. The canonical tensor decomposition (1) in N dimensions hasrelatively small memory requirements. In fact, the number of degrees of freedom that we need tostore is rNQ, where r is the separation rank, and Q is the number of degrees of freedom employedin each tensor component Glk(zk). Despite the relatively low-memory requirements, it is often de-sirable to employ scalable parallel versions the ALS algorithm [31, 40] to compute the canonicaltensor expansion (1) (see Appendix B)

1 Recall that minimizing the residual (3) with respect to βnhj is equivalent to impose orthogonality relative to the

space spanned by the functions

φh(zj)

N∏k=1k 6=j

Gnk (zk). (8)

4

2.2. Hierarchical Tensor Methods

Hierarchical Tensor methods [41, 42] were introduced to mitigate the dimensionality problem inthe core tensor of the classical Tucker decomposition [43]. A key idea is to perform a sequence ofSchmidt decompositions (or multivariate SVDs [44, 45]) until the approximation problem is reducedto a product of one-dimensional functions/vectors. To illustrate the method in a simple way,consider a six-dimensional function f(z) = f(z1, . . . , z6). We first split the variables as (z1, z2, z3)and (z4, z5, z6) through one Schmidt decomposition [46] as

f(z) =

r∑i7,i8=1

A{1}i7i8

T{1,2,3}i7

(z1, z2, z3)T{4,5,6}i8

(z4, z5, z6). (13)

Then we decompose T{1,2,3}i7

(z1, z2, z3) and T{4,5,6}i8

(z4, z5, z6) further by additional Schmidt expan-sions to obtain

f(z) =

r∑i7,i8=1

A{1}i7i8

r∑i1,i9=1

A{2}i7i1i9

T{1}i1

(z1)T{2,3}i9

(z2, z3)

r∑i4,i10=1

A{3}i8i4i10

T{4}i4

(z4)T{5,6}i10

(z5, z6),

=

r∑i1,··· ,i6=1

Ci1···i6T{1}i1

(z1)T{2}i2

(z2)T{3}i3

(z3)T{4}i4

(z4)T{5}i5

(z5)T{6}i6

(z6). (14)

The 6-dimensional core tensor2 is explicitly obtained as

Ci1···i6 =

r∑i7,i8=1

A{1}i7i8

r∑i9=1

A{2}i7i1i9

A{4}i9i2i3

r∑i10=1

A{3}i8i4i10

A{5}i10i5i6

. (16)

The procedure just described forms the foundation of hierarchical tensor methods. The key elementis that the core tensor Ci1···i6 resulting from this procedure is always factored as a product of atmost three-dimensional matrices. This is also true in arbitrary dimensions. The tensor components

T{k}ij

and the factors of the core tensor (16) can be effectively computed by employing hierarchical

singular value decomposition [44, 51, 45]. Alternatively, one can use an optimization frameworkthat leverages recursive subspace factorizations [52]. If we single out one variable at the time andperform a sequential Schmidt decomposition of the remaining variables we obtain the so-calledtensor-train (TT) decomposition [53, 38]. Both tensor train and hierarchical tensor expansions canbe conveniently visualized by graphs (see Figure 1). This is done by adopting the following standardrules: i) a node in a graph represents a tensor in as many variables as the number of the edgesconnected to it, ii) connecting two tensors by an edge represents a tensor contraction over the indexassociated with a certain variable.

2A diagonalization of the core tensor in (14) would minimize the number of terms in the series expansion. Un-fortunately, this is impossible for a tensor with dimension larger than 2 (see, e.g., [47, 48, 43, 49]) and, for complextensors, [50]). A closer look at the canonical tensor decomposition (1) reveals that such an expansion is in the formof a fully diagonal high-order Schmidt decomposition, i.e.,

f(x1, . . . , xN ) =

r∑l=1

Cl···lGl1(x1) · · ·Gl

N (xN ). (15)

The fact that diagonalization of Cl1···l6 is impossible in dimension larger than 2 implies that it is impossible tocompute the canonical tensor decomposition of f by standard linear algebra techniques.

5

Hierarchical Tucker Tensor Train

Figure 1: Graph representation of the hierarchical Tucker (HT) and tensor train (TT) decomposition of a six-dimensional function f(z1, . . . , z6).

Efficient algorithms to perform basic operations between hierarchical tensors, such as addition,orthogonalization, rank reduction, scalar products, multiplication, and linear transformations, arediscussed in [54, 43, 51, 55]. Parallel implementations of such algorithms were recently proposed in[56, 57]. Methods for reducing the computational cost of tensor trains are discussed in [58, 59, 55].Applications to the Vlasov kinetic equation can be found in [60, 61, 62].

3. Numerical Approximation of High-Dimensional PDEs

In this section we develop efficient numerical methods to integrate high-dimensional linear PDEsof the form

∂f(z, t)

∂t= L(z)f(z, t), (17)

where L(z) is a linear operator. The algorithms we propose are based on numerical tensor methodsand appropriate rank-reduction techniques, such as hierarchical singular value decomposition [44,57] and alternating least squares [31, 30]. To begin with, we assume that L(z) is a separable linearoperator with separation rank rL, i.e., an operator of the form

L(z) =

rL∑l=1

αlLl1(z1) · · ·LlN (zN ). (18)

For each i and j, Lji (zi) is a linear operator acting only on variable zi. As an example, the operator

L(z1, z2) = z21∂

∂z1− sin(z2)

∂2

∂z1∂z2. (19)

is separable with separation rank rL = 2 in dimension N = 2. It can be written in the form (18) ifwe set α1 = α2 = 1 and

L11(z1) = z21

∂

∂z1, L1

2(z2) = 1, L21(z1) =

∂

∂z1, L2

2(z2) = − sin(z2)∂

∂z2. (20)

6

3.1. Tensor Methods with Implicit Time Stepping

Let us discretize the PDE in (17) in time by using the Crank-Nicolson method. To this end,consider an evenly-spaced grid tn = n∆t (n = 0, 1, ...) with time step ∆t. This yields[

I − ∆t

2L(z)

]fn+1(z) =

[I +

∆t

2L(z)

]fn(z) + ∆tτn+1(z), (21)

where I is the identity matrix, fn(z) = f(z, tn), and τn+1(z) is the local truncation error of theCrank-Nicolson method at time tn+1 [63]. Rewriting this in a more compact notation yields [23]

A(z)fn+1(z) = B(z)fn(z) + ∆tτn+1(z), (22)

where

A = I − ∆t

2L(z), and B = I +

∆t

2L(z). (23)

It follows from the definition of L(z) in (18) that both A(z) and B(z) are separable operators ofthe form

A =

rL∑q=0

ηqEq1(z1) · · ·EqN (zN ), B =

rL∑q=0

ζqEq1(z1) · · ·EqN (aN ), (24)

where η0 = ζ0 = 1, E0j (zj) = 1 (j = 1, ..., N),

ηj = −∆t

2, ζj = −ηj , Eqj (zj) = Lqj(zj), q = 1, ..., rL, j = 1, ..., N. (25)

A substitution of the canonical tensor decomposition3

fn+1(z) =

r∑l=1

N∏k=1

Glk(zk, tn+1) (27)

into (21) yields the residual

R(z, tn+1) = A(z)fn+1(z)−B(z)fn(z), (28)

in which the local truncation error ∆tτn+1 is embedded into R(z, tn+1). Next, we minimize theresidual using the alternating least squares algorithm, described in Section 3.1, to obtain the solutionat time tn+1. In particular, we look for a minimizer of (28) computed in a parsimonious way. Thekey idea again is to split the optimization problem

minβlks(tn+1)

‖R(z, tn+1)‖2L2(29)

3Recall that the functions Glk(zk, tn) are in the form

Glk(zk, tn) =

Q∑s=1

βlks(tn)φs(zk), (26)

where βlks(tn) (l = 1, ..., r, k = 1, ..., N , s = 1, ..., Q) are the degrees of freedom.

7

Figure 2: Sketch of the alternating least squares (ALS) algorithm for the minimization of the norm of the residualR(z, tn+1) defined in equation (28).

into a sequence of optimization problems of smaller dimension, which are solved sequentially andin parallel [31] (see Appendix B). To this end, we define

βk(tn) = [β1k1(tn), ..., β1

kQ(tn), ..., βrk1(tn), ..., βrkQ(tn)]T k = 1, ..., N. (30)

The vector βk(tn) collects the degrees of freedom representing the solution functional along zk attime tn, i.e., the set of functions {G1

k(zk, tn), ..., Grk(zk, tn)}. Minimization of (29) with respect toindependent variations of βk(tn+1) yields a sequence of convex optimization problems (see Figure 2)

minβ1(tn+1)

‖R(z, tn+1)‖2L2, min

β2(tn+1)‖R(z, tn+1)‖2L2

, · · · , minβN (tn+1)

‖R(z, tn+1)‖2L2. (31)

This set of equations defines the alternating least-squares (ALS) method. The Euler-Lagrangeequations, which identify stationary points of (31), are linear systems in the form

MLq βq(tn+1) = MR

q βq(tn) q = 1, ..., N (32)

where

MLq =

rL∑e,z=0

KLez

[N©k=1k 6=qβk(tn+1)TEez

k βk(tn+1)

]T⊗[Eezq

]T, (33)

MRq =

rL∑e,z=0

KRez

[N©k=1k 6=qβk(tn)TEez

k βk(tn+1)

]T⊗[Eezq

]T, (34)

and

[Eezq

]sh

=

∫ b

−bEeq (a)φs(a)Ezq (a)φh(a)da. (35)

8

In (33)–(34), © denotes the Hadamard matrix product, ⊗ is the Kronecker matrix product, βk isthe matricization of βk, i.e.,

βk(tn) =

β1k1(tn) · · · βrk1(tn)

.... . .

...β1kQ(tn) · · · βrkQ(tn)

, (36)

Eezq are Q×Q matrices, and KL

ez and KRez are entries of the matrices

KL = ηηT , and KR = ηζT , (37)

where η and ζ are column vectors with entries ηq and ζq defined in (24). The ALS algorithm,combined with canonical tensor representations, effectively reduces evaluation of N -dimensionalintegrals to evaluation of the sum of products of one-dimensional integrals.

Summary of the Algorithm. Often many of the integrals in (35) only differ by a prefactor and inthat case the number of unique integrals can be reduced to a very small number (4 in the case of theBoltzmann-BGK equation discussed in Section 4.2). Thus, it is convenient to precompute a mapbetween such integrals and any entry in (35). Such a map allows us to rapidly compute each matrixEezq in (33) and (34), and therefore to efficiently build the matrix system. Next, we compute the

canonical tensor decomposition of the initial condition f(z, 0) by applying the methods describedin Section 2.1. This yields the set of vectors {β1(t0), ...,βN (t0)}, from which the matrices ML

1

and MR1 in (33) and (34) are built. To this end, we need an initial guess for {β1(t1), ...,βN (t1)}

given, e.g., by {β1(t0), ...,βN (t0)} or its small random perturbation. With ML1 , MR

1 in place, wesolve the linear system in (32) and update β1(t1). Finally, we recompute ML

2 and MR2 (with the

updated β1(t1)) and solve for β2(t1). This process is repeated for q = 3, . . . , N and iterated amongall the variables until convergence (see Figure 2). The ALS iterations are said to converge if∥∥∥β(j)

k (tn)− β(j−1)k (tn)

∥∥∥22≤ εTol, j = 1, . . . , Nmax, (38)

where j here denotes the iteration number whose maximum value is Nmax, and εTol is a prescribedtolerance on the increment of such iterations. Parallel versions of the ALS algorithm associatedwith canonical tensor decompositions were recently proposed in [31, 40] (see also Appendix B).We emphasize that the use of a hierarchical tensor series instead of (27) (see Section (2.2)) alsoallows one to compute the unknown degrees of the expansion by using parallel ALS algorithms.This question was recently addressed in [56].

3.2. Tensor Methods with Explicit Time Stepping

Let us discretize PDE (17) in time by using any explicit time stepping scheme, for example thesecond-order Adams-Bashforth scheme [63]

fn+2(z) = fn+1(z) +∆t

2L(z) (3fn+1(z)− fn(z)) + ∆tτn+2(z). (39)

In this setting, the only operations needed to compute fn+2 with tensor methods are: i) addition, ii)application of a (separable) linear operator L, and iii) rank reduction4, the last operation being the

4On the other hand, the use of implicit time-discretization schemes requires one to solve linear systems with tensormethods [57, 51, 33].

9

most important among the three [43, 51, 33]. Rank reduction can be achieved, e.g., by hierarchicalsingular value decomposition [44, 57] or by alternating least squares [31].

From a computational perspective, it is useful to split the sequence of tensor operations yieldingfn+2 in (39) into simple tensor operations followed by rank reduction. For example, L(3fn+1− fn)might be evaluated as follows: i) compute wn+1 = fn+1 − fn; ii) perform rank reduction onwn+1; iii) compute Lwn+1 and perform rank reduction again. This allows one to minimize thememory requirements and overall computational cost.5 Unfortunately, splitting tensor operationsinto sequences of tensor operations followed by rank reduction can produce severe cancellationerrors. In some cases this problem can be mitigated. For example, an efficient and robust algorithm,which allows us to split sums and rank reduction operations, was recently proposed in [51] in thecontext of hierarchical Tucker formats. The algorithm leverages the block diagonal structure arisingfrom addition of hierarchical Tucker tensors.

4. Numerical Results

In this section, we demonstrate the accuracy and effectiveness of the numerical tensor methodsdiscussed above. Two case studies are considered: an advection equation in six dimensions plustime and the Boltzmann-BGK equation.

4.1. Advection Equation

Consider an initial-value problem

∂f

∂t+

N∑j=1

(N∑k=1

Cjkzk

)∂f

∂zj= 0, f(z1, . . . , zN , 0) = f0(z1, . . . , zN ), (40)

where N is the number of independent variables forming the coordinate vector z = (z1, . . . , zN )>,and Cjk are components of a given matrix of coefficients C. The analytical solution to this initial-value problem is computed with the method of characteristics as [64]

f(z, t) = f0(e−tCz

). (41)

We choose the initial condition f0 to be a fully separable product of exponentials,

f0(z) = e−‖z‖22 . (42)

The time dynamics of the solution (41)-(42) is entirely determined by the matrix C. In particular,if C has eigenvalues with positive real part then the matrix exponential exp(−tC) is a contractionmap. In this case, f(z, t) → 1 as t → ∞, at each point z ∈ RN . Figure 3 exhibits the analyticalsolution (41) generated by a 2×2 matrix C, whose eigenvalues are complex conjugate with positivereal part. The side length of the hypercube cube that encloses any level set of the solution at timet depends on the number of dimensions N . In particular, the side-length of the hypercube thatencloses the set {z ∈ RN | f(z, t) ≥ ε} is given by

1

2

√− log(ε)

Nλmin≤ b ≤ 1

2

√− log(ε)

λmin. (43)

5Recall that adding two tensors with rank r1 and r2 usually yields a tensor of rank r1 + r2.

10

t = 0 t = 3 t = 6

-30 -15 0 15 30

-30

-15

0

15

30

0

0.2

0.4

0.6

0.8

1

-30 -15 0 15 30

-30

-15

0

15

30

0

0.2

0.4

0.6

0.8

1

-30 -15 0 15 30

-30

-15

0

15

30

0

0.2

0.4

0.6

0.8

1

Figure 3: Solution to the multivariate PDE (40) in N = 2 dimensions. The stable spiral at the origin of thecharacteristic system attracts every curve in the phase space and ultimately yields f = 1 everywhere after a transient.

where λmin is the smallest eigenvalue of the matrix exp(tC)> exp(tC). Equation (40) can be writtenin the form (17), with separable operator L (separation rank N2). Specifically, the one-dimensionaloperators Llj(zj) appearing in (18) are explicitly defined in Table 1.

q αq Lq1 Lq2 · · · LqN

1 −C11 z1∂z1 1 · · · 12 −C12 ∂z1 z2 · · · 1...

......

......

...N −C1N ∂z1 1 · · · zN

N + 1 −C21 z1 ∂z2 · · · 1N + 2 −C22 1 z2∂z2 · · · 1

......

......

......

2N −C2N 1 ∂z2 · · · zN...

......

......

...N2 −N + 1 −CN1 z1 1 · · · ∂zNN2 −N + 2 −CN2 1 z2 · · · ∂zN

......

......

......

N2 −CNN 1 1 · · · zN∂zN

Table 1: Advection equation (40). Ordering of the linear operators defined in (18).

We used tensor methods, both canonical tensor decomposition and hierarchical tensor meth-ods, and explicit time stepping (Section 3.2) to solve numerically (40) in the periodic hypercube6

[−60, 60]N . Each tensor component was discretized by using a Fourier spectral expansion with 600nodes in each variable (e.g., basis functions in (2) or (14)). The accuracy of the numerical solution

6Such domain is chosen large enough to accommodate periodic (zero) boundary conditions in the integrationperiod of interest.

11

was quantified in terms of the time-dependent relative error

εm(t) =

∣∣∣∣∣f(z∗, t)− f(z∗, t)f(z∗, t)

∣∣∣∣∣ , (44)

where f is the analytical solution (41), f is the numerical solution obtained by using the canonicalor hierarchical tensor methods with separation rank r.

Figure 4 shows the relative pointwise error (44), computed at z∗ = (h, h, . . . , h) with h =0.698835274542439, for different separation ranks r and different number of independent variablesN . As expected, the accuracy of the numerical solution increases with the separation rank r. Also,the relative error increases with the number of dimensions N . It is worthwhile emphasizing thatthe rank reduction in canonical tensor decomposition at each time step is based on a randomizedalgorithm that requires initialization at each time-step. This means that results of simulations withthe same nominal parameters may be different. On the other hand, tensor methods based on ahierarchical singular value decomposition, such as the hierarchical Tucker decomposition [44, 57], donot suffer from this issue. The separation rank of both canonical and hierarchical tensor methodsis computed adaptively up to the maximum value rmax specified in the legend of Figure 4. In twodimensions, the CP-ALS algorithm yields the similar error plots for rmax = 8 and rmax = 12. Thisis because r < 8 in both cases, and throughout the simulation up to t = 1. The difference betweenthese results is due to the random initialization required by the ALS algorithm at each time step.

4.2. Boltzmann-BGK Equation

In this section we develop an accurate ALS-based algorithm to solve a linearized BGK equation(see Appendix A for details of its derivation),

∂f

∂t+ v · ∇xf = νI, I = feq(v)− f. (45)

Here, f(z, t) is a probability density function is six phase variables plus time; the coordinate vectorz = (x,v)> is composed of three spatial dimensions x = (x1, x2, x3)> and three components of thevelocity vector v = (v1, v2, v3)>; feq denotes a locally Maxwellian distribution,

feq(v) =ρ

(2πRT )3/2exp

(−‖v‖

22

2RT

), (46)

with uniform gas density ρ, temperature T , and velocity v; R is the gas constant; and ν = κρT 1−µ

with positive constants κ and µ. The solution to the linearized Boltzmann-BGK equation (47) isalso computable analytically, which provides us with a benchmark solution to check the accuracyof our algorithms.

As before we start by rewriting this equation in an operator form,

∂f

∂t= Lf + C (47)

where

L(x,v) = −v · ∇x − νI, C(v) = νfeq(v). (48)

12

Canonical Tensor Method Hierarchical Tensor Method

two

dim

ensi

ons

0 0.2 0.4 0.6 0.8 110

-9

10-7

10-5

10-3

10-1

0 0.2 0.4 0.6 0.8 110

-9

10-7

10-5

10-3

10-1

thre

ed

imen

sion

s

0 0.2 0.4 0.6 0.8 110

-7

10-6

10-5

10-4

10-3

10-2

10-1

0 0.2 0.4 0.6 0.8 110

-9

10-7

10-5

10-3

10-1

six

dim

ensi

ons

0 0.1 0.2 0.3 0.4 0.510

-6

10-5

10-4

10-3

10-2

10-1

100

0 0.1 0.2 0.3 0.4 0.510

-8

10-6

10-4

10-2

100

−− rmax = 4 — rmax = 8 − · − rmax = 12

Figure 4: Advection equation (40). Relative pointwise error (44) for several values of the separation ranks r andthe number of independent variables N . These results are obtained by using the canonical and hierarchical tensormethods with explicit time stepping (see Section 3.2). The separation rank is computed adaptively at each time stepup to the maximum value rmax.

13

Note that L is a separable linear operator with separation rank rL = 4, which can be rewritten inthe form

L(z) =

4∑q=1

αqLq1(z1) · · ·Lq6(z6), (49)

for suitable one-dimensional linear operators Lqj(aj) defined in Table 2. Similarly, C(v) in (48) isseparated as

C(v) = C1(v1)C2(v2)C3(v3), where Ci(vi) =(νρ)1/3

(2πRT )1/2exp

(− v2i

2RT

). (50)

q αq Lq1 Lq2 Lq3 Lq4 Lq5 Lq61 −ν 1 1 1 1 1 12 −1 ∂/∂x1 1 1 v1 1 13 −1 1 ∂/∂x2 1 1 v2 14 −1 1 1 ∂/∂x3 1 1 v3

Table 2: Ordering of the linear operators defined in equation (49).

From a numerical viewpoint, the solution to (47) can be represented by using any of the tensorseries expansion we discussed in Section 2. In particular, hereafter we develop an algorithm basedon canonical tensor tensor decomposition, alternating least squares, and implicit time stepping (seeSection (3.1)). To this end, we discretize (47) in time with the Crank-Nicolson method to obtain[

I − ∆t

2L

]fn+1 =

[I +

∆t

2L

]fn + ∆tC + ∆tτn, (51)

where fn(z) = f(z, tn) and τn it the local truncation error of the Crank-Nicholson method at t = tn([63], p. 499). This equation can be written in a compact notation as

Afn+1 = Bfn + ∆tC + ∆tτn, (52)

where A and B are separable operators in the form (24), where all quantities are defined in Table3.

q ηq ζq Eq1 Eq2 Eq3 Eq4 Eq5 Eq6

1 ν(1 + ∆t/2) ν(1−∆t/2) 1 1 1 1 1 11 ∆t/2 −∆t/2 ∂/∂x1 1 1 v1 1 11 ∆t/2 −∆t/2 1 ∂/∂x2 1 1 v2 11 ∆t/2 −∆t/2 1 1 ∂/∂x3 1 1 v3

Table 3: Ordering of the linear operators A and B defined in (24).

14

A substitution of the canonical tensor decomposition7

fn+1 =

r∑l=1

6∏k=1

Glk(zk, tn+1) (54)

into equation (21) yields the residual

R(z, tn+1) = A(z)fn+1(z)−B(z)fn(z) + ∆tC(z4, z5, z6), (55)

which can be minimized by using the alternating least squares method, as we described in Section3.1 to obtain the solution at time tn+1.

Nonlinear Boltzmann-BGK Model. The numerical tensor methods we discussed in Section 3.1 andSection 3.2 can be extended to compute the numerical solution of the fully nonlinear Boltzmann-BGK model

∂f

∂t+ v · ∇xf = ν(x, t) (feq(x,v, t)− f(x,v, t)) . (56)

Here, feq(x,v, t) is the equilibrium distribution (A.8), while the collision frequency ν(x, t) cancan be expressed, e.g., as in (A.14). From a mathematical viewpoint both quantities feq and νare nonlinear functionals of the PDF f(x,v, t) (see equations (A.9)-(A.11)). Therefore (56) is anadvection equation driven driven by a nonlinear functional of f(x,v, t). The evaluation of theintegrals appearing in (A.9)-(A.11) poses no great challenge if we represent f as a canonical tensor(1) or as a hierarchical Tucker tensor (14). Indeed, such integrals can be factored as sums ofproduct of one-dimensional integrals in a tensor representation. Moreover, computing the inverseof ρ(x, t) in (A.10) and (A.11) is relatively simple in a collocation setting. In fact, we just need toevaluate the full tensor ρ(x, t), on a three-dimensional tensor product grid and compute 1/ρ(x, t)pointwise. If needed, we could then compute the tensor decomposition of 1/ρ(x, t), e.g., by usingthe canonical tensor representations and the ALS algorithm we discussed in Section 2.1. This allowsus to evaluate the BGK collision operator and represent it in a tensor series expansion. The solutionto the Boltzmann BGK equation can be then computed by operator splitting methods [65, 66].

Initial and Boundary conditions. To validate the proposed alternating least squares algorithm weset the initial condition to be either coincident with the homogeneous Maxwell-Boltzmann distri-bution (46), i.e.,

f (x,v, 0) =ρ

(2πRT )3/2

exp

(−‖v‖

22

2RT

)(57)

or a slight perturbation of it. The PDF (57) is obviously separable with separation rank one. More-over, we set the computational domain to be a the periodic hyperrectangle [−bx, bx]3 × [−bv, bv]3.It is essential that such domain is chosen large enough to accommodate the support of the PDF

7Recall that the functions Glk(zk, tn) are in the form

Glk(zk, tn) =

Q∑s=1

βlks(tn)φs(zk), (53)

βlks(tn) (l = 1, ..., r, k = 1, ..., 6, s = 1, ..., Q) being the degrees of freedom.

15

Parameter Symbol valueTemperature T 300 KNumber density n 2.4143 · 1025 m−3

Specific gas constant R 208 J kg−1 K−1

Relaxation time τR 0.40034 sCollision frequency ν 1/τRPosition domain size bx 500 mVelocity domain size bv 5

√RST

Time step ∆t 0.01τRNumber of iterations NIter 1000ALS tolerance εTol 10−8

Series truncation Q 11

Table 4: Simulation parameters we employed in the numerical approximation of the Boltzmann-BGK model. Theseparameters correspond to a kinetic simulation of dynamics of Argon in the periodic hyperrectangle [−bx, bx]3 ×[−bv , bv ]3. The time step in the implicit Crank-Nicolson method is ∆t, while the tolerance in the ALS iterations ateach time step is εTol.

at all times, thus preventing physically unrealistic correlations between the distribution function.More generally, it is possible to enforce other types of boundary conditions by using appropriatetrial functions, or by a mixed approach [67, 68, 69] in the case of Maxwell boundaries.

Unless otherwise stated, all simulation parameters are set as in Table 4. Such setting correspondsto the problem of computing the dynamics of Argon within the periodic hyperrectangle [−bx, bx]3×[−bv, bv]3. We are interested in testing our algorithms in two different regimes: i) steady state, andii) relaxation to statistical equilibrium.

4.2.1. Canonical Tensor Decomposition of the Maxwell-Boltzmann Distribution

We begin with a preliminary convergence study of the canonical tensor decomposition (1) appliedto the one-particle Maxwell distribution feq(v). Such study allows us to identify the size of thehyperrectangle that includes the support of the Maxwellian equilibrium distribution. This is done inFigure 5, where we plot the tensor expansion of feq(v) as function of the non-dimensional molecularspeed ‖v‖2 /

√RST , for different number of basis functions Q in (2), and for various values of the

dimensionless velocity domain size bv/√RT . It is seen that for small values of bv/

√RT , a small

number of basis functions Q is sufficient to capture the exact equilibrium distribution. At the samehowever, small values of bv/

√RT result in distribution functions with truncated tails, as can be

seen in Figures 5 (a), 5 (b), and 5 (e). Figures 5 (c) and 5 (g) demonstrate that the dimensionlessvelocity domain size bv/

√RT = 5 is sufficient to avoid truncation of the tail of the distribution,

and that a series expansion with Q = 11 in each variable is sufficient to accurately capture theequilibrium distribution. This justifies the choice of parameters in Table (4).

A more detailed analysis of the effects of the truncation order Q and the dimensionless velocitydomain size bv/

√RT on the approximation error in feq is done in Figure 6(a), where we plot the

Normalized Mean Absolute Error (NMAE) versus bv/√RT and Q. The NMAE between two vectors

X and Y of size N is defined as:

NMAE =1

N

‖X − Y ‖1max (X)−min (Y )

, (58)

16

0 100 200

0

2

4·10−3

(a) 2bv/√RST = 2

f(v)[m

−1s]

0 100 200

0

2

4·10−3

(b) 2bv/√RST = 3

0 100 200

0

2

4·10−3

(c) 2bv/√RST = 5

‖v‖2 /√RST

f(v)[m

−1s]

0 100 200

0

2

4·10−3

(d) 2bv/√RST = 10

‖v‖2 /√RST

−200 0 200

0

2

4

6·10−9

(e) 2bv/√RST = 2

f(x,v

)[m

−3s3]

−200 0 200

0

2

4

6·10−9

(f) 2bv/√RST = 3

−200 0 200

0

2

4

6·10−9

(g) 2bv/√RST = 5

vk/√RST

f(x,v

)[m

−3s3]

−200 0 200

0

2

4

6·10−9

(h) 2bv/√RST = 10

vk/√RST

Figure 5: One-particle Maxwell-Boltzmann distribution feq(v) ( ) as a function of the dimensionless molecular

speed ‖v‖2 /√RT for different dimensionless domain sizes bv/

√RT . In each case, we demonstrate convergence of

the canonical tensor decomposition (1) of feq as we increase the number of basis functions Q. Specifically, we plotthe cases: Q = 3 ( ), Q = 5 ( ), Q = 11 ( ) and Q = 19 ( ).

where ‖·‖1 is the standard vector 1-norm. In Figure 6(b) we plot the NMAE as a function of

Q√RT/bv. This results in a collapse of the data which suggests that if one wants to double the

non-dimensional velocity domain size while maintaining the same accuracy, the series truncationorder Q needs to be doubled as well. In addition, it can be seen that starting at Q

√RT/bv ≈ 1 the

error decays rapidly with increasing Q√RT/bv.

4.2.2. Steady State Simulations

We first test the the Boltzmann-BGK solver we developed in Section (3.1) on an initial valueproblem where the initial condition is set to be the Maxwell-Boltzmann distribution feq(v) (seeequation (46)). A properly working algorithm should keep such equilibrium distribution unchangedas the simulation progresses. In Figure 7 we plot the distribution of molecular velocities f(v, t)as function of the molecular speed ‖v‖2 |ξ| at various dimensionless times t/τR, where τR is therelaxation time in Table 4. It is seen that the alternating least squares algorithm we developedin Section (3.1) has a stable fixed point at the Maxwell-Boltzmann equilibrium distribution feq.This is an important test, as convergence of alternating least square is, in general, not granted forarbitrary residuals (see, e.g., [35, 70]).

We also computed the moments of the distribution function f(x,v, t) at different times to verifywhether ALS iterations preserve (density), momentum (velocity), and kinetic energy (temperature).

17

5 10 15 204

6

8

10

Q

2b v/√

RST

0

4

8

12

16

NMAE[%

]

(a)

10−0.6 10−0.4 10−0.2 100 100.2 100.4 100.610−3

10−2

10−1

100

101

102

103

Q√RST/ (2bv)

NMAE[%

]

(b)(b)

Figure 6: (a) Normalized Mean Absolute Error (NMAE) (58) between the analytical and the canonical tensor seriesexpansion of the Maxwell-Boltzmann equilibrium distribution versus versus Q and the dimensionless velocity domainsize bv/

√RT . In Figure (b) we plot the NMAE in a log-log scale as function of Q

√RT/bv . We emphasize that the

range of Q and bv/√RT is the same in both Figures.

In particular, we computed

〈ρ〉 (t) =1

b3x

∫ bx

−bx· · ·∫ bv

−bvf (x,v, t) dxdv (59)

〈u〉 (t) =1

〈ρ〉 b3r

∫ bx

−bx· · ·∫ bv

−bvvf (x,v, t) dxdv, and (60)

〈T 〉 (t) =1

〈ρ〉 b3xRS

∫ bx

−bx· · ·∫ bv

−bv‖v − 〈u〉‖22 f (x,v, t) dxdv (61)

In Figure 8 we plot(59)-(61) versus time. It is seen that such quantities are indeed constants, i.e.,ALS iterations preserve the average density, velocity and temperature.

4.2.3. Relaxation to Statistical Equilibrium

In this section we consider relaxation statistical equilibrium in the linearized Boltzmann-BGKmodel (45). This allows us to study the accuracy and computational efficiency of the proposed ALSalgorithm in transient dynamics. To this end, we consider the initial condition

f (x,v, 0) = feq

(1 + ε

3∏k=1

cos

(2πxkbx

))(62)

with ε = 0.3. Such initial condition is a slight perturbation of the Maxwell-Boltzmann equilibriumdistribution (see Figure 9(a)). The simulation results we obtain are shown in Figures 9(b)-(d),where we plot the time evolution of the marginal PDF f(v, t) versus the dimensionless time t/τR.It is seen that the initial condition (62) is in the basin of attraction of feq, i.e., the ALS algorithmsends f (x,v, 0) into feq (v) after a transient. Moreover, f(v, t) is attracted by feq(v) exponentiallyfast in time at a rate τR = 1/ν (see Figure 11), consistently with results of perturbation analysis.In Figure 10 we demonstrate that the parallel ALS algorithm we developed conserves the average

18

0 200 400 600 800 1,000 1,2000

2

4

6·1022

(a) t/τr = 0.00

f(v)[m

−1s]

0 200 400 600 800 1,000 1,2000

2

4

6·1022

(b) t/τr = 0.33

0 200 400 600 800 1,000 1,2000

2

4

6·1022

(c) t/τr = 0.67

‖v‖2 [m s−1]

f(v)[m

−1s]

0 200 400 600 800 1,000 1,2000

2

4

6·1022

(d) t/τr = 1.00

‖v‖2 [m s−1]

Figure 7: Solution to the Boltzmann-BGK equation (45) at different dimensionless times t/τR (τR is the relaxationtime in Table 4). Specifically, we plot the Maxwell-Boltzmann equilibrium distribution ( ), its canonical tensordecomposition with Q = 11 modes, and its the time evolution with the ALS algorithm we described in Section 3.1( ). It is seen that ALS has a stable fixed point at feq(v). This is an important test as, in general, ALS iterationsare not granted to converge (e.g., [35]).

density of particle, velocity (momentum) and temperature throughout the transient that yieldsstatistical equilibrium.

In Figure 11 we plot the the Normalized Mean Absolute Error (NMAE) between the Maxwell-Boltzmann distribution feq(v) and the distribution we computed numerically by using the proposedALS algorithm. Specifically we plot NMAE versus the dimensionless time t/τR for different time-steps ∆t we employed in our simulations. The goal of such study is to assess the sensitivity ofparallel ALS iterations on the time step ∆t, and in turn on the total number of steps for a fixedintegration time T = 5τR. Results of Figure 11 demonstrate that the decay of the distributionfunction f(x,v, t) to the its equilibrium state is indeed exponential. Moreover, we see that theALS algorithm is robust to ∆t, which implies that the such algorithm is not sensitive to the totalnumber of time steps we consider within a fixed integration time.

Finally, we’d like to address scalability of the parallel ALS algorithm we developed in ( B)., i.e.,performance relative to the number of degrees of freedom and number of CPUs. To this end, weanalyze here a few test cases we ran on a small workstation with 4 CPUs. The initial conditionis set as feq in all cases and we integrated the Boltzmann-BGK model (45) for 1000 steps, with∆t = 0.01τR. The separation rank of the solution PDF is set to r = 1 (see equation (1)). InFigure 12 (a) we plot the wall time twall versus the total number of degrees of freedom of the ALSalgorithm, i.e. 6Qr. The plot shows that a typical simulation of the Boltzmann-BGK model in

19

0 0.2 0.4 0.6 0.8 12.2

2.4

2.6

·1025

(a)

t/τr

〈ρ〉[m

−3]

0 0.2 0.4 0.6 0.8 1−1

−0.5

0

0.5

1·10−3

(b)

t/τr

‖〈v〉‖

[ms−

1]

0 0.2 0.4 0.6 0.8 1

280

300

320 (c)

t/τr

〈T〉[K]

Figure 8: Average density 〈ρ〉 (t), absolute average velocity ‖〈u〉‖2, and average temperature 〈T 〉 as function ofthe dimensionless time t/τR: simulation results ( ); results at equilibrium ( ). The simulation results areobtained by setting the initial condition in (45) equal to the Maxwell-Boltzmann equilibrium distribution feq andsolving the Boltzmann-BGK model with the ALS algorithm described in Section 3.1. Clearly, ALS iterations preservethe average density, velocity and temperature.

6 dimensions with r = 1 and Q = 11 takes about twall ≈ 80min on a single core to complete asimulation up t = 10 s. With such value of Q there is an error of only 0.5% compared to theanalytical Maxwell-Boltzmann distribution function. The total wall time scales with the power 5/2relative to number of degrees of freedom 6Qr. In Figure 12 we show that the total wall time scaleswe obtain with the proposed parallel ALS algorithm scales almost linearly with the number of CPUs(power −4/5). The leveling off of performance for 4 cores can be attributed to the fact that thedesktop computer we employed for our simulations is optimized for inter-core communication. Weexpect better scaling performance of our ALS algorithm in computer clusters with infinity bandinter-core communication.

5. Summary

In this paper we presented new parallel algorithms to solve high-dimensional partial differentialequations based on numerical tensor methods. In particular, we studied canonical and hierarchicaltensor expansions combined with alternating least squares and high-order singular value decom-position. Both implicit and explicit time integration methods were presented and discussed. Wedemonstrated the accuracy and computational efficiency of the proposed methods in simulating thetransient dynamics generated by a linear advection equation in six dimensions plus time, and thewell-known Boltzmann-BGK equation. We found that the algorithms we developed are extremelyfast and accurate, even in a naive Matlab implementation. Unlike direct simulation Monte Carlo(DSMC), the high accuracy of the numerical tensor methods we propose makes it suitable for study-ing transient problems. On the other hand, given the nature of the numerical tensor discretization,implementation of the proposed algorithms to complex domains is not straightforward.

Acknowledgements

We would like to thank N. Urien for useful discussions. D. Venturi was supported by the AFOSRgrant FA9550-16-1-0092.

20

0 200 400 600 800 1,000 1,2000

2

4

6

8·1022

(a) t/τr = 0.00

f(v)[m

−1s]

0 200 400 600 800 1,000 1,2000

2

4

6

8·1022

(b) t/τr = 1.67

0 200 400 600 800 1,000 1,2000

2

4

6

8·1022

(c) t/τr = 3.33

‖v‖2 [m s−1]

f(v)[m

−1s]

0 200 400 600 800 1,000 1,2000

2

4

6

8·1022

(d) t/τr = 5.00

‖v‖2 [m s−1]

Figure 9: Relaxation to statistical equilibrium in the linearized Boltzmann-BGK model (45). Shown are timesnapshots of the marginal PDF p(v, t) we obtain by using the parallel ALS algorithm we proposed in Section3.1 ( ). Note that the initial condition f(v, 0) is a slight perturbation of the Maxwell-Boltzmann distribution( ). Note that f(v, t) converges to feq(v) exponentially fast in time (see Figure 11), consistently with results ofperturbation analysis.

A. Boltzmann Equation and its Approximations

In the classical kinetic theory of rarefied gas dynamics, the gas flow is described in terms of anon-negative density function f(x,v, t) which provides the number of gas particles at time t withvelocity v ∈ R3 at position x ∈ R3. The density function f satisfies the Boltzmann equation[71, 72, 65]. In the absence of external forces, such equation can be written as

∂f

∂t+ v · ∇xf = Q(f, f), (A.1)

where Q(f, f) is the collision integral describing the effects of internal forces due to particle inter-actions. From a mathematical viewpoint the collision integral is a functional of the particle density.Its form depends on the microscopic dynamics. For example, for classical rarefied gas flows [71, 73],

Q(f, f)(x,v, t) =

∫R3

∫S2B(v,v1,ω) |f(x,v′, t)f(x,v′1, t)− f(x,v, t)f(x,v1, t)|dωdv1. (A.2)

In this expression,

v′ =1

2(v + v1 + ‖v − v1‖2w) , v′1 =

1

2(v + v1 − ‖v − v1‖2w) (A.3)

21

0 1 2 3 4 52.2

2.4

2.6

·1025

(a)

t/τr

〈ρ〉[m

−3]

0 1 2 3 4 5−1

−0.5

0

0.5

1·10−3

(b)

t/τr

‖〈v〉‖

[ms−

1]

0 1 2 3 4 5

280

300

320 (c)

t/τr

〈T〉[K]

Figure 10: Conservation of the average density 〈ρ〉, absolute average velocity ‖〈v〉‖2, and average temperature 〈T 〉during relaxation to statistical equilibrium. Shown are results obtained by using the parallel ALS algorithm wedeveloped in Section 3.1 ( ), and the equilibrium initial values ( ).

0 1 2 3 4 5

0

10

20

30(a) ∆t/τr = 0.1

t/τr

NM

AE

[%]

0 1 2 3 4 5

0

10

20

30(b) ∆t/τr = 0.01

t/τr

0 1 2 3 4 5

0

10

20

30(c) ∆t/τr = 0.001

t/τr

Figure 11: Relaxation to statistical equilibrium. Normalized Mean Absolute Error (NMAE) between the Maxwell-Boltzmann distribution feq(v) and the numerical solution we obtain with the parallel ALS algorithm we proposed inSection 3.1 ( ). We also plot the theoretically predicted exponential decay ( ), and the best approximationerror that we obtained based on approximating the equilibrium distribution feq with a canonical tensor decomposition(1) with Q = 11 modes ( ), Q = 11 being the number of modes employed in the transient simulation.

and {v,v1} represent, respectively, the velocities of two particles before and after the collision, ω isa vector on the unit sphere S2, and B(v,v1,ω) is the collision kernel. Such kernel is a non-negativefunction depending on the (Euclidean) norm ‖v − v1‖2 and on the scattering angle θ between therelative velocities before and after the collision

cos(θ) =(v − v1) · ω‖v − v1‖2

. (A.4)

Thus, B(v,v1,ω) becomes

B(v,v1,ω) = ‖v − v1‖2 σ (‖v − v1‖2 , cos(θ)) , (A.5)

σ being the scattering cross section function [73]. The collision operator (A.2) satisfies a system ofconservation laws in the form ∫

R3

Q(f, f)(x,v, t)ψ(v)dv = 0, (A.6)

22

25 50 100 200

103

104

105(a)

Q ·R ·K

t wall[s]

twall ∝ (Q ·R ·K)5/2

1 2 3 4

103.2

103.4

103.6

(b)

#CPU’s

t wall[s]

twall ∝ (#CPU’s)−4/5

Figure 12: Wall time twall versus the number of degrees of freedom of the ALS simulation (a), and versus the numberCPUs (b).

where ψ(v) is either {1,v, ‖v‖22}. This yields, respectively, conservation of mass, momentum andenergy. Moreover, Q(f, f) satisfies the Boltzmann H-theorem∫

R3

Q(f, f)(x,v, t) log (f(x,v, t)) dv ≤ 0 (A.7)

Such theorem implies that any equilibrium distribution function, i.e. any function f for whichQ(f, f) = 0, has the form of a locally Maxwellian distribution

feq(x,v, t) =ρ(x, t)

(2πRT (x, t))3/2exp

(−‖u(x, t)− v‖22

2RT (x, t)

), (A.8)

where R is the gas constant, and ρ(x, t), u(x, t) and T (x, t) are the density, mean velocity andtemperature of the gas, defined as

ρ(x, t) =

∫R3

f(x,v, t)dv, (A.9)

u(x, t) =1

ρ(x, t)

∫R3

vf(x,v, t)dv, (A.10)

T (x, t) =1

3ρ(x, t)

∫R3

‖u(x, t)− v‖22 f(x,v, t)dv. (A.11)

From a mathematical viewpoint, the Boltzmann equation (A.1) is a nonlinear integro-differentialequation for a positive scalar field in six dimensions plus time. By taking suitable averages oversmall volumes in position space, it can be shown that the Boltzmann equation is consistent withthe compressible Euler’s equations [74, 75], and the Navier-Stokes equations [76, 77, 78].

BGK Approximation. The simplest collision operator satisfying the conservation law (A.6) and thecondition

Q(f, f) = 0⇔ f(x,v, t) = feq(x,v, t) (A.12)

23

(see equation (A.8)) was proposed by Bhatnagar, Gross and Krook in [79]. The correspondingmodel, which is known as the BGK model, is defined by the linear relaxation operator

Q(f, f) = ν(x, t) [feq(x,v, t)− f(x,v, t)] ν(x, t) > 0. (A.13)

The field ν is usually set to be proportional to the density and the temperature of the gas [80]

ν(x, t) = κρ(x, t)T (x, t)1−µ. (A.14)

It can be shown that the Boltzmann-BGK model (A.13) converges to the correct Euler equationsof incompressible fluid dynamics with the scaling x′ = εx, t′ = εt, and in the limit ε→ 0. However,the model does not converge to the correct Navier-Stokes limit. The main reason is that it predictsan unphysical Prandtl number [81], which is larger that one obtained with the full collision operator(A.2). The correct Navier-Stokes limit can be recovered by more sophisticated BGK-type models,e.g., the ellipsoidal statistical BGK model [82].

Following [83, 84, 85, 86], we introduce additional simplifications, namely we assume thatν(x, t) = ν is constant and assume the equilibrium density, temperature and velocity to be ho-mogeneous across the spatial domain. Assuming u(x, t) = 0, this yields an equilibrium distributionin (46). The last assumption effectively decouples the BGK collision operator from the prob-ability density function f(x,v, t). This, in turn, makes the BGK approximation linear, i.e., asix-dimensional PDE (45). Using both the isothermal assumption T (x, t) = T [79], and constantdensity assumption ρ(x, t) = ρ [84, 85, 86] means that our code can only be used to obtain so-lutions which are small perturbations from the equilibrium distribution. The term ν [feq(v)− f ]in (45) represents the relaxation time approximation of the Boltzmann equation, ν is the collisionfrequency, here assumed to be homogeneous.

B. Parallelization of the ALS Algorithm for the Boltzmann-BGK Equation

In this appendix we describe the parallel alternating least squares algorithms we developed tosolve the linearized Boltzmann-BGK equation (45). Algorithm 1 shows the main Alternating LeastSquare (ALS) routine while algorithm 2, 3, 4, and 5 show the different subroutines. Each timestep, n, the value of βOld is updated and then βNew is determined in an iterative manner. Both thecomputations of the matrices Mk and the vectors γk, as well as the calculation of the vectors βNew,k

can be performed independently of each other, and thus can be easily parallelized. The currentimplementation assigns one core per dimension, but further optimization could be possible by imple-menting a parallel method to determine βNew [87]. At each time step the CreateMatrixM subroutine(algorithm 2) updates the different elements of matrix Mk for every iteration until convergence hasbeen reached. The algorithm iterates over every element of the matrix and performs the multipli-cation and summation of the elements of M . All the integrals were performed analytically and theresults are stored in the map M (k, l, z, s, q). Using this approach delivers significant savings in thecomputational cost of updating the matrix every iteration. The subroutine CreateVectorGamma

(algorithm 3) updates the vector γk. Similarly to the subroutine for the creation of matrix Mk thisalgorithm iterates over all elements of the vector, and fills each of them using the values of βNew

and βOld, and pre-calculated maps.With the completion of the calculation ofMk and γk, βNew can be updated using the ComputeBeta

subroutine (algorithm 4). Matrix Mk, depending on the rank rG, and series truncation Q, can be-come very large. In addition, it is not known whether the matrix is invertible. This is why the

24

Algorithm 1 Parallel Alternating Least Squares (ALS) algorithm

1: procedure Main2: Initialization . Load variables and allocate matrices3: βNew = β0 . Set initial condition4: for N ← 1 : NIter do5: εConv ← εTol + 1 . Reset convergence condition6: βOld ← βNew

7: while εConv > εTol do8: βInt ← βNew

9: parfor k ← 1 : K do . This for loop is executed in parallel10: Mk ← CreateMatrixM (βNew,k, k)11: γk ← CreateVectorGamma(βNew,k, βOld,k, k)

12: parfor k ← 1 : K do . This for loop is executed in parallel13: βNew,k ← ComputeBeta(Mk, βNew,k, γk) . Compute updated βNew

14: εConv ← ComputeConvergence(βInt, βNew) . Update convergence condition15: βNew ← βInt + (βNew − βInt) /4 . Update βNew

16: end procedure

system Mkβk = γk is solved using the iterative least square method, LSQR [88, 89]. For everyiteration the value for βNew,k which was determined in the previous iteration is used as the initialguess for the algorithm.

Convergence of the system is calculated using the ComputeConvergence subroutine (algorithm 5).Because it is very computationally expensive to calculate the full residual every iteration, insteadthe convergence is determined using:

εConv = max

(‖βNew,k − βInt,k‖‖βInt,k‖

)(B.1)

and when εConv < εTol, the while loop in algorithm 1 is allowed to exit. In its currently implemen-tation the algorithm is able to reach convergence using a fixed Rank G, rG = 2. However, a futureimplementation could implement a dynamic Rank G for improved accuracy. Every iteration thenew value of βNew is calculated according to:

βNew ← βInt +βNew − βInt

∆β(B.2)

with ∆β = 4. To find the location where the residual reaches a minimum, it is important togradually approach this location and not update βNew too aggressively by taking a smaller valueof ∆β. This causes overshooting of the minimum and the prevents convergence of βNew and thusminimization of the residual.

References

References

[1] C. Cercignani, The Boltzmann Equation and Its Applications, Springer New York, 1988.doi:10.1007/978-1-4612-1039-9.

25

http://dx.doi.org/10.1007/978-1-4612-1039-9

Algorithm 2 ALS Subroutines: CreateMatrixM

1: function CreateMatrixM(βNew,k, k)2: for l← 1 : RG do . Loop over all elements of Mk

3: for z ← 1 : RG do4: for s← −|Q| /2 : |Q| /2 do5: for q ← −|Q| /2 : |Q| /2 do6: for k′ ← 1 : K do . Loop over all elements of M for each element of Mk

7: for η ← 1 : rA do8: for υ ← 1 : rA do9: j ← (η − 1) rA + υ

10: if k’ = k then11: Mk′,j ←M(k′, η, υ, s, q)12: else13: Mk′,j ← 014: for s′ ← −|Q| /2 : |Q| /2 do15: for q′ ← −|Q| /2 : |Q| /2 do16: Mk′,j ← Mk′,j+βNew,k′,(l−1)Q+s′βNew,k′,(z−1)Q+q′M(k′, η, υ, s′, q′)

17: Mk,(z−1)Q+q,(l−1)Q+s ←∑r2Aj=1

∏Kk′=1 Mk′,j . Fill matrix Mk

18: return Mk

19: end function

[2] F. Moss, P. V. E. McClintock (Eds.), Noise in nonlinear dynamical systems, CambridgeUniversity Press, 1989. doi:10.1017/cbo9780511897832.

[3] K. Sobczyk, Stochastic Differential Equations, Springer Netherlands, 1991. doi:10.1007/

978-94-011-3712-6.

[4] D. Venturi, T. P. Sapsis, H. Cho, G. E. Karniadakis, A computable evolution equation for thejoint response-excitation probability density function of stochastic dynamical systems, Pro-ceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 468 (2139)(2011) 759–783. doi:10.1098/rspa.2011.0186.

[5] J. Li, J. Chen, Stochastic Dynamics of Structures, John Wiley & Sons, Ltd, 2009. doi:

10.1002/9780470824269.

[6] M. F. Shlesinger, T. Swean, Stochastically Excited Nonlinear Ocean Structures, WORLDSCIENTIFIC, 1998. doi:10.1142/3717.

[7] A. S. Monin, A. M. Yaglom, Statistical Fluid Mechanics, Volume II: Mechanics of Turbulence,Dover, 2007.

[8] S. B. Pope, Lagrangian PDF methods for turbulent flows, Annu. Rev. Fluid Mech. 26 (1)(1994) 23–63. doi:10.1146/annurev.fl.26.010194.000323.

[9] U. Frisch, Turbulence: The legacy of A. N. Kolmogorov, Cambridge University Press, 1995.

[10] D. Venturi, The numerical approximation of nonlinear functionals and functional differentialequations, Phys. Rep. 732 (2018) 1–102. doi:10.1016/j.physrep.2017.12.003.

26

http://dx.doi.org/10.1017/cbo9780511897832

http://dx.doi.org/10.1007/978-94-011-3712-6

http://dx.doi.org/10.1007/978-94-011-3712-6

http://dx.doi.org/10.1098/rspa.2011.0186

http://dx.doi.org/10.1002/9780470824269

http://dx.doi.org/10.1002/9780470824269

http://dx.doi.org/10.1142/3717

http://dx.doi.org/10.1146/annurev.fl.26.010194.000323

http://dx.doi.org/10.1016/j.physrep.2017.12.003

[11] L. Cowen, T. Ideker, B. J. Raphael, R. Sharan, Network propagation: A universal amplifierof genetic associations, Nat Rev Genet 18 (9) (2017) 551–562. doi:10.1038/nrg.2017.38.

[12] J. Zinn-Justin, Quantum Field Theory and Critical Phenomena, 4th Edition, Oxford Univer-sity Press, 2002. doi:10.1093/acprof:oso/9780198509233.001.0001.

[13] D. J. Amit, V. Martin-Mayor, Field Theory, the Renormalization Group, and Critical Phe-nomena, WORLD SCIENTIFIC, 2005. doi:10.1142/5715.

[14] H. Risken, The Fokker-Planck Equation, 2nd Edition, Springer Berlin Heidelberg, 1989. doi:10.1007/978-3-642-61544-3.

[15] D. M. Tartakovsky, P. A. Gremaud, Method of distributions for uncertainty quantifica-tion, in: R. Ghanem, D. Higdon, H. Owhadi (Eds.), Handbook of Uncertainty Quantifi-cation, Springer International Publishing, Switzerland, 2017, pp. 763–783. doi:10.1007/

978-3-319-12385-1\_27.

[16] G. Dimarco, L. Pareschi, Numerical methods for kinetic equations, Acta Numerica 23 (2014)369–520. doi:10.1017/s0962492914000063.

[17] S. Rjasanow, W. Wagner, Stochastic numerics for the Boltzmann equation, Springer, 2004.

[18] H. Babovsky, H. Neunzert, On a simulation scheme for the Boltzmann equation, Math. Meth.Appl. Sci. 8 (1) (1986) 223–233. doi:10.1002/mma.1670080114.

[19] S. Ansumali, Mean-field model beyond Boltzmann-enskog picture for dense gases, Commun.Commut. Phys. 9 (05) (2011) 1106–1116. doi:10.4208/cicp.301009.240910s.

[20] S. Ramanathan, D. L. Koch, An efficient direct simulation Monte Carlo method for low machnumber noncontinuum gas flows based on the bhatnagar–gross–krook model, Phys. Fluids21 (3) (2009) 033103. doi:10.1063/1.3081562.

[21] J.-P. M. Peraud, C. D. Landon, N. G. Hadjiconstantinou, Monte Carlo methods for solvingthe Boltzmann transport equation, Annual Rev Heat Transfer 17 (N/A) (2014) 205–265.doi:10.1615/annualrevheattransfer.2014007381.

[22] F. Rogier, J. Schneider, A direct method for solving the Boltzmann equation, Transp. TheoryStat. Phys. 23 (1-3) (1994) 313–338. doi:10.1080/00411459408203868.

[23] H. Cho, D. Venturi, G. Karniadakis, Numerical methods for high-dimensional probabilitydensity function equations, J. Comput. Phys. 305 (2016) 817–837. doi:10.1016/j.jcp.

2015.10.030.

[24] Z. Zhang, G. E. Karniadakis, Numerical Methods for Stochastic Partial DifferentialEquations with White Noise, Springer International Publishing, 2017. doi:10.1007/

978-3-319-57511-7.

[25] W. E, J. Han, A. Jentzen, Deep learning-based numerical methods for high-dimensionalparabolic partial differential equations and backward stochastic differential equations, Com-mun. Math. Stat. 5 (4) (2017) 349–380. doi:10.1007/s40304-017-0117-6.

27

http://dx.doi.org/10.1038/nrg.2017.38

http://dx.doi.org/10.1093/acprof:oso/9780198509233.001.0001

http://dx.doi.org/10.1142/5715

http://dx.doi.org/10.1007/978-3-642-61544-3

http://dx.doi.org/10.1007/978-3-642-61544-3

http://dx.doi.org/10.1007/978-3-319-12385-1_27

http://dx.doi.org/10.1007/978-3-319-12385-1_27

http://dx.doi.org/10.1017/s0962492914000063

http://dx.doi.org/10.1002/mma.1670080114

http://dx.doi.org/10.4208/cicp.301009.240910s

http://dx.doi.org/10.1063/1.3081562

http://dx.doi.org/10.1615/annualrevheattransfer.2014007381

http://dx.doi.org/10.1080/00411459408203868

http://dx.doi.org/10.1016/j.jcp.2015.10.030


http://dx.doi.org/10.1007/978-3-319-57511-7

http://dx.doi.org/10.1007/978-3-319-57511-7

http://dx.doi.org/10.1007/s40304-017-0117-6

[26] M. Bachmayr, R. Schneider, A. Uschmajew, Tensor networks and hierarchical tensors for thesolution of high-dimensional partial differential equations, Found Comput Math 16 (6) (2016)1423–1472. doi:10.1007/s10208-016-9317-9.

[27] J. S. Hesthaven, S. Gottlieb, D. Gottlieb, Spectral Methods for Time-Dependent Problems,Vol. 21, Cambridge University Press, 2007. doi:10.1017/cbo9780511618352.

[28] G. Beylkin, J. Garcke, M. J. Mohlenkamp, Multivariate regression and machine learningwith sums of separable functions, SIAM J. Sci. Comput. 31 (3) (2009) 1840–1857. doi:

10.1137/070710524.

[29] E. Acar, D. M. Dunlavy, T. G. Kolda, A scalable optimization approach for fitting canonicaltensor decompositions, J. Chemometrics 25 (2) (2011) 67–86. doi:10.1002/cem.1335.

[30] M. Espig, W. Hackbusch, A regularized newton method for the efficient approximation oftensors represented in the canonical tensor format, Numer. Math. 122 (3) (2012) 489–525.doi:10.1007/s00211-012-0465-9.

[31] L. Karlsson, D. Kressner, A. Uschmajew, Parallel algorithms for tensor completion in the CPformat, Parallel Comput. 57 (2016) 222–234. doi:10.1016/j.parco.2015.10.002.

[32] A. Doostan, A. Validi, G. Iaccarino, Non-intrusive low-rank separated approximation of high-dimensional stochastic models, Comput. Methods Appl. Mech. Eng. 263 (2013) 42–55. doi:

10.1016/j.cma.2013.04.003.

[33] M. J. Reynolds, A. Doostan, G. Beylkin, Randomized alternating least squares for canonicaltensor decompositions: Application to a PDE with random data, SIAM J. Sci. Comput. 38 (5)(2016) A2634–A2664. doi:10.1137/15m1042802.

[34] C. Battaglino, G. Ballard, T. G. Kolda, A practical randomized CP tensor decomposition(2017). arXiv:1701.06600.

[35] A. Uschmajew, Local convergence of the alternating least squares algorithm for canonicaltensor approximation, SIAM J. Matrix Anal. & Appl. 33 (2) (2012) 639–652. doi:10.1137/110843587.

[36] M. Espig, W. Hackbusch, A. Khachatryan, On the convergence of alternating least squaresoptimisation in tensor format representations (2015). arXiv:1506.00062.

[37] J. C. Bezdek, R. J. Hathaway, Convergence of alternating optimization, Neural Parallel Sci.Comput. 11 (2003) 351–368.

[38] T. Rohwedder, A. Uschmajew, On local convergence of alternating schemes for optimizationof convex problems in the tensor train format, SIAM J. Numer. Anal. 51 (2) (2013) 1134–1162.doi:10.1137/110857520.

[39] J. M. Ortega, W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Sev-eral Variables, Society for Industrial and Applied Mathematics, 2000. doi:10.1137/1.

9780898719468.

[40] O. Kaya, B. Ucar, Parallel Candecomp/Parafac decomposition of sparse tensors using dimen-sion trees, SIAM J. Sci. Comput. 40 (1) (2018) C99–C130. doi:10.1137/16m1102744.

28

http://dx.doi.org/10.1007/s10208-016-9317-9

http://dx.doi.org/10.1017/cbo9780511618352

http://dx.doi.org/10.1137/070710524

http://dx.doi.org/10.1137/070710524

http://dx.doi.org/10.1002/cem.1335

http://dx.doi.org/10.1007/s00211-012-0465-9

http://dx.doi.org/10.1016/j.parco.2015.10.002

http://dx.doi.org/10.1016/j.cma.2013.04.003

http://dx.doi.org/10.1016/j.cma.2013.04.003

http://dx.doi.org/10.1137/15m1042802

http://arxiv.org/abs/1701.06600

http://dx.doi.org/10.1137/110843587

http://dx.doi.org/10.1137/110843587


http://dx.doi.org/10.1137/110857520

http://dx.doi.org/10.1137/1.9780898719468

http://dx.doi.org/10.1137/1.9780898719468

http://dx.doi.org/10.1137/16m1102744

[41] W. Hackbusch, S. Kuhn, A new scheme for the tensor representation, J Fourier Anal Appl15 (5) (2009) 706–722. doi:10.1007/s00041-009-9094-9.

[42] W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Berlin Heidelberg,2012. doi:10.1007/978-3-642-28027-6.

[43] T. G. Kolda, B. W. Bader, Tensor decompositions and applications, SIAM Rev. 51 (3) (2009)455–500. doi:10.1137/07070111x.

[44] L. Grasedyck, Hierarchical singular value decomposition of tensors, SIAM J. Matrix Anal. &Appl. 31 (4) (2010) 2029–2054. doi:10.1137/090764189.

[45] L. De Lathauwer, B. De Moor, J. Vandewalle, A multilinear singular value decomposition,SIAM J. Matrix Anal. & Appl. 21 (4) (2000) 1253–1278. doi:10.1137/s0895479896305696.

[46] D. Venturi, A fully symmetric nonlinear biorthogonal decomposition theory for random fields,Physica D 240 (4-5) (2011) 415–425. doi:10.1016/j.physd.2010.10.005.

[47] A. Peres, Higher order Schmidt decompositions, Phys. Lett. A 202 (1) (1995) 16–17. doi:

10.1016/0375-9601(95)00315-t.

[48] V. de Silva, L.-H. Lim, Tensor rank and the ill-posedness of the best low-rank approximationproblem, SIAM J. Matrix Anal. & Appl. 30 (3) (2008) 1084–1127. doi:10.1137/06066518x.

[49] C. J. Hillar, L.-H. Lim, Most tensor problems are NP-hard, JACM 60 (6) (2013) 1–39. doi:10.1145/2512329.

[50] N. Vannieuwenhoven, J. Nicaise, R. Vandebril, K. Meerbergen, On generic nonexistence ofthe Schmidt–eckart–young decomposition for complex tensors, SIAM J. Matrix Anal. & Appl.35 (3) (2014) 886–903. doi:10.1137/130926171.

[51] D. Kressner, C. Tobler, Algorithm 941, ACM Trans. Math. Softw. 40 (3) (2014) 1–22. doi:

10.1145/2538688.

[52] C. Da Silva, F. J. Herrmann, Optimization on the hierarchical tucker manifold – applicationsto tensor completion, Linear Algebra and its Applications 481 (2015) 131–173. doi:10.1016/j.laa.2015.04.015.

[53] I. V. Oseledets, Tensor-train decomposition, SIAM J. Sci. Comput. 33 (5) (2011) 2295–2317.doi:10.1137/090752286.

[54] L. Grasedyck, R. Kriemann, C. Lobbert, A. Nagel, G. Wittum, K. Xylouris, Parallel tensorsampling in the hierarchical tucker format, Comput. Visual Sci. 17 (2) (2015) 67–78. doi:

10.1007/s00791-015-0247-x.

[55] A. Nouy, Higher-order principal component analysis for the approximation of tensors in tree-based low-rank formats (2017). arXiv:1705.00880.

[56] S. Etter, Parallel ALS algorithm for solving linear systems in the hierarchical tucker repre-sentation, SIAM J. Sci. Comput. 38 (4) (2016) A2585–A2609. doi:10.1137/15m1038852.

29

http://dx.doi.org/10.1007/s00041-009-9094-9

http://dx.doi.org/10.1007/978-3-642-28027-6

http://dx.doi.org/10.1137/07070111x

http://dx.doi.org/10.1137/090764189

http://dx.doi.org/10.1137/s0895479896305696

http://dx.doi.org/10.1016/j.physd.2010.10.005

http://dx.doi.org/10.1016/0375-9601(95)00315-t

http://dx.doi.org/10.1016/0375-9601(95)00315-t

http://dx.doi.org/10.1137/06066518x

http://dx.doi.org/10.1145/2512329

http://dx.doi.org/10.1145/2512329

http://dx.doi.org/10.1137/130926171

http://dx.doi.org/10.1145/2538688

http://dx.doi.org/10.1145/2538688

http://dx.doi.org/10.1016/j.laa.2015.04.015

http://dx.doi.org/10.1016/j.laa.2015.04.015

http://dx.doi.org/10.1137/090752286

http://dx.doi.org/10.1007/s00791-015-0247-x

http://dx.doi.org/10.1007/s00791-015-0247-x


http://dx.doi.org/10.1137/15m1038852

[57] L. Grasedyck, C. Lobbert, Distributed Hierarchical SVD in the Hierarchical Tucker Format(2017). arXiv:1708.03340.

[58] A.-H. Phan, P. Tichavsky, A. Cichocki, CANDECOMP/PARAFAC decomposition of high-order tensors through tensor reshaping, IEEE Trans. Signal Process. 61 (19) (2013) 4847–4860.doi:10.1109/tsp.2013.2269046.

[59] G. Zhou, A. Cichocki, S. Xie, Accelerated canonical polyadic decomposition using modereduction, IEEE Trans. Neural Netw. Learning Syst. 24 (12) (2013) 2051–2062. doi:10.

1109/tnnls.2013.2271507.

[60] D. Hatch, D. del Castillo-Negrete, P. Terry, Analysis and compression of six-dimensional gy-rokinetic datasets using higher order singular value decomposition, J. Comput. Phys. 231 (11)(2012) 4234–4256. doi:10.1016/j.jcp.2012.02.007.

[61] K. Kormann, A semi-Lagrangian vlasov solver in tensor train format, SIAM J. Sci. Comput.37 (4) (2015) B613–B632. doi:10.1137/140971270.

[62] S. Dolgov, A. Smirnov, E. Tyrtyshnikov, Low-rank approximation in the numerical modelingof the Farley–buneman instability in ionospheric plasma, J. Comput. Phys. 263 (2014) 268–282. doi:10.1016/j.jcp.2014.01.029.

[63] A. Quarteroni, R. Sacco, F. Saleri, Numerical Mathematics, 2nd Edition, Springer New York,2007. doi:10.1007/978-0-387-22750-4.

[64] H.-K. Rhee, R. Aris, N. R. Amundson, First-order partial differential equations, volume 1:Theory and applications of single equations, Dover, 2001.

[65] G. Dimarco, R. Loubere, J. Narski, T. Rey, An efficient numerical method for solving theBoltzmann equation in multidimensions, J. Comput. Phys. 353 (2018) 46–81. doi:10.1016/j.jcp.2017.10.010.

[66] L. Desvillettes, S. Mischler, About the splitting algorithm for Boltzmann and B.G.K.equations, Math. Models Methods Appl. Sci. 06 (08) (1996) 1079–1101. doi:10.1142/

s0218202596000444.

[67] P. Shuleshko, A new method of solving boundary-value problems of mathematical physics,Aust. J. Appl. Sci 10 (1959) 1–7.

[68] L. J. Snyder, T. W. Spriggs, W. E. Stewart, Solution of the equations of change by galerkin’smethod, AIChE J. 10 (4) (1964) 535–540. doi:10.1002/aic.690100423.

[69] B. Zinn, E. Powell, Application of the Galerkin method in the solution of combustion-instability problems, in: XlXth International Astronautical Congress, Vol. 3, 1968, pp. 59–73.

[70] P. Comon, X. Luciani, A. L. F. de Almeida, Tensor decompositions, alternating least squaresand other tales, J. Chemometrics 23 (7-8) (2009) 393–405. doi:10.1002/cem.1236.

[71] C. Cercignani, V. I. Gerasimenko, D. Y. Petrina, Many-Particle Dynamics and Kinetic Equa-tions, 1st Edition, Springer Netherlands, 1997. doi:10.1007/978-94-011-5558-8.

30


http://dx.doi.org/10.1109/tsp.2013.2269046

http://dx.doi.org/10.1109/tnnls.2013.2271507

http://dx.doi.org/10.1109/tnnls.2013.2271507


http://dx.doi.org/10.1137/140971270


http://dx.doi.org/10.1007/978-0-387-22750-4



http://dx.doi.org/10.1142/s0218202596000444

http://dx.doi.org/10.1142/s0218202596000444

http://dx.doi.org/10.1002/aic.690100423

http://dx.doi.org/10.1002/cem.1236

http://dx.doi.org/10.1007/978-94-011-5558-8

[72] C. Villani, A review of mathematical topics in collisional kinetic theory, in: S. Friedlander,D. Serre (Eds.), Handbook of mathematical fluid mechanics, Vol. 1, North-Holland, 2002, pp.71–305.

[73] C. Cercignani, R. Illner, M. Pulvirenti, The Mathematical Theory of Dilute Gases, SpringerNew York, 1994. doi:10.1007/978-1-4419-8524-8.

[74] T. Nishida, Fluid dynamical limit of the nonlinear Boltzmann equation to the level of thecompressible Euler equation, Commun.Math. Phys. 61 (2) (1978) 119–148. doi:10.1007/

bf01609490.

[75] R. E. Caflisch, The fluid dynamic limit of the nonlinear boltzmann equation, Comm. PureAppl. Math. 33 (5) (1980) 651–666. doi:10.1002/cpa.3160330506.

[76] H. Struchtrup, Macroscopic Transport Equations for Rarefied Gas Flows, Springer BerlinHeidelberg, 2005. doi:10.1007/3-540-32386-4.

[77] C. D. Levermore, Moment closure hierarchies for kinetic theories, J Stat Phys 83 (5-6) (1996)1021–1065. doi:10.1007/bf02179552.

[78] I. Muller, T. Ruggeri, Rational Extended Thermodynamics, Springer New York, 1998. doi:

10.1007/978-1-4612-2210-1.

[79] P. L. Bhatnagar, E. P. Gross, M. Krook, A model for collision processes in gases. i. smallamplitude processes in charged and neutral one-component systems, Phys. Rev. 94 (3) (1954)511–525. doi:10.1103/physrev.94.511.

[80] L. Mieussens, Discrete-velocity models and numerical schemes for the Boltzmann-BGK equa-tion in plane and axisymmetric geometries, J. Comput. Phys. 162 (2) (2000) 429–466.doi:10.1006/jcph.2000.6548.

[81] J. Nassios, J. E. Sader, High frequency oscillatory flows in a slightly˜rarefied gas accordingto the Boltzmann–BGK˜equation, J. Fluid Mech. 729 (2013) 1–46. doi:10.1017/jfm.2013.281.

[82] P. Andries, P. Le Tallec, J.-P. Perlat, B. Perthame, The Gaussian-BGK model of Boltzmannequation with small prandtl number, Eur. J. Mech. B. Fluids 19 (6) (2000) 813–830. doi:

10.1016/s0997-7546(00)01103-1.

[83] N. Borghini, Topics in Nonequilibrium Physics (Sep. 2016).

[84] L. Barichello, C. Siewert, Some comments on modeling the linearized Boltzmann equation,J. Quant. Spectrosc. Radiat. Transfer 77 (1) (2003) 43–59. doi:10.1016/s0022-4073(02)

00074-2.

[85] C. L. Pekeris, Z. Alterman, Solution of the Boltzmann-Hilbert integral equation ii. the coef-ficients of viscosity and heat conduction, Proceedings of the National Academy of Sciences43 (11) (1957) 998–1007. doi:10.1073/pnas.43.11.998.

[86] S. K. Loyalka, Model dependence of the slip coefficient, Phys. Fluids 10 (8) (1967) 1833.doi:10.1063/1.1762366.

31

http://dx.doi.org/10.1007/978-1-4419-8524-8

http://dx.doi.org/10.1007/bf01609490

http://dx.doi.org/10.1007/bf01609490

http://dx.doi.org/10.1002/cpa.3160330506

http://dx.doi.org/10.1007/3-540-32386-4

http://dx.doi.org/10.1007/bf02179552

http://dx.doi.org/10.1007/978-1-4612-2210-1

http://dx.doi.org/10.1007/978-1-4612-2210-1

http://dx.doi.org/10.1103/physrev.94.511

http://dx.doi.org/10.1006/jcph.2000.6548

http://dx.doi.org/10.1017/jfm.2013.281

http://dx.doi.org/10.1017/jfm.2013.281

http://dx.doi.org/10.1016/s0997-7546(00)01103-1

http://dx.doi.org/10.1016/s0997-7546(00)01103-1

http://dx.doi.org/10.1016/s0022-4073(02)00074-2

http://dx.doi.org/10.1016/s0022-4073(02)00074-2

http://dx.doi.org/10.1073/pnas.43.11.998

http://dx.doi.org/10.1063/1.1762366

[87] H. Huang, J. M. Dennis, L. Wang, P. Chen, A scalable parallel LSQR algorithm for solvinglarge-scale linear system for tomographic problems: A case study in seismic tomography,Procedia Comput. Sci. 18 (2013) 581–590. doi:10.1016/j.procs.2013.05.222.

[88] C. C. Paige, M. A. Saunders, LSQR: An algorithm for sparse linear equations and sparse leastsquares, ACM Trans. Math. Softw. 8 (1) (1982) 43–71. doi:10.1145/355984.355989.

[89] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo,C. Romine, H. van der Vorst, Templates for the Solution of Linear Systems: Building Blocksfor Iterative Methods, Society for Industrial and Applied Mathematics, 1994. doi:10.1137/1.9781611971538.

32

http://dx.doi.org/10.1016/j.procs.2013.05.222

http://dx.doi.org/10.1145/355984.355989

http://dx.doi.org/10.1137/1.9781611971538

http://dx.doi.org/10.1137/1.9781611971538

Algorithm 3 ALS Subroutines: CreateVectorGamma

1: function CreateVectorGamma(βNew,k, βOld,k, k)2: for z ← 1 : RG do . Loop over all elements of γk3: for q ← −|Q| /2 : |Q| /2 do4: for k′ ← 1 : K do . Loop over all elements of N for each element of γk5: for l′ ← 1 : RG do6: for η ← 1 : rA do7: for υ ← 1 : rA do8: j ← (η − 1) rA + υ9: if k’ = k then

10: Nk′,l′,j ← 011: for s′ ← −|Q| /2 : |Q| /2Q do12: Nk′,l′,j ← Nk′,l′,j + βOld,k′,(l′−1)Q+s′N(k′, η, υ, s′, q)

13: else14: Nk′,l′,j ← 015: for s′ ← −|Q| /2 : |Q| /2Q do16: for q′ ← −|Q| /2 : |Q| /2 do17: Nk′,l′,j ← Nk′,l′,j+βOld,k′,(l′−1)Q+s′βNew,k′,(z−1)Q+q′N(k′, η, υ, s′, q′)

18: Nk,(z−1)Q+q ←∑RG

l′=1

∑r2Aj=1

∏Kk′=1 Nk′,l′,j . Fill matrix Nk

19: for k′ ← 1 : K do . Loop over all elements of O for each element of γk20: for η ← 1 : rA do21: if k’ = k then22: Ok′,η ← O(k′, η, q)23: else24: Ok′,η ← 025: for q′ ← −|Q| /2 : |Q| /2 do26: Ok′,η ← Ok′,η + βNew,k′,(z−1)Q+q′O(k′, η, q′)

27: Ok,(z−1)Q+q ←∑rAη=1

∏Kk′=1 Ok′,η . Fill matrix Ok

28: γk = Nk + ∆t ν Ok29: return γk30: end function

Algorithm 4 ALS Subroutines: ComputeBeta

1: function ComputeBeta(Mk, βNew,k, γk)2: βNew,k ← Lsqr(Mk, βNew,k, γk) . The system γk −MkβNew,k = Res is solved using the

least squares method. βNew,k is used as an initial guess before being updated3: return βNew,k

4: end function

33

Algorithm 5 ALS Subroutines: ComputeConvergence

1: function ComputeConvergence(βInt, βNew)2: for k ← 1 : K do3: βNorm,k ← ‖βNew,k − βInt,k‖/‖βInt,k‖4: return max (βNorm)5: end function

34

arxiv:1803.10270v1 [math.na] 27 mar 2018 · parallel numerical tensor methods for high-dimensional...

Documents