fast fluid-structure interaction using lbm and immersed ...€¦ · -3.5 3-2.20 -1.70 -1.20 r ......

Fast Fluid-structure Interaction Using

Lattice Boltzmann and Immersed

Boundary Methods

Mark Mawson1, Pedro Valero Lara2, Julien Favier3, Alfredo Pinelli2, Alistair Revell1

1. The University of Manchester 2. Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas 3. Aix Marseille Université

Contents

1. The Lattice Boltzmann Method

2. The Immersed Boundary Method

3. Demonstrations in 2D and 3D

4. Implementation and Optimisation

5. Summary

Demo GPU Hardware

• GK104 based K5000M: • Portable (in a laptop)

• High peak performance

• Low DRAM bandwidth-but we can still solve

fluid problems interactively.

CUDA Cores 1344

DRAM 4GB

Compute capability 3.0

Peak performance

(single precision)

1.6TFlops

DRAM Bandwidth 96GB/sec (theoretical)

66GB/sec (measured)

The Lattice Boltzmann Method

• Continuum methods (macro-scale)

Based on Navier Stokes equations.

Conservation of mass/momentum/energy on infinitesimal volume

Finite (Volume/Element/Diff.)

• Molecular Dynamics (micro scale)

Small particles that collide with each other

Inter-particle forces governs interactions

For each t we must find trajectory of each particle

Very computationally expensive

• Meso-scale,

Based on Kinetic theory, fits somewhere in the middle:

LBM falls within this category

Instead of a single particle we consider a distribution function

This represents a collection of particles

The Boltzmann Equation

=ff

t

fex fe

ff eq )(1

f

e

f is a probability distribution function

is a the velocity vector associated

with faccounts for body forces

applied to the distribution

(more on this later)

)(eqf

an operator to account for particle collisions (LGBK) in this case

a Maxwellian used to “emit” particles

into a new component of e (see

Bhatnagar & Grook, 1954)

a relaxation term used to describe the

amount of collision taking place

is a weighting function and is the speed of sound in the lattice

Discretisation to Lattice Boltzmann

iieqiii tftftftf f,,1

=,1, )( xxxex i

fi = 1-1

2t

æ

èçö

ø÷w i

ei - u

cs2

+ei ×u

cs4ei

é

ëê

ù

ûú × f

2

2

4

2

2

)(

221=

sss

i

eq

iccc

fuueue ii

i 31

sc

i

i

f=

feu i2

1= i

i

f

• Multi-scale expansion of the Lattice Boltzmann equation up to and including 2nd

order terms allows the Navier-Stokes equations to be recovered,

• See Guo et al, 2002 for more details

Macroscopic variables Populations

LBM– Discretisation 2D-D2Q9 3D-D3Q19

1111111000

1111100110=ie

1111111100001100000

0000111111110011000

1111000011110000110

=ie

Collision step:

• This is an entirely local operation (think independent threads in CUDA)

LBM- As an Algorithm iieqii tftftf f,,

1=1, )( xxx

Streaming step:

• Nearest neighbour interaction.

tftf iii ,=1, xex

LBM- As an Algorithm

• Re-ordering of LBM algorithm to increase locality:

Stream in the appropriate direction

Apply boundary conditions.

Calculate ρ and u.

Calculate .

Apply the collision operator.

if

1,= txff iii ex

iieqii tftftf f,,1

=1, )( xxx

)(eq

if

LBM Validation

Validation 1: Lid Driven Cavity

0

0.2

0.4

0.6

0.8

1

-0.5 0 0.5 1

Y

u

Ghia et al

33x33

65x65

129x129

0

0.2

0.4

0.6

0.8

1

-0.5 0 0.5 1

Y

u

Ghia et al

65x65

129x129

257x257

0

0.2

0.4

0.6

0.8

1

-0.5 0 0.5 1

Y

u

Ghia et al

129x129

257x257

513x513

2D CASE: Centreline u profiles for Re=100,400 and 1000

0

0.2

0.4

0.6

0.8

1

-0.5 0 0.5 1

Y

u

Jiang & Lin

33x33x33

65x65x65

129x129x129

0

0.2

0.4

0.6

0.8

1

-0.5 0 0.5 1

Y

u

Jiang & Lin

65x65x65

129x129x129

257x257x257

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-0.5 0.5

Y

u

Jiang & Lin

129x129x129

257x257x257

3D CASE: Centreline u profiles for Re=100,400 and 1000

• D3Q19 is more memory intensive, leads to smaller domains.

• Boundary conditions become more important.

• 2nd order convergence verified up to floating precision in 3D.

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 0.2 0.4 0.6 0.8 1

u v

elo

cit

y

Y coordinate

Analytical Solution

33x33x33

-5

-4.5

-4

-3.5

-3

-2.20 -1.70 -1.20

Lo

g10 o

f err

or

Log10 Δx

Δx

Δx²

L₂ Norm

Validation 2: Poiseuille Flow

Validation 2: Poiseuille Flow

• In double precision the K5000M runs out of memory before floating point error becomes dominant.

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 0.2 0.4 0.6 0.8 1

u v

elo

cit

y

Y coordinate

Analytical Solution

33x33x33

-5.2

-4.7

-4.2

-3.7

-3.2

-2.20 -1.70 -1.20

Lo

g10 o

f err

or

Log10 Δx

Δx

Δx²

L₂ Norm

Immersed Boundary Method

• Allows moving and complex boundaries to be created arbitrarily within a Lagrangian space.

• No need for unstructured, body fitting domains. • We use the method found in Pinelli, A., Naqavi, I., Piomelli, U., Favier,

J., 2010. Immersed-boundary methods for general finite-difference and finite-volume navier-stokes solvers.

Pictures from http://www.math.vt.edu/people/xuz/research.html

Immersed Boundary Method – With LBM • Perform collision and stream

• Apply boundary conditions.

• Calculate u*.

• Integrate velocities to Lagrange space.

• Calculate corrective force.

• Integrate velocities back to Eulerian space

• Perform collision and stream operations with forces

• Calculate ρ and u with forces.

i

i

fie

u

=*

feu i2

=t

fii

xXxxuU dss ))((~

)(=)(

dsssF ))((~

)(=)( Xxxf

dt

sss

)()(ˆ=)(

UUF

tftftftf ieqiii ,,1

=,1, )( xxxex i

iieqiii tftftftf f,,1

=,1, )( xxxex i

Immersed Boundary Method - Interpolation

• is a mollifier kernel with compact support of size 3 (Roma 1999)

otherwise

rr

rrr

r

0

0.5||1313

1

1.5||0.51|)|3(1||356

1

=)(~ 2

2

~

Demonstrations – Flow Over a Sphere

Demonstration – Flexible Filaments

• We must update the position of our immersed boundary points.

• A Lagrange-Euler system is solved iteratively for the tension between points, and the position of the points

• This is currently (unfortunately!) performed on the host CPU.

• Partially hide this process by executing concurrently with LBM

T X

FgXXX

)()(=

2

2

2

2

2

2

sK

ssT

stB

Implementation & Performance-LBM Operating on all indices of in one thread helps to hide the

latency through ILP, lower occupancy but higher performance

(Volkov, 2010).

if

Unrolling the streaming operation loop allows 19 requests to

DRAM to be made with only register level stalls.

Implementation & Performance-LBM Use “Struct of Arrays” access to . Coalesced access therefore

only depends on the x component of e (fully coalesced during collision).

Cache hit rate is low as we don’t have repeat accesses; 0% in L1 and

Implementation & Performance- LBM • 2D LBM

740

750

760

770

780

790

800

810

820

830

256^2 512^2 1024^2 2048^2

ML

UP

S

Number of lattice points

0

100

200

300

400

500

600

64^3 96^3 128^3

ML

UP

S

Number of Lattice Points

Present work

Asinari et al. (2011)

Obrecht et al. (2010)

Rinaldi et al. (2012)

• 3D LBM

Implementation & Performance - LBM 3D LBM- If we scale for the bandwidth of the GPU

0

100

200

300

400

500

600

700

64^3 96^3 128^3

ML

UP

S s

cale

d f

or

ban

dw

idth


Present work



0

100

200

300

400

500

600

64^3 96^3 128^3

ML

UP

S


Present work

Astorino et al. (2011)



Implementation & Performance – IB • Transactions involving information for each boundary point

are coalesced – each point only needs information about itself • Transactions for moving data between fluid and boundary are

random – much higher cache use, ≈40%

Lagrange

information Fluid

information

Implementation & Performance – IB • In 2D we only use a few hundred lagrange markers per object

• We can assign one block of threads per object (1024 points max).

• In 3D several thousand lagrange markers are needed (4000 for the sphere demonstration).

• We need to launch one kernel per object.

• Launching kernels in different streams can improve the utilisation of the GPU, if the objects are small.

Performance – Immersed Boundary

• 3.4ms in serial

Summary

• Lattice Boltzmann-Immersed Boundary solvers presented in 2D and 3D. Relatively simple alternative to unstructured domains.

Both methods suit parallelisation.

• Real-time simulations possible thanks to GPU acceleration.

• Don’t always need high occupancy and use of shared/cache memory to achieve high performance.

Future work

“Interactive in-silico Platform for Optimising Surgical Procedure of Abdominal

Aortic Aneurysm Repair and Evaluation of Stent Performance.”

• Personalised surgery simulation for stent implants – i.e. Real-time with user interaction.

• Medical images converted into CAD designs and imported into fluids solver as an immersed boundary.

• A few years away, but this work lays the foundations for such a project.

Any Questions?

www.mark.j.mawson.blogspot.com

http://www.youtube.com/user/mjmawson

http://www.youtube.com/mcji8ar2

http://www.mark.j.mawson.blogspot.com/http://www.youtube.com/user/mjmawsonhttp://www.youtube.com/mcji8ar2

References • Bhatnagar, P. & Gross, E., 1954. A model for collision

processes in gases. I. Small amplitude processes in charged and neutral one-component systems.

• Guo, Z., Zheng, C. & Shi, B., 2002. Discrete lattice effects on the forcing term in the lattice Boltzmann method.

• Roma, A. M., Peskin, C. S., Berger, M. J., 1999. An adaptive version of the immersed boundary method. Journal of Computational Physics 153, 509 – 534.

• Volkov, V., 2010, Better Performance at Lower Occupancy, GTC 2010.

fast fluid-structure interaction using lbm and immersed ...€¦ · -3.5 3-2.20 -1.70 -1.20 r ......

Documents