fast fluid-structure interaction using lbm and immersed ...€¦ · -3.5 3-2.20 -1.70 -1.20 r ......

29
Fast Fluid-structure Interaction Using Lattice Boltzmann and Immersed Boundary Methods Mark Mawson 1 , Pedro Valero Lara 2 , Julien Favier 3 , Alfredo Pinelli 2 , Alistair Revell 1 1. The University of Manchester 2. Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas 3. Aix Marseille Université

Upload: others

Post on 21-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Fast Fluid-structure Interaction Using

    Lattice Boltzmann and Immersed

    Boundary Methods

    Mark Mawson1, Pedro Valero Lara2, Julien Favier3, Alfredo Pinelli2, Alistair Revell1

    1. The University of Manchester 2. Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas 3. Aix Marseille Université

  • Contents

    1. The Lattice Boltzmann Method

    2. The Immersed Boundary Method

    3. Demonstrations in 2D and 3D

    4. Implementation and Optimisation

    5. Summary

  • Demo GPU Hardware

    • GK104 based K5000M: • Portable (in a laptop)

    • High peak performance

    • Low DRAM bandwidth-but we can still solve

    fluid problems interactively.

    CUDA Cores 1344

    DRAM 4GB

    Compute capability 3.0

    Peak performance

    (single precision)

    1.6TFlops

    DRAM Bandwidth 96GB/sec (theoretical)

    66GB/sec (measured)

  • The Lattice Boltzmann Method

    • Continuum methods (macro-scale)

    Based on Navier Stokes equations.

    Conservation of mass/momentum/energy on infinitesimal volume

    Finite (Volume/Element/Diff.)

    • Molecular Dynamics (micro scale)

    Small particles that collide with each other

    Inter-particle forces governs interactions

    For each t we must find trajectory of each particle

    Very computationally expensive

    • Meso-scale,

    Based on Kinetic theory, fits somewhere in the middle:

    LBM falls within this category

    Instead of a single particle we consider a distribution function

    This represents a collection of particles

  • The Boltzmann Equation

    =ff

    t

    fex fe

    ff eq )(1

    f

    e

    f is a probability distribution function

    is a the velocity vector associated

    with faccounts for body forces

    applied to the distribution

    (more on this later)

    )(eqf

    an operator to account for particle collisions (LGBK) in this case

    a Maxwellian used to “emit” particles

    into a new component of e (see

    Bhatnagar & Grook, 1954)

    a relaxation term used to describe the

    amount of collision taking place

  • is a weighting function and is the speed of sound in the lattice

    Discretisation to Lattice Boltzmann

    iieqiii tftftftf f,,1

    =,1, )( xxxex i

    fi = 1-1

    2t

    æ

    èçö

    ø÷w i

    ei - u

    cs2

    +ei ×u

    cs4ei

    é

    ëê

    ù

    ûú × f

    2

    2

    4

    2

    2

    )(

    221=

    sss

    i

    eq

    iccc

    fuueue ii

    i 31

    sc

    i

    i

    f=

    feu i2

    1= i

    i

    f

    • Multi-scale expansion of the Lattice Boltzmann equation up to and including 2nd

    order terms allows the Navier-Stokes equations to be recovered,

    • See Guo et al, 2002 for more details

    Macroscopic variables Populations

  • LBM– Discretisation 2D-D2Q9 3D-D3Q19

    1111111000

    1111100110=ie

    1111111100001100000

    0000111111110011000

    1111000011110000110

    =ie

  • Collision step:

    • This is an entirely local operation (think independent threads in CUDA)

    LBM- As an Algorithm iieqii tftftf f,,

    1=1, )( xxx

    Streaming step:

    • Nearest neighbour interaction.

    tftf iii ,=1, xex

  • LBM- As an Algorithm

    • Re-ordering of LBM algorithm to increase locality:

    Stream in the appropriate direction

    Apply boundary conditions.

    Calculate ρ and u.

    Calculate .

    Apply the collision operator.

    if

    1,= txff iii ex

    iieqii tftftf f,,1

    =1, )( xxx

    )(eq

    if

  • LBM Validation

  • Validation 1: Lid Driven Cavity

    0

    0.2

    0.4

    0.6

    0.8

    1

    -0.5 0 0.5 1

    Y

    u

    Ghia et al

    33x33

    65x65

    129x129

    0

    0.2

    0.4

    0.6

    0.8

    1

    -0.5 0 0.5 1

    Y

    u

    Ghia et al

    65x65

    129x129

    257x257

    0

    0.2

    0.4

    0.6

    0.8

    1

    -0.5 0 0.5 1

    Y

    u

    Ghia et al

    129x129

    257x257

    513x513

    2D CASE: Centreline u profiles for Re=100,400 and 1000

    0

    0.2

    0.4

    0.6

    0.8

    1

    -0.5 0 0.5 1

    Y

    u

    Jiang & Lin

    33x33x33

    65x65x65

    129x129x129

    0

    0.2

    0.4

    0.6

    0.8

    1

    -0.5 0 0.5 1

    Y

    u

    Jiang & Lin

    65x65x65

    129x129x129

    257x257x257

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    -0.5 0.5

    Y

    u

    Jiang & Lin

    129x129x129

    257x257x257

    3D CASE: Centreline u profiles for Re=100,400 and 1000

  • • D3Q19 is more memory intensive, leads to smaller domains.

    • Boundary conditions become more important.

    • 2nd order convergence verified up to floating precision in 3D.

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0 0.2 0.4 0.6 0.8 1

    u v

    elo

    cit

    y

    Y coordinate

    Analytical Solution

    33x33x33

    -5

    -4.5

    -4

    -3.5

    -3

    -2.20 -1.70 -1.20

    Lo

    g10 o

    f err

    or

    Log10 Δx

    Δx

    Δx²

    L₂ Norm

    Validation 2: Poiseuille Flow

  • Validation 2: Poiseuille Flow

    • In double precision the K5000M runs out of memory before floating point error becomes dominant.

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0 0.2 0.4 0.6 0.8 1

    u v

    elo

    cit

    y

    Y coordinate

    Analytical Solution

    33x33x33

    -5.2

    -4.7

    -4.2

    -3.7

    -3.2

    -2.20 -1.70 -1.20

    Lo

    g10 o

    f err

    or

    Log10 Δx

    Δx

    Δx²

    L₂ Norm

  • Immersed Boundary Method

    • Allows moving and complex boundaries to be created arbitrarily within a Lagrangian space.

    • No need for unstructured, body fitting domains. • We use the method found in Pinelli, A., Naqavi, I., Piomelli, U., Favier,

    J., 2010. Immersed-boundary methods for general finite-difference and finite-volume navier-stokes solvers.

    Pictures from http://www.math.vt.edu/people/xuz/research.html

  • Immersed Boundary Method – With LBM • Perform collision and stream

    • Apply boundary conditions.

    • Calculate u*.

    • Integrate velocities to Lagrange space.

    • Calculate corrective force.

    • Integrate velocities back to Eulerian space

    • Perform collision and stream operations with forces

    • Calculate ρ and u with forces.

    i

    i

    fie

    u

    =*

    feu i2

    =t

    fii

    xXxxuU dss ))((~

    )(=)(

    dsssF ))((~

    )(=)( Xxxf

    dt

    sss

    )()(ˆ=)(

    UUF

    tftftftf ieqiii ,,1

    =,1, )( xxxex i

    iieqiii tftftftf f,,1

    =,1, )( xxxex i

  • Immersed Boundary Method - Interpolation

    • is a mollifier kernel with compact support of size 3 (Roma 1999)

    otherwise

    rr

    rrr

    r

    0

    0.5||1313

    1

    1.5||0.51|)|3(1||356

    1

    =)(~ 2

    2

    ~

  • Demonstrations – Flow Over a Sphere

  • Demonstration – Flexible Filaments

    • We must update the position of our immersed boundary points.

    • A Lagrange-Euler system is solved iteratively for the tension between points, and the position of the points

    • This is currently (unfortunately!) performed on the host CPU.

    • Partially hide this process by executing concurrently with LBM

    T X

    FgXXX

    )()(=

    2

    2

    2

    2

    2

    2

    sK

    ssT

    stB

  • Implementation & Performance-LBM Operating on all indices of in one thread helps to hide the

    latency through ILP, lower occupancy but higher performance

    (Volkov, 2010).

    if

    Unrolling the streaming operation loop allows 19 requests to

    DRAM to be made with only register level stalls.

  • Implementation & Performance-LBM Use “Struct of Arrays” access to . Coalesced access therefore

    only depends on the x component of e (fully coalesced during collision).

    Cache hit rate is low as we don’t have repeat accesses; 0% in L1 and

  • Implementation & Performance- LBM • 2D LBM

    740

    750

    760

    770

    780

    790

    800

    810

    820

    830

    256^2 512^2 1024^2 2048^2

    ML

    UP

    S

    Number of lattice points

    0

    100

    200

    300

    400

    500

    600

    64^3 96^3 128^3

    ML

    UP

    S

    Number of Lattice Points

    Present work

    Asinari et al. (2011)

    Obrecht et al. (2010)

    Rinaldi et al. (2012)

    • 3D LBM

  • Implementation & Performance - LBM 3D LBM- If we scale for the bandwidth of the GPU

    0

    100

    200

    300

    400

    500

    600

    700

    64^3 96^3 128^3

    ML

    UP

    S s

    cale

    d f

    or

    ban

    dw

    idth

    Number of Lattice Points

    Present work

    Obrecht et al. (2010)

    Rinaldi et al. (2012)

    0

    100

    200

    300

    400

    500

    600

    64^3 96^3 128^3

    ML

    UP

    S

    Number of Lattice Points

    Present work

    Astorino et al. (2011)

    Obrecht et al. (2010)

    Rinaldi et al. (2012)

  • Implementation & Performance – IB • Transactions involving information for each boundary point

    are coalesced – each point only needs information about itself • Transactions for moving data between fluid and boundary are

    random – much higher cache use, ≈40%

    Lagrange

    information Fluid

    information

  • Implementation & Performance – IB • In 2D we only use a few hundred lagrange markers per object

    • We can assign one block of threads per object (1024 points max).

    • In 3D several thousand lagrange markers are needed (4000 for the sphere demonstration).

    • We need to launch one kernel per object.

    • Launching kernels in different streams can improve the utilisation of the GPU, if the objects are small.

  • Performance – Immersed Boundary

    • 3.4ms in serial

  • Summary

    • Lattice Boltzmann-Immersed Boundary solvers presented in 2D and 3D. Relatively simple alternative to unstructured domains.

    Both methods suit parallelisation.

    • Real-time simulations possible thanks to GPU acceleration.

    • Don’t always need high occupancy and use of shared/cache memory to achieve high performance.

  • Future work

    “Interactive in-silico Platform for Optimising Surgical Procedure of Abdominal

    Aortic Aneurysm Repair and Evaluation of Stent Performance.”

    • Personalised surgery simulation for stent implants – i.e. Real-time with user interaction.

    • Medical images converted into CAD designs and imported into fluids solver as an immersed boundary.

    • A few years away, but this work lays the foundations for such a project.

  • Any Questions?

    www.mark.j.mawson.blogspot.com

    http://www.youtube.com/user/mjmawson

    http://www.youtube.com/mcji8ar2

    http://www.mark.j.mawson.blogspot.com/http://www.youtube.com/user/mjmawsonhttp://www.youtube.com/mcji8ar2

  • References • Bhatnagar, P. & Gross, E., 1954. A model for collision

    processes in gases. I. Small amplitude processes in charged and neutral one-component systems.

    • Guo, Z., Zheng, C. & Shi, B., 2002. Discrete lattice effects on the forcing term in the lattice Boltzmann method.

    • Roma, A. M., Peskin, C. S., Berger, M. J., 1999. An adaptive version of the immersed boundary method. Journal of Computational Physics 153, 509 – 534.

    • Volkov, V., 2010, Better Performance at Lower Occupancy, GTC 2010.