contact solid mechanics simulation

10
DRAFT Paper 1 Logo © Civil-Comp Press, 2017 Proceedings of the Fifth International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering, P. Iványi, B.H.V Topping and G. Várady (Editors) Civil-Comp Press, Stirlingshire, Scotland A Massively-Parallel Multicore Acceleration of a Point Contact Solid Mechanics Simulation M. Kolman, G. Kosec Parallel and Distributed Systems Laboratory, Jožef Stefan Institute, Ljubljana, Slove- nia Abstract This paper deals with the numerical determination of the stress and displacement dis- tribution in a solid body subjected to the applied external force. The tackled solid mechanics problem is governed by the Navier-Cauchy equation that describes the de- formation within the solid body through the displacement vector field. To obtain the solution, a coupled system of non-linear Partial Differential Equations (PDE) of sec- ond order has to be solved. In this paper, the problem is approached by a strong form Moving Least Squares (MLS) based numerical discretization also referred to as a Meshless Local Strong Form Method (MLSM). A generic C++ implementation of a MLSM is used for demonstration of parallel solution of a Point Contact problem on Intel ® Xeon Phi™ multicore accelerator. All tests are executed on either the host machine with two Intel ® Xeon ® E5-2620 v3 6 core processors or offloaded to its 60 core Intel ® Xeon Phi™ SE10/7120 series. The shared memory parallelization is implemented through an OpenMP API. Keywords: MLSM, meshless, OpenMP, Intel Xeon Phi coprocessor, parallel imple- mentation 1 Introduction In the majority of numerical simulations the solid mechanics problems are tackled with Finite Elements Methods (FEM) [1]. However, as an alternative to the mesh based FEM, a new class of numerical methods, referred to as Meshless methods, has emerged to alleviate the meshing complexity, which is in many cases, especially in 3D, still the most cumbersome part of the solution procedure. Both of the weak and strong form variants of the Meshless methods have been already applied on the solid mechanics problems [2, 3]. 1

Upload: others

Post on 18-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Contact Solid Mechanics Simulation

DRAFTPaper 1 Lo

go© Civil-Comp Press, 2017Proceedings of the Fifth International Conference onParallel, Distributed, Grid and Cloud Computing for Engineering,P. Iványi, B.H.V Topping and G. Várady (Editors)Civil-Comp Press, Stirlingshire, Scotland

A Massively-Parallel Multicore Acceleration of a PointContact Solid Mechanics Simulation

M. Kolman, G. KosecParallel and Distributed Systems Laboratory, Jožef Stefan Institute, Ljubljana, Slove-nia

Abstract

This paper deals with the numerical determination of the stress and displacement dis-tribution in a solid body subjected to the applied external force. The tackled solidmechanics problem is governed by the Navier-Cauchy equation that describes the de-formation within the solid body through the displacement vector field. To obtain thesolution, a coupled system of non-linear Partial Differential Equations (PDE) of sec-ond order has to be solved. In this paper, the problem is approached by a strongform Moving Least Squares (MLS) based numerical discretization also referred to asa Meshless Local Strong Form Method (MLSM). A generic C++ implementation ofa MLSM is used for demonstration of parallel solution of a Point Contact problemon Intel® Xeon Phi™ multicore accelerator. All tests are executed on either the hostmachine with two Intel® Xeon® E5-2620 v3 6 core processors or offloaded to its60 core Intel® Xeon Phi™ SE10/7120 series. The shared memory parallelization isimplemented through an OpenMP API.

Keywords: MLSM, meshless, OpenMP, Intel Xeon Phi coprocessor, parallel imple-mentation

1 Introduction

In the majority of numerical simulations the solid mechanics problems are tackledwith Finite Elements Methods (FEM) [1]. However, as an alternative to the meshbased FEM, a new class of numerical methods, referred to as Meshless methods, hasemerged to alleviate the meshing complexity, which is in many cases, especially in3D, still the most cumbersome part of the solution procedure. Both of the weak andstrong form variants of the Meshless methods have been already applied on the solidmechanics problems [2, 3].

1

Page 2: Contact Solid Mechanics Simulation

DRAFT

In this paper we consider a Strong Form Meshless method that is based on the MLSapproximation [4]. The most important features of the employed numerical methodare the locality and generality. The locality is manifested by the fact that the evalu-ation of partial differential operators, which is the main part of the solution process,relies only on a small number of surrounding computational nodes. This is importantfrom the computation point of view, since it reduces the inter-processor communi-cation, which is often the bottleneck of parallel algorithms [5]. The generality, onthe other hand, arises from the fact that all the building blocks of the method dependonly on the distance between the computational nodes. This is a very useful feature,especially when dealing with problems in multidimensional spaces, complex geome-tries, and moving boundaries. The generality of the method also allows for an elegantcomputer implementation. All these facts make the MLSM an attractive alternativeto the classical approaches like Finite Differences Method that suffers from geomet-rical limitations, and more general weak form Finite Elements Method that requiremeshing.

2 Point contact problem

The Navier-Cauchy equations describe the dynamics of a solid through the displace-ment vector field u, expressed concisely in vector form as

ρ∂2u

∂t2= (λ+ µ)∇(∇ · u) + µ∇2u−D∂u

∂t(1)

where µ and λ stand for Lamé constants, ρ is the density, andD is damping coefficient.The point contact problem on the lower half-plane, i.e. a point-load applied to asurface of the body, is characterised with the traction boundary conditions

σxy(x, 0) = 0, σyy(x, y) = −Pδ(x, y) (2)

where σxy and σxx are shear and normal stress tensor components, and δ(x, y) is theDirac delta function. These boundary conditions state that there is a singular normalforce P applied at (x, y) = (0, 0) and that there are no shear stresses on the surface ofthe elastic half- plane. The problem’s solution can be expressed in a closed form andis therefore ideal for testing purposes. The solution is presented as

σxx = −2P

π

x2y

(x2 + y2)2, σyy = −2P

π

y3

(x2 + y2)2, σxy = −2P

π

xy2

(x2 + y2)2, (3)

for a point (x, y) in the half-plane. For our purposes a solution in terms of displace-ment vector u = (ux, uy) is more convenient. It can be obtained by integrating Eq.(3), which yields

ux = − P

4πµ

((κ− 1)θ − 2xy

r2

), (4)

uy = − P

4πµ

((κ+ 1) log r +

2x2

r2

), (5)

2

Page 3: Contact Solid Mechanics Simulation

DRAFT

where r =√x2 + y2, tan θ = x

y. and κ = 3 − 4ν, with ν standing for the Poisson

ratio.

3 Meshless Local Strong Form numerical approach

To numerically solve a problem at hand, we discretize a domain into a finite set ofnodes, at which the partial differential operators, occurring in Eq. (1), are evaluated.The core concept of the spatial discretization used here, namely a MLSM, is a localMoving Least Squares (MLS) approximation of a considered field over the overlap-ping local support domains. In each node we create an approximation function over asmall local sub-set of neighbouring n nodes acting as a trial function

u(p) =m∑i=1

αibi(p) = b(p)Tα, (6)

with m, α, b, p (x, y) standing for the number of basis functions, approximationcoefficients, basis functions and the position vector, respectively. The problem can bewritten in matrix form as

α =(W0.5B

)+W0.5u, (7)

where (W0.5B)+ stand for a Moore–Penrose pseudo inverse. By explicitly expressingthe coefficients α into the trial function one gets

u (p) = b(p)T(W0.5 (p)B)+

W0.5 (p)u = χ (p)u, (8)

where χ stand for the shape functions. Now, we can apply any partial differentialoperator L, as is our goal, on the trial function

L u (p) = Lχ (p)u. (9)

The presented formulation is convenient for implementation since most of the com-plex operations, i.e. finding support nodes and building shape functions, are performedonly when nodal topology changes. During computation the pre- computed shapefunctions are convoluted with the vector of field values in the support to evaluate thedesired operator.

With the explicit temporal discretization the solution of Eq. (1) is formulated as

u3 =∆t2

ρ

((λ+ µ)∇(∇ · u2) + µ∇2u2 −D

u2 − u1

∆t

), (10)

where u1,u2 and u3 stand for displacement vector at two previous time steps andthe one currently computed, respectively. The operators L = ∇·,∇,∇2 are computedwith Equations (8) and (9) [4].

3

Page 4: Contact Solid Mechanics Simulation

DRAFT

4 Implementation

A MLSM solution for the point contact problem is implemented in C++ and compiledeither with GNU C++ compiler g++5.3.0 or Intel’s C++ compiler ICPC version16.0.1 (which has GCC version 4.8.0 compatibility) both with compiler flags -O3-std=c++14 -fopenmp. The code is developed within an open source project andis freely available at [6]. It implements different domain classes, a Nearest Neighbour(kNN) search based on the kD-tree algorithm, a general Moving Least Squares (MLS)engine and its extension to a full MLSM engine.

4.1 Building the shape functions

In a first major step all the necessary shape functions needed to calculate the requireddifferential operator Lu(p) are prepared. This set-up procedure comprises positioningof the nodes, finding support nodes, defining the MLS approximation, and finallycreating the operators class. The implementation concept is schematically presentedin Figure 1, and written in a C++ with MLSM library [6] as

1 RectangleDomain <Vec2d > domain ( domain_lo , domain_hi ) ;2 domain . f i l l U n i f o r m I n t e r i o r W i t h S t e p ( d_space ) ;

domain . f i l l U n i f o r m B o u n d a r y W i t h S t e p ( d_space ) ;4 domain . f i n d S u p p o r t ( n ) ;

s u p p o r t = domain . p o s i t i o n s [ domain . s u p p o r t [ 0 ] ] ;6 EngineMLS<Vec2d , Gauss i ans , Gauss i ans > mls (

sigmaB , m , s u p p o r t , sigmaW ) ) ;8 auto mlsm = make_mlsm ( domain , mls ) ;

The domain is an object containing all information about the nodes, including sup-port domain of all nodes. It is formed by parameters domain_lo, domain_hi andd_space, which define its boundaries and node density. The variable mlsm is aninstance of the MLSM class that is capable of calculating certain differential opera-tors (e.g. Laplacian, grad, div, etc.) of fields defined on the domain. The variablessigmaB and sigmaW are the standard deviations of basis functions and weight func-tion, respectively, both Gaussians in present case, defined as

g(p) =1√2πσ

exp

(−|p|

2

), (11)

The m defines the number of basis functions and n the number of support nodes. Inthis paper the same number of basis functions and support nodes is assumed. The n= m = 9 assumption effectively reduces the approximation to the collocation, whichis a popular set-up in a meshless community [7, 8], especially when regular nodaldistributions are used. However, in cases when irregular distributions are needed theoverdetermined MLS is preferred [4].

4

Page 5: Contact Solid Mechanics Simulation

DRAFTFigure 1: A MLSM implementation diagram.

4.2 Time simulation

Before the time stepping the boundary conditions and initial state are set. The bound-aries are assumed to be of a Dirichlet type with values obtained from the closed formsolution, while the initial state of the displacement is set to zero throughout whole thedomain, implemented as

1 Range < vec_ t > u_1 ( domain . s i z e ( ) , 0 ) ,2 u_2 ( domain . s i z e ( ) , 0 ) ,

u_3 ( domain . s i z e ( ) , 0 ) ;4

u_3 [ boundary ] = u _ a n a l y t i c a l ( boundary ) ;

Note, that the boundary is a vector of indices of boundary nodes.With prepared operators, known two consequential previous time steps and the

boundary conditions, the Eq. (10) can be numerically solved as

1 #pragma omp p a r a l l e l f o r p r i v a t e ( j ) s c h e d u l e ( s t a t i c )2 f o r ( j = 0 ; j < N; ++ j )

u_3 [ i ] = d t * d t / rho * (4 mu * mlsm . l a p ( u_2 , i ) +

E / (2 − 2 * nu ) * mlsm . g r a d d i v ( u_2 , i ) −6 D * ( u_2 [ i ] − u_1 [ i ] ) / d t )

+ 2 * u_2 [ i ] − u_1 [ i ] ;8

The N states the number of nodes in the domain and dt is the size of the time step.Variables mu, rho, E and nu are physical constants of the material correspondingto the first Lamé constant µ, density ρ, Young modulus E and Poisson coefficient ν,respectively. The variable mlsm is an operator object holding shape functions andprocedures for computing the partial differential operators. Since the loop iterationsare independent the whole process can be easily parallelized with OpenMP API byusing a compiler option #pragma omp.

Once the u3 is computed for all nodes in the domain, the step forward can be

5

Page 6: Contact Solid Mechanics Simulation

DRAFT

performed simply by u1 = u2 and u2 = u3. This iterative process takes place untilthe steady state is achieved.

5 Results

5.1 Solution of a point contact problem

The point contact problem is solved on a domain (x, y) ∈ Ω = [−1, 1]× [−1,−0.01].The standard deviation of basis functions is 70 times the domain characteristic dis-tance, i.e. the average distance to the closest neighbour. The physical constants are setto ρ = 7874 [kg/m3], ν = 0.25 , E = 210 · 109 [Pa], D = 109 and the applied force isP = −1000 [Pa]. All calculations are done with dt = 10−7 [s].

In Figure 2 a MLSM solution u(p) computed withN = 80601 regularly distributednodes is presented and compared against the known analytical solution u(p), withrelative displacement error computed as

E = |u(p)− u(p)| /(|u(p)|+ 10−10

)(12)

and visualized through the color map. The displacement field has been, for the sakeof visibility, multiplied with a factor of 5 · 106.

Figure 2: Relative displacement error E displayed on displaced nodes, with magnifieddisplacement by a factor of 5 · 106.

Next comparison is focused on the displacement over two horizontal crosssections,at y = −0.015 and y = −0.5. The numerical and analytical solutions are presented inFigure 3 and it is evident that MLSM agrees well with a known solution.

6

Page 7: Contact Solid Mechanics Simulation

DRAFT-1 -0.5 0 0.5 1x

-6

-5

-4

-3

-2

-1

0

1

2

uy(x,y=

−0.01

5),ux(x,y=

−0.015

)×10-9

uy MLSMuy analyticalux MLSMux analytical

(a) Horizontal displacement cross section at thetop of the domain.

-1 -0.5 0 0.5 1x

-3

-2

-1

0

1

2

uy(x,y=

−0.5),ux(x,y=

−0.5)

×10-9

uy MLSMuy analyticalux MLSMux analytical

(b) Horizontal displacement cross section inthe middle of the domain.

Figure 3: Two horizontal crosssections of the analytical and numerical solution for thepoint contact problem.

5.2 Execution performance

We tested the execution on a 60-core computing machine Intel® Xeon Phi™ Copro-cessor SE10/7120 series and its server host with two Intel® Xeon® CPU E5-2620 v3processors. The two machines are compared in Table 1.

Name Cores Clock [GHz] L2 Cache [MB] Memory [GB]Intel® Xeon Phi™ 61 1.24 30.5 (0.5/core) 16

Intel® Xeon® CPU 2×6 2.40 2×15 (2.5/core) 64

Table 1: Specification comparison between Intel® Xeon Phi™ and Intel® Xeon®

CPU.

In this preliminary study we are interested in the scalability of the execution per-formance. In Figures 4 and 5 a speed-up, defined as

S =T (1)

T (Nt), (13)

with T (Nt) standing for execution time on Nt threads, for different domain sizes andnumber of utilized threads is presented. The domain size is ranged from N = 741 upto N = 7994001 nodes. For both computers all available threads, including hyper-threading, are utilized. On the host machine in total 12 physical cores support up to24 threads and on the coprocessor 120 threads can be used on its 60 physical cores.

7

Page 8: Contact Solid Mechanics Simulation

DRAFT1 2 4 8 12 24Nt

1

2

3

4

5

6

7

8

S

N=741N=2178N=19701N=79401N=221778N=498501N=1997001N=7994001

(a) Execution speedup on the Intel® Xeon®

host.

1 510 30 60 120Nt

0

10

20

30

40

50

60

S

N=741N=2178N=19701N=79401N=221778N=498501N=1997001N=7994001

(b) Execution speedup on Intel® Xeon Phi™Coprocessor.

Figure 4: Shared memory parallelization speedup with respect to the number ofthreads for different problem sizes.

Increasing the number of threads on the host is beneficial only up to the limit of physi-cal cores, while on the coprocessor using additional 2× threads also improves results.It can be also seen that coprocessor requires much bigger problems to show its fullpotential in terms of scalability. The maximal efficiency (S/Nt) of 0.6 is achieved onhost already on a relatively small systems (N = 105), while the same efficiency isachieved on the coprocessor only with systems consisting of N = 8 · 106 nodes.

6 Conclusions

In this paper we demonstrated the application of the MLSM discretization techniqueon the solution of a coupled system of second order partial differential equations.Namely we solved a Navier-Cauchy equation that describes displacements in a solidbody subjected to an external force.

The code has been written in C++ and executed on two different computing archi-tectures, i.e. the Intel® Xeon® server class CPU and Intel® Xeon Phi™ coprocessor.On both architectures a relatively good scalability has been achieved, with a maximalparallel efficiency of 0.67 for 12 cores on a host machine and 60 cores on a coproces-sor.

Although, the parallel efficiency on the coprocessor is good, the overall perfor-mance is not satisfactory. It is our main focus in future work to improve the per-formance of the MLSM on the coprocessor by means of improving the utilization ofthe vectorization in the lowest level operations, namely the convolution of the shape

8

Page 9: Contact Solid Mechanics Simulation

DRAFT103 104 105 106 107

N

1

2

3

4

5

6

7

8

S

Nt = 1

Nt = 2

Nt = 4

Nt = 8

Nt = 12

Nt = 16

Nt = 24

(a) Execution speedup on the Intel Xeon™host.

103 104 105 106 107

N

5

10

15

20

25

30

35

40

45

50

55

S

Nt = 1

Nt = 2

Nt = 5

Nt = 10

Nt = 30

Nt = 60

Nt = 120

(b) Execution speedup on Intel® Xeon Phi™Coprocessor.

Figure 5: Shared memory parallelization speedup with respect to the problem size fordifferent number of utilized threads.

functions and the field values in the support nodes.

Acknowledgement

The authors acknowledge the financial support from the Slovenian Research Agency(research core funding No. P2-0095)

References

[1] O.C. Zienkiewicz, R.L. Taylor, The Finite Element Method: Solid Mechanics,Butterworth-Heinemann, 2000.

[2] Y. Chen, J.D. Lee, A. Eskandarian, Meshless methods in solid mechanics,Springer, New York, NY, 2006.

[3] B. Mavric, B. Šarler, “Local radial basis function collocation method for linearthermoelasticity in two dimensions”, International Journal of Numerical Methodsfor Heat and Fluid Flow, 25: 1488–1510, 2015.

[4] G. Kosec, “A local numerical solution of a fluid-flow problem on an irregulardomain”, Advances in Engineering Software, in press, 2016.

9

Page 10: Contact Solid Mechanics Simulation

DRAFT

[5] R. Trobec, M. Šterk, B. Robic, “Computational complexity and parallelization ofthe meshless local Petrov-Galerkin method”, Computers and Structures, 87(1-2):81–90, 2009.

[6] G. Kosec, M. Kolman, J. Slak, “Utilities for solving PDEs with meshless meth-ods”, 2016, URL https://gitlab.com/e62Lab/e62numcodes.git.

[7] M. Zerroukat, H. Power, C.S. Chen, “A numerical method for heat transfer prob-lems using collocation and radial basis functions”, International Journal of Nu-merical Methods in Engineering, 42: 1263–1278, 1998.

[8] S. Chantasiriwan, “Performance of multiquadric collocation method in solvinglid-driven cavity flow problem with low reynolds number”, CMES: ComputerModeling in Engineering and Sciences, 15: 137–146, 2006.

10