a comparative study of conforming and nonconforming high ... · rotated bilinear ~q 1 fe nodal...
TRANSCRIPT
![Page 1: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/1.jpg)
A Comparative Study of Conformingand Nonconforming High-Resolution
Finite Element SchemesMatthias Möller
Institute of Applied Mathematics (LS3)TU Dortmund, Germany
European Seminar on ComputingPilsen, Czech Republic
June 25, 2012
Montag, 25. Juni 12
![Page 2: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/2.jpg)
Family of AFC schemes Kuzmin et al.
MLu = [K+D]u+ F(u,u)
artificial diffusion operator
antidiffusive correctionlumped mass matrix
Montag, 25. Juni 12
![Page 3: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/3.jpg)
linearized flux-correction
nonlinear flux-correction
Family of AFC schemes Kuzmin et al.
NL-GP NL-FCT NL-TVD Low-order
MLu = MLuL + F (uL, uL)Lin-FCT
F (u) = �DuF (u, u) = [ML �MC ]u�Du F ⌘ 0
MLu = [K+D]u+ F(u,u)
Montag, 25. Juni 12
![Page 4: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/4.jpg)
linearized flux-correction
nonlinear flux-correction
Family of AFC schemes Kuzmin et al.
AFC-LPT. . .
NL-GP NL-FCT NL-TVD Low-order
MLu = MLuL + F (uL, uL)Lin-FCT
F (u) = �DuF (u, u) = [ML �MC ]u�Du F ⌘ 0
MLu = [K+D]u+ F(u,u)
Montag, 25. Juni 12
![Page 5: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/5.jpg)
Review of design principles
■ Jameson‘s Local Extremum Diminishing criterion
IF
THEN local solution maxima/minima do not increase/decrease
positive not negative
miui =X
j 6=i
�ij(uj � ui)
Montag, 25. Juni 12
![Page 6: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/6.jpg)
Review of design principles
■ Jameson‘s Local Extremum Diminishing criterion
IF
THEN local solution maxima/minima do not increase/decrease
■ Semi-discrete high-resolution scheme
positive not negative
miui =X
j 6=i
�ij(uj � ui)
miui =X
j 6=i
(kij + dij)(uj � ui) + �iui +X
j 6=i
↵ijfij
mi =X
jmij
not negative by construction of
diffusion coefficient
controlled byflux limiter
Montag, 25. Juni 12
![Page 7: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/7.jpg)
Review of design principles
■ Jameson‘s Local Extremum Diminishing criterion
IF
THEN local solution maxima/minima do not increase/decrease
■ Semi-discrete high-resolution scheme
■ Edge-based assembly of operators/vectors is feasible and efficient!
positive not negative
miui =X
j 6=i
�ij(uj � ui)
miui =X
j 6=i
(kij + dij)(uj � ui) + �iui +X
j 6=i
↵ijfij
mi =X
jmij
not negative by construction of
diffusion coefficient
controlled byflux limiter
Montag, 25. Juni 12
![Page 8: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/8.jpg)
Bilinear Q1 FE
■ nodal values at cell vertices
Rotated bilinear ~Q1 FE
■ nodal values at cell midpoints
■ integral mean-values at edges
Finite Elements
Montag, 25. Juni 12
![Page 9: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/9.jpg)
■ Uniform 128x128 grid of unit square with 5% stochastic disturbance
Finite Element properties
△(A) MC ML NEQ NA
Q1 8 ✓ ✓ 16,641 82,680
~Q1par 6 ✓ ✓ 33,024 140,483
~Q1np 6 ✓ ✓ 33,024 140,478
~Q1par 6 -7.2E-07 (✓) 33,024 138,204
~Q1np 6 -7.2E-07 (✓) 33,024 138,576
meanvalue
pointvalue
Montag, 25. Juni 12
![Page 10: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/10.jpg)
■ Uniform 128x128 grid of unit square with 5% stochastic disturbance
Finite Element properties
△(A) MC ML NEQ NA
Q1 8 ✓ ✓ 16,641 82,680
~Q1par 6 ✓ ✓ 33,024 140,483
~Q1np 6 ✓ ✓ 33,024 140,478
~Q1par 6 -7.2E-07 (✓) 33,024 138,204
~Q1np 6 -7.2E-07 (✓) 33,024 138,576
meanvalue
pointvalue
Montag, 25. Juni 12
![Page 11: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/11.jpg)
Finite Element properties
Montag, 25. Juni 12
![Page 12: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/12.jpg)
Finite Element properties
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 369
Q1 FE
Montag, 25. Juni 12
![Page 13: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/13.jpg)
Finite Element properties
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 369
Q1 FE
0 10 20 30 40 50 60 70 80
0
10
20
30
40
50
60
70
80
nz = 530
~Q1 FE
Montag, 25. Juni 12
![Page 14: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/14.jpg)
Finite Element properties
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 369
Q1 FE
0 10 20 30 40 50 60 70 80
0
10
20
30
40
50
60
70
80
nz = 530
~Q1 FE
3456789
101112
0 10 20 30 40 50 60 70 80
Number of non-zero entries per matrix row
Number of matrix row
Q1~Q1
Storage format:■ CRS, CCS
Storage format:■ ELLPACK
Montag, 25. Juni 12
![Page 15: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/15.jpg)
Solid body rotation
Pure convection problem
u+r · (vu) = 0 in ⌦ = (0, 1)2
u = 0 on �
inflow
■ Velocity field
■ Grid size
■ Stochastic grid disturbance
■ Time step in Crank-Nicolson
■ Initial = exact solution at
v(x, y) = (0.5� y, x� 0.5)
�t = 1.28 · h
t = 2⇡k, k 2 N
~Q1 FE NL-FCTsim. time t=2π
� 2 {0%, 1%, 5%}
h = 1/2l, l = 5, 6, . . .
Montag, 25. Juni 12
![Page 16: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/16.jpg)
SBR: L2-error (0% mesh disturbance)
0,01
0,1
1
5 6 7 8
Q1 FE
Refinement level
Low-order Lin-FCT NL-FCT NL-GP NL-TVD
0,01
0,1
1
5 6 7 8
~Q1 FE
Refinement level
Montag, 25. Juni 12
![Page 17: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/17.jpg)
SBR: L2-error (5% mesh disturbance)
0,01
0,1
1
5 6 7 8
Q1 FE
Refinement level
Low-order Lin-FCT NL-FCT NL-GP NL-TVD
0,01
0,1
1
5 6 7 8
~Q1 FE
Refinement level
Montag, 25. Juni 12
![Page 18: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/18.jpg)
Rotation of a Gaussian hill
■ Convection-diffusion equation
u+r · (vu� ✏ru) = 0
in ⌦ = (�1, 1)2
■ Velocity field, diffusion coefficient
■ Analytical solution
■ Position of the peak
■ Other parameters as before
v(x, y) = (�y, x), ✏ = 0.001
x(t) = �0.5 sin t
y(t) = 0.5 cos(t)
~Q1 FE NL-FCTtime t=5/2π
u(x, y, t) =1
4⇡✏te
� r2
4✏t
r
2 = (x� x)2 + (y � y)2
Montag, 25. Juni 12
![Page 19: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/19.jpg)
RGH: L2-error (0% mesh disturbance)
0,001
0,01
0,1
1
10
5 6 7 8 9
Q1 FE
Refinement level
Low-order Lin-FCT NL-FCT NL-GP NL-TVD
0,001
0,01
0,1
1
10
5 6 7 8 9
~Q1 FE
Refinement level
Montag, 25. Juni 12
![Page 20: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/20.jpg)
RGH: dispersion-error (0% mesh disturbance)
-0,5
0,5
1,5
2,5
3,5
4,5
5,5
5 6 7 8 9
Q1 FE
Refinement level
Low-order Lin-FCT NL-FCT NL-GP NL-TVD
-0,5
0,5
1,5
2,5
3,5
4,5
5,5
5 6 7 8 9
~Q1 FE
Refinement level
Montag, 25. Juni 12
![Page 21: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/21.jpg)
RGH: L2-error (5% mesh disturbance)
0,001
0,01
0,1
1
10
5 6 7 8 9
Q1 FE
Refinement level
Low-order Lin-FCT NL-FCT NL-GP NL-TVD
0,001
0,01
0,1
1
10
5 6 7 8 9
~Q1 FE
Refinement level
Montag, 25. Juni 12
![Page 22: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/22.jpg)
RGH: dispersion-error (5% mesh disturbance)
-0,5
0,5
1,5
2,5
3,5
4,5
5,5
5 6 7 8 9
Q1 FE
Refinement level
Low-order Lin-FCT NL-FCT NL-GP NL-TVD
-0,5
0,5
1,5
2,5
3,5
4,5
5,5
5 6 7 8 9
~Q1 FE
Refinement level
Montag, 25. Juni 12
![Page 23: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/23.jpg)
3:3conforming
non
conforming
good accuracy good
small numerical diffusion small
smaller #DOFs, #edges larger
irregular sparsity pattern regular
Taxonomy of finite elements
Montag, 25. Juni 12
![Page 24: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/24.jpg)
Can nonconforming finite elements help to improve performance on many-core hardware ?
NVIDIA Kepler GK110 Die Shot (taken from: www.gpgpu.org)Montag, 25. Juni 12
![Page 25: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/25.jpg)
Example: edge-based flux-assembly with Q1 FE
0
10
20
30
40
50
0K 1K 10K 100K 1.000K
Band
wid
th (
GB/
s)
#edges per color group
1 OMP 2 OMP 4 OMP8 OMP 12 OMP Xeon X5680(2x6) CUDA C2070
best GPU performancefor large problem sizes
best CPU performanceif data fits into cache
Montag, 25. Juni 12
![Page 26: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/26.jpg)
Example: edge-based flux assembly with Q1 FE
0
25
50
75
100
125
150
CUDA OMP 6
Acc
umul
ated
tim
e (m
s)
w/ transf.„55 ms“
132 ms
23 ms
5,7x
0
10
20
30
40
50
0K
125K
250K
375K
500K
Band
wid
th (
GB/
s)
Color groups 1-14
OMP 6 on Core i7 EPCUDA on C2070#edges per color group
6-7 GB/s
Fi =N
colorsX
k=1
X
ij2CGk
�ij(Uj � Ui)
Montag, 25. Juni 12
![Page 27: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/27.jpg)
0
25
50
75
100
125
150
175
200
CUDA OMP 6
Acc
umul
ated
tim
e (m
s)
w/ transf.„95 ms“
191 ms
32 ms
5,9x
Example: edge-based flux assembly with ~Q1 FE
0
10
20
30
40
50
60
0K
375K
750K
1.125K
1.500K
Band
wid
th (
GB/
s)
Color groups 1-9
OMP 6 on Core i7 EPCUDA on C2070#edges per color group
6-7 GB/s
Montag, 25. Juni 12
![Page 28: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/28.jpg)
Summary
Nonconforming ~Q1 finite elements
■ can be used within the algebraic flux correction framework
■ are comparable to conforming FEs (accuracy/numerical diffusion)
■ increase number of DOFs as compared to conforming Q1 FEs
■ lead to system matrices with regular structure favorable for HPC
Future plans
■ Apply nonconforming AFC schemes to systems of equations
■ Exploit benefits of nonconforming FEs for many-core architectures: speed up parallel (edge-based) assembly loops, implement more efficient matrix structures (ELLPACK), reduce communication costs in parallelized code
Montag, 25. Juni 12
![Page 29: A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal values at cell midpoints integral mean-values at edges Finite Elements Montag, 25. Juni](https://reader034.vdocuments.site/reader034/viewer/2022042209/5eac9a69c970ee129109cc78/html5/thumbnails/29.jpg)
Acknowledgements
AFC schemes
■ D. Kuzmin (University of Erlangen-Nuremberg)
CUDA/GPU programming
■ D. Göddeke, D. Ribbrock, M. Geveler (TU Dortmund)
Featflow2
■ M. Köster, P. Zajac (TU Dortmund)
Source code freely available at:
http://www.featflow.de/en/software/featflow2.html
Montag, 25. Juni 12