challenges simulating real fuel combustion...
TRANSCRIPT
This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 LLNL-PRES-652254!
Challenges Simulating Real Fuel Combustion Kinetics: The Role of GPUs
M. J. McNenly and R. A. Whitesides
GPU Technology Conference March 27, 2014 San Jose, CA
Lawrence Livermore National Laboratory
2 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Enhanced understanding of high efficiency clean combustion requires expensive models that fully couple detailed kinetics with CFD
Objective !Create faster and more accurate combustion solvers.!
We used to aim for … !Detailed chemistry!
!
!
!
in highly resolved 3D simulations!
!
!
Ex. !Diesel component!!C20H42 (LLNL)!!7.2K !species!!53K !reaction steps!
Ex. !SI/HCCI transition ~30M cells for Bosch in LLNL’s hpc4energy incubator !
§ Accelerates R&D on three major challenges identified in the DOE VTP multi-year program plan:!A. Lack of fundamental knowledge
of advanced engine combustion regimes!
C. Lack of modeling capability for combustion and emission control!
D. Lack of effective engine controls !
3 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Enhanced understanding of high efficiency clean combustion requires expensive models that fully couple detailed kinetics with CFD
Objective !Create faster and more accurate combustion solvers.!
Now we want to use… !Detailed chemistry!
!
!
!
!
Ex. !9-component diesel surrogate (AVFL18)!!C. Mueller et al. Energy Fuels, 2012.!!+10K !species!!+75K!reaction steps!
§ Accelerates R&D on three major challenges identified in the DOE VTP multi-year program plan:!A. Lack of fundamental knowledge
of advanced engine combustion regimes!
C. Lack of modeling capability for combustion and emission control!
D. Lack of effective engine controls !in highly resolved 3D simulations!
4 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Long term challenges requires new algorithms and lower cost computing architecture (flops/$)
Current industrial research achieves 30M cells LES/flamelet cells or 1.4B species-cells RANS/detailed finite rate kinetics !
The “John Deur [Cummins] Challenge” – all of the above with 150M cells and 10K species !
• Bare memory requirements: 12TB for a snapshot!• 400 nodes of cab (2x 8 core Intel, 2GB per core)!• 2000 NVIDIA K20X (4 Indiana University Big Red II)!
Pick one: Turbulence Chemistry!
5 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
HCCI relies on the chemical kinetics of the charge mixture to determine ignition timing and burn duration and not mixing
Videos courtesy of Y. Ikeda (U of Kobe)!
HCCI!Spark-Ignition (SI)!
6 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Predictive HCCI models require accurate chemistry more than detailed fluid dynamic transport
Aceves et al. SAE-2005-01-2134:!
The cylinder pressure was measured with a Kistler£ 7061B pressure transducer via a Kistler¥ 5011 charge amplifier. For each test condition, the pressure was recorded for 100 cycles, every 0.2 crank angle degrees. The pressure data was analyzed using a single zone heat release model [19].
16
Figure 4. Sketch of the flat piston crown.
Figure 5. Flat piston crown used in the disc geometry.
The experiments also included in-cylinder flow measurements to determine turbulence levels. The velocity measurements were performed with a two component DANTEC¥ fiber-flow system illuminated by an Ar-ion laser. The measuring volume length was about 0.7 mm and the diameter about 0.05 mm. More on the LDV system specifications can be found in [17].
Optical access to the combustion chamber was through a quartz window with a 10 mm diameter placed in the location where the diesel fuel injector is normally mounted. The LDV measurement volume was located 5 mm below the cylinder head and had a 20 degree angle from the vertical axis. Figure 6 shows the measurement position and the obtained velocity components.
The seeding used was a polystyrene-latex dispersion in water. The mean polystyrene particle size was 0.28 Pm and the mean droplet size from the liquid atomizers was 3 – 4 Pm. The dry weight of the dispersion was less than 1 %, which means that the resulting dry particle size was below 1 Pm. The obtained data rate was approximately 5 – 7 bursts per crank angle.
Figure 6. Picture showing the LDV measurement position for determining turbulence (seen from above) and the velocity components. The swirling flow in the engine is counterclockwise.
THE MODEL
Figure 7 illustrates the overall sequence of calculations required for the multi-zone model. The procedure is started by making a fluid mechanics run (KIVA3V, [20]) considering motored (no ignition) conditions. In this paper, the KIVA3V run considers only the trapped cycle, starting at intake valve closing (IVC). For the KIVA3V simulation, turbulent kinetic energy is initialized at IVC as 10% of the kinetic energy calculated by assuming that all the mass in the cylinder is moving at the mean piston speed. The turbulence length scale is initialized as the maximum intake valve lift of 1.19 cm. The dissipation of turbulence kinetic energy at IVC is calculated as a function of the initial turbulence kinetic energy and the initial turbulence length scale. All of these are standard modeling assumptions for KIVA3V. The same initial turbulence values are used for the two piston geometries. According to Hessel [21], the pre-IVC in-cylinder turbulence field is generated early in the intake stroke from the intake jets that form in the valve curtain region and from complex structures that form near the head during the piston's downward acceleration. Since the valves, head and engine speed are the same for both cases, assigning the same turbulence values at IVC seems justified. Late in the intake stroke as the piston decelerates and during early compression, the source of turbulence generation (high speed flow through the intake valves) diminishes and turbulence steadily decays. The turbulence fields generated by the different bowls play an important role later in compression where the source of turbulence is not the intake event, but instead the squish/swirl interaction.
KIVA3V calculates statistical information about the fluctuating turbulent quantities. However, only the statistical ensemble average time histories are used in the multi-zone model. Thus simulation results can be compared to experimental ensemble averages of many cycles, and cycle-by-cycle variations are not accounted for in the multi-zone model.
experimental results with a high resolution grid and a detailed chemical kinetic model. On the down side, this multi-zone model implementation does not consider the effect of mixing after ignition, which plays a minor role on combustion, but has been shown to have an effect on HC and CO emissions [16]. The purpose of this paper is to test whether the effect of turbulence on HCCI combustion can be explained exclusively in terms of temperature distribution inside the cylinder.
THE EXPERIMENT
The two cylinder geometries used in this study are produced with interchangeable piston crowns, which make it possible to use the same piston body [17, 18]. Figure 2 shows a sketch and Figure 3 a photograph of the square geometry. In the square bowl the swirling flow is believed to break up into smaller eddies in the corners. This eddy break-up generates high amounts of small scale turbulence. The squish distance with the square bowl combustion chamber is only 1 mm, giving a strong squish motion. Figures 4 and 5 show the disc geometry. For the disc combustion chamber the swirling flow is expected to be more or less undisturbed during the compression and expansion stroke. The squish distance is 12 mm with the disc geometry. Table 1 lists the main dimensions for the two crowns.
42
54
27
Figure 2. Sketch of the piston crown with square bowl.
Table 1. Piston Crown Specifications. Square Disc Squish Distance 1 mm 12 mm Topland height 32 mm 21 mm Cylinder wall area at TDC
323 cm2 278 cm2
The experiment [15] included runs for naturally aspirated and supercharged conditions. In this paper we only analyze the supercharged cases, which used 2 bar absolute intake pressure, iso-octane fuel, 0.4 equivalence ratio and 11.2:1 compression ratio. Since the in-cylinder turbulence and mean velocities are functions of crank angle, the combustion phasing was varied from early timing with steep pressure rise to very late timing with noticeable cycle to cycle variations.
Combustion phasing was varied by changing the intake temperature with an electric heater. The engine speed was set to 1200 rpm at all conditions.
The single cylinder test engine is based on an in-line 6 cylinder Volvo¥ TD100 truck engine. The major engine specifications are shown in Table 2. The port fuel injector was placed approximately 300 millimeters upstream of the inlet valve. The inlet air temperature was adjusted with an electric heater mounted upstream of the fuel injector.
Figure 3. Piston crown with square bowl combustion chamber.
Table 2. Engine specifications and operating conditions. Displaced Volume 1600 cm3 Bore 120.65 mm Stroke 140 mm Connecting Rod Length 260 mm Number of Valves 2 Inlet Valve Diameter 50 mm Exhaust Valve Diameter 46 mm Exhaust Valve Open 39q BBDC (at 1 mm lift) Exhaust Valve Close 10q BTDC (at 1 mm lift) Inlet Valve Open 5q ATDC (at 1 mm lift) Inlet Valve Close 13q ABDC (at 1 mm lift) Valve Lift Exhaust 13.4 mm Valve Lift Inlet 11.9 mm Compression Ratio 11.2:1 Coolant Temperature 88 qC Oil Temperature 90 qC Engine Speed 1200 rpm Fuel Iso-octane Equivalence Ratio 0.4 Intake pressure 2 bar absolute IMEP 6.5-7.3 bar Exhaust Pressure 2.3 bar
0.00.51.01.52.02.53.03.54.04.55.0
-50.0 -30.0 -10.0 10.0 30.0 50.0crank angle, degrees
turb
ulen
ce in
tens
ity, m
/s
solid lines: experimentaldotted lines: numerical
discsquare
Figure 9. Three-dimensional 90q sector mesh for the square geometry, with 0.5X resolution and 410,000 elements.
Figure 10. Comparison between experimental and numerical results for turbulence intensity as a function of crank angle, for the square and the disc geometries. The turbulence intensity is measured and calculated at the point indicated in Figure 6, 5 mm off the cylinder axis. Experimental results are shown by solid lines and numerical results by dotted lines.
RESULTS
Figure 10 shows a comparison between experimental and numerical results for turbulence intensity as a function of crank angle, for the square and the disc geometries. The turbulence intensity is measured and calculated at the point indicated in Figure 6, 5 mm off the cylinder axis and 5 mm below the cylinder head. Experimental results are shown by solid lines and numerical results by dotted lines. The figure shows that KIVA3V accurately predicts turbulence intensity for the disc geometry. The predicted turbulence intensity for the square bowl is earlier in phase and higher in magnitude than the experimental results. This difference may be explained by considering that the model predicts turbulence in 3D, where the measurements were done in 2D. Visualization of the
computational results for the square bowl shows that strong tumble vortices form in the bowl as a result of swirl and squish flow interaction. These flow structures set up large transient velocity gradients (sources of turbulence) near the measurement volume, much larger in the non-measured component direction than in the two measured component directions. Thus, the measurements do not capture the relatively large third component contribution to turbulence. These flow structures did not form with the flat piston geometry, where the modeling results are more representative of the experimental results. For a better comparison, it would be desirable to disregard the third (non-measured) component of turbulence intensity from the numerical results. However, this is not possible, because KIVA3V only calculates overall turbulence intensity and not individual components.
Figures 11, 12 and 13 show a comparison between experimental and numerical results for the disc geometry. Figure 11 shows experimental pressure traces (solid lines) and numerical pressure traces (dotted lines). The figure shows good agreement in all cases. Heat release rates are also well predicted (Figure 12). Burn duration is quite well predicted, although the maximum rate of heat release is underpredicted by 10-20% in all cases. Previous work [27] has shown that cylinder surface temperature has an effect on heat release rate, and it may be the main reason for the disagreement, since surface temperature was not measured in the experiment. Still, the numerical results are considered quite accurate, because heat release rate is obtained by analyzing pressure traces according to the first law of thermodynamics [28]. Small inaccuracies in pressure can therefore result in large differences in apparent heat release rates.
-5 0 5 10 15 20
crank angle, degrees
0
10
20
30
40
50
60
70
80
90
100
Pre
ssur
e, b
ardisc geometry P=2 bar, I=0.40
solid lines: experimentaldotted lines: numerical
Tin=451 KTin=443 KTin=436 KTin=433 KTin=431 KTin=429 K
Figure 11. Comparison between experimental and numerical pressure traces for the disc geometry. The figure shows experimental pressure traces with solid lines and numerical pressure traces with dotted lines.
What role do the in-cylinder transport processes affect HCCI performance?!
7 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Approach – Accelerate research in advanced combustion regimes by developing faster and more predictive engine models
1. Better algorithms and applied mathematics – same solution only faster
2. New computing architecture – more flops per second, per dollar, per watt
3. Improved physical models – more accuracy, better error control
8 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
General Purpose Graphical Processing Units (GPGPUs) bring Tflop/s computing power to the desktop
GPU: NVIDIA GTX 280 Cores: 240 Mem: 1.0 GB Tflop/s: 0.93 Price: <$300
Originally used for graphics intensive applications:
- video games
CPU: Intel/AMD Cores: up to 12 Mem: up to 256GB Tflop/s: up to 0.13 Price: up to $1300
GPU: NVIDIA GTX 480 Cores: 480 Mem: 1.5 GB Tflop/s: 1.35 Price: $500
Our GPU research into combustion algorithms had very modest roots from Aug. 2010 presentation to program partners
9 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Enthalpy and entropy calculations have greater speedup – benefit from more arithmetic operations per memory access
10 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Approach – Accelerate research in advanced combustion regimes by developing faster and more predictive engine models
1. Better algorithms and applied mathematics – same solution only faster
2. New computing architecture – more flops per second, per dollar, per watt
3. Improved physical models – more accuracy, better error control
11 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Better Algorithms: adaptive preconditioner using on-the-fly reduction produces the same solution significantly faster
Two approaches to faster chemistry solutions
Jacobian Matrix (species coupling freq.)
slower faster
Ex. iso-octane 874 species 3796 reactions
1. Classic mechanism reduction:
• Smaller ODE size • Smaller Jacobian
matrix
Ex.197 species
2. LLNL’s adaptive preconditioner:
• Identical ODE • Reduced mech only
in preconditioner
Filter out 50-75% of the least important reactions
Solution is faster but is not accurate over the entire operating range
Our solver is as fast as the reduced mechanism without any loss of accuracy - more than 10x speedup
12 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Traditional dense matrix ODE solvers still found in KIVA and OpenFOAM
- 90 years
Simulation time (chemistry-only) for 106 cells on 32 processors!
LLNL’s solver brings well-resolved chemistry and 3D CFD to 1-day engine design iterations using iso-octane (874 species)
2-methylnonadecane!7172 species!52980 steps!
874 species iso-octane now only 1-day!!
New commercial solvers using sparse systems
- 150 days
New LLNL solvers created for VTP program
- 11 days
13 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Approach – Accelerate research in advanced combustion regimes by developing faster and more predictive engine models
1. Better algorithms and applied mathematics – same solution only faster
2. New computing architecture – more flops per second, per dollar, per watt
3. Improved physical models – more accuracy, better error control
14 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
The calculation of the species production rate serves as a practical case study to illustrate GPU algorithm development
Creation rate of species i
Summation over all reactions containing species i as a product
Rate of progress of reaction step j
Example: hydrogen mechanism!
15 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Low arithmetic intensity and unstructured memory access of the species production rate algorithm makes it ill-suited for the GPU
Scatter-add for the species production rates
iso-octane (874 species) GPU: 0.13 GB/s CPU: 6.6 GB/s
Unstructured read or write operations very costly on GPU
16 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Processing multiple instructions that update the same species within the same thread block increases throughput
GPU Thread Block 1
Process 4 instructions per block:
iso-octane (874 species) GPU: 14 GB/s CPU: 17 GB/s
Shared Memory Bank
iso-octane (874 species) GPU: 85 GB/s (2n instr) CPU: 17 GB/s
17 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Similar algorithm improvements allows for an order of magnitude speedup in the calculation of the system derivatives on the GPU
18 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
The system derivatives for chemistry are only half the CPU cost – the matrix operations must also be accelerated
A: CPU Only (1T)!D: Vector Ops on GPU!E: Matrix skipping (1T)!F: Matrix CPU threads (8T)!To
tal S
imul
atio
n Ti
me
(sec
)!
A! D! E! F!
19 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
GLU sparse matrix package § Developed internally at NVIDIA
- lead developers Maxim Naumov & Sharan Chetlur
§ Key application: integrated circuits (SPICE)
§ Non-symmetric matrices
§ Beta software
§ LLNL has been provided early access
NVIDIA has provided us with a new sparse matrix software library
http://on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE-Acceleration-on-GPUs.pdf!
20 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
The system derivatives for chemistry are only half the CPU cost – the matrix operations must also be accelerated
A: CPU Only (1T)!D: Vector Ops on GPU!E: Matrix skipping (1T)!F: Matrix CPU threads (8T)!To
tal S
imul
atio
n Ti
me
(sec
)!
A! D! E! F!
21 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
A: CPU Only (1T)!D: Vector Ops on GPU!E: Matrix skipping (1T)!F: Matrix CPU threads (8T)!To
tal S
imul
atio
n Ti
me
(sec
)!
A! D! E! F!
G: GLU Matrix Ops!
G!
First GLU results promising – it is faster than the best work avoidance CPU-heuristic
22 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Second GLU results even better – now comparable speed to multi-threaded CPU but offers greater future growth
A: CPU Only (1T)!D: Vector Ops on GPU!E: Matrix skipping (1T)!F: Matrix CPU threads (8T)!To
tal S
imul
atio
n Ti
me
(sec
)!
A! D! E! F!
G: GLU Matrix Ops!
G!
H: GLU Matrix Ops #2!
H!
23 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Further improvements are expected with new “mix & match” solver capability added to the next build
A: CPU Only (1T)!D: Vector Ops on GPU!E: Matrix skipping (1T)!F: Matrix CPU threads (8T)!To
tal S
imul
atio
n Ti
me
(sec
)!
A! D! E! F!
G: GLU Matrix Ops!
G!
H: GLU Matrix Ops #2!
H!
CPU matrix!GPU matrix!
24 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Chemistry integration over a fluid dynamic time step
t0 tN t1 t2 tN-1
dt0 dt1
Single chemical reactor!
• Integrator decides how long a step to take and when to perform new factorization!
• Black lines indicates derivative evaluation and backsolve!• Red lines indicate matrix factorization (traditionally
expensive).!
25 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Reactor coupling: Ideal Case
cell 1!
cell 2!
cell 3!
cell 4!
Combined!
26 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Reactor coupling: Ideal Case
cell 1!
cell 2!
cell 3!
cell 4!
Combined!
Exactly the same initial conditions means exactly the same combined time steps and matrix factorizations.!
27 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Reactor coupling: Poor Case
cell 1!
cell 2!
cell 3!
cell 4!
Combined!
28 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Reactor coupling: Poor Case
cell 1!
cell 2!
cell 3!
cell 4!
Combined!
Closely related initial conditions can lead to very large coupling penalty.!
29 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Reactor coupling: Acceptable Case
cell 1!
cell 2!
cell 3!
cell 4!
Combined!
30 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Under what conditions can GPU-friendly multireactor groups be solved efficiently?
Ti = 800,!
ΔTi
50!25!0!
Φi! = 0.2!
ΔΦi! 0! 0.1! 0.2!
Speedup (CPU
Time/G
PU Tim
e)!
31 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Initial study reveals many operating conditions where the multireactor approach is near-ideal
Ti
800!
1600!
1400!
1200!
1000!
1800!
2000! ΔTi
50!25!0!
Φi! 0.2! 0.4! 0.6! 0.8! 1.0!
ΔΦi! 0! 0.1! 0.2!
Speedup (GPU
Time/C
PU Tim
e)!
32 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Future research directions to improve combustion algorithms on the GPU
1. Evaluate new metrics for organizing GPU-friendly multireactor groups!
2. Take full advantage of the hybrid CPU/GPU architecture – using the CPU to avoid poorly coupled multireactors!
!3. Sensitivity calculations and other mechanism
development tools used by fuel chemists!4. Lagrangian spray models!5. New control strategies for stiff ODE integrators!6. Many-species transport!
Near term (FY14):
Long term:
33 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Summary: GPU-accelerated multireactor solver will enable detailed fuel chemistry to be used in IC engine design
A: CPU Only (1T)!D: Vector Ops on GPU!E: Matrix skipping (1T)!F: Matrix CPU threads (8T)!To
tal S
imul
atio
n Ti
me
(sec
)!
A! D! E! F!
G: GLU Matrix Ops!
G!
H: GLU Matrix Ops #2!
H!
34 Lawrence Livermore National Laboratory
McNenly & Whitesides, LLNL-PRES-652254!
Acknowledgements
We gratefully acknowledge the support of Gurpreet Singh, the Advanced Combustion Engines Program Leader for the US Department of Energy Vehicle Technologies Office. We would like to thank Stan Posey of NVIDIA for providing us with new K20 Tesla GPUs (2496 CUDA cores) for testing, and Maxim Naumov for adding new features to the GLU library and providing us with support. We would also like to thank Prof. Bill Green and his research group at MIT for providing large mechanisms to test from their Reaction Mechanism Generator (RMG) package (http://rmg.sourceforge.net/).