accelerating forward modeling by cpu,gpu and mic · accelerating forward modeling by cpu,gpu and...

17
Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan [email protected] Department of Computer Science & Technology Tsinghua University, Beijing, China

Upload: others

Post on 10-Oct-2019

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

Accelerating Forward Modeling

by CPU,GPU and MIC

You, Yang and Fu, Haohuan

[email protected]

Department of Computer Science & Technology

Tsinghua University, Beijing, China

Page 2: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

Outline

Stencil

Architectures

Optimization Techniques

Experimental results

Page 3: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

Introduction

• Forward Modeling

– Oil and Gas Exploration

• Iterative Stencil Loops

– 99% of the time

• Lax-Wendroff Correction (LWC)

– Dablain, 1986

Page 4: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

LWC stencil

• Memory Access

– (36+1+1)*3=114

• Operations

– 228

• Flop to Byte Ratio

– 0.25 ~ 9.5 (double)

– 0.5 ~ 19 (single)

Page 5: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

Outline

Stencil

Architectures

Optimization Techniques

Experimental results

Page 6: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

Architectures

Architectures Peak Performance

(Double Precision)

Flop to Byte Ratio

(Double Precision)

Intel Sandy Bridge

CPU

256 Gflops

3.76

Intel Knights Corner

MIC

1010 Gflops 6.35

Nvidia Fermi C2070

GPU

515 Gflops 5.31

Nvidia Kepler K20x

GPU

1320 Gflops 7.02

Page 7: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

Outline

Stencil

Architectures

Optimization Techniques

Experimental results

Page 8: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

General techniques

• On-chip data reuse

– algorithmic FBR < architectural FBR

– 0.50 (stencil) < 21 (Kepler)

• Remove high-latency operations

– e.g. divisions

• Get rid of branches

– avoid discontinuities in SIMD/SIMT

• Continuous memory access

– AOS to SOA

Page 9: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

GPU: General Strategies

• One thread One point – High parallelism

• One thread One line – on-chip data reuse

• Common tricks – best L1/smem configuration

– minimize PCIe communication

– right block size • warp size

• shared memory bank size

Page 10: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

CPU/MIC: General Strategies

• Task parallelism

– Inherent parallelism

• Data parallelism

– SIMD

• On-chip data reuse

– Cache blocking

• Offload and Native

Page 11: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

CPU/MIC: Additional techniques

• Prefetch

– memory to cache

• Schedule

– thread and iteration

• Cilk array notation

– replace compiler hints

• Cilk Plus

– replace OpenMP

Page 12: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

Outline

Stencil

Architectures

Optimization Techniques

Experimental results

Page 13: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

Experimental results

Page 14: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

Performance Efficiency

Page 15: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

Knights Corner vs. Kepler

• User-controlled SM > Compiler-controlled L1

• Programmabilty – CUDA vs. Knights Corner Instructions

– OpenACC vs. OpenMP/Cilk

• Performance/Effort (double precision) – Hard to evaluate

• Power Efficiency (whole workstation) – MIC: 0.2770 Gflops/watt (single precision)

– GPU: 0.4133 Gflops/watt (single precision)

Page 16: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

Future work

• Support Vector Machine

– data mining

• Many-core and Multi-core architectures

• Welcome for discussion

Page 17: Accelerating Forward Modeling by CPU,GPU and MIC · Accelerating Forward Modeling by CPU,GPU and MIC You, Yang and Fu, Haohuan you-y12@mails.tsinghua.edu.cn Department of Computer

Thank You

You, Yang and Fu, Haohuan

[email protected]

Department of Computer Science& Technology

Tsinghua University, Beijing, China