simon mcintosh-smith [email protected] head of...
TRANSCRIPT
Exploiting OpenCL for heterogeneous computing:
a case study
Simon McIntosh-Smith [email protected] Head of Microelectronics Research
1
! Collaborators • Richard B. Sessions, Amaurys Avila Ibarra
• University of Bristol, Biochemistry • James Price
• University of Bristol, Computer Science • Tsuyoshi Hamada, Felipe Cruz
• University of Nagasaki, Japan
2
! Molecular docking
3
Proteins typically O(1000) atoms Ligands typically O(100) atoms
! BUDE: Bristol University Docking Engine
Typical docking scoring functions
Empirical Free Energy Forcefield
BUDE
Free Energy calculations MM1,2 QM/MM3
Entropy: solvation configurational Electrostatics All atom Explicit solvent
No Yes Yes Approx Approx Yes ? Approx Yes No Yes Yes No No Yes
Accuracy
Speed
1. MD Tyka, AR Clarke, RB Sessions, J. Phys. Chem. B 110 17212-20 (2006)
2. MD Tyka, RB Sessions, AR Clarke, J. Phys. Chem. B 111 9571-80 (2007)
3. CJ Woods, FR Manby, AJ Mulholland, J. Chem. Phys. 128 014109 (2008) 4
! How BUDE’s EMC works
5
! Experimental results
6
! Redocking into Xray Structure
7 1CIL (Human carbonic anhydrase II)
! Another example
8 1EZQ (Human Factor XA)
! OpenCL for heterogeneous computing
GPU
CPU CPU A modern computer includes: – One or more CPUs – One or more GPUs
9
OpenCL (Open Compute Language) lets programmers write a single portable program that uses ALL
resources in the heterogeneous platform
GPU
! BUDE’s heterogeneous approach
1. Discover all OpenCL platforms/devices, inc. both CPUs and GPUs
2. Run a micro benchmark on each device, ideally a short piece of real work
3. Load balance using micro benchmark results
4. Re-run micro benchmark at regular intervals in case load changes
10
! Benchmark results
11
13.4 11.7
8.0 7.3 6.5
5.3 4.3
0.8
13.6 13.3 11.7 11.0
6.6 6.3 5.9 4.6 4.5
1.2 1.2 1.0 0.9 0.9 0.7 0.3 0.3 0 2 4 6 8
10 12 14 16
Spee
dup
(hig
her i
s be
tter)
13.4
11.7
8.1 7.3
6.5 5.3
0
2
4
6
8
10
12
14
16
M2050 (x2) GTX-580 M2070 M2050 GTX-285 HD5870
Spee
dup
(hig
her i
s be
tter)
! Selected performance results
12
GPUs
1.0
0.9 0.9
0.7
0.3
0.0
0.2
0.4
0.6
0.8
1.0
1.2
E5462 (Fortran) x2
E5620 x2 i7-2600K i5-2500T A8-3850 CPU
Spee
dup
(hig
her i
s be
tter)
! Selected performance results
13
CPUs
13.6
11.7
5.9
1.2
0
2
4
6
8
10
12
14
16
M2050 (x2) & E5620 (x2)
GTX-580 & E8500 HD5870 & i5-2500T A8-3850 GPU & CPU
Spee
dup
(hig
her i
s be
tter)
! Selected performance results
14
Heterogeneous (CPUs+GPUs)
13.4
11.7
1.0
0
2
4
6
8
10
12
14
16
M2050 x2 GTX-580 E5462 (Fortran) x2
Spee
dup
(hig
her i
s be
tter)
! Selected performance results
15
Where we started
Where we finished Where we could go
£340
! Relative energy and run-time
16
Measurements are for a constant amount of work. Energy measurements are “at the wall” and include any idle components.
88% reduction in energy 93% reduction in time
16.2
11.4 10.6
9.1 9.3
2.4 1.0
13.2
15.2 15.4
5.9 6.6
0.7 1.0
0
2
4
6
8
10
12
14
16
18
GTX-580 M2050 (x2)
M2050 (x2) +
E5620 (x2)
HD5870 HD5870 + i5-2500T
i5-2500T E5620 (x2)
Relative Performance Per Watt
Relative Performance
! What does this let us do?
17
! Potentially save lives
18
NDM-1 responsible for antibiotic resistance giving rise
to “superbugs”
! GPU-system DEGIMA
19
• Used 144 GPUs in parallel for drug docking simulations • ATI Radeon HD5870 & Intel i5-2500T
• ~300 TFLOPS single precision • Courtesy of Tsuyoshi Hamada and Felipe Cruz, Nagasaki
! NDM-1 experiment • 1 million candidate drug molecules times
20 conformers each à 20M dockings • 1.23x1017 atom-atom energies calculated • 267 days of GPU compute time and 224
days of CPU compute time • ~55 hours actual wall-time • A second run with 8 million molecules,
160M conformers on >200 GPUs is running right now!
20
! Conclusions • OpenCL enables truly heterogeneous computing,
harnessing all hardware resources in a system • GPUs can yield significant savings in energy costs
(and equipment costs) • OpenCL can work just as well for multi-core CPUs
as it does for GPUs
It’s possible to screen libraries of millions of molecules against complex targets using highly accurate, computationally-expensive methods in one weekend using equipment costing O(£100K)
21
! For an introduction to GPUs The GPU Computing Revolution – a Knowledge Transfer Report from the London Mathematical Society and the KTN for Industrial Mathematics • https://ktn.innovateuk.org/web/mathsktn/
articles/-/blogs/the-gpu-computing-revolution
22
! References • S. McIntosh-Smith, T. Wilson, A.A. Ibarra, J.
Crisp and R.B. Sessions, "Benchmarking energy efficiency, power costs and carbon emissions on heterogeneous systems", The Computer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091
• N. Gibbs, A.R. Clarke & R.B. Sessions, "Ab-initio Protein Folding using Physicochemical Potentials and a Simplified Off-Lattice Model", Proteins 43:186-202,200
23