platform-independent description of imaging · computer science x - system simulation group harald...
TRANSCRIPT
![Page 1: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/1.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Platform-independent
description of imaging
algorithms
H. Köstler
Salt Lake City, 17.3.2015
1
![Page 2: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/2.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
ExaStencils Project
2
Christian Lengauer
Armin Größlinger
Stefan Kronawitter
Sven Apel
Alexander GrebhahnMatthias Bolten
Hannah Rittich
Ulrich Rüde.
Harald Köstler
Sebastian Kuckuk
Jürgen Teich.
Frank Hannig
Christian Schmitt
A unique, tool-assisted, domain-specific
co-design approach for the class of
stencil codes
http://www.exastencils.org/
![Page 3: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/3.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Problems in High Performance Computing
Hardware: Modern HPC platforms are massively
parallel
Intra-core, intra-node, and inter-node
Software: Imaging applications become more complex
with increasing computational power
More complex models
Code development in interdisciplinary teams
Algorithm: Class of different algorithms grows, many of
them are just a general idea (like multigrid)
Components and parameters depend on image, type of problem, …
![Page 4: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/4.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
High Performance Computing: Applications
real-time imaging e.g. medicine
Large-scalesimulation
e.g. multi-physics
![Page 5: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/5.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Variational Imaging
General Problem: Find mapping , such that
the energy functional
is minimized.
5
![Page 6: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/6.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
State of the Art: Application-driven Projects
6
Mathematician
Hardware specialist
Software specialist
User from application field Description of application
Solution method
Parallel implementation and framework
Efficient implementation on specific hardware
![Page 7: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/7.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Proposed: Domain-driven Projects
7
Mathematician
Hardware specialist
Software specialist
Users from different
application fieldsDescription of application in domain specific language
Automatic selection of algorithmic components
Code generation for specific application
Automatic tuning on specific hardware
Domain knowledgeFeature Model
Domain expert
PDE
Operators::Laplacian(Data::solution) = Data::rhs
![Page 8: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/8.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Aspects
Code generation for imaging applications
High dynamic range compression
Image registration / Optical Flow
Image segmentation
Image denoising
Domain-specific language design
Domain-specific knowledge representation and optimization
Efficient Algorithms
Parallelization
Performance Tuning
8
![Page 9: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/9.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
ExaStencils – Overview
• DSL as intuitive
interface to the user
• Automatic deduction of
configuration if desired
• Prediction and
Optimization of the
configuration’s
performance using
SPL and LFA
• Code generation in
Scala
• Automatic hardware-
specific optimizations
11
![Page 10: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/10.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
ExaStencils – Layers
1210
![Page 11: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/11.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
DSL Scope
11
dimension 2D 3DDomain UnitSquare UnitCubeSolution Scalar VectorOperator Stencil weak form nonlinearBoundary Dirichlet Neumann periodic
Architecture CPU GPULanguages C++ Scala CUDA OpenCLParallelization MPI OpenMP
discrete domain regularGrid nodebased cellbasedoperator discretization FD FE FVdata types float double complexdiscretization accuracy 2 4
![Page 12: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/12.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
DSL EXAMPLE
High Dynamic Range compression
12
![Page 13: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/13.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Application: High Dynamic Range Compression
Data: Siemens AG. Healthcare Sector
Sequences of 2D x-ray images
ComputationalSteering
Visualization withOpenGL
Köstler. et al. Performance engineering to achieve real-time high dynamic range imaging.J Real-Time Image Processing. 2013.
![Page 14: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/14.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Optimized HDR Compression (size 2048x2048)
0
20
40
60
80
100
120
140
GTX 295/2 GTX 480 GTX 480(wavefront)
fps
half of an NVIDIA GTX 295
112 GB/s peak bandwidth
compute capability 1.3
NVIDIA GTX 480
177 GB/s peak bandwidth
compute capability 2.0 (Fermi)
14
![Page 15: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/15.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Selection of Multigrid Components: LFA Toolbox
15
Goal: minimize time T to reach prescribed accuracy κ
Asymptotic convergence rates are estimated via LFA
![Page 16: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/16.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Hardware description (DSL level Hardware)
16
Hardware cpu
bandwidth = 40
peak = 30
cores = 4
Node n
sockets = 2
![Page 17: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/17.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
HDR Compression
Idea: Modify magnitude of image gradient by position-dependent attenuating function
Energy functional
Solve by multigrid the Euler-Lagrange equation
RR 2:
ICFattal/Lischinski/Werman. Gradient Domain High Dynamic Range Compression. SIGGRAPH. 2002
xdCxuuEu
2
)(min)(
onu
infu
0
17
![Page 18: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/18.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
What is Multigrid?
Goal: Solve partial differential equation
After discretization one requires an efficient iterative solver for sparse systems
multigrid solver has complexity O(N) in number of unknowns N
18
hh fAu
onu
infu
0Ω
∂Ω
![Page 19: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/19.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Problem Description (continuous)
19
Domain omega = UnitSquare
f : omega -> R^1
u : omega -> R^1
Laplacian : ( omega -> R^1 ) -> ( omega -> R^1 )
Laplacian = dx^2 + dy^2
pde : Laplacian [ u ] = f in omega
bc : u = 0 in partial_omega
![Page 20: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/20.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Generation of discrete problem
Domain-specific knowledge
Discretization methods FD, FV, FE
Types of operators supported result in sparse matrices
Domain-specific optimization chooses type of discretization and e.g. concrete data types
Description is parsed, an abstract syntax tree is constructed and then transformed into a discrete representation of the problem
Code generation framework is implemented in Scala language
20
![Page 21: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/21.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Problem Description (discrete)
21
Fragments f1 = Regular_Square
Discrete_Domain omega levels 8
xsize [0] = 1024 // from Image
ysize [0] = 1024 // from Image
xsize [l+1] = xsize [l] / 2
ysize [l+1] = ysize [l] / 2
Field<Double,1>@nodes f
Field<Double,1>@nodes u
StencilMatrix<Double,1,1, FD, 2>@nodes Laplacian
![Page 22: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/22.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Multigrid Algorithm
22
![Page 23: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/23.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Algorithmic Parameters and Components
23
mgcomponents
solver = multigrid
cycletype=Vcycle
mgparameter
nprae = 2
npost = 1
iters = 10
![Page 24: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/24.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Layer 4 Functions – Example
24
function VCycle@coarsest ( ) : Unit /* coarse grid solver */
function VCycle@((coarsest + 1) to finest) ( ) : Unit
repeat up 1
Smoother@current ( )
UpResidual@current ( )
Restriction@current ( )
SetSolution@coarser ( 0 )
VCycle@coarser ( )
Correction@current ( )
repeat up 2
Smoother@current ( )
![Page 25: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/25.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Example: Jacobi Smoother
25
function Smoother@((coarsest + 1) to finest) ( ) : Unit
communicate Solution[0]@(current)
loop over inner on Solution@(current)
Solution[1]@current =
Solution[0]@current
+ ( ( ( 1.0 / diag ( Lapl@(current) ) ) * 0.8 )
* ( RHS@current - Lapl@(current) * Solution[0]@current ) )
…
![Page 26: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/26.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Jacobi Smoother – Resulting Code (w/o basic Opt)
26
#include "MultiGrid/MultiGrid.h"
void Smoother_4()
exchsolutionData_4(0);
#pragma omp parallel for schedule(static) num_threads(8)
for (int fragmentIdx = 0; fragmentIdx < 8; ++fragmentIdx)
if (isValidForSubdomain[fragmentIdx][0])
for (int y = iterationOffsetBegin[fragmentIdx][0][1];
y < (iterationOffsetEnd[fragmentIdx][0][1]+17); y +=1)
for (int x = iterationOffsetBegin[fragmentIdx][0][0];
x < (iterationOffsetEnd[fragmentIdx][0][0]+17); x +=1)
slottedFieldData_Solution[1][fragmentIdx][4][(((y*19)+19)+(x+1))] =
(slottedFieldData_Solution[0][fragmentIdx][4][(((y*19)+19)+(x+1))]
+(((1.0e+00/fieldData_LaplCoeff[fragmentIdx][4][((y*17)+x)])*8.0e-01)
*(fieldData_RHS[fragmentIdx][4][((y*17)+x)]
-(((((fieldData_LaplCoeff[fragmentIdx][4][((y*17)+x)]
*slottedFieldData_Solution[0][fragmentIdx][4][(((y*19)+19)+(x+1))])
+(fieldData_LaplCoeff[fragmentIdx][4][(((y*17)+289)+x)]
*slottedFieldData_Solution[0][fragmentIdx][4][(((y*19)+19)+(x+2))]))
+(fieldData_LaplCoeff[fragmentIdx][4][(((y*17)+578)+x)]
*slottedFieldData_Solution[0][fragmentIdx][4][(((y*19)+19)+x)]))
+(fieldData_LaplCoeff[fragmentIdx][4][(((y*17)+867)+x)]
*slottedFieldData_Solution[0][fragmentIdx][4][(((y*19)+38)+(x+1))]))
+(fieldData_LaplCoeff[fragmentIdx][4][(((y*17)+1156)+x)]
*slottedFieldData_Solution[0][fragmentIdx][4][((y*19)+(x+1))])))));
…
![Page 27: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/27.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Jacobi Smoother – Resulting Code (w/ basic Opt)
27
#include "MultiGrid/MultiGrid.h"
void Smoother_4()
exchsolutionData_4(0);
#pragma omp parallel for schedule(static) num_threads(8)
for (int fragmentIdx = 0; fragmentIdx < 8; ++fragmentIdx)
if (isValidForSubdomain[fragmentIdx][0])
for (int c0 = iterationOffsetBegin[fragmentIdx][0][1];
(c0<=(iterationOffsetEnd[fragmentIdx][0][1]+16)); c0 = (c0+1))
double* slottedFieldData_Solution_1_fragmentIdx_4_p1 =
&(slottedFieldData_Solution[1][fragmentIdx][4][(19*c0)]);
double* fieldData_RHS_fragmentIdx_4_p1 = &(fieldData_RHS[fragmentIdx][4][(17*c0)]);
double* slottedFieldData_Solution_0_fragmentIdx_4_p1 = …
double* fieldData_LaplCoeff_fragmentIdx_4_p1 = …
for (int c1 = iterationOffsetBegin[fragmentIdx][0][0];
(c1<=(iterationOffsetEnd[fragmentIdx][0][0]+16)); c1 = (c1+1))
slottedFieldData_Solution_1_fragmentIdx_4_p1[(c1+20)] =
(slottedFieldData_Solution_0_fragmentIdx_4_p1[(c1+20)]
+(((1.0e+00/fieldData_LaplCoeff_fragmentIdx_4_p1[c1])*8.0e-01)*(fieldData_RHS_fragmentIdx_4_p1[c1]
-(((((fieldData_LaplCoeff_fragmentIdx_4_p1[c1]*slottedFieldData_Solution_0_fragmentIdx_4_p1[(c1+20)])
+(fieldData_LaplCoeff_fragmentIdx_4_p1[(c1+289)]*slottedFieldData_Solution_0_fragmentIdx_4_p1[(c1+21)]))
+(fieldData_LaplCoeff_fragmentIdx_4_p1[(c1+578)]*slottedFieldData_Solution_0_fragmentIdx_4_p1[(c1+19)]))
+(fieldData_LaplCoeff_fragmentIdx_4_p1[(c1+867)]*slottedFieldData_Solution_0_fragmentIdx_4_p1[(c1+39)]))
+(fieldData_LaplCoeff_fragmentIdx_4_p1[(c1+1156)]*slottedFieldData_Solution_0_fragmentIdx_4_p1[(c1+1)])))));
…
![Page 28: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/28.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Denoising by Diffusion
Idea:
Use nonlinear anisotropic diffusion process to denoise the image
u0 in the domain Ω ,i.e. solve the time-dependent PDE
28
inxuxu
Tonnug
Tint
uugdiv
)()0,(
0,
)(
0
![Page 29: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/29.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Runtime Results for Different Problems
29
cpu gpu
1 8 256
Double Float Double Float Double Float
Lap
lace
2D Jacobi 546 1.17 3.50 3.50 6.42 9.10
GaussSeidel 453 1.16 2.42 3.24 3.41 4.72
3D Jacobi 608 1.08 2.60 3.00 6.83 10.31
GaussSeidel 608 1.08 2.60 3.01 4.22 5.96
Co
mp
lex
D
iffu
sio
n 2D Jacobi 9235 1.02 2.81 2.87 23.99 33.58
GaussSeidel 7504 1.11 2.18 2.29 11.62 16.42
3D Jacobi 8799 1.15 2.77 2.91 15.94 39.28
GaussSeidel 9048 1.80 2.60 2.87 10.28 24.19
ms speedup speedup speedup speedup speedup
Image Sizes 4096x4096
resp. 256x256x256
![Page 30: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/30.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Future Work
Imaging applications (tested)
High dynamic range compression
Image denoising
Optical Flow
Imaging applications (next)
Image registration
Image segmentation
30
![Page 31: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms](https://reader034.vdocuments.site/reader034/viewer/2022052019/603364edb998f4618c4f1a24/html5/thumbnails/31.jpg)
Computer Science X - System Simulation Group
Harald Köstler ([email protected])
Acknowledgements
• Funded by
• Bundesministerium für Bildung und Forschung
• KONWIHR. Bavarian project
• DFG SPP 1648/1 – Software for Exascale computing
http://www.exastencils.org/