powered by intel® math kernel library
TRANSCRIPT
Powered by
Intel® Math Kernel Library (Intel® MKL)
Shane Story
Intel MKL Technology Strategist,
Victor Kostin
Intel MKL Dense Solvers team manager
"MKL is the best math library in the world…
robust, accurate, and vastly faster than the competition.”
Robert Harrison, Director, Oak Ridge National Labs
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
All you ever wanted to know about Intel MKL ...
• What is the Intel Math Kernel Library (Intel MKL)?
• Who uses Intel MKL?
• Why Intel MKL is relevant to the world’s fastest
computer systems?
• What is the most widely used math library in the world?
• How do I acquire Intel MKL?
You will be an Intel MKL evangelist 40 minutes from
now!
2
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Agenda
• Introduction
• Performance
– Benchmarks
– Top500
• Customers
• Looking forward
3
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
What is a math library? • Start with your problem in a scientific discipline
• These problems might involve mathematics – Differential equations
– Linear algebra
– Fourier transforms
– Statistics
Geosciences & Geo-
engineering Financial Analytics
Science & Research
Engineering Design
Signal Processing
Digital Content Creation
4
− 𝝏𝒖𝟐
𝝏𝒙𝟐 − 𝝏𝒖𝟐
𝝏𝒚𝟐 + 𝒒 𝒖 = 𝒇 𝒙, 𝒚
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Solving these problems can be tough
• These problems often – are impossible to solve by hand (no closed form
solution)
– are computationally intensive
– involve complex algorithms
• Math libraries can help ease the burden – Pick your favorite programming language:
C/C++/C#/Fortran/Java/Python
– IEEE floating point: single(32-bit) and double(64-bit)
(+/-1) * 2exponent
* 1.fraction
– Translate your mathematics into a SW program
5
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
A * B = C
Matrix Multiplication Example
1 2 34 5 6
∗ 1 4 7 102 5 8 113 6 9 12
= ? ? ? ?? ? ? ?
Match the general Intel MKL format
alpha * op( A ) * op( B ) + beta * C = C
Make an Intel MKL call
DGEMM(TRANSA,TRANSB,M,N,K,ALPHA,A,LDA,B,LDB,BETA,C,LDC)
m=2
k=3 n=4 n=4
k=3 m=2
Dimensions
m,n,k
6
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
No need to reinvent the wheel - math libraries
improve developer productivity
Compile and link against Intel MKL or
Program Sample
...
do i= 1,m
do j=1,n
do l=1,k
c(i,j)=c(i,j)+a(i,l)*b(l,j)
end do
end do
end do
...
call DGEMM(transa,transb,
m,n,k,
alpha,a,lda
,b,ldb,beta
,c,ldc)
...
stop
end
On Windows, using the Intel Fortran or C/C++ compiler, simply
ifort prog.f /Qmkl
or
icl prog.c /Qmkl
For other compilers linking to MKL may be tricky – see Intel® Math Kernel Library Link Line Advisor at http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/
7
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Linear Algebra
• BLAS
• LAPACK
• Sparse Solvers
• ScaLAPACK
Fast Fourier Transforms
• Multidimensional FFT
• FFTW interfaces
• Cluster FFT
Vector Math
• Trigonometric
• Hyperbolic
• Exponential, Logarithmic
• Power / Root
Vector Random
Number Generators
• Congruential
• Wichmann-Hill
• Mersenne Twister
• Sobol
• Neiderreiter
• Non-deterministic
Summary Statistics
• Kurtosis
• Variation coefficient
• Quantiles
• Order statistics
• Min/max
• Variance-covariance
Data Fitting
• Splines
• Interpolation
• Cell search
Intel MKL supports a wide variety of
mathematical computations
8
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Linear Algebra
• BLAS
• LAPACK
• Sparse Solvers
• ScaLAPACK
Fast Fourier Transforms
• Multidimensional FFT
• FFTW interfaces
• Cluster FFT
Vector Math
• Trigonometric
• Hyperbolic
• Exponential, Logarithmic
• Power / Root
Vector Random
Number Generators
• Congruential
• Wichmann-Hill
• Mersenne Twister
• Sobol
• Neiderreiter
• Non-deterministic
Summary Statistics
• Kurtosis
• Variation coefficient
• Quantiles
• Order statistics
• Min/max
• Variance-covariance
Data Fitting
• Splines
• Interpolation
• Cell search
Intel MKL supports a wide variety of
mathematical computations
9
Let’s review these
domains in more detail
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
BLAS – Basic Linear Algebra Subprograms
Defacto-standard APIs since the 1980s (FORTRAN 77)
Level 1 – vector-vector operations
Level 2 – matrix-vector operations
Level 3 – matrix-matrix operations
Precisions: single, double, single complex, double complex
Matrix Types: “dense” general, packed, triangular, banded
Operation Intel MKL Routine (D is for double)
Example Computational complexity (work)
Vector Vector D AXPY y = y + α x O(N)
Matrix Vector D GEMV y = αAx + βy O(N²)
Matrix Matrix D GEMM C = αA * B + βC O(N³)
Original BLAS available at http://netlib.org/blas/
10
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Defacto-standard API since early 1990s
Thousands of linear algebra functions
Four floating point precisions
supported
Breadth of coverage:
• Matrix factorizations: the 3 Amigos – LU, Cholesky, QR
• Solving systems of linear equations
• Condition number estimates
• Singular value decomposition
• Symmetric and non-symmetric eigenvalue problems
• And much, much more
LAPACK – Linear Algebra PACKage
Original LAPACK is available at:
http://netlib.org/lapack/
11
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
VML - Vector Math Library
• Arithmetic
‒ add/sub/sqrt/ ...
• Exponential and log
‒ exp/pow/log/log10
• Trigonometric and hyperbolic
‒ sin/cos/sincos/tan(h)
‒ asin/acos/atan(h)
• Rounding
‒ ceil, floor, round ...
‒ And many more ...
• Real and complex
• Single/double precision
• 3 accuracy modes
‒ High accuracy
(Almost correctly rounded)
‒ Low accuracy
( 2 lowest bits in error)
‒ Enhanced performance
(1/2 the bits correct)
Example: y(i) = ex(i) for i =1 to n
Vector-based elementary functions allow
developers to balance accuracy with performance 12
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel Instruction Set Architectures
• Intel® Streaming SIMD Extensions (Intel® SSE)
• Intel® Streaming SIMD Extensions 2 (Intel® SSE2)
• Intel® Streaming SIMD Extensions 3 (Intel® SSE3)
• Intel® Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3)
• Intel® Streaming SIMD Extensions 4 (Intel® SSE4)
• Intel® Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3)
• Intel® Advanced Vector Extensions (Intel® AVX)
• Intel® Advanced Vector Extensions 2 (Intel® AVX2)
13
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
The evolution of Intel® MKL
Intel BLAS Library
Pentium®
Intel Math Kernel Library
2.0 BLAS 3
threaded
2D FFT
2.1 Sparse BLAS
3.0 LAPACK
2000
4.0 Vector Math
2002
6.0 DFTI & Vector
Statistics Itanium®
7.0 PARDISO*
& ScaLAPACK
2005
8.0 CDFT, ISS, F95
8.1, 9.0 Trig
Transforms, Poisson
9.1 Trust
Region, LINPACK
10.0/1 OOC
PARDISO
10.2 LAPACK
3.2
10.3 Data
Fitting
11.0 СNR Intel Xeon Phi™
Intel® SSE2
Intel® SSE3
Intel® SSSE3
Intel® SSE4
Intel® SSE
Over 18 years of strategic investment
1994
1996 1995
1997 1998
1999 2003 2006
2007
2008
2010
2011
2012 Intel® AVX
14
Intel® AVX2
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Job #1 is optimal performance
• We go to extremes to get the most
“differentiated” performance out of the
processor and system resources available
‒ Core: think register usage, prefetching, caches
‒ Multicore (processor/socket) level parallelization
‒ Multi-socket (node) level parallelization
‒ Clusters
‒ Data locality is key: keep your friends close but
your data closer ...
Let’s visualize this!
15
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel MKL is optimized for performance
…
Intel® MKL
Automatic scaling from the core, to multicore,
to many core and beyond!
Intel MKL
+
OpenMP
Sequential
Intel MKL
runs on
single core
Intel MKL
+
MPI
16
Intel Xeon Phi
Coprocessor
PCIe
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
How we measure performance
• FLOPS – floating point operations per second
‒ 1 FLOP is one floating point operation
‒ Addition +, subtraction -, multiplication *, division /
• An “old” 2.0 GHz Intel SSE2-based Xeon quad core can
do:
‒ 2 additions + 2 multiplications per cycle (via SIMD
instructions)
‒ (4 cores) * (2 GHz) * (4 FLOP) = 32 GigaFLOPS (Double)
• Intel AVX-based (2010) systems double these rates
(2x)
‒ Double precision: 4 additions + 4 multiplications per cycle
‒ (4 cores) * (2 GHz) * (8 FLOP) = 64 GigaFLOPS (Double) 17
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
DGEMM performance (Intel MKL vs. ATLAS*)
18
Intel MKL strives to provide competitive performance
0
50
100
150
200
250
300
350
64 112 128 160 256 300 450 512 800 1000 1024 1500 1536 2000 2048 2560 3000 3072 4000 5000 6000 7000 8000
Pe
rfo
rma
nce
(G
Flo
ps)
Matrix Size (M=N=K=64, 112, 128, .., 6000, 7000, 8000)
Intel® Math Kernel Library versus ATLAS*
DGEMM on Intel® Xeon® E5-2600 Family Processor
Intel MKL - 16 threads Intel MKL - 8 threads Intel MKL - 1 thread
ATLAS - 16 threads ATLAS - 8 threads ATLAS - 1 thread
Intel® MKL offers significant performance boost over ATLAS*
Performance scales as number of CPU cores grows
Performance up to 87% of CPU peak!
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Why does performance matter?
• Grand Challenge problems (1980s/90s)
‒ Human genome
‒ Simulating nuclear explosions
‒ Speech and vision recognition
• Beyond GigaFLOPS (GFLOPS), we have ‒ TeraFLOPS (TFLOPS): GFLOPS x 1000 = 1012
‒ PetaFLOPS (PFLOPS): TFLOPS x 1000 = 1015
‒ ExaFLOPS (EFLOPS): PFLOPS x 1000 = 1018 (ETA ~2018)
‒ ZetaFLOPS (ZFLOPS): EFLOPS x 1000 = 1021
• More FLOPS (+ technology) will enable us to
solve currently unsolvable problems
19
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
An insatiable need for FLOPS
Exascale and beyond will help address humanity’s challenges
10 PFlops
1 PFlops
100 TFlops
10 TFlops
1 TFlops
100 GFlops
10 GFlops
1 GFlops
100 MFlops
100 PFlops
10 EFlops
1 EFlops
100 EFlops
1993 2017 1999 2005 2012 2023
1 ZFlops
2029
Weather Prediction
Medical Imaging
Cellular Research
Source: www.top500.org
Forecast
20
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
The big iron
21
• The Top500 list - the 500 most powerful commercially
available computer systems known
• Entrants are showcased (measured) using the High Performance Linpack (HPL) Benchmark
• List updated twice yearly at Super Computer conferences
• Intel MKL is key when it comes extracting optimal
performance on Intel based systems
"The Intel® Math Kernel Library is indispensable ... a rich, highly optimized collection of math routines …
Outstanding performance is achieved on both multicore and multiprocessor systems.“
Jack Dongarra (Father of Linpack Benchmark), Univ. of Tennessee
http://www.top500.org/
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Distribute blocks of A among cluster nodes
High Performance Linpack (HPL)
5x + 3y – 2z = 23 7x + 9y + 3z = 102 8x + 8y – 8z = 8
Generalize to
22
Ax = b where A is huge!
http://www.netlib.org/benchmark/hpl/
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Current Top500 list
23
http://www.top500.org/
TFLOP
PFLOP
EFLOP
“My current Core i5 laptop would have been the world’s fastest computer when I started at Intel in 1991”
Shane Story (Intel MKL Historian)
Titan: #1 17.6 PFLOPS!
162 PFlops
Lomonosov: #26 0.9 PFLOPS!
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Direct sales
Strategic End User Accounts
from segments:
Major ISVs Library Vendors
Customers Intel MKL
Open Source
Projects
• NumPy
• PETSc
• Trilinos
• Eigen
• Gromacs
• Octave
• WRF
Direct Sales
24
• Academic/Research • Animation • Life Sciences
• Financial Segment • Energy • Manufacturing
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel MKL Poisson solver
“We want solid building blocks that we know will be robust and have
optimal performance, Intel MKL provides that. ... “,
Ron Henderson, Senior R&D Manager, DreamWorks Animation* *Quote from a "A Very Good Kitty, Indeed" , Intel Visual Adrenaline 2012
25
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel MKL a key ingredient of the technical
computing SW product offerings
26
And is also available standalone http://software.intel.com/en-us/intel-math-kernel-library-purchase
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Q & A
27
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
28
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Core, Intel Inside, the Intel Inside logo, Itanium, Itanium Inside, Pentium, Pentium Inside, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
Legal Disclaimer & Optimization Notice
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel ‘s share of the Top500 systems
30
IA displaces RISC
http://www.top500.org/