optimize financial applications using intel® math kernel libraryinfringement of any patent,...
TRANSCRIPT
Optimize Financial Applications using Intel® Math Kernel Library
Industry-Leading High Performance Math Library
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Intel may make changes to specifications and product descriptions at any time, without notice.
All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published
specifications. Current characterized errata are available on request.
Sandy Bridge and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for
release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or
services and any such use of Intel's internal code names is at the sole risk of the user
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as
SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors
may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance
Intel, Core, Xeon, VTune, Cilk, Intel and Intel Sponsors of Tomorrow. and Intel Sponsors of Tomorrow. logo, and the Intel logo are trademarks of Intel
Corporation in the United States and other countries.
*Other names and brands may be claimed as the property of others.
Copyright ©2011 Intel Corporation.
Hyper-Threading Technology: Requires an Intel® HT Technology enabled system, check with your PC manufacturer. Performance will vary depending on the
specific hardware and software used. Not available on all Intel® Core™ processors. For more information including details on which processors support HT
Technology, visit http://www.intel.com/info/hyperthreading
Intel® 64 architecture: Requires a system with a 64-bit enabled processor, chipset, BIOS and software. Performance will vary depending on the specific
hardware and software you use. Consult your PC manufacturer for more information. For more information, visit http://www.intel.com/info/em64t
Intel® Turbo Boost Technology: Requires a system with Intel® Turbo Boost Technology capability. Consult your PC manufacturer. Performance varies
depending on hardware, software and system configuration. For more information, visit http://www.intel.com/technology/turboboost
2
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Optimization Notice
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for
optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and
SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or
effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-
dependent optimizations in this product are intended for use with Intel microprocessors. Certain
optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer
to the applicable product User and Reference Guides for more information regarding the specific
instruction sets covered by this notice.
Notice revision #20110804
3
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Agenda
Performance optimization using Intel software tools.
Common tasks in computational finance, and how Intel MKL may help.
– Examples in option pricing
– Example in time series modeling
– STAC* A2 financial analytics benchmark suite
Intel MKL overview:
– Components
– Top new features
More information:
– Online resources
4
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Baseline
Intel MKL Enabled
Boost computational task performance with Intel MKL
Vectorized
Compiled with ICC; Enhance source code for effective use of vectorization
Threaded
Using Intel Cilk Plus, OpenMP* or TBB
Scaled to many-core architecture and cluster
A Systematic Approach of Performance Optimization on Intel Architectures
5
ICC - Intel® C/C++ and Fortran Compilers
TBB – Intel® Threading Building Blocks
Intel software tools deliver top application performance
while minimizing development, tuning and testing cost.
Common tasks in computational finance and how Intel MKL can help
6
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Using Intel® MKL for Financial Mathematics
Financial mathematics
Analytical pricing models
Stochastic approach and simulation
FFT methods for option pricing
Statistical inference in financial models
MKL components
Vector math functions
Random number generators
Summary statistics
Fast Fourier Transforms
LAPACK, BLAS
7
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Black-Scholes and Analytical Models
Task: Option pricing using analytical models with closed-form solutions.
• The classical Black-Scholes formula.
• Challenges: Very compute intensive. Needs optimized transcendental math functions: erf, exp, ln, sqrt, …
How can Intel MKL help?
• Vector Math functions: vectorized, threaded.
• Supports single and double precision data types.
• Flexibility in balancing accuracy and performance.
8
Accuracy Performance
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Example: European Option Pricing using Black-Scholes
Embarrassingly parallel
– Millions of options can be priced simultaneously
T
rTKSrTK
T
rTKSSC
)5.0()/ln()exp(
)5.0()/ln( 2
0
2
00
S0[0], K[0], T[0] R[0], Sig[0]
S0[1], K[1], T[1] R[1], Sig[1]
S0[n], K[n], T[n] R[n], Sig[n]
SIMD
C[0] C[1] C[n]
9
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
10
10
void BlackScholesFormula( int nopt, float r, float sig, float s0[], float x[],
float t[], float vcall[], float vput[] )
{
…
#pragma omp parallel for shared( … ) private( … )
for ( i = 0; i < nb; i++ ) {
…
vmlSetMode( VML_EP );
…
vsDiv(NBUF, s0, x, Div); vsLn(NBUF, Div, Log);
for ( j = 0; j < NBUF; j++ ) {
…
}
vsExp(NBUF, mtr, Exp); vsInvSqrt(NBUF, tss, InvSqrt);
for ( j = 0; j < NBUF; j++ ) {
w1[j] = (Log[j] + tr[j] + tss05[j]) * InvSqrt[j] * INV_SQRT2;
w2[j] = (Log[j] + tr[j] - tss05[j]) * InvSqrt[j] * INV_SQRT2;
}
vsErf(NBUF, w1, w1); vsErf(NBUF, w2, w2);
for ( j = 0; j < NBUF; j++ ) {
w1[j] = HALF + HALF * w1[j]; w2[j] = HALF + HALF * w2[j];
vcall[j] = s0[j] * w1[j] - _x[j] * Exp[j] * w2[j];
…
}
}
…
}
European Option Pricing using Black-Scholes: Code Snippet
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
European Option Pricing using Black-Scholes: Performance
11
Intel® Core™ i7-2600 CPU (4 cores)
Double precision Single Precision
Baseline (libc math functions, sequential code)
1x 1x
Intel® MKL 4.82x 10.57x
Intel® MKL + OpenMP 18.48x 39.98x
Speed-up
Configuration Info - Versions: Intel® Math Kernel Library (Intel® MKL) 11.0; Hardware: Intel® Core™ i-7 2600 Processor (8 MB LLC, 3.40Ghz), 4 GB of RAM;
Operating System: Fedora 16 x86_64; Benchmark Source: Intel Corporation.
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Stochastic Approaches and Monte Carlo
Task: Option pricing by simulations of stochastic differential equations - Monte Carlo algorithms.
•Requires high quality random number generators.
•Parallelized Monte Carlo: multiple independent random streams.
• Low discrepancy Monte Carlo: Quasi-random numbers
How can Intel MKL help?
•A large selection of basic random number generators.
•Pseudo-random, quasi-random, non-deterministic random
•Many types of continuous/discrete distributions.
•Flexible usage models for independent streams.
12
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel MKL RNG Usage Model
Common usage model – Initialization status = vslNewStream(&stream, VSL_BRNG_MT19937, 7777777)
status = vslNewStream(&stream, VSL_BRNG_SOBOL, 10)
status = vslNewStreamEx(&stream, VSL_BRNG_SOBOL, nparams, params)
– Generating random numbers status = vdRngUniform(VSL_METHOD_DUNIFORM_STD, stream, n, r, 0.0, 1.0)
status = vsRngGaussian(VSL_METHOD_SGAUSSIAN_ICDF, stream, n, r, 0.0, 1.0)
– De-initializatioin status = vslDeleteStream(&stream)
Example (RNG examples can be found in Intel MKL packages)
13
#include “mkl_vsl.h”
#define N 1000 /* Vector size */
#define SEED 777 /* Seed for BRNG */
#define BRNG VSL_BRNG_MT19937 /* VSL BRNG */
#define METHOD VSL_METHOD_DGAUSSIAN_ICDF /* Generation method */
main()
{
double r[N], a = 0, sigma = 1.0;
VSLStreamStatePtr stream;
int errcode;
errcode = vslNewStream( &stream, BRNG, SEED ); /* Initialize random stream */
errcode = vdRngGaussian( METHOD, stream, N, r, a, sigma ); /* Call Gaussian Generator */
errcode = vslDeleteStream( &stream ); /* De-initialize random stream */
…
}
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Random Streams in Parallel Computing
Generate multiple independent streams using an RNG set.
– Wichmann-Hill: A set of 273 independent RNGs.
– MT2003: A set of 6024 independent RNGs.
Split a single random stream among multiple threads.
– Block-splitting: non-overlapping blocks
– MCG31m1, MRG32k3a, MCG59, WH, SOBOL, NIEDERREITER
– Leapfrogging: disjoint sequences
– 1st stream: x1, xk+1, x2k+1, x3k+1, ...,
– 2nd stream: x2, xk+2, x2k+2, x3k+2, ..., and so on.
– MCG31m1, MCG59, WH, SOBOL, NIEDERREITER
– Leapfrogging is good only when number of streams is small.
14
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Example: European Option Pricing Using Monte Carlo Method
Options-per-second: Higher is better.
15
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Efficient Option Pricing using FFT
Task: Achieve real time or near-real time option pricing using models under Lévy process.
• Obtain option prices across the whole spectrum of strikes with one set of FFT calculations.
• Involves a forward transform (density function --> characteristic function) and a backward transform.
How can Intel MKL help?
• Highly optimized FFT functions.
• Single and multidimensional transforms.
• Cluster support.
16
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel MKL Fast Fourier Transform (FFT)
Transform sizes: 2-powers, mixed radix, prime sizes.
• Transforms provide for efficient use of memory. Any size transform can be specified, but not all transform sizes run equally fast.
Multiple transforms on single call.
Supports strides in data.
Cluster FFT works with various MPI implementations.
Integrated FFTW interfaces.
• FFTW3 wrappers are also built into the library.
17
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Multivariate Time Series
Task: Modeling multivariate financial time series.
• An integral part in portfolio optimization, for example, VaR (value-at-risk) and MVP (minimum variance portfolio).
• Challenges: Quantifying correlations and dependences for high dimensional data.
• Dimension reduction methods usually based on PCA and eigenvectors of sample covariance matrices.
How can Intel MKL help?
• Routines to calculate variance-covariance matrix and correlation matrix.
• Rich functionality in Eigen-solvers and matrix decomposition (via LAPACK).
18
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel MKL Routines for Computing Variance-Covariance/Correlation Matrix
Part of the Summary Statistics component.
• Handles missing values and outliers.
• Supports parameterization of correlation matrices.
• Supports streaming data.
Example (more examples can be found in Intel MKL packages)
19
#include “mkl.h”
#define DIM 3 /* Task dimension */
#define N 10000 /* Number of observations */
int main()
{
VSLSSTaskPtr task;
MKL_INT dim, n, x_storage, cov_storage cor_storage;
double x[N*DIM], cov[DIM*DIM], cor[DIM*DIM], mean[DIM];
…
vsldSSNewTask( &task, &dim, &n, &x_storage, x, 0, 0 ); /* Create a task */
vsldSSEditCovCor( task, mean, cov, &cov_storage, cor, &cor_storage ); /* Modify the task parameters */
vsldSSCompute( task, VSL_SS_COR, VSL_SS_METHOD_FAST ); /* Compute statistical estimates */
vslSSDeleteTask( &task ); /* Destroy the task */
}
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Functions for Basic Statistical Analysis of Big Datasets
Intel MKL Summary Statistics also offers functions for:
20
Estimates
Raw and central moments up to the fourth order
Excess kurtosis, skewness and variation
Minimum, maximum, quantiles/streaming quantiles, and order statistics
Variance-covariance/correlation matrix
Pooled/group variance-covariance matrix and mean
Partial variance-covariance/correlation matrix
Robust estimators for variance-covariance matrix and mean in presence of outliers
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Example: Online Noise Filtration
21
time …
Filter
Dk
p x m(tk)
t1
D1
p x m(t1) D2
p x m(t2) …
t2 tk
Data arrive in chunks; each chunk – matrix of size p x m, t(i)
Signal component Noise component
Major blocks of the filter
• Update correlation matrix
using the latest data chunk Dk
• Apply PCA(*): compute
Eigenvalues/vectors for the correlation
• Split Eigenvalues into two sets(**):
1st set presents signal, 2nd set – noise
• Assembly signal and noise correlations
from 2 sets of Eigenvalues/vectors
Further analysis * PCA - Principal Component Analysis
** Split is based on Randomized Matrix theory
and distribution of Eigenvalues H. Kargupta, K. Sivakumar, and S. Ghosh. Dependency Detection in MobiMine
and Random Matrices, In Proceedings of PKDD'2002, pp. 250–262, 2002.
Springer-Verlag Berlin Heidelberg 2002
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Online Noise Filtration: Code Snippets
22
#define P 450 /* # of stocks*/
#define M 1000 /* number of observations in block */
…
VSLSSTaskPtr task;
double x[P*M], cor[P*P], W[2];
MKL_INT p, m, x_storage, cor_storage;
/* Initialize VSL Summary Stats task */
P = P; m = M;
x_storage = VSL_SS_MATRIX_STORAGE_COLS;
vsldSSNewTask( &task, &n, &m, x, &x_storage, 0,0 );
…
/* Set up parameters of the task */
vsldSSEditCovCor( task, mean, 0, 0, cor, VSL_SS_MATRIX_STORAGE_FULL);
vsldSSEditTask( task, VSL_SS_ED_ACCUM_WEIGHT, W );
…
/* loop over data blocks */
for ( nblock = 0; ; nblock ++ )
{
/* Update correlation estimate in cor */
vsldSSCompute( task, VSL_SS_COR, VSL_SS_METHOD_FAST );
/* Apply PCA */
dsyevr(…,l1, l2, …);
/* Assembly correlation matrix of noise */
...
dsyrk( evect_n, ..., cor_n,... );
...
}
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Online Noise Filtration: Performance
23
Online Noise Filtration Performance (S&P500 historic data, block size 450 x 1000)
Seconds per block Speedup vs. Baseline
Baseline implementation (using Netlib and glibc rand)
0.883 1x
Optimized implementation (using Intel® MKL)
0.031 28.9x
Configuration Info - Versions: Intel® Math Kernel Library (Intel® MKL) 11.0; Hardware: Intel® Xeon® E5-2690 Processor,
2 Eight-Core CPUs (20MB LLC, 2.9GHz), 32 GB of RAM; Operating System: Fedora 16 x86_64; Benchmark Source: Intel Corporation.
Article published in the December 2012 issue of Intel Parallel Universe magazine.
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Linear Equation Systems, Eigen Problems, Least Squares Problems
Tasks in computational finance frequently depend on solving these fundamental problems. For example:
• Model calibration – Least squares problems.
• Portfolio optimization – Eigen problems.
• Correlation based multivariate models – LU, QR, Cholesky, SVD.
How can Intel MKL help?
• Linear Algebra PACKage (LAPACK)
• De facto industry standard interface.
• Highly optimized Intel MKL implementation.
24
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Linear Equations Solvers
Ax = b
• Solving many types of systems: general, band, tridiagonal, symmetric positive definite, etc.
Least Squares Problem Solvers
min||Ax – b||2
• Solving linear least squares problems and generalized linear squared problems.
Singular Value Decomposition
A = UΣVH
• A set of algorithms for SVD for general real or complex rectangular matrices.
Eigen Solvers
Az = λz,
zHA = λzH
• Solving symmetric, non-symmetric, generalized symmetric-definite Eigenvalue problems.
Intel MKL LAPACK
25
Vectorized and threaded.
Uses Intel MKL BLAS for efficient matrix and vector operations.
Cluster support: ScaLAPACK
Fully support LAPACK 3.1 spec.
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
A Good Example That Uses Many Intel MKL Components: STAC* A2 Benchmark Suite
Vendor-neutral benchmark suite focuses on:
– Market risk management.
– Real-time, near-real-time models.
– Strategic backtesting, …
26
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel Implementation of STAC* A2
Intel provides the first implementation
– Seamlessly scales STAC* A2 from generation to generation of Intel architectures.
– A SC’12 paper: http://www.stacresearch.com/SC12_submission_stac.pdf
Source code and performance data:
– STAC* website: http://www.stacresearch.com/a2
– Available to registered users.
27
12/12/2012
Intel® Parallel Studio XE – Tools for development
Intel® C++ Compiler XE
Intel® MKL (BLAS, LAPACK, RNG, summary stats, data fitting functions)
Intel® Vtune™ Amplifier XE
A quick overview of Intel MKL
28
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
29
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Broad OS and Language Support
30
Static and dynamic runtime libraries, 64-bit and 32-bit
Windows* Linux* Mac OS*
Compiler Intel, CVF, Microsoft, PGI Intel, GNU, PGI Intel, GNU
Libraries .lib, .dll .a, .so .a, .dylib
Language Support
Domain Fortran 77 Fortran 95/99 C/C++
BLAS X X Via CBLAS
Sparse BLAS Level 1 X X Via CBLAS
Sparse BLAS level 2&3 X X X
LAPACK X X X
ScaLAPACK X
PARDISO X X X
DSS & ISS X X X
VML/VSL/DF X X X
FFT/Cluster FFT X X
Fast Poisson, Laplace, Helmholtz X X
Optimization (TR) Solvers X X X
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
What About Third Party Math Libraries and Tools?
Many of them are already powered by Intel MKL:
– MATLAB*, Mathematica*, NumPy*/SciPy*, NAG*, IMSL*, …
– Can be manually built/updated with the latest MKL release.
– http://software.intel.com/en-us/articles/intel-mkl-and-third-party-applications-how-to-use-them-together
But there are major benefits to use MKL directly:
– No delay in unlocking performance features of new Intel architectures.
– Access to the entire MKL functionality.
31
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Top New Features in Intel MKL 11.0
Conditional Numerical Reproducibility (CNR)
• Run-to-run and processor-to-processor reproducibility.
• A popular request by many FSI customers.
• http://software.intel.com/en-us/articles/conditional-numerical-reproducibility-cnr-in-intel-mkl-110
Support for Intel® Xeon Phi™ coprocessors
• http://software.intel.com/en-us/articles/intel-mkl-on-the-intel-xeon-phi-coprocessors
Optimizations for Intel® AVX2 including FMA3
• http://software.intel.com/en-us/articles/haswell-support-in-intel-mkl/
32
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Online Resources
Intel® MKL product web site:
– http://software.intel.com/en-us/articles/intel-mkl/
– Performance data and comparison
– Documentation
– User forum
Intel software development products:
– Intel® Parallel Studio XE
– Intel® Cluster Studio XE
Financial Services Industry Community:
– http://software.intel.com/en-us/financial-services
33
33
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
34
Backup Slides
35
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
36
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
MKL RNGs Performance Summary
Cycles-per-element (CPE): Lower is better.
0
3
6
9
12
15
18
MCG31 MCG59 MRG32K3A MT19937 MT2203 NIEDERR R250 SFMT19937 SOBOL WH
CP
E
Basic RNGs
Intel® MKL 11.0 Uniform distribution generation
Intel® Core™ i7-2600
Single precision
Double precision
37
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
38
Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
39