optimize financial applications using intel® math kernel libraryinfringement of any patent,...

Optimize Financial Applications using Intel® Math Kernel Library

Industry-Leading High Performance Math Library

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Intel may make changes to specifications and product descriptions at any time, without notice.

All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published

specifications. Current characterized errata are available on request.

Sandy Bridge and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for

release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or

services and any such use of Intel's internal code names is at the sole risk of the user

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as

SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors

may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,

including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance

Intel, Core, Xeon, VTune, Cilk, Intel and Intel Sponsors of Tomorrow. and Intel Sponsors of Tomorrow. logo, and the Intel logo are trademarks of Intel

Corporation in the United States and other countries.

*Other names and brands may be claimed as the property of others.

Copyright ©2011 Intel Corporation.

Hyper-Threading Technology: Requires an Intel® HT Technology enabled system, check with your PC manufacturer. Performance will vary depending on the

specific hardware and software used. Not available on all Intel® Core™ processors. For more information including details on which processors support HT

Technology, visit http://www.intel.com/info/hyperthreading

Intel® 64 architecture: Requires a system with a 64-bit enabled processor, chipset, BIOS and software. Performance will vary depending on the specific

hardware and software you use. Consult your PC manufacturer for more information. For more information, visit http://www.intel.com/info/em64t

Intel® Turbo Boost Technology: Requires a system with Intel® Turbo Boost Technology capability. Consult your PC manufacturer. Performance varies

depending on hardware, software and system configuration. For more information, visit http://www.intel.com/technology/turboboost

2

http://www.intel.com/performance

http://www.intel.com/info/hyperthreading

http://www.intel.com/info/em64t

http://www.intel.com/technology/turboboost


Optimization Notice

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for

optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and

SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or

effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-

dependent optimizations in this product are intended for use with Intel microprocessors. Certain

optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer

to the applicable product User and Reference Guides for more information regarding the specific

instruction sets covered by this notice.

Notice revision #20110804

3


Agenda

Performance optimization using Intel software tools.

Common tasks in computational finance, and how Intel MKL may help.

– Examples in option pricing

– Example in time series modeling

– STAC* A2 financial analytics benchmark suite

Intel MKL overview:

– Components

– Top new features

More information:

– Online resources

4


Baseline

Intel MKL Enabled

Boost computational task performance with Intel MKL

Vectorized

Compiled with ICC; Enhance source code for effective use of vectorization

Threaded

Using Intel Cilk Plus, OpenMP* or TBB

Scaled to many-core architecture and cluster

A Systematic Approach of Performance Optimization on Intel Architectures

5

ICC - Intel® C/C++ and Fortran Compilers

TBB – Intel® Threading Building Blocks

Intel software tools deliver top application performance

while minimizing development, tuning and testing cost.

Common tasks in computational finance and how Intel MKL can help

6


Using Intel® MKL for Financial Mathematics

Financial mathematics

Analytical pricing models

Stochastic approach and simulation

FFT methods for option pricing

Statistical inference in financial models

MKL components

Vector math functions

Random number generators

Summary statistics

Fast Fourier Transforms

LAPACK, BLAS

7


Black-Scholes and Analytical Models

Task: Option pricing using analytical models with closed-form solutions.

• The classical Black-Scholes formula.

• Challenges: Very compute intensive. Needs optimized transcendental math functions: erf, exp, ln, sqrt, …

How can Intel MKL help?

• Vector Math functions: vectorized, threaded.

• Supports single and double precision data types.

• Flexibility in balancing accuracy and performance.

8

Accuracy Performance


Example: European Option Pricing using Black-Scholes

Embarrassingly parallel

– Millions of options can be priced simultaneously

T

rTKSrTK

T

rTKSSC

)5.0()/ln()exp(

)5.0()/ln( 2

0

2

00

S0[0], K[0], T[0] R[0], Sig[0]

S0[1], K[1], T[1] R[1], Sig[1]

S0[n], K[n], T[n] R[n], Sig[n]

SIMD

C[0] C[1] C[n]

9


10

10

void BlackScholesFormula( int nopt, float r, float sig, float s0[], float x[],

float t[], float vcall[], float vput[] )

{

…

#pragma omp parallel for shared( … ) private( … )

for ( i = 0; i < nb; i++ ) {

…

vmlSetMode( VML_EP );

…

vsDiv(NBUF, s0, x, Div); vsLn(NBUF, Div, Log);

for ( j = 0; j < NBUF; j++ ) {

…

}

vsExp(NBUF, mtr, Exp); vsInvSqrt(NBUF, tss, InvSqrt);

for ( j = 0; j < NBUF; j++ ) {

w1[j] = (Log[j] + tr[j] + tss05[j]) * InvSqrt[j] * INV_SQRT2;

w2[j] = (Log[j] + tr[j] - tss05[j]) * InvSqrt[j] * INV_SQRT2;

}

vsErf(NBUF, w1, w1); vsErf(NBUF, w2, w2);

for ( j = 0; j < NBUF; j++ ) {

w1[j] = HALF + HALF * w1[j]; w2[j] = HALF + HALF * w2[j];

vcall[j] = s0[j] * w1[j] - _x[j] * Exp[j] * w2[j];

…

}

}

…

}

European Option Pricing using Black-Scholes: Code Snippet


European Option Pricing using Black-Scholes: Performance

11

Intel® Core™ i7-2600 CPU (4 cores)

Double precision Single Precision

Baseline (libc math functions, sequential code)

1x 1x

Intel® MKL 4.82x 10.57x

Intel® MKL + OpenMP 18.48x 39.98x

Speed-up

Configuration Info - Versions: Intel® Math Kernel Library (Intel® MKL) 11.0; Hardware: Intel® Core™ i-7 2600 Processor (8 MB LLC, 3.40Ghz), 4 GB of RAM;

Operating System: Fedora 16 x86_64; Benchmark Source: Intel Corporation.


Stochastic Approaches and Monte Carlo

Task: Option pricing by simulations of stochastic differential equations - Monte Carlo algorithms.

•Requires high quality random number generators.

•Parallelized Monte Carlo: multiple independent random streams.

• Low discrepancy Monte Carlo: Quasi-random numbers


•A large selection of basic random number generators.

•Pseudo-random, quasi-random, non-deterministic random

•Many types of continuous/discrete distributions.

•Flexible usage models for independent streams.

12


Intel MKL RNG Usage Model

Common usage model – Initialization status = vslNewStream(&stream, VSL_BRNG_MT19937, 7777777)

status = vslNewStream(&stream, VSL_BRNG_SOBOL, 10)

status = vslNewStreamEx(&stream, VSL_BRNG_SOBOL, nparams, params)

– Generating random numbers status = vdRngUniform(VSL_METHOD_DUNIFORM_STD, stream, n, r, 0.0, 1.0)

status = vsRngGaussian(VSL_METHOD_SGAUSSIAN_ICDF, stream, n, r, 0.0, 1.0)

– De-initializatioin status = vslDeleteStream(&stream)

Example (RNG examples can be found in Intel MKL packages)

13

#include “mkl_vsl.h”

#define N 1000 /* Vector size */

#define SEED 777 /* Seed for BRNG */

#define BRNG VSL_BRNG_MT19937 /* VSL BRNG */

#define METHOD VSL_METHOD_DGAUSSIAN_ICDF /* Generation method */

main()

{

double r[N], a = 0, sigma = 1.0;

VSLStreamStatePtr stream;

int errcode;

errcode = vslNewStream( &stream, BRNG, SEED ); /* Initialize random stream */

errcode = vdRngGaussian( METHOD, stream, N, r, a, sigma ); /* Call Gaussian Generator */

errcode = vslDeleteStream( &stream ); /* De-initialize random stream */

…

}


Random Streams in Parallel Computing

Generate multiple independent streams using an RNG set.

– Wichmann-Hill: A set of 273 independent RNGs.

– MT2003: A set of 6024 independent RNGs.

Split a single random stream among multiple threads.

– Block-splitting: non-overlapping blocks

– MCG31m1, MRG32k3a, MCG59, WH, SOBOL, NIEDERREITER

– Leapfrogging: disjoint sequences

– 1st stream: x1, xk+1, x2k+1, x3k+1, ...,

– 2nd stream: x2, xk+2, x2k+2, x3k+2, ..., and so on.

– MCG31m1, MCG59, WH, SOBOL, NIEDERREITER

– Leapfrogging is good only when number of streams is small.

14


Example: European Option Pricing Using Monte Carlo Method

Options-per-second: Higher is better.

15


Efficient Option Pricing using FFT

Task: Achieve real time or near-real time option pricing using models under Lévy process.

• Obtain option prices across the whole spectrum of strikes with one set of FFT calculations.

• Involves a forward transform (density function --> characteristic function) and a backward transform.


• Highly optimized FFT functions.

• Single and multidimensional transforms.

• Cluster support.

16


Intel MKL Fast Fourier Transform (FFT)

Transform sizes: 2-powers, mixed radix, prime sizes.

• Transforms provide for efficient use of memory. Any size transform can be specified, but not all transform sizes run equally fast.

Multiple transforms on single call.

Supports strides in data.

Cluster FFT works with various MPI implementations.

Integrated FFTW interfaces.

• FFTW3 wrappers are also built into the library.

17


Multivariate Time Series

Task: Modeling multivariate financial time series.

• An integral part in portfolio optimization, for example, VaR (value-at-risk) and MVP (minimum variance portfolio).

• Challenges: Quantifying correlations and dependences for high dimensional data.

• Dimension reduction methods usually based on PCA and eigenvectors of sample covariance matrices.


• Routines to calculate variance-covariance matrix and correlation matrix.

• Rich functionality in Eigen-solvers and matrix decomposition (via LAPACK).

18


Intel MKL Routines for Computing Variance-Covariance/Correlation Matrix

Part of the Summary Statistics component.

• Handles missing values and outliers.

• Supports parameterization of correlation matrices.

• Supports streaming data.

Example (more examples can be found in Intel MKL packages)

19

#include “mkl.h”

#define DIM 3 /* Task dimension */

#define N 10000 /* Number of observations */

int main()

{

VSLSSTaskPtr task;

MKL_INT dim, n, x_storage, cov_storage cor_storage;

double x[N*DIM], cov[DIM*DIM], cor[DIM*DIM], mean[DIM];

…

vsldSSNewTask( &task, &dim, &n, &x_storage, x, 0, 0 ); /* Create a task */

vsldSSEditCovCor( task, mean, cov, &cov_storage, cor, &cor_storage ); /* Modify the task parameters */

vsldSSCompute( task, VSL_SS_COR, VSL_SS_METHOD_FAST ); /* Compute statistical estimates */

vslSSDeleteTask( &task ); /* Destroy the task */

}


Functions for Basic Statistical Analysis of Big Datasets

Intel MKL Summary Statistics also offers functions for:

20

Estimates

Raw and central moments up to the fourth order

Excess kurtosis, skewness and variation

Minimum, maximum, quantiles/streaming quantiles, and order statistics

Variance-covariance/correlation matrix

Pooled/group variance-covariance matrix and mean

Partial variance-covariance/correlation matrix

Robust estimators for variance-covariance matrix and mean in presence of outliers


Example: Online Noise Filtration

21

time …

Filter

Dk

p x m(tk)

t1

D1

p x m(t1) D2

p x m(t2) …

t2 tk

Data arrive in chunks; each chunk – matrix of size p x m, t(i)

Signal component Noise component

Major blocks of the filter

• Update correlation matrix

using the latest data chunk Dk

• Apply PCA(*): compute

Eigenvalues/vectors for the correlation

• Split Eigenvalues into two sets(**):

1st set presents signal, 2nd set – noise

• Assembly signal and noise correlations

from 2 sets of Eigenvalues/vectors

Further analysis * PCA - Principal Component Analysis

** Split is based on Randomized Matrix theory

and distribution of Eigenvalues H. Kargupta, K. Sivakumar, and S. Ghosh. Dependency Detection in MobiMine

and Random Matrices, In Proceedings of PKDD'2002, pp. 250–262, 2002.

Springer-Verlag Berlin Heidelberg 2002


Online Noise Filtration: Code Snippets

22

#define P 450 /* # of stocks*/

#define M 1000 /* number of observations in block */

…

VSLSSTaskPtr task;

double x[P*M], cor[P*P], W[2];

MKL_INT p, m, x_storage, cor_storage;

/* Initialize VSL Summary Stats task */

P = P; m = M;

x_storage = VSL_SS_MATRIX_STORAGE_COLS;

vsldSSNewTask( &task, &n, &m, x, &x_storage, 0,0 );

…

/* Set up parameters of the task */

vsldSSEditCovCor( task, mean, 0, 0, cor, VSL_SS_MATRIX_STORAGE_FULL);

vsldSSEditTask( task, VSL_SS_ED_ACCUM_WEIGHT, W );

…

/* loop over data blocks */

for ( nblock = 0; ; nblock ++ )

{

/* Update correlation estimate in cor */

vsldSSCompute( task, VSL_SS_COR, VSL_SS_METHOD_FAST );

/* Apply PCA */

dsyevr(…,l1, l2, …);

/* Assembly correlation matrix of noise */

...

dsyrk( evect_n, ..., cor_n,... );

...

}


Online Noise Filtration: Performance

23

Online Noise Filtration Performance (S&P500 historic data, block size 450 x 1000)

Seconds per block Speedup vs. Baseline

Baseline implementation (using Netlib and glibc rand)

0.883 1x

Optimized implementation (using Intel® MKL)

0.031 28.9x

Configuration Info - Versions: Intel® Math Kernel Library (Intel® MKL) 11.0; Hardware: Intel® Xeon® E5-2690 Processor,

2 Eight-Core CPUs (20MB LLC, 2.9GHz), 32 GB of RAM; Operating System: Fedora 16 x86_64; Benchmark Source: Intel Corporation.

Article published in the December 2012 issue of Intel Parallel Universe magazine.

http://software.intel.com/en-us/intel-parallel-universe-magazine


Linear Equation Systems, Eigen Problems, Least Squares Problems

Tasks in computational finance frequently depend on solving these fundamental problems. For example:

• Model calibration – Least squares problems.

• Portfolio optimization – Eigen problems.

• Correlation based multivariate models – LU, QR, Cholesky, SVD.


• Linear Algebra PACKage (LAPACK)

• De facto industry standard interface.

• Highly optimized Intel MKL implementation.

24


Linear Equations Solvers

Ax = b

• Solving many types of systems: general, band, tridiagonal, symmetric positive definite, etc.

Least Squares Problem Solvers

min||Ax – b||2

• Solving linear least squares problems and generalized linear squared problems.

Singular Value Decomposition

A = UΣVH

• A set of algorithms for SVD for general real or complex rectangular matrices.

Eigen Solvers

Az = λz,

zHA = λzH

• Solving symmetric, non-symmetric, generalized symmetric-definite Eigenvalue problems.

Intel MKL LAPACK

25

Vectorized and threaded.

Uses Intel MKL BLAS for efficient matrix and vector operations.

Cluster support: ScaLAPACK

Fully support LAPACK 3.1 spec.


A Good Example That Uses Many Intel MKL Components: STAC* A2 Benchmark Suite

Vendor-neutral benchmark suite focuses on:

– Market risk management.

– Real-time, near-real-time models.

– Strategic backtesting, …

26


Intel Implementation of STAC* A2

Intel provides the first implementation

– Seamlessly scales STAC* A2 from generation to generation of Intel architectures.

– A SC’12 paper: http://www.stacresearch.com/SC12_submission_stac.pdf

Source code and performance data:

– STAC* website: http://www.stacresearch.com/a2

– Available to registered users.

27

12/12/2012

Intel® Parallel Studio XE – Tools for development

Intel® C++ Compiler XE

Intel® MKL (BLAS, LAPACK, RNG, summary stats, data fitting functions)

Intel® Vtune™ Amplifier XE

http://www.stacresearch.com/SC12_submission_stac.pdf

http://www.stacresearch.com/a2

A quick overview of Intel MKL

28


29


Broad OS and Language Support

30

Static and dynamic runtime libraries, 64-bit and 32-bit

Windows* Linux* Mac OS*

Compiler Intel, CVF, Microsoft, PGI Intel, GNU, PGI Intel, GNU

Libraries .lib, .dll .a, .so .a, .dylib

Language Support

Domain Fortran 77 Fortran 95/99 C/C++

BLAS X X Via CBLAS

Sparse BLAS Level 1 X X Via CBLAS

Sparse BLAS level 2&3 X X X

LAPACK X X X

ScaLAPACK X

PARDISO X X X

DSS & ISS X X X

VML/VSL/DF X X X

FFT/Cluster FFT X X

Fast Poisson, Laplace, Helmholtz X X

Optimization (TR) Solvers X X X


What About Third Party Math Libraries and Tools?

Many of them are already powered by Intel MKL:

– MATLAB*, Mathematica*, NumPy*/SciPy*, NAG*, IMSL*, …

– Can be manually built/updated with the latest MKL release.

– http://software.intel.com/en-us/articles/intel-mkl-and-third-party-applications-how-to-use-them-together

But there are major benefits to use MKL directly:

– No delay in unlocking performance features of new Intel architectures.

– Access to the entire MKL functionality.

31

http://software.intel.com/en-us/articles/intel-mkl-and-third-party-applications-how-to-use-them-together

























Top New Features in Intel MKL 11.0

Conditional Numerical Reproducibility (CNR)

• Run-to-run and processor-to-processor reproducibility.

• A popular request by many FSI customers.

• http://software.intel.com/en-us/articles/conditional-numerical-reproducibility-cnr-in-intel-mkl-110

Support for Intel® Xeon Phi™ coprocessors

• http://software.intel.com/en-us/articles/intel-mkl-on-the-intel-xeon-phi-coprocessors

Optimizations for Intel® AVX2 including FMA3

• http://software.intel.com/en-us/articles/haswell-support-in-intel-mkl/

32

http://software.intel.com/en-us/articles/conditional-numerical-reproducibility-cnr-in-intel-mkl-110

















http://software.intel.com/en-us/articles/intel-mkl-on-the-intel-xeon-phi-coprocessors

















http://software.intel.com/en-us/articles/haswell-support-in-intel-mkl/












Online Resources

Intel® MKL product web site:

– http://software.intel.com/en-us/articles/intel-mkl/

– Performance data and comparison

– Documentation

– User forum

Intel software development products:

– Intel® Parallel Studio XE

– Intel® Cluster Studio XE

Financial Services Industry Community:

– http://software.intel.com/en-us/financial-services

33

33

http://software.intel.com/en-us/articles/intel-mkl/





http://software.intel.com/en-us/forums/intel-math-kernel-library/

http://software.intel.com/en-us/intel-parallel-studio-xe

http://software.intel.com/en-us/intel-cluster-studio-xe

http://software.intel.com/en-us/financial-services







34

Backup Slides

35


36


MKL RNGs Performance Summary

Cycles-per-element (CPE): Lower is better.

0

3

6

9

12

15

18

MCG31 MCG59 MRG32K3A MT19937 MT2203 NIEDERR R250 SFMT19937 SOBOL WH

CP

E

Basic RNGs

Intel® MKL 11.0 Uniform distribution generation

Intel® Core™ i7-2600

Single precision

Double precision

37


38


39

optimize financial applications using intel® math kernel libraryinfringement of any patent,...

Documents