simulation in a big data world - b. nikolic · simulation in a big data world bojan nikolic bn...
Post on 30-May-2018
218 Views
Preview:
TRANSCRIPT
SIMULATION IN A BIG DATA WORLD
Bojan Nikolic
BN Algorithms Ltd
London 6th July 2017
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
LEGAL DISCLAIMER
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
1. No warranty, express or implied, to the fitness of the presented architecture or code for any purpose
2. Use entirely at your own risk – BN Algorithms or Bojan Nikolic shall not be responsible for any damages direct or
consequential
3. © Bojan Nikolic 2017, All Rights Reserved. Code sections present in this presentation are licensed to the reader
under the QuantLib open source license.
4. Apache Spark and Apache Zeppelin are trademarks of the Apache Software Foundation
5. AWS EMR is a product of Amazon Inc
Using “Big Data” Technologies to scale-up
simulation workloads
Concrete technologies: Apache Spark + Zeppelin
stack
Concrete example workload: valuation of
financial derivatives using QuantLib
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
THIS TALK IS ABOUT…
•Reuse the investment in Big Data technologies in a different, large, field:
•Training
•User Interfaces
•Cloud infrastructures/Own Data centre deployments
•Opportunity to combine big data analysis and simulations
•Opportunity for technology transfer from simulations into big data
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
THIS IS (HOPEFULLY!) OF INTEREST BECAUSE…
Career in Finance & Science connected by computing:
Design of PetaFLOPS+/PetaByte+ computing systems to process radio astronomy data
Grid/Cloud Risk Management systems for financial derivatives
Technologies enabling novel large radio astronomy telescopes
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
ABOUT ME…
Why derivatives? Simplify business, diversify risk
(But can equally be used to amplify risk)
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
FINANCIAL DERIVATIVES
Derivative = A security (contract) whose value depends in a non-linear way to
a price in an open market
Example: European Call Option at maturity 𝑃 𝑆 = ቊ0 𝑆 ≤ 𝐾𝑆 − 𝐾 𝑆 > 𝐾
What does the business want?
1. Short Time-To-Solution Reliably, (Low Max Power)
2. Low Capital Expenditure
3. Low Total Cost of ownership
4. High Degree of flexibility
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
ARCHITECTURAL DRIVERS
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
ARCHITECTURE
Boost C++ Libraries
QuantLib
QuantLib -- SWIG
Apache Spark
Apache ZeppelinKEY:
Non-Domain
Specific Module
Finance-Specific
Module
X Y “X” uses “Y”
1. Project started in 2000 – now on 17 years of active development
2. C++ / Object Oriented Architecture, 2000s vintage
3. BSD-like license
4. In commercial use at a number institutions (generally they are shy of publicising their use)
5. Designed primarily for single threaded use
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
ABOUT QUANTLIB
OO design patterns enforce a distribution topology and organisation
1. Usually very inefficient to scale-up in this enforced topology/organisation
2. Difficult to ensure reliability
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
OBJECT ORIENTATION VS BIG DATA TECHNOLOGIES
Object Orientation
==
“Dataless Programming”
Not a good match for
“Big Data” Technologies!
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
TASK GRANULARITY-COUPLING PLANETa
sk G
ranu
lari
ty
Task Coupling
Fine Grained
Coarse Grained
Loosely Coupled Strongly Coupled
Derivatives
Risk Analysis
MCMC
Optimisation
Hydrodynamic
Simulation
Deterministic Convex
Problem Optimisation
1. Manually aggregate a fixed number of valuation into a single task
2. Each of these tasks recreates all the necessary objects
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
TASK STRATEGY
Sample Application:
Calculate model value for a set of 100 swaptions for a wide range of model
market conditions
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
QUANTLIB SWIG BINDING ARCHITECTURE
QuantLib
QuantLib
– SWIG
Interface
definitions
Apache SparkKEY:
Non-Domain
Specific Module
Finance-Specific
Module
X Y “X” uses “Y”C++ Adapters
Java Classes
JVM Bytecode
C / JNI
C++ APIX Y
“Y” is auto-
generated from “X”
using SWIG
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
SPARK ZEPPELIN ARCHITECTURE – COMPONENT & CONNECTOR VIEW
HTML/JS Client
Apache Zeppelin
Driver
Cluster
Scheduler
Worker Worker Worker
KEY:
Run-Time
Component
X Y
“X” sends requests
to “Y”, “Y” replies
asynchronously
Build QuantLib & Java bindings for
the AWS EMR runtime environment
Setup & spinup an EMR cluster
Construct a Zeppelin notebook with Scala/Spark
QuantLib simulation/valuation
Explore/visualize the simulation results
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
STRATEGY
HOW TO BUILD QUANTLIB FOR AMAZON EMR:
1. Single standalone shared (*.so) library, compatible with GCC 4.8 used by AWS EMR
2. SWIG Java bindings (for use viaScala/Spark) -> Single .JAR
3. NIX system the recommended way to build!
4. Pre-built example used here:1. Shared Lib:
https://s3.amazonaws.com/bnalgo-ql-emr-77x45/libQuantLibJNI.so
2. JAR: https://s3.amazonaws.com/bnalgo-ql-emr-77x45/QuantLib.jar
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
1. Standard AWS EMR Zeppelin + Spark (here using EMR AMI V5.7)
2. Bootstrap actions to add QuantLib:s3://bnalgo-ql-emr-77x45/qlemrbootstrap.sh
3. Spin-up & all ready-to-go (in 5 mins!)
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
CLUSTER AMAZON EMR START-UP
Scala/Spark QuantLib
valuation function
• Spark does not know how to distribute QuantLib values
• Does know how to distribute Scala closures, including QuantLib types-> Write functions which close over an environment without any QuantLib values
QuantLib
Global State
• Spark does not distribute or synchronise the QuantLib global state
• Write in a functional style – set global state in each valuation function
• Use executor-cores = 1 to separate global state between tasks
Spark distributed Map operation
• The input RDD is the set of simulation input parameters (i.e., scenarios)
• The map function which is the QuantLib valuation function
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
SCALA/SPARK APPLICATION: THE THREE TRICKY BITS
1. For many traditional “Big-Data” applications 𝜌~1 to 10
2. In this example: 𝜌 ≫ 106
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
SIMULATION OR BIG DATA ?
𝜌 =Number of executed CPU operations
Number of bytes of input from storage or network
1. Scale-up
2. Resilience
3. Use existing internal infrastructure or public cloud
4. Load balance against other analytics work
5. Results are stored in an environment ideally suited for further analysis and visualisation
6. Reproducibility
7. Collaboration with remote colleagues
London -- 06/07/2017 BOJAN NIKOLIC -- BN ALGORITHMS LTD
WHAT HAVE WE ACHIEVED?
top related