big data & rocket fuel -...
TRANSCRIPT
Big Data & Rocket Fuel
Dr Raj Subramani, HSBCReza Rokni, Google Cloud, Solutions ArchitectAdrian Poole, Google Cloud,
&
Eight cloud products with
ONE BILLIONUsers
Organize the world’s information and make it universally accessible and useful
Google’s Mission
18 years of Google R&D /
Investment
Prohibitively Expensive
Mar
gina
l cos
t of
chan
ge
$
Increasing complexity of systems and processes
Trad
itiona
l Arc
hitec
ture
s
Google Cloud Native Architectures (GCP)
Increasing Marginal Cost of Change
Containers at Google
4
2004 2016
Core Ops Team
Number of running jobs
Enabled Google to grow our fleet over 10x faster than we grew our ops team
55
Google’s innovation in data
2012 20132002 2004 2006 2008 2010
GFS
MapReduce
Bigtable Colossus
Dremel Flume
Megastore
Spanner
Millwheel
Pub/Sub
F1
2016
Dataflow
TensorFlow
Proprietary + Confidential
6
2012 20132002 2004 2006 2008 2010
GCS
Dataproc
Bigtable GCS
BigQuery Dataflow
Datastore
Dataflow
Pub/Sub
2016
Dataflow
NoSQL
Google’s innovation in data
Proprietary + Confidential
Spanner
Spanner
Cloud ML
Now available on Google Cloud Platform
Big Data
Compute
ComputeEngine
App Engine ContainerEngine
Storage & Databases
Storage Cloud SQLBigtable
Machine Learning
Spanner Datastore
BigQuery Pub/Sub Dataflow Dataproc Datalab Speech APIMachine Learning
Translate APIVision API
● Democratise ML
● Big datasets beat fancy algorithms
● Good Models
● Lots of compute
Lesson of the last 10 years...
Google BigQueryBigQuery is Google's fully managed, petabyte scale, low cost enterprise data warehouse for analytics. BigQuery is serverless. There is no infrastructure to manage and you don't need a database administrator, so you can focus on analyzing data to find meaningful insights using familiar SQL. BigQuery is a powerful Big Data analytics platform used by all types of organizations, from startups to Fortune 500 companies.
Simple: Fully Managed and Serverless
Convenient: Mb -> Pb Scale and Fast Convenience of SQL
Secure: Encrypted, Durable and Highly Available
What is Cloud Dataflow?
Intelligently scales to millions of QPS
Open source programming model
Unified batch and streaming processing
Fully managed, no-ops data processing
Confidential + Proprietary
Google Cloud Dataflow
Big Data at HSBC Scale
Dr Raj Subramani, HSBC
Fundamental Review of the Trading Book
Fundamental Review of the Trading Book (FRTB)● Basel Committee on Banking Supervision (BCBS) conducted two
assessments (The Regulatory Consistency Assessment Programme - February and December 2013) for capital charges of market risks in trading books for institutions with approved internal models
● The significant differences in capital charges confirmed that the market risk framework was in need for reform
The regulations, in their final form, were published in January 2016
National supervisors are expected to finalize implementation by January 2019
Banks are expected to report under the new standards by end of 2019
Fundamental Review of the Trading BookTrading Book and Banking
Book Boundary
FRTB
TreatmentOf Credit
(securitised v/s non-securitised)
ApproachTo Risk
Management(VaR to Expected
Shortfall)
Incorporation of liquidity horizons
Treatment of Hedging and Diversification
Relationship between Internal Model (IM) and Standardized Approach (SA)
Working in the Cloud – the tradeoffs
Technologyoutcomes
Public CloudRisks
CostOutcomes
GovernanceRisks
● Business focused IT solution● Access to latest technology● Rapid prototyping● Quicker time to market
● Reduced capacity lag● Scalability and performance● Reduced total cost of ownership
● Internal Security clearance● Regulatory approval● Data sharing across borders● Geo-political issues
● Data security risks● Lock-in risks● Third party dependency risks
Proprietary + Confidential
Cloud Dataflow
Compute and storage
Unbounded
Bounded
Resource management
Resource auto-scaler
Dynamic work rebalancer
Work scheduler
Monitoring
Log collection
Graph optimization
Auto-healing
Intelligent watermarking SOURCE
SINK
Trade & Market DataTransferred to the Cloud (batch or stream)
Storage
Market Data
Trade Data
Pub/Sub
Unbounded
Bounded
Dataflow
Analytics
BigQuery
Post
Processing
Store results Post-process
The Anatomy of a Risk Engine
Data distribution and workflow across the analytics
● 2 million (dummy) plain vanilla mono currency interest rate swaps in 12 currencies● Dummy interest rate market data build from Bond, Futures and Swaps● Analytics was open source Quantlib (C++ compiled on Linux)
Dataflow as Risk Engine - Scale and Performance
JVMrunning
C++
● Performance gains are not always obtained straight out of the box
● Application of domain knowledge and expertise will always help tease out the best desired performance
Dataflow as Risk Engine - Stateful Analytics
The Cloud Journey
• Bring the business problem not a technical solution
• Beware the frog in the well
• Big Data in Google is just data; the separation of the data from the processing, in Google, allows for clever combinations to address both scenarios
What next ?
• Sign up for a Google Cloud account - first $300 free !
• Google Cloud courses @ https://www.coursera.org/ including Qwiklabs
• Contact Ian O’Shea ( [email protected] ) for further info.
Thank you
Dr Raj Subramani, HSBCReza Rokni, Google Cloud, Solutions ArchitectAdrian Poole, Google Cloud, Financial Services
&