rapid application design in financial services
TRANSCRIPT
Rapid Application Design Patterns for Financial Services Applications
—Brian Bulkowski, CTO and FounderSeptember 13, 2016
2CONFIDENTIAL © 2016 Aerospike Inc. All rights reserved. [ ]
What is Aerospike ?
Large-scale DHT Database ( 10B ++ objects, 100T++, O(1) get / put )… with queries, data structures, UDF, fast clients ...... On Linux ...
High availability clustering & rebalancing ( proven 5 9’s, no load balancer )
Very high performance C code – reads and writes( 2M++ TPS from Flash, 4M++ TPS from DRAM PER SERVER )
KVS++ provides query, UDF, table/columns, aggregations, SQL
Direct attach storage; persistence through replication and Flash
Cloud-savvy – runs with EC2, GCE others; Docker, more …
Dual License: Open Source for devs, Enterprise for deployment
4CONFIDENTIAL © 2016 Aerospike Inc. All rights reserved. [ ]
Identified where scale is mission critical – the AdTech Industry
1 Expanding into other use casesin BFSI, Telco, Retail, IoT
4
Built database that can scale for RTB use case for these customers and more….
2
The Scalable Operational Database Problem
3
Deployed @ Digital Enterprises that need consumer scale databases
5CONFIDENTIAL © 2016 Aerospike Inc. All rights reserved. [ ]
The only way for traditional enterprises to easily build Digital business is adapting to 2-Speed IT decoupling Systems of Record and Systems of Engagement
Back office legacy Enterprise Scale Applications that move at a slower pace and act as Systems of Record
Front office Consumer Scale Digital Applications thatmove at a Faster pace and act as Systems of Engagement
Enterprise Requirement: 2-Speed IT
6CONFIDENTIAL © 2016 Aerospike Inc. All rights reserved. [ ]
Architecture Overview
LEGACY DATABASE(Mainframe)
XDR
DATA WAREHOUSE/DATA LAKE
LEGACY RDBMSHDFS BASED
BUSINESSTRANSACTIONS
Web views
( Payments ) ( Mobile Queries )
( Recommendation ) ( And More )
High Performance NoSQL
“REAL-TIME BIG DATA”“DECISIONING”
YYYOUR APPLICATION FRAMEWORK
YY
Decisioning Engine
7CONFIDENTIAL © 2016 Aerospike Inc. All rights reserved. [ ]
Architecture Overview
LEGACY DATABASE(Mainframe)
XDR
Decisioning Engine
DATA WAREHOUSE/DATA LAKE
LEGACY RDBMSHDFS BASED
BUSINESSTRANSACTIONS
Web views
( Payments ) ( Mobile Queries )
( Recommendation ) ( And More )
High Performance NoSQL
“REAL-TIME BIG DATA”“DECISIONING”
500Business Trans per sec
5000 Calculations per sec
X =
2.5 M Database Transactions per sec
8CONFIDENTIAL © 2016 Aerospike Inc. All rights reserved. [ ]
■Retail brokerage: risk recalculation in minutes, not hours■Internet cache replacement / session management■Telecom integrated real-time billing and routing
■Account status determines routing, traffic shaping
■Retail predictive analytics■Machine learning
■Online learning, Spark accesses / updates shared models
■Gaming and gambling■Social messaging
Similar use cases
9CONFIDENTIAL © 2016 Aerospike Inc. All rights reserved. [ ]
Architecture
1) No Hotspots – Distributed Hash Table simplifies data partitioning
2) Smart Client – 1 hop to data, no load balancers
3) Shared Nothing Architecture, – every node is identical
4) Smart Cluster, Zero Touch – auto-failover, rebalancing, rack aware, rolling upgrades
5) Transactions and long-running tasks prioritized in real-time
6) XDR – sync replication across data centers ensures
– Zero Downtime
10CONFIDENTIAL © 2016 Aerospike Inc. All rights reserved. [ ]
OTHER DATABASE
OS FILE SYSTEM
PAGE CACHE
BLOCK INTERFACE
SSD HDD
BLOCK INTERFACE
SSD SSD
OPEN NVM
SSD
OTHER DATABASES
AEROSPIKE FLASH OPTIMIZED DATABASE
AEROSPIKE
HYBRID MEMORY SYSTEM™
• Direct device access• Large Block Writes• Indexes in DRAM• Highly Parallelized• Log-structured FS “copy-on-
write”• Fast restart with shared
memory
Flash Optimized High Performance
12CONFIDENTIAL © 2016 Aerospike Inc. All rights reserved. [ ]
■3 node cluster, Intel S3700 SSDs■Followed religiously all DataStax recommendations■Standard YCSB, includes instructions to reproduce for your workload■http://www.aerospike.com/blog/comparing-nosql-databases-aerospike-and-
cassandra/
Aerospike vs Cassandra ( 2016 )
15CONFIDENTIAL © 2016 Aerospike Inc. All rights reserved. [ ]
■ Redis 154K TPS, Aerospike 1.15M TPS ( non persistent, DRAM )■ Redis 138K TPS, Aerospike 1.11M TPS ( persistent, DRAM )
Redis is single core
Redis requires client sharding or RedisCluster
■PowerEdge R730xd, Xeon E5-2695 2.3Ghz, 256GB memory, 10GB net
■Redis 3.2.1 , Aerospike 3.8.4 , Centos 7.2.1 ( RHEL 7 compatible ).
■50 / 50 read write. 100B object size. 1 hour test. 100M objects.
Redis vs Aerospike : YCSB ( 2016 )
PRELIMINARY RESULTS NOT PUBLISHED
17Proprietary & Confidential | © 2015 Aerospike Inc. All rights reserved. [ ]
Challenge■DB2 stores positions for 10 Million customers■Must show balances on 300 positions, process
250M transactions, 2 M updates/day■Application design with true current transaction
status■Support new applications ( fraud, risk )
Need to scale reliably ■ 3 13 TB■ 100 400 Million objects■ 200k I Million TPS
Selected NoSQL■Predictable Low latency at High Throughput ■ Immediate consistency, no data loss■Hot standby replication ( XDR )■10 Server Cluster
IBM DB2(MAINFRAME)
Read/Write
Start of Day Data Loading
End of DayReconciliation
QueryREAL-TIME DATA FEED
ACCOUNTPOSITIONS
XDR
Fin Serv – Positions System of Record
19CONFIDENTIAL © 2016 Aerospike Inc. All rights reserved. [ ]
Business Challenge• Meet payment SLA of 750 ms• Differentiate between fraudulent and legitimate
orders in real-time• Stop loss of business due to latency• Support thousands of DB reads/writes per credit
card transaction• Bring new algorithms and data
online within weeks
Prevent Only Fraudulent Transactions
20CONFIDENTIAL © 2016 Aerospike Inc. All rights reserved. [ ]
Bespoke Algorithms• Track every IP address• Recent fraudulent cart items• Machine learning generated filters• Flexibly add new “features”
Not business process modeling
Easier to code and validatethan SQL queries
Credit Card Processing System
Fraud Detection & Protection App
RulesRule 1Rule 2Rule 3
Historical Data
Rule 1-PassedRule 2-PassedRule 3-Failed
Account BehaviorStatic Data
AccountStatistics
Prevent Only Fraudulent Transactions
10TB and 1M TPS per server means
10’s of servers, not 1000’s
© 2016 Aerospike. All rights reserved. 21
Fraud with machine learning – “features”
■Anything can be a “feature”■POS type, item sold, price of item, time of day for given items■Browser type, screen resolution, item + browser
■But you’ll need recent data
■And cutting edge libraries
( Aerospike-sponsored car accident predictionin real time using Dato )
© 2016 Aerospike. All rights reserved. 22
Example – IP addresses
■2B to 4B addresses
■Unknown number of networks
■Different risk models for different network ranges
■Recent behavior required
© 2016 Aerospike. All rights reserved. 23
Example – IP based scoring
ID START_IP END_IP Score1 192.168.4.1 192.168.4.26 202 192.168.4.7 192.168.17.0 2
SELECT id, score WHERE START_IP >= this_ip AND END_IP <= this_ip
Not so bad! Too slow, and
too simple
© 2016 Aerospike. All rights reserved. 24
Using SQL
ID IP TIMESTAMP BEHAVIOR1 192.168.4.1 2015-06-02 12:00 transaction2 192.168.4.7 2015-06-02 12:01 tweet
Add the most recent 5 minutes of traffic for each IP addressand lookup by previous network
INSERT ip, time, behaviorFOR ( network ) SELECT behavior WHERE ip <= network.start_id AND
ip >= network.end_id AND
timestamp > now - 600 You can do it in
SQL ! Too slow.Call the DBAs.
© 2016 Aerospike. All rights reserved. 25
With a NoSQL KVS
key Network list192.168.4.0 [ net1, net2, net3 ]192.168.5.0 null
Key on Class C address List of networks
Batch lookup networks filter on client
That’s messy !
Two network roundtrips,it’ll go fast
ID START_IP END_IP Scorenet1 192.168.4.1 192.168.4.26 20net2 192.168.4.7 192.168.17.0 2
© 2016 Aerospike. All rights reserved. 26
With a NoSQL KVS – add behavior
key Network list192.168.4.0 [ net1, net2, net3 ]192.168.5.0 null
Already have filtered list of networksFor each network,
insert into capped or time limited collectionThat’s really
messy
It’s fast and easy to explain
ID START_IP END_IP Score behaviornet1 192.168.4.1 192.168.4.26 20 { collection }
net2 192.168.4.7 192.168.17.0 2 { collection }
© 2016 Aerospike. All rights reserved. 27
Use your favorite environment
Use cutting edge librariesin your favorite app
environment
© 2016 Aerospike. All rights reserved. 29
Multiple engagements
■10B to 50B objects
■300 T before HA replication
■2 M TPS
■50 milliseconds per 5,000 queries
■Local and remote DR
What’s an Exadata license cost ?
© 2016 Aerospike. All rights reserved. 30
1 Million TPS on 1 Aerospike Server (in Flash)
Options for storage on a database before Aerospike:
RAM, which was fast, but allowed very limited storage Disk, which allowed for a lot of storage, but was limited in
speed
Intel achieved 1M TPS using 4 Intel P3700 SDs with 1.6 TB capacity on a single Aerospike server. The cost per GB is a fraction of the cost of RAM, while still having very high performance.
© 2016 Aerospike. All rights reserved. 32
■ Linux, Windows drivers achieve performance■ U.2 and M.2 form factors available
■ Intel P3700, P3600, P3500 available – ■250k IOPs per card
■ Samsung PM1735 available – ■120k IOPs per card
■ Micron 9100
■ HGST, Toshiba – 30k to 50k per card
■ SAS / SATA lingers■Samsung SM1635, PM1633; Intel S3700; Micron S600 still shipping
NVMe Arrives – 2015 to present
© 2016 Aerospike. All rights reserved. 33
■High, predictable performance■High transfer speed reduces jitter
■2.8uS NVMe vs 5.0uS SATA
■Better controllers
■Mature and tuned Linux driver
■U.2 front panel hot swap available ( and 24-wide )
■NVMe arrays available■Apeiron ADS1000 “direct scale-out flash”
■EMC D5, Mangstore NX63020
■All new Aerospike deployments on NVMe
NVMe’s crushing superiority
© 2016 Aerospike. All rights reserved. 34
■Every public cloud provider has Flash■AWS / EC2 has sophisticated offerings
■Google Compute is high performance
■Softlayer allows own-hardware
■Private clouds manage Flash■Docker offers storage metadata
■Pivotal manages Flash and traditional storage
Flash in the Public Cloud
© 2016 Aerospike. All rights reserved. 35
■ Database knows best, not storage■Database should manage
consistency vs availability
■Database should manage views and snapshots
■ Array vendors have started making “databases”■“Object stores”
■ High Density “flash aggregation”■Sandisk Infiniflash – SATA
■ High-read, or write-once, applications
■Apeiron ASD1000 – NVMe■ Read and write applications
■Vexata, others
“All Flash arrays”
© 2016 Aerospike. All rights reserved. 37
■ Persistent storage using chips■No power while idle
■ Does not use transistors■Resistor / phase change “but different”
■ Chips almost as fast as DRAM ( 100 ns, not 10ns )■NAND is 10 ms to 5 ms
■ Higher density than DRAM, lower than NAND
■ “Infinite” write durability
■ 128B read and write granularity■NAND write granularity --- 16 MB
■DRAM write granularity --- 64 B
What is 3D Xpoint?
© 2016 Aerospike. All rights reserved. 38
■ Optane this year■3D Xpoint in 2.5” NVMe package
■“7x faster” – limited by NVMe !
■Very high write durability
■Replaces SLC for some use cases
■Unknown pricing
■ NVDIMM ( on memory bus )■Removes NVMe limit
■Intel cagy on delivery – “uncommitted”
■Competes with DRAM
■Hard to program to■Really changes the world
Intel’s 3D Xpoint roadmap ( public info )
© 2016 Aerospike. All rights reserved. 39
Aerospike has the experience
■ Flash optimized from the start
■ 5 9’s reliability( longest running cluster: 5 minutes of outage in 4.5 years )
■ Multi-million TPS clustersin production today
■ Enterprise support of your POC
■ Used in financial services, telecom, gaming( and adtech of course )