paypal's usage of hpc and infiniband as presented at isc big data 2013
TRANSCRIPT
LEVERAGING HPC AND
ENTERPRISE ARCHITECTURES
FOR LARGE SCALE INLINE
TRANSACTIONAL ANALYTICS IN
FRAUD DETECTION AT PAYPAL
Image from Boris Müller’s “Visual Poetry 6” (http://www.esono.com/boris/projects/poetry06/visualpoetry06/)
Arno Kolster Sr. Database Architect
(Advanced Technology Group – Site Operations Infrastructure)
September 26, 2013
THE PROBLEM
Detecting fraud in 'real time’ as millions of transactions are
processed between disparate systems at volume.
3
Ability to create and deploy new fraud models into event
flows quickly and with minimal effort.
Provide environment for fraud modeling, analytics,
visualization, M/R, dimensioning and further processing.
Finding suspicious patterns that we don’t even know exist in
related data sets.
THE CHALLENGES
5 9s availability, scalability and reliability in a 24x7x365
environment. “PayPal is always open” *.
4
Maintaining a graph of identities, transactions, ips, etc. to
support the models.
Keep Operations simple. Small team of SAs and DBAs.
How to keep fraud models current and ensure integrity of
incoming events and data.
Educate peers and higher ups of new technology and
concepts so they ‘get it’. “HPC what?”
VOLUME FOR THIS HYBRID SYSTEM?
11 million+ PayPal logins / day.
5
500 variables calculated per event for some models.
~4 Billion inserts / day.
13 million+ financial transactions / day.
~8 Billion selects / day.
OUR SOLUTION - TRINITY
Highly distributed open source databases for OLTP storage
of nodes and edges. Architected for scale out, up and HA.
6
Intelligent gateways, message routing & delivery to
heterogeneous systems. Event everything.
Inline stream analytics using CEP and ESP.
Leveraged HPC architecture and hardware where it made
sense.
Downstream analytics environments for further processing.
Real time linking platform for identities from various source
systems. Built a giant graph.
Standardized operations – h/w, s/w deployment, monitoring,
command & control processes, etc.
7
HIGH LEVEL SYSTEM OVERVIEW
F
L
O
W
S11
M11
S0 SS1 M0 M1
… App App
App
DAL
SGW SGW
INCOMING EVENTS
SGW
S95 M95
S0 SS1 M0 M1
… App App
App
DAL
ANALYTICS (OLAP/MapReduce)
Trinity Identification Service
(TIS)
MODELS
Extensible Financial Linking
(EFL) MYSQL REPLICATION
TRINITY DB PROLIFERATION
9
2007 2013
1800 Instances
8+ Billion Selects/Day
600 Masters, 1200 ROs
36 Instances
12 Masters, 24 ROs TIS, TAS
5 Billion Nodes
Billions of Edges
TIS, TAS
ARS, NEO, UVS, NA
EFL 12 shards
Scale up & scale out 3 DBAs / 4 SAs
3 DBAs / 4 SAs
DERIVATION OF MODELS
Metrics / variables / summaries generated from inline
processing of events using CEP (>500 metrics / event)
10
Scoring of events based on historical and current metrics.
Scores sent on to PayPal flows for further RISK modeling or
transaction blocking.
Different Fraud Models generate different types of scores.
New Fraud Models created based on success ratio of
previous ones or reaction to change in data and usage
patterns. (R, SAS, M/R, vectoring)
STREAMING ANALYTICS FLOW (FRAUD)
11
AZURE DB
INDIGO DB
IDENTITY
SGW
POOL
INDIGO
SGW
POOL
AZURE
APP
POOL
INDIGO
APP
POOL
1
TIS
POOL
CERULEAN
SGW
POOL
CERULEAN
POOL
2
BES/RES
POOL
COBALT
POOL
(SFS)
TIS DB
M
E
S
S
A
G
E
B
U
S
PP
EFL
IDENTITY
SGW
POOL
REST
SOAP
RES
ATE
C
E
P
C
E
P
3
4
7
5
6
WHERE ARE WE USING HPC?
Infiniband on all internal Trinity network (Mellanox QDR
40Gb dual plane)
12
3 SGI Altix ICE 8200/8400 clusters for all 120+ EFL memory
based apps – no disk i/o overhead.
MPI “like” apps. MPP features with scale out and affinity
processing.
SGI InifiniteStorage IS4600 for EFL databases.
Lustre on Hadoop cluster.
SGI ALTIX 8200/8400 ICE CLUSTERS
13
156 sockets
1872 cores
7.5Tb RAM
Intel Xeon X5650
2.67Ghz
78 nodes
TRINITY STATISTICS
15
HAPPY HOLIDAYS 2012
Messages Events Database
Sent Rcvd Sent Rcvd DB Op:
SELECT
DB Op:
INSERT
DB Op:
UPDATE
Rows
Read
Rows
Inserted
Rows
Updated
Bytes
Sent
Bytes
Rcvd
Per
Day 404,920 3.27 B 4.3 B 8.89 B 3.47 B 3.35 B 882.4 G 1,801 G
Per
Secon
d
40,091 17,235 380 1,250 4,687 37,855 49,719 102,934 40,156 38,762 10.2 G 20.1 G
Totals
/Sec 57,326 1,630 92,261 181,851 30.3 G
WHAT’S NEXT?
Addition of new link types into the network graph. Will add
60+ dbs.
16
Co-sponsoring a BOF at SC13 with ORNL called “Big Data:
Industry views on real-time data, analytics, and HPC
technologies to bring them together.”
Change key/value type data store from structured to semi-
structured.
Keep educating peers and higher ups so they ‘get it’. “HPC?
Yes. We want more!”
New project involving enterprise integration with HPC
technology called ‘Systems Intelligence’ for PayPal
ecosystem management.