oltp%on%hadoop:%reviewing% …info.splicemachine.com/rs/331-feu-348/images/... ·...
TRANSCRIPT
Splice Machine | Proprietary & Confiden6al
OLTP on Hadoop: Reviewing the first Hadoop-‐based TPC-‐C
benchmarks
Monte Zweben Co-‐Founder and Chief Execu6ve Officer
John Leach
Co-‐Founder and Chief Technology Officer
September 30, 2015
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
The Tradi6onal Database Market
2
Opera2onal -‐ $24B
Analy2cal -‐ $11B
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
Origins of Hadoop
3
2003 2004 2005 2008
Batch Processing for Web Search Index
Batch AnalyGcs
Big Batch Processing Jobs to Batch AnalyGcs
Google File System (GFS)
paper published
Google Map Reduce paper published
Hadoop created based on GFS and Map Reduce
Hive created by Facebook for Analy6cs
2006 2007
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
Moving Hadoop Beyond Batch AnalyGcs to Power Real-‐Time Apps
4
Hadoop – Not Just for Data Scien6sts Anymore
Distributed File System
Java MapReduce Programs
Read-‐Only
Batch Analy6cs
Distributed RDBMS
SQL-‐99 Queries
Real-‐Time Updates with ACID Transac6ons
Real-‐Time Concurrent Apps and Analy6cs
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
SQL-‐on-‐Hadoop Market
5
OLTP OLAP
Splice Machine | Proprietary & Confiden6al
OLTP Requirements
§ ACID Transac6ons § High concurrency § Secondary indexes § Joins § Stored procedures
§ Triggers § Constraints § Foreign keys § Sub-‐queries § Views
6
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
ACID – What is it? Why is it important?
Atomicity
7
Consistency Isola2on Durability
Update a value and a secondary index
Recover from batch update error without reload
$ $
Transfer $ between bank accounts
Reliable updates across mulGple rows and tables
Use Cases
Transac6ons are all or nothing
Only valid data is saved
Transac6ons do not affect each other
Wrigen data will not be
lost
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
Doesn’t Hive Have ACID Transac6ons?
8
Hive Can’t Power an App: High Concurrency vs Batch TransacGons
~10 concurrent users
Up to 100,000 concurrent users
Opera2onal RDBMSs
“…designed for an analy2c workload…low concurrency – …10 – 50 users upda6ng …Mul6ple writers … wait behind each other…Hive is not meant for low latency updates and deletes…” Hortonworks blog, Apache Hive ACID TransacGons in HDP 2.2
• Batch updates • Table locking
• Mul6-‐Version Concurrency Control • Power applica6ons
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
Snapshot Isola6on: High-‐Concurrency MVCC Transac6ons
9
§ Leverages Mul6-‐Version Concurrency Control (MVCC)
§ Each update creates a new version with a new 6mestamp
§ Each transac6on can see its own “virtual” snapshot of database
§ Writers don’t block readers
Splice Machine | Proprietary & Confiden6al 10 Splice Machine | Proprietary & Confiden6al
Splice Machine
Replace Oracle to scale out your applica2ons
" Affordable, Scale-‐Out – Commodity hardware " Elas2c – Easy to expand or scale back " Transac2onal – Real-‐6me updates & ACID
Transac6ons " ANSI SQL – Leverage exis6ng SQL code, tools,
& skills " Flexible – Support opera6onal and analy6cal
workloads
The RDBMS on Hadoop
Splice Machine | Proprietary & Confiden6al 11 Splice Machine | Proprietary & Confiden6al
Compelling TCO: Sample Oracle Replacement Oracle RAC Costs List Price Unit 3 Year Cost
Oracle Database Enterprise Edi6on with RAC
$37,750 64 $2,416,000
3 years DB Maintenance (22% list price/yr)
$24,915 64 $1,594,560
3 years Opera6ng System Support (Oracle Linux)
$6,897 4 $27,588
Server Costs (mid-‐range, Intel Xeon-‐based)
$16,000 4 $64,000
Primary Storage $143,360 1 $143,360
TOTAL $228,922 $4,245,508
Assumes Oracle Enterprise Edi6on ($47.5K/CPU) and RAC ($23K/CPU)
Splice Machine Costs List Price Unit 3 Year Cost
Splice Machine Annual Subscrip6on
$10,000 7 $210,000
Cloudera Enterprise Edi6on Annual Subscrip6on
$7,500 8 $180,000
3 years Opera6ng System Support (Oracle Linux)
$6,897 4 $27,588
Server Costs with Storage $5,000 8 $40,000
TOTAL $22,500 $457,588
§ 90% TCO Reduc6on ($3.8M) § 3-‐7x faster
Splice Machine | Proprietary & Confiden6al 12 Splice Machine | Proprietary & Confiden6al
TPC-‐C Benchmark
§ “Gold standard” for OLTP § Requires high concurrency
transac6ons § 5 very complex queries § Models ERP order-‐entry:
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
Ini6al TPC-‐C Results* on Commodity Hardware
13
Linear scalability for transacGonal workload on Hadoop
-‐
10,000
20,000
30,000
40,000
50,000
60,000
70,000
4 8 16
Tran
sac2on
s per M
inute (tmpC
)
Nodes * Unaudited
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
Experimental TPC-‐C Results: Splice Machine HBase Fork
14
Linear scalability for transacGonal workload on Hadoop
-‐
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
4 8 16
Tran
sac2on
s per M
inute (tmpC
)
Nodes
tpmC
tpmC (Hbase Patch)
* Unaudited
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
3rd-‐Party Applica6ons vs Ad-‐Hoc Queries
Dynamically Generated SQL § No workarounds since generated code § Not human-‐friendly
§ Machine-‐generated symbols not meant to be interpreted § Code not indented or styled § No comments
Complex Sub-‐Queries § Object–Rela6onal Mappings create many levels of sub-‐queries § Applica6ons: object oriented to achieve code efficiency, reuse, and understandability § Databases: rela6onal to achieve performance, ACID proper6es, and minimal storage
High Concurrency § Must support 1,000s to 10,000s of concurrent users
Far more difficult to accommodate than analyGc queries
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al 16
Campaign Defini2on § Customer Segmenta6on by Household § Output of 8 Segments
§ By Country § By Previous 12 Months
§ Use of Select, Merge and Audience Processes
Data Flow Process § Select, Merge & Audience Processes § Selec6ons Rules
§ Household that Have Opted in § Grouped by Loyalty and Non
Loyalty
16
OLTP Applica6on: Unica Splice Machine powers the Unica ApplicaGon
SQL for Complex Selec2on Rule INSERT INTO UAC_639_1c SELECT A.CUSTOMER_MASTER_ID FROM UAC_639_v A
WHERE A.CUSTOMER_MASTER_ID NOT IN ( SELECT UAC_639_14.CUSTOMER_MASTER_ID FROM UAC_639_14 UNION SELECT UAC_639_11.CUSTOMER_MASTER_ID FROM UAC_639_11 UNION SELECT UAC_639_12.CUSTOMER_MASTER_ID FROM UAC_639_12 U NION SELECT UAC_639_13.CUSTOMER_MASTER_ID FROM UAC_639_13);
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
OLTP Applica6on: Unica
17
Splice Machine powers the Unica ApplicaGon
¼ cost with commodity scale out
Architecture
Real-‐Time Personaliza6on
Real-‐Time Ac6ons
Consumers
Cross Channel
Campaigns
Ini2al Results vs. Oracle RAC
3-‐7x faster through parallelized queries
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al 18
OLTP Applica6on: Redpoint Splice Machine powers the RedPoint Convergent MarkeGng ApplicaGon
Campaign Defini2on § Audience selec6on § Dataflow § Offers Data Flow Process § Suppression § Split rules
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al 19
INSERT INTO "AMEX"."RP_BC_69_2" SELECT a4."PID" AS "PID", a4."HHID" AS "HHID",
(CASE WHEN a5."PID" IS NULL THEN 'N' ELSE 'Y' END) AS "Standard_Suppression_2044", (CASE WHEN a10."PID" IS NULL THEN 'N' ELSE 'Y' END) AS "Low_Value_Customer_1729", (CASE WHEN a13."PID" IS NULL THEN 'N' ELSE 'Y' END) AS "Not_Mailable_1128"
FROM "AMEX"."RP_BC_69_1" a4 LEFT OUTER JOIN
(SELECT a6."PID" FROM "AMEX"."RP_BC_69_1" a6 INNER JOIN "AMEX"."PERSON" a7 ON a6."PID" = a7."PID" WHERE EXISTS (SELECT a8."PID" FROM "AMEX"."PERSON_ACCOUNT_MASTER_FINANCE" a8 INNER JOIN "AMEX"."UD_ACCOUNT_DETAILS_FINANCE" a9 ON a8."RP_ACCOUNT_ID" = a9."RP_ACCOUNT_ID" WHERE a7."PID" = a8."PID" AND a9."ACCOUNT_STATUS" = 'O')) AS a5 ON a4."PID" = a5."PID"
LEFT OUTER JOIN (SELECT a11."PID" FROM "AMEX"."RP_BC_69_1" a11 INNER JOIN "AMEX"."PERSON" a12 ON a11."PID" = a12."PID" WHERE a12."CUSTOMER_SEGMENT" = 1) AS a10 ON a4."PID" = a10."PID"
LEFT OUTER JOIN (SELECT a14."PID" FROM "AMEX"."RP_BC_69_1" a14 INNER JOIN "AMEX"."PERSON" a15 ON a14."PID" = a15."PID" WHERE EXISTS (SELECT a16."PID" FROM "AMEX"."PERSON_ADDRESS" a16 INNER JOIN "AMEX"."ADDRESS" a17 ON a16."ADDR_ID" = a17."ADDR_ID" WHERE a15."PID" = a16."PID" AND a17."STD_STATUS_CODE" IN ('M', 'X', '7'))) AS a13 ON a4."PID" = a13."PID" ;
OLTP Applica6on: Redpoint Splice Machine powers the RedPoint Convergent MarkeGng ApplicaGon
SQL for Complex Selec2on Rule
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
OLTP Applica6on: Redpoint
20
Splice Machine powers the RedPoint Convergent MarkeGng ApplicaGon
¼ cost with commodity scale out
Real-‐Time Offers
Real-‐Time Data
Consumers
Stream or Batch Updates
5-‐10x faster through parallelized queries
Architecture Ini2al Results vs. Oracle RAC
Splice Machine | Proprietary & Confiden6al 21
Internet of Things
ETL/Opera6onal Data Lake Digital Marke6ng
Precision Medicine
Use Cases
Splice Machine | Proprietary & Confiden6al
Fraud Detec6on
Splice Machine | Proprietary & Confiden6al 22 Splice Machine | Proprietary & Confiden6al
Sneak Peek: Splice Machine 2.0
Advantages § OLAP + OLTP § Massive scalability § Spark in-‐memory compu6ng engine § High-‐concurrency ACID transac6ons § ANSI SQL § Seamless integra6on § Isolated resource management
Benchmarks § Simultaneous TPC-‐C and TPC-‐DS § Never done before
First Hybrid, In-‐Memory RDBMS Powered by Hadoop and Spark
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
Summary
Power OLTP Apps on Hadoop § First TPC-‐C benchmark run on Hadoop § Leverage Hadoop for both OLTP and OLAP
Stay Tuned! § Hybrid, in-‐memory RDBMS powered
by Hadoop and Spark § Support mixed OLTP & OLAP workloads § Look for simultaneous TPC-‐C & TPC-‐DS
benchmark results
Splice Machine | Proprietary & Confiden6al 24 Splice Machine | Proprietary & Confiden6al
Ques6ons?
John Leach CTO
Splice Machine
Monte Zweben CEO
Splice Machine
Splice Machine | Proprietary & Confiden6al
OLTP on Hadoop: Reviewing the first Hadoop-‐based TPC-‐C
benchmarks
Monte Zweben Co-‐Founder and Chief Execu6ve Officer
John Leach
Co-‐Founder and Chief Technology Officer
September 30, 2015
Splice Machine | Proprietary & Confiden6al
OLTP on Hadoop: Reviewing the first Hadoop-‐based TPC-‐C
benchmarks
Appendix
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al
Unica Customer Marke6ng Service Provider
Pilot § Original number of records in each table:
§ Household Master -‐ 107 million § Customer Preference -‐ 600 million § Household Computed Value -‐ 107 million § Customer Address Quality -‐ 71 million
§ Unica 6mings § Strategic Segments – EM (36 processes) (oracle: 5 hours -‐ splice: 2 hours, 10 minutes) § Strategic Segments – DM (23 processes) (oracle: 3 hours -‐ splice: 2 hours, 15 minutes)
27
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al 28
Unica Demo Campaign
Flowchart 1 " 8 segments of household IDs created
" Preference indicated to receive direct mail " Iden6fied as valid (vs. ghost) household in advance
" Segments used as inputs for other Flowcharts " Valid name and address
" Loyalty customers converted to unique list " Household ID iden6fier by audience " Addi6onal Extract of Canadian household details
" Real 6me updates to Splice DB table
Splice Machine | Proprietary & Confiden6al Splice Machine | Proprietary & Confiden6al 29
Unica Demo Campaign (cont.)
Household Segments:
" a list of US based households (US DM HH) " the list of US based household just men6oned but where there has been at least one transac6on in the last 12 months. (US DM HH 12M)
" a list of US based loyalty households (US PP DM HH) " a list of US based loyalty households with at least one transac6on in the last 12 months (US PP DM HH 12M)
" a list of Canadian house holds (Canada DM) " a list of Canadian house holds with at least one transac6on in the last 12 months (Canada DM PP)
" a list of Puerto Rico based households (Puerto Rico DM) " a list of Puerto Rico based loyalty households with at least one transac6on in the last 12 months (PR PP DM)