sql in hadoop: big data innovation without the risk
TRANSCRIPT
Twitter Tag: #briefr The Briefing Room
Reveal the essential characteristics of enterprise software, good and bad
Provide a forum for detailed analysis of today’s innovative technologies
Give vendors a chance to explain their product to savvy analysts
Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr The Briefing Room
Topics
July: SQL INNOVATION
August: REAL-TIME DATA
September: HADOOP 2.0
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is Chief Analyst at The Bloor Group
[email protected] @robinbloor
Twitter Tag: #briefr The Briefing Room
Actian
Actian offers a variety of analytics, data management and integration solutions
The Actian Analytics Platform includes Vortex, a SQL-in-Hadoop solution for big data analytics
Actian Vortex leverages a vector-based columnar analytics engine that is YARN-compliant
Twitter Tag: #briefr The Briefing Room
Guest: Todd Untrecht
Todd Untrecht joined Actian in 2013, where he is currently Vice President of Global Product Management and Strategy. Todd brings more than 20 years of experience in both large company and startup environments. He specializes in product management, engineering management, and business transformation with particular expertise leading and aligning global engineering and product organizations, driving new products into new markets, and accelerating cross-company innovation.
Confiden'al © 2014 Ac'an Corpora'on 10
SQL in Hadoop
Todd Untrecht -‐ Vice President, Product Management and Strategy Emma McGraIan – Sr. Vice President, Engineering Ac'an Corpora'on
July 2015
Big Data Innova'on Without Risk
Bloor Group Briefing Room
Confiden'al © 2015 Ac'an Corpora'on 11
Who is Ac'an? $100M+ Revenues & Profitable
10,000+ Customers
Global Presence: 8 world-wide offices, 7x 24 multinational support model
11 “Fast becoming a big data powerhouse to challenge the market.” Forrester
“Actian is now very powerfully positioned in the big data and analytics markets.” Bloor
Actian has invested 100’s of millions into next generation technology that is architected to meet future demands
Confiden'al © 2014 Ac'an Corpora'on 12
Modernizing BI & Analy'c Workloads
Small Data Big Data
Opera'onal
Analy'cs
Performance Ceiling
Analyze more (and different)
data
Big Data SQL Analy/cs Market
Reduce Costs
Catalysts driving Big Data SQL Analy'cs
Ac'an Sweet Spot
Modern, massively distributed compute infrastructure w/ commodity hw
Well Recognized Business Value but Exis'ng Legacy Systems failing SQL
Confiden'al © 2015 Ac'an Corpora'on 13
Your BI and Analy'c Systems are Under Pressure
Legacy HW & SW Pla:orms
INDUSTRIAL
SQL
ISVs
Custom Apps
Confiden'al © 2015 Ac'an Corpora'on 14
Your BI and Analy'c Systems are Under Pressure
Legacy HW & SW Pla:orms
INDUSTRIAL
SQL
ISVs
Custom Apps
Increasing pressures on legacy infrastructure are causing analy/c workloads to break
Confiden'al © 2015 Ac'an Corpora'on 15
Your BI and Analy'c Systems are Under Pressure
Legacy HW & SW Pla:orms
INDUSTRIAL
SQL
ISVs
Custom Apps
Increasing pressures on legacy infrastructure are causing analy/c workloads to break
$$
Confiden'al © 2015 Ac'an Corpora'on 16
Your BI and Analy'c Systems are Under Pressure
Legacy HW & SW Pla:orms
INDUSTRIAL
SQL
ISVs
Custom Apps
Increasing pressures on legacy infrastructure are causing analy/c workloads to break
$$
Confiden'al © 2015 Ac'an Corpora'on 17
How to Innovate and Modernize Without Risk….
ISVs
Custom Apps
Legacy HW & SW Pla:orms
INDUSTRIAL
SQL
Confiden'al © 2015 Ac'an Corpora'on 18
How to Innovate and Modernize Without Risk….
ISVs
Custom Apps
Legacy HW & SW Pla:orms
INDUSTRIAL
SQL
Modern, massively distributed compute infrastructure w/ commodity HW
Confiden'al © 2015 Ac'an Corpora'on 19
How to Innovate and Modernize Without Risk….
ISVs
Custom Apps
Legacy HW & SW Pla:orms
INDUSTRIAL
SQL
Modern, massively distributed compute infrastructure w/ commodity HW
Keep Existing Apps and People
Confiden'al © 2015 Ac'an Corpora'on 20
How to Innovate and Modernize Without Risk….
ISVs
Custom Apps
Legacy HW & SW Pla:orms
INDUSTRIAL
SQL
Modern, massively distributed compute infrastructure w/ commodity HW
Grow Data with No Change in Performance and No Extra Budget
Confiden'al © 2015 Ac'an Corpora'on 21
How to Innovate and Modernize Without Risk….
ISVs
Custom Apps
Legacy HW & SW Pla:orms
INDUSTRIAL
SQL
Modern, massively distributed compute infrastructure w/ commodity HW
Leverage Advances in Open Source and Avoid Vendor Lock-In
Confiden'al © 2015 Ac'an Corpora'on 22
How to Innovate and Modernize Without Risk….
ISVs
Custom Apps
Legacy HW & SW Pla:orms Modern, massively distributed compute
infrastructure w/ commodity HW
Ac/an Vortex Modern, Super Scaling Columnar SQL Analy'c Engine
Enterprise Grade, Fast, Open INDUSTRIAL
SQL
Enterprise
Social
Internet of Things
SaaS
DATA Delight Customers
Improve Compe''ve Edge
Reduce Risk and Cost
Innovate
VALUE
The Wiz Data
Scien'st
IT Sophis/cate CIO
Maestro Business Analyst
Speed Demon Impa'ent
Business User
Elas'c Data Prepara'on
SQL Analy'cs
Predic've Analy'cs
Ac'an Vortex™ Highest Performance Analy'cs at Scale in Hadoop
Enterprise
Social
Internet of Things
SaaS
DATA Delight Customers
Improve Compe''ve Edge
Reduce Risk and Cost
Innovate
VALUE
The Wiz Data
Scien'st
IT Sophis/cate CIO
Maestro Business Analyst
Speed Demon Impa'ent
Business User
Ac'an Vortex™ Highest Performance Analy'cs at Scale in Hadoop
Predic've Analy'cs
Ac/an Vector in Hadoop
Elas/c Data Prepara/on
DataFlow SQL Analy/cs
Vector in Hadoop
Predic/ve Analy/cs
DataFlow
Enterprise
Social
Internet of Things
SaaS
DATA Delight Customers
Improve Compe''ve Edge
Reduce Risk and Cost
Innovate
VALUE
The Wiz Data
Scien'st
IT Sophis/cate CIO
Maestro Business Analyst
Speed Demon Impa'ent
Business User
Ac'an Vortex™ Highest Performance Analy'cs at Scale in Hadoop
Predic've Analy'cs
Ac/an Vector in Hadoop
Elas/c Data Prepara/on
DataFlow SQL Analy/cs
Vector in Hadoop
Predic/ve Analy/cs
DataFlow
Enterprise
Social
Internet of Things
SaaS
DATA Delight Customers
Improve Compe''ve Edge
Reduce Risk and Cost
Innovate
VALUE
The Wiz Data
Scien'st
IT Sophis/cate CIO
Maestro Business Analyst
Speed Demon Impa'ent
Business User
Ac'an Vortex™ Highest Performance Analy'cs at Scale in Hadoop
Predic've Analy'cs
Ac/an Vector in Hadoop
Elas/c Data Prepara/on
DataFlow SQL Analy/cs
Vector in Hadoop
Predic/ve Analy/cs
DataFlow
Confiden'al © 2014 Ac'an Corpora'on 27
Vortex -‐ Elas'c Data Prepara'on
Remote Vortex Hadoop Cluster
High Volume Data Pipes D
ata
Inflo
w
…
Data
Data
Data
…
HDFS
Data
LAN
Vector in Hadoop DataFlow Elas/c Data Inges/on
• New Edge-‐to-‐Engine high speed parallel inges'on no maIer where the source data resides
Local Data Sources
Cloud Data & Applications
Data
Data Highly parallel and elas'c data inges'on
New Vector in Hadoop Writer
Data
• Secure, Compressed, Binary to Binary inges'on (no intermediate HDFS files needed)
The fastest way to get big data into Ac'an
Data
Streaming
Confiden'al © 2014 Ac'an Corpora'on 28
Vortex – Open Architecture
Vortex Hadoop Cluster
High Volume Data Pipes D
ata
Inflo
w
…
Data
Data
Data
…
HDFS
Data
Vector in Hadoop DataFlow Elas/c Data Inges/on
Query na've Hadoop file formats (i.e. Parquet) without inges'on…
Highly parallel and elas'c data inges'on
Parquet
Parquet
Parquet
Parquet
External Table Support
Remote
Local Data Sources
Cloud Data & Applications
Data
Data
Data
Data
Streaming
LAN
Confiden'al © 2014 Ac'an Corpora'on 29
Vortex – Open Architecture Open up Vector in Hadoop file format for lightning fast external consump'on
Open APIs / Java Reference Implementa'on
Enterprise
Social
Internet of Things
SaaS
DATA Delight Customers
Improve Compe''ve Edge
Reduce Risk and Cost
Innovate
VALUE
The Wiz Data
Scien'st
IT Sophis/cate CIO
Maestro Business Analyst
Speed Demon Impa'ent
Business User
Ac'an Vortex™ Highest Performance Analy'cs at Scale in Hadoop
Predic've Analy'cs
Ac/an Vector in Hadoop
Elas/c Data Prepara/on
DataFlow SQL Analy/cs
Vector in Hadoop
Predic/ve Analy/cs
DataFlow
Confiden'al © 2015 Ac'an Corpora'on 31
The Basics: Ac'an Vector
Pioneered high speed columnar, Vector processing architecture
Over 10 Years in Development
5 Years in Produc'on
Supports Standard SQL Interfaces
Mature SQL Processing Front-‐End
Supports Advanced Analy'c Capabili'es e.g. CUBE, ROLLUP, GROUPING SETS, Windowing Func'ons
Unique Trickle Update Capabili'es
Designed to Leverage Modern Hardware
SQ
L P
roce
ssin
g SQL parser
Optimizer
Cross compiler
parsed tree
query plan
Client application
X100 algebra
X10
0
X100 rewriter
Builder
Execution engine
annotated query tree
operator tree
Buffer manager
data data request
Compressed Storage
SQL query
I/O
result
Founda'on for Enterprise Grade
Confiden'al © 2015 Ac'an Corpora'on 32
Ac'an Vector in Hadoop – Distributed X100 ‘”Secret Sauce” S
QL
Pro
cess
ing SQL parser
Optimizer
Cross compiler
parsed tree
query plan
Client application
X100 algebra
X10
0
Distributed rewriter
Builder
Execution engine
annotated query tree
operator tree
Buffer manager
data data request
HDFS
Mas
ter n
ode
SQL query
I/O
result
HDFS namenode
Confiden'al © 2015 Ac'an Corpora'on 33
Ac'an Vector in Hadoop – Distributed X100 ‘”Secret Sauce” S
QL
Pro
cess
ing SQL parser
Optimizer
Cross compiler
parsed tree
query plan
Client application
X100 algebra
X10
0
Distributed rewriter
Builder
Execution engine
annotated query tree
operator tree
Buffer manager
data data request
HDFS
Mas
ter n
ode
SQL query
I/O
result
HDFS namenode
annotated tree
partial result set
MPI
MPI
X100
X100
X100
X100
HDFS
HDFS
HDFS
HDFS
HDFS
X100
Wor
ker n
ode
[1..n
] (da
tano
des)
X10
0
Rewriter
Builder
Execution engine
annotated query tree
partial operator tree
Buffer manager
data data request
HDFS
I/O
MP
I in
ter-
node
com
mun
icat
ion
HDFS datanode
X100
X100
X100
X100
Ac'an Vector extended and op'mized to run inside a Hadoop cluster
Confiden'al © 2015 Ac'an Corpora'on 34
Ac'an Vector in Hadoop – Distributed X100 ‘”Secret Sauce” S
QL
Pro
cess
ing SQL parser
Optimizer
Cross compiler
parsed tree
query plan
Client application
X100 algebra
X10
0
Distributed rewriter
Builder
Execution engine
annotated query tree
operator tree
Buffer manager
data data request
HDFS
Mas
ter n
ode
SQL query
I/O
result
HDFS namenode
annotated tree
partial result set
MPI
MPI
X100
X100
X100
X100
HDFS
HDFS
HDFS
HDFS
HDFS
X100
Wor
ker n
ode
[1..n
] (da
tano
des)
X10
0
Rewriter
Builder
Execution engine
annotated query tree
partial operator tree
Buffer manager
data data request
HDFS
I/O
MP
I in
ter-
node
com
mun
icat
ion
HDFS datanode
X100
X100
X100
X100
Ac'an Vector extended and op'mized to run inside a Hadoop cluster
• Enterprise Ready, Industrial Strength SQL in Hadoop
• HDFS for storage scalability and redundancy • ACID Compliant with SQL update capability • YARN cer'fied for cluster and resource
management • Vector performance on every node
Confiden'al © 2015 Ac'an Corpora'on 35
0
5
10
15
20
25
30
35
Q3 Q7 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q98
“Impala Subset” of TPC-DS at Scale Factor 3000 (3TB) Actian+HDP2.1 vs Cloudera Impala
Impala Actian
Background to “Impala Subset “of TPC-DS benchmark can be found here: http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/
Both Executed on the Same Hardware and Software Environment: 5 Node Cluster with 64GB of RAM per node and 24x1TB Hard Disks.
16x Faster Average
Results: Highest Performing SQL in Hadoop # /m
es fa
ster th
an Im
pala Up to 30x Faster
Confiden'al © 2015 Ac'an Corpora'on 36
0
5
10
15
20
25
30
35
Q3 Q7 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q98
“Impala Subset” of TPC-DS at Scale Factor 3000 (3TB) Actian+HDP2.1 vs Cloudera Impala
Impala Actian
Background to “Impala Subset “of TPC-DS benchmark can be found here: http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/
Both Executed on the Same Hardware and Software Environment: 5 Node Cluster with 64GB of RAM per node and 24x1TB Hard Disks.
16x Faster Average
Results: Highest Performing SQL in Hadoop # /m
es fa
ster th
an Im
pala Up to 30x Faster Note the use
of partition keys
Vortex -‐ Summary
• Collabora've architecture • Open access to Ac'an formats • Support for non-‐Ac'an formats
You’re NOT locked in, and you can benefit from all the
advances and innovaBon in open source.
• Fastest data prep and inges'on • Fastest SQL analy'c engines • Unbridled processing power on
data nodes in a Hadoop cluster
• Full SQL support • Extreme scalability • Full security • High Availability &
Disaster Recovery
You get the results you need when you need them as your
data volumes grow
You get all the advantages of proven technology in an
immature space.
What we provide
Customer Benefits
Open Fast Enterprise Grade
Confiden'al © 2015 Ac'an Corpora'on 39
…and it’s very easy to get started
Pick your analy'c workload causing you the most pain
Seamlessly run it in our modern SQL analy'cs plaqorm
Benefit from our open architecture
Enjoy flexible deployment op'ons (on-‐prem or in the cloud)
Get up and running in 30 minutes
Easily migrate workloads
Innovate and modernize now without risk…
Ac/an Vortex Modern, Super Scaling Columnar SQL Analy'c Engine
INDUSTRIAL
SQL
Confiden'al © 2014 Ac'an Corpora'on 40 Confiden'al © 2014 Ac'an Corpora'on 40
Thank You! Thank You!
Ques'ons?
Confiden'al © 2014 Ac'an Corpora'on 41
Clearly Differen'ated
Slow Fast
Immature
Industrial Strength
Enterprise Re
adiness
Performance
Open Source Up-‐Starts
Big Data Analy/cs Market
Good Enough
Produc'on Ready
Legacy Opera'onal
Level of Openness
Modern, Super Scaling Columnar Analy'c Engines
Ac'an – Open, Fast, Enterprise Grade SQL
Johnny-Come-Lately
Data Science/Analytics is not an application; it is a work flow environment involving many
applications
The Data Science Latencies
1 Data access
2 Data preparation
3 Model development
4 Execution
5 Implementation
6 Model audit & update
This is where the rubber meets the road: Speed = Value
u Given that analytics is a complex application, what is the process of implementing Actian’s technology?
u How many of your customers (roughly) are building predictive analytics apps?
u Does your technology have application for real-time streaming?
u Is Vortex also appropriate for BI applications?
u What is the largest amount of data currently under management with any of your customers?
u Which companies/technologies do you compete with directly?
Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
July: SQL INNOVATION
August: REAL-TIME DATA
September: HADOOP 2.0