level up – how to achieve hadoop acceleration
DESCRIPTION
The Briefing Room with Robin Bloor and HP Vertica Live Webcast on August 26, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=3dd6d1b068fe395f665c75adb682ac41 Hadoop has long passed the point of being a nascent technology, but many users have found that when left to its own devices, Hadoop can be a one trick pony. To get the most out of Hadoop, organizations need a flexible platform that empowers analysts and data managers with a complete set of information lifecycle management and analytics tools without a performance tradeoff. Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he outlines Hadoop’s role in a big data architecture. He’ll be briefed by Walt Maguire of HP Vertica, who will showcase his company’s big data solutions, including HAVEn and the HP Big Data Platform. He will demonstrate how HP Vertica acts as a complement to Hadoop, and how the combination of the two provides a versatile and highly performant solution. Visit InsideAnlaysis.com for more information.TRANSCRIPT
Grab some coffee and
enjoy the
pre-show
banter
before the top of the
hour!
The Briefing Room
Level Up – How to Achieve Hadoop Acceleration
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
Topics
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
This Month: BIG DATA ECOSYSTEM
September: INTEGRATION & DATA FLOW
October: ANALYTIC PLATFORMS
Twitter Tag: #briefr
The Briefing Room
Executive Summary
u Yes, you still need to PLAN
u File formats and partitioning MATTER
u Pay attention to TCO
u Be willing to fail fast, OFTEN
LOOK BEFORE YOU LEAP INTO HADOOP!
Twitter Tag: #briefr
The Briefing Room
Analyst: Robin Bloor
Robin Bloor is Chief Analyst at The Bloor Group
[email protected] @robinbloor
Twitter Tag: #briefr
The Briefing Room
HP Vertica
! HP Vertica offers a range of enterprise software and database management solutions
! The column-oriented Vertica Analytics Platform leverages a standard SQL interface that now integrates with Hadoop
! HP’s big data initiative includes HAVEn (Hadoop, Autonomy IDOL, Vertica, Enterprise Security and n Apps), a platform designed to analyze and manage petabytes of data
Twitter Tag: #briefr
The Briefing Room
Guest: Walter Maguire
Walter Maguire has twenty-seven years of experience in analytics and data technologies. He practiced data science before it had a name, worked with big data when "big" meant a megabyte, and supported the movement which brought data management and analytic technologies from back-office to the front. In October of 2010, Walt became the first hire west of Denver for Vertica, makers of the Vertica Analytics software platform for real-time analytics of structured and unstructured data. Since then, he has helped build the HP Vertica customer base and team in the western USA. Now as Chief Field Technologist with HP Vertica, Walt addresses customer needs with the continuing evolution of Vertica and HAVEn, the HP Big Data strategy that links hardware, software, services, and business transformation consulting for successful execution.
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
With the HP Vertica Analytics Platform Walt Maguire, Chief Field Technologist, HP Vertica Chris Selland, VP Business Development, HP Vertica
Big Data & SQL: Hadoop Convergence
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 11
Essential Requirements of an Analytics Platform
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 12
Faster answers from Big Data at a fraction of the cost of traditional data warehouses
Introducing HP Vertica Dragline
Store all your data in any format cost-effectively across Vertica + Hadoop
Explore all your data directly in Hadoop without moving or changing it
Serve all of your data consumers without compromise from individualized queries to large complex reports
HP Vertica
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 13
Cost-Optimized Storage - ILM
Tier-off older data
Value Discovery
Interactive Data Frequently queried Vertica data cache
Batch Data
Archive Data
Serve Convert data to Vertica storage format
Explore Any format
Store Any format Location Format
Cold
Cool
Hot
Dark Data
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 14
The Richest, Most Open SQL on Hadoop
Challenge: Extracting data from Hadoop requires complex and brittle ETL processes
Solution: Hadoop Navigation and Analytics Benefits: • Navigate Hadoop data using its native catalog • Quickly and easily load native data types from Hadoop to
Vertica • Avoid creating and maintaining time-consuming schemas • Use the full power of HP Vertica SQL and analytics • Choose your own Hadoop distribution
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 15
The Richest, Most Open SQL on Hadoop
Challenge: Extracting Data from Hadoop requires complex and brittle ETL processes
Solution: Hadoop Navigation and Analytics Benefits: • Navigate Hadoop data using its native catalog • Quickly and easily load native data types from Hadoop to
Vertica • Avoid creating and maintaining time-consuming schemas • Use the full power of HP Vertica SQL and Analytics • Choose your own Hadoop distribution
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
16
HP Vertica Flex Zone Avoid creating and maintaining time-consuming schemas
on semi-structured data Faster SQL querying
semi-structured data loading Auto-schematization
for JSON and delimited data Flexible parsers
for blazing-fast performance One-step schema
Load, manage, and explore semi-structured data
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Exploring Data with FlexZone
Create Flex Table (with or without any columns)
Load Flex Tables (using parsers
data format independent)
Explore Data Using map functions
Materialize Flex Table (Compute keys / Build Views Materialize flextable columns)
Manage Flex Table (Alter, Config, etc)
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Monitor Customer Experience by Joining Machine Logs to Tweets
Create Flex Table with Machine Log
Data
Explore Data with Map Functions
Create Flex Table with Twitter Data
Sentiment Score Tweets with Pulse
Join Machine Log Data to Tweets
with Time Series Event Join
Associate customer
sentiment with application
response times
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Ecosystem
HP Vertica
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
• Integrated SQL-on-Hadoop solution with MapR live as of May 2014
• Reseller relationship with Hortonworks announced July 2014
• Significant joint customer base with Cloudera
Hadoop Partnership Momentum and Milestones
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Vertica Marketplace
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank you!
Walt Maguire Chief Field Technologist, HP Vertica [email protected] Chris Selland VP Business Development, HP Vertica [email protected]
Twitter Tag: #briefr
The Briefing Room
Perceptions & Questions
Analyst: Robin Bloor
Robin Bloor, PhD
THE NEXT PHASE OF HADOOP’S EVOLUTION?
Hadoop Evolution FR
OM
Serial batch workloads MapReduce Versatile data storage Key-value access only Island of processing
TO Multiple concurrent
workloads Multiple algorithms “Optimized” data storage SQL, JSON & even SPARQL access Integrated processing
The Data Warehouse: From/To
Data & Data Lifecycle Management
The Major Workload Will Be Analytics
The consequences are that: 1. DATA ACCESS will need to be more
versatile
2. WORKLOAD MANAGEMENT will need to be more versatile
3. Hadoop will need to “shake hands” with ONE OR MORE database engines
u In general what is the DBA overhead for the combination of Hadoop, Flex Zone and Vertica?
u How does data lifecycle management work in practice?
u Analytics can be done on Hadoop, in Flex Zone and in Vertica. How are these choices normally handled in a Flex Zone/Vertica environment?
u What analytics components can be used with Flex Zone and Vertica?
u What do you see as the sweet spot for this architecture (by sector & company size)? Where might it be overkill?
u In respect to scale, what is your largest implementation of Hadoop/Flex Zone/Vertica by data volume?
u Who do you see as closest in direct competition?
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
www.insideanalysis.com
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
This Month: BIG DATA ECOSYSTEM
September: INTEGRATION & DATA FLOW
October: ANALYTIC PLATFORMS
Twitter Tag: #briefr
The Briefing Room
THANK YOU for your
ATTENTION!
Opening slide image courtesy of Wikimedia Commons