pentaho big data analytics with vertica and hadoop
DESCRIPTION
Overview of the Pentaho Big Data Analytics Suite from the Pentaho + Vertica presentation at Big Data Techcon 2014 in Boston for the session called "The Ultimate Selfie | Picture Yourself with the Fastest Analytics on Hadoop with HP Vertica and Pentaho"TRANSCRIPT
The Ultimate Selfie | Picture Yourself with the Fastest Analytics on Hadoop with HP Vertica and Pentaho
© 2014, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75552
The Ultimate Selfie Picture Yourself with the Fastest Analytics on Hadoop
with HP Vertica and Pentaho
Pentaho Big Data Analytics
Mark KromerPentaho Big Data Analytics Product Manager
© 2014, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75553
DBA ETL/BI Developer Business Users & Executives
Analysts & Data Scientists
OPERATIONAL DATA BIG DATA DATA STREAMPUBLIC/PRIVATE CLOUDS
Enterprise & Interactive Reporting
Interactive Analysis
Dashboards Predictive Analytics
Pentaho Business Analytics
Data IntegrationInstaview | Visual Map Reduce
DIRECT ACCESS
Pentaho Business Analytics Platform
© 2014, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75554
Product Components
Pentaho Data Integration
• Visual development for big data• Broad connectivity• Data quality & enrichment• Integrated scheduling• Security integration
• Visual data exploration• Ad hoc analysis• Interactive charts & visualizations
Pentaho Dashboards
• Self-service dashboard builder• Content linking & drill through• Highly customized mash-ups
Pentaho Data Mining & Predictive Analytics
• Model construction & evaluation • Learning schemes• Integration with 3rd part models
using PMML
Pentaho Enterprise & Interactive Reports
• Both ad hoc & distributed reporting• Drag & drop interactive reporting• Pixel-perfect enterprise reports
Pentaho for Big Data MapReduce & Instaview
• Visual Interface for Developing MR
• Self-service big data discovery• Big data access to Data Analysts
Pentaho Analyzer
© 2014, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75555
❯ Simple, easy-to-use visual data exploration
❯ Web-based thin client; in-memory caching
❯ Rich library of interactive visualizations • Geo-mapping, heat grids, scatter plots, bubble
charts, line over bar and more• Pluggable visualizations
❯ Java ROLAP engine to analyze structured and unstructured data, with SQL dialects for querying data from RDBMs
❯ Pluggable cache integrating with leading caching architectures: Infinispan (JBoss Data Grid) & Memcached
Pentaho Interactive Analysis & Data DiscoveryHighly Flexible Advanced Visualizations
© 2014, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75556
Pentaho Data Integration
Easy to Use, Highly Scalable
❯Graphical ETL designer
❯Data agnostic
• Structured, unstructured, web services, packaged
apps (Google, SAS, SFDC, etc.), big data sources,
traditional sources, JSON, XML, HL7, etc.
❯Batch, low-latency & real time processing
❯Scale-out architecture, deployable to PDI clusters,
Hadoop clusters
❯100% Java engine; plug-in architecture for extensibility
❯Workflow, alerting, monitoring
Integration, Manipulation & Enrichment
Use Cases:
Classic ETL – data warehouse creation, population & maintenance
Information Delivery – extraction from multiple data sources,
transformation and streaming to a report
MapReduce Applications – implementing “code-free”
transformation pipelines within Hadoop
Extensibility – adding 3rd-party functionality that automatically
works within any of the above use cases.
© 2014, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75557
Pentaho Big Data Analytics Accelerate the time to big data value
• Full continuity from data
access to decisions –
complete data integration &
analytics for any big data
store
• Faster development,
faster runtime – visual
development, distributed
execution
• Instant and interactive
analysis – no coding and
no ETL required
© 2014, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75558
Pentaho Visual DevelopmentEliminates the Need for Complex Coding
Would you rather do this?
Scheduling Modeling
Ingestion / Manipulation / Integration
… or this?
© 2014, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75559
Pentaho Visual MapReduceDrag & Drop, Then Run in the Cluster
Parallel Execution as MapReduce in the Hadoop Cluster
As Much as 15x Faster Than Hand-Written Code
© 2014, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755510
• Major sponsor of the open source project Weka
• Data exploration/visualization, model construction and
export, preliminary evaluation
• Numerous classification/regression and clustering
algorithms
• Integration with Pentaho Data Integration
❯ Import 3rd-party models using Predictive Modeling
Markup Language (PMML)
❯ Operationalize models inside or outside of a Hadoop
Cluster
❯ Incorporate algorithms into Pentaho visual interface;
store and version models using the Pentaho repository
Pentaho Predictive Analytics
Full Predictive Analytics Lifecycle Support
© 2014, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755511
Streamlined Data RefineryDrive a Sustainable Analytics Strategy with Big Data Orchestration at Scale
Transactions – Batch & Real-time
Enrollments & Redemptions
Location, Email, Other Data
Hadoop Cluster
Analyzer
Reports
Data Orchestration
© 2014, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755512
blog.pentaho.com
@Pentaho
Facebook.com/Pentaho
Pentaho Business Analytics
JOIN THE CONVERSATION. YOU CAN FIND US ON: