hardware & software acquisition costs · 2017-05-23 · •scalable: performance dynamically...
TRANSCRIPT
www.pervasivedatarush.com
Pervasive DataRushTM
Parallel Data Analysis with KNIME
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES 2
Company Overview
Global Software Company
• Tens of thousands of users across the globe
• Americas, EMEA, Asia
• ~230 employees worldwide
Strong Financials
• $46 million revenue (Trailing 12-month)
• 40 consecutive quarters of profitability
• $36 million in the bank
• NASDAQ:PVSW since 1997
Leader in Data Innovation
• Cloud-Based and On-Premises Data Integration
• Data Management
• Web-based Business-to-Business Data Interchange
• Highly Parallel Data-Intensive and Analytic Applications
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Data Size
Com
ple
xit
y
HPC • climate modeling
• seismic analysis
• fluid dynamics
Internet scale
• web indexing
• web search
GB PB
Enterprise data
• custom solutions
• data quality
• data analytics
Need to deal
with increased
data and
complexity
The Challenge of Big Data
3
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Pervasive DataRush™
• Scalable: Performance dynamically scales with
increased core counts and increased nodes.
• High Throughput: Fast, deep analysis of large
data sets with no limit on input data size.
• Cost Efficient: Maximum performance from
commodity multicore servers, SMP systems and
clusters.
• Easy to Implement: No complex parallel
processing issues; visual and API level
interfaces.
• Extensible: Extensible platform so you remain in
control of development.
… a parallel dataflow platform that eliminates performance
bottlenecks in your data-intensive applications
Mult
icore
SM
P
Clu
ster
Hadoop
Clu
ster
Analytics and Big Data
Application
DataRush Apps Scale Up and Out
4
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Pervasive DataRush Architecture
PDR Modules
DR DataMatcher
DR Recommender
DR Profiler
User-defined Modules
Data preparation
Data analytics
DR Core Data
Prep Lib
DR Core
Analytics Lib
Dynamic
Processing
Graph
DataRush SDK
User-defined
Libraries
High Performance Data-intensive Application
Quality data
Actionable analytics
Large volumes of data
PDR Parallel Dataflow Engine
…
KNIME
5
JVM: Java, Python, JRuby, SCALA…
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
DataRush & KNIME Integration
• Desktop plug-in for DataRush usage
– Nodes for data preparation and manipulation
– Base set of parallelized data mining functionality
– Highly efficient & parallelized data staging
– Parallel execution extension
• SDK plug-in for DataRush node development
– Create your own DataRush based nodes
– Access to full DataRush API’s
– Wizard for creating DataRush based nodes
6
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Normal Execution
7
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Normal Execution
8
DataRush
Engine
spawns
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Normal Execution
9
DataRush
Engine
spawns
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Normal Execution
10
DataRush
Engine
spawns
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Normal Execution - Complete
11
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Parallel DataRush Executor
• Capabilities
– Supports parallel execution of DataRush based
nodes without intermediate staging
– Automatically splits workflows into executable
graphs at staging boundaries
– Executes non-DataRush nodes including meta-
nodes, for loops and branches
– Usable within desktop, command line and
server environments
12
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Parallel Execution
13
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Parallel Execution
14
DataRush
Engine
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Parallel Execution
15
DataRush
Engine
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Parallel Execution - Complete
16
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Parallel Execution – Details
17
Parse
Parse
Parse
Parse
Replace
Replace
Replace
Replace
Aggregate
Aggregate
Aggregate
Aggregate
Format
Format
Format
Format Write
www.pervasivedatarush.com
Demo
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Vision for Levels of Usage
• Level 0
– No code changes on your part
– Install DataRush plug-in and most nodes will see a performance
benefit
• Level 1
– Some code changes required
– Utilize DataRush to access parallelized data staging capability
bypassing BDT API
• Level 2
– Utilize DataRush SDK to build nodes using the full parallelized
flow capability of DataRush
– Available today
19
www.pervasivedatarush.com
Demo
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
DataRush Benefits
• High Throughput
– Process data quickly and efficiently
– Accomplish complex processing in a single pass
• Scalable
– Takes advantage of multicore processors
– Runs faster as more cores are added
– Scales with the amount of data
• Easy to use and extend
– Dataflow abstraction hides parallelism details
– SDK to ease development
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Summary
• Scale performance on commodity multicore systems – Massive performance exists on a single server
– Core counts growing with Moore’s Law
• Scale up and scale out – Economical, environmental, and manageable
• Scale to big data – Handle diverse, complex, massive data sets
• Scale development – Easy for existing team to implement parallel applications
– Extensible platform keeps you in control
Simplify how you develop Big Data applications
22
www.pervasivedatarush.com
Questions?