hardware & software acquisition costs · 2017-05-23 · •scalable: performance dynamically...

23
www.pervasivedatarush.com Pervasive DataRush TM Parallel Data Analysis with KNIME

Upload: others

Post on 01-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

www.pervasivedatarush.com

Pervasive DataRushTM

Parallel Data Analysis with KNIME

Page 2: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES 2

Company Overview

Global Software Company

• Tens of thousands of users across the globe

• Americas, EMEA, Asia

• ~230 employees worldwide

Strong Financials

• $46 million revenue (Trailing 12-month)

• 40 consecutive quarters of profitability

• $36 million in the bank

• NASDAQ:PVSW since 1997

Leader in Data Innovation

• Cloud-Based and On-Premises Data Integration

• Data Management

• Web-based Business-to-Business Data Interchange

• Highly Parallel Data-Intensive and Analytic Applications

Page 3: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Data Size

Com

ple

xit

y

HPC • climate modeling

• seismic analysis

• fluid dynamics

Internet scale

• web indexing

• web search

GB PB

Enterprise data

• custom solutions

• data quality

• data analytics

Need to deal

with increased

data and

complexity

The Challenge of Big Data

3

Page 4: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Pervasive DataRush™

• Scalable: Performance dynamically scales with

increased core counts and increased nodes.

• High Throughput: Fast, deep analysis of large

data sets with no limit on input data size.

• Cost Efficient: Maximum performance from

commodity multicore servers, SMP systems and

clusters.

• Easy to Implement: No complex parallel

processing issues; visual and API level

interfaces.

• Extensible: Extensible platform so you remain in

control of development.

… a parallel dataflow platform that eliminates performance

bottlenecks in your data-intensive applications

Mult

icore

SM

P

Clu

ster

Hadoop

Clu

ster

Analytics and Big Data

Application

DataRush Apps Scale Up and Out

4

Page 5: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Pervasive DataRush Architecture

PDR Modules

DR DataMatcher

DR Recommender

DR Profiler

User-defined Modules

Data preparation

Data analytics

DR Core Data

Prep Lib

DR Core

Analytics Lib

Dynamic

Processing

Graph

DataRush SDK

User-defined

Libraries

High Performance Data-intensive Application

Quality data

Actionable analytics

Large volumes of data

PDR Parallel Dataflow Engine

KNIME

5

JVM: Java, Python, JRuby, SCALA…

Page 6: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

DataRush & KNIME Integration

• Desktop plug-in for DataRush usage

– Nodes for data preparation and manipulation

– Base set of parallelized data mining functionality

– Highly efficient & parallelized data staging

– Parallel execution extension

• SDK plug-in for DataRush node development

– Create your own DataRush based nodes

– Access to full DataRush API’s

– Wizard for creating DataRush based nodes

6

Page 7: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Normal Execution

7

Page 8: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Normal Execution

8

DataRush

Engine

spawns

Page 9: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Normal Execution

9

DataRush

Engine

spawns

Page 10: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Normal Execution

10

DataRush

Engine

spawns

Page 11: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Normal Execution - Complete

11

Page 12: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Parallel DataRush Executor

• Capabilities

– Supports parallel execution of DataRush based

nodes without intermediate staging

– Automatically splits workflows into executable

graphs at staging boundaries

– Executes non-DataRush nodes including meta-

nodes, for loops and branches

– Usable within desktop, command line and

server environments

12

Page 13: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Parallel Execution

13

Page 14: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Parallel Execution

14

DataRush

Engine

Page 15: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Parallel Execution

15

DataRush

Engine

Page 16: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Parallel Execution - Complete

16

Page 17: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Parallel Execution – Details

17

Parse

Parse

Parse

Parse

Replace

Replace

Replace

Replace

Aggregate

Aggregate

Aggregate

Aggregate

Format

Format

Format

Format Write

Page 18: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

www.pervasivedatarush.com

Demo

Page 19: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Vision for Levels of Usage

• Level 0

– No code changes on your part

– Install DataRush plug-in and most nodes will see a performance

benefit

• Level 1

– Some code changes required

– Utilize DataRush to access parallelized data staging capability

bypassing BDT API

• Level 2

– Utilize DataRush SDK to build nodes using the full parallelized

flow capability of DataRush

– Available today

19

Page 20: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

www.pervasivedatarush.com

Demo

Page 21: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

DataRush Benefits

• High Throughput

– Process data quickly and efficiently

– Accomplish complex processing in a single pass

• Scalable

– Takes advantage of multicore processors

– Runs faster as more cores are added

– Scales with the amount of data

• Easy to use and extend

– Dataflow abstraction hides parallelism details

– SDK to ease development

Page 22: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES

Summary

• Scale performance on commodity multicore systems – Massive performance exists on a single server

– Core counts growing with Moore’s Law

• Scale up and scale out – Economical, environmental, and manageable

• Scale to big data – Handle diverse, complex, massive data sets

• Scale development – Easy for existing team to implement parallel applications

– Extensible platform keeps you in control

Simplify how you develop Big Data applications

22

Page 23: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep

www.pervasivedatarush.com

Questions?