sparkflows.io

17
Reducing cost and time-to-market for Big Data Analytics & Applications by 10X Self-Service Big Data Analytics & Applications Cut down from months to hours

Upload: sparkflows

Post on 15-Apr-2017

69 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Sparkflows.io

Reducing cost and time-to-market for Big Data Analytics & Applications by 10X

Self-Service Big Data Analytics & ApplicationsCut down from months to hours

Page 2: Sparkflows.io

AgendaProblem

Sparkflows Solution

Differentiators

2

Page 3: Sparkflows.io

3

Data Analysts

Data Engineers

Data Scientists

Its challenging for users and get value out of the Data Lake

Data Lake

● Data Analytics, Data Preparation

& Blending

● Machine Learning

● Streaming Applications

● Batch Applications

● Dashboards & Visualization

Needs a lot of coding on Big Data

Page 4: Sparkflows.io

4

Machine Learning

Classification Regression Clustering Collaborative Filtering Save/Load Model Predict Cross-Validator

NLP

CoreNLP StanfordNLP

OCR

Tesseract

File Formats

CSV/TSV Parquet JSON Avro PDF Images Whole Files

Feature Generation

Tokenization TF, IDF OneHotEncoder StringIndexer Imputer Scaler

Data Sources/Sinks

HDFS S3 Kafka, Flume, Twitter HBase Solr

ETL

Joins, Unions Filter SQL, Scala, Python GeoIP ConcatColumns Column Filter Dedup

Page 5: Sparkflows.io

5

Long time to Production & Value

Hard to maintain and extend the

pipelines/applications

Very Hard to Collaborate

Business Data Scientist

Data Engineer IT

Very Complex Deployment

Hard to handover code

Results In

Page 6: Sparkflows.io

Data Analysts

Data Engineers

Data Scientists

Spark

Relational

Batch + Streaming

Hadoop

Workflow / Application Repository

Nodes Repository

Future

● 100+ Nodes● Entity Resolution● Machine Learning● Data Wrangling / ETL / Drools

● Sentiment Analysis● Recommendations● Churn Prediction● Log Analytics

● Workflow Designer● Preview Mode● Execution Engine● Visualization

+ SQL / Scala / Python

Page 7: Sparkflows.io

7

Sparkflows Solution

Page 8: Sparkflows.io

Workflow Editor

How Sparkflows Works

Rich Visualizations &

Dashboards

100’s of Nodes

Batch & Streaming Engine

Interactive Execution

Easy Deployment & Configuration

Pre-built Workflows

Telco Churn Pred

Housing Price Pred

Bike Sharing Analysis

NY Taxi Data Analysis

Movie Lens Recommendations

Page 9: Sparkflows.io

Confidential Property of Sparkflows.io

Sparkflows Product Stack

Streaming DataKafka

Flume

Data SourcesHIVE/HBase

HDFS/S3

Solr

RDBMS

Apache Spark Cluster

Databricks AWS

IBM Bluemi

x

On Prem

Azure

Visualizations

ETL/NLP/OCR

Model Building

Workflow Execution

Scala/SQL/Python

Data Wrangling

Data Analysis

Data Pipelines

Big Data Analytics /Applications

Visualization

Data Sinks

HIVE/HBase

HDFS/S3

Solr

RDBMS

Page 10: Sparkflows.io

10

Business Analyst

Data Scientist

Data Engineer IT

Data Analytics for Business Use Cases by dragging and dropping nodes and using various datasets.

Visualization and deep

understanding of the data Build predictive models and apply

predictions

Do predictive and analytical modeling with the drag-and-drop capabilities

Write custom SQL, Scala, Python

to close the gaps Blend static and real-time streams

to build complex data pipelines

Build and deploy complex pipelines in minutes.

Connect to various sources and sinks including Kafaka, HDFS, S3, HBase, Solr.

Build and expose custom nodes in

Sparkflows for others to use Embed SQL, Scala, Python within

the workflow.

Easily configure multi-tenancy and security for Sparkflows users

Connect workflow results to

platform of choice for visualization

Provision Hadoop

infrastructure, monitor workflow jobs, and tune performance

Page 11: Sparkflows.io

Why Now?Big Trend towards building with Templates

11

Streamsets iPhone Apps

Building Website

nifi

StreamAnalytix Impetus

Alteryx

Page 12: Sparkflows.io

Dashboards

12

Combine output of various Workflows into Dashboards

Page 13: Sparkflows.io

Core Differentiators

13

Easy & Natural to use and Deploy

Deep Integration with Hadoop - Security/Impersonation/HIVE/HBase/Solr

Custom Nodes - Users can write their own Nodes and plug into the UI

Schema Propagation

Interactive Execution at Design Time

Rich Application Dashboards

Growing Repository of Workflows for various Solutions

Building out of Complex Nodes by Sparkflows - Dedup, Drools, OpenNLP, StanfordNLP, Tesseract etc.

Batch & Streaming - Nodes support both Batch & Streaming workloads

Support for SQL, Scala, Jython as Nodes of the workflow

Page 14: Sparkflows.io

Line of Products

14

Data Analytics(Analytics /

Wrangling / Machine Learning)

Streaming Analytics Applications

Page 15: Sparkflows.io

15

THANK YOU

Page 16: Sparkflows.io

Building Big Data Analytics & Applications is very costly & time consuming

16

Customer 360

Fraud Detection

Operations Analytics

Cyber Security

IoT Analytics

Analytics Applications

Not enough users are able to extract great value from the Data Lake

Page 17: Sparkflows.io

Needs a lot of coding on Big Data

17

Data Analytics, Data Preparation & Blending

Machine Learning

Streaming Applications

Batch Applications

Visualizations