sparkflows use cases

24
Use Cases to Build & Deploy in < 30 min Self-Serve Big Data Analytics & Applications

Upload: jayant-shekhar

Post on 15-Apr-2017

38 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Sparkflows Use Cases

Use Cases to Build & Deploy in < 30 min

Self-Serve Big Data Analytics & Applications

Page 2: Sparkflows Use Cases

2

AgendaIntroductionSparkflows SolutionUse Cases

Page 3: Sparkflows Use Cases

3

100 + Building Blocks

ETL, ML, OCR, NLP, Connect to various Sources/Sinks

Workflow Editor

Powerful Schema Inference, Schema Propagation, Interactive Execution

Visualization & DashboardsPrebuilt Workflows

Introduction

Page 4: Sparkflows Use Cases

4

Workflow Editor

Sparkflows Solution

Rich Visualizations &

Dashboards

100’s of Pre-built Nodes

Batch & Streaming Engine

Interactive Execution

Easy Deployment & Configuration

Pre-built Workflows

Telco Churn Pred

Housing Price Pred

Bike Sharing Analysis

NY Taxi Data Analysis

Movie Lens Recommendations

Page 5: Sparkflows Use Cases

5

Sparkflows Product Stack

Streaming DataKafka

Flume

Data SourcesHIVE/HBase

HDFS/S3

Solr

RDBMS

Apache Spark Cluster

Databricks AWS

IBM Bluemi

x

On Prem

Azure

Data Sinks

HIVE/HBase

HDFS/S3

Solr

RDBMS

Visualizations/

Dashboards

Page 6: Sparkflows Use Cases

6

Machine Learning

Classification Regression Clustering Collaborative Filtering Save/Load Model Predict Cross-Validator

NLP

NER Sentiment

OCR

Tesseract

Visualization

Line Chart Bar Chart Pie Chart Updating Dashboards

File Formats

CSV/TSV Parquet JSON Avro PDF Images Whole Files

Feature Generation

Tokenization TF, IDF OneHotEncoder StringIndexer Imputer Scaler

Data Sources/Sinks

HDFS S3 Kafka, Flume, Twitter HBase Solr Elastic Search

ETL

Joins, Unions Filter SQL, Scala, Python GeoIP ConcatColumns Column Filter Dedup

Languages

SQL Scala Jython Java

Some of the Building Block / Nodes

Page 7: Sparkflows Use Cases

7

Use Cases in < 30 minutes

Self-Serve Big Data Analytics

ETL Pipelines

NLP

OCR

Streaming Analytics

Do Big Data Analytics with Drag & Drop with 100+ building blocks

Build ETL pipelines with ease. Also incorporate SQL, Scala, Jython in it.

Perform NLP on Big Data with OpenNLP and Stanford CoreNLP

Perform OCR on millions of images with Tesseract

Perform Streaming Analytics reading from Kafka, performing complextransforms, generate graphs and write out to Solr, Hbase etc.

Page 8: Sparkflows Use Cases

8

Use Cases in < 30 minutes

Machine Learning

Entity Resolution

Log Analytics

Format Conversion

Load data into Solr, ES, HBase

Perform Machine Learning on huge datasets with drag and drop

Perform large scale Entity Resolution on data from multiple channels

Build Log Analytics Platform with Kafka, Spark, Solr/Elastic Search, Hue

Convert Big Data from one format to another

Easily load data into Solr, Elastic Search, HBase etc.

Page 9: Sparkflows Use Cases

9

Use Cases in < 30 minutes

Custom Nodes Create Custom Nodes and drop them in the Library/Workflow Editor

Dashboards Combine various outputs of workflows into a Dashboard

Page 10: Sparkflows Use Cases

Self-Serve Data Analytics

Spark

CSV

Read

AVRO

Save

JSON

Parquet

Solr

HBase

Elastic Search

HIVE

Row Filter / Rename Col

Random Forest

SQL / Scala / Jython

JOIN

Read

Graph

Graph

Model

Dashboard

Page 11: Sparkflows Use Cases

ETL – Build ETL pipelines with ease

HIVE

Solr

Spark

CSV Filter

Filter

JOIN SQLES

HBase

HIVE

LoadSolr

LoadES

LoadHBase

LoadHIVE

ReadCSV

ReadHIVE

Page 12: Sparkflows Use Cases

ETL – Connect various SQL for powerful pipelines

HIVE

Solr

Spark

CSV SQL

SQL

SQL SQLES

HBase

HIVE

LoadSolr

LoadES

LoadHBase

LoadHIVE

ReadCSV

ReadHIVE

Page 13: Sparkflows Use Cases

NLP – Perform distributed NLP on Big Data

CSV

Solr

Spark

PDF NLP

NLP

JOINES

HBase

HIVE

LoadSolr

LoadES

LoadHBase

LoadHIVE

ReadPDF

ReadCSV

Page 14: Sparkflows Use Cases

OCR – Perform distributed OCR on Big Data

Solr

Spark

PDF OCRES

HBase

HIVE

LoadSolr

LoadES

LoadHBase

LoadHIVE

ReadPDF

Plus extract images

Page 15: Sparkflows Use Cases

Streaming Analytics – With Kafka & Spark Streaming

Solr

Spark

ES

HBase

HIVE

LoadSolr

LoadES

LoadHBase

LoadHIVE

ReadKafka

Apply various

transforms

Kafka

Transform

Graph

Page 16: Sparkflows Use Cases

Machine Learning – With Spark ML

Spark

Logistic Regression

Score

Evaluate

Apply various

transforms

TransformHIVE Split

Page 17: Sparkflows Use Cases

Entity Resolution – Applying various distance algorithms & scoring

Spark

DedupJoin & Transform

DataSet 1

DataSet 2

HIVEFilter low

Scores

Page 18: Sparkflows Use Cases

Log Analytics

Spark

IP2Geo

ReadKafka

Kafka

Graph

Apache Logs

Parse Apache Logs

Save

Solr

HBase

Elastic Search

HIVE

SQL

HUE

Page 19: Sparkflows Use Cases

Small Files Problem

CSV

Spark

CSV

Coalesce

HIVE

Read

HIVE

Save

Page 20: Sparkflows Use Cases

Format Conversion

Spark

CSV

Read

AVRO

Save

JSON

Parquet

CSV

AVRO

JSON

Parquet

Page 21: Sparkflows Use Cases

Loading Data into Solr, Elastic Search, HBase, HIVE

Spark

CSV

Read

AVRO

Save

JSON

Parquet

Solr

HBase

Elastic Search

HIVE

Page 22: Sparkflows Use Cases

Custom Nodes – Create & Use Custom Nodes which add custom features

Spark

Custom NodeJoin & Transform

DataSet 1

DataSet 2

HIVECustom Node

Page 23: Sparkflows Use Cases

Dashboards – Combine output of various Workflows/Nodes into a Dashboard

Page 24: Sparkflows Use Cases

24

THANK YOU