summer shorts: big data integration

22
Big Data Integration 1 Marcelo Litovsky National Solutions Architect – Information Builders

Upload: information-builders

Post on 15-Apr-2017

343 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Summer Shorts: Big Data Integration

1

Big Data Integration

Marcelo LitovskyNational Solutions Architect – Information Builders

Page 2: Summer Shorts: Big Data Integration

Why are people buying Apache Hadoop?

• Load, Transform, Syndicate – Use the power of Apache Hadoop to pre-process large amounts of data at a low cost, then transform it into what is needed on the warehouse

• Archive/Offload - Do not discard any data. Use Apache Hadoop to archive/offload useful data. Whether driven by government regulations or to add business value, the information is readily available in Apache Hadoop.

Data Warehousing – Paradigm Shift from ETL to ELT

• Load data from external sources (Social Media, Machine data…)• Conform datasets to enterprise standards• Integrate the disparate data sources to extract value from the incoming data• Relate streaming and unstructured data, social data with transactional and traditional

operational data sources

External Data Integration

Page 3: Summer Shorts: Big Data Integration

3

The Evolution of Integration

Hand CodedIntegration

ETL MessagingBus

ESBEAI Apache Hadoop-BasedIntegration

Page 4: Summer Shorts: Big Data Integration

4

Traditional in Transition to Modern

Fewer use cases

More use cases

ModernTraditional

Apache Hadoop

IoT

Streaming

Virtual DW

Data Lake

OLTPOLAP

Data warehousesData marts

Point-to-pointIntegration

EII

Page 5: Summer Shorts: Big Data Integration

5

We Have Some Pretty Simple Problems…

According to a May 2015 Gartner Survey…• 26% are deploying Apache Hadoop, 11% in 12 months, 7% in 24

months• 49% cite trying to find value as their biggest problem• 57% cite the Apache Hadoop skills gap as their biggest problem

To summarize…• Companies are investing in Apache Hadoop, but not sure why• Companies are investing in Apache Hadoop, but don’t know how to

use it

Page 6: Summer Shorts: Big Data Integration

6

Information Builders Big Data ArchitectureUse Case for Apache Hadoop

Sqoop, Flume…

Avro, JSON

Traditional applications and data stores

iWay Big Data IntegratorSimplified, modern, native Apache Hadoop integration

Big Data Apache HadoopAny distribution, Any data

BI & Analytics WebFOCUS BI and analytics platform

Self-service for EveryoneWebFOCUS access, ETL, metadata

WebFOCUS access, ETL, metadata

Data Ingestion – Enterprise Data Hub

ETL / ELT

Predictive Analytics - RStat

Business Intelligence - WebFocus

Low-cost storage of large data volumes

Page 7: Summer Shorts: Big Data Integration

7

iWay Big Data Integrator100% Run “in” Apache Hadoop architecture

Simplifiedinterface

Native Apache Hadoopscript generation

Process mgmt. & governance Simplified easy-to-use interface

to integrate in Apache Hadoop Marshals Apache Hadoop

resources and standards Takes advantage of performance

and resource negotiation Includes sophisticated process

management and governanceSqoop, Flum

e…Avro, JSO

N…

Traditional applications and data stores

iWay Big Data IntegratorSimplified, modern, native Apache Hadoop integration

Big Data Apache HadoopAny distribution, Any data

Page 8: Summer Shorts: Big Data Integration

8

iWay Big Data IntegratorKey Features

Eclipse-based User Friendly Interface

Data ingestion using abstraction above Sqoop®, Flume®, Spark®, and proprietary streaming channel content

Transformation & Mapping

Publish to non-Apache Hadoop data sources

Auto-generated scripts/jobs based on configuration

Page 9: Summer Shorts: Big Data Integration

iWay Big Data Integrator

9

Notable Features in 2016

• Data Profiling, Data Preparation, Master Data Management• Analyze patterns, data types, sparsity, cardinality of Apache

Hadoop datasets• Generation of data cleansing rules based on pattern analysis• Auto generation of remediation tickets for non-cleansable records• Ability to transpose (wide to deep, deep to wide) data in parallel• Missing value imputation, data scaling, data categorization• Streaming and in-process predictive model scoring (PMML and

native code)• “Natively” Match and Merge

Data Governance

Page 10: Summer Shorts: Big Data Integration

iWay Big Data Integrator

10

Notable Features in 2016

• Full capture of data lineage for BDI ingestion, transform, data prep, cleansing

• Integration with Cloudera Navigator, to give holistic data lineage view for non-BDI sources

• User interface to interactively display information

Data Lineage

Page 11: Summer Shorts: Big Data Integration

11

iWay Big Data IntegratorData Ingestion

Graphical Sqoop and Flume configuration

•Replace•Change Data Capture•Native “Roll your own”

Sqoop

•Flume editor with validation•Graphical wizard in the works•Templates

Flume

•Legacy formats (Streaming channel, Mumps, etc)

Proprietary “channel” ingestion – iWay Service Manager

Structured data standardized on Avro format

Late-binding data “wrangler” for unstructured content

Page 12: Summer Shorts: Big Data Integration

12

iWay Big Data IntegratorData Ingestion

Graphical Sqoop and Flume configuration

Page 13: Summer Shorts: Big Data Integration

13

iWay Big Data IntegratorTransformation

• Join (inner, left, right, full, outer)• Group by• Aggregate functions as defined by cluster

Drag and drop data transformation designer

Any data on cluster can be transformed, provided it is described in Hive metastore

Logic preview

Transformations performed 100% in Apache Hadoop

Kerberos compliant

Page 14: Summer Shorts: Big Data Integration

14

iWay Big Data Integrator

• Relational targets on remote RDBMS• XML definitions• Custom-defined on design canvas

Mapping

• Publish to any JDBC-compliant MPP or RDBMS

• Staging table or direct-to-target load

Publish

Page 15: Summer Shorts: Big Data Integration

15

iWay Big Data IntegratorTransformation

Drag and drop data transformation designer

Page 16: Summer Shorts: Big Data Integration

16

iWay Big Data IntegratorTransformation – underlying scriptUnderlying script generation view

Page 17: Summer Shorts: Big Data Integration

17

iWay Big Data IntegratorJob Execution

Multiple job executions in a defined order

Page 18: Summer Shorts: Big Data Integration

18

Real-World Strategies for Deploying Big DataData Quality and MDM – iWay Big Data IntegratorEdge Node Deployment of DQ Services

Page 19: Summer Shorts: Big Data Integration

19

Real-World Strategies for Deploying Big DataData Quality and MDM – iWay Big Data Integrator

Native Spark Interface to DQ

Page 20: Summer Shorts: Big Data Integration

20

Real-World Strategies for Deploying Big DataSpark Integration – iWay Big Data Integrator

• Spark Streaming• SparkSQL• SparkR• MLLib

Full Integration of Apache® Spark Stack

Fully Automated project setup, dependency management, Scala version detection

Code, build, test, deploy – all from within Big Data Integrator

Page 21: Summer Shorts: Big Data Integration

21

Real-World Strategies for Deploying Big Data

Predictive Model Development and Deployment

Spark Integration – iWay Big Data Integrator

Predictive Model Development and Deployment

Page 22: Summer Shorts: Big Data Integration

22

iWay Big Data IntegratorCloudera Certified

• Easy to use interface for deploying and integrating data on Apache Hadoop distributions of all flavors, ensuring portability.

• Ingests, transforms, and cleanses traditional RDBMS, mobile, social media, sensor, and other data in batch or streams, using native Apache Hadoop facilities.

• 100% YARN compliant, taking advantage of native Apache Hadoop performance and resource negotiation.

• Simplifies the use of Apache Hadoop ecosystem technologies such as: MapReduce, Sqoop, Flume, Hive®, and Spark®.

iWay Big Data Integrator is CLOUDERA CERTIFIED!!