into dq ed wrazen

19
How to Gain Maximum Business Value Out of Your (BIG) Data Ed Wrazen, VP Product Management: Big Data

Upload: bigdataexpo

Post on 14-Apr-2017

120 views

Category:

Data & Analytics


0 download

TRANSCRIPT

How to Gain Maximum Business Value

Out of Your (BIG) Data

Ed Wrazen, VP Product Management: Big Data

TRILLIUM SOFTWARE, A Harte Hanks Company 2

Agenda

Common Data Challenges

Traditional Data Integration

Data Preparation & Data Quality

Trillium Refine

Benefits to your Business

TRILLIUM SOFTWARE, A Harte Hanks Company 3

Access and join data from any source (JSON, HDFS, S3, etc.) at scale without moving the data

Common Data Challenge # 1:How do you combine your complex universe of data?

The Classic Problem

I need this in 2 weeks.

ANALYST

Salesforce.com(Cloud)

Omniture(JSON)

Service Now(RDBMS)

Operational Data Store

ITPROFESSIONAL

Code

Code

Code

TRILLIUM SOFTWARE, A Harte Hanks Company

TRILLIUM SOFTWARE, A Harte Hanks Company 5

Current ETL Approach – very costly and

slow

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6

Data Discovery‘N’ days

Data Definition2 days

Extract Queue3-5 days

Quality Engineering

2 days

Iteration(repeat 1 -4)

End User Accessin Weeks

1Discovery process

2Fill out data request &discuss with Engineers

3Data staging & Data modeling

5Make changes & fix errors

6End user access to information

Raw Data ETL & Data Engineering

AnalystIT Work Queue

4Validate data

Massive cost $$$ per usable byte

TRILLIUM SOFTWARE, A Harte Hanks Company 6

Common Data Challenge #2:How accurate and complete is your data?

How accurate are

these numbers?

How current is this

data? When was it

last updated?

How valid are

these data

sources?

Do these data points

link to the correct

customers?

TRILLIUM SOFTWARE, A Harte Hanks Company 7

Common Data Challenge #2:

How do you cleanse and enrich customer data?

Integrating data

from multiple data

sources presents

differences in

completeness,

consistency and

quality

8

Many more data sources now

part of business operations

These new data sources are

often unstructured causing

more delay in ETL

Real-time data sources coming

online: IoT data, sensor data,

social media data etc.

Time to insight needed now

measured in minutes not weeks

Many more analysts across

the organization with BI tools need ETL services

Data access policies, security and governance add barriers to

self-service integration

platforms

TRILLIUM SOFTWARE, A Harte Hanks Company

The Challenges Facing A Business Analyst

TRILLIUM SOFTWARE, A Harte Hanks Company 9

What if you could…

Access, combine, and standardize all enterprise data with a single web-interface

Compare unique disparate data sets for critical new

insight, operational efficiencies, and compliance.

Enable IT to enable the line of business with access

to data they need to make better business decisions

Increase the ROI of your analytics and Big Data investments in

30 days or less

Utilize your existing infrastructure and Hadoop

environment for native processing

The Six Degrees of Data Integration

To truly be considered a self-service data integration platform the

solution must provide ALL of the following elements:

ACQUIRESource data from first or

third-party data sources

DISCOVERSearch all data sources to

select which are required

for the desired analysis

CLEANSE OR ENRICHRemove delimiters, spurious

fields, unwanted values etc.

and/or add values to enrich

analysis

NORMALIZECombine or modify two or

more data sources based

on objects of interest

TRANSFORMProvide structure to the

data so it can be combined

with other data sources or

viewed individually

FORMATFormat the resulting

transform to be viewed by

a visualization tool

10

TRILLIUM SOFTWARE, A Harte Hanks Company

TRILLIUM SOFTWARE, A Harte Hanks Company 11

Introducing Trillium Refine™

Universal ConnectivityNative connectivity to any data source or type, including JSON, XML, Avro, Parquet, Web & App Logs, Emails, etc...

Innovative Recommendation EngineThoughtful data preparation automationRecommend JOIN Keys, Identify Context of Data Entity/Data Types

Elegant User ExperienceIntuitive learning and high rate of adaptability True Data Exploration and Data Preparation for the Business

Hadoop Eco-System ExtensionsVersatile integrationsHive, Sqoop, File Browser, and many more…

Industry-leading Data QualityInnovative data quality technology and content

Cleanse, matching, and enrich global data

The first integrated data preparation and data quality solution with worldwide

support in the market today

*Powered with UNIFi Software

TRILLIUM SOFTWARE, A Harte Hanks Company 12

How we automate data refinement at scale

Consolidate &

Flag DQ Issues

Cleanse &

Identify Duplicates

Enrichment &

Re-validate DQ Issues

STEP 1 STEP 3STEP 2

Analyze & Report

STEP 5

Operationalize SCV +

Enrichment

STEP 4

TRILLIUM SOFTWARE, A Harte Hanks Company 13

Trillium Refine™ Architecture Overview

BrowserVisualization Predictive

Execution

Service

Other

services

Data

Discovery

Meta-

store

Hadoop

Sqoop

Hive

Map Reduce

HFDS

Databases CloudFile-systems

Trillium Refine™

TRILLIUM SOFTWARE, A Harte Hanks Company 14

Current ETL Approach – very costly and

slow

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6

Data Discovery‘N’ days

Data Definition2 days

Extract Queue3-5 days

Quality Engineering

2 days

Iteration(repeat 1 -4)

End User Accessin Weeks

1Discovery process

2Fill out data request &discuss with Engineers

3Data staging & Data modeling

5Make changes & fix errors

6End user access to information

Raw Data ETL & Data Engineering

AnalystIT Work Queue

4Validate data

Massive cost $$$ per usable byte

TRILLIUM SOFTWARE, A Harte Hanks Company 15

Trillium Refine – Enables self-service access

Step 1 Step 2 Step 3

Discovery1/4 Day

Refinement1/4 Day

Insight / Consumption

1Browser based

Discovery process

2Analyst Friendly

Modeling

3Analysis

Raw Data Modeling Analysis

Real cost and time savings

TRILLIUM SOFTWARE, A Harte Hanks Company 16

Trillium Refine - Cleansing and Matching

Dr B. Smith

3 Davy Dryve

S66 7EN

[email protected]

1189407600

MALtby

Name

Address

City

Postal Code

Phone

Email

E-commerce

Bob Smith DR

3 Davy Dr

S66 7EN

[email protected]

Name

Address

City

Postal Code

Phone

Email

Warranty

Mr Robert Smith

3 Davey Drive

S667EN

01189 407 600

Rotherham

Name

Address

City

Postal Code

Phone

Email

Product

DR ROB SMITH

3 DAVY DRIVE

[email protected]

01189407600

S66 7EN

Name

Address

City

Postal Code

Phone

Email

CustomerService

Dr Bob Smith

[email protected]

Name

Address

City

Postal Code

Phone

Email

Web

Customer InfoTransactional Data

Product Data

Warranty Data

Customer Service Data

Web Data

Campaigns

CustomerExperience

Reporting

Analytics

Channel Strategies

Dr Robert Smith

3 Davy Drive

S66 7EN

[email protected]

+44(0)1189 407 600

Rotherham

Name

Address

City

Postal Code

Phone

Email

Single Customer View

TRILLIUM SOFTWARE, A Harte Hanks Company 17

Benefits to Your Business

ACCURATE, TIMELY ANALYTICS

Analyst control of the entire data set used for analysis.

Accelerate speed of delivery of insights to visualization tools like Tableau

TARGETED MARKETING & REVENUE GROWTH

Gain the most accurate, in-depth view of your customers to prevent churn

Monitor and respond to customer activity in real-time to adjust pricing and increase

consumer spend

ADHERENCE TO COMPLIANCE INITIATIVES

Transparency and comprehensive coverage ensure confidence in regulatory reporting

Identify and manage risk more quickly and completely

OPERATIONAL EFFICIENCY & COST REDUCTION

Eliminate time spent on manual data preparation

Ensure accuracy of global operations and supply chain

TRILLIUM SOFTWARE, A Harte Hanks Company 18

Contact Information

email: [email protected]: +44 118 940 7634web: www.trilliumsoftware.com

email: [email protected]: +31 (0)297 254 390web: www.intodq.com

Thank You!