satyam open analytics nyc

24
1 BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy June 17, 2013 – New York City

Upload: open-analytics

Post on 12-May-2015

3.495 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Satyam open analytics nyc

1BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

BIG DATA ANALYTICS &

PITFALLS TO AVOID

Dr. Satyam Priyadarshy

June 17, 2013 – New York City

Page 2: Satyam open analytics nyc

2BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

Agenda

The Big Data buzz word is creating a lot of confusion for

companies. One needs to understand Big Data within their

context, and the 7V’s of Big Data along with the KARMA score

to avoid some of the serious pitfalls in leveraging Big Data.

Case Study will be presented in how to drive value out of Big

Data, in a meaningful manner

Page 3: Satyam open analytics nyc

3BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

BIG DATA Buzz - Should Business Care?

Big Data future is bright. Organizations that can effectively leverage Big Data without sinking in the Big Data Hole will realize additional business value, a loyal customer base and increased profits.

2.5 Exa bytes of new data/day generated

What we know?

A top business priority

Big opportunities available

Everyone is talking about it

But...

Emerging technology helps

Adds value definitely

Definition, Leverage is not clear

Big challenges for companies

The path to execute is less understood

Realization is complex but getting easier

Expertise is demand but supply is short

Page 4: Satyam open analytics nyc

4BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

BIG DATA - 7 V’s that describe

VELOCITYMoving away from batch processing to real-time addition of massive data for near real-time analysis

VARIETYStructured and unstructured data - e.g. POS data, Sensor Data, transaction data, call center data, supply chain data, new media data, etc.

VERACITYReliability and predictability of ‘not so’ precise data types. E.g. Sentiment data, Weather data and its impact on business.

VOLUMEThe ever growing data form Terra bytes to Peta bytes to Zetta bytes

Big Data definition is evolving. The origin of word dates back to 1990. Typically 4 V’s defined Big Data, but I strongly recommend the 7 V’s that describe Big Data.

(Source: chiefknowledgeguru.com)

80% of data generated is unstructured

VALUEUnless value is realized, Big Data is a just Big Hole

VIRTUALData resides in virtual environment - e.g. POS, Private and Public Clouds, Geo-located, inside and outside firewalls

VARIATIONNo single configuration of the 6 V’s below fits everyone. There is variation for each business.

Page 5: Satyam open analytics nyc

5BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

KARMA matters

Knowledge• Business,

Technology, People Strategy

• Big Data Sources, Lifecycle

• Re-invest based on actions

Action• Scalable

Architecture, Infrastructure, Tools & Technology, Resources

• Mining the Big Data with targeted and open mind to find Gold and other items

Recognition• Revenue By

Sell New Insights

• Increase Profit Margins

• Add new features to products & services

Market• Grow Share• Customer

Centricity

Advance• Innovate

with help of Big analytics

• Gather even more Big Data and keep going through this cycle

Page 6: Satyam open analytics nyc

6BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

KARMA SCORE is calculated using maturity level of these capabilities

• Parallel Processing, API, Query, Reporting

• Data Mining, Analytics, Pattern, Statistics

• Machine Learning, Inference predictions

• Tools, Technologies, Human Resources

• Service to support business – Data, Information, Knowledge, Process

• Presentation – Visualization, Mobility, Collaboration, Exploration

• Actions – Improve Product/Services, Grow Revenue/Profits, Agility

• Collection of Raw Data, Structured &Unstructured, Discovery, Staging

• Extract, Load, Transform• Data Connectors, Access,

Use, Move• Data Storage: Hadoop,

NoSQL, Key-value, MPP, In-memory, blobs, etc.

• Policy, Privacy, Security, Metadata, Risk, Total cost of ownership, Access control

• Data Lifecycle, Data Assets, SLA, ROI, ROA, Data Quality

• Physical Store, Virtual Storage, Encryption, Masking, Archive, Disaster Recovery

Data Governance

and Management

Big Data

Big Math and Big Analytics

Big Value, Big Actions

Page 7: Satyam open analytics nyc

7BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

What ever your KARMA Score is? One can leverage Big Data eventually

The Great Enabler is OPEN SOURCE RevolutionIn the last decade or so.

Page 8: Satyam open analytics nyc

8BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

In a Zoo In an Open Environment

OPEN SOURCE Creates a HAPPY, FLOURISHING Environment

Page 9: Satyam open analytics nyc

9BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

Open Source – Key Characteristics

FREE (*)

NOT CAGED, NOT BLACK BOX

MODIFICATIONS ALLOWED

MODIFIED VERSIONS

REDISTRBUTABLE

LIVES IN HARMONY WITH

OTHERS

Page 10: Satyam open analytics nyc

10BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

Open Source – BIG DATA PLAYERS

THESE TOOLS ENABLE YOU TO DIG THE GOLD IN BIG DATA(This is not a comprehensive list of tools/technologies)

Page 11: Satyam open analytics nyc

11BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

ACTION for finding the GOLD

PROBLEM SOLVING

OPERATIONAL

STRATEGIC – FUTURISTICBasic Analytics

Advanced Analytics

Holistic AnalyticsGO FOR THE GOLD

ADDRESSESCurrent Concerns

Reduce CostsEliminate Issues

ADDRESSES GROWTHCustomer Centric

Easily Incorporate New DataInnovation Related

Emerging Trends Adoption

BIG DATA, BIG MATH,

BIG ANALYTICS

Descriptive Statistics

Inferential Statistics

Page 12: Satyam open analytics nyc

12BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

THAT’S A GOLD MINE

Page 13: Satyam open analytics nyc

13BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

WHAT’S IN A GOLD MINE?

Gold Suite

BASE Suite

Iron-Manganese Suite

GoldArsenic

MercuryTungsten

Silver

CopperLeadZinc

BismuthCadmium

MolybdenumSilver

IronManganese

CobaltNickel

Yttrium

To GET GOLD ONE HAS TO DIG DEEPER

IF YOU FOUNDSILVER WHILE DIGGING FOR GOLD

WHAT WOULD YOU DO?

Page 14: Satyam open analytics nyc

14BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

CASE STUDY – DDoS Attack

PROBLEM

BIG ANALYTICS

THE GOLDKNOWLEDGE

ACTIONS

RECOGINITION

Source of attacks identified• After integrating

• Distributed targets• Multiple attack types

• Slow performance over binary data sets

• A step closer to solution, but requires more work to get it near real-time for actionable insights.

• Feedback loop to known datasets to enhance the predictability and performance

45 days laterIt’s Science not BI

DNS Servers are persistently attacked to create DdoS Attacks. Can we predict?CHALLENGES:• 7+ TB / Day• Varied Formats based on

Request and type of attacks

Hadoop based data storageAPPROACH• Hive / MapR queries and

R for statistical analysis• Interconnection of data

with known “data” sources for identification

• Tableau and (Open source DS3.js and Ploticus) for Visualization

• Iteratively optimized queries for speed

Page 15: Satyam open analytics nyc

15BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

CASE STUDY- DDoS Attack – Pattern Based Study

0.0 0.2 0.4 0.6 0.8 1.0 1.20

100

200

300

400

500

600

700

800

Single Day - Outlier Events - 10K Size :: Zones Hit from Multiple Sources

-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

-200

-100

0

100

200

300

400

500

600

700

800

900

Single Day - Outlier Events - 2K Size :: Zones Hit from Multiple

Sources

ABC.TLD ABC.TLD

SB

GOLD.TLD

Traffi

c Vo

lum

e

Unique ZRatio

AFTER DIGGING FURTHER

Unique ZRatio

Page 16: Satyam open analytics nyc

16BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

PITFALLS…

Lack of knowledge – Tools, Data Science

Too Much Data. Initially most of it was discarded

HOW TO OVERCOME

Deploy Hadoop Clusters with cheap storage and store with best possible compression

BIG DATA PITFALLS

Expert, Education, Execution

Big Data can help MOST BUSINESSES

Executives Not Sure

Belief Big DATA has all the answers

The Whole Mine is NOT GOLD.. Shows insights and coach

Education, Best Practices and Insights after mining and find useful patterns initially

Page 17: Satyam open analytics nyc

17BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

PITFALLS…

Silo Culture

Multiple copies of ‘same’ data in different formats

HOW TO OVERCOME

Keep Raw Data (along with DR site), Transform during Analysis

BIG DATA PITFALLS

Devastating for companies. Single Source of Truth Key to Success

Big Data can help MOST BUSINESSES

Well Established Enterprise Data Warehouse

Intuition Based Culture Can only focus on Gold, if

you find Silver and other precious metal, you miss the mark. Show Insights and Move On To Gold

Keep it for Simple, Operational Analytics, Augment with Big Data for Innovation and Future Growth

Page 18: Satyam open analytics nyc

18BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

Simple way to see some Big Data Challenges

• Data acquisition• Storage• Processing1st• Data transport & dissemination• Data management & curation• Big Analytics – Tools, Technology, Know-How2nd• Privacy, Security and Disaster Recovey• Technical/Scientific Talent• Cost of all of the above3rd

Page 19: Satyam open analytics nyc

19BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

KARMA matters

Knowledge• Business,

Technology, People Strategy

• Big Data Sources, Lifecycle

• Re-invest based on actions

Action• Scalable

Architecture, Infrastructure, Tools & Technology, Resources

• Mining the Big Data with targeted and open mind to find Gold and other items

Recognition• Revenue By

Sell New Insights

• Increase Profit Margins

• Add new features to products & services

Market• Grow Share• Customer

Centricity

Advance• Innovate

with help of Big analytics

• Gather even more Big Data and keep going through this cycle

Page 20: Satyam open analytics nyc

20BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

THANK YOU UNDERSTAND YOUR BIG DATA KARMA SCORE AND Understand the Big Picture, THE Direction and LEAD

Helps Build Strong

Foundation

Focus on OUR MOST VALUED CUSTOMES

INCREASE PROFITABiLITY

Innovate & Develop for

Future Differentiation &

Advantage

Advance the Brand

and Grow

Page 21: Satyam open analytics nyc

21BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

Appendix

Page 22: Satyam open analytics nyc

22BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

The Pitfalls for Adopting Big Data

The Big Data Definition of 4 V’s – Velocity, Volume, Variety, Veracity is incomplete.

The Belief that Big Data solves everything for Everyone.

Big Data is Abound, but Dimensions of it are to be understood

The Loudest Often Wins (LOW) or the highest paid person’s opinion (HIPPO) prevails

Data Driven approach trumps intuition is a hard nut to crack. Really!!

Data for Data’s Sake Talent Gap Data, Data Everywhere Infighting Aiming Too High

Reference: Wall Street Journal March 11, 2013 on page R4

Page 23: Satyam open analytics nyc

23BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

1890 1920 1950 1980 2010

Brief History of Analytics

Copyright 2012 Dr. Priyadarshy

Page 24: Satyam open analytics nyc

24BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy

DEFINITIONS of Analytics for Business

ANALYTICS– Any data-driven process that provides insights

ADVANCED ANALYTICS– Helps understanding cause-effect relationship, prediction of future events,

best possible action

• BIG ANALYTICS FOR BUSINESS– Relevant for the business, actionable insights for

increasing revenue/profit, value measurement and leverages “Big Data”.