satyam open analytics nyc
TRANSCRIPT
1BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
BIG DATA ANALYTICS &
PITFALLS TO AVOID
Dr. Satyam Priyadarshy
June 17, 2013 – New York City
2BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
Agenda
The Big Data buzz word is creating a lot of confusion for
companies. One needs to understand Big Data within their
context, and the 7V’s of Big Data along with the KARMA score
to avoid some of the serious pitfalls in leveraging Big Data.
Case Study will be presented in how to drive value out of Big
Data, in a meaningful manner
3BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
BIG DATA Buzz - Should Business Care?
Big Data future is bright. Organizations that can effectively leverage Big Data without sinking in the Big Data Hole will realize additional business value, a loyal customer base and increased profits.
2.5 Exa bytes of new data/day generated
What we know?
A top business priority
Big opportunities available
Everyone is talking about it
But...
Emerging technology helps
Adds value definitely
Definition, Leverage is not clear
Big challenges for companies
The path to execute is less understood
Realization is complex but getting easier
Expertise is demand but supply is short
4BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
BIG DATA - 7 V’s that describe
VELOCITYMoving away from batch processing to real-time addition of massive data for near real-time analysis
VARIETYStructured and unstructured data - e.g. POS data, Sensor Data, transaction data, call center data, supply chain data, new media data, etc.
VERACITYReliability and predictability of ‘not so’ precise data types. E.g. Sentiment data, Weather data and its impact on business.
VOLUMEThe ever growing data form Terra bytes to Peta bytes to Zetta bytes
Big Data definition is evolving. The origin of word dates back to 1990. Typically 4 V’s defined Big Data, but I strongly recommend the 7 V’s that describe Big Data.
(Source: chiefknowledgeguru.com)
80% of data generated is unstructured
VALUEUnless value is realized, Big Data is a just Big Hole
VIRTUALData resides in virtual environment - e.g. POS, Private and Public Clouds, Geo-located, inside and outside firewalls
VARIATIONNo single configuration of the 6 V’s below fits everyone. There is variation for each business.
5BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
KARMA matters
Knowledge• Business,
Technology, People Strategy
• Big Data Sources, Lifecycle
• Re-invest based on actions
Action• Scalable
Architecture, Infrastructure, Tools & Technology, Resources
• Mining the Big Data with targeted and open mind to find Gold and other items
Recognition• Revenue By
Sell New Insights
• Increase Profit Margins
• Add new features to products & services
Market• Grow Share• Customer
Centricity
Advance• Innovate
with help of Big analytics
• Gather even more Big Data and keep going through this cycle
6BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
KARMA SCORE is calculated using maturity level of these capabilities
• Parallel Processing, API, Query, Reporting
• Data Mining, Analytics, Pattern, Statistics
• Machine Learning, Inference predictions
• Tools, Technologies, Human Resources
• Service to support business – Data, Information, Knowledge, Process
• Presentation – Visualization, Mobility, Collaboration, Exploration
• Actions – Improve Product/Services, Grow Revenue/Profits, Agility
• Collection of Raw Data, Structured &Unstructured, Discovery, Staging
• Extract, Load, Transform• Data Connectors, Access,
Use, Move• Data Storage: Hadoop,
NoSQL, Key-value, MPP, In-memory, blobs, etc.
• Policy, Privacy, Security, Metadata, Risk, Total cost of ownership, Access control
• Data Lifecycle, Data Assets, SLA, ROI, ROA, Data Quality
• Physical Store, Virtual Storage, Encryption, Masking, Archive, Disaster Recovery
Data Governance
and Management
Big Data
Big Math and Big Analytics
Big Value, Big Actions
7BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
What ever your KARMA Score is? One can leverage Big Data eventually
The Great Enabler is OPEN SOURCE RevolutionIn the last decade or so.
8BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
In a Zoo In an Open Environment
OPEN SOURCE Creates a HAPPY, FLOURISHING Environment
9BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
Open Source – Key Characteristics
FREE (*)
NOT CAGED, NOT BLACK BOX
MODIFICATIONS ALLOWED
MODIFIED VERSIONS
REDISTRBUTABLE
LIVES IN HARMONY WITH
OTHERS
10BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
Open Source – BIG DATA PLAYERS
THESE TOOLS ENABLE YOU TO DIG THE GOLD IN BIG DATA(This is not a comprehensive list of tools/technologies)
11BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
ACTION for finding the GOLD
PROBLEM SOLVING
OPERATIONAL
STRATEGIC – FUTURISTICBasic Analytics
Advanced Analytics
Holistic AnalyticsGO FOR THE GOLD
ADDRESSESCurrent Concerns
Reduce CostsEliminate Issues
ADDRESSES GROWTHCustomer Centric
Easily Incorporate New DataInnovation Related
Emerging Trends Adoption
BIG DATA, BIG MATH,
BIG ANALYTICS
Descriptive Statistics
Inferential Statistics
12BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
THAT’S A GOLD MINE
13BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
WHAT’S IN A GOLD MINE?
Gold Suite
BASE Suite
Iron-Manganese Suite
GoldArsenic
MercuryTungsten
Silver
CopperLeadZinc
BismuthCadmium
MolybdenumSilver
IronManganese
CobaltNickel
Yttrium
To GET GOLD ONE HAS TO DIG DEEPER
IF YOU FOUNDSILVER WHILE DIGGING FOR GOLD
WHAT WOULD YOU DO?
14BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
CASE STUDY – DDoS Attack
PROBLEM
BIG ANALYTICS
THE GOLDKNOWLEDGE
ACTIONS
RECOGINITION
Source of attacks identified• After integrating
• Distributed targets• Multiple attack types
• Slow performance over binary data sets
• A step closer to solution, but requires more work to get it near real-time for actionable insights.
• Feedback loop to known datasets to enhance the predictability and performance
45 days laterIt’s Science not BI
DNS Servers are persistently attacked to create DdoS Attacks. Can we predict?CHALLENGES:• 7+ TB / Day• Varied Formats based on
Request and type of attacks
Hadoop based data storageAPPROACH• Hive / MapR queries and
R for statistical analysis• Interconnection of data
with known “data” sources for identification
• Tableau and (Open source DS3.js and Ploticus) for Visualization
• Iteratively optimized queries for speed
15BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
CASE STUDY- DDoS Attack – Pattern Based Study
0.0 0.2 0.4 0.6 0.8 1.0 1.20
100
200
300
400
500
600
700
800
Single Day - Outlier Events - 10K Size :: Zones Hit from Multiple Sources
-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
-200
-100
0
100
200
300
400
500
600
700
800
900
Single Day - Outlier Events - 2K Size :: Zones Hit from Multiple
Sources
ABC.TLD ABC.TLD
SB
GOLD.TLD
Traffi
c Vo
lum
e
Unique ZRatio
AFTER DIGGING FURTHER
Unique ZRatio
16BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
PITFALLS…
Lack of knowledge – Tools, Data Science
Too Much Data. Initially most of it was discarded
HOW TO OVERCOME
Deploy Hadoop Clusters with cheap storage and store with best possible compression
BIG DATA PITFALLS
Expert, Education, Execution
Big Data can help MOST BUSINESSES
Executives Not Sure
Belief Big DATA has all the answers
The Whole Mine is NOT GOLD.. Shows insights and coach
Education, Best Practices and Insights after mining and find useful patterns initially
17BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
PITFALLS…
Silo Culture
Multiple copies of ‘same’ data in different formats
HOW TO OVERCOME
Keep Raw Data (along with DR site), Transform during Analysis
BIG DATA PITFALLS
Devastating for companies. Single Source of Truth Key to Success
Big Data can help MOST BUSINESSES
Well Established Enterprise Data Warehouse
Intuition Based Culture Can only focus on Gold, if
you find Silver and other precious metal, you miss the mark. Show Insights and Move On To Gold
Keep it for Simple, Operational Analytics, Augment with Big Data for Innovation and Future Growth
18BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
Simple way to see some Big Data Challenges
• Data acquisition• Storage• Processing1st• Data transport & dissemination• Data management & curation• Big Analytics – Tools, Technology, Know-How2nd• Privacy, Security and Disaster Recovey• Technical/Scientific Talent• Cost of all of the above3rd
19BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
KARMA matters
Knowledge• Business,
Technology, People Strategy
• Big Data Sources, Lifecycle
• Re-invest based on actions
Action• Scalable
Architecture, Infrastructure, Tools & Technology, Resources
• Mining the Big Data with targeted and open mind to find Gold and other items
Recognition• Revenue By
Sell New Insights
• Increase Profit Margins
• Add new features to products & services
Market• Grow Share• Customer
Centricity
Advance• Innovate
with help of Big analytics
• Gather even more Big Data and keep going through this cycle
20BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
THANK YOU UNDERSTAND YOUR BIG DATA KARMA SCORE AND Understand the Big Picture, THE Direction and LEAD
Helps Build Strong
Foundation
Focus on OUR MOST VALUED CUSTOMES
INCREASE PROFITABiLITY
Innovate & Develop for
Future Differentiation &
Advantage
Advance the Brand
and Grow
21BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
Appendix
22BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
The Pitfalls for Adopting Big Data
The Big Data Definition of 4 V’s – Velocity, Volume, Variety, Veracity is incomplete.
The Belief that Big Data solves everything for Everyone.
Big Data is Abound, but Dimensions of it are to be understood
The Loudest Often Wins (LOW) or the highest paid person’s opinion (HIPPO) prevails
Data Driven approach trumps intuition is a hard nut to crack. Really!!
Data for Data’s Sake Talent Gap Data, Data Everywhere Infighting Aiming Too High
Reference: Wall Street Journal March 11, 2013 on page R4
23BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
1890 1920 1950 1980 2010
Brief History of Analytics
Copyright 2012 Dr. Priyadarshy
24BIG DATA ANALYTICS & PITFALLS TO AVOID© Dr. Satyam Priyadarshy
DEFINITIONS of Analytics for Business
ANALYTICS– Any data-driven process that provides insights
ADVANCED ANALYTICS– Helps understanding cause-effect relationship, prediction of future events,
best possible action
• BIG ANALYTICS FOR BUSINESS– Relevant for the business, actionable insights for
increasing revenue/profit, value measurement and leverages “Big Data”.