analyticsand bigdata
DESCRIPTION
TRANSCRIPT
21 Big Data and Analytics A Technical Perspective
Abhishek Bhattacharya, Aditya Gandhi and Pankaj Jain
November 2012
2 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Between the dawn of civilization and 2003, the human race created 5 exabytes of data Now we generate that every 2 days Total amount of global data is expected to grow to 2700 exabytes during 2012, up 48% from 2011
= 1,000,000 Tb 1 Exabyte
3 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Big Data Defined
Techniques and technologies that make handling data at extreme scale affordable.
Source: Forrester Research, ctoforum.org
VARIETY
Structured -> Semi-structured -> Unstructured
VOLUME
Terabytes -> Exabytes
VELOCITY
Batch -> Streaming Data
4 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Evolution of Analytics
2000s 2010s 1990s Late 2000s
Predictive Prescriptive Descriptive
What happened?
Standard Reporting
What could
Happen?
Simulation
Why did it happen?
Query / Drill down
What should I be doing?
Optimization
5 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
How is Big Data Analytics Different?
BIG DATA ANALYTICS
10s of TB to 100's of PB's
External + Operational
Mostly Semi-Structured
Experimental, Ad Hoc
GBs to 10s of TBs
Operational
Structured
Repetitive
Mathematics
Workload
Variety
Sources
Volumes
TRADITIONAL BI
Addition (Aggregation)
Complex Algorithms / Linear Programming
6 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
The Big Data Lifecycle
Manage
Enrich
Insight
Source: hadoop.apache.org; Microsoft.com; ibm.com
7 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Manage Data
ANY DATA, ANYWHERE, ANY SIZE
Non-Relational Relational Streaming
12345894597573629009890467382 3458945975736290098904673
945975736290098904673 8945975736290098
Data Movement
Source: hadoop.apache.org; Microsoft.com; ibm.com
8 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
ENRICH by Combining and Refining!
Discover
Combine
Refine
Source: Microsoft.com, oracle.com, ibm.com
9 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Insight | Anywhere, Any Device, Any User
ANY DATA, ANYWHERE (DEVICES), ALL USERS
Source: Microsoft.com, oracle.com, ibm.com
10 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
BIG DATA REQUIRES AN END-TO-END APPROACH
INSIGHT Self-Service Collaboration Corporate Apps Devices
ENRICH Discover Combine Refine
F(x)
MANAGE Relational Non-relational Streaming Analytical
Source: Microsoft.com, ibm.com
11 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
We are spoilt for choice in the marketplace
Product Proliferation
12 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL Source: Product Logos of Big Data Companies
13 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Enterprise Data Warehouse
Hadoop
Aggregate Oriented DB
In-Memory Stores
Source: Product Logos of Big Data Companies
14 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
• Requires referential integrity and
structured data - lack of flexibility and
agility
• Analytics and aggregation using OLAP
• “Shared-nothing” MPP Architecture
enable massive scale out architecture
• Best suited for Analytics using
structured data
• Key considerations include Data
Quality/Governance, structuring data,
segmenting analytics workloads
Ingestion Velocity
Variety
Volume
Processing Velocity
Analytics Complexity
ENTERPRISE DATA WAREHOUSES
15 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
• Java-based open-source framework
• Hadoop Core – MapReduce and HDFS
Structuring delayed until analytics
performed
• Flexibility as business grows/evolves
• Flexibility to build complex
algorithms/models for analytics
purposes
• Only option for Petabyte Range
• Best suited for batch-oriented analytics
• Works best when it’s possible to design
analytics algorithms as “scatter-gather”
• Key considerations: HDFS- file size,
map-reduce algorithm., sequential file
processing, data distribution
Ingestion Velocity
Variety
Volume
Processing Velocity
Analytics Complexity
HADOOP
16 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
• Maintains data in-memory and SSD
• Leverages shared-nothing architecture
to provide scalability
• In memory Databases (IMDB) – row or
column oriented schema
• In-memory Data grids (IMDG) – key-
value and de-normalized
IMDB: Best suited for real-time analytics
on structured data. Used for specialized
data marts as well as for OLTP needs
Key considerations: Data organization,
parallel query
IMDG: Suited for fast key-based data
access patterns or processing.
Key considerations: data distribution, key-
definition, data-process co-location
Ingestion Velocity
Variety
Volume
Processing Velocity
Analytics Complexity
IN-MEMORY STORES
17 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
• Highly scalable and available
distributed data-stores
• De-normalized data structures, data
organised as Aggregates. Data saved as
key-value, documents or columns
• Enable faster read/writes on
aggregates
• Best suited for analytics on semi-
structured data where access patterns
that can be bound in “a” key
• Key considerations: data distribution,
aggregate structure, key-definition,
data-process co-location
Ingestion Velocity
Variety
Volume
Processing Velocity
Analytics Complexity
AGGREGATE ORIENTED DB
18 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Volume Variety Ingestion Velocity
Processing Velocity
Analytics Complexity
Enterprise Data Warehouse
Hadoop
In-Memory Stores
Aggregate-Oriented DB
Product Category Comparison
Specific product selection will depend on an assessment of data and analytics requirements
19 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL
in·te·gra·tion • cov·er·age • pre·vis·i·bil·i·ty Aditya Gandhi
ADVANCED
PHYSICAL PORTFOLIO OPTIMIZATION
20 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL
• Making the next buck is harder • Constantly changing environment • Decisions are narrow or historical
CH
ALL
ENG
E
21 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL
CH
ALL
ENG
E
• Vast but un-captured information • Increasing volume / complexity • Coarse-grained operations
22 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL
CO
NC
EPT
• Toolset like a chess simulator • Takes in current state of the board • Provides best actions to take
23 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL
Markets
Price forecasts
Forward Curves
Volatilities
Costs and Tariffs
Asset Characteristics
Commodity In
Commodity Out
Transport
Storage
Processing
Plants
Beginning positions
Storage Inv
In transit Inventory
Exch Imbalance
Framework
Optimization User Actions
TARGET TRANSACTIONS:
Mkt Optimization formulates the optimal shape of transactions based on target portfolio and beg positions
EXECUTED TRANSACTIONS:
Exogenous and endogenous constraints and factors cause deviation from plan
24 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL
Retail Analytics Pankaj Jain
25 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL
Aspects of Retail Analytics
Market Basket
Analytics
Credit and Loyalty Card
Analytics
Shopper Insight
Store Location
Data
Geo Demo-
graphics
Category Segmentation
Product Affinity Brand Knowledge
Customer Segmentation
Loyalty
Lifestyle and Life Stage Segmentation
Brand Awareness Impulse Shopping
Store Location Store Size Store Format Competitive Analysis
Sociology Income/Education Infrastructure
26 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL
Retail Analytics Business Problems
How much money will customer spend
during the next visit?
When will customer visit the store next?
How many customers are price
sensitive?
How do I balance my product range across
store formats?
How can I find gaps in the product
range?
What should be delisted to introduce
new product
Do my shoppers buy across
range?
27 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL
Analytics Lifecycle
•Poor Structure
•Volume
•Inconsistent
POS & Other Data
•Volume
•Segmented
•Continuously Improved
Organized Data
•Template Reports
•Rapid Analysis
Summarized Data
•Segmentation
•Complex Algorithm
Processed Data
Attributes
Insight
Enrich
28 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL
Business Outcome
• Effective Promotions and Communication
o Over 8% increase in Steadfast customer and 5% more sales
o Over 80% acceptance of offers
o Over a million $ growth in the category
• Over 60% growth in the range with higher repeat sales and
new customers due to Range analysis.
• Addition of three new aerated drinks increased the sales of
that category by 12%.
• Overall higher consistent business growth.
Big Data Small Insights
29 © COPYRIGHT 2011 SAPIENT CORPORATION | CONFIDENTIAL
Conclusion
• Big data has more dimensions than just "Big"
• Lifecycle is critical
• Choose your product and platform wisely
• Big data analytics is lot more insightful than just
analytics
oBig Data Small Insights
oAsk the right question
• Ramp up your college statistics and mathematics!
30 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Thank You!