big data analytics summit - april, 2014
DESCRIPTION
Presentation on Big Data Analytics at Vellore Institute of Technology, ChennaiTRANSCRIPT
Big Data Analytics !
Industry Perspective
Shankar Radhakrishnan
Topics• Market Research • Market Trends • Big Data Analytics in
• Banking and Financial Services • Insurance • Travel and Hospitality • Retail • Life Sciences • Manufacturing • Telecommunications
• Challenges vs. Opportunities • Q & A
3
Market Research
4
Key Trends driving Big Data AnalyticsIndustry
Financial services▪ Customer Insights – Integrating Transactional data (CRM/Payments) and unstructured Social feeds ▪ Regulatory Compliance – Risk exposures across asset classes, LOBs and firms ▪ Fraud Detection in Credit Cards & Financial Crimes (AML) in Banks
Travel, Hospitality & Retail
▪ Customer centricity – Customer behavior analysis from Omni channel retailing & Social feeds ▪ Markdown Optimization – Improve markdown based on actual customer buying patters ▪ Market basket analysis – Narrow down market basket analysis by demographics
Life Science▪ Improve targeting & predictions – Automatic Detection of Adverse Drug Effects (ADEs) ▪ Patient data analysis – Longitudinal Patient Data (LPD) analysis ▪ Predictive Sciences – Analyze Preclinical Side Effect Profiles of Marketed Drugs
Healthcare (Payers & Providers)
▪ Cost of Care – Drug effectiveness & Cost of Care Analysis based on electronic Health Records (EMR) ▪ Self Service Healthcare – Increase in mHealth & eHealth to allow consumer access to health information ▪ Claims Analytics – Analyze insurance claims data for fraud detection & preferred treatment plans
Communication, Media & Entertainment
▪ Discover churn patterns based on Call data records (CDRs) and activity in subscribers’ networks ▪ Digital Asset Management (DAM) – Analyze & capitalize digital data assets
Manufacturing▪ Proactive Maintenance & Recommendation – Sensor Monitoring for automobile, buildings & machinery ▪ Energy Efficiency – Leveraging Smart meters for utility energy consumption ▪ Location or Proximity Tracking – Location based analytics using GPS Data
Hi-Tech ▪ Extend and complement conventional information supply chain with big data path ▪ Predictive analysis and real time decision support
Trends
5
John calls a customer care executive at the bank. !He is irritated with the services offered to him and is expressing signs of making a switch
Executive validates the customer’s identify and pulls up an application powered by Big Data that presents all relevant information to make a decision. !Big Data Application converts his speech to text in real time and identifies his propensity to churn.
Based on John’s tonal sentiment the application immediately pulls up top 5 offers or decisions to take based on the Customer Pe r sona i n fo rmat i on wh i ch contains l ikes/disl ikes, past experiences, which channels he prefer, CLV(Customer Life-time Value) etc.
Well Informed Customer Service Executive
6
Social media
Depositions
Complaints
Voice Data
Unstructured Data Speech to Text Conversion
Decision Engine
Analytical System
Customer Persona
•Customer Persona •Demographics, •Top interactions •Channel Preferences, •Dis-satisfiers •Customer Lifetime Value •Recent Contact History •Customer Sentiment •Trend during the call
Customer’s state of mind
Sentimental Analysis
Other Channel information
(ATM, Branch)
Big Data Warehouse
Traditional Warehouse
•Customer Executive Dashboard presents all intelligence required to make a decision •The decision engine also presents important decisions to be taken for the particular customer issue
Well Informed Customer Service Executive
7
Fraud Pattern Analysis & Detection
Envisaged Benefits ▪New fraud patterns can be identified by building ‘analytical models’ to run against X yrs. of History data ▪‘Web crawling’, ‘Contextual text analysis’, ‘Natural Language Processing’ allows fraud behavior identification from social media. It may increase Fraud detection success rate ▪‘Real time’ models to capture behavioral patters and do pattern analysis against History data to evaluate Fraud case validity. The model learns by self and updates ‘Fraud pattern master sets. This brings ‘artificial intelligent’ fraud pattern detection and analysis ▪‘Real time’ (in the order of .5-1 minute refresh rate) alerts to Fraud analysts about ‘self learned’ fraud patterns based on new customer behavior patterns
Process ▪Formation of key value groups to the order of XcY (where X no. of attributes that are relevant to Fraud and Y is no. of attributes that should be combined to identify patterns) ▪High speed history data loading from source systems ▪Efficient Real time fraud detection by identifying patterns through customer behavioral events and processing them over X yrs. of history data
Scenario ▪Formation of Fraud patterns using •Real time data coming from different departments like IVR, WEB, Customer profile, Transactions etc •Real time Mining and analysis of history data to form prior patterns
Fraud Pattern Analysis & Detection
8
Legacy Fraud Data
Customer Profile Data
Social Media Data
Card Transaction
Data
Decision Engine
Approval/Denial
Decision
History Data Processing to
find Fraud Patterns over years
Real-time Customer Behavior Analysis for
Fraud Detection
Real time Analysis of behavior patterns
Real time update to Decision Engine
Self Learning Fraud Detection
9
Cross Channel Analytics
John exhibits a specific pattern when he avails services. !He always visits the bank when he wants to deposit a check. !He prefers most other operations to be online. !He has recently started paying his utility bill payments through mobile.
• Analytical Solution integrates Customer transactions through different channels and reveals insights on customer’s channel preferences and activities.
• It also integrates data from call centers, surveys and complaints and measures Customer Experience.
• It reveals customer activities across channels which is normally not available for a customer touch-point to deliver superior service
• I t r e v e a l s o p p o r t u n i t i e s t o consolidate channels and optimize cost of operations by incentivizing customers to choose one medium over other
Analytical Solution produces !
• Dominant Path Analysis specifying which channel is used by John for which events
• Service Behavior Segmentation • Customer Journey analysis • Root Cause & Repeat Issue analysis • Longitudinal analysis on customer
preference changes !Helps bank deliver superior service and also optimize cost on specific channels
10
Analytics
Cross Channel Analytics
Big Data Warehouse
Dominant Path Analysis- channel usage info
Service Behavior Segmentation
Repeat Issue analysis
Query Drill-down
Ad hoc Reports
Predictive ModelingStatistical Analysis & Text Mining
Optimization
Root cause analysis
Call Reasons analysis
Customer Journey analysis
Structured data
Web & Mobile
ATM / Branch
IVR, Call Records, Notes
CRM Data
ACH / Wire Transfer / Other channels
Unstructured Content / Logs
DW
Transactions
Mailings, Offers, Lists
Other Channels
Survey
Complaints
11
Analytics Data Mart
Member profiling based on profile, demographic, social
media and history data. Identification of key
predictor variable for customer churn
Member profiling and variable identification
Termination prediction modeling
Termination prediction modeling engine to
determine “probability of termination” at each
member level
Member Prioritization Matrix
List of members with high likelihood of
termination
Retention Target list generation
Alternate product recommendation
engine
List of suitable products for each
customer
Analyze profitability of each of the
recommended product
Create most optimal and effective Retention
campaigns
Personalized Retention Plan
Churn and Retention Analytics
1212
Analyze customer’s search pattern by doing the weblog analysis using big data. !e.g. Rate or amenity which customer prefers !Step 1: Customer Starts the search on website
Drill down into specific search patterns and analyze customer’s rate preference or amenity expectation on a particular rate e.g. !Step 2: Customer selects some destination. !Search displays all the hotels and then refine the search by selecting a price range or sort the search based on price and then he leaves and doesn’t book. !It concludes that customer didn’t find the hotels at his\her expected rate. !Step 3: Customer selects some destination. !Search displays all the hotels and then refine the search by selecting preferred amenity e.g. swimming pool,wifi etc. and then he leaves and doesn’t book. !It concludes that customer didn’t find the hotels with expected amenities.
Popup right offers to the customer when they search which in return increase customer attraction and sales as well. !Revenue management team to use this data and come up with ideal rate. !The search pattern can be used for individual property amenity improvisation. !Step 4: This data can be forwarded to revenue management team to setup the right\competitive rates in right geography !Step 5: This data can be forwarded to propert ies as wel l for amenity improvisation
Look to Book Ratio Analytics
13
Planogram – created by planners and buyers
Actual view of the shelf arranged by store associates
Compliance dashboard as well as compliance score by Dept./Category/Subclass
▪ Planogram compliance is the process of verifying if the products arrangement and the manner in which they are displayed on the shelf in each store match the planogram that is strategically created and collaboratively developed between planners and trading partners
▪ Usually this verification and compliance check is a time consuming process and done on a sample basis. When the execution of planogram is compromised or if there are assortment void, it is a lost opportunity
▪ To accelerate this compliance check – take picture of the actual shelf by product facing, position and systematically compare for compliance
The Need
▪ Storing the planogram’s created at corporate location for each store/dept./category combination
▪ Storing the actual photo of the shelf ▪ Comparing this unstructured data for matching ▪ Integrate this matching score with planogram
planning data in Data warehouse to produce various dashboards and metrics that will influence sell-thru, profitability and customer satisfaction
Big Data Analytics
Planogram’s Compliance
14
▪ eCommerce retailer needs to analyze graphic images depicting items for sale over the Internet
▪ When a consumer wants to buy a red dress, their search may not match the tags used to identify each item’s search terms.
▪ Manufacturers do not always label their goods clearly for the distributors or identify keywords with which users are likely to search.
The Need
▪ Analyze thousands of dress images, detecting the red prominence of the primary object in the graphic (JPGs, GIFs and PNGs)
▪ This requires enormously complex logic for the computer to “see” the dress and its primary colors as humans do.
▪ Millions of images are tagged with additional information to assist consumers with their search
▪ Increases the chances that they find the item they were looking for and make a purchase
Big Data Analytics
Intelligent Item Search
15
ProcessInput Benefits
Predictive Biology External and Internal Literature sources
Text mining used for linking molecule with metabolic processes such as glucose uptake, fatty acid synthesis, metabolic stress etc. !Manual curation can be done to extract assertions and relationships with respect to effect, drug treatment, experiment type etc.
Vital evidence collected on the effect and relationships on species, tissues and linkages to canonical pathways and RNA expression data
Business Goal
Rapid extraction of key information from literature sources to collect evidence on biological processes !Assessing the incidence of Nausea in development compounds by analyzing the preclinical side effect profiles of marketed drugs
Statistical models created to find out the relation between various preclinical observations and occurrence of nausea !Model shows clustering of compounds associated with nausea having higher gastrointestinal preclinical observations
Model helps in identifying the risk of nausea early on during development !Running this model during compound selection can minimize the risk of seeing nausea in follow up compounds
Predictive Pre-Clinical Safety Gastrointestinal preclinical findings of marketed drugs
16
Predictive Sciences - Predictive Biology & Predictive Pre-Clinical Safety
Data Processing StepsInput Benefits
EMR data !Prescription data !Promotion data(eMail, Sales Calls etc.) !
Identify key themes in the EMR data for a particular disease type Use the prescription data to validate the patterns / themes !Merge the findings with the promotion data to uncover any relationships between promotion and treatment
Refining targeting / promotional strategies !Cost Reductions !Uncover potential reasons for choosing a particular therapy
Business Goal Linkage of EMR/Prescriber/Promotion data in order to understand the relationship between prescriber promotions and treatment patterns
Improving Diagnosis by EHR/EMR Data Analysis
17
Challenge Analytics Benefits
Wastage of energy and resources !Under utilized room’ temperature and lighting settings !Huge Energy bills
Historic Sensor Data !Blue-prints of the building and room layouts !Realtime Sensor Data !Temperate settings in the room and building !Usage patterns of the room
Green Energy, Smart Energy Management
Optimize consumption of energy in business environments !Networked sensors and a new generation analytics tools play a huge role in gaining insights and to implement the most efficient and sustained energy strategies.
18
Case
• Typically contact center channel data is analyzed typically from SLA perspective: TAT, Average wait time. • However, the actual transcript of the conversation can yield powerful insights regarding telecom infrastructure usage
Customer call-centerText Mining
Collocation Analysis fromCell Phone Towers
• Collocation analysis by an investigation team finds out if there were multiple phones with the same person. • Examination involves Terabytes of CDR/Tower records from the switch, one can triangulate on a few
collocation events
Multi-device Event Stream Analysis
co-relating Firewall & IDS & Switch activity
• Most telecom infrastructures IDS (Intrusion detection systems) sit at the periphery, with network monitoring , Firewalls and application logs being captured in silo
• Deploy Central Log File repository with events streaming from multiple devices that are ingested and collated centrally • Channels into intelligence, network infrastructure and security of the telecom assets • Optimizes significantly to detect everything from malware and spear phishing attempts to breach security
Optimizing cost of Telecom Tower Maintenance
• Big Data platform manages fuel consumption data in the telecom tower business • Each of the telecom towers has a generator and one of the biggest components of cost is diesel cost • Sensors/energy meters which constantly emit large data streams of operational data • Machine learning algorithms crawls through operational data stored over years to predict and optimize cost and revenue
User Behavior Analysis
• Operational systems at each telecom service provider generates huge data volumes in the form ofCall Data Records(CDR) for each call/SMS handled
• Signaling data between various switches, nodes, and terminals within the network • Mining of this data leads to insights for improving marketing operations, network and service optimization
Planning Sales Approach
• Large-scale data analysis boosts the ability to pinpoint exactly where ongoing sales approach could make further gains • Study the behavior of customers to see what factors motivated them to choose one brand or product over another. • This involves analyzing online search data and real-time information, shared by consumers across social networks and
other Web-based channel - about the company’s products and services • Brand affinity and customer sentiments are measured using Sentiment Analysis algorithms
Big Data Analytics In Telecom
Big Data Analytics
19
✓ Hype, Buzz & Myth
✓ “How?” vs. “Why?”
✓ Big Data Analytics for Business, than just for IT
✓ Business Case Justification
✓ Right Partner’s For Your Big Data Analytics Journey
✓ Evangelization and Alignment
✓ Business Onboarding
✓ Execution Plan And Course Corrections
✓ Talent and Knowledge Management
✓ Right math for ROI
20
Big Data Analytics : Challenges vs. Opportunities
Source: The Evolving Role of the Enterprise Data Warehouse in the Era of Big Data Analytics , By Ralph Kimball
Vector, matrix, or complex structure Free text Image or
Binary data Data “bags”Iterative logic
or complex branching
Advanced analytic routines
Rapidly repeated
measurements
Extreme low
latency
Access to all data
required
Search Ranking X X X X X X
Ad Tracking X X X X X X X X
Location or Proximity Tracking X X X X X
Social CRM X X X X X X X
Document Similarity Testing X X X X X X X X
Genomic Analysis X X X X X
Customer Cohort groups X X X X X X
Fraud Detection X X X X X X X X X
Smart Utility Metering X X X X X X
Churn Analysis X X X X X X X
Satellite Image Analysis X X X X
Game Gesture Analysis X X X X X X X X
Data Bag Exploration X X X X X X
Ad Tracking / Click stream analytics Location or Proximity Tracking Social Media Analytics / Social CRM Document Similarity Testing / Match Making
Customer Cohort Groups Sensor Monitoring (Flights / Building Smart Utility Metering Call Center Voice Analytics Log Analytics
Satellite / CAT Image Comparisons Fraud Detection Game Online Gesture Analysis Big Science (Astronomy, weather, atom smashers, Genome decoding)
Search Ranking Risk Management Churn Analysis Data “Bag” Exploration / Causal Factor Analysis
Design Challenge
21
Thanks Much !