using big data_to_your_advantage

28
ohn Repko -- Pikasoft LLC Using Big Data to your Advantage It’s not just about toy elephants anymore… March 19, 2013 John Repko – [email protected] Source: http://blog.questionpro.com/2012/12/24/market-research-trends-2013-big-data/

Upload: john-repko

Post on 26-Jan-2015

108 views

Category:

Technology


0 download

DESCRIPTION

Presentation by John Repko to the Colorado Society for Information Management (http://www.sim-colorado.org/), March 19, 2013. It talks about big data "killer apps," and the two kinds of innovation ("Hindsight" and "Foresight") that big data can bring to any business.

TRANSCRIPT

Page 1: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Using Big Data to your AdvantageIt’s not just about toy elephants anymore…

March 19, 2013

John Repko – [email protected]

Source: http://blog.questionpro.com/2012/12/24/market-research-trends-2013-big-data/

Page 2: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Big Data Is Not Just About “Big” Data … It’s About FAST Data!(http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html)

2

Source: http://www.startribune.com/sports/164830346.htmlSource: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg

So How Did We Get to Big Data Anyway?

Page 3: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

There Are Big Data Breakthroughs Everywhere…

3

I’ve Heard About Big Data Successes…

“Watson” Wins on Jeopardy Google Wins the Search

Market

Progressive’s Instant “Overnight” rate

quotes

Beat the best Jeopardy players ever Massively parallel web searches

with results back in a tenth of a

second

Progressive creates an

insurance quote for every car and truck in the US –

every night

Page 4: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

How Can I Determine If These Big Data Wins Apply to My Business?

4

• Where do I put the data?• How do I load the system?• How do I find the value in the

data?

• How do I present it?• How long is this going to take?• How much is this going to cost?

You Need A Proven Approach to Finding the Value in Your Data

Source: http://www.beingjavaguys.com/2013/01/what-is-big-data-introduction-and.html

Page 5: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

The Key is to Recognize That There IS a Pattern to Big Data Wins

• Foresight– We are presented a pattern – What has the

outcome been when we’ve seen similar patterns in the past?

• Hindsight– We are presented an outcome -- What

pattern of events anticipated the outcome in the past?

5

The Variety Of Big Data Wins In The Press Fall Into Just Two Solution Patterns

We Don’t Need Dozens Of Solution Approaches For Big Data – Just Two

Page 6: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Big Data Wins – Not “10 Problems” But Only 2

6

1. Modeling True Risk• What past patterns led to success or default?

2. Customer Churn Analysis• What do customer churn patterns predict about our products and markets?

3. Recommendation Engine• We have search terms – what have the results been from similar searches in the past?

4. Ad Targeting• We have profile information – what offers have led to sales for similar profiles in the

past?

5. PoS Transaction Analysis• We have your purchase history – what deals might we offer in the future?

Summary – 10 Common Hadoop-able Problems*

Foresight Hindsight

In This Light, Let’s Take A Look At The “10 Hadoop-able Problems”

* http://info.cloudera.com/TenCommonHadoopableProblemsWhitePaper.html

Page 7: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Big Data Wins – Not “10 Problems” But Only 2

7

6. Analyzing Data Logs to Forecast Events• We have your logs – what pattern of events have anticipated failures before?

7. Threat Analysis• We have a specific event – what results have we seen from similar threats in the past?

8. Trade Surveillance• Does this parcel raise any alarms, based on our history of past parcel-tracking?

9. Search Quality• We have a set of search terms – what have similar searches succeeded in finding in the past?

• Data “Sandbox”• We have your data, possibly unstructured data. What patterns in that data might we bring to your

attention now?

These Two Solution Types Apply Generally To The Hadoop-able Problems

Summary – 10 Common Hadoop-able Problems*

Foresight Hindsight

Page 8: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Data Warehouse Advanced Analytics Is Expensive and Generally Restricted To Structured Data

• According to Gartner, Enterprise Data will grow 650% by 2014. 85% of these data will be “unstructured data”, with a CAGR of 62% per year, far larger than transactional data

• Growth is taking place in areas not well served by RDBMS’s and DW’s

8

Source http://www.vertica.com/writable/knowledge_articles/file/bi_vertica.pdf: http://thecloudtutorial.com/hadoop-tutorial.html

Structured:Managed by RDBMS & DW

Unstructured:Growth Areas Not Managed well by RDBMS or DW

Page 9: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

The Tremendous Growth Of Data Is In Unstructured Data That Is Best Managed Outside The RDBMS

9

Structured:Managed by RDBMSor DW

Unstructured:Not Managed by RDBMS or DW

Page 10: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

The New Areas Of Non-RDBMS Managed Data Are Rich In Business Value And Are Ripe For Analysis

10

Structured: Managed by RDBMS

Unstructured:Not Managed by RDBMS or DW

Page 11: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Big Data Stores Are Increasingly Architected With Open-Source Tools

11

DataIntegration

Tools which extract, transform, and load data between Relational and Non-Relational datasets.

NoSQLData Store

Datasets structured as columnar, key-value, or document-based in order to overcome limitations in traditional relational modeling for ‘Big’ datasets.

MapReduce

Languages

Higher-level wrapper languages which simplify Map Reduce development efforts.

MapReduceEngine

CloudMapReduce

Processes (‘Map’ and ‘Reduce’ functions) which analyze very large datasets across distributed systems.

Page 12: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

You Have Data. Here’s What You Need to Unlock It

• Load the data in a system equipped with the tools to analyze it– Via a standard interface, or– Programmatically

• Determine valid relationships in the data

• Analyze the data for these common patterns

• Tune the analytics

• Visualize the results

• Pursue the patterns that emerge

12

• The system has to live where the data lives (otherwise transmission costs become prohibitive)

• REST or SOAP are the most common interfaces

• Bloom Filters can provide set operations in large data sets

• ORM (Object-Relational Management) simplifies data access

• Hadoop provides parallelized analysis for unstructured data

• Starfish provides automatic analytics tuning for Hadoop

• Structured data can be analyzed via statistical analysis (for numbers) or free-text search (for text)

• Solution patterns can be applied automatically once the data is sandboxed

• Visualization can help to grasp the key patterns and results

Needs Requirements

The Right Platform Can Meet All Of These Requirements

Page 13: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Additional Tools: With a Platform for Big Data, We Can Expand Our Analysis with Rich Analytics Tools

13

1. Predictive Modeling

2. Data Visualization

3. Cluster Partitioning

Key Big Data Analytics Solution Patterns

4. Outlier Analysis

5. AB Testing

6. Markov Chains

These Patterns Provide Straightforward Way to Finding Big Data Wins – Here’s How

Source: http://www.cognizant.com/InsightsCognizantiarticles/Cognizanti_Sow'sEar_Analytics.pdf

Page 14: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Big Data And Classic Analysis Patterns Are Creating A New Class Of Enterprise Applications

14

Data Sources Data Processing Data Presentation

Google Chart Tools

Public Data Sets on AWS

These Offerings Emerged In The Consumer Domain And Enterprise Users Are Coming To Have Similar

Expectations

Page 15: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

But New Applications Will Remain Just Curiosities, “One-Offs” Unless The Underlying Patterns Are Drawn Out

• There’s Nothing New Here: Hadoop is Turing-complete, as are most general-purpose processing and analytics packages

• To provide richer insights, tools like Hadoop need more advanced processing patterns:

Basic PatternsFiltering | Parsing | Counting/Summing | Collating | Sorting | Distributed Tasks | Chained Jobs

Advanced PatternsDistinct | Group By | Secondary Sorts | Joins | Distributed Sorting

Leading-Edge WorkClassification | Clustering | Regression | Dimension Reduction | Evolutionary Code

15

To See More Advanced Patterns and Richer Presentation, The Basic Patterns Must First Become Routine

Page 16: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Software Will Capture the Value of Intellectual Property

17

2012 Internet Company Valuations as %Revenue

• Pure services companies generally yield a company valuation of 0.5 to 1.0x Annual Revenue

• Recurring revenue businesses (hosting, support) typically generate 2.5 – 4.0x Revenue

• Product businesses derive their multiples from: growth, product margin, network effects, customer lock-in, and ecosystem effects) – with a good product, valuations of > 5X Revenue are possible

http://abovethecrowd.com/wp-content/uploads/2011/05/pr_mults.png

Page 17: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Capturing Trends – Where Is the IT Industry Headed?

18

IT Product Breakthroughs Happen When Technology Advances Invalidate “Old” Product Assumptions. Here Are The Principal Areas Where Old Assumptions Will Be Obsoleted.

• 5 major trends– Big Data: Big Data Just Beginning to Explode

– Cloud: Cloud Computing Market Size – Facts and Trends

– In-Memory: The Coming In-Memory Database Tipping Point

– Handheld: Five Emerging Trends in Analytics

– Real-time: Using Analytics to Create a Sense-and-Respond Organization

Page 18: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Capturing Trends – Why Bother? Who Cares?

• Big Data: – According to Michael Stonebraker and Jeremy Kepner the future of Hadoop is doomed– According to Mike Miller of Cloudant

the days are numbered for Hadoop as we know it

• Cloud:– Even PCI and HIPAA data is evolving into cloud-hosted models

• In-Memory:– Spinning disk is "the new tape" (overflow, recovery)

• Handheld:– Mobile Internet devices will outnumber humans this year, Cisco predicts

• Real-time:– Future of computing technology belongs to handheld devices

19

“You can’t just ask customers what they want and then try to give that to them. By the time you get it built, they’ll want something new. It took us three years to build the NeXT computer. If we’d given customers what they said they wanted, we’d have built a computer they’d have been happy with a year after we spoke to them — not something they’d want now.”

~ Steve Jobs

Page 19: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

The Cloud Provides a Platform For Do It Yourself Analytics

• Why the cloud matters

– Analytics cannot be “do it yourself” until everyone has access to a platform suitable for holding and processing Big Data.

– Only the cloud has the scale, speed, and availability to process Big Data universally

• What it gives us that is unique and differentiating

– Big Data projects today are 1) expensive, 2) long lead-time, and 3) run on masses of local hardware. With inevitable commoditization this has to change.

– The trend is to “do it yourself” analytics – if we build the ability to give do it yourself analytics, applications will appear that were inconceivable before the environment was created

• What we need to make happen

– Robustness –at least 3-nines of availability and zero data loss

– Security – starting with things like 5 Ways Amazon Web Services Protects Cloud Data

– Privacy – where it begins: Complying to the Higher Standard

20

Page 20: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Handhelds Make Analytics Available Everywhere

• Why handheld client delivery matters

– There are now more smartphones than client PCs

– More than 25% of users use smartphones for their primary web access

– The future of internet computing is mobile

• What it gives us that is unique and differentiating

– Hadoop is dreadfully mismatched with handheld access (batch, no standard client or reporting interface)

– Coming in-memory databases (HANA, Vertica, VoltDB) will provide a much-better mesh with handheld

• What we need to make happen

– Make handheld our primary target UI (design for thumbs, not mice … and more)

– Target do-it-yourself analytics use cases

21

Page 21: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Real-time Makes Previously Unthinkable Apps Possible

• Why real-time matters

– Users increasingly expect real-time analytics

– The first wave of real-time analytics tools is becoming available

• What it gives us that is unique and differentiating

– "Self-service" analytics

– Intuitive and unconstrained data exploration

– Instant visualization of complex datasets

– Viable plays for a variety of asset types

• Credit card debt, Student load debt, Properties, Insurance, etc.

• What we need to make happen

– If Hadoop – we must evolve to interactive batch execution (or overnight batch, like Progressive Insurance)

– If In-memory DB– need to select and groom a handheld interface and design for sub-100ms response times

22

Page 22: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

Beyond Big Data – The Emerging Big Data Tech Platform

23

RDBMS In-Memory RDBMS

On-Premise Distributed Cloud

Structured Data DWs Big Data Universal Data

Batch Hadoop Batch Always

Hindsight Foresight

Lumpenprogramming Today Tomorrow

Report Specialists Data Scientists Everyone

ReportsData

WarehousesBig Data

DIY Analytics

For what?

By whom?

What?

With what?

Stored where?

Processed where?

How?

When?

Here’s Where Our World Is Headed

What Happened? Why Did That Happen? What’s Next?

Page 23: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

The Future: Here’s What The Evolution Looks Like

24

Trend Development Initiatives Who’s Doing ItBig Data • APIs. No one is likely to reach a market with Big Data analytics

fronted by their own UI. Success will come from API links to• Level 1: REST Access API• Level 2: Plug-in API• Level 3: Runtime environment

Open territory! Infochimps has Level 1, Amazon (Elastic Mapreduce) has levels 2 and 3. Who else will play???

Cloud • All of the Cloud players are investigating DB-rich offerings• VoltDB options with AWS High IO option• “38% of all companies are planning a BI SaaS project before the end of

2013.”

Everybody: Amazon, Rackspace, Heroku ... Accenture

In-Memory • Move demo to DAHANA architecture (not hand-coded)• Select non-HANA in-memory DB (probably VoltDB) as secondary

platform• Hadoop evolves for a processing platform to an ETL gateway from

unstructured to structured data

• SAP / Hana• HP / Vertica• other NewSQL players

Handheld • Evolving UIs with HTML5 + JQuery Mobile• Reporting platforms increasingly offer mobile interfaces• Review Big Data interfaces to IPad and Android devices

Two principal camps -- Apple IOS and Android

Real-Time • Investigate CDN options for Big Data deployment• Confirm DB performance on buffer pool, locking, latching, recovery• Design for sub-100ms delivery

Just getting started...

Page 24: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

• Vision: – Target Audience: Product Executives

– Anticipated Benefit: Keep up with market leader Amazon, build up-sell and cross-sell revenue

– Delivered Benefit: Better market segmentation, enhanced revenue through “customers who bought xxx also bought...” recommendations.

– Alternatives: CRM recommendations do not draw on deep sense of customer intent

– Why It Kills: Provable revenue growth through A-B testing

25

Today’s Killer Apps: Recommendation Engine For Enhanced Retail Marketing

• How to Implement It:– Proof of Concept: Small cloud-based recognition

engine, based on readily-available (customer profile, purchase history) data stores

– Initial Rollout: Still cloud-based, but with broader streams (e.g. search histories) and dynamic updates

– Test and Customer Acceptance: Pilot program with configuration from the Initial Rollout, but now tied (on a limited basis) into retailing process and systems

– Full Rollout: Could be cloud or in-house, but moving to richer streams and real-time (i.e. in-memory) data access

– Maintenance: Tools updates, streams updates, transition to real-time data access

Today’s Tools: The Killer Apps

Page 25: Using big data_to_your_advantage

John Repko -- Pikasoft LLC 26

• Vision: – Target Audience: High end retailers with profitable

service contracts (e.g. computers, cameras, sound systems)

– Anticipated Benefit: Increase penetration rate of service contracts by pre-calculating terms in advance of sale or service renewal

– Delivered Benefit: Reward customer with historically low service costs, and increase penetration of profitable service deals by pre-calculation of ideal rates

– Alternatives: Consumers generally know one-size-fits-all service contracts are overpriced. If you can’t fit the terms to the customer then you can’t complete the service contract

– Why It Kills: Big data approach pre-calculates appropriate terms for all customers in advance of a sales or renewal transaction

• How to Implement It:– Proof of Concept: Small cloud-based run with limited

data sets to confirm data adoption approaches and identify most profitable segments in that sub-population

– Initial Rollout: Still cloud-based, but with larger data sets and dynamic updates

– Test and Customer Acceptance: Pilot program with configuration from the Initial Rollout, but now tied (on a limited basis) promotions and target marketing

– Full Rollout: Could be cloud or in-house, but moving to larger data stores, real-time (i.e. in-memory) data access and notifications across the full customer set

– Maintenance: Tools updates, stores updates, transition to real-time data access and notifications

Today’s Tools: The Killer Apps

Today’s Killer Apps: Analysis and Prediction Engine

Page 26: Using big data_to_your_advantage

John Repko -- Pikasoft LLC 27

• Vision: – Target Audience: Utilities executives

– Anticipated Benefit: Sell a energy or utilities package that better fits customer interests and reduces customer costs while increasing energy/utility margins

– Delivered Benefit: Customer gets a package that better fits their specific interests (e.g. “green”) and exec sells higher-margin offerings

– Alternatives: One size plan fits all does not capture customer interests or delivery high-margin offerings well

– Why It Kills: More customized packages better fit customer needs while reducing capital expenses and increasing margins for the utility

• How to Implement It:– Proof of Concept: Small cloud-based run with limited

data sets to capture basic patterns and confirm data adoption approaches

– Initial Rollout: Still cloud-based, but with larger data stores and dynamic updates

– Test and Customer Acceptance: Pilot program with configuration from the Initial Rollout, but now tied (on a limited basis) into production logs with reporting

– Full Rollout: Could be cloud or in-house, but moving to larger data stores, real-time (i.e. in-memory) data access and notifications

– Maintenance: Tools updates, stores updates, transition to real-time data access and notifications

Today’s Tools: The Killer Apps

Today’s Killer Apps: Log Analysis Engine

Page 27: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

This Is Only The Beginning. With A Standard Platform We’ll See Richer Big

Data Discoveries Become Routine

The Solution Tools (Slide 13) Become Straightforward if We Run Them on a

Standard Architecture

“One man’s noise is another man’s data.”~ Bill Stensrud - InstantEncore

29

Summary

Page 28: Using big data_to_your_advantage

John Repko -- Pikasoft LLC

• John Repko: [email protected] - (720) 624-6025

30

Contacts

https://pikasoft.s3.amazonaws.com/Using_Big_Data_To_Your_Advantage.ppt