Transcript
Page 1: Real-Time Analytics at Salesforce.com

Real-Time Analytics at Salesforce.com

Donovan SchneiderPrincipal Architect

SDForum May, 2010

Page 2: Real-Time Analytics at Salesforce.com

Agenda

Motivation Our approach Making it work Conclusions and future directions

Page 3: Real-Time Analytics at Salesforce.com

Evolution of Business Intelligence: Canned reports ➜ Ad-hoc query ➜ DW ➜ Real-Time Cloud Analytics

More than 50 percent of data warehouse projects will have limited acceptance, or be an outright failure

Page 4: Real-Time Analytics at Salesforce.com

Real-time. Always.

Accessible By Mere Mortals

Flexible. In Sync. Reportable.

Our Vision for CRM AnalyticsDeliver Insight That is Accessible, Real-time, and Trustworthy

Page 5: Real-Time Analytics at Salesforce.com

What Drives Actionable Insight?

ResponsiveRelevant

Easy to Use Actionable

Reliable

Reporting and Analytics that are…

IncreasedUser Adoption

Business user friendly

Powerful capabilities to answer real-world business questions

Fast performance, timely insight when needed

Integrated into the CRM to enable actions from insight

Accurate & consistent results

Real-Time

VisibilityUser

AdoptionActionable Insight

Page 6: Real-Time Analytics at Salesforce.com

And Our Customers Use It. A Lot!

12M+ reports2.5M+ run per day

750K dashboards700K views per day

Page 7: Real-Time Analytics at Salesforce.com

Agenda

Motivation Our approach Making it work Conclusions and future directions

Page 8: Real-Time Analytics at Salesforce.com

We take a fundamentally different approach than most

Painfully slow unless against DWData is never fresh against DWChanges to CRM propagate slowlyETL process is complicated and expensive

Usable by all, from rep to SVPReal-time, all the timeFlexible & customizablePowerful w/o complicated DWOne sharing model

Easy, Real-Time and FlexibleComplicated, Out of Date, & Rigid

Single tenant DW CRM

Other Systems

ERP

OtherClouds

72,000+ Companies

HROther

Systems

ETLProcesses

Real-Time Reporting

DW Reporting

1x / day

Page 9: Real-Time Analytics at Salesforce.com

Is a Data Warehouse Needed for CRM Analytics?

PerformancePre-aggregation was the only way to get decent reporting/analytics performance out of OLTP/CRM

Why people think they need a DW

Requirement to combine multiple data sources in 1 report/dashboard-CRM systems were hard to integrate with external data sources-And then, they were not built at all for BI

Business View of the Data-Corporate wide ontology-Single view of the customer-Historical data capture

Why we don’t need one

Force.com API200M+ API calls/day10M records/hourConsumes External Web Services

Entire System is Business Driven-Business people configure the system with their business terms, there is no IT translation req’d-Sales, Service, Analytics Clouds are all on same platform-History Tables and Analytical snapshots

Cloud Computing Scale & Multi-tenant Optimization Engine

And DW based architecture makes the system out of date, rigid and expensive

Page 10: Real-Time Analytics at Salesforce.com

Analytics

Dashboards Reports List Views Search

Page 11: Real-Time Analytics at Salesforce.com

Agenda

Motivation Our approach Making it work Conclusions and future directions

Page 12: Real-Time Analytics at Salesforce.com

Building a Multi-tenant Cloud Platform is Hard!

Lots of Pieces to Assemble! Relational / Text / Non-relational Application Services / Lifecycles Caching and Performance Scalability and Reliability Infrastructure and Backups Release Processes

Development Lifecycle

Page 13: Real-Time Analytics at Salesforce.com

Brief Review of Force.com Multi-tenancy

Real-time App Composition

Massive Shared Database

Shared General Purpose Kernel

Page 14: Real-Time Analytics at Salesforce.com

True Multi-tenancy: Why Share Everything?

~15 Databases ~100 Servers

2 Mirrors

100,000’s of Unique Applications

1 Code Base

Page 15: Real-Time Analytics at Salesforce.com

Force.com Data Architecture

Shared Metadata Cache

Bulk Processing Engine

Multi-Tenant-Aware Query Optimizer

Runtime Application Generator

Full-Text Search Engine

Real-time App Composition

Page 16: Real-Time Analytics at Salesforce.com

Sharing Relational Data Structures is Hard

Your Definitions

YourData

YourOptimizations

IndexesPivot table for non-unique indexes

UniqueFieldsPivot table for unique indexesRelationshipsPivot table for foreign keys

MRUIndexPivot table for most-recently-usedFallBackIndexPivot table for Name field index…others…

Harrah’s Data

Dell’s Products

Your Rep’s Data

Page 17: Real-Time Analytics at Salesforce.com

Flex Schema on Steroids: Everyone’s Data

Flex Column: Multiple Data Types

ID Tenant Data 2

1000001 Harrah’s $190

1000002 Harrah’s $250

1000003 Harrah’s $680

1000004 Harrah’s Poker

1000005 Harrah’s Black Jack

1000006 Harrah’s Craps

1000007 Dell Display

1000008 Dell Laptop

1000009 Dell Server

Page 18: Real-Time Analytics at Salesforce.com

ID Data 1 Data 2

10002 unus erat toto naturae

10003 vultus in orbe

10004 quem dixere chaeos

10005 rudis indigestaque

10006 meis perpetuum

10007 deducite temopra

10008 carmen ante

10009 mare et terras

10010 tegit et quod

10011 omnia caelum

10012 unus erat toto naturae

10013 vultus in orbe

10014 quem dixere chaeos

10015 rudis indigestaque

10016 meis perpetuum

10017 deducite temopra

10018 carmen ante

10019 mare et terras

10020 tegit et quod

10021 omnia caelum

10022 unus erat toto naturae

10023 vultus in orbe

10024 quem dixere chaeos

10025 rudis indigestaque

10026 meis perpetuum

10027 deducite temopra

10028 carmen ante

10029 mare et terras

10030 tegit et quod

10031 omnia caelum

10032 unus erat toto naturae

10033 vultus in orbe

Flex Schema: Everyone’s Optimizations

Multi-tenant IndexMuti-Tenant Table

ID Tenant Data 2

1000001 Harrah’s $190

1000002 Harrah’s $250

1000003 Harrah’s $680

1000004 Harrah’s Poker

1000005 Harrah’s Black Jack

1000006 Harrah’s Craps

1000007 Dell Display

1000008 Dell Laptop

1000009 Dell Server

Tenant Text Number

Harrah’s $190

Harrah’s $250

Harrah’s $680

Harrah’s Poker

Harrah’s Black Jack

Harrah’s Craps

Dell Display

Dell Laptop

Dell Server

SyncCopy

Page 19: Real-Time Analytics at Salesforce.com

Reporting Index Optimization

Reporting IndexMuti-Tenant Table

ID Tenant Data 2

1000001 Harrah’s $190

1000002 Harrah’s $250

1000003 Harrah’s $680

1000004 Harrah’s Poker

1000005 Harrah’s Black Jack

1000006 Harrah’s Craps

1000007 Dell Display

1000008 Dell Laptop

1000009 Dell Server

Tenant Data 2 Data 7 … Data k

Dell Display

Dell Laptop

Dell Server

SyncCopy

Page 20: Real-Time Analytics at Salesforce.com

But How Do You Make the Queries Fast?

Real-time App Composition

Shared Metadata Cache

Bulk Processing Engine

Multi-Tenant-Aware Query OptimizerRuntime Application GeneratorFull-Text Search Engine

Page 21: Real-Time Analytics at Salesforce.com

A Real World Question

Michael Dell wants to know if Servers are selling well in the West.

How will Force.com answer this question quickly?

Page 22: Real-Time Analytics at Salesforce.com

ID Data 1 Data 2

10002 unus erat toto naturae

10003 vultus in orbe

10004 quem dixere chaeos

10005 rudis indigestaque

10006 meis perpetuum

10007 deducite temopra

10008 carmen ante

10009 mare et terras

10010 tegit et quod

10011 omnia caelum

10012 unus erat totonaturae

10013 vultus in orbe

10014 quem dixere chaeos

10015 rudis indigestaque

10016 meis perpetuum

10017 deducite temopra

10018 carmen ante

10019 mare et terras

10020 tegit et quod

10021 omnia caelum

10022 unus erat toto naturae

10023 vultus in orbe

10024 quem dixere chaeos

10025 rudis indigestaque

10026 meis perpetuum

10027 deducite temopra

10028 carmen ante

10029 mare et terras

10030 tegit et quod

10031 omnia caelum

10032 unus erat toto naturae

10033 vultus in orbe

Visibility

Indexes

Millions of Sales Line Items

The fastest path to the answer ID Data 1 Data 2

10002 unus erat toto naturae

10003 vultus in orbe

10004 quem dixere chaeos

10005 rudis indigestaque

10006 meis perpetuum

10007 deducite temopra

10008 carmen ante

10009 mare et terras

10010 tegit et quod

10011 omnia caelum

10012 unus erat totonaturae

10013 vultus in orbe

10014 quem dixere chaeos

10015 rudis indigestaque

10016 meis perpetuum

10017 deducite temopra

10018 carmen ante

10019 mare et terras

10020 tegit et quod

10021 omnia caelum

10022 unus erat toto naturae

10023 vultus in orbe

10024 quem dixere chaeos

10025 rudis indigestaque

10026 meis perpetuum

10027 deducite temopra

10028 carmen ante

10029 mare et terras

10030 tegit et quod

10031 omnia caelum

10032 unus erat toto naturae

10033 vultus in orbe

M. Dell

Servers

West

Multi-tenant Query Optimizer

Page 23: Real-Time Analytics at Salesforce.com

Run pre-queriesCheck

user VisibilityCheck filter selectivity

Build query based on results of pre-queries

Execute query

User Visibility

# of rows that the user can access

=

Filter Selectivity

How specificis this filter?

=

Multi-tenant Query Optimizer

SharedVisibility

SharedIndexes

ID Data 1 Data 2

10002 unus erat toto naturae

10003 vultus in orbe

10004 quem dixere chaeos

10005 rudis indigestaque

10006 meis perpetuum

10007 deducite temopra

10008 carmen ante

10009 mare et terras

10010 tegit et quod

10011 omnia caelum

10012 unus erat totonaturae

10013 vultus in orbe

10014 quem dixere chaeos

10015 rudis indigestaque

10016 meis perpetuum

10017 deducite temopra

10018 carmen ante

10019 mare et terras

10020 tegit et quod

10021 omnia caelum

10022 unus erat toto naturae

10023 vultus in orbe

10024 quem dixere chaeos

10025 rudis indigestaque

10026 meis perpetuum

10027 deducite temopra

10028 carmen ante

10029 mare et terras

10030 tegit et quod

10031 omnia caelum

10032 unus erat toto naturae

10033 vultus in orbe

ID Data 1 Data 2

10002 unus erat toto naturae

10003 vultus in orbe

10004 quem dixere chaeos

10005 rudis indigestaque

10006 meis perpetuum

10007 deducite temopra

10008 carmen ante

10009 mare et terras

10010 tegit et quod

10011 omnia caelum

10012 unus erat totonaturae

10013 vultus in orbe

10014 quem dixere chaeos

10015 rudis indigestaque

10016 meis perpetuum

10017 deducite temopra

10018 carmen ante

10019 mare et terras

10020 tegit et quod

10021 omnia caelum

10022 unus erat toto naturae

10023 vultus in orbe

10024 quem dixere chaeos

10025 rudis indigestaque

10026 meis perpetuum

10027 deducite temopra

10028 carmen ante

10029 mare et terras

10030 tegit et quod

10031 omnia caelum

10032 unus erat toto naturae

10033 vultus in orbe

Stop

Go

Multi-tenant Optimizer Statistics

Page 24: Real-Time Analytics at Salesforce.com

Acting on pre-queries

Pre-QuerySelectivitymeasurements

Construct final database query, forcing…

User Filter

Low Low …nested loops join; drive using view of rows that the user can see

Low High …use of index related to filter

High Low …ordered hash join; driving using data table

High High … use of index related to filter

Page 25: Real-Time Analytics at Salesforce.com

Report Execution

JoinsFiltersHints

Aggregations

SortsAggregations

Filters

Application ServerApplication ServerApplication Server

rowscachePre-queries

SQL

Page 26: Real-Time Analytics at Salesforce.com

Agenda

Motivation Our approach Making it work Conclusions and future directions

Page 27: Real-Time Analytics at Salesforce.com

Conclusions

Real-time BI is possible A data warehouse is not required Interesting technical challenges

– Cannot rely on database’s cost-based optimizer– Sophisticated sharing models great for customers

but technically challenging– Real-time data limits caching applicability– Protect tenants from each other

We need help

Page 28: Real-Time Analytics at Salesforce.com

Technical Direction

Increased focus on usability Expanded analytical capabilities Collaboration Scalability


Top Related