real-time analytics at

Download Real-Time Analytics at

Post on 31-Dec-2016

218 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

Real-Time Analytics at Salesforce.com

Real-Time Analytics at Salesforce.comDonovan SchneiderPrincipal Architect

SDForum May, 2010

1

AgendaMotivationOur approachMaking it workConclusions and future directions

2

Evolution of Business Intelligence: Canned reports Ad-hoc query DW Real-Time Cloud Analytics

More than 50 percent of data warehouse projects will have limited acceptance, or be an outright failure

Real-time. Always. Accessible By Mere MortalsFlexible. In Sync. Reportable.

Our Vision for CRM AnalyticsDeliver Insight That is Accessible, Real-time, and Trustworthy

What Drives Actionable Insight?

ResponsiveRelevantEasy to UseActionableReliableReporting and Analytics that areIncreasedUser AdoptionBusiness user friendly

Powerful capabilities to answer real-world business questions

Fast performance, timely insight when needed

Integrated into the CRM to enable actions from insightAccurate & consistent results

Real-Time Visibility

User Adoption

Actionable Insight

And Our Customers Use It. A Lot!12M+ reports2.5M+ run per day750K dashboards700K views per day

AgendaMotivationOur approachMaking it workConclusions and future directions

7

We take a fundamentally different approach than mostPainfully slow unless against DWData is never fresh against DWChanges to CRM propagate slowlyETL process is complicated and expensiveUsable by all, from rep to SVPReal-time, all the timeFlexible & customizablePowerful w/o complicated DWOne sharing modelEasy, Real-Time and FlexibleComplicated, Out of Date, & Rigid

Single tenant DWCRMOther Systems

ERP

OtherClouds72,000+ Companies

HROther SystemsETLProcessesReal-Time ReportingDW Reporting1x / day

8

Is a Data Warehouse Needed for CRM Analytics? PerformancePre-aggregation was the only way to get decent reporting/analytics performance out of OLTP/CRMWhy people think they need a DWRequirement to combine multiple data sources in 1 report/dashboard-CRM systems were hard to integrate with external data sources-And then, they were not built at all for BI

Business View of the Data-Corporate wide ontology-Single view of the customer-Historical data captureWhy we dont need one

Force.com API200M+ API calls/day10M records/hourConsumes External Web Services

Entire System is Business Driven-Business people configure the system with their business terms, there is no IT translation reqd-Sales, Service, Analytics Clouds are all on same platform-History Tables and Analytical snapshotsCloud Computing Scale & Multi-tenant Optimization EngineAnd DW based architecture makes the system out of date, rigid and expensive

9

AnalyticsDashboardsReportsList ViewsSearch

10

AgendaMotivationOur approachMaking it workConclusions and future directions

11

Building a Multi-tenant Cloud Platform is Hard!Lots of Pieces to Assemble!Relational / Text / Non-relationalApplication Services / LifecyclesCaching and PerformanceScalability and ReliabilityInfrastructure and BackupsRelease ProcessesDevelopment Lifecycle

12

Brief Review of Force.com Multi-tenancy

Real-time App Composition

Massive Shared DatabaseShared General Purpose Kernel

True Multi-tenancy: Why Share Everything?~15 Databases~100 Servers2 Mirrors100,000s of Unique Applications1 Code Base

Force.com Data ArchitectureShared Metadata CacheBulk Processing EngineMulti-Tenant-Aware Query OptimizerRuntime Application GeneratorFull-Text Search Engine

Real-time App Composition

Sharing Relational Data Structures is HardYour DefinitionsYourDataYourOptimizations

IndexesPivot table for non-unique indexes

UniqueFieldsPivot table for unique indexes

RelationshipsPivot table for foreign keys

MRUIndexPivot table for most-recently-used

FallBackIndexPivot table for Name field indexothers

Harrahs DataDells ProductsYour Reps Data

Flex Schema on Steroids: Everyones DataFlex Column: Multiple Data Types

IDTenantData 21000001Harrahs$1901000002Harrahs$2501000003Harrahs$6801000004HarrahsPoker1000005HarrahsBlack Jack1000006HarrahsCraps1000007DellDisplay1000008DellLaptop1000009DellServer

IDData 1Data 210002unus erattoto naturae10003vultusin orbe10004quem dixerechaeos10005rudisindigestaque10006meisperpetuum10007deducitetemopra10008carmenante10009mareet terras10010tegitet quod10011omniacaelum10012unus erattoto naturae10013vultusin orbe10014quem dixerechaeos10015rudisindigestaque10016meisperpetuum10017deducitetemopra10018carmenante10019mareet terras10020tegitet quod10021omniacaelum10022unus erattoto naturae10023vultusin orbe10024quem dixerechaeos10025rudisindigestaque10026meisperpetuum10027deducitetemopra10028carmenante10029mareet terras10030tegitet quod10031omniacaelum10032unus erattoto naturae10033vultusin orbe

Flex Schema: Everyones OptimizationsMulti-tenant IndexMuti-Tenant TableIDTenantData 21000001Harrahs$1901000002Harrahs$2501000003Harrahs$6801000004HarrahsPoker1000005HarrahsBlack Jack1000006HarrahsCraps1000007DellDisplay1000008DellLaptop1000009DellServer

TenantTextNumberHarrahs$190Harrahs$250Harrahs$680HarrahsPokerHarrahsBlack JackHarrahsCrapsDellDisplayDellLaptopDellServer

SyncCopy

Indexing a multi-tenant table is non-trivial due to (1) each flex column is varchar, (2) each flex column is shared by multiple tenants, (3) each tenant could have diff data type for that flex columnSolution:Create a multi-tenant indexDefine some real (native) data types like Text, Number and DateSynchronously (within same transaction) propagate changes from multi-tenant table into the multi-tenant indexCreate normal, non-unique indexes on the multi-tenant index table (TenantId+column#+native-data-type)

Reporting Index OptimizationReporting IndexMuti-Tenant TableIDTenantData 21000001Harrahs$1901000002Harrahs$2501000003Harrahs$6801000004HarrahsPoker1000005HarrahsBlack Jack1000006HarrahsCraps1000007DellDisplay1000008DellLaptop1000009DellServer

TenantData 2Data 7Data kDellDisplayDellLaptopDellServer

SyncCopy

But How Do You Make the Queries Fast?

Real-time App CompositionShared Metadata CacheBulk Processing EngineMulti-Tenant-Aware Query OptimizerRuntime Application GeneratorFull-Text Search Engine

A Real World QuestionMichael Dell wants to know if Servers are selling well in the West.

How will Force.com answer this question quickly?

IDData 1Data 210002unus erattoto naturae10003vultusin orbe10004quem dixerechaeos10005rudisindigestaque10006meisperpetuum10007deducitetemopra10008carmenante10009mareet terras10010tegitet quod10011omniacaelum10012unus erattotonaturae10013vultusin orbe10014quem dixerechaeos10015rudisindigestaque10016meisperpetuum10017deducitetemopra10018carmenante10019mareet terras10020tegitet quod10021omniacaelum10022unus erattoto naturae10023vultusin orbe10024quem dixerechaeos10025rudisindigestaque10026meisperpetuum10027deducitetemopra10028carmenante10029mareet terras10030tegitet quod10031omniacaelum10032unus erattoto naturae10033vultusin orbe

VisibilityIndexes

Millions of Sales Line ItemsThe fastest path to the answerIDData 1Data 210002unus erattoto naturae10003vultusin orbe10004quem dixerechaeos10005rudisindigestaque10006meisperpetuum10007deducitetemopra10008carmenante10009mareet terras10010tegitet quod10011omniacaelum10012unus erattotonaturae10013vultusin orbe10014quem dixerechaeos10015rudisindigestaque10016meisperpetuum10017deducitetemopra10018carmenante10019mareet terras10020tegitet quod10021omniacaelum10022unus erattoto naturae10023vultusin orbe10024quem dixerechaeos10025rudisindigestaque10026meisperpetuum10027deducitetemopra10028carmenante10029mareet terras10030tegitet quod10031omniacaelum10032unus erattoto naturae10033vultusin orbe

M. DellServersWestMulti-tenant Query Optimizer

Run pre-queriesCheck user VisibilityCheck filter selectivityBuild query based on results of pre-queriesExecute queryUser Visibility# of rows that the user can access=Filter SelectivityHow specificis this filter?=Multi-tenant Query OptimizerSharedVisibilitySharedIndexesIDData 1Data 210002unus erattoto naturae10003vultusin orbe10004quem dixerechaeos10005rudisindigestaque10006meisperpetuum10007deducitetemopra10008carmenante10009mareet terras10010tegitet quod10011omniacaelum10012unus erattotonaturae10013vultusin orbe10014quem dixerechaeos10015rudisindigestaque10016meisperpetuum10017deducitetemopra10018carmenante10019mareet terras10020tegitet quod10021omniacaelum10022unus erattoto naturae10023vultusin orbe10024quem dixerechaeos10025rudisindigestaque10026meisperpetuum10027deducitetemopra10028carmenante10029mareet terras10030tegitet quod10031omniacaelum10032unus erattoto naturae10033vultusin orbe

IDData 1Data 210002unus erattoto naturae10003vultusin orbe10004quem dixerechaeos10005rudisindigestaque10006meisperpetuum10007deducitetemopra10008carmenante10009mareet terras10010tegitet quod10011omniacaelum10012unus erattotonaturae10013vultusin orbe10014quem dixerechaeos10015rudisindigestaque10016meisperpetuum10017deducitetemopra10018carmenante10019mareet terras10020tegitet quod10021omniacaelum10022unus erattoto naturae10023vultusin orbe10024quem dixerechaeos10025rudisindigestaque10026meisperpetuum10027deducitetemopra10028carmenante10029mareet terras10030tegitet quod10031omniacaelum10032unus erattoto naturae10033vultusin orbe

StopGoMulti-tenant Optimizer Statistics

23Multi-tenant optimization based on our flex schema with knowledge of de-normalized pivot tables and sharing model

Data sharing is scalable and robust and tightly integrated into the optimizer another high-level application feature pushed into the DB layer

Statistics ke