data done right
DESCRIPTION
You probably already know that managing data in Salesforce can be a formidable task. But you might not know that it doesn't have to be! In this session, we'll focus on strategies to help you with key data tasks such as data migration, managing large data volumes, org merges, and data consolidations.TRANSCRIPT
Data Done RightAdministrators
Brian Wiebe: Technical Engagement Manager, salesforce.comEzra Kenigsberg: Data Architect, salesforce.com
Safe HarborSafe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year ended January 31, 2010. This documents and others are available on the SEC Filings section of the Investor Information section of our Web site.
Any unreleased services or features referenced in this or other press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
TinyURL.com/SalesforceSafeHarbor
Purpose
To walk through three big data issues that can help
make you an even-better administrator.
This Session… Other Sessions…
Practical demos—things you
can do TODAY
Required– Data Loader
– Microsoft Excel
– A decent text editor
(I use Notepad++)
Optional– Cloud Converter
– Synchronizer
(requires Microsoft Access)
Bigger-picture data strategy
Professional third-party tools
Overview
1. Introduction
5 min
2. Moving Data
15 min
3. Cleaning Data
15 min
4. Working with Large Data Volumes
15 min
5. Q&A
until they kick us out
Prior to making any major changes to your org:
BACK UP!
Ezra Kenigsberg
salesforce.com
Moving Data
The Scenario
The scenario we’re walking through:
Gotta import new records by tomorrow
We’re creating a repeatable, documented process– “Just load it” fails the hit-by-a-bus test…
• …is difficult to audit after the fact
• …may not be reversible if I’ve made a mistake
Links and Tools
Useful links:
developer.force.com/consultants
EzraKenigsberg.com
Our tools:
Required– Data Loader
– Microsoft Excel
– A decent text editor (I use Notepad++)
Optional– Cloud Converter
– Synchronizer (requires Microsoft Access)
Useful Links: developer.force.com/consultants
Dedicated pages for• Data Migration• Large Data Volumes• many others
Useful Links: EzraKenigsberg.com
Dedicated sections for• Handy Tools• Reference Links• Presentations• Requests for Salesforce & Data Loader Improvements
Three Key Steps
Three steps:
1. How should I map my data?
2. How can I automate the generating of CSVs?
3. How can I load data in an auditable way?
HOW do I
generate? load?map?
HOW do I
generate? load?map?
Moving Data #1: How Do I Map?
Create a mapping file:
1. Create list of source fields in source file/s
2. Create list of API field names (not the UI labels!)– Get them with Data Loader or Cloud Converter
3. Match source fields to API field names
HOW do I
map?
Grab the Source Field Names
Transposed in Excel
Excel 2003:Edit | Paste Special | Transpose
Excel 2007 or 2010:Home | Paste | Paste Special | Transpose
1
2
HOW do I
map?
Grab the API Field Names (slide 1 of 2)
2
Model Metrics’ free utility “Cloud Converter” is a straightforward
way to export metadata
1
HOW do I
map?
Grab the API Field Names (slide 2 of 2)
2
1
...and a row for
every field
“Cloud Converter” exports a tab for every object...
HOW do I
map?
Build Out Mapping File
HOW do I
map?
Index Column
An index columnfacilitates latersorting
HOW do I
map?
A Column of Arrows
Arrows remind people of the DIRECTION of data
HOW do I
map?
Not Every Field Has to Map
Not every column has to map
HOW do I
map?
Keep it Simple!
The most common cause of bad data maps?
Too much stuff!
Put in only the columns you’ll UPDATE and USE
HOW do I
map?
Moving Data #2: How Do I Generate CSVs?
Import legacy data into a tool that enables REUSE– Raw data comes in, ready-to-load data comes out
– Minimize manual steps. Tools I’m not so thrilled with:• Microsoft Excel
• Import Wizard
– Each subsequent load becomes a straightforward process• Test-then-Production
• Follow-up loads
Which tool should I use?– Good question! Can you wait eight slides?
HOW do I
generate?
Moving Data #3: How Do I Load?
Loading Best Practices:
1. Group work by folder
2. Keep all files together
3. Auto-Match
4. Upsert!
HOW do I
load?
Folder Naming
YYYY-MM-DD#NNforces folders
to sort inchronological
order
HOW do I
load?
Keep All Files Together
Loading Best Practices:
1. Group work by folder
2. Keep all files together
3. Auto-Match
4. Upsert!
HOW do I
load?
Folder Contents (1 of 3)
Save source file, success files, and error files all in same directory
HOW do I
load?
Folder Contents (2 of 3)
When loading errors:• Copy error file and• Use as source for next load
1
2
HOW do I
load?
Folder Contents (3 of 3)HOW do I
load?
Dummy-Proof the Auto-Match
Loading Best Practices:
1. Group work by folder
2. Keep all files together
3. Auto-Match
4. Upsert!
HOW do I
load?
Perfect Auto-Match
With a file created this way, Auto-Match gets every field mapped
HOW do I
load?
The Smarter Way to Load Data: Upsert!
Loading Best Practices:
1. Group work by folder
2. Keep all files together
3. Auto-Match
4. Upsert!
HOW do I
load?
export UPSERT UPSERT
Why UPSERT and EXTERNAL IDs are Great
With Upsert & External IDs
export map update insert
map map update insert
a
c
a
c
a
c
a
c
a
c
cc
a
c
a
c
aa
c
Without Upsert & External IDs
a
c
a
HOW do I
generate? load?map?
Synchronizer is an OPEN SOURCE application that is NOT
SUPPORTED by Salesforce
Synchronizer has functions that facilitate all these tasks:– Mapping
– Generating CSVs
– Loading data
HOW do I
Synchronize?
How Synchronizer Automates All This
Synchronizer Walkthrough
In this walkthough, we’ll:
1. Import legacy data into Synchronizer
2. Map the data
3. Document the data map
4. Migrate the data into Salesforce
5. Review the files created
HOW do I
Synchronize?
Synchronizer Review: Import and Map
One-button CSV importer (not perfect, but fast and simple):
Grab API fields, then tie legacy fields to API fields:
HOW do I
Synchronize?
Synchronizer Review: Create Data Map
One-click data map generator:
HOW do I
Synchronize?
Synchronizer Review: Migrate
One-screen UI for loading data
HOW do I
Synchronize?
Synchronizer Review: Organize Files
Folder and file discipline:
Reimporting of success and error files for use in future loads
HOW do I
Synchronize?
Synchronizer Review: Other Goodies
Some other Synchronizer functions not covered:
Ability to run multiple steps in sequence
Scheduling
Mass-create tasks– Creating Users
– Assigning Users to Groups
Custom reports– Storage usage by User, by object
HOW do I
Synchronize?
Best Practices
1. Build a mapping file
2. Leverage a tool to generate CSVs
3. Use loading Best Practices
Get Synchronizer and help make it better!
HOW do I
generate? load?map?
Brian Wiebe
salesforce.com
Cleaning Data
43
What is Data Quality?Combination of Processes, Policies and Tools
Involves Governance, Enforcement, Prevention Goal is not perfection
What are the typical Issues?Duplicates (Account, Contact), Incomplete information,
Stale or Untouched data, Inconsistent values, Incorrect linkages
What are the typical causes?Not part of Budget, Unmeasurable problem
No Action Plan, No Ownership, Lack of Training, Non-optimized salesforce.com
Key Data Quality ConceptsDefining a Broad Topic
The Full Data Quality Lifecycle
AssessAssess
Cle
anse
•Train users•Enforce processes •Monitor on-going
quality
Data Protect
Protect
Our 3-step, iterative process quickly identifies problems, fixes them and helps you maintain high data quality over time
Data Cleanse
Data QualityAssessment•Profile data•Analyze results• Identify problems
and next steps
•Standardize & Cleanse
•Supplement & Enrich
•Test & Load
Data Quality Assessment
Data Quality Assessment
Project Planning
Strong Sponsorship– Committed Involvement & Availability (The DQ Assessment helps justify this)– Appreciation, Awareness & Understanding of Data complexities
Limiting Scope / Phased Approach– ACHIEVABLE Goals– Define critical quick-win items for Phase 1 (focus on biggest issues for end-users)
Test, Test, Test– Leverage your Sandbox Environment– Data Quality cleansing is a “destructive” process
Plan for End-user involvement– Data Quality is an iterative process – and MUST involve end-user buy-in and input
If the foundation is off..
Begin Governance & Stewardship– Involve IT and Business users– Monitoring Data Quality Dashboards – report back monthly– Use Salesforce features (e.g., Data Validation Rules, Conditional Workflow field
updates, Analytic Snapshots for trending)
Archive un-used Data– Data must be USEFUL to the Business and must be justifiable– Candidates for archiving : last updated > 1 year ago, no child records, Missing
Core Required Fields
Correct existing data– Users who have left company and STILL own records, Find/Replace picklist
values, Apply Naming Standards
Data Quality Solution Considerations
Identify and remove Dupes– Low hanging fruit:
• Simple dupes: e.g.,) match on a unique key like email address• Flag dupes for merging in Salesforce
– Leverage available de-dupe tools• Complex definition of dupe: e.g.,) fuzzy matching on name+address
– Define your rules (matching rules, merging rules)
Enrich & Append– Enrich your existing data– Add NEW data for known companies– 3rd party data vendors – helpful in creating Account hierarchies, helpful
for accurate Contact Info – especially at various levels in the Company
Data Quality Solution Considerations
Limit points of Entry– List Imports restricted to certain profiles– Control data being entered – without overwhelming users– Leverage Sales Intelligence Tools, Dupe Prevention, Address
Validation
Automation / Integration– Integrations, Master Data Management– Nightly Batch Updates
Data Quality Solution Considerations
Cleansing Environment
Staging
Staging
Production
• Transform & Re-model• Cleanse & Standardize• Enrich & De-dupe• Iterate• Validate with Business
Users
Company Name & Address
Enrich (Optional)Enrich (Optional)
Acme Inc HQAcme UK
Hierarchy Data
Demographics
3
Names
StandardizeStandardize
US, U.S. U.S.A USA
acme incorpAcme Inc
Addresses
Postal Standards
Identify, Match & Score
4
De-dupeDe-dupe
J. Smith, John Smith 80%
Re-parent Child Records
Account: Division, Opportunity, Contact
Merge
J. Smith, John Smith John Smith
Find & Replace
2
CleanseCleanse
Acme-Widgets-453
Hot HighCold Low
Data Transformation
Naming Conventions
Mergers, acquisitions, spin-offs
Archiving & Filtering
Load to Sandbox
5
ValidateValidate
Load to Production
Validate & Modify
1
Cleansing Process
Safeguard your cleansed data and prevent future deterioration.
TrainTrain
• User Training• Naming Conventions• Address Conventions• Dupe. Prevention Process• Data Importing Policies
• Required Fields• Default Values• Data Validation Rules• Workflow Field Updates• Web-to-Lead Restrictions
• Data Quality Dashboards• Data Quality
Reassessment• AppExchange Tools
EnforceEnforce MonitorMonitor
Protect Your Data
What tools do I use?
The AppExchange The Trusted Cloud Computing Marketplace
1000+Pre-Integrated Apps
300+ Services
4000+Customer Reviews
200+ Free Apps to Get You Started
• Reports & dashboards to end-to-end templates
• Fully customizable
AppExchange Tools Worth Checking Out!
Cloud Converter (Free)
Synchronizer (Free)
Jigsaw for Salesforce (Paid)
CRM Fusion (Paid)
Data Quality Dashboard (EE edition)
Data Quality Analysis Dashboard (EE Edition)
All reports pull from just TWO formula fields.
59
The Formula Field
You can EXPAND these formulas to include YOUR custom fields.
Brian WiebeEzra Kenigsberg
salesforce.com
Managing Large Data Volumes (LDV)
What Do We Mean by Large Data Volumes?
You know you’ve got scale when …– 1,000s of Users
– 1,000,000s of records for a single Object
– Role or Territory hierarchy > dozens of levels
– Public Groups nested > 5 levels deep
– A single User, Queue, Role, Public Group, or Territory:• Owning 10,000s of records
• Seeing 10,000s of records as a result of sharing
– 10,000s of Public Groups
– 1,000s of Territories
These are NOT hard limits—only useful guides to proceed carefully!
Talk to your Account Executive
Where Would We Proceed Carefully?
User Interface– Reports, Dashboards, List Views
– Searches
API– Queries
– Integration
Synchronizing with end-user (Outlook, Mobile)
apps
What Options are Available?
1. Segment
2. Optimize sharing
3. Leverage indexes & skinny tables
4. Move data asynchronously
1) Segmenting with Divisions Think millions
Use data access patterns
– Does everyone really look at
everything?
Acts like DB partitions
– Breaks up big objects
– Aim for <1M / division
Used for performance of search,
reports, dashboards, and list
views
Not a security measure!
Global Division
… SOHO SMB MSB Enterprise Gov …
By Geography
By Responsibility
Example: Financial Services Customer13M Clients / ~500 Branches = 26,000 records per Division
Arc
hive
1) Segmenting with Tiered Data Think tens of millions
Focus users on active data– Open cases
– Warm leads
– Recent history
Segregate inactive data
Can be used with Divisions
The Archive Data table...– ...doesn’t have to be custom
– ...doesn’t have to be Salesforce
Can use Analytic Snapshots
Example: Financial Services Customer145M Activities - 90M Legacy = 55M55M / ~500 Branches = 110,000
Active DataStandard or Custom ObjectStandard functionality
Batch ApexScheduled Apex
Archive DataStandard or Custom ObjectSubset of columnsReport focused
2) Optimize Sharing
Use private sharing strategically
Enforce ownership to prevent data
concentration– “Super-owner” individuals
Streamline hierarchies– Limit depth of nested groups
– Roles, Groups, Territories, etc.
Leverage all capabilities– Apex Managed Sharing for custom
objects
3) Leverage Custom Indexes
Standard Indexes: Created Date, Last Modified Date, Division, Record Type
Administrators can index fields bydesignating them as External IDs
Custom Indexes are availablethrough Support– Multi-column custom indexes also supported
– Can be applied based on use case, impact and priority
– Salesforce.com working on automatically detecting the need
Example: Large Japanese Insurance Co.Custom object with 10M records queried regularly by 50,000+ users80x boost in query perf. due to multi-column Custom Index
3) A Word About Skinny Tables
Database
Innovation– No work required
– Managed by
salesforce.com
2-10x performance
for some analytics
Also available
through Support
Name Address 1 ST Comments SUM
John 1 Terracotta Ln CA Need follow up here. 500
John 1 Terracotta Ln CA Need follow up here. 500
John 1 Terracotta Ln CA Need follow up here. 500
John 1 Terracotta Ln CA Need follow up here. 500
John 1 Terracotta Ln CA Need follow up here. 500
John 1 Terracotta Ln CA Need follow up here. 500
John 1 Terracotta Ln CA Need follow up here. 500
Name ST SUM
John CA 500
John CA 500
John CA 500
John CA 500
John CA 500
John CA 500
John CA 500
John CA 500
John CA 500
John CA 500
John CA 500
John CA 500
BaseTable
Ski
nn
yTa
ble
Faster Reports(more rows fit in memory)
Faster Reports(more rows fit in memory)
Fewerrowsper
fetch
Morerowsper
fetch
Data streamed to temporary storageData streamed to
temporary storage
ClientClientProcessing
ThreadProcessing
Thread
Processing Thread
Processing Thread
Processing Servers
Processing Servers
JobJob
Data Batches
Data Batches
Dequeue batch
Dequeue batch
Insert/updateInsert/update
Save resultsSave
results
Send all data
Send all data
Check StatusCheck Status
Retrieve ResultsRetrieve Results
4) Bulk Up—Asynchronously
Job updated in Admin Setup
Job updated in Admin Setup
Dataset processed in parallel
Dataset processed in parallel
ResultsResults
Bulk API
The “go-to” option for tens of thousands of records
and up
Up to 10,000 records in a batch file
Asynchronous loading, tracked in Salesforce’s
Walkthrough time!
Upsert legacy data into Salesforce—FAST
Example: American Insurance Co.230 million records processed in 33 hours,14 hours ahead of schedule
Q&A
Q&A
Links:
developer.force.com/consultants
ezrakenigsberg.com
Post-Session Questions?• Brian Wiebe
Technical Engagement Manager (West)[email protected]
• Ezra KenigsbergData Architect (Midwest) [email protected]
D I S C O V E R
Visit Customer Success Team at Campground
Discover
Training
Learning Paths
Experience
Product
Demos
Learn about Customer
Resources
the products, services and resources
Meet Success Experts
S U C C E S S
Find us at the Customer Success Team area of salesforce.com Campground at Moscone North
Learn about how to win prizes including 10 iPads & more!
that help you achieve
How Could Dreamforce Be Better? Tell Us!
Log in to the Dreamforce app to submit
surveys for the sessions you attendedUse the
Dreamforce Mobile app to submit
surveysEvery session survey you submit is
a chance to win an iPod nano!
OR