pk chunking presentation from tahoe dreamin' 2016
TRANSCRIPT
PK ChunkingDivide and conquer massive objects in salesforce
Daniel Peter• Lead Applications Engineer, Kenandy Inc.
• Co-organizer of the Bay Area Salesforce Developer User Group• @danieljpeter
Why shouldn’t I leave?
Because you need to learn how to avoid these errors!Query not “selective” enough:� Non-selective query against large object type (more
than 100000 rows).
Query takes too long:� No response from the server� Time limit exceeded� Your request exceeded the time limit for processing.
Too much data returned in query:� Too many query rows: 50001� Remoting response size exceeded maximum of 15 MB.
Sounds great. How?
Not so fast……first we need some pre-requisite knowledge!
� Database Indexes� Salesforce Ids
Database indexes (prereq)
“Allow us to quickly locate rows without having to scan every row in the database”
(paraphrased from wikipedia)
Salesforce Ids (prereq)
�Composite key containing multiple pieces of data.
�Uses base 62 numbering instead of the more common base 10.
�Fastest way to find a database row.Is it time to go
skiing yet?
Salesforce Ids (prereq)
Digits Values
1 62
2 3,844
3 238,328
4 14,776,336 million
5 916,132,832 million
6 56,800,235,584 billion
7 3,521,614,606,208 trillion
8 218,340,105,584,896 trillion
9 13,537,086,546,263,600quadrillion
Digits Values
1 10
2 100
3 1,000
4 10,000
5 100,000
6 1,000,000 million
7 10,000,000 million
8 100,000,000 million
9 1,000,000,000 billion
Base 10 Base 62vs
(sorry for covering you, logo)
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Fetching people in a city: problems
Non-selectiveRequest: “get me all the people who are female”
Response: “yer trippin’!”
Fetching people in a city: problems
TimeoutRequest: “find me a 7 foot tall person in a pink tuxedo in Beijing”
Response: (after searching all day) “I can’t find any! I give up!”
Finding people in a city: problems
Too many people foundRequest: “find me all the men in San Francisco with beards”
Response: (after searching for 10 mins) “The bus is full!”
Fetching people in a city: solutions
Non-selectiveRequest: “get me all the people who are female,in your small search area”
Response: “¡Con mucho gusto!”
Fetching people in a city: solutions
TimeoutRequest: “find me a 7 foot tall person in a pink tuxedo in Beijing, in your small search area”
Response: SP1: “Didn’t find any, sorry!”SP2: “Didn’t find any, sorry!”SP3: “Found one!”SP4: “Didn’t find any, sorry!”
Finding people in a city: solutions
Too many people foundRequest: “find me all the men in San Francisco with beards, in your small search area”
Response:SP1: 30 people in our busSP2: Didn’t find anySP3: 50 people in our bus
QLPK
Salesforce SOAP or REST API – AJAX toolkit works great.
Create and leverage a server-side cursor. Similar to an Apex query locator.
Analogy: Print me a phone book of everyone in the city so I can flip through it.
QLPK – AJAX Toolkit Response
Chunk the database, in size of your choice, by offsetting the queryLocator:
01gJ000000KnRpDIAV-5000001gJ000000KnRpDIAV-100000…01gJ000000KnRpDIAV-3995000001gJ000000KnRpDIAV-40000000
QLPK – The Chunks
800 chunksx 50,000 records40,000,000 total records
Analogy: we have exact addresses for clusters of 50k people to give to 800 different search parties.
QLPK – How to use in a query?
Perform 800 queries with the Id ranges in the where clause:
SELECT Id, Autonumber__c, Some_Number__cFROM Large_Object__cWHERE Some_Number__c > 10 AND Some_Number__c < 20 AND Id >= 'a00J000000BWNYk' AND Id <= 'a00J000000BWO4z'
QLPK – Parallelism
Yeah it’s 800 queries, but…
They all went out at once, and they might all come back at once.
Analogy: We hired 800 search parties and unleased them on the city at the same time.
Base62PK
Get the first and last Id of the database and extrapolate the ranges in between.
Analogy: Give me the highest and lowest address of everyone in the city and I will make a phonebook with every possible address in it. Then we will break that into chunks.
Base62PK – first and last Id
Get the first IdSELECT Id FROM Large_Object__c ORDER BY Id ASC LIMIT 1
Get the last IdSELECT Id FROM Large_Object__c ORDER BY Id DESC LIMIT 1
Even on H-U-G-E databases these return F-A-S-T. No problem.
Base62PK – extrapolate
1. Chop off the last 9 digits of the 15 digit first/last Ids. Decompose.
2. Convert the 9 digit base 62 numbers into a Long Integer.
3. Add the chunk size to the first number until you hit or exceed the last number.
4. Last chunk may be smaller.5. Convert those Long Integers back to base 62
and re-compose the 15 digit Ids
Base62PK – issues
� Digits 4 and 5 of the Salesforce Id are the pod Identifier. If the Ids in your org have different pod Id’s this technique will break, unless enhanced.
� Fragmented Ids lead to sparsely populated ranges. You will search entire ranges of Ids which have no records.
So which do I pick?
Hetergeneous Pod Ids Homogeneous Pod Ids
Low Id Fragmentation
(<1.5x)
Medium Id Fragmentation
(1.5x - 3x)
High Id Fragmentation
(>3x)
QLPK X X X
Base62PK X X
How do I implement?
� Needs to be orchestrated via JS in your page.� Doesn’t work on Lightning Component
Framework. No support for real parallel controller actions. (boxcar’ed)
� Has to be Visualforce or Lightning / Visualforce hybrid.
How do I implement?
� Use RemoteActions to get the chunk queries back into your page.
� Can be granular or aggregate queries!� Process each chunk query appropriately when
it comes back. EX: update totals on a master object or push into a master array.
function queryChunks() {for (var i=0; i<chunkList.length; i++) {
queryChunk(i);}
}
function queryChunk(chunkIndex) {var chunk = chunkList[chunkIndex];
Visualforce.remoting.Manager.invokeAction('{!$RemoteAction.Base62PKext.queryChunk}',chunk.first, chunk.last,function (result, event) {
for (var i=0; i<result.length; i++) {objectAnums.push(result[i].Autonumber__c);
}
queryChunkCount++;if (queryChunkCount == chunkList.length) {
allQueryChunksComplete();}
},{escape: false, buffer: false}
);
}
@RemoteActionpublic static List<Large_Object__c> queryChunk(String firstId, String lastId) {
String SOQL = 'SELECT Id, Autonumber__c, Some_Number__c ' +'FROM Large_Object__c ' +'WHERE Some_Number__c > 10 AND Some_Number__c < 20 ' +'AND Id >= \'' + firstId + '\' ' +'AND Id <= \''+ lastId +'\' ';
return database.query(SOQL);}
Landmines
� Timeouts – retries � Cache warming means if you first fail, try and
try again!� Concurrency� Beware: ConcurrentPerOrgApex Limit exceeded� Keep your individual chunk queries lean. < 5
secs.
Demos
Harrah’s internet doesn’t like 800 parallel http connections.
Video:
https://www.youtube.com/watch?v=KqHOStka0eg
How did you figure this out?
Had to meet requirements for Kenandy’s largest customer. $2.5B / yr manufacturer.
High visibility project.
Necessity mother of invention!
How did you figure this out?
QLPK
Ran into an org that had a mixture of sandbox and production IDs. Base62PK broke!
Why doesn’t Salesforce do this?
They do! (kinda)
The Bulk API uses a similar technique, but it is more asynchronous and wrapped in a message container to track progress.
Thank you!
More info:
Article on Salesforce Developers Bloghttps://developer.salesforce.com/blogs/developer-relations/2015/11/pk-chunking-techniques-massive-orgs.html
Github repohttps://github.com/danieljpeter/pkChunking
Bulk API documentation:https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm