falcon storage engine designed for speed presentation
TRANSCRIPT
MySQL Users' Conference April 2009
Falcon - built for speed
Ann HarrisonKevin Lewis
If it's so fast, why isn't it done yet?
Talk overviewFalcon at a glanceProject historyMulti-threading for the database developerCycle locking
Falcon at a glance – read first record
Serial LogFiles Database
Tablespaces
Serial Log
Windows Page Cache
Record CacheMySQL Server
Falcon at a glance – read complete
Serial LogFiles Database
Tablespaces
Serial Log
Windows Page Cache
Record CacheMySQL Server
Falcon at a glance – read again
Serial LogFiles Database
Tablespaces
Serial Log
Windows Page Cache
Record CacheMySQL Server
Falcon at a glance – write new record
Serial LogFiles Database
Tablespaces
Serial Log
Windows Page Cache
Record CacheMySQL Server
Falcon at a glance – commit
Serial LogFiles Database
Tablespaces
Serial Log
Windows Page Cache
Record CacheMySQL Server
Falcon at a glance – write complete
Serial LogFiles Database
Tablespaces
Serial Log
Windows Page Cache
Record CacheMySQL Server
Falcon historyOrigin
Transactional SQL Engine for Web App EnvironmentBought by MySQL in 2006
MVCCConsistent ReadVerisons control write accessMemory only – no steal
Indexes and data separateData encoded on disk and in memoryFine grained multi-threading
Falcon Goals circa 2006
Exploit large memory for more than just a bigger cacheUse threads and processors for data migrationEliminate tradeoffs, minimize tuningScale gracefully to very heavy loadsSupport web applications
Web application characteristicsLarge archive of dataSmaller active set High read:write ratioUneven, bursty activity
What we did instead
Enforce limit on record cache sizeRespond to simple atypical loads
Autocommit single record accessRepeat “insert ... select”Single pass read of large data set
Challenge InnoDB on DBT2Large working setContinuous heavy load
Hired the world's most vicious test designer
Record CacheRecord Cache contains:
Committed records with no versions
Record CacheRecord Cache contains:
Committed records with no versions
New, uncommitted records
Record Cache Record Cache contains:
Committed records with no versions
New, uncommitted records
Records with multiple versions
Record Cache cleanup – step 1Cleanup old committed single version recordsScavengerRuns on schedule or demandRemoves oldest mature recordsSettable limits – start and stop
Record Cache Cleanup – step 2Clean out record versions too oldto be useful
PruneRemove old, unneeded versions
Record Cache Cleanup – step 3
Clean up a cache full of new records
ChillCopy new record data to logDone by transaction threadSettable start size
Record Cache Cleanup – step 4Clean up multiple versions of asingle record created by a singletransaction
Remove intermediate versionsCreated by a single transactionRolled back to save pointRepeated updates
Record Cache Cleanup – step 5Clean up records with multipleversions, still potentially visibleBacklog
Copy entire record tree to diskExpensiveNot yet working
Simple, atypical loadsChallenge:
Autocommit single record accessRecord cache is uselessRecord encoding is uselessTransaction creation / destruction is too expensive
Response:Reuse read only transactions
Result:Multi-threaded bookkeeping nightmare
Simple, atypical loadsChallenge:
Repeat “insert ... select...”
Fill cache with old and new records
Simple, atypical loadsChallenge:
Repeat “insert ... select...”
Fill cache with old and new records
First solutionScavenge old recordsChill new record data
Simple, atypical loadsChallenge:
Repeat “insert ... select...”Fill cache with old and new records First solution
Scavenge old recordsChill new records
Second solutionMove the records headers outAlso helps index creation
Simple, atypical loads
Single pass read of large data setRead more records than Read them over and overCaches are uselessEncoding is overhead
Response:Make encoding optional?
Challenge InnoDB on DBT2Initial results were not encouraging (2007)
0
5000
10000
15000
20000
25000
30000
10 20 50 100 150 200
Connections
Tran
sact
ions
Falcon2007InnoDB2007
Challenge InnoDB on DBT2But Falcon has improved a lot since April 2007
0
5000
10000
15000
20000
25000
30000
10 20 50 100 150 200
Connections
Tran
sact
ions
Falcon2007InnoDB2007Falcon2009
Challenge InnoDB on DBT2So did InnoDB
0
5000
10000
15000
20000
25000
30000
10 20 50 100 150 200
Connections
Tran
sact
ions Falcon2007
InnoDB2007Falcon2009InnoDB2009
Bug trends
Multi-threadingDatabases are a natural fit for multi-threading
ConnectionsGophersScavengerDisk reader/writer
Except for shared structuresLocking blocks parallel operations
Challenge – sharing without locking
Multi-threadingNon-locking operation
Purge old record versions
Multi-threadingNon-locking operation
Purge old record versions
Multi-threadingLocking operation
Remove intermediate versions
Multi-threadingLocking operation
Remove intermediate versions
What granularity of lock?
Multi-threading – Lock granularity
One per record: Too many interlocked instructions
One per record group:Thread reading one record prevents scavenge of another
No answer is right – more options?
Cycle locking – read record chainBefore starting to read a record chain, get a shared lock on a “cycle”
Transaction A Transaction BTransaction C
Cycle 1 = 3shared
Cycle 2inactive
Cycle locking – clean a record chainBefore starting to read a record chain, get a shared lock on a “cycle”
Transaction A active in Cycle 1 Transaction B active in Cycle 1Transaction C active in Cycle 1Scavenger unlinks versionsfrom record chain and links themto a “to be deleted” list.
Cycle 1 = 4shared
Cycle 2 inactive
Cycle locking – records relinked
Transaction A releases lockTransaction B releases lockTransaction C still activeScavenger releases lock
Cycle 1 = 1shared
Cycle 2 inactive
Cycle locking – swap cyclesNew access locks cycle 2
Transaction C holds Cycle 1 lockCycle Manager requests exclusive on Cycle 1 (pumps cycle)Transaction A acquires Cycle 2 lock
Cycle 1 = 1 shared
Cycle 2 = 1 shared
Cycle locking – cleanup phase
Transaction C releases lockTransaction B acquires Cycle 2 lockCycle manager exclusive Cycle 1
Cycle 1 = 0 shared
exclusive
Cycle 2 = 2 shared
Cycle locking – cleanup complete
Transaction C acquires Cycle 2 lockCycle manager exclusive Cycle 1Remove unlinked, unloved, oldversions When cleanup is done, Cyclemanager releases cycle 1
Cycle 1 exclusive
Cycle 2 = 2 shared
Questions