transactions over apache hbase
TRANSCRIPT
TRANSACTIONS OVER HBASEAlex Baranau @abaranau Gary Helmling @gario
Continuuity
WHO WE ARE • We’ve built Continuuity Reactor: the world’s first scale-out
application server for Hadoop
• Fast, easy development, deployment and management of Hadoop and HBase apps
• Continuuity team has years of experience in using and contributing to Open Source, and we intend to continue doing so.
���2
AGENDA • Transactions in stream processing: Why? What?
• Implementation: How? • Omid-style transactions explained
• Transaction Manager
• What’s next?
���3
THE REACTOR • Continuuity Reactor is an app platform built on Hadoop and HBase
• Collect, Process, Store, and Query data.
• A Flow is a real-time processor with exactly-once guarantee
• A flow is composed of flowlets, connected via queues
• All processing happens with ACID guarantees in transactions
���4
HBase Table
PROCESSING IN A FLOW
���5
...Queue ...
...
Flowlet
... ...
HBase Table
PROCESSING IN A FLOW
���6
...Queue ...
...
Flowlet
... ...
HBase Table
PROCESSING IN A FLOW
���7
...Queue ...
...
Flowlet
TRANSACTIONS: WHAT?• Atomic - Entire transaction is committed as one
• Consistent - No partial state change due to failure
• Isolated - No dirty reads, transaction is only visible after commit
• Durable - Once committed, data is persisted reliably
���8
WHAT ABOUT HBASE?• Atomic operations on cell value:
checkAndPut, checkAndDelete, increment, append
• Atomic batch of operations on rows within region
���9
• No cross region atomic operations support
• No cross table atomic operations support
• No multi-RPC atomic operations support
IMPLEMENTATION OVERVIEW
���10
OMID-STYLE TRANSACTIONS • Multi-Version Concurrency Control
• Cell version (timestamp) = transaction ID
• All writes in the same transaction use the transaction ID as timestamp
• Reads exclude other, uncommitted transactions (for isolation)
• Optimistic Concurrency Control
• Conflict detection at commit of transaction
• Write Conflict: two overlapping transactions write the same row
• Rollback of one transaction in case of conflict (whichever commits later)
���11
OPTIMISTIC CONCURRENCY CONTROL
• Avoids cost of locking rows and tables
• No deadlocks or lock escalations
• Cost of conflict detection and possible rollback is higher
• Good if conflicts are rare: short transaction, disjoint partitioning of work
���12
ZooKeeper
TRANSACTIONS IN CONTEXT
���13
Tx Manager (standby)
HBase
Master 1
Master 2
RS 1
RS 2 RS 4
RS 3
Client 1
Client 2
Client N
Tx Manager (active)
TRANSACTION LIFE CYCLE
time out
try abort
failed
roll back in HBase
write to
HBasedo work
Client Tx Manager
none
complete Vabortsucceeded
in progress
start txstart
start tx
committry commit check conflicts
RPC API
invalid Xinvalidate
failed
HBase
CLIENT SIDE: TX AWARE
���15
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
Client 2
write = 1002 read = 1001
HBase
CLIENT SIDE: TX AWARE
���16
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
start
write = 1002 read = 1001
Client 2
write = 1002 read = 1001
HBase
CLIENT SIDE: TX AWARE
���17
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
start
write = 1002 read = 1001
Client 2
write = 1003 read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
���18
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1increment
write = 1002 read = 1001
Client 2
write = 1003 read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
���19
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1 increment
write = 1002 read = 1001
Client 2
write = 1003 read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
���20
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1 start
write = 1002 read = 1001
Client 2
write = 1003 read = 1001
inprogress=[1002]
write = 1003 read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
���21
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1 start
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002, 1003]
write = 1003 read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
���22
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
increment
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002, 1003]
write = 1003 read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
���23
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002, 1003]
write = 1003 read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
���24
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002]
write = 1003 read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
���25
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
���26
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
conflict!
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
���27
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
���28
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
���29
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
���30
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[]
HBase
CLIENT SIDE: TX AWARE
���31
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002 read = 1001
Client 2
write = 1004 read = 1003
inprogress=[]
HBase
CLIENT SIDE: TX AWARE
���32
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1 start
Client 2
write = 1005 read = 1003
inprogress=[]
write = 1004 read = 1003
HBase
CLIENT SIDE: TX AWARE
���33
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
read
Client 2
write = 1005 read = 1003
inprogress=[]
write = 1004 read = 1003
HBase
CLIENT SIDE: TX AWARE
���34
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
conflict!
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
���35
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
���36
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback failed
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
���37
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
invalidate
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
���38
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
invalidate
write = 1002 read = 1001
Client 2
write = 1004 read = 1003
inprogress=[] invalid=[1002]
HBase
CLIENT SIDE: TX AWARE
���39
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 start
Client 2
write = 1005 read = 1003
inprogress=[] invalid=[1002]
write = 1004 read = 1003
exclude = [1002]
HBase
CLIENT SIDE: TX AWARE
���40
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
read
Client 2
write = 1005 read = 1003
inprogress=[] invalid=[1002]
write = 1004 read = 1003
exclude = [1002]
invisible!
TRANSACTION MANAGER• Create new transactions
• Provides monotonically increasing write pointers
• Maintains all in-progress, committed, and invalid transactions
• Detect conflicts
• Transaction = Write Pointer: Timestamp for HBase writes
Read pointer: Upper bound timestamp for reads
Excludes: List of timestamps to exclude from reads
���41
TRANSACTION MANAGER• Simple & Fast
• All required state is in-memory
• Single point of failure? • Persist all state to a write-ahead log
• Secondary Tx Manager watches for failure of Primary
• Failover can happen quickly
���42
TRANSACTION MANAGER
���43
Tx ManagerCurrent State
in progress
committed
invalid
read point
write point
start()
TRANSACTION MANAGER
���44
Tx ManagerCurrent State
in progress (+)
committed
invalid
read point
write point ++
start()Tx Log
started, <write pt>
HDFS
TRANSACTION MANAGER
���45
Tx ManagerCurrent State
in progress (-)
committed (+)
invalid
read point
write pointcommit()
Tx Logstart, <write pt>
commit, <write pt>
HDFS
TRANSACTION SNAPSHOTS
• Write-ahead log provides persistence • Guarantees point-in-time recovery
• Longer the log grows, longer recovery takes
• Periodically write snapshot of full transaction state • Snapshot + all new logs provides full state
���46
Tx ManagerCurrent State
TRANSACTION SNAPSHOTS
���47
Tx Log A
in progress
committed
invalid
read point
write point
HDFS
Tx ManagerCurrent State
TRANSACTION SNAPSHOTS
���48
Tx Log A
in progress
committed
invalid
read point
write point Tx Log B1
HDFS
TRANSACTION SNAPSHOTS
���49
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B2
HDFS
TRANSACTION SNAPSHOTS
���50
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B
Tx Snapshot
in progress
committed
invalid
read point
write point
3
HDFS
TRANSACTION SNAPSHOTS
���51
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B
Tx Snapshot
in progress
committed
invalid
read point
write point
4
HDFS
HBase
TRANSACTION CLEANUP
���52
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback failed
write = 1002 read = 1001
Client 2
write = 1004 read = 1001
inprogress=[1002]
TRANSACTION CLEANUP: DATA JANITOR
• RegionObserver coprocessor
• Maintains in-memory snapshot of recent invalid & in-progress sets
• Periodically updates from transaction snapshot in HDFS
• Purges data from invalid transactions and older versions on flush & compaction
���53
HBase
TRANSACTION CLEANUP: DATA JANITOR
���54
Tx Snapshotread point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
refresh
Data Janitor (RegionObserver)
MemStore
preFlush()
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Cell TS Valuerow1:col1 1004 12
1003 111002 11
Custom RegionScanner
HFileCell TS Value
HBase
TRANSACTION CLEANUP: DATA JANITOR
���55
Data Janitor (RegionObserver)
HFileCell TS Value
Custom RegionScanner
MemStore
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshotread point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
preFlush()
Cell TS Valuerow1:col1 1004 12
1003 111002 11
HBase
TRANSACTION CLEANUP: DATA JANITOR
���56
Data Janitor (RegionObserver)
HFileCell TS Value
row1:col1 1004 12Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshotread point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Valuerow1:col1 1004 12
1003 111002 11
HBase
TRANSACTION CLEANUP: DATA JANITOR
���57
Data Janitor (RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshotread point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Valuerow1:col1 1004 12
1003 111002 11
HFileCell TS Value
row1:col1 1004 12
HBase
TRANSACTION CLEANUP: DATA JANITOR
���58
Data Janitor (RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshotread point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Valuerow1:col1 1004 12
1003 111002 11
HFileCell TS Value
row1:col1 1004 121003 11
HBase
TRANSACTION CLEANUP: DATA JANITOR
���59
Data Janitor (RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshotread point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Valuerow1:col1 1004 12
1003 111002 11
HFileCell TS Value
row1:col1 1004 121003 11
HBase
TRANSACTION CLEANUP: DATA JANITOR
���60
Data Janitor (RegionObserver)
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshotread point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Custom RegionScanner
preFlush()
MemStore
Cell TS Valuerow1:col1 1004 12
1003 111002 11
HFileCell TS Value
row1:col1 1004 121003 11
WHAT’S NEXT?• Open Source
• Continue Scaling Tx Manager • Transaction Groups?
• Integration across other transactional stores
���61
QS?Looking for the chance to work with a team that is
defining a new category within Big Data?
!
We are hiring! http://continuuity.com/careers
���62