omid: efficient transaction management and incremental processing for hbase (hadoop summit europe...

Efficient Transaction Management and Incremental Processing for HBase

Daniel Gómez Ferro 21/03/2013

About me

§ Research engineer at Yahoo!

§ Main Omid developer

§ Apache S4 committer

§ Contributor to: • ZooKeeper • HBase

Omid - Yahoo! Research 2

Outline

§ Motivation

§ Goal: transactions on HBase

§ Architecture

§ Examples

§ Future work

3 Omid - Yahoo! Research

Motivation

§ Traditional DBMS: • SQL • Transactions • Scalable

§ NoSQL stores: • SQL • Transactions • Scalable

§ Some use cases require • Scalability • Consistent updates

Motivation

§ Requested feature • Support for transactions in HBase is a common request

§ Percolator • Google implemented transactions on BigTable

§ Incremental Processing • MapReduce latencies measured in hours • Reduce latency to seconds • Requires transactions

Incremental Processing

State0 State1 Shared State

Offline MapReduce

Online Incremental Processing

Fault tolerance Scalability

Shared state

Using Percolator, Google dramatically reduced crawl-to-index time

Transactions on HBase

§ ‘Hope’ in Persian

§ Optimistic Concurrency Control system • Lock free • Aborts transactions at commit time

§ Implements Snapshot Isolation

§ Low overhead

§ http://yahoo.github.com/omid Omid - Yahoo! Research 8

Snapshot Isolation

§ Transactions read from their own snapshot

§ Snapshot: •  Immutable •  Identified by creation time • Values committed before creation time

§ Transactions conflict if two conditions apply: A)  Overlap in time B)  Write to the same row

Transaction conflicts

r1 r2 r3 r4 Time

§ Transaction B overlaps with A and C • B ∩ A = ∅ • B ∩ C = { r4 } • B and C conflict

§ Transaction D doesn’t conflict Omid - Yahoo! Research 10

Write set:

Time overlap:

Omid Architecture

Omid Clients Omid

Clients

Architecture

1.  Centralized server 2.  Transactional metadata replicated to clients 3.  Store transaction ids in HBase timestamps

HBase Region Servers

Status Oracle (SO)

(1) (2)

Omid Clients

Status Oracle

§ Limited memory • Evict old metadata • … maintaining consistency guarantees!

§ Keep LowWatermark • Updated on metadata eviction • Abort transactions older than LowWatermark • Necessary for Garbage Collection

Transactional Metadata

Status Oracle

Row LastCommitTS r1 T20 … …

StartTS CommitTS T10 T20 … …

Aborted T2 T18 …

LowWatermark T4

Metadata Replication

§ Transactional metadata is replicated to clients

§ Status Oracle not in critical path • Minimize overhead

§ Clients guarantee Snapshot Isolation •  Ignore data not in their snapshot • Talk directly to HBase • Conflicts resolved by the Status Oracle

§ Expensive, but scalable up to 1000 clients

Omid Clients Omid

Clients

Architecture recap

Omid Clients

Row LastCommitTS r1 T20 … …

StartTS CommitTS T10 T20 … …

Aborted T2 T18 …

LowWatermark T4

Status Oracle

Architecture + Metadata

Status Oracle

R LC r1 20

ST CT 10 20

Abort …

LowW …

ST CT 10 20

Abort …

LowW …

Client

Conflict detection (not replicated)

row TS val r1 10 hi

HBase values tagged with Start TimeStamp

Comparison with Percolator

Clients HBase

Status Oracle

Percolator

Clients BigTable

TS Server

Simple API

§ TransactionManager: Transaction begin(); void commit(Transaction t) throws RollbackException; void rollback(Transaction t);

§ TTable:

Result get(Transaction t, Get g); void put(Transaction t, Put p); ResultScanner getScanner(Transaction t, Scan s);

Example Transaction

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

Status Oracle

R LC r1 20

ST CT 10 20

Abort …

LowW …

row TS val r1 10 hi

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

R LC r1 20

ST CT 10 20

Abort …

LowW …

>> t = TM.begin() Transaction { ST: 25 } >>

t ST: 25

row TS val r1 10 hi

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

R LC r1 20

ST CT 10 20

Abort …

LowW …

>> t = TM.begin() Transaction { ST: 25 } >> TTable.get(t, ‘r1’) ‘hi’ >>

t ST: 25

row TS val r1 10 hi

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

R LC r1 20

ST CT 10 20

Abort …

LowW …

>> t = TM.begin() Transaction { ST: 25 } >> TTable.get(t, ‘r1’) ‘hi’ >> TTable.put(t, ‘r1’, ‘bye’) >>

t ST: 25 R: {r1}

row TS val r1 10 hi r1 25 bye

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

R LC r1 30

ST CT 25 30

Abort …

LowW 20

>> TTable.get(t, ‘r1’) ‘hi’ >> TTable.put(t, ‘r1’, ‘bye’) >> TM.commit(t) Committed{ CT: 30 } >>

t ST: 25 R: {r1}

row TS val r1 10 hi r1 25 bye

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

Abort 15

>> t_old Transaction{ ST: 15 } >> TT.put(t_old, ‘r1’, ‘yo’) >> TM.commit(t_old) Aborted >>

t_old ST: 15 R: {r1}

row TS val r1 15 yo r1 25 bye

R LC r1 30

ST CT 25 30

LowW …

Centralized Approach

Omid Performance

§ Centralized server = bottleneck? • Focus on good performance •  In our experiments, it never became the bottleneck

1K 10K 100K

Throughput in TPS

2 rows4 rows8 rows

16 rows32 rows

128 rows512 rows

512 rows/t 1K TPS 10 ms

2 rows/t 100K TPS

Fault Tolerance

§ Omid uses BookKeeper for fault tolerance • BookKeeper is a distributed Write Ahead Log

§ Recovery: reads log’s tail (bounded time)

5900 5950 6000 6050 6100

Throughput in TPS

Time in s

Downtime < 25 s

Example Use Case News Recommendation System

News Recommendation System

§ Model-based recommendations • Similar users belong to a cluster • Each cluster has a trained model

§ New article • Evaluate against clusters • Recommend to users

§ User actions • Train model • Split cluster

Transactions Advantages

§ All operations are consistent • Updates and queries

§ Needed for a recommendation system? • We avoid corner cases •  Implementing existing algorithms is straightforward

§ Problems we solve • Two operations concurrently splitting a cluster • Query while cluster splits • …

Example overview

Articles & User actions Queries

Index Data Store Wor

Updates

Other use cases

§ Update indexes incrementally • Original Percolator use case

§ Generic graph processing

§ Adapt existing transactional solutions to HBase

Future Work

Current Challenges

§ Load isn’t distributed automatically

§ Complex logic requires big transactions • More likely to conflict • More expensive to retry

§ API is low level for data processing

Future work

§ Framework for Incremental Processing

• Simpler API

• Trigger-based, easy to decompose operations

• Auto scalable

§ Improve user friendliness • Debug tools, metrics, etc

§ Hot failover

Questions?

All time contributors Ben Reed

Flavio Junqueira Francisco Pérez-Sorrosal

Ivan Kelly Matthieu Morel

Maysam Yabandeh

Partially supported by CumuloNimbo project (ICT-257993)

funded by the European Commission

Try it! yahoo.github.com/omid

Backup

Commit Algorithm

§ Receive commit request for txn_i

§ Set CommitTimestamp(txn_i) <= new timestamp

§ For each modified row: •  if LastCommit(row) > StartTimestamp(txn_i) => abort

§ For each modified row: • Set LastCommit(row) <= CommitTimestamp(txn_i)

omid: efficient transaction management and incremental processing for hbase (hadoop summit europe...

Technology

formal powerpoint...

university of tehran 1 microprocessor system design...

university of tehran 1 interface design compute memory...

taking omid to the clouds: fast, scalable transactions for...

university of tehran 1 microprocessor system design omid...

omid: a transactional framework for hbase

omid tahernia

building a linq provider for hbase mapreduce ·...

omid efficient transaction management and incremental...

university of tehran 1 microprocessor system design omid...

cb omid-roozmand

dr. omid shojaei, ceo

taking omid to the clouds: fast, scalable transactions for...

university of tehran 1 microprocessor system design timers...

university of tehran 1 microprocessor system design omid...

omid abdi - cv-senior sa

big data hadoop fulllbulk loading in hbase lcreate, insert,...

omid thesis

intro to hbase internals & schema design (for hbase users)

university of tehran 1 interface design memory modules omid...