omid: efficient transaction management and incremental processing for hbase (hadoop summit europe...

Post on 02-Dec-2014

1.366 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

http://yahoo.github.com/omid Many data processing tasks can be thought as small mutations to a large database triggered by events. Contrary to batch processing, the incremental processing model achieves very low delay from the reception of an event to the application of the mutation. In the absence of transactions support, developers have to use ad-hoc mechanisms to ensure atomic execution of mutations despite failures and concurrent accesses to the database by other clients. Most NoSQL data stores, like HBase, BigTable and Cassandra, lack support of transactions, which makes them unsuitable for a whole range of applications. In this talk we present Omid, an open source tool for transactional support and incremental processing on top of HBase (https://github.com/yahoo/omid). Due to the centralized nature of its client-replicated status oracle, Omid (i) avoids distributed locks, (ii) scales up to 60,000 TPS and a thousand clients, (iii) requires no changes to HBase, and (vi) adds a negligible overhead to data servers.

TRANSCRIPT

Omid

Efficient Transaction Management and Incremental Processing for HBase

Copyright © 2013 Yahoo! All rights reserved. No reproduction or distribution allowed without express written permission.

Daniel Gómez Ferro 21/03/2013

About me

§ Research engineer at Yahoo!

§ Main Omid developer

§ Apache S4 committer

§ Contributor to: • ZooKeeper • HBase

Omid - Yahoo! Research 2

Outline

§ Motivation

§ Goal: transactions on HBase

§ Architecture

§ Examples

§ Future work

3 Omid - Yahoo! Research

Motivation

§ Traditional DBMS: • SQL • Transactions • Scalable

§ NoSQL stores: • SQL • Transactions • Scalable

§ Some use cases require • Scalability • Consistent updates

Omid - Yahoo! Research 4

Motivation

§ Requested feature • Support for transactions in HBase is a common request

§ Percolator • Google implemented transactions on BigTable

§ Incremental Processing • MapReduce latencies measured in hours • Reduce latency to seconds • Requires transactions

Omid - Yahoo! Research 5

Incremental Processing

State0 State1 Shared State

Offline MapReduce

Online Incremental Processing

Fault tolerance Scalability

Fault tolerance Scalability

Shared state

Using Percolator, Google dramatically reduced crawl-to-index time

Omid - Yahoo! Research 6

Transactions on HBase

Omid

§ ‘Hope’ in Persian

§ Optimistic Concurrency Control system • Lock free • Aborts transactions at commit time

§ Implements Snapshot Isolation

§ Low overhead

§ http://yahoo.github.com/omid Omid - Yahoo! Research 8

Snapshot Isolation

§ Transactions read from their own snapshot

§ Snapshot: •  Immutable •  Identified by creation time • Values committed before creation time

§ Transactions conflict if two conditions apply: A)  Overlap in time B)  Write to the same row

Omid - Yahoo! Research 9

Transaction conflicts

r1 r2 r3 r4 Time

§ Transaction B overlaps with A and C • B ∩ A = ∅ • B ∩ C = { r4 } • B and C conflict

§ Transaction D doesn’t conflict Omid - Yahoo! Research 10

Write set:

r3 r4

r2 r4

r1 r3

Id:

A

B

C

D

Time overlap:

Omid Architecture

Omid Clients Omid

Clients

Architecture

1.  Centralized server 2.  Transactional metadata replicated to clients 3.  Store transaction ids in HBase timestamps

HBase Region Servers

Status Oracle (SO)

HBase Region Servers

HBase Region Servers

HBase Region Servers

(1) (2)

(3)

Omid - Yahoo! Research 12

Omid Clients

SO

Status Oracle

§ Limited memory • Evict old metadata • … maintaining consistency guarantees!

§ Keep LowWatermark • Updated on metadata eviction • Abort transactions older than LowWatermark • Necessary for Garbage Collection

Omid - Yahoo! Research 13

Transactional Metadata

Status Oracle

Row LastCommitTS r1 T20 … …

StartTS CommitTS T10 T20 … …

Aborted T2 T18 …

LowWatermark T4

Omid - Yahoo! Research 14

Metadata Replication

§ Transactional metadata is replicated to clients

§ Status Oracle not in critical path • Minimize overhead

§ Clients guarantee Snapshot Isolation •  Ignore data not in their snapshot • Talk directly to HBase • Conflicts resolved by the Status Oracle

§ Expensive, but scalable up to 1000 clients

Omid - Yahoo! Research 15

Omid Clients Omid

Clients

Architecture recap

HBase Region Servers

HBase Region Servers

HBase Region Servers

HBase Region Servers

Omid - Yahoo! Research 16

Omid Clients

SO

Row LastCommitTS r1 T20 … …

StartTS CommitTS T10 T20 … …

Aborted T2 T18 …

LowWatermark T4

Status Oracle

Architecture + Metadata

Status Oracle

R LC r1 20

ST CT 10 20

Abort …

LowW …

ST CT 10 20

Abort …

LowW …

Client

Omid - Yahoo! Research 17

Conflict detection (not replicated)

HBase

row TS val r1 10 hi

HBase values tagged with Start TimeStamp

Comparison with Percolator

18 Omid - Yahoo! Research

Omid

Clients HBase

Status Oracle

Percolator

Clients BigTable

TS Server

Simple API

§ TransactionManager: Transaction begin(); void commit(Transaction t) throws RollbackException; void rollback(Transaction t);

§ TTable:

Result get(Transaction t, Get g); void put(Transaction t, Put p); ResultScanner getScanner(Transaction t, Scan s);

Omid - Yahoo! Research 19

Example Transaction

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

Status Oracle

R LC r1 20

ST CT 10 20

Abort …

LowW …

HBase

row TS val r1 10 hi

>> |

Omid - Yahoo! Research 21

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

R LC r1 20

ST CT 10 20

Abort …

LowW …

>> t = TM.begin() Transaction { ST: 25 } >>

t ST: 25

|

Omid - Yahoo! Research 22

HBase

row TS val r1 10 hi

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

R LC r1 20

ST CT 10 20

Abort …

LowW …

>> t = TM.begin() Transaction { ST: 25 } >> TTable.get(t, ‘r1’) ‘hi’ >>

t ST: 25

|

Omid - Yahoo! Research 23

HBase

row TS val r1 10 hi

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

R LC r1 20

ST CT 10 20

Abort …

LowW …

>> t = TM.begin() Transaction { ST: 25 } >> TTable.get(t, ‘r1’) ‘hi’ >> TTable.put(t, ‘r1’, ‘bye’) >>

t ST: 25 R: {r1}

|

Omid - Yahoo! Research 24

HBase

row TS val r1 10 hi r1 25 bye

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

R LC r1 30

ST CT 25 30

Abort …

LowW 20

>> TTable.get(t, ‘r1’) ‘hi’ >> TTable.put(t, ‘r1’, ‘bye’) >> TM.commit(t) Committed{ CT: 30 } >>

t ST: 25 R: {r1}

|

Omid - Yahoo! Research 25

HBase

row TS val r1 10 hi r1 25 bye

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

Abort 15

>> t_old Transaction{ ST: 15 } >> TT.put(t_old, ‘r1’, ‘yo’) >> TM.commit(t_old) Aborted >>

t_old ST: 15 R: {r1}

Omid - Yahoo! Research 26

HBase

row TS val r1 15 yo r1 25 bye

R LC r1 30

ST CT 25 30

LowW …

|

Centralized Approach

27 Omid - Yahoo! Research

Omid Performance

§ Centralized server = bottleneck? • Focus on good performance •  In our experiments, it never became the bottleneck

0

5

10

15

20

25

30

1K 10K 100K

Late

ncy

in m

s

Throughput in TPS

2 rows4 rows8 rows

16 rows32 rows

128 rows512 rows

Omid - Yahoo! Research 28

512 rows/t 1K TPS 10 ms

2 rows/t 100K TPS

5 ms

Fault Tolerance

§ Omid uses BookKeeper for fault tolerance • BookKeeper is a distributed Write Ahead Log

§ Recovery: reads log’s tail (bounded time)

0

3000

6000

9000

12000

15000

18000

5900 5950 6000 6050 6100

Throughput in TPS

Time in s

Downtime < 25 s

Omid - Yahoo! Research 29

Example Use Case News Recommendation System

News Recommendation System

§ Model-based recommendations • Similar users belong to a cluster • Each cluster has a trained model

§ New article • Evaluate against clusters • Recommend to users

§ User actions • Train model • Split cluster

Omid - Yahoo! Research 31

Transactions Advantages

§ All operations are consistent • Updates and queries

§ Needed for a recommendation system? • We avoid corner cases •  Implementing existing algorithms is straightforward

§ Problems we solve • Two operations concurrently splitting a cluster • Query while cluster splits • …

32 Omid - Yahoo! Research

Example overview

33 Omid - Yahoo! Research

Index

Articles & User actions Queries

Index Data Store Wor

kers

Updates

Other use cases

§ Update indexes incrementally • Original Percolator use case

§ Generic graph processing

§ Adapt existing transactional solutions to HBase

34 Omid - Yahoo! Research

Future Work

Current Challenges

§ Load isn’t distributed automatically

§ Complex logic requires big transactions • More likely to conflict • More expensive to retry

§ API is low level for data processing

Omid - Yahoo! Research 36

Future work

§ Framework for Incremental Processing

• Simpler API

• Trigger-based, easy to decompose operations

• Auto scalable

§ Improve user friendliness • Debug tools, metrics, etc

§ Hot failover

Omid - Yahoo! Research 37

Questions?

All time contributors Ben Reed

Flavio Junqueira Francisco Pérez-Sorrosal

Ivan Kelly Matthieu Morel

Maysam Yabandeh

Partially supported by CumuloNimbo project (ICT-257993)

funded by the European Commission

Try it! yahoo.github.com/omid

Backup

39 Omid - Yahoo! Research

Commit Algorithm

§ Receive commit request for txn_i

§ Set CommitTimestamp(txn_i) <= new timestamp

§ For each modified row: •  if LastCommit(row) > StartTimestamp(txn_i) => abort

§ For each modified row: • Set LastCommit(row) <= CommitTimestamp(txn_i)

Omid - Yahoo! Research 40

top related