hbase basic

Nghia BuiApril 12, 2015

Faculty of Business Information SystemsVietnamese-German University

2

Agenda

• Introduction• Table design• Architecture

3

Introduction

4

What is a column-oriented DB?

m_id user_id msg time

1 Luyen thank you 1234

2 Luyen thank god 1235

3 Luyen thank heaven 1236

4 Apple no thanks 1237

5 Luyen still thank 1238

1,Luyen,thank you,1234,2,Luyen,thank god,1235,3,Luyen,thank heaven,1236,4,Apple,no thank,1237,5,Luyen,still thank,1238,

1:Luyen,2:Luyen,3:Luyen,4:Apple,5:Luyen,1:thank you,2:thank god,3:thank heaven,4:no thanks,5:still thank1:1234,2:1235,3:1236,4:1237,5:1238,

A common analytic query:Which is the top word of Luyen?

Most rowsare read!

• Only user_id, msg columns are read• Chance for high rate of cache hit• Chance for compression: e.g. 1-2-3:Luyen

(so-called RLE) …• Updating whole column is welcome• No storage penalty for unused cells: e.g.

1:1234;3:1236;4:1237

Row-oriented

Column-oriented

A case:Facebook Messenger

Which one is better?

5

a column-oriented DB

github.com/apache/hbase NoSQL means no SQL

CP in term of CAP theoremDistributed system

Written in Java

Built on top of Hadoopfor storing files

6

Table Design

7

Rowkey is the key for success• HBase shares similar concepts with RDBMS, but• Designing tables in HBase is very different• Let’s forget Entity, Relationship, Normalization …• Example:

• Design a table to store population info of cities and districts• Queries required:

1. Which is biggest city?2. Which is biggest district in hanoi?

• First and most important thing is rowkey:“hanoi” or “hanoi#hoankiem”?

• To minimize number of accessed rows, “hanoi” is suitable

8

Design of the population table

rowkey population (column family)

hanoi badinh = 2000hoangmai = 5000hoankiem = 1000………………………….total = 7,000,000

saigon govap = 10000phunhuan = 7000tanbinh = 9000………………………….total = 10,000,000

The queries are now easy:1. Which is biggest city?2. Which is biggest district in hanoi?

column qualifier

value

9

Back to Facebook Messenger

Still problems:• We have to split the msg

every time we query• Inserting a new message

will create a new row, should be restricted in column oriented DB

m_id user_id msg time1 Luyen thank you 12342 Luyen thank god 12353 Luyen thank heaven 12364 Apple no thanks 12375 Luyen still thank 1238

A common analytic query:Which is the top word of Luyen?

Solution:Rowkey(user_id) word (CF) msg (CF)

Apple no = 1thanks = 1

content:1237 = no thanks

Luyen heaven = 1still = 1thank = 4you = 1

content:1238 = still thankcontent:1236 = thank heavencontent:1235 = thank godcontent:1234 = thank you

timestamp

10

Architecture

11

A big picture

Client

Master

RegionServer 1 RegionServer 2 RegionServer 3 RegionServer n

ZooKeeper

hdfs://

protobuf

12

How can clients access to HBase?

Client

MasterRegionServer 1

ZooKeeper

protobuf

• Client MUST know address/port of ZooKeeper (ZK).• If the request changes schema, ZK will return Master address/port [WHY?]• Otherwise, return the in-charge RegionServer address/port.• Underlying protocol is “protobuf” but Hbase provide 2 high-level ways for easier

communication:• Shell command via $HBASE_HOME/bin/hbase shell• Java API in $HBASE_HOME/lib/*.jar

ZK stores a special table META to know which regions belong to which RegionServer.

protobuf

13

Master is just master

• Handle requests that change schema• Balance and (re)assign regions to RegionServers

based on workload, then Update to META table on ZooKeeper

MasterZooKeeper

…HFile HFile

Del

…

Put

Put Put

…

RegionServers: we are hard workersREAD PATH INPUT

Key is tbl:rk:cf:cq[:ts]

Problem in Read#3 ???

…

WRITE PATH INPUTPut(key : value)

Del(key)

……

HRegion

HStore (per column family)

MemStore

PutDel - marker

…

Read#2Write#2

* HFiles are compacted at certain events* Compaction: minor, major

Read#3

Flush

Full?

BlockCache: a LRU queue, for faster reading

RegionServer

Write Ahead

Log(WAL)Avoid

losing data when crash

…

Del

Put

Put

Mem

Disk

Note:

Read#4: write

Read#1

Write#1

key

14

Problem in Read#3: very slow

To find a key in a HFile, we have to scan from beginning until found. Items (Put, Del …) need to be sorted by keys!

Hadoop allows only appending to files Items must be sorted already in MemStore before flushing (under the hood: Concurrent Skip List is used)

So far, still problem: we still have to scan from beginning until found. HBase use MapFile technique: a HFile is a folder which has 2 filesindex and data

HFileRead#3

Del

Put

…

key1:offset0x1key2:offset0x6key3:offset0x8key4:offset0x10key5:offset0x20key6:offset0x50

value1value2value3value4value5value6

index data

WAL

Del

Put

…

How about the WAL?Do we need to sort its items?

NO!!!15

Further optimization: Bloom Filter• Checking a given key in

index file is still slow!• In which case it is slowest?

when the key is not existed

• Hey Bloom Filter, the key is existed or not?• answer is YES NOT SURE,

need to check index file• answer is NO FOR SURE

• The array of bits is also stored in index file.

0 0 0 0 0

An array of bits, initial zero:

1 0 1 0 1

When a key “hello” is inserted: hash(“hello”) 10101Update to the array: (OR)

Key “world” is inserted: hash(“world”) 11100Update to the array: (OR)

1 1 1 0 1

To check a given key “test” hash(“test”) = 00111 at positions 3,4,5 we have bit 0 answer NO

16

17

Hadoop has 2 components

Files are distributedRedundant, reliable

HBase stores files in HDFS

Computations are distributedFast, parallel

• Client sends { java bytecode + input file paths } to nodes.

• Nodes read input files from HDFS, assign & process independently.

• Nodes collect & write result to output files in HDFS.

HBase tables can be input & output of MapReduce

18

Summary

• Apache HBase is a column-oriented DB, Java open source, distributed system built on-top of Hadoop (HDFS & MapReduce)• Table design concerns mainly on rowkey• Architecture: ZooKeeper, Master, RegionServer

hbase basic

Software