hbase basic
TRANSCRIPT
Nghia BuiApril 12, 2015
Faculty of Business Information SystemsVietnamese-German University
2
Agenda
• Introduction• Table design• Architecture
3
Introduction
4
What is a column-oriented DB?
m_id user_id msg time
1 Luyen thank you 1234
2 Luyen thank god 1235
3 Luyen thank heaven 1236
4 Apple no thanks 1237
5 Luyen still thank 1238
1,Luyen,thank you,1234,2,Luyen,thank god,1235,3,Luyen,thank heaven,1236,4,Apple,no thank,1237,5,Luyen,still thank,1238,
1:Luyen,2:Luyen,3:Luyen,4:Apple,5:Luyen,1:thank you,2:thank god,3:thank heaven,4:no thanks,5:still thank1:1234,2:1235,3:1236,4:1237,5:1238,
A common analytic query:Which is the top word of Luyen?
Most rowsare read!
• Only user_id, msg columns are read• Chance for high rate of cache hit• Chance for compression: e.g. 1-2-3:Luyen
(so-called RLE) …• Updating whole column is welcome• No storage penalty for unused cells: e.g.
1:1234;3:1236;4:1237
Row-oriented
Column-oriented
A case:Facebook Messenger
Which one is better?
5
a column-oriented DB
github.com/apache/hbase NoSQL means no SQL
CP in term of CAP theoremDistributed system
Written in Java
Built on top of Hadoopfor storing files
6
Table Design
7
Rowkey is the key for success• HBase shares similar concepts with RDBMS, but• Designing tables in HBase is very different• Let’s forget Entity, Relationship, Normalization …• Example:
• Design a table to store population info of cities and districts• Queries required:
1. Which is biggest city?2. Which is biggest district in hanoi?
• First and most important thing is rowkey:“hanoi” or “hanoi#hoankiem”?
• To minimize number of accessed rows, “hanoi” is suitable
8
Design of the population table
rowkey population (column family)
hanoi badinh = 2000hoangmai = 5000hoankiem = 1000………………………….total = 7,000,000
saigon govap = 10000phunhuan = 7000tanbinh = 9000………………………….total = 10,000,000
The queries are now easy:1. Which is biggest city?2. Which is biggest district in hanoi?
column qualifier
value
9
Back to Facebook Messenger
Still problems:• We have to split the msg
every time we query• Inserting a new message
will create a new row, should be restricted in column oriented DB
m_id user_id msg time1 Luyen thank you 12342 Luyen thank god 12353 Luyen thank heaven 12364 Apple no thanks 12375 Luyen still thank 1238
A common analytic query:Which is the top word of Luyen?
Solution:Rowkey(user_id) word (CF) msg (CF)
Apple no = 1thanks = 1
content:1237 = no thanks
Luyen heaven = 1still = 1thank = 4you = 1
content:1238 = still thankcontent:1236 = thank heavencontent:1235 = thank godcontent:1234 = thank you
timestamp
10
Architecture
11
A big picture
Client
Master
RegionServer 1 RegionServer 2 RegionServer 3 RegionServer n
ZooKeeper
hdfs://
protobuf
12
How can clients access to HBase?
Client
MasterRegionServer 1
ZooKeeper
protobuf
• Client MUST know address/port of ZooKeeper (ZK).• If the request changes schema, ZK will return Master address/port [WHY?]• Otherwise, return the in-charge RegionServer address/port.• Underlying protocol is “protobuf” but Hbase provide 2 high-level ways for easier
communication:• Shell command via $HBASE_HOME/bin/hbase shell• Java API in $HBASE_HOME/lib/*.jar
ZK stores a special table META to know which regions belong to which RegionServer.
protobuf
13
Master is just master
• Handle requests that change schema• Balance and (re)assign regions to RegionServers
based on workload, then Update to META table on ZooKeeper
MasterZooKeeper
…HFile HFile
Del
…
Put
Put Put
…
RegionServers: we are hard workersREAD PATH INPUT
Key is tbl:rk:cf:cq[:ts]
Problem in Read#3 ???
…
WRITE PATH INPUTPut(key : value)
Del(key)
……
HRegion
HStore (per column family)
MemStore
PutDel - marker
…
Read#2Write#2
* HFiles are compacted at certain events* Compaction: minor, major
Read#3
Flush
Full?
BlockCache: a LRU queue, for faster reading
RegionServer
Write Ahead
Log(WAL)Avoid
losing data when crash
…
Del
Put
Put
Mem
Disk
Note:
Read#4: write
Read#1
Write#1
key
14
Problem in Read#3: very slow
To find a key in a HFile, we have to scan from beginning until found. Items (Put, Del …) need to be sorted by keys!
Hadoop allows only appending to files Items must be sorted already in MemStore before flushing (under the hood: Concurrent Skip List is used)
So far, still problem: we still have to scan from beginning until found. HBase use MapFile technique: a HFile is a folder which has 2 filesindex and data
HFileRead#3
Del
Put
…
key1:offset0x1key2:offset0x6key3:offset0x8key4:offset0x10key5:offset0x20key6:offset0x50
value1value2value3value4value5value6
index data
WAL
Del
Put
…
How about the WAL?Do we need to sort its items?
NO!!!15
Further optimization: Bloom Filter• Checking a given key in
index file is still slow!• In which case it is slowest?
when the key is not existed
• Hey Bloom Filter, the key is existed or not?• answer is YES NOT SURE,
need to check index file• answer is NO FOR SURE
• The array of bits is also stored in index file.
0 0 0 0 0
An array of bits, initial zero:
1 0 1 0 1
When a key “hello” is inserted: hash(“hello”) 10101Update to the array: (OR)
Key “world” is inserted: hash(“world”) 11100Update to the array: (OR)
1 1 1 0 1
To check a given key “test” hash(“test”) = 00111 at positions 3,4,5 we have bit 0 answer NO
16
17
Hadoop has 2 components
Files are distributedRedundant, reliable
HBase stores files in HDFS
Computations are distributedFast, parallel
• Client sends { java bytecode + input file paths } to nodes.
• Nodes read input files from HDFS, assign & process independently.
• Nodes collect & write result to output files in HDFS.
HBase tables can be input & output of MapReduce
18
Summary
• Apache HBase is a column-oriented DB, Java open source, distributed system built on-top of Hadoop (HDFS & MapReduce)• Table design concerns mainly on rowkey• Architecture: ZooKeeper, Master, RegionServer