introduction of hbase reporter: hu yi 2009-3-11. overview hbase is an apache open source project...
TRANSCRIPT
Introduction of HBase
Reporter: Hu Yi
2009-3-11
Overview
HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing Environment.
Data is logically organized into tables, rows and columns.
Outline
Data Model Architecture and Implementation Examples & Tests
Conceptual View
A data row has a sortable row key and an arbitrary number of columns.
A Time Stamp is designated automatically if not artificially.
<family>:<label>
Row keyTime
Stamp
Column“contents:
”Column “anchor:”
“com.apache.www”
t12 “<html>…”
t11 “<html>…”
t10“anchor:apache.
com”“APACHE
”
“com.cnn.www”
t15“anchor:cnnsi.co
m”“CNN”
t13“anchor:my.look.c
a”“CNN.co
m”
t6 “<html>…”
t5 “<html>…”
t3 “<html>…”
<family>:<label>
Physical Storage View
Physically, tables are stored on a per-column family basis.
Empty cells are not stored in a column-oriented storage format.
Each column family is managed by an HStore.
Row key TSColumn
“contents:”
“com.apache.www”
t12 “<html>…”
t11 “<html>…”
“com.cn.www”
t6 “<html>…”
t5 “<html>…”
t3 “<html>…”
Row key TS Column “anchor:”
“com.apache.www” t10
“anchor:apache.com”
“APACHE”
com.cn.www”
t9“anchor:
cnnsi.com”“CNN”
t8“anchor:
my.look.ca”“CNN.co
m”
HStore
Data MapFile
Index MapFile
Key/Value
Index key
HStore
Memcache
Row Ranges: Regions
Row key/ Column ascending, Timestamp descending
Physically, tables are broken into row ranges contain rows from start-key to end-key
Row keyTime
StampColumn
“contents:”Column “anchor:”
aaaa
t15 anchor:cc value
t13 ba
t12 bb
t11 anchor:cd value
t10 bc
aaab t14
aaac anchor:be value
aaad anchor:ad value
aaaet5 ae
t3 af
Outline
Data Model Architecture and Implementation Examples & Tests
Three major components
The HBaseMaster
The HRegionServer
The HBase client
HBaseMaster
Assign regions to HRegionServers.
1. ROOT region locates all the META regions.
2. META region maps a number of user regions.
3. Assign user regions to the HRegionServers.
Enable/Disable table and change table schema
Monitor the health of each Server
ROOT Regi on
META Regi on
META Regi on
USER Regi on
USER Regi on
USER Regi on
ROOT/META Table
Each row in the ROOT and META tables is approximately 1KB in size. At the default size of 256MB.
18
18 18
54 64
1 2
2 2
2 2
ROOTtable METAregions
USERregions
KB bytes
224TB
HRegionServer
Write Requests Read Requests Cache Flushes Compactions Region Splits
write
Hstore1 Hstore2
Memcache1
HLog
Row keyTimeStam
p
Column“contents
:”Column “anchor:”
“com.apache.ww
w”
t12“<html>…”
t11“<html>…”
t10“anchor:apache.
com”“APACH
E”
“com.cnn.www”
t9“anchor:cnnsi.co
m”“CNN”
t8“anchor:my.look.
ca”“CNN.co
m”
t6“<html>
…”
t5“<html>
…”
t3“<html>
…”
Memcache2
Mapfile1.1
Mapfile1.2
HRegionServer
Write Requests Read Requests Cache Flushes Compactions Region Splits
Read
Hstore1
Memcache1
Mapfile1.1
Mapfile1.2
Row keyTimeStam
p
Column“contents:
”Column “anchor:”
“com.apache.ww
w”
t12“<html>…”
t11“<html>…”
t10“anchor:apache.
com”“APACHE
”
“com.cnn.www”
t9“anchor:cnnsi.co
m”“CNN”
t8“anchor:my.look.
ca”“CNN.co
m”
t6“<html>
…”
t5“<html>
…”
t3“<html>
…”
HRegionServer
Write Requests Read Requests Cache Flushes Compactions Region Splits
Cache Flushes
Hstore1
Memcache1
Mapfile1.1
Mapfile1.2
HLog
Row keyTimeStam
p
Column“contents:
”Column “anchor:”
“com.apache.ww
w”
t12“<html>…”
t11“<html>…”
t10“anchor:apache.
com”“APACHE
”
“com.cnn.www”
t9“anchor:cnnsi.co
m”“CNN”
t8“anchor:my.look.
ca”“CNN.co
m”
t6“<html>
…”
t5“<html>
…”
t3“<html>
…”
Mapfile1.1
Mapfile1.2
Mapfile1.3
HRegionServer
Write Requests Read Requests Cache Flushes Compactions Region Splits
Compactions
Hstore1
Memcache1
Mapfile1.1
Mapfile1.2Mapfile1
Row keyTimeStam
p
Column“contents:
”Column “anchor:”
“com.apache.ww
w”
t12“<html>…”
t11“<html>…”
t10“anchor:apache.
com”“APACHE
”
“com.cnn.www”
t9“anchor:cnnsi.co
m”“CNN”
t8“anchor:my.look.
ca”“CNN.co
m”
t6“<html>
…”
t5“<html>
…”
t3“<html>
…”
HRegionServer
Write Requests Read Requests Cache Flushes Compactions Region Splits
Region Splits
Hstore1
Memcache1
Mapfile1
Row keyTimeStam
p
Column“contents
:”Column “anchor:”
“com.apache.ww
w”
t12“<html>…”
t11“<html>…”
t10“anchor:apache.
com”“APACH
E”
“com.cnn.www”
t9“anchor:cnnsi.co
m”“CNN”
t8“anchor:my.look.
ca”“CNN.co
m”
t6“<html>
…”
t5“<html>
…”
t3“<html>
…”
HBase Client
HBase Client ROOT Region
HBase Client
META Region
HBase Client User Region
Information cached
Outline
Data Model Architecture and Implementation Examples & Tests
Create MyTable
HBaseAdmin admin= new HBaseAdmin(config);HColumnDescriptor []column;column= new HColumnDescriptor[2];column[0]=new HColumnDescriptor("columnFamily1:");column[1]=new HColumnDescriptor("columnFamily2:");HTableDescriptor desc= new HTableDescriptor(Bytes.toByt
es("MyTable"));desc.addFamily(column[0]);desc.addFamily(column[1]);admin.createTable(desc);
Row Key
Timestamp
columnFamily1:
columnFamily2:
Insert Values
BatchUpdate batchUpdate = new BatchUpdate("myRow",timestamp);
batchUpdate.put("columnFamily1:labela",Bytes.toBytes("labela value"));
batchUpdate.put("columnFamily1:labelb",Bytes.toBytes(“labelb value"));
table.commit(batchUpdate);
Row Key
Timestamp columnFamily1:
myRow
ts1 labela labela value
ts2labelb
labelb value
I nsert
0
20000
40000
60000
80000
100000
120000
140000
160000
100000 10000 1000 100 10 1
1 10 100 1000 10000 100000
Hbase
Insert
1
10
100
1000
10000
100000
1000000
10 100
1000
1000
0
1000
00
Row*10 Column=1
time
(ms)
HbaseMySQL
Search
Row keyTime
StampColumn “anchor:”
“com.apache.www”
t12
t11
t10 “anchor:apache.com” “APACHE”
“com.cnn.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
t6
t5
t3
Select value from table where key=‘com.apache.www’ AND label=‘anchor:apache.com’
Search ScannerSelect value from table where anchor=‘cnnsi.com’
Row keyTime
StampColumn “anchor:”
“com.apache.www”
t12
t11
t10 “anchor:apache.com” “APACHE”
“com.cnn.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
t6
t5
t3
Summary
Column-oriented modification more flexible.
Higher performance on row key clusters.
Future work
More test work
Optimization on search
Thank you