Download - Introduction to HBase
![Page 2: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/2.jpg)
HBase Key Point
Clustered, commodity(-ish) hardware Mostly schema-less Dynamic distribution Spread writes out over the cluster
![Page 3: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/3.jpg)
HBase
Distributed database modeled on Bigtable Bigtable :
A Distributed Storage System for Structured Data by Chang et al.
Runs on top of Hadoop Core Layers on HDFS for storage Native connections to MapReduce Distributed, High Availability, High
Performance, Strong Consistency
![Page 4: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/4.jpg)
HBase (cont.)
Column-oriented store Wide table costs only the data stored NULLs in row are ‘free’ Good compression: columns of similar type Column name is arbitrary
Rows stored in sorted order Can random read and write Goal of billions of rows X millions of cells
Petabytes of data across thousands of servers
![Page 5: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/5.jpg)
Column Oriented Storage
![Page 6: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/6.jpg)
!HBase
“NoSQL” Database No joins No sophisticated query engine No transactions (sort of) No column typing No SQL, no ODBC/JDBC, etc.
Not a replacement for RDBMS Matching Impedance
![Page 7: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/7.jpg)
Why HBase?
Datasets are reaching Petabytes Traditional databases are expensive
to scale and difficult to distribute Commodity hardware is cheap and
powerful Need for random access and batch
processing (which Hadoop does not offer)
![Page 8: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/8.jpg)
Tables
Table is split into roughly equal sized “regions”
Each region is a contiguous range of keys
Regions split as they grow, thus dy-namically adjusting to your data set
![Page 9: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/9.jpg)
Table (cont.)
Tables are sorted by Row Table schema defines column fami-
lies Families consist of any number of col-
umns Columns consist of any number of ver-
sions Everything except table name is byte[](Table, Row, Family:Column, Timestamp) -> Value
![Page 10: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/10.jpg)
Table (cont.)
As a data structrue
SortedMap(RowKey, List(
SortedMap(Column, List(
Value, Timestamp)
))
)
![Page 11: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/11.jpg)
HBase Open Source Stack
ZooKeeper : Small Data Coordination Service
HBase : Database Storage Engine HDFS : Distributed File system Hadoop : Asynchrous Map-Reduce
Jobs
![Page 12: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/12.jpg)
Server Architecture
Similar to HDFS Master == Namenode Regionserver == Datanode
Often run these alongside each other! Difference: HBase stores state in HDFS HDFS provides robust data storage across
machines, insulating against failure Master and Regionserver fairly stateless
and machine independent
![Page 13: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/13.jpg)
Region Assignment
Each region from every table is as-signed to a Regionserver
Master Duties: Responsible for assignment and handling
regionserver problems (if any!) When machines fail, move regions When regions split, move regions to bal-
ance Could move regions to respond to load Can run multiple backup masters
![Page 14: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/14.jpg)
Master
The master does NOT Handle any write request (not a DB mas-
ter!) Handle location finding requests Not involved in the read/write path Generally does very little most of the
time
![Page 15: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/15.jpg)
Distributed Coordi-nation
Zookeeper is used to manage master election and server availability
Set up as a cluster, provides distrib-uted coordination primitives
An excellent tool for building cluster management systems
![Page 16: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/16.jpg)
HBase Architecture
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
![Page 17: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/17.jpg)
How data actually stored
![Page 18: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/18.jpg)
Write-ahead-Log
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
![Page 19: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/19.jpg)
HLog
![Page 20: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/20.jpg)
Demo
![Page 21: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/21.jpg)
HBase - Roadmap
HBase 0.92.0 Coprocessors Distributed Log Splitting Running Tasks in UI Performance Improvements
HBase 0.94.0 Security Secondary Indexes Search Integration HFile v2
![Page 22: Introduction to HBase](https://reader036.vdocuments.site/reader036/viewer/2022062405/554f72cab4c9058a148b541a/html5/thumbnails/22.jpg)
Reference
http://ofps.oreilly.com/titles/9781449396107/index.html
http://hbase.apache.org/book.html#quickstart
http://www.larsgeorge.com/2010/02/fosdem-2010-nosql-talk.html