introduction to hbase

Introduction to HBaseByeongweon Moon / REDDUCKbyeongweon.moon@reddduck.com

HBase Key Point

Clustered, commodity(-ish) hardware Mostly schema-less Dynamic distribution Spread writes out over the cluster

Distributed database modeled on Bigtable Bigtable :

A Distributed Storage System for Structured Data by Chang et al.

Runs on top of Hadoop Core Layers on HDFS for storage Native connections to MapReduce Distributed, High Availability, High

Performance, Strong Consistency

HBase (cont.)

Column-oriented store Wide table costs only the data stored NULLs in row are ‘free’ Good compression: columns of similar type Column name is arbitrary

Rows stored in sorted order Can random read and write Goal of billions of rows X millions of cells

Petabytes of data across thousands of servers

Column Oriented Storage

!HBase

“NoSQL” Database No joins No sophisticated query engine No transactions (sort of) No column typing No SQL, no ODBC/JDBC, etc.

Not a replacement for RDBMS Matching Impedance

Why HBase?

Datasets are reaching Petabytes Traditional databases are expensive

to scale and difficult to distribute Commodity hardware is cheap and

powerful Need for random access and batch

processing (which Hadoop does not offer)

Tables

Table is split into roughly equal sized “regions”

Each region is a contiguous range of keys

Regions split as they grow, thus dy-namically adjusting to your data set

Table (cont.)

Tables are sorted by Row Table schema defines column fami-

lies Families consist of any number of col-

umns Columns consist of any number of ver-

sions Everything except table name is byte[](Table, Row, Family:Column, Timestamp) -> Value

Table (cont.)

As a data structrue

SortedMap(RowKey, List(

SortedMap(Column, List(

Value, Timestamp)

HBase Open Source Stack

ZooKeeper : Small Data Coordination Service

HBase : Database Storage Engine HDFS : Distributed File system Hadoop : Asynchrous Map-Reduce

Server Architecture

Similar to HDFS Master == Namenode Regionserver == Datanode

Often run these alongside each other! Difference: HBase stores state in HDFS HDFS provides robust data storage across

machines, insulating against failure Master and Regionserver fairly stateless

and machine independent

Region Assignment

Each region from every table is as-signed to a Regionserver

Master Duties: Responsible for assignment and handling

regionserver problems (if any!) When machines fail, move regions When regions split, move regions to bal-

ance Could move regions to respond to load Can run multiple backup masters

Master

The master does NOT Handle any write request (not a DB mas-

ter!) Handle location finding requests Not involved in the read/write path Generally does very little most of the

Distributed Coordi-nation

Zookeeper is used to manage master election and server availability

Set up as a cluster, provides distrib-uted coordination primitives

An excellent tool for building cluster management systems

HBase Architecture

http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

How data actually stored

Write-ahead-Log

http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html

HBase - Roadmap

HBase 0.92.0 Coprocessors Distributed Log Splitting Running Tasks in UI Performance Improvements

HBase 0.94.0 Security Secondary Indexes Search Integration HFile v2

Reference

http://ofps.oreilly.com/titles/9781449396107/index.html

http://hbase.apache.org/book.html#quickstart

http://www.larsgeorge.com/2010/02/fosdem-2010-nosql-talk.html

introduction to hbase

hbase roadmap hbase

hbase distributed database

hbase architecturehttp

hdfs master

hbase nosql database

hdfs hdfs

data set

hbase stores state

Documents

introduction to apache hbase, mapr tables and security

a very brief introduction to hadoop -...

introduction to hbase - phoenix hug 5/14

introduction to hbase

how does hive compare to hbase

hbasics: an introduction to hadoop hbase huguk, april 14th,...

une introduction à hbase

about intellipaat€¦ · • introduction to hive •...

oracle g b data adapter for hbase · important that the...

hbase installation & shell...agenda • learn about...

unit v big data frameworks nosql - …...unit v big data...

introduction to hbase. agenda what is hbase about rdbms ...

mylife with hbase or hbase three flavors

intro to hbase

hbase -...

apache hbase - introduction & use cases

hadoop developer - sevenmentor · 2021. 2. 17. · d....

an introduction to apache hadoop, mahout and hbase

hbase: a comprehensive introduction

hbase introduction in azure