hbase - aioug · 38 hbase delete when delete command is triggered actual data is not deleted a...

Post on 08-Sep-2019

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

HBASE

V.Hariharaputhran

o Fourteen years in Oracle Development / DBA / Big Data / Cloud Technologies

o All India Oracle Users Group (AIOUG) Evangelist

o Passion to learn and share

o Blog: www.puthranv.com

Harish P o Eight Plus years in Oracle DBA

o Big Data / Cloud Technologies/ RAC

Specialist

o All India Oracle Users Group (AIOUG)

Evangelist

o Passion to learn and share

Agenda • Big Data Introduction

• Hadoop Components

• Hbase Overview

• Hbase in Hadoop

• Why Hbase

• Hbase Architecture

• Hbase Read and Write

5

Data Data Data…Lots of Data

Twitter

Facebook

Google keeps track of you

World Population

Banking/Telecom/Energy…every industry contribute

No Data Archiving Logic

Iam always online

6

Internet of People to Internet of Things

QUALITY &

CONSISTENCY MAINTAIN & REPAIR SMART SHOPPING MONITOR POLLUTION

LEVELS

WILDLIFE PROTECTION FARMING ENERGY

Devices TALK to each other as they become SMART & generate DATA

7

Hadoop Components

8

Hadoop Components

HDFS – Distributed File system

MapReduce – Distributed Data Processing

Model

Hive – Provides SQL-Based Query Language

HBASE – Distributed column-based database

Pig – Data Flow Execution

9

HDFS - Daemon / Background Process

Data Node(DN)

Secondary

Name Node(SNN)

Name Node (NN)

DN4

DN1 DN2 DN3

NN SNN

10

MapReduce - Daemon / Background Process

Task Tracker

Job Tracker

DN1 DN2 DN3

NN SNN

11

Hbase – Daemon / Background Process

Region Server

Hbase Master

RS1 RS2 RS3

HM SNN

12

SQL vs NoSQL

EMPID NAME SALARY

100 Karthick 50000

101 Shiva 40000 Row Column

100 CF – Name Timestamp value = Karthick

100 CF – Salary Timestamp value = 50000

101 CF – Name Timestamp value = Shiva

101 CF – Salary Timestamp value = 40000

EMPID NAME SALARY CITY

100 Karthick 50000 CHENNAI

101 Shiva 40000

100 CF – City Timestamp value = Chennai

EMPID NAME SALARY CITY

100 Karthick 50000 DELHI

101 Shiva 40000 100 CF – City Timestamp value = Delhi

13

No SQL Databases

NO SQL

Document

databases

Key-value

stores

Wide-column

stores

14

Hbase Keys & Column Families

Rowkey

100

101

Personal Data

Name Address

Tom SFO

Mike SFO

Demographic

DOB Gender

01-01-1960 M

01-01-1970 M

Each row has a Key

Each record is divided into Column Families

Each column family consists of one or more Columns

15

Hbase Overview

•Scalable, distributed data store

•Open source avatar of Google’s Bigtable

•Sparse

•Tightly integrated with Hadoop

•Not a RDBMS

16

Hbase is

• Column family oriented database

• Column family oriented

• Tables consisting of rows and columns

• Persisted Map

• Sparse

• Multi dimensional

• Sorted

• Indexed by rowkey, column and timestamp

• Key Value store

• [rowkey, col family, col qualifier, timestamp] -> cell value

17

Hbase is not..

• A relational database

• No SQL query language

• No joins

• No secondary indexing

• No transactions

18

When to use Hbase

•Data volume

•Application Types

•Hardware environment

•No requirement of relational features

•Quick access to data

19

Hbase Features

•Scalability

•Sharding

•Distributed storage

•Failover support

•API support

•MapReduce support

•Back up support

20

Hbase Vs RDBMs

21

Hbase Shell

bin/hbase shell

• Create table

•create ‘mytable’ , ‘cf1’

• List tables

• list

• Describe table

• describe ‘mytable’

22

Hbase Shell Cont

• Put a row

• put ‘mytable’ , ‘row1’, ‘cf1:cq1’ , ‘val1’

• Get a row

• get ‘mytable’ , ‘row1’

• Put more

• put ‘mytable’ , ‘row2’ , ‘cf1:cq1’ , ‘val2’

• put ‘mytable’ , ‘row1’ , ‘cf1:cq2’ , ‘val3’

• Get a row

• get ‘mytable’ , ‘row1’

• Scan table

• scan ‘mytable’

23

Demo

24

Hbase – Column Families Cont

Rowkey ColumnFamily Column Timestamp Value

1

CF1 COL1 123 INDIA

COL1 124 27

COL2 126 AIOUG

COL2 127 NI

CF2 COL3 123 12.6

COL3 128 ORACLE

Key Value Pair

Row Key CF1 CF2

COL1 COL2 COL3

1 INDIA 12.6

1 27

1 AIOUG

1 NI

1 ORACLE

Timestamp

123

124

126

127

128

Row Format

25

Hbase Read and Write

26

Hbase Catalog Tables

Keeps Track where

.META FILE is

present Keeps Track of All Table,

Regions that are present

27

Meta Table

28

Table - TBL

Hbase – Region and Region Servers

a

b

c

d

e

f

g

h

i

j

k

l

m

n

o

p

Region1

Region2

Region3

Region4

Table TBL,Region 1

Table TBL,Region 2

Table TBL,Region 3

Table T, Region 240

Table TBL,Region 4

Table A,Region 500

Region Server - RS1210

Region Server - RS 1230

Region Server - RS1260

29

A table can be divided horizontally into one or more regions. A region

contains a contiguous, sorted range of rows between a start key and an end

key

Each region is 1GB in size

A region of a table is served to the client by a RegionServer

Hbase Region

30

Hbase Client – Locate Data

31

Client

Region

Server

Region

Server

Zookeper

META

DATA

DATA NODE DATA NODE

Hbase Client – Read / Locate Data

META Location

META

Cache

32

Where does your data Reside ?

33

Hbase Region Server Components

34

Hbase Write

35

WAL

Hbase Write

100

1

50

Client HMaster

Region Server 102

Region Server 102

Memstore 100

1

50

HFile

ACK

36

How Data is Stored in Hfile

37

Demo

38

Hbase Delete

When Delete command is triggered actual data is not deleted

A tombstone marker is set

HBase periodically removes deleted cells during compactions.

Tombstone Marker

- > Version delete marker

Marks a single version of a column for deletion

-> Column delete marker

Marks all versions of a column for deletion

-> Family delete marker

Marks all versions of all columns for a column family for deletion

39

40

top related