apache hbase internals you hoped you never needed to understand
TRANSCRIPT
![Page 1: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/1.jpg)
Apache HBase Internals you Hoped you Never Needed to UnderstandJosh ElserFuture of Data, NYC2016/10/11
![Page 2: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/2.jpg)
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Engineer at Hortonworks, Member of the Apache Software Foundation
Top-Level Projects• Apache Accumulo®• Apache Calcite™• Apache Commons ™• Apache HBase ®• Apache Phoenix ™
ASF Incubator• Apache Fluo ™• Apache Gossip ™• Apache Pirk ™• Apache Rya ™• Apache Slider ™
These Apache project names are trademarks or registeredtrademarks of the Apache Software Foundation.
![Page 3: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/3.jpg)
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache HBase for storing your data!
CC BY 3.0 US: http://hbase.apache.org/
![Page 4: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/4.jpg)
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What happens when things go wrong?
CC BY-ND 2.0: https://www.flickr.com/photos/widnr/6588151679
![Page 5: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/5.jpg)
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The BigTable Architecture
BigTable’s architecture is simple
Debugging a distributed system is not simple
How can we break down a complex system?
How do we write resilient software?
• Log-Structured Merge Tree• Write-Ahead Logs• Distributed Coordination• Row-based, Auto-Sharding• Strong Consistency• Read Isolation• Coprocessors• Security (AuthN/AuthZ)• Backups
![Page 6: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/6.jpg)
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Naming Conventions
Servers– Hostname, Port, and Timestamp– RegionServer: r01n01.domain.com,16201,1475691463147– Master: r02n01.domain.com,16000,1475691462616
Regions– Table, Start RowKey, Region ID (timestamp), Replica ID, Encoded name– T1,\x04\x00\x00,1470324608597.c04d94cd4ee9797da2fb906b4dcd2e3c.– Or simply c04d94cd4ee9797da2fb906b4dcd2e3c
![Page 7: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/7.jpg)
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Regions
A sorted “shard” of a table At least one “column family”
– Physical partitions
Each family can have zero to many files Hosted by at most one RegionServer
– Can have many hosting RS’s for reads
In-memory locks for certain intra-row operations
![Page 8: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/8.jpg)
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Region Assignment
Coordinated by the HBase Master A Region must only be hosted by one RegionServer State tracked in hbase:meta
– hbck to fix issues
Region splits/merges make a hard problem even harder Moving towards ProcedureV2
Closed Offline Opening OpenPending Open
Normal Region Assignment States
![Page 9: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/9.jpg)
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The File System
HDFS “Compatible”– Distributed, durable, ”write leases”
Physical storage of HBase Tables (HFiles) Write-ahead logs A parent directory in that FileSystem (hbase.rootdir)
![Page 10: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/10.jpg)
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The File SystemPhysical Separation by HBase Namespace/hbase/data//hbase/data/default/<table1>/hbase/data/default/.tabledesc/.tableinfo…/hbase/data/default/<table2>/<region_id1>/hbase/data/default/<table2>/<region_id2>/hbase/data/my_custom_ns/<table3>/…/hbase/data/hbase/meta/…/hbase/archive/…
/hbase/WALs/<regionserver_name>/…/hbase/oldWALs/…/hbase/corrupt/…
![Page 11: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/11.jpg)
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The File System for one Region
/hbase/data/default/<table2>/<region_id1>
…/.regioninfo…/.tmp…/<family1>/<hfile>…/<family1>/<hfile>…/<family2>/<hfile>…/<family3>/<hfile>…/recovered.edits/<number>.seqid
![Page 12: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/12.jpg)
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Writes into HBase
Mutations inserted into sorted in-memory structure and WAL– Fast lookups of recent data– Append-only log for durability and speed
Mutations are collected by destination Region Beware of hot-spotting Data in memory eventually flush’ed into sorted (H)files
![Page 13: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/13.jpg)
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Compactions and Flushes
Flush: Taking Key-Values from the In-Memory map and creating an HFile Minor Compaction: Rewriting a subset of HFiles for a Region into one HFile Major Compaction: Rewriting all HFiles for a Region into one HFile
Compactions balance improved query performance with cost of rewriting data– Compactions are good!– Must understand SLA’s to properly tune compactions
![Page 14: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/14.jpg)
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reads into HBase
Merge-Sort over multiple streams of data– Memory– Disk (many files)
hbase:meta is the definitive source of where to find Regions
RowKey Region
hbase:meta
RegionServer
ZooKeeper
![Page 15: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/15.jpg)
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache ZooKeeper™
Distributed coordination is really hard Obvious use cases
– Service Discovery– Cluster Membership– “Root Table”
Non-obvious use cases– Assignment (sometimes)– Region Recovery– WAL Splitting– Cluster Replication– Distributed Procedures– HBase Snapshots
Apache ZooKeeper is a trademark of the Apache Software Foundation
![Page 16: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/16.jpg)
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache ZooKeeper™ Discovery/Leader ZNodes
– /hbase/rs/…– /hbase/master/…– /hbase/backup-masters/…
Consensus– /hbase/splitWAL/…– /hbase/flush-table-proc/...– /hbase/table-lock/...– /hbase/region-in-transition/...– /hbase/recovering-regions/...
![Page 17: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/17.jpg)
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Procedures
Resiliency in an unreliable system– How do we create a table?
“Procedure V2”– Resilient, finite state machine
HBase operations represented as ”procedures”
Clients are agnostic of Master state– Clients track procedure state
https://issues.apache.org/jira/secure/attachment/12679960/ProcedureV2.pdf
![Page 18: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/18.jpg)
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Procedures
Procedures are durable via Write-Ahead Log– /hbase/MasterProcWALs/…
Procedures only executed by the active HBase Master Reusable framework for the future
![Page 19: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/19.jpg)
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase RPCs
Internal and External HBase Communication
Half-Sync/Half-Async Model Many knobs to tweak
Listener Readers Scheduler Call Queues Call Runners/Handlers
Overview Components
![Page 20: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/20.jpg)
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase RPCs
Listener
Reader
Reader
Reader
Reader
Scheduler
Call Queues Handlers
Priority
Read
Write
Replication
Request to Execution
![Page 21: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/21.jpg)
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disaster Recovery
Multiple tools to ensure copies of data in the face of catastrophic failure CopyTable
– MapReduce job which reads all data from a source, writing to destination
Snapshots– A collection of Regions, their HFiles, and metadata
Backup & Restore– HBASE-7912, current targeted for HBase-2.0.0– Incremental and full backup/restore
![Page 22: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/22.jpg)
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberos
Strong authentication for untrusted networks ”Standard” across Apache Hadoop and friends Requirements:
– Forward/Reverse DNS– Unlimited Strength Java Cryptography Extension
SASL used to build RPC systems “Practical Kerberos with Apache HBase” https://goo.gl/y0d9ZO
![Page 23: Apache HBase Internals you hoped you Never Needed to Understand](https://reader035.vdocuments.site/reader035/viewer/2022062905/586f74911a28ab10258b5c35/html5/thumbnails/23.jpg)
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Finding an Hypothesis
Logs logs logs Application and System
Metrics exposed by JMX Graphing solutions
– Ambari Metrics Server + Grafana