apache hbase 0.98
DESCRIPTION
New features and improvements in the upcoming Apache HBase 0.98 release.TRANSCRIPT
![Page 1: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/1.jpg)
Apache HBase 0.98
Andrew Purtell Committer, Apache HBase, Apache Software Foundation
Big Data US Research And Development, Intel
![Page 2: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/2.jpg)
Who am I?
• Committer on the Apache HBase project
• Member of the Big Data Research And Development
Group at Intel
• Release manager for Apache HBase 0.98
![Page 3: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/3.jpg)
What’s In Apache HBase 0.98?
• 212 resolved JIRAs
• New features
– Reverse scans (HBASE-4811)
– EXEC access checks for Endpoints (HBASE-6104)
– Transparent server side encryption (HBASE-7544)
– Per-cell ACLs (HBASE-7662)
– Visibility labels (HBASE-7663)
– Stripe compactions (HBASE-7667)
– MapReduce over snapshots (HBASE-8369)
– REST streaming scans (HBASE-9343)
• Performance improvements
– Improved WAL write threading model (HBASE-8755)
• API cleanups and many bug fixes
![Page 4: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/4.jpg)
Branch Release Criteria
• Wire compatibility with HBase 0.96
– Mixed client↔server and server↔server operation with 0.96
possible as long as no 0.98 specific features enabled
• Compatible with earlier on-disk data formats
• Direct upgrade possible from 0.94 → 0.98 using the
same offline data migration procedure necessary for
0.94 → 0.96
• No significant performance regression from 0.96 using
defaults
• Binary API compatibility with versions < 0.98 not
guaranteed, code that directly references HBase JARs
may need to be recompiled
![Page 5: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/5.jpg)
Reverse Scans (HBASE-4811)
• Introduces a new internal scanner type that seeks to
the end of a range and then steps backwards
• No longer necessary to maintain tables of keys in
reverse sort order for scanning
• Exposed at the client with a new Scan method
Scan#setReversed(boolean reversed)
• A few % slower than forward scanning in CPU bound
tests (server side, filters)
![Page 6: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/6.jpg)
Endpoint EXEC Grants (HBASE-6104)
• HBase ACLs can grant a familiar set of privileges to
users (and groups):
– (R)ead
– (W)rite
– E(X)excute
– (C)reate
– (A)dmin
• AccessController versions prior to 0.98 ignored X
• Now access to coprocessor Endpoint invocations can
be controlled on a global, per-table, or per-CF basis
– Enable the AccessController
– Set hbase.security.exec.permission.checks to “true”
– Grant or revoke permissions as appropriate
– Deploy the coprocessor application
![Page 7: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/7.jpg)
Cell Tags
• All values written to HBase are stored into cells
– Cell is used interchangeably with “key-value” or “KeyValue” for
legacy reasons
• Cells can now also carry an arbitrary number of tags
– Metadata, considered distinct from the key and the value
– Optional dictionary compression for tags in HFiles and WALs
– Only available server side
– Coprocessors can manage their own user defined tags
![Page 8: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/8.jpg)
HFile Version 3
• HFile version 2 plus
– The ability to persist cell tags
– Support for optional file block encryption
• Enabled via a site file change
– hfile.format.version -> 3
• Once enabled, all data is transparently migrated over
time as new files are written by flushes and
compactions
• Required for:
– Transparent Encryption (HBASE-7544)
– Per-cell ACLs (HBASE-7662)
– Visibility labels (HBASE-7663)
• Considered experimental, but proven stable under load
![Page 9: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/9.jpg)
Transparent Encryption (HBASE-7544)
• Introduces a new generic cryptographic codec and key
management framework into hbase-common
• Provides transparent encryption of HBase on disk data
– Optional per-file HFile block encryption (requires HFile v3)
– Optional secure WAL reader and writer
• Provides simple key management
– Flexible and non-intrusive key rotation
– Two-tier key architecture for consistency with best practices
– Key provider supports secure local key storage or any network
or hardware key storage with Java KeyStore support
• Shell support
![Page 10: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/10.jpg)
Transparent Encryption (HBASE-7544)
![Page 11: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/11.jpg)
Per-Cell ACLs (HBASE-7662)
• Extends the AccessController with support for
persisting and checking ACL data in cell tags
• Uses existing API facilities to transmit per cell ACLs
• Backward compatible
with existing installs and
code
• We treat ACLs on a cell
as scoped only to the
cell for straightforward
policy evolution
• All mutations must have
covering permission in a
dominating grant
![Page 12: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/12.jpg)
Visibility Labels (HBASE-7663)
• Introduces a new VisibilityController coprocessor
• Introduces per-cell visibility expressions, client API
extensions for setting visibility and authorizations, and new
shell commands for label management
• The maximal set of labels for a user is defined with the new
shell command ‘setauths’ or equivalent admin API
• Users specify visibility expressions on cells
• Users submit authorizations on Gets and Scans
• The effective label set for the request is built in the RPC
context from authorizations; those not in the maximal set
are dropped
– How this is done is pluggable, e.g. integration with enterprise
identity management solutions
• Scan results are filtered with (label) set membership tests
![Page 13: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/13.jpg)
Visibility Labels (HBASE-7663)
• Visibility expressions
– Labels: arbitrary strings
(converted into ordinals with an
internal dictionary)
– Expressions: Labels joined in
boolean expressions
– Operators: &, |, !
– Parenthesis for precedence
secret
secret | topsecret
( secret | topsecret ) & !probationary
![Page 14: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/14.jpg)
Improved WAL Write Throughput (HBASE-8755)
• Introduces a new threading model for WAL writes that
reduces lock contention
• Provides better write throughput when under load
– A ~15% improvement in write ops/sec at high write
concurrency
• Lays groundwork for multiple WALs
– Will provide further write throughput increase
– Also important for limiting the impact of encrypting WAL
entries
![Page 15: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/15.jpg)
Stripe Compactions (HBASE-7667)
• Stripe compactions split the data inside the region by
row key and create sub-ranges of data
• The sub-ranges are compacted independently
• Depending on ingest and access patterns, using stripe
compactions can reduce read latency variability and
reduce compaction data volume (write amplification)
• Two use cases in particular may benefit
1. Approximately uniform keys and large regions
2. Non-uniform data with sequential row keys (e.g. log data)
• Can be complex to configure and tune, consult the
documentation for detail
![Page 16: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/16.jpg)
MapReduce Over Snapshots (HBASE-8369)
• Introduces MapReduce utilities supporting MR jobs
over snapshots of table data
• Similar to TableInputFormat but instead of running over
an online table using the HBase API it runs directly
over HFiles on disk collected from a table snapshot.
• For performance-dominant use cases where the
HBase API cannot provide sufficient throughput
– Can increase throughput of bulk scanning ~5x by streaming
HDFS reads directly to the client
• Caveat: Not recommended from a security perspective
– Built in access control is completely bypassed
– It is a risk to open direct access to HFile data in HDFS
![Page 17: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/17.jpg)
REST Streaming Scans (HBASE-9343)
• The REST gateway provides stateful scanners to be
consistent with the HBase API but this is not REST-ful
– Scanner state is not shared across multiple gateways
– Scanner state will be lost if the gateway fails
• Introduces a new scanning mode to the REST API for
stateless scanning
• The client manages paging and limits
• Instead of forcing a batching up of results as they
come back from the RegionServers into multiple HTTP
transactions, the stateless scanner can stream all
results back to the client over one HTTP connection
![Page 18: Apache HBase 0.98](https://reader034.vdocuments.site/reader034/viewer/2022052321/5471d575b4af9fb90a8b4cb4/html5/thumbnails/18.jpg)
End
Questions?