apache hbase 0 - files.meetup.comfiles.meetup.com/1350427/apache hbase 0.98.v2.pdfendpoint exec...

19
Apache HBase 0.98 Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel

Upload: buithuan

Post on 09-Mar-2018

232 views

Category:

Documents


4 download

TRANSCRIPT

Apache HBase 0.98

Andrew Purtell Committer, Apache HBase, Apache Software Foundation

Big Data US Research And Development, Intel

Who am I?

• Committer on the Apache HBase project

• Member of the Big Data Research And Development

Group at Intel

• Release manager for Apache HBase 0.98

What’s In Apache HBase 0.98?

• ~230 resolved JIRAs

• New features

– Reverse scans (HBASE-4811)

– EXEC access checks for Endpoints (HBASE-6104)

– Transparent server side encryption (HBASE-7544)

– Per-cell ACLs (HBASE-7662)

– Visibility labels (HBASE-7663)

– Stripe compactions (HBASE-7667)

– MapReduce over snapshots (HBASE-8369)

– REST streaming scans (HBASE-9343)

• Performance improvements

– Improved WAL write threading model (HBASE-8755)

• API cleanups and many bug fixes

Branch Release Criteria

• Wire compatibility with HBase 0.96

– Mixed client↔server and server↔server operation with 0.96

possible as long as no 0.98 specific features enabled

• Compatible with earlier on-disk data formats

• Direct upgrade possible from 0.94 → 0.98 using the

same offline data migration procedure necessary for

0.94 → 0.96

• No significant performance regression from 0.96 using

defaults

• Binary API compatibility with versions < 0.98 not

guaranteed, code that directly references HBase JARs

may need to be recompiled

Reverse Scans (HBASE-4811)

• Introduces a new internal scanner type that seeks to

the end of a range and then steps backwards

• No longer necessary to maintain tables of keys in

reverse sort order for scanning

• Exposed at the client with a new Scan method

Scan#setReversed(boolean reversed)

• A few % slower than forward scanning in CPU bound

tests (server side, filters)

Endpoint EXEC Grants (HBASE-6104)

• HBase ACLs can grant a familiar set of privileges to

users (and groups):

– (R)ead

– (W)rite

– E(X)excute

– (C)reate

– (A)dmin

• AccessController versions prior to 0.98 ignored X

• Now access to coprocessor Endpoint invocations can

be controlled on a global, per-table, or per-CF basis

– Enable the AccessController

– Set hbase.security.exec.permission.checks to “true”

– Grant or revoke permissions as appropriate

– Deploy the coprocessor application

Cell Tags

• All values written to HBase are stored into cells

– Cell is used interchangeably with “key-value” or “KeyValue” for

legacy reasons

• Cells can now also carry an arbitrary number of tags

– Metadata, considered distinct from the key and the value

– Only available server side

– Coprocessors can manage their own user defined tags

HFile Version 3

• HFile version 2 plus

– The ability to persist cell tags

– Support for optional file block encryption

• Enabled via a site file change

– hfile.format.version -> 3

• Once enabled, all data is transparently migrated over

time as new files are written by flushes and

compactions

• Required for:

– Transparent Encryption (HBASE-7544)

– Per-cell ACLs (HBASE-7662)

– Visibility labels (HBASE-7663)

Transparent Encryption (HBASE-7544)

• Introduces a new generic cryptographic codec and key

management framework into hbase-common

• Provides transparent encryption of HBase on disk data

– Optional per-file HFile block encryption (requires HFile v3)

– Optional secure WAL reader and writer

• Block encryption is enabled on a per-CF basis

– Supports schema design that places sensitive information in

only a subset of column families

• Provides simple key management

– Flexible and non-intrusive key rotation

– Key provider supports secure local key storage or any network

or hardware key storage with Java KeyStore support

• Simple shell support for testing

Transparent Encryption (HBASE-7544)

Per-Cell ACLs (HBASE-7662)

• Extends the AccessController with support for

persisting and checking ACL data in cell tags

• Uses existing API facilities to transmit per cell ACLs

• Backward compatible

with existing installs and

code

• We treat ACLs on a cell

as scoped only to the

cell for straightforward

policy evolution

• All mutations must have

covering permission in a

dominating grant

Visibility Labels (HBASE-7663)

• Introduces a new VisibilityController coprocessor

• Introduces per-cell visibility expressions, client API

extensions for setting visibility and authorizations, and new

shell commands for label management

• The maximal set of labels for a user is defined with the new

shell command ‘setauths’ or equivalent admin API

• Users specify visibility expressions on cells

• Users submit authorizations on Gets and Scans

• The effective label set for the request is built in the RPC

context from authorizations; those not in the maximal set

are dropped

– How this is done is pluggable, e.g. integration with enterprise

identity management solutions

• Scan results are filtered with (label) set membership tests

Visibility Labels (HBASE-7663)

• Visibility expressions

– Labels: arbitrary strings

(converted into ordinals with an

internal dictionary)

– Expressions: Labels joined in

boolean expressions

– Operators: &, |, !

– Parenthesis for precedence

secret

secret | topsecret

( secret | topsecret ) & !probationary

Improved WAL Write Throughput (HBASE-8755)

• Introduces a new threading model for WAL writes that

reduces lock contention

• Provides better write throughput when under load

– A ~15% improvement in write ops/sec at high write

concurrency

• Lays groundwork for multiple WALs

– Will provide further write throughput increase

– Also important for limiting the impact of encrypting WAL

entries

Stripe Compactions (HBASE-7667)

• Stripe compactions split the data inside the region by

row key and create sub-ranges of data

• The sub-ranges are compacted independently

• Depending on ingest and access patterns, using stripe

compactions can reduce read latency variability and

reduce compaction data volume (write amplification)

• Two use cases in particular may benefit

1. Approximately uniform keys and large regions

2. Non-uniform data with sequential row keys (e.g. log data)

• Can be complex to configure and tune, consult the

documentation for detail

MapReduce Over Snapshots (HBASE-8369)

• Introduces MapReduce utilities supporting MR jobs

over snapshots of table data

• Similar to TableInputFormat but instead of running over

an online table using the HBase API it runs directly

over HFiles on disk collected from a table snapshot.

• For performance-dominant use cases where the

HBase API cannot provide sufficient throughput

– Can increase throughput of bulk scanning ~5x by streaming

HDFS reads directly to the client

• Caveat: Not recommended from a security perspective

– Built in access control is completely bypassed

– It is a risk to open direct access to HFile data in HDFS

REST Streaming Scans (HBASE-9343)

• The REST gateway provides stateful scanners to be

consistent with the HBase API but this is not REST-ful

– Scanner state is not shared across multiple gateways

– Scanner state will be lost if the gateway fails

• Introduces a new scanning mode to the REST API for

stateless scanning

• The client manages paging and limits

• Instead of forcing a batching up of results as they

come back from the RegionServers into multiple HTTP

transactions, the stateless scanner can stream all

results back to the client over one HTTP connection

Going Forward with branch-0.98

• Bug fixes

• Performance improvements

• Further deprecations and API changes on the way to

HBase 1.0

– No more breaking binary API changes allowed

• Tag compression in HFile (HBASE-10451)

• Performance improvements for encryption

– Per family WAL encryption (HBASE-10077)

– Optional native accelerated cryptographic functions

End

Questions?