overview of hdfs transparent encryption

20
1 © Cloudera, Inc. All rights reserved. Charles Lamb HDFS Transparent Encryption SFHUG

Upload: cloudera-inc

Post on 15-Jul-2015

779 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Overview of HDFS Transparent Encryption

1© Cloudera, Inc. All rights reserved.

Charles Lamb

HDFS Transparent EncryptionSFHUG

Page 2: Overview of HDFS Transparent Encryption

2© Cloudera, Inc. All rights reserved.

Overview

• Done under open source (HDFS-6134)

• Data read from and written to certain directories is transparently encrypted

• No changes to user code

• Encryption/decryption always done by client

• HDFS never handles unencrypted data or unencrypted keys

• Helps applications be regulation-compliant (HIPAA, PCI DSS, FISMA, etc.)

Page 3: Overview of HDFS Transparent Encryption

3© Cloudera, Inc. All rights reserved.

Background

• Encryption can happen at any of several levels:

• Application: most secure and flexible, but hardest to do

• Adding encryption to legacy applications may be difficult

• Database: most DBMSs have this, but may incur performance penalties

• Secondary indices can not generally be encrypted

• Filesystem: high performance, transparent, but may not be flexible enough

• Multi-tenancy vs per-user encryption policies

• Disk: high performance but only really protects against physical theft

• HDFS encryption is somewhere between Filesystem and Database level

Page 4: Overview of HDFS Transparent Encryption

4© Cloudera, Inc. All rights reserved.

Design Goals

• Performance and scalability

• Transparent to applications, including legacy apps

• End-to-end

• Data should be encrypted on the network and ‘at-rest’

• Compartmentalization

• Key management independent of HDFS management

• Includes preventing HDFS admins and root users from accessing sensitive data

• Compatibility with HDFS access methods: WebHDFS, HttpFS, FUSE, NFS, hftp, har, etc.

Page 5: Overview of HDFS Transparent Encryption

5© Cloudera, Inc. All rights reserved.

Architectural Concepts

• Key Management Server

• Encryption Zones

• Keys

Page 6: Overview of HDFS Transparent Encryption

6© Cloudera, Inc. All rights reserved.

Key Management Server

Page 7: Overview of HDFS Transparent Encryption

7© Cloudera, Inc. All rights reserved.

Key Management Server (KMS)

• KMS sits between client and key server

• E.g. Cloudera Navigator Key Trustee

• Provides a unified API and scalability

• REST API

• Does not actually store keys (backend does that), but does cache them

• ACLs on per-key basis

Page 8: Overview of HDFS Transparent Encryption

8© Cloudera, Inc. All rights reserved.

Encryption Zones

• An HDFS directory in which the contents (including subdirs) are encrypted on write and decrypted on read.

• An EZ begins life as an empty directory

• Renames in/out of an EZ are prohibited

• Encryption is transparent to application with no code changes

Page 9: Overview of HDFS Transparent Encryption

9© Cloudera, Inc. All rights reserved.

Keys

• Every Encryption Zone has a key (“EZ Key”)

• Every file in an Encryption Zone has a unique key (“Data Encryption Key” or “DEK”)

• The HDFS NameNode stores the name of the EZ Key in an Xattr of the EZ Dir

• The actual EZ Key is stored in the Key Server

• The NameNode stores the DEK in an Xattr of the file, but only in encrypted form

• Encrypted Data Encryption Key, or “EDEK”

• The NameNode never touches decrypted data or decrypted keys

Page 10: Overview of HDFS Transparent Encryption

10© Cloudera, Inc. All rights reserved.

EZ Keys, Data Encryption Keys, and Encrypted Data Encryption Keys

Page 11: Overview of HDFS Transparent Encryption

11© Cloudera, Inc. All rights reserved.

Key Handling

Page 12: Overview of HDFS Transparent Encryption

12© Cloudera, Inc. All rights reserved.

Design

• End-to-end encryption

• Encryption occurs on the client and decrypted data is never touched by HDFS

• Protects against network sniffing, evil HDFS admins, and hard drive theft

• HDFS never touches key material (DEK’s or EZ keys)

• Compromising an HDFS daemon is not a viable attack vector

• HDFS handles encrypted Keys (EDEKs), but never in decrypted form (DEKs)

• Key permissions are handled by the KMS ACLs

• Each file is encrypted with a unique DEK

Page 13: Overview of HDFS Transparent Encryption

13© Cloudera, Inc. All rights reserved.

HDFS Encryption Configuration

• hadoop key create <keyname>

• hdfs dfs –mkdir <path>

• hdfs crypto –createZone –keyName <keyname> -path <path>

Page 14: Overview of HDFS Transparent Encryption

14© Cloudera, Inc. All rights reserved.

KMS Per-User ACL Configuration

• White lists (check for inclusion) and black lists (check for exclusion)

• etc/hadoop/kms-acls.xml

• hadoop.kms.acl.CREATE

• hadoop.kms.blacklist.CREATE

• … DELETE, ROLLOVER, GET, GET_KEYS, GET_METADATA,

GENERATE_EEK, DECRYPT_EEK

Page 15: Overview of HDFS Transparent Encryption

15© Cloudera, Inc. All rights reserved.

KMS Per-Key ACL Configuration

• etc/hadoop/kms-acls.xml

• hadoop.kms.acl.<keyname>.<operation>

• MANAGEMENT – createKey, deleteKey, rolloverNewVersion

• GENERATE_EEK – generateEncryptedKey,

warmUpEncryptedKeys

• DECRYPT_EEK – decryptEncryptedKey

• READ – getKeyVersion, getKeyVersions, getMetadata,

getKeysMetadata, getCurrentKey

• ALL – all of the above

Page 16: Overview of HDFS Transparent Encryption

16© Cloudera, Inc. All rights reserved.

Performance

• AES-CTR, 128 or 256 (with unlimited strength JCE installed)

• AES-NI available

• Negligible overhead on writes and 7.5% impact on reads for datasets larger than memory

Page 17: Overview of HDFS Transparent Encryption

17© Cloudera, Inc. All rights reserved.

DistCp

• Encryption Zone to Encryption Zone

• use –update –skipcrccheck

• Admins use special /.reserved/raw path prefix

• /.reserved/raw is only available to root and provides the encrypted contents

Page 18: Overview of HDFS Transparent Encryption

18© Cloudera, Inc. All rights reserved.

Exceptions

• Hive: may not be able to do a query that combines data from more than one encryption zone

Page 19: Overview of HDFS Transparent Encryption

19© Cloudera, Inc. All rights reserved.

HDFS Encryption - Summary

• Good performance (4-10% hit)

• No mods to existing applications

• Prevents attacks at the filesystem and below

• OS and filesystem only see encrypted bytes

• Data is encrypted all the way to the client

• Secure ‘at rest’ and in transit

• Key management is independent of HDFS

• Key admin != HDFS admin

• Can prevent HDFS admin from accessing secure data

Page 20: Overview of HDFS Transparent Encryption

20© Cloudera, Inc. All rights reserved.

Questions