hardening hadoop for healthcare with project rhino

32
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Secure Hadoop as a Service Vin Sharma, Intel March 26, 2014

Upload: amazon-web-services

Post on 26-Jan-2015

110 views

Category:

Technology


3 download

DESCRIPTION

Big Data analytics is estimated to save over $450B in healthcare costs, and there is exciting adoption of big data platforms with healthcare payers and providers. Hadoop and cloud computing have emerged as one of the most promising technologies for implementing big data at scale for healthcare workloads in production, using Hadoop as a service. Common considerations in the healthcare industry include privacy and data security, and the challenges of regulatory compliance with HIPAA and HITECH. Intel provides a security framework for Hadoop that enables enterprises to deploy big data analytics without compromising performance or security. Intel is contributing to a common security framework for Apache Hadoop, in the form of Project Rhino, which enables Hadoop to run workloads without compromising performance or security. Join this session to learn how your enterprise can take advantage of the security capabilities in the Intel Data Platform running on AWS to analyze healthcare data while ensuring technical safeguards that help you remain in compliance.

TRANSCRIPT

Page 1: Hardening Hadoop for Healthcare with Project Rhino

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Secure Hadoop as a Service

Vin Sharma, Intel

March 26, 2014

Page 2: Hardening Hadoop for Healthcare with Project Rhino

Who needs Hadoop security?

Page 3: Hardening Hadoop for Healthcare with Project Rhino

Big Data Analytics in Health and Life Sciences

Now: Disparate

streams of data

Next: Integrated

computing and data

Genomics

Clinical

Claims &

transactionsMeds &

labs

Patient

experience

Personal

data

Better decisions and outcomes at

reduced cost

Clinical Analysis

Genomic AnalysisFrom population- to person-based

treatment

Page 4: Hardening Hadoop for Healthcare with Project Rhino

Cost Savings via Big Data Analytics

Provider

Patient

Payer

Producer

Regulator

Personalized medicine

Data-driven adherence

Proven Pathways of care

Co-ordinated across providers

Shift volume to right setting

Reducing ER (re)admit rates

Provider / performance transparency

& payment innovation

Accelerated Approval

Accelerated Discovery

$180B

$100B$100B

$70B

Page 5: Hardening Hadoop for Healthcare with Project Rhino

Compliance Requirements

• HIPAA– Privacy Rule

– Security Rule

• Administrative Safeguards

• Physical Safeguards

• Technical Safeguards

• Others…

Provider

Patient

Payer

Producer

Regulator

Page 6: Hardening Hadoop for Healthcare with Project Rhino

Technical SafeguardsAccess Control A covered entity must implement technical policies and

procedures that allow only authorized persons to access

electronic protected health information (e-PHI).

Audit Controls A covered entity must implement hardware, software, and/or

procedural mechanisms to record and examine access and

other activity in information systems that contain or use e-PHI.

Integrity Controls A covered entity must implement policies and procedures to

ensure that e-PHI is not improperly altered or destroyed.

Electronic measures must be put in place to confirm that e-PHI

has not been improperly altered or destroyed.

Transmission Security A covered entity must implement technical security measures

that guard against unauthorized access to e-PHI that is being

transmitted over an electronic network.

Page 7: Hardening Hadoop for Healthcare with Project Rhino

Hadoop Security Challenges

Page 8: Hardening Hadoop for Healthcare with Project Rhino

Hadoop Security Challenges

HiveQL

Sqo

op

Flu

me

Zoo

kee

per

Pig

YARN (MRv2)

HDFS 2.0

R connectorsGiraph HCatalog

Hive

HBase Coprocessors

HBase

Mahout

Oozie

Components of a typical Hadoop stack

Page 9: Hardening Hadoop for Healthcare with Project Rhino

Hadoop Security ChallengesComponents sharing an authentication framework

HiveQL

Sqo

op

Flu

me

Zoo

kee

per

Pig

YARN (MRv2)

HDFS 2.0

R connectorsGiraph HCatalogMetadata

Hive

HBase Coprocessors

HBase

Mahout

OozieData flow

Page 10: Hardening Hadoop for Healthcare with Project Rhino

Hadoop Security ChallengesComponents capable of access control

HiveQL

Sqo

op

Flu

me

Zoo

kee

per

Pig

YARN (MRv2)

HDFS 2.0

R connectorsGiraph HCatalog

Hive

HBase Coprocessors

HBase

Mahout

Oozie

Page 11: Hardening Hadoop for Healthcare with Project Rhino

Hadoop Security ChallengesComponents capable of admission control

HiveQL

Sqo

op

Flu

me

Zoo

kee

per

Pig

YARN (MRv2)

HDFS 2.0

R connectorsGiraph HCatalog

Hive

HBase Coprocessors

HBase

Mahout

Oozie

Page 12: Hardening Hadoop for Healthcare with Project Rhino

Hadoop Security ChallengesComponents capable of (transparent) encryption

HiveQL

Sqo

op

Flu

me

Zoo

kee

per

Pig

HDFS 2.0

R connectorsGiraph HCatalog

Hive

HBase Coprocessors

HBase

Mahout

Oozie

YARN (MRv2)

Page 13: Hardening Hadoop for Healthcare with Project Rhino

Hadoop Security ChallengesComponents sharing a common policy engine

HiveQL

Sqo

op

Flu

me

Zoo

keep

er

Pig

HDFS 2.0

R connectorsGiraph HCatalog

Hive

HBase Coprocessors

HBase

Mahout

Oozie

YARN (MRv2)

Page 14: Hardening Hadoop for Healthcare with Project Rhino

Hadoop Security ChallengesComponents sharing a common audit log format

HiveQL

Sqo

op

Flu

me

Zoo

kee

per

Pig

HDFS 2.0

R connectorsGiraph HCatalogMetadata

Hive

HBase Coprocessors

HBase

MahoutData mining

Oozie

YARN (MRv2)

Page 15: Hardening Hadoop for Healthcare with Project Rhino

Hardening Hadoop from within

Page 16: Hardening Hadoop for Healthcare with Project Rhino

Project Rhino

Encryption and Key Management

Role Based Access Control

Common Authorization

Consistent Auditing

Page 17: Hardening Hadoop for Healthcare with Project Rhino

Deliver defense in depth

Firewall

Gateway

Authn

AuthZ

Encryption Audit & Alerts

Isolation

Page 18: Hardening Hadoop for Healthcare with Project Rhino

Protect Hadoop APIs

• Enforces consistent security policies across all Hadoop

services

• Serves as a trusted proxy to Hadoop, Hbase, and WebHDFS

APIs

• Common Criteria EAL4+, HSM, FIPS 140-2 certified

• Deploys as software, virtual appliance, or hardware

appliance

• Available on AWS Marketplace

Hcatalog

Stargate

WebHDFS

Page 19: Hardening Hadoop for Healthcare with Project Rhino

Provide role-based access control

AuthZ

• File, table, and cell-level

access control in HBase

• JIRA HBASE-6222:

Add per-KeyValue security

_acl_tabl

Page 20: Hardening Hadoop for Healthcare with Project Rhino

Provide encryption for data at rest

MapReduce

RecordReader

Map

Combiner

Partitioner

LocalMerge & Sort

Reduce

RecordWriterDecrypt

Encrypt

Derivative

Encrypt

Derivative

Decrypt

HDFS

• Extends compression

codec into crypto codec

• Provides an abstract API

for general use

Page 21: Hardening Hadoop for Healthcare with Project Rhino

Provide encryption for data at rest

HBase • Transparent table/CF encryption HBase-

7544

Page 22: Hardening Hadoop for Healthcare with Project Rhino

Pig & Hive Encryption

• Pig Encryption Capabilities– Support of text file and Avro* file format

– Intermediate job output file protection

– Pluggable key retrieving and key resolving

– Protection of key distribution in cluster

• Hive Encryption Capabilities– Support of RC file and Avro file format

– Intermediate and final output data encryption

– Encryption is transparent to end user without changing existing SQL

Page 23: Hardening Hadoop for Healthcare with Project Rhino

Crypto Codec Framework

• Extends compression codec

• Establishes a common abstraction of the API level that can be shared

by all crypto codec implementations

CryptoCodec cryptoCodec = (CryptoCodec) ReflectionUtils.newInstance(codecClass, conf);

CryptoContext cryptoContext = new CryptoContext();

...

cryptoCodec.setCryptoContext(cryptoContext);

CompressionInputStream input = cryptoCodec.createInputStream(inputStream);...

• Provides a foundation for other components in Hadoop* such as

MapReduce or HBase* to support encryption features

Page 24: Hardening Hadoop for Healthcare with Project Rhino

Key Distribution

• Enabling crypto codec in a MapReduce job

• Enabling different key storage or management systems

• Allowing different stages and files to use different keys

• API to integrate with external key manage system

Page 25: Hardening Hadoop for Healthcare with Project Rhino

Crypto Software Optimization

Multi-Buffer Crypt

• Process multiple independent

data buffers in parallel

• Improves cryptographic

functionality up to 2-9X

Page 26: Hardening Hadoop for Healthcare with Project Rhino

Intel® Data Protection Technology

AES-NI

• Processor assistance for

performing AES encryption

• Makes enabled encryption

software faster and stronger

Secure Key (DRNG)

• Processor-based true random

number generator

• More secure, standards

compliance, high performance

Internet

Data in MotionSecure transactions used pervasively in ecommerce, banking, etc.

Data in Process Most enterprise and cloud applications offer encryption options to secure information and protect confidentiality

Data at RestFull disk encryption software protects data while saving to disk

AES-NI - Advanced Encryption Standard New Instructions

Secure Key - previously known as Intel Digital

Random Number Generator (DRNG)

Page 27: Hardening Hadoop for Healthcare with Project Rhino

Intel® AES-NI Accelerated Encryption

18.2x/19.8x

Non Intel®

AES-NI

With Intel®

AES-NI

Intel® AES-NI

Multi-Buffer

5.3x/19.8x

En

cry

ptio

n

De

cry

ptio

n

En

cry

ptio

n

De

cry

ptio

n

AES-NI - Advanced Encryption Standard New Instructions

20X

Faster

Crypto

Relative speed of crypto functions

Higher is better

Based on Intel tests

Page 28: Hardening Hadoop for Healthcare with Project Rhino

Cloud Platform for secure Hadoop

Intel® Xeon® Processors

• E7 Family

• E5 Family

• E3 Family

Amazon

• EC2 Reserved Instances

• EC2 Dedicated Instances

Page 29: Hardening Hadoop for Healthcare with Project Rhino

20 more at aws.amazon.com/ec2/instance-types

Amazon EC2 Instances with AES-NI

Page 30: Hardening Hadoop for Healthcare with Project Rhino

Resources

Page 31: Hardening Hadoop for Healthcare with Project Rhino

For more information

• intel.com/bigdata

• intel.com/healthcare/bigdata

• github.com/intel-hadoop/project-rhino/

• aws.amazon.com/compliance/

• aws.amazon.com/ec2/instance-types/

Page 32: Hardening Hadoop for Healthcare with Project Rhino

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Secure Hadoop as a Service

Vin Sharma, Intel

March 26, 2014

Thank you!