partner ecosystem showcase for apache ranger and apache atlas

45
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hortonworks Inc. Talend Inc. Arcadia Data Protegrity Ali Bajwa, Partner Solutions Laurent Bride, CTO Shant Hovsepian, CTO Sunil Sabat, Director, Partner Solutions Srikanth Venkat, Product Management DataWorks Summit - San Jose June 2017 Partner Ecosystem Showcase For Apache Ranger And Apache Atlas

Upload: dataworks-summit

Post on 21-Jan-2018

543 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hortonworks Inc. Talend Inc. Arcadia Data ProtegrityAli Bajwa, Partner Solutions Laurent Bride, CTO Shant Hovsepian, CTO Sunil Sabat, Director, Partner SolutionsSrikanth Venkat, Product Management

DataWorks Summit - San Jose June 2017

Partner Ecosystem Showcase For Apache Ranger And Apache Atlas

Page 2: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaApache Ranger & Apache Atlas

Journey, Ecosystem & Partners

Hortonworks Partner Certification Program

SEC Ready & GOV Ready program

Partner Technology Showcase

Page 3: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

3 © Hortonworks Inc. 2011 – 2017 All Rights Reserved

Apache Ranger Community Snapshot

May 2014

XASecureAcquisition

July 2014

Enters Apache Incubation

Nov 2014

Ranger 0.4.0

Release

July 2015

Ranger 0.5/ HDP2.3

Aug 2016

Ranger 0.6/ HDP2.5

Nov 2016

Ranger 0.6.2/ HDP2.5.3

Jan 2017

Ranger TLP graduation!

Apr 2017

Ranger 0.7/

HDP2.6

TBD

1.0.0

Target

Release

Date

• Committers: 22

• Contributors from:Ebay, MSFT, Huawei, Pandora, Accenture, ING, Talend

Ranger 0.7/HDP 2.6

• Export/import of Policies

• $User and macros

• Plugin status tab

• “Show columns” and “describe extended support”

• Incremental LDAP Sync

• SmartSense Metrics

Ranger 0.6/HDP2.5

• Classification (tag) based security (ABAC)

• Dynamic Column Masking & Row Filtering

• KMS HSM Integration (Safenet)

• Dynamic Policies & Deny Conditions

• LDAP Improvements & Audit Scalability

Jun 2017

Ranger 0.7.1/

HDP2.6.1

Page 4: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Ranger: Ecosystem

PartnerPartner Integrations

Apache Ranger

ApacheKafka

Native Hadoop Service Authorizers

Azure Data Lake Store (ADLS)*(Future)

Authorizer Extensions

for Non-Hadoop

Filesystems & Stores

Page 5: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Background: DGI Community becomes Apache Atlas

May2015

Apache AtlasIncubation

DGI groupKickoff

Dec 2014

Apr2017

HDP 2.6/Apache 0.8 Release

Global FinancialCompany

* DGI: Data Governance Initiative

Aug2016

HDP 2.5/Apache 0.7Foundation Release

Apache 0.8/HDP 2.6• Simplified Search UI

• Simplified APIs

• Classification-based security for HDFS, Kafka, HBase

• Knox SSO

• Performance/scalability improvements

Apache 0.7.1/HDP 2.5.3

• High availability support

• LDAP Authentication/Authorization

• Classification based security for Hive

• UI Redesign

• Committers – 35• Code contributors from

- IBM, Aetna, Merck, Target, JPMC

Page 6: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Atlas: Ecosystem

Custom Integration

Apache Atlas

RDBMS

ApacheKafka

Pending:PartnerPartner

Page 7: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

7©2017 Talend Inc

Talend Studio Jobs lineage withApache AtlasLaurent Bride, CTO Talend

Page 8: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

8

Agenda

Integration Goals

Design

Technical Details

Demo

Page 9: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

9

Integration Goals

Support lineage of Talend Studio jobs on Apache Atlas / Hortonworks HDP

Similar (or improved) functionality to what we offer for other lineage providers.

Lineage for Talend Big Data jobs both on Spark/Hadoop.

Authentication with Lineage Backend.

Die-on-error: Lineage failure does not affect job execution.

Page 10: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

10

Design

Goal: Support a similar generic lineage model.

Solution:

Send the transformation graph representation with each node as a HashMap of properties.

Translate the graph into the given model in an integration layer.

For the Atlas case it uses the Atlas REST API via atlas-client JAR.

Let the specific lineage provider functionality open for advanced functionality

• Future Roadmap items

Page 11: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

11

Technical Details - Talend Model for Atlas

Note that Lineage view only shows Entities that are in the “DataSet – Process – DataSet” form.

So we had to represent every Component as a DataSet (tComponent) and create artificial components (tArtificialComponent) as a Process so we can show them in the Lineage view.

Page 12: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

12

Technical Details – Open Issues

The entity connection constraint is our biggest issue.

Breaking changes on the API (atlas-client 0.8 but compatible with 0.7 through redirect).

Inherited properties are shown even if not assigned (this is not an issue, but due to our reuse of DataSet we have issues like this:

DataSet has an owner, but an owner does not make sense for a Talend transform.

Atlas Model is flexible but strict at the same time, data is constrained to evolve with metadata, if we pass new arguments that are not defined in the metadata model they are ignored.

Page 13: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

13

Demo / Talend Studio side

Page 14: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

14

Demo / How it looks like in Apache Atlas

Page 15: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

Securing Visual Analytics for Big Data

with Apache Ranger

Shant Hovsepian – CTO & co-Founder@superdupershant

June 14, 2017

Page 16: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

Arcadia Visualization Engine

The First Native Visual Analytics Platform for Big Data

Arcadia Analytic Platform(Smart Acceleration™)

On-Premises

Drag-and-drop Visual Analytics & Dashboards

HybridCloud

Custom Data Applications

…BIG DATA OS

Distributed execution,

data storage, metadata, security

IN-CLUSTER ANALYTICS ENGINEScales linearly with cluster for

speed and easier management

WEB-BASED INTERFACEDrag & drop interface for

visual analytics & app workflow

Data

Pla

tform

Page 17: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

The Challenge

Page 18: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

What is Apache Ranger?

• Centralized authorization and auditing across Hadoop components

• Access authorization based on resources

• Policy based behavior such as column masking

• Extensible Architecture

18

Page 19: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

The Value of a Robust Policy Engine

• It’s complicated code to get right

• I am Lazy, I don’t want to implement it

• Zero Knowledge Proofs

19

Page 20: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

Native Security Integration

Arcadia analytics

platform

HDFS

SINGLE COPY OF DATA TO SECURE

Reduces footprint of data copies with the same or summarized

information

Single policy definition for access control

Easier compliance

ENTERPRISE GRADE

Kerberos, LDAPS/AD, PAM and SAML

Single sign on for business users

Role-based access control with delegation

INTEGRATED ROLE-BASED ACCESS

Use role definitions from Ranger for access at BI tier

No risk of mismatching policies between data management tier

and BI tier

Page 21: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

Configuration

• Tight integration with Ranger + Ambari makes installation and

configuration very easy!

21

Page 22: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

Arcadia Data OLAP Engine

• In order to accelerate data access and reporting we have an on-cluster

engine

• Cubes are pre-computed and stored in memory and in HDFS via

HCatalog.

• We had to make sure all Hive catalog accesses were first authorized

through Ranger

• Simple implementation just requires an Authorizer class with

isAccessAllowed()

22

Page 23: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

Arcadia Data Visualization Server (BETA)

• While table level privileges like SELECT/INSERT make sense for tables

visuals tend to have a richer set of verbs

• Need to define custom “resources” in Ranger

• Define custom “privileges” Edit / Clone / Export / Interact

• A little tricky to do if you are not Java based

• Wildcard support is awesome!!!!!

• See Yesterday’s talk on Ranger + HAWQ for more details (EXTENDING

APACHE RANGER AUTHORIZATION BEYOND HADOOP)

23

Page 24: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

Policy Page

• Arcadia Policy Shows Up Along others

24

Page 25: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

Admin Level Access

25

Page 26: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

Restricted Access For The Public

26

Page 27: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

In Conclusion

Page 28: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Arcadia Data. Proprietary and Confidential

Thank you.

Visit us atBooth 606

Page 29: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Protegrity Big Data Protector and Apache Ranger

Ranger Integration

By

Sunil Sabat

Copyright – Protegrity Inc.

Page 30: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

WHAT DO WE DO?

Deliver centralized

policy enforcement

across enterprise

Apply security as

close to the data as

possible

Protect the entire

data flow – at rest,

in transit, in use

Page 31: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

HOW WE DO IT

Spending

Healthcare

Financial

ASSOCIATED DATAIDENTIFIED DATA

SSN (023-45-1288)

Name (Jane Doe)

Email ([email protected])

DE-IDENTIFIED DATA

SSN (153-51-4363)

Name (Hfhe Jes)

Email ([email protected])

IDENTITY IS KNOWN

IDENTITY IS NOT KNOWN

To Unauthorized Users

To Authorized Users

Page 32: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

ACROSS THE ENTERPRSE

ESA

1/02/1966 xxxx2278 ysieondusbak

Tokenized In the clearMaskedDe-identified

Joe Smith12/25/1966076-39-2778CENTRAL

MANAGEMENT

POLICY ENFORCED TECHNOLOGY

CONSISTENT PROTECTION

Page 33: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Protegrity’s Big Data Protector for Hadoop

Hive

MapReduceYARN

HDFS

OS File System

Pig OtherName Node

Data Node

Data Node

Data Node

Edge Node

Edge Node

Data Node

Edge Node

Data Node

Edge Node

Edge Node

Edge Node

Edge Node

Data Node

Data Node

Data Node

Edge Node

Hadoop Cluster Hadoop Node

Policy

Audit

Protegrity Big Data Protector for Hadoop delivers protection at every

node and is delivered with our own cluster management capability.

All nodes are managed by the Enterprise Security Administrator that

delivers policy and accepts audit logs

Protegrity Data Security Policy contains information about how data is de-identified and who is authorized to have access to that data.

Policy is enforced at different levels of protection in Hadoop.

Coarse Grained Encryption

Fine Grained Encryption

Spark ( Java and Scala )

Page 34: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Perfect data security and governance

• Combine best of two products – Apache Ranger and Protegrity ESA ( enterprise security administrator )

• Apache Ranger controls access and authorization

• Protegrity protects data at fine grained level using tokenization

• Modern Data Lakes benefit from both products• Data lake is protected according to enterprise security policy while Hadoop

access and authorization in in the hands of Ranger

Page 35: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Process Flow

Protegrity coexists with

Apache Ranger policies

Ranger controls column access

policy

Ranger KMS coexists along

with Protegrity KMS

Protegrity protects column data based on

ESA policy

Ranger logs along with ESA logs give comprehensive

security audit ( access and data protection ) logs for forensic analysis, fraud

alerts and other benefits

Ranger custom masking function

can be a Protegrity UDF

Page 36: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Protegrity and Ranger IntegrationProtegrity coexists with Apache Ranger policies

• Ranger controls column access policy

• Ranger KMS coexists along with Protegrity KMS

• Protegrity protects column data based on ESA policy

• Ranger logs along with ESA logs give comprehensive security audit ( access and data protection ) logs for forensic analysis, fraud alerts and other benefits

• Ranger custom masking function can be a Protegrity UDF

Future Exploration

• Embed access policy in Ranger with Protegrity Data Element protection policy for better alert and management

• Inherit access policies from Ranger into ESA policy design

• Single KMS - Best

Page 37: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Use Cases

• Data Protection is provided by Protegrity across the enterprise while Hadoop authorization and access is controlled by Ranger

• Enhance Apache Ranger Column masking using custom function in the form of Protegrity UDFs.

• Result is Ranger in control of data access and protection

Page 38: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Clear Data in Hive table

• Original Data present in table “clear_table”

• select * from clear_table;

• +-------------------+--+

• | clear_table.ccn |

• +-------------------+--+

• | 5539455602750205 |

• | 5464987835837424 |

• | 6226540862865375 |

• | 6226600538383292 |

• | 376235139103947 |

• +-------------------+--+

Page 39: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Custom masking function - Protect

Page 40: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Custom masking function - Unprotect

Page 41: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

Summary of Demo

Original Data Protected Data Unprotected Data

5539455602750200 8295281832577430 5539455602750200

5464987835837420 8437400318738670 5464987835837420

6226540862865370 9683356798323010 6226540862865370

6226600538383290 9885536985189730 6226600538383290

376235139103947 222096775455034 376235139103947

Page 42: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

THANK YOU

www.protegrity.com

Page 43: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

46 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

HDP SEC READY & GOV READY Programs

✔ Choice: Customers choose features that they want to deploy—a la carte

✔ Curated & Fast: Partners to provide rich, complimentary and complete features ready to deploy

✔ Agile: Faster deployment and accelerate innovation

✔ Centralized : Open metadata/governance and security infrastructure

✔ Flexibility: Portfolio of partner reference architectures and integration patterns

✔ Safe: HDP at core to provide stability and interoperability

Page 44: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

47 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hortonworks Certified Technology Program

HDP YARN Ready Integrates with YARN (native, Tez, Slider) or uses/runs on a YARN Ready engine

HDP Operations Ready Integrates with AmbariAPIs, Stacks, Blueprints, or Views

HDP Governance Ready Integrates with Atlas

HDP Security Ready Integrates with Ranger, Knox, or other security features

Sign up to be a partner and request certification kit!http://hortonworks.com/partners/product-integration-certification/

Page 45: Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Questions