partner ecosystem showcase for apache ranger and apache atlas
TRANSCRIPT
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Inc. Talend Inc. Arcadia Data ProtegrityAli Bajwa, Partner Solutions Laurent Bride, CTO Shant Hovsepian, CTO Sunil Sabat, Director, Partner SolutionsSrikanth Venkat, Product Management
DataWorks Summit - San Jose June 2017
Partner Ecosystem Showcase For Apache Ranger And Apache Atlas
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaApache Ranger & Apache Atlas
Journey, Ecosystem & Partners
Hortonworks Partner Certification Program
SEC Ready & GOV Ready program
Partner Technology Showcase
3 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Apache Ranger Community Snapshot
May 2014
XASecureAcquisition
July 2014
Enters Apache Incubation
Nov 2014
Ranger 0.4.0
Release
July 2015
Ranger 0.5/ HDP2.3
Aug 2016
Ranger 0.6/ HDP2.5
Nov 2016
Ranger 0.6.2/ HDP2.5.3
Jan 2017
Ranger TLP graduation!
Apr 2017
Ranger 0.7/
HDP2.6
TBD
1.0.0
Target
Release
Date
• Committers: 22
• Contributors from:Ebay, MSFT, Huawei, Pandora, Accenture, ING, Talend
Ranger 0.7/HDP 2.6
• Export/import of Policies
• $User and macros
• Plugin status tab
• “Show columns” and “describe extended support”
• Incremental LDAP Sync
• SmartSense Metrics
Ranger 0.6/HDP2.5
• Classification (tag) based security (ABAC)
• Dynamic Column Masking & Row Filtering
• KMS HSM Integration (Safenet)
• Dynamic Policies & Deny Conditions
• LDAP Improvements & Audit Scalability
Jun 2017
Ranger 0.7.1/
HDP2.6.1
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Ranger: Ecosystem
PartnerPartner Integrations
Apache Ranger
ApacheKafka
Native Hadoop Service Authorizers
Azure Data Lake Store (ADLS)*(Future)
Authorizer Extensions
for Non-Hadoop
Filesystems & Stores
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Background: DGI Community becomes Apache Atlas
May2015
Apache AtlasIncubation
DGI groupKickoff
Dec 2014
Apr2017
HDP 2.6/Apache 0.8 Release
Global FinancialCompany
* DGI: Data Governance Initiative
Aug2016
HDP 2.5/Apache 0.7Foundation Release
Apache 0.8/HDP 2.6• Simplified Search UI
• Simplified APIs
• Classification-based security for HDFS, Kafka, HBase
• Knox SSO
• Performance/scalability improvements
Apache 0.7.1/HDP 2.5.3
• High availability support
• LDAP Authentication/Authorization
• Classification based security for Hive
• UI Redesign
• Committers – 35• Code contributors from
- IBM, Aetna, Merck, Target, JPMC
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Atlas: Ecosystem
Custom Integration
Apache Atlas
RDBMS
ApacheKafka
Pending:PartnerPartner
7©2017 Talend Inc
Talend Studio Jobs lineage withApache AtlasLaurent Bride, CTO Talend
8
Agenda
Integration Goals
Design
Technical Details
Demo
9
Integration Goals
Support lineage of Talend Studio jobs on Apache Atlas / Hortonworks HDP
Similar (or improved) functionality to what we offer for other lineage providers.
Lineage for Talend Big Data jobs both on Spark/Hadoop.
Authentication with Lineage Backend.
Die-on-error: Lineage failure does not affect job execution.
10
Design
Goal: Support a similar generic lineage model.
Solution:
Send the transformation graph representation with each node as a HashMap of properties.
Translate the graph into the given model in an integration layer.
For the Atlas case it uses the Atlas REST API via atlas-client JAR.
Let the specific lineage provider functionality open for advanced functionality
• Future Roadmap items
11
Technical Details - Talend Model for Atlas
Note that Lineage view only shows Entities that are in the “DataSet – Process – DataSet” form.
So we had to represent every Component as a DataSet (tComponent) and create artificial components (tArtificialComponent) as a Process so we can show them in the Lineage view.
12
Technical Details – Open Issues
The entity connection constraint is our biggest issue.
Breaking changes on the API (atlas-client 0.8 but compatible with 0.7 through redirect).
Inherited properties are shown even if not assigned (this is not an issue, but due to our reuse of DataSet we have issues like this:
DataSet has an owner, but an owner does not make sense for a Talend transform.
Atlas Model is flexible but strict at the same time, data is constrained to evolve with metadata, if we pass new arguments that are not defined in the metadata model they are ignored.
13
Demo / Talend Studio side
14
Demo / How it looks like in Apache Atlas
Arcadia Data. Proprietary and Confidential
Securing Visual Analytics for Big Data
with Apache Ranger
Shant Hovsepian – CTO & co-Founder@superdupershant
June 14, 2017
Arcadia Data. Proprietary and Confidential
Arcadia Visualization Engine
The First Native Visual Analytics Platform for Big Data
Arcadia Analytic Platform(Smart Acceleration™)
On-Premises
Drag-and-drop Visual Analytics & Dashboards
HybridCloud
Custom Data Applications
…BIG DATA OS
Distributed execution,
data storage, metadata, security
IN-CLUSTER ANALYTICS ENGINEScales linearly with cluster for
speed and easier management
WEB-BASED INTERFACEDrag & drop interface for
visual analytics & app workflow
Data
Pla
tform
Arcadia Data. Proprietary and Confidential
The Challenge
Arcadia Data. Proprietary and Confidential
What is Apache Ranger?
• Centralized authorization and auditing across Hadoop components
• Access authorization based on resources
• Policy based behavior such as column masking
• Extensible Architecture
18
Arcadia Data. Proprietary and Confidential
The Value of a Robust Policy Engine
• It’s complicated code to get right
• I am Lazy, I don’t want to implement it
• Zero Knowledge Proofs
19
Arcadia Data. Proprietary and Confidential
Native Security Integration
Arcadia analytics
platform
HDFS
SINGLE COPY OF DATA TO SECURE
Reduces footprint of data copies with the same or summarized
information
Single policy definition for access control
Easier compliance
ENTERPRISE GRADE
Kerberos, LDAPS/AD, PAM and SAML
Single sign on for business users
Role-based access control with delegation
INTEGRATED ROLE-BASED ACCESS
Use role definitions from Ranger for access at BI tier
No risk of mismatching policies between data management tier
and BI tier
Arcadia Data. Proprietary and Confidential
Configuration
• Tight integration with Ranger + Ambari makes installation and
configuration very easy!
21
Arcadia Data. Proprietary and Confidential
Arcadia Data OLAP Engine
• In order to accelerate data access and reporting we have an on-cluster
engine
• Cubes are pre-computed and stored in memory and in HDFS via
HCatalog.
• We had to make sure all Hive catalog accesses were first authorized
through Ranger
• Simple implementation just requires an Authorizer class with
isAccessAllowed()
22
Arcadia Data. Proprietary and Confidential
Arcadia Data Visualization Server (BETA)
• While table level privileges like SELECT/INSERT make sense for tables
visuals tend to have a richer set of verbs
• Need to define custom “resources” in Ranger
• Define custom “privileges” Edit / Clone / Export / Interact
• A little tricky to do if you are not Java based
• Wildcard support is awesome!!!!!
• See Yesterday’s talk on Ranger + HAWQ for more details (EXTENDING
APACHE RANGER AUTHORIZATION BEYOND HADOOP)
23
Arcadia Data. Proprietary and Confidential
Policy Page
• Arcadia Policy Shows Up Along others
24
Arcadia Data. Proprietary and Confidential
Admin Level Access
25
Arcadia Data. Proprietary and Confidential
Restricted Access For The Public
26
Arcadia Data. Proprietary and Confidential
In Conclusion
Arcadia Data. Proprietary and Confidential
Thank you.
Visit us atBooth 606
Protegrity Big Data Protector and Apache Ranger
Ranger Integration
By
Sunil Sabat
Copyright – Protegrity Inc.
WHAT DO WE DO?
Deliver centralized
policy enforcement
across enterprise
Apply security as
close to the data as
possible
Protect the entire
data flow – at rest,
in transit, in use
HOW WE DO IT
Spending
Healthcare
Financial
ASSOCIATED DATAIDENTIFIED DATA
SSN (023-45-1288)
Name (Jane Doe)
Email ([email protected])
DE-IDENTIFIED DATA
SSN (153-51-4363)
Name (Hfhe Jes)
Email ([email protected])
IDENTITY IS KNOWN
IDENTITY IS NOT KNOWN
To Unauthorized Users
To Authorized Users
ACROSS THE ENTERPRSE
ESA
1/02/1966 xxxx2278 ysieondusbak
Tokenized In the clearMaskedDe-identified
Joe Smith12/25/1966076-39-2778CENTRAL
MANAGEMENT
POLICY ENFORCED TECHNOLOGY
CONSISTENT PROTECTION
Protegrity’s Big Data Protector for Hadoop
Hive
MapReduceYARN
HDFS
OS File System
Pig OtherName Node
Data Node
Data Node
Data Node
Edge Node
Edge Node
Data Node
Edge Node
Data Node
Edge Node
Edge Node
Edge Node
Edge Node
Data Node
Data Node
Data Node
Edge Node
Hadoop Cluster Hadoop Node
Policy
Audit
Protegrity Big Data Protector for Hadoop delivers protection at every
node and is delivered with our own cluster management capability.
All nodes are managed by the Enterprise Security Administrator that
delivers policy and accepts audit logs
Protegrity Data Security Policy contains information about how data is de-identified and who is authorized to have access to that data.
Policy is enforced at different levels of protection in Hadoop.
Coarse Grained Encryption
Fine Grained Encryption
Spark ( Java and Scala )
Perfect data security and governance
• Combine best of two products – Apache Ranger and Protegrity ESA ( enterprise security administrator )
• Apache Ranger controls access and authorization
• Protegrity protects data at fine grained level using tokenization
• Modern Data Lakes benefit from both products• Data lake is protected according to enterprise security policy while Hadoop
access and authorization in in the hands of Ranger
Process Flow
Protegrity coexists with
Apache Ranger policies
Ranger controls column access
policy
Ranger KMS coexists along
with Protegrity KMS
Protegrity protects column data based on
ESA policy
Ranger logs along with ESA logs give comprehensive
security audit ( access and data protection ) logs for forensic analysis, fraud
alerts and other benefits
Ranger custom masking function
can be a Protegrity UDF
Protegrity and Ranger IntegrationProtegrity coexists with Apache Ranger policies
• Ranger controls column access policy
• Ranger KMS coexists along with Protegrity KMS
• Protegrity protects column data based on ESA policy
• Ranger logs along with ESA logs give comprehensive security audit ( access and data protection ) logs for forensic analysis, fraud alerts and other benefits
• Ranger custom masking function can be a Protegrity UDF
Future Exploration
• Embed access policy in Ranger with Protegrity Data Element protection policy for better alert and management
• Inherit access policies from Ranger into ESA policy design
• Single KMS - Best
Use Cases
• Data Protection is provided by Protegrity across the enterprise while Hadoop authorization and access is controlled by Ranger
• Enhance Apache Ranger Column masking using custom function in the form of Protegrity UDFs.
• Result is Ranger in control of data access and protection
Clear Data in Hive table
• Original Data present in table “clear_table”
•
• select * from clear_table;
• +-------------------+--+
• | clear_table.ccn |
• +-------------------+--+
• | 5539455602750205 |
• | 5464987835837424 |
• | 6226540862865375 |
• | 6226600538383292 |
• | 376235139103947 |
• +-------------------+--+
Custom masking function - Protect
Custom masking function - Unprotect
Summary of Demo
Original Data Protected Data Unprotected Data
5539455602750200 8295281832577430 5539455602750200
5464987835837420 8437400318738670 5464987835837420
6226540862865370 9683356798323010 6226540862865370
6226600538383290 9885536985189730 6226600538383290
376235139103947 222096775455034 376235139103947
THANK YOU
www.protegrity.com
46 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HDP SEC READY & GOV READY Programs
✔ Choice: Customers choose features that they want to deploy—a la carte
✔ Curated & Fast: Partners to provide rich, complimentary and complete features ready to deploy
✔ Agile: Faster deployment and accelerate innovation
✔ Centralized : Open metadata/governance and security infrastructure
✔ Flexibility: Portfolio of partner reference architectures and integration patterns
✔ Safe: HDP at core to provide stability and interoperability
47 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Certified Technology Program
HDP YARN Ready Integrates with YARN (native, Tez, Slider) or uses/runs on a YARN Ready engine
HDP Operations Ready Integrates with AmbariAPIs, Stacks, Blueprints, or Views
HDP Governance Ready Integrates with Atlas
HDP Security Ready Integrates with Ranger, Knox, or other security features
Sign up to be a partner and request certification kit!http://hortonworks.com/partners/product-integration-certification/
48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions