dell emc ecs iam and hadoop s3a implementation...ecs 3.5.0.1 includes support for an iam simulator,...

36
H18420 Implementation Guide Dell EMC ECS IAM and Hadoop S3A Implementation Dell EMC ECS Identity and Access Management (IAM) feature to control access to S3A Hadoop data Abstract This paper describes basic information on IAM features with Dell EMC ECS and step by step process to configure ECS with AD FS to determine SAML support features, that allow the Hadoop administrator to setup access policies to control access to S3A Hadoop data. June 2020

Upload: others

Post on 16-Sep-2020

26 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

H18420

Implementation Guide

Dell EMC ECS IAM and Hadoop S3A Implementation Dell EMC ECS Identity and Access Management (IAM) feature to control access to S3A Hadoop data

Abstract This paper describes basic information on IAM features with Dell EMC ECS and

step by step process to configure ECS with AD FS to determine SAML support

features, that allow the Hadoop administrator to setup access policies to control

access to S3A Hadoop data.

June 2020

Page 2: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Revisions

2 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

Revisions

Date Description

June 2020 Initial release

Acknowledgements

Author: Kirankumar Bhusanurmath, Analytics Solutions Architect

Support: Chip Maurer, Seema Tahaliyani

Other:

The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this

publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.

Use, copying, and distribution of any software described in this publication requires an applicable software license.

Copyright © 2020 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies, Dell, EMC, Dell EMC and other trademarks are trademarks of Dell

Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners. [7/13/2020] [Implementation Guide] [H18420]

Page 3: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Table of contents

3 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

Table of contents

Revisions ..................................................................................................................................................................... 2

Acknowledgements ...................................................................................................................................................... 2

Table of contents ......................................................................................................................................................... 3

Overview ..................................................................................................................................................................... 5

1 Dell EMC IAM implementation with Hadoop S3A.................................................................................................... 6

1.1 Policies for Hadoop users ............................................................................................................................. 6

1.2 Create policies on Dell EMC ECS ................................................................................................................. 7

1.2.1 Customer managed policy “HadoopFullControl” ............................................................................................ 7

1.2.2 Customer managed policy “HadoopNoDelete” .............................................................................................. 8

1.2.3 Customer managed policy “HadoopReadOnly” ............................................................................................. 9

1.3 SAML assertions/federated users access configurations ............................................................................. 10

1.3.1 Setup TrustReationship between ADFS as identity provider and ECS. ........................................................ 10

1.3.2 Demo environment details for using SAML assertions ................................................................................ 20

1.3.3 Create ECS IAM roles ................................................................................................................................ 21

1.3.4 Validate the SAML assertions ..................................................................................................................... 23

1.4 Encrypting IAM credentials using JCEKS.................................................................................................... 26

1.5 Bucket owner permissions change ............................................................................................................. 27

1.6 Restricting access via policy management .................................................................................................. 28

1.7 ECS IAM users and groups access configurations ...................................................................................... 28

1.7.1 Create IAM group ....................................................................................................................................... 29

1.7.2 Create IAM user ......................................................................................................................................... 29

1.7.3 Validate the IAM User................................................................................................................................. 30

2 SAML assertions vs IAM users/groups ................................................................................................................. 31

2.1 SAML assertions ........................................................................................................................................ 31

2.2 IAM users and groups ................................................................................................................................ 31

3 Object tagging to protect individual objects .......................................................................................................... 32

3.1 Object tagging ............................................................................................................................................ 32

3.2 Example object tagging to protect individual objects ................................................................................... 32

3.2.1 Example overview ...................................................................................................................................... 32

3.2.2 Step1: Create tag xml file ........................................................................................................................... 32

3.2.3 Step 2: Add Tag to file and confirm ............................................................................................................. 33

3.2.4 Step 3: Create payroll authorized user ........................................................................................................ 33

3.2.5 Step 4: Bucket policy to enforce access control .......................................................................................... 34

3.2.6 Step 5: Testing object access ..................................................................................................................... 34

Page 4: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Table of contents

4 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

4 Tips ..................................................................................................................................................................... 35

A Technical support and resources ......................................................................................................................... 36

A.1 Related resources ...................................................................................................................................... 36

Page 5: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Overview

5 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

Overview

S3A is an open source connector for Hadoop bases on the official Amazon Web Services SDK. It was created

to address the storage scaling and costs problems that many Hadoop users were having with HDFS.

Hadoop S3A allows to connect Hadoop cluster to Dell EMC ECS object store, this allows to create a second

tier of storage for offloading data to a cheaper cost per terabyte storage. Dell EMC ECS 3.5 now supports

Identity and Access management (IAM) features, this addresses security limitation of configuring Hadoop

cluster to ECS storage.

This implementation guide is intended to provide information on IAM features with Dell EMC ECS and step by

step process to configure Dell EMC ECS with Microsoft Active Directory Federation Services (AD FS) to

demonstrate Security Assertion Markup Language (SAML) supported features that can be used by Hadoop

administrators to setup access policies to control access to Hadoop data through S3A protocol stored on ECS

object store.

ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using

the simulator, refer AWS Simulate Principal Policy for more information

Page 6: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

6 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

1 Dell EMC IAM implementation with Hadoop S3A ECS IAM security services can be implemented on Hadoop cluster for S3A granular security. You require ECS

IAM credentials to securely access storage through Hadoop S3A.

Prior to ECS IAM, Hadoop access to ECS object storage using S3A required an ECS S3 object username and a

secret key. Also, ACL level security was not possible with S3A. However, with ECS IAM and Secure Token

Service (STS) features, an administrator has several, more secure options for controlling access to the S3A

storage. One option is to create IAM policies that define permissions which are appropriate for the customer

business case. Once the policies are in place, IAM groups can be created and attached to the policies.

Individual IAM users can then be created and become members of the IAM groups. IAM users are assigned

S3 access keys and secret keys that can be used to access the S3A data, compared to the IAM policy for the

IAM user.

Another option for administrators is to use STS and SAML Assertions to allow federated users to obtain

temporary credentials. In this use case, a cross trust relationship must be established between the ECS and

the Identity Provider (Windows ADFS). Similar to the previous example, IAM policies must first be created.

Once the policies are defined, the administrator creates IAM roles that are attached to the IAM policies.

Federated users can then authenticate and obtain a SAML assertion from the Identity Provider. The assertion

is used to assume one of the possible IAM roles that are permissible for the user. Once the role has been

assumed, the Hadoop user is provided with a temporary access key, a temporary secret key, and a temporary

token. The Hadoop user uses these temporary credentials to access the S3A data until the credentials expire.

These temporary credentials correspond to the configured policies which enforce security controls on an S3

object store.

Following sections describe the implementation mechanism to setup ECS IAM security services on Hadoop

cluster for S3A security.

We will explain implementation of below two user access options for Hadoop administrators with an example.

1. Option 1: IAM Users/Groups

a. Create IAM groups that attach to policies

b. Create IAM users that are members of an IAM group

2. Option 2: SAML Assertions (Federated Users)

a. Create IAM roles that attach to policies

b. Configure TrustRelationship between Identity Provider (AD FS) and ECS that map AD groups to

IAM roles

1.1 Policies for Hadoop users To walk you through the implementation process with an example, we will consider below 3 different types of

fictional Hadoop users for which we will create policies. Please be informed that each Hadoop environments

will have different policy requirements that needs to be planned, designed and implemented.

• Hadoop Administrator: This user can do all operations, except create bucket and delete bucket

• Hadoop Power User: This user can do all operations except create bucket, delete bucket and delete

object

• Hadoop Read Only User: This user can only list and read objects.

Page 7: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

7 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

1.2 Create policies on Dell EMC ECS In this section we will show how to create a customer managed IAM Policies to define a set of custom S3

permissions, that allow IAM identities to use these policies to perform S3 operations.

1.2.1 Customer managed policy “HadoopFullControl” Here customer managed policy name is HadoopFullControl, we provide ALL S3 service permissions

EXCEPT CreateBucket and DeleteBucket, to create the policy follow below steps.

1. Log into ECS as a native system administrator i.e. root and select “Identity and Access(S3)”

Dell EMC IAM page

2. Create New Policy

a. Name: HadoopFullControl

b. Select Visual Editor

i. Service: S3

ii. Expand Actions

iii. Expand the Actions, and select/deselect as appropriate to allow the following action

iv. Resources: All resources

Page 8: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

8 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

Dell EMC IAM page to create new policy

1.2.2 Customer managed policy “HadoopNoDelete” Here customer managed policy name is HadoopNoDelete, we provide ALL S3 service permissions EXCEPT

Delete action.

1. Create New Policy

a. Name: HadoopNoDelete

b. Select Visual Editor

i. Service: S3

ii. Expand Actions

iii. Expand the Actions and deselect Delete actions.

iv. Resources: All resources

Page 9: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

9 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

Dell EMC IAM page to create new policy

Note even for this policy, Delete Bucket and Create Bucket are not enabled

1.2.3 Customer managed policy “HadoopReadOnly” Here customer managed policy name is HadoopReadOnly, we provide only S3 List and Read actions.

1. Create New Policy

a. Name: HadoopReadOnly

b. Select Visual Editor

i. Service: S3

ii. Expand the Actions and select only List and Read actions.

iii. Resources: All resources

Page 10: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

10 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

Dell EMC IAM page to create new policy

Note Only List and Read/Get actions are enabled

1.3 SAML assertions/federated users access configurations In this section we see how to use Secure Token Service (STS) and Secure Assertion Markup Language

(SAML) assertion to allow federated users to obtain temporary credentials. In this use case, a cross trust

relationship must be established between the ECS and the Identity Provider (Windows ADFS). As described in

the above section IAM policies must first be created. Once the policies are defined, the administrator creates

IAM roles that are attached to the IAM policies. Federated users can then authenticate and obtain a SAML

assertion from the Identity Provider. The assertion is used to assume one of the possible IAM roles that are

permissible for the user. Once the role has been assumed, the Hadoop user is provided with a temporary

access key, a temporary secret key, and a temporary token. The Hadoop user uses these temporary

credentials to access the S3A data until the credentials expire. These temporary credentials correspond to

the configured policies which enforce security controls on an S3 object store.

Below is the process to setup SAML assertions user access configuration

1. Configure TrustRelationship between Identity Provider (AD FS) and ECS that map AD groups to IAM

roles

2. Create ECS IAM roles that attach to policies

1.3.1 Setup TrustReationship between ADFS as identity provider and ECS. In this section we will see steps to configure TrustRelationship between Microsoft ADFS as identity provider

and Dell EMC ECS, that will map AD groups to IAM roles.

Page 11: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

11 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

1.3.1.1 Step 1: Create new identity provider in ECS

Here we will create a new identity provider in ECS and add in ADFS details.

1. Download the ADFS Metadata file from the url https://[ADFS-SERVER-

NAME]/FederationMetadata/2007-06/FederationMetadata.xml

Download Microsoft ADFS FederationMetadata file

2. Login into ECS ➔ Manage ➔ Identity and Access(S3) ➔Identity Provider ➔ select Namespace ➔

Click “NEW IDENTITY PROVIDER”, provide the following details, for more information please refer

ECS admin guide Identity provider.

a. Namespace: HadoopProviderNamespaceX

b. Name: HadoopProvider

c. Type: SAML

d. Metadata provider: Click “Choose” and upload FederationMetadata.xml file downloaded from the

ADFS.

Scope of the Identity providers is limited to namespace, for multiple namespace environment the identity

providers must be setup accordingly.

Page 12: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

12 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

Dell EMC ECS UI create new identity provider

1.3.1.2 Step 2: Create ECS SAML service provider metadata Refer ECS admin guide SAML service provider metadata, to generate SAML service provider metadata file.

To generate service provider metadata file following information is needed from the user.

1. Base64 encoded Java keystore, key alias and key password.

2. User needs to provide DNS-domain-name which will be used as the entityBaseURL to set the

location in the AssertionConsumerService.

3. User EntityId will be set as urn:ecs:[uuid]:webservices, entity id will be unique for each customer.

4. ECS service provider metadata will be generated based on above information and then can be used

to setup relying party trust with ADFS.

5. Once ECS metadata is downloaded ADFS has to be configured to trust ECS as a relying party.

Claims needs to be added.

1.3.1.3 Step3: Setup relying party trust with ADFS In this section we will setup relying party trust with ADFS, follow the below steps.

1. In the ADFS ➔ right click “Relying Party Trust” ➔ Add Relying Party Trust ➔ click Next.

2. Select Data Source ➔ check “Import data about the relying party from a file” ➔ click Browse and

upload the ECS SAML service provider metadata file generated in the previous section ➔ click Next.

Page 13: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

13 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

MS ADFS add relying party trust and select data source

3. Specify Display Name, enter the display name and any optional notes for this relying party ➔ click

Next.

Specify display name

4. Keep default configuration for Configure multi-factor authentication ➔ click Next

Page 14: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

14 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

Configure multi-factor authentication

5. Keep default “Choose issuance authorization rules” ➔ click Next.

Choose issuance authorization rules

6. Keep default “Ready to Add trust” ➔ click Next ➔ click Finish.

Ready to add trust

1.3.1.4 Step4: Configure claim rules for the ECS relying party 1. Select the relying party trust added and click on Edit claim rules.

Page 15: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

15 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

Edit claim rules

2. Configuring claim rules for the ECS relying party

ECS relying party dialog box

3. Adding NameId Claim

a. In the Edit Claim Rules for <relying party> dialog box, click Add Rule.

b. Select Transform an Incoming Claim and then click Next.

c. Use the following settings:

i. Claim rule name: NameId

ii. Incoming claim type: Windows Account Name

Page 16: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

16 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

iii. Outgoing claim type: Name ID

iv. Outgoing name ID format: Persistent Identifier

v. Pass through all claim values: checked

Adding claim rule

4. Adding a RoleSessionName

a. Click Add Rule

b. In the Claim rule template list, select Send LDAP Attributes as Claims.

c. Use the following settings:

i. Claim rule name: RoleSessionName

ii. Attribute store: Active Directory

iii. LDAP Attribute: E-Mail-Addresses

iv. Outgoing Claim Type : https://aws.amazon.com/SAML/Attributes/RoleSessionName

d. Click Finish

Page 17: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

17 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

Adding a RoleSessionName

IMPORTANT: Following Step 5 and Step 6 are very important, please pay special attention

5. Custom claim rule to get AD groups

Unlike the two previous claims, here custom rule is used to send role attributes. This is done by

retrieving all the authenticated user’s AD groups and then matching the groups that start with to IAM

roles of a similar name. Names of these groups to create ECS Resource Names (ARNs) of IAM roles

in ECS Namespace (i.e., those that start with AWS-).

Sending role attributes required two custom rules. The first rule retrieves all the authenticated user’s

AD group memberships and the second rule performs the transformation to the roles claim.

a. Click Add Rule.

b. In the Claim rule template list, select Send Claims Using a Custom Rule and then click Next.

c. For Claim Rule Name, select Get AD Groups, and then in Custom rule, enter the following:

c:[Type

== "http://schemas.microsoft.com/ws/2008/06/identity/claims/windowsaccount

name", Issuer == "AD AUTHORITY"] => add(store = "Active Directory", types

= ("http://temp/variable"), query = ";tokenGroups;{0}", param = c.Value);

This custom rule uses a script in the claim rule language that retrieves all the groups the

authenticated user is a member of and places them into a temporary claim

Page 18: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

18 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

named http://temp/variable. (Think of this as a variable you can access later.) This is used in the next

rule to transform the groups into IAM role ARNs.

d. Click OK.

Send Claims using a custom rule part_1

Send Claims using a custom rule part_1

Page 19: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

19 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

6. Adding role attributes

a. Click Add Rule

b. Repeat the preceding steps, but this time, type Roles for Claim rule name and use the following

script:

c:[Type == "http://temp/variable", Value =~ "(?i)^AWS-"]

=> issue(Type = "https://aws.amazon.com/SAML/Attributes/Role", Value =

RegExReplace(c.Value, "AWS-", "urn:ecs:iam::s3:saml-

provider/provider1,urn:ecs:iam::s3:role/ADFS-"));

c. Note s3 is the namespace name you have provider and role in ECS. If your namespace name is

"iamns" then use "urn:ecs:iam::iamns:saml-provider/provider1,urn:ecs:iam::iamns:role/ADFS-".

d. Click OK.

Adding role attributes

NOTE: The following section 1.3.2 is a demo environment to demonstrate the SAML assertions, here section

1.3.1 process is followed to setup TrustReationship between ADFS as identity

provider(10.247.179/hdfs.emc.com) and ECS cluster (10.247.179.112). Revisit the section 1.3.1 to correlate

the entities.

Page 20: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

20 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

1.3.2 Demo environment details for using SAML assertions In this section we show demo environment setup which will be used to demonstrate SAML assertion with an

example.

1. First, we need to create cross trust relationship between ECS and ADFS. This is covered in the

above section. You can also refer to the ECS IAM documentation to create the appropriate cross trust

relationship.

2. For ECS, you can use IP of one of the ECS nodes or load balancers. In this example we will use

10.247.179.112.

3. For AD and ADFS, the host used in this example is 10.247.179.26 and the domain is hdfs.emc.com

1.3.2.1 Demo environment notes about TrustRelatiionship and Claims 1. All AD users should have email address, even if fake for RoleSessionName claim.

2. The Roles claim defines the mapping between AD user groups and the corresponding ECS roles that

will be mapped against later.

c:[Type == "http://temp/variable", Value =~ "(?i)^Hadoop"] => issue(Type =

"https://aws.amazon.com/SAML/Attributes/Role", Value =

RegExReplace(c.Value, "Hadoop", "urn:ecs:iam::s3:saml-

provider/HadoopProvider,urn:ecs:iam::s3:role/ADFS-"));

3. An AD user that is a member of an AD group that starts with ^Hadoop, will be eligible to assume

corresponding ECS IAM roles that start with ADFS-

a. A user in AD group HadoopAdminUser will be in ECS Hadoop Provider role ADFS-AdminUser

b. ADFS-AdminUser role must be defined on ECS ‘HadoopProvider’ , this is covered in the Create

ECS IAM roles

Page 21: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

21 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

1.3.2.2 SAML assertion system flow

SAML assertion system flow

1.3.2.3 SAML assertions AD Groups Assume AD Hadoop users of the hdfs.emc.com domain are members of one of the following AD groups

1. HadoopPowerUser – Has general read/write/list access to a bucket, but no delete

User: [email protected]

2. HadoopAdminUser – Like PowerUser but also has delete access

User: [email protected]

3. HadoopReadOnlyUser – Can only read/list contents of buckets (no write)

User: [email protected]

4. All other users in the domain have no access to Hadoop buckets

Note: It is possible for a person to be a member of more than one of these AD groups, but for simplicity sake,

we’ll assume users are only members of one of the groups above. If there is more than one role option for a

user to assume, they select which role they want at “run-time”

1.3.3 Create ECS IAM roles Here we will create a new IAM role that will define a trust relationship to a specific user in identity provider

(ADFS). The role will also provide S3 permission to this specific user when the user “assumes” this role.

1.3.3.1 Create ECS role “ADFS-PowerUser” 1. Log into ECS a native system administrator i.e. root.

2. Identity and Access -> Roles

Page 22: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

22 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

a. Create New Role

b. Name: ADFS-PowerUser

c. Change Maximum CLI/API session to 12 hours (or as needed)

d. SAML 2.0 Federation

i. SAML provider: provider created earlier (HadoopProvider in our case)

ii. Attribute: saml:aud

iii. Value: https://10.247.179.112/saml

1. One node of the ECS (or Load Balancer IP)

e. Permissions -> Customer Managed

i. Select HadoopNoDelete

f. Save

Create ECS file ADFS-AdminPowerUser

1.3.3.2 Create ECS role “ADFS-AdminUser” 1. Identity and Access -> Roles

a. Create New Role

b. Name: ADFS-AdminUser

c. Change Maximum CLI/API session to 12 hours (or as needed)

d. SAML 2.0 Federation

i. SAML provider: provider created earlier (HadoopProvider in our case)

ii. Attribute: saml:aud

iii. Value: https://10.247.179.112/saml

1. One node of the ECS (or Load Balancer IP)

Page 23: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

23 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

e. Permissions -> Customer Managed

i. Select HadoopFullControl

f. Save

1.3.3.3 Create ECS role “ADFS-ReadOnlyUser” 1. Identity and Access -> Roles

a. Create New Role

b. Name: ADFS-ReadOnlyUser

c. Change Maximum CLI/API session to 12 hours (or as needed)

d. SAML 2.0 Federation

i. SAML provider: provider created earlier (HadoopProvider in our case)

ii. Attribute: saml:aud

iii. Value: https://10.247.179.112/saml

1. One node of the ECS (or Load Balancer IP)

e. Permissions -> Customer Managed

i. Select HadoopNoDelete

f. Save

1.3.4 Validate the SAML assertions This section we will validate STS and SAML Assertions to allow federated users to obtain temporary

credentials to access the ECS object storage.

1.3.4.1 Object user access Verify the contents of the ECS bucket using the bucket owner “hdfs” by supplying the fs.s3a.access.key and

fs.s3a.secret.key from the Hadoop edge node.

1. The access key and secret key are not defined in core-site.xml for security reasons

2. Recall that ACLS (file/dir owner and perms) on S3A are not valid

File owner and group are shown as current user

3. The real Hadoop admin guards the access key and secret key tightly

Hadoop allows these values to be encrypted and put into JCEKS files for some additional security

[hdfs@lrmk025 ~]$ hdfs dfs -D fs.s3a.access.key=hdfs -D

fs.s3a.secret.key='PwVFc9O7zuTy+FN6/5MdvmruFeqF9CzHZ+N2rF66' -ls -R

s3a://hdfsBucket-s3a/

drwxrwxrwx - hdfs hdfs 0 2020-03-12 18:30 s3a://hdfsBucket-

s3a/tmp

-rw-rw-rw- 1 hdfs hdfs 8765 2020-03-12 18:31 s3a://hdfsBucket-

s3a/tmp/file1

drwxrwxrwx - hdfs hdfs 0 2020-03-12 18:30 s3a://hdfsBucket-

s3a/tmp/dir1

-rw-rw-rw- 1 hdfs hdfs 8764 2020-03-12 18:31 s3a://hdfsBucket-

s3a/tmp/dir1/file2

Page 24: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

24 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

1.3.4.2 SAML assertion validation overview To get SAML assertion from the ADFS,

1. Make sure the user for which you are getting saml assertion is in the group which is going to translate

to role in ECS.

- For example Bob user is in the group AWS-Dev and then there should be corresponding role in

ECS ADFS-Dev in namespace S3 with urn

urn:ecs:iam::s3:role/ADFS-Dev

2. You can use the script as in s3kinit.py or s3kinit3.py (depending on python version as stated above).

Please contact Dell EMC customer service for access to the s3kinit scripts.

You can also use temporary credentials to access ECS storage using the Amazon AWS CLI

- Provide AD username (Note: this user should have email set which is used in the

RoleSessionName claim)

- Provide AD user password

- Identity provider url with relying party for which you want saml assertion

- Once you get saml assertion encode it using https://www.url-encode-decode.com/

3. Note, s3kinit.py (python2) and s3kinit3.py (python3) are helper scripts to authenticate with AD/ADFS

required to be able to perform SAML assertion, and then perform the SAML assertion for the user to

get temp credentials

- Python packages for s3kinit that need to be added via pip: boto, requests, bs4

- The administrator hardcodes the idpentryurl and ECS/Load Balancer IP into this script, since

these are not likely to change, and the Hadoop user does not need to enter them in every time

> Sample idpentryurl:

https://ad.hdfs.emc.com/adfs/ls/idpinitiatedsignon.aspx?LoginToRp=urn:ecs:15a50b40-9713-

483f-a975-560ebd83449e:webservices <- LoginToRp is entityID from ECS provider metadata

4. The s3kinit.py script displays the roles which the user can assume, and the user selects the

appropriate role

1.3.4.3 The s3kinit script usage to get temporary credentials 1. The user chip is a member of the HadoopAdminUser AD Group (ADFS-AdminUser role)

- Ask for credentials to last for 12 hours, default is 1 hour

[chip@lrmk025 ~]$ s3kinit.py –H 12

Username: [email protected]

Password: <PASSWORD>

Following provider role combination can be used with assertion provided

with ECS assumerolewithsaml api

[ 0 ]: urn:ecs:iam::s3:role/ADFS-AdminUser urn:ecs:iam::s3:saml-

provider/HadoopProvider

Enter the number for the role to assume (between 0 and 0): 0

Page 25: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

25 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

2. Temporary Credentials Generated by s3kinit

Access Key: ASIA5FDA91A88B02BD80

Secret Key: 6zsZwdxZMnJkr1Np-lrkwxLheguviDtOh2LNHFDaFuM

Session Token:

CgJzMxoUQVJPQTNCRkQ3NzI3OEM2Rjg4ODEiI3VybjplY3M6aWFtOjpzMzpyb2xlL0FERlMtQW

RtaW5Vc2VyKhRBU0lBNUZEQTkxQTg4QjAyQkQ4MDJQTWFzdGVyS2V5UmVjb3JkLTNkYTRlMmU2

YzIwY2IzODY0NWVlMmViOWQ1ZTFjNTE4MmJhMGFiNDc1YjEwODhhYTk0MGYzMjJlMDI1YTNjZD

U474yH_o4uUhFjaGlwQGRldi5udWxsLmNvbVonCghzYW1sOmF1ZBIbaHR0cHM6Ly8xMC4yNDcu

MTc5LjExMi9zYW1sWhUKCHNhbWw6c3ViEglIREZTXGNoaXBaGwoNc2FtbDpzdWJfdHlwZRIKcG

Vyc2lzdGVudFo7ChJzYW1sOm5hbWVxdWFsaWZpZXISJVsidFcrR3FFVW5ZUmxnUDZHTTJMVkFB

ZytwVTB3XHUwMDNkIl1aNgoIc2FtbDppc3MSKmh0dHA6Ly9BRC5oZGZzLmVtYy5jb20vYWRmcy

9zZXJ2aWNlcy90cnVzdGIsdXJuOmVjczppYW06OnMzOnNhbWwtcHJvdmlkZXIvSGFkb29wUHJv

dmlkZXI

Expiration Date (UTC): 2020-04-10T06:44:16Z

3. Use these Hadoop settings:

-D fs.s3a.secret.key=6zsZwdxZMnJkr1Np-lrkwxLheguviDtOh2LNHFDaFuM

-D fs.s3a.access.key=ASIA5FDA91A88B02BD80

-D

fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCrede

ntialsProvider

-D

fs.s3a.session.token=CgJzMxoUQVJPQTNCRkQ3NzI3OEM2Rjg4ODEiI3VybjplY3M6aWFtO

jpzMzpyb2xlL0FERlMtQWRtaW5Vc2VyKhRBU0lBNUZEQTkxQTg4QjAyQkQ4MDJQTWFzdGVyS2V

5UmVjb3JkLTNkYTRlMmU2YzIwY2IzODY0NWVlMmViOWQ1ZTFjNTE4MmJhMGFiNDc1YjEwODhhY

Tk0MGYzMjJlMDI1YTNjZDU474yH_o4uUhFjaGlwQGRldi5udWxsLmNvbVonCghzYW1sOmF1ZBI

baHR0cHM6Ly8xMC4yNDcuMTc5LjExMi9zYW1sWhUKCHNhbWw6c3ViEglIREZTXGNoaXBaGwoNc

2FtbDpzdWJfdHlwZRIKcGVyc2lzdGVudFo7ChJzYW1sOm5hbWVxdWFsaWZpZXISJVsidFcrR3F

FVW5ZUmxnUDZHTTJMVkFBZytwVTB3XHUwMDNkIl1aNgoIc2FtbDppc3MSKmh0dHA6Ly9BRC5oZ

GZzLmVtYy5jb20vYWRmcy9zZXJ2aWNlcy90cnVzdGIsdXJuOmVjczppYW06OnMzOnNhbWwtcHJ

vdmlkZXIvSGFkb29wUHJvdmlkZXI

1.3.4.4 SAML assertion validation with an example Here we will use s3kinit script for a hdfs.emc.com domain user to get temporary credentials to ECS s3a and

use the same to run some Hadoop commands from a Hadoop client.

1. The user chip uses the output of s3kinit for various Hadoop commands

[chip@lrmk025 ~]$ hdfs dfs -D

fs.s3a.secret.key=7ZHagYY93_OAceczZ80J14lMnVh9qpwrj2-J5ua7C6E -D

fs.s3a.access.key=ASIA80797F10CD1D0F85 -D

fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCrede

ntialsProvider -D

fs.s3a.session.token=CgJzMxoUQVJPQTNCRkQ3NzI3OEM2Rjg4ODEiI3VybjplY3M6aWFtO

jpzMzpyb2xlL0FERlMtQWRtaW5Vc2VyKhRBU0lBODA3OTdGMTBDRDFEMEY4NTJQTWFzdGVyS2V

5UmVjb3JkLTNkYTRlMmU2YzIwY2IzODY0NWVlMmViOWQ1ZTFjNTE4MmJhMGFiNDc1YjEwODhhY

Tk0MGYzMjJlMDI1YTNjZDU41oWoio8uUhFjaGlwQGRldi5udWxsLmNvbVonCghzYW1sOmF1ZBI

baHR0cHM6Ly8xMC4yNDcuMTc5LjExMi9zYW1sWhUKCHNhbWw6c3ViEglIREZTXGNoaXBaGwoNc

2FtbDpzdWJfdHlwZRIKcGVyc2lzdGVudFo7ChJzYW1sOm5hbWVxdWFsaWZpZXISJVsidFcrR3F

Page 26: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

26 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

FVW5ZUmxnUDZHTTJMVkFBZytwVTB3XHUwMDNkIl1aNgoIc2FtbDppc3MSKmh0dHA6Ly9BRC5oZ

GZzLmVtYy5jb20vYWRmcy9zZXJ2aWNlcy90cnVzdGIsdXJuOmVjczppYW06OnMzOnNhbWwtcHJ

vdmlkZXIvSGFkb29wUHJvdmlkZXI -ls s3a://hdfsBucket-s3a/hdfsBucket-

s3a/tmp/file1

-rw-rw-rw- 1 chip chip 8765 2020-03-12 18:31 s3a://hdfsBucket-

s3a/tmp/file1

2. The user sahil is a member of the HadoopPowerUser AD group (ADFS-PowerUser role - no delete

privileges), here <TEMP CREDS> are the hadoop s3a parameters along with temporary credentials

generated from s3kinit script.

[sahil@lrmk025 ~]$ hdfs dfs <TEMP CREDS> -mkdir s3a://hdfsBucket-

s3a/tmp/dir2

[sahil@lrmk025 ~]$ hdfs dfs <TEMP CREDS> -rmdir s3a://hdfsBucket-

s3a/tmp/dir2

rmdir: s3a://hdfsBucket-s3a/tmp/dir2: delete on s3a://hdfsBucket-

s3a/tmp/dir2: com.amazonaws.services.s3.model.AmazonS3Exception: Access

Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied;

Request ID: 0af7b370:170cb443416:50cf:1f; S3 Extended Request ID: null),

S3 Extended Request ID: null:AccessDenied

3. The user tom is a member of the HadoopReadOnlyUser AD group (ADFS-ReadOnlyUSer role – no

create or delete privileges)

[tom@lrmk025 ~]$ hdfs dfs <TEMP CREDS> -mkdir s3a://hdfsBucket-

s3a/tmp/dir3

mkdir: tmp/dir3/: PUT 0-byte object on tmp/dir3/:

com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service:

Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID:

0af7b370:170cb443416:50cf:3e; S3 Extended Request ID: null), S3 Extended

Request ID: null:AccessDenied

1.3.4.5 Access restriction to non-member of any Hadoop AD group Any user who is not a member of any Hadoop specific AD groups cannot access s3a data, except he has

access key and secret key credentials.

1. The user khlebp is not a member of any Hadoop specific AD group, hence they cannot access S3A

data, unless someone provides them access key and secret key credentials

[khlebp@lrmk025 ~]$ s3kinit.py

Username@Domain: [email protected]

Password:

This user is not configured for any ADFS/ECS roles

1.4 Encrypting IAM credentials using JCEKS IAM credentials for an IAM User or a SAML asserted role user can be encrypted and placed in a JCEKS

credential file.

Page 27: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

27 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

1. For IAM users, add the access key and secret key.

- The jceks file can be placed on the local file system or HDFS, it just needs to only be accessible

by the current user

[chip@lrmk025 ~]$ hadoop credential create fs.s3a.access.key -value

AKIAB015FD8EC3DC6BCB -provider localjceks://file/home/chip/s3A.jceks

[chip@lrmk025 ~]$ hadoop credential create fs.s3a.secret.key -value

'JuXl8trShLj09q3rpgak826JqPzOlhamFSP0SSC8' -provider

localjceks://file/home/chip/s3A.jceks

[chip@lrmk025 ~]$ hdfs dfs -D

hadoop.security.credential.provider.path=localjceks://file/home/chip/s3.jc

eks -ls s3a://hdfsBucket-s3a/

2. For SAML users, the fs.s3a.session.token also should be added to the JCEKS file

- The fs.s3a.aws.credentials.provider keyword should be placed on the command line

or in core-site

1.5 Bucket owner permissions change 1. Prior to IAM support, the bucket owner, which can be an object user, had superuser privileges for the

bucket

- Using their user access key and secret key, the bucket owner could access everything on the

bucket

2. With IAM support, objects created by IAM users are not accessible by the bucket owner

- Adding a bucket policy for the bucket owner can address this new behavior

- Note that dir2, previously created by federated user sahil is not accessible by hdfs, the bucket

owner

[hdfs@lrmk025 ~]$ hdfs dfs -D fs.s3a.access.key=hdfs -D

fs.s3a.secret.key='PwVFc9O7zuTy+FN6/5MdvmruFeqF9CzHZ+N2rF66' -ls -R

s3a://hdfsBucket-s3a/tmp

drwxrwxrwx - hdfs hdfs 0 2020-03-18 13:42 s3a://hdfsBucket-

s3a/tmp/dir2

ls: tmp/dir2/: getFileStatus on tmp/dir2/:

com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service:

Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:

0af7b370:170cb443416:547f:b; S3 Extended Request ID: ), S3 Extended

Request ID: :403 Forbidden

3. Adding a bucket policy like below, will allow the bucket owner (hdfs) to retain the same control over

the bucket

{ "Version": "2012-10-17",

"Id": "Policy on bucket",

"Statement": [ {

"Action": [ "s3:*" ],

"Resource": "hdfsBucket-s3a/*",

Page 28: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

28 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

"Effect": "Allow",

"Principal": "hdfs",

"Sid": "allow hdfs root access" } ]}

1.6 Restricting access via policy management 1. Suppose in previous examples, that /users/payrolluser was a directory that contained files with

sensitive information that the Hadoop administrator wanted to keep from being viewed by

HadoopReadOnly policy users

- Since POSIX style ACL management is not possible for S3A storage, the Hadoop admin has to

use IAM policies and/or object tagging to restrict access

> Object tagging covered later

2. Using Deny control in HadoopReadOnly policy

- Restricting S3 access using resource wildcards is an easy way to enforce the security control

- Editing the policy permissions, we can add the condition using the JSON editor

- Summary

> We want to Deny list bucket access to anything in the bucket with the prefix

‘users/payrolluser’ for those using the HadoopReadOnly policy

3. Add the following JSON block after the “Allow” block of the Permissions dialog for the

HadoopReadOnly policy

{ "Action": [ "s3:*" ],

"Resource": "arn:aws:s3:::hdfsBucket-s3a/users/payrolluser/*",

"Effect": "Deny",

"Sid": "VisualEditor1" }} }

4. Now, the HadoopReadOnly user tom has no access

[tom@lrmk025 ~]$ hdfs dfs <TEMP CREDS> -cat s3a://hdfsBucket-

s3a/users/payrolluser/employees.csv

cat: s3a://hdfsBucket-s3a/users/payrolluser/employees.csv: getFileStatus

on s3a://hdfsBucket-s3a/users/payrolluser/employees.csv:

com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service:

Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:

0af7b370:1715a3392ab:259:df; S3 Extended Request ID: ), S3 Extended

Request ID: :403 Forbidden

1.7 ECS IAM users and groups access configurations ECS IAM Users and Groups can be created and used as a secure means to control access to the S3A

storage. This approach is to create IAM policies that define permissions which are appropriate for the

customer business case. Once the policies are in place, IAM groups can be created and attached to the

policies. Individual IAM users can then be created and become members of the IAM groups. IAM users are

assigned S3 access keys and secret keys that can be used to access the S3A data, relative to the IAM policy

for the IAM user.

Page 29: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

29 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

For demonstration purpose we will use the previous examples, let us use IAM groups and users to create a

user that uses the HadoopReadOnlu policy.

1.7.1 Create IAM group 1. Login int Log into ECS as a native system administrator i.e. root

2. Navigate to “Identity and Access(S3), select Group ad New Group

a. Group Name: IAMReadOnlyGroup

b. Attach HadoopReadOnly policy

c. Save

Dell EMC ECS IAM UI create IAM Group

1.7.2 Create IAM user 1. Login int Log into ECS as a native system administrator i.e. root

2. Navigate to “Identity and Access(S3), select Group ad New Group

a. Name: IAMReadOnlyUser

b. Group: IAMReadOnlyGroup

c. Create User

d. In the New User dialog, expand the Secret Key

e. Record Access Key and Secret Key

f. Download CSV

Page 30: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Dell EMC IAM implementation with Hadoop S3A

30 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

Create IAM User

Like AWS, the Secret Key is only displayed once when the user is created

1.7.3 Validate the IAM User Using the credentials for the “IAMReadOnlyUser” user created above, run Hadoop cat command. The user

should not be allowed to open the file.

[tom@lrmk025 ~]$ hdfs dfs <IAM CREDS> -cat s3a://hdfsBucket-

s3a/users/payrolluser/employees.csv

cat: s3a://hdfsBucket-s3a/users/payrolluser/employees.csv: getFileStatus on

s3a://hdfsBucket-s3a/users/payrolluser/employees.csv:

com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon

S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:

0af7b370:1715a3392ab:259:df; S3 Extended Request ID: ), S3 Extended Request ID:

:403 Forbidden

Page 31: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

SAML assertions vs IAM users/groups

31 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

2 SAML assertions vs IAM users/groups In this section we see the difference between SAML assertions and ECS IAM users and groups option to

access the s3a storage, which option to choose over the other for different scenarios.

2.1 SAML assertions 1. Use when the number of users who want access is large and users are authenticated via Active

Directory Federated Services

2. More work upfront to create CrossTrustRelationship between ECS and ADFS (somewhat

complicated)

3. More secure

a. Credentials are temporary

b. Credentials allocated per-assertion

2.2 IAM users and groups 1. Use when there are a small number of users

2. Use when need IAM access, but the user is not interactive. Example automated scripts to do

analytics which will need credentials generated and cannot run interactive s3kinit script.

3. Less upfront work to create groups and users (easier)

4. Use when users not federated via Active Directory Federation Services

a. Kerberos, openldap, etc

5. Less secure

a. Credentials are “long-term”

b. Users might be careless and expose their credentials

Page 32: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Object tagging to protect individual objects

32 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

3 Object tagging to protect individual objects

3.1 Object tagging Use object tagging to categorize storage, each tag is a key-value pair. Object tags enable fine-grained access

control of permissions. For example, you could grant an IAM user permissions to read-only objects with

specific tags.

3.2 Example object tagging to protect individual objects

3.2.1 Example overview 1. In this example, we will use object tagging to restrict access to some files for an IAM user for any file

which is tagged with a “Department” tag of “finance”

2. Only IAM/SAML users with ‘PutObjectTagging’ privileges can tag an object

3. Querying object tags imposes a minor latency similar to reading system metadata

4. IAM Users and IAM Roles can also have tags assigned to them

5. Example

a. Let’s assume there is a file in /users/chip/private called chip.private.txt

b. We will create an IAM User (IAMChipAdmin) who is also a member of the HadoopAdmins group

which attaches to the HadoopFullControl policy

c. We will update all policies to deny access to any file (GetObject) that has an object tag of

Department = “finance”

- Federated user tom who assumes the ADFS-ReadOnlyUser role should not have read access to

this file, even though the HadoopReadOnlyUser policy has read access to all files in the bucket

d. Only IAM user ‘IAMChipAdmin’ should be able to access (read) this file

e. We will create a key/value tag to assign to employees.csv

- <Key>Department</Key>

- <Value>finance</Value>

3.2.2 Step1: Create tag xml file 1. This does not necessarily have to be done via a file. The tag can be done on the command line

2. The s3curl.pl script is the preferred tool to use for object tagging for IAM users

a. SAML users need to create curl command to include session token

[chip@lrmk025 ~]$ cat payroll_object_tag.xml

<Tagging xmlns="http://s3.amazonaws.com/doc/2006-03-01/">

<TagSet>

<Tag>

<Key>Department</Key>

<Value>finance</Value>

</Tag>

</TagSet>

</Tagging>

Page 33: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Object tagging to protect individual objects

33 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

3.2.3 Step 2: Add Tag to file and confirm

3.2.3.1 Object tag using non secure port

[chip@lrmk025 ~]$ s3curl.pl --id=AKIA431FB00150EC3F79 --

key='mJ+8mcE6IJhK7KM9sF6fUgF0/fwHIEI798+MI6z1' --ord --put

./payroll_object_tag.xml http://10.247.179.112:9020/hdfsBucket-

s3a/tmp/employees.csv?tagging

3.2.3.2 Object tag using secure port If you want to use the secure S3 port, you need to use port 9021 and https

- May also have to use the –k arg to ignore certificate warnings (add – to pass to curl command)

[chip@lrmk025 ~]$ s3curl.pl --debug --id=AKIA431FB00150EC3F79 --

key='mJ+8mcE6IJhK7KM9sF6fUgF0/fwHIEI798+MI6z1' --ord

https://10.247.179.112:9021/hdfsBucket-s3a/tmp/employees.csv?tagging -- -k

| xmllint --format -

3.2.3.3 Confirm object tag

[chip@lrmk025 ~]$ s3curl.pl --id=AKIA431FB00150EC3F79 --

key='mJ+8mcE6IJhK7KM9sF6fUgF0/fwHIEI798+MI6z1’ -–ord

http://10.247.179.112:9020/hdfsBucket-s3a/tmp/employees.csv?tagging|

xmllint --format -

<Tagging xmlns="http://s3.amazonaws.com/doc/2006-03-01/">

<TagSet>

<Tag>

<Key>Department</Key>

<Value>payroll</Value>

</Tag>

</TagSet>

</Tagging>

3.2.4 Step 3: Create payroll authorized user The IAMChipAdmin IAM User has the Department tag set to finance

Dell EMC ECS IAM user tag page

Page 34: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Object tagging to protect individual objects

34 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

3.2.5 Step 4: Bucket policy to enforce access control We add to each policy a statement to deny s3:GetObject actions for all users if the object tag for Department

is “finance”

{

"Condition": {

"StringEquals": { "s3:ExistingObjectTag/Department": "finance" } },

"Action": [ "s3:GetObject" ],

"Resource": "arn:aws:s3:::hdfsBucket-s3a/*",

"Effect": "Deny" }

}

3.2.6 Step 5: Testing object access 1. First, we try to read the file using the credentials for IAM user ‘IAMChipAdmin’

- Has no policy ‘Deny’ rule about access to files with Department tag of ‘finance’

[chip@lrmk025 ~]$ hdfs dfs -D

hadoop.security.credential.provider.path=localjceks://file/home/chip/s3A.j

ceks -cat s3a://hdfsBucket-s3a/users/chip/private/chip.private.txt

This file is only visible to chip

2. Now we try as SAML user tom

[sahil@lrmk025 ~]$ hdfs dfs <TEMP CREDENTIALS> -cat s3a://hdfsBucket-

s3a/users/chip/private/chip.private.txt

cat: s3a://hdfsBucket-s3a/users/chip/private/chip.private.txt:

getFileStatus on s3a://hdfsBucket-s3a/users/chip/private/chip.private.txt:

com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service:

Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:

0af7b370:1715a3392ab:4c93:57; S3 Extended Request ID: ), S3 Extended

Request ID: :403 Forbidden

Page 35: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Tips

35 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

4 Tips 1. Hadoop administrators and storage administrators should work together to plan the policies that will

be implemented.

2. When you run s3kinit to get temporary credentials you might want to put the credentials in a secure

file location for future reference

- Pay attention to the credential expiration time/date

- If you lose/forget the credentials, re-run the s3kinit script

3. When defining policies, simple mistakes could cause unexpected results.

- Test often

4. When temporary credentials expire, you will see the same error using Hadoop tools that you would

see if you did not have access

- This is consistent with AWS

[chip@lrmk025 ~]$ hdfs dfs -ls s3a://hdfsBucket-s3a/

ls: s3a://hdfsBucket-s3a/: getFileStatus on s3a://hdfsBucket-s3a/:

com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service:

Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID:

0af7b370:1715a3392ab:23e:1; S3 Extended Request ID: null), S3 Extended

Request ID: null:AccessDenied

Page 36: Dell EMC ECS IAM and Hadoop S3A Implementation...ECS 3.5.0.1 includes support for an IAM simulator, the IAM functionality can first be tested and verified using the simulator, refer

Tips

36 Dell EMC ECS IAM and Hadoop S3A Implementation | H18420

A Technical support and resources

Dell.com/support is focused on meeting customer needs with proven services and support.

A.1 Related resources

Provide a list of documents and other assets that are referenced in the paper; include other resources that

may be helpful.

1. Dell EMC ECS IAM Overview

2. ECS Identity and Access Management (IAM)

3. ECS Security Assertion Markup Language (SAML)

4. ECS Identity Provider

5. Cloudera Security Best practices

6. Cloudera creating AWS IAM Policies

7. AWS IAM Roles

8. Enabling Federation to AWS Using Windows Active Directory, ADFS, and SAML 2.0

9. MS Create Relying Party trust

10. Configure SAML Assertions for the authentication response