hadoop security

31
By: Shrey Mehrotra

Upload: shrey-mehrotra

Post on 06-Aug-2015

65 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Hadoop security

By: Shrey Mehrotra

Page 2: Hadoop security

A form of protection where a separation is created between the assets and the threat.

Security in IT realm:

Application security

Computing security

Data security

Information security

Network security

Page 3: Hadoop security

Data : We have critical data in HDFS.

Resources : Each node of Hadoop cluster has resources required for executing applications.

Applications : Web Applications and REST APIs to access cluster details.

Services : HDFS, YARN and other services running on the cluster nodes.

Network Security : Services and Application communications over network.

Page 4: Hadoop security

Configuration

Data confidentiality

Service Level Authorization

Encryption

Authentication for Hadoop HTTP web-consoles

Delegation Tokens

Kerberos

Page 5: Hadoop security

core-site.xml

Parameter Value Notes

hadoop.security.authentication kerberossimple : No authentication. (default)

kerberos : Enable authentication by Kerberos.

hadoop.security.authorization true Enable RPC service-level authorization.

hadoop.rpc.protection authentication

authentication : authentication only (default)

integrity : integrity check in addition to authentication

privacy : data encryption in addition to integrity

hadoop.proxyuser.superuser.hosts comma separated hosts from which superuser access are allowd to impersonation. * means wildcard.

hadoop.proxyuser.superuser.groups comma separated groups to which users impersonated by superuser belongs. * means wildcard.

hdfs-site.xml

Parameter Value Notes

dfs.block.access.token.enable true Enable HDFS block access tokens for secure operations.

dfs.https.enable true This value is deprecated. Use dfs.http.policy

dfs.namenode.https-address nn_host_fqdn:50470

dfs.https.port 50470

dfs.namenode.keytab.file /etc/security/keytab/nn.service.keytab Kerberos keytab file for the NameNode.

dfs.namenode.kerberos.principal nn/[email protected] Kerberos principal name for the NameNode.

dfs.namenode.kerberos.internal.spnego.principal HTTP/[email protected] HTTP Kerberos principal name for the NameNode.

Page 6: Hadoop security

Superuser can submit jobs or access hdfs on behalf of another user in a secured

way.

Superuser must have kerberos credentials to be able to impersonate another user.

Ex. A superuser “bob” wants to submit job or access hdfs cluster as “alice”

//Create ugi for joe. The login user is 'super'.

UserGroupInformation ugi =

UserGroupInformation.createProxyUser(“alice", UserGroupInformation.getLoginUser());

ugi.doAs(new PrivilegedExceptionAction<Void>() {

public Void run() throws Exception {

//Submit a job

JobClient jc = new JobClient(conf);

jc.submitJob(conf);

//OR access hdfs

FileSystem fs = FileSystem.get(conf);

fs.mkdir(someFilePath);

}

}

Page 7: Hadoop security

The superuser must be configured on Namenode and ResourceManager to be

allowed to impersonate another user. Following configurations are required.

<property>

<name>hadoop.proxyuser.super.groups</name>

<value>group1,group2</value>

<description>Allow the superuser super to impersonate any members of the group group1 and group2</description>

</property>

<property>

<name>hadoop.proxyuser.super.hosts</name>

<value>host1,host2</value>

<description>The superuser can connect only from host1 and host2 to impersonate a user</description>

</property>

Page 8: Hadoop security

Initial authorization mechanism to ensure clients connecting to a particular Hadoop service have the

necessary, pre-configured, permissions and are authorized to access the given service.

For example, a MapReduce cluster can use this mechanism to allow a configured list of users/groups to

submit jobs.

By default, service-level authorization is disabled for Hadoop.

To enable it set following configuration property in core-site.xml :

<property>

<name>hadoop.security.authorization</name>

<value> true</value>

</property>

Page 9: Hadoop security

hadoop-policy.xml defines an access control list for each Hadoop service.

Every ACL has simple format, a comma separated list of users and groups separated by space.

Example: user1,user2 group1,group2.

Blocked Access Control Lists

security.client.protocol.acl security.client.protocol.acl.blocked

Refreshing Service Level Authorization Configuration

hadoop dfsadmin –refreshServiceAcl

Page 10: Hadoop security

<property>

<name>security.job.submission.protocol.acl</name>

<value>alice,bob mapreduce</value>

</property>

Allow only users alice, bob and users in the mapreduce group to submit jobs to the MapReduce cluster:

Allow only DataNodes running as the users who belong to the group datanodes to communicate with the NameNode:

<property>

<name>security.datanode.protocol.acl</name>

<value>datanodes</value>

</property>

Allow any user to talk to the HDFS cluster as a DFSClient:

<property>

<name>security.client.protocol.acl</name>

<value>*</value>

</property>

Page 11: Hadoop security

Data Encryption on RPC• The data transfered between hadoop services and clients.

• Setting hadoop.rpc.protection to "privacy" in the core-site.xml activate data encryption.

Data Encryption on Block data transfer• set dfs.encrypt.data.transfer to "true" in the hdfs-site.xml.

• set dfs.encrypt.data.transfer.algorithm to either "3des" or "rc4" to choose the specific encryption

algorithm.

• By default, 3DES is used.

Data Encryption on HTTP

• Data transfer between Web-console and clients are protected by using SSL(HTTPS).

Page 12: Hadoop security

It implements a permissions model for files and directories that shares much of the POSIX model.

User Identity

simple : In this mode of operation, the identity of a client process is determined by the host operating system.

kerberos : In Kerberized operation, the identity of a client process is determined by its Kerberos credentials.

Group Mapping

org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback

org.apache.hadoop.security.ShellBasedUnixGroupsMapping

HDFS stores the user and group of a file or directory as strings; there is no conversion from user and group identity

numbers as is conventional in Unix.

Page 13: Hadoop security

Shell Operations• hadoop fs -chmod [-R] mode file

• hadoop fs -chgrp [-R] group file

• chown [-R] [owner][:[group]] file

The Super-User

The super-user is the user with the same identity as name node process itself.

Permissions checks never fail for the super-user.

There is no persistent notion of who was the super-user

When the name node is started the process identity determines who is the super-user for now.

WebHDFS

Uses Kerberos (SPNEGO) and Hadoop delegation tokens for authentication.

Page 14: Hadoop security

An ACL provides a way to set different permissions for specific named users or named groups, not only the file's owner and

the file's group.

By default, support for ACLs is disabled.

Enable ACLs by adding the following configuration property to hdfs-site.xml and restarting the NameNode

<property>

<name>dfs.namenode.acls.enabled</name>

<value>true</value>

</property>

Page 15: Hadoop security

ACLs Shell Commands

hdfs dfs -getfacl [-R] <path>

hdfs dfs -setfacl [-R] [-b|-k -m|-x <acl_spec> <path>]|[--set <acl_spec> <path>]

-R : Recursive

-m : Modify ACL.

-b : Remove all but the base ACL entries. The entries for user, group and others are retained for compatibility

with permission bits.-k : Remove the default ACL.

-x : Remove specified ACL entries.

<acl_spec> : Comma separated list of ACL entries.

--set : Fully replace the ACL, discarding all existing entries.

hdfs dfs -ls <args>

ls will append a '+' character to the permissions string of any file or directory that has an ACL.

Page 17: Hadoop security

Tokens are generated for applications, containers.

Use HMAC_ALGORITHM to generate password

for tokens.

YARN interfaces for secret manager tokensBaseNMTokenSecretManager

AMRMTokenSecretManager

BaseClientToAMTokenSecretManager

BaseContainerTokenSecretManager

Source : Hortonworks

Page 18: Hadoop security

Enable ACL check in YARN

Queues ACL

QueueACLsManager check for access of each user against the ACL defined in the queue.

Following would restrict access to the "support" queue to the users “shrey” and the

members of the “sales" group:

yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue

<property>

<name>yarn.acl.enable</name>

<value>true</value>

<property>

<property>

<name>yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications</name>

<value>shrey sales</value>

<property>

<property>

<name>yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue s</name>

<value>sales</value>

<property>

Page 19: Hadoop security
Page 20: Hadoop security

Services

Client

Plain Text or Encrypted

Password

Page 21: Hadoop security

Kerberos is a network authentication protocol.

It is used to authenticate the identity of the services running on different

nodes (machines) communicating over a non-secure network.

It uses “tickets” as basic unit for authentication.

Page 22: Hadoop security

Authentication ServerIt is a service used to authenticate or verify clients. It usually checks for username of the requested client

in the system

Ticket Granting Server

It generates Ticket Granting Tickets (TGTs) based on target service name, initial

ticket (if any) and authenticator.

Principles

It is the unique identity to which Kerberos could assign tickets provided by Ticket

Granting Server.

Page 23: Hadoop security
Page 24: Hadoop security
Page 25: Hadoop security
Page 26: Hadoop security
Page 27: Hadoop security
Page 28: Hadoop security
Page 29: Hadoop security

To enable Kerberos authentication in Hadoop, we need to configure following properties

in core-site.xml

<property>

<name>hadoop.security.authentication</name>

<value>kerberos</value>

<!-- Giving value as "simple" disables security.-->

</property>

<property>

<name>hadoop.security.authorization</name>

<value>true</value>

</property>

Page 30: Hadoop security

Keytab is a file containing Kerberos principals and encrypted keys. These files are used to

login directly to Kerberos without being prompted for the password.

Enabling kerberos for HDFS services:

A. Generating KeyTabCreate the hdfs keytab file that will contain the hdfs principal and HTTP principal. This keytab file is used for the

Namenode and Datanode

B. Associate KeyTab with YARN principle

kadmin: xst -norandkey -k yarn.keytab hdfs/fully.qualified.domain.name HTTP/fully.qualified.domain.name

sudo mv hdfs.keytab /etc/hadoop/conf/

Page 31: Hadoop security

<!-- Namenode security configs -->

<property>

<name>dfs.namenode.keytab.file</name>

<value>/etc/hadoop/hdfs.keytab</value>

<!-- path to the HDFS keytab -->

</property>

<property>

<name>dfs.namenode.kerberos.principal</name>

<value>hdfs/[email protected]</value>

</property>

Add the following properties to the hdfs-site.xml file

<!-- Datanode security configs -->

<property>

<name>dfs.datanode.keytab.file</name>

<value>/etc/hadoop/hdfs.keytab</value>

<!-- path to the HDFS keytab -->

</property>

<property>

<name>dfs.datanode.kerberos.principal</name>

<value>hdfs/[email protected]</value>

</property>