how to set up a hadoop cluster using oracle...

38
How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On Labs of the System Admin and Developer Community of OTN by Orgad Kimchi with contributions from Jeff Taylor How to set up a Hadoop cluster using the Oracle Solaris Zones, ZFS, and network virtualization technologies. Lab Introduction This hands-on lab presents exercises that demonstrate how to set up an Apache Hadoop cluster using Oracle Solaris 11 technologies such as Oracle Solaris Zones, ZFS, and network virtualization. Key topics include the Hadoop Distributed File System (HDFS) and the Hadoop MapReduce programming model. We will also cover the Hadoop installation process and the cluster building blocks: NameNode, a secondary NameNode, and DataNodes. In addition, you will see how you can combine the Oracle Solaris 11 technologies for better scalability and data security, and you will learn how to load data into the Hadoop cluster and run a MapReduce job. Prerequisites This hands-on lab is appropriate for system administrators who will be setting up or maintaining a Hadoop cluster in production or development environments. Basic Linux or Oracle Solaris system administration experience is a prerequisite. Prior knowledge of Hadoop is not required. System Requirements This hands-on lab is run on Oracle Solaris 11 in Oracle VM VirtualBox. The lab is self- contained. All you need is in the Oracle VM VirtualBox instance. For those attending the lab at Oracle OpenWorld, your laptops are already preloaded with the correct Oracle VM VirtualBox image. If you want to try this lab outside of Oracle OpenWorld, you will need an Oracle Solaris 11 system. Do the following to set up your machine: 1. If you do not have Oracle Solaris 11, download it here . 2. Download the Oracle Solaris 11.1 VirtualBox Template (file size 1.7GB). 3. Install the template as described here . (Note: On step 4 of Exercise 2 for installing the template, set the RAM size to 4 GB in order to get good performance.)

Upload: trinhhuong

Post on 03-Jul-2018

249 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

How to Set Up a Hadoop Cluster Using

Oracle Solaris

Hands-On Labs of the System Admin and Developer Community of OTN

by Orgad Kimchi with contributions from Jeff Taylor

How to set up a Hadoop cluster using the Oracle Solaris Zones, ZFS, and network virtualization

technologies.

Lab Introduction

This hands-on lab presents exercises that demonstrate how to set up an Apache Hadoop cluster

using Oracle Solaris 11 technologies such as Oracle Solaris Zones, ZFS, and network

virtualization. Key topics include the Hadoop Distributed File System (HDFS) and the Hadoop

MapReduce programming model.

We will also cover the Hadoop installation process and the cluster building blocks: NameNode, a

secondary NameNode, and DataNodes. In addition, you will see how you can combine the

Oracle Solaris 11 technologies for better scalability and data security, and you will learn how to

load data into the Hadoop cluster and run a MapReduce job.

Prerequisites

This hands-on lab is appropriate for system administrators who will be setting up or maintaining

a Hadoop cluster in production or development environments. Basic Linux or Oracle Solaris

system administration experience is a prerequisite. Prior knowledge of Hadoop is not required.

System Requirements

This hands-on lab is run on Oracle Solaris 11 in Oracle VM VirtualBox. The lab is self-

contained. All you need is in the Oracle VM VirtualBox instance.

For those attending the lab at Oracle OpenWorld, your laptops are already preloaded with the

correct Oracle VM VirtualBox image.

If you want to try this lab outside of Oracle OpenWorld, you will need an Oracle Solaris 11

system. Do the following to set up your machine:

1. If you do not have Oracle Solaris 11, download it here. 2. Download the Oracle Solaris 11.1 VirtualBox Template (file size 1.7GB). 3. Install the template as described here. (Note: On step 4 of Exercise 2 for installing the template,

set the RAM size to 4 GB in order to get good performance.)

Page 2: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

Notes for Oracle Open World Attendees

Each attendee will have his or her own laptop for the lab. In this lab we are going to use the “welcome1” password for all the user accounts

Oracle Solaris 11 uses the GNOME desktop. If you have used the desktops on Linux or other UNIX operating systems, the interface should be familiar. Here are some quick basics in case the interface is new for you.

o In order to open a terminal window in the GNOME desktop system, right-click the background of the desktop, and select Open Terminal in the pop-up menu.

o The following source code editors are provided on the lab machines: vi (type vi in a terminal window) and emacs (type emacs in a terminal window).

Summary of Lab Exercises

This hands-on lab consists the following exercises covering various Oracle Solaris and Apache

Hadoop technologies:

Download and Install Hadoop

Configure the Network Time Protocol

Create the Scripts

Create the NameNodes, DataNodes, and ResourceManager Zones

Configure the Active NameNode

Set Up SSH

Set Up the Standby NameNode and the ResourceManager

Set Up the DataNode Zones

Verify the SSH Setup

Verify Name Resolution

Format the Hadoop File System

Start the Hadoop Cluster

About Hadoop High Availability

Configure Manual Failover

About Apache ZooKeeper and Automatic Failover

Configure Automatic Failover

Conclusion

The Case for Hadoop

The Apache Hadoop software is a framework that allows for the distributed processing of large

data sets across clusters of computers using simple programming models.

To store data, Hadoop uses the Hadoop Distributed File System (HDFS), which provides high-

throughput access to application data and is suitable for applications that have large data sets.

For more information about Hadoop and HDFS, see http://hadoop.apache.org/.

Page 3: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

The Hadoop cluster building blocks are as follows:

Active NameNode: The centerpiece of HDFS, which stores file system metadata and is

responsible for all client operations

Standby NameNode: A secondary NameNode that synchronizes its state with the active

NameNode in order to provide fast failover if the active NameNode goes down

ResourceManager: The global resource scheduler, which directs the slave NodeManager

daemons to perform the low-level I/O tasks

DataNodes: Nodes that store the data in the HDFS file system and are also known as

slaves; these nodes run the NodeManager process that communicates with the

ResourceManager

History Server: Provides REST APIs in order to allow the user to get the status of

finished applications and provides information about finished jobs

In the previous Hadoop version, the NameNode was a single point of failure (SPOF) in an HDFS

cluster. Hadoop version 2.2 provides the ability to build an HDFS cluster with high availability

(HA), and this article describes the steps involved in building such a configuration.

In the example presented in this article, all the Hadoop cluster building blocks are installed using

Oracle Solaris Zones, ZFS, and Unified Archive. Figure 1 shows the architecture:

Figure 1

Exercise 1: Install Hadoop

1. In Oracle VM VirtualBox, enable a bidirectional "shared clipboard" between the host and the guest in order to enable copying and pasting text from this file.

Page 4: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

Figure 2

In this lab, we will use the Apache Hadoop "15 October, 2013: Release 2.2.0" release.

Note: Oracle OpenWorld attendees can skip the following step (because the preloaded Oracle

VM VirtualBox image already provides the Hadoop image).

Download the Hadoop binary file using a web browser. Open the Firefox web browser

from the desktop and download the file.

Page 5: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

Figure 3

Open a terminal window by right-clicking any point in the background of the desktop and

selecting Open Terminal in the pop-up menu.

Page 6: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

Figure 4

Important: In the examples presented in this article, the command prompt indicates which user

needs to run each command in addition to indicating the environment where the command

should be run. For example, the command prompt root@global _zone:~# indicates that user root

needs to run the command from the global zone.

Note: For Oracle OpenWorld attendees, the root password has been provided in the one-pager

associated with this lab. For those running this lab outside of Oracle OpenWorld, enter the root

password you entered when you followed the steps in the "System Requirements" section.

oracle@global_zone:~$ su - Password:

Oracle Corporation SunOS 5.11 11.1 September 2012

Set up the virtual network interface card (VNIC) in order to enable network access to the

global zone from the non-global zones.

root@global_zone:~# dladm create-vnic -l net0 vnic0

root@global_zone:~# ipadm create-ip vnic0

root@global_zone:~# ipadm create-addr -T static -a local=192.168.1.100/24

vnic0/addr

Verify the VNIC creation:

root@global_zone:~# ipadm show-addr vnic0

ADDROBJ TYPE STATE ADDR

vnic0/addr static ok 192.168.1.100/24

Page 7: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

In the global zone, create the /usr/local directory if it doesn't exist.

Note: The cluster configuration will share the Hadoop directory structure (/usr/local/hadoop)

across the zones as a read-only file system. Every Hadoop cluster node needs to be able to write

its logs to an individual directory. The directory /var/log is a best-practice directory for every

Oracle Solaris Zone.

root@global_zone:~# mkdir -p /usr/local

1. Copy the Hadoop tarball to /usr/local:

root@global_zone:~# cp /export/home/oracle/hadoop-2.2.0.tar.gz

/usr/local

Unpack the tarball:

root@global_zone:~# cd /usr/local

root@global_zone:~# tar -xfz /usr/local/hadoop-2.2.0.tar.gz

2. Create the hadoop group:

root@global_zone:~# groupadd -g 200 hadoop

3. Create a symlink for the Hadoop binaries:

root@global_zone:~# ln -s /usr/local/hadoop-2.2.0 /usr/local/hadoop

4. Give ownership to the hadoop group:

root@global_zone:~# chown -R root:hadoop /usr/local/hadoop-2.2.0

5. Change the permissions:

root@global_zone:~# chmod -R 755 /usr/local/hadoop-2.2.0

6. Edit the Hadoop configuration files, which are shown in Table 1: Table 1. Hadoop Configuration Files

File Name Description

hadoop-env.sh Specifies environment variable settings used by Hadoop

yarn-env.sh Specifies environment variable settings used by YARN

mapred-env.sh Specifies environment variable settings used by MapReduce

Page 8: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

Slaves Contains a list of machine names that run the DataNode and

NodeManager pair of daemons

core-site.xml Specifies parameters relevant to all Hadoop daemons and clients

hdfs-site.xml Specifies parameters used by the HDFS daemons and clients

mapred-

site.xml Specifies parameters used by the MapReduce daemons and clients

yarn-site.xml Specifies the configurations for the ResourceManager and

NodeManager

7. Run the following commands to change the hadoop-env.sh script:

root@global_zone:~# export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

root@global_zone:~# cd $HADOOP_CONF_DIR

Append the following lines to the hadoop-env.sh script:

root@global_zone:~# echo "export JAVA_HOME=/usr/java" >> hadoop-env.sh

root@global_zone:~# echo "export HADOOP_LOG_DIR=/var/log/hadoop/hdfs" >>

hadoop-env.sh

Append the following lines to the yarn-env.sh script:

root@global_zone:~# vi yarn-env.sh

export JAVA_HOME=/usr/java

export YARN_LOG_DIR=/var/log/hadoop/yarn

export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

Append the following lines to the mapred-env.sh script:

root@global_zone:~# echo "export JAVA_HOME=/usr/java" >> mapred-env.sh

root@global_zone:~# echo "export

HADOOP_MAPRED_LOG_DIR=/var/log/hadoop/mapred" >> mapred-env.sh

root@global_zone:~# echo "export HADOOP_MAPRED_IDENT_STRING=mapred" >>

mapred-env.sh

Edit the slaves file to replace the localhost entry with the following lines:

Page 9: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

root@global_zone:~# vi slaves

data-node1

data-node2

data-node3

Edit the core-site.xml file so it looks like the following:

Note: fs.defaultFS is the URI that describes the NameNode address (protocol

specifier, hostname, and port) for the cluster. Each DataNode instance will register with

this NameNode and make its data available through it. In addition, the DataNodes send

heartbeats to the NameNode to confirm that each DataNode is operating and the block

replicas it hosts are available.

root@global_zone:~# vi core-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://name-node1</value>

</property>

</configuration>

Edit the hdfs-site.xml file so it looks like the following.

Notes:

dfs.datanode.data.dir The path on the local file system in which the

DataNode instance should store its data.

dfs.namenode.name.dir

The path on the local file system of the NameNode

instance where the NameNode metadata is stored. It is

used only by the NameNode instance to find its

information.

dfs.replication The default replication factor for each block of data in

the file system. (For a production cluster, this should

usually be left at its default value of 3).

Page 10: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

dfs.permission.supergroup

Specifies the UNIX group containing users that will be

treated as superusers by HDFS. You can stick with the

value of hadoop or pick your own group depending on

the security policies at your site.

root@global_zone:~# vi hdfs-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>dfs.datanode.data.dir</name>

<value>/var/data/1/dfs/dn</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>/var/data/1/dfs/nn</value>

</property>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<property>

<name>dfs.permission.supergroup</name>

<value>hadoop</value>

</property>

</configuration>

Create and then edit the mapred-site.xml file so it looks like the following:

Notes:

mapreduce.framework.name Sets the execution framework to Hadoop

YARN

mapreduce.jobhistory.address Specifies the MapReduce History Server's

host:port

mapreduce.jobhistory.webapp.address Specifies the MapReduce History Server's

web UI host:port

yarn.app.mapreduce.am.staging-dir Specifies a staging directory, which YARN

requires for temporary files created by

Page 11: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

running jobs

root@global_zone:~# cp mapred-site.xml.template mapred-site.xml

root@global_zone:~# vi mapred-site.xml

<?xml version="1.0"?>

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>resource-manager:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>resource-manager:19888</value>

</property>

<property>

<name>yarn.app.mapreduce.am.staging-dir</name>

<value>/user</value>

</property>

</configuration>

Edit the yarn-site.xml file so it looks like the following:

Notes:

yarn.nodemanager.aux-services Specifies the shuffle service that needs

to be set for MapReduce applications.

yarn.nodemanager.aux-

services.mapreduce.shuffle.class Specifies the exact name of the class

for shuffle service.

yarn.resourcemanager.hostname Specifies the ResourceManager's host

name.

yarn.nodemanager.local-dirs Is a comma-separated list of paths on

the local file system where

intermediate data is written.

yarn.nodemanager.log-dirs Specifies the URIs of the directories

where the NodeManager stores

Page 12: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

container log files.

yarn.log-aggregation-enable Specifies the configuration to enable

or disable log aggregation.

yarn.nodemanager.log-dirs Specifies a comma-separated list of

paths on the local file system where

logs are written.

root@global_zone:~# vi yarn-site.xml

<?xml version="1.0"?>

<configuration>

<!-- Site specific YARN configuration properties -->

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-

services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

<name>yarn.resourcemanager.hostname</name>

<value>resource-manager</value>

</property>

<property>

<name>yarn.nodemanager.local-dirs</name>

<value>file:///var/data/1/yarn/local</value>

</property>

<property>

<name>yarn.nodemanager.log-dirs</name>

<value>file:///var/data/1/yarn/logs</value>

</property>

<property>

<name>yarn.log.aggregation.enable</name>

<value>true</value>

</property>

<property>

<description>Where to aggregate logs</description>

<name>yarn.nodemanager.remote-app-log-dir</name>

<value>hdfs://var/log/hadoop-yarn/apps</value>

</property>

</configuration>

Page 13: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

Configure the Network Time Protocol

We should ensure that the system clock on the Hadoop zones is synchronized by using the

Network Time Protocol (NTP).

Note: It is best to select an NTP server that can be a dedicated time synchronization source so

that other services are not negatively affected if the node is brought down for planned

maintenance.

In the following example, the global zone is configured as an NTP server.

1. Configure an NTP server:

root@global_zone:~# cp /etc/inet/ntp.server /etc/inet/ntp.conf

root@global_zone:~# chmod +w /etc/inet/ntp.conf

root@global_zone:~# touch /var/ntp/ntp.drift

2. Append the following lines to the NTP server configuration file:

root@global_zone:~# vi /etc/inet/ntp.conf

server 127.127.1.0 prefer

broadcast 224.0.1.1 ttl 4

enable auth monitor

driftfile /var/ntp/ntp.drift

statsdir /var/ntp/ntpstats/

filegen peerstats file peerstats type day enable

filegen loopstats file loopstats type day enable

filegen clockstats file clockstats type day enable

keys /etc/inet/ntp.keys

trustedkey 0

requestkey 0

controlkey 0

3. Enable the NTP server service:

root@global_zone:~# svcadm enable ntp

4. Verify that the NTP server is online by using the following command:

root@global_zone:~# svcs ntp

STATE STIME FMRI

online 15:27:55 svc:/network/ntp:default

root@global_zone:~# mkdir /usr/local/Scripts

Create the Scripts

In the following steps, you will create utility scripts that will be used to simplify repetitive processes. Table 2. Utility scripts provided for this lab exercise.

Page 14: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

File Name Description

createzone Used for initial zone creation

buildprofile Used to create profiles that will specify details such as hostnames and

IP addresses during the initial zone creation

verifycluster We will use this script to verify the Hadoop cluster setup.

testssh Verify that password-less ssh is enabled between zones

startCluster Start every Solaris zone in the Hadoop Cluster

stopCluster Stop every Solaris zone in the Hadoop Cluster

1. Create the createzone script using your favorite editor, as shown in Listing 1. We will use this script to set up the Oracle Solaris Zones.

root@global_zone:~# vi /usr/local/Scripts/createzone

#!/bin/ksh

# FILENAME: createzone

# Create a zone

# Usage:

# createzone <zone name>

if [ $# != 1 ]

then

echo "Usage: createzone <zone name> "

exit 1

fi

ZONENAME=$1

VNICNAME=$2

zonecfg -z $ZONENAME > /dev/null 2>&1 << EOF

create

set autoboot=true

set limitpriv=default,dtrace_proc,dtrace_user,sys_time

set zonepath=/zones/$ZONENAME

add fs

set dir=/usr/local

set special=/usr/local

set type=lofs

set options=[ro,nodevices]

end

verify

exit

Page 15: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

EOF

if [ $? == 0 ] ; then

echo "Successfully created the $ZONENAME zone"

else

echo "Error: unable to create the $ZONENAME zone"

exit 1

fi

Listing 1. createzone script

2. Create the buildprofile script using your favorite editor, as shown in Listing 2. We will use this script to set up the Oracle Solaris Zones.

root@global_zone:~# vi /usr/local/Scripts/buildprofile

#!/bin/ksh

#

# Copyright 2006-2011 Oracle Corporation. All rights reserved.

# Use is subject to license terms.

#

# This script serves as an example of how to instantiate several zones

# with no administrative interaction. Run the script with no arguments

# to get a usage message.

export PATH=/usr/bin:/usr/sbin

me=$(basename $0)

function fail_usage {

print -u2 "Usage:

$me <sysconfig.xml>

<zone> <ipaddr>"

exit 2

}

function error {

print -u2 "$me: ERROR: $@"

}

# Parse and check arguments

(( $# != 3 )) && fail_usage

# Be sure the sysconfig profile is readable and ends in .xml

sysconfig=$1

zone=$2

ipaddr=$3

if [[ ! -f $sysconfig || ! -r $sysconfig || $sysconfig != *.xml ]] ;

then

error "sysconfig profile missing, unreadable, or not *.xml"

fail_usage

fi

#

# Create a temporary directory for all temp files

#

export TMPDIR=$(mktemp -d /tmp/$me.XXXXXX)

if [[ -z $TMPDIR ]]; then

Page 16: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

error "Could not create temporary directory"

exit 1

fi

trap 'rm -rf $TMPDIR' EXIT

# Customize the nodename in the sysconfig profile

z_sysconfig=$TMPDIR/{$zone}.xml

z_sysconfig2=$TMPDIR/{$zone}2.xml

search="<propval type=\"astring\" name=\"nodename\" value=\"name-

node1\"/>"

replace="<propval type=\"astring\" name=\"nodename\"

value=\"$zone\"/>"

sed "s|$search|$replace|" $sysconfig > $z_sysconfig

search="<propval type=\"net_address_v4\" name=\"static_address\"

value=\"192.168.1.1/24\"/>"

replace="<propval type=\"net_address_v4\" name=\"static_address\"

value=\"$ipaddr\"/>"

sed "s|$search|$replace|" $z_sysconfig > $z_sysconfig2

cp $z_sysconfig2 ./$zone-template.xml

rm -rf $TMPDIR

exit 0

Listing 2. buildprofile script

3. Create the verifycluster script using your favorite editor, as shown in Listing 3. We will use this script to verify the Hadoop cluster setup.

root@global_zone:~# vi /usr/local/Scripts/verifycluster

#!/bin/ksh

# FILENAME: verifycluster

# Verify the hadoop cluster configuration

# Usage:

# verifycluster

RET=1

for transaction in _; do

for i in name-node1 name-node2 resource-manager data-node1 data-node2

data-node3

do

cmd="zlogin $i ls /usr/local > /dev/null 2>&1 "

eval $cmd || break 2

done

Page 17: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

for i in name-node1 name-node2 resource-manager data-node1 data-

node2 data-node3

do

cmd="zlogin $i ping name-node1 > /dev/null 2>&1"

eval $cmd || break 2

done

for i in name-node1 name-node2 resource-manager data-node1 data-

node2 data-node3

do

cmd="zlogin $i ping name-node2 > /dev/null 2>&1"

eval $cmd || break 2

done

for i in name-node1 name-node2 resource-manager data-node1 data-node2

data-node3

do

cmd="zlogin $i ping resource-manager > /dev/null 2>&1"

eval $cmd || break 2

done

for i in name-node1 name-node2 resource-manager data-node1 data-

node2 data-node3

do

cmd="zlogin $i ping data-node1 > /dev/null 2>&1"

eval $cmd || break 2

done

for i in name-node1 name-node2 resource-manager data-node1 data-

node2 data-node3

do

cmd="zlogin $i ping data-node2 > /dev/null 2>&1"

eval $cmd || break 2

done

for i in name-node1 name-node2 resource-manager data-node1 data-

node2 data-node3

do

cmd="zlogin $i ping data-node3 > /dev/null 2>&1"

eval $cmd || break 2

done

RET=0

done

if [ $RET == 0 ] ; then

echo "The cluster is verified"

else

echo "Error: unable to verify the cluster"

fi

exit $RET

Listing 3. verifycluster script

4. Create the testssh script, as shown in Listing 4. We will use this script to verify the SSH setup.

Page 18: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

root@global_zone:~# vi /usr/local/Scripts/testssh

#!/bin/ksh

for i in name-node1 name-node2 resource-manager data-node1 data-node2

data-node3

do

ssh $i exit

done

Listing 4. testssh script

5. Create the startcluster script, as shown in Listing 5. We will use this script to start all the services on the Hadoop cluster.

root@global_zone:~# vi /usr/local/Scripts/startcluster

#!/bin/ksh

zlogin -l hdfs name-node1 'hadoop-daemon.sh start namenode'

zlogin -l hdfs data-node1 'hadoop-daemon.sh start datanode'

zlogin -l hdfs data-node2 'hadoop-daemon.sh start datanode'

zlogin -l hdfs data-node3 'hadoop-daemon.sh start datanode'

zlogin -l yarn resource-manager 'yarn-daemon.sh start resourcemanager'

zlogin -l yarn data-node1 'yarn-daemon.sh start nodemanager'

zlogin -l yarn data-node2 'yarn-daemon.sh start nodemanager'

zlogin -l yarn data-node3 'yarn-daemon.sh start nodemanager'

zlogin -l mapred resource-manager 'mr-jobhistory-daemon.sh start

historyserver'

Listing 5. startcluster script

6. Create the stopcluster script, as shown in Listing 6. We will use this script to stop all the services on the Hadoop cluster.

root@global_zone:~# vi /usr/local/Scripts/stopcluster

#!/bin/ksh

zlogin -l hdfs name-node1 'hadoop-daemon.sh stop namenode'

zlogin -l hdfs data-node1 'hadoop-daemon.sh stop datanode'

zlogin -l hdfs data-node2 'hadoop-daemon.sh stop datanode'

zlogin -l hdfs data-node3 'hadoop-daemon.sh stop datanode'

zlogin -l yarn resource-manager 'yarn-daemon.sh stop resourcemanager'

zlogin -l yarn data-node1 'yarn-daemon.sh stop nodemanager'

zlogin -l yarn data-node2 'yarn-daemon.sh stop nodemanager'

zlogin -l yarn data-node3 'yarn-daemon.sh stop nodemanager'

zlogin -l mapred resource-manager 'mr-jobhistory-daemon.sh stop

historyserver'

Listing 6. stopcluster script

Page 19: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

7. The Solaris command “wc –l” will display the number of lines in files. You can use this as a sanity check to verify that your scripts are about the right size:

root@global_zone:~# wc -l /usr/local/Scripts/*

64 /usr/local/Scripts/buildprofile

36 /usr/local/Scripts/createzone

12 /usr/local/Scripts/startcluster

10 /usr/local/Scripts/stopcluster

9 /usr/local/Scripts/testssh

67 /usr/local/Scripts/verifycluster

198 total!/bin/ksh

8. Change the scripts' permissions:

root@global_zone:~# chmod +x /usr/local/Scripts/*

Create the NameNodes, DataNodes, and ResourceManager Zones

We will leverage the integration between Oracle Solaris Zones virtualization technology and the

ZFS file system that is built into Oracle Solaris.

Table 2 shows a summary of the Hadoop zones we will create:

Table 2. Zone Summary

Function Zone Name ZFS Mount Point IP Address

Active NameNode name-node1 /zones/name-node 192.168.1.1/24

Standby

NameNode name-node2 /zones/sec-name-node 192.168.1.2/24

ResourceManager resource-

manager /zones/resource-

manager 192.168.1.3/24

DataNode data-node1 /zones/data-node1 192.168.1.4/24

DataNode data-node2 /zones/data-node2 192.168.1.5/24

DataNode data-node3 /zones/data-node3 192.168.1.6/24

Page 20: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

1. Create the name-node1 zone using the createzone script, which will create the zone configuration file. For the argument, the script needs the zone's name, for example, createzone <zone name>.

root@global_zone:~# /usr/local/Scripts/createzone name-node1

Successfully created the name-node1 zone

2. Create the name-node2 zone using the createzone script:

root@global_zone:~# /usr/local/Scripts/createzone name-node2

Successfully created the name-node2 zone

3. Create the resource-manager zone using the createzone script:

root@global_zone:~# /usr/local/Scripts/createzone resource-manager

Successfully created the resource-manager zone

4. Create the three DataNode zones using the createzone scripts:

root@global_zone:~# /usr/local/Scripts/createzone data-node1

Successfully created the data-node1 zone

root@global_zone:~# /usr/local/Scripts/createzone data-node2

Successfully created the data-node2 zone

root@global_zone:~# /usr/local/Scripts/createzone data-node3

Successfully created the data-node3 zone

Configure the Active NameNode

Let's create a system configuration profile template for the name-node1 zone. The system

configuration profile will include the host information, such as the host name, IP address, and

name services.

1. Run the sysconfig command, which will start the System Configuration Tool (see Figure 2):

root@global_zone:~# sysconfig create-profile

Page 21: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

Figure 2. System Configuration Tool

2. Press ESC-2 to start the wizard

3. Provide the zone's host information by using the following configuration for the name-node1 zone:

a. For the host name, use name-node1 b. Select manual network configuration. c. Ensure the network interface net0 has an IP address of 192.168.1.1 and a netmask of

255.255.255.0 . Leave the “router” field blank. d. Ensure the name service is based on your network configuration. In this article, we will

use /etc/hosts for name resolution, so we won't set up DNS for host name resolution. Select Do not configure DNS.

e. For Alternate Name Service, select None. f. For Time Zone Regions, select Americas. g. For Time Zone Locations, select United States. h. For Time Zone, select Pacific Time. i. For Locale: Language, select English. j. For Locale: Territory, select United States (en_US.UTF-8). k. For Keyboard, select US-English. l. Enter your root password, but leave the optional user account blank. m. For Support – Registration, provide your My Oracle Support credentials. n. For Support – Network Configuration, select an internet access method for Oracle

Configuration Manager and Oracle Auto Service Request.

Page 22: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

o. Review the settings below before hitting Esc-2 “apply”. The changes are not “applied”, instead, they are written to a file named /system/volatile/profile/sc_profile.xml

4. Copy the profile to /root/name-node1-template.xml:

root@global_zone:~# cp /system/volatile/profile/sc_profile.xml

/root/name-node1-template.xml

5. Now, install the name-node1 zone. Installing the first zone will take a couple of minutes. Later we will clone this zone in order to accelerate the creation of the other zones:

root@global_zone:~# zoneadm -z name-node1 install -c /root/name-node1-

template.xml

The following ZFS file system(s) have been created:

rpool/zones/name-node1

Progress being logged to /var/log/zones/zoneadm.20140225T111519Z.name-

node1.install

Image: Preparing at /zones/name-node1/root.

[...]

6. Boot the name-node1 zone:

root@global_zone:~# zoneadm -z name-node1 boot

7. Check the status of the zones we've created:

root@global_zone:~# zoneadm list -cv

ID NAME STATUS PATH BRAND IP

0 global running / solaris shared

1 name-node1 running /zones/name-node1 solaris excl

- name-node2 configured /zones/name-node2 solaris excl

- resource-manager configured /zones/resource-manager solaris excl

- data-node1 configured /zones/data-node1 solaris excl

- data-node2 configured /zones/data-node2 solaris excl

- data-node3 configured /zones/data-node3 solaris excl

We can see the six zones that we have created.

8. zlogin is a utility that is used to enter a non-global zone from the global zone. zlogin has 3 modes: interactive, non-interactive and console. For our first login to the newly created zone, we will use the console (-C) mode. When you log in to the console of name-node1 zone, you will see the progress of the initial boot. Subsequent boots will be much faster.

root@global_zone:~# zlogin –C name-node1

[Connected to zone 'name-node1' console]

134/134

Hostname: name-node1

. . .

login: root

Password: ********

Page 23: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

9. Verify that all the services are up and running: root@name-node1:~# svcs -xv

10. If all the services are up and running without any issues, the command will return to the system prompt without any error message. To disconnect from a zone virtual console, use the tilde (~) character and a period:

root@name-node1:~ # ~.

[Connection to zone 'name-node1' console closed]

11. Re-enter the zone in interactive mode.

root@global_zone:~# zlogin name-node1

[Connected to zone 'name-node1' pts/4]

Oracle Corporation SunOS 5.11 11.2 June 2014

root@name-node1:~#

12. Developing for Hadoop requires a Java programming environment. You can install Java Development Kit (JDK) 7 using the following command:

root@name-node1:~# pkg install --accept jdk-7

13. Verify the Java installation:

root@name-node1:~# java -version

java version "1.7.0_55”

Java(TM) SE Runtime Environment (build 1.7.0_55-b13)

Java HotSpot(TM) Server VM (build 24.55-b03, mixed mode

14. Create the hadoop group:

root@name-node1:~# groupadd -g 200 hadoop

For the Hadoop cluster, create the four users shown in Table 3. Table 3. Hadoop Users Summary

User:Group Description

hdfs:hadoop The NameNodes and DataNodes run as this user.

yarn:hadoop The ResourceManager and NodeManager services run as this user.

mapred:hadoop The History Server runs as this user.

Page 24: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

bob:staff This user will run the MapReduce jobs.

15. Add the hdfs user:

root@name-node1:~# useradd -u 200 -m -g hadoop hdfs

Set the hdfs user's password.

In this lab we are going to use the “welcome1” password for all he accounts

root@name-node1:~# passwd hdfs

New Password:<enter hdfs password>

Re-enter new Password: <re-enter hdfs password>

passwd: password successfully changed for hdfs

Add the yarn user:

root@name-node1:~# useradd -u 201 -m -g hadoop yarn

root@name-node1:~# passwd yarn

New Password: <enter yarn password>

Re-enter new Password: <re-enter yarn password>

passwd: password successfully changed for yarn

Add the mapred user:

root@name-node1:~# useradd -u 202 -m -g hadoop mapred

root@name-node1:~# passwd mapred

New Password: <enter mapred password>

Re-enter new Password: <re-enter mapred password>

passwd: password successfully changed for mapred

Create a directory for the YARN log files:

root@name-node1:~# mkdir -p /var/log/hadoop/yarn

root@name-node1:~# chown yarn:hadoop /var/log/hadoop/yarn

Create a directory for the HDFS log files:

root@name-node1:~# mkdir -p /var/log/hadoop/hdfs

root@name-node1:~# chown hdfs:hadoop /var/log/hadoop/hdfs

Create a directory for the mapred log files:

root@name-node1:~# mkdir -p /var/log/hadoop/mapred

root@name-node1:~# chown mapred:hadoop /var/log/hadoop/mapred

Create a directory for the HDFS metadata:

Page 25: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

root@name-node1:~# mkdir -p /var/data/1/dfs/nn

root@name-node1:~# chmod 700 /var/data/1/dfs/nn

root@name-node1:~# chown -R hdfs:hadoop /var/data/1/dfs/nn

Create a Hadoop data directory to store the HDFS blocks:

root@name-node1:~# mkdir -p /var/data/1/dfs/dn

root@name-node1:~# chown -R hdfs:hadoop /var/data/1/dfs/dn

Configure local storage directories for use by YARN:

root@name-node1:~# mkdir -p /var/data/1/yarn/local

root@name-node1:~# mkdir -p /var/data/1/yarn/logs

root@name-node1:~# chown -R yarn:hadoop /var/data/1/yarn/local

root@name-node1:~# chown -R yarn:hadoop /var/data/1/yarn/logs

Create the runtime directories:

root@name-node1:~# mkdir -p /var/run/hadoop/yarn

root@name-node1:~# chown yarn:hadoop /var/run/hadoop/yarn

root@name-node1:~# mkdir -p /var/run/hadoop/hdfs

root@name-node1:~# chown hdfs:hadoop /var/run/hadoop/hdfs

root@name-node1:~# mkdir -p /var/run/hadoop/mapred

root@name-node1:~# chown mapred:hadoop /var/run/hadoop/mapred

Add the user bob (later this user will run the MapReduce jobs):

root@name-node1:~# useradd -m -u 1000 bob

root@name-node1:~# passwd bob

New Password: <enter bob password>

Re-enter new Password: <re-enter bob password>

passwd: password successfully changed for bob

16. Switch to user bob

root@name-node1:~# su - bob

17. Using your favorite editor, append the following lines to .profile:

bob@name-node1:~$ vi $HOME/.profile

# Set JAVA_HOME

export JAVA_HOME=/usr/java

# Add Hadoop bin/ directory to PATH

export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

18. Logout using the exit command

bob@name-node1:~$ exit

Page 26: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

logout

19. Configure an NTP client, as shown in the following example:

Install the NTP package:

root@name-node1:~# pkg install ntp

Create the NTP client configuration files:

root@name-node1:~# cp /etc/inet/ntp.client /etc/inet/ntp.conf

root@name-node1:~# chmod +w /etc/inet/ntp.conf

root@name-node1:~# touch /var/ntp/ntp.drift

a. Edit the NTP client configuration file:

Note: In this setup, we are using the global zone as a time server so we add its

name (for example, global-zone) to /etc/inet/ntp.conf.

root@name-node1:~# vi /etc/inet/ntp.conf

Append these lines to the bottom of the file:

server global-zone prefer

driftfile /var/ntp/ntp.drift

statsdir /var/ntp/ntpstats/

filegen peerstats file peerstats type day enable

filegen loopstats file loopstats type day enable

20. Add the Hadoop cluster members' host names and IP addresses to /etc/hosts:

root@name-node1:~# vi /etc/hosts

::1 localhost

127.0.0.1 localhost loghost

192.168.1.1 name-node1

192.168.1.2 name-node2

192.168.1.3 resource-manager

192.168.1.4 data-node1

192.168.1.5 data-node2

192.168.1.6 data-node3

192.168.1.100 global-zone

21. Enable the NTP client service:

root@name-node1:~# svcadm enable ntp

22. Verify the NTP client status:

root@name-node1:~#:~# svcs ntp

STATE STIME FMRI

online 1:04:35 svc:/network/ntp:default

Page 27: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

Check whether the NTP client can synchronize its clock with the NTP server:

root@name-node1:~# ntpq -p

remote refid st t when poll reach delay offset

jitter

=======================================================================

=======

global-zone LOCAL(0) 6 u 19 64 1 0.374 0.119

0.000

You can see that the global-zone is the NTP server

Set Up SSH

Set up SSH key-based authentication for the Hadoop users on the name-node1 zone in order to

enable password-less login to other zones in the Hadoop cluster:

First, switch to the user hdfs and copy the SSH public key into the ~/.ssh/authorized_keys file:

root@name-node1:~# su - hdfs

Oracle Corporation SunOS 5.11 11.1 September 2012

hdfs@name-node1:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa

hdfs@name-nod1e:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Edit $HOME/.profile and append to the end of the file the following lines:

hdfs@name-node1:~$ vi $HOME/.profile

# Set JAVA_HOME

export JAVA_HOME=/usr/java

# Add Hadoop bin/ directory to PATH

export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

1. Switch to user yarn and edit $HOME/.profile to append to the end of the file the following lines:

hdfs@name-node1:~$ su - yarn

Password: <provide yarn password>

Oracle Corporation SunOS 5.11 11.1 September 2012

yarn@name-node1:~$ vi $HOME/.profile

# Set JAVA_HOME

export JAVA_HOME=/usr/java

# Add Hadoop bin/ directory to PATH

export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

Page 28: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

2. Copy the SSH public key into the ~/.ssh/authorized_keys file:

yarn@name-node1:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa

yarn@name-node1:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

3. Switch to user mapred and edit $HOME/.profile to append to the end of the file the following lines:

yarn@name-node1:~$ su - mapred

Password: <provide mapred password>

Oracle Corporation SunOS 5.11 11.1 September 2012

mapred@name-node1:~$ vi $HOME/.profile

# Set JAVA_HOME

export JAVA_HOME=/usr/java

# Add Hadoop bin/ directory to PATH

export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

4. Copy the SSH public key into the ~/.ssh/authorized_keys file:

mapred@name-node1:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa

5. mapred@name-node1:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Set Up the Standby NameNode and the ResourceManager

1. Run the following command to execute the .profile script:

mapred@name-node1:~$ source $HOME/.profile

2. Check that Hadoop runs by running the following command:

mapred@name-node1:~$ hadoop version

Hadoop 2.2.0

Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768

Compiled by hortonmu on 2013-10-07T06:28Z

Compiled with protoc 2.5.0

From source with checksum 79e53ce7994d1628b240f09af91e1af4

This command was run using /usr/local/hadoop-

2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar

Note: Press Ctrl-D several times until you exit from the name-node1 console and return

to the global zone. You can verify that you are in the global zone by using the zonename

command:

Page 29: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

root@global_zone:~# zonename

global

3. Create a profile for the name-node2 zone using the name-node1 profile as a template and using the buildprofile script. In a later step, we will use this profile in order to create the name-node2 zone.

Note: For arguments, the script needs the template profile's name (/root/name-node1-

template.xml, which we created in a previous step), the zone's name (name-node2), and

the zone's IP address (192.168.1.2, as shown in Table 2).

Change to the /root directory and create the zone profile there:

root@global_zone:~# cd /root

root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-

node1-template.xml name-node2 192.168.1.2/24

Verify the profile's creation:

root@global_zone:~# ls -l /root/name-node2-template.xml

-rw-r--r-- 1 root root 3715 Feb 25 05:59 /root/name-

node2-template.xml

4. From the global zone, run the following command to create the name-node2 zone as a clone of the name-node1:

Shut down the name-node1 zone (we can clone only halted zones):

root@global_zone:~# zoneadm -z name-node1 shutdown

Then clone the zone using the profile we created for name-node2:

root@global_zone:~# zoneadm -z name-node2 clone -c /root/name-

node2-template.xml name-node1

5. Boot the name-node2 zone:

root@global_zone:~# zoneadm -z name-node2 boot

6. Log in to the name-node2 zone:

root@global_zone:~# zlogin name-node2

7. Wait two minutes and verify that all the services are up and running:

root@name-node2:~# svcs -xv

Page 30: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

If all the services are up and running without any issues, the command will return to the

system prompt without any error message.

8. Exit from the name-node2 zone by pressing Ctrl- D.

9. Create the resource-manager profile using the name-node1 profile as a template:

root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-

template.xml resource-manager 192.168.1.3/24

10. Create the data-node1 profile using the name-node1 profile as a template:

root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-

template.xml data-node1 192.168.1.4/24

11. Create the data-node2 profile using the name-node1 profile as a template:

root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-

template.xml data-node2 192.168.1.5/24

12. Create the data-node3 profile using the name-node1 profile as a template:

root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-

template.xml data-node3 192.168.1.6/24

13. Verify the creation of the profiles:

root@global_zone:~# ls -l /root/*.xml

-rw-r--r-- 1 root root 3715 Feb 25 08:05 /root/data-node1-

template.xml

-rw-r--r-- 1 root root 3715 Feb 25 08:05 /root/data-node2-

template.xml

-rw-r--r-- 1 root root 3715 Feb 25 08:05 /root/data-node3-

template.xml

-r-------- 1 root root 3715 Feb 25 03:11 /root/name-node1-

template.xml

-rw-r--r-- 1 root root 3715 Feb 25 07:57 /root/name-node2-

template.xml

-rw-r--r-- 1 root root 3735 Feb 25 08:04 /root/resource-manager-

template.xml

14. From the global zone, run the following command to create the resource-manager zone as a clone of name-node1:

root@global_zone:~# zoneadm -z resource-manager clone -c

/root/resource-manager-template.xml name-node1

15. Boot the resource-manager zone:

root@global_zone:~# zoneadm -z resource-manager boot

Page 31: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

Set Up the DataNode Zones

In this section, we can leverage the integration between Oracle Solaris Zones virtualization

technology and the ZFS file system that is built into Oracle Solaris.

6. Run the following commands to create the three DataNode zones as a clone of the name-node1 zone, and then boot the new zones:

root@global_zone:~# zoneadm -z data-node1 clone -c /root/data-node1-

template.xml name-node1

root@global_zone:~# zoneadm -z data-node1 boot

root@global_zone:~# zoneadm -z data-node2 clone -c /root/data-node2-

template.xml name-node1

root@global_zone:~# zoneadm -z data-node2 boot

root@global_zone:~# zoneadm -z data-node3 clone -c /root/data-node3-

template.xml name-node1

root@global_zone:~# zoneadm -z data-node3 boot

7. Boot the name-node1 zone:

root@global_zone:~# zoneadm -z name-node1 boot

8. Check the status of the zones we've created:

root@global_zone:~# zoneadm list -cv

ID NAME STATUS PATH BRAND IP

0 global running / solaris shared

6 name-node1 running /zones/name-node1 solaris excl

10 name-node2 running /zones/name-node2 solaris excl

11 resource-manager running /zones/resource-manager solaris excl

12 data-node1 running /zones/data-node1 solaris excl

13 data-node2 running /zones/data-node2 solaris excl

14 data-node3 running /zones/data-node3 solaris excl

We can see that all the zones are running now.

Verify the SSH Setup

1. Log in to the name-node1 zone:

root@global_zone:~# zlogin name-node1

[Connected to zone 'name-node1' pts/1]

Oracle Corporation SunOS 5.11 11.1 September 2012

root@name-node1:~# su - hdfs

Oracle Corporation SunOS 5.11 11.1 September 2012

2. Run the testssh script to log in to the cluster nodes using the ssh command:

Page 32: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

Note: Once for each zone (name-node1 and name-node2), six times, you will need to

enter yes at the command prompt for the "Are you sure you want to continue connecting

(yes/no)?" question.

hdfs@name-node1:~$ /usr/local/Scripts/testssh

The authenticity of host 'name-node1 (192.168.1.1)' can't be

established.

RSA key fingerprint is 07:b6:b7:40:0c:39:cd:60:32:c4:98:07:66:79:63:1c.

Are you sure you want to continue connecting (yes/no)? yes

3. Switch to user yarn and run the testssh script again:

root@name-node1:~# su - yarn

Password: <enter yarn password>

yarn@name-node1:~$ /usr/local/Scripts/testssh

4. Switch to user mapred and run the testssh script again:

yarn@name-node1:~$ su - mapred

Password: <enter mapred password>

mapred@name-node1:~$ /usr/local/Scripts/testssh

5. Press Control-D four times to return to the global zone and repeat similar steps for name-node2:

Edit the /etc/hosts file inside name-node2 in order to add the name-node1 entry:

root@global_zone:~# zlogin name-node2 'echo "192.168.1.1 name-

node1" >> /etc/hosts'

Log in to the name-node2 zone:

root@global_zone:~# zlogin name-node2

[Connected to zone 'name-node1' pts/1]

Oracle Corporation SunOS 5.11 11.1 September 2012

root@name-node2:~# su - hdfs

Oracle Corporation SunOS 5.11 11.1 September 2012

a. Run the testssh script in order to log in to the cluster nodes using the ssh command.

Note: Enter yes at the command prompt for the "Are you sure you want to

continue connecting (yes/no)?" question.

hdfs@name-node2:~$ /usr/local/Scripts/testssh

The authenticity of host 'name-node1 (192.168.1.1)' can't be

established.

RSA key fingerprint is

07:b6:b7:40:0c:39:cd:60:32:c4:98:07:66:79:63:1c.

Are you sure you want to continue connecting (yes/no)? yes

Switch to user yarn:

Page 33: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

root@name-node2:~# su - yarn

Password: <enter yarn password>

Run the testssh script:

yarn@name-node2:~$ /usr/local/Scripts/testssh

Switch to user mapred:

yarn@name-node2:~$ su - mapred

Password: <enter mapred password>

Run the testssh script:

mapred@name-node2:~$ /usr/local/Scripts/testssh

Verify Name Resolution

1. From the global zone, edit the /etc/hosts files inside resource-manager and the DataNodes in order to add the name-node1 entry:

root@global_zone:~# zlogin name-node2 'echo "192.168.1.1 name-node1" >>

/etc/hosts'

root@global_zone:~# zlogin resource-manager 'echo "192.168.1.1 name-

node1" >> /etc/hosts'

root@global_zone:~# zlogin data-node1 'echo "192.168.1.1 name-node1" >>

/etc/hosts'

root@global_zone:~# zlogin data-node2 'echo "192.168.1.1 name-node1" >>

/etc/hosts'

root@global_zone:~# zlogin data-node3 'echo "192.168.1.1 name-node1" >>

/etc/hosts'

2. Verify name resolution by ensuring that the /etc/hosts files for the global zone and all the Hadoop zones have the host entries shown below:

root@global-zone:~# for zone in name-node1 name-node2 resource-manager

data-node1 data-node2 data-node3; do echo "============== $zone

============"; zlogin $zone cat /etc/hosts; done

============== name-node1 ============

::1 localhost

127.0.0.1 localhost loghost

192.168.1.1 name-node1

192.168.1.2 name-node2

192.168.1.3 resource-manager

192.168.1.4 data-node1

192.168.1.5 data-node2

192.168.1.6 data-node3

192.168.1.100 global-zone

Page 34: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

Note: If you are using the global zone as an NTP server, you must also add its host name

and IP address to /etc/hosts.

3. Verify the cluster using the verifycluster script:

root@global_zone:~# /usr/local/Scripts/verifycluster

If the cluster setup is correct, you will get a cluster is verified message.

Note: If the verifycluster script fails with an error message, check that the

/etc/hosts file in every zone includes all the zones names, as described in Step 1, and

then rerun the verifycluster script again.

Format the Hadoop File System

1. To format HDFS, run the following commands:

root@global_zone:~# zlogin -l hdfs name-node1

hdfs@name-node:$ hdfs namenode -format

2. Look for the following message, which indicates HDFS has been set up:

...

INFO common.Storage: Storage directory /var/data/1/dfs/nn has been

successfully formatted.

...

Start the Hadoop Cluster

Table 4 describes the startup scripts.

Table 4. Startup Scripts

User Command Command Description

hdfs hadoop-daemon.sh start namenode Starts the HDFS daemon (NameNode

process)

hdfs hadoop-daemon.sh start datanode Starts the DataNode process on all

DataNodes

yarn yarn-daemon.sh start

resourcemanager Starts YARN on the

ResourceManager

Page 35: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

yarn yarn-daemon.sh start

nodemanager Starts the NodeManager process on

all DataNodes

mapred mr-jobhistory-daemon.sh start

historyserver Starts the MapReduce History Server

1. Start HDFS by running the following command:

hdfs@name-node1:~$ hadoop-daemon.sh start namenode

starting namenode, logging to /var/log/hadoop/hdfs/hadoop--namenode-

name-node1.out

2. Run the jps command to verify that the NameNode process has been started:

hdfs@name-node1:~$ /usr/jdk/latest/bin/jps | grep NameNode

4223 NameNode

You should see the NameNode process ID (for example, 4223). If the process did not

start, look at the log file /var/log/hadoop/hdfs/hadoop--namenode-name-node1.log

to find the reason.

3. Exit from the name-node1 zone by pressing Ctrl-D. 4. Start the DataNodes on all the slaves (data-node1, data-node2, and data-node3):

Run the following commands for data-node1:

root@global_zone:~# zlogin -l hdfs data-node1

hdfs@data-node1:~$ hadoop-daemon.sh start datanode

hdfs@data-node1:~$ /usr/jdk/latest/bin/jps | grep DataNode

19762 DataNode

Exit from the data-node1 zone by pressing Ctrl-D.

Run the following commands for data-node2:

root@global_zone:~# zlogin -l hdfs data-node2

hdfs@data-node2:~$ hadoop-daemon.sh start datanode

hdfs@data-node2:~$ /usr/jdk/latest/bin/jps | grep DataNode

21525 DataNode

Exit from the data-node2 zone by pressing Ctrl-D.

Run the following commands for data-node3:

root@global_zone:~# zlogin -l hdfs data-node3

Page 36: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

hdfs@data-node3:~$ hadoop-daemon.sh start datanode

hdfs@data-node3:~$ /usr/jdk/latest/bin/jps | grep DataNode

29699 DataNode

Exit from the data-node3 zone by pressing Ctrl-D.

5. Create a /tmp directory and set its permissions to 1777 (drwxrwxrwt). Then create the HDFS file system using the hadoop fs command:

root@global_zone:~# zlogin -l hdfs name-node1

hdfs@name-node1:~$ hadoop fs -mkdir /tmp

Note: You might get the warning message NativeCodeLoader: Unable to load native-hadoop library for your platform...using builtin-java classes where applicable. Hadoop isn’t able to use native platform libraries that accelerate the Hadoop suite. These native libraries are optional; the port of the Oracle Solaris hadoop 2.x native libraries is a work in progress.

hdfs@name-node1:~$ hadoop fs -chmod -R 1777 /tmp

6. Create a history directory and set permissions and ownership:

hdfs@name-node1:~$ hadoop fs -mkdir /user

hdfs@name-node1:~$ hadoop fs -mkdir /user/history

hdfs@name-node1:~$ hadoop fs -chmod -R 1777 /user/history

hdfs@name-node1:~$ hadoop fs -chown yarn /user/history

7. Create the log directories:

hdfs@name-node1:~$ hadoop fs -mkdir /var

hdfs@name-node1:~$ hadoop fs -mkdir /var/log

hdfs@name-node1:~$ hadoop fs -mkdir /var/log/hadoop-yarn

hdfs@name-node1:~$ hadoop fs -chown yarn:mapred /var/log/hadoop-yarn

8. Create a directory for user bob and set ownership:

hdfs@name-node1:~$ hadoop fs -mkdir /user/bob

hdfs@name-node1:~$ hadoop fs -chown bob /user/bob

9. Verify the HDFS file structure:

hdfs@name-node:~$ hadoop fs -ls -R /

drwxrwxrwt - hdfs supergroup 0 2014-02-26 10:43 /tmp

drwxr-xr-x - hdfs supergroup 0 2014-02-26 10:58 /user

drwxr-xr-x - bob supergroup 0 2014-02-26 10:58 /user/bob

drwxrwxrwt - yarn supergroup 0 2014-02-26 10:50

/user/history

drwxr-xr-x - hdfs supergroup 0 2014-02-26 10:53 /var

drwxr-xr-x - hdfs supergroup 0 2014-02-26 10:53 /var/log

drwxr-xr-x - yarn mapred 0 2014-02-26 10:53

/var/log/hadoop-yarn

Page 37: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

10. Exit from the name-node1 zone by pressing Ctrl-D. 11. Start the YARN resource-manager service using the following commands:

root@global_zone:~# zlogin -l yarn resource-manager

yarn@resource-manager:~$ yarn-daemon.sh start resourcemanager

yarn@resource-manager:~$ /usr/jdk/latest/bin/jps | grep ResourceManager

29776 ResourceManager

12. Start the NodeManager process on all DataNodes and verify the status:

root@global_zone:~# zlogin -l yarn data-node1 yarn-daemon.sh start

nodemanager

root@global_zone:~# zlogin -l yarn data-node1 /usr/jdk/latest/bin/jps |

grep NodeManager

29920 NodeManager

root@global_zone:~# zlogin -l yarn data-node2 yarn-daemon.sh start

nodemanager

root@global_zone:~# zlogin -l yarn data-node2 /usr/jdk/latest/bin/jps |

grep NodeManager

29930 NodeManager

root@global_zone:~# zlogin -l yarn data-node3 yarn-daemon.sh start

nodemanager

root@global_zone:~# zlogin -l yarn data-node3 /usr/jdk/latest/bin/jps |

grep NodeManager

29982 NodeManager

13. Start the MapReduce History Server and verify its status:

root@global_zone:~# zlogin -l mapred resource-manager

mapred@history-server:~$ mr-jobhistory-daemon.sh start historyserver

mapred@history-server:~$ /usr/jdk/latest/bin/jps | grep

JobHistoryServer

654 JobHistoryServer

Exit the resource-manager zone by pressing Cntr-D

14. Log in to name-node1:

15. root@global_zone:~# zlogin -l hdfs name-node1

16. Use the following command to show basic HDFS statistics for the cluster:

hdfs@name-node1:~$ hdfs dfsadmin -report

13/11/26 05:16:51 WARN util.NativeCodeLoader: Unable to load native-

hadoop library for your platform... using builtin-java classes where

applicable

Configured Capacity: 1077762507264 (1003.74 GB)

Present Capacity: 1075847407736 (1001.96 GB)

DFS Remaining: 1075845337088 (1001.96 GB)

DFS Used: 2070648 (1.97 MB)

DFS Used%: 0.00%

Under replicated blocks: 4

Blocks with corrupt replicas: 0

Missing blocks: 0

Page 38: How to Set Up a Hadoop Cluster Using Oracle Solarisdocs.huihoo.com/oracle/openworld/2014/HOL2086-Set-Up-a-Hadoop-2... · How to Set Up a Hadoop Cluster Using Oracle Solaris Hands-On

-------------------------------------------------

Datanodes available: 3 (3 total, 0 dead)

17. Use the following command to show the cluster topology:

18. hdfs@name-node1:~$ hdfs dfsadmin -printTopology

13/11/26 05:19:03 WARN util.NativeCodeLoader: Unable to load native-

hadoop library for your platform... using builtin-java classes where

applicable

Rack: /default-rack

10.153.111.222:50010 (data-node1)

10.153.111.223:50010 (data-node2)

10.153.111.224:50010 (data-node3)

Note: You might get the warning message NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes

where applicable. Hadoop is able to use native platform libraries that accelerate the

Hadoop suite. These native libraries are optional; the port of the Oracle Solaris hadoop

2.x native libraries is a work in progress.

19. Run a simple MapReduce job:

root@global_zone:~# zlogin -l bob name-node1 hadoop jar

/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-

2.2.0.jar pi 10 20

where:

o zlogin -l bob name-node1 specifies that the command be run as user bob on the name-node1 zone.

o hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-

mapreduce-examples-2.2.0.jar pi specifies the Hadoop .jar file. o 10 specifies the number of maps. o 20 specifies the number of samples.