how to set up a hadoop cluster using oracle...
Post on 03-Jul-2018
250 Views
Preview:
TRANSCRIPT
How to Set Up a Hadoop Cluster Using
Oracle Solaris
Hands-On Labs of the System Admin and Developer Community of OTN
by Orgad Kimchi with contributions from Jeff Taylor
How to set up a Hadoop cluster using the Oracle Solaris Zones, ZFS, and network virtualization
technologies.
Lab Introduction
This hands-on lab presents exercises that demonstrate how to set up an Apache Hadoop cluster
using Oracle Solaris 11 technologies such as Oracle Solaris Zones, ZFS, and network
virtualization. Key topics include the Hadoop Distributed File System (HDFS) and the Hadoop
MapReduce programming model.
We will also cover the Hadoop installation process and the cluster building blocks: NameNode, a
secondary NameNode, and DataNodes. In addition, you will see how you can combine the
Oracle Solaris 11 technologies for better scalability and data security, and you will learn how to
load data into the Hadoop cluster and run a MapReduce job.
Prerequisites
This hands-on lab is appropriate for system administrators who will be setting up or maintaining
a Hadoop cluster in production or development environments. Basic Linux or Oracle Solaris
system administration experience is a prerequisite. Prior knowledge of Hadoop is not required.
System Requirements
This hands-on lab is run on Oracle Solaris 11 in Oracle VM VirtualBox. The lab is self-
contained. All you need is in the Oracle VM VirtualBox instance.
For those attending the lab at Oracle OpenWorld, your laptops are already preloaded with the
correct Oracle VM VirtualBox image.
If you want to try this lab outside of Oracle OpenWorld, you will need an Oracle Solaris 11
system. Do the following to set up your machine:
1. If you do not have Oracle Solaris 11, download it here. 2. Download the Oracle Solaris 11.1 VirtualBox Template (file size 1.7GB). 3. Install the template as described here. (Note: On step 4 of Exercise 2 for installing the template,
set the RAM size to 4 GB in order to get good performance.)
Notes for Oracle Open World Attendees
Each attendee will have his or her own laptop for the lab. In this lab we are going to use the “welcome1” password for all the user accounts
Oracle Solaris 11 uses the GNOME desktop. If you have used the desktops on Linux or other UNIX operating systems, the interface should be familiar. Here are some quick basics in case the interface is new for you.
o In order to open a terminal window in the GNOME desktop system, right-click the background of the desktop, and select Open Terminal in the pop-up menu.
o The following source code editors are provided on the lab machines: vi (type vi in a terminal window) and emacs (type emacs in a terminal window).
Summary of Lab Exercises
This hands-on lab consists the following exercises covering various Oracle Solaris and Apache
Hadoop technologies:
Download and Install Hadoop
Configure the Network Time Protocol
Create the Scripts
Create the NameNodes, DataNodes, and ResourceManager Zones
Configure the Active NameNode
Set Up SSH
Set Up the Standby NameNode and the ResourceManager
Set Up the DataNode Zones
Verify the SSH Setup
Verify Name Resolution
Format the Hadoop File System
Start the Hadoop Cluster
About Hadoop High Availability
Configure Manual Failover
About Apache ZooKeeper and Automatic Failover
Configure Automatic Failover
Conclusion
The Case for Hadoop
The Apache Hadoop software is a framework that allows for the distributed processing of large
data sets across clusters of computers using simple programming models.
To store data, Hadoop uses the Hadoop Distributed File System (HDFS), which provides high-
throughput access to application data and is suitable for applications that have large data sets.
For more information about Hadoop and HDFS, see http://hadoop.apache.org/.
The Hadoop cluster building blocks are as follows:
Active NameNode: The centerpiece of HDFS, which stores file system metadata and is
responsible for all client operations
Standby NameNode: A secondary NameNode that synchronizes its state with the active
NameNode in order to provide fast failover if the active NameNode goes down
ResourceManager: The global resource scheduler, which directs the slave NodeManager
daemons to perform the low-level I/O tasks
DataNodes: Nodes that store the data in the HDFS file system and are also known as
slaves; these nodes run the NodeManager process that communicates with the
ResourceManager
History Server: Provides REST APIs in order to allow the user to get the status of
finished applications and provides information about finished jobs
In the previous Hadoop version, the NameNode was a single point of failure (SPOF) in an HDFS
cluster. Hadoop version 2.2 provides the ability to build an HDFS cluster with high availability
(HA), and this article describes the steps involved in building such a configuration.
In the example presented in this article, all the Hadoop cluster building blocks are installed using
Oracle Solaris Zones, ZFS, and Unified Archive. Figure 1 shows the architecture:
Figure 1
Exercise 1: Install Hadoop
1. In Oracle VM VirtualBox, enable a bidirectional "shared clipboard" between the host and the guest in order to enable copying and pasting text from this file.
Figure 2
In this lab, we will use the Apache Hadoop "15 October, 2013: Release 2.2.0" release.
Note: Oracle OpenWorld attendees can skip the following step (because the preloaded Oracle
VM VirtualBox image already provides the Hadoop image).
Download the Hadoop binary file using a web browser. Open the Firefox web browser
from the desktop and download the file.
Figure 3
Open a terminal window by right-clicking any point in the background of the desktop and
selecting Open Terminal in the pop-up menu.
Figure 4
Important: In the examples presented in this article, the command prompt indicates which user
needs to run each command in addition to indicating the environment where the command
should be run. For example, the command prompt root@global _zone:~# indicates that user root
needs to run the command from the global zone.
Note: For Oracle OpenWorld attendees, the root password has been provided in the one-pager
associated with this lab. For those running this lab outside of Oracle OpenWorld, enter the root
password you entered when you followed the steps in the "System Requirements" section.
oracle@global_zone:~$ su - Password:
Oracle Corporation SunOS 5.11 11.1 September 2012
Set up the virtual network interface card (VNIC) in order to enable network access to the
global zone from the non-global zones.
root@global_zone:~# dladm create-vnic -l net0 vnic0
root@global_zone:~# ipadm create-ip vnic0
root@global_zone:~# ipadm create-addr -T static -a local=192.168.1.100/24
vnic0/addr
Verify the VNIC creation:
root@global_zone:~# ipadm show-addr vnic0
ADDROBJ TYPE STATE ADDR
vnic0/addr static ok 192.168.1.100/24
In the global zone, create the /usr/local directory if it doesn't exist.
Note: The cluster configuration will share the Hadoop directory structure (/usr/local/hadoop)
across the zones as a read-only file system. Every Hadoop cluster node needs to be able to write
its logs to an individual directory. The directory /var/log is a best-practice directory for every
Oracle Solaris Zone.
root@global_zone:~# mkdir -p /usr/local
1. Copy the Hadoop tarball to /usr/local:
root@global_zone:~# cp /export/home/oracle/hadoop-2.2.0.tar.gz
/usr/local
Unpack the tarball:
root@global_zone:~# cd /usr/local
root@global_zone:~# tar -xfz /usr/local/hadoop-2.2.0.tar.gz
2. Create the hadoop group:
root@global_zone:~# groupadd -g 200 hadoop
3. Create a symlink for the Hadoop binaries:
root@global_zone:~# ln -s /usr/local/hadoop-2.2.0 /usr/local/hadoop
4. Give ownership to the hadoop group:
root@global_zone:~# chown -R root:hadoop /usr/local/hadoop-2.2.0
5. Change the permissions:
root@global_zone:~# chmod -R 755 /usr/local/hadoop-2.2.0
6. Edit the Hadoop configuration files, which are shown in Table 1: Table 1. Hadoop Configuration Files
File Name Description
hadoop-env.sh Specifies environment variable settings used by Hadoop
yarn-env.sh Specifies environment variable settings used by YARN
mapred-env.sh Specifies environment variable settings used by MapReduce
Slaves Contains a list of machine names that run the DataNode and
NodeManager pair of daemons
core-site.xml Specifies parameters relevant to all Hadoop daemons and clients
hdfs-site.xml Specifies parameters used by the HDFS daemons and clients
mapred-
site.xml Specifies parameters used by the MapReduce daemons and clients
yarn-site.xml Specifies the configurations for the ResourceManager and
NodeManager
7. Run the following commands to change the hadoop-env.sh script:
root@global_zone:~# export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
root@global_zone:~# cd $HADOOP_CONF_DIR
Append the following lines to the hadoop-env.sh script:
root@global_zone:~# echo "export JAVA_HOME=/usr/java" >> hadoop-env.sh
root@global_zone:~# echo "export HADOOP_LOG_DIR=/var/log/hadoop/hdfs" >>
hadoop-env.sh
Append the following lines to the yarn-env.sh script:
root@global_zone:~# vi yarn-env.sh
export JAVA_HOME=/usr/java
export YARN_LOG_DIR=/var/log/hadoop/yarn
export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
Append the following lines to the mapred-env.sh script:
root@global_zone:~# echo "export JAVA_HOME=/usr/java" >> mapred-env.sh
root@global_zone:~# echo "export
HADOOP_MAPRED_LOG_DIR=/var/log/hadoop/mapred" >> mapred-env.sh
root@global_zone:~# echo "export HADOOP_MAPRED_IDENT_STRING=mapred" >>
mapred-env.sh
Edit the slaves file to replace the localhost entry with the following lines:
root@global_zone:~# vi slaves
data-node1
data-node2
data-node3
Edit the core-site.xml file so it looks like the following:
Note: fs.defaultFS is the URI that describes the NameNode address (protocol
specifier, hostname, and port) for the cluster. Each DataNode instance will register with
this NameNode and make its data available through it. In addition, the DataNodes send
heartbeats to the NameNode to confirm that each DataNode is operating and the block
replicas it hosts are available.
root@global_zone:~# vi core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://name-node1</value>
</property>
</configuration>
Edit the hdfs-site.xml file so it looks like the following.
Notes:
dfs.datanode.data.dir The path on the local file system in which the
DataNode instance should store its data.
dfs.namenode.name.dir
The path on the local file system of the NameNode
instance where the NameNode metadata is stored. It is
used only by the NameNode instance to find its
information.
dfs.replication The default replication factor for each block of data in
the file system. (For a production cluster, this should
usually be left at its default value of 3).
dfs.permission.supergroup
Specifies the UNIX group containing users that will be
treated as superusers by HDFS. You can stick with the
value of hadoop or pick your own group depending on
the security policies at your site.
root@global_zone:~# vi hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>/var/data/1/dfs/dn</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/var/data/1/dfs/nn</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permission.supergroup</name>
<value>hadoop</value>
</property>
</configuration>
Create and then edit the mapred-site.xml file so it looks like the following:
Notes:
mapreduce.framework.name Sets the execution framework to Hadoop
YARN
mapreduce.jobhistory.address Specifies the MapReduce History Server's
host:port
mapreduce.jobhistory.webapp.address Specifies the MapReduce History Server's
web UI host:port
yarn.app.mapreduce.am.staging-dir Specifies a staging directory, which YARN
requires for temporary files created by
running jobs
root@global_zone:~# cp mapred-site.xml.template mapred-site.xml
root@global_zone:~# vi mapred-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>resource-manager:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>resource-manager:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
</configuration>
Edit the yarn-site.xml file so it looks like the following:
Notes:
yarn.nodemanager.aux-services Specifies the shuffle service that needs
to be set for MapReduce applications.
yarn.nodemanager.aux-
services.mapreduce.shuffle.class Specifies the exact name of the class
for shuffle service.
yarn.resourcemanager.hostname Specifies the ResourceManager's host
name.
yarn.nodemanager.local-dirs Is a comma-separated list of paths on
the local file system where
intermediate data is written.
yarn.nodemanager.log-dirs Specifies the URIs of the directories
where the NodeManager stores
container log files.
yarn.log-aggregation-enable Specifies the configuration to enable
or disable log aggregation.
yarn.nodemanager.log-dirs Specifies a comma-separated list of
paths on the local file system where
logs are written.
root@global_zone:~# vi yarn-site.xml
<?xml version="1.0"?>
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-
services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>resource-manager</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///var/data/1/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///var/data/1/yarn/logs</value>
</property>
<property>
<name>yarn.log.aggregation.enable</name>
<value>true</value>
</property>
<property>
<description>Where to aggregate logs</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://var/log/hadoop-yarn/apps</value>
</property>
</configuration>
Configure the Network Time Protocol
We should ensure that the system clock on the Hadoop zones is synchronized by using the
Network Time Protocol (NTP).
Note: It is best to select an NTP server that can be a dedicated time synchronization source so
that other services are not negatively affected if the node is brought down for planned
maintenance.
In the following example, the global zone is configured as an NTP server.
1. Configure an NTP server:
root@global_zone:~# cp /etc/inet/ntp.server /etc/inet/ntp.conf
root@global_zone:~# chmod +w /etc/inet/ntp.conf
root@global_zone:~# touch /var/ntp/ntp.drift
2. Append the following lines to the NTP server configuration file:
root@global_zone:~# vi /etc/inet/ntp.conf
server 127.127.1.0 prefer
broadcast 224.0.1.1 ttl 4
enable auth monitor
driftfile /var/ntp/ntp.drift
statsdir /var/ntp/ntpstats/
filegen peerstats file peerstats type day enable
filegen loopstats file loopstats type day enable
filegen clockstats file clockstats type day enable
keys /etc/inet/ntp.keys
trustedkey 0
requestkey 0
controlkey 0
3. Enable the NTP server service:
root@global_zone:~# svcadm enable ntp
4. Verify that the NTP server is online by using the following command:
root@global_zone:~# svcs ntp
STATE STIME FMRI
online 15:27:55 svc:/network/ntp:default
root@global_zone:~# mkdir /usr/local/Scripts
Create the Scripts
In the following steps, you will create utility scripts that will be used to simplify repetitive processes. Table 2. Utility scripts provided for this lab exercise.
File Name Description
createzone Used for initial zone creation
buildprofile Used to create profiles that will specify details such as hostnames and
IP addresses during the initial zone creation
verifycluster We will use this script to verify the Hadoop cluster setup.
testssh Verify that password-less ssh is enabled between zones
startCluster Start every Solaris zone in the Hadoop Cluster
stopCluster Stop every Solaris zone in the Hadoop Cluster
1. Create the createzone script using your favorite editor, as shown in Listing 1. We will use this script to set up the Oracle Solaris Zones.
root@global_zone:~# vi /usr/local/Scripts/createzone
#!/bin/ksh
# FILENAME: createzone
# Create a zone
# Usage:
# createzone <zone name>
if [ $# != 1 ]
then
echo "Usage: createzone <zone name> "
exit 1
fi
ZONENAME=$1
VNICNAME=$2
zonecfg -z $ZONENAME > /dev/null 2>&1 << EOF
create
set autoboot=true
set limitpriv=default,dtrace_proc,dtrace_user,sys_time
set zonepath=/zones/$ZONENAME
add fs
set dir=/usr/local
set special=/usr/local
set type=lofs
set options=[ro,nodevices]
end
verify
exit
EOF
if [ $? == 0 ] ; then
echo "Successfully created the $ZONENAME zone"
else
echo "Error: unable to create the $ZONENAME zone"
exit 1
fi
Listing 1. createzone script
2. Create the buildprofile script using your favorite editor, as shown in Listing 2. We will use this script to set up the Oracle Solaris Zones.
root@global_zone:~# vi /usr/local/Scripts/buildprofile
#!/bin/ksh
#
# Copyright 2006-2011 Oracle Corporation. All rights reserved.
# Use is subject to license terms.
#
# This script serves as an example of how to instantiate several zones
# with no administrative interaction. Run the script with no arguments
# to get a usage message.
export PATH=/usr/bin:/usr/sbin
me=$(basename $0)
function fail_usage {
print -u2 "Usage:
$me <sysconfig.xml>
<zone> <ipaddr>"
exit 2
}
function error {
print -u2 "$me: ERROR: $@"
}
# Parse and check arguments
(( $# != 3 )) && fail_usage
# Be sure the sysconfig profile is readable and ends in .xml
sysconfig=$1
zone=$2
ipaddr=$3
if [[ ! -f $sysconfig || ! -r $sysconfig || $sysconfig != *.xml ]] ;
then
error "sysconfig profile missing, unreadable, or not *.xml"
fail_usage
fi
#
# Create a temporary directory for all temp files
#
export TMPDIR=$(mktemp -d /tmp/$me.XXXXXX)
if [[ -z $TMPDIR ]]; then
error "Could not create temporary directory"
exit 1
fi
trap 'rm -rf $TMPDIR' EXIT
# Customize the nodename in the sysconfig profile
z_sysconfig=$TMPDIR/{$zone}.xml
z_sysconfig2=$TMPDIR/{$zone}2.xml
search="<propval type=\"astring\" name=\"nodename\" value=\"name-
node1\"/>"
replace="<propval type=\"astring\" name=\"nodename\"
value=\"$zone\"/>"
sed "s|$search|$replace|" $sysconfig > $z_sysconfig
search="<propval type=\"net_address_v4\" name=\"static_address\"
value=\"192.168.1.1/24\"/>"
replace="<propval type=\"net_address_v4\" name=\"static_address\"
value=\"$ipaddr\"/>"
sed "s|$search|$replace|" $z_sysconfig > $z_sysconfig2
cp $z_sysconfig2 ./$zone-template.xml
rm -rf $TMPDIR
exit 0
Listing 2. buildprofile script
3. Create the verifycluster script using your favorite editor, as shown in Listing 3. We will use this script to verify the Hadoop cluster setup.
root@global_zone:~# vi /usr/local/Scripts/verifycluster
#!/bin/ksh
# FILENAME: verifycluster
# Verify the hadoop cluster configuration
# Usage:
# verifycluster
RET=1
for transaction in _; do
for i in name-node1 name-node2 resource-manager data-node1 data-node2
data-node3
do
cmd="zlogin $i ls /usr/local > /dev/null 2>&1 "
eval $cmd || break 2
done
for i in name-node1 name-node2 resource-manager data-node1 data-
node2 data-node3
do
cmd="zlogin $i ping name-node1 > /dev/null 2>&1"
eval $cmd || break 2
done
for i in name-node1 name-node2 resource-manager data-node1 data-
node2 data-node3
do
cmd="zlogin $i ping name-node2 > /dev/null 2>&1"
eval $cmd || break 2
done
for i in name-node1 name-node2 resource-manager data-node1 data-node2
data-node3
do
cmd="zlogin $i ping resource-manager > /dev/null 2>&1"
eval $cmd || break 2
done
for i in name-node1 name-node2 resource-manager data-node1 data-
node2 data-node3
do
cmd="zlogin $i ping data-node1 > /dev/null 2>&1"
eval $cmd || break 2
done
for i in name-node1 name-node2 resource-manager data-node1 data-
node2 data-node3
do
cmd="zlogin $i ping data-node2 > /dev/null 2>&1"
eval $cmd || break 2
done
for i in name-node1 name-node2 resource-manager data-node1 data-
node2 data-node3
do
cmd="zlogin $i ping data-node3 > /dev/null 2>&1"
eval $cmd || break 2
done
RET=0
done
if [ $RET == 0 ] ; then
echo "The cluster is verified"
else
echo "Error: unable to verify the cluster"
fi
exit $RET
Listing 3. verifycluster script
4. Create the testssh script, as shown in Listing 4. We will use this script to verify the SSH setup.
root@global_zone:~# vi /usr/local/Scripts/testssh
#!/bin/ksh
for i in name-node1 name-node2 resource-manager data-node1 data-node2
data-node3
do
ssh $i exit
done
Listing 4. testssh script
5. Create the startcluster script, as shown in Listing 5. We will use this script to start all the services on the Hadoop cluster.
root@global_zone:~# vi /usr/local/Scripts/startcluster
#!/bin/ksh
zlogin -l hdfs name-node1 'hadoop-daemon.sh start namenode'
zlogin -l hdfs data-node1 'hadoop-daemon.sh start datanode'
zlogin -l hdfs data-node2 'hadoop-daemon.sh start datanode'
zlogin -l hdfs data-node3 'hadoop-daemon.sh start datanode'
zlogin -l yarn resource-manager 'yarn-daemon.sh start resourcemanager'
zlogin -l yarn data-node1 'yarn-daemon.sh start nodemanager'
zlogin -l yarn data-node2 'yarn-daemon.sh start nodemanager'
zlogin -l yarn data-node3 'yarn-daemon.sh start nodemanager'
zlogin -l mapred resource-manager 'mr-jobhistory-daemon.sh start
historyserver'
Listing 5. startcluster script
6. Create the stopcluster script, as shown in Listing 6. We will use this script to stop all the services on the Hadoop cluster.
root@global_zone:~# vi /usr/local/Scripts/stopcluster
#!/bin/ksh
zlogin -l hdfs name-node1 'hadoop-daemon.sh stop namenode'
zlogin -l hdfs data-node1 'hadoop-daemon.sh stop datanode'
zlogin -l hdfs data-node2 'hadoop-daemon.sh stop datanode'
zlogin -l hdfs data-node3 'hadoop-daemon.sh stop datanode'
zlogin -l yarn resource-manager 'yarn-daemon.sh stop resourcemanager'
zlogin -l yarn data-node1 'yarn-daemon.sh stop nodemanager'
zlogin -l yarn data-node2 'yarn-daemon.sh stop nodemanager'
zlogin -l yarn data-node3 'yarn-daemon.sh stop nodemanager'
zlogin -l mapred resource-manager 'mr-jobhistory-daemon.sh stop
historyserver'
Listing 6. stopcluster script
7. The Solaris command “wc –l” will display the number of lines in files. You can use this as a sanity check to verify that your scripts are about the right size:
root@global_zone:~# wc -l /usr/local/Scripts/*
64 /usr/local/Scripts/buildprofile
36 /usr/local/Scripts/createzone
12 /usr/local/Scripts/startcluster
10 /usr/local/Scripts/stopcluster
9 /usr/local/Scripts/testssh
67 /usr/local/Scripts/verifycluster
198 total!/bin/ksh
8. Change the scripts' permissions:
root@global_zone:~# chmod +x /usr/local/Scripts/*
Create the NameNodes, DataNodes, and ResourceManager Zones
We will leverage the integration between Oracle Solaris Zones virtualization technology and the
ZFS file system that is built into Oracle Solaris.
Table 2 shows a summary of the Hadoop zones we will create:
Table 2. Zone Summary
Function Zone Name ZFS Mount Point IP Address
Active NameNode name-node1 /zones/name-node 192.168.1.1/24
Standby
NameNode name-node2 /zones/sec-name-node 192.168.1.2/24
ResourceManager resource-
manager /zones/resource-
manager 192.168.1.3/24
DataNode data-node1 /zones/data-node1 192.168.1.4/24
DataNode data-node2 /zones/data-node2 192.168.1.5/24
DataNode data-node3 /zones/data-node3 192.168.1.6/24
1. Create the name-node1 zone using the createzone script, which will create the zone configuration file. For the argument, the script needs the zone's name, for example, createzone <zone name>.
root@global_zone:~# /usr/local/Scripts/createzone name-node1
Successfully created the name-node1 zone
2. Create the name-node2 zone using the createzone script:
root@global_zone:~# /usr/local/Scripts/createzone name-node2
Successfully created the name-node2 zone
3. Create the resource-manager zone using the createzone script:
root@global_zone:~# /usr/local/Scripts/createzone resource-manager
Successfully created the resource-manager zone
4. Create the three DataNode zones using the createzone scripts:
root@global_zone:~# /usr/local/Scripts/createzone data-node1
Successfully created the data-node1 zone
root@global_zone:~# /usr/local/Scripts/createzone data-node2
Successfully created the data-node2 zone
root@global_zone:~# /usr/local/Scripts/createzone data-node3
Successfully created the data-node3 zone
Configure the Active NameNode
Let's create a system configuration profile template for the name-node1 zone. The system
configuration profile will include the host information, such as the host name, IP address, and
name services.
1. Run the sysconfig command, which will start the System Configuration Tool (see Figure 2):
root@global_zone:~# sysconfig create-profile
Figure 2. System Configuration Tool
2. Press ESC-2 to start the wizard
3. Provide the zone's host information by using the following configuration for the name-node1 zone:
a. For the host name, use name-node1 b. Select manual network configuration. c. Ensure the network interface net0 has an IP address of 192.168.1.1 and a netmask of
255.255.255.0 . Leave the “router” field blank. d. Ensure the name service is based on your network configuration. In this article, we will
use /etc/hosts for name resolution, so we won't set up DNS for host name resolution. Select Do not configure DNS.
e. For Alternate Name Service, select None. f. For Time Zone Regions, select Americas. g. For Time Zone Locations, select United States. h. For Time Zone, select Pacific Time. i. For Locale: Language, select English. j. For Locale: Territory, select United States (en_US.UTF-8). k. For Keyboard, select US-English. l. Enter your root password, but leave the optional user account blank. m. For Support – Registration, provide your My Oracle Support credentials. n. For Support – Network Configuration, select an internet access method for Oracle
Configuration Manager and Oracle Auto Service Request.
o. Review the settings below before hitting Esc-2 “apply”. The changes are not “applied”, instead, they are written to a file named /system/volatile/profile/sc_profile.xml
4. Copy the profile to /root/name-node1-template.xml:
root@global_zone:~# cp /system/volatile/profile/sc_profile.xml
/root/name-node1-template.xml
5. Now, install the name-node1 zone. Installing the first zone will take a couple of minutes. Later we will clone this zone in order to accelerate the creation of the other zones:
root@global_zone:~# zoneadm -z name-node1 install -c /root/name-node1-
template.xml
The following ZFS file system(s) have been created:
rpool/zones/name-node1
Progress being logged to /var/log/zones/zoneadm.20140225T111519Z.name-
node1.install
Image: Preparing at /zones/name-node1/root.
[...]
6. Boot the name-node1 zone:
root@global_zone:~# zoneadm -z name-node1 boot
7. Check the status of the zones we've created:
root@global_zone:~# zoneadm list -cv
ID NAME STATUS PATH BRAND IP
0 global running / solaris shared
1 name-node1 running /zones/name-node1 solaris excl
- name-node2 configured /zones/name-node2 solaris excl
- resource-manager configured /zones/resource-manager solaris excl
- data-node1 configured /zones/data-node1 solaris excl
- data-node2 configured /zones/data-node2 solaris excl
- data-node3 configured /zones/data-node3 solaris excl
We can see the six zones that we have created.
8. zlogin is a utility that is used to enter a non-global zone from the global zone. zlogin has 3 modes: interactive, non-interactive and console. For our first login to the newly created zone, we will use the console (-C) mode. When you log in to the console of name-node1 zone, you will see the progress of the initial boot. Subsequent boots will be much faster.
root@global_zone:~# zlogin –C name-node1
[Connected to zone 'name-node1' console]
134/134
Hostname: name-node1
. . .
login: root
Password: ********
9. Verify that all the services are up and running: root@name-node1:~# svcs -xv
10. If all the services are up and running without any issues, the command will return to the system prompt without any error message. To disconnect from a zone virtual console, use the tilde (~) character and a period:
root@name-node1:~ # ~.
[Connection to zone 'name-node1' console closed]
11. Re-enter the zone in interactive mode.
root@global_zone:~# zlogin name-node1
[Connected to zone 'name-node1' pts/4]
Oracle Corporation SunOS 5.11 11.2 June 2014
root@name-node1:~#
12. Developing for Hadoop requires a Java programming environment. You can install Java Development Kit (JDK) 7 using the following command:
root@name-node1:~# pkg install --accept jdk-7
13. Verify the Java installation:
root@name-node1:~# java -version
java version "1.7.0_55”
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) Server VM (build 24.55-b03, mixed mode
14. Create the hadoop group:
root@name-node1:~# groupadd -g 200 hadoop
For the Hadoop cluster, create the four users shown in Table 3. Table 3. Hadoop Users Summary
User:Group Description
hdfs:hadoop The NameNodes and DataNodes run as this user.
yarn:hadoop The ResourceManager and NodeManager services run as this user.
mapred:hadoop The History Server runs as this user.
bob:staff This user will run the MapReduce jobs.
15. Add the hdfs user:
root@name-node1:~# useradd -u 200 -m -g hadoop hdfs
Set the hdfs user's password.
In this lab we are going to use the “welcome1” password for all he accounts
root@name-node1:~# passwd hdfs
New Password:<enter hdfs password>
Re-enter new Password: <re-enter hdfs password>
passwd: password successfully changed for hdfs
Add the yarn user:
root@name-node1:~# useradd -u 201 -m -g hadoop yarn
root@name-node1:~# passwd yarn
New Password: <enter yarn password>
Re-enter new Password: <re-enter yarn password>
passwd: password successfully changed for yarn
Add the mapred user:
root@name-node1:~# useradd -u 202 -m -g hadoop mapred
root@name-node1:~# passwd mapred
New Password: <enter mapred password>
Re-enter new Password: <re-enter mapred password>
passwd: password successfully changed for mapred
Create a directory for the YARN log files:
root@name-node1:~# mkdir -p /var/log/hadoop/yarn
root@name-node1:~# chown yarn:hadoop /var/log/hadoop/yarn
Create a directory for the HDFS log files:
root@name-node1:~# mkdir -p /var/log/hadoop/hdfs
root@name-node1:~# chown hdfs:hadoop /var/log/hadoop/hdfs
Create a directory for the mapred log files:
root@name-node1:~# mkdir -p /var/log/hadoop/mapred
root@name-node1:~# chown mapred:hadoop /var/log/hadoop/mapred
Create a directory for the HDFS metadata:
root@name-node1:~# mkdir -p /var/data/1/dfs/nn
root@name-node1:~# chmod 700 /var/data/1/dfs/nn
root@name-node1:~# chown -R hdfs:hadoop /var/data/1/dfs/nn
Create a Hadoop data directory to store the HDFS blocks:
root@name-node1:~# mkdir -p /var/data/1/dfs/dn
root@name-node1:~# chown -R hdfs:hadoop /var/data/1/dfs/dn
Configure local storage directories for use by YARN:
root@name-node1:~# mkdir -p /var/data/1/yarn/local
root@name-node1:~# mkdir -p /var/data/1/yarn/logs
root@name-node1:~# chown -R yarn:hadoop /var/data/1/yarn/local
root@name-node1:~# chown -R yarn:hadoop /var/data/1/yarn/logs
Create the runtime directories:
root@name-node1:~# mkdir -p /var/run/hadoop/yarn
root@name-node1:~# chown yarn:hadoop /var/run/hadoop/yarn
root@name-node1:~# mkdir -p /var/run/hadoop/hdfs
root@name-node1:~# chown hdfs:hadoop /var/run/hadoop/hdfs
root@name-node1:~# mkdir -p /var/run/hadoop/mapred
root@name-node1:~# chown mapred:hadoop /var/run/hadoop/mapred
Add the user bob (later this user will run the MapReduce jobs):
root@name-node1:~# useradd -m -u 1000 bob
root@name-node1:~# passwd bob
New Password: <enter bob password>
Re-enter new Password: <re-enter bob password>
passwd: password successfully changed for bob
16. Switch to user bob
root@name-node1:~# su - bob
17. Using your favorite editor, append the following lines to .profile:
bob@name-node1:~$ vi $HOME/.profile
# Set JAVA_HOME
export JAVA_HOME=/usr/java
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
18. Logout using the exit command
bob@name-node1:~$ exit
logout
19. Configure an NTP client, as shown in the following example:
Install the NTP package:
root@name-node1:~# pkg install ntp
Create the NTP client configuration files:
root@name-node1:~# cp /etc/inet/ntp.client /etc/inet/ntp.conf
root@name-node1:~# chmod +w /etc/inet/ntp.conf
root@name-node1:~# touch /var/ntp/ntp.drift
a. Edit the NTP client configuration file:
Note: In this setup, we are using the global zone as a time server so we add its
name (for example, global-zone) to /etc/inet/ntp.conf.
root@name-node1:~# vi /etc/inet/ntp.conf
Append these lines to the bottom of the file:
server global-zone prefer
driftfile /var/ntp/ntp.drift
statsdir /var/ntp/ntpstats/
filegen peerstats file peerstats type day enable
filegen loopstats file loopstats type day enable
20. Add the Hadoop cluster members' host names and IP addresses to /etc/hosts:
root@name-node1:~# vi /etc/hosts
::1 localhost
127.0.0.1 localhost loghost
192.168.1.1 name-node1
192.168.1.2 name-node2
192.168.1.3 resource-manager
192.168.1.4 data-node1
192.168.1.5 data-node2
192.168.1.6 data-node3
192.168.1.100 global-zone
21. Enable the NTP client service:
root@name-node1:~# svcadm enable ntp
22. Verify the NTP client status:
root@name-node1:~#:~# svcs ntp
STATE STIME FMRI
online 1:04:35 svc:/network/ntp:default
Check whether the NTP client can synchronize its clock with the NTP server:
root@name-node1:~# ntpq -p
remote refid st t when poll reach delay offset
jitter
=======================================================================
=======
global-zone LOCAL(0) 6 u 19 64 1 0.374 0.119
0.000
You can see that the global-zone is the NTP server
Set Up SSH
Set up SSH key-based authentication for the Hadoop users on the name-node1 zone in order to
enable password-less login to other zones in the Hadoop cluster:
First, switch to the user hdfs and copy the SSH public key into the ~/.ssh/authorized_keys file:
root@name-node1:~# su - hdfs
Oracle Corporation SunOS 5.11 11.1 September 2012
hdfs@name-node1:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
hdfs@name-nod1e:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Edit $HOME/.profile and append to the end of the file the following lines:
hdfs@name-node1:~$ vi $HOME/.profile
# Set JAVA_HOME
export JAVA_HOME=/usr/java
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
1. Switch to user yarn and edit $HOME/.profile to append to the end of the file the following lines:
hdfs@name-node1:~$ su - yarn
Password: <provide yarn password>
Oracle Corporation SunOS 5.11 11.1 September 2012
yarn@name-node1:~$ vi $HOME/.profile
# Set JAVA_HOME
export JAVA_HOME=/usr/java
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
2. Copy the SSH public key into the ~/.ssh/authorized_keys file:
yarn@name-node1:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
yarn@name-node1:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
3. Switch to user mapred and edit $HOME/.profile to append to the end of the file the following lines:
yarn@name-node1:~$ su - mapred
Password: <provide mapred password>
Oracle Corporation SunOS 5.11 11.1 September 2012
mapred@name-node1:~$ vi $HOME/.profile
# Set JAVA_HOME
export JAVA_HOME=/usr/java
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
4. Copy the SSH public key into the ~/.ssh/authorized_keys file:
mapred@name-node1:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
5. mapred@name-node1:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Set Up the Standby NameNode and the ResourceManager
1. Run the following command to execute the .profile script:
mapred@name-node1:~$ source $HOME/.profile
2. Check that Hadoop runs by running the following command:
mapred@name-node1:~$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop-
2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
Note: Press Ctrl-D several times until you exit from the name-node1 console and return
to the global zone. You can verify that you are in the global zone by using the zonename
command:
root@global_zone:~# zonename
global
3. Create a profile for the name-node2 zone using the name-node1 profile as a template and using the buildprofile script. In a later step, we will use this profile in order to create the name-node2 zone.
Note: For arguments, the script needs the template profile's name (/root/name-node1-
template.xml, which we created in a previous step), the zone's name (name-node2), and
the zone's IP address (192.168.1.2, as shown in Table 2).
Change to the /root directory and create the zone profile there:
root@global_zone:~# cd /root
root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-
node1-template.xml name-node2 192.168.1.2/24
Verify the profile's creation:
root@global_zone:~# ls -l /root/name-node2-template.xml
-rw-r--r-- 1 root root 3715 Feb 25 05:59 /root/name-
node2-template.xml
4. From the global zone, run the following command to create the name-node2 zone as a clone of the name-node1:
Shut down the name-node1 zone (we can clone only halted zones):
root@global_zone:~# zoneadm -z name-node1 shutdown
Then clone the zone using the profile we created for name-node2:
root@global_zone:~# zoneadm -z name-node2 clone -c /root/name-
node2-template.xml name-node1
5. Boot the name-node2 zone:
root@global_zone:~# zoneadm -z name-node2 boot
6. Log in to the name-node2 zone:
root@global_zone:~# zlogin name-node2
7. Wait two minutes and verify that all the services are up and running:
root@name-node2:~# svcs -xv
If all the services are up and running without any issues, the command will return to the
system prompt without any error message.
8. Exit from the name-node2 zone by pressing Ctrl- D.
9. Create the resource-manager profile using the name-node1 profile as a template:
root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-
template.xml resource-manager 192.168.1.3/24
10. Create the data-node1 profile using the name-node1 profile as a template:
root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-
template.xml data-node1 192.168.1.4/24
11. Create the data-node2 profile using the name-node1 profile as a template:
root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-
template.xml data-node2 192.168.1.5/24
12. Create the data-node3 profile using the name-node1 profile as a template:
root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-
template.xml data-node3 192.168.1.6/24
13. Verify the creation of the profiles:
root@global_zone:~# ls -l /root/*.xml
-rw-r--r-- 1 root root 3715 Feb 25 08:05 /root/data-node1-
template.xml
-rw-r--r-- 1 root root 3715 Feb 25 08:05 /root/data-node2-
template.xml
-rw-r--r-- 1 root root 3715 Feb 25 08:05 /root/data-node3-
template.xml
-r-------- 1 root root 3715 Feb 25 03:11 /root/name-node1-
template.xml
-rw-r--r-- 1 root root 3715 Feb 25 07:57 /root/name-node2-
template.xml
-rw-r--r-- 1 root root 3735 Feb 25 08:04 /root/resource-manager-
template.xml
14. From the global zone, run the following command to create the resource-manager zone as a clone of name-node1:
root@global_zone:~# zoneadm -z resource-manager clone -c
/root/resource-manager-template.xml name-node1
15. Boot the resource-manager zone:
root@global_zone:~# zoneadm -z resource-manager boot
Set Up the DataNode Zones
In this section, we can leverage the integration between Oracle Solaris Zones virtualization
technology and the ZFS file system that is built into Oracle Solaris.
6. Run the following commands to create the three DataNode zones as a clone of the name-node1 zone, and then boot the new zones:
root@global_zone:~# zoneadm -z data-node1 clone -c /root/data-node1-
template.xml name-node1
root@global_zone:~# zoneadm -z data-node1 boot
root@global_zone:~# zoneadm -z data-node2 clone -c /root/data-node2-
template.xml name-node1
root@global_zone:~# zoneadm -z data-node2 boot
root@global_zone:~# zoneadm -z data-node3 clone -c /root/data-node3-
template.xml name-node1
root@global_zone:~# zoneadm -z data-node3 boot
7. Boot the name-node1 zone:
root@global_zone:~# zoneadm -z name-node1 boot
8. Check the status of the zones we've created:
root@global_zone:~# zoneadm list -cv
ID NAME STATUS PATH BRAND IP
0 global running / solaris shared
6 name-node1 running /zones/name-node1 solaris excl
10 name-node2 running /zones/name-node2 solaris excl
11 resource-manager running /zones/resource-manager solaris excl
12 data-node1 running /zones/data-node1 solaris excl
13 data-node2 running /zones/data-node2 solaris excl
14 data-node3 running /zones/data-node3 solaris excl
We can see that all the zones are running now.
Verify the SSH Setup
1. Log in to the name-node1 zone:
root@global_zone:~# zlogin name-node1
[Connected to zone 'name-node1' pts/1]
Oracle Corporation SunOS 5.11 11.1 September 2012
root@name-node1:~# su - hdfs
Oracle Corporation SunOS 5.11 11.1 September 2012
2. Run the testssh script to log in to the cluster nodes using the ssh command:
Note: Once for each zone (name-node1 and name-node2), six times, you will need to
enter yes at the command prompt for the "Are you sure you want to continue connecting
(yes/no)?" question.
hdfs@name-node1:~$ /usr/local/Scripts/testssh
The authenticity of host 'name-node1 (192.168.1.1)' can't be
established.
RSA key fingerprint is 07:b6:b7:40:0c:39:cd:60:32:c4:98:07:66:79:63:1c.
Are you sure you want to continue connecting (yes/no)? yes
3. Switch to user yarn and run the testssh script again:
root@name-node1:~# su - yarn
Password: <enter yarn password>
yarn@name-node1:~$ /usr/local/Scripts/testssh
4. Switch to user mapred and run the testssh script again:
yarn@name-node1:~$ su - mapred
Password: <enter mapred password>
mapred@name-node1:~$ /usr/local/Scripts/testssh
5. Press Control-D four times to return to the global zone and repeat similar steps for name-node2:
Edit the /etc/hosts file inside name-node2 in order to add the name-node1 entry:
root@global_zone:~# zlogin name-node2 'echo "192.168.1.1 name-
node1" >> /etc/hosts'
Log in to the name-node2 zone:
root@global_zone:~# zlogin name-node2
[Connected to zone 'name-node1' pts/1]
Oracle Corporation SunOS 5.11 11.1 September 2012
root@name-node2:~# su - hdfs
Oracle Corporation SunOS 5.11 11.1 September 2012
a. Run the testssh script in order to log in to the cluster nodes using the ssh command.
Note: Enter yes at the command prompt for the "Are you sure you want to
continue connecting (yes/no)?" question.
hdfs@name-node2:~$ /usr/local/Scripts/testssh
The authenticity of host 'name-node1 (192.168.1.1)' can't be
established.
RSA key fingerprint is
07:b6:b7:40:0c:39:cd:60:32:c4:98:07:66:79:63:1c.
Are you sure you want to continue connecting (yes/no)? yes
Switch to user yarn:
root@name-node2:~# su - yarn
Password: <enter yarn password>
Run the testssh script:
yarn@name-node2:~$ /usr/local/Scripts/testssh
Switch to user mapred:
yarn@name-node2:~$ su - mapred
Password: <enter mapred password>
Run the testssh script:
mapred@name-node2:~$ /usr/local/Scripts/testssh
Verify Name Resolution
1. From the global zone, edit the /etc/hosts files inside resource-manager and the DataNodes in order to add the name-node1 entry:
root@global_zone:~# zlogin name-node2 'echo "192.168.1.1 name-node1" >>
/etc/hosts'
root@global_zone:~# zlogin resource-manager 'echo "192.168.1.1 name-
node1" >> /etc/hosts'
root@global_zone:~# zlogin data-node1 'echo "192.168.1.1 name-node1" >>
/etc/hosts'
root@global_zone:~# zlogin data-node2 'echo "192.168.1.1 name-node1" >>
/etc/hosts'
root@global_zone:~# zlogin data-node3 'echo "192.168.1.1 name-node1" >>
/etc/hosts'
2. Verify name resolution by ensuring that the /etc/hosts files for the global zone and all the Hadoop zones have the host entries shown below:
root@global-zone:~# for zone in name-node1 name-node2 resource-manager
data-node1 data-node2 data-node3; do echo "============== $zone
============"; zlogin $zone cat /etc/hosts; done
============== name-node1 ============
::1 localhost
127.0.0.1 localhost loghost
192.168.1.1 name-node1
192.168.1.2 name-node2
192.168.1.3 resource-manager
192.168.1.4 data-node1
192.168.1.5 data-node2
192.168.1.6 data-node3
192.168.1.100 global-zone
Note: If you are using the global zone as an NTP server, you must also add its host name
and IP address to /etc/hosts.
3. Verify the cluster using the verifycluster script:
root@global_zone:~# /usr/local/Scripts/verifycluster
If the cluster setup is correct, you will get a cluster is verified message.
Note: If the verifycluster script fails with an error message, check that the
/etc/hosts file in every zone includes all the zones names, as described in Step 1, and
then rerun the verifycluster script again.
Format the Hadoop File System
1. To format HDFS, run the following commands:
root@global_zone:~# zlogin -l hdfs name-node1
hdfs@name-node:$ hdfs namenode -format
2. Look for the following message, which indicates HDFS has been set up:
...
INFO common.Storage: Storage directory /var/data/1/dfs/nn has been
successfully formatted.
...
Start the Hadoop Cluster
Table 4 describes the startup scripts.
Table 4. Startup Scripts
User Command Command Description
hdfs hadoop-daemon.sh start namenode Starts the HDFS daemon (NameNode
process)
hdfs hadoop-daemon.sh start datanode Starts the DataNode process on all
DataNodes
yarn yarn-daemon.sh start
resourcemanager Starts YARN on the
ResourceManager
yarn yarn-daemon.sh start
nodemanager Starts the NodeManager process on
all DataNodes
mapred mr-jobhistory-daemon.sh start
historyserver Starts the MapReduce History Server
1. Start HDFS by running the following command:
hdfs@name-node1:~$ hadoop-daemon.sh start namenode
starting namenode, logging to /var/log/hadoop/hdfs/hadoop--namenode-
name-node1.out
2. Run the jps command to verify that the NameNode process has been started:
hdfs@name-node1:~$ /usr/jdk/latest/bin/jps | grep NameNode
4223 NameNode
You should see the NameNode process ID (for example, 4223). If the process did not
start, look at the log file /var/log/hadoop/hdfs/hadoop--namenode-name-node1.log
to find the reason.
3. Exit from the name-node1 zone by pressing Ctrl-D. 4. Start the DataNodes on all the slaves (data-node1, data-node2, and data-node3):
Run the following commands for data-node1:
root@global_zone:~# zlogin -l hdfs data-node1
hdfs@data-node1:~$ hadoop-daemon.sh start datanode
hdfs@data-node1:~$ /usr/jdk/latest/bin/jps | grep DataNode
19762 DataNode
Exit from the data-node1 zone by pressing Ctrl-D.
Run the following commands for data-node2:
root@global_zone:~# zlogin -l hdfs data-node2
hdfs@data-node2:~$ hadoop-daemon.sh start datanode
hdfs@data-node2:~$ /usr/jdk/latest/bin/jps | grep DataNode
21525 DataNode
Exit from the data-node2 zone by pressing Ctrl-D.
Run the following commands for data-node3:
root@global_zone:~# zlogin -l hdfs data-node3
hdfs@data-node3:~$ hadoop-daemon.sh start datanode
hdfs@data-node3:~$ /usr/jdk/latest/bin/jps | grep DataNode
29699 DataNode
Exit from the data-node3 zone by pressing Ctrl-D.
5. Create a /tmp directory and set its permissions to 1777 (drwxrwxrwt). Then create the HDFS file system using the hadoop fs command:
root@global_zone:~# zlogin -l hdfs name-node1
hdfs@name-node1:~$ hadoop fs -mkdir /tmp
Note: You might get the warning message NativeCodeLoader: Unable to load native-hadoop library for your platform...using builtin-java classes where applicable. Hadoop isn’t able to use native platform libraries that accelerate the Hadoop suite. These native libraries are optional; the port of the Oracle Solaris hadoop 2.x native libraries is a work in progress.
hdfs@name-node1:~$ hadoop fs -chmod -R 1777 /tmp
6. Create a history directory and set permissions and ownership:
hdfs@name-node1:~$ hadoop fs -mkdir /user
hdfs@name-node1:~$ hadoop fs -mkdir /user/history
hdfs@name-node1:~$ hadoop fs -chmod -R 1777 /user/history
hdfs@name-node1:~$ hadoop fs -chown yarn /user/history
7. Create the log directories:
hdfs@name-node1:~$ hadoop fs -mkdir /var
hdfs@name-node1:~$ hadoop fs -mkdir /var/log
hdfs@name-node1:~$ hadoop fs -mkdir /var/log/hadoop-yarn
hdfs@name-node1:~$ hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
8. Create a directory for user bob and set ownership:
hdfs@name-node1:~$ hadoop fs -mkdir /user/bob
hdfs@name-node1:~$ hadoop fs -chown bob /user/bob
9. Verify the HDFS file structure:
hdfs@name-node:~$ hadoop fs -ls -R /
drwxrwxrwt - hdfs supergroup 0 2014-02-26 10:43 /tmp
drwxr-xr-x - hdfs supergroup 0 2014-02-26 10:58 /user
drwxr-xr-x - bob supergroup 0 2014-02-26 10:58 /user/bob
drwxrwxrwt - yarn supergroup 0 2014-02-26 10:50
/user/history
drwxr-xr-x - hdfs supergroup 0 2014-02-26 10:53 /var
drwxr-xr-x - hdfs supergroup 0 2014-02-26 10:53 /var/log
drwxr-xr-x - yarn mapred 0 2014-02-26 10:53
/var/log/hadoop-yarn
10. Exit from the name-node1 zone by pressing Ctrl-D. 11. Start the YARN resource-manager service using the following commands:
root@global_zone:~# zlogin -l yarn resource-manager
yarn@resource-manager:~$ yarn-daemon.sh start resourcemanager
yarn@resource-manager:~$ /usr/jdk/latest/bin/jps | grep ResourceManager
29776 ResourceManager
12. Start the NodeManager process on all DataNodes and verify the status:
root@global_zone:~# zlogin -l yarn data-node1 yarn-daemon.sh start
nodemanager
root@global_zone:~# zlogin -l yarn data-node1 /usr/jdk/latest/bin/jps |
grep NodeManager
29920 NodeManager
root@global_zone:~# zlogin -l yarn data-node2 yarn-daemon.sh start
nodemanager
root@global_zone:~# zlogin -l yarn data-node2 /usr/jdk/latest/bin/jps |
grep NodeManager
29930 NodeManager
root@global_zone:~# zlogin -l yarn data-node3 yarn-daemon.sh start
nodemanager
root@global_zone:~# zlogin -l yarn data-node3 /usr/jdk/latest/bin/jps |
grep NodeManager
29982 NodeManager
13. Start the MapReduce History Server and verify its status:
root@global_zone:~# zlogin -l mapred resource-manager
mapred@history-server:~$ mr-jobhistory-daemon.sh start historyserver
mapred@history-server:~$ /usr/jdk/latest/bin/jps | grep
JobHistoryServer
654 JobHistoryServer
Exit the resource-manager zone by pressing Cntr-D
14. Log in to name-node1:
15. root@global_zone:~# zlogin -l hdfs name-node1
16. Use the following command to show basic HDFS statistics for the cluster:
hdfs@name-node1:~$ hdfs dfsadmin -report
13/11/26 05:16:51 WARN util.NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes where
applicable
Configured Capacity: 1077762507264 (1003.74 GB)
Present Capacity: 1075847407736 (1001.96 GB)
DFS Remaining: 1075845337088 (1001.96 GB)
DFS Used: 2070648 (1.97 MB)
DFS Used%: 0.00%
Under replicated blocks: 4
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)
17. Use the following command to show the cluster topology:
18. hdfs@name-node1:~$ hdfs dfsadmin -printTopology
13/11/26 05:19:03 WARN util.NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes where
applicable
Rack: /default-rack
10.153.111.222:50010 (data-node1)
10.153.111.223:50010 (data-node2)
10.153.111.224:50010 (data-node3)
Note: You might get the warning message NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes
where applicable. Hadoop is able to use native platform libraries that accelerate the
Hadoop suite. These native libraries are optional; the port of the Oracle Solaris hadoop
2.x native libraries is a work in progress.
19. Run a simple MapReduce job:
root@global_zone:~# zlogin -l bob name-node1 hadoop jar
/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-
2.2.0.jar pi 10 20
where:
o zlogin -l bob name-node1 specifies that the command be run as user bob on the name-node1 zone.
o hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-
mapreduce-examples-2.2.0.jar pi specifies the Hadoop .jar file. o 10 specifies the number of maps. o 20 specifies the number of samples.
top related