install hadoop in a cluster

Hadoop Installation

Xuhong Zhang, Jiangling Yin

Advisor: Dr. Jun Wang

Hadoop Installation

• Documentation

Goes to http://hadoop.apache.org/

http://hadoop.apache.org/

Documentation

Install Hadoop in a Cluster

UCF CASS:

http://cass.eecs.ucf.edu/ganglia/?p=2&c=CASS

http://cass.eecs.ucf.edu/ganglia/?p=2&c=CASS

Prerequisites

• Several machines

• Linux for production platform (Linux centos in this example)

• Java installed (Version 6 or later)

• SSH installed

Install Hadoop

• Two steps:

1. Download Hadoop from release page

http://hadoop.apache.org/releases.html#Download

In this example, we use Hadoop 2.2.0

1. Config configuration files

http://hadoop.apache.org/releases.html

Download and unpack Hadoop

• Unpackage$ tar hadoop-2.2.0.tar.gz

• ‘cd’ to directory hadoop$ cd hadoop-2.2.0/

• Inside hadoop-2.2.0 directory

Configuration (1)

• All configuration files are under hadoop-2.2.0/etc/hadoop directory:

Configuration (2) –Environment variables

• Java

Set JAVA_HOME to the location of your jdk

for example:$export JAVA_HOME=/home/ji453898/jan/jdk1.7.0_03

• Hadoop

Set HADOOP_HOME to the location of your hadoop

folderexport HADOOP_HOME=/home/xzhang/hadoop-2.2.0

export PATH=$PATH:$HADOOP_HOME/bin

Configuration (3) –core-site.xml

Configuration (4) –hdfs-site.xml

Configuration (5) –mapred-site.xml

Configuration (6) –yarn-site.xml

Configuration (7) –slaves

A list of machines (one per line) that each run a datanode and a tasktracker.

Configuration-SSH passwordless login

SSH passwordless login from master to slaves• Generate SSH key pairs(public and private)

Configuration-SSH passwordless login

• Append public key into authorized_keys

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

• Copy id_rsa.pub file into all slave’s ~/.ssh/ folder

Run Hadoop

• Format Name node

• Start dfs(HDFS)

• Start Yarn(resourcemanager,nodemanager)

• Check if success

Format Namenode

• Inside bin directory

$./hadoop namenode –format

Start DFS and Yarn• Start hadoop

Check• On namenode

• On datanode

Run a sample MapReduce (1)

• Upload one file into dfs:

Link to hadoop file system shell documentationhttp://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/FileSystemShell.html

http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/FileSystemShell.html

Run a sample MapReduce (2)

• Run a MapReduce job:$ hadoop jar hadoop-mapreduce-examples-2.4.2-SNAPSHOT.jar wordcount

/wordcount/input /wordcount/output

……./hadoop/share/hadoop/mapreduce/

compiling

• compile WordCount.java$ javac -classpath hadoop-core-0.20.203.0.jar -d wordcount WordCount.java

• create a jar$jar -cvf ./word.jar -C wordcount .

• Look up the clasess:

$ jar tf word.jar

install hadoop in a cluster

Documents