install hadoop in a cluster
TRANSCRIPT
Hadoop Installation
Xuhong Zhang, Jiangling Yin
Advisor: Dr. Jun Wang
Documentation
Install Hadoop in a Cluster
UCF CASS:
http://cass.eecs.ucf.edu/ganglia/?p=2&c=CASS
Prerequisites
• Several machines
• Linux for production platform (Linux centos in this example)
• Java installed (Version 6 or later)
• SSH installed
Install Hadoop
• Two steps:
1. Download Hadoop from release page
http://hadoop.apache.org/releases.html#Download
In this example, we use Hadoop 2.2.0
1. Config configuration files
Download and unpack Hadoop
• Unpackage$ tar hadoop-2.2.0.tar.gz
• ‘cd’ to directory hadoop$ cd hadoop-2.2.0/
• Inside hadoop-2.2.0 directory
Configuration (1)
• All configuration files are under hadoop-2.2.0/etc/hadoop directory:
Configuration (2) –Environment variables
• Java
Set JAVA_HOME to the location of your jdk
for example:$export JAVA_HOME=/home/ji453898/jan/jdk1.7.0_03
• Hadoop
Set HADOOP_HOME to the location of your hadoop
folderexport HADOOP_HOME=/home/xzhang/hadoop-2.2.0
export PATH=$PATH:$HADOOP_HOME/bin
Configuration (3) –core-site.xml
Configuration (4) –hdfs-site.xml
Configuration (4) –hdfs-site.xml
Configuration (5) –mapred-site.xml
Configuration (5) –mapred-site.xml
Configuration (6) –yarn-site.xml
Configuration (7) –slaves
A list of machines (one per line) that each run a datanode and a tasktracker.
Configuration-SSH passwordless login
SSH passwordless login from master to slaves• Generate SSH key pairs(public and private)
Configuration-SSH passwordless login
• Append public key into authorized_keys
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
• Copy id_rsa.pub file into all slave’s ~/.ssh/ folder
Run Hadoop
• Format Name node
• Start dfs(HDFS)
• Start Yarn(resourcemanager,nodemanager)
• Check if success
Format Namenode
• Inside bin directory
$./hadoop namenode –format
Start DFS and Yarn• Start hadoop
Check• On namenode
• On datanode
Run a sample MapReduce (1)
• Upload one file into dfs:
Link to hadoop file system shell documentationhttp://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/FileSystemShell.html
Run a sample MapReduce (2)
• Run a MapReduce job:$ hadoop jar hadoop-mapreduce-examples-2.4.2-SNAPSHOT.jar wordcount
/wordcount/input /wordcount/output
……./hadoop/share/hadoop/mapreduce/
compiling
• compile WordCount.java$ javac -classpath hadoop-core-0.20.203.0.jar -d wordcount WordCount.java
• create a jar$jar -cvf ./word.jar -C wordcount .
• Look up the clasess:
$ jar tf word.jar