install hadoop in a cluster

25
Hadoop Installation Xuhong Zhang, Jiangling Yin Advisor: Dr. Jun Wang

Upload: xuhong-zhang

Post on 19-Jul-2015

71 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Install hadoop in a cluster

Hadoop Installation

Xuhong Zhang, Jiangling Yin

Advisor: Dr. Jun Wang

Page 2: Install hadoop in a cluster

Hadoop Installation

• Documentation

Goes to http://hadoop.apache.org/

Page 3: Install hadoop in a cluster

Documentation

Page 4: Install hadoop in a cluster

Install Hadoop in a Cluster

UCF CASS:

http://cass.eecs.ucf.edu/ganglia/?p=2&c=CASS

Page 5: Install hadoop in a cluster

Prerequisites

• Several machines

• Linux for production platform (Linux centos in this example)

• Java installed (Version 6 or later)

• SSH installed

Page 6: Install hadoop in a cluster

Install Hadoop

• Two steps:

1. Download Hadoop from release page

http://hadoop.apache.org/releases.html#Download

In this example, we use Hadoop 2.2.0

1. Config configuration files

Page 7: Install hadoop in a cluster

Download and unpack Hadoop

• Unpackage$ tar hadoop-2.2.0.tar.gz

• ‘cd’ to directory hadoop$ cd hadoop-2.2.0/

• Inside hadoop-2.2.0 directory

Page 8: Install hadoop in a cluster

Configuration (1)

• All configuration files are under hadoop-2.2.0/etc/hadoop directory:

Page 9: Install hadoop in a cluster

Configuration (2) –Environment variables

• Java

Set JAVA_HOME to the location of your jdk

for example:$export JAVA_HOME=/home/ji453898/jan/jdk1.7.0_03

• Hadoop

Set HADOOP_HOME to the location of your hadoop

folderexport HADOOP_HOME=/home/xzhang/hadoop-2.2.0

export PATH=$PATH:$HADOOP_HOME/bin

Page 10: Install hadoop in a cluster

Configuration (3) –core-site.xml

Page 11: Install hadoop in a cluster

Configuration (4) –hdfs-site.xml

Page 12: Install hadoop in a cluster

Configuration (4) –hdfs-site.xml

Page 13: Install hadoop in a cluster

Configuration (5) –mapred-site.xml

Page 14: Install hadoop in a cluster

Configuration (5) –mapred-site.xml

Page 15: Install hadoop in a cluster

Configuration (6) –yarn-site.xml

Page 16: Install hadoop in a cluster

Configuration (7) –slaves

A list of machines (one per line) that each run a datanode and a tasktracker.

Page 17: Install hadoop in a cluster

Configuration-SSH passwordless login

SSH passwordless login from master to slaves• Generate SSH key pairs(public and private)

Page 18: Install hadoop in a cluster

Configuration-SSH passwordless login

• Append public key into authorized_keys

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

• Copy id_rsa.pub file into all slave’s ~/.ssh/ folder

Page 19: Install hadoop in a cluster

Run Hadoop

• Format Name node

• Start dfs(HDFS)

• Start Yarn(resourcemanager,nodemanager)

• Check if success

Page 20: Install hadoop in a cluster

Format Namenode

• Inside bin directory

$./hadoop namenode –format

Page 21: Install hadoop in a cluster

Start DFS and Yarn• Start hadoop

Page 22: Install hadoop in a cluster

Check• On namenode

• On datanode

Page 23: Install hadoop in a cluster

Run a sample MapReduce (1)

• Upload one file into dfs:

Link to hadoop file system shell documentationhttp://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/FileSystemShell.html

Page 24: Install hadoop in a cluster

Run a sample MapReduce (2)

• Run a MapReduce job:$ hadoop jar hadoop-mapreduce-examples-2.4.2-SNAPSHOT.jar wordcount

/wordcount/input /wordcount/output

……./hadoop/share/hadoop/mapreduce/

Page 25: Install hadoop in a cluster

compiling

• compile WordCount.java$ javac -classpath hadoop-core-0.20.203.0.jar -d wordcount WordCount.java

• create a jar$jar -cvf ./word.jar -C wordcount .

• Look up the clasess:

$ jar tf word.jar