hadoop on aws amazon

Hadoop Cluster Configuration on AWS EC2

-----------------------------------------------------------------------------------------------------------Buy some Instances on Aws amazon and one master and 10 slaves

ec2-50-17-21-209.compute-1.amazonaws.com masterec2-54-242-251-124.compute-1.amazonaws.com slave1ec2-23-23-17-15.compute-1.amazonaws.com slave2ec2-50-19-79-241.compute-1.amazonaws.com slave3ec2-50-16-49-229.compute-1.amazonaws.com slave4ec2-174-129-99-84.compute-1.amazonaws.com slave5ec2-50-16-105-188.compute-1.amazonaws.com slave6ec2-174-129-92-105.compute-1.amazonaws.com slave7ec2-54-242-20-144.compute-1.amazonaws.com slave8ec2-54-243-24-10.compute-1.amazonaws.com slave9ec2-204-236-205-227.compute-1.amazonaws.com slave10----------------------------------------------------------------------------------------------------------------------------Make seperation as one master and 10 slaves

----------------------------------------------------------------------------------------------------------------------------Make sure ssh is working from master to all slaves

----------------------------------------------------------------------------------------------------------------------------Add the Ip and DNS name and duplicate DNS name in /etc/hosts in master

----------------------------------------------------------------------------------------------------------------------------Master /etc/hosts file Looks like this.

127.0.0.1 localhost localhost.localdomain10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master10.155.244.83 ec2-54-242-251-124.compute-1.amazonaws.com slave110.155.245.185 ec2-23-23-17-15.compute-1.amazonaws.com slave210.155.244.208 ec2-50-19-79-241.compute-1.amazonaws.com slave310.155.244.246 ec2-50-16-49-229.compute-1.amazonaws.com slave410.155.245.217 ec2-174-129-99-84.compute-1.amazonaws.com slave510.155.244.177 ec2-50-16-105-188.compute-1.amazonaws.com slave610.155.245.152 ec2-174-129-92-105.compute-1.amazonaws.com slave710.155.244.145 ec2-54-242-20-144.compute-1.amazonaws.com slave810.155.244.71 ec2-54-243-24-10.compute-1.amazonaws.com slave910.155.244.46 ec2-204-236-205-227.compute-1.amazonaws.com slave10

----------------------------------------------------------------------------------------------------------------------------and slaves etc/hosts file looks like this.

remove 127.0.0.1 in all slaves

10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master10.155.244.83 ec2-54-242-251-124.compute-1.amazonaws.com slave110.155.245.185 ec2-23-23-17-15.compute-1.amazonaws.com slave210.155.244.208 ec2-50-19-79-241.compute-1.amazonaws.com slave310.155.244.246 ec2-50-16-49-229.compute-1.amazonaws.com slave410.155.245.217 ec2-174-129-99-84.compute-1.amazonaws.com slave510.155.244.177 ec2-50-16-105-188.compute-1.amazonaws.com slave610.155.245.152 ec2-174-129-92-105.compute-1.amazonaws.com slave710.155.244.145 ec2-54-242-20-144.compute-1.amazonaws.com slave810.155.244.71 ec2-54-243-24-10.compute-1.amazonaws.com slave910.155.244.46 ec2-204-236-205-227.compute-1.amazonaws.com slave10

---------------------------------------------------------------------------------------------------------------------------Download Hadoop installation folder from ApacheHadoop release and keep it in master folder (Ex:-/usr/local/hadoop1.0.4)

----------------------------------------------------------------------------------------------------------------------------Open the Hadoop.env.sh file from (Hadoop-.10.4/conf/) folder

----------------------------------------------------------------------------------------------------------------------------set the environment variables for JAVA PATH,HADOOP_HOME, LD_LIBRARY_PATH, HADOOP_OPTS

export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64export HADOOP_HOME=/usr/local/hadoop-1.0.4/export LD_LIBRARY_PATH=/usr/local/hadoop-1.0.4/lib/native/Linux-amd64-64export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"export HADOOP_HEAPSIZE=400000 (in MB)----------------------------------------------------------------------------------------------------------------------------Open the Hdfs-Site.xml file.

and set the following param's

hadoop.log.dir /media/ephemeral0/logs hadoop.tmp.dir /media/ephemeral0/tmp-${user.name} dfs.data.dir /media/ephemeral0/data-${user.name} dfs.name.dir /media/ephemeral0/name-${user.name} fs.default.name hdfs://master:9000 dfs.replication 3 Default block replication dfs.block.size 536870912 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.

----------------------------------------------------------------------------------------------------------------------------Open the Mapred-site.xml.


hadoop.log.dir /media/ephemeral0/logs mapred.child.java.opts 60000 dfs.datanode.max.xcievers -Xmx400m mapred.tasktracker.map.tasks.maximum 14 mapred.tasktracker.reduce.tasks.maximum 14 mapred.system.dir /media/ephemeral0/system-${user.name} system directory to run map and reduce tasks

hadoop.log.dir/media/ephemeral0/log-${user.name}

mapred.job.tracker master:9001 The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. mapred.tasktracker.map.tasks.maximum 10 The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.

mapreduce.map.output.compress true

mapreduce.map.output.compress.codec org.apache.hadoop.io.compress.GzipCodec

mapred.create.symlink true

mapred.child.ulimitunlimited

----------------------------------------------------------------------------------------------------------------------------Open the Core-Site.Xml


dfs.data.dir /media/ephemeral0/data-${user.name} hadoop.tmp.dir /media/ephemeral0/tmp-${user.name} dfs.data.dir /media/ephemeral0/data-${user.name} dfs.name.dir /media/ephemeral0/name-${user.name} fs.default.name hdfs://master:9000

----------------------------------------------------------------------------------------------------------------------------Open the Masters file and set the following param'smaster

----------------------------------------------------------------------------------------------------------------------------Open the Slaves file and set the following param's

slave1salve2salve3salve4salve5salve6salve7salve8salve9salve10----------------------------------------------------------------------------------------------------------------------------

Give owner Permision to the Ec2-user in all slaves for the folder /media(all folders which are all we using for hadoop).

----------------------------------------------------------------------------------------------------------------------------from master copy full hadoop-1.0.4 to all slave ex:- Scp -r /usr/local/hadoop-1.0.4 ec2-50-17-21-209.compute-1.amazonaws.com:/usr/local/hadoop-1.0.4

----------------------------------------------------------------------------------------------------------------------------copy to all slaves from master.

----------------------------------------------------------------------------------------------------------------------------Add port 50000-50100 in security groups in aws console.

Hadoop namenode -format from master and start-all.sh----------------------------------------------------------------------------------------------------------------------------

hadoop on aws amazon

Technology