installation and setup hadoop published
TRANSCRIPT
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Installation and setup Hadoop
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 1: First go to virtual box site and download the virtual box:
https://www.virtualbox.org/wiki/Downloads
Step 2: Go to cloudera site and download cloudera
http://www.cloudera.com/downloads/quickstart_vms/5-8.html
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 3: Run the cloudera in virtual box
Step 4:
Now check whether the Hadoop is running or not through terminal
$ Hadoop version
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 5:
Also, check Hadoop configuration through browser
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 6:
Now go to site: http://tiny.cloudera.com/hadoopTutorialSample.
And download the source code of word count and extract it.
Step7:
Now open the terminal in this wordcount.jar location.
Create the own folder for input data:
$ Hadoop fs -mkdir /user/cloudera/Hadoop_data /user/cloudera/Hadoop_data/input
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 8:
Now put the file to be process in /user/cloudera/Hadoop_data/input folder
$ Hadoop fs -put file0 /user/cloudera/Hadoop_data/input
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 9:
Now run the word count jar in Hadoop to process the word in file0.
$ Hadoop jar wordcount.jar /user/cloudera/Hadoop_data/input /user/cloudera/Hadoop_data/output
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Running this command, exception occur saying “ClassNotFoundException”. This mean that jar file has no
explicity define the running class so let define the running class which is in org.myorg.WordCount class
Now wordcount.jar is running is Hadoop
$ Hadoop jar wordcount.jar org.myorg.WordCount /user/cloudera/Hadoop_data/input
/user/cloudera/Hadoop_data/output
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Step 10:
Now check the output contain:
$ Hadoop fs -cat /user/cloudera/Hadoop_data/output/*
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
So, the output has word count as expected.
Create Jar file and run in Hadoop
Step 11: Now let’s create java file in eclipse and export it to jar and run in Hadoop
First create project Hadoop_first_project in eclipse
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Now create WordCount class and paste the below code:
import java.io.IOException;
import java.util.regex.Pattern;
import org.apache.hadoop.conf.Configured;
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.log4j.Logger;
public class WordCount extends Configured implements Tool {
private static final Logger LOG = Logger.getLogger(WordCount.class);
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new WordCount(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), "wordcount");
job.setJarByClass(this.getClass());
// Use TextInputFormat, the default unless job.setInputFormatClass is used
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
private long numRecords = 0;
private static final Pattern WORD_BOUNDARY = Pattern.compile("\\s*\\b\\s*");
public void map(LongWritable offset, Text lineText, Context context)
throws IOException, InterruptedException {
String line = lineText.toString();
Text currentWord = new Text();
for (String word : WORD_BOUNDARY.split(line)) {
if (word.isEmpty()) {
continue;
}
currentWord = new Text(word);
context.write(currentWord,one);
}
}
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text word, Iterable<IntWritable> counts, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable count : counts) {
sum += count.get();
}
context.write(word, new IntWritable(sum));
}
}
}
Here, hadoop library is missing so let load the required library.
Go to project property
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Go to java build path and libraries:
Now, click on add external jars and add jar from following location
File System -> usr -> lib ->Hadoop
And add all the jar file
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Go to client-0.20 folder and add all jar from there as well
Go to lib folder and add all jar from there as well
Click on ok. You will see all the error will disappear.
Now export the project to jar file.
Right click on project-> export
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Click on jar file->next
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Now select the project and select the export location of jar file and click next and then next
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Click on browse to select the main running class
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Click ok-> finish
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
Now go to export mywordcount.jar location.
Run command:
Delete the output folder that has been created previously
$hadoop fs -rm -r /user/cloudera/Hadoop_data/output
And run the jar in Hadoop(no need to define the class since we have already defined the class entry
point during the export)
DIPENDRA KUSI 2/1/17
https://www.linkedin.com/in/er-dipendra-kusi-b3674193
HADOOP SETUP
$ Hadoop jar mywordcount.jar /user/cloudera/Hadoop_data/input
/user/cloudera/Hadoop_data/input/output