installation and setup hadoop published

22
DIPENDRA KUSI 2/1/17 https://www.linkedin.com/in/er-dipendra-kusi-b3674193 HADOOP SETUP Installation and setup Hadoop

Upload: er-dipendra-kusi

Post on 21-Feb-2017

88 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Installation and setup Hadoop

Page 2: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Step 1: First go to virtual box site and download the virtual box:

https://www.virtualbox.org/wiki/Downloads

Step 2: Go to cloudera site and download cloudera

http://www.cloudera.com/downloads/quickstart_vms/5-8.html

Page 3: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Step 3: Run the cloudera in virtual box

Step 4:

Now check whether the Hadoop is running or not through terminal

$ Hadoop version

Page 4: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Step 5:

Also, check Hadoop configuration through browser

Page 5: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Step 6:

Now go to site: http://tiny.cloudera.com/hadoopTutorialSample.

And download the source code of word count and extract it.

Step7:

Now open the terminal in this wordcount.jar location.

Create the own folder for input data:

$ Hadoop fs -mkdir /user/cloudera/Hadoop_data /user/cloudera/Hadoop_data/input

Page 6: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Step 8:

Now put the file to be process in /user/cloudera/Hadoop_data/input folder

$ Hadoop fs -put file0 /user/cloudera/Hadoop_data/input

Page 7: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Step 9:

Now run the word count jar in Hadoop to process the word in file0.

$ Hadoop jar wordcount.jar /user/cloudera/Hadoop_data/input /user/cloudera/Hadoop_data/output

Page 8: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Running this command, exception occur saying “ClassNotFoundException”. This mean that jar file has no

explicity define the running class so let define the running class which is in org.myorg.WordCount class

Now wordcount.jar is running is Hadoop

$ Hadoop jar wordcount.jar org.myorg.WordCount /user/cloudera/Hadoop_data/input

/user/cloudera/Hadoop_data/output

Page 9: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Step 10:

Now check the output contain:

$ Hadoop fs -cat /user/cloudera/Hadoop_data/output/*

Page 10: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

So, the output has word count as expected.

Create Jar file and run in Hadoop

Step 11: Now let’s create java file in eclipse and export it to jar and run in Hadoop

First create project Hadoop_first_project in eclipse

Page 11: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Now create WordCount class and paste the below code:

import java.io.IOException;

import java.util.regex.Pattern;

import org.apache.hadoop.conf.Configured;

Page 12: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.log4j.Logger;

public class WordCount extends Configured implements Tool {

private static final Logger LOG = Logger.getLogger(WordCount.class);

public static void main(String[] args) throws Exception {

int res = ToolRunner.run(new WordCount(), args);

System.exit(res);

}

public int run(String[] args) throws Exception {

Job job = Job.getInstance(getConf(), "wordcount");

job.setJarByClass(this.getClass());

// Use TextInputFormat, the default unless job.setInputFormatClass is used

Page 13: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(Map.class);

job.setReducerClass(Reduce.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

return job.waitForCompletion(true) ? 0 : 1;

}

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

private long numRecords = 0;

private static final Pattern WORD_BOUNDARY = Pattern.compile("\\s*\\b\\s*");

public void map(LongWritable offset, Text lineText, Context context)

throws IOException, InterruptedException {

String line = lineText.toString();

Text currentWord = new Text();

for (String word : WORD_BOUNDARY.split(line)) {

if (word.isEmpty()) {

continue;

}

currentWord = new Text(word);

context.write(currentWord,one);

}

}

Page 14: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

}

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

@Override

public void reduce(Text word, Iterable<IntWritable> counts, Context context)

throws IOException, InterruptedException {

int sum = 0;

for (IntWritable count : counts) {

sum += count.get();

}

context.write(word, new IntWritable(sum));

}

}

}

Here, hadoop library is missing so let load the required library.

Go to project property

Page 15: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Go to java build path and libraries:

Now, click on add external jars and add jar from following location

File System -> usr -> lib ->Hadoop

And add all the jar file

Page 16: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Go to client-0.20 folder and add all jar from there as well

Go to lib folder and add all jar from there as well

Click on ok. You will see all the error will disappear.

Now export the project to jar file.

Right click on project-> export

Page 17: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Click on jar file->next

Page 18: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Now select the project and select the export location of jar file and click next and then next

Page 19: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Click on browse to select the main running class

Page 20: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Click ok-> finish

Page 21: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

Now go to export mywordcount.jar location.

Run command:

Delete the output folder that has been created previously

$hadoop fs -rm -r /user/cloudera/Hadoop_data/output

And run the jar in Hadoop(no need to define the class since we have already defined the class entry

point during the export)

Page 22: Installation and setup hadoop published

DIPENDRA KUSI 2/1/17

https://www.linkedin.com/in/er-dipendra-kusi-b3674193

HADOOP SETUP

$ Hadoop jar mywordcount.jar /user/cloudera/Hadoop_data/input

/user/cloudera/Hadoop_data/input/output