2014 international software testing conference in seoul
TRANSCRIPT
Seoul Software Testing Conference
Testing Big Data: Unit Test in Hadoop (Part II)
Jongwook Woo (PhD)
High-Performance Internet Computing Center (HiPIC)Educational Partner with Cloudera and Grants Awardee of Amazon AWS
Computer Information Systems DepartmentCalifornia State University, Los Angeles
Seoul Software Testing Conference
Contents
Test in GeneralUse Cases: Big Data in Hadoop and EcosystemsUnit Test in Hadoop
Seoul Software Testing Conference
Test in general
Quality Assur-ance
TDD (Test Driven Devel-opment)
Unit TestTest func-tional units of the S/W
BDD (Behavior Driven Devel-opment)
Based on TDDTest behavior of the S/W
Integration Test: integrated components
Group of unit tests
CI (Continuous Integration) Server
Hudson, Jenkins etc
Seoul Software Testing Conference
CI Server
Continuous Inte-gration Server
TDD (Test Driven Devel-opment) based
All developers commit the update every-dayCI server com-pile and run the unit testsIf a test fails, all receive the failure email
Know who committed a bad code
Hudson, Jenkins etc
Supports SCM version control tools
CVS, Subver-sion, Git
Seoul Software Testing Conference
Test in Hadoop
Much harderJUnit cannot be used in HadoopClusterServerParallel Comput-ing
Seoul Software Testing Conference
Use Cases: Shopzilla
Hadoop’s Ele-phant In The Room
Hadoop testingQuality Assur-ance
Unit Test: functional units of the S/WIntegration Test: inte-grated com-ponentsBDD Test: Behavior of the S/W
Augmented Development
Use a dev cluster?
Too long per day
Hadoop-In-A-Box
Seoul Software Testing Conference
Use Cases: Shopzilla
Hadoop-In-A-Box
Fully compatible Mock Environ-ment
Without a clus-terMock cluster state
Test LocallySingle Node Pseudo ClusterMiniMRCluster=> can test HDFS, Pig
Seoul Software Testing Conference
Use Cases: Yahoo
DeveloperWants to run Hadoop codes in the local ma-chine
Does not want to run Hadoop codes at the Hadoop cluster
Yahoo HITHadoop Integra-tion TestRun Hadoop tests in the Hadoop Ecosys-tems
Deploy HIT on a Hadoop sin-gle or clusterRun tests in Hadoop, Pig, Hive, Oozie,…
Seoul Software Testing Conference
Unit Test in Hadoop
MRUnit testing framework
is based on JU-nit Cloudera do-nated to Apachecan test Map Reduce pro-grams
written on 0.20 , 0.23.x , 1.0.x , 2.x version of Hadoop
Can test Map-per, Reducer, MapperReducer
Seoul Software Testing Conference
Unit Test in Hadoop
WordCount Ex-ample
reads text files and counts how often words oc-cur.
The input and the output are text files,
Need three classes
WordCount.-java
Driver class with main function
WordMapper.-java
Mapper class with map method
SumReducer.-java
Reducer class with reduce method
Seoul Software Testing Conference
WordCount Example
WordMapper.-java
Mapper class with map func-tionFor the given sample input
assuming two map nodes
The sample input is dis-tributed to the maps
the first map emits:
<Hello, 1> <World, 1> <Bye, 1> <World, 1>
The second map emits:
<Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1>
Seoul Software Testing Conference
WordCount Example
SumReducer.javaReducer class with reduce functionFor the input from two Map-pers
the reduce method just sums up the values,
which are the occur-rence counts for each key
Thus the out-put of the job is:
<Bye, 1> <Goodbye, 1> <Hadoop, 2> <Hello, 2> <World, 2>
Seoul Software Testing Conference
WordCount.java (Driver)
import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.-FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.Tex-tInputFormat;import org.apache.hadoop.mapreduce.lib.output.-FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.-TextOutputFormat;public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}
Seoul Software Testing Conference
WordCount.java
public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }
Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}
Check Input and Output files
Seoul Software Testing Conference
WordCount.java
public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }
Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}
Set output (key, value) types
Seoul Software Testing Conference
WordCount.java
public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }
Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}
Set Mapper/Reducer classes
Seoul Software Testing Conference
WordCount.java
public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }
Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}
Set Input/Output format classes
Seoul Software Testing Conference
WordCount.java
public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }
Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}
Set Input/Output paths
Seoul Software Testing Conference
WordCount.java
public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }
Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}
Set Driver class
Seoul Software Testing Conference
WordCount.java
public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [in-put] [output]"); System.exit(-1); }
Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); }}
Submit the job to the master node
Seoul Software Testing Conference
WordMapper.java (Mapper class)
import java.io.IOException;import java.util.StringTokenizer; import org.apache.hadoop.io.In-tWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapre-duce.Mapper;
public class WordMapper extends Mapper<Object, Text, Text, In-tWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for process-ing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}
Seoul Software Testing Conference
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}
Extends mapper class with input/output keys and values
Seoul Software Testing Conference
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}
Output (key, value) types
Seoul Software Testing Conference
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}
Input (key, value) typesOutput as Context type
Seoul Software Testing Conference
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}
Read words from each line of the input file
Seoul Software Testing Conference
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static In-tWritable one = new In-tWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, Interrupt-edException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMore-Tokens()) { word.set(wordList.nextToken()); contex.write(word, one); } }}
Count each word
Seoul Software Testing Conference
Shuffler/Sorter
Maps emit (key, value) pairsShuffler/Sorter of Hadoop framework
Sort (key, value) pairs by keyThen, append the value to make (key, list of values) pairFor example,
The first, sec-ond maps emit:
<Hello, 1> <World, 1> <Bye, 1> <World, 1> <Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1>
Shuffler pro-duces and it becomes the input of the reducer
<Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>>, <-World, <1,1>>
Seoul Software Testing Conference
SumReducer.java (Reducer class)
import java.io.IOException;import java.util.Iterator; import org.apache.hadoop.io.In-tWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapre-duce.Reducer; public class SumReducer extends Re-ducer<Text, IntWritable, Text, In-tWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, Interrupt-edException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); }}
Seoul Software Testing Conference
SumReducer.java
public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}
Extends Reducer class with input/output keys and values
Seoul Software Testing Conference
SumReducer.java
public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}
Set output value type
Seoul Software Testing Conference
SumReducer.java
public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}
Set input (key, list of values) type and output as Context class
Seoul Software Testing Conference
SumReducer.java
public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}
For each word, Count/sum the number of values
Seoul Software Testing Conference
SumReducer.java
public class SumReducer extends Reducer<Text, In-tWritable, Text, In-tWritable> { private IntWritable total-WordCount = new In-tWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, total-WordCount); }}
For each word, Total count becomes the value
Seoul Software Testing Conference
SumReducer
ReducerInput: Shuffler produces and it becomes the input of the re-ducer
<Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>>, <-World, <1,1>>
Output<Bye, 1>, <Goodbye, 1>, <Hadoop, 2>, <Hello, 2>, <World, 2>
Seoul Software Testing Conference
MRUnit Test
How to UnitTest in Hadoop
Extending JUnit test
With org.a-pache.hadoop.mrunit.* API
Needs to test Driver, Mapper, Reducer
MapRe-duceDriver, MapDriver, ReduceDriverAdd input with expected out-put
Seoul Software Testing Conference
MRUnit Test
import java.util.ArrayList;import java.util.List; import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mrunit.MapDriver;import org.apache.hadoop.mrunit.MapReduceDriver;import org.apache.hadoop.mrunit.ReduceDriver;import org.junit.Before;import org.junit.Test; public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, In-tWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, In-tWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); }
Seoul Software Testing Conference
MRUnit Test
@Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); } @Test public void testMapReduce() { mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2)); mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1)); mapReduceDriver.runTest(); } }
Seoul Software Testing Conference
MRUnit Test
public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }
Using MRUnit API, declare MapReduce, Mapper, Reducer drivers with input/output (key, value)
Seoul Software Testing Conference
MRUnit Test
public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }
Run setUp() before executing each test method
Seoul Software Testing Conference
MRUnit Test
public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }
Instantiate WordCount Mapper, Reducer
Seoul Software Testing Conference
MRUnit Test
public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }
Instantiate and set Mapper driverwith input/output (key, value)
Seoul Software Testing Conference
MRUnit Test
public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }
Instantiate and set Reducer driverwith input/output (key, value)
Seoul Software Testing Conference
MRUnit Test
public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapRe-duceDriver; MapDriver<LongWritable, Text, Text, IntWritable> map-Driver; ReduceDriver<Text, In-tWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, In-tWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, In-tWritable, Text, IntWritable>(); mapReduceDriver.setMap-per(mapper); mapReduceDriver.setRe-ducer(reducer); }
Instantiate and set MapperReducer driverwith input/output (key, value)
Seoul Software Testing Conference
MRUnit Test
@Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); }
Mapper test: Define sample input with expected output
Seoul Software Testing Conference
MRUnit Test
@Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); }
Reducer test: Define sample input with expected output
Seoul Software Testing Conference
MRUnit Test
@Test public void testMapReduce() { mapReduceDriver.with-Input(new LongWritable(1), new Text("cat cat dog")); mapReduceDriver.add-Output(new Text("cat"), new IntWritable(2)); mapReduceDriver.add-Output(new Text("dog"), new IntWritable(1)); mapReduceDriv-er.runTest(); } }
MapperReducer test: Define sample input with expected output
Seoul Software Testing Conference
MRUnit Test in real
Need to imple-ment unit tests
How many?all Map, Re-duce, Driver
Problems?Mostly work
But it does not support complicated Map, Re-duce APIs
How many problems you can detect
Depends on how well you implement MRUnit code
Seoul Software Testing Conference
Conclusion
MRUnit for Hadoop Unit TestDevelopmentIntegrate with QA site with CI serverNeed to use it
Seoul Software Testing Conference
Question?
Seoul Software Testing Conference
References
1.Hadoop WordCount example with new map reduce api (http://codesfusion.blogspot.com/2013/10/hadoop-wordcount-with-new-map-reduce-api.html)2.Hadoop Word Count Example (http://wiki.apache.org/hadoop/WordCount )3.Example: WordCount v1.0, Cloudera Hadoop Tutorial (http://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_walk_through.html )4.Testing Word Count (https://cwiki.apache.org/confluence/display/MRUNIT/Testing+Word+Count)5.Apache MRUnit Tutorial (https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial )6.Hadoop Integration Test Suite, Shopzilla (https://github.com/shopzilla/hadoop-integration-test-suite )7.Hadoop’s Elepahnt in the Room, Jeremy Lucas, Shopzilla (http://tech.shopzilla.com/2013/04/hadoops-elephant-in-the-room/ )8.Facebook Test MapReduce Local (https://github.com/facebook/hadoop-20/blob/master/src/test/org/apache/hadoop/mapreduce/TestMapReduceLocal.java )9.Yahoo HIT Hadoop Integrated Testing (http://www.slideshare.net/ydn/hi-tv3?from_search=1 )
Seoul Software Testing Conference
Seoul Software Testing Conference