an introduction to test driven development on mapreduce
TRANSCRIPT
What is TDD
• Test first development approach where developers write test cases to capture the failure cases and improve the system to the acceptable state.
Why it is difficult in Hadoop
• Hadoop is a distributed framework designed to run on a larger cluster with terra bytes of data
• Mimic the behavior of a Hadoop cluster is very hard
The Best Practice
• Golden Rule of Programming
Always abstract your business logic. This will make easier for you to unit test
Example
public class StockMeanReducer extends Reducer <Text,DoubleWritable,Text,DoubleWritable>{
private DoubleWritable writable = new DoubleWritable();
@Override
public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException
{
double total = 0;
int count = 0;
for(DoubleWritable stockPrice : values)
{
total += stockPrice.get();
count++;
}
writable.set(total / count);
context.write(stockText, writable);
}
The best approach – Abstraction
public class StockMeanReducer2 extends Reducer <Text,DoubleWritable,Text,DoubleWritable>
{
private DoubleWritable writable = new DoubleWritable();
private final StockMean stockMean = new StockMean();
@Override
public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws
IOException, InterruptedException
{
stockMean.reset();
for(DoubleWritable stockPrice : values)
{
stockMean.add(stockPrice.get());
}
writable.set(stockMean.calculate());
context.write(stockText, writable);
}
}
The best approach – Abstraction- cont
public class StockMean
{
private double total = 0;
private int instance = 0;
public void add(final double total)
{
this.total += total;
++this.instance;
}
public double calculate()
{
return total / (double) instance;
}
public void reset()
{
this.total = 0;
this.instance = 0;
}
}
Testing Map Reduce Jobs
• Best Practices are fine. Still I need to test the code inside my mapper and reducer. What shall I do??
Introduction to MRUNIT
• MRUnit is a Map Reduce unit testing framework.
• Developed by cloudera and been open sourced and currently in Apache Incubator.
• Developed on top of Mockito mock object framework
• It is a generic framework that you can use with both Junit and TestNG
MRUnit – Testing Mapper
Unit Test
Mapper
MapDriver
Mock OutputCollector
MR Unit
(1) Set up and execute test
(2) Call Map method with key / value
(3) Map output is captured
(4) Compare the expected outputs
Sample Mapper
public class StockMeanMapper extends
Mapper<Text,DoubleWritable,Text,DoubleWritable>
{
@Override
protected void map(Text key, DoubleWritable value, Context
context)
throws IOException, InterruptedException
{
if(key == null) return;
if(key.toString().equalsIgnoreCase("xyz")) return;
context.write(key, value);
}
}
Mapper Unit Test
public class StockMeanMapperTest {
private Mapper<Text,DoubleWritable,Text,DoubleWritable> mapper;
private MapDriver<Text,DoubleWritable,Text,DoubleWritable> driver;
@Before
public void setUp()
{
mapper = new StockMeanMapper();
driver = new MapDriver<Text,DoubleWritable,Text,DoubleWritable>(mapper);
}
@Test
public void testPositiveConditionStockMeanMapper() throws IOException
{
List<Pair<Text, DoubleWritable>> results = driver.withInput(new Text("rahul"), new
DoubleWritable(1))
.withOutput(new Text("rahul"), new DoubleWritable(1))
.run();
assertEquals(1, results.size());
}
}
MRUnit – Testing Reducer
Unit Test
Reducer
ReduceDriver
Mock OutputCollector
MR Unit
(1) Set up and execute test
(2) Call Reduce method with key / value
(3) Reduce output is captured
(4) Compare the expected outputs
Sample Reducer
public class StockMeanReducer2 extends Reducer <Text,DoubleWritable,Text,DoubleWritable>
{
private DoubleWritable writable = new DoubleWritable();
private final StockMean stockMean = new StockMean();
@Override
public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws
IOException, InterruptedException
{
stockMean.reset();
for(DoubleWritable stockPrice : values)
{
stockMean.add(stockPrice.get());
}
writable.set(stockMean.calculate());
context.write(stockText, writable);
}
}
Reducer Unit Test
public class StockMeanReducerTest {
private ReduceDriver<Text,DoubleWritable,Text,DoubleWritable> driver;
private Reducer<Text,DoubleWritable,Text,DoubleWritable> reducer;
@Before
public void setup()
{
reducer = new StockMeanReducer2();
driver = new ReduceDriver<Text,DoubleWritable,Text,DoubleWritable>(reducer);
}
@Test
public void testStockPositive() throws IOException
{
Pair<Text,DoubleWritable> assertPair = new Pair<Text,DoubleWritable>(new Text("ananth"),
new DoubleWritable(300));
List<Pair<Text,DoubleWritable>> results = driver.withInput(new Text("ananth"),
Arrays.asList(new
DoubleWritable(500),
new DoubleWritable(100)))
.run();
assertEquals(assertPair, results.get(0));
}
}
MRUnit – Testing MapReduce
Unit Test
Reducer
MR Unit
(1) Set up and execute test
(4) Call Reduce method with key / value (5) Compare the expected
outputs
MapReduceDriver
MapDriver
(3)Shuffle
ReduceDriver
Mapper(2) Call Map method with key / value
(3) MRUnit perform it’s own in memory shuffle phase
MapReduce Unit Test
public class StockMeanMapReduceTest
{
private Mapper<Text,DoubleWritable,Text,DoubleWritable> mapper;
private Reducer<Text,DoubleWritable,Text,DoubleWritable> reducer;
private MapReduceDriver<Text,DoubleWritable,Text,DoubleWritable,Text,DoubleWritable> driver;
@Before
public void setup()
{
mapper = new StockMeanMapper();
reducer = new StockMeanReducer2();
driver = new MapReduceDriver<Text,DoubleWritable,Text,DoubleWritable,Text,DoubleWritable>(mapper,reducer);
}
MapReduce Unit Test – Contd..
@Testpublic void testPositive() throws IOException{
Pair<Text,DoubleWritable> inputPair1 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(300));Pair<Text,DoubleWritable> inputPair2 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(100));Pair<Text,DoubleWritable> inputPair3 = new Pair<Text,DoubleWritable>(new Text("rahul"), new DoubleWritable(400));Pair<Text,DoubleWritable> inputPair4 = new Pair<Text,DoubleWritable>(new Text("xyz"), new DoubleWritable(50));
Pair<Text,DoubleWritable> assertPair1 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(200));Pair<Text,DoubleWritable> assertPair2 = new Pair<Text,DoubleWritable>(new Text("rahul"), new DoubleWritable(400));
List<Pair<Text,DoubleWritable>> assertPair = Arrays.asList(assertPair1,assertPair2);
List<Pair<Text,DoubleWritable>> results = driver.withInput(inputPair1).withInput(inputPair2).withInput(inputPair3).withInput(inputPair4).run();
assertEquals(assertPair, results);
}
Wait, there is one more thing!!!
• Hadoop is all about data.
• We can’t always assume that data will be 100% perfect.
• So do MRUnit unit testing by mocking Object is enough??
Hadoop LocalFile System
• Hadoop API provides LocalFileSystem, which enable you to read data from your local file system and test your map reduce jobs.
• Best practice is to take a sample of your real data and load in to local file system and test it out.
• LocalFileSystem only work in Linux based System.
How can I test LocalFileSystem in Windows? – A little hack
public class WindowsLocalFileSystem extends LocalFileSystem
{
public WindowsLocalFileSystem()
{
super();
}
@Override
public boolean mkdirs (
final Path path,
final FsPermission permission)
throws IOException
{
final boolean result = super.mkdirs(path);
this.setPermission(path, permission);
return result;
}
Hack Contd..
@Override
public void setPermission (
final Path path,
final FsPermission permission)
throws IOException
{
try {
super.setPermission(path, permission);
}
catch (final IOException e) {
System.err.println("Cant help it, hence ignoring IOExceptionsetting persmission for path \"" + path +
"\": " + e.getMessage());
}
}
}
How to use it?
public class StockMeanDriver extends Configured implements Tool
{
/**
* @param args
* @throws Exception
*/
public static void main(String[] args) throws Exception {
ToolRunner.run(new StockMeanDriver(), null);
}
How to use it – contd..
@Override
public int run(String[] arg0) throws Exception
{
Configuration conf = getConf();
conf.set("fs.default.name", "file:///");
conf.set("mapred.job.tracker", "local");
conf.set("fs.file.impl", "org.intellipaat.training.hadoop.fs.WindowsLocalFileSystem");
conf.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization," +
"org.apache.hadoop.io.serializer.WritableSerialization");
Job job = new Job(conf,"Stock Mean");
job.setJarByClass(StockMeanDriver.class);
job.setMapperClass(StockMeanMapper2.class);
job.setReducerClass(StockMeanReducer2.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(DoubleWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path("input"));
FileOutputFormat.setOutputPath(job, new Path("output"));
job.waitForCompletion(Boolean.TRUE);
return 0;
}}