MapReduce 單元測試

1 篇文章 / 0 new
author
MapReduce 單元測試
若將程式放到 server 上執行, 然後利用 System.out 輸出訊息來進行除錯這是一個方式, 但這就需要頭腦清晰的人比較合適, 因為對於錯誤的結果能夠判斷可能出錯的位置來進行資料監視, 藉此進行除錯. 但對於常用單步執行方式進行監視除錯者就很累了, 此時可以借助 MRUint 類別來將 mapreduce 變成 local 端執行, 雖然跟實際server端運作有差異, 但就處理邏輯檢查部分是OK的,
►使用 MRUnit
1. 解開下載檔案(apache-mrunit-1.0.0-hadoop1-bin.tar.gz)後將  mrunit-1.0.0-hadoop1.jar 引入專案內
2. 測試 Mapper 做法
新增程式如下
public class TestMaxMapper {
    @Test // 需此宣告才能用 JUnit 進行測試
    public void ignoresMissingTemperatureRecord() throws IOException,InterruptedException {
        new MapDriver<LongWritable, Text, Text, IntWritable>()
                .withMapper(new MaxMapper()) //要測試的 mapper
                .withInput(new LongWritable(0), new Text("1950-12-04 9999")) //傳遞給 map 的資料
                .withOutput(new Text("1950"), new IntWritable(9998)) //設定預期的輸出結果
                .runTest(); //就資料取得資訊為 1950,9999
    }
}
若測試結果如預期, 則Junit不會產生任何輸出, 若結果與預計不符則會產生類似如下訊息
13/12/05 15:15:40 ERROR mrunit.TestDriver: Missing expected output (1950, 9998) at position 0.
13/12/05 15:15:40 ERROR mrunit.TestDriver: Received unexpected output (1950, 9999) at position 0.
3. 測試 Reduce 做法
public class TestMaxReducer {
    @Test
    public void returnsMaximumIntegerInValues() throws IOException,
            InterruptedException {
        new ReduceDriver<Text, IntWritable, Text, IntWritable>()
                .withReducer(new MaxReducer())
                .withInput(new Text("1950"), Arrays.asList(new IntWritable(10), new IntWritable(5)))
                .withOutput(new Text("1950"), new IntWritable(10))
                .runTest();
    }
}
4. 被測試的 Map/Reduce
Mapper
public class MaxMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    @Override
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String year = line.substring(0, 4);
        int data = Integer.parseInt(line.substring(11, 15));
        context.write(new Text(year), new IntWritable(data));
    }
}
Reducer
public class MaxReducer extends Reducer<Text, IntWritable, Text, IntWritable> {<br />
    @Override<br />
    public void reduce(Text key, Iterable<IntWritable> values, Context context)<br />
            throws IOException, InterruptedException {<br />
        int maxValue = Integer.MIN_VALUE;<br />
        for (IntWritable value : values) {<br />
            maxValue = Math.max(maxValue, value.get());<br />
        }<br />
        context.write(key, new IntWritable(maxValue));<br />
    }<br />
}
► 測試 Driver
Driver 一般還說比較單純, 問題也會比較少, 但 hadoop 也提供相關 util.Tool,util.ToolRunner 來供local測試, 但為了要讓 driver 除可以提供測試外也能同時server端執行, 因此作業上就稍有小變化
Driver
public class MaxDriver extends Configured implements Tool {
    @Override
    public int run(String[] args) throws Exception {//測試進入點
        Job job = new Job(conf, "Max Value");
        job.setJarByClass(getClass());
        job.setMapperClass(MaxMapper.class);
        job.setCombinerClass(MaxReducer.class);
        job.setReducerClass(MaxReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        return job.waitForCompletion(true) ? 0 : 1;
    }
    public static void main(String[] args) throws Exception {//正式執行進入點
        int exitCode = ToolRunner.run(new MaxDriver(), new String[] { args[0], args[1] } );
        System.exit(exitCode);
    }
}
測試 Driver 方法
public class TestMaxDriver {
    @Test
    public void test() throws Exception {
        Configuration conf = new Configuration();
        //指定使用 local 資源
        conf.set("fs.default.name", "file:///");
        conf.set("mapred.job.tracker", "local");
        Path input = new Path("in/testdriver.txt");
        Path output = new Path("out");
        FileSystem fs = FileSystem.getLocal(conf);
        fs.delete(output, true); // 刪除輸出目錄
 
        MaxDriver driver = new MaxDriver();
        driver.setConf(conf);
        int exitCode = driver.run(new String[] { input.toString(), output.toString() });//開始測試
        assertThat(exitCode, is(0));//JUnit追蹤點,不符條件時才會顯示資訊
    }
}
► 測試檔 testdriver.txt 內容
1950-12-04 7999
1950-12-10 8999
1951-12-04 3999
1952-11-10 8999
1952-11-04 7999
1952-12-10 9999
關鍵字: 
Free Web Hosting