How to get input file's name inside mapper

Oct 1 2013

OK, I agree that processing large scale with Hadoop is cool, but sometimes, it makes me frustrated when I was doing my course project. Often, in the task of Map-Reduce, we will use join, as the input for the whole job may be muti-files.

How to handle muti-files in one Mapper：

Multiple mappers: each mapper handles its own file: Code Demo

1
2
3

MultipleInputs.addInputPath(conf, new Path(args[0]), TextInputFormat.class, CustomerMap.class);
MultipleInputs.addInputPath(conf, new Path(args[1]), TextInputFormat.class, TransactionMap.class);
FileOutputFormat.setOutputPath(conf, new Path(args[2]));

Single mapper：one mapper handles all input files: Code Demo (As following code, we can know the souce for the data inside the mapper, then take responding actions)

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {

    public void map(LongWritable key, Text value, OutputCollector&lt;Text,Text> output, Reporter reporter) throws IOException {
        //Get FileName from reporter
        FileSplit fileSplit = (FileSplit)reporter.getInputSplit();
        String filename = fileSplit.getPath().getName();

        //String line = value.toString();
        output.collect(new Text(filename),value);
    }
}

PS: The input for mapper can be a folder：

1	FileInputFormat.setInputPaths(conf, new Path("/tmp/"));

疯狂de咸蛋

计算机那些事大数据

How to get input file's name inside mapper

How to handle muti-files in one Mapper：

PS: The input for mapper can be a folder：