OK, I agree that processing large scale with Hadoop is cool, but sometimes, it makes me frustrated when I was doing my course project.
Often, in the task of Map-Reduce, we will use join, as the input for the whole job may be muti-files.
How to handle muti-files in one Mapper:
Multiple mappers: each mapper handles its own file: Code Demo
Single mapper:one mapper handles all input files: Code Demo (As following code, we can know the souce for the data inside the mapper, then take responding actions)