Control the Number of Mappers for MapReduce Job

The number of Mappers that Hadoop creates is determined by the number of Input
Splits you have in your Data.

 Relation is simple:

No. of Mappers = No. of Input Splits.

So, in order to control the Number of Mappers, you have to first control the Number of
Input Splits Hadoop creates before running your MapReduce program. One of the easiest 
ways to control it is setting the property 'mapred.max.split.size' while running your 
MR program.

 Let's assume your Input data is 1 TB. So, 

Number of Physical Data Blocks = (1 * 1024 * 1024 / 128) = 8192 Blocks.

By Default, if you don't specify the Split Size, it is equal to the Blocks (i.e.) 8192.
Thus, your program will create and execute 8192 Mappers !!!

Let's say you want to create only 100 Mappers to handle your job.
As mentioned above, 100 Mappers means 100 Input Splits. So each Split size should be set
to (1 * 1024 * 1024 / 100) = 10486 MB

So In order to reduce the number of Mappers, Increase the split size.

There are no comments yet

Leave a comment

Your email address will not be published. Required fields are marked *