Big Data and Cloud Tips: Limiting the usage Counters in Hadoop

Sunday, December 25, 2011

Limiting the usage Counters in Hadoop

Besides the JobCounter and the TaskCounter counters which Hadoop framework maintains, it's also possible to define custom counters for application level statistics.

Counters can be incremented using the Reporter for the Old MapReduce API or by using the Context using the New MapReduce API. These counters are sent to the TaskTracker and the TaskTracker will send to the JobTracker and the JobTracker will consolidate the Counters to produce a holistic view for the complete Job.

There is a chance that a rogue Job creates millions of counters and since these counters are stored in the JobTracker, there is a better chance that JobTracker will go OOM. To avoid such a scenario the number of counters that can be created per Job are limited by the Hadoop framework.

The following code is in the Counters.java. Note that this code is in the 20.203, 20.204 and 20.205 (now called 1.0) releases. Also note that some of the parameters are configurable and some are not.

/** limit on the size of the name of the group **/
private static final int GROUP_NAME_LIMIT = 128;
/** limit on the size of the counter name **/
private static final int COUNTER_NAME_LIMIT = 64;

private static final JobConf conf = new JobConf();
/** limit on counters **/
public static int MAX_COUNTER_LIMIT = 
conf.getInt("mapreduce.job.counters.limit", 120);

/** the max groups allowed **/
static final int MAX_GROUP_LIMIT = 50;

In trunk and 0.23 release the below code is there in the MRJobConfig.java. Note that the parameters are configurable.

public static final String COUNTERS_MAX_KEY = "mapreduce.job.counters.max";
public static final int COUNTERS_MAX_DEFAULT = 120;

public static final String COUNTER_GROUP_NAME_MAX_KEY = "mapreduce.job.counters.group.name.max";
public static final int COUNTER_GROUP_NAME_MAX_DEFAULT = 128;

public static final String COUNTER_NAME_MAX_KEY = "mapreduce.job.counters.counter.name.max";
public static final int COUNTER_NAME_MAX_DEFAULT = 64;

public static final String COUNTER_GROUPS_MAX_KEY = "mapreduce.job.counters.groups.max";
public static final int COUNTER_GROUPS_MAX_DEFAULT = 50;

The above mentioned configuration parameters are not mentioned in the release documentation an so thought it would be worth mentioning as a blog entry.

I would like to call it a hidden jewel :)

6 comments:

Harsh JFebruary 16, 2012 at 12:16 AM
Configurables are good but raise beyond defaults with great care. Configurations are for use-case specific needs, not for raising it when you hit an issue and reading around the issue points to a potential workaround.

You should also consider contributing a doc patch upstream if you noticed its not documented - cause hidden jewels may also bite devs, and you can help prevent that.
ReplyDelete
Replies
UnknownMay 16, 2013 at 1:21 AM
Good info Praveen.
ReplyDelete
Replies
UnknownMarch 26, 2014 at 4:43 PM
I don't think it's configurable. If you read the Limits.java code, you'll find that it use an private JobConf to get COUNTERS_MAX. Although you set the property in your Job, it's not passed to the Limits.java, kind of weird.
ReplyDelete
Replies
DeepakApril 20, 2014 at 4:51 PM
I tried setting both the counters to 500, I still see the exception
Details: http://stackoverflow.com/questions/12140177/more-than-120-counters-in-hadoop/23181414#23181414
ReplyDelete
Replies
mareddyonlineJuly 22, 2014 at 8:39 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies

Add comment

Pages

Sunday, December 25, 2011

Limiting the usage Counters in Hadoop

6 comments: