Thursday, November 10, 2011

Retrieving Hadoop Counters in Map/Reduce Tasks

Hadoop uses Counters to gather metrics/statistics which can later be analyzed for performance tuning or to find bugs in the MapReduce programs. There are some predefined Counters and Custom counters can also be defined. JobCounter and TaskCounter contain the predefined Counters in Hadoop. There are lot of tutorials on incrementing the Counters from the Map and Reduce tasks. But, how to fetch the current value of the Counter from with the Map and Reduce tasks.

Counters can be incremented using the Reporter for the Old MapReduce API or by using the Context using the New MapReduce API. These counters are sent to the TaskTracker and the TaskTracker will send to the JobTracker and the JobTracker will consolidate the Counters to produce a holistic view for the complete Job. The consolidated Counters are not relayed back to the Map and the Reduce tasks by the JobTracker. So, the Map and Reduce tasks have to contact the JobTracker to get the current value of the Counter.

StackOverflow Query has the details on how to get the current value of a Counter from within a Map and Reduce task.

Edit: Looks like it's not a good practice to  retrieve the counters in the map and reduce tasks. Here is an alternate approach for passing the summary details from the mapper to the reducer. This approach requires some effort to code, but is doable. It would have been nice if the feature had been part of Hadoop and not required to hand code it.

No comments:

Post a Comment