There might be a requirement to pass additional parameters to the mapper and reducers, besides the the inputs which they process. Lets say we are interested in Matrix multiplication and there are multiple ways/algorithms of doing it. We could send an input parameter to the mapper and reducers, based on which the appropriate way/algorithm is picked. There are multiple ways of doing this
Setting the parameter:
1. Use the -D command line option to set the parameter while running the job.
2. Before launching the job using the old MR API
3. Before launching the job using the new MR API
Getting the parameter:
1. Using the old API in the Mapper and Reducer. The JobConfigurable#configure has to be implemented in the Mapper and Reducer class.
The variable N can then be used with the map and reduce functions.
2. Using the new API in the Mapper and Reducer. The context is passed to the setup, map, reduce and cleanup functions.
Setting the parameter:
1. Use the -D command line option to set the parameter while running the job.
2. Before launching the job using the old MR API
JobConf job = (JobConf) getConf(); job.set("test", "123");
3. Before launching the job using the new MR API
Configuration conf = new Configuration(); conf.set("test", "123"); Job job = new Job(conf);
Getting the parameter:
1. Using the old API in the Mapper and Reducer. The JobConfigurable#configure has to be implemented in the Mapper and Reducer class.
private static Long N; public void configure(JobConf job) { N = Long.parseLong(job.get("test")); }
The variable N can then be used with the map and reduce functions.
2. Using the new API in the Mapper and Reducer. The context is passed to the setup, map, reduce and cleanup functions.
Configuration conf = context.getConfiguration(); String param = conf.get("test");
Perfect, couldn't be easier. I was up and running with mappers and reducers taking parameters from the JobConf class in minutes with this information. Thanks!
ReplyDeleteTim - Thanks for the response. I thought a minute to write this blog entry or not, because it was very trivial. But, it got the most hits :)
DeleteWhen I started with Hadoop I found that changes were happening at a very fast pace and sometimes I got on the wrong foot and so this blog.
Hope you find the other entries here also helpful.
Praveen : Is there any means by which I can pass certain parameters from main to the partitioner function (my custom partitioner) ?
ReplyDeleteArun,
DeleteOne hack is to write the parameters in a file in HDFS and read them in the custom partitioner. I don't like this approach, there might be some better ways of solving it.
Post the query in the Apache forums for a better response.
Praveen
check this on StackOverflow, it shows how to implement a configurable partitioner:
Deletehttp://stackoverflow.com/questions/37752450/passing-dynamic-value-to-partitioner-code-in-mapreduce
Any idea on how I can pass the ArrayList to the mapper. The very inefficient workaround I can think of is converting it to String. Also if you could suggest as how to I can an ArrayList to the driver method.
ReplyDeleteThank you!
Write the data into HDFS (if the data is huge) and read it in the setup() of the mapper and reducer as required. Another option is to send the ArrayList as a String (if the data is small).
DeleteThere might be some better ways, which I am not aware of.
great help, thanks Pravin
ReplyDeleteThank you very much! You solved my problem. ^^ Thankyou Thankyou~
ReplyDeleteDear Praveen,
ReplyDeleteThanks for your post, what would you do if you have many parameters? Is there a way to put the parameters in a settings file and make them available to the mapper/reducer?
Dieter,
DeleteI think you should make use of DistributedCache in case you have multiple parameters to be passed onto mapper/reducer
Check this:
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#DistributedCache
Thanks for the post! Only the last solution worked for me in the new api. I would add that by using the getInt,setInt methods it would be slightly more efficient
ReplyDeleteIt is important to know that Configuration object is cloned at some point, so the order is important. i.e.:
ReplyDeleteConfiguration conf = getConf();
conf.set("mmsilist", mmsiList);
conf.set("msgidlist", msgidList);//set BEFORE Job instance is created
Job job = new Job(conf, "MyJob");
//
//If you try to set conf.set("mmsilist", mmsiList);
after the job is instantiated, it will not work .
This was a great tip. Thanks!
DeleteThis comment has been removed by the author.
ReplyDeleteHey, Thank you for your post, however I'm having problems. I'm using hadoop version 0.20.205, but context.getConfiguration(), java says context cannot be resolved. Is there a particular library I should be using? Is there a different variable I need to initialize first?
ReplyDeleteThanks!
thanks!! man :D
ReplyDeleteThanks Praveen, this is very helpful.
ReplyDeletehow to set an object in conf and hoe to get
ReplyDeletewhat is the meaning of this in MAPPER Class?
ReplyDeleteConfiguration conf=context.getConfiguration();
String newWord=conf.get("RunTimeArg");
what is the meaning of this in DRIVER Class?
Configuration conf = new Configuration(); conf.set("RunTimeArg",args[2]);
Job job = new Job(conf, "DynamicWordCount");