Wednesday, April 23, 2014

Screencast for submitting a job to a cluster

In an earlier blog (1) we looked at how to develop a simple MapReduce program in Eclipse on a Linux machine. Here (1) is another screencast on submitting a word count MapReduce program written in Python. For some reason Windows Media Player is not able to play the file, but VLC is able to.

Here is the code for the mapper (1) and the reducer (1). Note that Hadoop provides Streaming (1) feature for writing MapReduce programs in non Java languages.
https://dl.dropboxusercontent.com/u/3182023/Screencasts/Execute-WordCount-PythonMR-In-VirtualMachine.mp4
As observed in the screencast the Virtual Machine (VM) has all the necessary pieces to get easily started with Big Data and is provided as part of the Big Data training (1). The VM is updated regularly to add new frameworks and update the existing ones.

Hadoop tries to hide the underlying infrastructure details, so the MapReduce code and the command to submit a job is all the same for a single node and a thousand node cluster.

1 comment:

  1. Thanks. Have you tried Datameer on top of Cloudera for Analytics? You may do so if you use Hadoop because it is much easier and much faster. It is available at http://www.datameer.com for free trial.

    ReplyDelete