Saturday, February 9, 2013

Hadoop in a box

As a technology geek, I am not sure why I do something :) This experiment falls under the same category. Wanted to setup a Hadoop cluster on my Notebook. I have a HP 430 Notebook with a Core i5 processor and 4 GB RAM.

I choose to use Cloudera Manager for installation of the cluster as it automates most of the installation and configuration required for a Hadoop cluster. Below is how the configuration looks like.
On the Laptop Ubuntu 12.04 Desktop has been installed (host OS), which is the OS I use most of the time. On top of it Oracle VirtualBox had been installed, so as to enable running one OS (guest) on top of another OS (host). On top of VirtualBox, two instances of Ubuntu 12.04 Server have been installed (Guest OS). There is no need to have a full fledged desktop as nodes. Not only it is unnecessary, but Desktop versions of the OS make the whole thing slower. One of the Guest OS has been configured as a master/slave and other as a slave.

a) Below are the commands to start/stop the Cloudera Manager and the related Services
sudo /etc/init.d/postgresql start
sudo /etc/init.d/cloudera-scm-server-db start
sudo /etc/init.d/cloudera-scm-server start

sudo /etc/init.d/cloudera-scm-server stop
sudo /etc/init.d/cloudera-scm-server-db stop
sudo /etc/init.d/postgresql stop
Here is a screen shot for the same

 

b) Here is the command to start the two slave VMs.
vboxmanage startvm slave1 slave2
Here is a screen shot for the same


Now we two Ubuntu 12.04 Servers on top of Ubuntu 12.04  Desktop.


c) The Cloudera Manager Console is available at localhost:7180, the default  username/password is admin/admin.


d) If everything has been setup properly, the agents on VMs should send Heart Beats to the Cloudera Manager and the nodes should be reported as good in the below screen.


e) Similarly, all the services should be in a good and in a stopped state.


f) The services can be started from the same UI and status should be changed to started as below.


g) Now, it's time to run some basic commands. Login to one of the slave and then execute the commands as below.


h) CDH installations also provides a interface Hue to interact with the Hadoop cluster as shown below.



As mentioned earlier, it's not a full blown cluster for conducting performance tests or a proof of concept, but a basic one to get familiar with basic installation and usage of Hadoop cluster.

Will follow with another blog on hot to get started with the installation of a Cloudera CDH cluster.

No comments:

Post a Comment