Big Data and Cloud Tips: Getting started with HBase

Friday, February 17, 2012

Getting started with HBase - Part 1

After setting the Hadoop (both HDFS and MapReduce) on Single or a Multiple nodes. Now it's time to install HBase. HBase provides real-time/random read/write access on top of HDFS. HBase runs a Master on one of the nodes (similar to NameNode/JobTracker) and RegionServer (similar to DataNode/TaskTracker) on the slave nodes.

HDFS and HBase daemons can co-exist on the same nodes or exist on different nodes. In this blog entry, we will assume that Hadoop and HBase daemons co-exist. The master node will have the HDFS NameNode, HBase Master and the ZooKeeper daemons, while the slave nodes will have the HDFS DataNode and the HBase RegionServer daemons.

- Follow the instructions to setup Hadoop on all the nodes (single and multiple node). HBase depends on HDFS, so starting the HDFS Daemons (NameNode and DataNode) is enough. The MapReduce Daemons (JobTracker and TaskTracker) are not required unless a MapReduce job is run on HBase.

- Download and extract the hbase-0.92.0.tar.gz on all the nodes in the $HBASE_HOME folder.

- Make sure that Hadoop version installed in the above step matches with the version of the Hadoop jars in the $HBASE_HOME/lib folder. If there is a version mismatch, copy the Hadoop jar files to the $HBASE_HOME/lib folder.

- Add the below properties in the hbase-site.xml on all the master/slave nodes in $HBASE_HOME/conf folder.

<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://master:9000/hbase</value>
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>

    <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2222</value>
    </property>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>ubuntu-host</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/home/praveensripati/Installations/hbase-0.92.0/tmp</value>
    </property>
</configuration>

- Set export JAVA_HOME=/usr/lib/jvm/jdk1.6.0_27 in $HBASE_HOME/conf/hbase-env.sh file on the master and on all the slaves. The Java location should be changed appropriately.

- Add the host name of the slave machine in the $HBASE_HOME/conf/regionservers file on the master as

slave1
slave2

- The following should be included in the /etc/hosts file on the master and on all the slaves (change the ip address appropriately).

127.0.0.1       localhost
192.168.56.1    master
192.168.56.101  slave1
192.168.56.102  slave2

- Start HDFS as $HADOOP_HOME/bin/start-dfs.sh and HBase as $HBASE_HOME/bin/start-hbase.sh. Check the Hadoop and HBase log files for any errors.

- Check the WebInterfaces for the HBase master (http://master:60010/master-status) and the Region Servers (http://slave1:60030/rs-status and http://slave2:60030/rs-status).

The master web page (http://master:60010/master-status) shows the list of tables (-ROOT- and .META. by default) and the Region Servers in HBase.