Friday, February 17, 2012

Getting started with HBase - Part 2

Once HBase has been installed as shown in the previous blog, now it's time to create some tables and populate data in them. HBase provides a shell for executing the DDL (Data Definition Language) and DML (Data Manipulation Language) commands which can be invoked using the following command

$HBASE_HOME/bin/hbase shell

A HBase cluster will have a single master and multiple region servers on different nodes. Each region server will host multiple regions and each region will host a complete table or a part of the table.

1) For creating a table 'testtable' with a column family 'colfam1'
create 'testtable', 'colfam1'
create 'testtable', 'colfam1', { SPLITS => ['row-300', 'row-500', 'row-700' , 'row-900'] }
The first command creates a table in single region, when the 'testtable1' table size crosses a certain threshold the table is split into two regions.

While the second command splits the table into five regions based on the row keys. Each of the 5 regions might be hosted on different nodes, which will lead to better throughput/latency and also better utilization of the cluster.

2) To get the list of all the table created in the HBase cluster.
list 'testtable'
3) To insert data into the 'testtable' table.
put 'testtable', 'myrow-1', 'colfam1:q1', 'value-1'
put 'testtable', 'myrow-2', 'colfam1:q2', 'value-2'
put 'testtable', 'myrow-2', 'colfam1:q3', 'value-3'
The HBase Shell is (J)Ruby’s IRB with some HBase-related commands added. Anything that can be done in IRB, can also be done in the HBase Shell. The below command will insert 1K rows into the 'testtable' table.
for i in '0'..'9' do for j in '0'..'9' do \
for k in '0'..'9' do put 'testtable', "row-#{i}#{j}#{k}", \
"colfam1:#{j}#{k}", "#{j}#{k}" end end end
4) For getting data from the 'testtable' table
get 'testtable', 'myrow-1'
scan 'testtable'
5) For deleting data from the 'testtable' table.
delete 'testtable', 'myrow-2', 'colfam1:q2'
6) For deleting the table.
disable 'testtable'
drop 'testtable'
In the coming blog, we will go through how to create coprocessors which were introduced in HBase 0.92 release. Coprocessors have observers (which are similar to triggers) and end points (which are similar to stored procedures) in RDBMS.

As I mention again and again, HBase is not a solution to solve every problem.

1 comment:

  1. Hi!

    good tutorial!!
    is it possible to "update" multiple rows in hbase?
    if yes how?