Sunday, February 19, 2012

Getting started with HBase Coprocoessors - Observers

HBase 0.92 release provides coprocessors functionality which includes observers (similar to triggers for certain events) and endpoints (similar to stored procedures to be invoked from the client).

Observers can be at the region, master or at the WAL (Write Ahead Log) level. In this blog entry we will create a Region Observer.  Once a Region Observer has been created, it can specified in the hbase-default.xml which applies to all the regions and the tables in it or else the Region Observer can be specified on a table in which case it applies only to that table. Make sure that HBase is restarted after following the below steps for the coprocessor to be executed.

Excellent introduction to the coprocessors is available here.

1) Here is the code for the coprocessor which is triggered when the clients HTable.get() method is executed. The method preGet method is overridden to check if rowkey equals to @@@GETTIME@@@ in the HTable.get() and populates the result with an additional row containing the current time in ms.
package coprocessor;

import java.io.IOException;
import java.util.List;

import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
import org.apache.hadoop.hbase.coprocessor.ObserverContext;
import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
import org.apache.hadoop.hbase.util.Bytes;

public class RegionObserverExample extends BaseRegionObserver {
    public static final byte[] FIXED_ROW = Bytes.toBytes("@@@GETTIME@@@");

    @Override
    public void preGet(final ObserverContext<RegionCoprocessorEnvironment> e,
            final Get get, final List<KeyValue> results) throws IOException {
        if (Bytes.equals(get.getRow(), FIXED_ROW)) {
            KeyValue kv = new KeyValue(get.getRow(), FIXED_ROW, FIXED_ROW,
                    Bytes.toBytes(System.currentTimeMillis()));
            results.add(kv);
        }
    }
}
2) Compile the above class and prepare a jar file to be copied on all the Region Servers.

3) Modify the hbase-env.sh file on all the Region Server to include the jar file created earlier containing the coprocessor code.
export HBASE_CLASSPATH="/home/praveensripati/Installations/hbase-0.92.0/lib/coprocessor.jar"
4) Modify the hbase-site.xml to include the class name of the Region Observer on all the Region Servers.
    <property>
        <name>hbase.coprocessor.region.classes</name>
        <value>coprocessor.RegionObserverExample</value>
    </property>
5) Restart the HBase cluster.

6) Run the below command to create a 'testtable' table.
create 'testtable', 'colfam1' 
Run the below command to retrieve a row with a rowkey @@@GETTIME@@@ which triggers the above coprocessor to add the current time in ms to the response. Since there is no row with a rowkey @@@GETTIME@@@, only a single row is returned which is what was created in the RegionObserverExample.preGet() method.
hbase(main):002:0> get 'testtable','@@@GETTIME@@@'
COLUMN                                 CELL
 @@@GETTIME@@@:@@@GETTIME@@@           timestamp=9223372036854775807, value=\x00\x00\x015\x938\xD2i

Observers provide a look of hooks to HBase to add trigger like functionality. In the above example we have a Region Observer with a trigger to execute before the call to HTable.get() is executed. In the next blog, we will see how to create a endpoint which is similar to Stored Procedures which can be invoked from the client.

Edit (10th February, 2013) : Coprocessors can also be deployed dynamically without restaring the cluster to avoid any downtime. Check the `Coprocessor Deployment` section here for more details.

No comments:

Post a Comment