Wednesday, July 10, 2013

When to use HBase and when MapReduce?

Very often I do get a query on when to use HBase and when MapReduce. HBase provides an SQL like interface with Phoenix and MapReduce provides a similar SQL interface with Hive. Both can be used to get insights from the data.

I would like the analogy of HBase/MapReduce to an plane/train. A train can carry a lot of material at a slow pace, while a plane relatively can carry less material at a faster pace. Depending on the amount of the material to be transferred from one location to another and the urgency, either a plane or a train can me used to move material.

Similarly HBase (or in fact any database) provides relatively low latency (response time) at the cost of low throughput (data transferred/processed), while MapReduce provides high latency at the cost of high throughput. So, depending on the NFR (Non Functional Requirements) of the application either HBase or MapReduce can be picked.

E-commerce or any customer facing application requires a quick response time to the end user and and also only a few records related to the customer have to be picked, so HBase would fit the bill. But, for all the back end/batch processing MapReduce can be used.


  1. Just to add to that HBQL as well as Data Nucleus can also be used as SQL interface to Hbase.

    1. Srigopal - if you are interested to write an aricle on the same, please let me know and I can add it here - I will attribute it :) - Praveen