Big Data and Cloud Tips: Imapla or Hive

Saturday, November 23, 2013

Imapla or Hive - when to use what?

Hive has been initially developed by Facebook and later released to the Apache Software Foundation. Here is a paper from Facebook on the same. Impala from Cloudera is based on the Google Dremel paper. Both, Impala and Hive provide a SQL type of abstraction for data analytics for data on on top of HDFS and use the Hive metastore. So, when to use Hive and when to use Impala?

Here is a discussion on Quora on the same. Here is a snippet from the Cloudera Impala FAQ

Impala is well-suited to executing SQL queries for interactive exploratory analytics on large datasets. Hive and MapReduce are appropriate for very long running, batch-oriented tasks such as ETL.

And here is a nice presentation which summarizes to the point about Hive vs Imapala. So, I won't be repeating them again in this blog.

Note that performance is not the only non-functional-requirement for picking a patricular framework. Also, the Big Data had been moving rapidly and the comparison results might trip the other way in the future as more improvements are made to the corresponding framework.

4 comments:

Raghu NittalaJune 3, 2014 at 2:16 PM
I have a quick doubt here. Can we install Impala on an Apache Hadoop distribution. I am using Hadoop 1.0.4 and Hive 0.9. I saw people saying that Impala works only with CDH or Hadoop 2.0. Is this true? Thanks for the post
ReplyDelete
Replies
Kiran VTAugust 13, 2014 at 1:41 AM
yes, its a cloudera product, you can check out pheonix which has similar features as that of impala. also there are plenty of projects which are implementing SQL for hadoop.
ReplyDelete
Replies

Add comment

Pages

Saturday, November 23, 2013

Imapla or Hive - when to use what?

4 comments: