As mentioned in the previous blog, Pig and Hive are higher level abstractions on top of MapReduce. Given a task like joining of two data sets, it's much more easier to join the data sets using Pig and Hive as it takes less coding effort when compared to MapReduce. So, many companies are going with Pig and Hive as they provide better developer productivity.
The problem with abstraction is that it gives less control on what can and cannot be done and debugging with higher abstraction is also difficult as it hides the underlying details. Same is the case with Pig and Hive also.
Some time back Netflix open sourced Lipstick. Google also recently published a blog entry recently around the same. Pig converts the PigLatin scripts into a DAG of MapReduce and the underlying MR data flows can be difficult to visualize. Lipstick enables developers to visualize and monitor the execution of the Pig data flows at a logical level (aka MR). Earlier, this had to be done using the log files or by looking at the MR Web console.
Netflix and Twitter had been very aggressive in open sourcing their internal projects. With so much choice around, there had not been a better time around software to take an idea from concept to realization. One of the main criteria for picking a framework or a software is the support provided by commercial vendors. A good percentage of the softwares around Big Data are free and can be put in production with minimal cost, but lack commercial support for the sake of lower downtime. Lipstick also falls under the same category. It has not been included in any of the commercial Big Data distributions like the one from Cloudera, Hortonworks, MapR and others. So, Lipstick has to be installed manually and patching (for any bugs/improvements) has to be taken care of by the end user.
In an upcoming blog, we will look into how to install and configure Lipstick on top of Pig.
The problem with abstraction is that it gives less control on what can and cannot be done and debugging with higher abstraction is also difficult as it hides the underlying details. Same is the case with Pig and Hive also.
Some time back Netflix open sourced Lipstick. Google also recently published a blog entry recently around the same. Pig converts the PigLatin scripts into a DAG of MapReduce and the underlying MR data flows can be difficult to visualize. Lipstick enables developers to visualize and monitor the execution of the Pig data flows at a logical level (aka MR). Earlier, this had to be done using the log files or by looking at the MR Web console.
Netflix and Twitter had been very aggressive in open sourcing their internal projects. With so much choice around, there had not been a better time around software to take an idea from concept to realization. One of the main criteria for picking a framework or a software is the support provided by commercial vendors. A good percentage of the softwares around Big Data are free and can be put in production with minimal cost, but lack commercial support for the sake of lower downtime. Lipstick also falls under the same category. It has not been included in any of the commercial Big Data distributions like the one from Cloudera, Hortonworks, MapR and others. So, Lipstick has to be installed manually and patching (for any bugs/improvements) has to be taken care of by the end user.
In an upcoming blog, we will look into how to install and configure Lipstick on top of Pig.
No comments:
Post a Comment