Thursday, April 3, 2014

Oozie High Availability

In the earlier blog entries, we looked at how to install/configure Oozie, create and submit a simple work flow and finally execute the work flow at regular intervals of time.
Oozie work flows are written in HPDL (Hadoop Process Definition Language) using Hue or as simple as using a notepad. Note that writing HPDL is not easy and so using Hue would be easiest approach as it automatically generates the HPDL xml code. Oozie Client submits the work flow definition to the Oozie Server which in turn starts different actions as defined in the work flow definition.

As seen in the above diagram, the Oozie Server is a single point of failure. Oozie now supports HA (active-active Oozie Server) and the feature has been included in CDH 5. Here are the instructions for configuring Oozie in HA mode and here are more details about the Oozie HA feature from Cloudera.

1 comment:

  1. Hi Praveen,
    I am a follower of your blog. It is very useful for me to know what is going on around with Big Data community. Thank you for very useful info.
    I read about ‘Spark’ in your blog. Processing time is 4 times faster than traditional Hadoop MR batch processing.
    Do you still suggest to do 'Cloudera Certified Developer for Apache Hadoop CDH4 (CCD-410)' certification after releasing of Spark. How does it helpful. Because I am preparing for this certification and looking for a Big Data opportunity.
    THanks for your advice.