Wednesday, November 16, 2011

Hadoop release / version numbers


Edit: For easier access I have moved this to the pages section just below the blog header and no more maintaining this entry.

Software release numbers and features are daunting, remember Windows 1.0, 2.0, 2.1x, 3.0, 3.1, 95, 98, Me, NT WorkStation, 2000, XP, Vista, 7, 8 etc (I might have missed some of them). Microsoft seems to learning lately a bit with Windows 7, 8 naming. Ubuntu has a nice release scheme. The current release is 11.10 (the 1st number is the year and the 2nd number is the month of release), which says that it was released on October, 2011 and the next release number will be 12.04 (sometime around April, 2012). Ubuntu also has also clear guidelines on how long they would be supporting each version of Ubuntu.

Coming to Hadoop, there are multiple releases (0.20.*, 0.21, 0.22, 0.23 etc) and À la carte of features available in each of those release (CHANGES.txt in the release will have the JIRA's that have been pulled into that release) and users of Hadoop are confused on what release to pick. Some of these release are stable and some of them aren't. There is a lenghty discussion going in the Hadoop Groups to make the release numbers easy for everyone. Currently, 0.20.security, 0.22 and 0.23 are the releases on which work is happening actively. Proposal is to call them release 1, 2 and 3 for the coming releases, but it has yet to be finalized. 0.23 has been released recently, but is not production ready yet.

Besides the other improvements releated to HDFS, here is how the old/new MR API and Engine are supported in the different releases of Hadoop.


Old API New API Class MR Engine New MR Engine-MRv2
0.20.X Y Y Y N
0.22 Y Y Y N
0.23 Y(deprecated) Y N Y

No comments:

Post a Comment