Thursday, October 27, 2011

'Hadoop - The Definitive Guide' Book Review

For those who are interested and serious in getting into Hadoop, besides going through the tons of articles and tutorials on the Internet, 'Hadoop - The Definitive Guide' (2nd Edition) by Tom White is a must have book. Most of the tutorials stop with the 'Word Count' example, but this book goes into the next level explaining the nuts-n-bolts of the Hadoop framework with a lot of examples and references. The most interesting and important thing is that the book also mentions why certain design decisions where made in Hadoop.


Not only the book covers HDFS and MapReduce, but also gives an overview of the layers which sit on top of Hadoop like Pig, Hive, HBase, ZooKeeper and Sqoop.

The book could definitely have the following
  • MapReduce is covered in detail, but HDFS internals and fine-tuning are at a high-level.
  • Also, to be in sync with Hadoop development and features, it's absolutely necessary to get source from trunk or from another branch and build, package and try it out.
  • NextGen MapReduce, HDFS Federation and a slew of other features which are being released as part of Hadoop Release 0.23.

The 3rd Edition of same book is due on April 30th, 2012 and looks like it has more case studies as well as new material on MRv2. The 3rd Edition of the book is worth waiting, but for the impatient who want to get started immediately the 2nd Edition is a must have.

4 comments:

  1. Do i need to know Java or a OOPS language for sure to get into Hadoop? I have programming experience, but not essentially a OOPS language. Please respond. Thanks!

    ReplyDelete
    Replies
    1. Yes and No, based on what you want to do with Hadoop. Here is a more detailed explanation of the same.

      http://www.thecloudavenue.com/2012/10/is-java-prerequisite-for-getting.html

      Delete
  2. is there any way to take the data from hdfs for data visualization using d3/nvd3? Thanks in advance

    ReplyDelete
    Replies
    1. HDFS provides high latency, so I don't see a use case of directly integrating d3/nvd3 with HDFS.

      Delete