Wednesday, March 21, 2012

HDFS Facts and Fiction

Sometimes even the popular blogs/sites don't get the facts straight, not sure if articles are reviewed by others or not. As I mentioned earlier, changes in Hadoop are happening at a very rapid pace and it's very difficult to keep updated with the latest.

Here is a snippet from ZDNet article

> The Hadoop Distributed File System (HDFS) is a pillar of Hadoop. But its single-point-of-failure topology, and its ability to write to a file only once, leaves the Enterprise wanting more. Some vendors are trying to answer the call.

HDFS supports appends and that's the core of HBase. Without HDFS append functionality HBase doesn't work. There had been some challenges to get appends work in HDFS, but they have been sorted out. Also, HBase supports random/real-time read/write access of the data on top of HDFS.

Agree that NameNode is a SPOF in HDFS. But, HDFS High Availability will include two NameNodes and for now switchover from the active to the standby NameNode is a manual task to be done by an operator, work is going on to make this automatic. Also, there are mitigations around SPOF in HDFS like having a Secondary NameNode, writing the NameNode meta data to multiple locations.

One thing to ponder if HDFS is so unreliable, we wouldn't have seen so many cluster using HDFS. Also, on top of HDFS other Apache frameworks are also being built.

No comments:

Post a Comment