Thursday, November 24, 2011

HDFS Name Node High Availability

NameNode is the heart of HDFS. It stores the namespace for the filesystem and also tracks the location of the blocks in the the cluster. The location of the blocks are not persisted in the NameNode, but the DataNodes report the blocks it has to the NameNode when the DataNode starts. If an instance of NameNode is not available, then HDFS is not accessible till it's back running.

Hadoop 0.23 release introduced HDFS federation where it is possible to have multiple independent NameNodes in a cluster, where in a particular DataNode can have blocks for more than one Name Node. Federation provides horizontal scalability, better performance and isolation.

HDFS NN HA (NameNode High Availability) is an area where active work is happening. Here are the JIRA, Presentation and Video for the same. HDFS NN HA was not cut into 0.23 release and will be part of later releases. Changes are going in the HDFS-1623 branch, if someone is interested in the code.

Edit (11th March, 2012) : Detailed blog entry from Cloudera on HA.

Edit (14th October, 2013) : Explanation on HA from HortonWorks.

No comments:

Post a Comment