Tuesday, November 15, 2011

IPC between Hadoop Daemons

Hadoop has multiple daemons namely NameNode, DateNode, CheckPointNode, BackUpNode, JobTracker, TaskTracker and finally the client which submits the job. Interaction between the daemons is a bit complex and not well documented.

Hadoop has it's own RPC mechanism for IPC. The arguments and the return type are serialized using Writable. Protocols for RPC extend the o.a.h.ipc.VersionedProtocol. So, to get the interaction between the Hadoop daemons, references for the VersionedProtocol will be useful.

First get the Hadoop code locally using SVN (pick the appropriate branch)

svn co http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/

Once the code has been got locally, the following command will give all the protocol definitions in Hadoop

grep -r "extends VersionedProtocol" * | cut -f1 -d':'

common/src/java/org/apache/hadoop/security/authorize/RefreshAuthorizationPolicyProtocol.java
common/src/java/org/apache/hadoop/security/RefreshUserToGroupMappingsProtocol.java
common/src/java/org/apache/hadoop/ipc/AvroRpcEngine.java
common/src/test/core/org/apache/hadoop/security/TestDoAsEffectiveUser.java
common/src/test/core/org/apache/hadoop/ipc/TestRPC.java
common/src/test/core/org/apache/hadoop/ipc/MiniRPCBenchmark.java
common/src/test/system/java/org/apache/hadoop/test/system/DaemonProtocol.java
hdfs/src/java/org/apache/hadoop/hdfs/server/protocol/NamenodeProtocol.java
hdfs/src/java/org/apache/hadoop/hdfs/server/protocol/InterDatanodeProtocol.java
hdfs/src/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
hdfs/src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
hdfs/src/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java
mapreduce/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java
mapreduce/src/java/org/apache/hadoop/mapred/InterTrackerProtocol.java
mapreduce/src/java/org/apache/hadoop/mapred/AdminOperationsProtocol.java
mapreduce/src/java/org/apache/hadoop/mapred/TaskUmbilicalProtocol.java
mapreduce/src/contrib/raid/src/java/org/apache/hadoop/raid/protocol/RaidProtocol.java

Some of the interesting interfaces are NamenodeProtocol, InterDatanodeProtocol, DatanodeProtocol, ClientProtocol, ClientDatanodeProtocol, InterTrackerProtocol, AdminOperationsProtocol and TaskUmbilicalProtocol. These can be explored for further insights into Hadoop.

Edit: Converted the above list of classes into URL's for the eager and the impatient.

1 comment: