Big Data and Cloud Tips: WhatsWrong : DataNode on remote machine not able to connect to NameNode

Tuesday, January 3, 2012

WhatsWrong : DataNode on remote machine not able to connect to NameNode

I have setup a Hadoop cluster as shown above with `NameNode + DataNode` on one node and `DataNode` on a different node with the below configuration files on both the nodes.

core-site.xml

<?xml version="1.0"?>
<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>

hdfs-site.xml

<?xml version="1.0"?>
<configuration>
     <property>
         <name>dfs.replication</name>
         <value>2</value>
     </property>
</configuration>

The DataNode on the remote machine is not able to connect to the NameNode and here is the error in the hadoop-praveensripati-datanode-Node2.log file on the Node2, where Node1 is the hostname of the node which has the NameNode.

2012-01-03 16:57:57,924 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node1/192.168.0.101:9000. Already tried 0 time(s).
2012-01-03 16:57:58,926 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node1/192.168.0.101:9000. Already tried 1 time(s).
2012-01-03 16:57:59,928 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node1/192.168.0.101:9000. Already tried 2 time(s).

Made sure that

- Both the nodes can ping each other.
- Successfully ssh'd from the master to the slave.
- Configured the `/etc/hosts` and `/etc/hostname` properly.
- `netstat -a | grep 9000` gives the below output.

tcp        0      0 localhost:9000          *:*                     LISTEN     
tcp        0      0 localhost:9000          localhost:33476         ESTABLISHED
tcp        0      0 localhost:33571         localhost:9000          TIME_WAIT  
tcp        0      0 localhost:33476         localhost:9000          ESTABLISHED

What's wrong with the above setup?

Respond back in the comments and I will give a detailed explanation once I get a proper response.

8 comments:

mrkFebruary 1, 2012 at 4:34 PM
I think fs.default.name should point to node1 ip address instaed of localhost.
ReplyDelete
Replies
Praveen SripatiFebruary 1, 2012 at 5:32 PM
As to why, the source code looks like the following for fs.default.name set to localhost

ServerSocket socket = new ServerSocket(9000);
socket.bind(localhost);

Because bind address is assigned to localhost, the namenode process only can accept connection from localhost. If bind address is assigned to the name of machine name or ip address, then namenode process can accept any connection from remote machine.
ReplyDelete
Replies
srikanth ayalasomayajuluJuly 4, 2012 at 10:33 AM
i tried to setup a cluster , with namenode on one node, 3 datanodes and i set fs.default.name to localhost on namenode and master ip address on datanodes. but i am getting the same error. can anyone please help me.
ReplyDelete
Replies
Praveen SripatiJuly 4, 2012 at 11:52 AM
> i set fs.default.name to localhost on namenode

Set it to hostname/ip as mentioned in the first comment instead of the localhost.
ReplyDelete
Replies
srikanth ayalasomayajuluJuly 5, 2012 at 7:07 AM
Yes, i replaced and it worked, thank you
ReplyDelete
Replies
UnknownFebruary 6, 2013 at 6:11 PM
Hi, would this also be the case for the mapred-site.xml. That is, would putting localhost:9001 in the mapred-site.xml also cause bind problems for a datanode trying to connect to the namenode?

ReplyDelete
Replies

Add comment

Pages

Tuesday, January 3, 2012

WhatsWrong : DataNode on remote machine not able to connect to NameNode

8 comments: