I have setup a Hadoop cluster as shown above with `NameNode + DataNode` on one node and `DataNode` on a different node with the below configuration files on both the nodes.
core-site.xml
hdfs-site.xml
The DataNode on the remote machine is not able to connect to the NameNode and here is the error in the hadoop-praveensripati-datanode-Node2.log file on the Node2, where Node1 is the hostname of the node which has the NameNode.
Made sure that
- Both the nodes can ping each other.
- Successfully ssh'd from the master to the slave.
- Configured the `/etc/hosts` and `/etc/hostname` properly.
- `netstat -a | grep 9000` gives the below output.
What's wrong with the above setup?
Respond back in the comments and I will give a detailed explanation once I get a proper response.
core-site.xml
<?xml version="1.0"?> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
hdfs-site.xml
<?xml version="1.0"?> <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
The DataNode on the remote machine is not able to connect to the NameNode and here is the error in the hadoop-praveensripati-datanode-Node2.log file on the Node2, where Node1 is the hostname of the node which has the NameNode.
2012-01-03 16:57:57,924 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node1/192.168.0.101:9000. Already tried 0 time(s). 2012-01-03 16:57:58,926 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node1/192.168.0.101:9000. Already tried 1 time(s). 2012-01-03 16:57:59,928 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Node1/192.168.0.101:9000. Already tried 2 time(s).
Made sure that
- Both the nodes can ping each other.
- Successfully ssh'd from the master to the slave.
- Configured the `/etc/hosts` and `/etc/hostname` properly.
- `netstat -a | grep 9000` gives the below output.
tcp 0 0 localhost:9000 *:* LISTEN tcp 0 0 localhost:9000 localhost:33476 ESTABLISHED tcp 0 0 localhost:33571 localhost:9000 TIME_WAIT tcp 0 0 localhost:33476 localhost:9000 ESTABLISHED
What's wrong with the above setup?
Respond back in the comments and I will give a detailed explanation once I get a proper response.
I think fs.default.name should point to node1 ip address instaed of localhost.
ReplyDeleteAs to why, the source code looks like the following for fs.default.name set to localhost
ReplyDeleteServerSocket socket = new ServerSocket(9000);
socket.bind(localhost);
Because bind address is assigned to localhost, the namenode process only can accept connection from localhost. If bind address is assigned to the name of machine name or ip address, then namenode process can accept any connection from remote machine.
i tried to setup a cluster , with namenode on one node, 3 datanodes and i set fs.default.name to localhost on namenode and master ip address on datanodes. but i am getting the same error. can anyone please help me.
ReplyDelete> i set fs.default.name to localhost on namenode
ReplyDeleteSet it to hostname/ip as mentioned in the first comment instead of the localhost.
Yes, i replaced and it worked, thank you
ReplyDeleteHi, would this also be the case for the mapred-site.xml. That is, would putting localhost:9001 in the mapred-site.xml also cause bind problems for a datanode trying to connect to the namenode?
ReplyDeleteChris,
DeleteThe same would be applicable for the mapred-site.xml also. If localhost:9001 is mentioned in the configuration file, then the remote tasktracker won't be able to talk to the jobtracker.
Praveen
Thanks Praveen, this helped finally get my hadoop cluster up and running
Delete