Monday, July 3, 2017

Accessing the EMR Web Consoles

In the previous blog, we looked on how to start a AWS EMR cluster and run a Hive Script. Once the cluster has been started, it does provide a web console to check the status of the cluster and also to see the progress of the different data processing tasks. By default, the web consoles are blocked for the sake of security.

Below are the URLs of some of the web consoles.
YARN ResourceManager  http://master-public-dns-name:8088/
Hadoop HDFS NameNode  http://master-public-dns-name:50070/
Spark HistoryServer  http://master-public-dns-name:18080/
Zeppelin   http://master-public-dns-name:8890/
Hue    http://master-public-dns-name:8888/
Ganglia   http://master-public-dns-name/ganglia/
HBase UI   http://master-public-dns-name:16010/

YARN NodeManager  http://slave-public-dns-name:8042/
Hadoop HDFS DataNode  http://slave-public-dns-name:50075/
In this blog, we will be exploring on how to access the web consoles. The AWS documentation for the same is here.

Step 1 : Start the EMR cluster as shown in the previous blog.

Step 2 : Setup a ssh tunnel to the master using local port forwarding using the below command. Here the local port 8157 is being forwarded to the remote port 8088. The port 8157 can be replaced by any free local port and 8088 is the port on which the YARN console is available. Port 8088 can be replaced by the port of the Web console which we want to access.
ssh -i /home/praveen/Documents/AWS-Keys/MyKeyPair.pem -N -L

In the above command replace the following

a) the path of the key pair
b) the DNS name of the master node (twice)

Step 3 : Access the YARN console from the browser from the same machine the Step 1 has been performed.

An alternate for the above step is to make changes to the master Security Group and allow inbound 8088, which is the YARN Web Console port number.