Without any detailed explanation of what-is-what which is due for another blog entry, here are simple steps to get started with MRv2 (next generation MapReduce) in easy steps. Find more details about MRv2 details here. So, here are the steps
1) Download the Hadoop 2x release here.
2) Extract it to a folder (let's call it $HADOOP_HOME).
3) Add the following to .bashrc in the home folder.
etc/hadoop/yarn-site.xml
a) Check the log files in the $HADOOP_HOME/logs folder for any errors.
b) The following consoles should come up
11) Stop the daemons once the job has been through successfully.
1) Download the Hadoop 2x release here.
2) Extract it to a folder (let's call it $HADOOP_HOME).
3) Add the following to .bashrc in the home folder.
export HADOOP_HOME=/home/vm4learning/Installations/hadoop-2.2.0 export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop4) Create the namenode and the datanode folder in $HADOOP_HOME folder.
mkdir -p $HADOOP_HOME/yarn/yarn_data/hdfs/namenode mkdir -p $HADOOP_HOME/yarn/yarn_data/hdfs/datanode5) Create the following configuration files in $HADOOP_HOME/etc/hadoop folder.
etc/hadoop/yarn-site.xml
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>etc/hadoop/core-site.xml
<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>etc/hadoop/hdfs-site.xml
<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/vm4learning/Installations/hadoop-2.2.0/yarn/yarn_data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/vm4learning/Installations/hadoop-2.2.0/yarn/yarn_data/hdfs/datanode</value> </property>etc/hadoop/mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>6) Format the NameNode
bin/hadoop namenode -format7) Start the Hadoop daemons
sbin/hadoop-daemon.sh start namenode sbin/hadoop-daemon.sh start datanode sbin/hadoop-daemon.sh start secondarynamenode sbin/yarn-daemon.sh start resourcemanager sbin/yarn-daemon.sh start nodemanager sbin/mr-jobhistory-daemon.sh start historyserver8) Time to check if the installation has been a success or not
a) Check the log files in the $HADOOP_HOME/logs folder for any errors.
b) The following consoles should come up
http://localhost:50070/ for NameNode http://localhost:8088/cluster for ResourceManager http://localhost:19888/jobhistory for Job History Serverc) Run the jps command to make sure that the daemons are running.
2234 Jps 1989 ResourceManager 2023 NodeManager 1856 DataNode 2060 JobHistoryServer 1793 NameNode 2049 SecondaryNameNode9) Create a file and copy it to HDFS
mkdir in vi in/file Hadoop is fast Hadoop is cool bin/hadoop dfs -copyFromLocal in/ /in10) Run the example job.
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /in /out10) Verify that the output folder with the proper contents has been created through the NameNode Web console (http://localhost:50070/dfshealth.jsp) in the /out folder.
11) Stop the daemons once the job has been through successfully.
sbin/hadoop-daemon.sh stop namenode sbin/hadoop-daemon.sh stop datanode sbin/hadoop-daemon.sh stop secondarynamenode sbin/yarn-daemon.sh stop resourcemanager sbin/yarn-daemon.sh stop nodemanager sbin/mr-jobhistory-daemon.sh stop historyserver
Thanks for a good post.
ReplyDeleteAnyway, I running into a problem and was wondering if it's something you've encounter. So my job history server works fine as long as I have the set dfs.permissions to false like what you have in your post. However, If I remove it, the job history server will say job not found when I click on the job history link.
Looks like the problem is that the job files are written to hdfs with hdfs as the owner, but the job history server is run as yarn user, which causes permission issues.
I am not sure about the problem. I suggest you to post the problem in the Apache Hadoop forums or post it in StackOverflow.
DeleteThis comment has been removed by the author.
ReplyDeletewhile starting nodemanager i got this error ... help please sir,
ReplyDeleteUnrecognized option: -jvm
Could not create the Java virtual machine.
how to solve this bug. ..
Don't run it as root.
DeleteI did completely deployment as mentioned in Ubuntu 12.04 LTS. I am getting the following error, when I am trying to run any mapreduce application using the hadoop-mapreduce-examples-0.23.6.jar
ReplyDeleteHadoop version: 0.23.6
Container launch failed for container_1364342550899_0001_01_000002 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1364342550899_0001_m_000000_0
Any help is greatly appreciated...
Raja,
DeleteYou can fix the "invalid shuffle port number -1" problem by adding
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
to your yarn-site.xml
This comment has been removed by the author.
ReplyDeletePraveen,
ReplyDeleteGreat article , I was able to run in using the steps mentioned by you, as you mentioned in the starting in your post the whys behind the steps are not clear but I was able to install hadoop in the psuedo-distributed mode using your steps.
I look forward to your next post explaining what each step means and why its necessary , until then I would like to share the following warning message I got when I submitted the job from the examples to Hadoop . Can you please advise why I got this warning message , I am using the Hadoop 2.2 G.A
bin/hadoop dfs -copyFromLocal in/ /in
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
13/11/24 22:54:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Thanks Jack.
DeleteIt's a warning which can be ignored, more details about it here - http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html.
Here are some resourses to get started with YARN and to know what is what in the this blog - http://www.thecloudavenue.com/p/mrv2resources.html.
Hi Praveen,
ReplyDeleteI have setup Hadoop 2.2.0 as per the article and executed wordcount example. I am facing the following issue. Can you explain what might have went wrong?
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /in /out
13/11/30 07:58:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/11/30 07:58:38 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
13/11/30 07:58:40 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/30 07:58:41 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/30 07:58:42 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/30 07:58:43 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/30 07:58:44 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
`yarn.resourcemanager.address` defaults to `0.0.0.0:8032`. So, it looks like the resource manager is not running for some reason. Check the resource manager log file for any error exception.
Deleteplz can u help me ? i got the same error and i dont know how i fixe it where i will find log file of ressource manager ??
DeleteSripati...
ReplyDeleteI see the same issue, but I don't see any errors in resource manager logs..
Krishna..
Are you able to resolve this issue?
Thanks
Chandra
Thank you so much. Your steps worked like a charm. You saved my day :)
ReplyDeleteThanks Praveen..your steps worked perfect...Thanks lot
ReplyDeletePlease help me solve the following error. :
ReplyDeletehduser@ubuntu:/usr/local/hadoop$ hadoop fs -ls
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
14/03/08 09:27:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Call From ubuntu/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Regards
Abdul
May I know how to edit the file system directory in the localhost:50070??
ReplyDeleteHi,
ReplyDeleteI am trying to debug in standalone mode with following configuration
fs.default.name
file:///
mapred.job.tracker
local
but I am getting following errors:
4/06/04 13:55:08 INFO mapreduce.Job: Job job_1401791587704_0007 failed with state FAILED due to: Application application_1401791587704_0007 failed 2 times due to AM Container for appattempt_1401791587704_0007_000002 exited with exitCode: -1000 due to: File file:/user/hdfs/.staging/job_1401791587704_0007/job.jar does not exist
.Failing this attempt.. Failing the application.
14/06/04 13:55:08 INFO mapreduce.Job: Counters: 0
Exception in thread "main" java.io.IOException: Job failed!
bash-4.1$ ls -al /user/hdfs/.staging/job_1401791587704_0007/
total 108
drwx------. 2 hdfs hadoop 4096 Jun 4 13:55 .
drwx------. 6 hdfs hadoop 4096 Jun 4 13:55 ..
-rw-r--r--. 1 hdfs hadoop 7767 Jun 4 13:55 job.jar
-rw-r--r--. 1 hdfs hadoop 72 Jun 4 13:55 .job.jar.crc
-rw-r--r--. 1 hdfs hadoop 157 Jun 4 13:55 job.split
-rw-r--r--. 1 hdfs hadoop 12 Jun 4 13:55 .job.split.crc
-rw-r--r--. 1 hdfs hadoop 42 Jun 4 13:55 job.splitmetainfo
-rw-r--r--. 1 hdfs hadoop 12 Jun 4 13:55 .job.splitmetainfo.crc
-rw-r--r--. 1 hdfs hadoop 67865 Jun 4 13:55 job.xml
-rw-r--r--. 1 hdfs hadoop 540 Jun 4 13:55 .job.xml.crc
bash-4.1$ whoami
hdfs
What I am missing here? Is it possible to run standalone mode for hadoop 2.2.0 yarn?
bin/hadoop dfs -copyFromLocal in/ /in
ReplyDeleteWhere i want to run the above command root or home directory or bin directory
Really Nice post. Very helpful
ReplyDeleteIn addition to what u included in ur post i would like to share some free resources on Hadoop : http://intellipaat.com/blog/setting-up-hadoop-single-node-setup/