Wednesday, November 27, 2013

Installing and configuring Storm on Ubuntu 12.04

 
In this blog althrough we had been exploring different frameworks like Hadoop, Pig, Hive and others which are batch oriented (some call it long time similar to real time). So, based on the size of the data and the cluster it might take a couple of hours to process the data.

Above mentioned frameworks might not meet all the user requirement. Lets take the use case of a Credit Card or an Insurance company, they would like to detect frauds happening as soon as possible to minimize the effects of frauds. This is where frameworks like Apache Storm, LinkedIn Samza, Amazon Kinesis help to fill the gap.

A social analytics company called BackType acquired by Twitter developed Storm. Twitter later released the code for Storm. This Twitter blog makes a good introduction to Storm architecture. Instead of the repeating the same here, we will look into how to install and configure on Apache Storm on a single node. It took me a couple of hours to figure it out, so this blog entry to make it easy for others to get started with Storm. 1, 2 had been really helpful to figure it out. Note that Storm has been submitted to the Apache Software Foundation and is in the Incubator phase.

Here are steps from 10k feet

- Download the binaries, Install and Configure - ZooKeeper.
- Download the code, build, install - zeromq and jzmq.
- Download the binaries, Install and Configure - Storm.
- Download the sample Storm code, build and execute them.

Here are the steps in detail. No need to install Hadoop etc, Storm works independent of Hadoop. Note that these steps have been tried on Ubuntu 12.04 and might change a bit with other OS.

- Download the ZooKeeper binaries and extract it. Create the data folder and update the conf/zoo.cfg to point to the data folder. By default it is set to /tmp folder which will be cleansed with every boot. Rest of the default settings are good enough.
dataDir=/home/vm4learning/Installations/zookeeper-3.4.5/data
- Start the ZooKeeper as
bin/zkServer.sh start
- Install the uuid-dev package from the terminal. This is a prerequisite for the next step.
sudo apt-get install uuid-dev
- Download the code for zeromq, compile and install it. Don't be eager enough and get the latest code for zeromq, because it's not compatible with the jzmq Java bindings for zeromq.
wget http://download.zeromq.org/zeromq-2.1.7.tar.gz
tar -xzf zeromq-2.1.7.tar.gz
cd zeromq-2.1.7
./configure
make
sudo make install
- Install the git and libtool packages from the terminal. This are the prerequisites for the next step.
sudo apt-get install libtool git
- Download the code for jzmq. These are the Java bindings for zeromq. Compile and install it.
git clone https://github.com/nathanmarz/jzmq.git
cd jzmq
sed -i 's/classdist_noinst.stamp/classnoinst.stamp/g' src/Makefile.am
./autogen.sh
./configure
make
sudo make install
- Now we are all set with the installation of Storm. Download the latest Storm here and extract it. Create a folder called data in the extracted folder.

- Modify the conf/storm.yaml file to reflect the current environment in which Storm runs. More details about these parameters here. Storm will start two workers, since two parts are specified (6700, 6701). Also, replace 10.0.2.15 with the ip address of the node on which Storm is being installed.
storm.zookeeper.servers:
- "10.0.2.15"

storm.local.dir: "/home/vm4learning/Installations/storm-0.8.2/data"
nimbus.host: "10.0.2.15"
supervisor.slots.ports:
- 6700
- 6701
- Start the Storm daemons and check the logs for any exceptions. The nimbus service is similar to JobTracker and the supervisor service is similar to TaskTracker in Hadoop. More details about the Storm terminology are specified here.
bin/storm nimbus
bin/storm supervisor
- Start the  Storm UI and check the logs for any exceptions. Go to the Storm UI at http://localhost:8080.
bin/storm ui
- With the Storm running on a single node, now it's time to execute the sample code. Get the sample code from Git.
git clone https://github.com/nathanmarz/storm-starter.git
- Package the code. storm-starter-*.jar would be created after a successful build in the target folder.
mvn -f m2-pom.xml package
- Execute the WordCountTopology example. The job is submitted to Storm and the control returns back immediately. The last parameter WordCount is the topology name which can be observed in the Storm UI. Check the logs for any exceptions.
bin/storm jar /home/vm4learning/Code/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar storm.starter.WordCountTopology WordCount
-  The following is a snippet after a successful submission of the topology to Storm.
0    [main] INFO  backtype.storm.StormSubmitter  - Jar not uploaded to master yet. Submitting jar...
14   [main] INFO  backtype.storm.StormSubmitter  - Uploading topology jar /home/vm4learning/Code/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar to assigned location: /home/vm4learning/Installations/storm-0.8.2/data/nimbus/inbox/stormjar-cf3c6d6c-8b27-4827-9cf6-11e793aabb87.jar
289  [main] INFO  backtype.storm.StormSubmitter  - Successfully uploaded topology jar to assigned location: /home/vm4learning/Installations/storm-0.8.2/data/nimbus/inbox/stormjar-cf3c6d6c-8b27-4827-9cf6-11e793aabb87.jar
289  [main] INFO  backtype.storm.StormSubmitter  - Submitting topology WordCount in distributed mode with conf {"topology.workers":3,"topology.debug":true}
1455 [main] INFO  backtype.storm.StormSubmitter  - Finished submitting topology: WordCount
-  The topology should appear in the Storm UI
- And tuples should be emitted by the spout and the bolts defined and linked in the topology.
To summarize, we looked at how to install Storm on a single node Ubuntu 12.04 and run the sample WordCountTopology on it. In the future blogs, we will look at some of the advanced features of Storm.

1) An article from IBM DeveloperWorks on Storm.

9 comments:

  1. Thanks for this Post .. really it helped me to submit topology
    but i have a problem in this command
    user@ubuntu:~/Storm/storm$ bin/storm jar /home/user/storm/storm-starter.jar storm.starter.WordCountTopology wordcount

    got this

    cannot find or load main class of storm.starter.WordCountTopology

    can you help me in this ?

    ReplyDelete
    Replies
    1. For some reason the WordCountTopology is not in the classpath. Check the below mentioned stackoverflow response on how to solve it.

      http://stackoverflow.com/questions/18093928/what-does-could-not-find-or-load-main-class-mean

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. I have followed your instructions above for replacing "classdist_noinst.stamp" to "classnoinst.stamp" in src/Makefile.am but am getting the following error With JZMQ ....

    "No rule to make target `classnoinst.stamp', needed by `org/zeromq/ZMQ.class'"

    ReplyDelete
  4. Fixed it using the following;

    5) Cd /src
    6) touch classdist_noinst.stamp
    7) gmake classnoinst.stamp CLASSPATH=.:./.:$CLASSPATH javac -d . org/zeromq/ZMQ.java org/zeromq/ZMQException.java org/zeromq/ZMQQueue.java org/zeromq/ZMQForwarder.java org/zeromq/ZMQStreamer.java echo timestamp > classnoinst.stamp
    8) cd ..
    9) gmake
    10) gmake install

    ReplyDelete
  5. Hi, I have go through your tutorial and the WordCount works fine. Thanks a lot.
    A question here : How can I submit the RollingTopWords to storm?

    ReplyDelete
  6. When I run storm ui,

    org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused
    at org.apache.thrift7.transport.TSocket.open(TSocket.java:183)

    Can I know why?

    ReplyDelete
  7. When I run the word count topology,

    It shows the Num of workers is 28, the Num of executors is 2 and Num of tasks is 28, which is different from the graph on this post. Is it normal?

    Thanks.

    ReplyDelete