Tuesday, June 26, 2012

Beyond Hadoop WordCount


I often get the question `Not that I have run the wordcount example - what next? What else can implemented on top of Hadoop?`. Here are some of the options to consider

- Go through the code in the Hadoop examples package and understand in detail how MapReduce works.

- Implement some of the examples in `Data-Intensive Text Processing with MapReduce`.

- Pick a topic of interest from the blog entry from atbrox and start implementing it in MapReduce.

BTW, although algorithms mentioned above might be implemented in MapReduce, but it might not be the best model to implement the algorithms. Start considering alternative models like BSP. Take a look at Apache Hama and Giraph frameworks for implementing some of the above mentioned algorithms. Especially, iterative algorithms like PageRank can be efficiently implemented over Hama and Giraph, when compared to Hadoop.

Edit (27th August, 2012) : This blog article has a nice summary on how to get started analyzing the public data sets.

Monday, June 25, 2012

`Graph Processing Applications` Session @ HUG-Hyderabad

Last Saturday (23rd June, 2012) took a session on `Graph Processing Applications` @ Hyderabad-HUG. The session went very good and the response from the audience (~80) was also positive. Some of the audience asked for more technical details with a demo on Hama/Giraph. I plan to take another session in the near future on the same.

Here is the presentation I used for the session


Some were not familiar with graphs, so I started with a basic introduction to graphs and then talked about the different graph processing frameworks (Giraph, Hama) and about the graph databases (Neo4J).

Broadridge was very good at hosting the event, felt at home giving the session. Overall, I am very much satisfied with the session and plan to take a few more related to Big Data. Ed and Thomas had been very helpful to get me kicked off with Hama and some of the concepts behind BSP. Thanks to both of them.

If anyone is present in Hyderabad, would suggest to follow to this meetup and participate (both at the receiving and giving end) in the upcoming HUG sessions. Also, if the company you are working for is interested in hosting a HUG session @ Hyderabad, please let me know at praveensripati@gmail.com.

Have a nice day !!!

Edit (8 July, 2012) - There is an interesting article in GigaOM on the different alternatives to Hadoop/MapReduce.

Tuesday, June 19, 2012

Does Operating Systems really matter?

In the past 1 month I was able to convert two Windows users into Ubuntu and get my kid (4 years old) start with Ubuntu without much difficulty.

The first user wanted me to check why her Windows XP Laptop was slow (was taking ~5 min to boot), with her permission I installed Ubuntu 12.04 with all the required software (like VLC, Media Codecs, Firefox with a couple of extensions, some nice educational games for kids and others). With about 15 minutes of hand holding on Ubuntu, she was able to explore it with much ease.

The second user was using Windows for checking mails, blogging, checking social network sites (Facebook etc) and for Skype. It was snap to get her used to Firefox and Skype on Ubuntu 12.04 again.

My kid just started with computers. I have him use my desktop with Ubuntu 12.04 and he is comfortable with some of the educational games that I have installed on it (GCompris and others) for him.

The above facts made me think if OS really matters anymore as long as equivalent softwares are available on the different Operating Systems? Also, as more and more applications are moving to the web, the significance of a particular operating system is becoming less and less. It's very rare now a days that a particular site works for a particular web site works for a particular browser and not for other, the browser experience is almost the same on any Operating System.

Hope to convert more computer users into Ubuntu !!!