Sunday, July 6, 2014

Bye bye - thecloudavenue.com

It had been close to four nice years writing on this blog. Those who had been following closely might have noticed that I had been not much active writing on this blog. I have incorporated Dattamsha Techno Solutions Pvt Ltd (OPC) and had lately blogging at http://www.dattamsha.com/blog/.
The new site is a WIP, but I would be blogging there with the latest happenings around Big Data and offering various services around the same. There are multiple ways to get updated with the posts on the new site

- Subscribe to the blog via email
- Follow on Twitter / Facebook
- Use any RSS Aggregator like Feedly on the feed.
- or to visit the site on a regular basis :)

For the curious I am using Wordpress hosted on Bluehost. Blogger was a low hanging fruit to get started with blogging, but hasn't gone through much development lately. Wordpress has tons of plugins to start and since it is hosted and not a service, can be customized to the maximum extent.

See you @ Dattamsha.

Thursday, May 22, 2014

Pig as a Service: Hadoop challenges data warehouses

Thanks to Gil Allouche (Qubole's VP of Marketing) for this post.

Hadoop and its ecosystem has evolved from a narrow map-reduced architecture to a universal data platform set to dominate the data processing landscape in the future. Importantly, the push to simplify Hadoop deployments with managed cloud services known as Hadoop-as-a-Service is increasing Hadoop’s appeal to new data projects and architectures. Naturally, the development is permeating the Hadoop ecosystem in shape of Pig as a Service offerings, for example.

Pig, developed by Yahoo research in 2006, enables programmers to write data transformation programs for Hadoop quickly and easily without the cost and complexity of map-reduce programs. Consequently, ETL (Extract, Transform, Load), the core workload of DWH (data warehouse) solutions, is often realized with Pig in the Hadoop environment. The business case for Hadoop and Pig as a Service is very compelling from financial and technical perspectives.

Hadoop is becoming data’s Swiss Army knife
The news on Hadoop last year have been dominated by SQL (Structured Query language) on Hadoop with Hive, Presto, Impala, Drill, and countless other flavours competing on making big data accessible to business users. Most of these solutions are supported directly by Hadoop distributors, e.g. Hortonworks, MapR, Cloudera, and cloud service providers, e.g. Amazon and Qubole.

The push for development in the area is driven by the vision for Hadoop to become the data platform of the future. The release of Hadoop 2.0 with YARN (Yet Another Resource Negotiator) last year was an important step. It turned the core of Hadoop’s processing architecture from a map-reduce centric solution into a generic cluster resource management tool able to run any kind of algorithm and application. Hadoop solution providers are now racing to capture the market for multipurpose, any-size data processing. SQL on Hadoop is only one of the stepping-stones to this goal.

Friday, May 16, 2014

User recommendations using Hadoop, Flume, HBase and Log4J - Part 2

Thanks to Srinivas Kummarapu for this post on how to show the appropriate recommendations to a web user based on the user activity in the past.

In the previous blog we have seen how to Flume the user activities into the Hadoop cluster. On top of these user activities some analysis can be done to figure out what a particular user is interested in.

For example if a user wants to buy a mobile from a shopping site and ended up buying none, we got all his activities into Hadoop cluster on which analysis can be done to figure out what type of phones that particular user is interested in. The interested phones can be recommended when the user visits the site again.

The user activities in the HBase consists of only mobile name and no more details. More details about the mobile phone can be maintained in a RDBMS. We need to do join the RDBMS data (mobile details) with the HBase to send the information to the Recommendations tables of RDBMS in order to recommend the user.

Here we have two options to perform Joins.

1) Send the result of the Hadoop cluster to RDBMS and do Joins there.
2) Get the RDBMS data into HBase to perform join in parallel distributed fashion.

Both can be done by a Map-Only Jobs tool called Sqoop (SQl to haOOP).
In this article we will see how to Sqoop the RDBMS table into the HBase database in an incremental fashion.

Friday, May 9, 2014

User recommendations using Hadoop, Flume, HBase and Log4J - Part 1

Thanks to Srinivas Kummarapu for this post on how to show the appropriate recommendations to a web user based on the user activity in the past.

This first of a four part article is with the assumption that Hadoop, Flume, HBase and Log4J have been already installed. In this article we will see how to track the user activities and dump it into HDFS and HBase. In the future articles, we will look into some kind of basket analysis from the data in HDFS/HBase and will project the same to the transaction database for recommendations. Also, refer this article to Flume the data into HDFS.

Friday, May 2, 2014

Looking for guest bloggers at thecloudavenue.com

The first entry had been posted on 28th September, 2011 on this blog. Initially I started blogging as an experiment, but lately I had been having fun and liking to blog.

Not only the traffic to the blog had been increasing at a very good pace, but also I had been making quite a few acquaintances and also getting a lot of nice and interesting opportunities through the blog. I got offers to write a book, an article, blog on some other sites and others.

I am looking for guest bloggers to this blog. If you or someone else is interested then please let me know

a) a bit about yourself (along with LinkedIn profile)
b) topics you are interested in to write on this blog
c) references to articles written in the past if any
I don't want to put a lot of restrictions around this, but here are a few

a) the article should be authentic
b) no affiliate or promotional links to be included
c) the article can appear elsewhere after 10 days with a back link to the original

I am open to any topics around Big Data, but here are some of the topics I would be interested in

a) a use case on how you company/startup is using Big Data
b) using R/Python/Mahout/Weka for some interesting data processing
c) integrating different open source frameworks
d) comparing different open source frameworks with similar functionalities
e) ideas and implementation of pet projects or POC (Proof Of Concepts)
f) best practices and recommendation
g) views/opinions of different open source framework

As a bonus, if a blog gets posted here then it will also include a brief introduction about the author and a link to his/her LinkedIn profile. This will give enough publicity for the author.

If you are a rookie and writing for the first time, that shouldn't be a problem. Everything begins with a simple start. Please let me know at info@thecloudavenue.com if you are interested in blogging here.

Tuesday, April 29, 2014

Wanted interns to work on Big Data Technologies

We are looking for interns to work with us on some of  the Big Data technologies at Hyderabad, India. The pay would be appropriate. The intern preferably should be from a Computer Science background, be really passionate about learning new technologies and be ready to stretch a bit hard. The intern under our guidance would be performing installing/configuring/tuning of Linux OS and all the way to the Hadoop and related Big Data frameworks on the cluster. Once the Hadoop cluster has been setup, we have got a couple of ideas which we would be implementing on the same cluster.
The immediate advantage is that the intern would be working on one of the current hot technology and would have direct access to us to know/learn more about Big Data. Also, based on the requirement appropriate training would be given around Big Data. Also, the work being done by the intern will definitely help in getting them through the different Cloudera Certifications.

BTW, we are looking for someone who can work with us full time and not part time. If you or anyone you know is interested in taking an internship please send an email with CV at info@thecloudavenue.com.

Monday, April 28, 2014

Big Data Webinars

Big Data is a moving target with new companies, frameworks and features get introduced all the time. It's getting more and more difficult to keep in pace with Big Data. There is nothing more relaxing than sitting in a chair and watching webinars on some of the latest technologies.
Brighton Beach by appoose81 from  Flickr under CC
So, here (1) is a calendar (XML, ICAL, HTML) with few of the upcoming webinars around Big Data. I would be populating more and more events in the calendar as they get planned. Those interested can import the calendar in Thunderbird, Outlook or some other calendar application and keep updated with the webinars in the Big Data space.

If you are interested in including any webinar around Big Data in the calendar, then let me know at info@thecloudavenue.com.
http://www.thecloudavenue.com/p/big-data-webinars.html