Monday, June 24, 2013

Looking for help to redesign my blog

I have started this blog close to 2 years back and had been getting good traffic to it. Not only traffic, but had been getting some nice opportunities also through it. I was contacted by a top notch company for a job and by a couple of book publishers also. Initially I started it just for fun to feel what blogging is about, but lately I had been liking it.

The blog is maintained using blogger.com and godaddy.com is used as the internet domain registrar. I redirect the traffic from www.thecloudavenue.com to hadoop-tips.blogspot.com.

My main concentration had been on writing interesting blogs, but I was also spending a bit of time on making the blog look somewhat better. Now, I am interested in taking it to the next level. I would like some help getting it make look even better, more professional, easily usable, load fast while keeping most of the existing features.

If you have worked with customizing blogger templates, please send me an email at praveensripati@gmail.com with any samples of blogger templates which you have worked on along with your expectations.

Tuesday, June 18, 2013

Sharing folder between Guest and Host OS using VirtualBox

For the Big Data training which we provide, very often the question is `how to share folders and hence the data between the host and the guest OS` in regards to the Big Data Virtual Machine (VM).

The first step is to install the VirtualBox Guest Additions in the guest OS (which has to be done only once) and then share the folders. Below are the instructions for Ubuntu as host/guest OS, the instruction are some what similar to other OS also.

1) Start the VirtualBox, select the VM, click on `Settings`, select `Storage`, select `IDE Controller`, click on plus (+) to `Add CD/DVD Device`.

2) Click on `Choose disk` and point to the `VBoxGuestAdditions.iso` file which is a CD image for installing the VirtualBox Guest Additions. This file is in the `/usr/share/virtualbox` on Ubuntu and in the installation folder of VirtualBox on a Windows machine.


The iso should be added under the `IDE Controller` as shown below.


3) Start the VM and the iso should be mounted as shown below in the /media folder.


4) Install the Linux headers using the below command. Admin privileges are required and the user may be prompted for password.
sudo apt-get install linux-headers-$(uname -r)
5) Install the guest additions using the below command.  Make sure that there are no errors.
sudo ./VBoxLinuxAdditions.run
6) Close the terminal and shutdown the VM.

7) Now it's time to share the folders. Select the VM, click on `Settings`, select `Shared Folders` and add the folder path and folder name as shown below.


8) Restart the VM and make the user is part of the vboxsf group using the below command. Logout and login after running the below command.
sudo usermod -a -G vboxsf <username>
9) The shared folder should appear in the VM folder in the `/media` folder as shown below.


Now the data can be copied between the host and the guest OS using terminal, nautilus or some other tool.

Thursday, June 13, 2013

Google synonymous to Big Data and more .....

Couple of months back I had written an article on how Google had been driving the Big Data space. The papers published by Google provide insights into what is next in Big Data, which is like a moving target and difficult to be updated with. GigaOM published an article recently articulating the same.


As the size of data size increases we need machines to process more and more data using Machine Learning. Google can't automatically process the tons of emails as spam and non-spam without machines automatically classifying them. Google is using Neural Networks to let users search photos easily. Here is a article on the same.

Talking about Machine Learning, recently I took the Cloudera Data Science Certification which is still in the Beta. Cloudera offered the certification for free because of being in the top 5% for the Cloudera Certified Developer for Apache Hadoop CCDH.  I didn't get a chance to prepare for the Data Science Certification, the main purpose for the certification was to know the pattern of the certification.

I am planning for the same Cloudera Data Science Certification in the next 2-3 months and this is what had been keeping me busy and so the less posting on this blog and other activities. As I make progress, I will post some useful tips for getting through the certification.


Sunday, June 9, 2013

How Big Data helps to keep track of us?

Technology can be used to make our lives better or to keep a tab on us. This week Verizon had been all over the news for their role of sharing the customer call detail records with NSA. To store and analyze the call details for the entire customer base a relational database won't fit the picture. NSA developed and is using Apache Acumulo which is KV database with with a cell level security.

GigaOM published two nice articles (1 and 2) on NSA using Apache Acumulo to store and process huge amounts of data.