Saturday, December 28, 2013

Getting started with Big Data - Part 1 - Installing VirtualBox on a Windows Machine

If not all, most (1, 2 etc) of the Big Data frameworks get built for Linux platforms and then later some of them are migrated to the Windows platform as a second thought. It's not something new, but Microsoft has partnered with HortonWorks to make sure that Hadoop and other Big Data frameworks work smoothly on the Windows platform. Microsoft had Dryad which is a distributed computing platform which had been abandoned for Hadoop. Not only Hadoop is being ported to Windows as HDP, but one can also expect tight integration with other Windows services/softwares.
linux_not_windows by nslookup on flickr
There are a number of factors which decide to go for Big Data/Windows or Big Data/Linux platform. Even when going for Big Data/Windows, it makes sense to have a Big Data/Linux for the development environment from a cost perspective. It's more or less the same as developing JEE applications JBoss (ok ok - WildFly) and then migrating them to other proprietary application platforms like IBM WAS. Although, it comes with a cost saving to do the development on Linux machines, there might be an additional effort which might have to be incurred to take into the fact that there are some (or a lot of) environment differences between Big Data/Windows or Big Data/Linux platforms. Also, any extensions provided in one platform cannot be developed/tested on the other platform.

As mentioned above, most of the Big Data frameworks get built on Linux and then ported to Windows platform. And it's really important to get initially familiar and then finally comfortable with Linux to deep dive into the Big Data space. Most of us are have worked on and comfortable with Windows, but it's  rare or less common to see someone who uses Linux on a day to day basis unless someone is working in IT or has been forced to use Linux by mass migrating from Windows to Linux.

One of the main reason for the hindrance to adopt Linux is the fear that installation of Linux might mess with the existing Windows installation. But, Linux has come long way and it has been no more easier installing and using Linux. There are more than 100 flavors of Linux to pick and choose from.

In this blog we will look into how to install VirtualBox on a Windows machine and with upcoming blogs on how to install Ubuntu and finally Apache Bigtop (which makes it easy to get started with a lot of Big Data frameworks easy) on top of Ubuntu. VirtualBox and other virtualization softwares like VMWare allow to run one OS on top of another OS or run multiple OS directly on the virtualization software.
- First step is download VirtualBox from here and start the installation process. The default options are good enough to complete the VirtualBox installation. Below are the screen through the installation process which are more or less self explanatory.

Screen 1

Screen 2

Screen 3

Screen 4

Screen 5

Screen 6

Screen 7

With the installation of VirtualBox complete, in the upcoming blogs we will look into how to install Ubuntu on top of VirtualBox and then finally install Apache Bigtop in Ubuntu.

No comments:

Post a Comment