Saturday, October 22, 2011

Hadoop on Windows

As some of you might have read HortonWorks and Microsoft have partnered, to get Hadoop running on Windows. Till date, Hadoop is being run only on Linux in production, but on Windows and Linux for development. In the future, we would also be seeing Hadoop on Windows in production.

- It's not the first time Hadoop and Microsoft came together. Microsoft acquired semantic search engine PowerSet, which is now part of the Bing search engine. PowerSet internally used Hadoop. Later read that, Hadoop has been replaced with some other software by Microsoft after acquisition (disclaimer : not 100% sure about it).

- Then there is Dryad (platform for distributed computing) and DryadLINQ (high level abstraction language for distributed computing) from Microsoft. DryadLINQ is tightly integrated with .NET and Windows and would be running much more efficiently on Windows than Hadoop on Windows. Not sure, if Microsoft will give enough focus on Hadoop along with Dryad.

- Apache Hadoop documentation recommends Oracle JDK 6. Apache Hadoop unpatched doesn't run neither on IBM JDK/defunct Apache Harmony/Open JDK, now Windows is being added to the mix.

- I am not a performance expert on cross-platform applications, but it might be a challenge to make same version of Hadoop perform better on Linux and Windows at the same time.

- The one good thing about all of this is Microsoft would be contributing the code back to Apache and there would be more eyes looking at the Hadoop code. Also, Microsoft is having it's employees to work on Hadoop and not outsource it.

- Also, as Steve mentioned there is a very little chance that Hadoop on Windows will be deployed for internal use. So, someone outside has to step-up and deploy/find bugs in Hadoop on a big cluster.

Considering all there factors, let's wait and see if there would be more Hadoop on Linux or Windows.

Edit: I was a bit skeptical about Microsoft's commit to Hadoop. But, looks like Microsoft is jumping into Hadoop all the way. This is a good news for Hadoop.

Edit (13th December, 2011) : Microsoft to allow limited preview on Hadoop on Azure.

Edit (15th December, 2011): The following url point to WIP documentation for Hadoop on Windows.

Edit (12th Jaunary, 2012): Avkash had been passionately blogging about Hadoop on Windows Azure.

No comments:

Post a Comment