Monday, September 10, 2018

Where is Big Data heading?

Where is Big Data heading?

During the initial days of Hadoop only MapReduce was the supported software and later Hadoop was extended with YARN (kind of an Operating System for Big Data) to support Apache Spark and others. YARN also increased the resource utilization of the cluster. YARN was developed by HortonWorks and later on contributed to the Apache Software Foundation. Other Big Data Vendors like Cloudera, MapR slowly started adopting it and making improvements to it. YARN was an important turn in Big Data.

Along the same lines there is another major change happening in the Big Data space around Containerization, Orchestration and separating storage and the compute part. HortonWorks published a blog on the same and call it as Open Hybrid Architecture Initiative. There is a nice articles from ZDNet on the same.

The blog from HortonWorks is full of detail, but the crux as mentioned in the blog is as below:

Phase 1: Containerization of HDP and HDF workloads with DPS driving the new interaction model for orchestrating workloads by programmatic spin-up/down of workload-specific clusters (different versions of Hive, Spark, NiFi, etc.) for users and workflows.

Phase 2: Separation of storage and compute by adopting scalable file-system and object-store interfaces via the Apache Hadoop HDFS Ozone project.

Phase 3: Containerization for portability of big data services, leveraging technologies such as Kubernetes for containerized HDP and HDF. Red Hat and IBM partner with us on this journey to accelerate containerized big data workloads for hybrid. As part of this phase, we will certify HDP, HDF and DPS as Red Hat Certified Containers on RedHat OpenShift, an industry-leading enterprise container and Kubernetes application platform. This allows customers to more easily adopt a hybrid architecture for big data applications and analytics, all with the common and trusted security, data governance and operations that enterprises require.

My Opinion

The different Cloud Vendors had been offering Big Data as a service for quite some time. Athena, EMR, RedShift, Kinesis are a few of the services from AWS. There are similar offerings from Google Cloud, Microsoft Azure and other Cloud vendors also. All these services are native to the Cloud (built for the Cloud) and provide tight integration with the other services from the Cloud vendor.

In the case of Cloudera, MapR and HortonWorks the Big Data platforms were not designed with the Cloud into considerations from the beginning and later the platforms were plugged or force fitted into the Cloud. The Open Hybrid Architecture Initiative is an initiative by HortonWorks to make their Big Data platform more and more Cloud native. The below image from the ZDNet article says it all.
It's a long shot that the different phases are designed, developed and the customers move to it. But, the vision gives an idea on where Big Data is heading.

Two of the three phases are involved with Kubernetes and Containers. As mentioned in the previous few blogs, the way the applications are being built is getting changed a lot and its extremely important to get comfortable with the technologies around Containers.

No comments:

Post a Comment