Monday, March 18, 2019

Quickly and easily installing K8S on the local machine

In the previous blog here we have seen how to get started with K8S easily with zero-installation using Play-With-Kubernetes (PWK). Everything happens in the remote machines, so nothing to install on the local machines. We can get started with K8S in less than 5-10 minutes using PWK. The main con of PWK is that the session is available for 4 hours and any modifications to the K8S cluster are lost.

One easy way to use K8S locally is to use Minikube as mentioned here, but it provides a single node cluster and it makes it tough to test the different failure scenarios like a node going down and a few other things.

In this blog we will try to install a multi-node K8S locally on the laptop as mentioned here, so that the changes are persisted across sessions. We should be able to continue from where we left. K8S-the-hard-way sets up a cluster from scratch, but it takes time and expertise. So, there are tools like kubeadm which abstracts and makes the installation process easier.

With kubeadm there are a sequence of steps to install a multi-node cluster on laptop. And for those who are new to Linux, it might be a pain. So, I was trying to figure out if the installation process using kubeadm can be automated using Vagrant. Tried a couple of hours, got stuck and gave up. And then luckily I found a ready made Vagrantfile from this article, which made the K8S installation process a breeze.

On a side node a multi-node K8S cluster can be run on the Cloud, but not every one is comfortable with the Cloud, so here are the steps using VirtualBox and Vagrant on the local machine.

Step 1: Download and install the latest version of VirtualBox and then Vagrant. For the sake of Vagrant, you might have to restart the OS. The installation is pretty obvious as installing any other Windows software.

Step 2: Make folder on the laptop and create a Vagrantfile with the content from here. If required the amount of Memory and CPU cores can be modified in this file.

Step 3: Go to the above created folder and run the command 'vagrant up' from the Command Prompt. It takes a couple of minutes to create Virtual Machines in VirtualBox, download and install the K8S and the required binaries. The end screen will appear as shown below.

And the Virtual Machines (k8s-head, k8s-node-1 and k8s-node-2) will appear as shown below. We are all set with the K8S installation. It's a piece of cake. It had never been easy to install softwares.

Step 4: K8S follows a master-slave architecture. Login to the master using 'vagrant ssh k8s-head' and run the 'kubectl get nodes' to make sure all the nodes are ready.

Step 5: Now lets create a deployment using the 'kubectl run nginx --image=nginx -r=4' and make sure it has been deployed using the 'kubectl get deployment' and 'kubectl get pods' commands.

Step 6: Now if we want to destroy the cluster, run the 'vagrant destroy -f' command from the earlier created folder and the Virtual Machines will be shutdown and deleted.

Step 7: During the installation if something goes wrong then it will be displayed on the screen and more details will be logged to 'ubuntu-xenial-16.04-cloudimg-console.log' file in the same folder.

As seen above all it takes is a couple of steps to create a multi node K8S on the laptop. Now you should be all setup to get started and explore the world of K8S. Further nodes can be added by modifying the Vagrantfile and running the 'vagrant up' command.

In the upcoming blogs, we will try to install additional packages or applications on top of the above K8S cluster and try different things with them.

Note: Joserra in the comments points to the K8S blog on the same here. This blog uses VirtualBox and Vagrant. While the K8S blog uses Ansible to run the commands in the VM on top of VirtualBox and Vagrant. The end result of both of them are the same.

Monday, March 11, 2019

Getting started with K8S the easy way using 'Play with Kubernetes'

There are many ways of installing K8S as mentioned here. It can be installed in the Cloud, on-premise and also locally on the laptop using virtualization. But, installing K8S had never been easy. In this blog, we will look at one of the easiest way to get started with K8S using Play with Kubernetes (PWK). With this the whole K8S experience is within the browser and there is nothing to install on the laptop, everything is installed on the remote machine. PWK uses 'Docker in Docker' which is detailed here (1, 2).

Step 1:  Go to, Login and click on Start. A Docker or a Git login would be required for the same.

Step 2: PWK allows up to 5 nodes or machines. Click on 'ADD NEW INSTANCE' for 5 times and this will add 5 instances as shown below from node1 to node5. Here we will configure node1 as master and the remaining as workers.

Clicking on a node in the left pane will give access to the corresponding terminal in the bottom right pane. The combination 'Alt+Enter' will maximize the terminal.

Step 3: Run 'kubeadm config images pull' command on node1. This will pull all the images required for the installation before the actual installation starts in the next step. This is an optional step, but this step makes the installation faster.

Step 4: Init the master on node1 using the 'kubeadm init --apiserver-advertise-address $(hostname -i)' command. The output of the command should be as shown below. Note down the 'kubeadm join .....' command from the output of this command.

Step 5: Now is the time to deploy the Pod network using the below command on node1.

kubectl apply -n kube-system -f "$(kubectl version | base64 |tr -d '\n')"

Step 6: Execute the 'kubeadm join ......' command on all the workers (node2, node3, node4 and node5). On each of the node the 'This node has joined the cluster' will be displayed towards the end of the output. The 'kubeadm join ......' command has been got from Step 4.

Step 7: After a few minutes run 'kubectl get nodes' on the master node (node1) and all the nodes should be in a Ready status. This makes sure that out 5 node K8S cluster is ready.

Step 8: Lets create a K8S Deployment with 4 replicas on the nginx server by running 'kubectl run nginx --image=nginx -r=4' on the master node (node1). Initially the status of the Containers will be in 'Container Creating. But, in a few seconds it will change to Running.

Step 9: Get the detailed status of the Pods using 'kubectl get pods -o wide' command. This sill show that the Pods are balanced across all the nodes.

The K8S Deployment objects maintains a fixed number of Pods. Delete one of the Pod using 'kubectl delete pod NAME-OF-THE-POD'. Notice that the Pod will be deleted and a new Pod is automatically created. This can be observed by running the 'kubectl get pods -o wide' command again. The name of the deleted Pod will be changed.

The K8S session would be available for 4 hours. And also any resources/setting done will be lost after the session. The changes to the cluster won't be persisted. Likewise there are a few disadvantages of using PWK, but the good thing is it's free and requires no installation on the local machine.

In the upcoming blogs, we will try to explore the other ways of installing K8S. Also, check Katakoda. It offers K8S in the browser similar to PWK.

Friday, March 1, 2019

Webinar to know about CKAD and CKA Kubernetes Certifications

Kubernetes is all about orchestrating Microservices. Instead of repeating what it's all about, here is home page for Kubernetes with more details. CNCF offers CKAD and CKA certifications around Kubernetes. While CKAD is more from a developers perspective, CKA is from administration perspective. Out of these CKA is a bit tougher compared to CKAD. While most of the certifications are theoretical, the Kubernetes Certifications are practical, a set of tasks have to be completed in a given time on a Kubernetes cluster. So, hands on is pretty much required for the Certifications.

Here is a recorded webinar from CNCF on getting started with the Certifications. I was preparing for the Kubernetes certification, but got deviated. Planning to get back to get back to the Certifications again. Will write a detailed blog on these Certification once I get through the Certifications.

Paper on Serverless Computing from Berkeley

Cloud Computing moves MOST of the administration from the Cloud consumers to the Cloud providers. No need to think about procuring hardware, networking, cooling, physical security etc. Serverless moves in the same direction, taking away even more administration from the Cloud consumers.

The name `Serverless` is a bit of misnomer as there are still servers involved. The only thing is that the Cloud consumers need not think in terms of Servers. Take the example of FAAS (Function-As-A-Service). Here are the sequence of steps, no where a SERVER is mentioned.

- Write and test a function
- Package the function
- Deploy the package to the Cloud
- Associate an event with the function (to be invoked automatically) or provide an API Gateway (to be invoked programmatically)

There is no mention of SERVER in the above and so the name Serverless. The good thing about FAAS is that it scales automatically and there is no need pay when the function is not invoked which is not the case of IAAS, PAAS and SAAS. We pay based on the number of function invocations and the amount of resources consumed.

Serverless has a lot to go, but applications can be built end-to-end without thinking about Servers and so Serverless. Here is a recent good read Cloud Programming Simplified: A Berkeley View on Serverless Computing about the pros, cons, challenges, research areas and finally predictions of Serverless computing.

Also to get a hang on FAAS, here is an blog I have written using AWS Lambda to to trigger a Java function which shrinks an image as soon as it has been uploaded to AWS S3.

Sunday, January 13, 2019

Developing with AWS Workshop - CGC, Landran

Completed a 5 day Workshop "Developing with AWS" for Engineering and MCA Students  at CGC, Landran. Nice to see a good bunch of happy students towards the end of the Workshop.

Tuesday, September 18, 2018

Node fault tolerance in K8S & Devlarative programming

In K8S, everything is declarative and not imperative. We specify the target state to K8S and it will make sure that that the target state is always there, even in the case of failures. Basically, we specify what we want (as in the case of SQL) and not how to do it.
In the above scenario, we have 1 master and 2 nodes. We can ask K8S to deploy 6 pods (application instances) onto the nodes and K8S will automatically schedule the pods across the nodes. In case one of the node goes down, then K8S will automatically reschedule the pods from the failed node to a healthy node. I iterate, we simply specify the target state (6 nodes) and not where to deploy, how to address the failure scenarios etc. Remember declarative and not imperative.

For some reason it takes ~6 minutes for the pods to be rescheduled on the healthy nodes, even after the configuration changes mentioned here. Need to look into this a bit more.

Here is a video demoing the same in a small cluster. We can notice that when one of the node goes down, automatically K8S will reschedule the corresponding pods to a healthy node. We don't need to wake in the middle of the night to rectify a problem as long as have additional resources in case of failures.

Here are the sequence of steps. The same steps can be executed on a K8S cluster on the Cloud or locally on your Laptop. In this scenario, I am running the K8S Cluster on my Laptop. Also, the sequence of steps seem to be lengthy, but can be automated using Helm, which is a package manager for K8S.

Step 1 : Start the K8S cluster in VirtualBox.

Step 2 : Make sure the cluster is up. Wait for a few minutes for the cluster to be up. Freezing the recording here.
kubectl get nodes

Step 3 : Clean the cluster of all the resources
kubectl delete po,svc,rc,rs,deploy --all

Step 4 : Deploy the Docker ghost image (default replica is 1)
kubectl run ghost --image=ghost

Step 5 : Check the number of pods (should be 1)
kubectl get rs

Step 6 : Check the node in which they are deployed
kubectl get pods -o wide | grep -i running

Step 7 : Scale the application (replicas to 6)
kubectl scale deployment --replicas=6 ghost

Step 8 : Check the number of pods again (should be 6)
kubectl get rs

Step 9 : Check the node in which they are deployed (The K8S scheduler should load balance the pods across slave1 and slave2)
kubectl get pods -o wide | grep -i running

Step 10 : ssh to one slave of the bring down one of the node
sudo init 0

Step 11 : Wait for a few minutes (default ~6min). Freezing the recording here.

Step 12 : Check if the pods are deployed to healthy node
kubectl get pods -o wide | grep -i running

Hurray!!! The pods have been automatically deployed on a healthy node.

Additional steps (not required for this scenario)

Step 1 : Expose the pod as a service
kubectl expose deployment ghost --port=2368 --type=NodePort

Step 2 : Get the port of the service
kubectl get services ghost

Step 3 : Access the webpage using the above port

In the upcoming blogs, I will try to explain a few more features of K8S using demos. Keep looking !!!

Saturday, September 15, 2018

K8S Cluster on Laptop

Why K8S Cluster on Laptop?

A few years back I wrote a blog on setting up a Big Data Cluster on the laptop.  This time it's about setting up a K8S Cluster on the laptop. There are a few Zero-Installation K8S setup which can be run in the browser like Katakoda Kubernetes, Play with Kubernetes and K8S can also be run in Cloud (AWS EKS, Google GKE and Azure AKS). So, why install K8S Cluster on the Laptop? Here a few reasons I can think of.
  • It's absolutely free
  • Will get comfortable with the K8S administration concepts
  • Will know what happens behind the scenes to some extent
  • Above mentioned Katakoda and Play with Kubernetes were slow
  • Finally, because we can :)

More details

As mentioned in the K8S documentation there are a tons of options for installing it. I used a tool called kubeadm which is part of the K8S project. The official documentation for kubeadm is good, but it's  a bit too generic with a lot of options and also too lengthy. I found that the documentation from to be good and up to the point. There are few things missing in the documentation, but it's good to get started.

I would be writing a detailed article on the setup procedure, but here a few highlights for anyone to get started.

  • Used Oracle VirtualBox to setup three VMs and installed master on one of them and slaves on the other two.

  • Used a laptop with the below configuration. It has a HDD, an SSD would have saved lot more time during the installation process and also the K8S Cluster boot process (< 2 minutes on HDD).
  • Even after the K8S Cluster was started, the Laptop was still responsive. Below in the System Monitor after starting the K8S Cluster.
  •  Below shows kubectl commands to get the list of nodes, services and also to invoke the service.

Final thoughts

There are two K8S Certifications Certified Kubernetes Application Developer (CKAD) Program and Certified Kubernetes Administrator (CKA) Program from CNCF. The CKAD Certification was started recently and is much more easier than the CKA Certification.

The practice for CKAD Certification can done in Minikube which was discussed in the earlier blogs (Linux and Windows). But, for the CKA Certification setting up a Cluster with different configurations, troubleshooting is required and so setting up a Cluster is required.

Installation using kubeadm was easy, it automates the entire installation process. Installing from scratch would be definitely interesting (here and here). We will get to know what happens behind the scene.

It took a couple of hours to setup a K8S Cluster. Most of the time was spent on installing the Guest OS, cloning it, fine tuning to make sure K8S runs on a Laptop etc. The actual installation and basic testing of the K8S Cluster took less than 10 minutes.

In the upcoming blog, we will look at setting up a K8S Cluster on Laptop. Keep looking !!!

Monday, September 10, 2018

Where is Big Data heading?

Where is Big Data heading?

During the initial days of Hadoop only MapReduce was the supported software and later Hadoop was extended with YARN (kind of an Operating System for Big Data) to support Apache Spark and others. YARN also increased the resource utilization of the cluster. YARN was developed by HortonWorks and later on contributed to the Apache Software Foundation. Other Big Data Vendors like Cloudera, MapR slowly started adopting it and making improvements to it. YARN was an important turn in Big Data.

Along the same lines there is another major change happening in the Big Data space around Containerization, Orchestration and separating storage and the compute part. HortonWorks published a blog on the same and call it as Open Hybrid Architecture Initiative. There is a nice articles from ZDNet on the same.

The blog from HortonWorks is full of detail, but the crux as mentioned in the blog is as below:

Phase 1: Containerization of HDP and HDF workloads with DPS driving the new interaction model for orchestrating workloads by programmatic spin-up/down of workload-specific clusters (different versions of Hive, Spark, NiFi, etc.) for users and workflows.

Phase 2: Separation of storage and compute by adopting scalable file-system and object-store interfaces via the Apache Hadoop HDFS Ozone project.

Phase 3: Containerization for portability of big data services, leveraging technologies such as Kubernetes for containerized HDP and HDF. Red Hat and IBM partner with us on this journey to accelerate containerized big data workloads for hybrid. As part of this phase, we will certify HDP, HDF and DPS as Red Hat Certified Containers on RedHat OpenShift, an industry-leading enterprise container and Kubernetes application platform. This allows customers to more easily adopt a hybrid architecture for big data applications and analytics, all with the common and trusted security, data governance and operations that enterprises require.

My Opinion

The different Cloud Vendors had been offering Big Data as a service for quite some time. Athena, EMR, RedShift, Kinesis are a few of the services from AWS. There are similar offerings from Google Cloud, Microsoft Azure and other Cloud vendors also. All these services are native to the Cloud (built for the Cloud) and provide tight integration with the other services from the Cloud vendor.

In the case of Cloudera, MapR and HortonWorks the Big Data platforms were not designed with the Cloud into considerations from the beginning and later the platforms were plugged or force fitted into the Cloud. The Open Hybrid Architecture Initiative is an initiative by HortonWorks to make their Big Data platform more and more Cloud native. The below image from the ZDNet article says it all.
It's a long shot that the different phases are designed, developed and the customers move to it. But, the vision gives an idea on where Big Data is heading.

Two of the three phases are involved with Kubernetes and Containers. As mentioned in the previous few blogs, the way the applications are being built is getting changed a lot and its extremely important to get comfortable with the technologies around Containers.

Abstraction in the AWS Cloud, in fact any Cloud

Sharing of responsibility and abstraction in Cloud

One of the main advantage of the Cloud is sharing of the responsibilities by the Cloud Vendor and the Consumer of the Cloud. This way the Consumer of the Cloud need to worry less about the routine tasks and think more about the application business logic. Look here for more on the AWS Shared Responsibility Model.

EC2 (Virtual Server in the Cloud) was one of the oldest service introduced by AWS, with EC2 there is less responsibility on AWS and more on Consumer. As AWS became more mature and more services have been introduced, the responsibility had been shifting slowly more from the Consumers towards AWS. AWS also had been ABSTRACTING more and more of the different aspects of technology from the Customer.

When we deploy an application on EC2, we need to think about
  • Number of servers
  • Size of each server
  • Patching the server
  • Scaling the server up and down
  • Load balancing and more

On the other end of the spectrum with Lambda, we simply create a function and upload it to AWS. The above concerns and lot more are taken care of by AWS automatically for us. With Lambda we don't need to think about the number of EC2 instances, size of each EC2 and a lot of things.

While driving a regular car we don't need to worry about how the internal combustion of an engine works. A car provides us with an abstraction using a steering wheel, brakes, clutch etc. But, it's better to know what happens below the hood, just in case the car stops in the middle of no where. Same is the case of the AWS services also. The new autonomous cars do provide an even higher level of abstraction, we just need to specify the destination location and the rest of things will be taken care of. Similar is the progress in the different AWS services and in fact any of the Cloud services.

Recently I read an article in AWS detailing the above abstractions and responsibilities here. It's a good read introducing the different AWS Services at a very high level.

Abstraction is good, but it comes at a cost of less flexibility. Abstraction hides a lot of underlying details. With Lambda we simply upload a function and don't really care on which machine it runs nor do we have a choice on what type of hardware we want to run it on. So, we won't be able to do an Machine Learning inference using a GPU in a Lambda function as it requires access to the underlying GPU hardware which Lambda doesn't provide.


In the above diagram with different AWS Services as we move from left to right the flexibility of the services decreases. This is the dimension I would like to add  to the original discussion in the AWS article.

The Bare Metal on the extreme left is very flexible but with a lot of responsibility on the Customer, on the other extreme the Lambda function is less flexible but with less responsibility on the Customer. Depending on the requirement, budget and lot of other factors the appropriate AWS service can be picked.

We have Lambda which is a type of FAAS as the highest level of abstraction, I was thinking what's next abstraction on top of Lambda/FAAS. Any clue?

Thursday, September 6, 2018

How to run Windows Containers? Not using Minikube !!!

How we ran Minikube?

In the previous blog we looked at installing Minikube on Windows and also on Linux. In both the cases, the software stack is the same except replacing the Linux OS with the Windows OS (highlighted above). The Container ultimately runs on Linux OS in both the cases, so only Linux Containers and not the Windows Containers can be run in case of Minikube.

How do we run a Windows Container then?

For this we have to install Docker for Windows (DFW). Instructions for installing here. Prerequisite for DFW is support for Hyper-V which is not available in Windows Home Edition, need to upgrade to a Pro edition. In DFW K8S can be enabled as mentioned here.
There are two types of Containers in the Windows world, Windows Containers which runs directly on Windows and shares the host kernel with other Containers. And the other type of Container is Hyper-V Container which has has one Windows Kernel per Container. Both the types of Containers are highlighted in the above diagram and are detailed here.

The same Docker Windows image can be run as both Windows Container and Hyper-V Container, but the Hyper-V container provides an extra isolation. The Hyper-V Container is as good as a Hyper-V Virtual Machine, but uses a light weight and tweaked Windows Kernel. Microsoft documentation recommends using Windows Containers for stateless and Hyper-V Containers for stateful applications.

As seen in the above diagram the Windows Container runs directly on top of the Windows Pro OS and doesn't use any Virtualization, but Hyper-V is a prerequisite for installing Docker for Windows, not sure why. If I get to know I will update the blog accordingly.


In this blog, we looked at a very high level on running Windows Containers. Currently, I have Windows Home and Ubuntu as dual boot setup. Since, I don't have a Windows Pro with Hyper-V enabled, I am not able to install Docker for Windows. Will get Windows updated to Pro and will write a blog on installing and using Docker for Windows. Keep looking !!!

On a side note, I was thinking about setting up an entire K8S cluster on Windows and looks for now it is not possible. The K8S documentation mentions that the K8S control plane (aka master components) have to be installed on a Linux machine. But, Windows based worker nodes can join the K8S cluster. Maybe down the line, running an entire K8S cluster on Windows will be supported.

Note : Finally I was able to upgrade my Windows Home to Professional (here),  enable Hyper-V (here) and installed Docker for Windows (here).