Big Data and Cloud Tips: March 2019

Tuesday, March 26, 2019

K8S Support for Windows Containers

K8S has two components, the Master and the Worker. The Containers wrapped in Pods are executed on the Worker nodes, while the Master node schedules the Pods across the Worker nodes based on the resource availability and a couple of other factors like Taints and Tolerations. For K8S, the nodes in the Cluster combined is all like one big NODE or machine.

Till date both the Master and the Worker nodes in K8S used to run on different flavors of Linux and it was not possible to run Windows based workloads on K8S. Even in the Azure Cloud it was all Linux VMs for the Master and the Worker nodes.

With the new K8S 1.14 release support for Windows based Worker nodes has moved from Beta to GA, still the Master node is all on Linux only. Now it's possible to run/schedule Windows Containers through K8S on the Windows Worker nodes and not just through Docker Swarm.

While this is the most significant feature, along with this there are a couple of other features are introduced in K8S 1.14 release as mentioned here. There is a plan to blog about this and other new K8S features in the upcoming blogs in K8S official blog here. So, keep following the blog to get to know the latest around K8S.

This is a big thing as it enables Windows Containers to be scheduled in a K8S Cluster to take advantage of .NET runtime and the ecosystem Windows has to offer. And Microsoft gets a chance to push for more Windows based workloads on K8S. It surprises me that it took 4 years for adding Windows Container support to K8S, since it was released in 2015.

How is the contribution to K8S across companies?

The good thing about K8S is that almost all of the big companies are rallying around it, instead of providing their own alternatives. Here you can get the number of contributors to K8S grouped by companies for last month. Metrics for other data ranges can also be seen. There are also couple of other pre-built dashboard around a lot of other metrics there. Obviously Google has the most number of contributors followed by RedHat and VMware.

The visualization is provided by Grafana. The home page (https://devstats.cncf.io/) provides metrics for other CNCF projects (including Graduated, Incubating and Sandbox).

Monday, March 25, 2019

Automating Canary deployment with Jenkins, Spinnaker and Prometheus

In one of the previous blog, we looked at Canary deployment and how it can be done with Istio (service mesh). Initially a small percent of the traffic (may be 1%) is sent to the new version and the remaining (99%) is sent to the old version. As performance metrics are met, errors are within limit for the new version then the amount of traffic to the new version is increased incrementally till all the traffic goes to the new version.

We have seen how to do the same manually (again from the previous blog), but the whole thing can be automated using a CI tool (Jenkins etc), CD tool (Spinnaker etc) and a monitoring tool (Prometheus etc).

Jenkins workflow gets triggered for changes to Git and a Docker image is built and pushed to image registry.
Docker webooks triggers a Spinnaker pipeline to start a Canary Deployment of the new image to a K8S environment.
In the Spinnaker pipeline, the metrics from Prometheus are used to scale up or scale down the amount of traffic to the new version of the service using Canary Deployment in an incremental fashion.

Courtesy here

Again there is an interesting article and demo from Kublr around the above flow (excluding the Jenkins part). In the demo the image is built manually and pushed to Docker, which triggers the rest of the workflow. The only thing missing from the demo is the installation, configuration and integration of the different softwares. The setup has already been done and workflow is triggered and the execution shown. It's worth to see the demo and understand how the different pieces fit together.

It's all interesting to see and know how the different pieces fit together in the Microservices world. In the upcoming blogs we will see the steps to setup such an environment from scratch or by using one of the K8S management platforms.

In the case of BigData there were vendors like Cloudera, HortonWorks, MapR who were integrating the different softwares like Hadoop, Hive, Pig, HBase, Cassandra etc and make sure they work nice together. I guess in the K8S space also there will be such vendors. If yes, please update in the comments below.

Sunday, March 24, 2019

Getting started with Canary Deployment with Istio on K8S

K8S and Istio aren't that easy and it took me some time to figure it out and get started with Istio and setup a few working examples. So, this post is all about getting anyone interested in Istio to get started quickly and easily with 'Canary Deployment with Istio'

What is Canary Deployment?

Canary Deployment is a deployment technique in which a new version of the software is incrementally introduced to a subset of users. In the below figure, the new version V2 of the service is rolled out to only 1% of users and the rest are still using the old version V1 of the service. This reduces the risk of rolling out a faulty version to all the users.

The different characteristics like performance of V2, user feed back etc are observed and incrementally more and more users are exposed to the new version V2, till the 100% of the users are exposed to V2. Any problem with the version V2, the users can be rolled back to V1. More about Canary Deployment here. It's an old article, but the concepts are all the same.

Why Canary Deployment with Istio on K8S?

Canary Deployment can be done with plain K8S as well with Istio on top of K8S. With plain K8S, there are a few disadvantages. The ratio of users exposed to the service is proportional to the number of Pods of that service. Lets say we want 99% of the users to V1 and 1% to V2, then there should be a minimum 99 Pods of V1 and 1 Pod of V2, irrespective of the amount of traffic to the Pods. This is because the KubeProxy uses round-robin across different pods.

This is not the case with Istio, where we can specify the traffic split across different versions and this is independent of the number of the Pods of each version. Again, lets say we want 99% of the users to V1 and 1% to V2, then in the case of Istio at a minimum we need 1 Pod of V1 and 1 Pod V2. This happens because of the Istio component VirtualService.

How to get started with Canary Deployment on Istio?

Once Istio has been properly installed on top of K8S, the below components have to be created with the proper configuration to get started with Canary Deployment.

K8S Components

Deployment
Service

Istio Components

Gateway
VirtualService
DestinationRule

Instead of repeating the content here, I would be pointing to some resources which I think are really good to get started with Canary Deployment on Istio. The below are the links to Istio Official Documentation on Canary Deployment. It has got code snippets, but it's not complete and cannot be used As-Is.

1) Istio Official Documentation on Canary Deployment

2) Istio Official blog entry on Canary Deployment

The below blogs have a good explanation around Canary Deployment with working code snippets. I would recommend to get start with the below. If you have access to an existing K8S cluster then the steps of installing a K8S cluster using Kubler tool can be skipped.

3) Kublr Blog with code for Canary Deployment (Weight Based)

4) Kublr Blog with code for Canary Deployment (Intelligent Routing)

When creating a Gateway as mentioned in the above Kublr blogs on the Cloud then a Load Balancer will be created and the webpage can be accessed by using the URL of the Load Balancer. But, when creating a Gateway on a local machine or in a non Cloud environment a Load Balancer is not created.

In this case, port forwarding has to be used from any port (1235) in the below command to the istio-ingressgateway pod. Then the webpage can be accessed externally by using the IP address of the master on port 1235. Note that in the below command the pod name istio-ingressgateway-6c756547b-qz2tx has to be modified.

kubectl port-forward -n=istio-system --address 0.0.0.0 pod/istio-ingressgateway-6c756547b-qz2tx 1235:80

In the upcoming blogs, we will explore the other features of Istio like Injecting Faults, Timeouts and Retires.

Friday, March 22, 2019

Tips and Tricks for using K8S faster and easier

In this blog I will list down the Tips and Tricks around usage of K8S making it easier and faster to use. Some of these are extremely useful during the Certifications also. Usually I add them to my .bashrc file.

I will try to continuously update this blog as I get across many more. Also, if you come across any additional tips, let me know in the comments and I will add them here.

1) Create a shortcut for kubectl, this will save a few key strokes.

alias k='kubectl'

2) Adding this to .bashrc makes the auto completion work with the alias 'k' mentioned above. This makes it faster to complete the commands.

source <(kubectl completion bash | sed s/kubectl/k/g)

3) This deletes the namespace and creates it again. This makes sure that all the objects in the namespace are cleanup. This makes K8S faster as things the objects don't get piled over time.

alias kc='k delete namespaces my-namespace;k create namespace my-namespace'

Once this is done, the default namespace has to be changed. Note to change the current context (kubernetes-admin@kubernetes) in the second command based on the output of the first command.

kubectl config current-context
kubectl config set-context kubernetes-admin@kubernetes --namespace=my-namespace

4) The watch command is used a lot in K8S to observe the the different objects getting created and destroyed (object life cycle). When we run 'watch k get pods' the 'k' alias doesn't get expanded unless the below is included in the .bashrc.

alias watch='watch '

5) A bunch of them are mentioned here at

-- 'Pimp my Kubernetes Shell'.
-- Boosting your kubectl productivity

6) YAML is better than XML in verbosity. But, typing YAML is still a pain. The below commands will create the sample YAML files that can be modified later on as per our requirement. Note that the actual K8S objects are not created, just the YAML files as the --dry-run option is used.

k run nginx --image=nginx --restart=Never --dry-run -o yaml > pod.yaml
k run nginx --image=nginx -r=2 --generator=run/v1 --dry-run -o yaml > rc.yaml
k run nginx --image=nginx -r=2 --dry-run -o yaml > deployment.yaml
k expose deployment nginx --target-port=80 --port=8080 --type=NodePort --dry-run -o yaml > service.yaml

That's it for now. Have fun with K8S.

7) With tmux multiple terminal sessions can be accessed simultaneously in a single window. Usually I split into multiple sessions based on the task at hand. Tmux is a bit tricky to get started with, but very easy to get addicted.

Here the window has been split into two sessions. On the right is the help with the examples and on the left is the prompt to try out the commands. No need to toggle across different screens.

Here on the top-right is the 'watch k get pods' and bottom-right is the 'watch k get deployments' command. And on the left is the prompt to try out the commands. In this case, we can notice the pods and deployments getting created and destroyed. The life cycle of the K8S objects.

Add the below to .bashrc to get the tmux start automatically in the terminal.

#Start tmux autmatically
if command -v tmux>/dev/null; then
  [[ ! $TERM =~ screen ]] && [ -z $TMUX ] && exec tmux
fi

Exposing a K8S Service of Type LoadBalancer on Local Machine

As we had been exploring in this blog, there are different ways of installing and using K8S. It can be in the Cloud, In-Premise and also on your Laptop. For the sake of trying different things I prefer installing it on the Laptop as I would have complete control over it (building and breaking) and also also that I can work offline. And there is no cost associated with it. But, there are a few disadvantages of it, like not being able to integrate with the different Cloud services. We will look at one of the disadvantage of installing K8S on the Laptop and a workaround for the same.

When we create Pods then an IP address is assigned to them and when a Pod goes down and a new one is created by the Deployment automatically, then the Pod might be assigned a different IP address. So, when the IP address is not static then how do we access them? This is where a K8S Service comes into play. More about the K8S Service and the different ways they can be exposed here, here, here and here.

Based on how we want to expose a Service depending on the access type, Service can be exposed using Type as a ClusterIP, NodePort or a LoadBalancer as mentioned here. When a Service of Type LoadBalancer is created, then K8S will automatically creates a Cloud vendor specific Load Balancer. This is good in the Cloud, but what happens when we create a Service of Type LoadBalancer in a non-Cloud environment as in the case of Laptop where a Load Balancer can't be provisioned automatically. This can be addressed using MetalLB which is described as a 'bare-metal load-balancer for K8S'.

So, lets get into the sequence of steps to get around this problem using Metallb.

Step 1: Create a file nginxlb.yaml with the below content. This basically creates a Deployment and a Service for nginx.

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1
        ports:
        - name: http
          containerPort: 80

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
  type: LoadBalancer

Create another file layer2-config.yaml with the below content. This creates a ConfigMap with the IP address range pool which can be assigned to the Load Balancer. Note that the IP address range has to be modified according to the network to which your Laptop has been connected to.

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: my-ip-space
      protocol: layer2
      addresses:
      - 192.168.0.30-192.168.0.40

Step 2: Create a Deployment and Service using the 'kubectl apply -f nginx' command. Notice that the EXTERNAL-IP is specified as <pending> because in the local setup a Load Balancer cannot be created as in the case of Cloud.

Step 3: Delete the Deployment and the Service as they will be installed later again with MetalLB setup completed using the 'kubectl delete -f nginxlb.yaml' command.

Step 4: Install the MetalLB and the related components using 'kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.7.3/manifests/metallb.yaml' command.

Make sure that the corresponding Pods are created using the 'kubectl get pods -n=metallb-system' and also that they are in the Running status after a few minutes.

Step 5: Apply the configuration for the MetalLB installation using the 'kubectl apply -f layer2-config.yaml' command. In this file we specify the range of IP address from which the Load Balancer will get an IP address. Don't forget to change the IP address in the file, based on the network the Laptop is connected to.

Step 6: Now that MetalLB setup has been done. Recreate the Deployment and the Service using 'kubectl apply -f nginx.yaml'. Notice in the output of the 'kubectl get svc' command, the EXTERNAL-IP is 192.168.0.31 and not <pending> as was the case earlier.

Step 7: Now that the Load Balancer has been created, open the 192.168.0.31:8080 URL in a browser to get the below nginx page. Note to change the IP address based on the above step. Now, we are able to setup a Service of Type set to LoadBalancer if we get the below page.

Step 8: Finally delete the nginx Deployment/Service and the MetalLB related components using the 'kubectl delete -f nginxlb.yaml' and the 'kubectl delete -f https://raw.githubusercontent.com/google/metallb/v0.7.3/manifests/metallb.yaml' command. This is an optional step, just in case you want to delete it.

Conclusion

By default in a non-Cloud environment when we create a service with Type set to LoadBalancer the EXTERNAL-IP is set to <pending> as a Load Balancer cannot be created. But, in this blog we have seen how to configure a Layer 2 Load Balancer (MetalLB) on the Laptop to get a Load Balancer created. Below is the screen shot before and after the MetalLB setup has been done. Note the value of the EXTERNAL-IP.

Before

After

Monday, March 18, 2019

Quickly and easily installing K8S on the local machine

In the previous blog here we have seen how to get started with K8S easily with zero-installation using Play-With-Kubernetes (PWK). Everything happens in the remote machines, so nothing to install on the local machines. We can get started with K8S in less than 5-10 minutes using PWK. The main con of PWK is that the session is available for 4 hours and any modifications to the K8S cluster are lost.

One easy way to use K8S locally is to use Minikube as mentioned here, but it provides a single node cluster and it makes it tough to test the different failure scenarios like a node going down and a few other things.

In this blog we will try to install a multi-node K8S locally on the laptop as mentioned here, so that the changes are persisted across sessions. We should be able to continue from where we left. K8S-the-hard-way sets up a cluster from scratch, but it takes time and expertise. So, there are tools like kubeadm which abstracts and makes the installation process easier.

With kubeadm there are a sequence of steps to install a multi-node cluster on laptop. And for those who are new to Linux, it might be a pain. So, I was trying to figure out if the installation process using kubeadm can be automated using Vagrant. Tried a couple of hours, got stuck and gave up. And then luckily I found a ready made Vagrantfile from this article, which made the K8S installation process a breeze.

On a side node a multi-node K8S cluster can be run on the Cloud, but not every one is comfortable with the Cloud, so here are the steps using VirtualBox and Vagrant on the local machine.

Step 1: Download and install the latest version of VirtualBox and then Vagrant. For the sake of Vagrant, you might have to restart the OS. The installation is pretty obvious as installing any other Windows software.

Step 2: Make folder on the laptop and create a Vagrantfile with the content from here. If required the amount of Memory and CPU cores can be modified in this file.

Step 3: Go to the above created folder and run the command 'vagrant up' from the Command Prompt. It takes a couple of minutes to create Virtual Machines in VirtualBox, download and install the K8S and the required binaries. The end screen will appear as shown below.

And the Virtual Machines (k8s-head, k8s-node-1 and k8s-node-2) will appear as shown below. We are all set with the K8S installation. It's a piece of cake. It had never been easy to install softwares.

Step 4: K8S follows a master-slave architecture. Login to the master using 'vagrant ssh k8s-head' and run the 'kubectl get nodes' to make sure all the nodes are ready.

Step 5: Now lets create a deployment using the 'kubectl run nginx --image=nginx -r=4' and make sure it has been deployed using the 'kubectl get deployment' and 'kubectl get pods' commands.

Step 6: Now if we want to destroy the cluster, run the 'vagrant destroy -f' command from the earlier created folder and the Virtual Machines will be shutdown and deleted.

Step 7: During the installation if something goes wrong then it will be displayed on the screen and more details will be logged to 'ubuntu-xenial-16.04-cloudimg-console.log' file in the same folder.

As seen above all it takes is a couple of steps to create a multi node K8S on the laptop. Now you should be all setup to get started and explore the world of K8S. Further nodes can be added by modifying the Vagrantfile and running the 'vagrant up' command.

In the upcoming blogs, we will try to install additional packages or applications on top of the above K8S cluster and try different things with them.

Note: Joserra in the comments points to the K8S blog on the same here. This blog uses VirtualBox and Vagrant. While the K8S blog uses Ansible to run the commands in the VM on top of VirtualBox and Vagrant. The end result of both of them are the same.

Monday, March 11, 2019

Getting started with K8S the easy way using 'Play with Kubernetes'

There are many ways of installing K8S as mentioned here. It can be installed in the Cloud, on-premise and also locally on the laptop using virtualization. But, installing K8S had never been easy. In this blog, we will look at one of the easiest way to get started with K8S using Play with Kubernetes (PWK). With this the whole K8S experience is within the browser and there is nothing to install on the laptop, everything is installed on the remote machine. PWK uses 'Docker in Docker' which is detailed here (1, 2).

Step 1: Go to https://labs.play-with-k8s.com/, Login and click on Start. A Docker or a Git login would be required for the same.

Step 2: PWK allows up to 5 nodes or machines. Click on 'ADD NEW INSTANCE' for 5 times and this will add 5 instances as shown below from node1 to node5. Here we will configure node1 as master and the remaining as workers.

Clicking on a node in the left pane will give access to the corresponding terminal in the bottom right pane. The combination 'Alt+Enter' will maximize the terminal.

Step 3: Run 'kubeadm config images pull' command on node1. This will pull all the images required for the installation before the actual installation starts in the next step. This is an optional step, but this step makes the installation faster.

Step 4: Init the master on node1 using the 'kubeadm init --apiserver-advertise-address $(hostname -i)' command. The output of the command should be as shown below. Note down the 'kubeadm join .....' command from the output of this command.

Step 5: Now is the time to deploy the Pod network using the below command on node1.

kubectl apply -n kube-system -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 |tr -d '\n')"

Step 6: Execute the 'kubeadm join ......' command on all the workers (node2, node3, node4 and node5). On each of the node the 'This node has joined the cluster' will be displayed towards the end of the output. The 'kubeadm join ......' command has been got from Step 4.

Step 7: After a few minutes run 'kubectl get nodes' on the master node (node1) and all the nodes should be in a Ready status. This makes sure that out 5 node K8S cluster is ready.

Step 8: Lets create a K8S Deployment with 4 replicas on the nginx server by running 'kubectl run nginx --image=nginx -r=4' on the master node (node1). Initially the status of the Containers will be in 'Container Creating. But, in a few seconds it will change to Running.

Step 9: Get the detailed status of the Pods using 'kubectl get pods -o wide' command. This sill show that the Pods are balanced across all the nodes.

The K8S Deployment objects maintains a fixed number of Pods. Delete one of the Pod using 'kubectl delete pod NAME-OF-THE-POD'. Notice that the Pod will be deleted and a new Pod is automatically created. This can be observed by running the 'kubectl get pods -o wide' command again. The name of the deleted Pod will be changed.

The K8S session would be available for 4 hours. And also any resources/setting done will be lost after the session. The changes to the cluster won't be persisted. Likewise there are a few disadvantages of using PWK, but the good thing is it's free and requires no installation on the local machine.

In the upcoming blogs, we will try to explore the other ways of installing K8S. Also, check Katakoda. It offers K8S in the browser similar to PWK.

Friday, March 1, 2019

Webinar to know about CKAD and CKA Kubernetes Certifications

Kubernetes is all about orchestrating Microservices. Instead of repeating what it's all about, here is home page for Kubernetes with more details. CNCF offers CKAD and CKA certifications around Kubernetes. While CKAD is more from a developers perspective, CKA is from administration perspective. Out of these CKA is a bit tougher compared to CKAD. While most of the certifications are theoretical, the Kubernetes Certifications are practical, a set of tasks have to be completed in a given time on a Kubernetes cluster. So, hands on is pretty much required for the Certifications.

Here is a recorded webinar from CNCF on getting started with the Certifications. I was preparing for the Kubernetes certification, but got deviated. Planning to get back to get back to the Certifications again. Will write a detailed blog on these Certification once I get through the Certifications.

Paper on Serverless Computing from Berkeley

Cloud Computing moves MOST of the administration from the Cloud consumers to the Cloud providers. No need to think about procuring hardware, networking, cooling, physical security etc. Serverless moves in the same direction, taking away even more administration from the Cloud consumers.

The name `Serverless` is a bit of misnomer as there are still servers involved. The only thing is that the Cloud consumers need not think in terms of Servers. Take the example of FAAS (Function-As-A-Service). Here are the sequence of steps, no where a SERVER is mentioned.

- Write and test a function
- Package the function
- Deploy the package to the Cloud
- Associate an event with the function (to be invoked automatically) or provide an API Gateway (to be invoked programmatically)

There is no mention of SERVER in the above and so the name Serverless. The good thing about FAAS is that it scales automatically and there is no need pay when the function is not invoked which is not the case of IAAS, PAAS and SAAS. We pay based on the number of function invocations and the amount of resources consumed.

Serverless has a lot to go, but applications can be built end-to-end without thinking about Servers and so Serverless. Here is a recent good read Cloud Programming Simplified: A Berkeley View on Serverless Computing about the pros, cons, challenges, research areas and finally predictions of Serverless computing.

Also to get a hang on FAAS, here is an blog I have written using AWS Lambda to to trigger a Java function which shrinks an image as soon as it has been uploaded to AWS S3.

Pages