Saturday, September 19, 2020

Optimal VirtualBox network setting for K8S on Laptop

In one of the previous blog we looked at setting up K8S on a laptop. The advantages of this setup is the freedom to try out of different things and it is very quick to get started. On my laptop it takes about 5 minutes for the Virtual Machines to start including the K8S in them. The downside is it's mainly for learning things and doesn't take much load.

Recently I bought a new Lenovo ThinkPad and so had to go with the entire exercise of setting up K8S on it. BTW, pretty happy with the Laptop. The only gripe is that it comes with 8GB of RAM, need to upgrade it to 16GB, the maximum RAM it supports. The Laptop is very light and I can snug into any corner of the house to work with concentration easily.


Above is the setup on my previous Laptop, with one Control Plane (master) and two slaves. There had been a few problems with the VirtualBox networking. Different types of networking are supported by VirtualBox (1) and Bridged Networking was used. With Bridged Networking everything was working fine with the below problems.

- Had to be always connected to the network. Won't be able to work in the offline mode.
- Also, switching between the different networks will change the IP of the master and K8S would stop working.

As mentioned above there is more than one way of configuring the network in VirtualBox. The same can be seen in the Virtual Machine settings under Network tab.


Here(1) is a good article on the different types of networking in VirtualBox and details about them. On the Y-Axis we have the different types of networking and on the X-Axis the features they support. Let's narrow down to the type of networking we would like to use with VirtualBox by identifying the required features for having a K8S Cluster on the Laptop.


-- "VM <--> VM" -- Required for communicating across VM instances.
-- "VM <- Host" -- Required as we need to connect from the Host OS to the Guest for debugging etc.
-- "VM --> LAN" -- Required for the internet connection to download the different softwares
-- "VM --> Host" -- Is optional for connecting from the Virtual Machine to Host
-- "VM <-- LAN" -- Is optional for accessing the K8S Cluster from outside the Laptop

From the feature matrix and the required features, the only options left around the VirtualBox networking are NAT Network and Bridged Networking. The problem with the Bridged networking is that as mentioned above, it always requires connection to the network and switching to a different network changes the IP of the K8S master and breaks down the entire setup. The certificates during the K8S setup are tied to a specific IP and need to generated again each time the IP address of the master changes (1). This is not impossible, but is tedious every time we change the network and the IP address of the master changes. So, the only optimal option left is to use the NAT Network.

With the combination of the NAT Network in VirtualBox and using static IP address in guest Virtual Machines, we don't need to worry about changing from one network to another as the VirtualBox NAT Network has a DHCP component and an IP address from it can be configured as Static IP for the Guest Virtual Machines. Also, a Virtual Switch would be used for the communication across the different guest Virtual Machines and there is no need to be connected to the network. This ensures that we can work in the offline mode with K8S on the laptop even we are on the move. Below are the different components while using the VirtualBox NAT Network and how the network communication happens. Highlighted in the red is how the network communication happens.


The only catch with the NAT Network is that we won't be able to connect to the guest Virtual Machines directly without doing any port forwarding as mentioned in the VirtualBox documentation here (1). The documentation mentions NAT, but the same applies to the NAT Network also. This is a not a big issue, but is a matter of configuring the VirtualBox with "Port Forwarding Rules" before connecting to the guest Virtual Machines.


In a future blog, I will provide the binaries and the steps to easily setup K8S on a laptop. But, for now I took a screenshot of the Memory usage before and after starting the Virtual Machines on the laptop.

(Before)

(Starting the Virtual Machines with K8S)

(After)

(Laptop CPU and RAM)

Within 4 to 5 minutes, I was able to login to the K8S master and able to get the list of nodes and the pods using the kubectl command.


Conclusion

To conclude setting up K8S is not a hard task, but requires a bit of patience for the installation of the OS, softwares, configurations and finally cloning the Virtual Machines, so as to avoid repetition of tasks and saving time. Also "VirtualBox NAT Network" is the best option in the network type as this enables to work in the offline mode and doesn't break the K8S setup while switching between networks.

As mentioned I would be uploading the Virtual Machines Images and would be detailing the procedure for setting up K8S on a Laptop. But, I need to zip and uploads huge files, so it might take some time.

Wednesday, August 12, 2020

Connecting to S3 Service via VPC Gateway Endpoint

Lets say we are building a image processing application using ML which gets the images from S3 and identifies the action performed (sitting, standing etc) in those images . By default the network data flows from the application to S3 over the internet as shown in the left image which is not really that efficient and secure. AWS provides VPC Gateway Endpoint feature and this all the data will be within the AWS network only.

 
VPC Endpoints provides Gateway Endpoints for S3 and DynamoDB services, while Interface Endpoints are for the rest of the services. In this blog we will explore the VPC Gateway Endpoints.
 

Step 1: Create a VPC as mentioned in the previous blog and connect to the EC2 in the Private Subnet. By default the route table for the Private Subnet has route for 0.0.0.0/0 and so any EC2 in the Private Subnet will have internet connection.

 
The same can be verified by pinging google.com or some other host.

Step 2: In the Putty execute the below commands to install the AWS CLI.

sudo apt-get update
sudo apt-get install python2.7 python-pip -y
pip install awscli --upgrade
export PATH="$PATH:/home/ubuntu/.local/bin/" 

Step 3: Create an IAM Role with AmazonS3ReadOnlyAccess and attach it to the EC2 instance in the Private Subnet.

Step 4: Lets remove the internet connection for the EC2 instance in the Private Subnet. For this select the Routing Table for the Private Subnet and click on Edit Routes. Delete the route for 0.0.0.0/0 and click on "Save routes".

 

The Route Table will be updated as shown below.

 
Step 5: There is no need for NAT Gateway and the ElasticIP as we have removed the internet connection for the EC2 in the Private Subnet. Make sure to remove them. You can also keep it, but there is a cost associated with it.


Test out the internet connectivity (ping google.com") and also try to get the list of files in S3 (aws s3 ls). Both the commands should fail. Press Ctrl+C to come out of the commands.

Step 6: In the VPC Management Console, go to "Endpoints" and click on "Create Endpoint".

Search for S3 in the Service Name and select "com.amazonaws.us-east-1.s3". Make sure to select the VPC which was created in the previous step and select the Private Subnet.


Rest of the default options are good enough. Click on "Create endpoint" and the Endpoint will be created in a few minutes.

Step 7: Go back to the Route Table of the Private Subnet and note that a Route has been automatically added to the VPC Endpoint.

 
Step 8: Go back to the Putty session and execute the below commands. Notice that there is no internet connection, but still we are able to get the number of buckets in the AWS S3. This is because we have setup the AWS Gateway Endpoint and all the traffic remains with the AWS network only.
 
ping google.com
aws s3 ls | wc -l 

Conclusion

By default when we consume any AWS service from an EC2 instance the network traffic goes through the internet, which is not really secure. And there is an additional cost for NAT and Internet Gateway. By using the VPC Endpoint Gateway we noticed that the network traffic remains within the AWS network only. This makes it easy for migrating the applications to AWS and also make sure they are compliant.

Tuesday, August 11, 2020

Creating a VPC and connecting to the EC2 in the Private Subnet

An AWS VPC (Virtual Private Cloud) is a logically isolated network for isolating different environments like Production, QA, Development. VPC can also be used to isolate applications like CRM, HR and others. Applications is one VPC by default won't be able to communicate with applications in another VPC. A VPC Peering Connection has to be explicitly setup for communication to happen between two VPC.


In this blog we will setup a new VPC. In this VPC, we will create a Public and a Private Subnet. The way we will configure them is that any EC2 in the Public Subnet will have a Public and Private IP address, while any EC2 in the Private Subnet will have only a Private IP address associated with it. We can connect to the EC2 in the Public Subnet as it has a Public IP, but how do we connect to the EC2 in the Private Subnet as it doesn't have a Public IP? This might be required for making any changes to that particular EC2 instances to perform tasks like installing/upgrading databases etc.

One way to it to setup a VPN connection between the Laptop and the VPC as mentioned in the previous blog. This way the Laptop and the EC2 in the Private Subnet will appear as though they are in the same network and so we would be able connect from the Laptop to the EC2 in the Private Subnet. Another way is to connect to the EC2 in the Public Subnet using the Public IP and from there connect to the EC2 in the Private Subnet using the Private IP. This is what we would be exploring in this blog.

Step 1: Go to the EC2 Management Console and make sure the "New EC2" experience is selected.


Step 2: Click on the "Key Pairs" and click on "Create key pair". Enter the Key pair name and make sure the ppk format is selected and click on "Create key pair".

The Key pair would be created as shown below.


Step 3: Click on "Elastic IPs", click on "Allocate Elastic IP address" and finally click on Allocate. The Elastic IP address is required for NAT instance which will be automatically created while creating the VPC later.

An Elastic IP address will be created as shown below.

Step 4: Go to the VPC Management Console and click on "Launch VPC Wizard".

Step 5: Select "VPC with Public and Private Subnets" Option and click on Select.

Step 6: Enter the name of the VPC as MyVPC and select the Elastic IP created in the previous steps. Finally, click on "Create VPC". Rest of the default options are good enough.

The VPC creation process process takes a few minutes and the VPC screen will be updated as shown below.

Step 7: Any EC2 created in the Public Subnet, will only have the Private IP, we need to make sure it also has a Public IP address. For this to happen, from the list of Subnets select the Public Subnet, Actions -> "Modify auto-assign IP settings".

Make sure to check "Auto-assign IPv4" and click on Save.

Step 8: Click on the Security Groups and click on "Create security group". Enter the name as AllowSSH, give some description and add an inbound rule as displayed below to allow Port 22/ssh inbound. Make sure to select the VPC which has been created in the previous steps. Click on "Create security group". This one will be associated with the EC2 instances later.



Step 9: Create Ubuntu EC2 instances with t2.micro as the instance type in the MyVPC, one each in Private and Public Subnets. Make sure to select MyVPC and the appropriate Subnet in 'Configure instance' options while creating the EC2 instance. Also, select the AllowSSH Security Group and the KeyPair which has been created in the previous steps, this allows us to connect to the EC2 via SSH using the Key pair.

Step 10: Name the EC2 instances as MyVPC-PublicSubnet and MyVPC-PrivateSubnet appropriately. Also notice that the EC2 in the Public Subnet has both Private and Public IP address, while the EC2 in the Private Subnet has only Private IP address as shown below.


Step 11: Download putty.exe and pagent.exe from here. There is no need to install any of these softwares, simply download them. Start the pagent.exe and add the ppk file which has been downloaded in one of the previous steps.


Step 12: Open putty.exe and in the Host Name field enter the username ubuntu and the Public IP address of the EC2 in the Public Subnet separated by the symbol @ as shown below.

Step 13: Go to Connection --> SSH --> Auth and make sure to select "Allow agent forwarding" and click on Open to connect to the EC2 instance in the Public Subnet. There is no need to specify the Private key as it has been specified in the pagent.exe.


Step 14: From the Putty session execute the below command to connect to the EC2 in the Private Subnet. Make sure to replace 1.2.3.4 with the Private IP address of the EC2 in the Private Subnet. Note that there is no need to specify the Private key this time also as it is from the pagent.exe. Now on this EC2 instance we should be able to install any back-end applications like database, business logic and so on.

ssh ubuntu@1.2.3.4

 
Step 15: Finally the cleanup process

    - Terminate the EC2 instances
    - Delete the NAT Gateway
    - Wait for a few Minutes
    - Delete ElasticIP
    - Finally delete the VPC
    - Delete the Key pair
 

Conclusion

An EC2 instances in the Private has only a Private IP and is primarily used to host back-end applications like Database, Business Logic and so on. Since it doesn't have an Public IP, we won't able to connect to out from outside the Laptop using the Private IP. We have seen how to connect to it using the EC2 instance in the Public Subnet as a Bastion or Jump box.

For this we have used pagent.exe which stores Private Key in the memory and is not a safe approach. Another approach is to copy the Private key to the Bastion or Jump box which is also not a safe approach. Both the approaches are easy to use, but are not safe and is recommended in the non-production environments. For the production environments using the Client VPN is the preferred approach, in this the Bastion or Jump Box is altogether avoided and all the communication between to the AWS Cloud is encrypted using IPSec protocol.