Wednesday, April 8, 2020

How does the K8S cluster gets bootstrapped?

Although the different Cloud vendors provide managed service like AWS EKS, GCP GKE, Azure AKS and others, nothing beats running K8S on the local machine. Not only we can get started quickly, it's absolutely free and provides the ultimate freedom for any experimentation. Here is the setup I have using Oracle VirtualBox on Windows 10 OS.


It's a 3 node K8S cluster with one Control Plane node to do the orchestration and scheduling of the containers and two Worker nodes for the execution of the the containers. Recently, I upgraded to the latest version of K8S (1.18 as of this writing) and was able to try some of the new features.


The command 'kubectl get pods --all-namespaces -o wide' gives the Pods in all the namespaces. Below is the screenshot listing the Pods on the Control Plane and the Workers node. I was curious on how the Pods get started, this will give us a chance to check out the initialization parameters, tweak them and also to enable/disable features. This blog is all about the same. Note that the instructions are specific to installation using kubeadm and differ a bit for other installation process.


I was poking around in the K8S cluster and this StackOverflow Query (1) helped me. Below is the workflow on the how the K8S cluster gets bootstrapped. It all starts with the kubelet getting started automatically as a systemd service on all the Control Plane and the Workers nodes, which starts the minimum required static Pods for K8S to start working. Once the K8S Cluster boots up, additional Pods are started to get the K8S cluster into the desired state as stored in the etcd database.


Here is the workflow in a bit more detail


- Below is how the kubelet gets started as a systemd Service. /etc/systemd/system/kubelet.service.d/10-kubeadm.conf has the command to start the kubelet process and also the initialization parameters. Note that one of the parameter is /var/lib/kubelet/config.yaml location.


- In the /var/lib/kubelet/config.yaml file the staticPodPath variable is set to /etc/kubernetes/manifests path which has the yaml files for the static Pods to be started once the kubelet starts. As of now the apiserver, scheduler and etcd haven't started yet. So, kubelet starts them and manages them. Although these Pods are visible to the apiserver later, the apiserver doesn't manage them, kubelet is the one which manages them.


- In the /etc/kubernetes/manifests folder we have the yaml definitions for the etcd, apiserver, contoller-manager and the scheduler. These files will help us to understand how the K8S system Pods are initialized.


- OK, how about the coredns, flannel and proxy Pods getting started? The coredns and the proxy Pods are created by kubeadm during the K8S cluster initialization phase (1, 2) using kubeadm. The flannel Pods were created by me manually while setting up the K8S cluster (1). The details of these are stored in the etcd database as with any other user created K8S objects and K8S will automatically start them once the cluster starts.


Mystery solved, now that we know on how the K8S cluster gets bootstrapped, more can be explored. For using and administering K8S it is required to know the K8S bootstrap process in a bit more detail.

Tuesday, April 7, 2020

How does Key Pair work behind the scenes for Linux EC2 authentication?

Different  ways of authenticating against Linux EC2


Once a Linux EC2 instance has been created, the same can be accessed via Putty or some other SSH client. To access the EC2, first we need to authenticate the user. Either the Username/Password or the KeyPair can be used for authentication. There are pros and cons of each of them. AWS has chosen to go with the KeyPair way of authentication by default. If required this can be disabled and the Username/Password can be enabled.


How does KeyPair work behind the scenes for Linux EC2 authentication?



A picture conveys more than words, so is the above workflow. A KeyPair consists of a Private Key and a Public Key. The Private Key goes onto the Laptop and the Public Key automatically goes into the EC2 instance. Go through the above workflow to get to know what happens behind the scenes.

Note that the Private Key never leaves the laptop, this is one of the advantages of using the Key Pairs. Also, the way we never share the passwords with anyone, we should never share the Private Key with anyone. This would allow them to access the EC2. Also, the way we never use the same password across multiple services, never we should use the same Key Pair across multiple EC2 instances for the obvious reasons.

Also, it's always better to create a different set of Key Pairs for multiple users accessing the same EC2 instance, very similar to different passwords. Here is a small write-up from the AWS site on the same (1).

>> If you have several users that require access to a single instance, you can add user accounts to your instance. For more information, see Managing User Accounts on Your Linux Instance. You can create a key pair for each user, and add the public key information from each key pair to the .ssh/authorized_keys file for each user on your instance. You can then distribute the private key files to your users. That way, you do not have to distribute the same private key file that's used for the AWS account root user to multiple users.

Here are a few articles on some of the regularly performed tasks around the combination of EC2 and Key Pairs.

1) How do I enable a password login instead of a key pair when logging into my EC2 instance using SSH? (1, 2)

2) How do I add new user accounts with SSH access to my Amazon EC2 Linux instance? (1, 2)

3) Connecting to your Linux instance if you lose your private key (1)

4) Rotating the Access Keys (1) - Note that there are many better ways, but this is the easiest.

There is lot more to Key Pairs and how the authentication works, but this article gives a basic gist on what happens BTS (behind the scenes) when we use KeyPairs to access an EC2 instance. I like to keep things clear and simple, this helps me to get my concepts clear and express myself better.

Monday, April 6, 2020

AWS AMI vs Launch Templates

Very often I get asked about the differences between AMI (Amazon Machine Images) and EC2 Launch Templates, although both of them have different purpose. This blog is about getting these concepts clear.

What is AMI?


There might be a requirement where we need to install softwares and applications on hundreds of EC2s. It's not practically possible to login to each of the EC2 and perform the task as it is time consuming and also prone to errors. We can automate these tasks using AMIs. Below is the workflow to start working with the AMIs.


Once an EC2 has been created, the appropriate software/applications have to be installed along with the configurations and then an AMI has to be created. The AMI has to OS and all the software/applications on top of it. The AMI usually is a couple of GB, based on the original EC2 which was created and is immutable (no changes can be made to it).

With this AMI, additional EC2s can be created and each one of the EC2 will have the original software automatically installed as in the case of Apache2. This makes getting the EC2 ready much easier/quicker.

Different ways of launching EC2 Instance


AWS provides multiple ways of launching EC2 instances via the SDK, CLI, CloudFormation and the EC2 Management Console (Web UI). From the EC2 Management Console again there are two ways, one is is using the Launch Instance wizard and other is via the EC2 Launch Templates as shown below.


Both these approaches lead to launching an EC2 and takes the required parameters for the same like the AMI which was mentioned above, the EBS volume size/type, SecurityGroup, KeyPair to be used and a few other details.


In the EC2 creation via the Wizard approach we need to select each and every time the AMI, Instance Type, Network Settings, Storage, Security Groups, pricing model etc from the different options. In the below screen the AMI has to be selected from the available ones. This is OK when we launch an EC2 a few times, but it's more of a routine task and is also time consuming.


This is where the EC2 Launch Templates come to the rescue. We can create a template and predefine all the EC2 properties and reuse the same to launch an EC2 instance. This way we don't need to select the EC2 properties again and again. Below is the template with the AMI, instance type, KeyPair, SecurityGroup predefined. Using this template we should be able to create an EC2 instance using Option (6).


CloudFormation Templates vs EC2 Launch Templates


While the EC2 Launch Templates can be used for the automation of the EC2 Instance creation. CloudFormation Templates are much more than that. It's possible to create many of the AWS resources via the CloudFormation Templates and connect them together, watch the drift (changes to the AWS resources) and much more. Here is the list of AWS Resources that can be created by CloudFormation.

Tuesday, March 31, 2020

Debugging K8S applications with Ephemeral Containers

It's always CRITICAL to pack a Container image with the minimal binaries required as this makes the surface area of attack minimal, upgrading the image and testing also becomes easier as there are less variables to be addressed. Distroless Docker images can be used for the same. In the above diagram Container (A) has only the application and the dependent binaries and nothing more. So, if there are no debugging tools in the Container (A) nor any way to check the status of the process then how do we debug any problem in the application? Once a pod is created, it's even not possible to add Containers to it for additional debugging tools.

That's where the Ephemeral Containers come into picture as in the Container (B) in the above picture. These Containers are temporary that can be included in the Pod dynamically with additional debugging tools. Once a Ephemeral Container has been created, we can connect to it as usual using the kubectl attach, kubectl exec and kubectl logs commands.

Here are the steps for creating an Ephemeral Container. The assumption is that kubeadm has been used to create the K8S Cluster and the Cluster is using K8S 1.18 version or newer which is the latest as of this writing.

Step 1) The Ephemeral Containers feature has to be enabled before creating it. More about enabling K8S features here.

Edit the below files to add '- --feature-gates=EphemeralContainers=true' in the command section. there is no need to restart the Pods as the kubelet process continuously monitors these files and restarts the Pods automatically. Run the 'kubectl get pods --all-namespaces' and notice the changes to the AGE column.

/etc/kubernetes/manifests/kube-apiserver.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml

Step 2) Edit the '/etc/systemd/system/kubelet.service.d/10-kubeadm.conf' to pass the  '--feature-gates=EphemeralContainers=true' parameter to kubelet as shown below.

ExecStart=/usr/bin/kubelet --feature-gates=EphemeralContainers=true $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

Step 3) Restart the kubelet service for the above changes to take place into effect.

sudo systemctl daemon-reload
sudo systemctl restart kubelet

Note that Step 2 and 3 have to be run on all the Control Plane Nodes and the Worker Nodes.

Step 4) Now that the feature has been enabled, lets create a Pod with an application. For sake of simplicity, a pause Pod is created using the below command.

kubectl run ephemeral-demo --image=k8s.gcr.io/pause --restart=Never

There is no shell or any other debugging tools in this container, so the below command will fail.

kubectl exec -it ephemeral-demo -- sh

Step 5) Let's start an Ephemeral Container with the below command. Note that this particular feature is in alpha release of K8S, so the parameter alpha in the kubectl command. Once the feature graduates to stable, we don't need this any more. The below command opens a terminal to the Ephemeral Container and from where we can start debugging the main application.

kubectl alpha debug -it ephemeral-demo --image=busybox --target=ephemeral-demo

Step 6) Open a new terminal and run the below command to get the details of the Ephemeral Container.

kubectl describe pod ephemeral-demo

Hope you had fun learning about Ephemeral Containers, more about here and here.

Monday, January 13, 2020

Prajval in '32nd South Zone Aquatic Championship – 2019'

My son SS Prajval got Silver Medal in the finals of ‘4x50 mts Medley Relay – Group III – Boys’ 32nd South Zone Aquatic Championship– 2019. This was the first time Prajval represented in the finals of the South Zone Competitions and he was able to grab a Silver Medal in the same.

The District and the State level competitions were held at ‘B V Gurumurthy Memorial MCH Swimming Pool, Secunderabad, while the finals were held at ‘GMC Balayogi Athletic Stadium, Gachibowli, Hyderabad’. These events were conducted by the Telangana Swimming Association. The event was well covered by all the major media outlets.


GMC Balayogi Athletic Stadium, Gachibowli, Hyderabad

In the finals Swimmers from Andhra Pradesh, Telangana, Kerala, Karnataka, Tamil Nadu and Puducherry (Southern States of India) represented. It was nice to see kids really motivated with the Swimming. The final events were held on 3rd/4th/5th of January 2020. We had been to the events on the all three days and it was an enriching and motivating experience for Prajval and us.

Receiving the Silver Medal

With his Swimming Coach Mr Satish Balasubramanian (NIS Coach)

With his buddies - BEFORE the competition

With his buddies - AFTER the competetion

Sakshi

Deccan Chronicle

Sakshi
It's a tough balancing act between studies, swimming and peer pressures. But it's good to see that the kids are able to balance and prioritize them. Wish him luck for all the future competitions.

Monday, December 9, 2019

Running Containers on K8S using AWS Fargate

Container orchestration is all in hype. And there are different ways of running containers on AWS, using either EKS or ECS. EKS uses K8S behind the scenes and ECS uses AWS proprietary technologies.With EKS it's easy to migrate containers from one Cloud to another, but not with ECS. In fact Google Cloud Anthos makes it easy to manage K8S across Clouds and on premise.

With the EKS, AWS has announced Managed Node Groups which takes away the burden of maintaining the K8S worker nodes. In the recent re:Invent 2019 AWS announced another exiting feature around EKS. Now, it's possible to run EKS the Fargate way as mentioned in the above diagram as Option 3. The rest of the options had been there for some time. AWS Fargate follows the serverless pattern and there is no need to think in terms of number of EC2 and size of them. All we need is to create an EKS Cluster and run the Pods on them. We exactly pay for the vCPU and Memory resources consumed by the Pods.

Here are the steps for creating an AWS EKS Cluster using the eksctl via Fargate approach.

Step 1: Create an Ubuntu EC2 Instance (t2.micro) and connect to it. On this Instance we would be running the eksctl and other commands for creating the AWS EKS Cluster.

Step 2: Execute the below commands on Ubuntu to create key pairs and install AWS CLI, aws-iam-authenticator, kubectl and eksctl softwares.

#installation of the required software
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list

sudo apt-get update
sudo apt-get install -y python3-pip kubectl

pip3 install awscli --upgrade
export PATH="$PATH:/home/ubuntu/.local/bin/"

curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin

Step 3: Get the access keys for the root and provide them using the `aws configure` command.

Step 4: From here on use the steps mentioned in AWS Blog 'Amazon EKS on AWS Fargate Now Generally Available'.

Note that there is a cost associated for running the EKS K8S Cluster and also for the NAT Gateway which is part of the VPC created in the above steps mentioned in the AWS Blog (Step 4). Also, make sure to delete the EKS Cluster and any other AWS resources created as part of the sequence of steps.

Friday, November 29, 2019

Changes to the AWS EC2 Instance Metadata Service (IMDS) around the recent Capital One hack

Captial One Bank (1) and 30 different organizations were hacked around end of July, I have written a blog (1) around the same time on how to recreate the hack in your own AWS account and also a few mitigations around the same. Now, AWS has made a few changes to the AWS EC2 Instance Metadata Service (IMDS) around the same (1, 2). Here (Security best practices for the Amazon EC2 instance metadata service) is the AWS re:Invent 2019 session around the same.

The old/existing approach is called IMDSv1 and the new one IMDSv2. Although IMDSv1 solves a few problems like not storing the access keys on the EC2, it bought its own headaches which lead to the hacks. Earlier the access keys for a IAM Role can be got using the below command. But, with the new IMDSv2 the same command would lead to `401 - Unauthorized` error after enabling IMDSv2. This would block SSRF and other such attacks.

curl http://169.254.169.254/latest/meta-data/iam/security-credentials/Role4EC2-MetaDataMod

Lets try to see it in action with the sequence of steps.

Step 1: Create an Ubuntu EC2 (t2.micro) and login to it via Putty.

Step 2: Create a role (Role4EC2-MetaDataMod) with the below JSON Policy (MetaDataModPolicy). Attach the role to the EC2.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "ec2:ModifyInstanceMetadataOptions",
            "Resource": "*"
        }
    ]
}

Step 3: Execute the below commands on the EC2 instance to configure the AWS CLI.

sudo apt-get update; sudo apt-get install -y python3-pip
pip3 install awscli --upgrade
export PATH="$PATH:/home/ubuntu/.local/bin/"

Configure the AWS CLI, make sure to provide only the region (us-east-1 or some other) and rest of them as blank.

aws configure

Step 4: Execute the below command on the EC2 instance to get the access keys associated with the IAM Role. The access keys would be displayed in the console.

curl http://169.254.169.254/latest/meta-data/iam/security-credentials/Role4EC2-MetaDataMod

Step 5: Turn off IMDSv1 executing the below command on the EC2 instance, by default both (IMDSv1 and IMDSv2) of them are turned on. Make sure to replace the EC2 instance-id in the below command. With this IMDSv1 is disabled and IMDSv2 is enabled. If IMDSv1 and IMDSv2 both should be enabled for the sake of compatibility with the existing applications the http-token can be set to optional. More on the command syntax and the documentation here.

aws ec2 modify-instance-metadata-options --instance-id i-053cb17f19ca95067 --profile default --http-endpoint enabled --http-token required

Step 6: Now the same command from Step 4 leads to `401 - Unauthorized` error message. as shown in the below screen.


Step 7: With the IMDSv2 a session token has to be got via the HTTP PUT method and then the token used to retrieve the access keys or in-fact with any of the EC2 Instance Metadata Service.

TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
echo $TOKEN
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/Role4EC2-MetaDataMod -H "X-aws-ec2-metadata-token: $TOKEN"

Now the access keys can be got as shown in the below screen.


How IMDSv2 would prevent the hack from happening?


IMDSv2 would not prevent hacks like the Capital One hack from happening, but it addresses many misconfigurations and application bugs to some extent. There might be still a probability that a WAF is not configured properly to block the HTTP PUT requests with a bug in the application code to get the IMDS token and access the EC2 Metadata Service form outside the EC2.

For those who are hosting applications on AWS, would recommend enabling IMDSv2 and disabling IMDSv1 as mentioned here (1, 2, 3). But, before making the changes make sure the applications on AWS EC2 are compatible with IMDSv2. There won't be any changes required for any applications using the AWS SDK as this internally gets the token and then accesses the EC2 Instance Metadata Service using the token. But, if your application accesses the EC2 Instance Metadata Service using the HTTP endpoint as in the case of Step 4, it would require changes to the code to get the token.

This is a nice step from AWS, but it took more than 100 days for them to come up with a solution to block more such attacks. GCP has also the instance metadata service (1), not really sure if the same vulnerability is in GCP also and how GCP handles it. If you are familiar with how GCP tackles it, please let me know in the comments section and I will update this blog.

Along, the same lines AWS also introduced managed WAF rules (1) to avoid similar attacks.

Thursday, November 28, 2019

Creating a K8S Cluster on AWS using eksctl

As mentioned in the official K8S documentation (1), there are different ways of setting up K8S, some used for learning purpose and some for production setup. Same is the case, there are different ways of setting up K8S on AWS (1). Today we will explore setting up using the eksctl way (1) which is kinda easy. There are lot of tools which make the K8S installation easier, but if you are looking on how to build from scratch then K8S the hard way (1) is the way to go. This will also help with the CKA certification (1).

As of this writing AWS EKS charges $0.20 per hour for each Cluster created, this is independent of the worker nodes in the Cluster. And there is a separate charge for EC2 and EBS for the worker nodes. As the Cluster charges are flat, there is no way to optimize it. So, I used a t2.micro for the worker nodes to optimize the cost. But, it didn't work out as t2.micro supports a maximum of 2 network interfaces and 2 IPv4 addresses per network interface (1). This boils to the fact that t2.micro can have maximum of 4 IP addresses.

Whenever a new Pod is created on the worker node, AWS allocates a new IP address from the VPC subnet. As mentioned above, t2.micro can have a maximum of 4 IP address, one of which is attached  to the EC2 itself. There are 3 more IP address left, which can be allocated to the Pods on the worker nodes. This makes is difficult to use the t2.micro EC2 Instance which falls under the AWS free tier, as some of the pods are used by K8S installation itself. The below steps would be using a single EC2 Spot Instance (t3.small or t2.medium) for the worker nodes.

Also, AWS has recently introduced managed Node Groups (1). With this most of the grunt work like upgrading K8S on the worker nodes is managed by AWS with no additional cost. So, we have node-groups which have to be managed by the customer and the new managed-node-groups which are managed by AWS.

Node-groups had been there for some time and support EC2 Spot Instances, but managed-node-groups is relatively new and looks like it doesn't support the EC2 Spot Instances as of now.

Here are the steps for creating an AWS EKS Cluster using the eksctl. References (1, 2, 3)

Step 1: Create an Ubuntu EC2 Instance (t2.micro) and connect to it. On this Instance we would be running the eksctl and other commands for creating the AWS EKS Cluster.

Step 2: Execute the below commands on Ubuntu to create key pairs and install AWS CLI, aws-iam-authenticator, kubectl and eksctl softwares.

#generation of ssh keypairs to be used by the worker K8S Instances
ssh-keygen -f .ssh/id_rsa

#installation of the required software
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list

sudo apt-get update
sudo apt-get install -y python3-pip kubectl

curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-2.amazonaws.com/1.14.6/2019-08-22/bin/linux/amd64/aws-iam-authenticator
chmod +x ./aws-iam-authenticator
sudo mv ./aws-iam-authenticator /usr/local/bin

pip3 install awscli --upgrade
export PATH="$PATH:/home/ubuntu/.local/bin/"

curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin

Step 3: Get the access keys for the root and provide them using the `aws configure` command.

Step 4: Create a cluster.yaml file for a nodegroup or managed nodegroup and start creating the Cluster. The yaml configuration file for both of them has been mentioned below. It would take 10-15 minutes time. This configuration will use existing VPC, note to change the availability zone and subnet-id in the yaml to where the worker nodes have to be deployed.

eksctl create cluster -f cluster.yaml

Check the number of nodes in the cluster.

kubectl get nodes

Once the cluster has been created the below command can be used to login to each of the worker nodes. Make sure to replace the EC2 IP of the worker nodes.

ssh -i ./.ssh/id_rsa ec2-user@3.81.92.65

Step 5: Create a deployment with 2 ngnix pods and get the pod details.

kubectl run nginx --image=nginx -r=2
kubectl getpods -o wide

Step 6: Delete the Cluster. Again, the deletion of the Cluster would take 10-15 minutes of time. The progress would be displayed in the console.

eksctl delete cluster --wait --region=us-east-1 --name=praveen-k8s-cluster

Step 7: Make sure to terminate the EC2 Instance created in Step 1.

# An example of ClusterConfig showing nodegroups with spot instances
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
    name: praveen-k8s-cluster
    region: us-east-1

vpc:
  subnets:
    public:
      us-east-1a: { id: subnet-32740f6e }
      us-east-1b: { id: subnet-78146a1f }
      us-east-1c: { id: subnet-16561338 }

nodeGroups:
    - name: ng-1
      ssh:
        allow: true
      minSize: 1
      maxSize: 2
      instancesDistribution:
        instanceTypes: ["t3.small", "t3.medium"]
        onDemandBaseCapacity: 0
        onDemandPercentageAboveBaseCapacity: 0
        spotInstancePools: 2

# An example of ClusterConfig showing managed nodegroups with spot instances
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

vpc:
  subnets:
    public:
      us-east-1a: { id: subnet-32740f6e }
      us-east-1b: { id: subnet-78146a1f }
      us-east-1c: { id: subnet-16561338 }

metadata:
    name: praveen-k8s-cluster
    region: us-east-1

managedNodeGroups:
  - name: managed-ng-1
    instanceType: t3.small
    minSize: 1
    maxSize: 1
    desiredCapacity: 1
    volumeSize: 20
    ssh:
      allow: true

Screen shots from the above sequence of steps


1. eksctl uses CloudFormation templates to create the EKS Cluster and the NodeGroup. The status of the Stack creation can be monitored from the CloudFomation Management Console.


2. The above mentioned CloudFormation templates create a  EKS Cluster and the NodeGroup as shown in the below EKS Management Console.




3. One of the EC2 Instance was created in Step 1. Other one was created by eksctl for the NodeGroup worker nodes.


4. Interaction with the Cluster using kubctl to
     - get the nodes
     - create a deployment and get the list of pods


 5. Finally, deletion of the EKS Cluster and the NodeGroup.


Conclusion


As shown above eksctl provides an easy way to create a K8S Cluster in the AWS easily. The same thing can be done with the AWS Management Console also (1). Doing with the AWS Management Console is more of a manual way, which gives us clarity on the different resources getting created and how they interact with each other.

Monday, November 25, 2019

Interacting with AWS S3 using Java on EC2

Many web applications are being built on top of the Cloud Infrastructure. Let's take the case of a photo sharing website like Instagram. The website can be deployed on EC2 and let it interact with S3 to store the pictures. Building a full fledged photo sharing website is beyond the scope of this blog, but we will explore how to execute a Java program on top of EC2 to interact with S3.


Here are the sequence of steps with the assumption that the reader is familiar with the basics of AWS. Also, all the steps in this blog will fall under the free tier.

Step 1: Create an Ubuntu EC2 instance (t2.micro) and connect to it via Putty or any other means. For the EC2 SecurityGroup the port 22/SSH has to be opened in the inbound.

Step 2: Get the list of softwares and install maven, java-common by executing the below commands on Ubuntu EC2 instance.
sudo apt-get update
sudo apt install maven java-common

Step 3: By default EC2 has no permission to interact with S3 or in fact with any other service. Create a role with AmazonS3FullAccess policy attached. This policy gives full permissions to all the folders and files in S3, which is not recommended. It's always best to give limited privileges by creating a custom policy and attaching it to the EC2.

Step 4: Attach the policy to the EC2. Now the EC2 has permissions to interact with S3.

Step 5: Get the link to the latest Amazon Corretto Java from this link and replace the link in the below wget command. Execute the wget commands in the Ubuntu EC2 to download Amazon Corretto Java. OpenJDK or Oracle JDK can also be used.

wget https://d3pxv6yz143wms.cloudfront.net/11.0.5.10.1/java-11-amazon-corretto-jdk_11.0.5.10-1_amd64.deb

Install Amazon Corretto Java on Ubuntu using the dpkg command as an administrator.

sudo dpkg --install java-11-amazon-corretto-jdk_11.0.5.10-1_amd64.deb

Step 6: Create basic maven package using the below command. This will create a myapp folder with pom.xml, App.java and other artifacts.

mvn -B archetype:generate \
  -DarchetypeGroupId=org.apache.maven.archetypes \
  -DgroupId=org.example.basicapp \
  -DartifactId=myapp
 
Step 6: In the myapp folder, remove the pom.xml and replace with pom.xml mentioned here. Replace the exec-maven-plugin (1) and aws-java-sdk (1) versions with the latest versions got from the maven repository in the pom.xml file.

Step 7: Remove the App.java file created by maven.

cd /home/ubuntu/myapp/src/main/java/org/example/basicapp
rm App.java

Create a file by the S3Sample.java with the java program mentioned here in the above folder.

Step 8: Execute the below commands to compile the S3Sample.java program and execute it.

cd ~/myapp
mvn clean compile exec:java

Step 9: Note that interaction with the S3 happening in the output of the above command towards the end as shown below. A bucket is created in S3, a file uploaded/downloaded. The S3 cleanup is also done in the Java program, so when seen in the S3 Management Console there won't be any changes.

Modify the Java program not to do the S3 cleanup and the changes done by the Java program can be seen in the S3 Management Console.