Wednesday, August 1, 2018

Upgrading from Ubuntu 16.04 (Xenial Xerus) to 18.04 (Bionic Beaver)

I had been using Ubuntu for quite a few years and lately had been using Ubuntu 16.04 along with Windows 10 as dual boot on my Lenovo Z510. Ubuntu for pretty much everything and Windows for any software which is not compatible with Ubuntu. This has been a deadly combination which worked for me pretty well.

Why the upgrade in Ubuntu?

In Ubuntu 16.04 pretty much everything was working well, except the suspend and hibernate. The system was not able to resume from suspend every time. The only option left was to shutdown and restart the computer along with all the applications, which is not really nice.

Checking the different Ubuntu forums and trying out different suggested solutions didn't fix the problem. So, finally decided to upgrade Ubuntu to the latest version. There is a probability that the upgrade process got messed up and the data is lost. My data is backed up automatically to the different Clouds, so this was not an issue.

Ubuntu released 18.04 in April, a few months back. But, upgrade process from 16.04 (Xenial Xerus) to 18.04 (Bionic Beaver) is not recommended. Upgradation to the point release 18.04.1 is the safest approach. It gives Canonical time to fix the bugs and make the transition smoother.

So, as soon as 18.04.1 announced, I took a shot and upgraded to Ubuntu 18.04.1 by following the instructions mentioned here.

How was the Ubuntu upgradation?

During the initial days of Ubuntu, upgradation from one version to another messed up the Operating System, but it was really smooth this time. Here I am with the latest Ubuntu after a reboot.

Ubuntu 18.04 (Bionic Beaver) Desktop

The download and installation process took about 2 hours with a good number of prompts in between. Wish there was a 'Yes to all' option during the process which would have made the installation process unattended.

Was everything smooth after the upgradation?

Usually any software upgrade will have some major/minor issues which will get fixed overtime, same is the case with Ubuntu. Here is a list of some issues to start with. I am sure to update the list the more I use the latest Ubuntu and also with the possible solutions if any.

  • Ubuntu was using Unity UI and moved to GNOME, so it takes some time to get used to the new UI. But, my initial impressions are good with GNOME.

  • I had been using Phatch to batch mark the images on this blog, but it has been removed from the Ubuntu repository. Quick Googling around gave Converseen as an alternative which I am yet to try.

  • Right click on the mouse stopped working and has been replaced with two-finger click. There were a couple of solutions and quick try of some of them didn't work. Again it will take some time to get used to the two-finger click.

  • The good thing is that suspend start working and I was able to resume where I stopped. This basically increased the productivity and the focus. When I used the Nvidia display driver instead of the default open source Nouveau display driver, the suspend functionality broke and I had to revert to the Nouveau display driver.

  • Should I upgrade?

    If Canonical is supporting the Ubuntu version which you had been using for the next few years and there is no hard pressing issue like suspend in my case then I would recommend to stick to the current OS. Again, if you want to try the latest technology like me, then go ahead with the upgrade.

    Monday, July 30, 2018

    Compatibility between the Big Data vendors

    What the Big Data vendors have to offer?

    Finally that the Big Data wars have pretty much ended, we have got Cloudera, MapR and Hortonworks as the major Big Data vendors. There are also other pure vendors that focus on one or two Big Data softwares (like DataStax on Apache Cassandra), but the above mentioned Cloudera, MapR and Hortonworks vendors provide a complete suite of softwares covering storage, processing, security, easy installation etc. These vendors solve some of the problems like

    • Integrating the different softwares from Apache. Not every Big Data software from Apache is compatible with other. These vendors make sure that the different softwares from Apache play nice with each other.

    • Installation and fine tuning of the Big Data softwares is not easy. It's no more download and click. These vendors make the installation process easier and automate as much as possible.

    • Although the software from Apache is free to use. Apache Software Foundation doesn't provide any commercial support. Companies like Cloudera, MapR and Hortonworks fill the gap as long as the software from these vendors is being used.

    Friday, July 6, 2018

    What is DIGITAL?

    Very often we hear the word DIGITAL in the quarterly results of the different IT companies especially in India. The revenue from the DIGITAL business is compared with the traditional business. So, what is DIGITAL? There is no formal definition of DIGITAL, but has been loosely used by different companies as mentioned lately.

    But, here is the definition of DIGITAL in an interview at MoneyControl (here) by Rostow Ravanan, Mindtree CEO and MD. This is a bit vague, but the best I could get till now. The vagueness comes from the fact that it doesn't say what BETTER is. Does anyone see something missing? I see IOT missing. Lately I had been working on IOT and would be writing my opinion on where IOT stands as of now.

    Q: Digital is still a vague term in the minds of many. What does it mean for you?

    A: So let me go back a little bit and tell you what we define as digital. We define digital and we put that in our factsheet, whenever we declare results every quarter.

    In our definition of digital, we take one or two ways of defining it. To a business user, we definite digital from a business process perspective to say anything that allows my customer to connect to their customer better or anything that allows my customer to connect to their people better is one way of defining digital from a business process point of view.

    Or if you were to look at digital from a technology definition point of view, we say it is social, mobility, analytics, cloud, and e-commerce. From a technology point of view, that is how we define digital.

    Thursday, November 2, 2017

    Refining the existing AWS Security Groups

    I am a big fan of blogs/articles which use multiple services from AWS. Each of the services from AWS is powerful, but when we combine them in different ways we can achieve a lot more.

    When an organization deploys an application in the Cloud, over time there can be some port numbers in the Security Groups which are not required for the functionality of the application. These unnecessary ports might be a security risk to the organization. So, it's always better to open the minimum set of port numbers required.


    AWS doesn't give a direct way to identify the unused ports, the VPC flow logs have to be captured and analyzed to identify the unused port numbers and the corresponding Network Interfaces and Security Groups. So, below are two blogs from AWS on the same.

    How to Optimize and Visualize Your Security Groups

    How to Visualize and Refine Your Network’s Security by Adding Security Group IDs to Your VPC Flow Logs

    The end results are the same for the two blogs, but they do use different services from AWS. The blogs are pretty straightforward to follow. After trying it out, you would be familiar with VPC flow logs, Lambda, Kinesis, Elasticsearch and IAM.

    For those who are getting started with AWS, I would definitely recommend going through the above blogs in the same order. Depending on the technology comfort, it might take time, but the blogs are worth trying it out.

    Tuesday, October 31, 2017

    Managing the Raspberry Pi from Laptop

    Pi (short for Raspberry Pi) is a single-board computer (SBC). It has a couple of USB ports to which keyboard, mouse and other peripherals can be connected. An HDMI port to connect a monitor, a MicroSD slot for OS/applications/data and a mini USB port for power supply. There are a bunch of models in Pi and I have bought Raspberry Pi 2 Model B a few years back and plugged the different components together as shown below. Finally, installed Raspbian on it. Note that the green casing is not part of the Pi, it had to be ordered additionally.


    The right side cables are to the USB ports to which a mouse and keyboard are connected. On top of the USB cable is the WiFi dongle. The later models of Pi have inbuilt support for WiFi, but this model doesn't have. The left side cables are HDMI and power supply. It's a cool setup for the kids to get started with computers. It's easy to setup, but those who are scared there are a few choices of laptops built on pi as the new pi-top. The Pi has a few GPIO ports to which different sensors (light, temperature, humidity etc) can be connected to get the ambiance conditions and take some actions using actuators.


    The above configuration is cool, but is not mobile as it's much like a desktop. So, I was looking for options to connect it to the Laptop and use the keyboard, mouse, monitor and power from the laptop. Here I got the instructions for the same. Finally. I was able to manage the Pi from the laptop as shown below. The Pi Desktop is there on the Laptop. Not sure why, but the VNC server on the Pi stopped starting automatically. So, I had to login via ssh to the Pi and start the VNC server.

    Thursday, October 26, 2017

    Installing WordPress on AWS Cloud

    In this blog, we would be installing WordPress which is a popular CMS (Content Management System) on EC2 and RDS. The WordPress software is widely used to create blogs and web sites. This configuration is not recommended for production. At the end of the blog, the additional tasks to be done to make the entire thing a bit more robust would be mentioned. The end product is as below.


    The WordPress software would be running on a Ubuntu EC2 instances and the data would be stored in MySQL RDS instance. Starting the RDS takes time, so first we would start the RDS instance and then the EC2 instance with WordPress on it.

    Tuesday, October 24, 2017

    Amazon Macie and S3 Security

    AWS S3 got into limelight lately for wrong reasons, more here (1, 2, 3). S3 security policies are a pain in the neck to understand, we will cover about security in the context of S3 in a detailed blog later. Before the cloud was there it took a few days to weeks for procuring the hardware, software, setting them etc. But with the cloud, it takes a few minutes to create an S3 bucket, put some sensitive data and finally set some wrong permissions on them.


    Meanwhile, AWS launched Macie to protect sensitive data from getting into the wrong hands. Here is the blog from AWS launching Macie and here the documentation on how to get started with Macie. The blog explains nicely on how to get started with Macie. Also, look at the Macy FAQ here. Initially, Macie covers only S3 data, the plan is to roll Macie for other services like EC2.

    Wednesday, October 18, 2017

    Getting notified for any state change in the AWS resources (like EC2 going down)

    EC2 can be used for any purpose like running a website, doing some batch processing. A website has a requirement to run 99.9 or 99.99 or some other percentage of the time, so back up of the EC2 instances are required for the sake of high availability. But, lets take the case of batch as in the case of transforming 1000's of records from one format to another, then high availability is not really important. If an instance fails then another can be started or the work can be automatically shifted to some other instance automatically as in the case of Big Data.

    Just to quickly summarize, in the case as in the case of web servers we need some level of high availability and so multiple EC2 instances (backup), but in the case of batch processing there is no need of backup. Lets take the case of a batch job running on a single EC2 instance, it would be good to get some sort of notification when the instance goes down. We would be looking into the same in this blog. We would be using EC2, SNS and CloudWatch. So, it would be a good way to get familiar with the different topics.


    So, here are the steps.

    Step 1: Create a new topic in the SNS management console.


    Tuesday, October 17, 2017

    Microsoft Azure for AWS Professionals

    A few months back I blogged about 'GCP for AWS Professionals' comparing the two platforms here. Now, Microsoft has published something similar comparing Microsoft Azure with Amazon AWS here.

    It's good to know for Amazon AWS when their competitors are comparing their services with Amazon's. AWS has been adding new services and features (small and big) within them at a very rapid pace. Here you can get the new features introduced in Amazon AWS on a daily basis.

    Similar to GCP and AWS, Azure also gives free credit to get started. So, now is the time to create an account and get started with Cloud. So, here are the links for the same (AWS, Azure and GCP).

    Getting the execution times of the EMR Steps from the CLI

    In the previous blog, we executed a Hive script to convert the Airline dataset from the original csv format to Parquet Snappy format. And then same query were run to csv and the Parquet Snappy format data to see the performance improvements. This involved three steps.

    Step 1 : Create the ontime and the ontime_parquet_snappy table. Move the data from ontime table to the ontime_parquet_snappy table for the conversion of one format to another.

    Step 2 : Execute the query on the ontime table, which represents the csv data.

    Step 3 : Execute the query on the ontime_parquet_snappy time, which representa the Parquet Snappy data.

    The execution time for the above three steps was got from the AWS EMR management console which is a Web UI. All the tasks which can be done from the AWS management console can also be done from the CLI (Command Line Interface) also. Lets see the steps involved to get the execution time for the steps in EMR.