Wednesday, October 14, 2020

Applications around the intersection of Big Data / Machine Learning and AWS

As many of the readers of this blog know I am a big fan of Big Data and the AWS Cloud, especially I am interested in the intersection of these two. But, Big Data processing requires huge number of machines, to process huge amounts of data and do some complex processing as in the case of Machine Learning.

Cloud has democratized the usage of Big Data, there is no need to buy any machines, we can spin a number of EC2 instances, do the Big Data processing and once done we can terminate the EC2 instances. AWS and other vendors are doing a lot of hardware and software innovations in this space, below are a few hardware innovations from AWS. They do require a lot of investment in the R&D and building them, which is usually possible at the scale Cloud operates.

AWS Nitro Systems : Some of the virtualization responsibilities have been shifted from the CPU to the dedicated hardware and software.

AWS Graviton Processor : The Graviton processor uses ARM based architecture, similar to the once used on mobile phones. Now we can spin EC2 with Graviton Processor.

AWS and Nvidia : They bring very high end GPU to the Cloud with the EC2 instances for Machine Learning modelling.

AWS Inferentia : Once the Machine Learning model has been created, the next step is inference which takes most of the CPU cycles. Inferentia is a custom chip from AWS for the same.

F1 Instances : Hardware acceleration on the EC2 using FPGA.

Coming back to the subject of this blog, AWS provides a few open data sets via S3 for free for us to do the processing in the Cloud and get some meaningful insights out of it. The data sets can be found here. For those who are familiar with either AWS or Big Data, the challenge is how to figure out how the intersection of these work together. For this AWS has published a bunch of blogs/articles here on the intersection of AWS and Big Data /Machine Learning for different domains. Below is a sample application around the intersection of Big Data and AWS around Genome data. Note that AWS has been highlighted, look out for more of them.


Summary

The intersection of Big Data / Machine Learning and AWS is very interesting. Cloud with the pricing democratizes the usage of Big Data / Machine Learning, but each one is a beast on its own to learn and there is a lot of innovation happening in this space and it's tough to keep in pace. Here are a few applications around these to get started. Good Luck !!!

No comments:

Post a Comment