Wednesday, October 12, 2016

ASF (Apache Software Foundation) as a standards body

What is ASF all about?

For many Apache is synonymous to the Apache HTTP server, which is the backbone for serving the web pages. But, there is much more to Apache. It's a non profit organization (ASF - Apache Software Foundation) which provides an environment and a platform in which different companies and individuals work in an open and collaborative fashion towards a common goal of developing a good piece of software. Open means that all the work (architecture, design, coding, testing, documentation etc) happens in an open way and there are no secrets. Anyone can also download the code, make some changes, compile it and push it back.

It's possible for different companies and individuals like you and me to improve the code and contribute it back to the ASF. To maintain the quality of the software there is a process in place where the project committer will check the quality of the code contributed by someone and then add them to the code repository. The advantage of working in this model is that any improvements made by an individual or a company can be immediately absorbed by someone else. This is what working in a collaborative fashion means.

There are a lot of Big Data projects under the ASF like Hadoop, Hive, Pig, HBase, Cassandra and a lot of non Big Data projects like Tomcat, Log4J, Velocity, Struts. The projects usually start with Incubator status and then some of them move to the TLP status (Top Level Project). The code for the different projects can be accessed in a read only way from here.


How is Apache promoting standards?

OK, now we know how the Apache process works. The different companies and individuals work towards the common goal of creating good software. Now, lets look into why standards are important in software and how Apache is promoting the standards.

Those who travel internationally and carry at least one electronic item, face the problem with the different socket layout and the voltages in different countries. The plugs just don't fit in into the sockets and so is the need to carry multiple adapters. If there had been an international standard for the socket layout and the voltage, we wouldn't face this problem.

The same can be applied to software standards also. Software standards allow interoperability across different software stacks. As an example, a program written against one software can be easily ported to some other software if the standards are followed. One example is the JEE standards. An EJB written for JBoss can be easily ported to WebSphere with minimal or no changes.

In the case of Big Data stacks, the different Big Data companies take the software from the Apache Software foundation and improve on it. Some of the improvements can be better documentation,  better performance, bug fixes, better usability in terms of installation/monitoring/alerting.

The Apache code is the common base for the different Big Data distributions. Due to this reason the Apache code base and the different distributions like HDP, CDH etc provide more or less the same API to program against. This way Apache is acting as a standards body. For example a MapReduce program written against HDP distribution can be run against the other distributions with minimal or no changes.

Although Apache is not a standards body, it is still acting as one. Usually, a standards body is formed and they do develop standards and a reference implementation for the same. This is a painstaking process and some of the standards may not see the light of the day also. The advantage of working in the Apache fashion is that the standards are developed indirectly in a very quick fashion.

No comments:

Post a Comment