Google has unique requirements with respective to data processing and storage which no one has. According to the WikiPedia, Google has to process about 24 Peta Bytes of data per day which be a bit outdated and Google might be processing more data per day. So, they need to continuously innovate to address the unique requirements. Soon they outgrow the innovation and they come up with some new innovation.
The good thing is that Google had been continuously releasing these innovations as papers once they have it refined and there is a solid internal implementation of it.
These Google Papers have been implemented by the ASF (Apache Software Foundation) and others. It's taking some time for the ASF frameworks like Hadoop and others to production ready. There is a catchup between Google papers and the ASF on a continuous basis.
The good thing is that Google had been continuously releasing these innovations as papers once they have it refined and there is a solid internal implementation of it.
These Google Papers have been implemented by the ASF (Apache Software Foundation) and others. It's taking some time for the ASF frameworks like Hadoop and others to production ready. There is a catchup between Google papers and the ASF on a continuous basis.
Google Paper | Apache Frameworks |
The Google File System (October, 2003) |
HDFS (2008 became Apache TLP) |
MapReduce: Simplified Data Processing on Large Clusters (December, 2004) |
MapReduce (2008 became Apache TLP) |
Bigtable: A Distributed Storage System for Structured Data (November, 2006) |
HBase (2010 became Apache TLP), Cassandra (2010 became Apache TLP) |
Large-scale graph computing at Google (June, 2009) |
Hama, Giraph (2012 became Apache TLP) |
Dremel: Interactive Analysis of Web-Scale Datasets (2010) |
Apache Drill (Incubated in August, 2012), Imapala from Cloudera. |
Large-scale Incremental Processing Using Distributed Transactions and Notifications (2010) |
??? |
Spanner: Google's Globally-Distributed Database (September, 2012) |
??? |
Following the research/papers published by Google and related blogs/articles gives an idea where Big Data is moving. Many might not have the same requirements nor the resources as Google, so we would be seeing more and more cloud services for the same.
No comments:
Post a Comment