History of Apache Hadoop

Apache Hadoop is open-source software, used for storing and processing large datasets. It can process data in the range of gigabytes to petabytes. Hadoop was developed at Apache Software Foundation.

Let us have a look at the history of Hadoop and its evaluation in the last 2 decades.

Apache Hadoop was created by Doug Cutting, who has also created Apache Lucene.

The Origin of the Name “Hadoop”

Doug Cutting (The developer and project creator of Hadoop) clarified the source of Hadoop and its logo.

“He said that his kind suggested a name that is a stuffed yellow elephant that was very easy to spell; it was meaningless and nowhere used. He further said that was his naming criteria.”

Year-by-Year Evaluation of Hadoop

2002: Apache Nutch Project was started.

2003: The architecture of Google’s distributed file system GFS was published that was being used in production at Google.

2004: In this year, Google issued a paper that presented MapReduce to the world.

2005: Nutch developers had implemented and tested MapReduce at Nutch, and in the same year all Nutch algorithms were started using MapReduce and NDFS.

2006: Doug Cutting was appointed at Yahoo! and created a dedicated team to work on the Hadoop project that can run at a web-scale.

2007: This year, the first code drop for HBase was done by Mike Cafarella.

2007: Yahoo! published a message that the production search index of Yahoo! is generated by a thousand (1000) core Hadoop cluster.

2008: In January, Hadoop becomes a top-level project of Apache Foundation which has an active community and companies apart from Yahoo! Like Facebook, New York Times, and Last. fm was started using it.

2008: Hadoop created history by performing the sort operation of one terabyte of data in just 209 seconds. 910 Hadoop cluster nodes were used to perform this activity.

2014: Team from Databricks were joint winners of the Gray Sort benchmark.

History of Apache Hadoop

Hadoop - Big Data Overview

Hadoop - Introduction

The Origin of the Name “Hadoop”

Year-by-Year Evaluation of Hadoop