!!! Overview
[{$pagename}] is an [open Source] software framework used for distributed storage and processing of [big data] sets using the [MapReduce] programming model. 

[{$pagename}] is a generally considered to be a [Data-lake]

[{$pagename}] consists of computer clusters built from commodity hardware. All the modules in [{$pagename}] are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

The core of Apache Hadoop consists to major pieces:
* a [Distributed File System] - [Hadoop Distributed File System] ([HDFS])
* a processing part - [MapReduce] programming model. 

Hadoop splits files into large [blocks] and distributes them across nodes in a cluster. 

[{$pagename}] then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of [data] locality, where nodes manipulate the data they have access to. This allows the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking.

The genesis of [{$pagename}] came from the [Google] File System paper that was published in October [2003|Year 2003]

!! More Information
There might be more information for this subject on one of the following:
[{ReferringPagesPlugin before='*' after='\n' }]