!!! Overview [{$pagename}] is an [open Source] software framework used for distributed storage and processing of [big data] sets using the [MapReduce] programming model. [{$pagename}] is a generally considered to be a [Data-lake] [{$pagename}] consists of computer clusters built from commodity hardware. All the modules in [{$pagename}] are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework. The core of Apache Hadoop consists to major pieces: * a [Distributed File System] - [Hadoop Distributed File System] ([HDFS]) * a processing part - [MapReduce] programming model. Hadoop splits files into large [blocks] and distributes them across nodes in a cluster. [{$pagename}] then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of [data] locality, where nodes manipulate the data they have access to. This allows the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking. The genesis of [{$pagename}] came from the [Google] File System paper that was published in October [2003|Year 2003] !! More Information There might be more information for this subject on one of the following: [{ReferringPagesPlugin before='*' after='\n' }]