This page (revision-1) was last changed on 29-Nov-2024 16:16 by UnknownAuthor

Only authorized users are allowed to rename pages.

Only authorized users are allowed to delete pages.

Page revision history

Version Date Modified Size Author Changes ... Change note

Page References

Incoming links Outgoing links

Version management

Difference between version and

At line 1 added 20 lines
!!! Overview
[{$pagename}] is an [open Source] software framework used for distributed storage and processing of [big data] sets using the [MapReduce] programming model.
[{$pagename}] is a generally considered to be a [Data-lake]
[{$pagename}] consists of computer clusters built from commodity hardware. All the modules in [{$pagename}] are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.
The core of Apache Hadoop consists to major pieces:
* a [Distributed File System] - [Hadoop Distributed File System] ([HDFS])
* a processing part - [MapReduce] programming model.
Hadoop splits files into large [blocks] and distributes them across nodes in a cluster.
[{$pagename}] then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of [data] locality, where nodes manipulate the data they have access to. This allows the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking.
The genesis of [{$pagename}] came from the [Google] File System paper that was published in October [2003|Year 2003]
!! More Information
There might be more information for this subject on one of the following:
[{ReferringPagesPlugin before='*' after='\n' }]