<meta charset="UTF-8">
<meta name="robots" content="noindex,nofollow" />
<link rel="shortcut icon" type="image/x-icon" href="/wiki/images/favicon.ico" />
<link rel="icon" type="image/x-icon" href="/wiki/images/favicon.ico" />

<pre>
!!! Overview
[{$pagename}] is a programming model and an associated implementation for processing and generating [big data] sets with a parallel, distributed [algorithm] on a [Computing Cluster].

[{$pagename}] is a special form of a [Directed Acyclic Graph] ([DAG]) which is applicable in a wide range of use cases. It is organized as a [Map()] function which transform a piece of data into some number of key/value pairs. Each of these elements will then be sorted by their key and reach to the same node, where a [Reduce()] function is use to merge the values (of the same key) into a single result.

The &quot;[{$pagename}] System&quot; orchestrates the processing by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.

The model is a specialization of the split-apply-combine strategy for data analysis. [{$pagename}] is inspired by the [Map()] and [Reduce()] functions commonly used in [Functional Programming], although their purpose in the [{$pagename}] framework is not the same as in their original forms.

The key contributions of the [{$pagename}] framework are not the actual map and reduce functions (which, for example, resemble the 1995 [Message Passing Interface] standard's reduce and scatter operations), but the scalability and fault-tolerance achieved for a variety of applications by optimizing the execution engine. As such, a single-threaded implementation of MapReduce will usually not be faster than a traditional (non-MapReduce) implementation; any gains are usually only seen with multi-threaded implementations.[9] The use of this model is beneficial only when the optimized distributed shuffle operation (which reduces network communication cost) and fault tolerance features of the MapReduce framework come into play. Optimizing the communication cost is essential to a good MapReduce algorithm.

[{$pagename}] libraries have been written in many programming languages, with different levels of optimization. A popular open-source implementation that has support for distributed shuffles is part of Apache Hadoop. The name MapReduce originally referred to the proprietary [Google] technology, but has since been genericized. By [2014|Year 2014], [Google] was no longer using [{$pagename}] as their primary Big Data processing model, and development on [Apache Mahout] had moved on to more capable and less disk-oriented mechanisms that incorporated full map and reduce capabilities.[12]


!! More Information
There might be more information for this subject on one of the following:
[{ReferringPagesPlugin before='*' after='\n' }]
----
* [#1] - [MapReduce|Wikipedia:MapReduce|target='_blank'] - based on information obtained 2017-04-30- 
</pre>