!!! Overview
[{$pagename}] is a method used in [Artificial Neural networks] to calculate [Loss function] which tells you how you need to change a single [Training dataset] entry in terms of the relative proportions of the [weight] and [bias] for the [Artificial Neuron] so as to most efficiently decrease the loss.


[{$pagename}] involves the averaging the [Loss function] for every entry within the [Training dataset] for each [Artificial Neuron] in each layer to determine the total [Cost function]


[{$pagename}] is a method used in [Artificial Neural networks] to calculate the error contribution of each [neuron] after a batch of data (in image recognition, multiple images) is processed. 

[{$pagename}] is a special case of an older and more general technique called automatic differentiation. In the [context] of [learning], [{$pagename}] is commonly used by the [gradient descent] optimization [algorithm] to adjust the weight of [neurons] by calculating the gradient of the [loss function]. This technique is also sometimes called [{$pagename}] of errors, because the error is calculated at the output and distributed back through the network layers.

[{$pagename}] [algorithm] has been repeatedly rediscovered and is equivalent to automatic differentiation in reverse accumulation mode. 

[{$pagename}] requires a known, desired output for each input value—it is therefore considered to be a [Supervised Learning] method (although it is used in some unsupervised networks such as autoencoders).

[{$pagename}] is also a generalization of the delta rule to multi-layered [Feedforward Neural networks], made possible by using the chain rule to iteratively compute gradients for each layer. [{$pagename}] is closely related to the Gauss–Newton algorithm, and is part of continuing research in neural [{$pagename}]. 

[{$pagename}] can be used with any gradient-based optimizer, such as L-BFGS or truncated Newton.

[{$pagename}] is commonly used to train [Deep Learning] [Artificial Neural networks] with more than one [hidden node].

!! [{$pagename}] to find [Gradient descent] in [Python]
%%prettify 
{{{
dw = 1/m*np.dot(X,(A-Y).T)
db = 1/m*np.sum(A-Y)
}}} 
/%

[{$pagename}] is an [Algorithm] for the partial derivative ∂C/∂w of the [Cost function] "C" with respect to any [weight] w (or [bias] b) in the network. The expression tells us how quickly the cost changes when we change the [weights] and [biases|Bias]. And while the expression is somewhat complex, it also has a beauty to it, with each element having a natural, intuitive interpretation. And so backpropagation isn't just a fast algorithm for learning. It actually gives us detailed insights into how changing the weights and biases changes the overall behaviour of the [Artificial Neural network].


!! Category
%%category [Artificial Intelligence]%%

!! More Information
There might be more information for this subject on one of the following:
[{ReferringPagesPlugin before='*' after='\n' }]
----
* [#1] - [Backpropagation|Wikipedia:Backpropagation|target='_blank'] - based on information obtained 2017-11-24-