!!! Overview [{$pagename}] is a method used in [Artificial Neural networks] to calculate [Loss function] which tells you how you need to change a single [Training dataset] entry in terms of the relative proportions of the [weight] and [bias] for the [Artificial Neuron] so as to most efficiently decrease the loss. [{$pagename}] involves the averaging the [Loss function] for every entry within the [Training dataset] for each [Artificial Neuron] in each layer to determine the total [Cost function] [{$pagename}] is a method used in [Artificial Neural networks] to calculate the error contribution of each [neuron] after a batch of data (in image recognition, multiple images) is processed. [{$pagename}] is a special case of an older and more general technique called automatic differentiation. In the [context] of [learning], [{$pagename}] is commonly used by the [gradient descent] optimization [algorithm] to adjust the weight of [neurons] by calculating the gradient of the [loss function]. This technique is also sometimes called [{$pagename}] of errors, because the error is calculated at the output and distributed back through the network layers. [{$pagename}] [algorithm] has been repeatedly rediscovered and is equivalent to automatic differentiation in reverse accumulation mode. [{$pagename}] requires a known, desired output for each input value—it is therefore considered to be a [Supervised Learning] method (although it is used in some unsupervised networks such as autoencoders). [{$pagename}] is also a generalization of the delta rule to multi-layered [Feedforward Neural networks], made possible by using the chain rule to iteratively compute gradients for each layer. [{$pagename}] is closely related to the Gauss–Newton algorithm, and is part of continuing research in neural [{$pagename}]. [{$pagename}] can be used with any gradient-based optimizer, such as L-BFGS or truncated Newton. [{$pagename}] is commonly used to train [Deep Learning] [Artificial Neural networks] with more than one [hidden node]. !! [{$pagename}] to find [Gradient descent] in [Python] %%prettify {{{ dw = 1/m*np.dot(X,(A-Y).T) db = 1/m*np.sum(A-Y) }}} /% [{$pagename}] is an [Algorithm] for the partial derivative ∂C/∂w of the [Cost function] "C" with respect to any [weight] w (or [bias] b) in the network. The expression tells us how quickly the cost changes when we change the [weights] and [biases|Bias]. And while the expression is somewhat complex, it also has a beauty to it, with each element having a natural, intuitive interpretation. And so backpropagation isn't just a fast algorithm for learning. It actually gives us detailed insights into how changing the weights and biases changes the overall behaviour of the [Artificial Neural network]. !! Category %%category [Artificial Intelligence]%% !! More Information There might be more information for this subject on one of the following: [{ReferringPagesPlugin before='*' after='\n' }] ---- * [#1] - [Backpropagation|Wikipedia:Backpropagation|target='_blank'] - based on information obtained 2017-11-24-