This page (revision-1) was last changed on 29-Nov-2024 16:16 by UnknownAuthor

Only authorized users are allowed to rename pages.

Only authorized users are allowed to delete pages.

Page revision history

Version Date Modified Size Author Changes ... Change note

Page References

Incoming links Outgoing links

Version management

Difference between version and

At line 1 added 40 lines
!!! Overview
[{$pagename}] is a method used in [Artificial Neural networks] to calculate [Loss function] which tells you how you need to change a single [Training dataset] entry in terms of the relative proportions of the [weight] and [bias] for the [Artificial Neuron] so as to most efficiently decrease the loss.
[{$pagename}] involves the averaging the [Loss function] for every entry within the [Training dataset] for each [Artificial Neuron] in each layer to determine the total [Cost function]
[{$pagename}] is a method used in [Artificial Neural networks] to calculate the error contribution of each [neuron] after a batch of data (in image recognition, multiple images) is processed.
[{$pagename}] is a special case of an older and more general technique called automatic differentiation. In the [context] of [learning], [{$pagename}] is commonly used by the [gradient descent] optimization [algorithm] to adjust the weight of [neurons] by calculating the gradient of the [loss function]. This technique is also sometimes called [{$pagename}] of errors, because the error is calculated at the output and distributed back through the network layers.
[{$pagename}] [algorithm] has been repeatedly rediscovered and is equivalent to automatic differentiation in reverse accumulation mode.
[{$pagename}] requires a known, desired output for each input value—it is therefore considered to be a [Supervised Learning] method (although it is used in some unsupervised networks such as autoencoders).
[{$pagename}] is also a generalization of the delta rule to multi-layered [Feedforward Neural networks], made possible by using the chain rule to iteratively compute gradients for each layer. [{$pagename}] is closely related to the Gauss–Newton algorithm, and is part of continuing research in neural [{$pagename}].
[{$pagename}] can be used with any gradient-based optimizer, such as L-BFGS or truncated Newton.
[{$pagename}] is commonly used to train [Deep Learning] [Artificial Neural networks] with more than one [hidden node].
!! [{$pagename}] to find [Gradient descent] in [Python]
%%prettify
{{{
dw = 1/m*np.dot(X,(A-Y).T)
db = 1/m*np.sum(A-Y)
}}}
/%
[{$pagename}] is an [Algorithm] for the partial derivative ∂C/∂w of the [Cost function] "C" with respect to any [weight] w (or [bias] b) in the network. The expression tells us how quickly the cost changes when we change the [weights] and [biases|Bias]. And while the expression is somewhat complex, it also has a beauty to it, with each element having a natural, intuitive interpretation. And so backpropagation isn't just a fast algorithm for learning. It actually gives us detailed insights into how changing the weights and biases changes the overall behaviour of the [Artificial Neural network].
!! Category
%%category [Artificial Intelligence]%%
!! More Information
There might be more information for this subject on one of the following:
[{ReferringPagesPlugin before='*' after='\n' }]
----
* [#1] - [Backpropagation|Wikipedia:Backpropagation|target='_blank'] - based on information obtained 2017-11-24-