jspωiki
Activation Function

Overview#

Activation Function of an Artificial Neuron defines the output of that Artificial Neuron given an input or set of inputs.

Activation Function is just a decision making function that determines presence of a particular feature. Zero means the Artificial Neuron says feature is not present and one means Artificial Neuron says feature is present.

A standard computer chip circuit can be seen as a digital network of Activation Functions that can be "ON" (1) or "OFF" (0), depending on input.

This is similar to the behavior of the linear perceptron in neural networks.

nonlinear Activation Functions#

Nonlinear Activation Functions allow such networks to compute nontrivial problems using only a small number of nodes. In Artificial Neural networks this function may be referred to as the transfer function.

In biologically inspired neural networks, the activation function is usually an abstraction representing the rate of action potential firing in the cell. In its simplest form, this function is binary—that is, either the neuron is firing or not. The function looks like

(v_{i})=U(v_{i})
where U is the Heaviside step function. In this case many neurons must be used in computation beyond linear separation of categories.

A line of positive slope may be used to reflect the increase in firing rate that occurs as input current increases. Such a function would be of the form {\displaystyle \phi (v_{i})=\mu v_{i} \phi (v_{i})=\mu v_{i}, where {\displaystyle \mu } \mu is the slope. This activation function is linear, and therefore has the same problems as the binary function. In addition, networks constructed using this model have unstable convergence because neuron inputs along favored paths tend to increase without bound, as this function is not normalizable.

All problems mentioned above can be handled by using a normalizable sigmoid activation function. One realistic model stays at zero until input current is received, at which point the firing frequency increases quickly at first, but gradually approaches an asymptote at 100% firing rate. Mathematically, this looks like

(v_{i})=U(v_{i})\tanh(v_{i})
where the hyperbolic tangent function can be replaced by any sigmoid function. This behavior is realistically reflected in the neuron, as neurons cannot physically fire faster than a certain rate. This model runs into problems, however, in computational networks as it is not differentiable, a requirement to calculate backpropagation.

The final model, then, that is used in multilayer Perceptrons is a sigmoidal activation function in the form of a hyperbolic tangent. Two forms of this function are commonly used:

(v_{i})=\tanh(v_{i})
whose range is normalized from -1 to 1, and
(v_{i})=(1+\exp(-v_{i}))^{-1}
is vertically translated to normalize from 0 to 1. The latter model is often considered more biologically realistic, but it runs into theoretical and experimental difficulties with certain types of computational problems.

Typically before Activation Function you would perform forward propagation

Artificial Neural networks#

"G" is often used to represent the Activation Function.

Activation Function are applied to the Hidden layers and to the Output layer.

Sigmoid function Activation Function in Python#

Here we show common Sigmoid Activation Function
A = sigmoid(np.dot(w.T,X)+b) 
General advise is to only use in a binary output layer

Hyperbolic tangent:#

tanh x=-i\tan(ix)
or
a = tanh(z) = (e^z - e^-z) / (e^z) + (e^-z)
Values will always be between +1 and -1.

Rectified Linear Unit (ReLU)#

Instead of sigmoid function, most recent Deep Learning networks use Rectified Linear Units (ReLUs) for the Hidden layers. A Rectified Linear Unit has output
  • 0 if the input is less than 0
  • raw output otherwise.
That is, if the input is greater than 0, the output is equal to the input. ReLUs' machinery is more like a real neuron in your body.

Rectified Linear Unit Activation Functiona are the simplest non-linear Activation Function you can use, obviously. When you get the input is positive, the derivative is just 1, so there isn't the squeezing effect you meet on backpropagation errors from the Sigmoid function.

Research has shown that Rectified Linear Units result in much faster training for large networks. Most frameworks like TensorFlow and TFLearn make it simple to use Rectified Linear Units on the the Hidden layers, so typically you won't need to implement them yourself.

Category#

Artificial Intelligence

More Information#

There might be more information for this subject on one of the following: