paid-only post

Activation Function

Activation Function
Photo by Artur Voznenko / Unsplash

Quiz Timebeta

Answer the following two questions

Responses at the bottom of the post


Activation Function

An activation function determines whether or not a neuron should be activated. This implies that it will use simpler mathematical operations to determine whether the neuron's input to the network is essential or not throughout the prediction phase.

The activation Function helps the neural network to use important information while suppressing irrelevant data points.

The Activation Function's job is to generate output from a collection of input values that are passed into a node (or a layer).

The main purpose of an activation function is to add non-linearity to the neural network.

Why should we use the Activation Function?

During forward propagation, activation functions add an extra step at each layer.

Let's pretend we have a neural network that doesn't have any activation functions. In that situation, each neuron will just use the weights and biases to execute a linear change on the inputs. Because the combination of two linear functions is a linear function, it doesn't matter how many hidden layers we add to the neural network; all levels will act the same.

Although the neural network becomes simpler, learning any complex task is impossible, and our model would be just a linear regression model.

Basically, there are three types of Activation Functions.

  1. Binary Step Activation Function
  2. Linear Activation Function
  3. Non-linear Activation Functions

Linear Activation Function

The linear activation function is when the activation is proportionate to the input, also known as "no activation" or "identity function."

The function does nothing with the input's weighted sum; it just returns the value it was given.

For Example, \(f(x) = m.x+c\) is a Linear Activation Function.

Non-linear Activation Functions

To overcome the limitation of the Linear Activation Function we use Non-linear Activation Functions.

  • Sigmoid / Logistic Activation Function:

\[f(x) = \frac{1}{1+e^{-x}}\]

This function takes any real value as input and outputs values in the range of 0 to 1.

As shown above, the greater the input (more positive), the closer the output value is to 1.0, and the smaller the input (more negative), the closer the output is to 0.0. The function is differentiable and offers a smooth gradient, avoiding output value jumps. The sigmoid activation function has an S-shape to indicate this. That's why it is more useful for training the model.

  • Tanh Function (Hyperbolic Tangent):

\[f(x) = \frac{(e^ x -e ^{-x})}{(e^ x + e ^ {-x})}\]

This is similar to the Sigmoid Activation Function, but This function takes any real value as input and outputs values in the range of -1 to 1.

Sigmoid and tanh both suffer from vanishing gradients, but tanh is zero centered, so the gradients can move in any direction.

There are many activation function and in next few posts we will cover these functions.

Responses to the quiz:

This post is for paying subscribers only

Subscribe to continue reading