What’s Softmax Function & Why do we need it?


In this post, you will learn about the concepts of Softmax function with Python code example and why do we need Softmax function? As a data scientist / machine learning enthusiasts, it is very important to understand the concepts of Softmax function as it helps in understanding the algorithms such as neural network, multinomial logistic regression in better manner. Note that Softmax function is used in various multiclass classification machine learning algorithms such as multinomial logistic regression (thus, also called as softmax regression), neural networks etc.

What’s Softmax Function?

Simply speaking, Softmax function converts raw values (as outcome of functions) into probabilities. Here is how the softmax function looks like: 

Softmax function
Fig 1. Softmax function

Lets understand this with an example. Let’s say the models (such as those trained using algorithms such as multi-class LDA, multinomial logistic regression) output three different values such as 5.0, 2.5 and 0.5 for a particular input. In order to convert these numbers into probabilities, these numbers are fed into the softmax function as shown in fig 1.

Softmax Function Example
Fig 2. Softmax Function Example

Notice that the softmax outputs are less than 1. And, the outputs of softmax function sums upto 1. Owing to this property, Softmax function is considered as an activation function in neural network and algorithms such as multinomial logistic regression. Note that for binary logistic regression, the activation function used is sigmoid function.

Based on above, it could be understood that the output of softmax function maps to a [0, 1] range. And, it maps outputs in a way that the total sum of all the output values is 1. Thus, it could be said that the output of the softmax function is probability distribution.

Here is the Python code which was used to derive the value shown in fig 2.

import numpy as np
# Input to softmax function
input_to_softmax = [5.0, 2.5, 0.5]
# Denominator of softmax function
# Summation of exponentials
exp_sum = np.exp(input_to_softmax[0]) \
                + np.exp(input_to_softmax[1]) \
                + np.exp(input_to_softmax[2])
# Softmax functon outputs
softmax_outputs = [round(np.exp(input_to_softmax[0])/exp_sum, 1), 
                   round(np.exp(input_to_softmax[1])/exp_sum, 1), 
                   round(np.exp(input_to_softmax[2])/exp_sum, 1)]
# Print the softmax function output

When written in a concise manner, the softmax function output for input [5.0, 2.5, 0.5] can also be obtained using the following Python code. The out is [0.9, 0.1, 0.0] as shown in the above diagram.

import numpy as np
# Input to softmax function
input_to_softmax = [5.0, 2.5, 0.5]
# Softmax outputs
softmax_outputs = np.exp(input_to_softmax) / np.sum(np.exp(input_to_softmax))
[round(output, 1) for output in softmax_outputs]

Why is Softmax Function needed?

Softmax function is used in classifications algorithms where there is a need to obtain probability or probability distribution as the output. Some of these algorithms are following:

  • Neural networks
  • Multinomial logistic regression (Softmax regression)
  • Bayes naive classifier
  • Multi-class linear discriminant analysis

In artificial neural networks, the softmax function is used in the final / last layer.

Softmax function is also used in case of reinforcement learning to output probabilities related to different actions to be taken.


Here is the summary of what you learned about softmax function and why do we need to use it:

  • Softmax function is used to convert the numerical output to values in the range [0, 1]
  • The output of softmax function can be seen as probability distribution given the output sums upto 1
  • Softmax function is used in multiclass classification methods such as neural networks, multinomial logistic regression, multiclass LDA, Naive Bayes classifier.
  • Softmax function is used to output action probabilities in case of reinforcement learning
  • Softmax function is used as an activation function in the last / final layer of neural network algorithm.
Ajitesh Kumar
Follow me

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.