Deep Learning

Keras – Categorical Cross Entropy Loss Function

In this post, you will learn about when to use categorical cross entropy loss function when training neural network using Python Keras. Generally speaking, the loss function is used to compute the quantity that the the model should seek to minimize during training. For regression models, the commonly used loss function used is mean squared error function while for classification models predicting the probability, the loss function most commonly used is cross entropy. In this post, you will learn about different types of cross entropy loss function which is used to train the Keras neural network model.

Cross entropy loss function is an optimization function which is used in case of training a classification model which classifies the data by predicting the probability of whether the data belongs to one class or the other class. One of the examples where Cross entropy loss function is used is Logistic Regression.  Check my post on the related topic – Cross entropy loss function explained with Python examples

When fitting a neural network for classification, Keras provide the following three different types of cross entropy loss function:

  • binary_crossentropy: Used as a loss function for binary classification model. The binary_crossentropy function computes the cross-entropy loss between true labels and predicted labels.
  • categorical_crossentropy: Used as a loss function for multi-class classification model where there are two or more output labels. The outputis assigned one-hot category encoding value in form of 0s and 1. The output label, if present in integer form, is converted into categorical encoding using keras.utils to_categorical method.
  • sparse_categorical_crossentropy: Used as a loss function for multi-class classification model where the outputis assigned integer value (0, 1, 2, 3…). This loss function is mathematically same as the categorical_crossentropy. It just has a different interface.

Here is how the loss function is set as one of the above in order to configure neural network. Pay attention to the parameter, loss, which is assigned the value of binary_crossentropy for learning parameters of the binary classification neural network model.

network.compile(optimizer=optimizers.RMSprop(lr=0.01), 
                loss='binary_crossentropy', 
                metrics=['accuracy'])

When loss function to be used is categorical_crossentropy, the Keras network configuration code would look like the following:

network.compile(optimizer=optimizers.RMSprop(lr=0.01), 
                loss='categorical_crossentropy', 
                metrics=['accuracy'])

You may want to check different kinds of loss functions which can be used with Keras neural network on this page – Keras Loss Functions.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Retrieval Augmented Generation (RAG) & LLM: Examples

Last updated: 25th Jan, 2025 Have you ever wondered how to seamlessly integrate the vast…

3 weeks ago

How to Setup MEAN App with LangChain.js

Hey there! As I venture into building agentic MEAN apps with LangChain.js, I wanted to…

4 weeks ago

Build AI Chatbots for SAAS Using LLMs, RAG, Multi-Agent Frameworks

Software-as-a-Service (SaaS) providers have long relied on traditional chatbot solutions like AWS Lex and Google…

4 weeks ago

Creating a RAG Application Using LangGraph: Example Code

Retrieval-Augmented Generation (RAG) is an innovative generative AI method that combines retrieval-based search with large…

1 month ago

Building a RAG Application with LangChain: Example Code

The combination of Retrieval-Augmented Generation (RAG) and powerful language models enables the development of sophisticated…

1 month ago

Building an OpenAI Chatbot with LangChain

Have you ever wondered how to use OpenAI APIs to create custom chatbots? With advancements…

1 month ago