In this post, you will learn about concepts of neural networks with the help of mathematical models examples. In simple words, you will learn about how to represent the neural networks using mathematical equations. As a data scientist / machine learning researcher, it would be good to get a sense of how the neural networks can be converted into a bunch of mathematical equations for calculating different values. Having a good understanding of representing the activation function output of different computation units / nodes / neuron in different layers would help in understanding back propagation algorithm in a better and easier manner. This will be dealt in one of the future posts.
Here is how a single layer neural network looks like. You may want to check out my post on Perceptron – Perceptron explained with Python example.
Here is how the mathematical equation would look like for getting the value of a1 (output node) as a function of input x1, x2, x3.
[latex]a^{(2)}_1 = g(\theta^{(1)}_{10}x_0 + \theta^{(1)}_{11}x_1 + \theta^{(1)}_{12}x_2 + \theta^{(1)}_{13}x_3)[/latex]
.
In the above equation, the superscript of weight represents the layer and the subscript of weights represent the weight of connection between the input node to output node. Thus, [latex]\theta^{(1)}_12[/latex] represents the weight of the first layer between the node 1 in next layer and node 2 in current layer.
Here is a neural network with one hidden layer having three units, an input layer with 3 input units and an output layer with one unit.
Here is how the mathematical equation would look like for getting the value of a1, a2 and a3 in layer 2 as a function of input x1, x2, x3. Further, the value of a1 in layer 3 is represented as a function of value of a1, a2 and a3 in layer 2.
As a first step, lets represent the output values processed in three hidden units in the hidden layer. Input layer is represented as layer 1, hidden layer as layer 2 and output layer as layer 3.
[latex]a^{(2)}_1 = g(\theta^{(1)}_{10}x_0 + \theta^{(1)}_{11}x_1 + \theta^{(1)}_{12}x_2 + \theta^{(1)}_{13}x_3)[/latex]
.
[latex]a^{(2)}_2 = g(\theta^{(1)}_{20}x_0 + \theta^{(1)}_{21}x_1 + \theta^{(1)}_{22}x_2 + \theta^{(1)}_{23}x_3)[/latex]
.
[latex]a^{(2)}_3= g(\theta^{(1)}_{30}x_0 + \theta^{(1)}_{31}x_1 + \theta^{(1)}_{32}x_2 + \theta^{(1)}_{33}x_3)[/latex]
.
Lets determine the output value of node / unit in the output layer. The value gets represented as a function of a1, a2 and a3 in the previous nodes / units which could be represented as value of x1, x2 and x3 in the input layer.
[latex]a^{(3)}_1 = g(\theta^{(2)}_{10}a^{(2)}_0 + \theta^{(2)}_{11}a^{(2)}_1 + \theta^{(2)}_{12}a^{(2)}_2 + \theta^{(2)}_{13}a^{(2)}_3)[/latex]
.
Here is a neural network with one hidden layer having three units, an input layer with 2 input units and an output layer with 2 units.
Here is how the mathematical equation would look like for getting the value of a1, a2 and a3 in layer 2 as a function of input x1, x2. Further, the value of a1 and a2 in layer 3 is represented as a function of value of a1, a2 and a3 in layer 2.
As a first step, lets represent the output values processed in three hidden units in the hidden layer. Input layer is represented as layer 1, hidden layer as layer 2 and output layer as layer 3.
[latex]a^{(2)}_1 = g(\theta^{(1)}_{10}x_0 + \theta^{(1)}_{11}x_1 + \theta^{(1)}_{12}x_2)[/latex]
.
[latex]a^{(2)}_2 = g(\theta^{(1)}_{20}x_0 + \theta^{(1)}_{21}x_1 + \theta^{(1)}_{22}x_2)[/latex]
.
[latex]a^{(2)}_3= g(\theta^{(1)}_{30}x_0 + \theta^{(1)}_{31}x_1 + \theta^{(1)}_{32}x_2)[/latex]
.
Let’s determine the output value of nodes / units in the output layer. The value gets represented as a function of a1, a2 and a3 in the previous nodes / units which could be represented as value of x1, x2 and x3 in the input layer.
[latex]a^{(3)}_1 = g(\theta^{(2)}_{10}a^{(2)}_0 + \theta^{(2)}_{11}a^{(2)}_1 + \theta^{(2)}_{12}a^{(2)}_2 + \theta^{(2)}_{13}a^{(2)}_3)[/latex]
.
[latex]a^{(3)}_2 = g(\theta^{(2)}_{20}a^{(2)}_0 + \theta^{(2)}_{21}a^{(2)}_1 + \theta^{(2)}_{22}a^{(2)}_2 + \theta^{(2)}_{23}a^{(2)}_3)[/latex]
.
Lastly, let’s take a look at how the output values of nodes / unit a1 in output layer can be expressed as mathematical computations as a function of input signals x1 and x2. Here is the diagram of the deep learning network having two hidden layers, one having three nodes / units and other having 2 nodes / units Then there is an input layer having two input nodes and an output layer having one output node / unit. Here is the diagram of a simplistic deep learning network.
The values at layer 2 (a1, a2 and a3) and layer 3 (a1 and a2) will remain same as shown in the previous section. Lets represent the value of 1 in the output layer as a function of values of a1 and a2 in the previous layer (layer 3).
[latex]a^{(4)}_1 = g(\theta^{(3)}_{10}a^{(3)}_0 + \theta^{(3)}_{11}a^{(3)}_1 + \theta^{(3)}_{12}a^{(3)}_2)[/latex]
.
Here is the summary of what you learned in this post regarding representing neural networks as mathematical models:
Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…
In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…
In this blog, I aim to provide a comprehensive list of valuable resources for learning…
Have you ever wondered how systems determine whether to grant or deny access, and how…
What revolutionary technologies and industries will define the future of business in 2025? As we…
For data scientists and machine learning researchers, 2024 has been a landmark year in AI…