April 24, 2025

[ad_1]

The art of combining neural networks, fixed points and ordinary differential equations

Source: https://pixabay.com/photos/sea-waves-nature-light-ripples-7484743/

Recently, smart researchers have realized that we can combine seemingly unrelated ideas to reinterpret how neural networks work and how we train them. First, if we increase the number of hidden layers of a neural network toward infinity, we can see the output of the neural network as a fixed point problem. Second, there is a deep connection between neural networks and ordinary differential equations (ODEs). We can actually train neural networks using ODEs solvers. So, what if we combine these two ideas together? That is, we train a neural network by finding the steady-state of an ODE. Well, it turns out that it works quite well, and this what this blog post is about.

Credits: This post is based on this excellent blog post, which totally blew my mind the first time I read it.

Let’s say we have a dynamical system defined by the relation x’=f(x). A fixed point of the system is reached when the left-hand-side is equal to the left-hand side: x*=f(x*). To find a fixed point, one may start with a random guess, and then apply the function f and certain number of times.

This process is illustrated below with the fixed point cos(x)=x. I start with an initial guess x0 = -1. Then I update my guess using the rule x1 = cos(x0). And then I repeat the process. After a few iterations, the xn is really close to the red line (the 45° line), which indicates that

Calculating the fixed point cos(x)=x. Source: Author’s calculations.

Note for the careful reader: the reasons things works well here is because the function x->cos(x) is a contraction mapping on the interval of interest. See: https://en.wikipedia.org/wiki/Banach_fixed-point_theorem

What does it have to do with neural networks? Well, let’s consider a neural network with a single layer x’=NN(x). Now, let’s say we add another layer, using the same architecture: x’’ = NN(NN(x)). Let’s do that operation again: x’’’ = NN(NN(NN(x))). Etc… This process is really similar to what we have done above, with the simple fixed point problem cos(x)=x.

So far, so good. Now, let’s assume we are considering a physical system (a ball, a rocket, etc.) with position x. Let’s assume that f gives us the velocity of the system: f(x) = dx/dt. Now dx ≈ x_{n+1}-x_{n}, so

The physical system does not move when the velocity is null: f(x) = 0. The physical system also does not move when g(x)=x, which is a fixed point problem of the type described above. So the punchline is that there is a connection between fixed point problems and finding the steady-state of an ODE.

Sometimes, we can find exact solutions of ODEs. Most the time, we cannot, so we have to find numerical approximations. One (brilliant) idea is to approximate the solution using a neural network. More specifically, for the function g above, we use a neural network.

For a given g, we can use an ODE solver that gives us the fixed point of the ODE. Going one step further, we can train the neural network so that for a given input, the fixed point of the ODE is quantity we want to predict. In nutshell, this is what Deep Equilibrium Models (DEM) are all about.

As a first pass, we can check if this methodology works with a very simple case. Here, given x, we want the DEM to predict the value 2x. The code below uses Julia, which is like to describe as “fast Python”:

It should output a graph similar to this one:

Source: Author’s calculations based on the code above. Code based on https://julialang.org/blog/2021/10/DEQ/

Things work as expected. However, learning the function y=2x using a DEM feels like using a bazooka to kill a fly. In the next application, we tackle a slightly more ambitious target: the MNIST dataset and the prediction of digits from images.

The code is a bit more involved, but the main idea remains unchanged. For a given input, the output is a fixed point of an ODE, where the ODE depends on a neural network.

Here, we have to do a bit more work because we first have to transform images to vectors. Then we transform the steady state of the ODE to a digit prediction using a softmax layer (already included in the loss function logitcrossentropy). Note also that the ODE layer is sandwiched between two other layers. Feel free to experiment with other architectures.

After training, you should see something like this

Source: Author’s calculations based on the code above. Code based on https://julialang.org/blog/2021/10/DEQ/
DEM prediction: 2
True digit: 2

This blog post offers an overview of what Deep Equilibrium Models. It seems almost magical that we can create a prediction machine that combines neural networks, ODEs and fixed points. After reading this blog post, I hope you understand a little bit more the mechanics behind this magic.

Bai, Shaojie, J. Zico Kolter, and Vladlen Koltun. “Deep equilibrium models.” Advances in Neural Information Processing Systems 32 (2019).

Chen, Ricky TQ, et al. “Neural ordinary differential equations.” Advances in neural information processing systems 31 (2018).

Composability in Julia: Implementing Deep Equilibrium Models via Neural ODEs. url: https://julialang.org/blog/2021/10/DEQ/

Deep Equilibrium Models via Neural ODEs Republished from Source https://towardsdatascience.com/deep-equilibrium-models-via-neural-odes-c25a3ac8d004?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed

<!–

–>

[ad_2]

Source link