root `WRITEUP` Autodiff
Autodiff Static reverse-mode automatic differentiation in C `2025-03-11` Project Files • GitHub Repo This project is mostly an attempt at convincing myself that I understand the fundamentals of deep learning. It consists of a static reverse-mode automatic differentiation library built up into a multilayer perceptron that trains on the MNIST database. For more details, see the project’s readme. The multilayer perceptron is compiled in two stages because it resorts to code generation for backpropagation. Interpreting computation graphs is slow, and just-in-time compiling them sounds hard and is not portable, so code generation seemed most sensible. The rough workflow for making a computation differentiable is to replace scalars (`float`s and `double`s) with computation graphs (`struct node *`s), and operations on floats and doubles (say, `a + b` and `log(x)`) with their overloaded counterparts (here, `node_add(a, b)` and `node_log(x)`). But here’s what’s meta: the output of differentiation—the derivative of a computation—is itself a computation, so by using this very workflow we can make it differentiable. And by making differentiation output differentiable computations, we gain the ability to compute derivatives of arbitrary order. For a practical use-case, see taylor.c.
© 2025 Emilien Breton `Autodiff/WRITEUP.md`

Autodiff

Static reverse-mode automatic differentiation in C 2025-03-11

This project is mostly an attempt at convincing myself that I understand the fundamentals of deep learning. It consists of a static reverse-mode automatic differentiation library built up into a multilayer perceptron that trains on the MNIST database. For more details, see the project’s readme.

The multilayer perceptron is compiled in two stages because it resorts to code generation for backpropagation. Interpreting computation graphs is slow, and just-in-time compiling them sounds hard and is not portable, so code generation seemed most sensible.

The rough workflow for making a computation differentiable is to replace scalars (floats and doubles) with computation graphs (struct node *s), and operations on floats and doubles (say, a + b and log(x)) with their overloaded counterparts (here, node_add(a, b) and node_log(x)). But here’s what’s meta: the output of differentiation—the derivative of a computation—is itself a computation, so by using this very workflow we can make it differentiable. And by making differentiation output differentiable computations, we gain the ability to compute derivatives of arbitrary order. For a practical use-case, see taylor.c.