Making sense of Score, Flow, and Diffusion models

This post is an attempt to bridge the gap between different ideas around the latest techniques in generative modeling. We will try to do it in a mathematically rigorous fashion, and meticulously unpacking the theory and the links between these models.

Throughout, we use the following notations:

Introduction to generative modeling

A generative model is a parameterized family of probability distributions $p_{\theta}(\mathbf{x})$ that we seek to match to a true data distribution $p_{\text{data}}(\mathbf{x})$. One typically has i.i.d. samples from $p_{\text{data}}$ (the training data). We want to:

  1. Train $p_{\theta}(\mathbf{x})$ so that $p_{\theta}\approx p_{\text{data}}$.
  2. Generate (sample) new data $\mathbf{x}$ from $p_{\theta}$.
  3. Potentially evaluate or compare densities for model-based reasoning.

Different generative modeling paradigms include:

In this post, we will focus on:

Flow-based generative modeling

Traditional normalizing flows

In a discrete normalizing flow, one designs a sequence of invertible mappings $ f_i: \mathbb{R}^d \to \mathbb{R}^d$, $i=1,\dots,L$. Denote the base distribution $\pi(\mathbf{z})$, often $\mathcal{N}(\mathbf{0},\mathbf{I})$. A sample from the model is constructed as:

\[\mathbf{z}_0 \sim \pi(\mathbf{z}), \quad \mathbf{z}_1 = f_1(\mathbf{z}_0), \quad \mathbf{z}_2 = f_2(\mathbf{z}_1), \quad \dots \quad \mathbf{z}_L = f_L(\mathbf{z}_{L-1}) =: \mathbf{x}.\]

Hence $\mathbf{x} \sim p_{\theta}(\mathbf{x})$. If each $f_i$ is invertible, the model distribution can be exactly expressed:

\[p_{\theta}(\mathbf{x}) = \pi \bigl(f^{-1}(\mathbf{x})\bigr)\, \left\lvert \det \nabla_{\mathbf{x}} f^{-1}(\mathbf{x}) \right\rvert,\]

where $f = f_L \circ \dots \circ f_1$. Training typically maximizes the log-likelihood $\log p_{\theta}(\mathbf{x})$ over data $\mathbf{x}$. But carefully designing invertible $f_i$ with tractable Jacobian determinants can be restrictive.