Chufan Chen's Homepage

Diffusion RL

2023-07-10

🚧 WIP 🚧

Generative Models

Probabilistic models

We often estimate $p(x)$ as a multivariate normal distribution. Why would we want to do this($p(x)$) ?

Consider a language model(using chain rule) that can generate sentence($x$). $x_k$ is the $k$th token in the sentence. We can write the probability of the sentence as:

$$ p(x_1,x_2,\dots,x_n):=p(x)=p(x_1|x_0)p(x_2|x_1,x_0)\dots p(x_n|x_{n-1},\dots,x_0) $$

Since $x_0$ is a constant(<start>) we can drop it in the $p(x_1|x_))$.

$$ p(x)=p(x_1)p(x_2|x_1)p(x_3|x_{1:2})p(x_4|x_{1:3})\dots p(x_n|x_{1:n-1}) $$

Can we extend our language model? For example, can we "language model" images?

before now

time step = word

time step = pixel

x is a sentence

x is an array of pixels

$x_k$ is the k-th token

$x_k$ is the k-th pixel

Autoregressive generative models

Main principle for training:

  1. Divide up x into dimensions $x_1,\dots,x_n$
  2. Discretize each $x_i$ into k values
  3. Model $p(x)$ via the chain rule, each of factor $p(x_i|x_{1:i-1})$ is a softmax over k values
  4. Use your favorite sequence model to model $p(x)$