Diffusion RL

2023-07-10

🚧 WIP 🚧

Generative Models

Probabilistic models

$p(x)$ - Generative model
$p(y|x)$ - Classification model

We often estimate $p(x)$ as a multivariate normal distribution. Why would we want to do this($p(x)$) ?

Consider a language model(using chain rule) that can generate sentence($x$). $x_k$ is the $k$th token in the sentence. We can write the probability of the sentence as:

$$ p(x_1,x_2,\dots,x_n):=p(x)=p(x_1|x_0)p(x_2|x_1,x_0)\dots p(x_n|x_{n-1},\dots,x_0) $$

Since $x_0$ is a constant(<start>) we can drop it in the $p(x_1|x_))$.

$$ p(x)=p(x_1)p(x_2|x_1)p(x_3|x_{1:2})p(x_4|x_{1:3})\dots p(x_n|x_{1:n-1}) $$

Unsupervised pretrainging on lots of data
Representation learning
Pretraining for later finetuning
Actually generating things

Can we extend our language model? For example, can we "language model" images?

before	now
time step = word	time step = pixel
x is a sentence	x is an array of pixels
$x_k$ is the k-th token	$x_k$ is the k-th pixel

Autoregressive generative models

Main principle for training:

Divide up x into dimensions $x_1,\dots,x_n$
Discretize each $x_i$ into k values
Model $p(x)$ via the chain rule, each of factor $p(x_i|x_{1:i-1})$ is a softmax over k values
Use your favorite sequence model to model $p(x)$