2023-07-10
🚧 WIP 🚧
Probabilistic models
We often estimate $p(x)$ as a multivariate normal distribution. Why would we want to do this($p(x)$) ?
Consider a language model(using chain rule) that can generate sentence($x$). $x_k$ is the $k$th token in the sentence. We can write the probability of the sentence as:
$$ p(x_1,x_2,\dots,x_n):=p(x)=p(x_1|x_0)p(x_2|x_1,x_0)\dots p(x_n|x_{n-1},\dots,x_0) $$
Since $x_0$ is a constant(<start>) we can drop it in the $p(x_1|x_))$.
$$ p(x)=p(x_1)p(x_2|x_1)p(x_3|x_{1:2})p(x_4|x_{1:3})\dots p(x_n|x_{1:n-1}) $$
Can we extend our language model? For example, can we "language model" images?
before | now |
---|---|
time step = word |
time step = pixel |
x is a sentence |
x is an array of pixels |
$x_k$ is the k-th token |
$x_k$ is the k-th pixel |
Main principle for training: