15 YouTube views, likes subscribers in 10 minutes. Free!
Get Free YouTube Subscribers, Views and Likes

Efficient Text-to-Image Training (16x cheaper than Stable Diffusion) | Paper Explained

Follow
Outlier

Würstchen is a diffusion model, whose textconditional model works in a highly compressed latent space of images. Why is this important? Compressing data can reduce computational costs for both training and inference by magnitudes. Training on 1024x1024 images is way more expensive than training on 32x32. Usually, other works make use of a relatively small compression, in the range of 4x 8x spatial compression. Würstchen takes this to an extreme. Through its novel design, we achieve a 42x spatial compression.

If you want to dive in even more into Würstchen here is the link to the paper & code:
Arxiv: https://arxiv.org/abs/2306.00637
Huggingface: https://huggingface.co/docs/diffusers...
Github: https://huggingface.co/dome272/wuerst...

We also created a community Discord for people interested in Generate AI:
  / discord  

posted by ontogsAssotl0