Image generation models are causing a sensation worldwide, particularly the powerful Stable Diffusion technique 🔥. With Stable Diffusion, you can generate images with your laptop, previously impossible 🤩. Here's how diffusion models work, in plain

Thread

Image generation models are causing a sensation worldwide, particularly the powerful Stable Diffusion technique 🔥.

With Stable Diffusion, you can generate images with your laptop, previously impossible 🤩.

Here's how diffusion models work, in plain English:
-A Thread--
🧵

1/ Generating images involves two processes.

Diffusion adds noise gradually to the image until its unrecognizable, and a reversion diffusion process removes the noise.

The models then try to generate new images from the noise image.

2/ Denoising is done using convolutional neural networks such as U-NET.

A U-NET is made up of an encoder for creating the latent representation of the image and a decoder for creating an image from the low-level image representation.

3/ Noise is not removed from the image at once but is done gradually for the defined number of steps.

Removing noise step-by-step makes the process of generating images from pure noise easier.

Therefore, the goal is to improve upon the previous step.

4/ Generating the image in one step leads to a noisy image.

At each time step, a fraction of the noise and not the entire noise is removed.

The same concept is used in text-to-image generation, where you inject the textual information gradually instead of at once.

5/ The text information is added by concatenating the text representation from a language model on the image input and also through cross-attention.

Cross-attention enables the CNN attention layers to attend to the text tokens.

6/ Diffusion models are compute-intensive because of the number of steps involved in the denoising process.

This can be solved by training the network with small images and adding a network to upsample the result to larger images.

7/ Latent diffusion models (LDM) solve this problem by generating the image in the latent space instead of the image space.

LDMs create a low-dimensional image representation by passing it through an encoder network. Apply noise to the image representation instead of the image.

8/ The reverse diffusion process works with the low-dimensional image representation instead of the image itself.

This is a less compute-intensive process because the model is not working with the entire image. As a result, you can perform image generation on your laptop.

9/9 You can try out stable diffusion models on:

• @huggingface
huggingface.co/spaces/stabilityai/stable-diffusion with Stable Diffusion 2.1
• lexica.art
• dreamstudio.ai

That's it for today. Follow @themwiti for more posts on machine learning and deep learning.

Follow @themwiti for more content on machine learning and deep learning.

Mentions

See All

Akshay 🚀 @akshay_pachaar · Mar 14, 2023

Post
From Twitter

Well written thread !!

Jaydeep Karale @_jaydeepkarale · Mar 14, 2023

Post
From Twitter

Superb explanation

Thread by mwiti

Thread

Mentions