Get free YouTube views, likes and subscribers

Get Free YouTube Subscribers, Views and Likes

How AI Creates Images/Videos/Audio - Diffusion Models Explained

Generating images, videos, and audio with AI, how does it work? Seeing the recent improvements with different GenAI diffusion models like Luma’s Dream Machine, OpenAI’s Sora, and Stable Diffusion 3 medium coming out recently, I was wondering this exact question! To gain a better understanding and help other curious folks like myself, I’ve put together a full intuitive breakdown of how diffusion models work, and the differences between image, video, and audio models.

And not to just keep everything purely theoretical, the second part of this resource shows diffusion models in action: Using Stable Diffusion 3 Medium, Stable Video img2vid, and Stable Audio Open 1.0 to generate an image, convert it into a video, and add an audio track to create a fully diffusion model generated clip.

Colab Notebook: https://colab.research.google.com/dri...
Miro Board: https://miro.com/app/board/uXjVK6HcIX...

Additional Resources:
Blog by Kemal Erden: https://erdem.pl/2023/11/stepbystep...
Blog by Lilian Wang: https://lilianweng.github.io/posts/20...

Chapters:
00:00 Introduction
01:05 Diffusion Overview
03:18 Step 1: Image Forward Diffusion
06:46 Step 2: Image Model Training
10:02 Step 3: Image Reverse Diffusion/Generation
11:58 Post Breakdown Overview
13:43 Audio Diffusion Models
15:54 Video Diffusion Models Part 1
17:29 Video Diffusion Models Handling Time & Space
20:07 Video Diffusion Models Part 2
21:33 How Diffusion Models Use Text Prompts
25:06 Diffusion Overview Recap
25:55 Code: Setting Up Colab
27:14 Code: Image Gen with Stable Diffusion 3 Medium
31:14 Code: Video Gen with Stable Video Diffusion img2vid
33:34 Code: Audio Gen with Stable Audio Open 1.0
36:57 Code: Combing Image, Video, & Audio
37:56 Outro

#artificialintelligence #stablediffusion #diffusionmodel