Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,200 --> 00:00:01,580
I already told you.
2
00:00:01,610 --> 00:00:03,680
We work with stable diffusion.
3
00:00:03,680 --> 00:00:05,570
And stable diffusion is.
4
00:00:05,840 --> 00:00:07,550
It's a diffusion model.
5
00:00:07,550 --> 00:00:12,890
In this video we will take a look at what diffusion models are and what they do.
6
00:00:12,890 --> 00:00:16,940
So I have found a really, really nice article from medium.
7
00:00:16,940 --> 00:00:21,740
This article is relatively long but we won't make it that far.
8
00:00:21,920 --> 00:00:24,890
All I need is this picture right here.
9
00:00:24,890 --> 00:00:33,320
Let's assume we have a big, big computer and we train our computer on images on images like this.
10
00:00:33,320 --> 00:00:41,300
So we give the computer images, for example, of this beach, and we describe it with a text.
11
00:00:41,300 --> 00:00:50,360
We give the computer the image and we say maybe a beach with the blue ocean, blue sky, there's some
12
00:00:50,360 --> 00:00:52,370
green on the mountains and so on.
13
00:00:52,370 --> 00:00:55,070
We are really, really specific.
14
00:00:55,250 --> 00:01:02,840
After that we add some noise to the picture, like you see here, but we still described what's on the
15
00:01:02,840 --> 00:01:03,560
picture.
16
00:01:03,560 --> 00:01:08,780
So a beach, blue ocean, blue sky and so on.
17
00:01:08,780 --> 00:01:16,730
More noise, same text, more noise, same text, more noise, same text until you get only noise.
18
00:01:17,210 --> 00:01:23,480
In this process, the computer learns how these pictures look like.
19
00:01:23,480 --> 00:01:32,540
In this process, it simply understands that the words that you gave the computer yield to this picture.
20
00:01:32,540 --> 00:01:35,210
So we can reverse this.
21
00:01:35,210 --> 00:01:44,030
If we have only noise and we tell the computer a beach, blue sky, blue ocean, there is some green
22
00:01:44,030 --> 00:01:45,920
on the mountains and so on.
23
00:01:45,920 --> 00:01:50,510
The computer can reverse this and make out of the noise.
24
00:01:50,510 --> 00:01:54,980
This picture, this is really, really a cool concept.
25
00:01:54,980 --> 00:01:58,640
And of course we don't do this with just one picture.
26
00:01:58,640 --> 00:02:03,590
We try to give the computer every picture that we can find.
27
00:02:03,590 --> 00:02:06,950
And there are of course different diffusion models.
28
00:02:06,950 --> 00:02:10,490
For example, there's also Adobe Firefly.
29
00:02:10,610 --> 00:02:15,710
Adobe Firefly is trained on pictures of Adobe Stock Stable.
30
00:02:15,710 --> 00:02:18,740
Diffusion is open source and it's free.
31
00:02:18,740 --> 00:02:20,480
Everybody can use it.
32
00:02:20,480 --> 00:02:25,130
And Stable Diffusion was trained on pictures from the internet.
33
00:02:25,130 --> 00:02:31,940
And because of this, we also can create nearly everything that is on the internet.
34
00:02:31,940 --> 00:02:34,580
We can create even celebrities.
35
00:02:34,580 --> 00:02:38,780
We can create not safe for work stuff and so on.
36
00:02:38,780 --> 00:02:42,020
Stable diffusion is not restricted.
37
00:02:42,050 --> 00:02:49,070
Nearly everything that is in the internet we can create with stable diffusion if we give the right prompts.
38
00:02:49,070 --> 00:02:54,890
The prompts are the descriptions that we give the computer to make our picture.
39
00:02:54,890 --> 00:03:02,510
And for that instance, it's really, really important to make good prompts because we need good pictures.
40
00:03:02,510 --> 00:03:07,790
If we are not specific, we can create a pictures that look like this.
41
00:03:07,820 --> 00:03:12,710
If we simply tell maybe a beach, we will get a random beach.
42
00:03:12,710 --> 00:03:21,200
If we tell him a beach, blue ocean, blue sky and so on, we will get exactly this picture.
43
00:03:21,440 --> 00:03:28,490
So a quick illustration of this process because some people like this illustration, I use this a lot.
44
00:03:28,490 --> 00:03:33,110
Just imagine you lay down on the ground and you look in the sky.
45
00:03:33,140 --> 00:03:41,360
Besides, you is your girlfriend or your boyfriend or whoever you want and she tells to you, can you
46
00:03:41,360 --> 00:03:42,830
see this cloud?
47
00:03:42,830 --> 00:03:46,970
It looks a little bit like an apple, but you don't get it.
48
00:03:46,970 --> 00:03:48,770
You don't see the apple.
49
00:03:48,950 --> 00:03:54,110
But then she tells you, of course, just look, here is the apple.
50
00:03:54,110 --> 00:03:56,540
And then you start to understand.
51
00:03:56,540 --> 00:04:05,240
You see the cloud and now your eyes see an apple because your brain is trained on apples, your brain
52
00:04:05,240 --> 00:04:08,630
most likely knows how a apple looks like.
53
00:04:08,630 --> 00:04:14,270
And then you see the apple in the cloud, even if there is no apple there.
54
00:04:14,270 --> 00:04:21,080
And if your girlfriend doesn't say it's maybe a green apple, maybe you think of a red apple.
55
00:04:21,080 --> 00:04:26,030
And that's exactly why we need to use good prompt engineering.
56
00:04:26,030 --> 00:04:31,130
Because if we don't are specific, we will get random pictures.
57
00:04:31,130 --> 00:04:37,940
If you want to have a green apple, you need to tell the computer that you want to have a green apple,
58
00:04:37,940 --> 00:04:40,010
just like your girlfriend.
59
00:04:40,010 --> 00:04:44,090
Need to tell you that the apple in the clouds is green.
60
00:04:44,090 --> 00:04:51,050
If she doesn't tell you that, maybe you think of a red apple, maybe of a green apple, maybe even
61
00:04:51,050 --> 00:04:53,180
a yellow apple you doesn't know.
62
00:04:53,180 --> 00:04:55,970
So you need to be specific.
63
00:04:55,970 --> 00:04:59,780
So in this video we took a quick look at the diffusion.
64
00:04:59,960 --> 00:05:00,320
Model.
65
00:05:00,350 --> 00:05:02,930
The diffusion model works simple.
66
00:05:02,930 --> 00:05:05,990
It's trained on pictures and on text.
67
00:05:06,020 --> 00:05:08,090
Then noise gets added.
68
00:05:08,090 --> 00:05:16,400
The computer learns in this process how this picture looks like, and if we give the computer text afterwards,
69
00:05:16,400 --> 00:05:26,870
it can just create this pictures because it will randomly select the pixels that are right for our picture.
70
00:05:26,870 --> 00:05:29,570
And I hope this makes sense for you.
71
00:05:29,570 --> 00:05:33,290
And in the next video we will take an even closer look.
72
00:05:33,290 --> 00:05:36,830
Because stable diffusion is a bit special.
73
00:05:36,830 --> 00:05:41,600
We can use different checkpoints, Laura's seats, and so on.
7064
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.