All language subtitles for 2. The Diffusion Model explained

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,200 --> 00:00:01,580 I already told you. 2 00:00:01,610 --> 00:00:03,680 We work with stable diffusion. 3 00:00:03,680 --> 00:00:05,570 And stable diffusion is. 4 00:00:05,840 --> 00:00:07,550 It's a diffusion model. 5 00:00:07,550 --> 00:00:12,890 In this video we will take a look at what diffusion models are and what they do. 6 00:00:12,890 --> 00:00:16,940 So I have found a really, really nice article from medium. 7 00:00:16,940 --> 00:00:21,740 This article is relatively long but we won't make it that far. 8 00:00:21,920 --> 00:00:24,890 All I need is this picture right here. 9 00:00:24,890 --> 00:00:33,320 Let's assume we have a big, big computer and we train our computer on images on images like this. 10 00:00:33,320 --> 00:00:41,300 So we give the computer images, for example, of this beach, and we describe it with a text. 11 00:00:41,300 --> 00:00:50,360 We give the computer the image and we say maybe a beach with the blue ocean, blue sky, there's some 12 00:00:50,360 --> 00:00:52,370 green on the mountains and so on. 13 00:00:52,370 --> 00:00:55,070 We are really, really specific. 14 00:00:55,250 --> 00:01:02,840 After that we add some noise to the picture, like you see here, but we still described what's on the 15 00:01:02,840 --> 00:01:03,560 picture. 16 00:01:03,560 --> 00:01:08,780 So a beach, blue ocean, blue sky and so on. 17 00:01:08,780 --> 00:01:16,730 More noise, same text, more noise, same text, more noise, same text until you get only noise. 18 00:01:17,210 --> 00:01:23,480 In this process, the computer learns how these pictures look like. 19 00:01:23,480 --> 00:01:32,540 In this process, it simply understands that the words that you gave the computer yield to this picture. 20 00:01:32,540 --> 00:01:35,210 So we can reverse this. 21 00:01:35,210 --> 00:01:44,030 If we have only noise and we tell the computer a beach, blue sky, blue ocean, there is some green 22 00:01:44,030 --> 00:01:45,920 on the mountains and so on. 23 00:01:45,920 --> 00:01:50,510 The computer can reverse this and make out of the noise. 24 00:01:50,510 --> 00:01:54,980 This picture, this is really, really a cool concept. 25 00:01:54,980 --> 00:01:58,640 And of course we don't do this with just one picture. 26 00:01:58,640 --> 00:02:03,590 We try to give the computer every picture that we can find. 27 00:02:03,590 --> 00:02:06,950 And there are of course different diffusion models. 28 00:02:06,950 --> 00:02:10,490 For example, there's also Adobe Firefly. 29 00:02:10,610 --> 00:02:15,710 Adobe Firefly is trained on pictures of Adobe Stock Stable. 30 00:02:15,710 --> 00:02:18,740 Diffusion is open source and it's free. 31 00:02:18,740 --> 00:02:20,480 Everybody can use it. 32 00:02:20,480 --> 00:02:25,130 And Stable Diffusion was trained on pictures from the internet. 33 00:02:25,130 --> 00:02:31,940 And because of this, we also can create nearly everything that is on the internet. 34 00:02:31,940 --> 00:02:34,580 We can create even celebrities. 35 00:02:34,580 --> 00:02:38,780 We can create not safe for work stuff and so on. 36 00:02:38,780 --> 00:02:42,020 Stable diffusion is not restricted. 37 00:02:42,050 --> 00:02:49,070 Nearly everything that is in the internet we can create with stable diffusion if we give the right prompts. 38 00:02:49,070 --> 00:02:54,890 The prompts are the descriptions that we give the computer to make our picture. 39 00:02:54,890 --> 00:03:02,510 And for that instance, it's really, really important to make good prompts because we need good pictures. 40 00:03:02,510 --> 00:03:07,790 If we are not specific, we can create a pictures that look like this. 41 00:03:07,820 --> 00:03:12,710 If we simply tell maybe a beach, we will get a random beach. 42 00:03:12,710 --> 00:03:21,200 If we tell him a beach, blue ocean, blue sky and so on, we will get exactly this picture. 43 00:03:21,440 --> 00:03:28,490 So a quick illustration of this process because some people like this illustration, I use this a lot. 44 00:03:28,490 --> 00:03:33,110 Just imagine you lay down on the ground and you look in the sky. 45 00:03:33,140 --> 00:03:41,360 Besides, you is your girlfriend or your boyfriend or whoever you want and she tells to you, can you 46 00:03:41,360 --> 00:03:42,830 see this cloud? 47 00:03:42,830 --> 00:03:46,970 It looks a little bit like an apple, but you don't get it. 48 00:03:46,970 --> 00:03:48,770 You don't see the apple. 49 00:03:48,950 --> 00:03:54,110 But then she tells you, of course, just look, here is the apple. 50 00:03:54,110 --> 00:03:56,540 And then you start to understand. 51 00:03:56,540 --> 00:04:05,240 You see the cloud and now your eyes see an apple because your brain is trained on apples, your brain 52 00:04:05,240 --> 00:04:08,630 most likely knows how a apple looks like. 53 00:04:08,630 --> 00:04:14,270 And then you see the apple in the cloud, even if there is no apple there. 54 00:04:14,270 --> 00:04:21,080 And if your girlfriend doesn't say it's maybe a green apple, maybe you think of a red apple. 55 00:04:21,080 --> 00:04:26,030 And that's exactly why we need to use good prompt engineering. 56 00:04:26,030 --> 00:04:31,130 Because if we don't are specific, we will get random pictures. 57 00:04:31,130 --> 00:04:37,940 If you want to have a green apple, you need to tell the computer that you want to have a green apple, 58 00:04:37,940 --> 00:04:40,010 just like your girlfriend. 59 00:04:40,010 --> 00:04:44,090 Need to tell you that the apple in the clouds is green. 60 00:04:44,090 --> 00:04:51,050 If she doesn't tell you that, maybe you think of a red apple, maybe of a green apple, maybe even 61 00:04:51,050 --> 00:04:53,180 a yellow apple you doesn't know. 62 00:04:53,180 --> 00:04:55,970 So you need to be specific. 63 00:04:55,970 --> 00:04:59,780 So in this video we took a quick look at the diffusion. 64 00:04:59,960 --> 00:05:00,320 Model. 65 00:05:00,350 --> 00:05:02,930 The diffusion model works simple. 66 00:05:02,930 --> 00:05:05,990 It's trained on pictures and on text. 67 00:05:06,020 --> 00:05:08,090 Then noise gets added. 68 00:05:08,090 --> 00:05:16,400 The computer learns in this process how this picture looks like, and if we give the computer text afterwards, 69 00:05:16,400 --> 00:05:26,870 it can just create this pictures because it will randomly select the pixels that are right for our picture. 70 00:05:26,870 --> 00:05:29,570 And I hope this makes sense for you. 71 00:05:29,570 --> 00:05:33,290 And in the next video we will take an even closer look. 72 00:05:33,290 --> 00:05:36,830 Because stable diffusion is a bit special. 73 00:05:36,830 --> 00:05:41,600 We can use different checkpoints, Laura's seats, and so on. 7064

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.