subtitlecat.com

All language subtitles for 2. The Diffusion Model explained

Afrikaans

Akan

Albanian

Amharic

Arabic Download

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,200 --> 00:00:01,580 I already told you. 2 00:00:01,610 --> 00:00:03,680 We work with stable diffusion. 3 00:00:03,680 --> 00:00:05,570 And stable diffusion is. 4 00:00:05,840 --> 00:00:07,550 It's a diffusion model. 5 00:00:07,550 --> 00:00:12,890 In this video we will take a look at what diffusion models are and what they do. 6 00:00:12,890 --> 00:00:16,940 So I have found a really, really nice article from medium. 7 00:00:16,940 --> 00:00:21,740 This article is relatively long but we won't make it that far. 8 00:00:21,920 --> 00:00:24,890 All I need is this picture right here. 9 00:00:24,890 --> 00:00:33,320 Let's assume we have a big, big computer and we train our computer on images on images like this. 10 00:00:33,320 --> 00:00:41,300 So we give the computer images, for example, of this beach, and we describe it with a text. 11 00:00:41,300 --> 00:00:50,360 We give the computer the image and we say maybe a beach with the blue ocean, blue sky, there's some 12 00:00:50,360 --> 00:00:52,370 green on the mountains and so on. 13 00:00:52,370 --> 00:00:55,070 We are really, really specific. 14 00:00:55,250 --> 00:01:02,840 After that we add some noise to the picture, like you see here, but we still described what's on the 15 00:01:02,840 --> 00:01:03,560 picture. 16 00:01:03,560 --> 00:01:08,780 So a beach, blue ocean, blue sky and so on. 17 00:01:08,780 --> 00:01:16,730 More noise, same text, more noise, same text, more noise, same text until you get only noise. 18 00:01:17,210 --> 00:01:23,480 In this process, the computer learns how these pictures look like. 19 00:01:23,480 --> 00:01:32,540 In this process, it simply understands that the words that you gave the computer yield to this picture. 20 00:01:32,540 --> 00:01:35,210 So we can reverse this. 21 00:01:35,210 --> 00:01:44,030 If we have only noise and we tell the computer a beach, blue sky, blue ocean, there is some green 22 00:01:44,030 --> 00:01:45,920 on the mountains and so on. 23 00:01:45,920 --> 00:01:50,510 The computer can reverse this and make out of the noise. 24 00:01:50,510 --> 00:01:54,980 This picture, this is really, really a cool concept. 25 00:01:54,980 --> 00:01:58,640 And of course we don't do this with just one picture. 26 00:01:58,640 --> 00:02:03,590 We try to give the computer every picture that we can find. 27 00:02:03,590 --> 00:02:06,950 And there are of course different diffusion models. 28 00:02:06,950 --> 00:02:10,490 For example, there's also Adobe Firefly. 29 00:02:10,610 --> 00:02:15,710 Adobe Firefly is trained on pictures of Adobe Stock Stable. 30 00:02:15,710 --> 00:02:18,740 Diffusion is open source and it's free. 31 00:02:18,740 --> 00:02:20,480 Everybody can use it. 32 00:02:20,480 --> 00:02:25,130 And Stable Diffusion was trained on pictures from the internet. 33 00:02:25,130 --> 00:02:31,940 And because of this, we also can create nearly everything that is on the internet. 34 00:02:31,940 --> 00:02:34,580 We can create even celebrities. 35 00:02:34,580 --> 00:02:38,780 We can create not safe for work stuff and so on. 36 00:02:38,780 --> 00:02:42,020 Stable diffusion is not restricted. 37 00:02:42,050 --> 00:02:49,070 Nearly everything that is in the internet we can create with stable diffusion if we give the right prompts. 38 00:02:49,070 --> 00:02:54,890 The prompts are the descriptions that we give the computer to make our picture. 39 00:02:54,890 --> 00:03:02,510 And for that instance, it's really, really important to make good prompts because we need good pictures. 40 00:03:02,510 --> 00:03:07,790 If we are not specific, we can create a pictures that look like this. 41 00:03:07,820 --> 00:03:12,710 If we simply tell maybe a beach, we will get a random beach. 42 00:03:12,710 --> 00:03:21,200 If we tell him a beach, blue ocean, blue sky and so on, we will get exactly this picture. 43 00:03:21,440 --> 00:03:28,490 So a quick illustration of this process because some people like this illustration, I use this a lot. 44 00:03:28,490 --> 00:03:33,110 Just imagine you lay down on the ground and you look in the sky. 45 00:03:33,140 --> 00:03:41,360 Besides, you is your girlfriend or your boyfriend or whoever you want and she tells to you, can you 46 00:03:41,360 --> 00:03:42,830 see this cloud? 47 00:03:42,830 --> 00:03:46,970 It looks a little bit like an apple, but you don't get it. 48 00:03:46,970 --> 00:03:48,770 You don't see the apple. 49 00:03:48,950 --> 00:03:54,110 But then she tells you, of course, just look, here is the apple. 50 00:03:54,110 --> 00:03:56,540 And then you start to understand. 51 00:03:56,540 --> 00:04:05,240 You see the cloud and now your eyes see an apple because your brain is trained on apples, your brain 52 00:04:05,240 --> 00:04:08,630 most likely knows how a apple looks like. 53 00:04:08,630 --> 00:04:14,270 And then you see the apple in the cloud, even if there is no apple there. 54 00:04:14,270 --> 00:04:21,080 And if your girlfriend doesn't say it's maybe a green apple, maybe you think of a red apple. 55 00:04:21,080 --> 00:04:26,030 And that's exactly why we need to use good prompt engineering. 56 00:04:26,030 --> 00:04:31,130 Because if we don't are specific, we will get random pictures. 57 00:04:31,130 --> 00:04:37,940 If you want to have a green apple, you need to tell the computer that you want to have a green apple, 58 00:04:37,940 --> 00:04:40,010 just like your girlfriend. 59 00:04:40,010 --> 00:04:44,090 Need to tell you that the apple in the clouds is green. 60 00:04:44,090 --> 00:04:51,050 If she doesn't tell you that, maybe you think of a red apple, maybe of a green apple, maybe even 61 00:04:51,050 --> 00:04:53,180 a yellow apple you doesn't know. 62 00:04:53,180 --> 00:04:55,970 So you need to be specific. 63 00:04:55,970 --> 00:04:59,780 So in this video we took a quick look at the diffusion. 64 00:04:59,960 --> 00:05:00,320 Model. 65 00:05:00,350 --> 00:05:02,930 The diffusion model works simple. 66 00:05:02,930 --> 00:05:05,990 It's trained on pictures and on text. 67 00:05:06,020 --> 00:05:08,090 Then noise gets added. 68 00:05:08,090 --> 00:05:16,400 The computer learns in this process how this picture looks like, and if we give the computer text afterwards, 69 00:05:16,400 --> 00:05:26,870 it can just create this pictures because it will randomly select the pixels that are right for our picture. 70 00:05:26,870 --> 00:05:29,570 And I hope this makes sense for you. 71 00:05:29,570 --> 00:05:33,290 And in the next video we will take an even closer look. 72 00:05:33,290 --> 00:05:36,830 Because stable diffusion is a bit special. 73 00:05:36,830 --> 00:05:41,600 We can use different checkpoints, Laura's seats, and so on. 7064