subtitlecat.com

All language subtitles for 019 Deep Q-Learning Intuition - Acting-subtitle-en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu Download

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,650 --> 00:00:05,690 Hello and welcome back to the course on a I I in the previous part we talked about the deep learning 2 00:00:05,750 --> 00:00:08,360 Killary intuition we started there. 3 00:00:08,360 --> 00:00:14,900 And in fact we actually got all the way to this part and where we talked about learning and now we're 4 00:00:14,900 --> 00:00:18,200 going to move on to the actual acting part. 5 00:00:18,200 --> 00:00:22,250 So there's there's two parts to distinct parts that we have to remember. 6 00:00:22,250 --> 00:00:25,520 So that's the learning part but now he actually he's done all of this. 7 00:00:25,520 --> 00:00:26,390 That's beautiful. 8 00:00:26,390 --> 00:00:30,500 Now he actually has to take an action he has to decide what is he going to do is going to do action 9 00:00:30,500 --> 00:00:31,710 one two three or four. 10 00:00:31,740 --> 00:00:32,860 And so how does he do that. 11 00:00:33,020 --> 00:00:39,370 Well the way he does it is now given those same values so the values don't change after we've we have 12 00:00:39,370 --> 00:00:43,430 these values of compare them with Calcott the last two by arrogated era we've updated the weights but 13 00:00:43,430 --> 00:00:45,950 the values don't change in that whole process. 14 00:00:45,990 --> 00:00:47,410 To have got the cube values there. 15 00:00:47,430 --> 00:00:48,380 They're fixed. 16 00:00:48,380 --> 00:00:49,440 We know what they are. 17 00:00:49,440 --> 00:00:50,480 All this happens though. 18 00:00:50,510 --> 00:00:53,820 Networks updated and out using those same values that we had. 19 00:00:53,960 --> 00:00:58,600 What we're going to do is we're going to parse them through a soft max function. 20 00:00:58,610 --> 00:01:00,580 And again soft Max as described. 21 00:01:00,620 --> 00:01:05,160 I think an annex 2 and we'll talk a bit more about soft max. 22 00:01:05,180 --> 00:01:12,070 Further down in or we'll talk about this action selection policy further down in the rest of this section. 23 00:01:12,140 --> 00:01:13,610 So just in a few tutorials. 24 00:01:13,730 --> 00:01:17,270 But for now we're just going to say we're passing it through a soft next function. 25 00:01:17,270 --> 00:01:22,150 Basically what it does is it allows it helps select the best one it selects the best action possible. 26 00:01:22,250 --> 00:01:23,650 And there's a small caveat to that. 27 00:01:23,660 --> 00:01:26,120 It's not just the best one possible. 28 00:01:26,120 --> 00:01:28,940 We'll talk about that in the action selection policy tutorial. 29 00:01:28,940 --> 00:01:35,890 But for now let's just say it selects the best action from here it says OK so Q1 you know the likelihood. 30 00:01:36,140 --> 00:01:41,960 Basically we know that q values predicted the Q value so it can look at them and say OK so the highest 31 00:01:41,960 --> 00:01:46,280 Q value of these just as we did in the simple Q learning algorithm. 32 00:01:46,280 --> 00:01:50,240 Ill just look at all these for say the highest values this one I'm going to select that action we're 33 00:01:50,240 --> 00:01:50,860 going to take those. 34 00:01:50,900 --> 00:01:52,180 And that's pretty much it. 35 00:01:52,220 --> 00:01:57,300 That's how he chooses which action take takes takes action and then all of this process happens again. 36 00:01:57,290 --> 00:02:02,120 For for the next stage the agent ends up in in our case and the next square of the maze. 37 00:02:02,120 --> 00:02:04,540 But generally speaking in the next state. 38 00:02:04,640 --> 00:02:05,420 So there we go. 39 00:02:05,420 --> 00:02:14,660 That's how we feed in a reinforcement learning problem into a neural network through a vector describing 40 00:02:14,660 --> 00:02:16,160 the state that we're in. 41 00:02:16,160 --> 00:02:17,510 And once we fit it. 42 00:02:17,510 --> 00:02:22,210 There's two parts of the process that happen Part one is the learning. 43 00:02:22,400 --> 00:02:26,840 So remember that part where we compare each of the cube values with the target and then we back propagate 44 00:02:26,840 --> 00:02:32,360 the loss through the network to update the weights so that our network is learning as we go through 45 00:02:32,360 --> 00:02:34,830 this maze or through this environment. 46 00:02:35,210 --> 00:02:41,120 And also the second part is of course we have to act we have to select an action and that is where we 47 00:02:41,120 --> 00:02:46,880 pass the values through a soft max function and or basically an action selection policy which we'll 48 00:02:46,880 --> 00:02:48,330 talk about further down. 49 00:02:48,470 --> 00:02:53,570 And then we simply select the action that we want to take and we perform that action and then this whole 50 00:02:53,570 --> 00:02:54,580 process starts again. 51 00:02:54,770 --> 00:02:59,570 And then maybe the agent gets then maybe the agent doesn't pausa the game. 52 00:02:59,630 --> 00:03:01,250 In any case the game ends. 53 00:03:01,250 --> 00:03:08,270 And then once again the whole process repeats the agent plays the whole game again and then that stops 54 00:03:08,270 --> 00:03:14,460 so basically that's that's another airpark every time the agent you know every time the game ends with 55 00:03:14,460 --> 00:03:16,680 a favor beyond fairie that's the end of an airport. 56 00:03:16,700 --> 00:03:19,560 And then he starts again and then he starts again and then he starts again. 57 00:03:19,790 --> 00:03:20,420 And so on. 58 00:03:20,420 --> 00:03:26,810 So that happens and this process happens for every single time the agent is in you in a new state so 59 00:03:26,810 --> 00:03:32,240 the state is encoded here so that's important not just for every single game that he plays but for every 60 00:03:32,240 --> 00:03:33,020 single state. 61 00:03:33,020 --> 00:03:38,030 So he's in a state that goes through his process dates and so on and happens every single time. 62 00:03:38,150 --> 00:03:41,410 And so the learning happens and the acting happens as well. 63 00:03:41,720 --> 00:03:47,090 So that is deep learning in the intuition behind deep learning. 64 00:03:47,090 --> 00:03:54,200 We've got lots more to cover off and then of course practical and in the meantime if you'd like to get 65 00:03:54,410 --> 00:03:56,720 some additional information on keep learning. 66 00:03:56,720 --> 00:04:05,200 We've got a recommended reading so we've already spoken about Arthur Giuliani's series of blog posts. 67 00:04:05,210 --> 00:04:12,590 If you look at simple informal learning Lifton's flow part 4 you will find the part that's relevant 68 00:04:12,590 --> 00:04:14,260 to what we discussed today. 69 00:04:14,270 --> 00:04:21,170 Note that here he talks about convolutions we are not covering revolutions in this section we're going 70 00:04:21,170 --> 00:04:23,650 to be talking about them in the next section of the course. 71 00:04:23,720 --> 00:04:28,880 So the difference here is that it's just kind of skip the conclusions part for now and we'll talk about 72 00:04:28,880 --> 00:04:32,850 them in the next part of the course but the difference is in evolutions. 73 00:04:32,850 --> 00:04:39,170 You're like looking the agent is looking at the image and and therefore he has to process an image an 74 00:04:39,170 --> 00:04:43,540 additional complication for now where we're slowly gradually building up to that. 75 00:04:43,580 --> 00:04:50,060 For now we're encoding our environment through you look here we're encoding our environment or maybe 76 00:04:50,060 --> 00:04:58,700 like look at this one probably in coding our environment as a or in to state the agent is in as a vector. 77 00:04:58,700 --> 00:05:01,330 So in our case was very simple vector of values. 78 00:05:01,490 --> 00:05:06,190 Sometimes people even in that in that simple may sometimes or as you'll see from this blog post. 79 00:05:06,290 --> 00:05:10,180 Sometimes people prefer the one hot and coded version of that state. 80 00:05:10,180 --> 00:05:13,380 So basically where every single box of the maze has a. 81 00:05:13,620 --> 00:05:17,780 So you have like a vector of for a null case would be 12 values three by four. 82 00:05:17,800 --> 00:05:22,130 So it isn't like either either 1 or 0 depending on which elements and which box you're in. 83 00:05:22,160 --> 00:05:22,990 In the environment. 84 00:05:23,060 --> 00:05:29,900 So in whichever way you decide to code your environment and the state of your environment that's how 85 00:05:29,900 --> 00:05:31,520 in coding It's basically a vector. 86 00:05:31,520 --> 00:05:36,410 The key here is that it's not a convolution So it's not like an image and there's no convolution volt 87 00:05:36,410 --> 00:05:37,810 So this part will come later. 88 00:05:37,820 --> 00:05:43,410 For us it starts over here and that just simplifies the process for us to gradually understand better. 89 00:05:43,550 --> 00:05:49,130 And of course don't forget that this post is rude and tends to flow and we're using pi torche in our 90 00:05:49,130 --> 00:05:50,090 tutorials. 91 00:05:50,090 --> 00:05:51,910 So hopefully you enjoy this. 92 00:05:51,920 --> 00:05:59,220 A quick intro into a deep convolutional deep not yet deep book learning. 93 00:05:59,310 --> 00:06:02,910 And on that note I look forward to seeing you next. 94 00:06:02,930 --> 00:06:05,430 And until then enjoy artificial intelligence. 10213