subtitlecat.com

All language subtitles for 065 How do Neural Networks learn-en

Afrikaans

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bengali

Bosnian

Bulgarian

Catalan

Cebuano

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Khmer

Korean

Kurdish (Kurmanji)

Kyrgyz

Lao

Latin

Latvian

Lithuanian

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mongolian

Myanmar (Burmese)

Nepali

Norwegian

Pashto

Persian

Polish

Portuguese

Punjabi

Romanian

Russian

Samoan

Scots Gaelic

Serbian

Sesotho

Shona

Sindhi

Sinhala

Slovak

Slovenian

Somali

Spanish

Sundanese

Swahili

Swedish

Tajik

Tamil

Telugu

Thai

Turkish

Ukrainian

Urdu

Uzbek

Vietnamese Download

Welsh

Xhosa

Yiddish

Yoruba

Zulu

Odia (Oriya)

Kinyarwanda

Turkmen

Tatar

Uyghur

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,840 --> 00:00:05,380 Hello and welcome back to the course and deep learning now that we've seen your own networks in action 2 00:00:05,440 --> 00:00:08,280 it's time for us to find out how they learn. 3 00:00:08,470 --> 00:00:10,480 So let's get right into it. 4 00:00:10,510 --> 00:00:16,100 They are two fundamentally different approaches to getting a program to do what you want it to do. 5 00:00:16,240 --> 00:00:24,610 One is hard coded coding where you actually tell the program's specific rules and what outcomes you 6 00:00:24,610 --> 00:00:25,120 want. 7 00:00:25,120 --> 00:00:30,940 And you just guide it throughout the whole way and you account for all the possible options that the 8 00:00:30,940 --> 00:00:33,130 program has to deal with. 9 00:00:33,310 --> 00:00:41,320 On the other hand you have neural networks where you create a facility for the program to be able to 10 00:00:41,800 --> 00:00:43,530 understand what it needs to do on its own. 11 00:00:43,530 --> 00:00:50,080 So you basically create this neural network where you provided inputs you tell it what you want as outputs 12 00:00:50,110 --> 00:00:53,050 and then you let it figure everything out on its own. 13 00:00:53,350 --> 00:00:59,890 Two fundamentally different approaches and that is something to keep in mind as we go through these 14 00:00:59,890 --> 00:01:00,850 tutorials. 15 00:01:00,850 --> 00:01:06,180 Our goal is to create this network which then learns on its own. 16 00:01:06,220 --> 00:01:14,570 We going to avoid trying to put in the rules and a good example that I can give you right now is this 17 00:01:14,680 --> 00:01:18,680 will come further in the course but it's just a very visual example for instance. 18 00:01:18,700 --> 00:01:25,690 How do you distinguish between a dog and cat fur on the left side on the process depicted on the left 19 00:01:25,690 --> 00:01:33,250 you would program things like the cat's ears have to be like this look out for whiskers look out for 20 00:01:33,250 --> 00:01:39,530 this type of nose look out for this type of shape of face look out for these colors you kind of you'd 21 00:01:39,530 --> 00:01:45,310 describe all these things and you'd have conditions like if if the ears are pointy than cat if the ears 22 00:01:45,310 --> 00:01:49,600 are sloping down and possibly dog and so on. 23 00:01:49,600 --> 00:01:55,090 On the other hand for a neural network you just code the neural networks you code the architecture and 24 00:01:55,090 --> 00:02:01,030 then you point the neural network at a folder with all these cats and dogs with images of cats and dogs 25 00:02:01,030 --> 00:02:06,580 which are already categorized and you tell it OK I've got you I've got some images of cats and dogs 26 00:02:06,880 --> 00:02:08,860 go and learn what a cat is. 27 00:02:08,860 --> 00:02:10,560 Go and learn what a dog is. 28 00:02:10,600 --> 00:02:16,000 And the neural network will on its own understand everything it needs to understand and then further 29 00:02:16,000 --> 00:02:20,950 down once its trained up when you give it a new image of a cat or dog it will be able to understand 30 00:02:20,950 --> 00:02:21,600 what it was. 31 00:02:21,610 --> 00:02:25,600 So there they are those are the two fundamentally different approaches. 32 00:02:25,690 --> 00:02:31,090 And today we're going to slowly start getting into how that second approach works. 33 00:02:31,090 --> 00:02:31,530 All right. 34 00:02:31,570 --> 00:02:33,340 So let's get straight to it. 35 00:02:33,400 --> 00:02:39,880 Here we have a very basic neural network with a one layer is called a single layer feedforward neural 36 00:02:39,880 --> 00:02:42,760 network and it is also called a perception. 37 00:02:42,760 --> 00:02:47,380 Now before we proceed one thing that we do need to adjust is that output value. 38 00:02:47,380 --> 00:02:49,320 Right now you can see that it's just a Y. 39 00:02:49,330 --> 00:02:51,160 We need to put a y hat in there. 40 00:02:51,190 --> 00:02:56,500 And the reason for that is usually y stands for the actual value and that's what we're going to be using. 41 00:02:56,500 --> 00:03:03,700 So why is it going to be the actual value which we see inreality output value is the predicted value 42 00:03:03,700 --> 00:03:05,890 by the algorithm by the neural network. 43 00:03:05,890 --> 00:03:09,220 Why what is the output value. 44 00:03:09,220 --> 00:03:11,500 Basically that's the denomination for the output value. 45 00:03:11,740 --> 00:03:20,020 And the perception that was first invented in 1957 by Frank Rosenblat and his whole idea was to create 46 00:03:20,170 --> 00:03:25,010 something that can actually learn and adjust itself. 47 00:03:25,240 --> 00:03:28,010 And this is what we're going to be looking at now. 48 00:03:28,030 --> 00:03:30,230 So we've got our precept drawn. 49 00:03:30,250 --> 00:03:32,070 Let's see how our perception learns. 50 00:03:32,080 --> 00:03:39,130 So let's say we have some input values that have been supplied to the perception and or basically to 51 00:03:39,130 --> 00:03:40,210 our own network. 52 00:03:40,330 --> 00:03:44,190 Then the activation function is applied. 53 00:03:44,200 --> 00:03:49,210 We have an output and now we're going to plot the output on a chart. 54 00:03:49,210 --> 00:03:51,830 So there it is our output y hat. 55 00:03:51,830 --> 00:03:57,520 Now what we need to do is in order to be able to learn we need to compare the output value to the actual 56 00:03:57,520 --> 00:04:01,310 value that we want the neural network to get right. 57 00:04:01,600 --> 00:04:04,520 And that is the value y. 58 00:04:04,810 --> 00:04:08,230 And so if we put it here you'll see that there's a bit of a difference. 59 00:04:08,330 --> 00:04:13,510 Now we're going to calculate a function called the cost function is calculated as one half of the difference 60 00:04:13,510 --> 00:04:17,200 of the square difference between the actual value and output value. 61 00:04:17,200 --> 00:04:20,500 Now there there are many ways you can come up for class function. 62 00:04:20,500 --> 00:04:23,300 There are many different cost functions that you can use. 63 00:04:23,320 --> 00:04:30,280 This is probably the most commonly used call function and why it is specifically this function that 64 00:04:30,280 --> 00:04:34,900 we use will find out further down when we're talking about a gradient decent but for now we're just 65 00:04:34,900 --> 00:04:39,830 going to agree that this is the cost function and basically what the cost function is telling us is 66 00:04:40,420 --> 00:04:44,240 what is the error that you have in your prediction. 67 00:04:44,290 --> 00:04:50,770 And our goal is to minimize the cost function because the lower the cost function the closer the y hat 68 00:04:50,790 --> 00:04:51,780 is to y. 69 00:04:52,150 --> 00:04:54,430 OK so as only we agree on that let's proceed. 70 00:04:54,430 --> 00:05:00,760 So basically from here what happens is there is a cost function and from here what happens is now we're 71 00:05:00,760 --> 00:05:08,950 going to once we've compared now we're going to feed this information back into the neural network. 72 00:05:08,980 --> 00:05:14,170 So there we go there's the information going back into the neural network and it goes to the weights 73 00:05:14,200 --> 00:05:15,630 and the weights get updated. 74 00:05:15,700 --> 00:05:20,880 Basically the only thing that we have control of in this very simple neural network are the weights 75 00:05:20,900 --> 00:05:23,490 w 1 W2 all the way to W.. 76 00:05:23,980 --> 00:05:29,370 And our goal is to minimize the cost function so all we can do is update the weights. 77 00:05:29,500 --> 00:05:33,690 So we update the weights and tweak them a little bit. 78 00:05:33,940 --> 00:05:39,600 And how exactly we'll find out for the down but for now we agree that we have the the weights and then 79 00:05:39,600 --> 00:05:40,320 we continue so. 80 00:05:40,320 --> 00:05:48,870 But here I've put up this screenshot of the data just to make some one point very clear that right now 81 00:05:48,930 --> 00:05:53,990 throughout this whole experiment everything we're doing right now we're dealing with just the one role. 82 00:05:54,000 --> 00:06:00,330 So we're dealing with we have a dataset of one row where we have for instance we're dealing with how 83 00:06:00,330 --> 00:06:05,720 long you study it like the variable that we're predicting is what. 84 00:06:06,180 --> 00:06:08,230 What's the result you're going to get on an exam. 85 00:06:08,430 --> 00:06:13,200 And the dependent independent variables that we have is how many hours did you study for how many hours 86 00:06:13,200 --> 00:06:15,430 did you sleep and what did you get on the quiz. 87 00:06:15,460 --> 00:06:19,880 In the mid-semester So in the middle of the semester is a quiz what percentage did you get there. 88 00:06:19,880 --> 00:06:26,100 So based on those variables we're trying to predict what score you'll get for the exam and exam the 89 00:06:26,100 --> 00:06:28,010 93 percent that's the actual value. 90 00:06:28,010 --> 00:06:29,020 So that's why. 91 00:06:29,560 --> 00:06:30,460 So. 92 00:06:30,660 --> 00:06:36,720 So we feed these three values into a neural network again for the second time now and then we're going 93 00:06:36,720 --> 00:06:38,980 to be comparing the result to white. 94 00:06:39,150 --> 00:06:40,690 So let's see how this works. 95 00:06:40,800 --> 00:06:43,710 We feed these values into the neural network. 96 00:06:43,830 --> 00:06:50,160 Everything gets adjusted and weights get it just so as you can see this is again we feed the values 97 00:06:50,190 --> 00:06:55,480 again the point here is that we're feeding in the same ball so we only have one roll we're trying to 98 00:06:55,480 --> 00:06:56,370 do we're training on one row. 99 00:06:56,370 --> 00:06:59,580 This is because this is just a very simple basic example. 100 00:06:59,640 --> 00:07:01,610 Then we'll see what happens when there's morals. 101 00:07:01,800 --> 00:07:06,180 So again we feed these rows in our cross-functional get adjusted. 102 00:07:06,180 --> 00:07:10,520 As you can see everything happens along those lines again. 103 00:07:10,530 --> 00:07:15,030 So as you say every time our white hat is changing because we've tweaked the weights. 104 00:07:15,030 --> 00:07:20,550 All I had is changing our clothes function changing this whole look again so we feed those in. 105 00:07:20,550 --> 00:07:22,840 Why had is changing clothes function is changing. 106 00:07:22,920 --> 00:07:27,020 We get information back feedback to the weights so that the weights get adjusted again. 107 00:07:27,030 --> 00:07:31,850 We feed in the same values every time everything gets adjusted goes back to the weights. 108 00:07:31,860 --> 00:07:33,920 And one more time feed in. 109 00:07:34,020 --> 00:07:34,990 OK. 110 00:07:35,730 --> 00:07:40,720 And another time so we've adjust the way that just the way we feel in the information. 111 00:07:40,830 --> 00:07:41,370 And there we go. 112 00:07:41,370 --> 00:07:45,990 So now this time the white hat is equal to y cross-functional 0. 113 00:07:46,020 --> 00:07:48,410 Usually you won't get cost function equal to zero. 114 00:07:48,420 --> 00:07:50,720 But this is a very simple example. 115 00:07:50,820 --> 00:07:57,480 So hopefully all that made sense every time we feed in exactly that same row because just in this case 116 00:07:57,480 --> 00:08:01,370 we're just dealing with that one row into our neural network. 117 00:08:01,400 --> 00:08:06,990 Well then the weights get the values get valid supply supply the ways activation function is applied 118 00:08:06,990 --> 00:08:12,320 we get y hat y had as compared to Y then we see how the cost function is changed. 119 00:08:12,430 --> 00:08:16,500 Feedback and the feed that information Bakker's on your own network and then just adjust the weights 120 00:08:16,500 --> 00:08:17,470 again. 121 00:08:17,850 --> 00:08:21,410 And then we repeat the same process again with the same exact row. 122 00:08:21,570 --> 00:08:23,320 We're trying to minimize that cost. 123 00:08:23,520 --> 00:08:26,860 So up until now we've been dealing with just that one row. 124 00:08:27,030 --> 00:08:29,470 Let's see what happens when you have multiple roles. 125 00:08:29,490 --> 00:08:31,320 So here's the full data set. 126 00:08:31,350 --> 00:08:38,610 We have eight rows of how many hours you slept or maybe these are different students in day taking the 127 00:08:38,610 --> 00:08:44,070 same exam how many other hours they studied how many hours they slept before the exam would get on the 128 00:08:44,070 --> 00:08:47,300 quiz and their final result on the test. 129 00:08:47,490 --> 00:08:52,720 And as you can see here on the left I've got eight of these perceptions actually. 130 00:08:53,100 --> 00:08:55,950 They are all the same perception so this is also important. 131 00:08:56,010 --> 00:09:02,600 I just multiplied it or like duplicated eight times just so that we can. 132 00:09:03,330 --> 00:09:04,310 Conception is that. 133 00:09:04,320 --> 00:09:10,010 But the important thing here is the same neural network we're going to be feeding these into one Samual 134 00:09:10,040 --> 00:09:10,380 network. 135 00:09:10,380 --> 00:09:11,650 So let's go let's get started. 136 00:09:11,650 --> 00:09:20,550 So one airport as you'll hear had lain mentioning one airpark is when we go through a whole dataset 137 00:09:20,610 --> 00:09:27,410 and we train our neural network on on all of these roles so those lists are. 138 00:09:27,420 --> 00:09:34,410 So there's our first row and there's Why had for the first row there's a second role there's why I had 139 00:09:34,410 --> 00:09:35,260 for the second round. 140 00:09:35,280 --> 00:09:39,590 So again it's being fed into the same neural network every time. 141 00:09:39,600 --> 00:09:45,070 I just copied them several times so we can visually see how this is happening. 142 00:09:45,090 --> 00:09:52,320 Then again as it's happening again that's third row fourth row there is our white head for the fourth 143 00:09:52,320 --> 00:09:53,010 row and so on. 144 00:09:53,010 --> 00:09:56,580 Basically then we get the same values for the remaining four rows as well. 145 00:09:56,580 --> 00:10:03,440 So every time we just feed in a row into our neural network we get about it. 146 00:10:03,780 --> 00:10:06,930 Then we compare to the actual value. 147 00:10:06,930 --> 00:10:08,550 So they are the actual values. 148 00:10:08,760 --> 00:10:11,340 So for every single roll we have an actual value. 149 00:10:11,640 --> 00:10:18,480 And now based on all of these differences between y hat and why we can calculate the cost function which 150 00:10:18,480 --> 00:10:27,620 is the sum of all of those squared differences between why and why and how all of that is halved. 151 00:10:28,230 --> 00:10:30,360 And there's our cost function. 152 00:10:30,360 --> 00:10:36,750 And basically now what we do after we have the full cost function we go back and we update the weights 153 00:10:37,170 --> 00:10:39,480 we update a W 1 WTW. 154 00:10:39,510 --> 00:10:45,810 And the important thing to remember here is that all of these perception's all of these neural networks 155 00:10:45,810 --> 00:10:47,340 is actually one neural network. 156 00:10:47,340 --> 00:10:49,420 So there's not eight of them there's just one. 157 00:10:49,680 --> 00:10:55,110 And when we update the weights we're going to update the weights in that one neural network so basically 158 00:10:55,110 --> 00:10:57,900 the weights are going to be the same for all of the rows. 159 00:10:57,930 --> 00:11:00,560 So it's not the case that every role has its own weight. 160 00:11:00,580 --> 00:11:07,320 Now all the rows share the weights and so that's why we looked at the cost function which is the sum 161 00:11:07,620 --> 00:11:15,270 of the square differences and then we updated the weights and now from here there was just one iteration. 162 00:11:15,270 --> 00:11:19,020 Next we're going to run this whole thing again. 163 00:11:19,020 --> 00:11:25,440 We're going to feed every single row into the neural network find out our cost function and do this 164 00:11:25,440 --> 00:11:26,370 whole process again. 165 00:11:26,370 --> 00:11:32,090 So just as we saw previously where we had just one row and we were doing everything again and again 166 00:11:32,140 --> 00:11:33,590 and again same thing here. 167 00:11:33,600 --> 00:11:38,880 But now we're going to be doing and Pedros or 800 rows or eight thousand rows however many rows you 168 00:11:38,880 --> 00:11:40,590 have in your data set. 169 00:11:40,830 --> 00:11:43,700 You do this process and then you calculate the cost function. 170 00:11:44,220 --> 00:11:51,510 And the goal here is to minimize the cost function and to get as soon as you found a minute of the cost 171 00:11:51,510 --> 00:12:00,210 function that is your final neural network that means your weights have been adjusted and you have found 172 00:12:00,750 --> 00:12:08,550 the optimal weights for this dataset that you began your training on and you're ready to proceed to 173 00:12:08,550 --> 00:12:11,130 the testing phase or to the application phase. 174 00:12:11,550 --> 00:12:14,920 And this whole process is called back propagation. 175 00:12:15,000 --> 00:12:21,930 So some additional reading that you might want to do for the cost function and I know we just talked 176 00:12:21,930 --> 00:12:24,840 about one and there are many different ones. 177 00:12:24,840 --> 00:12:28,690 A good article is located on cross validated. 178 00:12:28,740 --> 00:12:33,020 It's called a list of course functions used in neural networks alongside applications. 179 00:12:33,090 --> 00:12:39,840 So the euro is there but you can just google for that exact search term or search phrase and you will 180 00:12:39,960 --> 00:12:42,150 that this one will be the first one that pops up. 181 00:12:42,150 --> 00:12:48,660 It's actually got some good examples and application or use cases for different cost functions so if 182 00:12:48,660 --> 00:12:51,800 you're interested to learn more about cost functions Check out this article. 183 00:12:51,990 --> 00:12:54,380 And on that note I hope you enjoy this tutorial. 184 00:12:54,420 --> 00:12:56,070 I look forward to see you next time. 185 00:12:56,070 --> 00:12:58,020 Until then enjoy deep learning. 20145