Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,840 --> 00:00:05,380
Hello and welcome back to the course and deep learning now that we've seen your own networks in action
2
00:00:05,440 --> 00:00:08,280
it's time for us to find out how they learn.
3
00:00:08,470 --> 00:00:10,480
So let's get right into it.
4
00:00:10,510 --> 00:00:16,100
They are two fundamentally different approaches to getting a program to do what you want it to do.
5
00:00:16,240 --> 00:00:24,610
One is hard coded coding where you actually tell the program's specific rules and what outcomes you
6
00:00:24,610 --> 00:00:25,120
want.
7
00:00:25,120 --> 00:00:30,940
And you just guide it throughout the whole way and you account for all the possible options that the
8
00:00:30,940 --> 00:00:33,130
program has to deal with.
9
00:00:33,310 --> 00:00:41,320
On the other hand you have neural networks where you create a facility for the program to be able to
10
00:00:41,800 --> 00:00:43,530
understand what it needs to do on its own.
11
00:00:43,530 --> 00:00:50,080
So you basically create this neural network where you provided inputs you tell it what you want as outputs
12
00:00:50,110 --> 00:00:53,050
and then you let it figure everything out on its own.
13
00:00:53,350 --> 00:00:59,890
Two fundamentally different approaches and that is something to keep in mind as we go through these
14
00:00:59,890 --> 00:01:00,850
tutorials.
15
00:01:00,850 --> 00:01:06,180
Our goal is to create this network which then learns on its own.
16
00:01:06,220 --> 00:01:14,570
We going to avoid trying to put in the rules and a good example that I can give you right now is this
17
00:01:14,680 --> 00:01:18,680
will come further in the course but it's just a very visual example for instance.
18
00:01:18,700 --> 00:01:25,690
How do you distinguish between a dog and cat fur on the left side on the process depicted on the left
19
00:01:25,690 --> 00:01:33,250
you would program things like the cat's ears have to be like this look out for whiskers look out for
20
00:01:33,250 --> 00:01:39,530
this type of nose look out for this type of shape of face look out for these colors you kind of you'd
21
00:01:39,530 --> 00:01:45,310
describe all these things and you'd have conditions like if if the ears are pointy than cat if the ears
22
00:01:45,310 --> 00:01:49,600
are sloping down and possibly dog and so on.
23
00:01:49,600 --> 00:01:55,090
On the other hand for a neural network you just code the neural networks you code the architecture and
24
00:01:55,090 --> 00:02:01,030
then you point the neural network at a folder with all these cats and dogs with images of cats and dogs
25
00:02:01,030 --> 00:02:06,580
which are already categorized and you tell it OK I've got you I've got some images of cats and dogs
26
00:02:06,880 --> 00:02:08,860
go and learn what a cat is.
27
00:02:08,860 --> 00:02:10,560
Go and learn what a dog is.
28
00:02:10,600 --> 00:02:16,000
And the neural network will on its own understand everything it needs to understand and then further
29
00:02:16,000 --> 00:02:20,950
down once its trained up when you give it a new image of a cat or dog it will be able to understand
30
00:02:20,950 --> 00:02:21,600
what it was.
31
00:02:21,610 --> 00:02:25,600
So there they are those are the two fundamentally different approaches.
32
00:02:25,690 --> 00:02:31,090
And today we're going to slowly start getting into how that second approach works.
33
00:02:31,090 --> 00:02:31,530
All right.
34
00:02:31,570 --> 00:02:33,340
So let's get straight to it.
35
00:02:33,400 --> 00:02:39,880
Here we have a very basic neural network with a one layer is called a single layer feedforward neural
36
00:02:39,880 --> 00:02:42,760
network and it is also called a perception.
37
00:02:42,760 --> 00:02:47,380
Now before we proceed one thing that we do need to adjust is that output value.
38
00:02:47,380 --> 00:02:49,320
Right now you can see that it's just a Y.
39
00:02:49,330 --> 00:02:51,160
We need to put a y hat in there.
40
00:02:51,190 --> 00:02:56,500
And the reason for that is usually y stands for the actual value and that's what we're going to be using.
41
00:02:56,500 --> 00:03:03,700
So why is it going to be the actual value which we see inreality output value is the predicted value
42
00:03:03,700 --> 00:03:05,890
by the algorithm by the neural network.
43
00:03:05,890 --> 00:03:09,220
Why what is the output value.
44
00:03:09,220 --> 00:03:11,500
Basically that's the denomination for the output value.
45
00:03:11,740 --> 00:03:20,020
And the perception that was first invented in 1957 by Frank Rosenblat and his whole idea was to create
46
00:03:20,170 --> 00:03:25,010
something that can actually learn and adjust itself.
47
00:03:25,240 --> 00:03:28,010
And this is what we're going to be looking at now.
48
00:03:28,030 --> 00:03:30,230
So we've got our precept drawn.
49
00:03:30,250 --> 00:03:32,070
Let's see how our perception learns.
50
00:03:32,080 --> 00:03:39,130
So let's say we have some input values that have been supplied to the perception and or basically to
51
00:03:39,130 --> 00:03:40,210
our own network.
52
00:03:40,330 --> 00:03:44,190
Then the activation function is applied.
53
00:03:44,200 --> 00:03:49,210
We have an output and now we're going to plot the output on a chart.
54
00:03:49,210 --> 00:03:51,830
So there it is our output y hat.
55
00:03:51,830 --> 00:03:57,520
Now what we need to do is in order to be able to learn we need to compare the output value to the actual
56
00:03:57,520 --> 00:04:01,310
value that we want the neural network to get right.
57
00:04:01,600 --> 00:04:04,520
And that is the value y.
58
00:04:04,810 --> 00:04:08,230
And so if we put it here you'll see that there's a bit of a difference.
59
00:04:08,330 --> 00:04:13,510
Now we're going to calculate a function called the cost function is calculated as one half of the difference
60
00:04:13,510 --> 00:04:17,200
of the square difference between the actual value and output value.
61
00:04:17,200 --> 00:04:20,500
Now there there are many ways you can come up for class function.
62
00:04:20,500 --> 00:04:23,300
There are many different cost functions that you can use.
63
00:04:23,320 --> 00:04:30,280
This is probably the most commonly used call function and why it is specifically this function that
64
00:04:30,280 --> 00:04:34,900
we use will find out further down when we're talking about a gradient decent but for now we're just
65
00:04:34,900 --> 00:04:39,830
going to agree that this is the cost function and basically what the cost function is telling us is
66
00:04:40,420 --> 00:04:44,240
what is the error that you have in your prediction.
67
00:04:44,290 --> 00:04:50,770
And our goal is to minimize the cost function because the lower the cost function the closer the y hat
68
00:04:50,790 --> 00:04:51,780
is to y.
69
00:04:52,150 --> 00:04:54,430
OK so as only we agree on that let's proceed.
70
00:04:54,430 --> 00:05:00,760
So basically from here what happens is there is a cost function and from here what happens is now we're
71
00:05:00,760 --> 00:05:08,950
going to once we've compared now we're going to feed this information back into the neural network.
72
00:05:08,980 --> 00:05:14,170
So there we go there's the information going back into the neural network and it goes to the weights
73
00:05:14,200 --> 00:05:15,630
and the weights get updated.
74
00:05:15,700 --> 00:05:20,880
Basically the only thing that we have control of in this very simple neural network are the weights
75
00:05:20,900 --> 00:05:23,490
w 1 W2 all the way to W..
76
00:05:23,980 --> 00:05:29,370
And our goal is to minimize the cost function so all we can do is update the weights.
77
00:05:29,500 --> 00:05:33,690
So we update the weights and tweak them a little bit.
78
00:05:33,940 --> 00:05:39,600
And how exactly we'll find out for the down but for now we agree that we have the the weights and then
79
00:05:39,600 --> 00:05:40,320
we continue so.
80
00:05:40,320 --> 00:05:48,870
But here I've put up this screenshot of the data just to make some one point very clear that right now
81
00:05:48,930 --> 00:05:53,990
throughout this whole experiment everything we're doing right now we're dealing with just the one role.
82
00:05:54,000 --> 00:06:00,330
So we're dealing with we have a dataset of one row where we have for instance we're dealing with how
83
00:06:00,330 --> 00:06:05,720
long you study it like the variable that we're predicting is what.
84
00:06:06,180 --> 00:06:08,230
What's the result you're going to get on an exam.
85
00:06:08,430 --> 00:06:13,200
And the dependent independent variables that we have is how many hours did you study for how many hours
86
00:06:13,200 --> 00:06:15,430
did you sleep and what did you get on the quiz.
87
00:06:15,460 --> 00:06:19,880
In the mid-semester So in the middle of the semester is a quiz what percentage did you get there.
88
00:06:19,880 --> 00:06:26,100
So based on those variables we're trying to predict what score you'll get for the exam and exam the
89
00:06:26,100 --> 00:06:28,010
93 percent that's the actual value.
90
00:06:28,010 --> 00:06:29,020
So that's why.
91
00:06:29,560 --> 00:06:30,460
So.
92
00:06:30,660 --> 00:06:36,720
So we feed these three values into a neural network again for the second time now and then we're going
93
00:06:36,720 --> 00:06:38,980
to be comparing the result to white.
94
00:06:39,150 --> 00:06:40,690
So let's see how this works.
95
00:06:40,800 --> 00:06:43,710
We feed these values into the neural network.
96
00:06:43,830 --> 00:06:50,160
Everything gets adjusted and weights get it just so as you can see this is again we feed the values
97
00:06:50,190 --> 00:06:55,480
again the point here is that we're feeding in the same ball so we only have one roll we're trying to
98
00:06:55,480 --> 00:06:56,370
do we're training on one row.
99
00:06:56,370 --> 00:06:59,580
This is because this is just a very simple basic example.
100
00:06:59,640 --> 00:07:01,610
Then we'll see what happens when there's morals.
101
00:07:01,800 --> 00:07:06,180
So again we feed these rows in our cross-functional get adjusted.
102
00:07:06,180 --> 00:07:10,520
As you can see everything happens along those lines again.
103
00:07:10,530 --> 00:07:15,030
So as you say every time our white hat is changing because we've tweaked the weights.
104
00:07:15,030 --> 00:07:20,550
All I had is changing our clothes function changing this whole look again so we feed those in.
105
00:07:20,550 --> 00:07:22,840
Why had is changing clothes function is changing.
106
00:07:22,920 --> 00:07:27,020
We get information back feedback to the weights so that the weights get adjusted again.
107
00:07:27,030 --> 00:07:31,850
We feed in the same values every time everything gets adjusted goes back to the weights.
108
00:07:31,860 --> 00:07:33,920
And one more time feed in.
109
00:07:34,020 --> 00:07:34,990
OK.
110
00:07:35,730 --> 00:07:40,720
And another time so we've adjust the way that just the way we feel in the information.
111
00:07:40,830 --> 00:07:41,370
And there we go.
112
00:07:41,370 --> 00:07:45,990
So now this time the white hat is equal to y cross-functional 0.
113
00:07:46,020 --> 00:07:48,410
Usually you won't get cost function equal to zero.
114
00:07:48,420 --> 00:07:50,720
But this is a very simple example.
115
00:07:50,820 --> 00:07:57,480
So hopefully all that made sense every time we feed in exactly that same row because just in this case
116
00:07:57,480 --> 00:08:01,370
we're just dealing with that one row into our neural network.
117
00:08:01,400 --> 00:08:06,990
Well then the weights get the values get valid supply supply the ways activation function is applied
118
00:08:06,990 --> 00:08:12,320
we get y hat y had as compared to Y then we see how the cost function is changed.
119
00:08:12,430 --> 00:08:16,500
Feedback and the feed that information Bakker's on your own network and then just adjust the weights
120
00:08:16,500 --> 00:08:17,470
again.
121
00:08:17,850 --> 00:08:21,410
And then we repeat the same process again with the same exact row.
122
00:08:21,570 --> 00:08:23,320
We're trying to minimize that cost.
123
00:08:23,520 --> 00:08:26,860
So up until now we've been dealing with just that one row.
124
00:08:27,030 --> 00:08:29,470
Let's see what happens when you have multiple roles.
125
00:08:29,490 --> 00:08:31,320
So here's the full data set.
126
00:08:31,350 --> 00:08:38,610
We have eight rows of how many hours you slept or maybe these are different students in day taking the
127
00:08:38,610 --> 00:08:44,070
same exam how many other hours they studied how many hours they slept before the exam would get on the
128
00:08:44,070 --> 00:08:47,300
quiz and their final result on the test.
129
00:08:47,490 --> 00:08:52,720
And as you can see here on the left I've got eight of these perceptions actually.
130
00:08:53,100 --> 00:08:55,950
They are all the same perception so this is also important.
131
00:08:56,010 --> 00:09:02,600
I just multiplied it or like duplicated eight times just so that we can.
132
00:09:03,330 --> 00:09:04,310
Conception is that.
133
00:09:04,320 --> 00:09:10,010
But the important thing here is the same neural network we're going to be feeding these into one Samual
134
00:09:10,040 --> 00:09:10,380
network.
135
00:09:10,380 --> 00:09:11,650
So let's go let's get started.
136
00:09:11,650 --> 00:09:20,550
So one airport as you'll hear had lain mentioning one airpark is when we go through a whole dataset
137
00:09:20,610 --> 00:09:27,410
and we train our neural network on on all of these roles so those lists are.
138
00:09:27,420 --> 00:09:34,410
So there's our first row and there's Why had for the first row there's a second role there's why I had
139
00:09:34,410 --> 00:09:35,260
for the second round.
140
00:09:35,280 --> 00:09:39,590
So again it's being fed into the same neural network every time.
141
00:09:39,600 --> 00:09:45,070
I just copied them several times so we can visually see how this is happening.
142
00:09:45,090 --> 00:09:52,320
Then again as it's happening again that's third row fourth row there is our white head for the fourth
143
00:09:52,320 --> 00:09:53,010
row and so on.
144
00:09:53,010 --> 00:09:56,580
Basically then we get the same values for the remaining four rows as well.
145
00:09:56,580 --> 00:10:03,440
So every time we just feed in a row into our neural network we get about it.
146
00:10:03,780 --> 00:10:06,930
Then we compare to the actual value.
147
00:10:06,930 --> 00:10:08,550
So they are the actual values.
148
00:10:08,760 --> 00:10:11,340
So for every single roll we have an actual value.
149
00:10:11,640 --> 00:10:18,480
And now based on all of these differences between y hat and why we can calculate the cost function which
150
00:10:18,480 --> 00:10:27,620
is the sum of all of those squared differences between why and why and how all of that is halved.
151
00:10:28,230 --> 00:10:30,360
And there's our cost function.
152
00:10:30,360 --> 00:10:36,750
And basically now what we do after we have the full cost function we go back and we update the weights
153
00:10:37,170 --> 00:10:39,480
we update a W 1 WTW.
154
00:10:39,510 --> 00:10:45,810
And the important thing to remember here is that all of these perception's all of these neural networks
155
00:10:45,810 --> 00:10:47,340
is actually one neural network.
156
00:10:47,340 --> 00:10:49,420
So there's not eight of them there's just one.
157
00:10:49,680 --> 00:10:55,110
And when we update the weights we're going to update the weights in that one neural network so basically
158
00:10:55,110 --> 00:10:57,900
the weights are going to be the same for all of the rows.
159
00:10:57,930 --> 00:11:00,560
So it's not the case that every role has its own weight.
160
00:11:00,580 --> 00:11:07,320
Now all the rows share the weights and so that's why we looked at the cost function which is the sum
161
00:11:07,620 --> 00:11:15,270
of the square differences and then we updated the weights and now from here there was just one iteration.
162
00:11:15,270 --> 00:11:19,020
Next we're going to run this whole thing again.
163
00:11:19,020 --> 00:11:25,440
We're going to feed every single row into the neural network find out our cost function and do this
164
00:11:25,440 --> 00:11:26,370
whole process again.
165
00:11:26,370 --> 00:11:32,090
So just as we saw previously where we had just one row and we were doing everything again and again
166
00:11:32,140 --> 00:11:33,590
and again same thing here.
167
00:11:33,600 --> 00:11:38,880
But now we're going to be doing and Pedros or 800 rows or eight thousand rows however many rows you
168
00:11:38,880 --> 00:11:40,590
have in your data set.
169
00:11:40,830 --> 00:11:43,700
You do this process and then you calculate the cost function.
170
00:11:44,220 --> 00:11:51,510
And the goal here is to minimize the cost function and to get as soon as you found a minute of the cost
171
00:11:51,510 --> 00:12:00,210
function that is your final neural network that means your weights have been adjusted and you have found
172
00:12:00,750 --> 00:12:08,550
the optimal weights for this dataset that you began your training on and you're ready to proceed to
173
00:12:08,550 --> 00:12:11,130
the testing phase or to the application phase.
174
00:12:11,550 --> 00:12:14,920
And this whole process is called back propagation.
175
00:12:15,000 --> 00:12:21,930
So some additional reading that you might want to do for the cost function and I know we just talked
176
00:12:21,930 --> 00:12:24,840
about one and there are many different ones.
177
00:12:24,840 --> 00:12:28,690
A good article is located on cross validated.
178
00:12:28,740 --> 00:12:33,020
It's called a list of course functions used in neural networks alongside applications.
179
00:12:33,090 --> 00:12:39,840
So the euro is there but you can just google for that exact search term or search phrase and you will
180
00:12:39,960 --> 00:12:42,150
that this one will be the first one that pops up.
181
00:12:42,150 --> 00:12:48,660
It's actually got some good examples and application or use cases for different cost functions so if
182
00:12:48,660 --> 00:12:51,800
you're interested to learn more about cost functions Check out this article.
183
00:12:51,990 --> 00:12:54,380
And on that note I hope you enjoy this tutorial.
184
00:12:54,420 --> 00:12:56,070
I look forward to see you next time.
185
00:12:56,070 --> 00:12:58,020
Until then enjoy deep learning.
20145
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.