Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:01,850 --> 00:00:07,340
Hello and welcome back to the course on deploring today we're going to wrap up with back propagation.
2
00:00:07,340 --> 00:00:11,520
All right so we're you know pretty much everything we need to know about what happens in in your all
3
00:00:11,520 --> 00:00:17,990
that we know that there's a process called Forward propagation where information is entered into the
4
00:00:17,990 --> 00:00:24,620
input layer and then it's propagated forward to get our white hats our output values and then we compare
5
00:00:24,620 --> 00:00:29,240
those to the actual values that we have in our training set.
6
00:00:29,240 --> 00:00:36,110
And then we calculate the errors then the errors are back propagated through the network in the opposite
7
00:00:36,110 --> 00:00:41,570
direction and that allows us to train the network by adjusting the weights.
8
00:00:41,660 --> 00:00:49,670
So the one key important thing to remember here is that back propagation is an advanced algorithm driven
9
00:00:49,670 --> 00:00:58,890
by very interesting and sophisticated mathematics which allows us to adjust the weights.
10
00:00:59,030 --> 00:01:02,540
All of them at the same time all the weights are adjusted simultaneously.
11
00:01:02,540 --> 00:01:08,990
So if we were doing this manually or if we're coming up a very different type of algorithm than Even
12
00:01:08,990 --> 00:01:14,150
if we calculated the error and then we were trying to understand what effect each of the weights has
13
00:01:14,150 --> 00:01:21,040
on the error we'd have to somehow adjust each of the weights independent independently or individually.
14
00:01:22,000 --> 00:01:29,170
The huge advantage of backwardation and it's a key thing to remember is that during the process of back
15
00:01:29,170 --> 00:01:35,910
propagation simply because of the way the algorithm is structured.
16
00:01:36,850 --> 00:01:43,990
You are able to adjust all the way at the same time so you basically know which part of the error each
17
00:01:43,990 --> 00:01:47,400
of your weights in the neural network is responsible for.
18
00:01:47,450 --> 00:01:54,220
Now that is the key fundamental underlying principle of back propagation.
19
00:01:54,220 --> 00:02:02,650
And this was why it picked up so rapidly in the 1980s and this was a major breakthrough.
20
00:02:02,770 --> 00:02:08,890
And if you'd like to learn more about that and how exactly the mathematics works in the background then
21
00:02:09,190 --> 00:02:14,800
a good article which we've already mentioned is the neural networks and deep learning is actually a
22
00:02:14,800 --> 00:02:16,640
book by Michael Nielsen.
23
00:02:16,720 --> 00:02:23,610
You'll find the mathematics written out and it will help you understand how exactly this is possible.
24
00:02:23,650 --> 00:02:30,550
But for now for our purposes if from an intuition point of view the important part is to remember that
25
00:02:31,240 --> 00:02:33,310
that's what backwardation does.
26
00:02:33,310 --> 00:02:36,750
It adjusts all of the weights at the same time.
27
00:02:36,940 --> 00:02:43,300
And now we're going to just wrap everything up with a step by step walkthrough of what happens in the
28
00:02:43,300 --> 00:02:45,370
training of a neural network.
29
00:02:45,370 --> 00:02:51,000
All right so step one we randomly initialized the weights to small numbers close to zero but not zero.
30
00:02:51,010 --> 00:02:56,830
We didn't really focus on the initialization of weights during the intuition tutorials but then we have
31
00:02:56,830 --> 00:03:02,610
to start somewhere and they are initialized with random values near zero.
32
00:03:02,620 --> 00:03:09,730
And from there through the process for propagation by propagation these weights are adjusted until the
33
00:03:09,730 --> 00:03:11,690
error is minimized.
34
00:03:11,970 --> 00:03:13,550
So the cost function is minimized.
35
00:03:13,820 --> 00:03:19,330
Then step two inputs the first observation all your data sets to the first row into the input Lehre
36
00:03:19,510 --> 00:03:21,440
each feature is one input.
37
00:03:21,440 --> 00:03:27,610
So basically take the combs and put them into the input nodes separately for propagation from left to
38
00:03:27,610 --> 00:03:27,910
right.
39
00:03:27,910 --> 00:03:32,620
The neurons are activated in a way that they pick in our vision neurons activation is limited by the
40
00:03:32,620 --> 00:03:39,150
weights the weights basically determine how important each neurons activation is then propagate the
41
00:03:39,160 --> 00:03:43,100
activation until getting the produce a result y hat.
42
00:03:43,150 --> 00:03:43,850
In this case.
43
00:03:43,990 --> 00:03:46,640
So basically you propagate from left to right.
44
00:03:46,690 --> 00:03:50,110
You go all the way until you get to and you get your y hat.
45
00:03:50,320 --> 00:03:52,720
Then compare the result to the actual result.
46
00:03:52,750 --> 00:03:58,140
Measure the generated error and then you do the backwardation from right to left the air is bipolar
47
00:03:58,150 --> 00:03:58,620
again.
48
00:03:58,630 --> 00:04:02,080
Update the weights according to how much they are responsible for the error.
49
00:04:02,260 --> 00:04:08,500
Again you are able to calculate that because of the way the back propagated perturbation algorithm is
50
00:04:08,500 --> 00:04:13,750
structured the learning rate decides by how much we update the weights the learning rate as parameter
51
00:04:13,990 --> 00:04:17,710
you can control in your neural network.
52
00:04:17,710 --> 00:04:23,110
Step 6 repeat steps 1 2 5 and update the weights after each observation.
53
00:04:23,320 --> 00:04:30,670
That is called reinforcement learning and in our case that was stochastic gradient descent or repeat
54
00:04:30,670 --> 00:04:31,490
steps 1 to 5.
55
00:04:31,510 --> 00:04:37,840
But that way it's only after a batch of observations or batch learning it's either a full gradient descent
56
00:04:37,870 --> 00:04:43,150
or badge green Nissan or mini batched gradient descent and step seven when the whole train had passed
57
00:04:43,150 --> 00:04:49,030
through artificial neural network that makes an epoch redo more epochs.
58
00:04:49,040 --> 00:04:55,090
So basically just keep doing that and doing that and doing that and to allow your neural network to
59
00:04:55,120 --> 00:05:02,510
train better and better and better and constantly adjust itself as you minimize the cost function.
60
00:05:02,740 --> 00:05:04,330
So there we go.
61
00:05:04,420 --> 00:05:09,770
Those are the steps you need to take to build your artificial neural networks and train it.
62
00:05:10,030 --> 00:05:16,060
And these are the steps that you will be taking till I've had lunch in the practical tutorials.
63
00:05:16,120 --> 00:05:19,520
Wish you the best of luck and I look forward to seeing you next time.
64
00:05:19,540 --> 00:05:21,280
Until then enjoy the learning.
7542
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.