Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,480 --> 00:00:07,650
Hello and welcome to this new tutorial now that we've done our four transformations the input is ready
2
00:00:07,830 --> 00:00:09,490
to be fed into the neural network.
3
00:00:09,510 --> 00:00:12,060
It has the authorization to get in.
4
00:00:12,180 --> 00:00:14,730
And therefore that's exactly what we're going to do.
5
00:00:14,730 --> 00:00:16,420
We're going to get it.
6
00:00:16,490 --> 00:00:25,380
And so there is nothing more simple you know we already have our pre-trained model SSD and it is already
7
00:00:25,380 --> 00:00:30,090
pre-trained because we could load the weight thanks to this file but that's not actually what we'll
8
00:00:30,090 --> 00:00:30,210
do.
9
00:00:30,210 --> 00:00:32,730
Now we will load the weight in the end.
10
00:00:32,730 --> 00:00:38,280
Right now that is just a variable so we will just use the variable but then at the end of this implementation
11
00:00:38,280 --> 00:00:41,970
we will load the weights to get our pre-trained model.
12
00:00:41,970 --> 00:00:50,460
So to feed x r towards variable that contains both the torch tensor of the input frame and the gradient
13
00:00:50,850 --> 00:00:52,890
into the neural network net.
14
00:00:53,160 --> 00:00:59,960
Well we simply need to take our neural network net and then apply X and that's it.
15
00:01:00,030 --> 00:01:02,940
That's how we feed X to the neural network.
16
00:01:03,150 --> 00:01:11,040
But then since this neural network nets applied to the input X will return the output y.
17
00:01:11,280 --> 00:01:19,740
Well we're going to get this output y right now and therefore and adding y equals net X. That gives
18
00:01:19,740 --> 00:01:26,190
us the output way we will of course describe what is why directly you can already start to try to think
19
00:01:26,280 --> 00:01:27,570
what it is exactly.
20
00:01:27,570 --> 00:01:30,140
But now we have the output that's great.
21
00:01:30,240 --> 00:01:32,970
And so we can move on to the next step.
22
00:01:33,060 --> 00:01:35,280
So the next step what is the next step.
23
00:01:35,460 --> 00:01:43,020
Well we just got our output y Why doesn't contain directly what we're interested in that is the result
24
00:01:43,020 --> 00:01:50,580
of the detection whether we have a dog or a human in the input frame to get that specific information
25
00:01:50,580 --> 00:01:51,830
we're interested in.
26
00:01:51,840 --> 00:01:55,460
Well we need to take the data attribute from Y.
27
00:01:55,560 --> 00:02:01,340
And so what we're going to do now is create a new sensor that we're going to call detections.
28
00:02:01,490 --> 00:02:06,110
So detection is a new tensor and that's a tensor contained in the output y.
29
00:02:06,210 --> 00:02:13,260
And that will contain the values we're interested in and to get this tensor while we take our output
30
00:02:13,260 --> 00:02:13,780
y.
31
00:02:14,010 --> 00:02:21,410
And then we add that and we take our attribute data and then we get the values of the output.
32
00:02:21,570 --> 00:02:22,050
Perfect.
33
00:02:22,050 --> 00:02:29,610
Now we have what we want the next step now is to create a new tensor object which will have the dimensions
34
00:02:29,730 --> 00:02:32,390
width height width height.
35
00:02:32,460 --> 00:02:34,190
So I didn't say twice.
36
00:02:34,200 --> 00:02:37,030
It's just a tensor of four dimensions.
37
00:02:37,080 --> 00:02:42,000
The first dimension is with the second dimension is height the third dimension is width and the fourth
38
00:02:42,000 --> 00:02:43,070
dimension is height.
39
00:02:43,290 --> 00:02:47,800
And now of course most of you must be thinking why do we have to create such a tensor.
40
00:02:47,940 --> 00:02:54,870
Well that's because the position of the detected objects inside the image has to be normalized between
41
00:02:55,020 --> 00:02:56,410
0 and 1.
42
00:02:56,460 --> 00:03:01,980
And to do this normalization will need this scale tensor with these four dimensions.
43
00:03:02,130 --> 00:03:04,910
Basically the Newtons are we're about to create right now.
44
00:03:04,950 --> 00:03:10,950
Scale will be just use to do this normalization between zero and one of the positions of the object
45
00:03:11,100 --> 00:03:12,320
detected in the image.
46
00:03:12,330 --> 00:03:13,820
That's the only purpose.
47
00:03:13,940 --> 00:03:16,810
And now why do we have with height width height.
48
00:03:16,840 --> 00:03:22,920
That's because the first two with height will correspond to the scale of values of the upper left corner
49
00:03:23,160 --> 00:03:24,990
of the rectangle detector.
50
00:03:25,230 --> 00:03:31,140
And the second with height will correspond to the scale of values of the lower right corner of this
51
00:03:31,140 --> 00:03:32,630
same rectangle detector.
52
00:03:32,640 --> 00:03:34,500
That's why we have a double with height.
53
00:03:34,620 --> 00:03:40,410
So let's create this scale sensor so that you can visualize it.
54
00:03:40,410 --> 00:03:48,330
So on a general rule to create a tensor in Torch Well we need to take our torche library and then we
55
00:03:48,330 --> 00:03:54,810
use the tensor class so scale will be an object of the tenso class which therefore will be a tensor
56
00:03:54,890 --> 00:03:56,150
a torch tensor.
57
00:03:56,550 --> 00:04:02,940
But as the arguments of this tensor class we need to specify the four dimensions of the tensor and these
58
00:04:02,940 --> 00:04:11,850
four dimensions are it's hights what's heights.
59
00:04:12,080 --> 00:04:12,790
Perfect.
60
00:04:12,800 --> 00:04:20,090
So this first with hied correspond to the upper left corner of the rectangle and this second with height
61
00:04:20,120 --> 00:04:23,220
corresponds to the lower right corner of the rectangle.
62
00:04:23,300 --> 00:04:29,060
And we're doing this to normalize the scale of values of the position of the detected objects between
63
00:04:29,150 --> 00:04:30,160
0 and 1.
64
00:04:30,260 --> 00:04:31,190
Perfect.
65
00:04:31,190 --> 00:04:32,610
So another good thing done.
66
00:04:32,690 --> 00:04:34,290
Don't worry about the warnings here.
67
00:04:34,310 --> 00:04:40,440
That's just because we haven't use these detections and scale variables yet we will do it very quickly.
68
00:04:40,640 --> 00:04:45,770
But before we do that I highly recommend to take a break because what we're about to do now will be
69
00:04:45,770 --> 00:04:49,700
slightly more complicated than what we've been doing so far.
70
00:04:49,730 --> 00:04:52,570
So we're going to finish with this tutorial now.
71
00:04:52,640 --> 00:04:54,080
Take a good break.
72
00:04:54,170 --> 00:04:59,100
Possibly a little nap or good coffee and then we'll attack more.
73
00:04:59,120 --> 00:05:01,670
The heart of the ass is tomorrow.
74
00:05:01,820 --> 00:05:03,390
Hope that didn't sound too aggressive.
75
00:05:03,480 --> 00:05:07,430
But yeah we're going to get into the heart of the as is the neural network.
76
00:05:07,520 --> 00:05:10,660
So have a good break and I'll see you in the next tutorial.
77
00:05:10,670 --> 00:05:12,380
Until then enjoy computer vision.
7822
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.