Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,660 --> 00:00:04,980
Hello and welcome back to the course on computer vision in today's eternal we're going to talk about
2
00:00:04,980 --> 00:00:11,010
another powerful feature of the essays the algorithm and that is how it predicts object positions.
3
00:00:11,010 --> 00:00:18,090
So here we got another lovely photo which we took in Italy and these are just part of what's so small
4
00:00:18,090 --> 00:00:24,720
boats which are parked for the day or for the night and they're going to sometime soon go maybe fishing
5
00:00:24,930 --> 00:00:26,990
or whatever they're meant to do.
6
00:00:27,300 --> 00:00:35,130
And as previously we're going to overlay a grid over this image and then for every single center on
7
00:00:35,130 --> 00:00:41,110
this grid we're going to introduce a couple of rectangles and we're going to stick with our example
8
00:00:41,110 --> 00:00:43,470
three rectangles like that.
9
00:00:43,560 --> 00:00:49,110
And again as we discussed previously we're going to take each one of these rectangles and think of it
10
00:00:49,170 --> 00:00:55,650
as a separate image and then within each rectangle the conventional neural network will ask the question
11
00:00:57,360 --> 00:01:03,350
is there a bullet or is there not a bone basically lost about objects if you if you watch this video
12
00:01:03,360 --> 00:01:08,880
this is from the promotional video for the course and you posit at some point you'll see that this boat
13
00:01:08,880 --> 00:01:15,340
actually at one in one frame was recognized as an airplane or there was like a boat and an airplane
14
00:01:15,340 --> 00:01:16,450
recognized in one here.
15
00:01:16,450 --> 00:01:23,350
So it kind of shows that that's Lewis looking for all classes of objects at the same time.
16
00:01:23,350 --> 00:01:30,040
So yeah it's a good verification that sometimes those errors happen it's just shows that maybe it can
17
00:01:30,050 --> 00:01:34,390
confuse an object and this does look like a bit not like a nonstandard board because it's like a rubber
18
00:01:34,390 --> 00:01:35,680
boat.
19
00:01:35,730 --> 00:01:36,080
Yeah.
20
00:01:36,100 --> 00:01:43,120
So basically each one of these squares each one of these rectangles they go say this rectangle can there's
21
00:01:43,120 --> 00:01:44,230
so many of them you can barely see.
22
00:01:44,240 --> 00:01:50,050
Let's say this right over here is going to ask is like just looking at it by itself forgetting about
23
00:01:50,050 --> 00:01:53,560
everything else is going to ask is there any object inside here.
24
00:01:53,560 --> 00:01:55,680
Can I detect any features of an object.
25
00:01:55,750 --> 00:01:56,780
Yes or no.
26
00:01:57,050 --> 00:02:03,310
And so in our case in that well after the training in the ideal scenario of these falling rectangles
27
00:02:03,340 --> 00:02:08,130
will pick up features of bolts and so let's get rid of everything else.
28
00:02:08,140 --> 00:02:15,040
So there you go you can see that these rectangles do overlap with our ground truth and therefore it's
29
00:02:15,040 --> 00:02:17,120
fair to say that they will pick up the sort of features.
30
00:02:17,150 --> 00:02:25,570
But again just a couple of things I wanted to point out here that during training this might not be
31
00:02:25,570 --> 00:02:31,270
the case maybe this rectangle pick up the boat on this rectangle and this rectangle won't pick up a
32
00:02:31,270 --> 00:02:31,500
boat.
33
00:02:31,510 --> 00:02:36,670
But then through the years of training process they will know they will learn what the features of boats
34
00:02:37,330 --> 00:02:39,670
are and how to pick them up and so on.
35
00:02:39,670 --> 00:02:42,020
So that's the training that we got to lend.
36
00:02:42,020 --> 00:02:47,860
Lengthy hard part in the detection of the network after it's trained up this is what will happen.
37
00:02:47,920 --> 00:02:54,010
So we kind of like combining the concepts of training and application so it's important just to keep
38
00:02:54,010 --> 00:02:56,230
in mind what exactly we're talking about.
39
00:02:56,470 --> 00:03:03,910
So this is the ideal scenario what would happen during train and during application after the network
40
00:03:03,910 --> 00:03:04,640
has trained.
41
00:03:04,930 --> 00:03:12,630
And so let's have a look at one specific set of these three boxes for instance this one over here.
42
00:03:12,640 --> 00:03:19,660
So the interesting thing is that these boxes have picked up features of a boat ride so they can see
43
00:03:19,660 --> 00:03:27,900
maybe the front of the boat and this anchor line maybe some you know you can see that it's in the water.
44
00:03:28,000 --> 00:03:29,370
They can see like this.
45
00:03:29,420 --> 00:03:33,010
This part isn't just standing out meaning that's where the driver stands.
46
00:03:33,020 --> 00:03:37,540
And so some of you can see more some of them can see less but nevertheless not all of them can see the
47
00:03:37,540 --> 00:03:39,050
full image of the boat.
48
00:03:39,090 --> 00:03:41,940
None of you can see the full ground truth of the boat.
49
00:03:42,220 --> 00:03:51,190
And so the question is how can a box predict how can like one of these temporary or inferred boxes how
50
00:03:51,190 --> 00:03:55,630
can it predict where exactly the full boat will be.
51
00:03:55,630 --> 00:03:56,400
That's the question.
52
00:03:56,470 --> 00:04:02,110
So if you like if you use your hand to just close that this part of the image.
53
00:04:02,110 --> 00:04:07,840
How can you complete this boat without knowing where it actually is supposed to be complete without
54
00:04:07,840 --> 00:04:09,190
knowing the full ground truth.
55
00:04:09,190 --> 00:04:11,490
How do you know where it will be completed.
56
00:04:11,710 --> 00:04:17,880
And that is another special part in the s is the algorithm.
57
00:04:17,890 --> 00:04:20,910
It's actually not that complex when you think about it.
58
00:04:21,070 --> 00:04:29,460
So the underlying principle is that we have the ground truth when making these predictions and training
59
00:04:29,460 --> 00:04:32,020
the algorithms so now we're back to talking about training.
60
00:04:32,110 --> 00:04:37,550
So when training the answer the algorithm we have these ground troops we have these actual boxes.
61
00:04:37,900 --> 00:04:46,030
And so we can then let these boxes let the algorithm make its predictions of the algorithm based on
62
00:04:46,030 --> 00:04:52,360
what it can see based on these individual boxes it can then make a prediction.
63
00:04:52,360 --> 00:04:52,560
OK.
64
00:04:52,570 --> 00:04:57,010
I think the boat looks like maybe maybe I'll say it looks like this.
65
00:04:57,050 --> 00:05:03,940
So the full boat the full rectangle should be around like this size although it wouldn't do that because
66
00:05:03,940 --> 00:05:07,030
it knows that there's no features a boat overboard in here.
67
00:05:07,030 --> 00:05:07,600
Right so.
68
00:05:07,710 --> 00:05:12,340
But for instance it could say like it knows that this rectangle has features of both.
69
00:05:12,340 --> 00:05:12,870
Right.
70
00:05:12,910 --> 00:05:17,020
So it doesn't know maybe the boat actually goes up up to up to the top here.
71
00:05:17,020 --> 00:05:19,360
So it does use the overlapping.
72
00:05:19,450 --> 00:05:21,610
So it is it makes it easier.
73
00:05:21,610 --> 00:05:28,960
So like if you go if you go back it will use overlapping and where they are overlapping that's of course
74
00:05:28,960 --> 00:05:34,230
is a high probability that chance that there is a boat where the two rectangles which are overlapping
75
00:05:34,230 --> 00:05:34,860
is a hybrid.
76
00:05:34,870 --> 00:05:35,590
It is a boat.
77
00:05:35,890 --> 00:05:41,260
But nevertheless you can still make errors it can still maim like come to the assumption that the boat
78
00:05:41,260 --> 00:05:47,350
is higher longer or wider or something like that but the way this is mitigated is again through training.
79
00:05:47,350 --> 00:05:52,870
So at the end once it's made its predictions there will be a CERN.
80
00:05:53,020 --> 00:06:00,460
CERN or box that it suggests for this body will say that the box star said this pixel the bottom left
81
00:06:00,460 --> 00:06:04,790
corner is here on the bottom right corner is instead of saying here it might say we're here.
82
00:06:05,140 --> 00:06:11,380
And then it will actually that's resolved that box that it creates will be compared to the ground truth
83
00:06:11,920 --> 00:06:18,080
and then again will be an era which can be calculated and if you think of it in terms of coordinates
84
00:06:18,080 --> 00:06:20,130
it can be calculated CHONE distance.
85
00:06:20,340 --> 00:06:26,700
There's lots of mathematics going on in the background so we're not going to go into that right now
86
00:06:26,710 --> 00:06:28,590
how exactly the error is calculated.
87
00:06:28,740 --> 00:06:35,460
But you can envisage how the arrow would could be expressed in mathematical terms and that era would
88
00:06:35,460 --> 00:06:41,300
be then propagated back up through the network in order to help update the weights.
89
00:06:41,310 --> 00:06:47,910
And again we talked about all of this like the back propagation process and gradient descent in the
90
00:06:47,910 --> 00:06:54,280
annex for a and then artificial neural networks and convolutional neural networks but then the conceptual
91
00:06:54,300 --> 00:06:55,080
that's what is happening.
92
00:06:55,080 --> 00:07:01,680
So the bottom line takeaway from this tutorial is that through training two things happen the first
93
00:07:01,680 --> 00:07:09,150
thing is that what we discussed previously is that each box will learn to better classify objects or
94
00:07:09,150 --> 00:07:11,680
better identify if it has objects inside there.
95
00:07:11,700 --> 00:07:15,990
And in each box will use the ground truth to assess that.
96
00:07:15,990 --> 00:07:20,910
So for instance this box will say that yes there is about where is this box or this box.
97
00:07:20,920 --> 00:07:23,960
This this box we want it to say yes there is about.
98
00:07:24,120 --> 00:07:28,650
And this box we don't want it to say that there is a bot and that's what'll happen to training and they'll
99
00:07:28,650 --> 00:07:29,650
get better at that.
100
00:07:29,670 --> 00:07:36,450
But the second thing that will happen is they will get better at identifying the exact final output
101
00:07:36,450 --> 00:07:39,840
rectangle of that they should create.
102
00:07:39,840 --> 00:07:46,590
So we want we want these this algorithm this is D to produce this exact rectangle which matches the
103
00:07:46,590 --> 00:07:52,020
ground truth rather than a large direct eye or rather than a shorter rectangle.
104
00:07:52,020 --> 00:07:55,670
So for instance here you can see there is always going to be some sort of error.
105
00:07:55,680 --> 00:08:00,060
So here you can see that this rectangle because this isn't the ground truth right.
106
00:08:00,120 --> 00:08:04,050
So at the beginning we discussed there isn't this isn't this isn't ground truth.
107
00:08:04,050 --> 00:08:08,250
We assume this to be the ground truth but as you can see.
108
00:08:08,310 --> 00:08:12,270
And then at that time I said that I wouldn't draw a better rectangle around any of them except I'm into
109
00:08:12,290 --> 00:08:12,500
this.
110
00:08:12,510 --> 00:08:16,910
But now if you look carefully you can see that this rectangle could be a bit narrower.
111
00:08:16,930 --> 00:08:22,190
Right you could see that it could be a bit narrower here and this rectangle could also be a bit narrower.
112
00:08:22,190 --> 00:08:26,700
So it's taking a little bit more so there's no I can there's no bones in these pixels over here on the
113
00:08:26,700 --> 00:08:27,300
right.
114
00:08:27,300 --> 00:08:32,740
There's no boat in these pixels on the left and maybe even this one could be a tiny little bit narrower.
115
00:08:33,480 --> 00:08:34,470
So there we go.
116
00:08:34,470 --> 00:08:40,280
That's how one second that's how the training happens.
117
00:08:40,290 --> 00:08:46,350
And so that's another element of the training that is important to keep in mind so maybe if this algorithm
118
00:08:46,350 --> 00:08:51,780
was trained along with the one that was actually applied to drill what we took as the ground maybe if
119
00:08:51,840 --> 00:08:58,200
it was trained for a longer maybe this bullet would have been more precisely identified during the application
120
00:08:58,200 --> 00:09:00,090
process.
121
00:09:00,210 --> 00:09:05,640
But having said that this was a quite a tough challenge because as you can imagine the blow boats are
122
00:09:05,640 --> 00:09:12,790
kind of like moving around they're floating with waves the waves and also that the drone is flying.
123
00:09:12,810 --> 00:09:20,730
So this is actually a pretty good result for those circumstances anyway so that's the second part that
124
00:09:20,730 --> 00:09:22,890
happens in the training.
125
00:09:23,580 --> 00:09:29,620
This is slow as you can see slowly we're building up the different components of this deal.
126
00:09:29,610 --> 00:09:34,800
So training is how to learn how to classify objects through the ground truth but also is the ground
127
00:09:34,800 --> 00:09:39,270
truth to identify the correct widths and heights of those boxes.
128
00:09:39,480 --> 00:09:47,270
And in the next tutorial we'll continue and talk about the next part component of that as is the.
129
00:09:47,440 --> 00:09:50,860
And of course see you there and until next time in your computer vision.
14816
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.