Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,610 --> 00:00:06,010
Hello welcome back to the course on computer vision in today's tutorial we're going to talk about adaptive
2
00:00:06,160 --> 00:00:08,390
boosting or other boost.
3
00:00:08,440 --> 00:00:17,320
So previously we left off when we discussed how the algorithm uses input data or images to find which
4
00:00:17,320 --> 00:00:21,310
features are important to recognizing faces.
5
00:00:21,310 --> 00:00:28,610
So it uses the face images to understand which features are common among faces and it keeps those.
6
00:00:28,700 --> 00:00:34,040
And then it uses the nonfatal images to understand out of the ones that are the features that it's kept
7
00:00:34,460 --> 00:00:38,760
which features give it high rates of false positives.
8
00:00:38,930 --> 00:00:43,670
And then it discards those features and makes them less important and focus on the features that are
9
00:00:43,730 --> 00:00:51,830
present and faces and faces only that do not come up very commonly in other types of objects or the
10
00:00:51,860 --> 00:00:53,000
pictures.
11
00:00:53,070 --> 00:00:55,300
And so that's basically how the algorithm is trained.
12
00:00:55,640 --> 00:01:02,330
And then we have these different features and there are thresholds at which they are considered to be
13
00:01:02,330 --> 00:01:07,590
present in an image and so that sounds like the end of the story sounds like.
14
00:01:07,600 --> 00:01:08,180
Ok cool.
15
00:01:08,180 --> 00:01:13,700
We've identified the features we've identified thresholds all we have to do when a new image which is
16
00:01:13,730 --> 00:01:19,870
going to watch is going to look at the features that we've identified check if their present check the
17
00:01:19,910 --> 00:01:20,770
thresholds are met.
18
00:01:20,780 --> 00:01:25,750
And we're going to know if it's a face while everything is not that simple.
19
00:01:25,760 --> 00:01:38,930
The reason for this hurdle is that even in a 24 by 24 pixel image the number of features is huge the
20
00:01:38,940 --> 00:01:45,810
number of features actually is over a hundred and eighty thousand possible features that can fit into
21
00:01:45,810 --> 00:01:46,340
this image.
22
00:01:46,350 --> 00:01:52,650
Sounds it sounds unbelievable because the image is so small it's only 24 pixels but it is true.
23
00:01:52,650 --> 00:01:58,800
So the the reason for that is that these features as we discussed there are scalable.
24
00:01:58,800 --> 00:02:03,810
So not only are we looking at these features on all their different positions in this image.
25
00:02:03,810 --> 00:02:08,010
We're actually looking at all of the variations of each one of these features.
26
00:02:08,010 --> 00:02:13,380
For instance this feature looks like for example it looks like there's one 2 pixels here and 1 2 pixels
27
00:02:13,380 --> 00:02:13,840
here.
28
00:02:13,890 --> 00:02:19,860
So you need to look through this image and try all all possible positions of this feature or very likely
29
00:02:19,850 --> 00:02:24,880
like that and then all possible positions here here all different positions here.
30
00:02:25,230 --> 00:02:26,660
And that's that's the start.
31
00:02:26,700 --> 00:02:33,540
But now you need to also extend that as to be like one to three pixels three pixels here and three pixels
32
00:02:33,560 --> 00:02:38,380
here and again you have to try out all possible positions for that feature.
33
00:02:38,400 --> 00:02:41,730
Next you might make it like four pixels high.
34
00:02:41,800 --> 00:02:46,170
Again you have to try it out all the positions the next you might make it for pixels high and then two
35
00:02:46,170 --> 00:02:48,360
pixels wide so that one.
36
00:02:48,360 --> 00:02:52,920
One two three four one two three four here and one two three four one two three four here and you need
37
00:02:52,920 --> 00:02:54,640
to try that out as well.
38
00:02:54,780 --> 00:03:00,030
And then once and then you keep doing that you keep expanding making it wider and so on and all the
39
00:03:00,030 --> 00:03:03,260
possible widths and heights of this feature.
40
00:03:03,450 --> 00:03:07,920
And then once you've run out of options for that you need to move on to the next one and the next or
41
00:03:07,920 --> 00:03:08,960
next on THE NEXT ONE.
42
00:03:09,150 --> 00:03:17,740
And so in a 24 by 24 pixel image these base features they all possible variations of them and their
43
00:03:17,760 --> 00:03:18,650
different positions.
44
00:03:18,810 --> 00:03:26,670
They add up to over a hundred and eighty thousand possible options that we would have to explore.
45
00:03:26,880 --> 00:03:29,670
And that is a huge number.
46
00:03:29,710 --> 00:03:33,210
So and it poses two concerns.
47
00:03:33,210 --> 00:03:40,550
First of all during the training that would be very hard because you know not only have to check 180000
48
00:03:40,560 --> 00:03:47,520
features for one image you need to check hundred eighty thousand features for all the images in the
49
00:03:47,520 --> 00:03:53,670
training data which is nine thousand eight hundred thirty two faces in the original Old Jones paper
50
00:03:54,060 --> 00:04:02,730
plus the huge number you know thousands and tens of thousands of images of known faces so you'd have
51
00:04:02,730 --> 00:04:10,830
to check that across all those images so training becomes quite long becomes like a nightmare.
52
00:04:10,950 --> 00:04:18,900
And the second thing is that even during your application so when you're detecting you're in if you've
53
00:04:18,900 --> 00:04:24,990
somehow managed to train those all those features now when you're detecting the faces you have to check
54
00:04:25,000 --> 00:04:27,260
180000 every single time.
55
00:04:27,480 --> 00:04:29,540
And that is practically impossible to do in real time.
56
00:04:29,550 --> 00:04:37,670
Take a lot of computation to do that for every single frame or every single image.
57
00:04:37,850 --> 00:04:41,960
So this is where adaptive boosting comes in to help solve this problem.
58
00:04:41,960 --> 00:04:48,560
So we've got we're going to take our features and we're going to put them together into a classifier
59
00:04:48,830 --> 00:04:58,280
which will look like this so here on the left we've got the classifier f f of x and then here f one
60
00:04:58,280 --> 00:05:04,460
of two and three are the features and Alpha 1 2 and 3 are the way to those features so we'll just add
61
00:05:04,460 --> 00:05:05,750
some features here.
62
00:05:05,750 --> 00:05:14,330
For instance there's a feature so that feature could be that's one that's commonly talked about the
63
00:05:16,010 --> 00:05:19,840
bridge of the nose of the nose is usually lighter than on the left on the right.
64
00:05:19,840 --> 00:05:29,430
So it's it's a feature that can help detect faces and this is the feature that the eyes are commonly
65
00:05:29,430 --> 00:05:32,230
darker than the area under the eyes.
66
00:05:32,310 --> 00:05:35,970
And so just even these two features will get as I say.
67
00:05:35,980 --> 00:05:40,110
So that's those two features and then maybe there's another feature and so on.
68
00:05:40,110 --> 00:05:46,170
So we've got our features aligned and that and it keeps going like that.
69
00:05:46,410 --> 00:05:51,400
And each one of these features is called a weak classifier on its own.
70
00:05:51,450 --> 00:05:57,840
He doesn't get a very high rate of success so as long as it gets over 50 percent that's already good.
71
00:05:57,840 --> 00:06:03,750
So for instance maybe this bridge of the nose if you just used that feature on its own maybe gets a
72
00:06:03,750 --> 00:06:10,560
60 percent success rate or a road or 65 percent success rate out of the images.
73
00:06:10,830 --> 00:06:19,560
And then this image that or this classifier or this part on the left the effort the F couple of X that's
74
00:06:19,560 --> 00:06:21,100
called a strong cost classified.
75
00:06:21,390 --> 00:06:28,160
And the way it works is that when you have one classifier by itself it's not as good.
76
00:06:28,170 --> 00:06:31,470
It maybe has a 60 percent success rate.
77
00:06:31,740 --> 00:06:38,430
When you have two week classifiers together even though this one might also be only like 60 or something
78
00:06:38,430 --> 00:06:39,970
percent or 55 percent.
79
00:06:40,050 --> 00:06:42,360
But together all of a sudden they're much stronger.
80
00:06:42,390 --> 00:06:48,730
And then as you keep adding they get stronger and stronger and stronger so you don't actually need.
81
00:06:48,990 --> 00:06:51,890
You don't need all hundred and eighty thousand of them.
82
00:06:52,080 --> 00:06:59,250
You might just need a couple of thousand to get a really really good result very strong classified.
83
00:06:59,400 --> 00:07:05,670
And this is called an ensemble method this is called ensemble because you are leveraging the power of
84
00:07:05,670 --> 00:07:06,930
the crowd as they call it.
85
00:07:06,930 --> 00:07:13,030
So it's like the power of one is not as strong.
86
00:07:13,320 --> 00:07:18,960
But when you put lots of we classifiers together they become a strong classifier and that is what I
87
00:07:18,960 --> 00:07:24,980
was going to say that even just these two features the bridge of the nose and the eyes and the area
88
00:07:24,990 --> 00:07:26,290
under the eyes.
89
00:07:26,730 --> 00:07:32,550
It's the exact numbers in the village on paper but I think they said something like it gave them an
90
00:07:32,610 --> 00:07:39,130
80 percent accuracy just those two features together or really made it so much more powerful.
91
00:07:39,180 --> 00:07:45,420
Not it wasn't close to 90 or a hundred but it was already much better than each one of those on their
92
00:07:45,420 --> 00:07:47,000
own.
93
00:07:47,160 --> 00:07:51,700
And so as you can imagine by adding more and more and more you can get better and better better results.
94
00:07:51,810 --> 00:07:57,840
So the question is now how do we find these right features how do we add the right ones.
95
00:07:57,870 --> 00:08:03,060
So the most important ones are want to have the most important ones at the front.
96
00:08:03,570 --> 00:08:04,050
And then.
97
00:08:04,080 --> 00:08:11,070
So then because our of the 180 thousand then the ones at the end you don't even need to worry about
98
00:08:11,070 --> 00:08:12,500
them after a certain point.
99
00:08:12,750 --> 00:08:16,680
So how do we find the best ones and this is where adaptive boosting comes into play.
100
00:08:16,680 --> 00:08:20,580
We won't go into the mathematics behind the algorithm we'll just go into the intuition.
101
00:08:20,850 --> 00:08:26,300
So let's say you have 10 pictures you have five faces and five known faces.
102
00:08:26,550 --> 00:08:31,520
And during the training process you apply.
103
00:08:31,540 --> 00:08:35,700
You want to identify the like the best feature.
104
00:08:35,700 --> 00:08:41,310
See I don't you want to identify how to build that strong classifier So you identify a feature that
105
00:08:41,310 --> 00:08:42,660
is important.
106
00:08:42,840 --> 00:08:52,060
And for instance the bridge of the nose feature and there you you apply it to your images.
107
00:08:52,290 --> 00:08:58,290
And so you get a result for for instance it says identify that these three are indeed faces.
108
00:08:58,290 --> 00:08:59,170
That's great.
109
00:08:59,250 --> 00:09:02,220
And then it's identified out of the ones on the right.
110
00:09:02,220 --> 00:09:05,020
It's identified that these three are not.
111
00:09:05,160 --> 00:09:08,520
And hasn't found that feature in those images.
112
00:09:08,520 --> 00:09:14,700
So for the for the algorithm those are not faces that's also good but then it does have some error.
113
00:09:14,700 --> 00:09:20,100
So it's got some these false negatives.
114
00:09:20,100 --> 00:09:24,170
Right so it's it's a Negat saying it's a negative but it's actually false.
115
00:09:24,210 --> 00:09:26,170
It's not so correct.
116
00:09:26,530 --> 00:09:31,620
These got two false negatives that says these are not faces when they are actually faces so didn't find
117
00:09:31,620 --> 00:09:34,650
the feature on these on these two pictures.
118
00:09:34,650 --> 00:09:36,360
But these are actually faces.
119
00:09:36,960 --> 00:09:39,450
And then it's got two false positives.
120
00:09:39,450 --> 00:09:46,760
It's identified these as faces even though they're not faces so what adaptive boosting does is it says
121
00:09:46,790 --> 00:09:47,390
ok cool.
122
00:09:47,390 --> 00:09:56,150
So we we like this feature because and the way we found it was perhaps that it is present on a huge
123
00:09:56,150 --> 00:10:04,850
number of images it's the one that comes up most commonly and through our like empirical training we've
124
00:10:04,850 --> 00:10:10,120
identified that yes this is a good picture but now what adaptive boosting will do is.
125
00:10:10,150 --> 00:10:16,070
OK I need to compliment like I might have a lot of very good features I might have this feature is good
126
00:10:16,170 --> 00:10:18,050
then another feature is good and so on.
127
00:10:18,050 --> 00:10:24,350
So but our idea is to go back here so the point of adaptive boosting is that we don't want to just combine
128
00:10:24,350 --> 00:10:31,220
the strongest features or because they might they might actually be leveraging off very similar things
129
00:10:31,220 --> 00:10:36,050
or that might not be the best approach the best approach that we want to take this is the important
130
00:10:36,050 --> 00:10:41,360
part is we want to take a strong feature but then we want to complement it with something that will
131
00:10:41,360 --> 00:10:46,140
help you know help fix where this feature is strong.
132
00:10:46,160 --> 00:10:49,320
That's good but where it's weak where it's making mistakes.
133
00:10:49,430 --> 00:10:57,800
This will help complement that area and will help improve the performance in those areas which are falling
134
00:10:57,800 --> 00:11:00,060
behind in this feature by itself.
135
00:11:00,230 --> 00:11:04,580
And then this feature will be used to complement these two.
136
00:11:04,700 --> 00:11:06,180
And we're they're falling behind.
137
00:11:06,170 --> 00:11:11,060
And so each next one we're not just taking the strongest features we can find we're staying a strong
138
00:11:11,060 --> 00:11:16,180
feature or the strongest and then we take in the one that best complements this one and then we're taking
139
00:11:16,220 --> 00:11:19,610
the next one which best complements these two and the next on and so on.
140
00:11:19,610 --> 00:11:26,150
So basically instead of just taking the strongest all the time we're constructing the strongest classifier
141
00:11:26,300 --> 00:11:30,620
to resulting classify that we can rather than.
142
00:11:30,620 --> 00:11:38,390
And so basically by leveraging the strengths of this one and then fixing up its weaknesses with this
143
00:11:38,390 --> 00:11:42,380
one and then fixing up their weaknesses with this and so on so we covered.
144
00:11:42,400 --> 00:11:45,420
That's very kind of like a strategic approach.
145
00:11:45,500 --> 00:11:53,630
And so here again we've got these are this is all good but then we've got these two false negative false
146
00:11:53,900 --> 00:11:56,420
negatives and false positives.
147
00:11:56,480 --> 00:12:04,790
And so what adaptive boosting our will do is in the next round it will now look for something that complements
148
00:12:04,880 --> 00:12:08,190
this feature that we found and how we'll do it.
149
00:12:08,200 --> 00:12:12,890
What I'll say is it will give more weight to where the errors were made.
150
00:12:12,890 --> 00:12:16,630
Ill say Okay so now I'm going to let's go let's go back.
151
00:12:16,640 --> 00:12:21,570
Although explain this or so it's decreased the sizes.
152
00:12:21,620 --> 00:12:28,010
This is just to just to symbolize what is going on so we've decreased the size of these images increase
153
00:12:28,010 --> 00:12:32,630
the sizes of these zones highlighted in blue and decreased the size of these on increase the size of
154
00:12:32,630 --> 00:12:36,170
these ones so you can see again if I go back.
155
00:12:36,170 --> 00:12:37,290
So there you go.
156
00:12:37,280 --> 00:12:44,180
So it's increased the sizes of these ones saying that it's not actually increasing the size of the image
157
00:12:44,180 --> 00:12:47,790
which is doing it here on the PowerPoint just so that it is clear.
158
00:12:47,810 --> 00:12:52,790
It's like it's easier to follow along but what it's doing is just increasing the weight the importance
159
00:12:52,790 --> 00:13:01,430
of these images for the next for the next horror like feature one is going to be looking at this whole
160
00:13:01,730 --> 00:13:07,970
system is going to now is going to know that the importance of these images is higher and it's going
161
00:13:07,970 --> 00:13:14,420
to find a feature that treats them the best that helps I classify them properly.
162
00:13:14,660 --> 00:13:15,850
And that's what we do.
163
00:13:15,860 --> 00:13:24,770
That's how it accounts for that is how it implements that notion where we want to fix the problems of
164
00:13:24,770 --> 00:13:27,720
the first class fire the first feature that we had so.
165
00:13:27,950 --> 00:13:35,030
So now if we add a new one it finds the next one that best works with these images that predominantly
166
00:13:35,030 --> 00:13:37,550
classify these images properly and find this one.
167
00:13:37,550 --> 00:13:42,920
And so as you can see this now it applies to classify these five images correctly.
168
00:13:43,100 --> 00:13:46,950
Unfortunately classify that or an incorrectly so that's a false negative.
169
00:13:47,300 --> 00:13:50,470
And then it's classified these images correctly and then it's classified.
170
00:13:50,480 --> 00:13:53,260
This one still incorrectly it's a false positive.
171
00:13:53,270 --> 00:14:01,850
So maybe it was not possible to find a classifier that would classify these two very well so what the
172
00:14:01,850 --> 00:14:05,960
algorithm did is it found one that classifies these four including this one very well.
173
00:14:06,140 --> 00:14:11,670
And now we're going to have to fix find a new classifier that fixes these errors.
174
00:14:11,870 --> 00:14:13,250
And that's what will happen next.
175
00:14:13,250 --> 00:14:16,240
Next we would focus on these two.
176
00:14:16,490 --> 00:14:21,380
So everything will become smaller and would focus on these two as predominantly the ones that we need
177
00:14:21,380 --> 00:14:28,070
to fix in order to create that very strong overall classifier.
178
00:14:28,070 --> 00:14:29,750
And and this would keep going.
179
00:14:30,080 --> 00:14:37,070
And in the end the goal is to create a classifier that classifies all of these correctly and classifies
180
00:14:37,100 --> 00:14:39,230
all of these correctly as well.
181
00:14:39,330 --> 00:14:41,440
These faces spaces these are known faces.
182
00:14:41,480 --> 00:14:44,110
So of course it's not never going to be ideal.
183
00:14:44,120 --> 00:14:53,150
But the point is to keep building this strong classifier through these weak classifiers just keep building
184
00:14:53,150 --> 00:15:01,570
it and minimizing the error rate until you get to a satisfactory level until you get to a very high
185
00:15:03,550 --> 00:15:08,890
percentage of correct classifications like 99 point something percent.
186
00:15:08,890 --> 00:15:14,980
And at that point you don't need the hundred and eighty thousand You just need that first couple of
187
00:15:14,980 --> 00:15:20,530
thousand that or hundreds however many you can get to.
188
00:15:20,550 --> 00:15:27,400
You mean you'll need to get to that good satisfactory conversion rate and then that's it.
189
00:15:27,400 --> 00:15:29,830
And that's all you need.
190
00:15:30,060 --> 00:15:31,870
And then you don't classify as really.
191
00:15:32,020 --> 00:15:43,720
And that's a first step to reducing the number of the computation expense of finding find these images.
192
00:15:43,720 --> 00:15:48,420
That's the first step is the added boost in what we discussed today.
193
00:15:48,610 --> 00:15:52,010
And then the next step is going to be the cascading.
194
00:15:52,240 --> 00:15:58,050
And once you combine them together you'll see that it actually improves the efficiency substantially.
195
00:15:58,360 --> 00:16:02,380
So there you go that's how the Arab Boosler adaptive boosting algorithm works.
196
00:16:02,530 --> 00:16:11,120
If you'd like to read a bit more about it there is a paper by Gene you and Paul Viola.
197
00:16:11,140 --> 00:16:12,910
It's called boosting image retrieval.
198
00:16:12,910 --> 00:16:19,690
And this paper inspired some of the work in the violin Jones paper which we were talking about at the
199
00:16:19,690 --> 00:16:20,370
start.
200
00:16:20,560 --> 00:16:23,220
So hope you enjoyed Sterle 40 next 14.
201
00:16:23,380 --> 00:16:25,540
And until then enjoy computer vision.
22897
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.