subtitlecat.com

All language subtitles for 008 Adaptive Boosting (Adaboost)-en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese Download

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,610 --> 00:00:06,010 Hello welcome back to the course on computer vision in today's tutorial we're going to talk about adaptive 2 00:00:06,160 --> 00:00:08,390 boosting or other boost. 3 00:00:08,440 --> 00:00:17,320 So previously we left off when we discussed how the algorithm uses input data or images to find which 4 00:00:17,320 --> 00:00:21,310 features are important to recognizing faces. 5 00:00:21,310 --> 00:00:28,610 So it uses the face images to understand which features are common among faces and it keeps those. 6 00:00:28,700 --> 00:00:34,040 And then it uses the nonfatal images to understand out of the ones that are the features that it's kept 7 00:00:34,460 --> 00:00:38,760 which features give it high rates of false positives. 8 00:00:38,930 --> 00:00:43,670 And then it discards those features and makes them less important and focus on the features that are 9 00:00:43,730 --> 00:00:51,830 present and faces and faces only that do not come up very commonly in other types of objects or the 10 00:00:51,860 --> 00:00:53,000 pictures. 11 00:00:53,070 --> 00:00:55,300 And so that's basically how the algorithm is trained. 12 00:00:55,640 --> 00:01:02,330 And then we have these different features and there are thresholds at which they are considered to be 13 00:01:02,330 --> 00:01:07,590 present in an image and so that sounds like the end of the story sounds like. 14 00:01:07,600 --> 00:01:08,180 Ok cool. 15 00:01:08,180 --> 00:01:13,700 We've identified the features we've identified thresholds all we have to do when a new image which is 16 00:01:13,730 --> 00:01:19,870 going to watch is going to look at the features that we've identified check if their present check the 17 00:01:19,910 --> 00:01:20,770 thresholds are met. 18 00:01:20,780 --> 00:01:25,750 And we're going to know if it's a face while everything is not that simple. 19 00:01:25,760 --> 00:01:38,930 The reason for this hurdle is that even in a 24 by 24 pixel image the number of features is huge the 20 00:01:38,940 --> 00:01:45,810 number of features actually is over a hundred and eighty thousand possible features that can fit into 21 00:01:45,810 --> 00:01:46,340 this image. 22 00:01:46,350 --> 00:01:52,650 Sounds it sounds unbelievable because the image is so small it's only 24 pixels but it is true. 23 00:01:52,650 --> 00:01:58,800 So the the reason for that is that these features as we discussed there are scalable. 24 00:01:58,800 --> 00:02:03,810 So not only are we looking at these features on all their different positions in this image. 25 00:02:03,810 --> 00:02:08,010 We're actually looking at all of the variations of each one of these features. 26 00:02:08,010 --> 00:02:13,380 For instance this feature looks like for example it looks like there's one 2 pixels here and 1 2 pixels 27 00:02:13,380 --> 00:02:13,840 here. 28 00:02:13,890 --> 00:02:19,860 So you need to look through this image and try all all possible positions of this feature or very likely 29 00:02:19,850 --> 00:02:24,880 like that and then all possible positions here here all different positions here. 30 00:02:25,230 --> 00:02:26,660 And that's that's the start. 31 00:02:26,700 --> 00:02:33,540 But now you need to also extend that as to be like one to three pixels three pixels here and three pixels 32 00:02:33,560 --> 00:02:38,380 here and again you have to try out all possible positions for that feature. 33 00:02:38,400 --> 00:02:41,730 Next you might make it like four pixels high. 34 00:02:41,800 --> 00:02:46,170 Again you have to try it out all the positions the next you might make it for pixels high and then two 35 00:02:46,170 --> 00:02:48,360 pixels wide so that one. 36 00:02:48,360 --> 00:02:52,920 One two three four one two three four here and one two three four one two three four here and you need 37 00:02:52,920 --> 00:02:54,640 to try that out as well. 38 00:02:54,780 --> 00:03:00,030 And then once and then you keep doing that you keep expanding making it wider and so on and all the 39 00:03:00,030 --> 00:03:03,260 possible widths and heights of this feature. 40 00:03:03,450 --> 00:03:07,920 And then once you've run out of options for that you need to move on to the next one and the next or 41 00:03:07,920 --> 00:03:08,960 next on THE NEXT ONE. 42 00:03:09,150 --> 00:03:17,740 And so in a 24 by 24 pixel image these base features they all possible variations of them and their 43 00:03:17,760 --> 00:03:18,650 different positions. 44 00:03:18,810 --> 00:03:26,670 They add up to over a hundred and eighty thousand possible options that we would have to explore. 45 00:03:26,880 --> 00:03:29,670 And that is a huge number. 46 00:03:29,710 --> 00:03:33,210 So and it poses two concerns. 47 00:03:33,210 --> 00:03:40,550 First of all during the training that would be very hard because you know not only have to check 180000 48 00:03:40,560 --> 00:03:47,520 features for one image you need to check hundred eighty thousand features for all the images in the 49 00:03:47,520 --> 00:03:53,670 training data which is nine thousand eight hundred thirty two faces in the original Old Jones paper 50 00:03:54,060 --> 00:04:02,730 plus the huge number you know thousands and tens of thousands of images of known faces so you'd have 51 00:04:02,730 --> 00:04:10,830 to check that across all those images so training becomes quite long becomes like a nightmare. 52 00:04:10,950 --> 00:04:18,900 And the second thing is that even during your application so when you're detecting you're in if you've 53 00:04:18,900 --> 00:04:24,990 somehow managed to train those all those features now when you're detecting the faces you have to check 54 00:04:25,000 --> 00:04:27,260 180000 every single time. 55 00:04:27,480 --> 00:04:29,540 And that is practically impossible to do in real time. 56 00:04:29,550 --> 00:04:37,670 Take a lot of computation to do that for every single frame or every single image. 57 00:04:37,850 --> 00:04:41,960 So this is where adaptive boosting comes in to help solve this problem. 58 00:04:41,960 --> 00:04:48,560 So we've got we're going to take our features and we're going to put them together into a classifier 59 00:04:48,830 --> 00:04:58,280 which will look like this so here on the left we've got the classifier f f of x and then here f one 60 00:04:58,280 --> 00:05:04,460 of two and three are the features and Alpha 1 2 and 3 are the way to those features so we'll just add 61 00:05:04,460 --> 00:05:05,750 some features here. 62 00:05:05,750 --> 00:05:14,330 For instance there's a feature so that feature could be that's one that's commonly talked about the 63 00:05:16,010 --> 00:05:19,840 bridge of the nose of the nose is usually lighter than on the left on the right. 64 00:05:19,840 --> 00:05:29,430 So it's it's a feature that can help detect faces and this is the feature that the eyes are commonly 65 00:05:29,430 --> 00:05:32,230 darker than the area under the eyes. 66 00:05:32,310 --> 00:05:35,970 And so just even these two features will get as I say. 67 00:05:35,980 --> 00:05:40,110 So that's those two features and then maybe there's another feature and so on. 68 00:05:40,110 --> 00:05:46,170 So we've got our features aligned and that and it keeps going like that. 69 00:05:46,410 --> 00:05:51,400 And each one of these features is called a weak classifier on its own. 70 00:05:51,450 --> 00:05:57,840 He doesn't get a very high rate of success so as long as it gets over 50 percent that's already good. 71 00:05:57,840 --> 00:06:03,750 So for instance maybe this bridge of the nose if you just used that feature on its own maybe gets a 72 00:06:03,750 --> 00:06:10,560 60 percent success rate or a road or 65 percent success rate out of the images. 73 00:06:10,830 --> 00:06:19,560 And then this image that or this classifier or this part on the left the effort the F couple of X that's 74 00:06:19,560 --> 00:06:21,100 called a strong cost classified. 75 00:06:21,390 --> 00:06:28,160 And the way it works is that when you have one classifier by itself it's not as good. 76 00:06:28,170 --> 00:06:31,470 It maybe has a 60 percent success rate. 77 00:06:31,740 --> 00:06:38,430 When you have two week classifiers together even though this one might also be only like 60 or something 78 00:06:38,430 --> 00:06:39,970 percent or 55 percent. 79 00:06:40,050 --> 00:06:42,360 But together all of a sudden they're much stronger. 80 00:06:42,390 --> 00:06:48,730 And then as you keep adding they get stronger and stronger and stronger so you don't actually need. 81 00:06:48,990 --> 00:06:51,890 You don't need all hundred and eighty thousand of them. 82 00:06:52,080 --> 00:06:59,250 You might just need a couple of thousand to get a really really good result very strong classified. 83 00:06:59,400 --> 00:07:05,670 And this is called an ensemble method this is called ensemble because you are leveraging the power of 84 00:07:05,670 --> 00:07:06,930 the crowd as they call it. 85 00:07:06,930 --> 00:07:13,030 So it's like the power of one is not as strong. 86 00:07:13,320 --> 00:07:18,960 But when you put lots of we classifiers together they become a strong classifier and that is what I 87 00:07:18,960 --> 00:07:24,980 was going to say that even just these two features the bridge of the nose and the eyes and the area 88 00:07:24,990 --> 00:07:26,290 under the eyes. 89 00:07:26,730 --> 00:07:32,550 It's the exact numbers in the village on paper but I think they said something like it gave them an 90 00:07:32,610 --> 00:07:39,130 80 percent accuracy just those two features together or really made it so much more powerful. 91 00:07:39,180 --> 00:07:45,420 Not it wasn't close to 90 or a hundred but it was already much better than each one of those on their 92 00:07:45,420 --> 00:07:47,000 own. 93 00:07:47,160 --> 00:07:51,700 And so as you can imagine by adding more and more and more you can get better and better better results. 94 00:07:51,810 --> 00:07:57,840 So the question is now how do we find these right features how do we add the right ones. 95 00:07:57,870 --> 00:08:03,060 So the most important ones are want to have the most important ones at the front. 96 00:08:03,570 --> 00:08:04,050 And then. 97 00:08:04,080 --> 00:08:11,070 So then because our of the 180 thousand then the ones at the end you don't even need to worry about 98 00:08:11,070 --> 00:08:12,500 them after a certain point. 99 00:08:12,750 --> 00:08:16,680 So how do we find the best ones and this is where adaptive boosting comes into play. 100 00:08:16,680 --> 00:08:20,580 We won't go into the mathematics behind the algorithm we'll just go into the intuition. 101 00:08:20,850 --> 00:08:26,300 So let's say you have 10 pictures you have five faces and five known faces. 102 00:08:26,550 --> 00:08:31,520 And during the training process you apply. 103 00:08:31,540 --> 00:08:35,700 You want to identify the like the best feature. 104 00:08:35,700 --> 00:08:41,310 See I don't you want to identify how to build that strong classifier So you identify a feature that 105 00:08:41,310 --> 00:08:42,660 is important. 106 00:08:42,840 --> 00:08:52,060 And for instance the bridge of the nose feature and there you you apply it to your images. 107 00:08:52,290 --> 00:08:58,290 And so you get a result for for instance it says identify that these three are indeed faces. 108 00:08:58,290 --> 00:08:59,170 That's great. 109 00:08:59,250 --> 00:09:02,220 And then it's identified out of the ones on the right. 110 00:09:02,220 --> 00:09:05,020 It's identified that these three are not. 111 00:09:05,160 --> 00:09:08,520 And hasn't found that feature in those images. 112 00:09:08,520 --> 00:09:14,700 So for the for the algorithm those are not faces that's also good but then it does have some error. 113 00:09:14,700 --> 00:09:20,100 So it's got some these false negatives. 114 00:09:20,100 --> 00:09:24,170 Right so it's it's a Negat saying it's a negative but it's actually false. 115 00:09:24,210 --> 00:09:26,170 It's not so correct. 116 00:09:26,530 --> 00:09:31,620 These got two false negatives that says these are not faces when they are actually faces so didn't find 117 00:09:31,620 --> 00:09:34,650 the feature on these on these two pictures. 118 00:09:34,650 --> 00:09:36,360 But these are actually faces. 119 00:09:36,960 --> 00:09:39,450 And then it's got two false positives. 120 00:09:39,450 --> 00:09:46,760 It's identified these as faces even though they're not faces so what adaptive boosting does is it says 121 00:09:46,790 --> 00:09:47,390 ok cool. 122 00:09:47,390 --> 00:09:56,150 So we we like this feature because and the way we found it was perhaps that it is present on a huge 123 00:09:56,150 --> 00:10:04,850 number of images it's the one that comes up most commonly and through our like empirical training we've 124 00:10:04,850 --> 00:10:10,120 identified that yes this is a good picture but now what adaptive boosting will do is. 125 00:10:10,150 --> 00:10:16,070 OK I need to compliment like I might have a lot of very good features I might have this feature is good 126 00:10:16,170 --> 00:10:18,050 then another feature is good and so on. 127 00:10:18,050 --> 00:10:24,350 So but our idea is to go back here so the point of adaptive boosting is that we don't want to just combine 128 00:10:24,350 --> 00:10:31,220 the strongest features or because they might they might actually be leveraging off very similar things 129 00:10:31,220 --> 00:10:36,050 or that might not be the best approach the best approach that we want to take this is the important 130 00:10:36,050 --> 00:10:41,360 part is we want to take a strong feature but then we want to complement it with something that will 131 00:10:41,360 --> 00:10:46,140 help you know help fix where this feature is strong. 132 00:10:46,160 --> 00:10:49,320 That's good but where it's weak where it's making mistakes. 133 00:10:49,430 --> 00:10:57,800 This will help complement that area and will help improve the performance in those areas which are falling 134 00:10:57,800 --> 00:11:00,060 behind in this feature by itself. 135 00:11:00,230 --> 00:11:04,580 And then this feature will be used to complement these two. 136 00:11:04,700 --> 00:11:06,180 And we're they're falling behind. 137 00:11:06,170 --> 00:11:11,060 And so each next one we're not just taking the strongest features we can find we're staying a strong 138 00:11:11,060 --> 00:11:16,180 feature or the strongest and then we take in the one that best complements this one and then we're taking 139 00:11:16,220 --> 00:11:19,610 the next one which best complements these two and the next on and so on. 140 00:11:19,610 --> 00:11:26,150 So basically instead of just taking the strongest all the time we're constructing the strongest classifier 141 00:11:26,300 --> 00:11:30,620 to resulting classify that we can rather than. 142 00:11:30,620 --> 00:11:38,390 And so basically by leveraging the strengths of this one and then fixing up its weaknesses with this 143 00:11:38,390 --> 00:11:42,380 one and then fixing up their weaknesses with this and so on so we covered. 144 00:11:42,400 --> 00:11:45,420 That's very kind of like a strategic approach. 145 00:11:45,500 --> 00:11:53,630 And so here again we've got these are this is all good but then we've got these two false negative false 146 00:11:53,900 --> 00:11:56,420 negatives and false positives. 147 00:11:56,480 --> 00:12:04,790 And so what adaptive boosting our will do is in the next round it will now look for something that complements 148 00:12:04,880 --> 00:12:08,190 this feature that we found and how we'll do it. 149 00:12:08,200 --> 00:12:12,890 What I'll say is it will give more weight to where the errors were made. 150 00:12:12,890 --> 00:12:16,630 Ill say Okay so now I'm going to let's go let's go back. 151 00:12:16,640 --> 00:12:21,570 Although explain this or so it's decreased the sizes. 152 00:12:21,620 --> 00:12:28,010 This is just to just to symbolize what is going on so we've decreased the size of these images increase 153 00:12:28,010 --> 00:12:32,630 the sizes of these zones highlighted in blue and decreased the size of these on increase the size of 154 00:12:32,630 --> 00:12:36,170 these ones so you can see again if I go back. 155 00:12:36,170 --> 00:12:37,290 So there you go. 156 00:12:37,280 --> 00:12:44,180 So it's increased the sizes of these ones saying that it's not actually increasing the size of the image 157 00:12:44,180 --> 00:12:47,790 which is doing it here on the PowerPoint just so that it is clear. 158 00:12:47,810 --> 00:12:52,790 It's like it's easier to follow along but what it's doing is just increasing the weight the importance 159 00:12:52,790 --> 00:13:01,430 of these images for the next for the next horror like feature one is going to be looking at this whole 160 00:13:01,730 --> 00:13:07,970 system is going to now is going to know that the importance of these images is higher and it's going 161 00:13:07,970 --> 00:13:14,420 to find a feature that treats them the best that helps I classify them properly. 162 00:13:14,660 --> 00:13:15,850 And that's what we do. 163 00:13:15,860 --> 00:13:24,770 That's how it accounts for that is how it implements that notion where we want to fix the problems of 164 00:13:24,770 --> 00:13:27,720 the first class fire the first feature that we had so. 165 00:13:27,950 --> 00:13:35,030 So now if we add a new one it finds the next one that best works with these images that predominantly 166 00:13:35,030 --> 00:13:37,550 classify these images properly and find this one. 167 00:13:37,550 --> 00:13:42,920 And so as you can see this now it applies to classify these five images correctly. 168 00:13:43,100 --> 00:13:46,950 Unfortunately classify that or an incorrectly so that's a false negative. 169 00:13:47,300 --> 00:13:50,470 And then it's classified these images correctly and then it's classified. 170 00:13:50,480 --> 00:13:53,260 This one still incorrectly it's a false positive. 171 00:13:53,270 --> 00:14:01,850 So maybe it was not possible to find a classifier that would classify these two very well so what the 172 00:14:01,850 --> 00:14:05,960 algorithm did is it found one that classifies these four including this one very well. 173 00:14:06,140 --> 00:14:11,670 And now we're going to have to fix find a new classifier that fixes these errors. 174 00:14:11,870 --> 00:14:13,250 And that's what will happen next. 175 00:14:13,250 --> 00:14:16,240 Next we would focus on these two. 176 00:14:16,490 --> 00:14:21,380 So everything will become smaller and would focus on these two as predominantly the ones that we need 177 00:14:21,380 --> 00:14:28,070 to fix in order to create that very strong overall classifier. 178 00:14:28,070 --> 00:14:29,750 And and this would keep going. 179 00:14:30,080 --> 00:14:37,070 And in the end the goal is to create a classifier that classifies all of these correctly and classifies 180 00:14:37,100 --> 00:14:39,230 all of these correctly as well. 181 00:14:39,330 --> 00:14:41,440 These faces spaces these are known faces. 182 00:14:41,480 --> 00:14:44,110 So of course it's not never going to be ideal. 183 00:14:44,120 --> 00:14:53,150 But the point is to keep building this strong classifier through these weak classifiers just keep building 184 00:14:53,150 --> 00:15:01,570 it and minimizing the error rate until you get to a satisfactory level until you get to a very high 185 00:15:03,550 --> 00:15:08,890 percentage of correct classifications like 99 point something percent. 186 00:15:08,890 --> 00:15:14,980 And at that point you don't need the hundred and eighty thousand You just need that first couple of 187 00:15:14,980 --> 00:15:20,530 thousand that or hundreds however many you can get to. 188 00:15:20,550 --> 00:15:27,400 You mean you'll need to get to that good satisfactory conversion rate and then that's it. 189 00:15:27,400 --> 00:15:29,830 And that's all you need. 190 00:15:30,060 --> 00:15:31,870 And then you don't classify as really. 191 00:15:32,020 --> 00:15:43,720 And that's a first step to reducing the number of the computation expense of finding find these images. 192 00:15:43,720 --> 00:15:48,420 That's the first step is the added boost in what we discussed today. 193 00:15:48,610 --> 00:15:52,010 And then the next step is going to be the cascading. 194 00:15:52,240 --> 00:15:58,050 And once you combine them together you'll see that it actually improves the efficiency substantially. 195 00:15:58,360 --> 00:16:02,380 So there you go that's how the Arab Boosler adaptive boosting algorithm works. 196 00:16:02,530 --> 00:16:11,120 If you'd like to read a bit more about it there is a paper by Gene you and Paul Viola. 197 00:16:11,140 --> 00:16:12,910 It's called boosting image retrieval. 198 00:16:12,910 --> 00:16:19,690 And this paper inspired some of the work in the violin Jones paper which we were talking about at the 199 00:16:19,690 --> 00:16:20,370 start. 200 00:16:20,560 --> 00:16:23,220 So hope you enjoyed Sterle 40 next 14. 201 00:16:23,380 --> 00:16:25,540 And until then enjoy computer vision. 22897