All language subtitles for 026 The Scale Problem-en

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese Download
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,550 --> 00:00:03,860 Hello welcome back to the course on computer vision. 2 00:00:04,080 --> 00:00:11,340 Today we're talking about the scale problem the scale problem of the that we have that the is the algorithm 3 00:00:11,370 --> 00:00:13,490 actually solves and how it solves. 4 00:00:13,650 --> 00:00:14,270 OK. 5 00:00:14,580 --> 00:00:16,250 So what is a scale problem. 6 00:00:16,470 --> 00:00:18,040 Well let's have a look at this image. 7 00:00:18,060 --> 00:00:25,930 This is an image we took in what was in your brain or in the Czech Republic. 8 00:00:26,560 --> 00:00:31,460 Yes just some horses at a field. 9 00:00:31,660 --> 00:00:33,980 And what can we see here. 10 00:00:33,990 --> 00:00:37,970 Well let's just do what we have learned so far. 11 00:00:37,980 --> 00:00:43,350 And let's see how the algorithm will go if we just apply the constitution so that all the grid would 12 00:00:43,350 --> 00:00:51,780 apply some boxes and right away for them we can see the boxes that have identified have fallen features 13 00:00:51,780 --> 00:00:54,760 of horses and boxes which have been found. 14 00:00:55,020 --> 00:01:04,380 And so if I remove the green boxes which haven't fallen future forces then you will see that somehow 15 00:01:04,590 --> 00:01:11,900 this horse right in front the main horse the one that's like the most obvious to us is humans. 16 00:01:11,940 --> 00:01:13,480 It has been missed. 17 00:01:13,530 --> 00:01:18,810 The algorithm did not pick up the horse it picked up that horse over there it's got features obviously 18 00:01:18,810 --> 00:01:24,780 seen features of that horse's features of that horse and seen pictures of this horse that you can barely 19 00:01:24,780 --> 00:01:28,440 see but it does not see this horse. 20 00:01:28,440 --> 00:01:31,930 Point blank and once the question is why. 21 00:01:32,970 --> 00:01:34,560 Why is that happening here. 22 00:01:34,610 --> 00:01:37,630 What's what's going on. 23 00:01:37,740 --> 00:01:41,350 Well the reason for that is that this horse is too close. 24 00:01:41,430 --> 00:01:46,320 It's too big for the algorithm to pick up features and I'll show you what what I mean by that so let's 25 00:01:46,410 --> 00:01:51,300 look at just a pair of rectangles or a couple of rectangles. 26 00:01:51,300 --> 00:01:55,220 Look at these ones the largest single out one of them so that rectangle. 27 00:01:55,410 --> 00:02:01,170 And while in this grand scheme of things it might seem that how can you not pick up a horse it's pretty 28 00:02:01,170 --> 00:02:02,500 obvious that there is a horse. 29 00:02:02,820 --> 00:02:08,700 Let's remember what we talked about at the start we said that every single one of these rectangles works 30 00:02:08,730 --> 00:02:09,960 as a separate image. 31 00:02:10,050 --> 00:02:16,200 It is dealing in its own right with what it has inside and is trying to understand do I see features 32 00:02:16,200 --> 00:02:21,960 of a horse or do I not see you if you show a horse and to make it even easier to visualize let's crop 33 00:02:21,960 --> 00:02:25,570 the rest of the image and let's look at what this rectangles were getting at. 34 00:02:25,650 --> 00:02:28,500 Let's imagine that we are this rectangle. 35 00:02:28,680 --> 00:02:33,670 And what this rectangle is seeing is that now by looking at that image can you identify that there's 36 00:02:33,690 --> 00:02:35,070 a horse in there. 37 00:02:35,220 --> 00:02:36,580 Probably not. 38 00:02:36,580 --> 00:02:40,820 What would that look like if anything that looks like. 39 00:02:41,190 --> 00:02:52,230 I don't know some like a puddle of mud or maybe I don't know it like some fog very like a fire in the 40 00:02:52,230 --> 00:02:54,390 distance. 41 00:02:54,390 --> 00:02:55,590 I don't know. 42 00:02:55,800 --> 00:02:59,620 Maybe water falling or something like God could be anything right. 43 00:02:59,620 --> 00:03:04,160 If you just forget about what we saw previously and you let your eyes adjust. 44 00:03:04,170 --> 00:03:07,960 You'll notice that it doesn't resemble a horse in any sort of way. 45 00:03:08,580 --> 00:03:11,540 And you'll probably say OK that's fair enough. 46 00:03:11,550 --> 00:03:15,090 But there's other features on the horse that you could identify from this image. 47 00:03:15,120 --> 00:03:18,920 You know you could maybe identify the who knows of the tail of the eyes. 48 00:03:18,960 --> 00:03:20,340 Well let's have a look at that as well. 49 00:03:20,640 --> 00:03:23,280 Let's have a look at another rectangle. 50 00:03:23,330 --> 00:03:24,110 Look at this one. 51 00:03:24,300 --> 00:03:32,050 So here you can see that yes we can kind of identify that there is an eye. 52 00:03:32,160 --> 00:03:40,180 And because of the the what what is it called The part that goes in the mouth with the horse. 53 00:03:40,210 --> 00:03:47,220 And I should I really should know this but because of that pink strap we can see that it's a horse and 54 00:03:47,390 --> 00:03:51,190 and it's fair that the algorithm should be also leveraging that picture. 55 00:03:51,300 --> 00:03:53,300 So yes it does. 56 00:03:53,420 --> 00:03:55,740 It does resemble a horse better. 57 00:03:55,740 --> 00:04:02,520 But remember that the S is not only an to identify that indeed there's a horse in there but it also 58 00:04:02,520 --> 00:04:09,390 means to identify it it will need to make a suggestion for the actual box that completes this horse 59 00:04:09,440 --> 00:04:12,800 that takes lead it contains a horse. 60 00:04:12,870 --> 00:04:19,290 And so from here can you tell just forgetting about that image can you tell where the horse is actually 61 00:04:19,740 --> 00:04:20,250 located. 62 00:04:20,250 --> 00:04:28,650 Is it is it located like this or is it located like this or maybe it's like the guy is turned his head 63 00:04:28,650 --> 00:04:31,060 like this but it's actually standing on this side. 64 00:04:31,110 --> 00:04:36,780 Maybe that's one of its hooves Maybe that is the horse's Hooven maybe that the horses standing here 65 00:04:36,780 --> 00:04:43,320 so it's very hard for the algorithm to predict what the horse looks like what the full image of the 66 00:04:43,320 --> 00:04:44,880 horses are like. 67 00:04:45,030 --> 00:04:51,600 Maybe maybe it's maybe there's something else blocking the horse in the way so the horse is so big that 68 00:04:51,960 --> 00:04:55,820 in most cases for most rectangles we cannot even see that horse. 69 00:04:55,830 --> 00:05:01,210 But even then all those cases where we can Judus certain features identify that it's a horse it's very 70 00:05:01,210 --> 00:05:03,360 hard to build the rectangle. 71 00:05:03,730 --> 00:05:09,460 And so this is where the S is the the next component that comes to the rescue. 72 00:05:09,730 --> 00:05:13,390 And so let's have a look at the scale problem. 73 00:05:13,440 --> 00:05:16,060 That is how he deals with the scale problem. 74 00:05:16,160 --> 00:05:21,370 So let's move back to the architecture and here we can see how there's many layers. 75 00:05:21,370 --> 00:05:27,280 So far we've been talking mostly about looking at examples or one of the original image but in reality 76 00:05:27,280 --> 00:05:33,000 what happens is the image region is constantly reduced in size. 77 00:05:33,040 --> 00:05:41,190 There are many layers in this network that's working behind the algorithm and or in the background. 78 00:05:41,200 --> 00:05:48,310 And basically what it does is it applies a conclusion or operation to reduce the image size so here 79 00:05:48,310 --> 00:05:55,330 you can see how it was 300 them it's thirty eight nineteen ten pixels five pixel three pixels one pixel 80 00:05:55,330 --> 00:05:57,610 so it's constantly being reduced in size. 81 00:05:57,630 --> 00:06:05,150 The distance sensor and and then all of everything we talked about is applied on every single step on 82 00:06:05,530 --> 00:06:13,750 all of those images what we were talking about is applied to all those rectangles identified the training 83 00:06:13,750 --> 00:06:17,370 of where they should be and what they contain where they should be happens. 84 00:06:17,440 --> 00:06:23,830 And as you can see here there's a huge number of detection is eight thousand two hundred and three directions 85 00:06:23,830 --> 00:06:27,620 in this set up for 300 300 and put image at the start. 86 00:06:27,850 --> 00:06:31,500 Eight hundred eight thousand seven hundred thirty two detections per class. 87 00:06:31,580 --> 00:06:34,690 And as you can imagine there's many different classes as well. 88 00:06:34,810 --> 00:06:43,420 And that's because there's many iterations and basically what happens is once every time we can involve 89 00:06:43,480 --> 00:06:48,090 the image further we are again we are applying these classifiers. 90 00:06:48,160 --> 00:06:54,170 Here you can see classifiers are being applied to detect for instance in our case horses again classified 91 00:06:54,250 --> 00:06:59,350 as a BNN to apply to detect horses classify as had been applied classify So basically that just shows 92 00:06:59,350 --> 00:07:02,890 that everything is being repeated again and again and again. 93 00:07:02,980 --> 00:07:07,090 And so let's have a look at that how that happens and then I'll provide some additional comments on 94 00:07:07,090 --> 00:07:07,710 that. 95 00:07:08,170 --> 00:07:14,000 So there is our image what happens to the image is it is it actually goes through a convolution and 96 00:07:14,540 --> 00:07:20,560 we won't you just resize it but that's just for visual purposes just to help us understand better what's 97 00:07:20,560 --> 00:07:21,200 going on. 98 00:07:21,310 --> 00:07:23,360 In reality it's not just resized. 99 00:07:23,380 --> 00:07:29,500 It actually goes through the confusion so the image won't look anything like itself any more it will 100 00:07:29,500 --> 00:07:35,610 be completely random looking thing for us but it will preserve the features. 101 00:07:35,600 --> 00:07:40,990 So once again we discuss this more in the illusional neural networks tutorials which are in an X number 102 00:07:40,990 --> 00:07:41,470 two. 103 00:07:41,620 --> 00:07:46,390 But for our purposes just for us to better understand intuition behind things we're going to keep it 104 00:07:46,420 --> 00:07:52,480 as an image of horses is the exact image and just should resize that will still help us understand what's 105 00:07:52,480 --> 00:07:53,680 going on. 106 00:07:53,750 --> 00:08:01,300 And so here what happens is the image is resized but the rectangles are stay the same size and now they 107 00:08:01,300 --> 00:08:03,770 are applied to this new image which is resized. 108 00:08:03,880 --> 00:08:07,640 And because the image is smaller that means that horse is smaller. 109 00:08:07,660 --> 00:08:12,990 And now if we try to pick up features of horses you'll see that these rectangles do pick up a horse 110 00:08:13,360 --> 00:08:19,500 whereas those rectangles that used speak of a horse so this horse is not picked up in animal that one 111 00:08:19,510 --> 00:08:24,550 isn't that one isn't because they're smaller or not just similar to what we had in the first example 112 00:08:24,550 --> 00:08:28,540 with Adlon at the lake and there was people there were people in the background that they were too small 113 00:08:28,540 --> 00:08:30,160 to be identified. 114 00:08:30,160 --> 00:08:30,820 Same thing here. 115 00:08:30,820 --> 00:08:35,710 So just for argument's sake we're going to say that these horses are not picked up and that's totally 116 00:08:35,710 --> 00:08:41,440 fine because we're very pick them up before whereas now we've picked up this big horse which in this 117 00:08:41,440 --> 00:08:43,630 new version of the image is smaller. 118 00:08:44,080 --> 00:08:48,590 And so now it's it will be identified. 119 00:08:48,760 --> 00:08:54,900 The question though that still remains is how will we draw the box for this horse in the original image 120 00:08:54,910 --> 00:09:00,040 we know from what we discussed in the previous tutorial we know how the algorithm goes about drawing 121 00:09:00,040 --> 00:09:00,840 this box. 122 00:09:00,880 --> 00:09:07,960 In this version of image because through training it will understand you know it has it has lots of 123 00:09:07,960 --> 00:09:15,610 examples of of horses or other objects with the ground truth and will be able to understand how to draw 124 00:09:15,610 --> 00:09:19,630 these boxes and so in this case we will also learn how to draw the boxes in this image. 125 00:09:19,630 --> 00:09:24,300 But remember that we got this image not just through resizing but through a convolution operation. 126 00:09:24,490 --> 00:09:31,580 And that means how do that means we still have a question how do we then take this box and move it back. 127 00:09:31,720 --> 00:09:35,380 Well it's pretty straightforward. 128 00:09:36,080 --> 00:09:42,310 We're just going to look at our architecture again here and so wherever we are in the process the is 129 00:09:42,370 --> 00:09:49,660 the algorithm it's constructed in such a way that it preserves information about how to return how to 130 00:09:49,660 --> 00:09:55,990 scale back to the original image so hard to go from let's say this layer to here are hard to go from 131 00:09:55,990 --> 00:10:01,270 this layer to here so it all remember that it found the horse for instance over here in this part of 132 00:10:01,270 --> 00:10:08,950 the image of this layer and it will also know how it got to this layer so it will know how to get back. 133 00:10:09,010 --> 00:10:17,140 So how to decontrol mean in simplistic terms how to get back to this original image and where it should 134 00:10:17,140 --> 00:10:21,940 place the horse there so that's an important part that it needs to place the horse in the right place 135 00:10:21,940 --> 00:10:24,300 not only in this image but in the original image. 136 00:10:24,370 --> 00:10:28,290 And this is the algorithm is equipped to do that. 137 00:10:28,360 --> 00:10:34,310 And so basically through this way we are now able to handle objects of different sizes. 138 00:10:34,540 --> 00:10:41,590 And the beauty of this is the algorithm is that it doesn't just resize the image and then proceeds do 139 00:10:41,590 --> 00:10:43,200 the conversion operation again. 140 00:10:43,210 --> 00:10:49,870 The idea here is that all this happens at the same time all of these boxes that we're detecting all 141 00:10:49,870 --> 00:10:56,920 of the positions of the boxes the classes inside the boxes and even the resizing or the different scales 142 00:10:56,920 --> 00:10:57,470 of the images. 143 00:10:57,490 --> 00:11:02,910 It all happens in one neural network over here like this so instead of just having this image and then 144 00:11:02,910 --> 00:11:07,920 having a copy of the image where just resize and another copy of the image were smaller and was more 145 00:11:07,910 --> 00:11:12,330 up and running neural networks for each one were running all of this in one your own network. 146 00:11:12,360 --> 00:11:13,500 And why is that good. 147 00:11:13,500 --> 00:11:24,510 Why is that important is because we can utilize the features we can share the features across the different 148 00:11:25,680 --> 00:11:32,160 layers so that you know we've when we're when we're training the network it learns how to detect horses 149 00:11:33,630 --> 00:11:40,080 when they're in the full size or less so in this in this lair in this size than when they're smaller 150 00:11:40,170 --> 00:11:44,970 when they're made smaller and so and so that way all of the layers are learning together. 151 00:11:45,120 --> 00:11:51,890 And that gives it that additional power or that you know there's power in numbers that they're they're 152 00:11:51,900 --> 00:11:57,270 seeing more different horses at the same time so some horses are seen here some horses are seen here 153 00:11:57,540 --> 00:12:04,200 and therefore it is more powerful in the sense that it has more examples to work off and the network 154 00:12:04,290 --> 00:12:08,700 can better adjusted its weights to properly detect those horses. 155 00:12:08,700 --> 00:12:12,110 So this really helps or facilitates the training. 156 00:12:13,580 --> 00:12:20,310 Yeah so that's how the essays the algorithm deals with the scale problem. 157 00:12:20,320 --> 00:12:23,600 It's quite a smart solution. 158 00:12:23,650 --> 00:12:30,130 I'm sure you'd agree that you know putting all of this into one emotional neural network is is pretty 159 00:12:30,820 --> 00:12:34,410 exciting and definitely gives us some amazing results. 160 00:12:34,410 --> 00:12:40,270 And then and on that note I hope you enjoyed this tutorial and I look forward to see you next time. 161 00:12:40,270 --> 00:12:42,370 And until then enjoy computer vision. 18325

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.