All language subtitles for 025 Predicting Object Positions-en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranรฎ)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,660 --> 00:00:04,980 Hello and welcome back to the course on computer vision in today's eternal we're going to talk about 2 00:00:04,980 --> 00:00:11,010 another powerful feature of the essays the algorithm and that is how it predicts object positions. 3 00:00:11,010 --> 00:00:18,090 So here we got another lovely photo which we took in Italy and these are just part of what's so small 4 00:00:18,090 --> 00:00:24,720 boats which are parked for the day or for the night and they're going to sometime soon go maybe fishing 5 00:00:24,930 --> 00:00:26,990 or whatever they're meant to do. 6 00:00:27,300 --> 00:00:35,130 And as previously we're going to overlay a grid over this image and then for every single center on 7 00:00:35,130 --> 00:00:41,110 this grid we're going to introduce a couple of rectangles and we're going to stick with our example 8 00:00:41,110 --> 00:00:43,470 three rectangles like that. 9 00:00:43,560 --> 00:00:49,110 And again as we discussed previously we're going to take each one of these rectangles and think of it 10 00:00:49,170 --> 00:00:55,650 as a separate image and then within each rectangle the conventional neural network will ask the question 11 00:00:57,360 --> 00:01:03,350 is there a bullet or is there not a bone basically lost about objects if you if you watch this video 12 00:01:03,360 --> 00:01:08,880 this is from the promotional video for the course and you posit at some point you'll see that this boat 13 00:01:08,880 --> 00:01:15,340 actually at one in one frame was recognized as an airplane or there was like a boat and an airplane 14 00:01:15,340 --> 00:01:16,450 recognized in one here. 15 00:01:16,450 --> 00:01:23,350 So it kind of shows that that's Lewis looking for all classes of objects at the same time. 16 00:01:23,350 --> 00:01:30,040 So yeah it's a good verification that sometimes those errors happen it's just shows that maybe it can 17 00:01:30,050 --> 00:01:34,390 confuse an object and this does look like a bit not like a nonstandard board because it's like a rubber 18 00:01:34,390 --> 00:01:35,680 boat. 19 00:01:35,730 --> 00:01:36,080 Yeah. 20 00:01:36,100 --> 00:01:43,120 So basically each one of these squares each one of these rectangles they go say this rectangle can there's 21 00:01:43,120 --> 00:01:44,230 so many of them you can barely see. 22 00:01:44,240 --> 00:01:50,050 Let's say this right over here is going to ask is like just looking at it by itself forgetting about 23 00:01:50,050 --> 00:01:53,560 everything else is going to ask is there any object inside here. 24 00:01:53,560 --> 00:01:55,680 Can I detect any features of an object. 25 00:01:55,750 --> 00:01:56,780 Yes or no. 26 00:01:57,050 --> 00:02:03,310 And so in our case in that well after the training in the ideal scenario of these falling rectangles 27 00:02:03,340 --> 00:02:08,130 will pick up features of bolts and so let's get rid of everything else. 28 00:02:08,140 --> 00:02:15,040 So there you go you can see that these rectangles do overlap with our ground truth and therefore it's 29 00:02:15,040 --> 00:02:17,120 fair to say that they will pick up the sort of features. 30 00:02:17,150 --> 00:02:25,570 But again just a couple of things I wanted to point out here that during training this might not be 31 00:02:25,570 --> 00:02:31,270 the case maybe this rectangle pick up the boat on this rectangle and this rectangle won't pick up a 32 00:02:31,270 --> 00:02:31,500 boat. 33 00:02:31,510 --> 00:02:36,670 But then through the years of training process they will know they will learn what the features of boats 34 00:02:37,330 --> 00:02:39,670 are and how to pick them up and so on. 35 00:02:39,670 --> 00:02:42,020 So that's the training that we got to lend. 36 00:02:42,020 --> 00:02:47,860 Lengthy hard part in the detection of the network after it's trained up this is what will happen. 37 00:02:47,920 --> 00:02:54,010 So we kind of like combining the concepts of training and application so it's important just to keep 38 00:02:54,010 --> 00:02:56,230 in mind what exactly we're talking about. 39 00:02:56,470 --> 00:03:03,910 So this is the ideal scenario what would happen during train and during application after the network 40 00:03:03,910 --> 00:03:04,640 has trained. 41 00:03:04,930 --> 00:03:12,630 And so let's have a look at one specific set of these three boxes for instance this one over here. 42 00:03:12,640 --> 00:03:19,660 So the interesting thing is that these boxes have picked up features of a boat ride so they can see 43 00:03:19,660 --> 00:03:27,900 maybe the front of the boat and this anchor line maybe some you know you can see that it's in the water. 44 00:03:28,000 --> 00:03:29,370 They can see like this. 45 00:03:29,420 --> 00:03:33,010 This part isn't just standing out meaning that's where the driver stands. 46 00:03:33,020 --> 00:03:37,540 And so some of you can see more some of them can see less but nevertheless not all of them can see the 47 00:03:37,540 --> 00:03:39,050 full image of the boat. 48 00:03:39,090 --> 00:03:41,940 None of you can see the full ground truth of the boat. 49 00:03:42,220 --> 00:03:51,190 And so the question is how can a box predict how can like one of these temporary or inferred boxes how 50 00:03:51,190 --> 00:03:55,630 can it predict where exactly the full boat will be. 51 00:03:55,630 --> 00:03:56,400 That's the question. 52 00:03:56,470 --> 00:04:02,110 So if you like if you use your hand to just close that this part of the image. 53 00:04:02,110 --> 00:04:07,840 How can you complete this boat without knowing where it actually is supposed to be complete without 54 00:04:07,840 --> 00:04:09,190 knowing the full ground truth. 55 00:04:09,190 --> 00:04:11,490 How do you know where it will be completed. 56 00:04:11,710 --> 00:04:17,880 And that is another special part in the s is the algorithm. 57 00:04:17,890 --> 00:04:20,910 It's actually not that complex when you think about it. 58 00:04:21,070 --> 00:04:29,460 So the underlying principle is that we have the ground truth when making these predictions and training 59 00:04:29,460 --> 00:04:32,020 the algorithms so now we're back to talking about training. 60 00:04:32,110 --> 00:04:37,550 So when training the answer the algorithm we have these ground troops we have these actual boxes. 61 00:04:37,900 --> 00:04:46,030 And so we can then let these boxes let the algorithm make its predictions of the algorithm based on 62 00:04:46,030 --> 00:04:52,360 what it can see based on these individual boxes it can then make a prediction. 63 00:04:52,360 --> 00:04:52,560 OK. 64 00:04:52,570 --> 00:04:57,010 I think the boat looks like maybe maybe I'll say it looks like this. 65 00:04:57,050 --> 00:05:03,940 So the full boat the full rectangle should be around like this size although it wouldn't do that because 66 00:05:03,940 --> 00:05:07,030 it knows that there's no features a boat overboard in here. 67 00:05:07,030 --> 00:05:07,600 Right so. 68 00:05:07,710 --> 00:05:12,340 But for instance it could say like it knows that this rectangle has features of both. 69 00:05:12,340 --> 00:05:12,870 Right. 70 00:05:12,910 --> 00:05:17,020 So it doesn't know maybe the boat actually goes up up to up to the top here. 71 00:05:17,020 --> 00:05:19,360 So it does use the overlapping. 72 00:05:19,450 --> 00:05:21,610 So it is it makes it easier. 73 00:05:21,610 --> 00:05:28,960 So like if you go if you go back it will use overlapping and where they are overlapping that's of course 74 00:05:28,960 --> 00:05:34,230 is a high probability that chance that there is a boat where the two rectangles which are overlapping 75 00:05:34,230 --> 00:05:34,860 is a hybrid. 76 00:05:34,870 --> 00:05:35,590 It is a boat. 77 00:05:35,890 --> 00:05:41,260 But nevertheless you can still make errors it can still maim like come to the assumption that the boat 78 00:05:41,260 --> 00:05:47,350 is higher longer or wider or something like that but the way this is mitigated is again through training. 79 00:05:47,350 --> 00:05:52,870 So at the end once it's made its predictions there will be a CERN. 80 00:05:53,020 --> 00:06:00,460 CERN or box that it suggests for this body will say that the box star said this pixel the bottom left 81 00:06:00,460 --> 00:06:04,790 corner is here on the bottom right corner is instead of saying here it might say we're here. 82 00:06:05,140 --> 00:06:11,380 And then it will actually that's resolved that box that it creates will be compared to the ground truth 83 00:06:11,920 --> 00:06:18,080 and then again will be an era which can be calculated and if you think of it in terms of coordinates 84 00:06:18,080 --> 00:06:20,130 it can be calculated CHONE distance. 85 00:06:20,340 --> 00:06:26,700 There's lots of mathematics going on in the background so we're not going to go into that right now 86 00:06:26,710 --> 00:06:28,590 how exactly the error is calculated. 87 00:06:28,740 --> 00:06:35,460 But you can envisage how the arrow would could be expressed in mathematical terms and that era would 88 00:06:35,460 --> 00:06:41,300 be then propagated back up through the network in order to help update the weights. 89 00:06:41,310 --> 00:06:47,910 And again we talked about all of this like the back propagation process and gradient descent in the 90 00:06:47,910 --> 00:06:54,280 annex for a and then artificial neural networks and convolutional neural networks but then the conceptual 91 00:06:54,300 --> 00:06:55,080 that's what is happening. 92 00:06:55,080 --> 00:07:01,680 So the bottom line takeaway from this tutorial is that through training two things happen the first 93 00:07:01,680 --> 00:07:09,150 thing is that what we discussed previously is that each box will learn to better classify objects or 94 00:07:09,150 --> 00:07:11,680 better identify if it has objects inside there. 95 00:07:11,700 --> 00:07:15,990 And in each box will use the ground truth to assess that. 96 00:07:15,990 --> 00:07:20,910 So for instance this box will say that yes there is about where is this box or this box. 97 00:07:20,920 --> 00:07:23,960 This this box we want it to say yes there is about. 98 00:07:24,120 --> 00:07:28,650 And this box we don't want it to say that there is a bot and that's what'll happen to training and they'll 99 00:07:28,650 --> 00:07:29,650 get better at that. 100 00:07:29,670 --> 00:07:36,450 But the second thing that will happen is they will get better at identifying the exact final output 101 00:07:36,450 --> 00:07:39,840 rectangle of that they should create. 102 00:07:39,840 --> 00:07:46,590 So we want we want these this algorithm this is D to produce this exact rectangle which matches the 103 00:07:46,590 --> 00:07:52,020 ground truth rather than a large direct eye or rather than a shorter rectangle. 104 00:07:52,020 --> 00:07:55,670 So for instance here you can see there is always going to be some sort of error. 105 00:07:55,680 --> 00:08:00,060 So here you can see that this rectangle because this isn't the ground truth right. 106 00:08:00,120 --> 00:08:04,050 So at the beginning we discussed there isn't this isn't this isn't ground truth. 107 00:08:04,050 --> 00:08:08,250 We assume this to be the ground truth but as you can see. 108 00:08:08,310 --> 00:08:12,270 And then at that time I said that I wouldn't draw a better rectangle around any of them except I'm into 109 00:08:12,290 --> 00:08:12,500 this. 110 00:08:12,510 --> 00:08:16,910 But now if you look carefully you can see that this rectangle could be a bit narrower. 111 00:08:16,930 --> 00:08:22,190 Right you could see that it could be a bit narrower here and this rectangle could also be a bit narrower. 112 00:08:22,190 --> 00:08:26,700 So it's taking a little bit more so there's no I can there's no bones in these pixels over here on the 113 00:08:26,700 --> 00:08:27,300 right. 114 00:08:27,300 --> 00:08:32,740 There's no boat in these pixels on the left and maybe even this one could be a tiny little bit narrower. 115 00:08:33,480 --> 00:08:34,470 So there we go. 116 00:08:34,470 --> 00:08:40,280 That's how one second that's how the training happens. 117 00:08:40,290 --> 00:08:46,350 And so that's another element of the training that is important to keep in mind so maybe if this algorithm 118 00:08:46,350 --> 00:08:51,780 was trained along with the one that was actually applied to drill what we took as the ground maybe if 119 00:08:51,840 --> 00:08:58,200 it was trained for a longer maybe this bullet would have been more precisely identified during the application 120 00:08:58,200 --> 00:09:00,090 process. 121 00:09:00,210 --> 00:09:05,640 But having said that this was a quite a tough challenge because as you can imagine the blow boats are 122 00:09:05,640 --> 00:09:12,790 kind of like moving around they're floating with waves the waves and also that the drone is flying. 123 00:09:12,810 --> 00:09:20,730 So this is actually a pretty good result for those circumstances anyway so that's the second part that 124 00:09:20,730 --> 00:09:22,890 happens in the training. 125 00:09:23,580 --> 00:09:29,620 This is slow as you can see slowly we're building up the different components of this deal. 126 00:09:29,610 --> 00:09:34,800 So training is how to learn how to classify objects through the ground truth but also is the ground 127 00:09:34,800 --> 00:09:39,270 truth to identify the correct widths and heights of those boxes. 128 00:09:39,480 --> 00:09:47,270 And in the next tutorial we'll continue and talk about the next part component of that as is the. 129 00:09:47,440 --> 00:09:50,860 And of course see you there and until next time in your computer vision. 14816

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.