All language subtitles for 031 Object Detection - Step 5-en

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese Download
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,480 --> 00:00:07,650 Hello and welcome to this new tutorial now that we've done our four transformations the input is ready 2 00:00:07,830 --> 00:00:09,490 to be fed into the neural network. 3 00:00:09,510 --> 00:00:12,060 It has the authorization to get in. 4 00:00:12,180 --> 00:00:14,730 And therefore that's exactly what we're going to do. 5 00:00:14,730 --> 00:00:16,420 We're going to get it. 6 00:00:16,490 --> 00:00:25,380 And so there is nothing more simple you know we already have our pre-trained model SSD and it is already 7 00:00:25,380 --> 00:00:30,090 pre-trained because we could load the weight thanks to this file but that's not actually what we'll 8 00:00:30,090 --> 00:00:30,210 do. 9 00:00:30,210 --> 00:00:32,730 Now we will load the weight in the end. 10 00:00:32,730 --> 00:00:38,280 Right now that is just a variable so we will just use the variable but then at the end of this implementation 11 00:00:38,280 --> 00:00:41,970 we will load the weights to get our pre-trained model. 12 00:00:41,970 --> 00:00:50,460 So to feed x r towards variable that contains both the torch tensor of the input frame and the gradient 13 00:00:50,850 --> 00:00:52,890 into the neural network net. 14 00:00:53,160 --> 00:00:59,960 Well we simply need to take our neural network net and then apply X and that's it. 15 00:01:00,030 --> 00:01:02,940 That's how we feed X to the neural network. 16 00:01:03,150 --> 00:01:11,040 But then since this neural network nets applied to the input X will return the output y. 17 00:01:11,280 --> 00:01:19,740 Well we're going to get this output y right now and therefore and adding y equals net X. That gives 18 00:01:19,740 --> 00:01:26,190 us the output way we will of course describe what is why directly you can already start to try to think 19 00:01:26,280 --> 00:01:27,570 what it is exactly. 20 00:01:27,570 --> 00:01:30,140 But now we have the output that's great. 21 00:01:30,240 --> 00:01:32,970 And so we can move on to the next step. 22 00:01:33,060 --> 00:01:35,280 So the next step what is the next step. 23 00:01:35,460 --> 00:01:43,020 Well we just got our output y Why doesn't contain directly what we're interested in that is the result 24 00:01:43,020 --> 00:01:50,580 of the detection whether we have a dog or a human in the input frame to get that specific information 25 00:01:50,580 --> 00:01:51,830 we're interested in. 26 00:01:51,840 --> 00:01:55,460 Well we need to take the data attribute from Y. 27 00:01:55,560 --> 00:02:01,340 And so what we're going to do now is create a new sensor that we're going to call detections. 28 00:02:01,490 --> 00:02:06,110 So detection is a new tensor and that's a tensor contained in the output y. 29 00:02:06,210 --> 00:02:13,260 And that will contain the values we're interested in and to get this tensor while we take our output 30 00:02:13,260 --> 00:02:13,780 y. 31 00:02:14,010 --> 00:02:21,410 And then we add that and we take our attribute data and then we get the values of the output. 32 00:02:21,570 --> 00:02:22,050 Perfect. 33 00:02:22,050 --> 00:02:29,610 Now we have what we want the next step now is to create a new tensor object which will have the dimensions 34 00:02:29,730 --> 00:02:32,390 width height width height. 35 00:02:32,460 --> 00:02:34,190 So I didn't say twice. 36 00:02:34,200 --> 00:02:37,030 It's just a tensor of four dimensions. 37 00:02:37,080 --> 00:02:42,000 The first dimension is with the second dimension is height the third dimension is width and the fourth 38 00:02:42,000 --> 00:02:43,070 dimension is height. 39 00:02:43,290 --> 00:02:47,800 And now of course most of you must be thinking why do we have to create such a tensor. 40 00:02:47,940 --> 00:02:54,870 Well that's because the position of the detected objects inside the image has to be normalized between 41 00:02:55,020 --> 00:02:56,410 0 and 1. 42 00:02:56,460 --> 00:03:01,980 And to do this normalization will need this scale tensor with these four dimensions. 43 00:03:02,130 --> 00:03:04,910 Basically the Newtons are we're about to create right now. 44 00:03:04,950 --> 00:03:10,950 Scale will be just use to do this normalization between zero and one of the positions of the object 45 00:03:11,100 --> 00:03:12,320 detected in the image. 46 00:03:12,330 --> 00:03:13,820 That's the only purpose. 47 00:03:13,940 --> 00:03:16,810 And now why do we have with height width height. 48 00:03:16,840 --> 00:03:22,920 That's because the first two with height will correspond to the scale of values of the upper left corner 49 00:03:23,160 --> 00:03:24,990 of the rectangle detector. 50 00:03:25,230 --> 00:03:31,140 And the second with height will correspond to the scale of values of the lower right corner of this 51 00:03:31,140 --> 00:03:32,630 same rectangle detector. 52 00:03:32,640 --> 00:03:34,500 That's why we have a double with height. 53 00:03:34,620 --> 00:03:40,410 So let's create this scale sensor so that you can visualize it. 54 00:03:40,410 --> 00:03:48,330 So on a general rule to create a tensor in Torch Well we need to take our torche library and then we 55 00:03:48,330 --> 00:03:54,810 use the tensor class so scale will be an object of the tenso class which therefore will be a tensor 56 00:03:54,890 --> 00:03:56,150 a torch tensor. 57 00:03:56,550 --> 00:04:02,940 But as the arguments of this tensor class we need to specify the four dimensions of the tensor and these 58 00:04:02,940 --> 00:04:11,850 four dimensions are it's hights what's heights. 59 00:04:12,080 --> 00:04:12,790 Perfect. 60 00:04:12,800 --> 00:04:20,090 So this first with hied correspond to the upper left corner of the rectangle and this second with height 61 00:04:20,120 --> 00:04:23,220 corresponds to the lower right corner of the rectangle. 62 00:04:23,300 --> 00:04:29,060 And we're doing this to normalize the scale of values of the position of the detected objects between 63 00:04:29,150 --> 00:04:30,160 0 and 1. 64 00:04:30,260 --> 00:04:31,190 Perfect. 65 00:04:31,190 --> 00:04:32,610 So another good thing done. 66 00:04:32,690 --> 00:04:34,290 Don't worry about the warnings here. 67 00:04:34,310 --> 00:04:40,440 That's just because we haven't use these detections and scale variables yet we will do it very quickly. 68 00:04:40,640 --> 00:04:45,770 But before we do that I highly recommend to take a break because what we're about to do now will be 69 00:04:45,770 --> 00:04:49,700 slightly more complicated than what we've been doing so far. 70 00:04:49,730 --> 00:04:52,570 So we're going to finish with this tutorial now. 71 00:04:52,640 --> 00:04:54,080 Take a good break. 72 00:04:54,170 --> 00:04:59,100 Possibly a little nap or good coffee and then we'll attack more. 73 00:04:59,120 --> 00:05:01,670 The heart of the ass is tomorrow. 74 00:05:01,820 --> 00:05:03,390 Hope that didn't sound too aggressive. 75 00:05:03,480 --> 00:05:07,430 But yeah we're going to get into the heart of the as is the neural network. 76 00:05:07,520 --> 00:05:10,660 So have a good break and I'll see you in the next tutorial. 77 00:05:10,670 --> 00:05:12,380 Until then enjoy computer vision. 7822

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.