subtitlecat.com

All language subtitles for 031 Object Detection - Step 5-en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese Download

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,480 --> 00:00:07,650 Hello and welcome to this new tutorial now that we've done our four transformations the input is ready 2 00:00:07,830 --> 00:00:09,490 to be fed into the neural network. 3 00:00:09,510 --> 00:00:12,060 It has the authorization to get in. 4 00:00:12,180 --> 00:00:14,730 And therefore that's exactly what we're going to do. 5 00:00:14,730 --> 00:00:16,420 We're going to get it. 6 00:00:16,490 --> 00:00:25,380 And so there is nothing more simple you know we already have our pre-trained model SSD and it is already 7 00:00:25,380 --> 00:00:30,090 pre-trained because we could load the weight thanks to this file but that's not actually what we'll 8 00:00:30,090 --> 00:00:30,210 do. 9 00:00:30,210 --> 00:00:32,730 Now we will load the weight in the end. 10 00:00:32,730 --> 00:00:38,280 Right now that is just a variable so we will just use the variable but then at the end of this implementation 11 00:00:38,280 --> 00:00:41,970 we will load the weights to get our pre-trained model. 12 00:00:41,970 --> 00:00:50,460 So to feed x r towards variable that contains both the torch tensor of the input frame and the gradient 13 00:00:50,850 --> 00:00:52,890 into the neural network net. 14 00:00:53,160 --> 00:00:59,960 Well we simply need to take our neural network net and then apply X and that's it. 15 00:01:00,030 --> 00:01:02,940 That's how we feed X to the neural network. 16 00:01:03,150 --> 00:01:11,040 But then since this neural network nets applied to the input X will return the output y. 17 00:01:11,280 --> 00:01:19,740 Well we're going to get this output y right now and therefore and adding y equals net X. That gives 18 00:01:19,740 --> 00:01:26,190 us the output way we will of course describe what is why directly you can already start to try to think 19 00:01:26,280 --> 00:01:27,570 what it is exactly. 20 00:01:27,570 --> 00:01:30,140 But now we have the output that's great. 21 00:01:30,240 --> 00:01:32,970 And so we can move on to the next step. 22 00:01:33,060 --> 00:01:35,280 So the next step what is the next step. 23 00:01:35,460 --> 00:01:43,020 Well we just got our output y Why doesn't contain directly what we're interested in that is the result 24 00:01:43,020 --> 00:01:50,580 of the detection whether we have a dog or a human in the input frame to get that specific information 25 00:01:50,580 --> 00:01:51,830 we're interested in. 26 00:01:51,840 --> 00:01:55,460 Well we need to take the data attribute from Y. 27 00:01:55,560 --> 00:02:01,340 And so what we're going to do now is create a new sensor that we're going to call detections. 28 00:02:01,490 --> 00:02:06,110 So detection is a new tensor and that's a tensor contained in the output y. 29 00:02:06,210 --> 00:02:13,260 And that will contain the values we're interested in and to get this tensor while we take our output 30 00:02:13,260 --> 00:02:13,780 y. 31 00:02:14,010 --> 00:02:21,410 And then we add that and we take our attribute data and then we get the values of the output. 32 00:02:21,570 --> 00:02:22,050 Perfect. 33 00:02:22,050 --> 00:02:29,610 Now we have what we want the next step now is to create a new tensor object which will have the dimensions 34 00:02:29,730 --> 00:02:32,390 width height width height. 35 00:02:32,460 --> 00:02:34,190 So I didn't say twice. 36 00:02:34,200 --> 00:02:37,030 It's just a tensor of four dimensions. 37 00:02:37,080 --> 00:02:42,000 The first dimension is with the second dimension is height the third dimension is width and the fourth 38 00:02:42,000 --> 00:02:43,070 dimension is height. 39 00:02:43,290 --> 00:02:47,800 And now of course most of you must be thinking why do we have to create such a tensor. 40 00:02:47,940 --> 00:02:54,870 Well that's because the position of the detected objects inside the image has to be normalized between 41 00:02:55,020 --> 00:02:56,410 0 and 1. 42 00:02:56,460 --> 00:03:01,980 And to do this normalization will need this scale tensor with these four dimensions. 43 00:03:02,130 --> 00:03:04,910 Basically the Newtons are we're about to create right now. 44 00:03:04,950 --> 00:03:10,950 Scale will be just use to do this normalization between zero and one of the positions of the object 45 00:03:11,100 --> 00:03:12,320 detected in the image. 46 00:03:12,330 --> 00:03:13,820 That's the only purpose. 47 00:03:13,940 --> 00:03:16,810 And now why do we have with height width height. 48 00:03:16,840 --> 00:03:22,920 That's because the first two with height will correspond to the scale of values of the upper left corner 49 00:03:23,160 --> 00:03:24,990 of the rectangle detector. 50 00:03:25,230 --> 00:03:31,140 And the second with height will correspond to the scale of values of the lower right corner of this 51 00:03:31,140 --> 00:03:32,630 same rectangle detector. 52 00:03:32,640 --> 00:03:34,500 That's why we have a double with height. 53 00:03:34,620 --> 00:03:40,410 So let's create this scale sensor so that you can visualize it. 54 00:03:40,410 --> 00:03:48,330 So on a general rule to create a tensor in Torch Well we need to take our torche library and then we 55 00:03:48,330 --> 00:03:54,810 use the tensor class so scale will be an object of the tenso class which therefore will be a tensor 56 00:03:54,890 --> 00:03:56,150 a torch tensor. 57 00:03:56,550 --> 00:04:02,940 But as the arguments of this tensor class we need to specify the four dimensions of the tensor and these 58 00:04:02,940 --> 00:04:11,850 four dimensions are it's hights what's heights. 59 00:04:12,080 --> 00:04:12,790 Perfect. 60 00:04:12,800 --> 00:04:20,090 So this first with hied correspond to the upper left corner of the rectangle and this second with height 61 00:04:20,120 --> 00:04:23,220 corresponds to the lower right corner of the rectangle. 62 00:04:23,300 --> 00:04:29,060 And we're doing this to normalize the scale of values of the position of the detected objects between 63 00:04:29,150 --> 00:04:30,160 0 and 1. 64 00:04:30,260 --> 00:04:31,190 Perfect. 65 00:04:31,190 --> 00:04:32,610 So another good thing done. 66 00:04:32,690 --> 00:04:34,290 Don't worry about the warnings here. 67 00:04:34,310 --> 00:04:40,440 That's just because we haven't use these detections and scale variables yet we will do it very quickly. 68 00:04:40,640 --> 00:04:45,770 But before we do that I highly recommend to take a break because what we're about to do now will be 69 00:04:45,770 --> 00:04:49,700 slightly more complicated than what we've been doing so far. 70 00:04:49,730 --> 00:04:52,570 So we're going to finish with this tutorial now. 71 00:04:52,640 --> 00:04:54,080 Take a good break. 72 00:04:54,170 --> 00:04:59,100 Possibly a little nap or good coffee and then we'll attack more. 73 00:04:59,120 --> 00:05:01,670 The heart of the ass is tomorrow. 74 00:05:01,820 --> 00:05:03,390 Hope that didn't sound too aggressive. 75 00:05:03,480 --> 00:05:07,430 But yeah we're going to get into the heart of the as is the neural network. 76 00:05:07,520 --> 00:05:10,660 So have a good break and I'll see you in the next tutorial. 77 00:05:10,670 --> 00:05:12,380 Until then enjoy computer vision. 7822