subtitlecat.com

All language subtitles for 002 How Convolutional Neural Networks (CNN) works

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese Download

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,180 --> 00:00:05,280 Convolutional neural networks of CNN is the type of neural networks that uses convolution operation 2 00:00:05,280 --> 00:00:06,300 in the process. 3 00:00:06,900 --> 00:00:10,860 Deep learning can be used for solving computer vision problems, thanks to CNN. 4 00:00:12,040 --> 00:00:16,990 Convolutional layer is the core layer of deep neural networks for solving computer vision problems. 5 00:00:17,560 --> 00:00:20,710 We will dig deeper to understand the convolution process. 6 00:00:21,700 --> 00:00:27,040 There are three major components in convolutional layer which are input data filters or kernels and 7 00:00:27,040 --> 00:00:28,000 filter maps. 8 00:00:28,570 --> 00:00:33,550 For example, suppose the input is a colour image with three dimensions which are width, height and 9 00:00:33,550 --> 00:00:34,000 channel. 10 00:00:34,840 --> 00:00:39,400 The filter of Corona is commonly referred to as a feature detector because it moves across the image 11 00:00:39,400 --> 00:00:43,540 and performs a joint operation between the input and the value of the filter. 12 00:00:43,540 --> 00:00:45,550 To produce an output which is a feature map. 13 00:00:46,120 --> 00:00:48,670 Convolution is the name given to this process. 14 00:00:49,270 --> 00:00:53,860 The filter is a two dimensional array with weights that can be updated during the training process. 15 00:00:54,870 --> 00:00:59,460 By performing convolution into the image, we will have new image which contains features. 16 00:00:59,940 --> 00:01:02,460 So how exactly convolution works? 17 00:01:03,030 --> 00:01:05,129 This slide gives a simple illustration. 18 00:01:05,840 --> 00:01:08,690 Given an image on the left and the filter of Col at the top. 19 00:01:09,500 --> 00:01:12,080 Convolution is carried out by performing the product. 20 00:01:12,080 --> 00:01:16,250 The filter to every part of an image starting from top left to the bottom right corner. 21 00:01:16,970 --> 00:01:22,100 For example, from the first convolution operation, we will have a new value replacing the middle value 22 00:01:22,100 --> 00:01:24,260 of the top corner, which is five into four. 23 00:01:24,920 --> 00:01:30,410 We will have zero multiplied by four plus minus one multiplied by two plus zero multiplied by three 24 00:01:30,410 --> 00:01:36,290 plus minus one multiplied by eight plus four multiplied by five plus minus one multiplied by two plus 25 00:01:36,290 --> 00:01:41,690 zero multiplied by four plus minus one multiplied by four plus zero multiplied by three, which is equal 26 00:01:41,690 --> 00:01:42,320 to four. 27 00:01:43,030 --> 00:01:47,920 In this example, we use threat equals one, which means we slide the filter to the right by one pixel 28 00:01:48,190 --> 00:01:52,330 and perform similar operation and we will have minus nine as the result. 29 00:01:53,180 --> 00:01:55,770 Continue the process and we will have this feature map. 30 00:01:59,070 --> 00:02:03,960 After creating the initial feature map, we typically perform a series of convolutions using various 31 00:02:03,960 --> 00:02:07,380 filters, stride and padding to have various feature maps. 32 00:02:08,070 --> 00:02:13,500 Later on, the networks will give large weights to feature maps that fit the label dataset and small 33 00:02:13,500 --> 00:02:14,880 weights to the otherwise. 34 00:02:15,920 --> 00:02:21,260 The number of filters has an impact on the output channel, for example, through different filters 35 00:02:21,260 --> 00:02:24,860 to produce two different filter maps resulting in a two channel output. 36 00:02:26,110 --> 00:02:31,420 The is the amount of filter displacement that occurs during convolution convolution with thread. 37 00:02:31,420 --> 00:02:37,150 One is demonstrated in this and the previous examples if we use try to the mislead the filter to pixels 38 00:02:37,150 --> 00:02:38,590 to the right and so on. 39 00:02:39,040 --> 00:02:44,110 It is important to note that if we use try to the resulting feature map size will be even smaller. 40 00:02:44,650 --> 00:02:48,250 This technique is used for DOWNSAMPLING, the feature map in YOLO fee for. 41 00:02:49,440 --> 00:02:52,950 Pending is a process of ending border of close to the input limits. 42 00:02:53,220 --> 00:02:54,960 There are three types of pending. 43 00:02:55,530 --> 00:02:58,370 The first is without pending or so called valid PD. 44 00:02:59,010 --> 00:03:03,990 As previously illustrated, the fiscal map generated by the convolution process will be smaller than 45 00:03:03,990 --> 00:03:05,820 the input if we use valid PD. 46 00:03:06,980 --> 00:03:08,630 The second is the same paddy. 47 00:03:09,260 --> 00:03:12,230 The input image is ended with one layer of cross border. 48 00:03:12,740 --> 00:03:17,360 The size of the feature generated by the convolution process will be the same as the input. 49 00:03:18,550 --> 00:03:24,100 The third is the complete paddy the size of the Fitzroy map generated by the convolution process will 50 00:03:24,100 --> 00:03:26,140 be larger than the input with this paddy. 51 00:03:26,710 --> 00:03:29,590 For example, this is the input and this is the output. 52 00:03:30,710 --> 00:03:35,600 The pooling layer in most deep neural networks is responsible for DOWNSAMPLING or reducing the feature 53 00:03:35,600 --> 00:03:38,180 map dimension and a number of parameters. 54 00:03:38,750 --> 00:03:42,800 This is important to speed up the process while maintaining important features. 55 00:03:43,860 --> 00:03:47,910 Falling is commonly used in two ways maximum pulling or average pulling. 56 00:03:48,540 --> 00:03:52,440 The maximum value of images covered by the kernel is to turn by max polling. 57 00:03:52,440 --> 00:03:56,670 While the average value of image is covered by the kernel is returned by average poorly. 58 00:03:57,300 --> 00:04:02,340 Please bear in mind that in this example we used try because to which means we slide it two by two pruning 59 00:04:02,340 --> 00:04:07,570 kernel two pixels to the right and then to pixels to below right before the end of a deep neural networks 60 00:04:07,590 --> 00:04:09,000 is a fully connected layer. 61 00:04:10,040 --> 00:04:14,880 Because the feature map produced by the convolution and in layers is still a multidimensional array. 62 00:04:14,900 --> 00:04:19,279 It must be reshaped into a vector before it can be used as input to the fully connected layer. 63 00:04:20,070 --> 00:04:22,079 The reset process called flattening. 64 00:04:22,680 --> 00:04:27,900 Fully connected layer is commonly used in multilayer perceptron applications and aims to transform the 65 00:04:27,900 --> 00:04:30,870 data dimension so that the data can be classified linearly. 7080