All language subtitles for 002 How Convolutional Neural Networks (CNN) works

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese Download
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,180 --> 00:00:05,280 Convolutional neural networks of CNN is the type of neural networks that uses convolution operation 2 00:00:05,280 --> 00:00:06,300 in the process. 3 00:00:06,900 --> 00:00:10,860 Deep learning can be used for solving computer vision problems, thanks to CNN. 4 00:00:12,040 --> 00:00:16,990 Convolutional layer is the core layer of deep neural networks for solving computer vision problems. 5 00:00:17,560 --> 00:00:20,710 We will dig deeper to understand the convolution process. 6 00:00:21,700 --> 00:00:27,040 There are three major components in convolutional layer which are input data filters or kernels and 7 00:00:27,040 --> 00:00:28,000 filter maps. 8 00:00:28,570 --> 00:00:33,550 For example, suppose the input is a colour image with three dimensions which are width, height and 9 00:00:33,550 --> 00:00:34,000 channel. 10 00:00:34,840 --> 00:00:39,400 The filter of Corona is commonly referred to as a feature detector because it moves across the image 11 00:00:39,400 --> 00:00:43,540 and performs a joint operation between the input and the value of the filter. 12 00:00:43,540 --> 00:00:45,550 To produce an output which is a feature map. 13 00:00:46,120 --> 00:00:48,670 Convolution is the name given to this process. 14 00:00:49,270 --> 00:00:53,860 The filter is a two dimensional array with weights that can be updated during the training process. 15 00:00:54,870 --> 00:00:59,460 By performing convolution into the image, we will have new image which contains features. 16 00:00:59,940 --> 00:01:02,460 So how exactly convolution works? 17 00:01:03,030 --> 00:01:05,129 This slide gives a simple illustration. 18 00:01:05,840 --> 00:01:08,690 Given an image on the left and the filter of Col at the top. 19 00:01:09,500 --> 00:01:12,080 Convolution is carried out by performing the product. 20 00:01:12,080 --> 00:01:16,250 The filter to every part of an image starting from top left to the bottom right corner. 21 00:01:16,970 --> 00:01:22,100 For example, from the first convolution operation, we will have a new value replacing the middle value 22 00:01:22,100 --> 00:01:24,260 of the top corner, which is five into four. 23 00:01:24,920 --> 00:01:30,410 We will have zero multiplied by four plus minus one multiplied by two plus zero multiplied by three 24 00:01:30,410 --> 00:01:36,290 plus minus one multiplied by eight plus four multiplied by five plus minus one multiplied by two plus 25 00:01:36,290 --> 00:01:41,690 zero multiplied by four plus minus one multiplied by four plus zero multiplied by three, which is equal 26 00:01:41,690 --> 00:01:42,320 to four. 27 00:01:43,030 --> 00:01:47,920 In this example, we use threat equals one, which means we slide the filter to the right by one pixel 28 00:01:48,190 --> 00:01:52,330 and perform similar operation and we will have minus nine as the result. 29 00:01:53,180 --> 00:01:55,770 Continue the process and we will have this feature map. 30 00:01:59,070 --> 00:02:03,960 After creating the initial feature map, we typically perform a series of convolutions using various 31 00:02:03,960 --> 00:02:07,380 filters, stride and padding to have various feature maps. 32 00:02:08,070 --> 00:02:13,500 Later on, the networks will give large weights to feature maps that fit the label dataset and small 33 00:02:13,500 --> 00:02:14,880 weights to the otherwise. 34 00:02:15,920 --> 00:02:21,260 The number of filters has an impact on the output channel, for example, through different filters 35 00:02:21,260 --> 00:02:24,860 to produce two different filter maps resulting in a two channel output. 36 00:02:26,110 --> 00:02:31,420 The is the amount of filter displacement that occurs during convolution convolution with thread. 37 00:02:31,420 --> 00:02:37,150 One is demonstrated in this and the previous examples if we use try to the mislead the filter to pixels 38 00:02:37,150 --> 00:02:38,590 to the right and so on. 39 00:02:39,040 --> 00:02:44,110 It is important to note that if we use try to the resulting feature map size will be even smaller. 40 00:02:44,650 --> 00:02:48,250 This technique is used for DOWNSAMPLING, the feature map in YOLO fee for. 41 00:02:49,440 --> 00:02:52,950 Pending is a process of ending border of close to the input limits. 42 00:02:53,220 --> 00:02:54,960 There are three types of pending. 43 00:02:55,530 --> 00:02:58,370 The first is without pending or so called valid PD. 44 00:02:59,010 --> 00:03:03,990 As previously illustrated, the fiscal map generated by the convolution process will be smaller than 45 00:03:03,990 --> 00:03:05,820 the input if we use valid PD. 46 00:03:06,980 --> 00:03:08,630 The second is the same paddy. 47 00:03:09,260 --> 00:03:12,230 The input image is ended with one layer of cross border. 48 00:03:12,740 --> 00:03:17,360 The size of the feature generated by the convolution process will be the same as the input. 49 00:03:18,550 --> 00:03:24,100 The third is the complete paddy the size of the Fitzroy map generated by the convolution process will 50 00:03:24,100 --> 00:03:26,140 be larger than the input with this paddy. 51 00:03:26,710 --> 00:03:29,590 For example, this is the input and this is the output. 52 00:03:30,710 --> 00:03:35,600 The pooling layer in most deep neural networks is responsible for DOWNSAMPLING or reducing the feature 53 00:03:35,600 --> 00:03:38,180 map dimension and a number of parameters. 54 00:03:38,750 --> 00:03:42,800 This is important to speed up the process while maintaining important features. 55 00:03:43,860 --> 00:03:47,910 Falling is commonly used in two ways maximum pulling or average pulling. 56 00:03:48,540 --> 00:03:52,440 The maximum value of images covered by the kernel is to turn by max polling. 57 00:03:52,440 --> 00:03:56,670 While the average value of image is covered by the kernel is returned by average poorly. 58 00:03:57,300 --> 00:04:02,340 Please bear in mind that in this example we used try because to which means we slide it two by two pruning 59 00:04:02,340 --> 00:04:07,570 kernel two pixels to the right and then to pixels to below right before the end of a deep neural networks 60 00:04:07,590 --> 00:04:09,000 is a fully connected layer. 61 00:04:10,040 --> 00:04:14,880 Because the feature map produced by the convolution and in layers is still a multidimensional array. 62 00:04:14,900 --> 00:04:19,279 It must be reshaped into a vector before it can be used as input to the fully connected layer. 63 00:04:20,070 --> 00:04:22,079 The reset process called flattening. 64 00:04:22,680 --> 00:04:27,900 Fully connected layer is commonly used in multilayer perceptron applications and aims to transform the 65 00:04:27,900 --> 00:04:30,870 data dimension so that the data can be classified linearly. 7080

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.