Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,180 --> 00:00:05,280
Convolutional neural networks of CNN is the type of neural networks that uses convolution operation
2
00:00:05,280 --> 00:00:06,300
in the process.
3
00:00:06,900 --> 00:00:10,860
Deep learning can be used for solving computer vision problems, thanks to CNN.
4
00:00:12,040 --> 00:00:16,990
Convolutional layer is the core layer of deep neural networks for solving computer vision problems.
5
00:00:17,560 --> 00:00:20,710
We will dig deeper to understand the convolution process.
6
00:00:21,700 --> 00:00:27,040
There are three major components in convolutional layer which are input data filters or kernels and
7
00:00:27,040 --> 00:00:28,000
filter maps.
8
00:00:28,570 --> 00:00:33,550
For example, suppose the input is a colour image with three dimensions which are width, height and
9
00:00:33,550 --> 00:00:34,000
channel.
10
00:00:34,840 --> 00:00:39,400
The filter of Corona is commonly referred to as a feature detector because it moves across the image
11
00:00:39,400 --> 00:00:43,540
and performs a joint operation between the input and the value of the filter.
12
00:00:43,540 --> 00:00:45,550
To produce an output which is a feature map.
13
00:00:46,120 --> 00:00:48,670
Convolution is the name given to this process.
14
00:00:49,270 --> 00:00:53,860
The filter is a two dimensional array with weights that can be updated during the training process.
15
00:00:54,870 --> 00:00:59,460
By performing convolution into the image, we will have new image which contains features.
16
00:00:59,940 --> 00:01:02,460
So how exactly convolution works?
17
00:01:03,030 --> 00:01:05,129
This slide gives a simple illustration.
18
00:01:05,840 --> 00:01:08,690
Given an image on the left and the filter of Col at the top.
19
00:01:09,500 --> 00:01:12,080
Convolution is carried out by performing the product.
20
00:01:12,080 --> 00:01:16,250
The filter to every part of an image starting from top left to the bottom right corner.
21
00:01:16,970 --> 00:01:22,100
For example, from the first convolution operation, we will have a new value replacing the middle value
22
00:01:22,100 --> 00:01:24,260
of the top corner, which is five into four.
23
00:01:24,920 --> 00:01:30,410
We will have zero multiplied by four plus minus one multiplied by two plus zero multiplied by three
24
00:01:30,410 --> 00:01:36,290
plus minus one multiplied by eight plus four multiplied by five plus minus one multiplied by two plus
25
00:01:36,290 --> 00:01:41,690
zero multiplied by four plus minus one multiplied by four plus zero multiplied by three, which is equal
26
00:01:41,690 --> 00:01:42,320
to four.
27
00:01:43,030 --> 00:01:47,920
In this example, we use threat equals one, which means we slide the filter to the right by one pixel
28
00:01:48,190 --> 00:01:52,330
and perform similar operation and we will have minus nine as the result.
29
00:01:53,180 --> 00:01:55,770
Continue the process and we will have this feature map.
30
00:01:59,070 --> 00:02:03,960
After creating the initial feature map, we typically perform a series of convolutions using various
31
00:02:03,960 --> 00:02:07,380
filters, stride and padding to have various feature maps.
32
00:02:08,070 --> 00:02:13,500
Later on, the networks will give large weights to feature maps that fit the label dataset and small
33
00:02:13,500 --> 00:02:14,880
weights to the otherwise.
34
00:02:15,920 --> 00:02:21,260
The number of filters has an impact on the output channel, for example, through different filters
35
00:02:21,260 --> 00:02:24,860
to produce two different filter maps resulting in a two channel output.
36
00:02:26,110 --> 00:02:31,420
The is the amount of filter displacement that occurs during convolution convolution with thread.
37
00:02:31,420 --> 00:02:37,150
One is demonstrated in this and the previous examples if we use try to the mislead the filter to pixels
38
00:02:37,150 --> 00:02:38,590
to the right and so on.
39
00:02:39,040 --> 00:02:44,110
It is important to note that if we use try to the resulting feature map size will be even smaller.
40
00:02:44,650 --> 00:02:48,250
This technique is used for DOWNSAMPLING, the feature map in YOLO fee for.
41
00:02:49,440 --> 00:02:52,950
Pending is a process of ending border of close to the input limits.
42
00:02:53,220 --> 00:02:54,960
There are three types of pending.
43
00:02:55,530 --> 00:02:58,370
The first is without pending or so called valid PD.
44
00:02:59,010 --> 00:03:03,990
As previously illustrated, the fiscal map generated by the convolution process will be smaller than
45
00:03:03,990 --> 00:03:05,820
the input if we use valid PD.
46
00:03:06,980 --> 00:03:08,630
The second is the same paddy.
47
00:03:09,260 --> 00:03:12,230
The input image is ended with one layer of cross border.
48
00:03:12,740 --> 00:03:17,360
The size of the feature generated by the convolution process will be the same as the input.
49
00:03:18,550 --> 00:03:24,100
The third is the complete paddy the size of the Fitzroy map generated by the convolution process will
50
00:03:24,100 --> 00:03:26,140
be larger than the input with this paddy.
51
00:03:26,710 --> 00:03:29,590
For example, this is the input and this is the output.
52
00:03:30,710 --> 00:03:35,600
The pooling layer in most deep neural networks is responsible for DOWNSAMPLING or reducing the feature
53
00:03:35,600 --> 00:03:38,180
map dimension and a number of parameters.
54
00:03:38,750 --> 00:03:42,800
This is important to speed up the process while maintaining important features.
55
00:03:43,860 --> 00:03:47,910
Falling is commonly used in two ways maximum pulling or average pulling.
56
00:03:48,540 --> 00:03:52,440
The maximum value of images covered by the kernel is to turn by max polling.
57
00:03:52,440 --> 00:03:56,670
While the average value of image is covered by the kernel is returned by average poorly.
58
00:03:57,300 --> 00:04:02,340
Please bear in mind that in this example we used try because to which means we slide it two by two pruning
59
00:04:02,340 --> 00:04:07,570
kernel two pixels to the right and then to pixels to below right before the end of a deep neural networks
60
00:04:07,590 --> 00:04:09,000
is a fully connected layer.
61
00:04:10,040 --> 00:04:14,880
Because the feature map produced by the convolution and in layers is still a multidimensional array.
62
00:04:14,900 --> 00:04:19,279
It must be reshaped into a vector before it can be used as input to the fully connected layer.
63
00:04:20,070 --> 00:04:22,079
The reset process called flattening.
64
00:04:22,680 --> 00:04:27,900
Fully connected layer is commonly used in multilayer perceptron applications and aims to transform the
65
00:04:27,900 --> 00:04:30,870
data dimension so that the data can be classified linearly.
7080
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.