Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,270 --> 00:00:02,640
Hello and welcome back to the course on deep learning.
2
00:00:02,730 --> 00:00:05,140
All right today we're talking about the activation function.
3
00:00:05,190 --> 00:00:07,010
Let's get straight into it.
4
00:00:07,020 --> 00:00:11,910
So this is where we left off previously we talked about the structure of one neuron.
5
00:00:12,030 --> 00:00:16,770
So there it is in the middle we know that it has some inputs values coming in it's got some weights
6
00:00:17,130 --> 00:00:23,370
then it adds up the way to calculate the way that some of those inputs and then apply the activation
7
00:00:23,370 --> 00:00:24,690
function in step 3.
8
00:00:24,750 --> 00:00:30,090
It passes on the signal to the next year and then that's what we're talking about today we're talking
9
00:00:30,090 --> 00:00:32,850
about the value that is going to be passed over.
10
00:00:32,850 --> 00:00:35,970
So we're talking about the activation function that's being applied.
11
00:00:36,390 --> 00:00:39,270
So what options do we have for the activation function.
12
00:00:39,270 --> 00:00:43,400
Well we're going to look at four different types of activation functions that you can choose from.
13
00:00:43,410 --> 00:00:47,400
Of course there are more different types of activation function but these are the predominate ones that
14
00:00:47,400 --> 00:00:50,390
you'll be hearing about and that we'll be using in this course.
15
00:00:50,400 --> 00:00:53,060
So here is the threshold function.
16
00:00:53,070 --> 00:00:54,300
This is what it looks like.
17
00:00:54,300 --> 00:00:59,600
So on the x axis you have the weighted some of inputs on the y axis.
18
00:00:59,610 --> 00:01:07,320
You have just you know the values from 0 to 1 and basically the threshold functions are very simple
19
00:01:07,330 --> 00:01:14,700
type a function where if the value is less than zero then the free.
20
00:01:14,730 --> 00:01:16,680
Thanks ssion passes on zero.
21
00:01:16,890 --> 00:01:22,940
If the value is more than zero or equal to zero then threshold function pusses on a 1.
22
00:01:22,940 --> 00:01:26,910
So it's basically kind of like yes no type of function.
23
00:01:26,940 --> 00:01:29,130
Very very straightforward.
24
00:01:29,130 --> 00:01:33,500
Very kind of like rigid type of function either yes or no.
25
00:01:33,540 --> 00:01:35,000
No other options.
26
00:01:35,040 --> 00:01:35,510
So there you go.
27
00:01:35,510 --> 00:01:36,210
That's how it works.
28
00:01:36,210 --> 00:01:37,440
Very simple function.
29
00:01:37,440 --> 00:01:40,020
Let's move on to something a bit more complex.
30
00:01:40,020 --> 00:01:48,420
Now this sigmoid function very interesting formula that we have here you'll see just now there is one
31
00:01:48,420 --> 00:01:49,940
divide by one plus each.
32
00:01:49,950 --> 00:01:58,450
The power of minus X whereas in this case of course X is the value of the sums of the way that sums.
33
00:01:58,590 --> 00:02:00,540
And so yeah.
34
00:02:00,570 --> 00:02:02,600
So this is what the sigmoid looks like.
35
00:02:02,610 --> 00:02:06,510
It's a function which is used in the logistic regression.
36
00:02:06,510 --> 00:02:09,470
If you recall from the machine learning course.
37
00:02:09,540 --> 00:02:12,000
So what is good about this function is that it is smooth.
38
00:02:12,060 --> 00:02:14,880
Unlike the virtual function.
39
00:02:14,970 --> 00:02:21,720
This one doesn't have those kinks in its curve and therefore it's just nice and smooth gradual progression.
40
00:02:21,720 --> 00:02:26,340
So anything below 0 is just like drops off above zero.
41
00:02:26,340 --> 00:02:35,220
It acts approximates towards one and this sigmoid function is very useful in the final Lehren the output
42
00:02:35,220 --> 00:02:35,590
layer.
43
00:02:35,610 --> 00:02:38,900
Especially when you're trying to predict probabilities.
44
00:02:38,910 --> 00:02:40,820
And we'll see that throughout the course.
45
00:02:41,190 --> 00:02:47,370
And then we've got the rectifier function rectifier function even though it has a kink is one of the
46
00:02:47,370 --> 00:02:55,090
most popular functions for artificial neural networks so it goes all the way to zero it is zero.
47
00:02:55,110 --> 00:03:02,460
And then from there it's gradually progresses as the input value increases as well and we'll see that
48
00:03:02,460 --> 00:03:07,140
throughout the course we'll see that in other intuition tutorials and we also see that how we use this
49
00:03:07,140 --> 00:03:13,020
function in the practical side of the course and I will comment on this a bit more in a few slides from
50
00:03:13,020 --> 00:03:13,590
now.
51
00:03:13,590 --> 00:03:18,970
So just remember the direct fire function is one of the most used functions in artificial neural networks.
52
00:03:19,020 --> 00:03:22,770
And finally we've got one more function that you will probably hear about.
53
00:03:22,830 --> 00:03:25,220
It's the hyperbolic tangent function.
54
00:03:25,260 --> 00:03:32,760
It's very similar to the sigmoid function but here the hyperbolic tangent function goes below zero so
55
00:03:32,760 --> 00:03:39,510
the values go from 0 to 1 or approximately 2 1 and go from zero to minus 1 on the other side.
56
00:03:39,750 --> 00:03:42,360
And that can be useful in some applications.
57
00:03:42,390 --> 00:03:48,060
So we're not going to go into too much depth on each one of these functions I just wanted to acquaint
58
00:03:48,060 --> 00:03:51,680
you with them so that you know what they look like and what they're called.
59
00:03:51,780 --> 00:03:59,690
If you'd like to get some additional reading then check out this paper by a 75 year lot.
60
00:03:59,820 --> 00:04:05,630
Have you a lot called Deep sparse rectifies neural networks 2000 paper.
61
00:04:05,790 --> 00:04:14,700
And there you will find out exactly why the rectifier function is such a valuable function why it's
62
00:04:14,970 --> 00:04:16,300
so popularly used.
63
00:04:16,350 --> 00:04:20,640
But nevertheless for now we don't really need to know all of those things.
64
00:04:20,650 --> 00:04:24,240
For now we're just going to start applying them which you start using them more and more and more.
65
00:04:24,270 --> 00:04:31,290
And so when you feel comfortable with the practical side of things then you can go and refer to this
66
00:04:31,290 --> 00:04:37,140
paper and then you will be able to soak in that knowledge much quicker and it will make much more sense.
67
00:04:37,370 --> 00:04:42,000
But just keep this in mind that when you're ready when you feel that you're ready then you can go and
68
00:04:42,120 --> 00:04:45,060
research paper and get some valuable knowledge from them.
69
00:04:45,540 --> 00:04:53,070
So just to quickly recap we have the threshold activation function which goes like this the sigmoid
70
00:04:53,100 --> 00:04:55,360
activation function which looks like this.
71
00:04:55,680 --> 00:05:01,770
We have the rectifier function and we have the hyperbolic tangent function and now to finish off this
72
00:05:01,770 --> 00:05:09,000
tutorial Let's quickly do a few exercise so just do two quick exercises to help that knowledge sink
73
00:05:09,000 --> 00:05:09,150
in.
74
00:05:09,150 --> 00:05:15,140
So first one is we've got an example here of a neural network of just one neuron and that right away
75
00:05:15,160 --> 00:05:16,030
the output layer.
76
00:05:16,140 --> 00:05:22,620
And the question is assuming that your dependent variable is binary So it's either 0 or 1 which threshold
77
00:05:22,620 --> 00:05:23,780
function would you use.
78
00:05:23,790 --> 00:05:31,980
So out of the ones that we've discussed we have a threshold function the sigmoid function the rectifier
79
00:05:31,980 --> 00:05:39,480
function and we've got the hyperbolic tangent function in it's in their roll forms which ones would
80
00:05:39,480 --> 00:05:43,450
you be able to use for a binary variable.
81
00:05:43,950 --> 00:05:44,410
OK.
82
00:05:44,490 --> 00:05:49,360
So the answers here are there's two options that we can approach this with.
83
00:05:49,380 --> 00:05:55,790
So one is the threshold activation function because we know that it's between 0 and 1 and it gives us
84
00:05:55,800 --> 00:06:00,090
0 Anderson umbrellas and then otherwise it gives you once it only can give you two values.
85
00:06:00,090 --> 00:06:10,020
It fits perfectly fits this requirement perfectly and therefore you could you say y equals the threshold
86
00:06:10,020 --> 00:06:13,770
function of your sway to some and that's it.
87
00:06:14,010 --> 00:06:18,450
And in the second case which you could use is the sigmoid activation function.
88
00:06:18,450 --> 00:06:21,710
It is actually also between 0 and 1 just what we need.
89
00:06:21,750 --> 00:06:29,940
But at the same time you want is just one right so you is not exactly what we need but in this case
90
00:06:29,940 --> 00:06:37,530
which you could use it as is the probability of Y being yes or no.
91
00:06:37,530 --> 00:06:46,170
So we want Y to be 0 1 but instead we'll say that the sigmoid function Simoun activation function tells
92
00:06:46,170 --> 00:06:51,860
us whether it would tell us of the probability of Y being equal to 1.
93
00:06:51,870 --> 00:06:59,130
So basically the closer you get to the top the more likely it is that this is indeed a one or a yes
94
00:06:59,160 --> 00:07:00,300
rather than a no.
95
00:07:00,750 --> 00:07:04,700
And yeah so that's very similar to the logistic regression approach.
96
00:07:04,920 --> 00:07:07,570
And those are just two examples.
97
00:07:07,650 --> 00:07:09,610
If you have a binary variable.
98
00:07:10,120 --> 00:07:12,810
Now let's have a look at another practical application.
99
00:07:12,810 --> 00:07:17,190
Let's have a look at how all this would play out if we had in your all natural like this.
100
00:07:17,430 --> 00:07:20,960
So in the first layer we have some inputs.
101
00:07:20,980 --> 00:07:26,060
They are sent off to our first hidden layer and then an activation function is applied.
102
00:07:26,070 --> 00:07:31,380
And usually what you would apply here and what you will see throughout the Scorsese will apply a rectifier
103
00:07:31,410 --> 00:07:34,510
activation function so it would look something like that.
104
00:07:34,530 --> 00:07:40,980
We apply the rectifier activation function and then from there the signals would be passed on to the
105
00:07:40,980 --> 00:07:46,820
output layer where the sigmoid activation function would be applied and that would be our final output.
106
00:07:46,830 --> 00:07:51,270
And that could predict a probability for instance so this combination is going to be quite common where
107
00:07:51,600 --> 00:07:58,640
in the hidden layers we apply the rectifier function and then output there we apply the sigmoid function.
108
00:07:58,890 --> 00:07:59,850
So there we go.
109
00:07:59,850 --> 00:08:05,040
Hope you enjoyed this tutorial now you are quite well versed in four different types of activation functions
110
00:08:05,040 --> 00:08:11,130
and you will get some hands on practical experience with them throughout this course will be using them
111
00:08:11,220 --> 00:08:15,900
all over the place so you'll get to know them quite intimately and you should be quite comfortable with
112
00:08:15,900 --> 00:08:16,310
them.
113
00:08:16,530 --> 00:08:22,230
But for now this is the knowledge that you need to progress and understand what he's going to be happening
114
00:08:22,250 --> 00:08:23,600
further down in this course.
115
00:08:23,940 --> 00:08:26,940
And on that note I look forward to seeing you next time.
116
00:08:26,940 --> 00:08:28,560
Until then enjoy learning.
12783
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.