subtitlecat.com

All language subtitles for 063 The Activation Function-en

Afrikaans

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bengali

Bosnian

Bulgarian

Catalan

Cebuano

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Khmer

Korean

Kurdish (Kurmanji)

Kyrgyz

Lao

Latin

Latvian

Lithuanian

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mongolian

Myanmar (Burmese)

Nepali

Norwegian

Pashto

Persian

Polish

Portuguese

Punjabi

Romanian

Russian

Samoan

Scots Gaelic

Serbian

Sesotho

Shona

Sindhi

Sinhala

Slovak

Slovenian

Somali

Spanish

Sundanese

Swahili

Swedish

Tajik

Tamil

Telugu

Thai

Turkish

Ukrainian

Urdu

Uzbek

Vietnamese Download

Welsh

Xhosa

Yiddish

Yoruba

Zulu

Odia (Oriya)

Kinyarwanda

Turkmen

Tatar

Uyghur

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,270 --> 00:00:02,640 Hello and welcome back to the course on deep learning. 2 00:00:02,730 --> 00:00:05,140 All right today we're talking about the activation function. 3 00:00:05,190 --> 00:00:07,010 Let's get straight into it. 4 00:00:07,020 --> 00:00:11,910 So this is where we left off previously we talked about the structure of one neuron. 5 00:00:12,030 --> 00:00:16,770 So there it is in the middle we know that it has some inputs values coming in it's got some weights 6 00:00:17,130 --> 00:00:23,370 then it adds up the way to calculate the way that some of those inputs and then apply the activation 7 00:00:23,370 --> 00:00:24,690 function in step 3. 8 00:00:24,750 --> 00:00:30,090 It passes on the signal to the next year and then that's what we're talking about today we're talking 9 00:00:30,090 --> 00:00:32,850 about the value that is going to be passed over. 10 00:00:32,850 --> 00:00:35,970 So we're talking about the activation function that's being applied. 11 00:00:36,390 --> 00:00:39,270 So what options do we have for the activation function. 12 00:00:39,270 --> 00:00:43,400 Well we're going to look at four different types of activation functions that you can choose from. 13 00:00:43,410 --> 00:00:47,400 Of course there are more different types of activation function but these are the predominate ones that 14 00:00:47,400 --> 00:00:50,390 you'll be hearing about and that we'll be using in this course. 15 00:00:50,400 --> 00:00:53,060 So here is the threshold function. 16 00:00:53,070 --> 00:00:54,300 This is what it looks like. 17 00:00:54,300 --> 00:00:59,600 So on the x axis you have the weighted some of inputs on the y axis. 18 00:00:59,610 --> 00:01:07,320 You have just you know the values from 0 to 1 and basically the threshold functions are very simple 19 00:01:07,330 --> 00:01:14,700 type a function where if the value is less than zero then the free. 20 00:01:14,730 --> 00:01:16,680 Thanks ssion passes on zero. 21 00:01:16,890 --> 00:01:22,940 If the value is more than zero or equal to zero then threshold function pusses on a 1. 22 00:01:22,940 --> 00:01:26,910 So it's basically kind of like yes no type of function. 23 00:01:26,940 --> 00:01:29,130 Very very straightforward. 24 00:01:29,130 --> 00:01:33,500 Very kind of like rigid type of function either yes or no. 25 00:01:33,540 --> 00:01:35,000 No other options. 26 00:01:35,040 --> 00:01:35,510 So there you go. 27 00:01:35,510 --> 00:01:36,210 That's how it works. 28 00:01:36,210 --> 00:01:37,440 Very simple function. 29 00:01:37,440 --> 00:01:40,020 Let's move on to something a bit more complex. 30 00:01:40,020 --> 00:01:48,420 Now this sigmoid function very interesting formula that we have here you'll see just now there is one 31 00:01:48,420 --> 00:01:49,940 divide by one plus each. 32 00:01:49,950 --> 00:01:58,450 The power of minus X whereas in this case of course X is the value of the sums of the way that sums. 33 00:01:58,590 --> 00:02:00,540 And so yeah. 34 00:02:00,570 --> 00:02:02,600 So this is what the sigmoid looks like. 35 00:02:02,610 --> 00:02:06,510 It's a function which is used in the logistic regression. 36 00:02:06,510 --> 00:02:09,470 If you recall from the machine learning course. 37 00:02:09,540 --> 00:02:12,000 So what is good about this function is that it is smooth. 38 00:02:12,060 --> 00:02:14,880 Unlike the virtual function. 39 00:02:14,970 --> 00:02:21,720 This one doesn't have those kinks in its curve and therefore it's just nice and smooth gradual progression. 40 00:02:21,720 --> 00:02:26,340 So anything below 0 is just like drops off above zero. 41 00:02:26,340 --> 00:02:35,220 It acts approximates towards one and this sigmoid function is very useful in the final Lehren the output 42 00:02:35,220 --> 00:02:35,590 layer. 43 00:02:35,610 --> 00:02:38,900 Especially when you're trying to predict probabilities. 44 00:02:38,910 --> 00:02:40,820 And we'll see that throughout the course. 45 00:02:41,190 --> 00:02:47,370 And then we've got the rectifier function rectifier function even though it has a kink is one of the 46 00:02:47,370 --> 00:02:55,090 most popular functions for artificial neural networks so it goes all the way to zero it is zero. 47 00:02:55,110 --> 00:03:02,460 And then from there it's gradually progresses as the input value increases as well and we'll see that 48 00:03:02,460 --> 00:03:07,140 throughout the course we'll see that in other intuition tutorials and we also see that how we use this 49 00:03:07,140 --> 00:03:13,020 function in the practical side of the course and I will comment on this a bit more in a few slides from 50 00:03:13,020 --> 00:03:13,590 now. 51 00:03:13,590 --> 00:03:18,970 So just remember the direct fire function is one of the most used functions in artificial neural networks. 52 00:03:19,020 --> 00:03:22,770 And finally we've got one more function that you will probably hear about. 53 00:03:22,830 --> 00:03:25,220 It's the hyperbolic tangent function. 54 00:03:25,260 --> 00:03:32,760 It's very similar to the sigmoid function but here the hyperbolic tangent function goes below zero so 55 00:03:32,760 --> 00:03:39,510 the values go from 0 to 1 or approximately 2 1 and go from zero to minus 1 on the other side. 56 00:03:39,750 --> 00:03:42,360 And that can be useful in some applications. 57 00:03:42,390 --> 00:03:48,060 So we're not going to go into too much depth on each one of these functions I just wanted to acquaint 58 00:03:48,060 --> 00:03:51,680 you with them so that you know what they look like and what they're called. 59 00:03:51,780 --> 00:03:59,690 If you'd like to get some additional reading then check out this paper by a 75 year lot. 60 00:03:59,820 --> 00:04:05,630 Have you a lot called Deep sparse rectifies neural networks 2000 paper. 61 00:04:05,790 --> 00:04:14,700 And there you will find out exactly why the rectifier function is such a valuable function why it's 62 00:04:14,970 --> 00:04:16,300 so popularly used. 63 00:04:16,350 --> 00:04:20,640 But nevertheless for now we don't really need to know all of those things. 64 00:04:20,650 --> 00:04:24,240 For now we're just going to start applying them which you start using them more and more and more. 65 00:04:24,270 --> 00:04:31,290 And so when you feel comfortable with the practical side of things then you can go and refer to this 66 00:04:31,290 --> 00:04:37,140 paper and then you will be able to soak in that knowledge much quicker and it will make much more sense. 67 00:04:37,370 --> 00:04:42,000 But just keep this in mind that when you're ready when you feel that you're ready then you can go and 68 00:04:42,120 --> 00:04:45,060 research paper and get some valuable knowledge from them. 69 00:04:45,540 --> 00:04:53,070 So just to quickly recap we have the threshold activation function which goes like this the sigmoid 70 00:04:53,100 --> 00:04:55,360 activation function which looks like this. 71 00:04:55,680 --> 00:05:01,770 We have the rectifier function and we have the hyperbolic tangent function and now to finish off this 72 00:05:01,770 --> 00:05:09,000 tutorial Let's quickly do a few exercise so just do two quick exercises to help that knowledge sink 73 00:05:09,000 --> 00:05:09,150 in. 74 00:05:09,150 --> 00:05:15,140 So first one is we've got an example here of a neural network of just one neuron and that right away 75 00:05:15,160 --> 00:05:16,030 the output layer. 76 00:05:16,140 --> 00:05:22,620 And the question is assuming that your dependent variable is binary So it's either 0 or 1 which threshold 77 00:05:22,620 --> 00:05:23,780 function would you use. 78 00:05:23,790 --> 00:05:31,980 So out of the ones that we've discussed we have a threshold function the sigmoid function the rectifier 79 00:05:31,980 --> 00:05:39,480 function and we've got the hyperbolic tangent function in it's in their roll forms which ones would 80 00:05:39,480 --> 00:05:43,450 you be able to use for a binary variable. 81 00:05:43,950 --> 00:05:44,410 OK. 82 00:05:44,490 --> 00:05:49,360 So the answers here are there's two options that we can approach this with. 83 00:05:49,380 --> 00:05:55,790 So one is the threshold activation function because we know that it's between 0 and 1 and it gives us 84 00:05:55,800 --> 00:06:00,090 0 Anderson umbrellas and then otherwise it gives you once it only can give you two values. 85 00:06:00,090 --> 00:06:10,020 It fits perfectly fits this requirement perfectly and therefore you could you say y equals the threshold 86 00:06:10,020 --> 00:06:13,770 function of your sway to some and that's it. 87 00:06:14,010 --> 00:06:18,450 And in the second case which you could use is the sigmoid activation function. 88 00:06:18,450 --> 00:06:21,710 It is actually also between 0 and 1 just what we need. 89 00:06:21,750 --> 00:06:29,940 But at the same time you want is just one right so you is not exactly what we need but in this case 90 00:06:29,940 --> 00:06:37,530 which you could use it as is the probability of Y being yes or no. 91 00:06:37,530 --> 00:06:46,170 So we want Y to be 0 1 but instead we'll say that the sigmoid function Simoun activation function tells 92 00:06:46,170 --> 00:06:51,860 us whether it would tell us of the probability of Y being equal to 1. 93 00:06:51,870 --> 00:06:59,130 So basically the closer you get to the top the more likely it is that this is indeed a one or a yes 94 00:06:59,160 --> 00:07:00,300 rather than a no. 95 00:07:00,750 --> 00:07:04,700 And yeah so that's very similar to the logistic regression approach. 96 00:07:04,920 --> 00:07:07,570 And those are just two examples. 97 00:07:07,650 --> 00:07:09,610 If you have a binary variable. 98 00:07:10,120 --> 00:07:12,810 Now let's have a look at another practical application. 99 00:07:12,810 --> 00:07:17,190 Let's have a look at how all this would play out if we had in your all natural like this. 100 00:07:17,430 --> 00:07:20,960 So in the first layer we have some inputs. 101 00:07:20,980 --> 00:07:26,060 They are sent off to our first hidden layer and then an activation function is applied. 102 00:07:26,070 --> 00:07:31,380 And usually what you would apply here and what you will see throughout the Scorsese will apply a rectifier 103 00:07:31,410 --> 00:07:34,510 activation function so it would look something like that. 104 00:07:34,530 --> 00:07:40,980 We apply the rectifier activation function and then from there the signals would be passed on to the 105 00:07:40,980 --> 00:07:46,820 output layer where the sigmoid activation function would be applied and that would be our final output. 106 00:07:46,830 --> 00:07:51,270 And that could predict a probability for instance so this combination is going to be quite common where 107 00:07:51,600 --> 00:07:58,640 in the hidden layers we apply the rectifier function and then output there we apply the sigmoid function. 108 00:07:58,890 --> 00:07:59,850 So there we go. 109 00:07:59,850 --> 00:08:05,040 Hope you enjoyed this tutorial now you are quite well versed in four different types of activation functions 110 00:08:05,040 --> 00:08:11,130 and you will get some hands on practical experience with them throughout this course will be using them 111 00:08:11,220 --> 00:08:15,900 all over the place so you'll get to know them quite intimately and you should be quite comfortable with 112 00:08:15,900 --> 00:08:16,310 them. 113 00:08:16,530 --> 00:08:22,230 But for now this is the knowledge that you need to progress and understand what he's going to be happening 114 00:08:22,250 --> 00:08:23,600 further down in this course. 115 00:08:23,940 --> 00:08:26,940 And on that note I look forward to seeing you next time. 116 00:08:26,940 --> 00:08:28,560 Until then enjoy learning. 12783