Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:05,470 --> 00:00:09,230
Welcome back everyone to this lecture on neural networks.
2
00:00:09,490 --> 00:00:14,770
So we just learned about a single percent Trump however a single perception model won't be enough to
3
00:00:14,770 --> 00:00:16,810
learn complicated systems.
4
00:00:16,810 --> 00:00:22,660
Fortunately we can expand on the idea of a single perception to create a multilayer perception model
5
00:00:23,020 --> 00:00:27,230
commonly known as a basic artificial neural network.
6
00:00:27,230 --> 00:00:33,130
We'll also introduce the idea of activation functions which will cover in a future lecture.
7
00:00:33,140 --> 00:00:37,820
So how do we actually build out a neural network from the idea of a single perception.
8
00:00:37,820 --> 00:00:39,800
Well it's actually quite simple.
9
00:00:39,830 --> 00:00:45,140
What we do is we simply build a network of perceptions reconnect layers of perceptions using what's
10
00:00:45,140 --> 00:00:47,420
known as the multilayer perception model.
11
00:00:47,480 --> 00:00:51,220
Essentially we have a vertical layer of these neurons.
12
00:00:51,230 --> 00:00:57,640
The single perceptions and we take their outputs and then feed them to the next layer of perceptions
13
00:00:57,680 --> 00:01:01,950
so that the output of the previous layer becomes the input of the next layer.
14
00:01:02,000 --> 00:01:07,160
And you'll notice here that every neuron at least in this illustration is connected to every neuron
15
00:01:07,190 --> 00:01:08,000
in the next layer.
16
00:01:08,030 --> 00:01:10,480
And this is known as a fully connected layer.
17
00:01:10,550 --> 00:01:15,200
We'll learn about different types of layers and different network configurations like recurrent neural
18
00:01:15,200 --> 00:01:19,000
networks and convolution neural networks in future section of the course.
19
00:01:19,070 --> 00:01:25,160
But right now we're just focused on a feed forward that is to say that all the information is going
20
00:01:25,160 --> 00:01:30,080
from the input layer all the way to the end the output layer and it's fully connected which means every
21
00:01:30,080 --> 00:01:36,800
neuron in one layer is connected to every neuron in the next layer so the outputs of one perception
22
00:01:36,830 --> 00:01:39,610
are then directly fed as inputs to another perception.
23
00:01:39,680 --> 00:01:44,720
And you'll notice I've begun to kind of switch the words perception to neurons and it's up to you which
24
00:01:44,720 --> 00:01:45,680
one you prefer.
25
00:01:45,680 --> 00:01:51,410
But one word in general discussing neural networks we'll start to lay off the single perceptual on terminology
26
00:01:51,740 --> 00:01:54,200
and then use the more flexible neuron terminology.
27
00:01:54,200 --> 00:02:00,240
So from now on you'll probably hear me say neuron a lot more than single perceptual.
28
00:02:00,300 --> 00:02:05,070
So basically this allows the network as a whole to learn about interactions and relationships between
29
00:02:05,070 --> 00:02:12,870
features so the first layer coming in is the input layer and the layer that directly receives the data.
30
00:02:12,900 --> 00:02:17,880
And this could be things like tabular data that has features on information that you're trying to predict
31
00:02:17,910 --> 00:02:22,780
a label off of the last layer is known as the output layer.
32
00:02:22,810 --> 00:02:28,120
And keep in mind even though this illustration just shows one neuron in the output layer this last layer
33
00:02:28,120 --> 00:02:32,590
can be more than one neuron especially when we're dealing with things like multi class classification
34
00:02:34,000 --> 00:02:40,720
then any layers in between the input layer and the output layer are known as hidden layers hidden layers
35
00:02:40,810 --> 00:02:46,270
are difficult to interpret due to their high interconnectivity and distance away from known input and
36
00:02:46,300 --> 00:02:51,310
output values so the input layer that's really easy to interpret because that's directly accepting the
37
00:02:51,310 --> 00:02:57,310
inputs of the raw data that you already know and understand the output layer is also a little easier
38
00:02:57,310 --> 00:03:02,490
to interpret since it's closely associated with the label that you're actually trying to predict.
39
00:03:02,650 --> 00:03:07,190
The hidden layers especially as you have a deeper and deeper network with more and more hidden layers.
40
00:03:07,330 --> 00:03:11,920
Those become more and more like a black box since for really large networks.
41
00:03:11,920 --> 00:03:17,110
It's difficult to understand why a single neuron is picking up as far as the interconnectivity of the
42
00:03:17,110 --> 00:03:24,910
last layer and the next layer so questions students often ask is when there's a neural network become
43
00:03:24,970 --> 00:03:29,130
a deep neural network and really the terminology is quite simple.
44
00:03:29,170 --> 00:03:36,710
You have a deep neural network when you contain two or more hidden layers so here we can see on the
45
00:03:36,710 --> 00:03:37,680
left hand side.
46
00:03:37,760 --> 00:03:40,400
This network only has a single hidden layer.
47
00:03:40,400 --> 00:03:46,250
You'll notice that the hidden layer is quite wide meaning it has quite a few neurons so the width of
48
00:03:46,250 --> 00:03:51,620
a network is how many neurons are in the layer and the depth of a network is how many layers there are
49
00:03:51,620 --> 00:03:52,350
in total.
50
00:03:52,490 --> 00:03:58,060
So a deep neural network is two or more hidden layers.
51
00:03:58,120 --> 00:04:02,890
So again to overview the terminology we have the input layer first layer that directly accepts real
52
00:04:02,890 --> 00:04:08,740
data values hidden layer any layer in between those input output layers and if we have two or more we
53
00:04:08,740 --> 00:04:13,210
begin to have what's known as a deep neural network and then the output layer is the final estimate
54
00:04:13,270 --> 00:04:16,230
of the output that the network estimates.
55
00:04:16,510 --> 00:04:21,580
Now something that I find really incredible about the neural network framework is that it can actually
56
00:04:21,580 --> 00:04:27,970
be used to approximate any continuous function and researchers Lou and then also later on Boris Hannan
57
00:04:28,240 --> 00:04:34,480
proved mathematically in the same way you would have a proof of geometry that neural networks can approximate
58
00:04:34,600 --> 00:04:36,790
any convex continuous function.
59
00:04:36,850 --> 00:04:42,490
So for any convex continuous function essentially a function that you can continually integrate over
60
00:04:42,850 --> 00:04:48,960
there should be somewhere out there some network that can directly approximate that function.
61
00:04:49,210 --> 00:04:54,490
And while you can't maybe find that network directly and it may take quite a bit of time before you
62
00:04:54,490 --> 00:04:59,440
train the network find the right number of neurons and the width and depth there theoretically and it's
63
00:04:59,440 --> 00:05:00,810
been proven mathematically.
64
00:05:00,910 --> 00:05:07,320
There is some network out there that can actually approximate that function so for more details on this
65
00:05:07,330 --> 00:05:11,620
since the scope of that proof in the mathematics is outside the scope of this course and it's not really
66
00:05:11,620 --> 00:05:16,240
important that you understand that to actually build out these neural networks you can check out the
67
00:05:16,240 --> 00:05:19,930
wikipedia page for universal approximation theorem.
68
00:05:19,930 --> 00:05:20,940
It's a really cool page.
69
00:05:20,950 --> 00:05:25,900
I definitely encourage you to check it out but that page basically discusses in more detail how those
70
00:05:25,900 --> 00:05:31,210
mathematical proofs actually work and what's really cool about a lot of those mathematical proofs is
71
00:05:31,540 --> 00:05:37,540
years even proofs on placing restraints on the width in order to match up to the inputs of the continuous
72
00:05:37,540 --> 00:05:42,520
function and you can still prove mathematically that there is some network out there even with those
73
00:05:42,520 --> 00:05:47,950
constraints of the number of neurons with and then at work that will still approximate that continuous
74
00:05:47,950 --> 00:05:48,370
function.
75
00:05:48,400 --> 00:05:54,090
So definitely something I encourage you to check out if you're interested in that now previously if
76
00:05:54,090 --> 00:05:59,610
you recall in our simple model of the single perceptual one we saw that that single perceptual on that
77
00:05:59,610 --> 00:06:03,240
neuron if you will it contains a very simple summation function.
78
00:06:03,240 --> 00:06:09,000
We essentially took our inputs X multiplied them by their own weights W and then we added that neurons
79
00:06:09,000 --> 00:06:11,080
bias and sum that all up.
80
00:06:11,460 --> 00:06:15,660
However in most used cases we really don't want to just do a straight sum.
81
00:06:15,720 --> 00:06:21,690
Instead we're going to want to be able to set constraints to our output values especially in classification
82
00:06:21,690 --> 00:06:28,600
tasks so you can imagine in the classification tasks it would be really useful to have all the outputs
83
00:06:28,600 --> 00:06:35,110
fall between 0 and 1 and then these values can present the probability assignments for each class.
84
00:06:35,500 --> 00:06:40,630
If we just have a large summation we have no upper limit on what values can be.
85
00:06:40,780 --> 00:06:47,020
So you should always start thinking well what kind of functions can we apply to those inputs times the
86
00:06:47,020 --> 00:06:52,810
weights plus the bias in order to put constraints or upper and lower limits and what the neuron value
87
00:06:52,810 --> 00:06:53,900
will actually spit out.
88
00:06:54,010 --> 00:06:56,880
And that's where activation functions come into play.
89
00:06:56,950 --> 00:07:02,320
So in the next lecture we're gonna explore how to use activation functions to set boundaries to output
90
00:07:02,320 --> 00:07:07,450
values from the neuron instead of just summing everything up like within that simple perception model.
91
00:07:07,450 --> 00:07:12,550
And there's a wide variety of activation functions which is why we'll have an entire lecture on it.
92
00:07:12,550 --> 00:07:18,700
So up next we're going to talk about activation functions and a few options you'll have we'll see whether.
11036
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.