Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,000 --> 00:00:06,000
Kylie Ying has worked at many interesting places such as MIT,
2
00:00:06,000 --> 00:00:10,879
She's a physicist, engineer, and basically a genius. And now she's
3
00:00:10,880 --> 00:00:14,720
about machine learning in a way that is accessible to absolute
4
00:00:15,279 --> 00:00:21,600
What's up you guys? So welcome to Machine Learning for Everyone.
5
00:00:21,600 --> 00:00:27,520
is interested in machine learning and you think you are considered
6
00:00:27,519 --> 00:00:33,039
is for you. In this video, we'll talk about supervised and
7
00:00:33,039 --> 00:00:39,200
we'll go through maybe a little bit of the logic or math behind
8
00:00:39,200 --> 00:00:46,960
we can program it on Google CoLab. If there are certain things
9
00:00:46,960 --> 00:00:50,960
you're somebody with more experience than me, please feel free to
10
00:00:50,960 --> 00:00:58,000
and we can all as a community learn from this together. So with
11
00:00:58,000 --> 00:01:02,159
Without wasting any time, let's just dive straight into the code
12
00:01:02,159 --> 00:01:11,039
concepts as we go. So this here is the UCI machine learning
13
00:01:11,040 --> 00:01:15,280
they just have a ton of data sets that we can access. And I found
14
00:01:15,280 --> 00:01:22,560
the magic gamma telescope data set. So in this data set, if you
15
00:01:22,560 --> 00:01:28,320
to summarize what I what I think is going on, is there's this
16
00:01:28,319 --> 00:01:34,239
these high energy particles hitting the telescope. Now there's a
17
00:01:34,239 --> 00:01:40,399
actually records certain patterns of you know, how this light hits
18
00:01:40,400 --> 00:01:46,640
properties of those patterns in order to predict what type of
19
00:01:46,640 --> 00:01:54,879
whether it was a gamma particle, or some other head, like hadron.
20
00:01:54,879 --> 00:02:00,000
the attributes of those patterns that we collect in the camera. So
21
00:02:00,000 --> 00:02:06,480
know, some length, width, size, asymmetry, etc. Now we're going to
22
00:02:06,480 --> 00:02:12,400
help us discriminate the patterns and whether or not they came
23
00:02:13,199 --> 00:02:19,519
So in order to do this, we're going to come up here, go to the
24
00:02:19,520 --> 00:02:28,240
to click this magic zero for data, and we're going to download
25
00:02:28,240 --> 00:02:34,320
notebook open. So you go to colab dot research dot google.com, you
26
00:02:34,319 --> 00:02:43,120
I'm just going to call this the magic data set. So actually, I'm
27
00:02:43,120 --> 00:02:52,240
magic example. Okay. So with that, I'm going to first start with
28
00:02:52,240 --> 00:03:04,560
you know, I always import NumPy, I always import pandas. And I
29
00:03:06,080 --> 00:03:11,360
And then we'll import other things as we go. So yeah
30
00:03:14,080 --> 00:03:19,200
we run that in order to run the cell, you can either click this
31
00:03:19,199 --> 00:03:24,319
on my computer, it's just shift enter and that that will run the
32
00:03:24,319 --> 00:03:29,120
to order I'm just going to, you know, let you guys know, okay,
33
00:03:30,000 --> 00:03:34,080
So I've copied and pasted this actually, but this is just where I
34
00:03:35,199 --> 00:03:40,639
And in order to import that downloaded file that we we got from
35
00:03:40,639 --> 00:03:49,119
over here to this folder thing. And I am literally just going to
36
00:03:50,800 --> 00:03:55,840
Okay. So in order to take a look at, you know, what does this file
37
00:03:55,840 --> 00:03:59,840
do we have the labels? Do we not? I mean, we could open it on our
38
00:04:00,960 --> 00:04:06,640
pandas read CSV. And we can pass in the name of this file.
39
00:04:06,639 --> 00:04:14,559
And let's see what it returns. So it doesn't seem like we have the
40
00:04:16,160 --> 00:04:23,600
I'm just going to make the columns, the column labels, all of
41
00:04:23,600 --> 00:04:29,120
So I'm just going to take these values and make that the column
42
00:04:29,120 --> 00:04:36,079
All right, how do I do that? So basically, I will come back here,
43
00:04:36,079 --> 00:04:50,560
calls. And I will type in all of those things. With f size, f
44
00:04:50,560 --> 00:05:06,079
We have f symmetry, f m three long, f m three trans, f alpha.
45
00:05:09,839 --> 00:05:16,639
Okay, great. Now in order to label those as these columns down
46
00:05:16,639 --> 00:05:22,879
So basically, this command here just reads some CSV file that you
47
00:05:22,879 --> 00:05:31,519
separated values, and turns that into a pandas data frame object.
48
00:05:31,519 --> 00:05:38,799
then it basically assigns these labels to the columns of this data
49
00:05:38,800 --> 00:05:44,960
this data frame equal to DF. And then if we call the head is just
50
00:05:44,959 --> 00:05:50,799
give me the first five things. Now you'll see that we have labels
51
00:05:52,000 --> 00:05:57,519
All right, great. So one thing that you might notice is that over
52
00:05:57,519 --> 00:06:05,279
we have G and H. So if I actually go down here, and I do data
53
00:06:07,199 --> 00:06:11,519
you'll see that I have either G's or H's, and these stand for
54
00:06:11,519 --> 00:06:17,439
And our computer is not so good at understanding letters, right?
55
00:06:17,439 --> 00:06:23,279
understanding numbers. So what we're going to do is we're going to
56
00:06:23,279 --> 00:06:35,679
one for H. So here, I'm going to set this equal to this, whether
57
00:06:35,680 --> 00:06:42,560
I'm just going to say as type int. So what this should do is
58
00:06:43,360 --> 00:06:48,720
if it equals G, then this is true. So I guess that would be one.
59
00:06:48,720 --> 00:06:52,800
be false. So that would be zero, but I'm just converting G and H
60
00:06:52,800 --> 00:07:02,240
really matter. Like, if G is one and H is zero or vice versa. Let
61
00:07:02,240 --> 00:07:09,439
now and talk about this data set. So here I have some data frame,
62
00:07:09,439 --> 00:07:18,240
values for each entry. Now this is a you know, each of these is
63
00:07:18,240 --> 00:07:23,199
it's one item in our data set, it's one data point, all of these
64
00:07:23,199 --> 00:07:29,120
thing when I mentioned, oh, this is one example, or this is one
65
00:07:29,120 --> 00:07:36,240
these samples, they have, you know, one quality for each or one
66
00:07:36,240 --> 00:07:41,600
up here, and then it has the class. Now what we're going to do in
67
00:07:41,600 --> 00:07:50,800
predict for future, you know, samples, whether the class is G for
68
00:07:50,800 --> 00:08:00,319
that is something known as classification. Now, all of these up
69
00:08:00,319 --> 00:08:05,759
and features are just things that we're going to pass into our
70
00:08:05,759 --> 00:08:12,879
the label, which in this case is the class column. So for you
71
00:08:14,240 --> 00:08:19,519
10 different features. So I have 10 different values that I can
72
00:08:19,519 --> 00:08:26,719
And I can spit out, you know, the class the label, and I know the
73
00:08:26,720 --> 00:08:35,440
this is actually supervised learning. All right. So before I move
74
00:08:35,440 --> 00:08:43,360
little crash course on what I just said. This is machine learning
75
00:08:43,360 --> 00:08:49,759
question is, what is machine learning? Well, machine learning is a
76
00:08:49,759 --> 00:08:56,000
that focuses on certain algorithms, which might help a computer
77
00:08:56,000 --> 00:09:01,360
programmer being there telling the computer exactly what to do.
78
00:09:01,360 --> 00:09:08,480
programming. So you might have heard of AI and ML and data
79
00:09:08,480 --> 00:09:14,720
all of these. So AI is artificial intelligence. And that's an area
80
00:09:14,720 --> 00:09:22,080
goal is to enable computers and machines to perform human like
81
00:09:23,600 --> 00:09:31,600
Now machine learning is a subset of AI that tries to solve one
82
00:09:31,600 --> 00:09:39,840
using certain data. And data science is a field that attempts to
83
00:09:39,840 --> 00:09:45,840
from data. And that might mean we're using machine learning. So
84
00:09:45,840 --> 00:09:52,560
and all of them might use machine learning. So there are a few
85
00:09:52,559 --> 00:09:58,399
The first one is supervised learning. And in supervised learning,
86
00:09:58,399 --> 00:10:05,360
So this means whatever input we get, we have a corresponding
87
00:10:05,360 --> 00:10:12,960
models and to learn outputs of different new inputs that we might
88
00:10:12,960 --> 00:10:19,040
I might have these pictures, okay, to a computer, all these
89
00:10:19,039 --> 00:10:27,439
with a certain color. Now in supervised learning, all of these
90
00:10:27,440 --> 00:10:32,880
them, this is the output that we might want the computer to be
91
00:10:32,879 --> 00:10:39,200
over here, this picture is a cat, this picture is a dog, and this
92
00:10:41,600 --> 00:10:47,840
Now there's also unsupervised learning. And in unsupervised
93
00:10:47,840 --> 00:10:57,920
to learn about patterns in the data. So here are here are my input
94
00:10:57,919 --> 00:11:04,959
images, they're just pixels. Well, okay, let's say I have a bunch
95
00:11:05,759 --> 00:11:09,919
And what I can do is I can feed all these to my computer. And I
96
00:11:09,919 --> 00:11:14,479
my computer is not going to be able to say, Oh, this is a cat, dog
97
00:11:14,480 --> 00:11:19,680
you know, the output. But it might be able to cluster all these
98
00:11:19,679 --> 00:11:26,079
Hey, all of these have something in common. All of these have
99
00:11:26,080 --> 00:11:31,680
down here have something in common, that's finding some sort of
100
00:11:33,679 --> 00:11:40,159
And finally, we have reinforcement learning. And reinforcement
101
00:11:40,159 --> 00:11:46,480
there's an agent that is learning in some sort of interactive
102
00:11:46,480 --> 00:11:54,720
penalties. So let's think of a dog, we can train our dog, but
103
00:11:54,720 --> 00:12:02,879
any wrong or right output at any given moment, right? Well, let's
104
00:12:03,600 --> 00:12:08,240
Essentially, what we're doing is we're giving rewards to our
105
00:12:08,240 --> 00:12:15,200
Hey, this is probably something good that you want to keep doing.
106
00:12:16,879 --> 00:12:21,759
But in this class today, we'll be focusing on supervised learning
107
00:12:21,759 --> 00:12:29,120
and learning different models for each of those. Alright, so let's
108
00:12:29,120 --> 00:12:35,120
first. So this is kind of what a machine learning model looks like
109
00:12:35,120 --> 00:12:40,960
that are going into some model. And then the model is spitting out
110
00:12:41,919 --> 00:12:48,399
So all these inputs, this is what we call the feature vector. Now
111
00:12:48,399 --> 00:12:53,919
of features that we can have, we might have qualitative features.
112
00:12:53,919 --> 00:13:01,360
categorical data, there's either a finite number of categories or
113
00:13:01,360 --> 00:13:07,440
qualitative feature might be gender. And in this case, there's
114
00:13:07,440 --> 00:13:13,200
the example, I know this might be a little bit outdated. Here we
115
00:13:13,200 --> 00:13:19,840
two genders, there are two different categories. That's a piece of
116
00:13:19,840 --> 00:13:25,600
example might be okay, we have, you know, a bunch of different
117
00:13:25,600 --> 00:13:33,279
a nation or a location, that might also be an example of
118
00:13:33,279 --> 00:13:43,199
these, there's no inherent order. It's not like, you know, we can
119
00:13:43,200 --> 00:13:51,840
three, etc. Right? There's not really any inherent order built
120
00:13:51,840 --> 00:14:00,240
data sets. That's why we call this nominal data. Now, for nominal
121
00:14:00,240 --> 00:14:06,639
to feed it into our computer is using something called one hot
122
00:14:06,639 --> 00:14:13,120
know, I have a data set, some of the items in our data, some of
123
00:14:13,120 --> 00:14:19,200
some might be from India, then Canada, then France. Now, how do we
124
00:14:19,200 --> 00:14:24,560
we have to do something called one hot encoding. And basically,
125
00:14:24,559 --> 00:14:30,239
well, if it matches some category, make that a one. And if it
126
00:14:31,120 --> 00:14:40,159
So for example, if your input were from the US, you would you
127
00:14:40,159 --> 00:14:46,879
0100. Canada, okay, well, the item representing Canada is one and
128
00:14:46,879 --> 00:14:52,240
France is one. And then you can see that the rest are zeros,
129
00:14:54,480 --> 00:15:00,480
Now, there are also a different type of qualitative feature. So
130
00:15:00,480 --> 00:15:07,440
there are different age groups, there's babies, toddlers,
131
00:15:08,639 --> 00:15:15,840
adults, and so on, right. And on the right hand side, we might
132
00:15:15,840 --> 00:15:26,160
bad, not so good, mediocre, good, and then like, great. Now, these
133
00:15:26,159 --> 00:15:33,600
data, because they have some sort of inherent order, right? Like,
134
00:15:33,600 --> 00:15:41,680
being a baby than being an elderly person, right? Or good is
135
00:15:41,679 --> 00:15:48,559
bad. So these have some sort of inherent ordering system. And so
136
00:15:48,559 --> 00:15:54,399
we can actually just mark them from, you know, one to five, or we
137
00:15:54,399 --> 00:16:02,959
let's give it a number. And this makes sense. Because, like, for
138
00:16:02,960 --> 00:16:09,759
just said, how good is closer to great, then good is close to not
139
00:16:09,759 --> 00:16:14,559
to five, then four is close to one. So this actually kind of makes
140
00:16:14,559 --> 00:16:22,399
computer as well. Alright, there are also quantitative pieces of
141
00:16:22,960 --> 00:16:29,040
pieces of data are numerical valued pieces of data. So this could
142
00:16:29,039 --> 00:16:34,159
you know, they might be integers, or it could be continuous, which
143
00:16:34,159 --> 00:16:40,799
So for example, the length of something is a quantitative piece of
144
00:16:40,799 --> 00:16:46,559
feature, the temperature of something is a quantitative feature.
145
00:16:46,559 --> 00:16:53,679
Easter eggs I collected in my basket, this Easter egg hunt, that
146
00:16:53,679 --> 00:17:02,079
feature. Okay, so these are continuous. And this over here is the
147
00:17:02,080 --> 00:17:08,400
that go into our feature vector, those are our features that we're
148
00:17:08,400 --> 00:17:14,800
our computers are really, really good at understanding math, right
149
00:17:14,799 --> 00:17:19,680
they're not so good at understanding things that humans might be
150
00:17:21,759 --> 00:17:29,680
Well, what are the types of predictions that our model can output?
151
00:17:29,680 --> 00:17:35,440
there are some different tasks, there's one classification, and
152
00:17:35,440 --> 00:17:42,000
just saying, okay, predict discrete classes. And that might mean,
153
00:17:42,799 --> 00:17:48,639
this is a pizza, and this is ice cream. Okay, so there are three
154
00:17:48,640 --> 00:17:56,480
pictures of hot dogs, pizza or ice cream, I can put under these
155
00:17:56,480 --> 00:18:03,440
Hot dog, pizza, ice cream. This is something known as multi class
156
00:18:03,440 --> 00:18:10,640
binary classification. And binary classification, you might have
157
00:18:10,640 --> 00:18:14,240
only two categories that you're working with something that is
158
00:18:14,240 --> 00:18:23,680
isn't binary classification. Okay, so yeah, other examples. So if
159
00:18:23,680 --> 00:18:28,960
sentiment, that's binary classification. Maybe you're predicting
160
00:18:28,960 --> 00:18:35,039
dogs. That's binary classification. Maybe, you know, you are
161
00:18:35,039 --> 00:18:40,559
trying to figure out if an email spam or not spam. So that's also
162
00:18:41,759 --> 00:18:46,240
Now for multi class classification, you might have, you know, cat,
163
00:18:46,960 --> 00:18:53,519
rabbit, etc. We might have different types of fruits like orange,
164
00:18:53,519 --> 00:18:59,440
maybe different plant species. But multi class classification just
165
00:18:59,440 --> 00:19:06,320
and binary means we're predicting between two things. There's also
166
00:19:06,319 --> 00:19:11,359
when we talk about supervised learning. And this just means we're
167
00:19:11,359 --> 00:19:15,759
values. So instead of just trying to predict different categories,
168
00:19:15,759 --> 00:19:24,400
with a number that you know, is on some sort of scale. So some
169
00:19:24,400 --> 00:19:31,040
be the price of aetherium tomorrow, or it might be okay, what is
170
00:19:31,759 --> 00:19:37,440
Or it might be what is the price of this house? Right? So these
171
00:19:37,440 --> 00:19:43,920
discrete classes. We're trying to predict a number that's as close
172
00:19:43,920 --> 00:19:51,759
using different features of our data set. So that's exactly what
173
00:19:51,759 --> 00:19:59,279
supervised learning. Now let's talk about the model itself. How do
174
00:19:59,920 --> 00:20:05,120
Or how can we tell whether or not it's even learning? So before we
175
00:20:05,680 --> 00:20:10,320
let's talk about how can we actually like evaluate these models?
176
00:20:10,319 --> 00:20:19,039
whether something is a good model or bad model? So let's take a
177
00:20:19,039 --> 00:20:26,639
set has this is from a diabetes, a Pima Indian diabetes data set.
178
00:20:26,640 --> 00:20:32,640
number of pregnancies, different glucose levels, blood pressure,
179
00:20:32,640 --> 00:20:37,520
age, and then the outcome whether or not they have diabetes one
180
00:20:37,519 --> 00:20:46,639
So here, all of these are quantitative features, right, because
181
00:20:48,720 --> 00:20:56,160
So each row is a different sample in the data. So it's a different
182
00:20:56,160 --> 00:21:04,240
and each row represents one person in this data set. Now this
183
00:21:04,240 --> 00:21:11,599
different feature. So this one here is some measure of blood
184
00:21:11,599 --> 00:21:17,119
over here, as we mentioned is the output label. So this one is
185
00:21:19,039 --> 00:21:23,759
And as I mentioned, this is what we would call a feature vector,
186
00:21:23,759 --> 00:21:33,519
features in one sample. And this is what's known as the target, or
187
00:21:33,519 --> 00:21:41,279
vector. That's what we're trying to predict. And all of these
188
00:21:42,640 --> 00:21:51,920
And over here, this is our labels or targets vector y. So I've
189
00:21:51,920 --> 00:21:58,000
bar to kind of talk about some of the other concepts in machine
190
00:21:58,000 --> 00:22:08,160
we have our x, our features matrix, and over here, this is our
191
00:22:08,160 --> 00:22:15,200
will be fed into our model, right. And our model will make some
192
00:22:15,200 --> 00:22:21,920
is we compare that prediction to the actual value of y that we
193
00:22:21,920 --> 00:22:26,960
that's the whole point of supervised learning is we can compare
194
00:22:26,960 --> 00:22:31,920
oh, what is the truth, actually, and then we can go back and we
195
00:22:31,920 --> 00:22:41,039
iteration, we get closer to what the true value is. So that whole
196
00:22:41,039 --> 00:22:46,399
okay, what's the difference? Where did we go wrong? That's what's
197
00:22:47,680 --> 00:22:54,080
Alright, so take this whole, you know, chunk right here, do we
198
00:22:54,079 --> 00:23:02,319
chocolate bar into the model to train our model? Not really,
199
00:23:02,319 --> 00:23:10,240
how do we know that our model can do well on new data that we
200
00:23:10,240 --> 00:23:18,000
create a model to predict whether or not someone has diabetes,
201
00:23:18,000 --> 00:23:23,119
data, and I see that all my training data does well, I go to some
202
00:23:23,119 --> 00:23:28,559
model. I think you can use this to predict if somebody has
203
00:23:28,559 --> 00:23:41,039
be effective or not? Probably not, right? Because we haven't
204
00:23:41,039 --> 00:23:46,879
generalize. Okay, it might do well after you know, our model has
205
00:23:46,880 --> 00:23:54,960
over again. But what about new data? Can our model handle new
206
00:23:54,960 --> 00:24:02,319
model to assess that? So we actually break up our whole data set
207
00:24:02,319 --> 00:24:07,759
types of data sets, we call it the training data set, the
208
00:24:07,759 --> 00:24:15,759
set. And you know, you might have 60% here 20% and 20% or 80 10
209
00:24:15,759 --> 00:24:22,000
many statistics you have, I think either of those would be
210
00:24:22,000 --> 00:24:28,960
the training data set into our model, we come up with, you know,
211
00:24:28,960 --> 00:24:36,079
corresponding with each sample that we put into our model, we
212
00:24:36,079 --> 00:24:42,879
between our prediction and the true values, this is something
213
00:24:42,880 --> 00:24:50,080
what's the difference here, in some numerical quantity, of course.
214
00:24:50,079 --> 00:24:57,599
and that's what we call training. Okay. So then, once you know,
215
00:24:58,480 --> 00:25:06,000
we can put our validation set through this model. And the
216
00:25:06,000 --> 00:25:14,559
check during or after training to ensure that the model can handle
217
00:25:14,559 --> 00:25:19,599
single time after we train one iteration, we might stick the
218
00:25:19,599 --> 00:25:25,679
the loss there. And then after our training is over, we can assess
219
00:25:25,680 --> 00:25:32,400
hey, what's the loss there. But one key difference here is that we
220
00:25:32,400 --> 00:25:38,080
this loss never gets fed back into the model, right, that feedback
221
00:25:38,799 --> 00:25:45,919
Alright, so let's talk about loss really quickly. So here, I have
222
00:25:45,920 --> 00:25:52,960
I have some sort of data that's being fed into the model, and then
223
00:25:52,960 --> 00:26:02,720
here is pretty far from you know, this truth that we want. And so
224
00:26:02,720 --> 00:26:07,839
model B, again, this is pretty far from what we want. So this loss
225
00:26:07,839 --> 00:26:15,759
let's give it 1.5. Now this one here, it's pretty close, I mean,
226
00:26:15,759 --> 00:26:23,839
to this one. So that might have a loss of 0.5. And then this one
227
00:26:23,839 --> 00:26:30,319
but still better than these two. So that loss might be 0.9. Okay,
228
00:26:30,319 --> 00:26:40,079
performs the best? Well, model C has a smallest loss, so it's
229
00:26:40,079 --> 00:26:45,679
take model C. After you know, we've come up with these, all these
230
00:26:45,680 --> 00:26:52,880
C is probably the best model. We take model C, and we run our test
231
00:26:52,880 --> 00:27:00,720
test set is used as a final check to see how generalizable that
232
00:27:00,720 --> 00:27:05,680
you know, finish training my diabetes data set, then I could run
233
00:27:05,680 --> 00:27:11,519
data and I can say, oh, like, this is how we perform on data that
234
00:27:11,519 --> 00:27:19,599
any point during the training process. Okay. And that loss, that's
235
00:27:19,599 --> 00:27:27,199
of my test set, or this would be the final reported performance of
236
00:27:29,279 --> 00:27:34,879
So let's talk about this thing called loss, because I think I kind
237
00:27:34,880 --> 00:27:41,600
right? So loss is the difference between your prediction and the
238
00:27:43,200 --> 00:27:50,640
So this would give a slightly higher loss than this. And this
239
00:27:50,640 --> 00:27:56,960
because it's even more off. In computer science, we like formulas,
240
00:27:57,599 --> 00:28:03,279
of describing things. So here are some examples of loss functions
241
00:28:03,279 --> 00:28:10,160
up with numbers. This here is known as L one loss. And basically,
242
00:28:10,160 --> 00:28:18,080
absolute value of whatever your you know, real value is, whatever
243
00:28:18,640 --> 00:28:26,160
subtracts the predicted value, and takes the absolute value of
244
00:28:26,160 --> 00:28:34,000
value is a function that looks something like this. So the further
245
00:28:35,519 --> 00:28:42,480
right in either direction. So if your real value is off from your
246
00:28:42,480 --> 00:28:47,519
then your loss for that point would be 10. And then this sum here
247
00:28:47,519 --> 00:28:53,039
we're taking all the points in our data set. And we're trying to
248
00:28:53,039 --> 00:29:01,599
everything is. Now, we also have something called L two loss. So
249
00:29:01,599 --> 00:29:08,559
which means that if it's close, the penalty is very minimal. And
250
00:29:08,559 --> 00:29:15,839
then the penalty is much, much higher. Okay. And this instead of
251
00:29:15,839 --> 00:29:26,000
the the difference between the two. Now, there's also something
252
00:29:26,960 --> 00:29:32,720
It looks something like this. And this is for binary
253
00:29:32,720 --> 00:29:38,960
loss that we use. So this loss, you know, I'm not going to really
254
00:29:38,960 --> 00:29:47,840
But you just need to know that loss decreases as the performance
255
00:29:47,839 --> 00:29:53,679
other measures of accurate or performance as well. So for example,
256
00:29:55,440 --> 00:30:02,559
So let's say that these are pictures that I'm feeding my model,
257
00:30:02,559 --> 00:30:11,359
might be apple, orange, orange, apple, okay, but the actual is
258
00:30:12,240 --> 00:30:17,680
three of them were correct. And one of them was incorrect. So the
259
00:30:17,680 --> 00:30:25,600
three quarters or 75%. Alright, coming back to our colab notebook,
260
00:30:25,599 --> 00:30:33,039
bit. Again, we've imported stuff up here. And we've already
261
00:30:33,039 --> 00:30:39,599
this is this is all of our data. This is what we're going to use
262
00:30:40,559 --> 00:30:49,039
again, if we now take a look at our data set, you'll see that our
263
00:30:49,039 --> 00:30:53,119
So now this is all numerical, which is good, because our computer
264
00:30:53,119 --> 00:31:00,719
Okay. And you know, it would probably be a good idea to maybe kind
265
00:31:00,720 --> 00:31:10,240
have anything to do with the class. So here, I'm going to go
266
00:31:10,240 --> 00:31:15,839
in the columns of this data frame. So this just gets me the list.
267
00:31:15,839 --> 00:31:20,879
right? It's called so let's just use that might be less confusing
268
00:31:20,880 --> 00:31:26,560
thing, which is the class. So I'm going to take all these 10
269
00:31:26,559 --> 00:31:37,039
to plot them as a histogram. So and now I'm going to plot them as
270
00:31:37,039 --> 00:31:45,599
take that data frame, and I say, okay, for everything where the
271
00:31:45,599 --> 00:31:55,279
of our gammas, remember, now, for that portion of the data frame,
272
00:31:55,279 --> 00:32:03,440
these, okay, what this part here is saying is, inside the data
273
00:32:03,440 --> 00:32:08,480
the class is equal to one. So that's all all of these would fit
274
00:32:09,119 --> 00:32:14,079
And now let's just look at the label column. So the first label
275
00:32:14,079 --> 00:32:20,480
be this column. So this command here is getting me all the
276
00:32:20,480 --> 00:32:27,200
for this specific label. And that's exactly what I'm going to put
277
00:32:27,200 --> 00:32:34,960
just going to tell you know, matplotlib make the color blue, make
278
00:32:37,039 --> 00:32:43,279
set alpha, why do I keep doing that, alpha equal to 0.7. So that's
279
00:32:43,279 --> 00:32:48,399
And then I'm going to set density equal to true, so that when we
280
00:32:50,000 --> 00:32:56,960
the hadrons here, we'll have a baseline for comparing them. Okay,
281
00:32:56,960 --> 00:33:05,360
just basically normalizes these distributions. So you know, if you
282
00:33:05,359 --> 00:33:12,079
and then 50 of another type, well, if you drew the histograms, it
283
00:33:12,079 --> 00:33:17,599
one of them would be a lot bigger than the other, right. But by
284
00:33:17,599 --> 00:33:24,240
distributing them over how many samples there are. Alright, and
285
00:33:24,240 --> 00:33:31,680
on here and make that the label, the y label. So because it's
286
00:33:32,799 --> 00:33:36,319
And the x label is just going to be the label.
287
00:33:36,319 --> 00:33:44,639
What is going on. And I'm going to include a legend and PLT dot
288
00:33:44,640 --> 00:33:54,800
the plot. So if I run that, just be up to the last item. So we
289
00:33:54,799 --> 00:34:02,240
item. And now we can see that we're plotting all of these. So here
290
00:34:02,240 --> 00:34:11,199
made this gamma. So this should be hadron. Okay, so the gammas in
291
00:34:11,199 --> 00:34:16,559
here we can already see that, you know, maybe if the length is
292
00:34:16,559 --> 00:34:24,320
to be gamma, right. And we can kind of you know, these all look
293
00:34:24,320 --> 00:34:34,640
clearly, if there's more asymmetry, or if you know, this asymmetry
294
00:34:34,639 --> 00:34:44,480
probably hadron. Okay, oh, this one's a good one. So f alpha seems
295
00:34:44,480 --> 00:34:48,960
distributed. Whereas if this is smaller, it looks like there's
296
00:34:48,960 --> 00:34:54,480
Okay, so this is kind of what the data that we're working with, we
297
00:34:55,920 --> 00:35:02,079
Okay, so the next thing that we're going to do here is we are
298
00:35:03,119 --> 00:35:12,880
our validation, and our test data sets. I'm going to set train
299
00:35:12,880 --> 00:35:20,800
this. So NumPy dot split, I'm just splitting up the data frame.
300
00:35:20,800 --> 00:35:29,360
where I'm sampling everything, this will basically shuffle my
301
00:35:29,360 --> 00:35:38,320
exactly I'm splitting my data set, so the first split is going to
302
00:35:38,320 --> 00:35:44,720
to say 0.6 times the length of this data frame. So and then cast
303
00:35:44,719 --> 00:35:50,559
to be the first place where you know, I cut it off, and that'll be
304
00:35:50,559 --> 00:35:57,360
then go to 0.8, this basically means everything between 60% and
305
00:35:57,360 --> 00:36:03,760
set will go towards validation. And then, like everything from 80
306
00:36:03,760 --> 00:36:12,080
my test data. So I can run that. And now, if we go up here, and we
307
00:36:12,079 --> 00:36:20,480
these columns seem to have values in like the 100s, whereas this
308
00:36:20,480 --> 00:36:28,240
all these numbers is way off. And sometimes that will affect our
309
00:36:28,239 --> 00:36:35,919
is way off. And sometimes that will affect our results. So one
310
00:36:35,920 --> 00:36:46,240
is scale these so that they are, you know, so that it's now
311
00:36:46,239 --> 00:36:54,399
standard deviation of that specific column. I'm going to create a
312
00:36:54,400 --> 00:37:04,880
And I'm going to pass in the data frame. And that's what I'll do
313
00:37:04,880 --> 00:37:14,320
going to be, you know, I take the data frame. And let's assume
314
00:37:14,320 --> 00:37:20,000
you know, that the label will always be the last thing in the data
315
00:37:20,000 --> 00:37:28,559
data frame, dot columns all the way up to the last item, and get
316
00:37:30,000 --> 00:37:34,239
well, it's the last column. So I can just do this, I can just
317
00:37:34,800 --> 00:37:46,640
and then get those values. Now, in, so I'm actually going to
318
00:37:46,639 --> 00:37:55,199
the standard scalar from sk learn. So if I come up here, I can go
319
00:37:56,079 --> 00:38:04,880
And I'm going to import standard scalar, I have to run that cell,
320
00:38:04,880 --> 00:38:10,880
And now I'm going to create a scalar and use that skip or so
321
00:38:10,880 --> 00:38:21,119
And with the scalar, what I can do is actually just fit and
322
00:38:21,119 --> 00:38:31,599
is equal to scalar dot fit, fit, transform x. So what that's doing
323
00:38:31,599 --> 00:38:36,799
fit the standard scalar to x, and then transform all those values.
324
00:38:36,800 --> 00:38:45,039
going to be our new x. Alright. And then I'm also going to just
325
00:38:45,039 --> 00:38:53,920
one huge 2d NumPy array. And in order to do that, I'm going to
326
00:38:53,920 --> 00:38:58,400
okay, take an array, and another array and horizontally stack them
327
00:38:58,400 --> 00:39:03,440
the H stands for. So by horizontally stacked them together, just
328
00:39:03,440 --> 00:39:09,200
okay, not on top of each other. So what am I stacking? Well, I
329
00:39:10,000 --> 00:39:20,400
so that it can stack x and y. And now, okay, so NumPy is very
330
00:39:20,400 --> 00:39:27,119
right? So in this specific case, our x is a two dimensional
331
00:39:27,119 --> 00:39:35,440
thing, it's only a vector of values. So in order to now reshape it
332
00:39:35,440 --> 00:39:45,200
NumPy dot reshape. And we can pass in the dimensions of its
333
00:39:45,199 --> 00:39:51,039
one comma one, that just means okay, make this a 2d array, where
334
00:39:51,039 --> 00:39:56,719
what what this dimension value would be, which ends up being the
335
00:39:56,719 --> 00:40:01,439
same as literally doing this. But the negative one is easier
336
00:40:01,440 --> 00:40:13,119
do the hard work. So if I stack that, I'm going to then return the
337
00:40:13,119 --> 00:40:18,480
thing is that if we go into our training data set, okay, again,
338
00:40:18,480 --> 00:40:28,240
And we get the length of the training data set. But where the
339
00:40:28,239 --> 00:40:39,439
so remember that this is the gammas. And then if we print that,
340
00:40:39,440 --> 00:40:49,039
we'll see that, you know, there's around 7000 of the gammas, but
341
00:40:49,039 --> 00:40:57,360
So that might actually become an issue. And instead, what we want
342
00:40:57,360 --> 00:41:06,200
our our training data set. So that means that we want to increase
343
00:41:06,199 --> 00:41:13,960
so that these kind of match better. And surprise, surprise, there
344
00:41:13,960 --> 00:41:23,159
that will help us do that. It's so I'm going to go to from in the
345
00:41:23,159 --> 00:41:31,759
going to import this random oversampler, run that cell, and come
346
00:41:31,760 --> 00:41:43,640
add in this parameter called oversample, and set that to false for
347
00:41:43,639 --> 00:41:51,239
oversample, then what I'm going to do, and by oversample, so if I
348
00:41:51,239 --> 00:41:59,559
then I'm going to create this ROS and set it equal to this random
349
00:41:59,559 --> 00:42:06,960
I'm just going to say, okay, just fit and resample x and y. And
350
00:42:06,960 --> 00:42:15,000
take more of the less class. So take take the less class and keep
351
00:42:15,000 --> 00:42:24,039
the size of our data set of that smaller class so that they now
352
00:42:24,039 --> 00:42:33,279
data set, and I pass in the training data set where oversample is
353
00:42:33,280 --> 00:42:48,400
is train and then x train, y train. Oops, what's going on? These
354
00:42:48,400 --> 00:42:55,039
what I'm doing now is I'm just saying, okay, what is the length of
355
00:42:55,039 --> 00:43:05,440
14,800, whatever. And now let's take a look at how many of these
356
00:43:05,440 --> 00:43:12,720
we can just sum that up. And then we'll also see that if we
357
00:43:12,719 --> 00:43:19,799
many of them are the other type, it's the same value. So now these
358
00:43:19,800 --> 00:43:31,320
rebalanced. Okay, well, okay. So here, I'm just going to make this
359
00:43:31,320 --> 00:43:39,880
then the next one, I'm going to make this the test data set.
360
00:43:39,880 --> 00:43:46,280
switch oversample here to false. Now, the reason why I'm switching
361
00:43:46,280 --> 00:43:51,840
validation and my test sets are for the purpose of you know, if I
362
00:43:51,840 --> 00:43:59,680
how does my sample perform on those? And I don't want to
363
00:43:59,679 --> 00:44:06,559
I don't care about balancing those I'm, I want to know if I have a
364
00:44:06,559 --> 00:44:16,840
unlabeled, can I trust my model, right? So that's why I'm not
365
00:44:16,840 --> 00:44:23,120
what is going on? Oh, it's because we already have this train. So
366
00:44:23,119 --> 00:44:32,279
that data frame again. And now let's run these. Okay. So now we
367
00:44:32,280 --> 00:44:37,040
And we're going to move on to different models now. And I'm going
368
00:44:37,039 --> 00:44:43,000
about each of these models. And then I'm going to show you how we
369
00:44:43,000 --> 00:44:49,880
first model that we're going to learn about is KNN or K nearest
370
00:44:49,880 --> 00:44:57,720
already drawn a plot on the y axis, I have the number of kids that
371
00:44:57,719 --> 00:45:07,399
on the x axis, I have their income in terms of 1000s per year. So,
372
00:45:07,400 --> 00:45:12,360
making 40,000 a year, that's where this would be. And if somebody
373
00:45:12,360 --> 00:45:18,000
would be somebody has zero kids, it'd be somewhere along this
374
00:45:18,000 --> 00:45:28,400
somewhere over here. Okay. And now I have these plus signs and
375
00:45:28,400 --> 00:45:42,480
I'm going to represent here is the plus sign means that they own a
376
00:45:42,480 --> 00:45:49,800
to represent no car. Okay. So your initial thought should be okay,
377
00:45:49,800 --> 00:46:00,240
classification because all of our points all of our samples have
378
00:46:00,239 --> 00:46:13,000
the plus label. And this here is another sample with the minus
379
00:46:13,000 --> 00:46:20,760
width that I'll use. Alright, so we have this entire data set. And
380
00:46:20,760 --> 00:46:29,200
own a car and maybe around half the people don't own a car. Okay,
381
00:46:29,199 --> 00:46:35,399
point, let me use choose a different color, I'll use this nice
382
00:46:35,400 --> 00:46:42,720
point over here? So let's say that somebody makes 40,000 a year
383
00:46:42,719 --> 00:46:52,439
that would be? Well, just logically looking at this plot, you
384
00:46:52,440 --> 00:46:57,800
they wouldn't have a car, right? Because that kind of matches the
385
00:46:57,800 --> 00:47:06,240
them. So that's a whole concept of this nearest neighbors is you
386
00:47:06,239 --> 00:47:11,319
And then you're basically like, okay, I'm going to take the label
387
00:47:11,320 --> 00:47:17,640
So the first thing that we have to do is we have to define a
388
00:47:17,639 --> 00:47:25,279
in, you know, 2d plots like this, our distance function is
389
00:47:25,280 --> 00:47:45,480
And Euclidean distance is basically just this straight line
390
00:47:45,480 --> 00:47:54,000
would be the Euclidean distance, it seems like there's this point,
391
00:47:54,000 --> 00:48:00,679
that point, etc. So the length of this line, this green line that
392
00:48:00,679 --> 00:48:10,159
as Euclidean distance. If we want to get technical with that, this
393
00:48:10,159 --> 00:48:20,199
let me zoom in. The distance is equal to the square root of one
394
00:48:20,199 --> 00:48:29,159
squared plus extend that square root, the same thing for y. So y
395
00:48:29,159 --> 00:48:36,159
other squared. Okay, so we're basically trying to find the length,
396
00:48:36,159 --> 00:48:43,719
between x and y, and then square each of those sum it up and take
397
00:48:43,719 --> 00:48:53,239
going to erase this so it doesn't clutter my drawing. But anyways,
398
00:48:53,239 --> 00:49:03,519
so here in the nearest neighbor algorithm, we see that there is a
399
00:49:03,519 --> 00:49:09,719
telling us, okay, how many neighbors do we use in order to judge
400
00:49:09,719 --> 00:49:16,519
we use a K of maybe, you know, three or five, depends on how big
401
00:49:16,519 --> 00:49:25,360
I would say, maybe a logical number would be three or five. So
402
00:49:25,360 --> 00:49:34,640
to three. Okay, well, of this data point that I drew over here,
403
00:49:34,639 --> 00:49:40,199
Okay, so of this data point that I drew over here, it looks like
404
00:49:40,199 --> 00:49:50,359
this one, this one. And then this one has a length of four. And
405
00:49:50,360 --> 00:49:57,559
bit further than four. So actually, this would be these would be
406
00:49:57,559 --> 00:50:05,920
points are blue. So chances are, my prediction for this point is
407
00:50:05,920 --> 00:50:14,840
probably don't have a car. All right, now what if my point is
408
00:50:14,840 --> 00:50:26,120
somewhere over here, let's say that a couple has four kids, and
409
00:50:26,119 --> 00:50:34,159
well, now my closest points are this one, probably a little bit
410
00:50:34,159 --> 00:50:45,639
right? Okay, still all pluses. Well, this one is more than likely
411
00:50:45,639 --> 00:50:55,279
let me get rid of some of these just so that it looks a little bit
412
00:50:55,280 --> 00:51:06,960
let's go through one more. What about a point that might be right
413
00:51:06,960 --> 00:51:16,000
definitely this is the closest, right? This one's also closest.
414
00:51:16,000 --> 00:51:22,719
the two of these. But if we actually do the mathematics, it seems
415
00:51:22,719 --> 00:51:30,839
this one is right here. And this one is in between these two. So
416
00:51:30,840 --> 00:51:37,920
than this one. And that means that that top one is the one that
417
00:51:37,920 --> 00:51:45,079
what is the majority of the points that are close by? Well, we
418
00:51:45,079 --> 00:51:52,159
here, and we have one minus here, which means that the pluses are
419
00:51:52,159 --> 00:52:04,559
that this label is probably somebody with a car. Okay. So this is
420
00:52:04,559 --> 00:52:13,599
work. It's that simple. And this can be extrapolated to further
421
00:52:13,599 --> 00:52:19,400
know, if you have here, we have two different features, we have
422
00:52:19,400 --> 00:52:25,920
the number of kids. But let's say we have 10 different features,
423
00:52:25,920 --> 00:52:31,519
function so that it includes all 10 of those dimensions, we take
424
00:52:31,519 --> 00:52:39,480
and then we figure out which one is the closest to the point that
425
00:52:39,480 --> 00:52:45,240
that's K nearest neighbors. So now we've learned about K nearest
426
00:52:45,239 --> 00:52:51,079
be able to do that within our code. So here, I'm going to label
427
00:52:51,079 --> 00:52:59,559
And we're actually going to use a package from SK learn. So the
428
00:52:59,559 --> 00:53:04,639
packages and so that we don't have to manually code all these
429
00:53:04,639 --> 00:53:08,199
be really difficult. And chances are the way that we would code
430
00:53:08,199 --> 00:53:13,079
or it'd be really slow, or I don't know a whole bunch of issues.
431
00:53:13,079 --> 00:53:20,319
hand it off to the pros. From here, I can say, okay, from SK
432
00:53:20,320 --> 00:53:27,880
neighbors, I'm going to import K neighbors classifier, because
433
00:53:27,880 --> 00:53:38,160
so I run that. And our KNN model is going to be this K neighbors
434
00:53:38,159 --> 00:53:43,920
a parameter of how many neighbors, you know, we want to use. So
435
00:53:43,920 --> 00:53:52,800
we just use one. So now if I do K, and then model dot fit, I can
436
00:53:52,800 --> 00:54:03,560
weight y train data. Okay. So that effectively fits this model.
437
00:54:03,559 --> 00:54:11,880
why can and I guess yeah, let's do y predictions. And my y
438
00:54:11,880 --> 00:54:24,960
dot predict. So let's use the test set x test. Okay. Alright, so
439
00:54:24,960 --> 00:54:29,720
that we have those. But if I get my truth values for that test
440
00:54:29,719 --> 00:54:33,879
we actually do. So just looking at this, we got five out of six of
441
00:54:33,880 --> 00:54:39,480
actually take a look at something called the classification report
442
00:54:39,480 --> 00:54:49,719
So if I go to from SK learn dot metrics, import classification
443
00:54:49,719 --> 00:54:57,959
say, hey, print out this classification report for me. And let's
444
00:54:57,960 --> 00:55:04,119
y test and the y prediction. We run this and we see we get this
445
00:55:04,119 --> 00:55:10,719
to tell you guys a few things on this chart. Alright, this
446
00:55:10,719 --> 00:55:15,679
pretty good. That's just saying, hey, if we just look at, you
447
00:55:15,679 --> 00:55:23,359
what it's closest to, then we actually get an 82% accuracy, which
448
00:55:23,360 --> 00:55:29,960
versus how many total are there. Now, precision is saying, okay,
449
00:55:29,960 --> 00:55:36,199
for class one, or class zero and class one. What precision is
450
00:55:36,199 --> 00:55:42,879
diagram over here, because I actually kind of like this diagram.
451
00:55:42,880 --> 00:55:48,160
And on the left over here, we have everything that we know is
452
00:55:48,159 --> 00:55:54,079
actually truly positive, that we've labeled positive in our
453
00:55:54,079 --> 00:56:01,079
this is everything that's truly negative. Now in the circle, we
454
00:56:01,079 --> 00:56:08,159
were labeled positive by our model. On the left here, we have
455
00:56:08,159 --> 00:56:13,119
because you know, this side is the positive side and the side is
456
00:56:13,119 --> 00:56:18,839
truly positive. Whereas all these ones out here, well, they should
457
00:56:18,840 --> 00:56:24,559
are labeled as negative. And in here, these are the ones that
458
00:56:24,559 --> 00:56:33,000
actually negative. And out here, these are truly negative. So
459
00:56:33,000 --> 00:56:40,400
the ones we've labeled as positive, how many of them are true
460
00:56:40,400 --> 00:56:47,160
okay, out of all the ones that we know are truly positive, how
461
00:56:47,159 --> 00:56:55,480
so going back to this over here, our precision score, so again,
462
00:56:55,480 --> 00:57:03,880
that we've labeled as the specific class, how many of them are
463
00:57:03,880 --> 00:57:09,400
recall how out of all the ones that are actually this class, how
464
00:57:09,400 --> 00:57:18,200
is 68% and 89%. Alright, so not too shabby, we can clearly see
465
00:57:18,199 --> 00:57:24,079
like this, the class zero is worse than class one. Right? So that
466
00:57:24,079 --> 00:57:30,079
hadrons and for our gammas. This f1 score over here is kind of a
467
00:57:30,079 --> 00:57:35,519
recall score. So we're actually going to mostly look at this one
468
00:57:35,519 --> 00:57:43,000
test data set. So here we have a measure of 72 and 87 or point
469
00:57:43,000 --> 00:57:55,639
which is not too shabby. All right. Well, what if we, you know,
470
00:57:55,639 --> 00:58:04,599
that, okay, so what was it originally with one? We see that our f1
471
00:58:04,599 --> 00:58:10,360
point seven two and then point eight seven. And then our accuracy
472
00:58:10,360 --> 00:58:20,440
three. Alright, so we've kind of increased zero at the cost of one
473
00:58:20,440 --> 00:58:28,159
is 81. So let's actually just make this five. Alright, so you
474
00:58:28,159 --> 00:58:35,359
we have 82% accuracy, which is pretty decent for a model that's
475
00:58:35,360 --> 00:58:42,880
the next type of model that we're going to talk about is something
476
00:58:42,880 --> 00:58:48,400
in order to understand the concepts behind naive Bayes, we have to
477
00:58:48,400 --> 00:58:55,800
conditional probability and Bayes rule. So let's say I have some
478
00:58:55,800 --> 00:59:03,720
this table right here. People who have COVID are over here in this
479
00:59:03,719 --> 00:59:09,039
have COVID are down here in this green row. Now, what about the
480
00:59:09,039 --> 00:59:18,360
tested positive are over here in this column. And people who have
481
00:59:18,360 --> 00:59:25,840
this column. Okay. Yeah, so basically, our categories are people
482
00:59:25,840 --> 00:59:32,800
people who don't have COVID, but test positive, so a false false
483
00:59:32,800 --> 00:59:38,560
and test negative, which is a false negative, and people who don't
484
00:59:38,559 --> 00:59:48,159
which good means you don't have COVID. Okay, so let's make this
485
00:59:48,159 --> 00:59:55,359
in the margins, I've written down the sums of whatever it's
486
00:59:55,360 --> 01:00:05,559
sum of this entire row. And this here might be the sum of this
487
01:00:05,559 --> 01:00:11,559
question that I have is, what is the probability of having COVID
488
01:00:11,559 --> 01:00:21,920
test? And in probability, we write that out like this. So the
489
01:00:21,920 --> 01:00:29,360
line, that vertical line means given that, you know, some
490
01:00:29,360 --> 01:00:39,440
okay, so what is the probability of having COVID given a positive
491
01:00:39,440 --> 01:00:48,320
saying, okay, let's go into this condition. So the condition of
492
01:00:48,320 --> 01:00:53,360
slice of the data, right? That means if you're in this slice of
493
01:00:53,360 --> 01:00:59,000
given that we have a positive test, given in this condition, in
494
01:00:59,000 --> 01:01:05,679
test. So what's the probability that we have COVID? Well, if we're
495
01:01:05,679 --> 01:01:15,440
of people that have COVID is 531. So I'm gonna say that there's
496
01:01:15,440 --> 01:01:24,599
now we divide that by the total number of people that have a
497
01:01:24,599 --> 01:01:34,639
so that's the probability and doing a quick division, we get that
498
01:01:34,639 --> 01:01:43,239
96.4%. So according to this data set, which is data that I made up
499
01:01:43,239 --> 01:01:50,759
not actually real COVID data. But according to this data, the
500
01:01:50,760 --> 01:02:02,480
that you tested positive is 96.4%. Alright, now with that, let's
501
01:02:02,480 --> 01:02:10,440
this section here. Let's ignore this bottom part for now. So Bayes
502
01:02:10,440 --> 01:02:18,000
the probability of some event A happening, given that B happened.
503
01:02:18,000 --> 01:02:26,000
happened. This is our condition, right? Well, what if we don't
504
01:02:26,000 --> 01:02:31,440
if we don't know what the probability of A given B is? Well, Bayes
505
01:02:31,440 --> 01:02:36,920
can actually go and calculate it, as long as you have a
506
01:02:36,920 --> 01:02:43,920
of A and the probability of B. Okay. And this is just a
507
01:02:43,920 --> 01:02:51,320
so here we have Bayes rule. And let's actually see Bayes rule in
508
01:02:51,320 --> 01:02:58,920
So here, let's say that we have some disease statistics, okay. So
509
01:02:58,920 --> 01:03:05,960
And we know that the probability of obtaining a false positive is
510
01:03:05,960 --> 01:03:12,800
false negative is 0.01. And the probability of the disease is 0.1.
511
01:03:12,800 --> 01:03:20,640
the disease given that we got a positive test? Hmm, how do we even
512
01:03:20,639 --> 01:03:26,519
what what do I mean by false positive? What's a different way to
513
01:03:26,519 --> 01:03:32,960
is when you test positive, but you don't actually have the
514
01:03:32,960 --> 01:03:42,480
that you have a positive test given no disease, right? And
515
01:03:42,480 --> 01:03:47,599
it's a probability that you test negative given that you actually
516
01:03:47,599 --> 01:03:58,119
that into a chart, for example, and this might be my positive and
517
01:03:58,119 --> 01:04:07,239
be my diseases, disease and no disease. Well, the probability that
518
01:04:07,239 --> 01:04:14,039
have no disease, okay, that's 0.05 over here. And then the false
519
01:04:14,039 --> 01:04:20,880
testing negative, but I don't actually have the disease. This so
520
01:04:20,880 --> 01:04:25,480
positive, and you don't have the disease, plus a probability that
521
01:04:25,480 --> 01:04:30,880
don't have the disease, that should sum up to one. Okay, because
522
01:04:30,880 --> 01:04:34,360
then you should have some probability that you're testing positive
523
01:04:34,360 --> 01:04:43,120
testing negative. But that probability, in total should be one. So
524
01:04:43,119 --> 01:04:47,039
negative and no disease, this should be the reciprocal, this
525
01:04:47,039 --> 01:04:57,360
should be 0.95 because it's one minus whatever this probability
526
01:04:59,679 --> 01:05:06,319
up here, this should be 0.99 because the probability that we, you
527
01:05:06,320 --> 01:05:10,080
test negative and have the disease plus the probability that we
528
01:05:10,079 --> 01:05:16,799
disease should equal one. So this is our probability chart. And
529
01:05:16,800 --> 01:05:21,920
being point 0.1 just means I have 10% probability of actually of
530
01:05:23,199 --> 01:05:30,000
in the general population, the probability that I have the disease
531
01:05:30,000 --> 01:05:37,039
probability that I have the disease given that I got a positive
532
01:05:37,039 --> 01:05:43,119
can write this out in terms of Bayes rule, right? So if I use this
533
01:05:43,119 --> 01:05:51,199
probability of a positive test given that I have the disease times
534
01:05:52,880 --> 01:05:58,240
divided by the probability of the evidence, which is my positive
535
01:06:00,000 --> 01:06:05,679
Alright, now let's plug in some numbers for that. The probability
536
01:06:05,679 --> 01:06:13,839
that I have the disease is 0.99. And then the probability that I
537
01:06:13,840 --> 01:06:26,000
over here 0.1. Okay. And then the probability that I have a
538
01:06:26,000 --> 01:06:29,840
what is the probability that I have a positive test given that I
539
01:06:29,840 --> 01:06:37,360
and then having having the disease. And then the other case, where
540
01:06:37,360 --> 01:06:45,519
negative test given or sorry, positive test giving no disease
541
01:06:45,519 --> 01:06:52,000
having a disease. Okay, so I can expand that probability of having
542
01:06:52,000 --> 01:06:58,480
these two different cases, I have a disease, and then I don't. And
543
01:06:58,480 --> 01:07:08,240
having positive tests in either one of those cases. So that
544
01:07:09,519 --> 01:07:16,159
plus 0.05. So that's the probability that I'm testing positive,
545
01:07:16,960 --> 01:07:20,400
And the times the probability that I don't actually have the
546
01:07:20,400 --> 01:07:29,840
0.1 probability that the population doesn't have the disease is
547
01:07:29,840 --> 01:07:48,720
multiplication. And I get an answer of 0.6875 or 68.75%. Okay. All
548
01:07:48,719 --> 01:07:56,480
that we can expand Bayes rule and apply it to classification. And
549
01:07:56,480 --> 01:08:04,639
base. So first, a little terminology. So the posterior is this
550
01:08:04,639 --> 01:08:12,480
Hey, what is the probability of some class CK? So by CK, I just
551
01:08:12,480 --> 01:08:19,359
categories, so C for category or class or whatever. So category
552
01:08:19,359 --> 01:08:26,639
dogs, category three, lizards, all the way, we have k categories,
553
01:08:27,520 --> 01:08:36,160
So what is the probability of having of this specific sample x, so
554
01:08:36,159 --> 01:08:44,079
of this one sample. What is the probability of x fitting into
555
01:08:44,079 --> 01:08:49,119
so that that's what this is asking, what is the probability that,
556
01:08:49,119 --> 01:08:59,920
this class, given all this evidence that we see the x's. So the
557
01:08:59,920 --> 01:09:07,600
here, it's saying, Okay, well, given that, you know, assume,
558
01:09:07,600 --> 01:09:13,760
class is class CK, okay, assume that this is a category. Well,
559
01:09:13,760 --> 01:09:21,280
actually seeing x, all these different features from that
560
01:09:21,279 --> 01:09:26,880
prior. So like in the entire population of things, what are the
561
01:09:26,880 --> 01:09:32,640
probability of this class in general? Like if I have, you know, in
562
01:09:32,640 --> 01:09:40,160
percentage? What is the chance that this image is a cat? How many
563
01:09:40,159 --> 01:09:47,439
down here is called the evidence because what we're trying to do
564
01:09:47,439 --> 01:09:54,319
we're creating this new posterior probability built upon the prior
565
01:09:54,319 --> 01:10:02,239
right? And that evidence is a probability of x. So that's some
566
01:10:05,439 --> 01:10:15,599
is a rule for naive Bayes. Whoa, okay, let's digest that a little
567
01:10:15,600 --> 01:10:21,680
let me use a different color. What is this side of the equation
568
01:10:21,680 --> 01:10:28,320
what is the probability that we are in some class K, CK, given
569
01:10:28,319 --> 01:10:33,920
input, this is my second input, this is, you know, my third,
570
01:10:33,920 --> 01:10:41,600
say that our classification is, do we play soccer today or not?
571
01:10:41,600 --> 01:10:49,440
okay, is it how much wind is there? How much rain is there? And
572
01:10:49,439 --> 01:10:54,399
So let's say that it's raining, it's not windy, but it's
573
01:10:56,079 --> 01:10:59,680
So let's use Bayes rule on this. So this here
574
01:11:06,079 --> 01:11:13,840
is equal to the probability of x one, x two, all these joint
575
01:11:13,840 --> 01:11:20,800
times the probability of that class, all over the probability of
576
01:11:24,399 --> 01:11:31,839
Okay. So what is this fancy symbol over here, this means
577
01:11:33,600 --> 01:11:38,560
so how our equal sign means it's equal to this like little
578
01:11:38,560 --> 01:11:48,800
proportional to okay, and this denominator over here, you might
579
01:11:48,800 --> 01:11:53,840
the class like this, that number doesn't depend on the class,
580
01:11:53,840 --> 01:11:59,199
for all of our different classes. So what I'm going to do is make
581
01:11:59,199 --> 01:12:07,920
going to say that this probability x one, x two, all the way to x
582
01:12:07,920 --> 01:12:10,800
to the numerator, I don't care about the denominator, because it's
583
01:12:10,800 --> 01:12:20,800
single class. So this is proportional to x one, x two, x n given
584
01:12:20,800 --> 01:12:31,920
that class. Okay. All right. So in naive Bayes, the point of it
585
01:12:32,960 --> 01:12:36,319
this joint probability, we're just assuming that all of these
586
01:12:36,319 --> 01:12:42,719
are all independent. So in my soccer example, you know, the
587
01:12:44,800 --> 01:12:50,720
or the probability that, you know, it's windy, and it's rainy,
588
01:12:50,720 --> 01:12:56,800
things are independent, we're assuming that they're independent.
589
01:12:56,800 --> 01:13:06,560
actually write this part of the equation here as this. So each
590
01:13:07,119 --> 01:13:13,840
all of them together. So the probability of the first feature,
591
01:13:14,800 --> 01:13:20,159
times the probability of the second feature and given this
592
01:13:20,159 --> 01:13:30,960
all the way up until, you know, the nth feature of given that it's
593
01:13:30,960 --> 01:13:39,199
all of this. All right, which means that this here is now
594
01:13:39,199 --> 01:13:47,599
expanded times this. So I'm going to write that out. So the
595
01:13:47,600 --> 01:13:54,560
And I'm actually going to use this symbol. So what this means is
596
01:13:54,560 --> 01:14:04,000
it means multiply everything to the right of this. So this
597
01:14:04,720 --> 01:14:11,360
but do it for all the i's. So I, what is I, okay, we're going to
598
01:14:11,359 --> 01:14:18,639
the first x i all the way to the nth. So that means for every
599
01:14:19,359 --> 01:14:27,439
these probabilities together. And that's where this up here comes
600
01:14:27,439 --> 01:14:31,599
oops, this should be a line to wrap this up in plain English.
601
01:14:31,600 --> 01:14:37,520
is a probability that you know, we're in some category, given that
602
01:14:37,520 --> 01:14:44,960
features is proportional to the probability of that class in
603
01:14:44,960 --> 01:14:51,119
each of those features, given that we're in this one class that
604
01:14:51,680 --> 01:14:59,600
of it, you know, of us playing soccer today, given that it's
605
01:14:59,600 --> 01:15:04,880
Wednesday, is proportional to Okay, well, what is what is the
606
01:15:04,880 --> 01:15:10,400
anyways, and then times the probability that it's rainy, given
607
01:15:10,960 --> 01:15:15,439
times the probability that it's not windy, given that we're
608
01:15:15,439 --> 01:15:21,199
we playing soccer when it's windy, how you know, and then how many
609
01:15:21,199 --> 01:15:30,319
that's Wednesday, given that we're playing soccer. Okay. So how do
610
01:15:30,319 --> 01:15:39,039
classification. So that's where this comes in our y hat, our
611
01:15:39,039 --> 01:15:45,439
something called the arg max. And then this expression over here,
612
01:15:45,439 --> 01:15:55,199
the arg max. Well, we want. So okay, if I write out this, again,
613
01:15:55,199 --> 01:16:05,840
being in some class CK given all of our evidence. Well, we're
614
01:16:06,640 --> 01:16:13,920
this expression on the right. That's what arc max means. So if K
615
01:16:14,720 --> 01:16:21,199
one through K, so this is how many categories are, we're going to
616
01:16:21,199 --> 01:16:32,319
to solve this expression over here and find the K that makes that
617
01:16:32,319 --> 01:16:39,439
that instead of writing this, we have now a formula, thanks to
618
01:16:40,560 --> 01:16:47,440
approximate that right in something that maybe we can we maybe we
619
01:16:47,439 --> 01:16:54,479
we have the answers for that based on our training set. So this
620
01:16:54,479 --> 01:17:00,559
these and finding whatever class whatever category maximizes this
621
01:17:00,560 --> 01:17:12,160
this is something known as MAP for short, or maximum a
622
01:17:12,159 --> 01:17:20,159
Pick the hypothesis. So pick the K that is the most probable so
623
01:17:20,159 --> 01:17:31,119
of misclassification. Right. So that is MAP. That is naive Bayes.
624
01:17:31,760 --> 01:17:38,800
just like how I imported k nearest neighbor, k neighbors
625
01:17:38,800 --> 01:17:45,680
I can go to SK learn naive Bayes. And I can import Gaussian naive
626
01:17:46,800 --> 01:17:52,720
Right. And here I'm going to say my naive Bayes model is equal.
627
01:17:52,720 --> 01:18:06,480
had above. And I'm just going to say with this model, we are going
628
01:18:06,479 --> 01:18:17,359
All right, just like above. So this, I might actually, so I'm
629
01:18:19,199 --> 01:18:26,159
exactly, just like above, I'm going to make my prediction. So
630
01:18:26,159 --> 01:18:35,279
naive Bayes model. And of course, I'm going to run the
631
01:18:35,279 --> 01:18:40,719
just going to put these in the same cell. But here we have the y
632
01:18:40,720 --> 01:18:49,520
is still our original test data set. So if I run this, you'll see
633
01:18:49,520 --> 01:18:58,640
we get worse scores, right? Our precision, for all of them, they
634
01:18:58,640 --> 01:19:04,160
you know, for our precision, our recall, our f1 score, they look
635
01:19:04,159 --> 01:19:11,439
categories. And our total accuracy, I mean, it's still 72%, which
636
01:19:11,439 --> 01:19:22,000
72%. Okay. Which, you know, is not not that great. Okay, so let's
637
01:19:22,000 --> 01:19:29,760
Here, I've drawn a plot, I have y. So this is my label on one
638
01:19:29,760 --> 01:19:36,720
my features. So let's just say I only have one feature in this
639
01:19:36,720 --> 01:19:44,079
we see that, you know, I have a few of one class type down here.
640
01:19:44,079 --> 01:19:51,279
because it's zero. And then we have our other class type one up
641
01:19:51,279 --> 01:19:58,960
y. Okay. So many of you guys are familiar with regression. So
642
01:19:58,960 --> 01:20:10,159
draw a regression line through this, it might look something like
643
01:20:10,159 --> 01:20:16,239
doesn't seem to be a very good model. Like, why would we use this
644
01:20:16,239 --> 01:20:27,840
Right? It's, it's iffy. Okay. For example, we might say, okay,
645
01:20:27,840 --> 01:20:33,520
everything from here downwards would be one class type in here,
646
01:20:34,640 --> 01:20:41,520
But when you look at this, you're just you, you visually can tell,
647
01:20:41,520 --> 01:20:46,240
make sense. Things are not those dots are not along that line. And
648
01:20:46,239 --> 01:20:55,279
are doing classification, not regression. Okay. Well, first of
649
01:20:55,279 --> 01:21:04,639
this model, if we just use this line, it equals m x. So whatever
650
01:21:04,640 --> 01:21:10,000
which is the y intercept, right? And m is the slope. But when we
651
01:21:10,000 --> 01:21:15,760
is it actually y hat? No, it's not right. So when we're working
652
01:21:15,760 --> 01:21:20,720
what we're actually estimating in our model is a probability,
653
01:21:20,720 --> 01:21:30,240
and one, that is class zero or class one. So here, let's rewrite
654
01:21:32,720 --> 01:21:39,440
Okay, well, m x plus b, that can range, you know, from negative
655
01:21:39,439 --> 01:21:43,279
right? For any for any value of x, it goes from negative infinity
656
01:21:44,159 --> 01:21:49,039
But probability, we know probably one of the rules of probability
657
01:21:49,039 --> 01:21:57,039
between zero and one. So how do we fix this? Well, maybe instead
658
01:21:57,039 --> 01:22:03,519
equal to that, we can set the odds equal to this. So by that, I
659
01:22:03,520 --> 01:22:10,080
divided by one minus the probability. Okay, so now becomes this
660
01:22:10,079 --> 01:22:17,359
take on infinite values. But there's still one issue here. Let me
661
01:22:18,079 --> 01:22:24,559
The one issue here is that m x plus b, that can still be negative,
662
01:22:24,560 --> 01:22:28,800
I have a negative slope, if I have a negative b, if I have some
663
01:22:28,800 --> 01:22:36,400
but that can be that's allowed to be negative. So how do we fix
664
01:22:36,399 --> 01:22:47,839
the log of the odds. Okay. So now I have the log of you know, some
665
01:22:47,840 --> 01:22:54,319
the probability. And now that is on a range of negative infinity
666
01:22:54,319 --> 01:23:00,639
because the range of log should be negative infinity to infinity.
667
01:23:00,640 --> 01:23:08,400
the probability? Well, the first thing I can do is take, you know,
668
01:23:08,399 --> 01:23:16,479
the not the e to the whatever is on both sides. So that gives me
669
01:23:16,479 --> 01:23:27,839
over the one minus the probability is now equal to e to the m x
670
01:23:27,840 --> 01:23:39,039
that out. So the probability is equal to one minus probability e
671
01:23:39,039 --> 01:23:49,279
e to the m x plus b minus P times e to the m x plus b. And now we
672
01:23:49,279 --> 01:23:58,880
one side. So if I do P, so basically, I'm moving this over, so I'm
673
01:23:58,880 --> 01:24:11,440
to the m x plus b is equal to e to the m x plus b and let me
674
01:24:11,439 --> 01:24:22,719
little bigger. So now my probability can be e to the m x plus b
675
01:24:22,720 --> 01:24:32,880
Okay, well, let me just rewrite this really quickly, I want a
676
01:24:33,840 --> 01:24:39,920
Okay, so what I'm going to do is I'm going to multiply this by
677
01:24:40,800 --> 01:24:45,119
and then also the bottom by negative m x plus b, and I'm allowed
678
01:24:45,119 --> 01:24:52,640
this over this is one. So now my probability is equal to one over
679
01:24:54,640 --> 01:25:01,840
one plus e to the negative m x plus b. And now why did I rewrite
680
01:25:01,840 --> 01:25:07,600
It's because this is actually a form of a special function, which
681
01:25:07,600 --> 01:25:19,360
function. And for the sigmoid function, it looks something like
682
01:25:20,159 --> 01:25:30,639
that some x is equal to one over one plus e to the negative x. So
683
01:25:30,640 --> 01:25:38,000
is rewrite this in some sigmoid function, where the x value is
684
01:25:38,960 --> 01:25:42,880
So maybe I'll change this to y just to make that a bit more clear,
685
01:25:42,880 --> 01:25:50,319
the variable name is. But this is our sigmoid function. And
686
01:25:50,319 --> 01:26:01,039
looks like is it goes from zero. So this here is zero to one. And
687
01:26:01,039 --> 01:26:06,399
curved s, which I didn't draw too well. Let me try that again.
688
01:26:10,159 --> 01:26:19,119
something if I can draw this right. Like that. Okay, so it goes in
689
01:26:19,119 --> 01:26:25,760
And you might notice that this form fits our shape up here.
690
01:26:29,840 --> 01:26:36,159
Oops, let's draw it sharper. But if it's our shape up there a lot
691
01:26:37,439 --> 01:26:44,479
Alright, so that is what we call logistic regression, we're
692
01:26:44,479 --> 01:26:56,239
to the sigmoid function. Okay. And when we only have, you know,
693
01:26:56,239 --> 01:27:06,239
one feature x, and that's what we call simple logistic regression.
694
01:27:06,239 --> 01:27:12,639
so that's only x zero, but then if we have x zero, x one, all the
695
01:27:12,640 --> 01:27:19,360
multiple logistic regression, because there are multiple features
696
01:27:19,359 --> 01:27:26,079
when we're building our model, logistic regression. So I'm going
697
01:27:26,079 --> 01:27:36,079
And again, from SK learn this linear model, we can import logistic
698
01:27:36,079 --> 01:27:43,279
And just like how we did above, we can repeat all of this. So
699
01:27:43,279 --> 01:27:53,439
this log model, or LG logistic regression. I'm going to change
700
01:27:54,319 --> 01:27:59,119
So I'm just going to use the default logistic regression. But
701
01:27:59,119 --> 01:28:02,319
you see that you can use different penalties. So right now we're
702
01:28:02,319 --> 01:28:08,880
an L2 penalty. But L2 is our quadratic formula. Okay, so that
703
01:28:09,680 --> 01:28:16,079
you know, outliers, it would really penalize that. For all these
704
01:28:16,079 --> 01:28:22,319
you can toggle these different parameters, and you might get
705
01:28:22,319 --> 01:28:26,960
If I were building a production level logistic regression model,
706
01:28:26,960 --> 01:28:31,439
would want to figure out how to do that. So I'm going to go ahead
707
01:28:31,439 --> 01:28:36,479
I would want to figure out, you know, what are the best parameters
708
01:28:36,479 --> 01:28:41,519
based on my validation data. But for now, we'll just we'll just
709
01:28:42,720 --> 01:28:49,600
So again, I'm going to fit the X train and the Y train. And I'm
710
01:28:49,600 --> 01:28:57,440
so I can just call this again. And instead of LG, NB, I'm going to
711
01:28:57,439 --> 01:29:07,279
precision 65% recall 71, f 168, or 82 total accuracy of 77. Okay,
712
01:29:07,279 --> 01:29:15,279
better than I base, but it's still not as good as K and N.
713
01:29:15,279 --> 01:29:20,079
classification that I wanted to talk about is something called
714
01:29:20,079 --> 01:29:31,840
or SVMs for short. So what exactly is an SVM model, I have two
715
01:29:31,840 --> 01:29:39,520
x one on the axes. And then I've told you if it's you know, class
716
01:29:39,520 --> 01:29:51,280
blue and red labels, my goal is to find some sort of line between
717
01:29:51,279 --> 01:30:00,559
the data. Alright, so this line is our SVM model. So I call it a
718
01:30:00,560 --> 01:30:06,160
line, but in 3d, it would be a plane and then you can also have
719
01:30:06,159 --> 01:30:11,599
proper term is actually I want to find the hyperplane that best
720
01:30:11,600 --> 01:30:30,000
classes. Let's see a few examples. Okay, so first, between these
721
01:30:30,000 --> 01:30:37,760
and C, which one is the best divider of the data, which one has
722
01:30:37,760 --> 01:30:42,880
or the other, or at least if it doesn't, which one divides it the
723
01:30:42,880 --> 01:30:53,920
is has the most defined boundary between the two different groups.
724
01:30:53,920 --> 01:31:02,079
pretty straightforward. It should be a right because a has a clear
725
01:31:02,079 --> 01:31:09,039
know, everything on this side of a is one label, it's negative and
726
01:31:09,039 --> 01:31:16,399
is the other label, it's positive. So what if I have a but then
727
01:31:16,399 --> 01:31:26,479
like this, and my C, maybe like this, sorry, they're kind of the
728
01:31:27,439 --> 01:31:38,559
But now which one is the best? So I would argue that it's still a,
729
01:31:38,560 --> 01:31:47,840
Right? And why is it still a? Because in these other two, look at
730
01:31:47,840 --> 01:31:57,119
to these points. Right? So if I had some new point that I wanted
731
01:31:57,119 --> 01:32:02,960
say I didn't have A or B. So let's say we're just working with C.
732
01:32:02,960 --> 01:32:10,960
that's right here. Or maybe a new point that's right there. Well,
733
01:32:10,960 --> 01:32:19,600
looking at this. I mean, without the boundary, that would probably
734
01:32:19,600 --> 01:32:27,520
right? I mean, it's pretty close to that other positive. So one
735
01:32:27,520 --> 01:32:36,320
is something known as the margin. Okay, so not only do we want to
736
01:32:36,319 --> 01:32:43,119
well, we also care about the boundary in between where the points
737
01:32:43,119 --> 01:32:53,279
are, and the line that we're drawing. So in a line like this, the
738
01:32:53,279 --> 01:33:10,000
might be like here. And I'm trying to draw these perpendicular.
739
01:33:10,000 --> 01:33:22,399
if I switch over to these dotted lines, if I can draw this right.
740
01:33:22,399 --> 01:33:37,839
are what's known as the margins. Okay, so these both here, these
741
01:33:38,479 --> 01:33:43,039
And our goal is to maximize those margins. So not only do we want
742
01:33:43,039 --> 01:33:51,279
two different classes, we want the line that has the largest
743
01:33:51,279 --> 01:33:57,519
on the margin lines, the data. So basically, these are the data
744
01:33:57,520 --> 01:34:08,480
divider. These are what we call support vectors. Hence the name
745
01:34:08,479 --> 01:34:16,479
so the issue with SVM sometimes is that they're not so robust to
746
01:34:16,479 --> 01:34:25,839
if I had one outlier, like this up here, that would totally change
747
01:34:25,840 --> 01:34:31,920
vector to be, even though that might be my only outlier. Okay. So
748
01:34:31,920 --> 01:34:38,239
in mind. As you know, when you're working with SVM is, it might
749
01:34:38,239 --> 01:34:45,679
are outliers in your data set. Okay, so another example of SVMs
750
01:34:45,680 --> 01:34:50,480
data like this, I'm just going to use a one dimensional data set
751
01:34:50,479 --> 01:34:56,799
say we have a data set that looks like this. Well, our, you know,
752
01:34:56,800 --> 01:35:01,440
perpendicular to this line. But it should be somewhere along this
753
01:35:02,399 --> 01:35:09,119
anywhere like this. You might argue, okay, well, there's one here.
754
01:35:09,119 --> 01:35:13,840
draw another one over here, right? And then maybe you can have two
755
01:35:13,840 --> 01:35:21,680
SVMs work. But one thing that we can do is we can create some sort
756
01:35:21,680 --> 01:35:29,440
that one thing I forgot to do was to label where zero was. So
757
01:35:32,000 --> 01:35:36,800
Now, what I'm going to do is I'm going to say, okay, I'm going to
758
01:35:36,800 --> 01:35:44,560
have x, sorry, x zero and x one. So x zero is just going to be my
759
01:35:44,560 --> 01:35:56,880
x one equal to let's say, x squared. So whatever is this squared,
760
01:35:56,880 --> 01:36:02,960
you know, maybe somewhere here, here, just pretend that it's
761
01:36:02,960 --> 01:36:06,640
Right. And now my pluses might be something like
762
01:36:10,079 --> 01:36:16,079
that. And I'm going to run out of space over here. So I'm just
763
01:36:16,079 --> 01:36:27,600
use your imagination. But once I draw it like this, well, it's a
764
01:36:27,600 --> 01:36:35,520
right? Now our SVM could be maybe something like this, this. And
765
01:36:35,520 --> 01:36:41,600
our data set. Now it's separable where one class is this way. And
766
01:36:42,800 --> 01:36:49,360
Okay, so that's known as SVMs. I do highly suggest that, you know,
767
01:36:49,359 --> 01:36:54,399
mentioned, if you're interested in them, do go more in depth
768
01:36:54,399 --> 01:37:00,239
do we how do we find this hyperplane? Right? I'm not going to go
769
01:37:00,239 --> 01:37:05,840
because you're just learning what an SVM is. But it's a good idea
770
01:37:05,840 --> 01:37:13,039
technique behind finding, you know, what exactly are the are the
771
01:37:13,039 --> 01:37:19,519
that we're going to use. So anyways, this transformation that we
772
01:37:19,520 --> 01:37:26,560
as the kernel trick. So when we go from x to some coordinate x,
773
01:37:27,119 --> 01:37:31,599
what we're doing is we are applying a kernel. So that's why it's
774
01:37:33,279 --> 01:37:40,159
So SVMs are actually really powerful. And you'll see that here. So
775
01:37:40,159 --> 01:37:48,800
to import SVC. And SVC is our support vector classifier. So with
776
01:37:49,600 --> 01:37:59,840
we are going to, you know, create SVC model. And we are going to,
777
01:37:59,840 --> 01:38:06,560
could have just copied and pasted this, I should be able to do
778
01:38:06,560 --> 01:38:10,480
again, fit this to X train, I could have just copied and pasted
779
01:38:10,479 --> 01:38:23,119
done that. Okay, taking a bit longer. All right. Let's predict
780
01:38:23,760 --> 01:38:28,880
let's see if I can hover over this. Right. So again, you see a lot
781
01:38:28,880 --> 01:38:37,119
parameters here that you can go back and change if you were
782
01:38:37,119 --> 01:38:46,319
but in this specific case, we'll just use it out of the box again.
783
01:38:46,319 --> 01:38:53,119
you'll note that Wow, the accuracy actually jumps to 87% with the
784
01:38:53,119 --> 01:38:59,199
there's nothing less than, you know, point eight, which is great.
785
01:38:59,199 --> 01:39:03,359
I mean, everything's at 0.9, which is higher than anything that we
786
01:39:06,640 --> 01:39:11,360
So so far, we've gone over four different classification models,
787
01:39:11,359 --> 01:39:17,039
logistic regression, naive Bayes and cannon. And these are just
788
01:39:17,039 --> 01:39:23,760
them. Each of these they have different, you know, they have
789
01:39:23,760 --> 01:39:31,920
go and you can toggle. And you can try to see if that helps later
790
01:39:31,920 --> 01:39:40,800
they perform, they give us around 70 to 80% accuracy. Okay, with
791
01:39:40,800 --> 01:39:45,440
let's see if we can actually beat that using a neural net. Now the
792
01:39:45,439 --> 01:39:51,839
I wanted to talk about is known as a neural net or neural network.
793
01:39:51,840 --> 01:39:58,480
like this. So you have an input layer, this is where all your
794
01:39:58,479 --> 01:40:03,199
all these arrows pointing to some sort of hidden layer. And then
795
01:40:03,199 --> 01:40:10,559
sort of output layer. So what is what is all this mean? Each of
796
01:40:10,560 --> 01:40:18,160
something known as a neuron. Okay, so that's a neuron. In a neural
797
01:40:18,159 --> 01:40:23,199
features that we're inputting into the neural net. So that might
798
01:40:23,840 --> 01:40:28,880
x n. Right. And these are the features that we talked about there,
799
01:40:28,880 --> 01:40:38,720
the pregnancy, the BMI, the age, etc. Now all of these get
800
01:40:38,720 --> 01:40:44,240
are multiplied by some w number that applies to that one specific
801
01:40:44,239 --> 01:40:51,840
feature. So these two get multiplied. And the sum of all of these
802
01:40:51,840 --> 01:40:58,400
so basically, I'm taking w zero times x zero. And then I'm adding
803
01:40:58,399 --> 01:41:05,359
I'm adding you know, x two times w two, etc, all the way to x n
804
01:41:05,359 --> 01:41:11,199
input into the neuron. Now I'm also adding this bias term, which
805
01:41:11,199 --> 01:41:17,199
to shift this by a little bit. So I might add five or I might add
806
01:41:17,199 --> 01:41:24,960
I don't know. But we're going to add this bias term. And the
807
01:41:24,960 --> 01:41:31,279
the sum of this, this, this and this, go into something known as
808
01:41:31,279 --> 01:41:38,960
okay. And then after applying this activation function, we get an
809
01:41:38,960 --> 01:41:44,399
neuron would look like. Now a whole network of them would look
810
01:41:46,000 --> 01:41:53,760
So I kind of gloss over this activation function. What exactly is
811
01:41:53,760 --> 01:41:58,720
looks like if we have all our inputs here. And let's say all of
812
01:41:58,720 --> 01:42:08,159
of addition, right? Then what's going on is we're just adding a
813
01:42:08,159 --> 01:42:13,840
the some sort of weight times these input layer a bunch of times.
814
01:42:13,840 --> 01:42:22,000
and factor that all out, then this entire neural net is just a
815
01:42:22,000 --> 01:42:27,840
layers, which I don't know about you, but that just seems kind of
816
01:42:27,840 --> 01:42:33,279
literally just write that out in a formula, why would we need to
817
01:42:33,279 --> 01:42:40,000
we wouldn't. So the activation function is introduced, right? So
818
01:42:40,000 --> 01:42:46,880
function, this just becomes a linear model. An activation function
819
01:42:46,880 --> 01:42:52,880
this. And as you can tell, these are not linear. And the reason
820
01:42:52,880 --> 01:42:58,480
our entire model doesn't collapse on itself and become a linear
821
01:42:58,479 --> 01:43:04,079
something known as a sigmoid function, it runs between zero and
822
01:43:04,079 --> 01:43:10,720
one all the way to one. And this is ReLU, which anything less than
823
01:43:10,720 --> 01:43:18,640
greater than zero is linear. So with these activation functions,
824
01:43:18,640 --> 01:43:24,160
is no longer just the linear combination of these, it's some sort
825
01:43:24,159 --> 01:43:32,880
that the input into the next neuron is, you know, it doesn't it
826
01:43:32,880 --> 01:43:39,920
become linear, because we've introduced all these nonlinearities.
827
01:43:39,920 --> 01:43:45,440
model, the loss, right? And then we do this thing called training,
828
01:43:45,439 --> 01:43:53,199
back into the model, and make certain adjustments to the model to
829
01:43:55,199 --> 01:43:59,359
Let's talk a little bit about the training, what exactly goes on
830
01:44:00,720 --> 01:44:07,600
Let's go back and take a look at our L2 loss function. This is
831
01:44:07,600 --> 01:44:15,840
looks like it's a quadratic formula, right? Well, up here, the
832
01:44:15,840 --> 01:44:23,199
large. And our goal is to get somewhere down here, where the loss
833
01:44:23,199 --> 01:44:30,720
means that our predicted value is closer to our true value. So
834
01:44:30,720 --> 01:44:39,680
this way. Okay. And thanks to a lot of properties of math,
835
01:44:39,680 --> 01:44:53,680
gradient descent, in order to follow this slope down this way.
836
01:44:53,680 --> 01:45:02,560
different slopes with respect to some value. Okay, so the loss
837
01:45:03,119 --> 01:45:12,479
w zero, versus w one versus w n, they might all be different.
838
01:45:12,479 --> 01:45:18,319
think about it is, to what extent is this value contributing to
839
01:45:18,319 --> 01:45:24,399
figure that out through some calculus, which we're not going to
840
01:45:24,399 --> 01:45:29,599
But if you want to learn more about neural nets, you should
841
01:45:29,600 --> 01:45:35,360
and figure out what exactly back propagation is doing, in order to
842
01:45:35,359 --> 01:45:41,759
how much do we have to backstep by. So the thing is here, you
843
01:45:41,760 --> 01:45:48,480
this curve at all of these different points. And the closer we get
844
01:45:48,479 --> 01:45:57,839
this step becomes. Now stick with me here. So my new value, this
845
01:45:57,840 --> 01:46:04,800
I'm going to take w zero, and I'm going to set some new value for
846
01:46:04,800 --> 01:46:12,800
set for that is the old value of w zero, plus some factor, which
847
01:46:13,680 --> 01:46:22,400
times whatever this arrow is. So that's basically saying, okay,
848
01:46:23,039 --> 01:46:30,000
and just decrease it this way. So I guess increase it in this
849
01:46:30,000 --> 01:46:34,640
this direction. But this alpha here is telling us, okay, don't
850
01:46:34,640 --> 01:46:38,800
just in case we're wrong, take a small step, take a small step in
851
01:46:38,800 --> 01:46:45,760
closer. And for those of you who, you know, do want to look more
852
01:46:45,760 --> 01:46:51,840
the reason why I use a plus here is because this here is the
853
01:46:51,840 --> 01:46:54,720
just the if you were to use the actual gradient, this should be a
854
01:46:54,720 --> 01:47:00,560
Now this alpha is something that we call the learning rate. Okay,
855
01:47:00,560 --> 01:47:07,280
we're taking steps. And that might, you know, tell our that that
856
01:47:07,840 --> 01:47:13,039
how long it takes for our neural net to converge. Or sometimes if
857
01:47:13,039 --> 01:47:21,840
diverge. But with all of these weights, so here I have w zero, w
858
01:47:21,840 --> 01:47:29,840
update to all of them after we calculate the loss, the gradient of
859
01:47:29,840 --> 01:47:37,680
weight. So that's how back propagation works. And that is
860
01:47:37,680 --> 01:47:42,880
calculate the loss, we're calculating gradients, making
861
01:47:42,880 --> 01:47:50,480
all the all the weights to something adjusted slightly. And then
862
01:47:50,479 --> 01:47:55,119
gradient. And then we're saying, Okay, let's take the training set
863
01:47:55,119 --> 01:48:01,840
again, and go through this loop all over again. So for machine
864
01:48:01,840 --> 01:48:09,039
libraries that we use, right, we've already seen SK learn. But
865
01:48:09,039 --> 01:48:19,920
networks, this is kind of what we're trying to program. And it's
866
01:48:19,920 --> 01:48:25,760
do this from scratch, because not only will we probably have a lot
867
01:48:25,760 --> 01:48:30,159
not going to be fast enough, right? Wouldn't it be great if there
868
01:48:30,800 --> 01:48:35,760
full time professionals that are dedicated to solving this
869
01:48:35,760 --> 01:48:43,360
just give us their code that's already running really fast? Well,
870
01:48:43,359 --> 01:48:49,359
And that's why we use TensorFlow. So TensorFlow makes it really
871
01:48:49,359 --> 01:48:55,599
we also have enough control over what exactly we're feeding into
872
01:48:55,600 --> 01:49:02,640
this line here is basically saying, Okay, let's create a
873
01:49:02,640 --> 01:49:08,000
just, you know, what we've seen here, it just goes one layer to
874
01:49:08,000 --> 01:49:13,359
a dense layer means that all of them are interconnected. So here,
875
01:49:13,359 --> 01:49:19,839
nodes, and this one's all these, and then this one gets connected
876
01:49:19,840 --> 01:49:26,800
So we're going to create 16 dense nodes with relu activation
877
01:49:26,800 --> 01:49:34,000
to create another layer of 16 dense nodes with relu activation.
878
01:49:34,000 --> 01:49:43,199
to be just one node. Okay. And that's how easy it is to define
879
01:49:43,199 --> 01:49:51,199
is an open source library that helps you develop and train your ML
880
01:49:51,199 --> 01:49:57,119
for a neural net. So we're using a neural net for classification.
881
01:49:58,239 --> 01:50:03,840
we are going to use TensorFlow, and I don't think I imported that
882
01:50:03,840 --> 01:50:18,400
that down here. So I'm going to import TensorFlow as TF. And
883
01:50:19,279 --> 01:50:28,159
is going to be, I'm going to use this. So essentially, this is
884
01:50:28,159 --> 01:50:35,039
things that I'm about to pass in. So yeah, layer them linear stack
885
01:50:35,760 --> 01:50:40,560
And what that means, nope, not that. So what that means is I can
886
01:50:42,720 --> 01:50:46,560
some sort of layer, and I'm just going to use a dense layer.
887
01:50:46,560 --> 01:50:56,560
Oops, dot dense. And let's say we have 32 units. Okay, I will
888
01:51:01,279 --> 01:51:09,599
set the activation as really. And at first we have to specify the
889
01:51:09,600 --> 01:51:19,680
and comma. Alright. Alright, so that's our first layer. Now our
890
01:51:19,680 --> 01:51:28,880
another dense layer of 32 units all using relu. And that's it. So
891
01:51:28,880 --> 01:51:35,760
just going to be my output layer, it's going to just be one node.
892
01:51:35,760 --> 01:51:43,119
be sigmoid. So if you recall from our logistic regression, what
893
01:51:43,119 --> 01:51:49,599
a sigmoid, it looks something like this, right? So by creating a
894
01:51:49,600 --> 01:51:56,720
we're essentially projecting our predictions to be zero or one,
895
01:51:57,439 --> 01:52:03,279
And that's going to help us, you know, we can just round to zero
896
01:52:03,279 --> 01:52:12,000
Okay. So this is my neural net model. And I'm going to compile
897
01:52:12,000 --> 01:52:17,520
we have to compile it. It's really cool, because I can just
898
01:52:17,520 --> 01:52:23,840
I want, and it'll do it. So here, if I go to optimizers, I'm
899
01:52:24,720 --> 01:52:31,039
And you'll see that, you know, the learning rate is 0.001. So I'm
900
01:52:31,039 --> 01:52:44,800
So 0.001. And my loss is going to be binary cross entropy. And the
901
01:52:44,800 --> 01:52:50,079
include on here, so it already will consider loss, but I'm, I'm
902
01:52:50,079 --> 01:52:55,600
So we can actually see that in a plot later on. Alright, so I'm
903
01:52:55,600 --> 01:53:01,760
And one thing that I'm going to also do is I'm going to define
904
01:53:01,760 --> 01:53:06,800
actually copying and pasting this, I got these from TensorFlow. So
905
01:53:06,800 --> 01:53:13,119
tutorial, they actually have these, this like, defined. And that's
906
01:53:13,119 --> 01:53:18,239
So I'm actually going to move this cell up, run that. So we're
907
01:53:18,239 --> 01:53:23,519
over all the different epochs. epochs means like training cycles.
908
01:53:23,520 --> 01:53:27,680
means like training cycles. And we're going to plot the accuracy
909
01:53:28,960 --> 01:53:36,079
Alright, so we have our model. And now all that's left is, let's
910
01:53:37,199 --> 01:53:42,720
So I'm going to say history. So TensorFlow is great, because it
911
01:53:42,720 --> 01:53:47,680
of the training, which is why we can go and plot it later on. Now
912
01:53:47,680 --> 01:53:59,280
this neural net model. And fit that with x train, y train, I'm
913
01:53:59,279 --> 01:54:06,159
equal to let's say just let's just use 100 for now. And the batch
914
01:54:06,159 --> 01:54:18,159
let's say 32. Alright. And the validation split. So what the
915
01:54:18,159 --> 01:54:23,920
here somewhere. Okay, so yeah, this validation split is just the
916
01:54:23,920 --> 01:54:31,119
to be used as validation data. So essentially, every single epoch,
917
01:54:31,119 --> 01:54:37,199
saying, leave certain if this is point two, then leave 20% out.
918
01:54:37,199 --> 01:54:42,559
model performs on that 20% that we've left out. Okay, so it's
919
01:54:42,560 --> 01:54:48,800
set. But TensorFlow does it on our training data set during the
920
01:54:48,800 --> 01:54:54,640
outside of just our validation data set to see, you know, what's
921
01:54:54,640 --> 01:55:05,760
I'm going to make that 0.2. And we can run this. So if I run that,
922
01:55:05,760 --> 01:55:13,760
to set verbose equal to zero, which means, okay, don't print
923
01:55:13,760 --> 01:55:19,680
for 100 epochs might get kind of annoying. So I'm just going to
924
01:55:19,680 --> 01:55:31,039
and then we'll see what happens. Cool, so it finished training.
925
01:55:31,039 --> 01:55:36,960
because you know, I've already defined these two functions, I can
926
01:55:36,960 --> 01:55:45,199
oops, loss of that history. And I can also plot the accuracy
927
01:55:45,199 --> 01:55:52,239
So this is a little bit ish what we're looking for. We definitely
928
01:55:52,239 --> 01:55:59,119
decreasing loss and an increasing accuracy. So here we do see
929
01:55:59,119 --> 01:56:07,199
accuracy improves from around point seven, seven or something all
930
01:56:07,199 --> 01:56:16,880
point, maybe eight one. And our loss is decreasing. So this is
931
01:56:16,880 --> 01:56:23,359
loss and accuracy is performing worse than the training loss or
932
01:56:23,359 --> 01:56:28,479
our model is training on that data. So it's adapting to that data.
933
01:56:28,479 --> 01:56:35,759
you know, stuff that it hasn't seen yet. So, so that's why. So in
934
01:56:35,760 --> 01:56:40,159
we could change a bunch of the parameters, right? Like I could
935
01:56:40,159 --> 01:56:46,960
a row of 64 nodes, and then 32, and then one. So I can change some
936
01:56:47,680 --> 01:56:53,039
And a lot of machine learning is trying to find, hey, what do we
937
01:56:54,399 --> 01:57:02,079
So what I'm actually going to do is I'm going to rewrite this so
938
01:57:02,079 --> 01:57:08,079
known as a grid search. So we can search through an entire space
939
01:57:08,079 --> 01:57:19,199
we have 64 nodes and 64 nodes, or 16 nodes and 16 nodes, and so
940
01:57:19,199 --> 01:57:26,639
we can, you know, we can change this learning rate, we can change
941
01:57:26,640 --> 01:57:33,039
you know, the batch size, all these things might affect our
942
01:57:33,039 --> 01:57:42,000
I'm also going to add what's known as a dropout layer in here. And
943
01:57:42,000 --> 01:57:51,119
saying, hey, randomly choose with at this rate, certain nodes, and
944
01:57:51,119 --> 01:57:59,760
in a certain iteration. So this helps prevent overfitting. Okay,
945
01:57:59,760 --> 01:58:06,720
define this as a function called train model, we're going to pass
946
01:58:07,920 --> 01:58:15,760
the number of nodes, the dropout, you know, the probability that
947
01:58:15,760 --> 01:58:27,199
learning rate. So I'm actually going to say lr batch size. And we
948
01:58:27,199 --> 01:58:34,319
right? I mentioned that as a parameter. So indent this, so it goes
949
01:58:34,319 --> 01:58:40,799
I'm going to set this equal to number of nodes. And now with the
950
01:58:40,800 --> 01:58:48,720
to set dropout prob. So now you know, the probability of turning
951
01:58:48,720 --> 01:58:55,360
is equal to dropout prob. And I'm going to keep the output layer
952
01:58:55,359 --> 01:59:00,479
but this here is now going to be my learning rate. And I still
953
01:59:00,479 --> 01:59:12,639
accuracy. We are actually going to train our model inside of this
954
01:59:12,640 --> 01:59:19,200
epochs equal epochs, and this is equal to whatever, you know,
955
01:59:19,199 --> 01:59:25,279
y train belong right here. Okay, so those are getting passed in as
956
01:59:25,279 --> 01:59:38,159
end, I'm going to return this model and the history of that model.
957
01:59:40,399 --> 01:59:46,399
is let's just go through all of these. So let's say let's keep
958
01:59:46,399 --> 01:59:53,279
do is I can say, hey, for a number of nodes in, let's say, let's
959
01:59:53,279 --> 02:00:02,960
happens for the different dropout probabilities. And I mean, zero
960
02:00:02,960 --> 02:00:17,199
Also, to see what happens. You know, for the learning rate in
961
02:00:17,199 --> 02:00:27,359
maybe we want to throw on 0.1 in there as well. And then for the
962
02:00:27,359 --> 02:00:33,119
64 as well. Actually, and let's also throw in 128. Actually, let's
963
02:00:33,680 --> 02:00:44,079
so 128 in there. That should be 01. I'm going to record the model
964
02:00:44,079 --> 02:00:54,640
train model here. So we're going to do x train y train, the number
965
02:00:54,640 --> 02:01:04,240
you know, the number of nodes that we've defined here, dropout,
966
02:01:04,239 --> 02:01:10,479
Okay. And then now we have both the model and the history. And
967
02:01:10,479 --> 02:01:18,079
I want to plot the loss for the history. I'm also going to plot
968
02:01:19,840 --> 02:01:22,640
Probably should have done them side by side, that probably would
969
02:01:26,319 --> 02:01:34,399
Okay, so what I'm going to do is split up, split this up. And that
970
02:01:34,399 --> 02:01:41,039
the subplots. So now this is just saying, okay, I want one row and
971
02:01:41,039 --> 02:01:56,000
plots. Okay, so I'm going to plot on my axis one, the loss. I
972
02:01:56,000 --> 02:02:04,640
work. Okay, we don't care about the grid. Yeah, let's let's keep
973
02:02:09,199 --> 02:02:14,800
So now on here, I'm going to plot all the accuracies on the second
974
02:02:20,159 --> 02:02:21,840
I might have to debug this a bit.
975
02:02:21,840 --> 02:02:27,680
We should be able to get rid of that. If we run this, we already
976
02:02:27,680 --> 02:02:36,800
in here. So if I just run it on this, okay, it has no attribute x
977
02:02:36,800 --> 02:02:47,680
it's like set x label or something. Okay, yeah, so it's, it's set
978
02:02:47,680 --> 02:02:54,480
So let's see if that works. All right, cool. Um, and let's
979
02:02:55,439 --> 02:02:59,919
Okay, so we can actually change the figure size that I'm gonna
980
02:02:59,920 --> 02:03:08,159
set that to. Oh, that's not the way I wanted it. Okay, so that
981
02:03:08,159 --> 02:03:13,920
And that's just going to be my plot history function. So now I can
982
02:03:15,279 --> 02:03:23,279
Here, I'm going to plot the history. And what I'm actually going
983
02:03:23,279 --> 02:03:26,079
I'm going to print out all these parameters. So I'm going to print
984
02:03:27,359 --> 02:03:34,960
the F string to print out all of this stuff. So here, I'm going to
985
02:03:34,960 --> 02:03:42,720
Uh, all of this stuff. So here, I'm printing out how many nodes,
986
02:03:55,199 --> 02:03:57,519
And we already know how many you found, so I'm not even going to
987
02:03:57,520 --> 02:04:10,560
So once we plot this, uh, let's actually also figure out what the,
988
02:04:10,560 --> 02:04:15,680
losses on our validation set that we have that we created all the
989
02:04:16,720 --> 02:04:23,760
Alright, so remember, we created three data sets. Let's call our
990
02:04:23,760 --> 02:04:32,640
validation data with the validation data sets loss would be. And I
991
02:04:33,520 --> 02:04:38,160
let's say I want to record whatever model has the least validation
992
02:04:40,640 --> 02:04:45,360
first, I'm going to initialize that to infinity so that you know,
993
02:04:45,359 --> 02:04:53,599
So if I do float infinity, that will set that to infinity. And
994
02:04:53,600 --> 02:04:58,640
track of the parameters. Actually, it doesn't really matter. I'm
995
02:04:58,640 --> 02:05:06,480
the model. And I'm gonna set that to none. So now down here, if
996
02:05:06,479 --> 02:05:13,759
less than the least validation loss, then I am going to simply
997
02:05:13,760 --> 02:05:20,400
Hey, this validation for this least validation loss is now equal
998
02:05:21,600 --> 02:05:30,480
And the least loss model is whatever this model is that just
999
02:05:31,840 --> 02:05:40,319
So we are actually just going to let this run for a while. And
1000
02:05:40,319 --> 02:05:51,840
last model after that. So let's just run. All right, and now we
1001
02:05:51,840 --> 02:06:12,079
All right, so we've finally finished training. And you'll notice
1002
02:06:12,079 --> 02:06:19,039
actually gets to like 0.29. The accuracy is around 88%, which is
1003
02:06:19,039 --> 02:06:26,239
okay, why is this accuracy in this? Like, these are both the
1004
02:06:26,239 --> 02:06:30,319
is on the validation data set that we've defined at the beginning,
1005
02:06:30,319 --> 02:06:35,840
this is actually taking 20% of our tests, our training set every
1006
02:06:35,840 --> 02:06:41,199
and saying, Okay, how much of it do I get right now? You know,
1007
02:06:41,199 --> 02:06:46,880
train with any of that. So they're slightly different. And
1008
02:06:46,880 --> 02:06:52,640
that I probably you know, probably what I should have done is over
1009
02:06:54,640 --> 02:06:59,920
the model fit, instead of the validation split, you can define the
1010
02:07:00,479 --> 02:07:04,639
And you can pass in the validation data, I don't know if this is
1011
02:07:05,439 --> 02:07:09,439
that's probably what I should have done. But instead, you know,
1012
02:07:09,439 --> 02:07:16,719
we have here. So you'll see at the end, you know, with the 64
1013
02:07:16,720 --> 02:07:24,880
performance 64 nodes with a dropout of 0.2, a learning rate of
1014
02:07:25,439 --> 02:07:31,439
And it does seem like yes, the validation, you know, the fake
1015
02:07:34,000 --> 02:07:40,239
loss is decreasing, and then the accuracy is increasing, which is
1016
02:07:40,239 --> 02:07:45,039
so finally, what I'm going to do is I'm actually just going to
1017
02:07:45,039 --> 02:07:50,960
this model, which we've called our least loss model, I'm going to
1018
02:07:50,960 --> 02:07:58,159
and I'm going to predict x test on that. And you'll see that it
1019
02:07:58,159 --> 02:08:02,159
are really close to zero and some that are really close to one.
1020
02:08:02,159 --> 02:08:11,920
output. So if I do this, and what I can do is I can cast them. So
1021
02:08:11,920 --> 02:08:20,239
greater than 0.5, set that to one. So if I actually, I think what
1022
02:08:22,399 --> 02:08:29,759
Oh, okay, so I have to cast that as type. And so now you'll see
1023
02:08:29,760 --> 02:08:40,560
actually going to transform this into a column as well. So here
1024
02:08:40,560 --> 02:08:49,280
I didn't mean to do that. Okay, no, I wanted to just reshape it to
1025
02:08:49,279 --> 02:08:57,599
Okay. And using that we can actually just rerun the classification
1026
02:08:57,600 --> 02:09:04,880
neural net output. And you'll see that okay, the the F ones are
1027
02:09:04,880 --> 02:09:12,560
seems like what happened here is the precision on class zero. So
1028
02:09:12,560 --> 02:09:19,840
but the recall decreased. But the F one score is still at a good
1029
02:09:19,840 --> 02:09:24,480
class, it looked like the precision decreased a bit the recall
1030
02:09:25,039 --> 02:09:31,439
That's also been increased. I think I interpreted that properly. I
1031
02:09:31,439 --> 02:09:37,839
work and we got a model that performs actually very, very
1032
02:09:37,840 --> 02:09:43,039
had earlier. And the whole point of this exercise was to
1033
02:09:43,039 --> 02:09:48,720
define your models. But it's also to say, hey, maybe, you know,
1034
02:09:48,720 --> 02:09:55,840
powerful, as you can tell. But sometimes, you know, an SVM or some
1035
02:09:55,840 --> 02:10:03,360
appropriate. But in this case, I guess it didn't really matter
1036
02:10:04,399 --> 02:10:10,639
accuracy score is still pretty good. So yeah, let's now move on to
1037
02:10:11,840 --> 02:10:17,039
We just saw a bunch of different classification models. Now let's
1038
02:10:17,039 --> 02:10:23,279
the other type of supervised learning. If we look at this plot
1039
02:10:23,279 --> 02:10:31,439
data points. And here we have our x value for those data points.
1040
02:10:31,439 --> 02:10:40,079
value, which is now our label. And when we look at this plot,
1041
02:10:40,079 --> 02:10:48,159
the line of best fit that best models this data. Essentially,
1042
02:10:48,159 --> 02:10:54,159
some new value of x that we don't have in our sample, we're trying
1043
02:10:54,159 --> 02:11:01,599
prediction for y be for that given x value. So that, you know,
1044
02:11:03,279 --> 02:11:08,399
I don't know. But remember, in regression that, you know, given
1045
02:11:08,399 --> 02:11:12,079
we're trying to predict some continuous numerical value for y.
1046
02:11:12,079 --> 02:11:21,199
In linear regression, we want to take our data and fit a linear
1047
02:11:21,199 --> 02:11:30,079
our linear model might look something along the lines of here.
1048
02:11:30,079 --> 02:11:41,119
considered as maybe our line of best fit. And this line is modeled
1049
02:11:41,119 --> 02:11:51,680
it down here, y equals b zero, plus b one x. Now b zero just means
1050
02:11:51,680 --> 02:11:58,880
extend this y down here, this value here is b zero, and then b one
1051
02:11:58,880 --> 02:12:08,880
line, defines the slope of this line. Okay. All right. So that's
1052
02:12:09,680 --> 02:12:17,119
for linear regression. And how exactly do we come up with that
1053
02:12:17,119 --> 02:12:23,279
with this linear regression? You know, we could just eyeball where
1054
02:12:23,279 --> 02:12:29,279
not very good at eyeballing certain things like that. I mean, we
1055
02:12:29,279 --> 02:12:37,519
better at giving us a precise value for b zero and b one. Well,
1056
02:12:37,520 --> 02:12:47,200
something known as a residual. Okay, so residual, you might also
1057
02:12:47,199 --> 02:12:55,039
And what that means is, let's take some data point in our data
1058
02:12:55,039 --> 02:13:03,439
far off is our prediction from a data point that we already have.
1059
02:13:04,000 --> 02:13:15,119
this is 12345678. So this is y eight, let's call it, you'll see
1060
02:13:15,119 --> 02:13:23,039
I in order to represent, hey, just one of these points. Okay. So
1061
02:13:23,039 --> 02:13:30,720
would be the prediction. Oops, this here would be the prediction
1062
02:13:30,720 --> 02:13:35,199
with this hat. Okay, if it has a hat on it, that means hey, this
1063
02:13:35,199 --> 02:13:48,239
my prediction for you know, this specific value of x. Okay. Now
1064
02:13:48,239 --> 02:13:58,719
here between y eight and y hat eight. So y eight minus y hat
1065
02:13:58,720 --> 02:14:04,400
give us this here. And I'm just going to take the absolute value
1066
02:14:04,399 --> 02:14:08,879
the line, right, then you would get a negative value, but distance
1067
02:14:08,880 --> 02:14:14,560
just going to put a little hat, or we're going to put a little
1068
02:14:15,279 --> 02:14:23,519
And that gives us the residual or the error. So let me rewrite
1069
02:14:23,520 --> 02:14:32,960
to all the points, I'm going to say the residual can be calculated
1070
02:14:32,960 --> 02:14:39,279
So this just means the distance between some given point, and its
1071
02:14:39,279 --> 02:14:47,679
prediction on the line. So now, with this residual, this line of
1072
02:14:47,680 --> 02:14:55,840
decrease these residuals as much as possible. So now that we have
1073
02:14:55,840 --> 02:15:00,640
our line of best fit is trying to decrease the error as much as
1074
02:15:00,640 --> 02:15:07,840
data points. And that might mean, you know, minimizing the sum of
1075
02:15:07,840 --> 02:15:14,720
here, this is the sum symbol. And if I just stick the residual
1076
02:15:16,640 --> 02:15:21,200
it looks something like that, right. And I'm just going to say,
1077
02:15:21,199 --> 02:15:27,679
data set, so for all the different points, we're going to sum up
1078
02:15:27,680 --> 02:15:33,200
to try to decrease that with my line of best fit. So I'm going to
1079
02:15:33,199 --> 02:15:41,679
me the lowest value of this. Okay. Now in other, you know,
1080
02:15:41,680 --> 02:15:49,039
we might attach a squared to that. So we're trying to decrease the
1081
02:15:49,039 --> 02:16:03,519
And what that does is it just, you know, it adds a higher penalty
1082
02:16:03,520 --> 02:16:07,920
you know, points that are further off. So that is linear
1083
02:16:08,640 --> 02:16:15,520
this equation, some line of best fit that will help us decrease
1084
02:16:15,520 --> 02:16:19,920
with respect to all the data points that we have in our data set,
1085
02:16:19,920 --> 02:16:27,760
the best prediction for all of them. This is known as simple
1086
02:16:30,880 --> 02:16:39,520
And basically, that means, you know, our equation looks something
1087
02:16:39,520 --> 02:16:52,479
multiple linear regression, which just means that hey, if we have
1088
02:16:52,479 --> 02:16:58,559
think of our feature vectors, we have multiple values in our x
1089
02:16:58,559 --> 02:17:11,199
look something more like this. Actually, I'm just going to say
1090
02:17:11,200 --> 02:17:18,960
up with some coefficient for all of the different x values that I
1091
02:17:18,959 --> 02:17:23,039
might have noticed that I have some assumptions over here. And you
1092
02:17:23,040 --> 02:17:26,560
what in the world do these assumptions mean? So let's go over
1093
02:17:26,559 --> 02:17:31,119
So let's go over them. The first one is linearity.
1094
02:17:33,840 --> 02:17:38,399
And what that means is, let's say I have a data set. Okay.
1095
02:17:43,760 --> 02:17:50,960
Linearity just means, okay, my does my data follow a linear
1096
02:17:50,959 --> 02:17:59,279
increases? Or does y decrease at as x increases? Does so if y
1097
02:17:59,280 --> 02:18:04,720
rate as x increases, then you're probably looking at something
1098
02:18:04,719 --> 02:18:12,959
nonlinear data set? Let's say I had data that might look something
1099
02:18:12,959 --> 02:18:18,719
visually judging this, you might say, okay, seems like the line of
1100
02:18:18,719 --> 02:18:28,559
curve like this. Right. And in this case, we don't satisfy that
1101
02:18:29,680 --> 02:18:36,960
So with linearity, we basically just want our data set to follow
1102
02:18:39,280 --> 02:18:42,640
And independence, our second assumption
1103
02:18:42,639 --> 02:18:50,079
just means this point over here, it should have no influence on
1104
02:18:50,079 --> 02:18:55,039
or this point over here, or this point over here. So in other
1105
02:18:56,000 --> 02:19:03,440
all the samples in our data set should be independent. Okay, they
1106
02:19:03,440 --> 02:19:05,840
one another, they should not affect one another.
1107
02:19:05,840 --> 02:19:17,120
Okay, now, normality and homoscedasticity, those are concepts
1108
02:19:17,120 --> 02:19:31,120
I have a plot that looks something like this, and I have a plot
1109
02:19:31,120 --> 02:19:45,680
something like this. And my line of best fit is somewhere here,
1110
02:19:47,200 --> 02:19:52,000
In order to look at these normality and homoscedasticity
1111
02:19:52,000 --> 02:20:03,440
the residual plot. Okay. And what that means is I'm going to keep
1112
02:20:03,440 --> 02:20:09,360
of plotting now where they are relative to this y, I'm going to
1113
02:20:09,360 --> 02:20:19,200
going to plot y minus y hat like this. Okay. And now you know,
1114
02:20:19,200 --> 02:20:24,720
so it might be here, this one down here is negative, it might be
1115
02:20:25,840 --> 02:20:30,079
it's literally just a plot of how you know, the values are
1116
02:20:30,079 --> 02:20:42,879
fit. So it looks like it might, you know, look something like
1117
02:20:42,879 --> 02:20:55,279
residual plot. And what normality means, so our assumptions are
1118
02:20:59,280 --> 02:21:05,120
I might have butchered that spelling, I don't really know. But
1119
02:21:05,120 --> 02:21:12,960
saying, okay, these residuals should be normally distributed.
1120
02:21:12,959 --> 02:21:21,599
it should follow a normal distribution. And now what
1121
02:21:21,600 --> 02:21:28,399
of these points should remain constant throughout. So this spread
1122
02:21:28,399 --> 02:21:35,199
same as this spread over here. Now, what's an example of where you
1123
02:21:35,200 --> 02:21:43,920
not held? Well, let's say that our original plot actually looks
1124
02:21:46,479 --> 02:21:51,600
Okay, so now if we looked at the residuals for that, it might look
1125
02:21:51,600 --> 02:22:03,600
like that. And now if we look at this spread of the points, it
1126
02:22:03,600 --> 02:22:12,559
is not constant, which means that homoscedasticity, this
1127
02:22:12,559 --> 02:22:18,559
might not be appropriate to use linear regression. So that's just
1128
02:22:18,559 --> 02:22:25,680
we have a bunch of data points, we want to predict some y value
1129
02:22:25,680 --> 02:22:32,639
up with this line of best fit that best describes, hey, given some
1130
02:22:32,639 --> 02:22:43,039
guess of what y is. So let's move on to how do we evaluate a
1131
02:22:43,040 --> 02:22:49,600
measure that I'm going to talk about is known as mean absolute
1132
02:22:52,079 --> 02:22:59,039
for short, okay. And mean absolute error is basically saying, all
1133
02:22:59,040 --> 02:23:06,080
all the errors. So all these residuals that we talked about, let's
1134
02:23:06,079 --> 02:23:11,440
for all of them, and then take the average. And then that can
1135
02:23:11,440 --> 02:23:18,319
we. So the mathematical formula for that would be, okay, let's
1136
02:23:21,680 --> 02:23:27,440
Alright, so this is the distance. Actually, let me redraw a plot
1137
02:23:27,440 --> 02:23:41,440
suppose I have a data set, look like this. And here are all my
1138
02:23:41,440 --> 02:23:52,319
say my line looks something like that. So my mean absolute error
1139
02:23:52,319 --> 02:24:01,600
values. This was a mistake. So summing up all of these, and then
1140
02:24:01,600 --> 02:24:07,760
I have. So what would be all the residuals, it would be y i,
1141
02:24:08,639 --> 02:24:16,159
minus y hat i, so the prediction for that on here. And then we're
1142
02:24:16,159 --> 02:24:24,319
all of the different i's in our data set. Right, so i, and then we
1143
02:24:24,319 --> 02:24:29,119
we have. So actually, I'm going to rewrite this to make it a
1144
02:24:29,120 --> 02:24:33,680
whatever the first data point is all the way through the nth data
1145
02:24:33,680 --> 02:24:42,399
it by n, which is how many points there are. Okay, so this is our
1146
02:24:42,399 --> 02:24:50,479
telling us, okay, in on average, this is the distance between our
1147
02:24:50,479 --> 02:25:01,359
actual value in our training set. Okay. And mae is good because it
1148
02:25:01,360 --> 02:25:08,720
get this value here, we can literally directly compare it to
1149
02:25:08,719 --> 02:25:17,920
So let's say y is we're talking, you know, the prediction of the
1150
02:25:17,920 --> 02:25:24,719
dollars. Once we have once we calculate the mae, we can literally
1151
02:25:24,719 --> 02:25:34,319
price, the average, how much we're off by is literally this many
1152
02:25:34,319 --> 02:25:40,159
mean absolute error. An evaluation technique that's also closely
1153
02:25:40,159 --> 02:25:53,280
squared error. And this is MSE for short. Okay. Now, if I take
1154
02:25:53,280 --> 02:25:59,360
and move it down here, well, the gist of mean squared error is
1155
02:25:59,360 --> 02:26:06,159
of the absolute value, we're going to square. So now the MSE is
1156
02:26:06,159 --> 02:26:11,920
okay, let's sum up something, right, so we're going to sum up all
1157
02:26:13,280 --> 02:26:19,120
So now I'm going to do y i minus y hat i. But instead of absolute
1158
02:26:19,120 --> 02:26:25,360
I'm going to square them all. And then I'm going to divide by n in
1159
02:26:25,360 --> 02:26:33,200
basically, now I'm taking all of these different values, and I'm
1160
02:26:33,200 --> 02:26:42,079
them to one another. And then I divide by n. And the reason why we
1161
02:26:42,079 --> 02:26:49,680
is that it helps us punish large errors in the prediction. And
1162
02:26:49,680 --> 02:26:55,760
because of differentiability, right? So a quadratic equation is
1163
02:26:55,760 --> 02:27:00,719
if you're familiar with calculus, a quadratic equation is
1164
02:27:00,719 --> 02:27:05,279
value function is not totally differentiable everywhere. But if
1165
02:27:05,280 --> 02:27:10,560
don't worry about it, you won't really need it right now. And now
1166
02:27:10,559 --> 02:27:16,239
error is that once I calculate the mean squared error over here,
1167
02:27:16,239 --> 02:27:25,360
want to compare the values. Well, it gets a little bit trickier to
1168
02:27:25,360 --> 02:27:33,280
error is in terms of y squared, right? It's this is now squared.
1169
02:27:33,280 --> 02:27:40,079
you know, how many dollars off am I I'm talking how many dollars
1170
02:27:40,079 --> 02:27:45,440
you know, to humans, it doesn't really make that much sense. Which
1171
02:27:45,440 --> 02:27:53,600
something known as the root mean squared error. And I'm just going
1172
02:27:53,600 --> 02:28:02,559
because it's very, very similar to mean squared error. Except now
1173
02:28:03,280 --> 02:28:10,640
Okay, so this is our messy, and we take the square root of that
1174
02:28:10,639 --> 02:28:17,760
term in which you know, we're defining our error is now in terms
1175
02:28:17,760 --> 02:28:23,280
So that's a pro of root mean squared error is that now we can say,
1176
02:28:23,280 --> 02:28:30,320
to this metric is this many dollar signs off from our predictor.
1177
02:28:30,319 --> 02:28:37,680
which is one of the pros of root mean squared error. And now
1178
02:28:37,680 --> 02:28:43,200
of determination, or r squared. And this is a formula for r
1179
02:28:43,200 --> 02:28:55,200
to one minus RSS over TSS. Okay, so what does that mean?
1180
02:28:56,639 --> 02:29:03,920
of the squared residuals. So maybe it should be SSR instead, but
1181
02:29:03,920 --> 02:29:14,079
RSS sum of the squared residuals, and this is equal to if I take
1182
02:29:14,799 --> 02:29:24,799
and I take y i minus y hat, i, and square that, that is my RSS,
1183
02:29:24,799 --> 02:29:30,639
residuals. Now TSS, let me actually use a different color for
1184
02:29:30,639 --> 02:29:38,479
So TSS is the total sum of squares.
1185
02:29:41,040 --> 02:29:46,640
And what that means is that instead of being with respect to this
1186
02:29:52,079 --> 02:29:59,440
take each y value and just subtract the mean of all the y values,
1187
02:30:16,000 --> 02:30:23,040
actually, let's use a different color. Let's use green. If this
1188
02:30:24,799 --> 02:30:33,039
so RSS is giving me this measure here, right? It's giving me some
1189
02:30:33,040 --> 02:30:41,840
our regressor that we predicted. Actually, I'm gonna take this
1190
02:30:41,840 --> 02:30:52,639
and actually, I'm going to use red for that. Well, TSS, on the
1191
02:30:52,639 --> 02:30:59,039
how far off are these values from the mean. So if we literally
1192
02:30:59,040 --> 02:31:04,800
line of best fit, if we just took all the y values and average all
1193
02:31:04,799 --> 02:31:10,159
this is the average value for every single x value, I'm just going
1194
02:31:10,159 --> 02:31:16,000
instead, then it's asking, okay, how far off are all these points
1195
02:31:19,120 --> 02:31:26,079
Okay, and remember that this square means that we're punishing
1196
02:31:26,079 --> 02:31:32,959
they look somewhat close in terms of distance, the further a few
1197
02:31:32,959 --> 02:31:39,439
the larger our total sum of squares is going to be. Sorry, that
1198
02:31:39,440 --> 02:31:44,960
squares is taking all of these values and saying, okay, what is
1199
02:31:44,959 --> 02:31:51,119
any regressor, and I literally just calculated the average of all
1200
02:31:51,120 --> 02:31:55,440
and for every single x value, I'm just going to predict that
1201
02:31:55,440 --> 02:32:00,720
like, that means that maybe y and x aren't associated with each
1202
02:32:00,719 --> 02:32:05,599
best thing that I can do for any new x value, just predict, hey,
1203
02:32:05,600 --> 02:32:11,200
And this total sum of squares is saying, okay, well, with respect
1204
02:32:12,239 --> 02:32:19,920
what is our error? Right? So up here, the sum of the squared
1205
02:32:19,920 --> 02:32:26,799
our what what is our error with respect to this line of best fit?
1206
02:32:26,799 --> 02:32:34,559
saying what is the error with respect to, you know, just the
1207
02:32:34,559 --> 02:32:44,639
of best fit is a better fit, then this total sum of squares, that
1208
02:32:46,079 --> 02:32:51,520
that means that this numerator is going to be smaller than this
1209
02:32:52,319 --> 02:32:59,600
And if our errors in our line of best fit are much smaller, then
1210
02:32:59,600 --> 02:33:06,960
of the RSS over TSS is going to be very small, which means that R
1211
02:33:06,959 --> 02:33:14,319
one. And now when R squared is towards one, that means that that's
1212
02:33:14,319 --> 02:33:24,719
good predictor. It's one of the signs, not the only one. So over
1213
02:33:24,719 --> 02:33:29,840
that there's this adjusted R squared. And what that does, it just
1214
02:33:29,840 --> 02:33:36,000
So x1, x2, x3, etc. It adjusts for how many extra terms we add,
1215
02:33:37,280 --> 02:33:42,480
you know, add an extra term, the R squared value will increase
1216
02:33:42,479 --> 02:33:48,879
y some more. But the value for the adjusted R squared increase if
1217
02:33:48,879 --> 02:33:54,000
improves this model fit more than expected, you know, by chance.
1218
02:33:54,000 --> 02:33:58,159
R squared is. I'm not, you know, it's out of the scope of this one
1219
02:33:58,159 --> 02:34:04,559
And now that's linear regression. Basically, I've covered the
1220
02:34:05,280 --> 02:34:11,040
And, you know, how do we use that in order to find the line of
1221
02:34:11,040 --> 02:34:15,200
our computer can do all the calculations for us, which is nice.
1222
02:34:15,200 --> 02:34:20,400
it's trying to minimize that error, right? And then we've gone
1223
02:34:20,399 --> 02:34:25,440
ways of actually evaluating a linear regression model and the pros
1224
02:34:26,559 --> 02:34:31,760
So now let's look at an example. So we're still on supervised
1225
02:34:31,760 --> 02:34:37,120
talk about regression. So what happens when you don't just want to
1226
02:34:37,120 --> 02:34:43,840
What happens if you actually want to predict a certain value? So
1227
02:34:43,840 --> 02:34:54,399
learning repository. And here I found this data set about bike
1228
02:34:55,040 --> 02:35:01,520
So this data set is predicting rental bike count. And here it's
1229
02:35:01,520 --> 02:35:08,159
hour. So what we're going to do, again, you're going to go into
1230
02:35:08,159 --> 02:35:19,520
to download this CSV file. And we're going to move over to collab
1231
02:35:19,520 --> 02:35:29,680
this FCC bikes and regression. I don't remember what I called the
1232
02:35:29,680 --> 02:35:39,600
regression. Now I'm going to import a bunch of the same things
1233
02:35:39,600 --> 02:35:46,559
I'm going to also continue to import the oversampler and the
1234
02:35:46,559 --> 02:35:52,799
also just going to let you guys know that I have a few more things
1235
02:35:52,799 --> 02:35:59,199
library that lets us copy things. Seaborn is a wrapper over a
1236
02:35:59,200 --> 02:36:03,280
to plot certain things. And then just letting you know that we're
1237
02:36:03,280 --> 02:36:07,920
TensorFlow. Okay, so one more thing that we're also going to be
1238
02:36:07,920 --> 02:36:13,760
sklearn linear model library. Actually, let me make my screen a
1239
02:36:15,600 --> 02:36:25,120
awesome. Run this and that'll import all the things that we need.
1240
02:36:25,120 --> 02:36:34,960
you know, give some credit to where we got this data set. So let
1241
02:36:38,000 --> 02:36:42,159
And I will also give credit to this here.
1242
02:36:46,559 --> 02:36:54,319
Okay, cool. All right, cool. So this is our data set. And again,
1243
02:36:54,319 --> 02:37:01,520
attributes that we have right here. So I'm actually going to go
1244
02:37:05,280 --> 02:37:09,280
Feel free to copy and paste this if you want me to read it out
1245
02:37:09,280 --> 02:37:18,960
It's byte count, hour, temp, humidity, wind, visibility, dew
1246
02:37:18,959 --> 02:37:27,279
snow, and functional, whatever that means. Okay, so I'm going to
1247
02:37:27,280 --> 02:37:34,800
by dragging and dropping. All right. Now, one thing that you guys
1248
02:37:34,799 --> 02:37:41,359
you might actually have to open up the CSV because there were, at
1249
02:37:41,360 --> 02:37:46,319
characters in mine, at least. So you might have to get rid of
1250
02:37:46,319 --> 02:37:50,639
but my computer wasn't recognizing it. So I got rid of that. So
1251
02:37:50,639 --> 02:37:58,639
and get rid of some of those labels that are incorrect. I'm going
1252
02:37:59,600 --> 02:38:07,040
after we've done that, we've imported in here, I'm going to create
1253
02:38:07,040 --> 02:38:12,560
all right, so now what I can do is I can read that CSV file and I
1254
02:38:12,559 --> 02:38:21,359
So so like data dot CSV. Okay, so now if I call data dot head,
1255
02:38:21,360 --> 02:38:32,079
various labels, right? And then I have the data in there. So I'm
1256
02:38:32,079 --> 02:38:37,600
going to get rid of some of these columns that, you know, I don't
1257
02:38:37,600 --> 02:38:44,159
I'm going to, when I when I type this in, I'm going to drop maybe
1258
02:38:44,159 --> 02:38:53,039
holiday, and the various seasons. So I'm just not going to care
1259
02:38:53,040 --> 02:38:59,120
one means drop it from the columns. So now you'll see that okay,
1260
02:38:59,120 --> 02:39:05,280
I guess you don't really notice it. But if I set the data frames
1261
02:39:05,280 --> 02:39:11,280
and I look at, you know, the first five things, then you'll see
1262
02:39:11,280 --> 02:39:17,520
It's a lot easier to read. So another thing is, I'm actually going
1263
02:39:18,319 --> 02:39:24,239
df functional. And we're going to create this. So remember that
1264
02:39:24,239 --> 02:39:30,000
at language, we want it to be in zeros and ones. So here, I will
1265
02:39:30,000 --> 02:39:39,920
Well, if this is equal to yes, then that that gets mapped as one.
1266
02:39:41,040 --> 02:39:48,560
Great. Cool. So the thing is, right now, these by counts are for
1267
02:39:48,559 --> 02:39:52,559
to make this example simpler, I'm just going to index on an hour,
1268
02:39:52,559 --> 02:39:59,359
we're only going to use that specific hour. So I'm just going to
1269
02:39:59,360 --> 02:40:07,680
going to use an hour. So here, let's say. So this data frame is
1270
02:40:07,680 --> 02:40:17,600
the hour, let's say it equals 12. Okay, so it's noon. All right.
1271
02:40:17,600 --> 02:40:31,120
equal to 12. And I'm actually going to now drop that column. Our
1272
02:40:31,120 --> 02:40:38,480
so we run this cell. Okay, so now we got rid of the hour in here.
1273
02:40:38,479 --> 02:40:45,760
the temperature, humidity, wind, visibility, and yada, yada, yada.
1274
02:40:45,760 --> 02:40:54,639
is I'm going to actually plot all of these. So for i in all the
1275
02:40:55,440 --> 02:40:59,280
whatever its data frame is, and all the columns, because I don't
1276
02:41:00,159 --> 02:41:05,760
actually, it's my first thing. So what I'm going to do is say for
1277
02:41:06,559 --> 02:41:10,159
columns, everything after the first thing, so that would give me
1278
02:41:10,159 --> 02:41:19,440
onwards. So these are all my features, right? I'm going to just
1279
02:41:19,440 --> 02:41:29,680
label how that specific data, how that affects the by count. So
1280
02:41:29,680 --> 02:41:35,760
the y axis. And I'm going to plot, you know, whatever the specific
1281
02:41:35,760 --> 02:41:46,000
And I'm going to title this, whatever the label is. And, you know,
1282
02:41:46,639 --> 02:41:58,079
at noon. And the x label as just the label. Okay, now, I guess we
1283
02:41:58,079 --> 02:42:10,000
We don't even need the legend. So just show that plot. All right.
1284
02:42:10,000 --> 02:42:21,920
not really doesn't really give us any utility. So then snow rain
1285
02:42:21,920 --> 02:42:31,040
you know, is fairly linear dew point temperature, visibility, wind
1286
02:42:31,040 --> 02:42:37,200
much humidity, kind of maybe like an inverse relationship. But the
1287
02:42:37,200 --> 02:42:41,680
looks like there's a relationship between that and the number of
1288
02:42:41,680 --> 02:42:46,000
going to do is I'm going to drop some of the ones that don't don't
1289
02:42:46,000 --> 02:42:56,959
maybe wind, you know, visibility. Yeah, so I'm going to get rid of
1290
02:42:59,280 --> 02:43:13,760
So now data frame, and I'm going to drop wind, visibility, and
1291
02:43:13,760 --> 02:43:21,200
axis again is the column. So that's one. So if I look at my data
1292
02:43:21,200 --> 02:43:27,200
temperature, the humidity, the dew point temperature, radiation,
1293
02:43:27,200 --> 02:43:33,760
what I want to do is I want to split this into my training, my
1294
02:43:34,319 --> 02:43:42,719
just as we talked before. Here, we can use the exact same thing
1295
02:43:42,719 --> 02:43:51,359
numpy dot split, and sample, you know that the whole sample, and
1296
02:43:54,000 --> 02:44:02,559
of the data frame. And we're going to do that. But now set this to
1297
02:44:04,639 --> 02:44:10,159
So I don't really care about, you know, the the full grid, the
1298
02:44:10,159 --> 02:44:19,680
use an underscore for that variable. But I will get my training x
1299
02:44:19,680 --> 02:44:29,600
have a function for getting the x and y's. So here, I'm going to
1300
02:44:30,159 --> 02:44:36,879
get x y. And I'm going to pass in the data frame. And I'm actually
1301
02:44:36,879 --> 02:44:47,039
of the y label is, and what the x what specific x labels I want to
1302
02:44:47,040 --> 02:44:51,520
then I'm just like, like, I'm only going to I'm going to get
1303
02:44:51,520 --> 02:45:00,560
not the wildlife. So here, I'm actually going to make first a deep
1304
02:45:00,559 --> 02:45:08,879
And that basically means I'm just copying everything over. If, if
1305
02:45:08,879 --> 02:45:14,559
so if not x labels, then all I'm going to do is say, all right, x
1306
02:45:14,559 --> 02:45:22,959
data frame is. And I'm just going to take all the columns. So C
1307
02:45:22,959 --> 02:45:32,239
if C does not equal the y label, right, and I'm going to get the
1308
02:45:32,239 --> 02:45:40,159
is the x labels, well, okay, so in order to index only one thing,
1309
02:45:40,159 --> 02:45:50,000
one thing in here, then my data frame is, so let me make a case
1310
02:45:50,000 --> 02:46:00,319
labels is equal to one, then what I'm going to do is just say that
1311
02:46:00,319 --> 02:46:07,600
and add that just that label values, and I actually need to
1312
02:46:08,159 --> 02:46:15,039
So I'm going to pass in negative one comma one there. Now,
1313
02:46:15,040 --> 02:46:20,000
specific x labels that I want to use, then I'm actually just going
1314
02:46:20,000 --> 02:46:28,719
frame of those x labels, dot values. And that should suffice.
1315
02:46:28,719 --> 02:46:36,159
extracting x. And in order to get my y, I'm going to do y equals
1316
02:46:36,159 --> 02:46:45,440
label. And at the very end, I'm going to say data equals NP dot h
1317
02:46:45,440 --> 02:46:54,960
one next to each other. And I'll take x and y, and return that.
1318
02:46:54,959 --> 02:46:59,119
And I'm actually going to reshape this to make it 2d as well so
1319
02:46:59,120 --> 02:47:10,160
And I will return data x, y. So now I should be able to say, okay,
1320
02:47:10,159 --> 02:47:18,639
frame. And the y label, so my y label is byte count. And actually,
1321
02:47:18,639 --> 02:47:24,399
going to let's just do like one dimension right now. And earlier,
1322
02:47:24,399 --> 02:47:30,719
had seen that maybe, you know, the temperature dimension does
1323
02:47:30,719 --> 02:47:38,639
to use that to predict why. So I'm going to label this also that,
1324
02:47:38,639 --> 02:47:48,559
temperature. And I am also going to do this again for, oh, this
1325
02:47:48,559 --> 02:48:00,239
validation. And this should be a test. Because oh, that's Val.
1326
02:48:01,920 --> 02:48:08,079
And this should be test. Alright, so we run this and now we have
1327
02:48:08,639 --> 02:48:16,239
data sets for just the temperature. So if I look at x train temp,
1328
02:48:16,239 --> 02:48:23,039
Okay, and I'm doing this first to show you simple linear
1329
02:48:23,040 --> 02:48:30,800
create a regressor. So I can say the temp regressor here. And then
1330
02:48:30,799 --> 02:48:40,000
linear regression model. And just like before, I can simply fix
1331
02:48:40,000 --> 02:48:48,239
in order to train train this linear regression model. Alright, and
1332
02:48:49,040 --> 02:49:02,160
this regressor is coefficients and the intercept. So if I do that,
1333
02:49:02,159 --> 02:49:11,039
for whatever the temperature is, and then the the x intercept,
1334
02:49:11,040 --> 02:49:25,920
right. And I can, you know, score, so I can get the the r squared
1335
02:49:25,920 --> 02:49:35,520
and y test. All right, so it's an r squared of around point three
1336
02:49:35,520 --> 02:49:40,880
zero, which would mean, hey, there's absolutely no association.
1337
02:49:42,319 --> 02:49:47,520
good, it depends on the context. But, you know, the higher that
1338
02:49:47,520 --> 02:49:53,680
the two variables would be correlated, right? Which here, it's all
1339
02:49:53,680 --> 02:50:00,319
maybe some association between the two. But the reason why I want
1340
02:50:00,319 --> 02:50:06,799
you, you know, if we plotted this, this is what it would look
1341
02:50:07,440 --> 02:50:22,480
and let's take the training. So this is our data. And then let's
1342
02:50:22,479 --> 02:50:29,279
also plotted, so something that I can do is say, you know, the x
1343
02:50:29,840 --> 02:50:36,399
is when space, and this goes from negative 20 to 40, this piece of
1344
02:50:36,399 --> 02:50:47,199
let's take 100 things from there. So I'm going to plot x, and I'm
1345
02:50:47,200 --> 02:50:55,840
this, like, regressor, and predict x with that. Okay, and this
1346
02:50:57,200 --> 02:51:08,800
the fit. And this color, let's make this red. And let's actually
1347
02:51:08,799 --> 02:51:20,719
I can change how thick that value is. Okay. Now at the very end,
1348
02:51:21,920 --> 02:51:30,239
all right, let's also create, you know, title, all these things
1349
02:51:30,239 --> 02:51:39,360
here, let's just say, this would be the bikes, versus the
1350
02:51:39,360 --> 02:51:48,400
would be number of bikes. And the x label would be the
1351
02:51:48,399 --> 02:51:57,920
might cause an error. Yeah. So it's expecting a 2d array. So we
1352
02:51:57,920 --> 02:52:15,120
Okay, there we go. So I just had to make this an array and then
1353
02:52:15,120 --> 02:52:20,960
we see that, all right, this increases. But again, remember those
1354
02:52:20,959 --> 02:52:26,799
linear regression, like this, I don't really know if this fits
1355
02:52:26,799 --> 02:52:32,159
wanted to show you guys though, that like, all right, this is what
1356
02:52:32,159 --> 02:52:46,399
data would look like. Okay. Now, we can do multiple linear
1357
02:52:46,399 --> 02:52:58,079
and do that as well. Now, if I take my data set, and instead of
1358
02:52:58,079 --> 02:53:09,600
my current data set right now. Alright, so let's just use all of
1359
02:53:09,600 --> 02:53:18,399
right. So I'm going to just say for the x labels, let's just take
1360
02:53:18,399 --> 02:53:30,559
remove the byte count. So does that work? So if this part should
1361
02:53:30,559 --> 02:53:39,039
this should work now. Oops, sorry. Okay, so I have Oh, but this
1362
02:53:39,040 --> 02:53:48,160
temperature anymore, we should actually do this, let's say all,
1363
02:53:48,159 --> 02:53:53,920
rerun this piece here so that we have our temperature only data
1364
02:53:53,920 --> 02:54:02,000
all data set. Okay. And this regressor, I can do the same thing.
1365
02:54:02,000 --> 02:54:12,879
And I'm going to make this the linear regression. And I'm going to
1366
02:54:12,879 --> 02:54:20,959
train all. Okay. Alright, so let's go ahead and also score this
1367
02:54:20,959 --> 02:54:30,159
R squared performs now. So if I test this on the test data set,
1368
02:54:30,159 --> 02:54:37,200
square seems to improve it went from point four to point five,
1369
02:54:38,319 --> 02:54:44,559
And I can't necessarily plot, you know, every single dimension.
1370
02:54:44,559 --> 02:54:49,680
to say, okay, this is this is improved, right? Alright, so one
1371
02:54:49,680 --> 02:55:00,079
tensorflow is you can actually do regression, but with the neural
1372
02:55:00,079 --> 02:55:08,879
to we already have our our training data for just the temperature
1373
02:55:08,879 --> 02:55:13,839
different columns. So I'm not going to bother with splitting up
1374
02:55:13,840 --> 02:55:20,639
ahead and start building the model. So in this linear regression
1375
02:55:20,639 --> 02:55:28,079
it does help if we normalize it. So that's very easy to do with
1376
02:55:28,079 --> 02:55:36,719
normalizer layer. So I'm going to do tensorflow Keras layers, and
1377
02:55:37,440 --> 02:55:43,920
And the input shape for that will just be one because let's just
1378
02:55:43,920 --> 02:55:53,520
temperature and the access I will make none. Now for this temp
1379
02:55:53,520 --> 02:56:04,960
an equal sign there. I'm going to adapt this to X train temp, and
1380
02:56:06,479 --> 02:56:14,799
So that should work great. Now with this model, so temp neural net
1381
02:56:14,799 --> 02:56:23,759
you know, dot keras, sequential. And I'm going to pass in this
1382
02:56:23,760 --> 02:56:29,920
going to say, hey, just give me one single dense layer with one
1383
02:56:29,920 --> 02:56:37,120
is saying, all right, well, one single node just means that it's
1384
02:56:37,120 --> 02:56:43,360
sort of activation function to it, the output is also linear. So
1385
02:56:43,360 --> 02:56:52,960
Keras layers dot dense. And I'm just going to have one unit. And
1386
02:56:54,479 --> 02:57:06,799
So with this model, let's compile. And for our optimizer, let's
1387
02:57:06,799 --> 02:57:16,399
let's use the atom again, dot atom, and we have to pass in the
1388
02:57:16,399 --> 02:57:26,879
and our learning rate, let's do 0.01. And now, the loss, we
1389
02:57:26,879 --> 02:57:34,079
loss, I'm going to do mean squared error. Okay, so we run that
1390
02:57:34,079 --> 02:57:41,440
And just like before, we can call history. And I'm going to fit
1391
02:57:41,440 --> 02:57:48,640
if I call fit, I can just fit it, and I'm going to take the x
1392
02:57:49,280 --> 02:57:57,840
but reshape it. Y train for the temperature. And I'm going to set
1393
02:57:57,840 --> 02:58:04,479
that it doesn't, you know, display stuff. I'm actually going to
1394
02:58:04,479 --> 02:58:13,760
1000. And the validation data should be let's pass in the
1395
02:58:16,319 --> 02:58:22,799
as a tuple. And I know I spelled that wrong. So let's just run
1396
02:58:22,799 --> 02:58:27,759
And up here, I've copied and pasted the plot loss from our
1397
02:58:27,760 --> 02:58:34,159
to MSC. Because now we're talking we're dealing with mean squared
1398
02:58:34,159 --> 02:58:39,119
the loss of this history after it's done. So let's just wait for
1399
02:58:39,120 --> 02:58:50,320
plot. Okay, so this actually looks pretty good. We see that the
1400
02:58:50,319 --> 02:58:56,479
this actually looks pretty good. We see that the values are
1401
02:58:56,479 --> 02:59:05,520
I'm going to go back up and take this plot. And we are going to
1402
02:59:07,200 --> 02:59:14,400
here, instead of this temperature regressor, I'm going to use the
1403
02:59:17,360 --> 02:59:25,200
And if I run that, I can see that, you know, this also gives me a
1404
02:59:26,399 --> 02:59:30,079
you'll notice that this this fit is not entirely the same as the
1405
02:59:31,120 --> 02:59:38,800
up here. And that's due to the training process of, you know, of
1406
02:59:38,799 --> 02:59:45,279
different ways to try and try to find the best linear regressor.
1407
02:59:45,280 --> 02:59:50,960
propagation to train a neural net node, whereas in the other one,
1408
02:59:50,959 --> 02:59:58,719
Okay, they're probably just trying to actually compute the line of
1409
02:59:59,600 --> 03:00:08,479
well, we can repeat the exact same exercise with our with our
1410
03:00:09,360 --> 03:00:14,560
but I'm actually going to skip that part. I will leave that as an
1411
03:00:14,559 --> 03:00:19,039
so now what would happen if we use a neural net, a real neural net
1412
03:00:19,040 --> 03:00:24,960
one single node in order to predict this. So let's start on that
1413
03:00:24,959 --> 03:00:31,439
normalizer. So I'm actually going to take the same setup here. But
1414
03:00:31,440 --> 03:00:37,520
one dense layer, I'm going to set this equal to 32 units. And for
1415
03:00:37,520 --> 03:00:46,159
Relu. And now let's duplicate that. And for the final output, I
1416
03:00:46,159 --> 03:00:52,079
want one cell. And this activation is also going to be Relu,
1417
03:00:52,079 --> 03:00:57,039
zero bytes. So I'm just going to set that as Relu. I'm just going
1418
03:00:57,040 --> 03:01:04,640
Okay. And at the bottom, I'm going to have this neural net model.
1419
03:01:04,639 --> 03:01:16,319
net model, I'm going to compile. And I will actually use the same
1420
03:01:18,639 --> 03:01:27,279
instead of a learning rate of 0.01, I'll use 0.001. Okay. And I'm
1421
03:01:27,280 --> 03:01:39,920
So the history is this neural net model. And I'm going to fit that
1422
03:01:39,920 --> 03:01:54,479
temp, and valid validation data, I'm going to set this again equal
1423
03:01:54,479 --> 03:02:03,600
Now, for the verbose, I'm going to say equal to zero epochs, let's
1424
03:02:03,600 --> 03:02:08,559
size, actually, let's just not do a batch size right now. Let's
1425
03:02:08,559 --> 03:02:18,319
here. And again, we can plot the loss of this history after it's
1426
03:02:18,319 --> 03:02:26,879
run this. And that's not what we're supposed to get. So what is
1427
03:02:26,879 --> 03:02:39,679
we have our temperature normalizer, which I'm wondering now if we
1428
03:02:39,680 --> 03:02:51,040
Do that. Okay, so we do see this decline, it's an interesting
1429
03:02:53,280 --> 03:02:57,280
So this is our loss, which all right, if decreasing, that's a good
1430
03:02:57,920 --> 03:03:04,079
And actually, what's interesting is let's just let's plot this
1431
03:03:04,079 --> 03:03:09,840
And you'll see that we actually have this like, curve that looks
1432
03:03:09,840 --> 03:03:19,600
what if I got rid of this activation? Let's train this again. And
1433
03:03:21,120 --> 03:03:27,600
Alright, so even even when I got rid of that really at the end, it
1434
03:03:27,600 --> 03:03:36,559
it's not the best model, if we had maybe one more layer in here,
1435
03:03:36,559 --> 03:03:41,680
to play around with. When you're, you know, working with machine
1436
03:03:41,680 --> 03:03:53,440
know what the best model is going to be. For example, this also is
1437
03:03:53,440 --> 03:04:00,399
it's okay. So my point is, though, that with a neural net, I mean,
1438
03:04:00,399 --> 03:04:04,959
there's like no data down here, right? So it's kind of hard for
1439
03:04:04,959 --> 03:04:09,439
we probably should have started the prediction somewhere around
1440
03:04:09,440 --> 03:04:14,560
with this neural net model, you can see that this is no longer a
1441
03:04:14,559 --> 03:04:21,600
get an estimate of the value, right? And we can repeat this exact
1442
03:04:21,600 --> 03:04:30,640
do that. Right. And we can repeat this exact same exercise with
1443
03:04:33,520 --> 03:04:40,720
if I now pass in all of the data, so this is my all normalizer
1444
03:04:40,719 --> 03:04:54,479
and I should just be able to pass in that. So let's move this to
1445
03:04:54,479 --> 03:05:00,959
I'm going to pass in my all normalizer. And let's compile it.
1446
03:05:02,959 --> 03:05:10,479
Great. So here with the history, when we're trying to fit this
1447
03:05:10,479 --> 03:05:17,680
we're going to use our larger data set with all the features. And
1448
03:05:22,000 --> 03:05:23,680
And of course, we want to plot the loss.
1449
03:05:31,520 --> 03:05:37,760
Okay, so that's what our loss looks like. So an interesting curve,
1450
03:05:37,760 --> 03:05:44,479
So before we saw that our R squared score was around point five,
1451
03:05:44,479 --> 03:05:49,680
that with a neural net anymore. But one thing that we can measure
1452
03:05:49,680 --> 03:05:59,600
error, right? So if I come down here, and I compare the two mean
1453
03:05:59,600 --> 03:06:13,360
so I can predict x test all right. So these are my predictions
1454
03:06:14,079 --> 03:06:20,159
will linear multiple multiple linear regressor. So these are my
1455
03:06:20,159 --> 03:06:32,079
Okay. I'm actually going to do that at the bottom. So let me just
1456
03:06:32,079 --> 03:06:41,760
it down here. So now I'm going to calculate the mean squared error
1457
03:06:41,760 --> 03:06:51,360
and the neural net. Okay, so this is my linear and this is my
1458
03:06:51,360 --> 03:07:03,760
model, and I predict x test all, I get my two, you know, different
1459
03:07:03,760 --> 03:07:11,280
the mean squared error, right? So if I want to get the mean
1460
03:07:11,280 --> 03:07:19,200
and y real, I can do numpy dot square, and then I would need the y
1461
03:07:19,200 --> 03:07:31,840
real. So this this is basically squaring everything. And this
1462
03:07:31,840 --> 03:07:42,000
this entire thing and take the mean of that, that should give me
1463
03:07:44,959 --> 03:07:52,639
And the y real is y test all, right? So that's my mean squared
1464
03:07:52,639 --> 03:08:04,559
And this is my mean squared error for the neural net. So that's
1465
03:08:04,559 --> 03:08:14,399
I guess. So my guess is that it's probably coming from this
1466
03:08:14,399 --> 03:08:33,279
shape is probably just six. And okay, so that works now. And the
1467
03:08:33,280 --> 03:08:39,040
my inputs are only for every vector, it's only a one dimensional
1468
03:08:39,040 --> 03:08:46,000
have I should have just had six, comma, which is a tuple of size
1469
03:08:46,000 --> 03:08:54,079
a tuple containing one element, which is a six. Okay, so it's
1470
03:08:54,079 --> 03:09:00,479
net results seem like they they have a larger mean squared error
1471
03:09:00,479 --> 03:09:09,840
One thing that we can look at is, we can actually plot the real
1472
03:09:09,840 --> 03:09:21,200
results versus what the predictions are. So if I say, some access,
1473
03:09:21,200 --> 03:09:31,280
axes and make these equal, then I can scatter the the y, you know,
1474
03:09:31,280 --> 03:09:40,000
values are on the x axis, and then what the prediction are on the
1475
03:09:40,000 --> 03:09:50,159
label this as the linear regression predictions. Okay, so then let
1476
03:09:50,159 --> 03:09:59,360
x axis, I'm going to say is the true values. The y axis is going
1477
03:10:04,319 --> 03:10:09,279
Or actually, let's plot. Let's just make this predictions.
1478
03:10:09,280 --> 03:10:19,200
And then at the end, I'm going to plot. Oh, let's set some
1479
03:10:22,879 --> 03:10:26,159
Because I think that's like approximately the max number of
1480
03:10:28,639 --> 03:10:35,199
So I'm going to set my x limit to this and my y limit to this.
1481
03:10:35,200 --> 03:10:45,920
So here, I'm going to pass that in here too. And all right, this
1482
03:10:46,479 --> 03:10:54,719
linear regressor. You see that actually, they align quite well, I
1483
03:10:54,719 --> 03:11:03,359
probably too much 2500. I mean, looks like maybe like 1800 would
1484
03:11:03,360 --> 03:11:09,360
And I'm actually going to label something else, the neural net
1485
03:11:12,719 --> 03:11:22,000
Let's add a legend. So you can see that our neural net for the
1486
03:11:22,000 --> 03:11:28,479
it's a little bit more spread out. And it seems like we tend to
1487
03:11:28,479 --> 03:11:36,479
here in this area. Okay. And for some reason, these are way off as
1488
03:11:37,840 --> 03:11:44,479
But yeah, so we've basically used a linear regressor and a neural
1489
03:11:44,479 --> 03:11:48,559
sometimes where a neural net is more appropriate and a linear
1490
03:11:49,120 --> 03:11:54,720
I think that it just comes with time and trying to figure out, you
1491
03:11:54,719 --> 03:11:59,279
like, hey, what works better, like here, a linear, a multiple
1492
03:11:59,280 --> 03:12:05,760
better than a neural net. But for example, with the one
1493
03:12:05,760 --> 03:12:12,880
never be able to see this curve. Okay. I mean, I'm not saying this
1494
03:12:12,879 --> 03:12:19,439
just saying like, hey, you know, sometimes it might be more
1495
03:12:19,440 --> 03:12:29,120
linear. So yeah, I will leave regression at that. Okay, so we just
1496
03:12:29,840 --> 03:12:34,880
And in supervised learning, we have data, we have some a bunch of
1497
03:12:34,879 --> 03:12:39,759
different samples. But each of those samples has some sort of
1498
03:12:39,760 --> 03:12:46,159
a category, a class, etc. Right, we were able to use that label in
1499
03:12:46,159 --> 03:12:51,840
right, we were able to use that label in order to try to predict
1500
03:12:51,840 --> 03:12:59,520
we haven't seen yet. Well, now let's move on to unsupervised
1501
03:12:59,520 --> 03:13:05,600
learning, we have a bunch of unlabeled data. And what can we do
1502
03:13:05,600 --> 03:13:13,120
anything from this data? So the first algorithm that we're going
1503
03:13:13,120 --> 03:13:22,720
clustering. What k means clustering is trying to do is it's trying
1504
03:13:25,760 --> 03:13:31,360
So in this example below, I have a bunch of scattered points. And
1505
03:13:31,360 --> 03:13:38,079
is x zero and x one on the two axes, which means I'm actually
1506
03:13:38,079 --> 03:13:44,799
right of each point, but we don't know what the y label is for
1507
03:13:44,799 --> 03:13:51,439
at these scattered points, we can kind of see how there are
1508
03:13:51,440 --> 03:14:00,319
right. So depending on what we pick for k, we might have different
1509
03:14:00,319 --> 03:14:05,440
right, then we might pick, okay, this seems like it could be one
1510
03:14:05,440 --> 03:14:12,399
another cluster. So those might be our two different clusters. If
1511
03:14:13,120 --> 03:14:18,160
for example, then okay, this seems like it could be a cluster.
1512
03:14:18,159 --> 03:14:23,119
cluster. And maybe this could be a cluster, right. So we could
1513
03:14:23,120 --> 03:14:33,520
data set. Now, this k here is predefined, if I can spell that
1514
03:14:33,520 --> 03:14:42,479
the model. So that would be you. All right. And let's discuss how
1515
03:14:42,479 --> 03:14:49,199
goes through and computes the k clusters. So I'm going to write
1516
03:14:52,639 --> 03:15:01,279
Now, the first step that happens is we actually choose well, the
1517
03:15:01,280 --> 03:15:11,280
points on this plot to be the centroids. And by centuries, I just
1518
03:15:11,840 --> 03:15:16,799
Okay. So three random points, let's say we're doing k equals
1519
03:15:16,799 --> 03:15:21,519
random points to be the centroids of the three clusters. If it
1520
03:15:21,520 --> 03:15:27,760
random points. Okay. So maybe the three random points I'm choosing
1521
03:15:27,760 --> 03:15:41,680
Here, here, and here. All right. So we have three different
1522
03:15:46,000 --> 03:15:58,639
the distance for each point to those centroids. So between all the
1523
03:16:01,360 --> 03:16:06,400
So basically, I'm saying, all right, this is this distance, this
1524
03:16:07,600 --> 03:16:13,120
all of these distances, I'm computing between oops, not those two,
1525
03:16:13,120 --> 03:16:18,720
centroids themselves. So I'm computing the distances for all of
1526
03:16:20,079 --> 03:16:30,639
Okay. And that comes with also assigning those points to the
1527
03:16:34,799 --> 03:16:42,399
What do I mean by that? So let's take this point here, for
1528
03:16:42,399 --> 03:16:46,959
this distance, this distance, and this distance. And I'm saying,
1529
03:16:46,959 --> 03:16:54,399
is the closest. So I'm actually going to put this into the red
1530
03:16:54,399 --> 03:17:03,279
all of these points, it seems slightly closer to red, and this one
1531
03:17:03,280 --> 03:17:13,040
right? Now for the blue, I actually wouldn't put any blue ones in
1532
03:17:13,040 --> 03:17:21,200
actually, that first one is closer to red. And now it seems like
1533
03:17:21,200 --> 03:17:31,440
closer to green. So let's just put all of these into green here,
1534
03:17:31,440 --> 03:17:38,480
have, you know, our two, three, technically centroid. So there's
1535
03:17:38,479 --> 03:17:44,879
this group here. And then blue is kind of just this group here, it
1536
03:17:44,879 --> 03:17:54,559
of the points yet. So the next step, three that we do is we
1537
03:17:54,559 --> 03:18:02,799
centroid. So we compute new centroids based on the points that we
1538
03:18:04,000 --> 03:18:10,159
And by that, I just mean, okay, well, let's take the average of
1539
03:18:10,159 --> 03:18:15,680
new centroid? That's probably going to be somewhere around here,
1540
03:18:15,680 --> 03:18:22,800
any points in there. So we won't touch and then the screen one, we
1541
03:18:22,799 --> 03:18:36,239
over here, oops, somewhere over here. Right. So now if I erase all
1542
03:18:38,239 --> 03:18:44,239
I can go and I can actually redo step two over here, this
1543
03:18:45,280 --> 03:18:48,560
Alright, so I'm going to go back and I'm going to iterate through
1544
03:18:48,559 --> 03:18:55,199
and I'm going to recompute my three centroids. So let's see, we're
1545
03:18:55,200 --> 03:19:01,840
these are definitely all red, right? This one still looks a bit
1546
03:19:03,760 --> 03:19:06,800
this part, we actually start getting closer to the blues.
1547
03:19:08,159 --> 03:19:16,799
So this one still seems closer to a blue than a green, this one as
1548
03:19:16,799 --> 03:19:26,399
would belong to green. Okay, so now our three centroids are three,
1549
03:19:26,399 --> 03:19:39,840
would be this, this, and then this, right? Those are our three
1550
03:19:39,840 --> 03:19:44,079
and we compute the new sorry, those would be the three clusters.
1551
03:19:44,079 --> 03:19:50,639
the three centroids. So I'm going to get rid of this, this and
1552
03:19:50,639 --> 03:19:57,680
red be centered, probably closer, you know, to this point here,
1553
03:19:57,680 --> 03:20:05,520
up here. And then this green would probably be somewhere. It's
1554
03:20:05,520 --> 03:20:10,880
before. But it seems like it'd be pulled down a bit. So probably
1555
03:20:10,879 --> 03:20:20,239
All right. And now, again, we go back and we compute the distance
1556
03:20:20,239 --> 03:20:27,600
and the centroids. And then we assign them to the closest
1557
03:20:27,600 --> 03:20:36,000
it's very clear. Actually, let me just circle that. And this it
1558
03:20:36,000 --> 03:20:43,440
it actually seemed like this point is closer to this blue now. So
1559
03:20:43,440 --> 03:20:49,440
be maybe this point looks like it'd be blue. So all these look
1560
03:20:50,159 --> 03:20:58,000
And the greens would probably be this cluster right here. So we go
1561
03:20:58,000 --> 03:21:08,959
bam. This one probably like almost here, bam. And then the green
1562
03:21:10,959 --> 03:21:21,919
here ish. Okay. And now we go back and we compute the we compute
1563
03:21:21,920 --> 03:21:32,879
So red, still this blue, I would argue is now this cluster here.
1564
03:21:33,360 --> 03:21:48,079
Okay, so we go and we recompute the centroids, bam, bam. And, you
1565
03:21:48,079 --> 03:21:54,399
to go and assign all the points to clusters again, I would get the
1566
03:21:54,399 --> 03:21:59,840
that's when we know that we can stop iterating between steps two
1567
03:21:59,840 --> 03:22:06,559
converged on some solution when we've reached some stable point.
1568
03:22:06,559 --> 03:22:10,399
these points are really changing out of their clusters anymore, we
1569
03:22:10,399 --> 03:22:19,199
and say, Hey, these are our three clusters. Okay. And this
1570
03:22:20,719 --> 03:22:33,279
expectation maximization. This part where we're assigning the
1571
03:22:33,280 --> 03:22:41,840
this is something this is our expectation step. And this part
1572
03:22:41,840 --> 03:22:54,000
centroids, this is our maximization step. Okay, so that's
1573
03:22:55,040 --> 03:23:02,720
And we use this in order to compute the centroids, assign all the
1574
03:23:02,719 --> 03:23:07,519
according to those centroids. And then we're recomputing all that
1575
03:23:07,520 --> 03:23:13,760
some stable point where nothing is changing anymore. Alright, so
1576
03:23:13,760 --> 03:23:19,200
of unsupervised learning. And basically, what this is doing is
1577
03:23:19,200 --> 03:23:25,520
some pattern in the data. So if I came up with another point, you
1578
03:23:25,520 --> 03:23:32,560
I can say, Oh, it looks like that's closer to if this is a, b, c,
1579
03:23:32,559 --> 03:23:38,239
cluster B. And so I would probably put it in cluster B. Okay, so
1580
03:23:38,239 --> 03:23:46,239
in the data based on just how, how the points are scattered
1581
03:23:46,239 --> 03:23:50,479
the second unsupervised learning technique that I'm going to
1582
03:23:50,479 --> 03:23:57,439
principal component analysis. And the point of principal component
1583
03:23:57,440 --> 03:24:07,520
used as a dimensionality reduction technique. So let me write that
1584
03:24:07,520 --> 03:24:15,520
reduction. And what do I mean by dimensionality reduction is if I
1585
03:24:15,520 --> 03:24:23,600
x1 x2 x3 x4, etc. Can I just reduce that down to one dimension
1586
03:24:23,600 --> 03:24:29,520
about how all these points are spread relative to one another. And
1587
03:24:29,520 --> 03:24:42,800
principal component analysis. Let's say I have some points in the
1588
03:24:42,799 --> 03:24:51,279
Okay, so these points might be spread, you know, something like
1589
03:24:59,680 --> 03:25:08,960
Okay. So for example, if this were something to do with housing
1590
03:25:08,959 --> 03:25:19,599
this here might be x zero might be hey, years since built, right,
1591
03:25:19,600 --> 03:25:29,920
and x one might be square footage of the house. Alright, so like
1592
03:25:29,920 --> 03:25:36,960
right now it's been, you know, 22 years since a house in 2000 was
1593
03:25:36,959 --> 03:25:40,799
analysis is just saying, alright, let's say we want to build a
1594
03:25:40,799 --> 03:25:48,639
you know, display something about our data, but we don't we don't
1595
03:25:49,520 --> 03:25:56,319
How do we display, you know, how do we how do we demonstrate that
1596
03:25:56,319 --> 03:26:04,239
this point than this point. And we can do that using principal
1597
03:26:04,239 --> 03:26:07,920
take what you know about linear regression and just forget about
1598
03:26:07,920 --> 03:26:16,879
you might get confused. PCA is a way of trying to find direction
1599
03:26:16,879 --> 03:26:23,920
variance. So this principal component, what that means is
1600
03:26:23,920 --> 03:26:38,879
So some direction in this space with the largest variance, okay,
1601
03:26:38,879 --> 03:26:42,639
data set without the two different dimensions. Like, let's say we
1602
03:26:42,639 --> 03:26:47,359
mentions, and somebody's telling us, hey, you only get one
1603
03:26:48,079 --> 03:26:53,840
What dimension do you want to show us? Okay, so let's say we want
1604
03:26:53,840 --> 03:26:59,040
what dimension like what do we do, we want to project our data
1605
03:27:00,159 --> 03:27:05,520
Alright, so that in this case might be a dimension that looks
1606
03:27:06,399 --> 03:27:10,639
this. And you might say, okay, we're not going to talk about
1607
03:27:11,680 --> 03:27:16,800
We don't have a y value. So linear regression, this would be why
1608
03:27:16,799 --> 03:27:23,199
have a label for that. Instead, what we're doing is we're taking
1609
03:27:23,200 --> 03:27:30,880
all of these take that's not very visible. But take this right
1610
03:27:33,040 --> 03:27:38,960
And what PCA is doing is saying, okay, map all of these points
1611
03:27:39,520 --> 03:27:44,000
So the transformed data set would be here.
1612
03:27:44,000 --> 03:27:49,760
This one's on the data sets are on the line. So we just put that
1613
03:27:49,760 --> 03:27:57,120
new one dimensional data set. Okay, it's not our prediction or
1614
03:27:57,120 --> 03:28:02,480
If somebody came to us said you only get one dimension, you only
1615
03:28:02,479 --> 03:28:06,879
each of these 2d points. What number would you give us? What
1616
03:28:06,879 --> 03:28:13,039
So this would be our new one dimensional data set. Okay, it's not
1617
03:28:13,040 --> 03:28:23,360
What number would you give me? This would be the number that we
1618
03:28:24,159 --> 03:28:29,840
this is where our points are the most spread out. Right? If I took
1619
03:28:31,040 --> 03:28:36,320
and let me actually duplicate this so I don't have to rewrite
1620
03:28:36,319 --> 03:28:43,840
Or so I don't have to erase and then redraw anything. Let me get
1621
03:28:47,440 --> 03:28:50,079
And I just got rid of a point there too. So let me draw that
1622
03:28:54,159 --> 03:29:01,039
Alright, so if this were my original data point, what if I had
1623
03:29:01,040 --> 03:29:12,960
the PCA dimension? Okay, well, I then would have points that let
1624
03:29:12,959 --> 03:29:24,319
color. So if I were to draw a right angle to this for every point,
1625
03:29:24,319 --> 03:29:37,440
like this. And so just intuitively looking at these two different
1626
03:29:37,440 --> 03:29:43,120
we can see that the points are squished a little bit closer
1627
03:29:43,120 --> 03:29:48,800
variance that's not the space with the largest variance. The thing
1628
03:29:48,799 --> 03:29:55,759
is that this will give us the most discrimination between all of
1629
03:29:55,760 --> 03:30:01,520
variance, the further spread out these points will likely be. Now,
1630
03:30:01,520 --> 03:30:07,600
dimension that we should project it on a different way to actually
1631
03:30:07,600 --> 03:30:14,399
dimension with the largest variance. It's actually it also happens
1632
03:30:14,399 --> 03:30:25,279
to be the dimension that decreases that minimizes the residuals.
1633
03:30:25,280 --> 03:30:33,520
we take the residual from that the XY residual, so in linear
1634
03:30:33,520 --> 03:30:37,760
we were looking only at this residual, the differences between the
1635
03:30:37,760 --> 03:30:44,800
y and y hat, it's not that here in principal component analysis,
1636
03:30:44,799 --> 03:30:52,319
from our current point in two dimensional space, and then it's
1637
03:30:52,319 --> 03:31:00,879
taking that dimension. And we're saying, alright, how much, you
1638
03:31:00,879 --> 03:31:08,719
between that projection residual, and we're trying to minimize
1639
03:31:08,719 --> 03:31:21,119
actually equates to this largest variance dimension, this
1640
03:31:21,120 --> 03:31:32,560
you can either look at it as minimizing, minimize, let me get rid
1641
03:31:34,559 --> 03:31:38,319
the projection residuals. So that's the stuff in orange.
1642
03:31:42,079 --> 03:31:48,319
Or to maximizing the variance between the points.
1643
03:31:48,319 --> 03:31:55,279
Okay. And we're not really going to talk about, you know, the
1644
03:31:55,280 --> 03:32:00,960
calculate out the principal components, or like what that
1645
03:32:00,959 --> 03:32:06,799
need to understand linear algebra for that, especially
1646
03:32:06,799 --> 03:32:12,079
I'm not going to cover in this class. But that's how you would
1647
03:32:12,079 --> 03:32:16,879
now, with this two dimensional data set here, sorry, this one
1648
03:32:16,879 --> 03:32:22,159
from a 2d data set, and we now boil it down to one dimension.
1649
03:32:22,159 --> 03:32:27,680
dimension, and we can do other things with it. Right, we can, like
1650
03:32:27,680 --> 03:32:35,040
then we can now show x versus y, rather than x zero and x one in
1651
03:32:35,040 --> 03:32:38,480
Now we can just say, oh, this is a principal component. And we're
1652
03:32:38,479 --> 03:32:44,559
the y. Or for example, if there were 100 different dimensions, and
1653
03:32:44,559 --> 03:32:51,199
them, well, you could go and you could find the top five PCA
1654
03:32:51,200 --> 03:32:58,400
more useful to you than 100 different feature vector values.
1655
03:32:58,399 --> 03:33:05,279
analysis. Again, we're taking, you know, certain data that's
1656
03:33:05,280 --> 03:33:13,760
some sort of estimation, like some guess about its structure from
1657
03:33:13,760 --> 03:33:20,159
wanted to take, you know, a 3d thing, so like a sphere, but we
1658
03:33:20,159 --> 03:33:26,079
on. Well, what's the best approximation that we can make? Oh, it's
1659
03:33:26,079 --> 03:33:30,079
the same thing. It's saying if we have something with all these
1660
03:33:30,079 --> 03:33:35,920
show all of them, how do we boil it down to just one dimension?
1661
03:33:35,920 --> 03:33:43,200
information from that multiple dimensions? And that is exactly
1662
03:33:43,200 --> 03:33:50,400
residuals, or you maximize the variance. And that is PCA. So we'll
1663
03:33:50,399 --> 03:33:57,039
Now, finally, let's move on to implementing the unsupervised
1664
03:33:57,040 --> 03:34:03,600
Here, again, I'm on the UCI machine learning repository. And I
1665
03:34:04,399 --> 03:34:09,440
you know, I have a bunch of kernels that belong to three different
1666
03:34:09,440 --> 03:34:17,120
comma, Rosa and Canadian. And the different features that we have
1667
03:34:17,120 --> 03:34:23,840
geometric parameters of those wheat kernels. So the area
1668
03:34:23,840 --> 03:34:30,639
width, asymmetry, and the length of the kernel groove. Okay, so
1669
03:34:30,639 --> 03:34:35,119
which is easy to work with. And what we're going to do is we're
1670
03:34:36,079 --> 03:34:40,479
or I guess we're going to try to cluster the different varieties
1671
03:34:41,440 --> 03:34:46,960
So let's get started. I have a colab notebook open again. Oh,
1672
03:34:46,959 --> 03:34:52,159
go to the data folder, download this. And so I'm going to go to
1673
03:34:52,159 --> 03:35:04,239
and let's get started. So the first thing to do is to import our
1674
03:35:04,239 --> 03:35:11,920
notebook. So I've done that here. Okay, and then we're going to
1675
03:35:11,920 --> 03:35:28,960
so pandas. And then I'm also going to import seedborn because I'm
1676
03:35:28,959 --> 03:35:40,239
specific class. Okay. Great. So now our columns that we have in
1677
03:35:40,239 --> 03:35:54,879
the perimeter, the compactness, the length, with asymmetry,
1678
03:35:54,879 --> 03:36:00,959
to call it groove. And then the class, right, the wheat kernels
1679
03:36:00,959 --> 03:36:11,199
I'm going to do that using pandas read CSV. And it's called seeds
1680
03:36:11,200 --> 03:36:19,040
that into a data frame. And the names are equal to the columns
1681
03:36:19,040 --> 03:36:29,120
do that? Oops, what did I call this seeds data set text? Alright,
1682
03:36:29,120 --> 03:36:36,800
data frame right now, you'll notice something funky. Okay. And
1683
03:36:36,799 --> 03:36:42,239
stuff under area. And these are all our numbers with some dash t.
1684
03:36:42,239 --> 03:36:50,799
haven't actually told pandas what the separator is, which we can
1685
03:36:50,799 --> 03:36:56,959
just a tab. So in order to ensure that like all whitespace gets
1686
03:36:56,959 --> 03:37:04,559
we can actually this is for like a space. So any spaces are going
1687
03:37:04,559 --> 03:37:13,279
separators. So if I run that, now our this, you know, this is a
1688
03:37:14,559 --> 03:37:20,719
So now let's actually go and like visualize this data. So what I'm
1689
03:37:20,719 --> 03:37:26,479
each of these against one another. So in this case, pretend that
1690
03:37:26,479 --> 03:37:31,279
class, right? Pretend that so this class here, I'm just going to
1691
03:37:31,280 --> 03:37:36,159
that like, hey, we can predict our classes using unsupervised
1692
03:37:36,159 --> 03:37:41,440
in unsupervised learning, we don't actually have access to the
1693
03:37:41,440 --> 03:37:49,920
plot these against one another and see what happens. So for some I
1694
03:37:49,920 --> 03:37:57,040
the columns minus one because the classes in the columns. And I'm
1695
03:37:57,040 --> 03:38:06,640
so take everything from I onwards, you know, so I like the next
1696
03:38:06,639 --> 03:38:15,519
So this will give us basically a grid of all the different like
1697
03:38:15,520 --> 03:38:24,399
going to be columns I our y label is going to be the columns j. So
1698
03:38:25,280 --> 03:38:34,000
And I'm going to use seaborne this time. And I'm going to say
1699
03:38:34,000 --> 03:38:46,399
to be our x label. Or y is going to be our y label. And our data
1700
03:38:46,399 --> 03:38:52,879
we're passing in. So what's interesting here is that we can say
1701
03:38:53,520 --> 03:38:57,920
like if I give this class, it's going to separate the three
1702
03:38:57,920 --> 03:39:03,200
hues. So now what we're doing is we're basically comparing the
1703
03:39:03,200 --> 03:39:10,880
and the compactness. But we're going to visualize, you know, what
1704
03:39:10,879 --> 03:39:22,399
ahead and I might have to show. So great. So basically, we can see
1705
03:39:22,399 --> 03:39:31,760
we get these three groups. The area compactness, we get these
1706
03:39:31,760 --> 03:39:40,639
kind of look honestly like somewhat similar. Right, so Wow, look
1707
03:39:40,639 --> 03:39:44,319
we have the compactness and the asymmetry. And it looks like
1708
03:39:44,319 --> 03:39:48,799
it just looks like they're blobs, right? Sure, maybe class three
1709
03:39:50,000 --> 03:39:55,680
one and two kind of look like they're on top of each other. Okay.
1710
03:39:55,680 --> 03:40:00,720
might look slightly better in terms of clustering. But let's go
1711
03:40:00,719 --> 03:40:05,920
clustering examples that we talked about, and try to implement
1712
03:40:05,920 --> 03:40:16,239
going to do is just straight up clustering. So what we learned
1713
03:40:16,239 --> 03:40:29,039
So from SK learn, I'm going to import k means. Okay. And just for
1714
03:40:29,040 --> 03:40:38,640
you know, any x and any y, I'm just going to say, hey, let's use
1715
03:40:40,959 --> 03:40:47,439
I mean, perimeter asymmetry could be a good one. So x could be
1716
03:40:47,440 --> 03:40:58,159
Okay. And for this, the x values, I'm going to just extract those
1717
03:40:59,840 --> 03:41:08,639
Alright, well, let's make a k means algorithm, or let's, you know,
1718
03:41:09,200 --> 03:41:15,760
and in this specific case, we know that the number of clusters is
1719
03:41:15,760 --> 03:41:27,120
I'm going to fit this against this x that I've just defined right
1720
03:41:27,120 --> 03:41:33,200
create this clusters, so one thing, one cool thing is I can
1721
03:41:33,200 --> 03:41:43,200
say k mean dot labels. And it'll give give me if I can type
1722
03:41:43,200 --> 03:41:52,159
predictions for all the clusters are. And our actual, oops, not
1723
03:41:52,159 --> 03:41:59,440
and we get the class, and the values from those, we can actually
1724
03:41:59,440 --> 03:42:05,200
like, you know, everything in general, most of the zeros that it's
1725
03:42:05,200 --> 03:42:11,360
And in general, the twos are the twos here. And then this third
1726
03:42:11,360 --> 03:42:16,560
to three. Now remember, these are separate classes. So the labels,
1727
03:42:16,559 --> 03:42:23,760
really matter. We can say a map zero to one map two to two and map
1728
03:42:23,760 --> 03:42:30,880
you know, our mapping would do fairly well. But we can actually
1729
03:42:30,879 --> 03:42:40,239
that, I'm going to create this cluster cluster data frame. So I'm
1730
03:42:40,239 --> 03:42:50,559
And I'm going to pass in a horizontally stacked array with x, so
1731
03:42:51,920 --> 03:42:58,159
the clusters that I have here, but I'm going to reshape them. So
1732
03:42:58,159 --> 03:43:14,319
Okay. And the columns, the labels for that are going to be x, y,
1733
03:43:14,319 --> 03:43:23,520
to go ahead and do that same seaborne scatter plot. Again, where x
1734
03:43:23,520 --> 03:43:32,159
the hue is again the class. And the data is now this cluster data
1735
03:43:35,760 --> 03:43:42,639
this here is my k means like, I guess classes.
1736
03:43:42,639 --> 03:43:54,319
So k means kind of looks like this. If I come down here and I
1737
03:43:54,319 --> 03:44:01,760
this is my original classes with respect to this specific x and y.
1738
03:44:01,760 --> 03:44:07,360
like it doesn't do too poorly. Yeah, there's I mean, the colors
1739
03:44:07,360 --> 03:44:16,000
For the most part, it gets information of the clusters, right. And
1740
03:44:16,000 --> 03:44:25,680
higher dimensions. So with the higher dimensions, if we make x
1741
03:44:25,680 --> 03:44:31,680
except for the last one, which is our class, we can do the exact
1742
03:44:31,680 --> 03:44:38,720
We can do the exact same thing. So here, and we can
1743
03:44:43,600 --> 03:44:55,360
predict this. But now, our columns are equal to our data frame
1744
03:44:55,360 --> 03:45:02,079
And then with this class, actually, so we can literally just say
1745
03:45:02,079 --> 03:45:09,760
And we can fit all of this. And now, if I want to plot the k means
1746
03:45:11,520 --> 03:45:20,079
Alright, so this was my that's my clustered and my original. So
1747
03:45:20,079 --> 03:45:27,360
get these on the same page. So yeah, I mean, pretty similar to
1748
03:45:27,360 --> 03:45:36,159
actually really cool is even something like, you know, if we
1749
03:45:36,159 --> 03:45:47,280
where they were like on top of each other? Okay, so compactness
1750
03:45:47,280 --> 03:45:57,680
Right. So if I come down here, and I say compactness and
1751
03:45:58,959 --> 03:46:05,119
this is what my scatterplot. So this is what you know, my k means
1752
03:46:05,120 --> 03:46:12,000
dimensions for compactness and asymmetry, if we just look at those
1753
03:46:12,000 --> 03:46:17,520
right? And we know that the original looks something like this.
1754
03:46:18,239 --> 03:46:25,119
alike? No. Okay, so now if I come back down here, and I rerun this
1755
03:46:25,120 --> 03:46:31,280
but actually, this clusters, I need to get the labels of the k
1756
03:46:34,559 --> 03:46:38,399
Okay, so if I rerun this with higher dimensions
1757
03:46:38,399 --> 03:46:45,600
well, if we zoom out, and we take a look at these two, sure, the
1758
03:46:45,600 --> 03:46:52,000
there are the three groups are there, right? This does a much
1759
03:46:52,000 --> 03:47:01,200
what group is what. So, for example, we could relabel the one in
1760
03:47:01,200 --> 03:47:08,400
And then we could make sorry, okay, this is kind of confusing. But
1761
03:47:08,399 --> 03:47:15,600
were projected onto this darker pink here, and then this dark one
1762
03:47:15,600 --> 03:47:21,280
and this light one was this dark one, then you kind of see like
1763
03:47:21,280 --> 03:47:26,159
right? Like even these two up here are the same class as all the
1764
03:47:26,159 --> 03:47:31,039
the same in the same color. So you don't want to compare the two
1765
03:47:31,040 --> 03:47:37,680
you want to compare which points are in what colors in each of the
1766
03:47:37,680 --> 03:47:44,079
application. So this is how k means functions, it's basically
1767
03:47:44,079 --> 03:47:50,239
All right, where are my clusters given these pieces of data? And
1768
03:47:50,239 --> 03:47:58,319
talked about is PCA. So PCA, we're reducing the dimension, but
1769
03:47:58,319 --> 03:48:02,799
you know, seven dimensions. I don't know if there are seven, I
1770
03:48:02,799 --> 03:48:09,199
mapping multiple dimensions into a lower dimension number. Right.
1771
03:48:10,079 --> 03:48:16,159
So from SK learn decomposition, I can import PCA and that will be
1772
03:48:16,159 --> 03:48:22,479
So if I do PCA component, so this is how many dimensions you want
1773
03:48:22,479 --> 03:48:28,319
And you know, for this exercise, let's do two. Okay, so now I'm
1774
03:48:29,360 --> 03:48:39,600
And my transformed x is going to be PCA dot fit transform, and the
1775
03:48:39,600 --> 03:48:46,559
And the same x that I had up here. Okay, so all the other all the
1776
03:48:46,559 --> 03:48:54,799
perimeter, compactness, length, width, asymmetry, groove. Okay. So
1777
03:48:54,799 --> 03:49:02,399
transformed it. So let's look at what the shape of x used to be.
1778
03:49:02,399 --> 03:49:10,879
I had 210 samples, each seven, seven features long, basically. And
1779
03:49:14,639 --> 03:49:20,079
is 210 samples, but only of length two, which means that I only
1780
03:49:20,079 --> 03:49:26,159
I'm plotting. And we can actually even take a look at, you know,
1781
03:49:27,200 --> 03:49:30,320
Okay, so now we see each each one is a two dimensional point
1782
03:49:30,319 --> 03:49:37,600
each sample is now a two dimensional point in our new in our new
1783
03:49:38,879 --> 03:49:42,959
So what's cool is I can actually scatter these
1784
03:49:46,639 --> 03:49:53,519
zero and transformed x. So I actually have to
1785
03:49:53,520 --> 03:49:59,280
take the columns here. And if I show that
1786
03:50:01,920 --> 03:50:06,879
basically, we've just taken this like seven dimensional thing, and
1787
03:50:06,879 --> 03:50:12,079
single or I guess to a two dimensional representation. So that's a
1788
03:50:13,200 --> 03:50:20,800
And actually, let's go ahead and do the same clustering exercise
1789
03:50:20,799 --> 03:50:29,840
the k means this PCA data frame, I can let's construct data frame
1790
03:50:29,840 --> 03:50:40,399
frame is going to be H stack. I'm going to take this transformed x
1791
03:50:40,399 --> 03:50:46,559
So actually, instead of clusters, I'm going to use k means dot
1792
03:50:46,559 --> 03:50:58,799
So it's 2d. So we can do the H stack. And for the columns, I'm
1793
03:50:59,680 --> 03:51:07,200
and the class. All right. So now if I take this, I can also do the
1794
03:51:08,159 --> 03:51:13,200
But instead of the k means labels, I want from the data frame the
1795
03:51:13,200 --> 03:51:20,720
And I'm just going to take the values from that. And so now I have
1796
03:51:20,719 --> 03:51:27,199
with PCA and then a data frame for the truth with also the PCA.
1797
03:51:27,200 --> 03:51:32,320
to how I plotted these up here. So let me actually take these
1798
03:51:32,319 --> 03:51:41,279
Instead of the cluster data frame, I want the this is the k means
1799
03:51:41,280 --> 03:51:51,200
to be class, but now x and y are going to be the two PCA
1800
03:51:51,200 --> 03:51:58,159
dimensions. And you can see that the data frame is going to be the
1801
03:51:58,159 --> 03:52:05,760
So these are my two PCA dimensions. And you can see that, you
1802
03:52:05,760 --> 03:52:14,319
out. And then here, I'm going to go to my truth classes. Again,
1803
03:52:14,319 --> 03:52:22,000
of k means this should be truth PCA data frame. So you can see
1804
03:52:22,000 --> 03:52:29,520
along these two dimensions, we actually are doing fairly well in
1805
03:52:29,520 --> 03:52:36,720
seem like this is slightly more separable than the other like
1806
03:52:36,719 --> 03:52:45,359
up here. So that's a good sign. And up here, you can see that hey,
1807
03:52:45,360 --> 03:52:51,440
another. I mean, for the most part, our algorithm or unsupervised
1808
03:52:51,440 --> 03:52:59,680
to give us is able to spit out, you know, what the proper labels
1809
03:52:59,680 --> 03:53:05,200
specific labels to the different types of kernels. But for
1810
03:53:05,200 --> 03:53:09,360
kernel kernels and same here. And then these might all be the
1811
03:53:09,360 --> 03:53:14,960
be the Canadian kernels. So it does struggle a little bit with,
1812
03:53:14,959 --> 03:53:21,119
But for the most part, our algorithm is able to find the three
1813
03:53:21,120 --> 03:53:26,480
fairly good job at predicting them without without any information
1814
03:53:26,479 --> 03:53:32,879
algorithm any labels. So that's a gist of unsupervised learning. I
1815
03:53:32,879 --> 03:53:38,799
this course. I hope you know, a lot of these examples made sense.
1816
03:53:38,799 --> 03:53:44,239
that I have done, and you know, you're somebody with more
1817
03:53:44,239 --> 03:53:50,559
in the comments and we can all as a community learn from this
182733
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.