All language subtitles for Machine-Learning-for-Everybody-Full-Course_en

af Afrikaans
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,000 --> 00:00:06,000 Kylie Ying has worked at many interesting places such as MIT, 2 00:00:06,000 --> 00:00:10,879 She's a physicist, engineer, and basically a genius. And now she's 3 00:00:10,880 --> 00:00:14,720 about machine learning in a way that is accessible to absolute 4 00:00:15,279 --> 00:00:21,600 What's up you guys? So welcome to Machine Learning for Everyone. 5 00:00:21,600 --> 00:00:27,520 is interested in machine learning and you think you are considered 6 00:00:27,519 --> 00:00:33,039 is for you. In this video, we'll talk about supervised and 7 00:00:33,039 --> 00:00:39,200 we'll go through maybe a little bit of the logic or math behind 8 00:00:39,200 --> 00:00:46,960 we can program it on Google CoLab. If there are certain things 9 00:00:46,960 --> 00:00:50,960 you're somebody with more experience than me, please feel free to 10 00:00:50,960 --> 00:00:58,000 and we can all as a community learn from this together. So with 11 00:00:58,000 --> 00:01:02,159 Without wasting any time, let's just dive straight into the code 12 00:01:02,159 --> 00:01:11,039 concepts as we go. So this here is the UCI machine learning 13 00:01:11,040 --> 00:01:15,280 they just have a ton of data sets that we can access. And I found 14 00:01:15,280 --> 00:01:22,560 the magic gamma telescope data set. So in this data set, if you 15 00:01:22,560 --> 00:01:28,320 to summarize what I what I think is going on, is there's this 16 00:01:28,319 --> 00:01:34,239 these high energy particles hitting the telescope. Now there's a 17 00:01:34,239 --> 00:01:40,399 actually records certain patterns of you know, how this light hits 18 00:01:40,400 --> 00:01:46,640 properties of those patterns in order to predict what type of 19 00:01:46,640 --> 00:01:54,879 whether it was a gamma particle, or some other head, like hadron. 20 00:01:54,879 --> 00:02:00,000 the attributes of those patterns that we collect in the camera. So 21 00:02:00,000 --> 00:02:06,480 know, some length, width, size, asymmetry, etc. Now we're going to 22 00:02:06,480 --> 00:02:12,400 help us discriminate the patterns and whether or not they came 23 00:02:13,199 --> 00:02:19,519 So in order to do this, we're going to come up here, go to the 24 00:02:19,520 --> 00:02:28,240 to click this magic zero for data, and we're going to download 25 00:02:28,240 --> 00:02:34,320 notebook open. So you go to colab dot research dot google.com, you 26 00:02:34,319 --> 00:02:43,120 I'm just going to call this the magic data set. So actually, I'm 27 00:02:43,120 --> 00:02:52,240 magic example. Okay. So with that, I'm going to first start with 28 00:02:52,240 --> 00:03:04,560 you know, I always import NumPy, I always import pandas. And I 29 00:03:06,080 --> 00:03:11,360 And then we'll import other things as we go. So yeah 30 00:03:14,080 --> 00:03:19,200 we run that in order to run the cell, you can either click this 31 00:03:19,199 --> 00:03:24,319 on my computer, it's just shift enter and that that will run the 32 00:03:24,319 --> 00:03:29,120 to order I'm just going to, you know, let you guys know, okay, 33 00:03:30,000 --> 00:03:34,080 So I've copied and pasted this actually, but this is just where I 34 00:03:35,199 --> 00:03:40,639 And in order to import that downloaded file that we we got from 35 00:03:40,639 --> 00:03:49,119 over here to this folder thing. And I am literally just going to 36 00:03:50,800 --> 00:03:55,840 Okay. So in order to take a look at, you know, what does this file 37 00:03:55,840 --> 00:03:59,840 do we have the labels? Do we not? I mean, we could open it on our 38 00:04:00,960 --> 00:04:06,640 pandas read CSV. And we can pass in the name of this file. 39 00:04:06,639 --> 00:04:14,559 And let's see what it returns. So it doesn't seem like we have the 40 00:04:16,160 --> 00:04:23,600 I'm just going to make the columns, the column labels, all of 41 00:04:23,600 --> 00:04:29,120 So I'm just going to take these values and make that the column 42 00:04:29,120 --> 00:04:36,079 All right, how do I do that? So basically, I will come back here, 43 00:04:36,079 --> 00:04:50,560 calls. And I will type in all of those things. With f size, f 44 00:04:50,560 --> 00:05:06,079 We have f symmetry, f m three long, f m three trans, f alpha. 45 00:05:09,839 --> 00:05:16,639 Okay, great. Now in order to label those as these columns down 46 00:05:16,639 --> 00:05:22,879 So basically, this command here just reads some CSV file that you 47 00:05:22,879 --> 00:05:31,519 separated values, and turns that into a pandas data frame object. 48 00:05:31,519 --> 00:05:38,799 then it basically assigns these labels to the columns of this data 49 00:05:38,800 --> 00:05:44,960 this data frame equal to DF. And then if we call the head is just 50 00:05:44,959 --> 00:05:50,799 give me the first five things. Now you'll see that we have labels 51 00:05:52,000 --> 00:05:57,519 All right, great. So one thing that you might notice is that over 52 00:05:57,519 --> 00:06:05,279 we have G and H. So if I actually go down here, and I do data 53 00:06:07,199 --> 00:06:11,519 you'll see that I have either G's or H's, and these stand for 54 00:06:11,519 --> 00:06:17,439 And our computer is not so good at understanding letters, right? 55 00:06:17,439 --> 00:06:23,279 understanding numbers. So what we're going to do is we're going to 56 00:06:23,279 --> 00:06:35,679 one for H. So here, I'm going to set this equal to this, whether 57 00:06:35,680 --> 00:06:42,560 I'm just going to say as type int. So what this should do is 58 00:06:43,360 --> 00:06:48,720 if it equals G, then this is true. So I guess that would be one. 59 00:06:48,720 --> 00:06:52,800 be false. So that would be zero, but I'm just converting G and H 60 00:06:52,800 --> 00:07:02,240 really matter. Like, if G is one and H is zero or vice versa. Let 61 00:07:02,240 --> 00:07:09,439 now and talk about this data set. So here I have some data frame, 62 00:07:09,439 --> 00:07:18,240 values for each entry. Now this is a you know, each of these is 63 00:07:18,240 --> 00:07:23,199 it's one item in our data set, it's one data point, all of these 64 00:07:23,199 --> 00:07:29,120 thing when I mentioned, oh, this is one example, or this is one 65 00:07:29,120 --> 00:07:36,240 these samples, they have, you know, one quality for each or one 66 00:07:36,240 --> 00:07:41,600 up here, and then it has the class. Now what we're going to do in 67 00:07:41,600 --> 00:07:50,800 predict for future, you know, samples, whether the class is G for 68 00:07:50,800 --> 00:08:00,319 that is something known as classification. Now, all of these up 69 00:08:00,319 --> 00:08:05,759 and features are just things that we're going to pass into our 70 00:08:05,759 --> 00:08:12,879 the label, which in this case is the class column. So for you 71 00:08:14,240 --> 00:08:19,519 10 different features. So I have 10 different values that I can 72 00:08:19,519 --> 00:08:26,719 And I can spit out, you know, the class the label, and I know the 73 00:08:26,720 --> 00:08:35,440 this is actually supervised learning. All right. So before I move 74 00:08:35,440 --> 00:08:43,360 little crash course on what I just said. This is machine learning 75 00:08:43,360 --> 00:08:49,759 question is, what is machine learning? Well, machine learning is a 76 00:08:49,759 --> 00:08:56,000 that focuses on certain algorithms, which might help a computer 77 00:08:56,000 --> 00:09:01,360 programmer being there telling the computer exactly what to do. 78 00:09:01,360 --> 00:09:08,480 programming. So you might have heard of AI and ML and data 79 00:09:08,480 --> 00:09:14,720 all of these. So AI is artificial intelligence. And that's an area 80 00:09:14,720 --> 00:09:22,080 goal is to enable computers and machines to perform human like 81 00:09:23,600 --> 00:09:31,600 Now machine learning is a subset of AI that tries to solve one 82 00:09:31,600 --> 00:09:39,840 using certain data. And data science is a field that attempts to 83 00:09:39,840 --> 00:09:45,840 from data. And that might mean we're using machine learning. So 84 00:09:45,840 --> 00:09:52,560 and all of them might use machine learning. So there are a few 85 00:09:52,559 --> 00:09:58,399 The first one is supervised learning. And in supervised learning, 86 00:09:58,399 --> 00:10:05,360 So this means whatever input we get, we have a corresponding 87 00:10:05,360 --> 00:10:12,960 models and to learn outputs of different new inputs that we might 88 00:10:12,960 --> 00:10:19,040 I might have these pictures, okay, to a computer, all these 89 00:10:19,039 --> 00:10:27,439 with a certain color. Now in supervised learning, all of these 90 00:10:27,440 --> 00:10:32,880 them, this is the output that we might want the computer to be 91 00:10:32,879 --> 00:10:39,200 over here, this picture is a cat, this picture is a dog, and this 92 00:10:41,600 --> 00:10:47,840 Now there's also unsupervised learning. And in unsupervised 93 00:10:47,840 --> 00:10:57,920 to learn about patterns in the data. So here are here are my input 94 00:10:57,919 --> 00:11:04,959 images, they're just pixels. Well, okay, let's say I have a bunch 95 00:11:05,759 --> 00:11:09,919 And what I can do is I can feed all these to my computer. And I 96 00:11:09,919 --> 00:11:14,479 my computer is not going to be able to say, Oh, this is a cat, dog 97 00:11:14,480 --> 00:11:19,680 you know, the output. But it might be able to cluster all these 98 00:11:19,679 --> 00:11:26,079 Hey, all of these have something in common. All of these have 99 00:11:26,080 --> 00:11:31,680 down here have something in common, that's finding some sort of 100 00:11:33,679 --> 00:11:40,159 And finally, we have reinforcement learning. And reinforcement 101 00:11:40,159 --> 00:11:46,480 there's an agent that is learning in some sort of interactive 102 00:11:46,480 --> 00:11:54,720 penalties. So let's think of a dog, we can train our dog, but 103 00:11:54,720 --> 00:12:02,879 any wrong or right output at any given moment, right? Well, let's 104 00:12:03,600 --> 00:12:08,240 Essentially, what we're doing is we're giving rewards to our 105 00:12:08,240 --> 00:12:15,200 Hey, this is probably something good that you want to keep doing. 106 00:12:16,879 --> 00:12:21,759 But in this class today, we'll be focusing on supervised learning 107 00:12:21,759 --> 00:12:29,120 and learning different models for each of those. Alright, so let's 108 00:12:29,120 --> 00:12:35,120 first. So this is kind of what a machine learning model looks like 109 00:12:35,120 --> 00:12:40,960 that are going into some model. And then the model is spitting out 110 00:12:41,919 --> 00:12:48,399 So all these inputs, this is what we call the feature vector. Now 111 00:12:48,399 --> 00:12:53,919 of features that we can have, we might have qualitative features. 112 00:12:53,919 --> 00:13:01,360 categorical data, there's either a finite number of categories or 113 00:13:01,360 --> 00:13:07,440 qualitative feature might be gender. And in this case, there's 114 00:13:07,440 --> 00:13:13,200 the example, I know this might be a little bit outdated. Here we 115 00:13:13,200 --> 00:13:19,840 two genders, there are two different categories. That's a piece of 116 00:13:19,840 --> 00:13:25,600 example might be okay, we have, you know, a bunch of different 117 00:13:25,600 --> 00:13:33,279 a nation or a location, that might also be an example of 118 00:13:33,279 --> 00:13:43,199 these, there's no inherent order. It's not like, you know, we can 119 00:13:43,200 --> 00:13:51,840 three, etc. Right? There's not really any inherent order built 120 00:13:51,840 --> 00:14:00,240 data sets. That's why we call this nominal data. Now, for nominal 121 00:14:00,240 --> 00:14:06,639 to feed it into our computer is using something called one hot 122 00:14:06,639 --> 00:14:13,120 know, I have a data set, some of the items in our data, some of 123 00:14:13,120 --> 00:14:19,200 some might be from India, then Canada, then France. Now, how do we 124 00:14:19,200 --> 00:14:24,560 we have to do something called one hot encoding. And basically, 125 00:14:24,559 --> 00:14:30,239 well, if it matches some category, make that a one. And if it 126 00:14:31,120 --> 00:14:40,159 So for example, if your input were from the US, you would you 127 00:14:40,159 --> 00:14:46,879 0100. Canada, okay, well, the item representing Canada is one and 128 00:14:46,879 --> 00:14:52,240 France is one. And then you can see that the rest are zeros, 129 00:14:54,480 --> 00:15:00,480 Now, there are also a different type of qualitative feature. So 130 00:15:00,480 --> 00:15:07,440 there are different age groups, there's babies, toddlers, 131 00:15:08,639 --> 00:15:15,840 adults, and so on, right. And on the right hand side, we might 132 00:15:15,840 --> 00:15:26,160 bad, not so good, mediocre, good, and then like, great. Now, these 133 00:15:26,159 --> 00:15:33,600 data, because they have some sort of inherent order, right? Like, 134 00:15:33,600 --> 00:15:41,680 being a baby than being an elderly person, right? Or good is 135 00:15:41,679 --> 00:15:48,559 bad. So these have some sort of inherent ordering system. And so 136 00:15:48,559 --> 00:15:54,399 we can actually just mark them from, you know, one to five, or we 137 00:15:54,399 --> 00:16:02,959 let's give it a number. And this makes sense. Because, like, for 138 00:16:02,960 --> 00:16:09,759 just said, how good is closer to great, then good is close to not 139 00:16:09,759 --> 00:16:14,559 to five, then four is close to one. So this actually kind of makes 140 00:16:14,559 --> 00:16:22,399 computer as well. Alright, there are also quantitative pieces of 141 00:16:22,960 --> 00:16:29,040 pieces of data are numerical valued pieces of data. So this could 142 00:16:29,039 --> 00:16:34,159 you know, they might be integers, or it could be continuous, which 143 00:16:34,159 --> 00:16:40,799 So for example, the length of something is a quantitative piece of 144 00:16:40,799 --> 00:16:46,559 feature, the temperature of something is a quantitative feature. 145 00:16:46,559 --> 00:16:53,679 Easter eggs I collected in my basket, this Easter egg hunt, that 146 00:16:53,679 --> 00:17:02,079 feature. Okay, so these are continuous. And this over here is the 147 00:17:02,080 --> 00:17:08,400 that go into our feature vector, those are our features that we're 148 00:17:08,400 --> 00:17:14,800 our computers are really, really good at understanding math, right 149 00:17:14,799 --> 00:17:19,680 they're not so good at understanding things that humans might be 150 00:17:21,759 --> 00:17:29,680 Well, what are the types of predictions that our model can output? 151 00:17:29,680 --> 00:17:35,440 there are some different tasks, there's one classification, and 152 00:17:35,440 --> 00:17:42,000 just saying, okay, predict discrete classes. And that might mean, 153 00:17:42,799 --> 00:17:48,639 this is a pizza, and this is ice cream. Okay, so there are three 154 00:17:48,640 --> 00:17:56,480 pictures of hot dogs, pizza or ice cream, I can put under these 155 00:17:56,480 --> 00:18:03,440 Hot dog, pizza, ice cream. This is something known as multi class 156 00:18:03,440 --> 00:18:10,640 binary classification. And binary classification, you might have 157 00:18:10,640 --> 00:18:14,240 only two categories that you're working with something that is 158 00:18:14,240 --> 00:18:23,680 isn't binary classification. Okay, so yeah, other examples. So if 159 00:18:23,680 --> 00:18:28,960 sentiment, that's binary classification. Maybe you're predicting 160 00:18:28,960 --> 00:18:35,039 dogs. That's binary classification. Maybe, you know, you are 161 00:18:35,039 --> 00:18:40,559 trying to figure out if an email spam or not spam. So that's also 162 00:18:41,759 --> 00:18:46,240 Now for multi class classification, you might have, you know, cat, 163 00:18:46,960 --> 00:18:53,519 rabbit, etc. We might have different types of fruits like orange, 164 00:18:53,519 --> 00:18:59,440 maybe different plant species. But multi class classification just 165 00:18:59,440 --> 00:19:06,320 and binary means we're predicting between two things. There's also 166 00:19:06,319 --> 00:19:11,359 when we talk about supervised learning. And this just means we're 167 00:19:11,359 --> 00:19:15,759 values. So instead of just trying to predict different categories, 168 00:19:15,759 --> 00:19:24,400 with a number that you know, is on some sort of scale. So some 169 00:19:24,400 --> 00:19:31,040 be the price of aetherium tomorrow, or it might be okay, what is 170 00:19:31,759 --> 00:19:37,440 Or it might be what is the price of this house? Right? So these 171 00:19:37,440 --> 00:19:43,920 discrete classes. We're trying to predict a number that's as close 172 00:19:43,920 --> 00:19:51,759 using different features of our data set. So that's exactly what 173 00:19:51,759 --> 00:19:59,279 supervised learning. Now let's talk about the model itself. How do 174 00:19:59,920 --> 00:20:05,120 Or how can we tell whether or not it's even learning? So before we 175 00:20:05,680 --> 00:20:10,320 let's talk about how can we actually like evaluate these models? 176 00:20:10,319 --> 00:20:19,039 whether something is a good model or bad model? So let's take a 177 00:20:19,039 --> 00:20:26,639 set has this is from a diabetes, a Pima Indian diabetes data set. 178 00:20:26,640 --> 00:20:32,640 number of pregnancies, different glucose levels, blood pressure, 179 00:20:32,640 --> 00:20:37,520 age, and then the outcome whether or not they have diabetes one 180 00:20:37,519 --> 00:20:46,639 So here, all of these are quantitative features, right, because 181 00:20:48,720 --> 00:20:56,160 So each row is a different sample in the data. So it's a different 182 00:20:56,160 --> 00:21:04,240 and each row represents one person in this data set. Now this 183 00:21:04,240 --> 00:21:11,599 different feature. So this one here is some measure of blood 184 00:21:11,599 --> 00:21:17,119 over here, as we mentioned is the output label. So this one is 185 00:21:19,039 --> 00:21:23,759 And as I mentioned, this is what we would call a feature vector, 186 00:21:23,759 --> 00:21:33,519 features in one sample. And this is what's known as the target, or 187 00:21:33,519 --> 00:21:41,279 vector. That's what we're trying to predict. And all of these 188 00:21:42,640 --> 00:21:51,920 And over here, this is our labels or targets vector y. So I've 189 00:21:51,920 --> 00:21:58,000 bar to kind of talk about some of the other concepts in machine 190 00:21:58,000 --> 00:22:08,160 we have our x, our features matrix, and over here, this is our 191 00:22:08,160 --> 00:22:15,200 will be fed into our model, right. And our model will make some 192 00:22:15,200 --> 00:22:21,920 is we compare that prediction to the actual value of y that we 193 00:22:21,920 --> 00:22:26,960 that's the whole point of supervised learning is we can compare 194 00:22:26,960 --> 00:22:31,920 oh, what is the truth, actually, and then we can go back and we 195 00:22:31,920 --> 00:22:41,039 iteration, we get closer to what the true value is. So that whole 196 00:22:41,039 --> 00:22:46,399 okay, what's the difference? Where did we go wrong? That's what's 197 00:22:47,680 --> 00:22:54,080 Alright, so take this whole, you know, chunk right here, do we 198 00:22:54,079 --> 00:23:02,319 chocolate bar into the model to train our model? Not really, 199 00:23:02,319 --> 00:23:10,240 how do we know that our model can do well on new data that we 200 00:23:10,240 --> 00:23:18,000 create a model to predict whether or not someone has diabetes, 201 00:23:18,000 --> 00:23:23,119 data, and I see that all my training data does well, I go to some 202 00:23:23,119 --> 00:23:28,559 model. I think you can use this to predict if somebody has 203 00:23:28,559 --> 00:23:41,039 be effective or not? Probably not, right? Because we haven't 204 00:23:41,039 --> 00:23:46,879 generalize. Okay, it might do well after you know, our model has 205 00:23:46,880 --> 00:23:54,960 over again. But what about new data? Can our model handle new 206 00:23:54,960 --> 00:24:02,319 model to assess that? So we actually break up our whole data set 207 00:24:02,319 --> 00:24:07,759 types of data sets, we call it the training data set, the 208 00:24:07,759 --> 00:24:15,759 set. And you know, you might have 60% here 20% and 20% or 80 10 209 00:24:15,759 --> 00:24:22,000 many statistics you have, I think either of those would be 210 00:24:22,000 --> 00:24:28,960 the training data set into our model, we come up with, you know, 211 00:24:28,960 --> 00:24:36,079 corresponding with each sample that we put into our model, we 212 00:24:36,079 --> 00:24:42,879 between our prediction and the true values, this is something 213 00:24:42,880 --> 00:24:50,080 what's the difference here, in some numerical quantity, of course. 214 00:24:50,079 --> 00:24:57,599 and that's what we call training. Okay. So then, once you know, 215 00:24:58,480 --> 00:25:06,000 we can put our validation set through this model. And the 216 00:25:06,000 --> 00:25:14,559 check during or after training to ensure that the model can handle 217 00:25:14,559 --> 00:25:19,599 single time after we train one iteration, we might stick the 218 00:25:19,599 --> 00:25:25,679 the loss there. And then after our training is over, we can assess 219 00:25:25,680 --> 00:25:32,400 hey, what's the loss there. But one key difference here is that we 220 00:25:32,400 --> 00:25:38,080 this loss never gets fed back into the model, right, that feedback 221 00:25:38,799 --> 00:25:45,919 Alright, so let's talk about loss really quickly. So here, I have 222 00:25:45,920 --> 00:25:52,960 I have some sort of data that's being fed into the model, and then 223 00:25:52,960 --> 00:26:02,720 here is pretty far from you know, this truth that we want. And so 224 00:26:02,720 --> 00:26:07,839 model B, again, this is pretty far from what we want. So this loss 225 00:26:07,839 --> 00:26:15,759 let's give it 1.5. Now this one here, it's pretty close, I mean, 226 00:26:15,759 --> 00:26:23,839 to this one. So that might have a loss of 0.5. And then this one 227 00:26:23,839 --> 00:26:30,319 but still better than these two. So that loss might be 0.9. Okay, 228 00:26:30,319 --> 00:26:40,079 performs the best? Well, model C has a smallest loss, so it's 229 00:26:40,079 --> 00:26:45,679 take model C. After you know, we've come up with these, all these 230 00:26:45,680 --> 00:26:52,880 C is probably the best model. We take model C, and we run our test 231 00:26:52,880 --> 00:27:00,720 test set is used as a final check to see how generalizable that 232 00:27:00,720 --> 00:27:05,680 you know, finish training my diabetes data set, then I could run 233 00:27:05,680 --> 00:27:11,519 data and I can say, oh, like, this is how we perform on data that 234 00:27:11,519 --> 00:27:19,599 any point during the training process. Okay. And that loss, that's 235 00:27:19,599 --> 00:27:27,199 of my test set, or this would be the final reported performance of 236 00:27:29,279 --> 00:27:34,879 So let's talk about this thing called loss, because I think I kind 237 00:27:34,880 --> 00:27:41,600 right? So loss is the difference between your prediction and the 238 00:27:43,200 --> 00:27:50,640 So this would give a slightly higher loss than this. And this 239 00:27:50,640 --> 00:27:56,960 because it's even more off. In computer science, we like formulas, 240 00:27:57,599 --> 00:28:03,279 of describing things. So here are some examples of loss functions 241 00:28:03,279 --> 00:28:10,160 up with numbers. This here is known as L one loss. And basically, 242 00:28:10,160 --> 00:28:18,080 absolute value of whatever your you know, real value is, whatever 243 00:28:18,640 --> 00:28:26,160 subtracts the predicted value, and takes the absolute value of 244 00:28:26,160 --> 00:28:34,000 value is a function that looks something like this. So the further 245 00:28:35,519 --> 00:28:42,480 right in either direction. So if your real value is off from your 246 00:28:42,480 --> 00:28:47,519 then your loss for that point would be 10. And then this sum here 247 00:28:47,519 --> 00:28:53,039 we're taking all the points in our data set. And we're trying to 248 00:28:53,039 --> 00:29:01,599 everything is. Now, we also have something called L two loss. So 249 00:29:01,599 --> 00:29:08,559 which means that if it's close, the penalty is very minimal. And 250 00:29:08,559 --> 00:29:15,839 then the penalty is much, much higher. Okay. And this instead of 251 00:29:15,839 --> 00:29:26,000 the the difference between the two. Now, there's also something 252 00:29:26,960 --> 00:29:32,720 It looks something like this. And this is for binary 253 00:29:32,720 --> 00:29:38,960 loss that we use. So this loss, you know, I'm not going to really 254 00:29:38,960 --> 00:29:47,840 But you just need to know that loss decreases as the performance 255 00:29:47,839 --> 00:29:53,679 other measures of accurate or performance as well. So for example, 256 00:29:55,440 --> 00:30:02,559 So let's say that these are pictures that I'm feeding my model, 257 00:30:02,559 --> 00:30:11,359 might be apple, orange, orange, apple, okay, but the actual is 258 00:30:12,240 --> 00:30:17,680 three of them were correct. And one of them was incorrect. So the 259 00:30:17,680 --> 00:30:25,600 three quarters or 75%. Alright, coming back to our colab notebook, 260 00:30:25,599 --> 00:30:33,039 bit. Again, we've imported stuff up here. And we've already 261 00:30:33,039 --> 00:30:39,599 this is this is all of our data. This is what we're going to use 262 00:30:40,559 --> 00:30:49,039 again, if we now take a look at our data set, you'll see that our 263 00:30:49,039 --> 00:30:53,119 So now this is all numerical, which is good, because our computer 264 00:30:53,119 --> 00:31:00,719 Okay. And you know, it would probably be a good idea to maybe kind 265 00:31:00,720 --> 00:31:10,240 have anything to do with the class. So here, I'm going to go 266 00:31:10,240 --> 00:31:15,839 in the columns of this data frame. So this just gets me the list. 267 00:31:15,839 --> 00:31:20,879 right? It's called so let's just use that might be less confusing 268 00:31:20,880 --> 00:31:26,560 thing, which is the class. So I'm going to take all these 10 269 00:31:26,559 --> 00:31:37,039 to plot them as a histogram. So and now I'm going to plot them as 270 00:31:37,039 --> 00:31:45,599 take that data frame, and I say, okay, for everything where the 271 00:31:45,599 --> 00:31:55,279 of our gammas, remember, now, for that portion of the data frame, 272 00:31:55,279 --> 00:32:03,440 these, okay, what this part here is saying is, inside the data 273 00:32:03,440 --> 00:32:08,480 the class is equal to one. So that's all all of these would fit 274 00:32:09,119 --> 00:32:14,079 And now let's just look at the label column. So the first label 275 00:32:14,079 --> 00:32:20,480 be this column. So this command here is getting me all the 276 00:32:20,480 --> 00:32:27,200 for this specific label. And that's exactly what I'm going to put 277 00:32:27,200 --> 00:32:34,960 just going to tell you know, matplotlib make the color blue, make 278 00:32:37,039 --> 00:32:43,279 set alpha, why do I keep doing that, alpha equal to 0.7. So that's 279 00:32:43,279 --> 00:32:48,399 And then I'm going to set density equal to true, so that when we 280 00:32:50,000 --> 00:32:56,960 the hadrons here, we'll have a baseline for comparing them. Okay, 281 00:32:56,960 --> 00:33:05,360 just basically normalizes these distributions. So you know, if you 282 00:33:05,359 --> 00:33:12,079 and then 50 of another type, well, if you drew the histograms, it 283 00:33:12,079 --> 00:33:17,599 one of them would be a lot bigger than the other, right. But by 284 00:33:17,599 --> 00:33:24,240 distributing them over how many samples there are. Alright, and 285 00:33:24,240 --> 00:33:31,680 on here and make that the label, the y label. So because it's 286 00:33:32,799 --> 00:33:36,319 And the x label is just going to be the label. 287 00:33:36,319 --> 00:33:44,639 What is going on. And I'm going to include a legend and PLT dot 288 00:33:44,640 --> 00:33:54,800 the plot. So if I run that, just be up to the last item. So we 289 00:33:54,799 --> 00:34:02,240 item. And now we can see that we're plotting all of these. So here 290 00:34:02,240 --> 00:34:11,199 made this gamma. So this should be hadron. Okay, so the gammas in 291 00:34:11,199 --> 00:34:16,559 here we can already see that, you know, maybe if the length is 292 00:34:16,559 --> 00:34:24,320 to be gamma, right. And we can kind of you know, these all look 293 00:34:24,320 --> 00:34:34,640 clearly, if there's more asymmetry, or if you know, this asymmetry 294 00:34:34,639 --> 00:34:44,480 probably hadron. Okay, oh, this one's a good one. So f alpha seems 295 00:34:44,480 --> 00:34:48,960 distributed. Whereas if this is smaller, it looks like there's 296 00:34:48,960 --> 00:34:54,480 Okay, so this is kind of what the data that we're working with, we 297 00:34:55,920 --> 00:35:02,079 Okay, so the next thing that we're going to do here is we are 298 00:35:03,119 --> 00:35:12,880 our validation, and our test data sets. I'm going to set train 299 00:35:12,880 --> 00:35:20,800 this. So NumPy dot split, I'm just splitting up the data frame. 300 00:35:20,800 --> 00:35:29,360 where I'm sampling everything, this will basically shuffle my 301 00:35:29,360 --> 00:35:38,320 exactly I'm splitting my data set, so the first split is going to 302 00:35:38,320 --> 00:35:44,720 to say 0.6 times the length of this data frame. So and then cast 303 00:35:44,719 --> 00:35:50,559 to be the first place where you know, I cut it off, and that'll be 304 00:35:50,559 --> 00:35:57,360 then go to 0.8, this basically means everything between 60% and 305 00:35:57,360 --> 00:36:03,760 set will go towards validation. And then, like everything from 80 306 00:36:03,760 --> 00:36:12,080 my test data. So I can run that. And now, if we go up here, and we 307 00:36:12,079 --> 00:36:20,480 these columns seem to have values in like the 100s, whereas this 308 00:36:20,480 --> 00:36:28,240 all these numbers is way off. And sometimes that will affect our 309 00:36:28,239 --> 00:36:35,919 is way off. And sometimes that will affect our results. So one 310 00:36:35,920 --> 00:36:46,240 is scale these so that they are, you know, so that it's now 311 00:36:46,239 --> 00:36:54,399 standard deviation of that specific column. I'm going to create a 312 00:36:54,400 --> 00:37:04,880 And I'm going to pass in the data frame. And that's what I'll do 313 00:37:04,880 --> 00:37:14,320 going to be, you know, I take the data frame. And let's assume 314 00:37:14,320 --> 00:37:20,000 you know, that the label will always be the last thing in the data 315 00:37:20,000 --> 00:37:28,559 data frame, dot columns all the way up to the last item, and get 316 00:37:30,000 --> 00:37:34,239 well, it's the last column. So I can just do this, I can just 317 00:37:34,800 --> 00:37:46,640 and then get those values. Now, in, so I'm actually going to 318 00:37:46,639 --> 00:37:55,199 the standard scalar from sk learn. So if I come up here, I can go 319 00:37:56,079 --> 00:38:04,880 And I'm going to import standard scalar, I have to run that cell, 320 00:38:04,880 --> 00:38:10,880 And now I'm going to create a scalar and use that skip or so 321 00:38:10,880 --> 00:38:21,119 And with the scalar, what I can do is actually just fit and 322 00:38:21,119 --> 00:38:31,599 is equal to scalar dot fit, fit, transform x. So what that's doing 323 00:38:31,599 --> 00:38:36,799 fit the standard scalar to x, and then transform all those values. 324 00:38:36,800 --> 00:38:45,039 going to be our new x. Alright. And then I'm also going to just 325 00:38:45,039 --> 00:38:53,920 one huge 2d NumPy array. And in order to do that, I'm going to 326 00:38:53,920 --> 00:38:58,400 okay, take an array, and another array and horizontally stack them 327 00:38:58,400 --> 00:39:03,440 the H stands for. So by horizontally stacked them together, just 328 00:39:03,440 --> 00:39:09,200 okay, not on top of each other. So what am I stacking? Well, I 329 00:39:10,000 --> 00:39:20,400 so that it can stack x and y. And now, okay, so NumPy is very 330 00:39:20,400 --> 00:39:27,119 right? So in this specific case, our x is a two dimensional 331 00:39:27,119 --> 00:39:35,440 thing, it's only a vector of values. So in order to now reshape it 332 00:39:35,440 --> 00:39:45,200 NumPy dot reshape. And we can pass in the dimensions of its 333 00:39:45,199 --> 00:39:51,039 one comma one, that just means okay, make this a 2d array, where 334 00:39:51,039 --> 00:39:56,719 what what this dimension value would be, which ends up being the 335 00:39:56,719 --> 00:40:01,439 same as literally doing this. But the negative one is easier 336 00:40:01,440 --> 00:40:13,119 do the hard work. So if I stack that, I'm going to then return the 337 00:40:13,119 --> 00:40:18,480 thing is that if we go into our training data set, okay, again, 338 00:40:18,480 --> 00:40:28,240 And we get the length of the training data set. But where the 339 00:40:28,239 --> 00:40:39,439 so remember that this is the gammas. And then if we print that, 340 00:40:39,440 --> 00:40:49,039 we'll see that, you know, there's around 7000 of the gammas, but 341 00:40:49,039 --> 00:40:57,360 So that might actually become an issue. And instead, what we want 342 00:40:57,360 --> 00:41:06,200 our our training data set. So that means that we want to increase 343 00:41:06,199 --> 00:41:13,960 so that these kind of match better. And surprise, surprise, there 344 00:41:13,960 --> 00:41:23,159 that will help us do that. It's so I'm going to go to from in the 345 00:41:23,159 --> 00:41:31,759 going to import this random oversampler, run that cell, and come 346 00:41:31,760 --> 00:41:43,640 add in this parameter called oversample, and set that to false for 347 00:41:43,639 --> 00:41:51,239 oversample, then what I'm going to do, and by oversample, so if I 348 00:41:51,239 --> 00:41:59,559 then I'm going to create this ROS and set it equal to this random 349 00:41:59,559 --> 00:42:06,960 I'm just going to say, okay, just fit and resample x and y. And 350 00:42:06,960 --> 00:42:15,000 take more of the less class. So take take the less class and keep 351 00:42:15,000 --> 00:42:24,039 the size of our data set of that smaller class so that they now 352 00:42:24,039 --> 00:42:33,279 data set, and I pass in the training data set where oversample is 353 00:42:33,280 --> 00:42:48,400 is train and then x train, y train. Oops, what's going on? These 354 00:42:48,400 --> 00:42:55,039 what I'm doing now is I'm just saying, okay, what is the length of 355 00:42:55,039 --> 00:43:05,440 14,800, whatever. And now let's take a look at how many of these 356 00:43:05,440 --> 00:43:12,720 we can just sum that up. And then we'll also see that if we 357 00:43:12,719 --> 00:43:19,799 many of them are the other type, it's the same value. So now these 358 00:43:19,800 --> 00:43:31,320 rebalanced. Okay, well, okay. So here, I'm just going to make this 359 00:43:31,320 --> 00:43:39,880 then the next one, I'm going to make this the test data set. 360 00:43:39,880 --> 00:43:46,280 switch oversample here to false. Now, the reason why I'm switching 361 00:43:46,280 --> 00:43:51,840 validation and my test sets are for the purpose of you know, if I 362 00:43:51,840 --> 00:43:59,680 how does my sample perform on those? And I don't want to 363 00:43:59,679 --> 00:44:06,559 I don't care about balancing those I'm, I want to know if I have a 364 00:44:06,559 --> 00:44:16,840 unlabeled, can I trust my model, right? So that's why I'm not 365 00:44:16,840 --> 00:44:23,120 what is going on? Oh, it's because we already have this train. So 366 00:44:23,119 --> 00:44:32,279 that data frame again. And now let's run these. Okay. So now we 367 00:44:32,280 --> 00:44:37,040 And we're going to move on to different models now. And I'm going 368 00:44:37,039 --> 00:44:43,000 about each of these models. And then I'm going to show you how we 369 00:44:43,000 --> 00:44:49,880 first model that we're going to learn about is KNN or K nearest 370 00:44:49,880 --> 00:44:57,720 already drawn a plot on the y axis, I have the number of kids that 371 00:44:57,719 --> 00:45:07,399 on the x axis, I have their income in terms of 1000s per year. So, 372 00:45:07,400 --> 00:45:12,360 making 40,000 a year, that's where this would be. And if somebody 373 00:45:12,360 --> 00:45:18,000 would be somebody has zero kids, it'd be somewhere along this 374 00:45:18,000 --> 00:45:28,400 somewhere over here. Okay. And now I have these plus signs and 375 00:45:28,400 --> 00:45:42,480 I'm going to represent here is the plus sign means that they own a 376 00:45:42,480 --> 00:45:49,800 to represent no car. Okay. So your initial thought should be okay, 377 00:45:49,800 --> 00:46:00,240 classification because all of our points all of our samples have 378 00:46:00,239 --> 00:46:13,000 the plus label. And this here is another sample with the minus 379 00:46:13,000 --> 00:46:20,760 width that I'll use. Alright, so we have this entire data set. And 380 00:46:20,760 --> 00:46:29,200 own a car and maybe around half the people don't own a car. Okay, 381 00:46:29,199 --> 00:46:35,399 point, let me use choose a different color, I'll use this nice 382 00:46:35,400 --> 00:46:42,720 point over here? So let's say that somebody makes 40,000 a year 383 00:46:42,719 --> 00:46:52,439 that would be? Well, just logically looking at this plot, you 384 00:46:52,440 --> 00:46:57,800 they wouldn't have a car, right? Because that kind of matches the 385 00:46:57,800 --> 00:47:06,240 them. So that's a whole concept of this nearest neighbors is you 386 00:47:06,239 --> 00:47:11,319 And then you're basically like, okay, I'm going to take the label 387 00:47:11,320 --> 00:47:17,640 So the first thing that we have to do is we have to define a 388 00:47:17,639 --> 00:47:25,279 in, you know, 2d plots like this, our distance function is 389 00:47:25,280 --> 00:47:45,480 And Euclidean distance is basically just this straight line 390 00:47:45,480 --> 00:47:54,000 would be the Euclidean distance, it seems like there's this point, 391 00:47:54,000 --> 00:48:00,679 that point, etc. So the length of this line, this green line that 392 00:48:00,679 --> 00:48:10,159 as Euclidean distance. If we want to get technical with that, this 393 00:48:10,159 --> 00:48:20,199 let me zoom in. The distance is equal to the square root of one 394 00:48:20,199 --> 00:48:29,159 squared plus extend that square root, the same thing for y. So y 395 00:48:29,159 --> 00:48:36,159 other squared. Okay, so we're basically trying to find the length, 396 00:48:36,159 --> 00:48:43,719 between x and y, and then square each of those sum it up and take 397 00:48:43,719 --> 00:48:53,239 going to erase this so it doesn't clutter my drawing. But anyways, 398 00:48:53,239 --> 00:49:03,519 so here in the nearest neighbor algorithm, we see that there is a 399 00:49:03,519 --> 00:49:09,719 telling us, okay, how many neighbors do we use in order to judge 400 00:49:09,719 --> 00:49:16,519 we use a K of maybe, you know, three or five, depends on how big 401 00:49:16,519 --> 00:49:25,360 I would say, maybe a logical number would be three or five. So 402 00:49:25,360 --> 00:49:34,640 to three. Okay, well, of this data point that I drew over here, 403 00:49:34,639 --> 00:49:40,199 Okay, so of this data point that I drew over here, it looks like 404 00:49:40,199 --> 00:49:50,359 this one, this one. And then this one has a length of four. And 405 00:49:50,360 --> 00:49:57,559 bit further than four. So actually, this would be these would be 406 00:49:57,559 --> 00:50:05,920 points are blue. So chances are, my prediction for this point is 407 00:50:05,920 --> 00:50:14,840 probably don't have a car. All right, now what if my point is 408 00:50:14,840 --> 00:50:26,120 somewhere over here, let's say that a couple has four kids, and 409 00:50:26,119 --> 00:50:34,159 well, now my closest points are this one, probably a little bit 410 00:50:34,159 --> 00:50:45,639 right? Okay, still all pluses. Well, this one is more than likely 411 00:50:45,639 --> 00:50:55,279 let me get rid of some of these just so that it looks a little bit 412 00:50:55,280 --> 00:51:06,960 let's go through one more. What about a point that might be right 413 00:51:06,960 --> 00:51:16,000 definitely this is the closest, right? This one's also closest. 414 00:51:16,000 --> 00:51:22,719 the two of these. But if we actually do the mathematics, it seems 415 00:51:22,719 --> 00:51:30,839 this one is right here. And this one is in between these two. So 416 00:51:30,840 --> 00:51:37,920 than this one. And that means that that top one is the one that 417 00:51:37,920 --> 00:51:45,079 what is the majority of the points that are close by? Well, we 418 00:51:45,079 --> 00:51:52,159 here, and we have one minus here, which means that the pluses are 419 00:51:52,159 --> 00:52:04,559 that this label is probably somebody with a car. Okay. So this is 420 00:52:04,559 --> 00:52:13,599 work. It's that simple. And this can be extrapolated to further 421 00:52:13,599 --> 00:52:19,400 know, if you have here, we have two different features, we have 422 00:52:19,400 --> 00:52:25,920 the number of kids. But let's say we have 10 different features, 423 00:52:25,920 --> 00:52:31,519 function so that it includes all 10 of those dimensions, we take 424 00:52:31,519 --> 00:52:39,480 and then we figure out which one is the closest to the point that 425 00:52:39,480 --> 00:52:45,240 that's K nearest neighbors. So now we've learned about K nearest 426 00:52:45,239 --> 00:52:51,079 be able to do that within our code. So here, I'm going to label 427 00:52:51,079 --> 00:52:59,559 And we're actually going to use a package from SK learn. So the 428 00:52:59,559 --> 00:53:04,639 packages and so that we don't have to manually code all these 429 00:53:04,639 --> 00:53:08,199 be really difficult. And chances are the way that we would code 430 00:53:08,199 --> 00:53:13,079 or it'd be really slow, or I don't know a whole bunch of issues. 431 00:53:13,079 --> 00:53:20,319 hand it off to the pros. From here, I can say, okay, from SK 432 00:53:20,320 --> 00:53:27,880 neighbors, I'm going to import K neighbors classifier, because 433 00:53:27,880 --> 00:53:38,160 so I run that. And our KNN model is going to be this K neighbors 434 00:53:38,159 --> 00:53:43,920 a parameter of how many neighbors, you know, we want to use. So 435 00:53:43,920 --> 00:53:52,800 we just use one. So now if I do K, and then model dot fit, I can 436 00:53:52,800 --> 00:54:03,560 weight y train data. Okay. So that effectively fits this model. 437 00:54:03,559 --> 00:54:11,880 why can and I guess yeah, let's do y predictions. And my y 438 00:54:11,880 --> 00:54:24,960 dot predict. So let's use the test set x test. Okay. Alright, so 439 00:54:24,960 --> 00:54:29,720 that we have those. But if I get my truth values for that test 440 00:54:29,719 --> 00:54:33,879 we actually do. So just looking at this, we got five out of six of 441 00:54:33,880 --> 00:54:39,480 actually take a look at something called the classification report 442 00:54:39,480 --> 00:54:49,719 So if I go to from SK learn dot metrics, import classification 443 00:54:49,719 --> 00:54:57,959 say, hey, print out this classification report for me. And let's 444 00:54:57,960 --> 00:55:04,119 y test and the y prediction. We run this and we see we get this 445 00:55:04,119 --> 00:55:10,719 to tell you guys a few things on this chart. Alright, this 446 00:55:10,719 --> 00:55:15,679 pretty good. That's just saying, hey, if we just look at, you 447 00:55:15,679 --> 00:55:23,359 what it's closest to, then we actually get an 82% accuracy, which 448 00:55:23,360 --> 00:55:29,960 versus how many total are there. Now, precision is saying, okay, 449 00:55:29,960 --> 00:55:36,199 for class one, or class zero and class one. What precision is 450 00:55:36,199 --> 00:55:42,879 diagram over here, because I actually kind of like this diagram. 451 00:55:42,880 --> 00:55:48,160 And on the left over here, we have everything that we know is 452 00:55:48,159 --> 00:55:54,079 actually truly positive, that we've labeled positive in our 453 00:55:54,079 --> 00:56:01,079 this is everything that's truly negative. Now in the circle, we 454 00:56:01,079 --> 00:56:08,159 were labeled positive by our model. On the left here, we have 455 00:56:08,159 --> 00:56:13,119 because you know, this side is the positive side and the side is 456 00:56:13,119 --> 00:56:18,839 truly positive. Whereas all these ones out here, well, they should 457 00:56:18,840 --> 00:56:24,559 are labeled as negative. And in here, these are the ones that 458 00:56:24,559 --> 00:56:33,000 actually negative. And out here, these are truly negative. So 459 00:56:33,000 --> 00:56:40,400 the ones we've labeled as positive, how many of them are true 460 00:56:40,400 --> 00:56:47,160 okay, out of all the ones that we know are truly positive, how 461 00:56:47,159 --> 00:56:55,480 so going back to this over here, our precision score, so again, 462 00:56:55,480 --> 00:57:03,880 that we've labeled as the specific class, how many of them are 463 00:57:03,880 --> 00:57:09,400 recall how out of all the ones that are actually this class, how 464 00:57:09,400 --> 00:57:18,200 is 68% and 89%. Alright, so not too shabby, we can clearly see 465 00:57:18,199 --> 00:57:24,079 like this, the class zero is worse than class one. Right? So that 466 00:57:24,079 --> 00:57:30,079 hadrons and for our gammas. This f1 score over here is kind of a 467 00:57:30,079 --> 00:57:35,519 recall score. So we're actually going to mostly look at this one 468 00:57:35,519 --> 00:57:43,000 test data set. So here we have a measure of 72 and 87 or point 469 00:57:43,000 --> 00:57:55,639 which is not too shabby. All right. Well, what if we, you know, 470 00:57:55,639 --> 00:58:04,599 that, okay, so what was it originally with one? We see that our f1 471 00:58:04,599 --> 00:58:10,360 point seven two and then point eight seven. And then our accuracy 472 00:58:10,360 --> 00:58:20,440 three. Alright, so we've kind of increased zero at the cost of one 473 00:58:20,440 --> 00:58:28,159 is 81. So let's actually just make this five. Alright, so you 474 00:58:28,159 --> 00:58:35,359 we have 82% accuracy, which is pretty decent for a model that's 475 00:58:35,360 --> 00:58:42,880 the next type of model that we're going to talk about is something 476 00:58:42,880 --> 00:58:48,400 in order to understand the concepts behind naive Bayes, we have to 477 00:58:48,400 --> 00:58:55,800 conditional probability and Bayes rule. So let's say I have some 478 00:58:55,800 --> 00:59:03,720 this table right here. People who have COVID are over here in this 479 00:59:03,719 --> 00:59:09,039 have COVID are down here in this green row. Now, what about the 480 00:59:09,039 --> 00:59:18,360 tested positive are over here in this column. And people who have 481 00:59:18,360 --> 00:59:25,840 this column. Okay. Yeah, so basically, our categories are people 482 00:59:25,840 --> 00:59:32,800 people who don't have COVID, but test positive, so a false false 483 00:59:32,800 --> 00:59:38,560 and test negative, which is a false negative, and people who don't 484 00:59:38,559 --> 00:59:48,159 which good means you don't have COVID. Okay, so let's make this 485 00:59:48,159 --> 00:59:55,359 in the margins, I've written down the sums of whatever it's 486 00:59:55,360 --> 01:00:05,559 sum of this entire row. And this here might be the sum of this 487 01:00:05,559 --> 01:00:11,559 question that I have is, what is the probability of having COVID 488 01:00:11,559 --> 01:00:21,920 test? And in probability, we write that out like this. So the 489 01:00:21,920 --> 01:00:29,360 line, that vertical line means given that, you know, some 490 01:00:29,360 --> 01:00:39,440 okay, so what is the probability of having COVID given a positive 491 01:00:39,440 --> 01:00:48,320 saying, okay, let's go into this condition. So the condition of 492 01:00:48,320 --> 01:00:53,360 slice of the data, right? That means if you're in this slice of 493 01:00:53,360 --> 01:00:59,000 given that we have a positive test, given in this condition, in 494 01:00:59,000 --> 01:01:05,679 test. So what's the probability that we have COVID? Well, if we're 495 01:01:05,679 --> 01:01:15,440 of people that have COVID is 531. So I'm gonna say that there's 496 01:01:15,440 --> 01:01:24,599 now we divide that by the total number of people that have a 497 01:01:24,599 --> 01:01:34,639 so that's the probability and doing a quick division, we get that 498 01:01:34,639 --> 01:01:43,239 96.4%. So according to this data set, which is data that I made up 499 01:01:43,239 --> 01:01:50,759 not actually real COVID data. But according to this data, the 500 01:01:50,760 --> 01:02:02,480 that you tested positive is 96.4%. Alright, now with that, let's 501 01:02:02,480 --> 01:02:10,440 this section here. Let's ignore this bottom part for now. So Bayes 502 01:02:10,440 --> 01:02:18,000 the probability of some event A happening, given that B happened. 503 01:02:18,000 --> 01:02:26,000 happened. This is our condition, right? Well, what if we don't 504 01:02:26,000 --> 01:02:31,440 if we don't know what the probability of A given B is? Well, Bayes 505 01:02:31,440 --> 01:02:36,920 can actually go and calculate it, as long as you have a 506 01:02:36,920 --> 01:02:43,920 of A and the probability of B. Okay. And this is just a 507 01:02:43,920 --> 01:02:51,320 so here we have Bayes rule. And let's actually see Bayes rule in 508 01:02:51,320 --> 01:02:58,920 So here, let's say that we have some disease statistics, okay. So 509 01:02:58,920 --> 01:03:05,960 And we know that the probability of obtaining a false positive is 510 01:03:05,960 --> 01:03:12,800 false negative is 0.01. And the probability of the disease is 0.1. 511 01:03:12,800 --> 01:03:20,640 the disease given that we got a positive test? Hmm, how do we even 512 01:03:20,639 --> 01:03:26,519 what what do I mean by false positive? What's a different way to 513 01:03:26,519 --> 01:03:32,960 is when you test positive, but you don't actually have the 514 01:03:32,960 --> 01:03:42,480 that you have a positive test given no disease, right? And 515 01:03:42,480 --> 01:03:47,599 it's a probability that you test negative given that you actually 516 01:03:47,599 --> 01:03:58,119 that into a chart, for example, and this might be my positive and 517 01:03:58,119 --> 01:04:07,239 be my diseases, disease and no disease. Well, the probability that 518 01:04:07,239 --> 01:04:14,039 have no disease, okay, that's 0.05 over here. And then the false 519 01:04:14,039 --> 01:04:20,880 testing negative, but I don't actually have the disease. This so 520 01:04:20,880 --> 01:04:25,480 positive, and you don't have the disease, plus a probability that 521 01:04:25,480 --> 01:04:30,880 don't have the disease, that should sum up to one. Okay, because 522 01:04:30,880 --> 01:04:34,360 then you should have some probability that you're testing positive 523 01:04:34,360 --> 01:04:43,120 testing negative. But that probability, in total should be one. So 524 01:04:43,119 --> 01:04:47,039 negative and no disease, this should be the reciprocal, this 525 01:04:47,039 --> 01:04:57,360 should be 0.95 because it's one minus whatever this probability 526 01:04:59,679 --> 01:05:06,319 up here, this should be 0.99 because the probability that we, you 527 01:05:06,320 --> 01:05:10,080 test negative and have the disease plus the probability that we 528 01:05:10,079 --> 01:05:16,799 disease should equal one. So this is our probability chart. And 529 01:05:16,800 --> 01:05:21,920 being point 0.1 just means I have 10% probability of actually of 530 01:05:23,199 --> 01:05:30,000 in the general population, the probability that I have the disease 531 01:05:30,000 --> 01:05:37,039 probability that I have the disease given that I got a positive 532 01:05:37,039 --> 01:05:43,119 can write this out in terms of Bayes rule, right? So if I use this 533 01:05:43,119 --> 01:05:51,199 probability of a positive test given that I have the disease times 534 01:05:52,880 --> 01:05:58,240 divided by the probability of the evidence, which is my positive 535 01:06:00,000 --> 01:06:05,679 Alright, now let's plug in some numbers for that. The probability 536 01:06:05,679 --> 01:06:13,839 that I have the disease is 0.99. And then the probability that I 537 01:06:13,840 --> 01:06:26,000 over here 0.1. Okay. And then the probability that I have a 538 01:06:26,000 --> 01:06:29,840 what is the probability that I have a positive test given that I 539 01:06:29,840 --> 01:06:37,360 and then having having the disease. And then the other case, where 540 01:06:37,360 --> 01:06:45,519 negative test given or sorry, positive test giving no disease 541 01:06:45,519 --> 01:06:52,000 having a disease. Okay, so I can expand that probability of having 542 01:06:52,000 --> 01:06:58,480 these two different cases, I have a disease, and then I don't. And 543 01:06:58,480 --> 01:07:08,240 having positive tests in either one of those cases. So that 544 01:07:09,519 --> 01:07:16,159 plus 0.05. So that's the probability that I'm testing positive, 545 01:07:16,960 --> 01:07:20,400 And the times the probability that I don't actually have the 546 01:07:20,400 --> 01:07:29,840 0.1 probability that the population doesn't have the disease is 547 01:07:29,840 --> 01:07:48,720 multiplication. And I get an answer of 0.6875 or 68.75%. Okay. All 548 01:07:48,719 --> 01:07:56,480 that we can expand Bayes rule and apply it to classification. And 549 01:07:56,480 --> 01:08:04,639 base. So first, a little terminology. So the posterior is this 550 01:08:04,639 --> 01:08:12,480 Hey, what is the probability of some class CK? So by CK, I just 551 01:08:12,480 --> 01:08:19,359 categories, so C for category or class or whatever. So category 552 01:08:19,359 --> 01:08:26,639 dogs, category three, lizards, all the way, we have k categories, 553 01:08:27,520 --> 01:08:36,160 So what is the probability of having of this specific sample x, so 554 01:08:36,159 --> 01:08:44,079 of this one sample. What is the probability of x fitting into 555 01:08:44,079 --> 01:08:49,119 so that that's what this is asking, what is the probability that, 556 01:08:49,119 --> 01:08:59,920 this class, given all this evidence that we see the x's. So the 557 01:08:59,920 --> 01:09:07,600 here, it's saying, Okay, well, given that, you know, assume, 558 01:09:07,600 --> 01:09:13,760 class is class CK, okay, assume that this is a category. Well, 559 01:09:13,760 --> 01:09:21,280 actually seeing x, all these different features from that 560 01:09:21,279 --> 01:09:26,880 prior. So like in the entire population of things, what are the 561 01:09:26,880 --> 01:09:32,640 probability of this class in general? Like if I have, you know, in 562 01:09:32,640 --> 01:09:40,160 percentage? What is the chance that this image is a cat? How many 563 01:09:40,159 --> 01:09:47,439 down here is called the evidence because what we're trying to do 564 01:09:47,439 --> 01:09:54,319 we're creating this new posterior probability built upon the prior 565 01:09:54,319 --> 01:10:02,239 right? And that evidence is a probability of x. So that's some 566 01:10:05,439 --> 01:10:15,599 is a rule for naive Bayes. Whoa, okay, let's digest that a little 567 01:10:15,600 --> 01:10:21,680 let me use a different color. What is this side of the equation 568 01:10:21,680 --> 01:10:28,320 what is the probability that we are in some class K, CK, given 569 01:10:28,319 --> 01:10:33,920 input, this is my second input, this is, you know, my third, 570 01:10:33,920 --> 01:10:41,600 say that our classification is, do we play soccer today or not? 571 01:10:41,600 --> 01:10:49,440 okay, is it how much wind is there? How much rain is there? And 572 01:10:49,439 --> 01:10:54,399 So let's say that it's raining, it's not windy, but it's 573 01:10:56,079 --> 01:10:59,680 So let's use Bayes rule on this. So this here 574 01:11:06,079 --> 01:11:13,840 is equal to the probability of x one, x two, all these joint 575 01:11:13,840 --> 01:11:20,800 times the probability of that class, all over the probability of 576 01:11:24,399 --> 01:11:31,839 Okay. So what is this fancy symbol over here, this means 577 01:11:33,600 --> 01:11:38,560 so how our equal sign means it's equal to this like little 578 01:11:38,560 --> 01:11:48,800 proportional to okay, and this denominator over here, you might 579 01:11:48,800 --> 01:11:53,840 the class like this, that number doesn't depend on the class, 580 01:11:53,840 --> 01:11:59,199 for all of our different classes. So what I'm going to do is make 581 01:11:59,199 --> 01:12:07,920 going to say that this probability x one, x two, all the way to x 582 01:12:07,920 --> 01:12:10,800 to the numerator, I don't care about the denominator, because it's 583 01:12:10,800 --> 01:12:20,800 single class. So this is proportional to x one, x two, x n given 584 01:12:20,800 --> 01:12:31,920 that class. Okay. All right. So in naive Bayes, the point of it 585 01:12:32,960 --> 01:12:36,319 this joint probability, we're just assuming that all of these 586 01:12:36,319 --> 01:12:42,719 are all independent. So in my soccer example, you know, the 587 01:12:44,800 --> 01:12:50,720 or the probability that, you know, it's windy, and it's rainy, 588 01:12:50,720 --> 01:12:56,800 things are independent, we're assuming that they're independent. 589 01:12:56,800 --> 01:13:06,560 actually write this part of the equation here as this. So each 590 01:13:07,119 --> 01:13:13,840 all of them together. So the probability of the first feature, 591 01:13:14,800 --> 01:13:20,159 times the probability of the second feature and given this 592 01:13:20,159 --> 01:13:30,960 all the way up until, you know, the nth feature of given that it's 593 01:13:30,960 --> 01:13:39,199 all of this. All right, which means that this here is now 594 01:13:39,199 --> 01:13:47,599 expanded times this. So I'm going to write that out. So the 595 01:13:47,600 --> 01:13:54,560 And I'm actually going to use this symbol. So what this means is 596 01:13:54,560 --> 01:14:04,000 it means multiply everything to the right of this. So this 597 01:14:04,720 --> 01:14:11,360 but do it for all the i's. So I, what is I, okay, we're going to 598 01:14:11,359 --> 01:14:18,639 the first x i all the way to the nth. So that means for every 599 01:14:19,359 --> 01:14:27,439 these probabilities together. And that's where this up here comes 600 01:14:27,439 --> 01:14:31,599 oops, this should be a line to wrap this up in plain English. 601 01:14:31,600 --> 01:14:37,520 is a probability that you know, we're in some category, given that 602 01:14:37,520 --> 01:14:44,960 features is proportional to the probability of that class in 603 01:14:44,960 --> 01:14:51,119 each of those features, given that we're in this one class that 604 01:14:51,680 --> 01:14:59,600 of it, you know, of us playing soccer today, given that it's 605 01:14:59,600 --> 01:15:04,880 Wednesday, is proportional to Okay, well, what is what is the 606 01:15:04,880 --> 01:15:10,400 anyways, and then times the probability that it's rainy, given 607 01:15:10,960 --> 01:15:15,439 times the probability that it's not windy, given that we're 608 01:15:15,439 --> 01:15:21,199 we playing soccer when it's windy, how you know, and then how many 609 01:15:21,199 --> 01:15:30,319 that's Wednesday, given that we're playing soccer. Okay. So how do 610 01:15:30,319 --> 01:15:39,039 classification. So that's where this comes in our y hat, our 611 01:15:39,039 --> 01:15:45,439 something called the arg max. And then this expression over here, 612 01:15:45,439 --> 01:15:55,199 the arg max. Well, we want. So okay, if I write out this, again, 613 01:15:55,199 --> 01:16:05,840 being in some class CK given all of our evidence. Well, we're 614 01:16:06,640 --> 01:16:13,920 this expression on the right. That's what arc max means. So if K 615 01:16:14,720 --> 01:16:21,199 one through K, so this is how many categories are, we're going to 616 01:16:21,199 --> 01:16:32,319 to solve this expression over here and find the K that makes that 617 01:16:32,319 --> 01:16:39,439 that instead of writing this, we have now a formula, thanks to 618 01:16:40,560 --> 01:16:47,440 approximate that right in something that maybe we can we maybe we 619 01:16:47,439 --> 01:16:54,479 we have the answers for that based on our training set. So this 620 01:16:54,479 --> 01:17:00,559 these and finding whatever class whatever category maximizes this 621 01:17:00,560 --> 01:17:12,160 this is something known as MAP for short, or maximum a 622 01:17:12,159 --> 01:17:20,159 Pick the hypothesis. So pick the K that is the most probable so 623 01:17:20,159 --> 01:17:31,119 of misclassification. Right. So that is MAP. That is naive Bayes. 624 01:17:31,760 --> 01:17:38,800 just like how I imported k nearest neighbor, k neighbors 625 01:17:38,800 --> 01:17:45,680 I can go to SK learn naive Bayes. And I can import Gaussian naive 626 01:17:46,800 --> 01:17:52,720 Right. And here I'm going to say my naive Bayes model is equal. 627 01:17:52,720 --> 01:18:06,480 had above. And I'm just going to say with this model, we are going 628 01:18:06,479 --> 01:18:17,359 All right, just like above. So this, I might actually, so I'm 629 01:18:19,199 --> 01:18:26,159 exactly, just like above, I'm going to make my prediction. So 630 01:18:26,159 --> 01:18:35,279 naive Bayes model. And of course, I'm going to run the 631 01:18:35,279 --> 01:18:40,719 just going to put these in the same cell. But here we have the y 632 01:18:40,720 --> 01:18:49,520 is still our original test data set. So if I run this, you'll see 633 01:18:49,520 --> 01:18:58,640 we get worse scores, right? Our precision, for all of them, they 634 01:18:58,640 --> 01:19:04,160 you know, for our precision, our recall, our f1 score, they look 635 01:19:04,159 --> 01:19:11,439 categories. And our total accuracy, I mean, it's still 72%, which 636 01:19:11,439 --> 01:19:22,000 72%. Okay. Which, you know, is not not that great. Okay, so let's 637 01:19:22,000 --> 01:19:29,760 Here, I've drawn a plot, I have y. So this is my label on one 638 01:19:29,760 --> 01:19:36,720 my features. So let's just say I only have one feature in this 639 01:19:36,720 --> 01:19:44,079 we see that, you know, I have a few of one class type down here. 640 01:19:44,079 --> 01:19:51,279 because it's zero. And then we have our other class type one up 641 01:19:51,279 --> 01:19:58,960 y. Okay. So many of you guys are familiar with regression. So 642 01:19:58,960 --> 01:20:10,159 draw a regression line through this, it might look something like 643 01:20:10,159 --> 01:20:16,239 doesn't seem to be a very good model. Like, why would we use this 644 01:20:16,239 --> 01:20:27,840 Right? It's, it's iffy. Okay. For example, we might say, okay, 645 01:20:27,840 --> 01:20:33,520 everything from here downwards would be one class type in here, 646 01:20:34,640 --> 01:20:41,520 But when you look at this, you're just you, you visually can tell, 647 01:20:41,520 --> 01:20:46,240 make sense. Things are not those dots are not along that line. And 648 01:20:46,239 --> 01:20:55,279 are doing classification, not regression. Okay. Well, first of 649 01:20:55,279 --> 01:21:04,639 this model, if we just use this line, it equals m x. So whatever 650 01:21:04,640 --> 01:21:10,000 which is the y intercept, right? And m is the slope. But when we 651 01:21:10,000 --> 01:21:15,760 is it actually y hat? No, it's not right. So when we're working 652 01:21:15,760 --> 01:21:20,720 what we're actually estimating in our model is a probability, 653 01:21:20,720 --> 01:21:30,240 and one, that is class zero or class one. So here, let's rewrite 654 01:21:32,720 --> 01:21:39,440 Okay, well, m x plus b, that can range, you know, from negative 655 01:21:39,439 --> 01:21:43,279 right? For any for any value of x, it goes from negative infinity 656 01:21:44,159 --> 01:21:49,039 But probability, we know probably one of the rules of probability 657 01:21:49,039 --> 01:21:57,039 between zero and one. So how do we fix this? Well, maybe instead 658 01:21:57,039 --> 01:22:03,519 equal to that, we can set the odds equal to this. So by that, I 659 01:22:03,520 --> 01:22:10,080 divided by one minus the probability. Okay, so now becomes this 660 01:22:10,079 --> 01:22:17,359 take on infinite values. But there's still one issue here. Let me 661 01:22:18,079 --> 01:22:24,559 The one issue here is that m x plus b, that can still be negative, 662 01:22:24,560 --> 01:22:28,800 I have a negative slope, if I have a negative b, if I have some 663 01:22:28,800 --> 01:22:36,400 but that can be that's allowed to be negative. So how do we fix 664 01:22:36,399 --> 01:22:47,839 the log of the odds. Okay. So now I have the log of you know, some 665 01:22:47,840 --> 01:22:54,319 the probability. And now that is on a range of negative infinity 666 01:22:54,319 --> 01:23:00,639 because the range of log should be negative infinity to infinity. 667 01:23:00,640 --> 01:23:08,400 the probability? Well, the first thing I can do is take, you know, 668 01:23:08,399 --> 01:23:16,479 the not the e to the whatever is on both sides. So that gives me 669 01:23:16,479 --> 01:23:27,839 over the one minus the probability is now equal to e to the m x 670 01:23:27,840 --> 01:23:39,039 that out. So the probability is equal to one minus probability e 671 01:23:39,039 --> 01:23:49,279 e to the m x plus b minus P times e to the m x plus b. And now we 672 01:23:49,279 --> 01:23:58,880 one side. So if I do P, so basically, I'm moving this over, so I'm 673 01:23:58,880 --> 01:24:11,440 to the m x plus b is equal to e to the m x plus b and let me 674 01:24:11,439 --> 01:24:22,719 little bigger. So now my probability can be e to the m x plus b 675 01:24:22,720 --> 01:24:32,880 Okay, well, let me just rewrite this really quickly, I want a 676 01:24:33,840 --> 01:24:39,920 Okay, so what I'm going to do is I'm going to multiply this by 677 01:24:40,800 --> 01:24:45,119 and then also the bottom by negative m x plus b, and I'm allowed 678 01:24:45,119 --> 01:24:52,640 this over this is one. So now my probability is equal to one over 679 01:24:54,640 --> 01:25:01,840 one plus e to the negative m x plus b. And now why did I rewrite 680 01:25:01,840 --> 01:25:07,600 It's because this is actually a form of a special function, which 681 01:25:07,600 --> 01:25:19,360 function. And for the sigmoid function, it looks something like 682 01:25:20,159 --> 01:25:30,639 that some x is equal to one over one plus e to the negative x. So 683 01:25:30,640 --> 01:25:38,000 is rewrite this in some sigmoid function, where the x value is 684 01:25:38,960 --> 01:25:42,880 So maybe I'll change this to y just to make that a bit more clear, 685 01:25:42,880 --> 01:25:50,319 the variable name is. But this is our sigmoid function. And 686 01:25:50,319 --> 01:26:01,039 looks like is it goes from zero. So this here is zero to one. And 687 01:26:01,039 --> 01:26:06,399 curved s, which I didn't draw too well. Let me try that again. 688 01:26:10,159 --> 01:26:19,119 something if I can draw this right. Like that. Okay, so it goes in 689 01:26:19,119 --> 01:26:25,760 And you might notice that this form fits our shape up here. 690 01:26:29,840 --> 01:26:36,159 Oops, let's draw it sharper. But if it's our shape up there a lot 691 01:26:37,439 --> 01:26:44,479 Alright, so that is what we call logistic regression, we're 692 01:26:44,479 --> 01:26:56,239 to the sigmoid function. Okay. And when we only have, you know, 693 01:26:56,239 --> 01:27:06,239 one feature x, and that's what we call simple logistic regression. 694 01:27:06,239 --> 01:27:12,639 so that's only x zero, but then if we have x zero, x one, all the 695 01:27:12,640 --> 01:27:19,360 multiple logistic regression, because there are multiple features 696 01:27:19,359 --> 01:27:26,079 when we're building our model, logistic regression. So I'm going 697 01:27:26,079 --> 01:27:36,079 And again, from SK learn this linear model, we can import logistic 698 01:27:36,079 --> 01:27:43,279 And just like how we did above, we can repeat all of this. So 699 01:27:43,279 --> 01:27:53,439 this log model, or LG logistic regression. I'm going to change 700 01:27:54,319 --> 01:27:59,119 So I'm just going to use the default logistic regression. But 701 01:27:59,119 --> 01:28:02,319 you see that you can use different penalties. So right now we're 702 01:28:02,319 --> 01:28:08,880 an L2 penalty. But L2 is our quadratic formula. Okay, so that 703 01:28:09,680 --> 01:28:16,079 you know, outliers, it would really penalize that. For all these 704 01:28:16,079 --> 01:28:22,319 you can toggle these different parameters, and you might get 705 01:28:22,319 --> 01:28:26,960 If I were building a production level logistic regression model, 706 01:28:26,960 --> 01:28:31,439 would want to figure out how to do that. So I'm going to go ahead 707 01:28:31,439 --> 01:28:36,479 I would want to figure out, you know, what are the best parameters 708 01:28:36,479 --> 01:28:41,519 based on my validation data. But for now, we'll just we'll just 709 01:28:42,720 --> 01:28:49,600 So again, I'm going to fit the X train and the Y train. And I'm 710 01:28:49,600 --> 01:28:57,440 so I can just call this again. And instead of LG, NB, I'm going to 711 01:28:57,439 --> 01:29:07,279 precision 65% recall 71, f 168, or 82 total accuracy of 77. Okay, 712 01:29:07,279 --> 01:29:15,279 better than I base, but it's still not as good as K and N. 713 01:29:15,279 --> 01:29:20,079 classification that I wanted to talk about is something called 714 01:29:20,079 --> 01:29:31,840 or SVMs for short. So what exactly is an SVM model, I have two 715 01:29:31,840 --> 01:29:39,520 x one on the axes. And then I've told you if it's you know, class 716 01:29:39,520 --> 01:29:51,280 blue and red labels, my goal is to find some sort of line between 717 01:29:51,279 --> 01:30:00,559 the data. Alright, so this line is our SVM model. So I call it a 718 01:30:00,560 --> 01:30:06,160 line, but in 3d, it would be a plane and then you can also have 719 01:30:06,159 --> 01:30:11,599 proper term is actually I want to find the hyperplane that best 720 01:30:11,600 --> 01:30:30,000 classes. Let's see a few examples. Okay, so first, between these 721 01:30:30,000 --> 01:30:37,760 and C, which one is the best divider of the data, which one has 722 01:30:37,760 --> 01:30:42,880 or the other, or at least if it doesn't, which one divides it the 723 01:30:42,880 --> 01:30:53,920 is has the most defined boundary between the two different groups. 724 01:30:53,920 --> 01:31:02,079 pretty straightforward. It should be a right because a has a clear 725 01:31:02,079 --> 01:31:09,039 know, everything on this side of a is one label, it's negative and 726 01:31:09,039 --> 01:31:16,399 is the other label, it's positive. So what if I have a but then 727 01:31:16,399 --> 01:31:26,479 like this, and my C, maybe like this, sorry, they're kind of the 728 01:31:27,439 --> 01:31:38,559 But now which one is the best? So I would argue that it's still a, 729 01:31:38,560 --> 01:31:47,840 Right? And why is it still a? Because in these other two, look at 730 01:31:47,840 --> 01:31:57,119 to these points. Right? So if I had some new point that I wanted 731 01:31:57,119 --> 01:32:02,960 say I didn't have A or B. So let's say we're just working with C. 732 01:32:02,960 --> 01:32:10,960 that's right here. Or maybe a new point that's right there. Well, 733 01:32:10,960 --> 01:32:19,600 looking at this. I mean, without the boundary, that would probably 734 01:32:19,600 --> 01:32:27,520 right? I mean, it's pretty close to that other positive. So one 735 01:32:27,520 --> 01:32:36,320 is something known as the margin. Okay, so not only do we want to 736 01:32:36,319 --> 01:32:43,119 well, we also care about the boundary in between where the points 737 01:32:43,119 --> 01:32:53,279 are, and the line that we're drawing. So in a line like this, the 738 01:32:53,279 --> 01:33:10,000 might be like here. And I'm trying to draw these perpendicular. 739 01:33:10,000 --> 01:33:22,399 if I switch over to these dotted lines, if I can draw this right. 740 01:33:22,399 --> 01:33:37,839 are what's known as the margins. Okay, so these both here, these 741 01:33:38,479 --> 01:33:43,039 And our goal is to maximize those margins. So not only do we want 742 01:33:43,039 --> 01:33:51,279 two different classes, we want the line that has the largest 743 01:33:51,279 --> 01:33:57,519 on the margin lines, the data. So basically, these are the data 744 01:33:57,520 --> 01:34:08,480 divider. These are what we call support vectors. Hence the name 745 01:34:08,479 --> 01:34:16,479 so the issue with SVM sometimes is that they're not so robust to 746 01:34:16,479 --> 01:34:25,839 if I had one outlier, like this up here, that would totally change 747 01:34:25,840 --> 01:34:31,920 vector to be, even though that might be my only outlier. Okay. So 748 01:34:31,920 --> 01:34:38,239 in mind. As you know, when you're working with SVM is, it might 749 01:34:38,239 --> 01:34:45,679 are outliers in your data set. Okay, so another example of SVMs 750 01:34:45,680 --> 01:34:50,480 data like this, I'm just going to use a one dimensional data set 751 01:34:50,479 --> 01:34:56,799 say we have a data set that looks like this. Well, our, you know, 752 01:34:56,800 --> 01:35:01,440 perpendicular to this line. But it should be somewhere along this 753 01:35:02,399 --> 01:35:09,119 anywhere like this. You might argue, okay, well, there's one here. 754 01:35:09,119 --> 01:35:13,840 draw another one over here, right? And then maybe you can have two 755 01:35:13,840 --> 01:35:21,680 SVMs work. But one thing that we can do is we can create some sort 756 01:35:21,680 --> 01:35:29,440 that one thing I forgot to do was to label where zero was. So 757 01:35:32,000 --> 01:35:36,800 Now, what I'm going to do is I'm going to say, okay, I'm going to 758 01:35:36,800 --> 01:35:44,560 have x, sorry, x zero and x one. So x zero is just going to be my 759 01:35:44,560 --> 01:35:56,880 x one equal to let's say, x squared. So whatever is this squared, 760 01:35:56,880 --> 01:36:02,960 you know, maybe somewhere here, here, just pretend that it's 761 01:36:02,960 --> 01:36:06,640 Right. And now my pluses might be something like 762 01:36:10,079 --> 01:36:16,079 that. And I'm going to run out of space over here. So I'm just 763 01:36:16,079 --> 01:36:27,600 use your imagination. But once I draw it like this, well, it's a 764 01:36:27,600 --> 01:36:35,520 right? Now our SVM could be maybe something like this, this. And 765 01:36:35,520 --> 01:36:41,600 our data set. Now it's separable where one class is this way. And 766 01:36:42,800 --> 01:36:49,360 Okay, so that's known as SVMs. I do highly suggest that, you know, 767 01:36:49,359 --> 01:36:54,399 mentioned, if you're interested in them, do go more in depth 768 01:36:54,399 --> 01:37:00,239 do we how do we find this hyperplane? Right? I'm not going to go 769 01:37:00,239 --> 01:37:05,840 because you're just learning what an SVM is. But it's a good idea 770 01:37:05,840 --> 01:37:13,039 technique behind finding, you know, what exactly are the are the 771 01:37:13,039 --> 01:37:19,519 that we're going to use. So anyways, this transformation that we 772 01:37:19,520 --> 01:37:26,560 as the kernel trick. So when we go from x to some coordinate x, 773 01:37:27,119 --> 01:37:31,599 what we're doing is we are applying a kernel. So that's why it's 774 01:37:33,279 --> 01:37:40,159 So SVMs are actually really powerful. And you'll see that here. So 775 01:37:40,159 --> 01:37:48,800 to import SVC. And SVC is our support vector classifier. So with 776 01:37:49,600 --> 01:37:59,840 we are going to, you know, create SVC model. And we are going to, 777 01:37:59,840 --> 01:38:06,560 could have just copied and pasted this, I should be able to do 778 01:38:06,560 --> 01:38:10,480 again, fit this to X train, I could have just copied and pasted 779 01:38:10,479 --> 01:38:23,119 done that. Okay, taking a bit longer. All right. Let's predict 780 01:38:23,760 --> 01:38:28,880 let's see if I can hover over this. Right. So again, you see a lot 781 01:38:28,880 --> 01:38:37,119 parameters here that you can go back and change if you were 782 01:38:37,119 --> 01:38:46,319 but in this specific case, we'll just use it out of the box again. 783 01:38:46,319 --> 01:38:53,119 you'll note that Wow, the accuracy actually jumps to 87% with the 784 01:38:53,119 --> 01:38:59,199 there's nothing less than, you know, point eight, which is great. 785 01:38:59,199 --> 01:39:03,359 I mean, everything's at 0.9, which is higher than anything that we 786 01:39:06,640 --> 01:39:11,360 So so far, we've gone over four different classification models, 787 01:39:11,359 --> 01:39:17,039 logistic regression, naive Bayes and cannon. And these are just 788 01:39:17,039 --> 01:39:23,760 them. Each of these they have different, you know, they have 789 01:39:23,760 --> 01:39:31,920 go and you can toggle. And you can try to see if that helps later 790 01:39:31,920 --> 01:39:40,800 they perform, they give us around 70 to 80% accuracy. Okay, with 791 01:39:40,800 --> 01:39:45,440 let's see if we can actually beat that using a neural net. Now the 792 01:39:45,439 --> 01:39:51,839 I wanted to talk about is known as a neural net or neural network. 793 01:39:51,840 --> 01:39:58,480 like this. So you have an input layer, this is where all your 794 01:39:58,479 --> 01:40:03,199 all these arrows pointing to some sort of hidden layer. And then 795 01:40:03,199 --> 01:40:10,559 sort of output layer. So what is what is all this mean? Each of 796 01:40:10,560 --> 01:40:18,160 something known as a neuron. Okay, so that's a neuron. In a neural 797 01:40:18,159 --> 01:40:23,199 features that we're inputting into the neural net. So that might 798 01:40:23,840 --> 01:40:28,880 x n. Right. And these are the features that we talked about there, 799 01:40:28,880 --> 01:40:38,720 the pregnancy, the BMI, the age, etc. Now all of these get 800 01:40:38,720 --> 01:40:44,240 are multiplied by some w number that applies to that one specific 801 01:40:44,239 --> 01:40:51,840 feature. So these two get multiplied. And the sum of all of these 802 01:40:51,840 --> 01:40:58,400 so basically, I'm taking w zero times x zero. And then I'm adding 803 01:40:58,399 --> 01:41:05,359 I'm adding you know, x two times w two, etc, all the way to x n 804 01:41:05,359 --> 01:41:11,199 input into the neuron. Now I'm also adding this bias term, which 805 01:41:11,199 --> 01:41:17,199 to shift this by a little bit. So I might add five or I might add 806 01:41:17,199 --> 01:41:24,960 I don't know. But we're going to add this bias term. And the 807 01:41:24,960 --> 01:41:31,279 the sum of this, this, this and this, go into something known as 808 01:41:31,279 --> 01:41:38,960 okay. And then after applying this activation function, we get an 809 01:41:38,960 --> 01:41:44,399 neuron would look like. Now a whole network of them would look 810 01:41:46,000 --> 01:41:53,760 So I kind of gloss over this activation function. What exactly is 811 01:41:53,760 --> 01:41:58,720 looks like if we have all our inputs here. And let's say all of 812 01:41:58,720 --> 01:42:08,159 of addition, right? Then what's going on is we're just adding a 813 01:42:08,159 --> 01:42:13,840 the some sort of weight times these input layer a bunch of times. 814 01:42:13,840 --> 01:42:22,000 and factor that all out, then this entire neural net is just a 815 01:42:22,000 --> 01:42:27,840 layers, which I don't know about you, but that just seems kind of 816 01:42:27,840 --> 01:42:33,279 literally just write that out in a formula, why would we need to 817 01:42:33,279 --> 01:42:40,000 we wouldn't. So the activation function is introduced, right? So 818 01:42:40,000 --> 01:42:46,880 function, this just becomes a linear model. An activation function 819 01:42:46,880 --> 01:42:52,880 this. And as you can tell, these are not linear. And the reason 820 01:42:52,880 --> 01:42:58,480 our entire model doesn't collapse on itself and become a linear 821 01:42:58,479 --> 01:43:04,079 something known as a sigmoid function, it runs between zero and 822 01:43:04,079 --> 01:43:10,720 one all the way to one. And this is ReLU, which anything less than 823 01:43:10,720 --> 01:43:18,640 greater than zero is linear. So with these activation functions, 824 01:43:18,640 --> 01:43:24,160 is no longer just the linear combination of these, it's some sort 825 01:43:24,159 --> 01:43:32,880 that the input into the next neuron is, you know, it doesn't it 826 01:43:32,880 --> 01:43:39,920 become linear, because we've introduced all these nonlinearities. 827 01:43:39,920 --> 01:43:45,440 model, the loss, right? And then we do this thing called training, 828 01:43:45,439 --> 01:43:53,199 back into the model, and make certain adjustments to the model to 829 01:43:55,199 --> 01:43:59,359 Let's talk a little bit about the training, what exactly goes on 830 01:44:00,720 --> 01:44:07,600 Let's go back and take a look at our L2 loss function. This is 831 01:44:07,600 --> 01:44:15,840 looks like it's a quadratic formula, right? Well, up here, the 832 01:44:15,840 --> 01:44:23,199 large. And our goal is to get somewhere down here, where the loss 833 01:44:23,199 --> 01:44:30,720 means that our predicted value is closer to our true value. So 834 01:44:30,720 --> 01:44:39,680 this way. Okay. And thanks to a lot of properties of math, 835 01:44:39,680 --> 01:44:53,680 gradient descent, in order to follow this slope down this way. 836 01:44:53,680 --> 01:45:02,560 different slopes with respect to some value. Okay, so the loss 837 01:45:03,119 --> 01:45:12,479 w zero, versus w one versus w n, they might all be different. 838 01:45:12,479 --> 01:45:18,319 think about it is, to what extent is this value contributing to 839 01:45:18,319 --> 01:45:24,399 figure that out through some calculus, which we're not going to 840 01:45:24,399 --> 01:45:29,599 But if you want to learn more about neural nets, you should 841 01:45:29,600 --> 01:45:35,360 and figure out what exactly back propagation is doing, in order to 842 01:45:35,359 --> 01:45:41,759 how much do we have to backstep by. So the thing is here, you 843 01:45:41,760 --> 01:45:48,480 this curve at all of these different points. And the closer we get 844 01:45:48,479 --> 01:45:57,839 this step becomes. Now stick with me here. So my new value, this 845 01:45:57,840 --> 01:46:04,800 I'm going to take w zero, and I'm going to set some new value for 846 01:46:04,800 --> 01:46:12,800 set for that is the old value of w zero, plus some factor, which 847 01:46:13,680 --> 01:46:22,400 times whatever this arrow is. So that's basically saying, okay, 848 01:46:23,039 --> 01:46:30,000 and just decrease it this way. So I guess increase it in this 849 01:46:30,000 --> 01:46:34,640 this direction. But this alpha here is telling us, okay, don't 850 01:46:34,640 --> 01:46:38,800 just in case we're wrong, take a small step, take a small step in 851 01:46:38,800 --> 01:46:45,760 closer. And for those of you who, you know, do want to look more 852 01:46:45,760 --> 01:46:51,840 the reason why I use a plus here is because this here is the 853 01:46:51,840 --> 01:46:54,720 just the if you were to use the actual gradient, this should be a 854 01:46:54,720 --> 01:47:00,560 Now this alpha is something that we call the learning rate. Okay, 855 01:47:00,560 --> 01:47:07,280 we're taking steps. And that might, you know, tell our that that 856 01:47:07,840 --> 01:47:13,039 how long it takes for our neural net to converge. Or sometimes if 857 01:47:13,039 --> 01:47:21,840 diverge. But with all of these weights, so here I have w zero, w 858 01:47:21,840 --> 01:47:29,840 update to all of them after we calculate the loss, the gradient of 859 01:47:29,840 --> 01:47:37,680 weight. So that's how back propagation works. And that is 860 01:47:37,680 --> 01:47:42,880 calculate the loss, we're calculating gradients, making 861 01:47:42,880 --> 01:47:50,480 all the all the weights to something adjusted slightly. And then 862 01:47:50,479 --> 01:47:55,119 gradient. And then we're saying, Okay, let's take the training set 863 01:47:55,119 --> 01:48:01,840 again, and go through this loop all over again. So for machine 864 01:48:01,840 --> 01:48:09,039 libraries that we use, right, we've already seen SK learn. But 865 01:48:09,039 --> 01:48:19,920 networks, this is kind of what we're trying to program. And it's 866 01:48:19,920 --> 01:48:25,760 do this from scratch, because not only will we probably have a lot 867 01:48:25,760 --> 01:48:30,159 not going to be fast enough, right? Wouldn't it be great if there 868 01:48:30,800 --> 01:48:35,760 full time professionals that are dedicated to solving this 869 01:48:35,760 --> 01:48:43,360 just give us their code that's already running really fast? Well, 870 01:48:43,359 --> 01:48:49,359 And that's why we use TensorFlow. So TensorFlow makes it really 871 01:48:49,359 --> 01:48:55,599 we also have enough control over what exactly we're feeding into 872 01:48:55,600 --> 01:49:02,640 this line here is basically saying, Okay, let's create a 873 01:49:02,640 --> 01:49:08,000 just, you know, what we've seen here, it just goes one layer to 874 01:49:08,000 --> 01:49:13,359 a dense layer means that all of them are interconnected. So here, 875 01:49:13,359 --> 01:49:19,839 nodes, and this one's all these, and then this one gets connected 876 01:49:19,840 --> 01:49:26,800 So we're going to create 16 dense nodes with relu activation 877 01:49:26,800 --> 01:49:34,000 to create another layer of 16 dense nodes with relu activation. 878 01:49:34,000 --> 01:49:43,199 to be just one node. Okay. And that's how easy it is to define 879 01:49:43,199 --> 01:49:51,199 is an open source library that helps you develop and train your ML 880 01:49:51,199 --> 01:49:57,119 for a neural net. So we're using a neural net for classification. 881 01:49:58,239 --> 01:50:03,840 we are going to use TensorFlow, and I don't think I imported that 882 01:50:03,840 --> 01:50:18,400 that down here. So I'm going to import TensorFlow as TF. And 883 01:50:19,279 --> 01:50:28,159 is going to be, I'm going to use this. So essentially, this is 884 01:50:28,159 --> 01:50:35,039 things that I'm about to pass in. So yeah, layer them linear stack 885 01:50:35,760 --> 01:50:40,560 And what that means, nope, not that. So what that means is I can 886 01:50:42,720 --> 01:50:46,560 some sort of layer, and I'm just going to use a dense layer. 887 01:50:46,560 --> 01:50:56,560 Oops, dot dense. And let's say we have 32 units. Okay, I will 888 01:51:01,279 --> 01:51:09,599 set the activation as really. And at first we have to specify the 889 01:51:09,600 --> 01:51:19,680 and comma. Alright. Alright, so that's our first layer. Now our 890 01:51:19,680 --> 01:51:28,880 another dense layer of 32 units all using relu. And that's it. So 891 01:51:28,880 --> 01:51:35,760 just going to be my output layer, it's going to just be one node. 892 01:51:35,760 --> 01:51:43,119 be sigmoid. So if you recall from our logistic regression, what 893 01:51:43,119 --> 01:51:49,599 a sigmoid, it looks something like this, right? So by creating a 894 01:51:49,600 --> 01:51:56,720 we're essentially projecting our predictions to be zero or one, 895 01:51:57,439 --> 01:52:03,279 And that's going to help us, you know, we can just round to zero 896 01:52:03,279 --> 01:52:12,000 Okay. So this is my neural net model. And I'm going to compile 897 01:52:12,000 --> 01:52:17,520 we have to compile it. It's really cool, because I can just 898 01:52:17,520 --> 01:52:23,840 I want, and it'll do it. So here, if I go to optimizers, I'm 899 01:52:24,720 --> 01:52:31,039 And you'll see that, you know, the learning rate is 0.001. So I'm 900 01:52:31,039 --> 01:52:44,800 So 0.001. And my loss is going to be binary cross entropy. And the 901 01:52:44,800 --> 01:52:50,079 include on here, so it already will consider loss, but I'm, I'm 902 01:52:50,079 --> 01:52:55,600 So we can actually see that in a plot later on. Alright, so I'm 903 01:52:55,600 --> 01:53:01,760 And one thing that I'm going to also do is I'm going to define 904 01:53:01,760 --> 01:53:06,800 actually copying and pasting this, I got these from TensorFlow. So 905 01:53:06,800 --> 01:53:13,119 tutorial, they actually have these, this like, defined. And that's 906 01:53:13,119 --> 01:53:18,239 So I'm actually going to move this cell up, run that. So we're 907 01:53:18,239 --> 01:53:23,519 over all the different epochs. epochs means like training cycles. 908 01:53:23,520 --> 01:53:27,680 means like training cycles. And we're going to plot the accuracy 909 01:53:28,960 --> 01:53:36,079 Alright, so we have our model. And now all that's left is, let's 910 01:53:37,199 --> 01:53:42,720 So I'm going to say history. So TensorFlow is great, because it 911 01:53:42,720 --> 01:53:47,680 of the training, which is why we can go and plot it later on. Now 912 01:53:47,680 --> 01:53:59,280 this neural net model. And fit that with x train, y train, I'm 913 01:53:59,279 --> 01:54:06,159 equal to let's say just let's just use 100 for now. And the batch 914 01:54:06,159 --> 01:54:18,159 let's say 32. Alright. And the validation split. So what the 915 01:54:18,159 --> 01:54:23,920 here somewhere. Okay, so yeah, this validation split is just the 916 01:54:23,920 --> 01:54:31,119 to be used as validation data. So essentially, every single epoch, 917 01:54:31,119 --> 01:54:37,199 saying, leave certain if this is point two, then leave 20% out. 918 01:54:37,199 --> 01:54:42,559 model performs on that 20% that we've left out. Okay, so it's 919 01:54:42,560 --> 01:54:48,800 set. But TensorFlow does it on our training data set during the 920 01:54:48,800 --> 01:54:54,640 outside of just our validation data set to see, you know, what's 921 01:54:54,640 --> 01:55:05,760 I'm going to make that 0.2. And we can run this. So if I run that, 922 01:55:05,760 --> 01:55:13,760 to set verbose equal to zero, which means, okay, don't print 923 01:55:13,760 --> 01:55:19,680 for 100 epochs might get kind of annoying. So I'm just going to 924 01:55:19,680 --> 01:55:31,039 and then we'll see what happens. Cool, so it finished training. 925 01:55:31,039 --> 01:55:36,960 because you know, I've already defined these two functions, I can 926 01:55:36,960 --> 01:55:45,199 oops, loss of that history. And I can also plot the accuracy 927 01:55:45,199 --> 01:55:52,239 So this is a little bit ish what we're looking for. We definitely 928 01:55:52,239 --> 01:55:59,119 decreasing loss and an increasing accuracy. So here we do see 929 01:55:59,119 --> 01:56:07,199 accuracy improves from around point seven, seven or something all 930 01:56:07,199 --> 01:56:16,880 point, maybe eight one. And our loss is decreasing. So this is 931 01:56:16,880 --> 01:56:23,359 loss and accuracy is performing worse than the training loss or 932 01:56:23,359 --> 01:56:28,479 our model is training on that data. So it's adapting to that data. 933 01:56:28,479 --> 01:56:35,759 you know, stuff that it hasn't seen yet. So, so that's why. So in 934 01:56:35,760 --> 01:56:40,159 we could change a bunch of the parameters, right? Like I could 935 01:56:40,159 --> 01:56:46,960 a row of 64 nodes, and then 32, and then one. So I can change some 936 01:56:47,680 --> 01:56:53,039 And a lot of machine learning is trying to find, hey, what do we 937 01:56:54,399 --> 01:57:02,079 So what I'm actually going to do is I'm going to rewrite this so 938 01:57:02,079 --> 01:57:08,079 known as a grid search. So we can search through an entire space 939 01:57:08,079 --> 01:57:19,199 we have 64 nodes and 64 nodes, or 16 nodes and 16 nodes, and so 940 01:57:19,199 --> 01:57:26,639 we can, you know, we can change this learning rate, we can change 941 01:57:26,640 --> 01:57:33,039 you know, the batch size, all these things might affect our 942 01:57:33,039 --> 01:57:42,000 I'm also going to add what's known as a dropout layer in here. And 943 01:57:42,000 --> 01:57:51,119 saying, hey, randomly choose with at this rate, certain nodes, and 944 01:57:51,119 --> 01:57:59,760 in a certain iteration. So this helps prevent overfitting. Okay, 945 01:57:59,760 --> 01:58:06,720 define this as a function called train model, we're going to pass 946 01:58:07,920 --> 01:58:15,760 the number of nodes, the dropout, you know, the probability that 947 01:58:15,760 --> 01:58:27,199 learning rate. So I'm actually going to say lr batch size. And we 948 01:58:27,199 --> 01:58:34,319 right? I mentioned that as a parameter. So indent this, so it goes 949 01:58:34,319 --> 01:58:40,799 I'm going to set this equal to number of nodes. And now with the 950 01:58:40,800 --> 01:58:48,720 to set dropout prob. So now you know, the probability of turning 951 01:58:48,720 --> 01:58:55,360 is equal to dropout prob. And I'm going to keep the output layer 952 01:58:55,359 --> 01:59:00,479 but this here is now going to be my learning rate. And I still 953 01:59:00,479 --> 01:59:12,639 accuracy. We are actually going to train our model inside of this 954 01:59:12,640 --> 01:59:19,200 epochs equal epochs, and this is equal to whatever, you know, 955 01:59:19,199 --> 01:59:25,279 y train belong right here. Okay, so those are getting passed in as 956 01:59:25,279 --> 01:59:38,159 end, I'm going to return this model and the history of that model. 957 01:59:40,399 --> 01:59:46,399 is let's just go through all of these. So let's say let's keep 958 01:59:46,399 --> 01:59:53,279 do is I can say, hey, for a number of nodes in, let's say, let's 959 01:59:53,279 --> 02:00:02,960 happens for the different dropout probabilities. And I mean, zero 960 02:00:02,960 --> 02:00:17,199 Also, to see what happens. You know, for the learning rate in 961 02:00:17,199 --> 02:00:27,359 maybe we want to throw on 0.1 in there as well. And then for the 962 02:00:27,359 --> 02:00:33,119 64 as well. Actually, and let's also throw in 128. Actually, let's 963 02:00:33,680 --> 02:00:44,079 so 128 in there. That should be 01. I'm going to record the model 964 02:00:44,079 --> 02:00:54,640 train model here. So we're going to do x train y train, the number 965 02:00:54,640 --> 02:01:04,240 you know, the number of nodes that we've defined here, dropout, 966 02:01:04,239 --> 02:01:10,479 Okay. And then now we have both the model and the history. And 967 02:01:10,479 --> 02:01:18,079 I want to plot the loss for the history. I'm also going to plot 968 02:01:19,840 --> 02:01:22,640 Probably should have done them side by side, that probably would 969 02:01:26,319 --> 02:01:34,399 Okay, so what I'm going to do is split up, split this up. And that 970 02:01:34,399 --> 02:01:41,039 the subplots. So now this is just saying, okay, I want one row and 971 02:01:41,039 --> 02:01:56,000 plots. Okay, so I'm going to plot on my axis one, the loss. I 972 02:01:56,000 --> 02:02:04,640 work. Okay, we don't care about the grid. Yeah, let's let's keep 973 02:02:09,199 --> 02:02:14,800 So now on here, I'm going to plot all the accuracies on the second 974 02:02:20,159 --> 02:02:21,840 I might have to debug this a bit. 975 02:02:21,840 --> 02:02:27,680 We should be able to get rid of that. If we run this, we already 976 02:02:27,680 --> 02:02:36,800 in here. So if I just run it on this, okay, it has no attribute x 977 02:02:36,800 --> 02:02:47,680 it's like set x label or something. Okay, yeah, so it's, it's set 978 02:02:47,680 --> 02:02:54,480 So let's see if that works. All right, cool. Um, and let's 979 02:02:55,439 --> 02:02:59,919 Okay, so we can actually change the figure size that I'm gonna 980 02:02:59,920 --> 02:03:08,159 set that to. Oh, that's not the way I wanted it. Okay, so that 981 02:03:08,159 --> 02:03:13,920 And that's just going to be my plot history function. So now I can 982 02:03:15,279 --> 02:03:23,279 Here, I'm going to plot the history. And what I'm actually going 983 02:03:23,279 --> 02:03:26,079 I'm going to print out all these parameters. So I'm going to print 984 02:03:27,359 --> 02:03:34,960 the F string to print out all of this stuff. So here, I'm going to 985 02:03:34,960 --> 02:03:42,720 Uh, all of this stuff. So here, I'm printing out how many nodes, 986 02:03:55,199 --> 02:03:57,519 And we already know how many you found, so I'm not even going to 987 02:03:57,520 --> 02:04:10,560 So once we plot this, uh, let's actually also figure out what the, 988 02:04:10,560 --> 02:04:15,680 losses on our validation set that we have that we created all the 989 02:04:16,720 --> 02:04:23,760 Alright, so remember, we created three data sets. Let's call our 990 02:04:23,760 --> 02:04:32,640 validation data with the validation data sets loss would be. And I 991 02:04:33,520 --> 02:04:38,160 let's say I want to record whatever model has the least validation 992 02:04:40,640 --> 02:04:45,360 first, I'm going to initialize that to infinity so that you know, 993 02:04:45,359 --> 02:04:53,599 So if I do float infinity, that will set that to infinity. And 994 02:04:53,600 --> 02:04:58,640 track of the parameters. Actually, it doesn't really matter. I'm 995 02:04:58,640 --> 02:05:06,480 the model. And I'm gonna set that to none. So now down here, if 996 02:05:06,479 --> 02:05:13,759 less than the least validation loss, then I am going to simply 997 02:05:13,760 --> 02:05:20,400 Hey, this validation for this least validation loss is now equal 998 02:05:21,600 --> 02:05:30,480 And the least loss model is whatever this model is that just 999 02:05:31,840 --> 02:05:40,319 So we are actually just going to let this run for a while. And 1000 02:05:40,319 --> 02:05:51,840 last model after that. So let's just run. All right, and now we 1001 02:05:51,840 --> 02:06:12,079 All right, so we've finally finished training. And you'll notice 1002 02:06:12,079 --> 02:06:19,039 actually gets to like 0.29. The accuracy is around 88%, which is 1003 02:06:19,039 --> 02:06:26,239 okay, why is this accuracy in this? Like, these are both the 1004 02:06:26,239 --> 02:06:30,319 is on the validation data set that we've defined at the beginning, 1005 02:06:30,319 --> 02:06:35,840 this is actually taking 20% of our tests, our training set every 1006 02:06:35,840 --> 02:06:41,199 and saying, Okay, how much of it do I get right now? You know, 1007 02:06:41,199 --> 02:06:46,880 train with any of that. So they're slightly different. And 1008 02:06:46,880 --> 02:06:52,640 that I probably you know, probably what I should have done is over 1009 02:06:54,640 --> 02:06:59,920 the model fit, instead of the validation split, you can define the 1010 02:07:00,479 --> 02:07:04,639 And you can pass in the validation data, I don't know if this is 1011 02:07:05,439 --> 02:07:09,439 that's probably what I should have done. But instead, you know, 1012 02:07:09,439 --> 02:07:16,719 we have here. So you'll see at the end, you know, with the 64 1013 02:07:16,720 --> 02:07:24,880 performance 64 nodes with a dropout of 0.2, a learning rate of 1014 02:07:25,439 --> 02:07:31,439 And it does seem like yes, the validation, you know, the fake 1015 02:07:34,000 --> 02:07:40,239 loss is decreasing, and then the accuracy is increasing, which is 1016 02:07:40,239 --> 02:07:45,039 so finally, what I'm going to do is I'm actually just going to 1017 02:07:45,039 --> 02:07:50,960 this model, which we've called our least loss model, I'm going to 1018 02:07:50,960 --> 02:07:58,159 and I'm going to predict x test on that. And you'll see that it 1019 02:07:58,159 --> 02:08:02,159 are really close to zero and some that are really close to one. 1020 02:08:02,159 --> 02:08:11,920 output. So if I do this, and what I can do is I can cast them. So 1021 02:08:11,920 --> 02:08:20,239 greater than 0.5, set that to one. So if I actually, I think what 1022 02:08:22,399 --> 02:08:29,759 Oh, okay, so I have to cast that as type. And so now you'll see 1023 02:08:29,760 --> 02:08:40,560 actually going to transform this into a column as well. So here 1024 02:08:40,560 --> 02:08:49,280 I didn't mean to do that. Okay, no, I wanted to just reshape it to 1025 02:08:49,279 --> 02:08:57,599 Okay. And using that we can actually just rerun the classification 1026 02:08:57,600 --> 02:09:04,880 neural net output. And you'll see that okay, the the F ones are 1027 02:09:04,880 --> 02:09:12,560 seems like what happened here is the precision on class zero. So 1028 02:09:12,560 --> 02:09:19,840 but the recall decreased. But the F one score is still at a good 1029 02:09:19,840 --> 02:09:24,480 class, it looked like the precision decreased a bit the recall 1030 02:09:25,039 --> 02:09:31,439 That's also been increased. I think I interpreted that properly. I 1031 02:09:31,439 --> 02:09:37,839 work and we got a model that performs actually very, very 1032 02:09:37,840 --> 02:09:43,039 had earlier. And the whole point of this exercise was to 1033 02:09:43,039 --> 02:09:48,720 define your models. But it's also to say, hey, maybe, you know, 1034 02:09:48,720 --> 02:09:55,840 powerful, as you can tell. But sometimes, you know, an SVM or some 1035 02:09:55,840 --> 02:10:03,360 appropriate. But in this case, I guess it didn't really matter 1036 02:10:04,399 --> 02:10:10,639 accuracy score is still pretty good. So yeah, let's now move on to 1037 02:10:11,840 --> 02:10:17,039 We just saw a bunch of different classification models. Now let's 1038 02:10:17,039 --> 02:10:23,279 the other type of supervised learning. If we look at this plot 1039 02:10:23,279 --> 02:10:31,439 data points. And here we have our x value for those data points. 1040 02:10:31,439 --> 02:10:40,079 value, which is now our label. And when we look at this plot, 1041 02:10:40,079 --> 02:10:48,159 the line of best fit that best models this data. Essentially, 1042 02:10:48,159 --> 02:10:54,159 some new value of x that we don't have in our sample, we're trying 1043 02:10:54,159 --> 02:11:01,599 prediction for y be for that given x value. So that, you know, 1044 02:11:03,279 --> 02:11:08,399 I don't know. But remember, in regression that, you know, given 1045 02:11:08,399 --> 02:11:12,079 we're trying to predict some continuous numerical value for y. 1046 02:11:12,079 --> 02:11:21,199 In linear regression, we want to take our data and fit a linear 1047 02:11:21,199 --> 02:11:30,079 our linear model might look something along the lines of here. 1048 02:11:30,079 --> 02:11:41,119 considered as maybe our line of best fit. And this line is modeled 1049 02:11:41,119 --> 02:11:51,680 it down here, y equals b zero, plus b one x. Now b zero just means 1050 02:11:51,680 --> 02:11:58,880 extend this y down here, this value here is b zero, and then b one 1051 02:11:58,880 --> 02:12:08,880 line, defines the slope of this line. Okay. All right. So that's 1052 02:12:09,680 --> 02:12:17,119 for linear regression. And how exactly do we come up with that 1053 02:12:17,119 --> 02:12:23,279 with this linear regression? You know, we could just eyeball where 1054 02:12:23,279 --> 02:12:29,279 not very good at eyeballing certain things like that. I mean, we 1055 02:12:29,279 --> 02:12:37,519 better at giving us a precise value for b zero and b one. Well, 1056 02:12:37,520 --> 02:12:47,200 something known as a residual. Okay, so residual, you might also 1057 02:12:47,199 --> 02:12:55,039 And what that means is, let's take some data point in our data 1058 02:12:55,039 --> 02:13:03,439 far off is our prediction from a data point that we already have. 1059 02:13:04,000 --> 02:13:15,119 this is 12345678. So this is y eight, let's call it, you'll see 1060 02:13:15,119 --> 02:13:23,039 I in order to represent, hey, just one of these points. Okay. So 1061 02:13:23,039 --> 02:13:30,720 would be the prediction. Oops, this here would be the prediction 1062 02:13:30,720 --> 02:13:35,199 with this hat. Okay, if it has a hat on it, that means hey, this 1063 02:13:35,199 --> 02:13:48,239 my prediction for you know, this specific value of x. Okay. Now 1064 02:13:48,239 --> 02:13:58,719 here between y eight and y hat eight. So y eight minus y hat 1065 02:13:58,720 --> 02:14:04,400 give us this here. And I'm just going to take the absolute value 1066 02:14:04,399 --> 02:14:08,879 the line, right, then you would get a negative value, but distance 1067 02:14:08,880 --> 02:14:14,560 just going to put a little hat, or we're going to put a little 1068 02:14:15,279 --> 02:14:23,519 And that gives us the residual or the error. So let me rewrite 1069 02:14:23,520 --> 02:14:32,960 to all the points, I'm going to say the residual can be calculated 1070 02:14:32,960 --> 02:14:39,279 So this just means the distance between some given point, and its 1071 02:14:39,279 --> 02:14:47,679 prediction on the line. So now, with this residual, this line of 1072 02:14:47,680 --> 02:14:55,840 decrease these residuals as much as possible. So now that we have 1073 02:14:55,840 --> 02:15:00,640 our line of best fit is trying to decrease the error as much as 1074 02:15:00,640 --> 02:15:07,840 data points. And that might mean, you know, minimizing the sum of 1075 02:15:07,840 --> 02:15:14,720 here, this is the sum symbol. And if I just stick the residual 1076 02:15:16,640 --> 02:15:21,200 it looks something like that, right. And I'm just going to say, 1077 02:15:21,199 --> 02:15:27,679 data set, so for all the different points, we're going to sum up 1078 02:15:27,680 --> 02:15:33,200 to try to decrease that with my line of best fit. So I'm going to 1079 02:15:33,199 --> 02:15:41,679 me the lowest value of this. Okay. Now in other, you know, 1080 02:15:41,680 --> 02:15:49,039 we might attach a squared to that. So we're trying to decrease the 1081 02:15:49,039 --> 02:16:03,519 And what that does is it just, you know, it adds a higher penalty 1082 02:16:03,520 --> 02:16:07,920 you know, points that are further off. So that is linear 1083 02:16:08,640 --> 02:16:15,520 this equation, some line of best fit that will help us decrease 1084 02:16:15,520 --> 02:16:19,920 with respect to all the data points that we have in our data set, 1085 02:16:19,920 --> 02:16:27,760 the best prediction for all of them. This is known as simple 1086 02:16:30,880 --> 02:16:39,520 And basically, that means, you know, our equation looks something 1087 02:16:39,520 --> 02:16:52,479 multiple linear regression, which just means that hey, if we have 1088 02:16:52,479 --> 02:16:58,559 think of our feature vectors, we have multiple values in our x 1089 02:16:58,559 --> 02:17:11,199 look something more like this. Actually, I'm just going to say 1090 02:17:11,200 --> 02:17:18,960 up with some coefficient for all of the different x values that I 1091 02:17:18,959 --> 02:17:23,039 might have noticed that I have some assumptions over here. And you 1092 02:17:23,040 --> 02:17:26,560 what in the world do these assumptions mean? So let's go over 1093 02:17:26,559 --> 02:17:31,119 So let's go over them. The first one is linearity. 1094 02:17:33,840 --> 02:17:38,399 And what that means is, let's say I have a data set. Okay. 1095 02:17:43,760 --> 02:17:50,960 Linearity just means, okay, my does my data follow a linear 1096 02:17:50,959 --> 02:17:59,279 increases? Or does y decrease at as x increases? Does so if y 1097 02:17:59,280 --> 02:18:04,720 rate as x increases, then you're probably looking at something 1098 02:18:04,719 --> 02:18:12,959 nonlinear data set? Let's say I had data that might look something 1099 02:18:12,959 --> 02:18:18,719 visually judging this, you might say, okay, seems like the line of 1100 02:18:18,719 --> 02:18:28,559 curve like this. Right. And in this case, we don't satisfy that 1101 02:18:29,680 --> 02:18:36,960 So with linearity, we basically just want our data set to follow 1102 02:18:39,280 --> 02:18:42,640 And independence, our second assumption 1103 02:18:42,639 --> 02:18:50,079 just means this point over here, it should have no influence on 1104 02:18:50,079 --> 02:18:55,039 or this point over here, or this point over here. So in other 1105 02:18:56,000 --> 02:19:03,440 all the samples in our data set should be independent. Okay, they 1106 02:19:03,440 --> 02:19:05,840 one another, they should not affect one another. 1107 02:19:05,840 --> 02:19:17,120 Okay, now, normality and homoscedasticity, those are concepts 1108 02:19:17,120 --> 02:19:31,120 I have a plot that looks something like this, and I have a plot 1109 02:19:31,120 --> 02:19:45,680 something like this. And my line of best fit is somewhere here, 1110 02:19:47,200 --> 02:19:52,000 In order to look at these normality and homoscedasticity 1111 02:19:52,000 --> 02:20:03,440 the residual plot. Okay. And what that means is I'm going to keep 1112 02:20:03,440 --> 02:20:09,360 of plotting now where they are relative to this y, I'm going to 1113 02:20:09,360 --> 02:20:19,200 going to plot y minus y hat like this. Okay. And now you know, 1114 02:20:19,200 --> 02:20:24,720 so it might be here, this one down here is negative, it might be 1115 02:20:25,840 --> 02:20:30,079 it's literally just a plot of how you know, the values are 1116 02:20:30,079 --> 02:20:42,879 fit. So it looks like it might, you know, look something like 1117 02:20:42,879 --> 02:20:55,279 residual plot. And what normality means, so our assumptions are 1118 02:20:59,280 --> 02:21:05,120 I might have butchered that spelling, I don't really know. But 1119 02:21:05,120 --> 02:21:12,960 saying, okay, these residuals should be normally distributed. 1120 02:21:12,959 --> 02:21:21,599 it should follow a normal distribution. And now what 1121 02:21:21,600 --> 02:21:28,399 of these points should remain constant throughout. So this spread 1122 02:21:28,399 --> 02:21:35,199 same as this spread over here. Now, what's an example of where you 1123 02:21:35,200 --> 02:21:43,920 not held? Well, let's say that our original plot actually looks 1124 02:21:46,479 --> 02:21:51,600 Okay, so now if we looked at the residuals for that, it might look 1125 02:21:51,600 --> 02:22:03,600 like that. And now if we look at this spread of the points, it 1126 02:22:03,600 --> 02:22:12,559 is not constant, which means that homoscedasticity, this 1127 02:22:12,559 --> 02:22:18,559 might not be appropriate to use linear regression. So that's just 1128 02:22:18,559 --> 02:22:25,680 we have a bunch of data points, we want to predict some y value 1129 02:22:25,680 --> 02:22:32,639 up with this line of best fit that best describes, hey, given some 1130 02:22:32,639 --> 02:22:43,039 guess of what y is. So let's move on to how do we evaluate a 1131 02:22:43,040 --> 02:22:49,600 measure that I'm going to talk about is known as mean absolute 1132 02:22:52,079 --> 02:22:59,039 for short, okay. And mean absolute error is basically saying, all 1133 02:22:59,040 --> 02:23:06,080 all the errors. So all these residuals that we talked about, let's 1134 02:23:06,079 --> 02:23:11,440 for all of them, and then take the average. And then that can 1135 02:23:11,440 --> 02:23:18,319 we. So the mathematical formula for that would be, okay, let's 1136 02:23:21,680 --> 02:23:27,440 Alright, so this is the distance. Actually, let me redraw a plot 1137 02:23:27,440 --> 02:23:41,440 suppose I have a data set, look like this. And here are all my 1138 02:23:41,440 --> 02:23:52,319 say my line looks something like that. So my mean absolute error 1139 02:23:52,319 --> 02:24:01,600 values. This was a mistake. So summing up all of these, and then 1140 02:24:01,600 --> 02:24:07,760 I have. So what would be all the residuals, it would be y i, 1141 02:24:08,639 --> 02:24:16,159 minus y hat i, so the prediction for that on here. And then we're 1142 02:24:16,159 --> 02:24:24,319 all of the different i's in our data set. Right, so i, and then we 1143 02:24:24,319 --> 02:24:29,119 we have. So actually, I'm going to rewrite this to make it a 1144 02:24:29,120 --> 02:24:33,680 whatever the first data point is all the way through the nth data 1145 02:24:33,680 --> 02:24:42,399 it by n, which is how many points there are. Okay, so this is our 1146 02:24:42,399 --> 02:24:50,479 telling us, okay, in on average, this is the distance between our 1147 02:24:50,479 --> 02:25:01,359 actual value in our training set. Okay. And mae is good because it 1148 02:25:01,360 --> 02:25:08,720 get this value here, we can literally directly compare it to 1149 02:25:08,719 --> 02:25:17,920 So let's say y is we're talking, you know, the prediction of the 1150 02:25:17,920 --> 02:25:24,719 dollars. Once we have once we calculate the mae, we can literally 1151 02:25:24,719 --> 02:25:34,319 price, the average, how much we're off by is literally this many 1152 02:25:34,319 --> 02:25:40,159 mean absolute error. An evaluation technique that's also closely 1153 02:25:40,159 --> 02:25:53,280 squared error. And this is MSE for short. Okay. Now, if I take 1154 02:25:53,280 --> 02:25:59,360 and move it down here, well, the gist of mean squared error is 1155 02:25:59,360 --> 02:26:06,159 of the absolute value, we're going to square. So now the MSE is 1156 02:26:06,159 --> 02:26:11,920 okay, let's sum up something, right, so we're going to sum up all 1157 02:26:13,280 --> 02:26:19,120 So now I'm going to do y i minus y hat i. But instead of absolute 1158 02:26:19,120 --> 02:26:25,360 I'm going to square them all. And then I'm going to divide by n in 1159 02:26:25,360 --> 02:26:33,200 basically, now I'm taking all of these different values, and I'm 1160 02:26:33,200 --> 02:26:42,079 them to one another. And then I divide by n. And the reason why we 1161 02:26:42,079 --> 02:26:49,680 is that it helps us punish large errors in the prediction. And 1162 02:26:49,680 --> 02:26:55,760 because of differentiability, right? So a quadratic equation is 1163 02:26:55,760 --> 02:27:00,719 if you're familiar with calculus, a quadratic equation is 1164 02:27:00,719 --> 02:27:05,279 value function is not totally differentiable everywhere. But if 1165 02:27:05,280 --> 02:27:10,560 don't worry about it, you won't really need it right now. And now 1166 02:27:10,559 --> 02:27:16,239 error is that once I calculate the mean squared error over here, 1167 02:27:16,239 --> 02:27:25,360 want to compare the values. Well, it gets a little bit trickier to 1168 02:27:25,360 --> 02:27:33,280 error is in terms of y squared, right? It's this is now squared. 1169 02:27:33,280 --> 02:27:40,079 you know, how many dollars off am I I'm talking how many dollars 1170 02:27:40,079 --> 02:27:45,440 you know, to humans, it doesn't really make that much sense. Which 1171 02:27:45,440 --> 02:27:53,600 something known as the root mean squared error. And I'm just going 1172 02:27:53,600 --> 02:28:02,559 because it's very, very similar to mean squared error. Except now 1173 02:28:03,280 --> 02:28:10,640 Okay, so this is our messy, and we take the square root of that 1174 02:28:10,639 --> 02:28:17,760 term in which you know, we're defining our error is now in terms 1175 02:28:17,760 --> 02:28:23,280 So that's a pro of root mean squared error is that now we can say, 1176 02:28:23,280 --> 02:28:30,320 to this metric is this many dollar signs off from our predictor. 1177 02:28:30,319 --> 02:28:37,680 which is one of the pros of root mean squared error. And now 1178 02:28:37,680 --> 02:28:43,200 of determination, or r squared. And this is a formula for r 1179 02:28:43,200 --> 02:28:55,200 to one minus RSS over TSS. Okay, so what does that mean? 1180 02:28:56,639 --> 02:29:03,920 of the squared residuals. So maybe it should be SSR instead, but 1181 02:29:03,920 --> 02:29:14,079 RSS sum of the squared residuals, and this is equal to if I take 1182 02:29:14,799 --> 02:29:24,799 and I take y i minus y hat, i, and square that, that is my RSS, 1183 02:29:24,799 --> 02:29:30,639 residuals. Now TSS, let me actually use a different color for 1184 02:29:30,639 --> 02:29:38,479 So TSS is the total sum of squares. 1185 02:29:41,040 --> 02:29:46,640 And what that means is that instead of being with respect to this 1186 02:29:52,079 --> 02:29:59,440 take each y value and just subtract the mean of all the y values, 1187 02:30:16,000 --> 02:30:23,040 actually, let's use a different color. Let's use green. If this 1188 02:30:24,799 --> 02:30:33,039 so RSS is giving me this measure here, right? It's giving me some 1189 02:30:33,040 --> 02:30:41,840 our regressor that we predicted. Actually, I'm gonna take this 1190 02:30:41,840 --> 02:30:52,639 and actually, I'm going to use red for that. Well, TSS, on the 1191 02:30:52,639 --> 02:30:59,039 how far off are these values from the mean. So if we literally 1192 02:30:59,040 --> 02:31:04,800 line of best fit, if we just took all the y values and average all 1193 02:31:04,799 --> 02:31:10,159 this is the average value for every single x value, I'm just going 1194 02:31:10,159 --> 02:31:16,000 instead, then it's asking, okay, how far off are all these points 1195 02:31:19,120 --> 02:31:26,079 Okay, and remember that this square means that we're punishing 1196 02:31:26,079 --> 02:31:32,959 they look somewhat close in terms of distance, the further a few 1197 02:31:32,959 --> 02:31:39,439 the larger our total sum of squares is going to be. Sorry, that 1198 02:31:39,440 --> 02:31:44,960 squares is taking all of these values and saying, okay, what is 1199 02:31:44,959 --> 02:31:51,119 any regressor, and I literally just calculated the average of all 1200 02:31:51,120 --> 02:31:55,440 and for every single x value, I'm just going to predict that 1201 02:31:55,440 --> 02:32:00,720 like, that means that maybe y and x aren't associated with each 1202 02:32:00,719 --> 02:32:05,599 best thing that I can do for any new x value, just predict, hey, 1203 02:32:05,600 --> 02:32:11,200 And this total sum of squares is saying, okay, well, with respect 1204 02:32:12,239 --> 02:32:19,920 what is our error? Right? So up here, the sum of the squared 1205 02:32:19,920 --> 02:32:26,799 our what what is our error with respect to this line of best fit? 1206 02:32:26,799 --> 02:32:34,559 saying what is the error with respect to, you know, just the 1207 02:32:34,559 --> 02:32:44,639 of best fit is a better fit, then this total sum of squares, that 1208 02:32:46,079 --> 02:32:51,520 that means that this numerator is going to be smaller than this 1209 02:32:52,319 --> 02:32:59,600 And if our errors in our line of best fit are much smaller, then 1210 02:32:59,600 --> 02:33:06,960 of the RSS over TSS is going to be very small, which means that R 1211 02:33:06,959 --> 02:33:14,319 one. And now when R squared is towards one, that means that that's 1212 02:33:14,319 --> 02:33:24,719 good predictor. It's one of the signs, not the only one. So over 1213 02:33:24,719 --> 02:33:29,840 that there's this adjusted R squared. And what that does, it just 1214 02:33:29,840 --> 02:33:36,000 So x1, x2, x3, etc. It adjusts for how many extra terms we add, 1215 02:33:37,280 --> 02:33:42,480 you know, add an extra term, the R squared value will increase 1216 02:33:42,479 --> 02:33:48,879 y some more. But the value for the adjusted R squared increase if 1217 02:33:48,879 --> 02:33:54,000 improves this model fit more than expected, you know, by chance. 1218 02:33:54,000 --> 02:33:58,159 R squared is. I'm not, you know, it's out of the scope of this one 1219 02:33:58,159 --> 02:34:04,559 And now that's linear regression. Basically, I've covered the 1220 02:34:05,280 --> 02:34:11,040 And, you know, how do we use that in order to find the line of 1221 02:34:11,040 --> 02:34:15,200 our computer can do all the calculations for us, which is nice. 1222 02:34:15,200 --> 02:34:20,400 it's trying to minimize that error, right? And then we've gone 1223 02:34:20,399 --> 02:34:25,440 ways of actually evaluating a linear regression model and the pros 1224 02:34:26,559 --> 02:34:31,760 So now let's look at an example. So we're still on supervised 1225 02:34:31,760 --> 02:34:37,120 talk about regression. So what happens when you don't just want to 1226 02:34:37,120 --> 02:34:43,840 What happens if you actually want to predict a certain value? So 1227 02:34:43,840 --> 02:34:54,399 learning repository. And here I found this data set about bike 1228 02:34:55,040 --> 02:35:01,520 So this data set is predicting rental bike count. And here it's 1229 02:35:01,520 --> 02:35:08,159 hour. So what we're going to do, again, you're going to go into 1230 02:35:08,159 --> 02:35:19,520 to download this CSV file. And we're going to move over to collab 1231 02:35:19,520 --> 02:35:29,680 this FCC bikes and regression. I don't remember what I called the 1232 02:35:29,680 --> 02:35:39,600 regression. Now I'm going to import a bunch of the same things 1233 02:35:39,600 --> 02:35:46,559 I'm going to also continue to import the oversampler and the 1234 02:35:46,559 --> 02:35:52,799 also just going to let you guys know that I have a few more things 1235 02:35:52,799 --> 02:35:59,199 library that lets us copy things. Seaborn is a wrapper over a 1236 02:35:59,200 --> 02:36:03,280 to plot certain things. And then just letting you know that we're 1237 02:36:03,280 --> 02:36:07,920 TensorFlow. Okay, so one more thing that we're also going to be 1238 02:36:07,920 --> 02:36:13,760 sklearn linear model library. Actually, let me make my screen a 1239 02:36:15,600 --> 02:36:25,120 awesome. Run this and that'll import all the things that we need. 1240 02:36:25,120 --> 02:36:34,960 you know, give some credit to where we got this data set. So let 1241 02:36:38,000 --> 02:36:42,159 And I will also give credit to this here. 1242 02:36:46,559 --> 02:36:54,319 Okay, cool. All right, cool. So this is our data set. And again, 1243 02:36:54,319 --> 02:37:01,520 attributes that we have right here. So I'm actually going to go 1244 02:37:05,280 --> 02:37:09,280 Feel free to copy and paste this if you want me to read it out 1245 02:37:09,280 --> 02:37:18,960 It's byte count, hour, temp, humidity, wind, visibility, dew 1246 02:37:18,959 --> 02:37:27,279 snow, and functional, whatever that means. Okay, so I'm going to 1247 02:37:27,280 --> 02:37:34,800 by dragging and dropping. All right. Now, one thing that you guys 1248 02:37:34,799 --> 02:37:41,359 you might actually have to open up the CSV because there were, at 1249 02:37:41,360 --> 02:37:46,319 characters in mine, at least. So you might have to get rid of 1250 02:37:46,319 --> 02:37:50,639 but my computer wasn't recognizing it. So I got rid of that. So 1251 02:37:50,639 --> 02:37:58,639 and get rid of some of those labels that are incorrect. I'm going 1252 02:37:59,600 --> 02:38:07,040 after we've done that, we've imported in here, I'm going to create 1253 02:38:07,040 --> 02:38:12,560 all right, so now what I can do is I can read that CSV file and I 1254 02:38:12,559 --> 02:38:21,359 So so like data dot CSV. Okay, so now if I call data dot head, 1255 02:38:21,360 --> 02:38:32,079 various labels, right? And then I have the data in there. So I'm 1256 02:38:32,079 --> 02:38:37,600 going to get rid of some of these columns that, you know, I don't 1257 02:38:37,600 --> 02:38:44,159 I'm going to, when I when I type this in, I'm going to drop maybe 1258 02:38:44,159 --> 02:38:53,039 holiday, and the various seasons. So I'm just not going to care 1259 02:38:53,040 --> 02:38:59,120 one means drop it from the columns. So now you'll see that okay, 1260 02:38:59,120 --> 02:39:05,280 I guess you don't really notice it. But if I set the data frames 1261 02:39:05,280 --> 02:39:11,280 and I look at, you know, the first five things, then you'll see 1262 02:39:11,280 --> 02:39:17,520 It's a lot easier to read. So another thing is, I'm actually going 1263 02:39:18,319 --> 02:39:24,239 df functional. And we're going to create this. So remember that 1264 02:39:24,239 --> 02:39:30,000 at language, we want it to be in zeros and ones. So here, I will 1265 02:39:30,000 --> 02:39:39,920 Well, if this is equal to yes, then that that gets mapped as one. 1266 02:39:41,040 --> 02:39:48,560 Great. Cool. So the thing is, right now, these by counts are for 1267 02:39:48,559 --> 02:39:52,559 to make this example simpler, I'm just going to index on an hour, 1268 02:39:52,559 --> 02:39:59,359 we're only going to use that specific hour. So I'm just going to 1269 02:39:59,360 --> 02:40:07,680 going to use an hour. So here, let's say. So this data frame is 1270 02:40:07,680 --> 02:40:17,600 the hour, let's say it equals 12. Okay, so it's noon. All right. 1271 02:40:17,600 --> 02:40:31,120 equal to 12. And I'm actually going to now drop that column. Our 1272 02:40:31,120 --> 02:40:38,480 so we run this cell. Okay, so now we got rid of the hour in here. 1273 02:40:38,479 --> 02:40:45,760 the temperature, humidity, wind, visibility, and yada, yada, yada. 1274 02:40:45,760 --> 02:40:54,639 is I'm going to actually plot all of these. So for i in all the 1275 02:40:55,440 --> 02:40:59,280 whatever its data frame is, and all the columns, because I don't 1276 02:41:00,159 --> 02:41:05,760 actually, it's my first thing. So what I'm going to do is say for 1277 02:41:06,559 --> 02:41:10,159 columns, everything after the first thing, so that would give me 1278 02:41:10,159 --> 02:41:19,440 onwards. So these are all my features, right? I'm going to just 1279 02:41:19,440 --> 02:41:29,680 label how that specific data, how that affects the by count. So 1280 02:41:29,680 --> 02:41:35,760 the y axis. And I'm going to plot, you know, whatever the specific 1281 02:41:35,760 --> 02:41:46,000 And I'm going to title this, whatever the label is. And, you know, 1282 02:41:46,639 --> 02:41:58,079 at noon. And the x label as just the label. Okay, now, I guess we 1283 02:41:58,079 --> 02:42:10,000 We don't even need the legend. So just show that plot. All right. 1284 02:42:10,000 --> 02:42:21,920 not really doesn't really give us any utility. So then snow rain 1285 02:42:21,920 --> 02:42:31,040 you know, is fairly linear dew point temperature, visibility, wind 1286 02:42:31,040 --> 02:42:37,200 much humidity, kind of maybe like an inverse relationship. But the 1287 02:42:37,200 --> 02:42:41,680 looks like there's a relationship between that and the number of 1288 02:42:41,680 --> 02:42:46,000 going to do is I'm going to drop some of the ones that don't don't 1289 02:42:46,000 --> 02:42:56,959 maybe wind, you know, visibility. Yeah, so I'm going to get rid of 1290 02:42:59,280 --> 02:43:13,760 So now data frame, and I'm going to drop wind, visibility, and 1291 02:43:13,760 --> 02:43:21,200 axis again is the column. So that's one. So if I look at my data 1292 02:43:21,200 --> 02:43:27,200 temperature, the humidity, the dew point temperature, radiation, 1293 02:43:27,200 --> 02:43:33,760 what I want to do is I want to split this into my training, my 1294 02:43:34,319 --> 02:43:42,719 just as we talked before. Here, we can use the exact same thing 1295 02:43:42,719 --> 02:43:51,359 numpy dot split, and sample, you know that the whole sample, and 1296 02:43:54,000 --> 02:44:02,559 of the data frame. And we're going to do that. But now set this to 1297 02:44:04,639 --> 02:44:10,159 So I don't really care about, you know, the the full grid, the 1298 02:44:10,159 --> 02:44:19,680 use an underscore for that variable. But I will get my training x 1299 02:44:19,680 --> 02:44:29,600 have a function for getting the x and y's. So here, I'm going to 1300 02:44:30,159 --> 02:44:36,879 get x y. And I'm going to pass in the data frame. And I'm actually 1301 02:44:36,879 --> 02:44:47,039 of the y label is, and what the x what specific x labels I want to 1302 02:44:47,040 --> 02:44:51,520 then I'm just like, like, I'm only going to I'm going to get 1303 02:44:51,520 --> 02:45:00,560 not the wildlife. So here, I'm actually going to make first a deep 1304 02:45:00,559 --> 02:45:08,879 And that basically means I'm just copying everything over. If, if 1305 02:45:08,879 --> 02:45:14,559 so if not x labels, then all I'm going to do is say, all right, x 1306 02:45:14,559 --> 02:45:22,959 data frame is. And I'm just going to take all the columns. So C 1307 02:45:22,959 --> 02:45:32,239 if C does not equal the y label, right, and I'm going to get the 1308 02:45:32,239 --> 02:45:40,159 is the x labels, well, okay, so in order to index only one thing, 1309 02:45:40,159 --> 02:45:50,000 one thing in here, then my data frame is, so let me make a case 1310 02:45:50,000 --> 02:46:00,319 labels is equal to one, then what I'm going to do is just say that 1311 02:46:00,319 --> 02:46:07,600 and add that just that label values, and I actually need to 1312 02:46:08,159 --> 02:46:15,039 So I'm going to pass in negative one comma one there. Now, 1313 02:46:15,040 --> 02:46:20,000 specific x labels that I want to use, then I'm actually just going 1314 02:46:20,000 --> 02:46:28,719 frame of those x labels, dot values. And that should suffice. 1315 02:46:28,719 --> 02:46:36,159 extracting x. And in order to get my y, I'm going to do y equals 1316 02:46:36,159 --> 02:46:45,440 label. And at the very end, I'm going to say data equals NP dot h 1317 02:46:45,440 --> 02:46:54,960 one next to each other. And I'll take x and y, and return that. 1318 02:46:54,959 --> 02:46:59,119 And I'm actually going to reshape this to make it 2d as well so 1319 02:46:59,120 --> 02:47:10,160 And I will return data x, y. So now I should be able to say, okay, 1320 02:47:10,159 --> 02:47:18,639 frame. And the y label, so my y label is byte count. And actually, 1321 02:47:18,639 --> 02:47:24,399 going to let's just do like one dimension right now. And earlier, 1322 02:47:24,399 --> 02:47:30,719 had seen that maybe, you know, the temperature dimension does 1323 02:47:30,719 --> 02:47:38,639 to use that to predict why. So I'm going to label this also that, 1324 02:47:38,639 --> 02:47:48,559 temperature. And I am also going to do this again for, oh, this 1325 02:47:48,559 --> 02:48:00,239 validation. And this should be a test. Because oh, that's Val. 1326 02:48:01,920 --> 02:48:08,079 And this should be test. Alright, so we run this and now we have 1327 02:48:08,639 --> 02:48:16,239 data sets for just the temperature. So if I look at x train temp, 1328 02:48:16,239 --> 02:48:23,039 Okay, and I'm doing this first to show you simple linear 1329 02:48:23,040 --> 02:48:30,800 create a regressor. So I can say the temp regressor here. And then 1330 02:48:30,799 --> 02:48:40,000 linear regression model. And just like before, I can simply fix 1331 02:48:40,000 --> 02:48:48,239 in order to train train this linear regression model. Alright, and 1332 02:48:49,040 --> 02:49:02,160 this regressor is coefficients and the intercept. So if I do that, 1333 02:49:02,159 --> 02:49:11,039 for whatever the temperature is, and then the the x intercept, 1334 02:49:11,040 --> 02:49:25,920 right. And I can, you know, score, so I can get the the r squared 1335 02:49:25,920 --> 02:49:35,520 and y test. All right, so it's an r squared of around point three 1336 02:49:35,520 --> 02:49:40,880 zero, which would mean, hey, there's absolutely no association. 1337 02:49:42,319 --> 02:49:47,520 good, it depends on the context. But, you know, the higher that 1338 02:49:47,520 --> 02:49:53,680 the two variables would be correlated, right? Which here, it's all 1339 02:49:53,680 --> 02:50:00,319 maybe some association between the two. But the reason why I want 1340 02:50:00,319 --> 02:50:06,799 you, you know, if we plotted this, this is what it would look 1341 02:50:07,440 --> 02:50:22,480 and let's take the training. So this is our data. And then let's 1342 02:50:22,479 --> 02:50:29,279 also plotted, so something that I can do is say, you know, the x 1343 02:50:29,840 --> 02:50:36,399 is when space, and this goes from negative 20 to 40, this piece of 1344 02:50:36,399 --> 02:50:47,199 let's take 100 things from there. So I'm going to plot x, and I'm 1345 02:50:47,200 --> 02:50:55,840 this, like, regressor, and predict x with that. Okay, and this 1346 02:50:57,200 --> 02:51:08,800 the fit. And this color, let's make this red. And let's actually 1347 02:51:08,799 --> 02:51:20,719 I can change how thick that value is. Okay. Now at the very end, 1348 02:51:21,920 --> 02:51:30,239 all right, let's also create, you know, title, all these things 1349 02:51:30,239 --> 02:51:39,360 here, let's just say, this would be the bikes, versus the 1350 02:51:39,360 --> 02:51:48,400 would be number of bikes. And the x label would be the 1351 02:51:48,399 --> 02:51:57,920 might cause an error. Yeah. So it's expecting a 2d array. So we 1352 02:51:57,920 --> 02:52:15,120 Okay, there we go. So I just had to make this an array and then 1353 02:52:15,120 --> 02:52:20,960 we see that, all right, this increases. But again, remember those 1354 02:52:20,959 --> 02:52:26,799 linear regression, like this, I don't really know if this fits 1355 02:52:26,799 --> 02:52:32,159 wanted to show you guys though, that like, all right, this is what 1356 02:52:32,159 --> 02:52:46,399 data would look like. Okay. Now, we can do multiple linear 1357 02:52:46,399 --> 02:52:58,079 and do that as well. Now, if I take my data set, and instead of 1358 02:52:58,079 --> 02:53:09,600 my current data set right now. Alright, so let's just use all of 1359 02:53:09,600 --> 02:53:18,399 right. So I'm going to just say for the x labels, let's just take 1360 02:53:18,399 --> 02:53:30,559 remove the byte count. So does that work? So if this part should 1361 02:53:30,559 --> 02:53:39,039 this should work now. Oops, sorry. Okay, so I have Oh, but this 1362 02:53:39,040 --> 02:53:48,160 temperature anymore, we should actually do this, let's say all, 1363 02:53:48,159 --> 02:53:53,920 rerun this piece here so that we have our temperature only data 1364 02:53:53,920 --> 02:54:02,000 all data set. Okay. And this regressor, I can do the same thing. 1365 02:54:02,000 --> 02:54:12,879 And I'm going to make this the linear regression. And I'm going to 1366 02:54:12,879 --> 02:54:20,959 train all. Okay. Alright, so let's go ahead and also score this 1367 02:54:20,959 --> 02:54:30,159 R squared performs now. So if I test this on the test data set, 1368 02:54:30,159 --> 02:54:37,200 square seems to improve it went from point four to point five, 1369 02:54:38,319 --> 02:54:44,559 And I can't necessarily plot, you know, every single dimension. 1370 02:54:44,559 --> 02:54:49,680 to say, okay, this is this is improved, right? Alright, so one 1371 02:54:49,680 --> 02:55:00,079 tensorflow is you can actually do regression, but with the neural 1372 02:55:00,079 --> 02:55:08,879 to we already have our our training data for just the temperature 1373 02:55:08,879 --> 02:55:13,839 different columns. So I'm not going to bother with splitting up 1374 02:55:13,840 --> 02:55:20,639 ahead and start building the model. So in this linear regression 1375 02:55:20,639 --> 02:55:28,079 it does help if we normalize it. So that's very easy to do with 1376 02:55:28,079 --> 02:55:36,719 normalizer layer. So I'm going to do tensorflow Keras layers, and 1377 02:55:37,440 --> 02:55:43,920 And the input shape for that will just be one because let's just 1378 02:55:43,920 --> 02:55:53,520 temperature and the access I will make none. Now for this temp 1379 02:55:53,520 --> 02:56:04,960 an equal sign there. I'm going to adapt this to X train temp, and 1380 02:56:06,479 --> 02:56:14,799 So that should work great. Now with this model, so temp neural net 1381 02:56:14,799 --> 02:56:23,759 you know, dot keras, sequential. And I'm going to pass in this 1382 02:56:23,760 --> 02:56:29,920 going to say, hey, just give me one single dense layer with one 1383 02:56:29,920 --> 02:56:37,120 is saying, all right, well, one single node just means that it's 1384 02:56:37,120 --> 02:56:43,360 sort of activation function to it, the output is also linear. So 1385 02:56:43,360 --> 02:56:52,960 Keras layers dot dense. And I'm just going to have one unit. And 1386 02:56:54,479 --> 02:57:06,799 So with this model, let's compile. And for our optimizer, let's 1387 02:57:06,799 --> 02:57:16,399 let's use the atom again, dot atom, and we have to pass in the 1388 02:57:16,399 --> 02:57:26,879 and our learning rate, let's do 0.01. And now, the loss, we 1389 02:57:26,879 --> 02:57:34,079 loss, I'm going to do mean squared error. Okay, so we run that 1390 02:57:34,079 --> 02:57:41,440 And just like before, we can call history. And I'm going to fit 1391 02:57:41,440 --> 02:57:48,640 if I call fit, I can just fit it, and I'm going to take the x 1392 02:57:49,280 --> 02:57:57,840 but reshape it. Y train for the temperature. And I'm going to set 1393 02:57:57,840 --> 02:58:04,479 that it doesn't, you know, display stuff. I'm actually going to 1394 02:58:04,479 --> 02:58:13,760 1000. And the validation data should be let's pass in the 1395 02:58:16,319 --> 02:58:22,799 as a tuple. And I know I spelled that wrong. So let's just run 1396 02:58:22,799 --> 02:58:27,759 And up here, I've copied and pasted the plot loss from our 1397 02:58:27,760 --> 02:58:34,159 to MSC. Because now we're talking we're dealing with mean squared 1398 02:58:34,159 --> 02:58:39,119 the loss of this history after it's done. So let's just wait for 1399 02:58:39,120 --> 02:58:50,320 plot. Okay, so this actually looks pretty good. We see that the 1400 02:58:50,319 --> 02:58:56,479 this actually looks pretty good. We see that the values are 1401 02:58:56,479 --> 02:59:05,520 I'm going to go back up and take this plot. And we are going to 1402 02:59:07,200 --> 02:59:14,400 here, instead of this temperature regressor, I'm going to use the 1403 02:59:17,360 --> 02:59:25,200 And if I run that, I can see that, you know, this also gives me a 1404 02:59:26,399 --> 02:59:30,079 you'll notice that this this fit is not entirely the same as the 1405 02:59:31,120 --> 02:59:38,800 up here. And that's due to the training process of, you know, of 1406 02:59:38,799 --> 02:59:45,279 different ways to try and try to find the best linear regressor. 1407 02:59:45,280 --> 02:59:50,960 propagation to train a neural net node, whereas in the other one, 1408 02:59:50,959 --> 02:59:58,719 Okay, they're probably just trying to actually compute the line of 1409 02:59:59,600 --> 03:00:08,479 well, we can repeat the exact same exercise with our with our 1410 03:00:09,360 --> 03:00:14,560 but I'm actually going to skip that part. I will leave that as an 1411 03:00:14,559 --> 03:00:19,039 so now what would happen if we use a neural net, a real neural net 1412 03:00:19,040 --> 03:00:24,960 one single node in order to predict this. So let's start on that 1413 03:00:24,959 --> 03:00:31,439 normalizer. So I'm actually going to take the same setup here. But 1414 03:00:31,440 --> 03:00:37,520 one dense layer, I'm going to set this equal to 32 units. And for 1415 03:00:37,520 --> 03:00:46,159 Relu. And now let's duplicate that. And for the final output, I 1416 03:00:46,159 --> 03:00:52,079 want one cell. And this activation is also going to be Relu, 1417 03:00:52,079 --> 03:00:57,039 zero bytes. So I'm just going to set that as Relu. I'm just going 1418 03:00:57,040 --> 03:01:04,640 Okay. And at the bottom, I'm going to have this neural net model. 1419 03:01:04,639 --> 03:01:16,319 net model, I'm going to compile. And I will actually use the same 1420 03:01:18,639 --> 03:01:27,279 instead of a learning rate of 0.01, I'll use 0.001. Okay. And I'm 1421 03:01:27,280 --> 03:01:39,920 So the history is this neural net model. And I'm going to fit that 1422 03:01:39,920 --> 03:01:54,479 temp, and valid validation data, I'm going to set this again equal 1423 03:01:54,479 --> 03:02:03,600 Now, for the verbose, I'm going to say equal to zero epochs, let's 1424 03:02:03,600 --> 03:02:08,559 size, actually, let's just not do a batch size right now. Let's 1425 03:02:08,559 --> 03:02:18,319 here. And again, we can plot the loss of this history after it's 1426 03:02:18,319 --> 03:02:26,879 run this. And that's not what we're supposed to get. So what is 1427 03:02:26,879 --> 03:02:39,679 we have our temperature normalizer, which I'm wondering now if we 1428 03:02:39,680 --> 03:02:51,040 Do that. Okay, so we do see this decline, it's an interesting 1429 03:02:53,280 --> 03:02:57,280 So this is our loss, which all right, if decreasing, that's a good 1430 03:02:57,920 --> 03:03:04,079 And actually, what's interesting is let's just let's plot this 1431 03:03:04,079 --> 03:03:09,840 And you'll see that we actually have this like, curve that looks 1432 03:03:09,840 --> 03:03:19,600 what if I got rid of this activation? Let's train this again. And 1433 03:03:21,120 --> 03:03:27,600 Alright, so even even when I got rid of that really at the end, it 1434 03:03:27,600 --> 03:03:36,559 it's not the best model, if we had maybe one more layer in here, 1435 03:03:36,559 --> 03:03:41,680 to play around with. When you're, you know, working with machine 1436 03:03:41,680 --> 03:03:53,440 know what the best model is going to be. For example, this also is 1437 03:03:53,440 --> 03:04:00,399 it's okay. So my point is, though, that with a neural net, I mean, 1438 03:04:00,399 --> 03:04:04,959 there's like no data down here, right? So it's kind of hard for 1439 03:04:04,959 --> 03:04:09,439 we probably should have started the prediction somewhere around 1440 03:04:09,440 --> 03:04:14,560 with this neural net model, you can see that this is no longer a 1441 03:04:14,559 --> 03:04:21,600 get an estimate of the value, right? And we can repeat this exact 1442 03:04:21,600 --> 03:04:30,640 do that. Right. And we can repeat this exact same exercise with 1443 03:04:33,520 --> 03:04:40,720 if I now pass in all of the data, so this is my all normalizer 1444 03:04:40,719 --> 03:04:54,479 and I should just be able to pass in that. So let's move this to 1445 03:04:54,479 --> 03:05:00,959 I'm going to pass in my all normalizer. And let's compile it. 1446 03:05:02,959 --> 03:05:10,479 Great. So here with the history, when we're trying to fit this 1447 03:05:10,479 --> 03:05:17,680 we're going to use our larger data set with all the features. And 1448 03:05:22,000 --> 03:05:23,680 And of course, we want to plot the loss. 1449 03:05:31,520 --> 03:05:37,760 Okay, so that's what our loss looks like. So an interesting curve, 1450 03:05:37,760 --> 03:05:44,479 So before we saw that our R squared score was around point five, 1451 03:05:44,479 --> 03:05:49,680 that with a neural net anymore. But one thing that we can measure 1452 03:05:49,680 --> 03:05:59,600 error, right? So if I come down here, and I compare the two mean 1453 03:05:59,600 --> 03:06:13,360 so I can predict x test all right. So these are my predictions 1454 03:06:14,079 --> 03:06:20,159 will linear multiple multiple linear regressor. So these are my 1455 03:06:20,159 --> 03:06:32,079 Okay. I'm actually going to do that at the bottom. So let me just 1456 03:06:32,079 --> 03:06:41,760 it down here. So now I'm going to calculate the mean squared error 1457 03:06:41,760 --> 03:06:51,360 and the neural net. Okay, so this is my linear and this is my 1458 03:06:51,360 --> 03:07:03,760 model, and I predict x test all, I get my two, you know, different 1459 03:07:03,760 --> 03:07:11,280 the mean squared error, right? So if I want to get the mean 1460 03:07:11,280 --> 03:07:19,200 and y real, I can do numpy dot square, and then I would need the y 1461 03:07:19,200 --> 03:07:31,840 real. So this this is basically squaring everything. And this 1462 03:07:31,840 --> 03:07:42,000 this entire thing and take the mean of that, that should give me 1463 03:07:44,959 --> 03:07:52,639 And the y real is y test all, right? So that's my mean squared 1464 03:07:52,639 --> 03:08:04,559 And this is my mean squared error for the neural net. So that's 1465 03:08:04,559 --> 03:08:14,399 I guess. So my guess is that it's probably coming from this 1466 03:08:14,399 --> 03:08:33,279 shape is probably just six. And okay, so that works now. And the 1467 03:08:33,280 --> 03:08:39,040 my inputs are only for every vector, it's only a one dimensional 1468 03:08:39,040 --> 03:08:46,000 have I should have just had six, comma, which is a tuple of size 1469 03:08:46,000 --> 03:08:54,079 a tuple containing one element, which is a six. Okay, so it's 1470 03:08:54,079 --> 03:09:00,479 net results seem like they they have a larger mean squared error 1471 03:09:00,479 --> 03:09:09,840 One thing that we can look at is, we can actually plot the real 1472 03:09:09,840 --> 03:09:21,200 results versus what the predictions are. So if I say, some access, 1473 03:09:21,200 --> 03:09:31,280 axes and make these equal, then I can scatter the the y, you know, 1474 03:09:31,280 --> 03:09:40,000 values are on the x axis, and then what the prediction are on the 1475 03:09:40,000 --> 03:09:50,159 label this as the linear regression predictions. Okay, so then let 1476 03:09:50,159 --> 03:09:59,360 x axis, I'm going to say is the true values. The y axis is going 1477 03:10:04,319 --> 03:10:09,279 Or actually, let's plot. Let's just make this predictions. 1478 03:10:09,280 --> 03:10:19,200 And then at the end, I'm going to plot. Oh, let's set some 1479 03:10:22,879 --> 03:10:26,159 Because I think that's like approximately the max number of 1480 03:10:28,639 --> 03:10:35,199 So I'm going to set my x limit to this and my y limit to this. 1481 03:10:35,200 --> 03:10:45,920 So here, I'm going to pass that in here too. And all right, this 1482 03:10:46,479 --> 03:10:54,719 linear regressor. You see that actually, they align quite well, I 1483 03:10:54,719 --> 03:11:03,359 probably too much 2500. I mean, looks like maybe like 1800 would 1484 03:11:03,360 --> 03:11:09,360 And I'm actually going to label something else, the neural net 1485 03:11:12,719 --> 03:11:22,000 Let's add a legend. So you can see that our neural net for the 1486 03:11:22,000 --> 03:11:28,479 it's a little bit more spread out. And it seems like we tend to 1487 03:11:28,479 --> 03:11:36,479 here in this area. Okay. And for some reason, these are way off as 1488 03:11:37,840 --> 03:11:44,479 But yeah, so we've basically used a linear regressor and a neural 1489 03:11:44,479 --> 03:11:48,559 sometimes where a neural net is more appropriate and a linear 1490 03:11:49,120 --> 03:11:54,720 I think that it just comes with time and trying to figure out, you 1491 03:11:54,719 --> 03:11:59,279 like, hey, what works better, like here, a linear, a multiple 1492 03:11:59,280 --> 03:12:05,760 better than a neural net. But for example, with the one 1493 03:12:05,760 --> 03:12:12,880 never be able to see this curve. Okay. I mean, I'm not saying this 1494 03:12:12,879 --> 03:12:19,439 just saying like, hey, you know, sometimes it might be more 1495 03:12:19,440 --> 03:12:29,120 linear. So yeah, I will leave regression at that. Okay, so we just 1496 03:12:29,840 --> 03:12:34,880 And in supervised learning, we have data, we have some a bunch of 1497 03:12:34,879 --> 03:12:39,759 different samples. But each of those samples has some sort of 1498 03:12:39,760 --> 03:12:46,159 a category, a class, etc. Right, we were able to use that label in 1499 03:12:46,159 --> 03:12:51,840 right, we were able to use that label in order to try to predict 1500 03:12:51,840 --> 03:12:59,520 we haven't seen yet. Well, now let's move on to unsupervised 1501 03:12:59,520 --> 03:13:05,600 learning, we have a bunch of unlabeled data. And what can we do 1502 03:13:05,600 --> 03:13:13,120 anything from this data? So the first algorithm that we're going 1503 03:13:13,120 --> 03:13:22,720 clustering. What k means clustering is trying to do is it's trying 1504 03:13:25,760 --> 03:13:31,360 So in this example below, I have a bunch of scattered points. And 1505 03:13:31,360 --> 03:13:38,079 is x zero and x one on the two axes, which means I'm actually 1506 03:13:38,079 --> 03:13:44,799 right of each point, but we don't know what the y label is for 1507 03:13:44,799 --> 03:13:51,439 at these scattered points, we can kind of see how there are 1508 03:13:51,440 --> 03:14:00,319 right. So depending on what we pick for k, we might have different 1509 03:14:00,319 --> 03:14:05,440 right, then we might pick, okay, this seems like it could be one 1510 03:14:05,440 --> 03:14:12,399 another cluster. So those might be our two different clusters. If 1511 03:14:13,120 --> 03:14:18,160 for example, then okay, this seems like it could be a cluster. 1512 03:14:18,159 --> 03:14:23,119 cluster. And maybe this could be a cluster, right. So we could 1513 03:14:23,120 --> 03:14:33,520 data set. Now, this k here is predefined, if I can spell that 1514 03:14:33,520 --> 03:14:42,479 the model. So that would be you. All right. And let's discuss how 1515 03:14:42,479 --> 03:14:49,199 goes through and computes the k clusters. So I'm going to write 1516 03:14:52,639 --> 03:15:01,279 Now, the first step that happens is we actually choose well, the 1517 03:15:01,280 --> 03:15:11,280 points on this plot to be the centroids. And by centuries, I just 1518 03:15:11,840 --> 03:15:16,799 Okay. So three random points, let's say we're doing k equals 1519 03:15:16,799 --> 03:15:21,519 random points to be the centroids of the three clusters. If it 1520 03:15:21,520 --> 03:15:27,760 random points. Okay. So maybe the three random points I'm choosing 1521 03:15:27,760 --> 03:15:41,680 Here, here, and here. All right. So we have three different 1522 03:15:46,000 --> 03:15:58,639 the distance for each point to those centroids. So between all the 1523 03:16:01,360 --> 03:16:06,400 So basically, I'm saying, all right, this is this distance, this 1524 03:16:07,600 --> 03:16:13,120 all of these distances, I'm computing between oops, not those two, 1525 03:16:13,120 --> 03:16:18,720 centroids themselves. So I'm computing the distances for all of 1526 03:16:20,079 --> 03:16:30,639 Okay. And that comes with also assigning those points to the 1527 03:16:34,799 --> 03:16:42,399 What do I mean by that? So let's take this point here, for 1528 03:16:42,399 --> 03:16:46,959 this distance, this distance, and this distance. And I'm saying, 1529 03:16:46,959 --> 03:16:54,399 is the closest. So I'm actually going to put this into the red 1530 03:16:54,399 --> 03:17:03,279 all of these points, it seems slightly closer to red, and this one 1531 03:17:03,280 --> 03:17:13,040 right? Now for the blue, I actually wouldn't put any blue ones in 1532 03:17:13,040 --> 03:17:21,200 actually, that first one is closer to red. And now it seems like 1533 03:17:21,200 --> 03:17:31,440 closer to green. So let's just put all of these into green here, 1534 03:17:31,440 --> 03:17:38,480 have, you know, our two, three, technically centroid. So there's 1535 03:17:38,479 --> 03:17:44,879 this group here. And then blue is kind of just this group here, it 1536 03:17:44,879 --> 03:17:54,559 of the points yet. So the next step, three that we do is we 1537 03:17:54,559 --> 03:18:02,799 centroid. So we compute new centroids based on the points that we 1538 03:18:04,000 --> 03:18:10,159 And by that, I just mean, okay, well, let's take the average of 1539 03:18:10,159 --> 03:18:15,680 new centroid? That's probably going to be somewhere around here, 1540 03:18:15,680 --> 03:18:22,800 any points in there. So we won't touch and then the screen one, we 1541 03:18:22,799 --> 03:18:36,239 over here, oops, somewhere over here. Right. So now if I erase all 1542 03:18:38,239 --> 03:18:44,239 I can go and I can actually redo step two over here, this 1543 03:18:45,280 --> 03:18:48,560 Alright, so I'm going to go back and I'm going to iterate through 1544 03:18:48,559 --> 03:18:55,199 and I'm going to recompute my three centroids. So let's see, we're 1545 03:18:55,200 --> 03:19:01,840 these are definitely all red, right? This one still looks a bit 1546 03:19:03,760 --> 03:19:06,800 this part, we actually start getting closer to the blues. 1547 03:19:08,159 --> 03:19:16,799 So this one still seems closer to a blue than a green, this one as 1548 03:19:16,799 --> 03:19:26,399 would belong to green. Okay, so now our three centroids are three, 1549 03:19:26,399 --> 03:19:39,840 would be this, this, and then this, right? Those are our three 1550 03:19:39,840 --> 03:19:44,079 and we compute the new sorry, those would be the three clusters. 1551 03:19:44,079 --> 03:19:50,639 the three centroids. So I'm going to get rid of this, this and 1552 03:19:50,639 --> 03:19:57,680 red be centered, probably closer, you know, to this point here, 1553 03:19:57,680 --> 03:20:05,520 up here. And then this green would probably be somewhere. It's 1554 03:20:05,520 --> 03:20:10,880 before. But it seems like it'd be pulled down a bit. So probably 1555 03:20:10,879 --> 03:20:20,239 All right. And now, again, we go back and we compute the distance 1556 03:20:20,239 --> 03:20:27,600 and the centroids. And then we assign them to the closest 1557 03:20:27,600 --> 03:20:36,000 it's very clear. Actually, let me just circle that. And this it 1558 03:20:36,000 --> 03:20:43,440 it actually seemed like this point is closer to this blue now. So 1559 03:20:43,440 --> 03:20:49,440 be maybe this point looks like it'd be blue. So all these look 1560 03:20:50,159 --> 03:20:58,000 And the greens would probably be this cluster right here. So we go 1561 03:20:58,000 --> 03:21:08,959 bam. This one probably like almost here, bam. And then the green 1562 03:21:10,959 --> 03:21:21,919 here ish. Okay. And now we go back and we compute the we compute 1563 03:21:21,920 --> 03:21:32,879 So red, still this blue, I would argue is now this cluster here. 1564 03:21:33,360 --> 03:21:48,079 Okay, so we go and we recompute the centroids, bam, bam. And, you 1565 03:21:48,079 --> 03:21:54,399 to go and assign all the points to clusters again, I would get the 1566 03:21:54,399 --> 03:21:59,840 that's when we know that we can stop iterating between steps two 1567 03:21:59,840 --> 03:22:06,559 converged on some solution when we've reached some stable point. 1568 03:22:06,559 --> 03:22:10,399 these points are really changing out of their clusters anymore, we 1569 03:22:10,399 --> 03:22:19,199 and say, Hey, these are our three clusters. Okay. And this 1570 03:22:20,719 --> 03:22:33,279 expectation maximization. This part where we're assigning the 1571 03:22:33,280 --> 03:22:41,840 this is something this is our expectation step. And this part 1572 03:22:41,840 --> 03:22:54,000 centroids, this is our maximization step. Okay, so that's 1573 03:22:55,040 --> 03:23:02,720 And we use this in order to compute the centroids, assign all the 1574 03:23:02,719 --> 03:23:07,519 according to those centroids. And then we're recomputing all that 1575 03:23:07,520 --> 03:23:13,760 some stable point where nothing is changing anymore. Alright, so 1576 03:23:13,760 --> 03:23:19,200 of unsupervised learning. And basically, what this is doing is 1577 03:23:19,200 --> 03:23:25,520 some pattern in the data. So if I came up with another point, you 1578 03:23:25,520 --> 03:23:32,560 I can say, Oh, it looks like that's closer to if this is a, b, c, 1579 03:23:32,559 --> 03:23:38,239 cluster B. And so I would probably put it in cluster B. Okay, so 1580 03:23:38,239 --> 03:23:46,239 in the data based on just how, how the points are scattered 1581 03:23:46,239 --> 03:23:50,479 the second unsupervised learning technique that I'm going to 1582 03:23:50,479 --> 03:23:57,439 principal component analysis. And the point of principal component 1583 03:23:57,440 --> 03:24:07,520 used as a dimensionality reduction technique. So let me write that 1584 03:24:07,520 --> 03:24:15,520 reduction. And what do I mean by dimensionality reduction is if I 1585 03:24:15,520 --> 03:24:23,600 x1 x2 x3 x4, etc. Can I just reduce that down to one dimension 1586 03:24:23,600 --> 03:24:29,520 about how all these points are spread relative to one another. And 1587 03:24:29,520 --> 03:24:42,800 principal component analysis. Let's say I have some points in the 1588 03:24:42,799 --> 03:24:51,279 Okay, so these points might be spread, you know, something like 1589 03:24:59,680 --> 03:25:08,960 Okay. So for example, if this were something to do with housing 1590 03:25:08,959 --> 03:25:19,599 this here might be x zero might be hey, years since built, right, 1591 03:25:19,600 --> 03:25:29,920 and x one might be square footage of the house. Alright, so like 1592 03:25:29,920 --> 03:25:36,960 right now it's been, you know, 22 years since a house in 2000 was 1593 03:25:36,959 --> 03:25:40,799 analysis is just saying, alright, let's say we want to build a 1594 03:25:40,799 --> 03:25:48,639 you know, display something about our data, but we don't we don't 1595 03:25:49,520 --> 03:25:56,319 How do we display, you know, how do we how do we demonstrate that 1596 03:25:56,319 --> 03:26:04,239 this point than this point. And we can do that using principal 1597 03:26:04,239 --> 03:26:07,920 take what you know about linear regression and just forget about 1598 03:26:07,920 --> 03:26:16,879 you might get confused. PCA is a way of trying to find direction 1599 03:26:16,879 --> 03:26:23,920 variance. So this principal component, what that means is 1600 03:26:23,920 --> 03:26:38,879 So some direction in this space with the largest variance, okay, 1601 03:26:38,879 --> 03:26:42,639 data set without the two different dimensions. Like, let's say we 1602 03:26:42,639 --> 03:26:47,359 mentions, and somebody's telling us, hey, you only get one 1603 03:26:48,079 --> 03:26:53,840 What dimension do you want to show us? Okay, so let's say we want 1604 03:26:53,840 --> 03:26:59,040 what dimension like what do we do, we want to project our data 1605 03:27:00,159 --> 03:27:05,520 Alright, so that in this case might be a dimension that looks 1606 03:27:06,399 --> 03:27:10,639 this. And you might say, okay, we're not going to talk about 1607 03:27:11,680 --> 03:27:16,800 We don't have a y value. So linear regression, this would be why 1608 03:27:16,799 --> 03:27:23,199 have a label for that. Instead, what we're doing is we're taking 1609 03:27:23,200 --> 03:27:30,880 all of these take that's not very visible. But take this right 1610 03:27:33,040 --> 03:27:38,960 And what PCA is doing is saying, okay, map all of these points 1611 03:27:39,520 --> 03:27:44,000 So the transformed data set would be here. 1612 03:27:44,000 --> 03:27:49,760 This one's on the data sets are on the line. So we just put that 1613 03:27:49,760 --> 03:27:57,120 new one dimensional data set. Okay, it's not our prediction or 1614 03:27:57,120 --> 03:28:02,480 If somebody came to us said you only get one dimension, you only 1615 03:28:02,479 --> 03:28:06,879 each of these 2d points. What number would you give us? What 1616 03:28:06,879 --> 03:28:13,039 So this would be our new one dimensional data set. Okay, it's not 1617 03:28:13,040 --> 03:28:23,360 What number would you give me? This would be the number that we 1618 03:28:24,159 --> 03:28:29,840 this is where our points are the most spread out. Right? If I took 1619 03:28:31,040 --> 03:28:36,320 and let me actually duplicate this so I don't have to rewrite 1620 03:28:36,319 --> 03:28:43,840 Or so I don't have to erase and then redraw anything. Let me get 1621 03:28:47,440 --> 03:28:50,079 And I just got rid of a point there too. So let me draw that 1622 03:28:54,159 --> 03:29:01,039 Alright, so if this were my original data point, what if I had 1623 03:29:01,040 --> 03:29:12,960 the PCA dimension? Okay, well, I then would have points that let 1624 03:29:12,959 --> 03:29:24,319 color. So if I were to draw a right angle to this for every point, 1625 03:29:24,319 --> 03:29:37,440 like this. And so just intuitively looking at these two different 1626 03:29:37,440 --> 03:29:43,120 we can see that the points are squished a little bit closer 1627 03:29:43,120 --> 03:29:48,800 variance that's not the space with the largest variance. The thing 1628 03:29:48,799 --> 03:29:55,759 is that this will give us the most discrimination between all of 1629 03:29:55,760 --> 03:30:01,520 variance, the further spread out these points will likely be. Now, 1630 03:30:01,520 --> 03:30:07,600 dimension that we should project it on a different way to actually 1631 03:30:07,600 --> 03:30:14,399 dimension with the largest variance. It's actually it also happens 1632 03:30:14,399 --> 03:30:25,279 to be the dimension that decreases that minimizes the residuals. 1633 03:30:25,280 --> 03:30:33,520 we take the residual from that the XY residual, so in linear 1634 03:30:33,520 --> 03:30:37,760 we were looking only at this residual, the differences between the 1635 03:30:37,760 --> 03:30:44,800 y and y hat, it's not that here in principal component analysis, 1636 03:30:44,799 --> 03:30:52,319 from our current point in two dimensional space, and then it's 1637 03:30:52,319 --> 03:31:00,879 taking that dimension. And we're saying, alright, how much, you 1638 03:31:00,879 --> 03:31:08,719 between that projection residual, and we're trying to minimize 1639 03:31:08,719 --> 03:31:21,119 actually equates to this largest variance dimension, this 1640 03:31:21,120 --> 03:31:32,560 you can either look at it as minimizing, minimize, let me get rid 1641 03:31:34,559 --> 03:31:38,319 the projection residuals. So that's the stuff in orange. 1642 03:31:42,079 --> 03:31:48,319 Or to maximizing the variance between the points. 1643 03:31:48,319 --> 03:31:55,279 Okay. And we're not really going to talk about, you know, the 1644 03:31:55,280 --> 03:32:00,960 calculate out the principal components, or like what that 1645 03:32:00,959 --> 03:32:06,799 need to understand linear algebra for that, especially 1646 03:32:06,799 --> 03:32:12,079 I'm not going to cover in this class. But that's how you would 1647 03:32:12,079 --> 03:32:16,879 now, with this two dimensional data set here, sorry, this one 1648 03:32:16,879 --> 03:32:22,159 from a 2d data set, and we now boil it down to one dimension. 1649 03:32:22,159 --> 03:32:27,680 dimension, and we can do other things with it. Right, we can, like 1650 03:32:27,680 --> 03:32:35,040 then we can now show x versus y, rather than x zero and x one in 1651 03:32:35,040 --> 03:32:38,480 Now we can just say, oh, this is a principal component. And we're 1652 03:32:38,479 --> 03:32:44,559 the y. Or for example, if there were 100 different dimensions, and 1653 03:32:44,559 --> 03:32:51,199 them, well, you could go and you could find the top five PCA 1654 03:32:51,200 --> 03:32:58,400 more useful to you than 100 different feature vector values. 1655 03:32:58,399 --> 03:33:05,279 analysis. Again, we're taking, you know, certain data that's 1656 03:33:05,280 --> 03:33:13,760 some sort of estimation, like some guess about its structure from 1657 03:33:13,760 --> 03:33:20,159 wanted to take, you know, a 3d thing, so like a sphere, but we 1658 03:33:20,159 --> 03:33:26,079 on. Well, what's the best approximation that we can make? Oh, it's 1659 03:33:26,079 --> 03:33:30,079 the same thing. It's saying if we have something with all these 1660 03:33:30,079 --> 03:33:35,920 show all of them, how do we boil it down to just one dimension? 1661 03:33:35,920 --> 03:33:43,200 information from that multiple dimensions? And that is exactly 1662 03:33:43,200 --> 03:33:50,400 residuals, or you maximize the variance. And that is PCA. So we'll 1663 03:33:50,399 --> 03:33:57,039 Now, finally, let's move on to implementing the unsupervised 1664 03:33:57,040 --> 03:34:03,600 Here, again, I'm on the UCI machine learning repository. And I 1665 03:34:04,399 --> 03:34:09,440 you know, I have a bunch of kernels that belong to three different 1666 03:34:09,440 --> 03:34:17,120 comma, Rosa and Canadian. And the different features that we have 1667 03:34:17,120 --> 03:34:23,840 geometric parameters of those wheat kernels. So the area 1668 03:34:23,840 --> 03:34:30,639 width, asymmetry, and the length of the kernel groove. Okay, so 1669 03:34:30,639 --> 03:34:35,119 which is easy to work with. And what we're going to do is we're 1670 03:34:36,079 --> 03:34:40,479 or I guess we're going to try to cluster the different varieties 1671 03:34:41,440 --> 03:34:46,960 So let's get started. I have a colab notebook open again. Oh, 1672 03:34:46,959 --> 03:34:52,159 go to the data folder, download this. And so I'm going to go to 1673 03:34:52,159 --> 03:35:04,239 and let's get started. So the first thing to do is to import our 1674 03:35:04,239 --> 03:35:11,920 notebook. So I've done that here. Okay, and then we're going to 1675 03:35:11,920 --> 03:35:28,960 so pandas. And then I'm also going to import seedborn because I'm 1676 03:35:28,959 --> 03:35:40,239 specific class. Okay. Great. So now our columns that we have in 1677 03:35:40,239 --> 03:35:54,879 the perimeter, the compactness, the length, with asymmetry, 1678 03:35:54,879 --> 03:36:00,959 to call it groove. And then the class, right, the wheat kernels 1679 03:36:00,959 --> 03:36:11,199 I'm going to do that using pandas read CSV. And it's called seeds 1680 03:36:11,200 --> 03:36:19,040 that into a data frame. And the names are equal to the columns 1681 03:36:19,040 --> 03:36:29,120 do that? Oops, what did I call this seeds data set text? Alright, 1682 03:36:29,120 --> 03:36:36,800 data frame right now, you'll notice something funky. Okay. And 1683 03:36:36,799 --> 03:36:42,239 stuff under area. And these are all our numbers with some dash t. 1684 03:36:42,239 --> 03:36:50,799 haven't actually told pandas what the separator is, which we can 1685 03:36:50,799 --> 03:36:56,959 just a tab. So in order to ensure that like all whitespace gets 1686 03:36:56,959 --> 03:37:04,559 we can actually this is for like a space. So any spaces are going 1687 03:37:04,559 --> 03:37:13,279 separators. So if I run that, now our this, you know, this is a 1688 03:37:14,559 --> 03:37:20,719 So now let's actually go and like visualize this data. So what I'm 1689 03:37:20,719 --> 03:37:26,479 each of these against one another. So in this case, pretend that 1690 03:37:26,479 --> 03:37:31,279 class, right? Pretend that so this class here, I'm just going to 1691 03:37:31,280 --> 03:37:36,159 that like, hey, we can predict our classes using unsupervised 1692 03:37:36,159 --> 03:37:41,440 in unsupervised learning, we don't actually have access to the 1693 03:37:41,440 --> 03:37:49,920 plot these against one another and see what happens. So for some I 1694 03:37:49,920 --> 03:37:57,040 the columns minus one because the classes in the columns. And I'm 1695 03:37:57,040 --> 03:38:06,640 so take everything from I onwards, you know, so I like the next 1696 03:38:06,639 --> 03:38:15,519 So this will give us basically a grid of all the different like 1697 03:38:15,520 --> 03:38:24,399 going to be columns I our y label is going to be the columns j. So 1698 03:38:25,280 --> 03:38:34,000 And I'm going to use seaborne this time. And I'm going to say 1699 03:38:34,000 --> 03:38:46,399 to be our x label. Or y is going to be our y label. And our data 1700 03:38:46,399 --> 03:38:52,879 we're passing in. So what's interesting here is that we can say 1701 03:38:53,520 --> 03:38:57,920 like if I give this class, it's going to separate the three 1702 03:38:57,920 --> 03:39:03,200 hues. So now what we're doing is we're basically comparing the 1703 03:39:03,200 --> 03:39:10,880 and the compactness. But we're going to visualize, you know, what 1704 03:39:10,879 --> 03:39:22,399 ahead and I might have to show. So great. So basically, we can see 1705 03:39:22,399 --> 03:39:31,760 we get these three groups. The area compactness, we get these 1706 03:39:31,760 --> 03:39:40,639 kind of look honestly like somewhat similar. Right, so Wow, look 1707 03:39:40,639 --> 03:39:44,319 we have the compactness and the asymmetry. And it looks like 1708 03:39:44,319 --> 03:39:48,799 it just looks like they're blobs, right? Sure, maybe class three 1709 03:39:50,000 --> 03:39:55,680 one and two kind of look like they're on top of each other. Okay. 1710 03:39:55,680 --> 03:40:00,720 might look slightly better in terms of clustering. But let's go 1711 03:40:00,719 --> 03:40:05,920 clustering examples that we talked about, and try to implement 1712 03:40:05,920 --> 03:40:16,239 going to do is just straight up clustering. So what we learned 1713 03:40:16,239 --> 03:40:29,039 So from SK learn, I'm going to import k means. Okay. And just for 1714 03:40:29,040 --> 03:40:38,640 you know, any x and any y, I'm just going to say, hey, let's use 1715 03:40:40,959 --> 03:40:47,439 I mean, perimeter asymmetry could be a good one. So x could be 1716 03:40:47,440 --> 03:40:58,159 Okay. And for this, the x values, I'm going to just extract those 1717 03:40:59,840 --> 03:41:08,639 Alright, well, let's make a k means algorithm, or let's, you know, 1718 03:41:09,200 --> 03:41:15,760 and in this specific case, we know that the number of clusters is 1719 03:41:15,760 --> 03:41:27,120 I'm going to fit this against this x that I've just defined right 1720 03:41:27,120 --> 03:41:33,200 create this clusters, so one thing, one cool thing is I can 1721 03:41:33,200 --> 03:41:43,200 say k mean dot labels. And it'll give give me if I can type 1722 03:41:43,200 --> 03:41:52,159 predictions for all the clusters are. And our actual, oops, not 1723 03:41:52,159 --> 03:41:59,440 and we get the class, and the values from those, we can actually 1724 03:41:59,440 --> 03:42:05,200 like, you know, everything in general, most of the zeros that it's 1725 03:42:05,200 --> 03:42:11,360 And in general, the twos are the twos here. And then this third 1726 03:42:11,360 --> 03:42:16,560 to three. Now remember, these are separate classes. So the labels, 1727 03:42:16,559 --> 03:42:23,760 really matter. We can say a map zero to one map two to two and map 1728 03:42:23,760 --> 03:42:30,880 you know, our mapping would do fairly well. But we can actually 1729 03:42:30,879 --> 03:42:40,239 that, I'm going to create this cluster cluster data frame. So I'm 1730 03:42:40,239 --> 03:42:50,559 And I'm going to pass in a horizontally stacked array with x, so 1731 03:42:51,920 --> 03:42:58,159 the clusters that I have here, but I'm going to reshape them. So 1732 03:42:58,159 --> 03:43:14,319 Okay. And the columns, the labels for that are going to be x, y, 1733 03:43:14,319 --> 03:43:23,520 to go ahead and do that same seaborne scatter plot. Again, where x 1734 03:43:23,520 --> 03:43:32,159 the hue is again the class. And the data is now this cluster data 1735 03:43:35,760 --> 03:43:42,639 this here is my k means like, I guess classes. 1736 03:43:42,639 --> 03:43:54,319 So k means kind of looks like this. If I come down here and I 1737 03:43:54,319 --> 03:44:01,760 this is my original classes with respect to this specific x and y. 1738 03:44:01,760 --> 03:44:07,360 like it doesn't do too poorly. Yeah, there's I mean, the colors 1739 03:44:07,360 --> 03:44:16,000 For the most part, it gets information of the clusters, right. And 1740 03:44:16,000 --> 03:44:25,680 higher dimensions. So with the higher dimensions, if we make x 1741 03:44:25,680 --> 03:44:31,680 except for the last one, which is our class, we can do the exact 1742 03:44:31,680 --> 03:44:38,720 We can do the exact same thing. So here, and we can 1743 03:44:43,600 --> 03:44:55,360 predict this. But now, our columns are equal to our data frame 1744 03:44:55,360 --> 03:45:02,079 And then with this class, actually, so we can literally just say 1745 03:45:02,079 --> 03:45:09,760 And we can fit all of this. And now, if I want to plot the k means 1746 03:45:11,520 --> 03:45:20,079 Alright, so this was my that's my clustered and my original. So 1747 03:45:20,079 --> 03:45:27,360 get these on the same page. So yeah, I mean, pretty similar to 1748 03:45:27,360 --> 03:45:36,159 actually really cool is even something like, you know, if we 1749 03:45:36,159 --> 03:45:47,280 where they were like on top of each other? Okay, so compactness 1750 03:45:47,280 --> 03:45:57,680 Right. So if I come down here, and I say compactness and 1751 03:45:58,959 --> 03:46:05,119 this is what my scatterplot. So this is what you know, my k means 1752 03:46:05,120 --> 03:46:12,000 dimensions for compactness and asymmetry, if we just look at those 1753 03:46:12,000 --> 03:46:17,520 right? And we know that the original looks something like this. 1754 03:46:18,239 --> 03:46:25,119 alike? No. Okay, so now if I come back down here, and I rerun this 1755 03:46:25,120 --> 03:46:31,280 but actually, this clusters, I need to get the labels of the k 1756 03:46:34,559 --> 03:46:38,399 Okay, so if I rerun this with higher dimensions 1757 03:46:38,399 --> 03:46:45,600 well, if we zoom out, and we take a look at these two, sure, the 1758 03:46:45,600 --> 03:46:52,000 there are the three groups are there, right? This does a much 1759 03:46:52,000 --> 03:47:01,200 what group is what. So, for example, we could relabel the one in 1760 03:47:01,200 --> 03:47:08,400 And then we could make sorry, okay, this is kind of confusing. But 1761 03:47:08,399 --> 03:47:15,600 were projected onto this darker pink here, and then this dark one 1762 03:47:15,600 --> 03:47:21,280 and this light one was this dark one, then you kind of see like 1763 03:47:21,280 --> 03:47:26,159 right? Like even these two up here are the same class as all the 1764 03:47:26,159 --> 03:47:31,039 the same in the same color. So you don't want to compare the two 1765 03:47:31,040 --> 03:47:37,680 you want to compare which points are in what colors in each of the 1766 03:47:37,680 --> 03:47:44,079 application. So this is how k means functions, it's basically 1767 03:47:44,079 --> 03:47:50,239 All right, where are my clusters given these pieces of data? And 1768 03:47:50,239 --> 03:47:58,319 talked about is PCA. So PCA, we're reducing the dimension, but 1769 03:47:58,319 --> 03:48:02,799 you know, seven dimensions. I don't know if there are seven, I 1770 03:48:02,799 --> 03:48:09,199 mapping multiple dimensions into a lower dimension number. Right. 1771 03:48:10,079 --> 03:48:16,159 So from SK learn decomposition, I can import PCA and that will be 1772 03:48:16,159 --> 03:48:22,479 So if I do PCA component, so this is how many dimensions you want 1773 03:48:22,479 --> 03:48:28,319 And you know, for this exercise, let's do two. Okay, so now I'm 1774 03:48:29,360 --> 03:48:39,600 And my transformed x is going to be PCA dot fit transform, and the 1775 03:48:39,600 --> 03:48:46,559 And the same x that I had up here. Okay, so all the other all the 1776 03:48:46,559 --> 03:48:54,799 perimeter, compactness, length, width, asymmetry, groove. Okay. So 1777 03:48:54,799 --> 03:49:02,399 transformed it. So let's look at what the shape of x used to be. 1778 03:49:02,399 --> 03:49:10,879 I had 210 samples, each seven, seven features long, basically. And 1779 03:49:14,639 --> 03:49:20,079 is 210 samples, but only of length two, which means that I only 1780 03:49:20,079 --> 03:49:26,159 I'm plotting. And we can actually even take a look at, you know, 1781 03:49:27,200 --> 03:49:30,320 Okay, so now we see each each one is a two dimensional point 1782 03:49:30,319 --> 03:49:37,600 each sample is now a two dimensional point in our new in our new 1783 03:49:38,879 --> 03:49:42,959 So what's cool is I can actually scatter these 1784 03:49:46,639 --> 03:49:53,519 zero and transformed x. So I actually have to 1785 03:49:53,520 --> 03:49:59,280 take the columns here. And if I show that 1786 03:50:01,920 --> 03:50:06,879 basically, we've just taken this like seven dimensional thing, and 1787 03:50:06,879 --> 03:50:12,079 single or I guess to a two dimensional representation. So that's a 1788 03:50:13,200 --> 03:50:20,800 And actually, let's go ahead and do the same clustering exercise 1789 03:50:20,799 --> 03:50:29,840 the k means this PCA data frame, I can let's construct data frame 1790 03:50:29,840 --> 03:50:40,399 frame is going to be H stack. I'm going to take this transformed x 1791 03:50:40,399 --> 03:50:46,559 So actually, instead of clusters, I'm going to use k means dot 1792 03:50:46,559 --> 03:50:58,799 So it's 2d. So we can do the H stack. And for the columns, I'm 1793 03:50:59,680 --> 03:51:07,200 and the class. All right. So now if I take this, I can also do the 1794 03:51:08,159 --> 03:51:13,200 But instead of the k means labels, I want from the data frame the 1795 03:51:13,200 --> 03:51:20,720 And I'm just going to take the values from that. And so now I have 1796 03:51:20,719 --> 03:51:27,199 with PCA and then a data frame for the truth with also the PCA. 1797 03:51:27,200 --> 03:51:32,320 to how I plotted these up here. So let me actually take these 1798 03:51:32,319 --> 03:51:41,279 Instead of the cluster data frame, I want the this is the k means 1799 03:51:41,280 --> 03:51:51,200 to be class, but now x and y are going to be the two PCA 1800 03:51:51,200 --> 03:51:58,159 dimensions. And you can see that the data frame is going to be the 1801 03:51:58,159 --> 03:52:05,760 So these are my two PCA dimensions. And you can see that, you 1802 03:52:05,760 --> 03:52:14,319 out. And then here, I'm going to go to my truth classes. Again, 1803 03:52:14,319 --> 03:52:22,000 of k means this should be truth PCA data frame. So you can see 1804 03:52:22,000 --> 03:52:29,520 along these two dimensions, we actually are doing fairly well in 1805 03:52:29,520 --> 03:52:36,720 seem like this is slightly more separable than the other like 1806 03:52:36,719 --> 03:52:45,359 up here. So that's a good sign. And up here, you can see that hey, 1807 03:52:45,360 --> 03:52:51,440 another. I mean, for the most part, our algorithm or unsupervised 1808 03:52:51,440 --> 03:52:59,680 to give us is able to spit out, you know, what the proper labels 1809 03:52:59,680 --> 03:53:05,200 specific labels to the different types of kernels. But for 1810 03:53:05,200 --> 03:53:09,360 kernel kernels and same here. And then these might all be the 1811 03:53:09,360 --> 03:53:14,960 be the Canadian kernels. So it does struggle a little bit with, 1812 03:53:14,959 --> 03:53:21,119 But for the most part, our algorithm is able to find the three 1813 03:53:21,120 --> 03:53:26,480 fairly good job at predicting them without without any information 1814 03:53:26,479 --> 03:53:32,879 algorithm any labels. So that's a gist of unsupervised learning. I 1815 03:53:32,879 --> 03:53:38,799 this course. I hope you know, a lot of these examples made sense. 1816 03:53:38,799 --> 03:53:44,239 that I have done, and you know, you're somebody with more 1817 03:53:44,239 --> 03:53:50,559 in the comments and we can all as a community learn from this 182733

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.