Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,420 --> 00:00:06,230
Hello everyone and welcome to the lecture on factor in categorical matrices this lecture we're going
2
00:00:06,230 --> 00:00:09,850
to look at how to create factor in categorical matrices in our.
3
00:00:10,050 --> 00:00:13,140
Let's go ahead and jump to our studio to get started.
4
00:00:13,140 --> 00:00:15,300
All right so here here our studio.
5
00:00:15,480 --> 00:00:20,520
And in this lecture we're mainly going to be discussing the factor function and it's used for creating
6
00:00:20,520 --> 00:00:22,210
categorical matrices.
7
00:00:22,470 --> 00:00:27,510
This specific function will become really useful when we begin to apply data analysis and machine learning
8
00:00:27,510 --> 00:00:29,210
techniques to our data.
9
00:00:29,220 --> 00:00:32,160
This is also sometimes and then is creating dummy variables.
10
00:00:32,160 --> 00:00:35,630
So you may be familiar with it in that context.
11
00:00:35,670 --> 00:00:40,230
So let's go to start by show an example of why and how we can build a sort of matrix.
12
00:00:40,390 --> 00:00:50,580
I want to go ahead and create a vector called Animal Nalut just go ahead and pretend that where have
13
00:00:50,580 --> 00:00:52,120
some sort of animal sanctuary.
14
00:00:52,130 --> 00:00:54,040
And there's dogs and cats in it.
15
00:00:54,090 --> 00:00:59,090
So we'll go ahead and just label this the C for cat D for dog.
16
00:00:59,340 --> 00:01:07,180
Let's go ahead and put another dog in there and then we'll start with two more cats.
17
00:01:07,180 --> 00:01:07,870
All right.
18
00:01:08,160 --> 00:01:11,690
So we have our animal vector where C is for cat D is for dog.
19
00:01:11,700 --> 00:01:16,890
And so we're trying to represent some sort of animal sanctuary and let's say each of these specific
20
00:01:16,890 --> 00:01:20,320
animals also has a corresponding ID number.
21
00:01:20,430 --> 00:01:23,440
So we'll go ahead and make an ID number vector.
22
00:01:23,970 --> 00:01:29,130
Know let's just go ahead and say One two three four five.
23
00:01:29,130 --> 00:01:34,440
All right so we have our animals in Adie's and we want to convert the animal vector into information
24
00:01:34,440 --> 00:01:40,260
that an algorithm or equation can understand more easily meaning we want to begin to check how many
25
00:01:40,260 --> 00:01:45,030
categories or factor levels are actually in our character vector.
26
00:01:45,330 --> 00:01:51,600
So we can do is we can pass the vector through the factor function like this.
27
00:01:51,600 --> 00:01:54,230
So we're going to go in and say factor.
28
00:01:55,080 --> 00:02:00,470
And so then notice I'm going to go ahead and pass an animal.
29
00:02:00,810 --> 00:02:07,980
And so what you get back is this vector that also notice that it has these levels attached to it C and
30
00:02:07,980 --> 00:02:08,790
D.
31
00:02:08,850 --> 00:02:16,470
So I'm going to go ahead and assign this factor of animal and I'll assign it to let's just go ahead
32
00:02:16,470 --> 00:02:22,180
and say fact Annie as some variable name for this.
33
00:02:22,320 --> 00:02:24,580
So we're going to save that factor.
34
00:02:24,690 --> 00:02:25,250
All right.
35
00:02:25,410 --> 00:02:29,340
So we can see that we have two levels D-N.C. for dog and cat.
36
00:02:29,610 --> 00:02:34,310
So in our There's going to be two distinct types of categorical variables.
37
00:02:34,380 --> 00:02:38,670
There's going to be ordinal categorical variables in nominal categorical variables.
38
00:02:38,790 --> 00:02:42,780
So the nominal categorical variables let me go in and type this out.
39
00:02:42,780 --> 00:02:44,540
You can also reference the notes for this.
40
00:02:44,770 --> 00:02:50,310
So right now we're talking about nominal must go ahead and make that a little brighter by not putting
41
00:02:50,310 --> 00:02:51,660
it as a comment.
42
00:02:51,660 --> 00:02:55,230
So the nominal categorical variables don't have any order.
43
00:02:55,350 --> 00:02:57,180
So you can imagine that dogs and cats.
44
00:02:57,180 --> 00:02:59,480
There's no order to them.
45
00:02:59,520 --> 00:03:06,570
Well some people may prefer dogs for cats but in reality there's no order to them versus ordinal categorical
46
00:03:06,570 --> 00:03:08,370
variables as the name implies.
47
00:03:08,400 --> 00:03:18,420
Do have some sort of order so we can think of nominal there's no order versus let's go ahead and say
48
00:03:20,160 --> 00:03:24,660
ordinal there is an order.
49
00:03:24,750 --> 00:03:27,690
So nominal an example that is what we have here.
50
00:03:27,840 --> 00:03:34,770
This animal so there's no order versus dogs and cats but something that could be ordinal of order could
51
00:03:34,770 --> 00:03:44,970
be something like temperatures so we could have something like or Dohm cat with let's say cold
52
00:03:48,300 --> 00:03:51,890
medium and hot.
53
00:03:51,960 --> 00:03:59,700
So this is a ordinal categorical variable because even though we're dealing with strings here and different
54
00:03:59,700 --> 00:04:01,080
categories.
55
00:04:01,080 --> 00:04:05,910
So cold is a category medium is a category of temperature hot as category temperature.
56
00:04:05,910 --> 00:04:07,620
You can see that there's an order to them.
57
00:04:07,650 --> 00:04:11,280
It should be cold medium hot versus dogs and cats.
58
00:04:11,310 --> 00:04:14,690
So Dog is a category C is a category for cat.
59
00:04:14,850 --> 00:04:19,470
There's no order of dogs before cats are cast before dogs.
60
00:04:19,470 --> 00:04:20,150
OK.
61
00:04:20,640 --> 00:04:27,120
So if you actually want to assign an order to these variables you would be able to do that while using
62
00:04:27,120 --> 00:04:28,250
the factor function.
63
00:04:28,290 --> 00:04:34,680
And you can actually pass in an argument with ordered equals true and pass in the levels argument as
64
00:04:34,680 --> 00:04:34,990
well.
65
00:04:34,990 --> 00:04:37,510
So we're going to show you how we can do that.
66
00:04:37,560 --> 00:04:47,610
So let's say we have a vector temp's and it's going to be called Medium a high bigger than just copy
67
00:04:47,610 --> 00:04:49,170
and paste this.
68
00:04:49,290 --> 00:04:53,780
And we're also going to put in a couple of more instances so I'll say there's another instance of hot
69
00:04:53,830 --> 00:04:54,320
.
70
00:04:54,690 --> 00:04:58,320
Another instance of Ha we'll say there's another axis of cold.
71
00:04:58,590 --> 00:05:03,440
And finally one more instance of median.
72
00:05:03,820 --> 00:05:04,050
Right.
73
00:05:04,050 --> 00:05:06,820
So we have our attempts vector here.
74
00:05:06,840 --> 00:05:10,520
So it's caught cold medium hot ha hot cold medium.
75
00:05:10,540 --> 00:05:10,740
Right.
76
00:05:10,740 --> 00:05:12,670
So we have these three categories.
77
00:05:12,690 --> 00:05:23,340
Let's go ahead and pass this into a variable using factor and let me just bring this back up by clearing
78
00:05:23,340 --> 00:05:29,110
that I'm going to go ahead and say temp's is that vector of cold medium hot et cetera.
79
00:05:29,130 --> 00:05:39,780
Various instances and I'll say the factored version of this called The Factor function pass in my temps
80
00:05:39,860 --> 00:05:40,850
vector.
81
00:05:41,280 --> 00:05:45,750
I'm going to say ordered is equal to true.
82
00:05:45,780 --> 00:05:50,310
Remember I didn't do this for the dogs and cats vector because there was no order that made sense for
83
00:05:50,310 --> 00:05:52,150
dogs and cats in this case.
84
00:05:52,170 --> 00:05:53,730
There is an order that makes sense.
85
00:05:53,730 --> 00:05:55,830
It should be cold medium that hot.
86
00:05:55,950 --> 00:05:59,610
You could also do hot medium cold depending on which way you want it to think about it.
87
00:05:59,640 --> 00:06:07,050
This case will say cold comes first then medium than hot so I'll go ahead and add in the level's argument
88
00:06:07,560 --> 00:06:14,130
and the levels argument is going to take in a vector and you just pasan a vector of the order you want
89
00:06:15,030 --> 00:06:20,040
needs to be ordered it's going to say cold medium and then hot
90
00:06:23,000 --> 00:06:23,320
right.
91
00:06:23,370 --> 00:06:26,160
So we've assigned that factored version of that.
92
00:06:26,370 --> 00:06:32,420
Dr. chanels go ahead and see an idea of what this looks like by calling fact that temp.
93
00:06:32,460 --> 00:06:38,010
So notice the levels actually have an order to them and you can see it by the comparison operator in
94
00:06:38,010 --> 00:06:42,570
between the terms so cold is less than medium and medium is less than hot.
95
00:06:42,960 --> 00:06:48,750
OK so that's pretty useful and it's going to be really useful when we end up using the summary function
96
00:06:49,230 --> 00:06:53,540
to call a summary on that object.
97
00:06:54,660 --> 00:07:00,720
So I'll go ahead and tell us cold medium and hot and the various counts of how many times those appear
98
00:07:01,290 --> 00:07:08,550
and we can show you a summary of the original temps as well to show the difference between the factored
99
00:07:09,030 --> 00:07:13,920
version of that vector versus the summary of the original.
100
00:07:13,920 --> 00:07:14,610
All right.
101
00:07:14,610 --> 00:07:16,860
So that's it for this lecture.
102
00:07:16,890 --> 00:07:22,320
Don't stress out too much about understanding everything you've gone over here during this lecture.
103
00:07:22,350 --> 00:07:26,160
We're not going to touch back on this until quite a while from now.
104
00:07:26,310 --> 00:07:31,900
I just wanted you to be aware of it since it fits in well with the ideas of vectors and matrices.
105
00:07:31,920 --> 00:07:39,210
So the only thing you should be aware of are the use of the factor function as a possibility for you
106
00:07:39,210 --> 00:07:39,930
to use.
107
00:07:40,080 --> 00:07:47,890
And then the difference between a nominal category set versus an ordinal category set.
108
00:07:47,910 --> 00:07:53,520
OK so the nominal you have those cats and dogs are the order ordinal as the name implies there is an
109
00:07:53,550 --> 00:07:54,480
order to it.
110
00:07:54,480 --> 00:08:00,750
And you can specify that order using the factor function by saying order equals true and then specifying
111
00:08:00,780 --> 00:08:08,230
the levels as a vector and you just pasand the vector of the terms in indexed order that you want.
112
00:08:08,510 --> 00:08:10,920
OK that's it for this lecture.
113
00:08:10,920 --> 00:08:11,960
Thanks everyone.
114
00:08:12,000 --> 00:08:13,550
I'll see at the next lecture.
11673
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.