Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:03,150 --> 00:00:05,010
Instructor: Welcome back everybody.
2
00:00:05,010 --> 00:00:06,420
Towards the end of the last lecture,
3
00:00:06,420 --> 00:00:08,189
we mentioned standardizing,
4
00:00:08,189 --> 00:00:10,863
but didn't explain what it is and why we use it.
5
00:00:11,910 --> 00:00:13,740
Before we understand this concept
6
00:00:13,740 --> 00:00:16,353
we need to explain what a transformation is.
7
00:00:18,180 --> 00:00:21,150
So a transformation is a way in which we can alter
8
00:00:21,150 --> 00:00:24,630
every element of a distribution to get a new distribution
9
00:00:24,630 --> 00:00:26,373
with similar characteristics.
10
00:00:27,420 --> 00:00:30,060
For normal distributions we can use addition,
11
00:00:30,060 --> 00:00:33,210
subtraction, multiplication, and division,
12
00:00:33,210 --> 00:00:35,510
without changing the type of the distribution.
13
00:00:36,690 --> 00:00:38,520
For instance, if we add a constant
14
00:00:38,520 --> 00:00:41,100
to every element of a normal distribution
15
00:00:41,100 --> 00:00:43,250
the new distribution would still be normal.
16
00:00:44,880 --> 00:00:47,400
Let's discuss the four algebraic options
17
00:00:47,400 --> 00:00:49,533
and see how each one affects the graph.
18
00:00:51,120 --> 00:00:54,840
If we had a constant like three to the entire distribution
19
00:00:54,840 --> 00:00:57,150
then we simply need to move the graph three places
20
00:00:57,150 --> 00:00:57,983
to the right.
21
00:00:59,580 --> 00:01:02,760
Similarly, if we subtract a number from every element
22
00:01:02,760 --> 00:01:04,470
we would simply move our current graph
23
00:01:04,470 --> 00:01:06,243
to the left to get the new one.
24
00:01:08,070 --> 00:01:10,470
If we multiply the function by a constant
25
00:01:10,470 --> 00:01:13,020
it will shrink that many times,
26
00:01:13,020 --> 00:01:15,510
and if we divide every element by a number,
27
00:01:15,510 --> 00:01:16,623
the graph will expand.
28
00:01:17,490 --> 00:01:20,250
However, if we multiply or divide by a number
29
00:01:20,250 --> 00:01:23,583
between zero and one, the opposing effects will occur.
30
00:01:24,660 --> 00:01:27,930
For example, dividing by a half is the same as multiplying
31
00:01:27,930 --> 00:01:30,630
by two, so the graph will shrink
32
00:01:30,630 --> 00:01:32,030
even though we are dividing.
33
00:01:34,050 --> 00:01:35,160
All right.
34
00:01:35,160 --> 00:01:37,110
Now that you know what a transformation is,
35
00:01:37,110 --> 00:01:38,823
we can explain standardizing.
36
00:01:39,930 --> 00:01:42,930
Standardizing is a special kind of transformation
37
00:01:42,930 --> 00:01:46,080
in which we make the expected value equal to zero
38
00:01:46,080 --> 00:01:48,003
and the variance equal to one.
39
00:01:49,860 --> 00:01:51,930
The distribution we get after standardizing
40
00:01:51,930 --> 00:01:53,970
any normal distribution is called
41
00:01:53,970 --> 00:01:56,013
a standard normal distribution.
42
00:01:57,120 --> 00:02:01,380
In addition to the 68, 95, 99.7 rule,
43
00:02:01,380 --> 00:02:04,020
a table exists which summarizes the most commonly
44
00:02:04,020 --> 00:02:08,073
used values for the CDF of a standard normal distribution.
45
00:02:09,630 --> 00:02:12,870
This table is known as the standard normal distribution
46
00:02:12,870 --> 00:02:15,423
table or the Z-score table.
47
00:02:17,280 --> 00:02:20,340
Okay, so far we have learned what standardizing is
48
00:02:20,340 --> 00:02:22,140
and why it's convenient.
49
00:02:22,140 --> 00:02:25,080
What we haven't talked about is how to do it.
50
00:02:25,080 --> 00:02:28,410
First, we wish to move the graph either to the left
51
00:02:28,410 --> 00:02:31,563
or to the right until it's mean equals zero.
52
00:02:32,460 --> 00:02:36,120
The way we would do that is by subtracting the mean Mu,
53
00:02:36,120 --> 00:02:38,040
from every element.
54
00:02:38,040 --> 00:02:40,920
After this, to make the standardization complete
55
00:02:40,920 --> 00:02:44,550
we need to make sure the standard deviation is one.
56
00:02:44,550 --> 00:02:47,220
To do so, we would have to divide every element
57
00:02:47,220 --> 00:02:49,260
of the newly obtained distribution
58
00:02:49,260 --> 00:02:52,503
by the value of the standard deviation, sigma.
59
00:02:54,570 --> 00:02:58,080
If we denote the standard normal distribution with Z,
60
00:02:58,080 --> 00:03:01,320
then for any normally distributed variable Y,
61
00:03:01,320 --> 00:03:05,280
Z equals Y minus Mu over sigma.
62
00:03:05,280 --> 00:03:07,320
This equation expresses the transformation we
63
00:03:07,320 --> 00:03:09,123
use when standardizing.
64
00:03:11,640 --> 00:03:13,050
Amazing.
65
00:03:13,050 --> 00:03:14,790
Applying this single transformation
66
00:03:14,790 --> 00:03:18,030
for any normal distribution would result in a standard
67
00:03:18,030 --> 00:03:20,700
normal distribution, which is convenient.
68
00:03:20,700 --> 00:03:23,550
Essentially, every element of the non-standardized
69
00:03:23,550 --> 00:03:27,060
distribution is represented in the new distribution
70
00:03:27,060 --> 00:03:29,340
by the number of standard deviations it is
71
00:03:29,340 --> 00:03:30,483
away from the mean.
72
00:03:31,830 --> 00:03:36,180
For instance, if a value Y is 2.3 standard deviations
73
00:03:36,180 --> 00:03:39,810
away from the mean, it's equivalent value Z
74
00:03:39,810 --> 00:03:41,463
would be equal to 2.3.
75
00:03:42,630 --> 00:03:44,850
Standardizing is incredibly useful when
76
00:03:44,850 --> 00:03:46,920
we have a normal distribution.
77
00:03:46,920 --> 00:03:49,050
However, we cannot always anticipate
78
00:03:49,050 --> 00:03:50,673
the data is spread out that way.
79
00:03:52,200 --> 00:03:54,930
A crucial fact to remember about the normal distribution
80
00:03:54,930 --> 00:03:58,080
is that it requires a lot of data.
81
00:03:58,080 --> 00:04:01,320
If our sample is limited, we run the risk of outliers
82
00:04:01,320 --> 00:04:03,273
drastically affecting our analysis.
83
00:04:04,290 --> 00:04:06,690
In cases where we have less than 30 entries,
84
00:04:06,690 --> 00:04:09,273
we usually avoid assuming a normal distribution.
85
00:04:11,250 --> 00:04:14,340
However, there is a small sample size approximation
86
00:04:14,340 --> 00:04:16,440
of a normal distribution called
87
00:04:16,440 --> 00:04:18,899
the student's T distribution.
88
00:04:18,899 --> 00:04:21,600
And we are going to focus on that in our next lecture.
89
00:04:22,770 --> 00:04:23,793
Thanks for watching.
7012
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.