Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,420 --> 00:00:01,950
Narrator: While variance is a common measure
2
00:00:01,950 --> 00:00:04,080
of data dispersion in most cases,
3
00:00:04,080 --> 00:00:05,880
the figure you will obtain is pretty large
4
00:00:05,880 --> 00:00:09,033
and hard to compare as the unit of measurement is squared.
5
00:00:09,960 --> 00:00:12,450
The easy fix is to calculate it square root
6
00:00:12,450 --> 00:00:15,510
and obtain a statistic known as standard deviation.
7
00:00:15,510 --> 00:00:18,390
In most analyses, you perform standard deviation
8
00:00:18,390 --> 00:00:21,267
will be much more meaningful than variants.
9
00:00:21,267 --> 00:00:23,850
As we saw in the previous lecture.
10
00:00:23,850 --> 00:00:24,900
There are different measures
11
00:00:24,900 --> 00:00:27,750
for the population and sample variants.
12
00:00:27,750 --> 00:00:30,180
Consequently, there is also population
13
00:00:30,180 --> 00:00:32,012
and sample standard deviation.
14
00:00:33,000 --> 00:00:36,090
The formulas are the square root of the population variance
15
00:00:36,090 --> 00:00:38,740
and square root of the sample variance, respectively.
16
00:00:39,600 --> 00:00:40,920
I believe there is no need
17
00:00:40,920 --> 00:00:43,804
for an example of the calculation, right?
18
00:00:43,804 --> 00:00:46,170
If you have a calculator in your hands
19
00:00:46,170 --> 00:00:48,549
you'll be able to do the job.
20
00:00:48,549 --> 00:00:49,950
All right?
21
00:00:49,950 --> 00:00:52,110
The other measure we still have to introduce is
22
00:00:52,110 --> 00:00:54,390
the coefficient of variation.
23
00:00:54,390 --> 00:00:57,663
It is equal to the standard deviation divided by the mean.
24
00:00:58,620 --> 00:01:02,070
Another name for the term is relative standard deviation.
25
00:01:02,070 --> 00:01:04,500
This is an easy way to remember its formula.
26
00:01:04,500 --> 00:01:08,516
It is simply the standard deviation relative to the mean.
27
00:01:08,516 --> 00:01:10,500
As you probably guessed
28
00:01:10,500 --> 00:01:13,360
there is a population and sample formula once again
29
00:01:14,580 --> 00:01:17,670
so standard deviation is the most common measure
30
00:01:17,670 --> 00:01:20,160
of variability for a single data set.
31
00:01:20,160 --> 00:01:22,200
But why do we need yet another measure such
32
00:01:22,200 --> 00:01:24,033
as the coefficient of variation?
33
00:01:24,900 --> 00:01:27,360
Well, comparing the standard deviations
34
00:01:27,360 --> 00:01:30,150
of two different data sets is meaningless
35
00:01:30,150 --> 00:01:33,543
but comparing coefficients of variation is not.
36
00:01:34,410 --> 00:01:38,490
Aristotle once said, Tell me, I'll forget.
37
00:01:38,490 --> 00:01:40,620
Show me, I'll remember.
38
00:01:40,620 --> 00:01:43,710
Involve me, I'll understand.
39
00:01:43,710 --> 00:01:46,020
To make sure you remember, here's an example
40
00:01:46,020 --> 00:01:48,633
of a comparison between standard deviations.
41
00:01:49,500 --> 00:01:52,620
Let's take the prices of pizza at 10 different places
42
00:01:52,620 --> 00:01:56,223
in New York, they range from one to $11.
43
00:01:57,996 --> 00:02:00,660
Now, imagine that you only have Mexican pesos, and to you
44
00:02:00,660 --> 00:02:05,660
the prices look more like 18.81 pesos to 206.91 pesos.
45
00:02:06,120 --> 00:02:09,310
Given the exchange rate of 18.81 pesos for $1
46
00:02:10,860 --> 00:02:13,770
let's combine our knowledge so far and find the standard
47
00:02:13,770 --> 00:02:16,080
deviations and coefficients of variation
48
00:02:16,080 --> 00:02:17,433
of these two data sets.
49
00:02:18,510 --> 00:02:22,320
First, we have to see if this is a sample or a population.
50
00:02:22,320 --> 00:02:24,720
Are there only 11 restaurants in New York?
51
00:02:24,720 --> 00:02:25,650
Of course not.
52
00:02:25,650 --> 00:02:28,440
This is obviously a sample drawn from all the restaurants
53
00:02:28,440 --> 00:02:29,550
in the city.
54
00:02:29,550 --> 00:02:31,920
Then we have to use the formulas for sample measures
55
00:02:31,920 --> 00:02:33,750
of variability.
56
00:02:33,750 --> 00:02:36,450
Second, we have to find the mean.
57
00:02:36,450 --> 00:02:40,230
The mean in dollars is equal to 5.5 and the mean in pesos
58
00:02:40,230 --> 00:02:42,393
to 103.46.
59
00:02:43,560 --> 00:02:45,587
The third step of the process
60
00:02:45,587 --> 00:02:46,680
is finding the sample variance.
61
00:02:46,680 --> 00:02:48,930
Following the formula that we showed earlier
62
00:02:48,930 --> 00:02:53,930
we can obtain $10.72 squared and 3793.69 peso squared.
63
00:02:57,510 --> 00:02:59,650
The respect of sample standard deviations
64
00:03:00,547 --> 00:03:03,843
are $3.27 and 61.59 pesos.
65
00:03:05,190 --> 00:03:07,170
Let's make a couple of observations.
66
00:03:07,170 --> 00:03:08,910
First, variance gives results
67
00:03:08,910 --> 00:03:13,140
in squared units while standard deviation in original units.
68
00:03:13,140 --> 00:03:15,390
This is the main reason why professionals prefer to
69
00:03:15,390 --> 00:03:18,870
use standard deviation as the main measure of variability.
70
00:03:18,870 --> 00:03:20,880
It is directly interpretable.
71
00:03:20,880 --> 00:03:23,190
Square dollars means nothing even
72
00:03:23,190 --> 00:03:25,124
in the field of statistics.
73
00:03:25,124 --> 00:03:29,160
Second, we got standard deviations of 3.27
74
00:03:29,160 --> 00:03:31,890
and 61.59 for the same pizza
75
00:03:31,890 --> 00:03:34,800
at the same 11 restaurants in New York City.
76
00:03:34,800 --> 00:03:36,390
Seems wrong, right?
77
00:03:36,390 --> 00:03:37,380
Don't worry.
78
00:03:37,380 --> 00:03:39,930
It is time to use our last tool
79
00:03:39,930 --> 00:03:42,840
the coefficient of variation
80
00:03:42,840 --> 00:03:45,780
dividing the standard deviations by the respective means.
81
00:03:45,780 --> 00:03:48,540
We get the two coefficients of variation.
82
00:03:48,540 --> 00:03:51,693
The result is the same, 0.60.
83
00:03:52,740 --> 00:03:56,010
Notice that it is not dollars pesos, dollars squared
84
00:03:56,010 --> 00:03:57,330
or peso squared.
85
00:03:57,330 --> 00:03:59,433
It is just zero point 60.
86
00:04:00,390 --> 00:04:02,430
This shows us the great advantage
87
00:04:02,430 --> 00:04:05,490
that the coefficient of variation gives us.
88
00:04:05,490 --> 00:04:07,530
Now, we can confidently say
89
00:04:07,530 --> 00:04:10,230
that the two data sets have the same variability
90
00:04:10,230 --> 00:04:12,363
which is what we expected beforehand.
91
00:04:13,890 --> 00:04:16,290
Let's recap what we have learned so far.
92
00:04:16,290 --> 00:04:18,930
There are three main measures of variability,
93
00:04:18,930 --> 00:04:22,830
variance, standard deviation and coefficient of variation.
94
00:04:22,830 --> 00:04:25,710
Each of them has different strengths and applications.
95
00:04:25,710 --> 00:04:27,750
You should feel confident using all of them
96
00:04:27,750 --> 00:04:31,380
as we are getting closer to more complex statistical topics.
97
00:04:31,380 --> 00:04:33,690
And remember, Aristotle's advice,
98
00:04:33,690 --> 00:04:36,600
involve me, I'll understand.
99
00:04:36,600 --> 00:04:40,203
So please don't forget to get involved with the exercises.
7770
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.