Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,420 --> 00:00:01,859
Instructor: Next on our to-do list
2
00:00:01,859 --> 00:00:04,019
are the measures of variability.
3
00:00:04,019 --> 00:00:06,060
There are many ways to quantify variability.
4
00:00:06,060 --> 00:00:08,600
However, we will focus on the most common ones,
5
00:00:08,600 --> 00:00:12,753
variance, standard deviation, and coefficient of variation.
6
00:00:14,010 --> 00:00:15,270
In the field of statistics,
7
00:00:15,270 --> 00:00:17,550
we will typically use different formulas when working
8
00:00:17,550 --> 00:00:20,100
with population data and sample data.
9
00:00:20,100 --> 00:00:21,753
Let's think about this for a bit.
10
00:00:22,830 --> 00:00:24,600
When you have the whole population,
11
00:00:24,600 --> 00:00:25,950
each data point is known
12
00:00:25,950 --> 00:00:28,863
so you are 100% sure of the measures you are calculating.
13
00:00:29,790 --> 00:00:31,800
When you take a sample of this population
14
00:00:31,800 --> 00:00:33,990
and you compute a sample statistic,
15
00:00:33,990 --> 00:00:35,760
it is interpreted as an approximation
16
00:00:35,760 --> 00:00:37,680
of the population parameter.
17
00:00:37,680 --> 00:00:40,110
Moreover, if you extract 10 different samples
18
00:00:40,110 --> 00:00:41,640
from the same population,
19
00:00:41,640 --> 00:00:43,815
you will get 10 different measures.
20
00:00:43,815 --> 00:00:46,140
Statisticians have solved the problem
21
00:00:46,140 --> 00:00:48,030
by adjusting the algebraic formulas
22
00:00:48,030 --> 00:00:50,760
for many statistics to reflect this issue.
23
00:00:50,760 --> 00:00:53,400
Therefore, we will explore both population
24
00:00:53,400 --> 00:00:56,223
and sample formulas as they are both used.
25
00:00:57,480 --> 00:01:00,450
You must be asking yourself why there are unique formulas
26
00:01:00,450 --> 00:01:02,970
for the mean, median, and mode.
27
00:01:02,970 --> 00:01:05,760
Well, actually, the sample mean is the average
28
00:01:05,760 --> 00:01:07,230
of the sample data points,
29
00:01:07,230 --> 00:01:09,630
while the population mean is the average
30
00:01:09,630 --> 00:01:11,670
of the population data points.
31
00:01:11,670 --> 00:01:14,400
So technically, there are two different formulas
32
00:01:14,400 --> 00:01:16,803
but they are computed in the same way.
33
00:01:18,030 --> 00:01:21,390
Okay, now after this short clarification,
34
00:01:21,390 --> 00:01:23,493
it's time to get onto variance.
35
00:01:24,450 --> 00:01:27,120
Variance measures the dispersion of a set of data points
36
00:01:27,120 --> 00:01:28,623
around their mean value.
37
00:01:29,580 --> 00:01:32,520
Population variance, denoted by sigma squared,
38
00:01:32,520 --> 00:01:34,500
is equal to the sum of square differences
39
00:01:34,500 --> 00:01:37,500
between the observed values and the population mean
40
00:01:37,500 --> 00:01:40,143
divided by the total number of observations.
41
00:01:41,790 --> 00:01:46,080
Sample variance, on the other hand, is denoted by S squared,
42
00:01:46,080 --> 00:01:48,360
and is equal to the sum of squared differences
43
00:01:48,360 --> 00:01:51,687
between observed sample values and the sample mean
44
00:01:51,687 --> 00:01:55,863
divided by the number of sample observations minus one.
45
00:01:57,360 --> 00:01:58,590
All right.
46
00:01:58,590 --> 00:02:00,510
When you are getting acquainted with statistics,
47
00:02:00,510 --> 00:02:03,390
it is hard to grasp everything right away.
48
00:02:03,390 --> 00:02:06,660
Therefore, let's stop for a second to examine the formula
49
00:02:06,660 --> 00:02:09,513
for the population and try to clarify its meaning.
50
00:02:10,530 --> 00:02:12,810
The main part of the formula is its numerator
51
00:02:12,810 --> 00:02:15,810
so that's what we want to comprehend.
52
00:02:15,810 --> 00:02:18,090
The sum of differences between the observations
53
00:02:18,090 --> 00:02:20,220
and the mean squared.
54
00:02:20,220 --> 00:02:23,730
Hmm, so the closer a number to the mean,
55
00:02:23,730 --> 00:02:26,403
the lower the results we will obtain, right?
56
00:02:27,270 --> 00:02:29,610
And the further away from the mean it lies,
57
00:02:29,610 --> 00:02:31,942
the larger this difference.
58
00:02:31,942 --> 00:02:33,600
Easy.
59
00:02:33,600 --> 00:02:36,243
But why do we elevate to the second degree?
60
00:02:37,170 --> 00:02:40,770
Squaring the differences has two main purposes.
61
00:02:40,770 --> 00:02:42,600
First, by squaring the numbers,
62
00:02:42,600 --> 00:02:45,450
we always get non-negative computations.
63
00:02:45,450 --> 00:02:48,060
Without going too deep into the mathematics of it,
64
00:02:48,060 --> 00:02:51,180
it is intuitive that dispersion cannot be negative.
65
00:02:51,180 --> 00:02:52,890
Dispersion is about distance,
66
00:02:52,890 --> 00:02:55,053
and distance cannot be negative.
67
00:02:56,430 --> 00:02:59,040
If on the other hand, we calculate the difference,
68
00:02:59,040 --> 00:03:01,140
and do not elevate to the second degree,
69
00:03:01,140 --> 00:03:03,540
we would obtain both positive and negative values
70
00:03:03,540 --> 00:03:05,610
that, when summed, would cancel out,
71
00:03:05,610 --> 00:03:08,313
leaving us with no information about the dispersion.
72
00:03:09,810 --> 00:03:13,503
Second, squaring amplifies the effect of large differences.
73
00:03:14,400 --> 00:03:16,410
For example, if the mean is zero,
74
00:03:16,410 --> 00:03:18,540
and you have an observation of 100,
75
00:03:18,540 --> 00:03:21,570
the squared spread is 10,000.
76
00:03:21,570 --> 00:03:23,730
All right, enough dry theory.
77
00:03:23,730 --> 00:03:26,223
It is time for a practical example.
78
00:03:27,150 --> 00:03:30,060
We have a population of five observations,
79
00:03:30,060 --> 00:03:33,480
One, two, three, four, and five.
80
00:03:33,480 --> 00:03:35,043
Let's find its variance.
81
00:03:35,910 --> 00:03:38,070
We start by calculating the mean,
82
00:03:38,070 --> 00:03:42,150
one plus two plus three plus four plus five
83
00:03:42,150 --> 00:03:44,583
divided by five equals three.
84
00:03:45,510 --> 00:03:48,150
Then we apply the formula we just saw.
85
00:03:48,150 --> 00:03:53,150
One minus three squared plus two minus three squared,
86
00:03:53,700 --> 00:03:57,120
plus three minus three squared,
87
00:03:57,120 --> 00:04:00,660
plus four minus three squared,
88
00:04:00,660 --> 00:04:04,500
plus five minus three squared.
89
00:04:04,500 --> 00:04:07,410
All of these components have to be divided by five.
90
00:04:07,410 --> 00:04:10,410
When we do the math, we get two.
91
00:04:10,410 --> 00:04:14,520
So the population variance of the data set is two.
92
00:04:14,520 --> 00:04:17,040
But what about the sample variance?
93
00:04:17,040 --> 00:04:19,079
This would only be suitable if we were told
94
00:04:19,079 --> 00:04:21,540
that these five observations were a sample drawn
95
00:04:21,540 --> 00:04:23,250
from a population.
96
00:04:23,250 --> 00:04:25,563
So let's imagine that's the case.
97
00:04:26,430 --> 00:04:29,370
The sample mean is once again, three.
98
00:04:29,370 --> 00:04:30,660
The numerator is the same,
99
00:04:30,660 --> 00:04:33,930
but the denominator is going to be four instead of five,
100
00:04:33,930 --> 00:04:37,023
giving us a sample variance of 2.5.
101
00:04:38,250 --> 00:04:40,110
To conclude the variance topic,
102
00:04:40,110 --> 00:04:42,030
we should interpret the result.
103
00:04:42,030 --> 00:04:43,890
Why is the sample variance bigger
104
00:04:43,890 --> 00:04:46,080
than the population variance?
105
00:04:46,080 --> 00:04:48,300
In the first case, we knew the population.
106
00:04:48,300 --> 00:04:49,950
That is, we had all the data,
107
00:04:49,950 --> 00:04:52,050
and we calculated the variance.
108
00:04:52,050 --> 00:04:54,030
In the second case, we were told
109
00:04:54,030 --> 00:04:57,720
that one, two, three, four, and five was a sample
110
00:04:57,720 --> 00:05:00,330
drawn from a bigger population.
111
00:05:00,330 --> 00:05:02,430
Imagine the population of this sample
112
00:05:02,430 --> 00:05:04,140
where these nine numbers,
113
00:05:04,140 --> 00:05:08,100
one, one, one, two, three, four,
114
00:05:08,100 --> 00:05:11,160
five, five, five, and five.
115
00:05:11,160 --> 00:05:13,320
Clearly the numbers are the same,
116
00:05:13,320 --> 00:05:14,850
but there is a concentration
117
00:05:14,850 --> 00:05:19,470
around the two extremes of the data set, one and five.
118
00:05:19,470 --> 00:05:22,953
The variance of this population is 2.96.
119
00:05:24,060 --> 00:05:27,810
So our sample variance has rightfully corrected upwards
120
00:05:27,810 --> 00:05:30,663
in order to reflect the higher potential variability.
121
00:05:31,680 --> 00:05:34,230
This is the reason why there are different formulas
122
00:05:34,230 --> 00:05:36,393
for sample and population data.
123
00:05:37,477 --> 00:05:39,690
This was a very important lesson,
124
00:05:39,690 --> 00:05:42,690
so please make sure that you have understood it well.
125
00:05:42,690 --> 00:05:44,400
You can reinforce what you learned here
126
00:05:44,400 --> 00:05:46,380
by doing the exercise available
127
00:05:46,380 --> 00:05:48,750
in the course resources section.
128
00:05:48,750 --> 00:05:50,910
Remember, the subject of statistics
129
00:05:50,910 --> 00:05:53,880
is only understood when practiced.
130
00:05:53,880 --> 00:05:54,880
Thanks for watching.
10095
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.