Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,004 --> 00:00:01,007
- [Instructor] When you analyze data,
2
00:00:01,007 --> 00:00:04,004
it is often important to see how two sets of values
3
00:00:04,004 --> 00:00:07,000
vary in relation to one another.
4
00:00:07,000 --> 00:00:08,000
One way to do that
5
00:00:08,000 --> 00:00:11,008
is to calculate the covariance of the data sets.
6
00:00:11,008 --> 00:00:14,007
The covariance formula looks a little intimidating,
7
00:00:14,007 --> 00:00:18,000
but I'll break it down for you step by step.
8
00:00:18,000 --> 00:00:20,000
If you have two sets of values and columns,
9
00:00:20,000 --> 00:00:22,002
you can find the average of each column.
10
00:00:22,002 --> 00:00:25,008
And the average is indicated by a bar
11
00:00:25,008 --> 00:00:27,006
above the variable name.
12
00:00:27,006 --> 00:00:28,007
So you see X bar,
13
00:00:28,007 --> 00:00:31,004
which is the average of all the X values, column one.
14
00:00:31,004 --> 00:00:35,004
And Y bar is the average or mean of all of the Y values.
15
00:00:35,004 --> 00:00:38,002
That would be your second column.
16
00:00:38,002 --> 00:00:40,006
Then, to average the covariance of a pair of values,
17
00:00:40,006 --> 00:00:42,009
you subtract the mean of column one
18
00:00:42,009 --> 00:00:44,008
from the first value in column one,
19
00:00:44,008 --> 00:00:46,008
and subtract the mean of column two
20
00:00:46,008 --> 00:00:48,009
from the first value of column two,
21
00:00:48,009 --> 00:00:51,003
and then multiply those values together
22
00:00:51,003 --> 00:00:54,002
for each pair of data points.
23
00:00:54,002 --> 00:00:56,000
You find the sum of all of those values
24
00:00:56,000 --> 00:01:00,001
and then, finally, you divide by the number of data pairs.
25
00:01:00,001 --> 00:01:02,001
So if you have 10 pairs of values,
26
00:01:02,001 --> 00:01:05,003
you would divide that sum by 10.
27
00:01:05,003 --> 00:01:08,003
And the result is given in terms of the original data,
28
00:01:08,003 --> 00:01:11,003
such as dollars, per mile driven,
29
00:01:11,003 --> 00:01:15,002
or perhaps customers versus dollar spent.
30
00:01:15,002 --> 00:01:18,009
The next question is how you interpret covariance values.
31
00:01:18,009 --> 00:01:22,005
If the value is zero, which is rare, but can happen,
32
00:01:22,005 --> 00:01:25,005
the data sets don't vary together at all.
33
00:01:25,005 --> 00:01:27,006
If you have a positive covariance,
34
00:01:27,006 --> 00:01:30,002
the data sets tend to move in the same direction.
35
00:01:30,002 --> 00:01:32,003
So if one value goes up,
36
00:01:32,003 --> 00:01:34,000
such as personal income,
37
00:01:34,000 --> 00:01:35,005
then the other value would go up,
38
00:01:35,005 --> 00:01:38,007
such as amount spent at your store.
39
00:01:38,007 --> 00:01:41,001
If the values show a negative covariance,
40
00:01:41,001 --> 00:01:44,006
then they tend to move in opposite directions.
41
00:01:44,006 --> 00:01:46,006
Finally, then, you can ask,
42
00:01:46,006 --> 00:01:48,005
is my covariance significant?
43
00:01:48,005 --> 00:01:49,007
And I have to admit
44
00:01:49,007 --> 00:01:53,001
that this is a difficult question to answer.
45
00:01:53,001 --> 00:01:55,001
In general, values close to zero
46
00:01:55,001 --> 00:01:58,004
do indicate little relationship.
47
00:01:58,004 --> 00:02:00,006
And also, large positive or negative values
48
00:02:00,006 --> 00:02:02,005
can be significant.
49
00:02:02,005 --> 00:02:04,003
So look at the covariance
50
00:02:04,003 --> 00:02:07,002
in relation to the means of each data set.
51
00:02:07,002 --> 00:02:13,001
If you have a positive covariance of 500 and a mean of 100,
52
00:02:13,001 --> 00:02:16,006
then the covariance is five times greater than the mean,
53
00:02:16,006 --> 00:02:18,005
and that might be significant.
54
00:02:18,005 --> 00:02:21,001
But that kind of analysis does come down
55
00:02:21,001 --> 00:02:23,005
to individual interpretation.
56
00:02:23,005 --> 00:02:25,008
In practice, what most analysts do
57
00:02:25,008 --> 00:02:29,001
is convert covariances to correlations.
58
00:02:29,001 --> 00:02:30,007
But before we take that step,
59
00:02:30,007 --> 00:02:32,007
I will show you how to calculate covariance
60
00:02:32,007 --> 00:02:34,000
for different sets of values.
4602
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.