Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,004 --> 00:00:01,007
- [Instructor] When you analyze data,
2
00:00:01,007 --> 00:00:04,003
it's often important to see how two sets of value
3
00:00:04,003 --> 00:00:06,007
vary in relation to one another.
4
00:00:06,007 --> 00:00:09,003
For example, if people drive to get to your store,
5
00:00:09,003 --> 00:00:11,009
you might be interested in knowing if the distance
6
00:00:11,009 --> 00:00:14,009
they drive is related to how much they spend.
7
00:00:14,009 --> 00:00:17,000
In this movie, I will show you how
8
00:00:17,000 --> 00:00:20,004
to look at the relationship between two sets of values
9
00:00:20,004 --> 00:00:22,009
by calculating covariance.
10
00:00:22,009 --> 00:00:26,002
My sample file is 05_02_SingleCovariance.
11
00:00:26,002 --> 00:00:28,004
And you can find it in the Chapter05 folder
12
00:00:28,004 --> 00:00:30,007
of the Exercise Files collection.
13
00:00:30,007 --> 00:00:33,009
I'll show you two different ways to calculate covariance.
14
00:00:33,009 --> 00:00:35,005
The first is the long way.
15
00:00:35,005 --> 00:00:39,004
So I will implement the formula that you see here
16
00:00:39,004 --> 00:00:42,008
in this graphic as part of the worksheet.
17
00:00:42,008 --> 00:00:45,005
And then once you understand what's going on,
18
00:00:45,005 --> 00:00:48,001
I will show you how to use a built-in function
19
00:00:48,001 --> 00:00:50,003
to perform the same calculation.
20
00:00:50,003 --> 00:00:52,002
Rather than give an overview,
21
00:00:52,002 --> 00:00:55,000
I will go ahead and start implementing the formula
22
00:00:55,000 --> 00:00:58,000
that you see on the right in cell C2.
23
00:00:58,000 --> 00:01:01,000
So this will be for our first pair of values.
24
00:01:01,000 --> 00:01:04,001
So in C2, I will type an equal sign.
25
00:01:04,001 --> 00:01:08,004
And the first thing I want to do is subtract the average
26
00:01:08,004 --> 00:01:10,006
of the values in column one
27
00:01:10,006 --> 00:01:14,007
from the specific value in cell A2.
28
00:01:14,007 --> 00:01:19,003
So I'll type a left parentheses, A2,
29
00:01:19,003 --> 00:01:22,006
minus, and then we want to find the average or mean
30
00:01:22,006 --> 00:01:23,007
of the values in column one.
31
00:01:23,007 --> 00:01:25,007
So that's AVERAGE.
32
00:01:25,007 --> 00:01:30,006
And the range is A2 to A11.
33
00:01:30,006 --> 00:01:33,001
And I don't want that reference to change.
34
00:01:33,001 --> 00:01:35,002
I always want to refer to that set of values
35
00:01:35,002 --> 00:01:36,003
for the average.
36
00:01:36,003 --> 00:01:40,007
So I'll press F4 and I'll go back to A2.
37
00:01:40,007 --> 00:01:44,001
Press F4 to make that an absolute reference.
38
00:01:44,001 --> 00:01:46,000
So I have my range of values,
39
00:01:46,000 --> 00:01:49,005
and I can type two right parentheses
40
00:01:49,005 --> 00:01:54,000
to close out that part of the calculation.
41
00:01:54,000 --> 00:01:57,001
Then I'll type in asterisk for multiplication,
42
00:01:57,001 --> 00:02:00,004
and we'll do the same thing for the values in column two.
43
00:02:00,004 --> 00:02:02,002
So I'll go a little faster.
44
00:02:02,002 --> 00:02:06,005
We're going to multiply that by, left parentheses, B2
45
00:02:06,005 --> 00:02:11,004
minus the average of the values in B2.
46
00:02:11,004 --> 00:02:13,008
And again, that will be an absolute reference.
47
00:02:13,008 --> 00:02:17,001
Colon B11.
48
00:02:17,001 --> 00:02:19,007
And F4 again on the PC.
49
00:02:19,007 --> 00:02:23,000
On Mac, its command + T.
50
00:02:23,000 --> 00:02:26,007
Then two right parentheses, and I will press enter
51
00:02:26,007 --> 00:02:28,007
to create our calculation.
52
00:02:28,007 --> 00:02:31,007
And I have a covariance of 4.68.
53
00:02:31,007 --> 00:02:34,002
And it is a small value because we're working
54
00:02:34,002 --> 00:02:37,002
with small values in columns A and B.
55
00:02:37,002 --> 00:02:39,002
Now I want to copy this formula down.
56
00:02:39,002 --> 00:02:43,000
So it covers every row of values going down
57
00:02:43,000 --> 00:02:45,002
to row 11 in the worksheet.
58
00:02:45,002 --> 00:02:47,004
So I will click cell C2.
59
00:02:47,004 --> 00:02:50,003
Double-click the fill handle at the bottom-right corner.
60
00:02:50,003 --> 00:02:52,000
I know my mouse pointer is in the right place
61
00:02:52,000 --> 00:02:54,003
when it changes to a black cross.
62
00:02:54,003 --> 00:02:55,005
So I double-clicked,
63
00:02:55,005 --> 00:03:01,004
and there I have covariances for each pair of values.
64
00:03:01,004 --> 00:03:03,005
Now, I want to add all of that up,
65
00:03:03,005 --> 00:03:07,002
so I will click in cell C12.
66
00:03:07,002 --> 00:03:08,005
Type an equal sign.
67
00:03:08,005 --> 00:03:17,001
And then, I'll do SUM, and that will be C2 to C11.
68
00:03:17,001 --> 00:03:19,009
And I want to divide that by the number of pairs,
69
00:03:19,009 --> 00:03:23,007
and I'll type in a forward slash for division,
70
00:03:23,007 --> 00:03:26,005
10 'cause that's the number I have, and enter.
71
00:03:26,005 --> 00:03:31,001
And I have an overall covariance of 0.48.
72
00:03:31,001 --> 00:03:35,001
So that is the mechanics of how the calculation is done.
73
00:03:35,001 --> 00:03:38,005
Now, I'll show you how to do it with a built-in function.
74
00:03:38,005 --> 00:03:40,004
So I'll click in cell E3.
75
00:03:40,004 --> 00:03:42,001
Type an equal sign.
76
00:03:42,001 --> 00:03:48,004
And the function that we'll use is COVARIANCE.P.
77
00:03:48,004 --> 00:03:52,002
COVARIANCE.P returns the covariance of a population.
78
00:03:52,002 --> 00:03:54,001
And a population calculation assumes
79
00:03:54,001 --> 00:03:56,004
that you have every possible value,
80
00:03:56,004 --> 00:04:01,001
and you're working with them as part of the calculation.
81
00:04:01,001 --> 00:04:04,002
And then you need to enter in the two arrays of value.
82
00:04:04,002 --> 00:04:08,005
So I have A2 through A11, then a comma,
83
00:04:08,005 --> 00:04:11,002
and B2 through B11.
84
00:04:11,002 --> 00:04:14,000
Then a right parenthesis and enter.
85
00:04:14,000 --> 00:04:15,003
And you see that once again,
86
00:04:15,003 --> 00:04:19,000
we get the value of negative 0.48.
87
00:04:19,000 --> 00:04:21,001
So this value is very close to zero.
88
00:04:21,001 --> 00:04:23,003
And even though we're working with small numbers,
89
00:04:23,003 --> 00:04:25,002
that indicates that there probably
90
00:04:25,002 --> 00:04:26,009
isn't a very strong relationship
91
00:04:26,009 --> 00:04:29,003
between these two sets of values.
92
00:04:29,003 --> 00:04:32,005
I'll go back up to E3 and double-click.
93
00:04:32,005 --> 00:04:36,007
And you might have noticed that when I was entering
94
00:04:36,007 --> 00:04:39,001
in the formula, we had two options
95
00:04:39,001 --> 00:04:41,004
for calculating covariance.
96
00:04:41,004 --> 00:04:43,009
COVARIANCE.P, which is what I demonstrated here,
97
00:04:43,009 --> 00:04:45,008
and COVARIANCE.S.
98
00:04:45,008 --> 00:04:48,006
S assumes that you have a sample of your data.
99
00:04:48,006 --> 00:04:50,003
So I will highlight that.
100
00:04:50,003 --> 00:04:53,000
Press tab, and the function has been changed
101
00:04:53,000 --> 00:04:53,008
in the formula.
102
00:04:53,008 --> 00:04:55,001
I'll press enter.
103
00:04:55,001 --> 00:04:58,009
And you can see that the value is slightly larger,
104
00:04:58,009 --> 00:05:01,001
or at least farther away from zero
105
00:05:01,001 --> 00:05:02,009
than the previous calculation.
106
00:05:02,009 --> 00:05:05,007
And that's because we are dividing by n minus one
107
00:05:05,007 --> 00:05:07,006
instead of just n.
108
00:05:07,006 --> 00:05:09,007
so it's a slightly more conservative way
109
00:05:09,007 --> 00:05:12,000
of doing the same calculation.
8366
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.