Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,004 --> 00:00:02,005
- [Instructor] When you analyze business data,
2
00:00:02,005 --> 00:00:03,008
you will often want to know
3
00:00:03,008 --> 00:00:06,006
whether two sets of values are related.
4
00:00:06,006 --> 00:00:10,005
One way to do that is to calculate correlation.
5
00:00:10,005 --> 00:00:13,008
Previously, I showed you how to calculate core variance.
6
00:00:13,008 --> 00:00:16,008
And as a reminder here is that formula.
7
00:00:16,008 --> 00:00:19,005
The idea is that you multiply the differences
8
00:00:19,005 --> 00:00:22,002
from the mean for each value pair,
9
00:00:22,002 --> 00:00:24,007
so we have two columns of data
10
00:00:24,007 --> 00:00:27,003
and we have an X and Y value that are matched up,
11
00:00:27,003 --> 00:00:31,000
and you find the sum of that product.
12
00:00:31,000 --> 00:00:35,001
You then divide that total by the number of data pairs.
13
00:00:35,001 --> 00:00:37,001
Correlation is related
14
00:00:37,001 --> 00:00:39,006
and here is the formula.
15
00:00:39,006 --> 00:00:42,003
So you'll see that the top term is the same
16
00:00:42,003 --> 00:00:44,007
but is divided by the term shown.
17
00:00:44,007 --> 00:00:46,006
And it takes a while to explain,
18
00:00:46,006 --> 00:00:51,003
so I will just ask you to accept the term as given
19
00:00:51,003 --> 00:00:54,002
and use it in your calculations.
20
00:00:54,002 --> 00:00:55,009
So the next question is
21
00:00:55,009 --> 00:00:58,008
how do you interpret your correlation values?
22
00:00:58,008 --> 00:01:01,007
You will get a value between 1 and - 1
23
00:01:01,007 --> 00:01:05,003
data that is completely uncorrelated returns
24
00:01:05,003 --> 00:01:07,000
a correlation value of 0.
25
00:01:07,000 --> 00:01:08,006
In other words, the two sets of values
26
00:01:08,006 --> 00:01:11,002
have nothing to do with each other.
27
00:01:11,002 --> 00:01:13,005
Data that is positively correlated
28
00:01:13,005 --> 00:01:16,006
will be between 0 and 1.
29
00:01:16,006 --> 00:01:18,005
And if you have 1, that means
30
00:01:18,005 --> 00:01:21,006
the data sets move and lockstep.
31
00:01:21,006 --> 00:01:24,007
In other words, they move the same way
32
00:01:24,007 --> 00:01:27,003
all the time, every time.
33
00:01:27,003 --> 00:01:29,008
And if you have data that is negatively correlated
34
00:01:29,008 --> 00:01:31,005
between minus 1 and 0
35
00:01:31,005 --> 00:01:34,005
then the data moves in opposite directions.
36
00:01:34,005 --> 00:01:39,001
So when one set of values goes up, the other one goes down.
37
00:01:39,001 --> 00:01:41,007
And of course it's possible to have values
38
00:01:41,007 --> 00:01:45,001
between 0 and 1 or 0 and -1,
39
00:01:45,001 --> 00:01:48,009
and that means the correlation isn't quite as strong.
40
00:01:48,009 --> 00:01:51,005
So let's take a look at a visual example
41
00:01:51,005 --> 00:01:53,008
of data that is not correlated.
42
00:01:53,008 --> 00:01:55,007
Here, I have five starting values
43
00:01:55,007 --> 00:01:57,004
and those are 1, 2, 3, 4, and 5.
44
00:01:57,004 --> 00:02:00,000
You can see those on the horizontal axis.
45
00:02:00,000 --> 00:02:03,006
And 1 produces two results, 3 and -3.
46
00:02:03,006 --> 00:02:08,004
And the same for the other values along the horizontal axis.
47
00:02:08,004 --> 00:02:10,003
What this means is that the starting value
48
00:02:10,003 --> 00:02:12,009
tells you nothing about the value that follows it,
49
00:02:12,009 --> 00:02:14,007
the X says nothing about the Y,
50
00:02:14,007 --> 00:02:17,006
so the correlation is 0.
51
00:02:17,006 --> 00:02:20,009
If you have data with a perfect positive correlation
52
00:02:20,009 --> 00:02:23,001
in other words, a correlation of 1,
53
00:02:23,001 --> 00:02:26,000
then you'll see that it goes up
54
00:02:26,000 --> 00:02:27,006
and the values exactly match.
55
00:02:27,006 --> 00:02:30,006
1 gives you 1, 2 gives you 2, and so on.
56
00:02:30,006 --> 00:02:33,000
It doesn't have to be this exact pattern
57
00:02:33,000 --> 00:02:36,000
but you can see a visual example
58
00:02:36,000 --> 00:02:38,009
of what a correlation of 1 looks like.
59
00:02:38,009 --> 00:02:41,008
And data with a perfect negative correlation
60
00:02:41,008 --> 00:02:44,009
goes in the opposite direction.
61
00:02:44,009 --> 00:02:47,003
The next question you have to ask, of course is,
62
00:02:47,003 --> 00:02:49,005
is my correlation significant?
63
00:02:49,005 --> 00:02:52,003
And that depends on several factors.
64
00:02:52,003 --> 00:02:54,003
You have the number of measurements
65
00:02:54,003 --> 00:02:56,009
whether a value can be positive or negative.
66
00:02:56,009 --> 00:02:59,006
And by that, I mean if you're looking
67
00:02:59,006 --> 00:03:01,008
for a positive or negative value only,
68
00:03:01,008 --> 00:03:03,003
in other words, one side,
69
00:03:03,003 --> 00:03:06,008
then that's called a one-tailed test.
70
00:03:06,008 --> 00:03:09,003
If your difference can be either positive or negative
71
00:03:09,003 --> 00:03:12,008
then you have a two-tailed test.
72
00:03:12,008 --> 00:03:15,000
And then you look up the correlation value
73
00:03:15,000 --> 00:03:18,007
in a table, which you can find in statistics,
74
00:03:18,007 --> 00:03:20,003
textbooks are online,
75
00:03:20,003 --> 00:03:22,005
and see what that looks like.
76
00:03:22,005 --> 00:03:24,001
And here is an example
77
00:03:24,001 --> 00:03:27,009
of a two-tailed correlation lookup table.
78
00:03:27,009 --> 00:03:30,007
And the leftmost column gives you the number
79
00:03:30,007 --> 00:03:32,004
of samples that you have,
80
00:03:32,004 --> 00:03:36,000
and then to the right you have different confidence levels.
81
00:03:36,000 --> 00:03:38,007
And you subtract that confidence level from 1
82
00:03:38,007 --> 00:03:42,002
to give you the value that you want.
83
00:03:42,002 --> 00:03:46,002
So, for example, in the column next to N, 0.1
84
00:03:46,002 --> 00:03:50,001
you subtract that from 1 for the 90% confidence interval
85
00:03:50,001 --> 00:03:52,001
or a confidence level.
86
00:03:52,001 --> 00:03:54,005
And then to the right of that, you have 0.05,
87
00:03:54,005 --> 00:03:57,005
so that's 95%, 98%,
88
00:03:57,005 --> 00:04:00,004
and the other two values you see there.
89
00:04:00,004 --> 00:04:05,008
So if you have 10 values and you calculate the correlation
90
00:04:05,008 --> 00:04:08,006
and you want to look at the 90% level
91
00:04:08,006 --> 00:04:11,009
then you would need to have a correlation value
92
00:04:11,009 --> 00:04:13,007
of at least 0.55,
93
00:04:13,007 --> 00:04:16,003
and that's the value at the intersection
94
00:04:16,003 --> 00:04:19,000
of 10 and 0.1
95
00:04:19,000 --> 00:04:23,000
to have a 90% certainty or 90% confidence level
96
00:04:23,000 --> 00:04:25,006
that the correlation is valid.
97
00:04:25,006 --> 00:04:30,001
And the various other combinations are available here.
98
00:04:30,001 --> 00:04:33,002
So again, there are a lot of moving parts to correlation
99
00:04:33,002 --> 00:04:36,000
but it is a very useful way to look at your data.
7609
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.