Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,750 --> 00:00:02,850
Instructor: All right, excellent.
2
00:00:02,850 --> 00:00:05,310
We've covered all univariate measures.
3
00:00:05,310 --> 00:00:07,890
Now it is time to see measures that are used when we work
4
00:00:07,890 --> 00:00:09,780
with more than one variable.
5
00:00:09,780 --> 00:00:12,120
In the next two lessons, we'll explore measures
6
00:00:12,120 --> 00:00:15,123
that can help us explore the relationship between variables.
7
00:00:15,960 --> 00:00:18,842
Our focus will be on covariance
8
00:00:18,842 --> 00:00:21,360
and the linear correlation coefficient.
9
00:00:21,360 --> 00:00:22,620
Let's zoom out a bit and think
10
00:00:22,620 --> 00:00:24,990
of an example that is very easy to understand
11
00:00:24,990 --> 00:00:26,550
and will help us grasp the nature
12
00:00:26,550 --> 00:00:30,120
of the relationship between two variables a bit better.
13
00:00:30,120 --> 00:00:32,250
Think about real estate, which is one
14
00:00:32,250 --> 00:00:35,400
of the main factors that determine house prices?
15
00:00:35,400 --> 00:00:36,543
Their size, right?
16
00:00:37,500 --> 00:00:39,900
Typically, larger houses are more expensive,
17
00:00:39,900 --> 00:00:42,480
as people like having extra space.
18
00:00:42,480 --> 00:00:43,860
The table that you can see here
19
00:00:43,860 --> 00:00:46,440
shows us data about several houses.
20
00:00:46,440 --> 00:00:49,140
On the left side, we can see the size of each house,
21
00:00:49,140 --> 00:00:50,760
and on the right we have the price
22
00:00:50,760 --> 00:00:53,110
at which it's been listed in a local newspaper.
23
00:00:54,120 --> 00:00:57,090
We can present these data points in a scatter plot.
24
00:00:57,090 --> 00:00:59,580
The x-axis will show a house's size,
25
00:00:59,580 --> 00:01:02,523
and the Y-axis will provide information about its price.
26
00:01:03,600 --> 00:01:05,640
We can certainly notice a pattern.
27
00:01:05,640 --> 00:01:08,523
There is a clear relationship between these variables.
28
00:01:09,600 --> 00:01:11,970
We say that the two variables are correlated,
29
00:01:11,970 --> 00:01:14,400
and the main statistic to measure this correlation
30
00:01:14,400 --> 00:01:15,723
is called covariance.
31
00:01:16,680 --> 00:01:19,260
Unlike variance, covariance may be positive,
32
00:01:19,260 --> 00:01:21,123
equal to zero, or negative.
33
00:01:22,140 --> 00:01:24,000
To understand the concept better,
34
00:01:24,000 --> 00:01:26,070
I would like to show you the formulas that allow us to
35
00:01:26,070 --> 00:01:29,070
calculate the covariance between two variables.
36
00:01:29,070 --> 00:01:31,680
It is formulas with an S, because once again
37
00:01:31,680 --> 00:01:34,770
there is a sample and a population formula.
38
00:01:34,770 --> 00:01:35,603
Here they are.
39
00:01:37,260 --> 00:01:39,480
Since this is obviously sample data
40
00:01:39,480 --> 00:01:41,763
we should use the sample covariance formula.
41
00:01:43,200 --> 00:01:44,490
Let's apply it in practice
42
00:01:44,490 --> 00:01:46,770
for the example that we saw earlier.
43
00:01:46,770 --> 00:01:49,893
X will be house size and Y stands for house price.
44
00:01:51,120 --> 00:01:52,350
First, we need to calculate
45
00:01:52,350 --> 00:01:54,333
the mean size and the mean price.
46
00:01:55,350 --> 00:01:58,170
I will also compute the sample standard deviations
47
00:01:58,170 --> 00:02:00,630
in case we need them later on.
48
00:02:00,630 --> 00:02:02,103
Okay, done.
49
00:02:03,150 --> 00:02:05,220
Now let's calculate the nominator
50
00:02:05,220 --> 00:02:06,663
of the covariance function.
51
00:02:07,620 --> 00:02:09,060
Starting with the first house
52
00:02:09,060 --> 00:02:11,130
I'll multiply the difference between its size
53
00:02:11,130 --> 00:02:12,480
and the average house size,
54
00:02:12,480 --> 00:02:13,800
by the difference between the price
55
00:02:13,800 --> 00:02:16,100
of the same house and the average house price.
56
00:02:17,670 --> 00:02:20,250
Once we're ready, we have to perform this calculation
57
00:02:20,250 --> 00:02:22,530
for all houses that we have in the table,
58
00:02:22,530 --> 00:02:24,663
and then sum the numbers we've obtained.
59
00:02:26,220 --> 00:02:27,090
See?
60
00:02:27,090 --> 00:02:27,923
Great.
61
00:02:29,190 --> 00:02:31,500
Our sample size is five.
62
00:02:31,500 --> 00:02:33,420
Now we have to divide the sum above
63
00:02:33,420 --> 00:02:35,583
by the sample size minus one.
64
00:02:36,780 --> 00:02:38,760
The result is the covariance.
65
00:02:38,760 --> 00:02:40,350
It gives us a sense of the direction
66
00:02:40,350 --> 00:02:42,990
in which the two variables are moving.
67
00:02:42,990 --> 00:02:44,490
If they go in the same direction,
68
00:02:44,490 --> 00:02:46,530
the covariance will have a positive sign.
69
00:02:46,530 --> 00:02:48,390
While if they move in opposite directions,
70
00:02:48,390 --> 00:02:50,523
the covariance will have a negative sign.
71
00:02:51,510 --> 00:02:53,640
Finally, if their movements are independent,
72
00:02:53,640 --> 00:02:55,410
the covariance between the house size
73
00:02:55,410 --> 00:02:57,243
and its price will be equal to zero.
74
00:02:58,410 --> 00:03:01,770
There is just one tiny problem with covariance though,
75
00:03:01,770 --> 00:03:03,810
it could be a number like five or 50,
76
00:03:03,810 --> 00:03:08,100
but it can also be something like 0.0023456,
77
00:03:08,100 --> 00:03:11,670
or even over 30 million as in our example.
78
00:03:11,670 --> 00:03:14,700
Values of a completely different scale.
79
00:03:14,700 --> 00:03:16,920
How could one interpret such numbers?
80
00:03:16,920 --> 00:03:19,140
Proceed to the next lecture to find out how
81
00:03:19,140 --> 00:03:22,830
the correlation coefficient can help us with this issue.
82
00:03:22,830 --> 00:03:23,830
Thanks for watching.
6283
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.