Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,600 --> 00:00:01,980
-: This lesson will introduce you to
2
00:00:01,980 --> 00:00:04,710
the three measures of central tendency.
3
00:00:04,710 --> 00:00:06,840
Don't be scared by the terminology,
4
00:00:06,840 --> 00:00:09,810
we are talking about mean, median, and mode.
5
00:00:09,810 --> 00:00:11,820
Even if you are familiar with these terms,
6
00:00:11,820 --> 00:00:13,770
please stick around as we will explore their
7
00:00:13,770 --> 00:00:16,020
upsides and shortfalls.
8
00:00:16,020 --> 00:00:16,950
Ready?
9
00:00:16,950 --> 00:00:18,510
Lets go.
10
00:00:18,510 --> 00:00:20,760
The first measure we will study is the mean,
11
00:00:20,760 --> 00:00:22,773
also know as the simple average.
12
00:00:23,640 --> 00:00:26,820
It is denoted by the Greek letter MU for a population
13
00:00:26,820 --> 00:00:28,950
and x bar for a sample.
14
00:00:28,950 --> 00:00:31,600
These notions will come in handy in the next section.
15
00:00:32,790 --> 00:00:35,580
We can find the mean of a data set by adding up all of it's
16
00:00:35,580 --> 00:00:38,080
components and then dividing them by their number.
17
00:00:39,030 --> 00:00:41,970
The mean is the most common measure of central tendency,
18
00:00:41,970 --> 00:00:44,280
but it has a huge downside.
19
00:00:44,280 --> 00:00:47,250
It is easily affected by outliers.
20
00:00:47,250 --> 00:00:49,443
Let's aid ourselves with an example.
21
00:00:50,910 --> 00:00:54,000
These are the prices of pizza at eleven different locations
22
00:00:54,000 --> 00:00:57,810
in New York City and ten different locations in LA.
23
00:00:57,810 --> 00:00:59,100
Let's calculate the mean's of
24
00:00:59,100 --> 00:01:01,233
the two data sets using the formula.
25
00:01:02,550 --> 00:01:05,580
For the mean in NYC, we get eleven dollars.
26
00:01:05,580 --> 00:01:08,343
Whereas for LA, just 5.5.
27
00:01:09,570 --> 00:01:11,940
On average, pizza in New York can't be
28
00:01:11,940 --> 00:01:14,940
twice as expensive as in LA, right?
29
00:01:14,940 --> 00:01:15,780
Correct!
30
00:01:15,780 --> 00:01:17,580
The problem is that in our sample,
31
00:01:17,580 --> 00:01:19,711
we included one posh place in New York
32
00:01:19,711 --> 00:01:22,740
where they charge 66 dollars for pizza
33
00:01:22,740 --> 00:01:24,093
and this doubled the mean.
34
00:01:25,110 --> 00:01:27,120
What we should take away from this example
35
00:01:27,120 --> 00:01:30,213
is that the mean is not enough to make definite conclusions.
36
00:01:31,620 --> 00:01:35,310
So, how can we protect ourselves from this issue?
37
00:01:35,310 --> 00:01:36,450
You guessed it!
38
00:01:36,450 --> 00:01:39,873
We can calculate the second measure, the median.
39
00:01:40,920 --> 00:01:43,110
The median is basically the middle number
40
00:01:43,110 --> 00:01:44,283
in an ordered data set.
41
00:01:45,330 --> 00:01:47,430
Let's see how it works for our example.
42
00:01:47,430 --> 00:01:49,560
In order to calculate the median we have to
43
00:01:49,560 --> 00:01:52,050
order our data in ascending order.
44
00:01:52,050 --> 00:01:54,720
The median of the data set is the number at position
45
00:01:54,720 --> 00:01:58,320
n + 1 divided by 2 in the ordered list,
46
00:01:58,320 --> 00:02:00,753
where n is the number of observations.
47
00:02:01,950 --> 00:02:04,380
Therefore, the median for NYC is at the
48
00:02:04,380 --> 00:02:07,200
sixth position, or six dollars.
49
00:02:07,200 --> 00:02:09,090
Much closer to the observed prices
50
00:02:09,090 --> 00:02:11,039
than the mean of eleven dollars, right?
51
00:02:12,330 --> 00:02:13,920
What about LA?
52
00:02:13,920 --> 00:02:16,500
We have just ten observations in LA.
53
00:02:16,500 --> 00:02:20,283
According to our formula, the median is at positions 5.5.
54
00:02:21,240 --> 00:02:23,910
In cases like this, the median is the simple average
55
00:02:23,910 --> 00:02:26,850
of the numbers at position 5 and 6.
56
00:02:26,850 --> 00:02:30,693
Therefore, the median of LA prices is 5.5 dollars.
57
00:02:32,160 --> 00:02:33,780
Okay, we have seen that the median
58
00:02:33,780 --> 00:02:36,000
is not affected by extreme prices.
59
00:02:36,000 --> 00:02:38,280
Which is good when we have posh New York restaurants
60
00:02:38,280 --> 00:02:39,870
and a street pizza sample.
61
00:02:39,870 --> 00:02:42,810
But, we still don't get the full picture.
62
00:02:42,810 --> 00:02:46,053
We must introduce another measure, the mode.
63
00:02:47,070 --> 00:02:50,520
The mode is the value that occurs most often.
64
00:02:50,520 --> 00:02:53,520
It can be used for both numerical and categorical data,
65
00:02:53,520 --> 00:02:55,773
but we will stick to our numerical example.
66
00:02:56,880 --> 00:02:59,100
After counting the frequencies of each value,
67
00:02:59,100 --> 00:03:01,020
we find that the mode of New York pizza
68
00:03:01,020 --> 00:03:03,240
prices is three dollars.
69
00:03:03,240 --> 00:03:05,160
Now, that's interesting.
70
00:03:05,160 --> 00:03:08,760
The most common price of pizza in NYC is just 3 dollars,
71
00:03:08,760 --> 00:03:10,740
but the mean and median led us to
72
00:03:10,740 --> 00:03:12,693
believe it was much more expensive.
73
00:03:14,130 --> 00:03:14,963
Okay,
74
00:03:14,963 --> 00:03:18,453
let's do the same and find the mode of LA pizza prices.
75
00:03:19,620 --> 00:03:22,847
Hm, each price only appears once.
76
00:03:22,847 --> 00:03:25,260
How do we find the mode then?
77
00:03:25,260 --> 00:03:27,963
Well, we say there is no mode.
78
00:03:29,190 --> 00:03:32,670
But, can't I say there are ten modes, you may ask?
79
00:03:32,670 --> 00:03:33,503
Sure you can,
80
00:03:33,503 --> 00:03:36,060
but it will be meaningless with ten observations
81
00:03:36,060 --> 00:03:38,673
and experienced datatician would never do that.
82
00:03:39,780 --> 00:03:42,600
In general you often have multiple modes.
83
00:03:42,600 --> 00:03:45,030
Usually, two or three modes are tolerable,
84
00:03:45,030 --> 00:03:46,710
but more than that would defeat the
85
00:03:46,710 --> 00:03:48,123
purpose of finding a mode.
86
00:03:49,560 --> 00:03:52,110
There is one last question we haven't answered.
87
00:03:52,110 --> 00:03:54,003
Which measure is best?
88
00:03:54,870 --> 00:03:57,690
The NYC and LA example shows us that the measures
89
00:03:57,690 --> 00:04:00,090
of central tendency should be used together,
90
00:04:00,090 --> 00:04:01,830
rather than independently.
91
00:04:01,830 --> 00:04:04,200
Therefore, there is no best,
92
00:04:04,200 --> 00:04:07,593
but using only one is definitely the worst!
93
00:04:08,910 --> 00:04:13,020
All right, now you know about the mean, median, and mode.
94
00:04:13,020 --> 00:04:15,540
In our next video we will use that knowledge
95
00:04:15,540 --> 00:04:17,579
to talk about skewness.
96
00:04:17,579 --> 00:04:19,473
Stay tuned and thanks for watching!
7351
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.