Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,000 --> 00:00:03,300
I want to explain the difference between the mean and the median.
2
00:00:03,300 --> 00:00:06,950
I think it's one of those things that it's easy to kind of get confused,
3
00:00:06,950 --> 00:00:10,180
or I find that a lot of people want to talk about averages,
4
00:00:10,180 --> 00:00:11,945
and what's the average,
5
00:00:11,945 --> 00:00:16,230
and really, the median is a more representative way of talking about a dataset.
6
00:00:16,230 --> 00:00:17,430
So, I'm just going to use
7
00:00:17,430 --> 00:00:21,179
this visual example to try and show you the difference between the two,
8
00:00:21,179 --> 00:00:24,570
and how an average can end up being kind of skewed or
9
00:00:24,570 --> 00:00:29,290
distorted in some way based on some outlying information and outlying data.
10
00:00:30,200 --> 00:00:32,850
So, if you have a number line like this,
11
00:00:32,850 --> 00:00:37,200
let's say we're talking about the age of a group of people.
12
00:00:37,200 --> 00:00:40,320
If we put those ages on this number line,
13
00:00:40,320 --> 00:00:41,980
and so we have five people.
14
00:00:41,980 --> 00:00:44,500
They all happen to be kids, and their ages are 2,
15
00:00:44,500 --> 00:00:47,630
3, 5, 7, and 8 years old.
16
00:00:47,630 --> 00:00:51,810
I said, "How can we describe the ages of that group?"
17
00:00:51,810 --> 00:00:56,570
Well, you could do it based on the mean or same thing as saying the average.
18
00:00:56,570 --> 00:00:59,340
So, if you took the average of those ages and literally add them up,
19
00:00:59,340 --> 00:01:00,875
divide by the total number,
20
00:01:00,875 --> 00:01:02,970
you get a mean of 5.
21
00:01:02,970 --> 00:01:04,500
So, if I said, "How old are those kids?"
22
00:01:04,500 --> 00:01:07,140
and you said, "Well, the average they're five years old."
23
00:01:07,140 --> 00:01:09,130
Then, I would know what you're talking about and I would say,
24
00:01:09,130 --> 00:01:10,935
"Okay, I know how old they are."
25
00:01:10,935 --> 00:01:13,820
It turns out that the median is also five.
26
00:01:13,820 --> 00:01:20,660
The median just takes the the number of values and divides them up into two categories.
27
00:01:20,660 --> 00:01:23,270
So, in this case, the values have been sorted from lowest to
28
00:01:23,270 --> 00:01:26,230
highest and it says which value is in the middle.
29
00:01:26,230 --> 00:01:29,005
So we have half of the values are above the median,
30
00:01:29,005 --> 00:01:30,700
half the values are below the median.
31
00:01:30,700 --> 00:01:33,950
It doesn't matter what those values actually are as long as we know
32
00:01:33,950 --> 00:01:38,445
that they've been sorted and we've divided them into half above, half below.
33
00:01:38,445 --> 00:01:41,090
So, here we have a mean and a median that are equal.
34
00:01:41,090 --> 00:01:43,540
So, I could use either one to describe this dataset,
35
00:01:43,540 --> 00:01:48,195
and it would make sense if you would understand what we're talking about.
36
00:01:48,195 --> 00:01:54,210
What happens though if one of the kids is not as close in age to the other ones,
37
00:01:54,210 --> 00:01:59,690
and maybe is 13 years old instead of the other ones being between 2, 3, 5, and 7,
38
00:01:59,690 --> 00:02:02,885
so now, the mean has been changed,
39
00:02:02,885 --> 00:02:06,995
because remember we're adding up the total and then dividing by the number of values.
40
00:02:06,995 --> 00:02:09,895
Now, our mean has gone from 5 to 6.
41
00:02:09,895 --> 00:02:11,920
The median though has not changed,
42
00:02:11,920 --> 00:02:13,370
because that last value,
43
00:02:13,370 --> 00:02:15,890
all it means is that it's still above
44
00:02:15,890 --> 00:02:17,990
the halfway mark in terms of how many values we
45
00:02:17,990 --> 00:02:20,335
had when we sort them, put them into two groups.
46
00:02:20,335 --> 00:02:26,145
So, our mean is now not as representative necessarily,
47
00:02:26,145 --> 00:02:30,310
because four of the people in this group are 2, 3, 5, and 7.
48
00:02:30,310 --> 00:02:33,205
So, if you said that the mean is 6,
49
00:02:33,205 --> 00:02:34,770
yeah, I guess it's still not that bad.
50
00:02:34,770 --> 00:02:38,210
But what happens if we have a kid that's not 13 years old,
51
00:02:38,210 --> 00:02:42,790
but it's actually really just a kid at heart and it's actually 53 years old?
52
00:02:42,790 --> 00:02:46,580
So, now we have five people in this group and
53
00:02:46,580 --> 00:02:51,080
four of them are between the ages of 2 and 7 and one of them is 53 years old.
54
00:02:51,080 --> 00:02:52,810
So look what happens to the mean here.
55
00:02:52,810 --> 00:02:55,570
Now, we have a mean of 14 years old.
56
00:02:55,570 --> 00:02:58,850
So, if I said, how old is to the people in that group,
57
00:02:58,850 --> 00:03:00,920
and you said, 14 is the average.
58
00:03:00,920 --> 00:03:02,160
Then, I would say, "Okay,
59
00:03:02,160 --> 00:03:03,180
so they're all teenagers.
60
00:03:03,180 --> 00:03:06,170
Or you'd kind of have them in your mind when really four of them are
61
00:03:06,170 --> 00:03:09,515
on the age of seven or less and one of them is as an adult.
62
00:03:09,515 --> 00:03:13,935
So, notice though that the median is still five.
63
00:03:13,935 --> 00:03:15,980
This is really, I hope a good way of
64
00:03:15,980 --> 00:03:18,620
visualizing the difference between the mean and the median.
65
00:03:18,620 --> 00:03:20,750
That's why often when you're talking to anyone
66
00:03:20,750 --> 00:03:23,280
who's sort of more versed in statistics is,
67
00:03:23,280 --> 00:03:24,600
if you say an average,
68
00:03:24,600 --> 00:03:27,410
yes, it's something that people can relate to.
69
00:03:27,410 --> 00:03:30,320
If it really is generally representative,
70
00:03:30,320 --> 00:03:31,790
there's no harm done,
71
00:03:31,790 --> 00:03:33,934
but if there's any kind of outlier
72
00:03:33,934 --> 00:03:37,490
happening if you have somebody that's way outside of the group,
73
00:03:37,490 --> 00:03:39,740
like for example if you had income for
74
00:03:39,740 --> 00:03:42,220
a group and you know everyone makes around the same amount of money,
75
00:03:42,220 --> 00:03:46,090
but then you have some billionaire added to that group.
76
00:03:46,090 --> 00:03:49,970
Then you've got one person who's completely skewing the average,
77
00:03:49,970 --> 00:03:52,249
but the median would still be representative.
78
00:03:52,249 --> 00:03:55,910
So, I just wanted to make sure that those two concepts are clear,
79
00:03:55,910 --> 00:03:58,590
it really does make a difference when you're looking at data,
80
00:03:58,590 --> 00:04:00,890
when you're classifying it for mapping.
81
00:04:00,890 --> 00:04:04,975
If we look at how the mean and the median relate to our mapping data here,
82
00:04:04,975 --> 00:04:10,520
so this is the classification dialog box and ArcMap and here's our distribution,
83
00:04:10,520 --> 00:04:13,425
and so here's the median for this dataset,
84
00:04:13,425 --> 00:04:14,945
and here's the mean,
85
00:04:14,945 --> 00:04:17,900
and so the average is a little bit higher than the median.
86
00:04:17,900 --> 00:04:20,780
That makes sense because you can see that there's a couple of
87
00:04:20,780 --> 00:04:24,670
outliers here that are dragging the mean higher than the median.
88
00:04:24,670 --> 00:04:26,600
In other words, they're skewing the data a little bit
89
00:04:26,600 --> 00:04:29,405
because of the fact that we have these outliers.
90
00:04:29,405 --> 00:04:31,180
With the data values I'm using here,
91
00:04:31,180 --> 00:04:33,710
which are income values for census tracts,
92
00:04:33,710 --> 00:04:37,480
I'm showing both the median income and the average income,
93
00:04:37,480 --> 00:04:39,300
and to show you the difference.
94
00:04:39,300 --> 00:04:42,710
So, these are actually being calculated for each of these areas,
95
00:04:42,710 --> 00:04:47,055
and you'll notice that towards the low end of the data range,
96
00:04:47,055 --> 00:04:49,310
the values are not that far apart.
97
00:04:49,310 --> 00:04:51,815
The average is still a little bit higher than the median,
98
00:04:51,815 --> 00:04:55,595
but maybe $7,000 difference, something like that.
99
00:04:55,595 --> 00:04:57,725
But if you look at the high end of the range,
100
00:04:57,725 --> 00:05:03,955
you actually have a difference between the average and the median of about $160,000.
101
00:05:03,955 --> 00:05:06,300
So, what is more representative of that group?
102
00:05:06,300 --> 00:05:10,765
This is probably, you've got some very wealthy people living in that neighborhood,
103
00:05:10,765 --> 00:05:12,810
but not that many of them necessarily,
104
00:05:12,810 --> 00:05:15,420
because the median is much lower.
105
00:05:15,420 --> 00:05:17,630
So, that tells me that the median is probably going
106
00:05:17,630 --> 00:05:19,880
to give us some of representative number in
107
00:05:19,880 --> 00:05:22,520
terms of describing the income levels
108
00:05:22,520 --> 00:05:25,270
of the people that live in that particular census tract.9247
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.