All language subtitles for 09_mean-vs-median.en

af Afrikaans
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,000 --> 00:00:03,300 I want to explain the difference between the mean and the median. 2 00:00:03,300 --> 00:00:06,950 I think it's one of those things that it's easy to kind of get confused, 3 00:00:06,950 --> 00:00:10,180 or I find that a lot of people want to talk about averages, 4 00:00:10,180 --> 00:00:11,945 and what's the average, 5 00:00:11,945 --> 00:00:16,230 and really, the median is a more representative way of talking about a dataset. 6 00:00:16,230 --> 00:00:17,430 So, I'm just going to use 7 00:00:17,430 --> 00:00:21,179 this visual example to try and show you the difference between the two, 8 00:00:21,179 --> 00:00:24,570 and how an average can end up being kind of skewed or 9 00:00:24,570 --> 00:00:29,290 distorted in some way based on some outlying information and outlying data. 10 00:00:30,200 --> 00:00:32,850 So, if you have a number line like this, 11 00:00:32,850 --> 00:00:37,200 let's say we're talking about the age of a group of people. 12 00:00:37,200 --> 00:00:40,320 If we put those ages on this number line, 13 00:00:40,320 --> 00:00:41,980 and so we have five people. 14 00:00:41,980 --> 00:00:44,500 They all happen to be kids, and their ages are 2, 15 00:00:44,500 --> 00:00:47,630 3, 5, 7, and 8 years old. 16 00:00:47,630 --> 00:00:51,810 I said, "How can we describe the ages of that group?" 17 00:00:51,810 --> 00:00:56,570 Well, you could do it based on the mean or same thing as saying the average. 18 00:00:56,570 --> 00:00:59,340 So, if you took the average of those ages and literally add them up, 19 00:00:59,340 --> 00:01:00,875 divide by the total number, 20 00:01:00,875 --> 00:01:02,970 you get a mean of 5. 21 00:01:02,970 --> 00:01:04,500 So, if I said, "How old are those kids?" 22 00:01:04,500 --> 00:01:07,140 and you said, "Well, the average they're five years old." 23 00:01:07,140 --> 00:01:09,130 Then, I would know what you're talking about and I would say, 24 00:01:09,130 --> 00:01:10,935 "Okay, I know how old they are." 25 00:01:10,935 --> 00:01:13,820 It turns out that the median is also five. 26 00:01:13,820 --> 00:01:20,660 The median just takes the the number of values and divides them up into two categories. 27 00:01:20,660 --> 00:01:23,270 So, in this case, the values have been sorted from lowest to 28 00:01:23,270 --> 00:01:26,230 highest and it says which value is in the middle. 29 00:01:26,230 --> 00:01:29,005 So we have half of the values are above the median, 30 00:01:29,005 --> 00:01:30,700 half the values are below the median. 31 00:01:30,700 --> 00:01:33,950 It doesn't matter what those values actually are as long as we know 32 00:01:33,950 --> 00:01:38,445 that they've been sorted and we've divided them into half above, half below. 33 00:01:38,445 --> 00:01:41,090 So, here we have a mean and a median that are equal. 34 00:01:41,090 --> 00:01:43,540 So, I could use either one to describe this dataset, 35 00:01:43,540 --> 00:01:48,195 and it would make sense if you would understand what we're talking about. 36 00:01:48,195 --> 00:01:54,210 What happens though if one of the kids is not as close in age to the other ones, 37 00:01:54,210 --> 00:01:59,690 and maybe is 13 years old instead of the other ones being between 2, 3, 5, and 7, 38 00:01:59,690 --> 00:02:02,885 so now, the mean has been changed, 39 00:02:02,885 --> 00:02:06,995 because remember we're adding up the total and then dividing by the number of values. 40 00:02:06,995 --> 00:02:09,895 Now, our mean has gone from 5 to 6. 41 00:02:09,895 --> 00:02:11,920 The median though has not changed, 42 00:02:11,920 --> 00:02:13,370 because that last value, 43 00:02:13,370 --> 00:02:15,890 all it means is that it's still above 44 00:02:15,890 --> 00:02:17,990 the halfway mark in terms of how many values we 45 00:02:17,990 --> 00:02:20,335 had when we sort them, put them into two groups. 46 00:02:20,335 --> 00:02:26,145 So, our mean is now not as representative necessarily, 47 00:02:26,145 --> 00:02:30,310 because four of the people in this group are 2, 3, 5, and 7. 48 00:02:30,310 --> 00:02:33,205 So, if you said that the mean is 6, 49 00:02:33,205 --> 00:02:34,770 yeah, I guess it's still not that bad. 50 00:02:34,770 --> 00:02:38,210 But what happens if we have a kid that's not 13 years old, 51 00:02:38,210 --> 00:02:42,790 but it's actually really just a kid at heart and it's actually 53 years old? 52 00:02:42,790 --> 00:02:46,580 So, now we have five people in this group and 53 00:02:46,580 --> 00:02:51,080 four of them are between the ages of 2 and 7 and one of them is 53 years old. 54 00:02:51,080 --> 00:02:52,810 So look what happens to the mean here. 55 00:02:52,810 --> 00:02:55,570 Now, we have a mean of 14 years old. 56 00:02:55,570 --> 00:02:58,850 So, if I said, how old is to the people in that group, 57 00:02:58,850 --> 00:03:00,920 and you said, 14 is the average. 58 00:03:00,920 --> 00:03:02,160 Then, I would say, "Okay, 59 00:03:02,160 --> 00:03:03,180 so they're all teenagers. 60 00:03:03,180 --> 00:03:06,170 Or you'd kind of have them in your mind when really four of them are 61 00:03:06,170 --> 00:03:09,515 on the age of seven or less and one of them is as an adult. 62 00:03:09,515 --> 00:03:13,935 So, notice though that the median is still five. 63 00:03:13,935 --> 00:03:15,980 This is really, I hope a good way of 64 00:03:15,980 --> 00:03:18,620 visualizing the difference between the mean and the median. 65 00:03:18,620 --> 00:03:20,750 That's why often when you're talking to anyone 66 00:03:20,750 --> 00:03:23,280 who's sort of more versed in statistics is, 67 00:03:23,280 --> 00:03:24,600 if you say an average, 68 00:03:24,600 --> 00:03:27,410 yes, it's something that people can relate to. 69 00:03:27,410 --> 00:03:30,320 If it really is generally representative, 70 00:03:30,320 --> 00:03:31,790 there's no harm done, 71 00:03:31,790 --> 00:03:33,934 but if there's any kind of outlier 72 00:03:33,934 --> 00:03:37,490 happening if you have somebody that's way outside of the group, 73 00:03:37,490 --> 00:03:39,740 like for example if you had income for 74 00:03:39,740 --> 00:03:42,220 a group and you know everyone makes around the same amount of money, 75 00:03:42,220 --> 00:03:46,090 but then you have some billionaire added to that group. 76 00:03:46,090 --> 00:03:49,970 Then you've got one person who's completely skewing the average, 77 00:03:49,970 --> 00:03:52,249 but the median would still be representative. 78 00:03:52,249 --> 00:03:55,910 So, I just wanted to make sure that those two concepts are clear, 79 00:03:55,910 --> 00:03:58,590 it really does make a difference when you're looking at data, 80 00:03:58,590 --> 00:04:00,890 when you're classifying it for mapping. 81 00:04:00,890 --> 00:04:04,975 If we look at how the mean and the median relate to our mapping data here, 82 00:04:04,975 --> 00:04:10,520 so this is the classification dialog box and ArcMap and here's our distribution, 83 00:04:10,520 --> 00:04:13,425 and so here's the median for this dataset, 84 00:04:13,425 --> 00:04:14,945 and here's the mean, 85 00:04:14,945 --> 00:04:17,900 and so the average is a little bit higher than the median. 86 00:04:17,900 --> 00:04:20,780 That makes sense because you can see that there's a couple of 87 00:04:20,780 --> 00:04:24,670 outliers here that are dragging the mean higher than the median. 88 00:04:24,670 --> 00:04:26,600 In other words, they're skewing the data a little bit 89 00:04:26,600 --> 00:04:29,405 because of the fact that we have these outliers. 90 00:04:29,405 --> 00:04:31,180 With the data values I'm using here, 91 00:04:31,180 --> 00:04:33,710 which are income values for census tracts, 92 00:04:33,710 --> 00:04:37,480 I'm showing both the median income and the average income, 93 00:04:37,480 --> 00:04:39,300 and to show you the difference. 94 00:04:39,300 --> 00:04:42,710 So, these are actually being calculated for each of these areas, 95 00:04:42,710 --> 00:04:47,055 and you'll notice that towards the low end of the data range, 96 00:04:47,055 --> 00:04:49,310 the values are not that far apart. 97 00:04:49,310 --> 00:04:51,815 The average is still a little bit higher than the median, 98 00:04:51,815 --> 00:04:55,595 but maybe $7,000 difference, something like that. 99 00:04:55,595 --> 00:04:57,725 But if you look at the high end of the range, 100 00:04:57,725 --> 00:05:03,955 you actually have a difference between the average and the median of about $160,000. 101 00:05:03,955 --> 00:05:06,300 So, what is more representative of that group? 102 00:05:06,300 --> 00:05:10,765 This is probably, you've got some very wealthy people living in that neighborhood, 103 00:05:10,765 --> 00:05:12,810 but not that many of them necessarily, 104 00:05:12,810 --> 00:05:15,420 because the median is much lower. 105 00:05:15,420 --> 00:05:17,630 So, that tells me that the median is probably going 106 00:05:17,630 --> 00:05:19,880 to give us some of representative number in 107 00:05:19,880 --> 00:05:22,520 terms of describing the income levels 108 00:05:22,520 --> 00:05:25,270 of the people that live in that particular census tract.9247

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.