All language subtitles for 01 - Use the normal distribution

af Afrikaans
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,004 --> 00:00:01,009 - [Instructor] The data distribution 2 00:00:01,009 --> 00:00:03,005 that you are most likely to use 3 00:00:03,005 --> 00:00:06,009 during your analysis is the normal distribution. 4 00:00:06,009 --> 00:00:09,003 The normal curve or bell curve 5 00:00:09,003 --> 00:00:11,007 has the shape shown in this chart. 6 00:00:11,007 --> 00:00:13,008 This chart indicates probabilities for a curve 7 00:00:13,008 --> 00:00:15,006 with an average or mean of 100 8 00:00:15,006 --> 00:00:18,009 and a standard deviation of 20. 9 00:00:18,009 --> 00:00:22,000 The mean is usually indicated by the Greek letter mu 10 00:00:22,000 --> 00:00:26,001 and the standard deviation by the Greek letter sigma. 11 00:00:26,001 --> 00:00:29,007 the normal curve has some very useful properties. 12 00:00:29,007 --> 00:00:33,009 The first is that approximately 68% of all values 13 00:00:33,009 --> 00:00:37,006 will occur within plus or minus one standard deviation. 14 00:00:37,006 --> 00:00:40,009 So with our mean of 100, that would mean 15 00:00:40,009 --> 00:00:44,004 that about 68% of the values would fall 16 00:00:44,004 --> 00:00:46,009 within 20 above or below the average. 17 00:00:46,009 --> 00:00:49,003 So 80 to 120. 18 00:00:49,003 --> 00:00:53,001 95% of values will be within two standard deviations, 19 00:00:53,001 --> 00:00:55,003 So 60 to 140, 20 00:00:55,003 --> 00:00:58,008 and 99.7% within three standard deviations 21 00:00:58,008 --> 00:01:03,001 plus or minus, so that's 40 to 160. 22 00:01:03,001 --> 00:01:05,005 To see how to work with these values in Excel, 23 00:01:05,005 --> 00:01:08,009 We'll switch over to our practice workbook. 24 00:01:08,009 --> 00:01:10,005 I've switched over to Excel 25 00:01:10,005 --> 00:01:13,004 and my sample file is 04_01_Normal, 26 00:01:13,004 --> 00:01:15,007 and you can find it in the chapter four folder 27 00:01:15,007 --> 00:01:18,001 of the exercise files collection. 28 00:01:18,001 --> 00:01:20,003 I use the values in columns A and B 29 00:01:20,003 --> 00:01:23,002 to create the graph of the curve 30 00:01:23,002 --> 00:01:25,003 that you see at the bottom right. 31 00:01:25,003 --> 00:01:28,008 But let's ask some numerical questions of our data. 32 00:01:28,008 --> 00:01:31,005 For example, we can calculate the probability 33 00:01:31,005 --> 00:01:33,009 of getting exactly 92. 34 00:01:33,009 --> 00:01:36,002 So we have an average of 100, 35 00:01:36,002 --> 00:01:37,009 standard aviation of 20, 36 00:01:37,009 --> 00:01:39,005 92 is close to the middle, 37 00:01:39,005 --> 00:01:42,005 so let's calculate the probability 38 00:01:42,005 --> 00:01:44,004 of getting exactly that value 39 00:01:44,004 --> 00:01:47,000 if we're generating random numbers. 40 00:01:47,000 --> 00:01:49,008 So I'll click in cell E1 41 00:01:49,008 --> 00:01:52,001 and then type an equal sign. 42 00:01:52,001 --> 00:01:55,007 And the function we use is NORM.DIST 43 00:01:55,007 --> 00:01:57,004 and as you might guess, 44 00:01:57,004 --> 00:02:00,002 that stands for normal distribution. 45 00:02:00,002 --> 00:02:03,002 The value we're working with our X is 92. 46 00:02:03,002 --> 00:02:05,005 So I'll type that in, then a comma. 47 00:02:05,005 --> 00:02:07,008 The mean is in B1, comma, 48 00:02:07,008 --> 00:02:11,004 standard deviation in B2, then a comma. 49 00:02:11,004 --> 00:02:14,008 And we are looking for the probability mass function 50 00:02:14,008 --> 00:02:17,008 which is also called a point probability. 51 00:02:17,008 --> 00:02:19,009 And that means that for the last argument, 52 00:02:19,009 --> 00:02:23,007 I need to select FALSE so I highlight that. 53 00:02:23,007 --> 00:02:24,009 Press tab to accept it, 54 00:02:24,009 --> 00:02:28,000 type a right parenthesis and enter. 55 00:02:28,000 --> 00:02:34,000 And we see the probability of getting exactly 92 is 1.84%. 56 00:02:34,000 --> 00:02:36,000 And that might seem pretty low, but remember, 57 00:02:36,000 --> 00:02:37,009 within three standard deviations, 58 00:02:37,009 --> 00:02:39,009 we go from 40 to 160. 59 00:02:39,009 --> 00:02:43,001 So the fact that 92 is as probable 60 00:02:43,001 --> 00:02:45,005 as it is at a random selection 61 00:02:45,005 --> 00:02:50,000 is an indication of how close to the average it is. 62 00:02:50,000 --> 00:02:51,007 Now let's calculate the probability 63 00:02:51,007 --> 00:02:54,001 of getting 92 or more. 64 00:02:54,001 --> 00:02:56,003 And I will do it incorrectly the first time 65 00:02:56,003 --> 00:02:59,008 and then show you how to fix what is a very common mistake. 66 00:02:59,008 --> 00:03:04,004 So in E2, I'll type equal, NORM.DIST. 67 00:03:04,004 --> 00:03:06,009 As before our X is 92, 68 00:03:06,009 --> 00:03:10,000 the mean is in B1, standard deviation in B2 69 00:03:10,000 --> 00:03:12,005 and then a comma, but now we do want to look 70 00:03:12,005 --> 00:03:15,003 for the accumulative distribution function. 71 00:03:15,003 --> 00:03:18,008 And that's because we're looking for 92 or more. 72 00:03:18,008 --> 00:03:20,008 So we want a spread of values instead 73 00:03:20,008 --> 00:03:23,004 of a single point probability. 74 00:03:23,004 --> 00:03:25,005 So I highlight TRUE, 75 00:03:25,005 --> 00:03:27,001 type a right parenthesis, 76 00:03:27,001 --> 00:03:31,002 and again, this is going to be an incorrect result. 77 00:03:31,002 --> 00:03:36,001 I get 34.46 of getting 92 or more. 78 00:03:36,001 --> 00:03:38,003 And here's why that's wrong. 79 00:03:38,003 --> 00:03:42,009 92 is to the left of the mean. 80 00:03:42,009 --> 00:03:44,009 And if you look at the normal curve, 81 00:03:44,009 --> 00:03:47,002 half the values are greater than the mean 82 00:03:47,002 --> 00:03:49,008 and the other half are less than the mean. 83 00:03:49,008 --> 00:03:54,005 So the fact that our calculation shows only 34.46% 84 00:03:54,005 --> 00:03:57,003 of values are greater than 92, 85 00:03:57,003 --> 00:04:00,006 which is less than the mean, must be incorrect. 86 00:04:00,006 --> 00:04:02,000 The way to fix this error 87 00:04:02,000 --> 00:04:05,005 is to subtract that calculation from one. 88 00:04:05,005 --> 00:04:07,005 So I will 89 00:04:07,005 --> 00:04:09,005 double click in cell E2, 90 00:04:09,005 --> 00:04:12,009 and then I will add one minus 91 00:04:12,009 --> 00:04:14,002 our previous calculation. 92 00:04:14,002 --> 00:04:18,007 Now, when I press tab, I get 65.54% 93 00:04:18,007 --> 00:04:20,004 and that makes a lot more sense 94 00:04:20,004 --> 00:04:24,000 because 92 is approximately here, 95 00:04:24,000 --> 00:04:26,004 I've highlighted 90, 96 00:04:26,004 --> 00:04:28,003 and I'll just leave the mouse pointer there 97 00:04:28,003 --> 00:04:30,006 to show you the approximate point. 98 00:04:30,006 --> 00:04:34,001 You can see that about 65.54% of the values 99 00:04:34,001 --> 00:04:35,003 are to the right 100 00:04:35,003 --> 00:04:38,007 so our calculation makes intuitive sense. 101 00:04:38,007 --> 00:04:41,006 We can also ask about percentages of values 102 00:04:41,006 --> 00:04:43,002 within a distribution. 103 00:04:43,002 --> 00:04:46,002 So let's say, what is the value inside this curve 104 00:04:46,002 --> 00:04:48,004 or as part of this data distribution 105 00:04:48,004 --> 00:04:51,006 that 33% of values are below? 106 00:04:51,006 --> 00:04:54,007 So I will click in cell H1 107 00:04:54,007 --> 00:04:56,008 and type an equal sign. 108 00:04:56,008 --> 00:05:00,002 We can't use NORM.DIST for this calculation 109 00:05:00,002 --> 00:05:04,004 but we can use a different function, NORM.INV 110 00:05:04,004 --> 00:05:07,006 and this is the inverse of the normal distribution 111 00:05:07,006 --> 00:05:11,001 where we got probabilities with NORM.DIST, 112 00:05:11,001 --> 00:05:13,000 the inverse 113 00:05:13,000 --> 00:05:17,001 of that gives us values based on a probability. 114 00:05:17,001 --> 00:05:21,000 So our probability is 33% 115 00:05:21,000 --> 00:05:22,000 then a comma, 116 00:05:22,000 --> 00:05:23,008 our mean is still in B1, 117 00:05:23,008 --> 00:05:25,009 standard deviation is still in B2. 118 00:05:25,009 --> 00:05:27,006 We don't need any other arguments 119 00:05:27,006 --> 00:05:31,000 so I'll type a right parenthesis and enter. 120 00:05:31,000 --> 00:05:34,001 And we get 91.2. 121 00:05:34,001 --> 00:05:36,004 And this again makes sense. 122 00:05:36,004 --> 00:05:40,002 About 33% of our values are below 91, 123 00:05:40,002 --> 00:05:43,002 which again is here on the curve, approximately, 124 00:05:43,002 --> 00:05:48,002 and that will show that about 33% of the values 125 00:05:48,002 --> 00:05:49,007 are to the left 126 00:05:49,007 --> 00:05:52,003 so our value checks out. 127 00:05:52,003 --> 00:05:55,004 If I want to find the value for which 90% of values 128 00:05:55,004 --> 00:05:57,003 in this curve are above, 129 00:05:57,003 --> 00:06:00,000 Then I can do NORM.INV. 130 00:06:00,000 --> 00:06:02,003 And if you're suspecting that we need to do 131 00:06:02,003 --> 00:06:05,001 one minus something, as we did with probability 132 00:06:05,001 --> 00:06:07,002 of 92 or more, you are correct 133 00:06:07,002 --> 00:06:09,004 but we put it in a different place. 134 00:06:09,004 --> 00:06:12,006 So in H2 I'll type an equal sign, 135 00:06:12,006 --> 00:06:14,007 NORM.INV. 136 00:06:14,007 --> 00:06:17,008 The result we would get by typing in 90% 137 00:06:17,008 --> 00:06:22,002 would be to return a value that 90% of values are below. 138 00:06:22,002 --> 00:06:25,006 So we need to subtract the percentage from one 139 00:06:25,006 --> 00:06:28,001 as part of the probability calculation. 140 00:06:28,001 --> 00:06:33,008 So for the first argument, I'll type 1 minus 90%, 141 00:06:33,008 --> 00:06:35,009 then a comma, B1 for the mean, 142 00:06:35,009 --> 00:06:37,008 B2 for the standard deviation, 143 00:06:37,008 --> 00:06:39,008 right parenthesis and enter, 144 00:06:39,008 --> 00:06:44,004 and we get 74.37 approximately. 145 00:06:44,004 --> 00:06:45,006 And again, that makes sense. 146 00:06:45,006 --> 00:06:48,003 If I go down to 74, 147 00:06:48,003 --> 00:06:50,004 that's 72, 148 00:06:50,004 --> 00:06:52,002 oh, there's 74. 149 00:06:52,002 --> 00:06:54,009 We can see where it lies on the curve 150 00:06:54,009 --> 00:06:58,003 and it makes sense that about 90% of values, 151 00:06:58,003 --> 00:07:01,005 including the fat part of the curve in the middle 152 00:07:01,005 --> 00:07:08,000 would be above the return value of about 74.37. 153 00:07:08,000 --> 00:07:11,008 So as you can see, you can do a lot with the normal curve, 154 00:07:11,008 --> 00:07:17,000 especially with the functions NORM.DIST and NORM.INV. 11490

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.