subtitlecat.com

All language subtitles for 01 - Use the normal distribution

Afrikaans

Akan

Albanian

Amharic

Arabic Download

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,004 --> 00:00:01,009 - [Instructor] The data distribution 2 00:00:01,009 --> 00:00:03,005 that you are most likely to use 3 00:00:03,005 --> 00:00:06,009 during your analysis is the normal distribution. 4 00:00:06,009 --> 00:00:09,003 The normal curve or bell curve 5 00:00:09,003 --> 00:00:11,007 has the shape shown in this chart. 6 00:00:11,007 --> 00:00:13,008 This chart indicates probabilities for a curve 7 00:00:13,008 --> 00:00:15,006 with an average or mean of 100 8 00:00:15,006 --> 00:00:18,009 and a standard deviation of 20. 9 00:00:18,009 --> 00:00:22,000 The mean is usually indicated by the Greek letter mu 10 00:00:22,000 --> 00:00:26,001 and the standard deviation by the Greek letter sigma. 11 00:00:26,001 --> 00:00:29,007 the normal curve has some very useful properties. 12 00:00:29,007 --> 00:00:33,009 The first is that approximately 68% of all values 13 00:00:33,009 --> 00:00:37,006 will occur within plus or minus one standard deviation. 14 00:00:37,006 --> 00:00:40,009 So with our mean of 100, that would mean 15 00:00:40,009 --> 00:00:44,004 that about 68% of the values would fall 16 00:00:44,004 --> 00:00:46,009 within 20 above or below the average. 17 00:00:46,009 --> 00:00:49,003 So 80 to 120. 18 00:00:49,003 --> 00:00:53,001 95% of values will be within two standard deviations, 19 00:00:53,001 --> 00:00:55,003 So 60 to 140, 20 00:00:55,003 --> 00:00:58,008 and 99.7% within three standard deviations 21 00:00:58,008 --> 00:01:03,001 plus or minus, so that's 40 to 160. 22 00:01:03,001 --> 00:01:05,005 To see how to work with these values in Excel, 23 00:01:05,005 --> 00:01:08,009 We'll switch over to our practice workbook. 24 00:01:08,009 --> 00:01:10,005 I've switched over to Excel 25 00:01:10,005 --> 00:01:13,004 and my sample file is 04_01_Normal, 26 00:01:13,004 --> 00:01:15,007 and you can find it in the chapter four folder 27 00:01:15,007 --> 00:01:18,001 of the exercise files collection. 28 00:01:18,001 --> 00:01:20,003 I use the values in columns A and B 29 00:01:20,003 --> 00:01:23,002 to create the graph of the curve 30 00:01:23,002 --> 00:01:25,003 that you see at the bottom right. 31 00:01:25,003 --> 00:01:28,008 But let's ask some numerical questions of our data. 32 00:01:28,008 --> 00:01:31,005 For example, we can calculate the probability 33 00:01:31,005 --> 00:01:33,009 of getting exactly 92. 34 00:01:33,009 --> 00:01:36,002 So we have an average of 100, 35 00:01:36,002 --> 00:01:37,009 standard aviation of 20, 36 00:01:37,009 --> 00:01:39,005 92 is close to the middle, 37 00:01:39,005 --> 00:01:42,005 so let's calculate the probability 38 00:01:42,005 --> 00:01:44,004 of getting exactly that value 39 00:01:44,004 --> 00:01:47,000 if we're generating random numbers. 40 00:01:47,000 --> 00:01:49,008 So I'll click in cell E1 41 00:01:49,008 --> 00:01:52,001 and then type an equal sign. 42 00:01:52,001 --> 00:01:55,007 And the function we use is NORM.DIST 43 00:01:55,007 --> 00:01:57,004 and as you might guess, 44 00:01:57,004 --> 00:02:00,002 that stands for normal distribution. 45 00:02:00,002 --> 00:02:03,002 The value we're working with our X is 92. 46 00:02:03,002 --> 00:02:05,005 So I'll type that in, then a comma. 47 00:02:05,005 --> 00:02:07,008 The mean is in B1, comma, 48 00:02:07,008 --> 00:02:11,004 standard deviation in B2, then a comma. 49 00:02:11,004 --> 00:02:14,008 And we are looking for the probability mass function 50 00:02:14,008 --> 00:02:17,008 which is also called a point probability. 51 00:02:17,008 --> 00:02:19,009 And that means that for the last argument, 52 00:02:19,009 --> 00:02:23,007 I need to select FALSE so I highlight that. 53 00:02:23,007 --> 00:02:24,009 Press tab to accept it, 54 00:02:24,009 --> 00:02:28,000 type a right parenthesis and enter. 55 00:02:28,000 --> 00:02:34,000 And we see the probability of getting exactly 92 is 1.84%. 56 00:02:34,000 --> 00:02:36,000 And that might seem pretty low, but remember, 57 00:02:36,000 --> 00:02:37,009 within three standard deviations, 58 00:02:37,009 --> 00:02:39,009 we go from 40 to 160. 59 00:02:39,009 --> 00:02:43,001 So the fact that 92 is as probable 60 00:02:43,001 --> 00:02:45,005 as it is at a random selection 61 00:02:45,005 --> 00:02:50,000 is an indication of how close to the average it is. 62 00:02:50,000 --> 00:02:51,007 Now let's calculate the probability 63 00:02:51,007 --> 00:02:54,001 of getting 92 or more. 64 00:02:54,001 --> 00:02:56,003 And I will do it incorrectly the first time 65 00:02:56,003 --> 00:02:59,008 and then show you how to fix what is a very common mistake. 66 00:02:59,008 --> 00:03:04,004 So in E2, I'll type equal, NORM.DIST. 67 00:03:04,004 --> 00:03:06,009 As before our X is 92, 68 00:03:06,009 --> 00:03:10,000 the mean is in B1, standard deviation in B2 69 00:03:10,000 --> 00:03:12,005 and then a comma, but now we do want to look 70 00:03:12,005 --> 00:03:15,003 for the accumulative distribution function. 71 00:03:15,003 --> 00:03:18,008 And that's because we're looking for 92 or more. 72 00:03:18,008 --> 00:03:20,008 So we want a spread of values instead 73 00:03:20,008 --> 00:03:23,004 of a single point probability. 74 00:03:23,004 --> 00:03:25,005 So I highlight TRUE, 75 00:03:25,005 --> 00:03:27,001 type a right parenthesis, 76 00:03:27,001 --> 00:03:31,002 and again, this is going to be an incorrect result. 77 00:03:31,002 --> 00:03:36,001 I get 34.46 of getting 92 or more. 78 00:03:36,001 --> 00:03:38,003 And here's why that's wrong. 79 00:03:38,003 --> 00:03:42,009 92 is to the left of the mean. 80 00:03:42,009 --> 00:03:44,009 And if you look at the normal curve, 81 00:03:44,009 --> 00:03:47,002 half the values are greater than the mean 82 00:03:47,002 --> 00:03:49,008 and the other half are less than the mean. 83 00:03:49,008 --> 00:03:54,005 So the fact that our calculation shows only 34.46% 84 00:03:54,005 --> 00:03:57,003 of values are greater than 92, 85 00:03:57,003 --> 00:04:00,006 which is less than the mean, must be incorrect. 86 00:04:00,006 --> 00:04:02,000 The way to fix this error 87 00:04:02,000 --> 00:04:05,005 is to subtract that calculation from one. 88 00:04:05,005 --> 00:04:07,005 So I will 89 00:04:07,005 --> 00:04:09,005 double click in cell E2, 90 00:04:09,005 --> 00:04:12,009 and then I will add one minus 91 00:04:12,009 --> 00:04:14,002 our previous calculation. 92 00:04:14,002 --> 00:04:18,007 Now, when I press tab, I get 65.54% 93 00:04:18,007 --> 00:04:20,004 and that makes a lot more sense 94 00:04:20,004 --> 00:04:24,000 because 92 is approximately here, 95 00:04:24,000 --> 00:04:26,004 I've highlighted 90, 96 00:04:26,004 --> 00:04:28,003 and I'll just leave the mouse pointer there 97 00:04:28,003 --> 00:04:30,006 to show you the approximate point. 98 00:04:30,006 --> 00:04:34,001 You can see that about 65.54% of the values 99 00:04:34,001 --> 00:04:35,003 are to the right 100 00:04:35,003 --> 00:04:38,007 so our calculation makes intuitive sense. 101 00:04:38,007 --> 00:04:41,006 We can also ask about percentages of values 102 00:04:41,006 --> 00:04:43,002 within a distribution. 103 00:04:43,002 --> 00:04:46,002 So let's say, what is the value inside this curve 104 00:04:46,002 --> 00:04:48,004 or as part of this data distribution 105 00:04:48,004 --> 00:04:51,006 that 33% of values are below? 106 00:04:51,006 --> 00:04:54,007 So I will click in cell H1 107 00:04:54,007 --> 00:04:56,008 and type an equal sign. 108 00:04:56,008 --> 00:05:00,002 We can't use NORM.DIST for this calculation 109 00:05:00,002 --> 00:05:04,004 but we can use a different function, NORM.INV 110 00:05:04,004 --> 00:05:07,006 and this is the inverse of the normal distribution 111 00:05:07,006 --> 00:05:11,001 where we got probabilities with NORM.DIST, 112 00:05:11,001 --> 00:05:13,000 the inverse 113 00:05:13,000 --> 00:05:17,001 of that gives us values based on a probability. 114 00:05:17,001 --> 00:05:21,000 So our probability is 33% 115 00:05:21,000 --> 00:05:22,000 then a comma, 116 00:05:22,000 --> 00:05:23,008 our mean is still in B1, 117 00:05:23,008 --> 00:05:25,009 standard deviation is still in B2. 118 00:05:25,009 --> 00:05:27,006 We don't need any other arguments 119 00:05:27,006 --> 00:05:31,000 so I'll type a right parenthesis and enter. 120 00:05:31,000 --> 00:05:34,001 And we get 91.2. 121 00:05:34,001 --> 00:05:36,004 And this again makes sense. 122 00:05:36,004 --> 00:05:40,002 About 33% of our values are below 91, 123 00:05:40,002 --> 00:05:43,002 which again is here on the curve, approximately, 124 00:05:43,002 --> 00:05:48,002 and that will show that about 33% of the values 125 00:05:48,002 --> 00:05:49,007 are to the left 126 00:05:49,007 --> 00:05:52,003 so our value checks out. 127 00:05:52,003 --> 00:05:55,004 If I want to find the value for which 90% of values 128 00:05:55,004 --> 00:05:57,003 in this curve are above, 129 00:05:57,003 --> 00:06:00,000 Then I can do NORM.INV. 130 00:06:00,000 --> 00:06:02,003 And if you're suspecting that we need to do 131 00:06:02,003 --> 00:06:05,001 one minus something, as we did with probability 132 00:06:05,001 --> 00:06:07,002 of 92 or more, you are correct 133 00:06:07,002 --> 00:06:09,004 but we put it in a different place. 134 00:06:09,004 --> 00:06:12,006 So in H2 I'll type an equal sign, 135 00:06:12,006 --> 00:06:14,007 NORM.INV. 136 00:06:14,007 --> 00:06:17,008 The result we would get by typing in 90% 137 00:06:17,008 --> 00:06:22,002 would be to return a value that 90% of values are below. 138 00:06:22,002 --> 00:06:25,006 So we need to subtract the percentage from one 139 00:06:25,006 --> 00:06:28,001 as part of the probability calculation. 140 00:06:28,001 --> 00:06:33,008 So for the first argument, I'll type 1 minus 90%, 141 00:06:33,008 --> 00:06:35,009 then a comma, B1 for the mean, 142 00:06:35,009 --> 00:06:37,008 B2 for the standard deviation, 143 00:06:37,008 --> 00:06:39,008 right parenthesis and enter, 144 00:06:39,008 --> 00:06:44,004 and we get 74.37 approximately. 145 00:06:44,004 --> 00:06:45,006 And again, that makes sense. 146 00:06:45,006 --> 00:06:48,003 If I go down to 74, 147 00:06:48,003 --> 00:06:50,004 that's 72, 148 00:06:50,004 --> 00:06:52,002 oh, there's 74. 149 00:06:52,002 --> 00:06:54,009 We can see where it lies on the curve 150 00:06:54,009 --> 00:06:58,003 and it makes sense that about 90% of values, 151 00:06:58,003 --> 00:07:01,005 including the fat part of the curve in the middle 152 00:07:01,005 --> 00:07:08,000 would be above the return value of about 74.37. 153 00:07:08,000 --> 00:07:11,008 So as you can see, you can do a lot with the normal curve, 154 00:07:11,008 --> 00:07:17,000 especially with the functions NORM.DIST and NORM.INV. 11490