All language subtitles for 04 - Introduce the central limit theorem

af Afrikaans
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,005 --> 00:00:03,003 - [Instructor] One of the dangers of business data analysis 2 00:00:03,003 --> 00:00:05,007 is making decisions too soon. 3 00:00:05,007 --> 00:00:09,002 The reason is that short-term results can be deceiving. 4 00:00:09,002 --> 00:00:10,007 But you should start to see patterns 5 00:00:10,007 --> 00:00:12,005 as you gather more data. 6 00:00:12,005 --> 00:00:14,007 One reliable principle of data analysis 7 00:00:14,007 --> 00:00:18,006 is the central limit theorem, which says that as the number 8 00:00:18,006 --> 00:00:21,000 of measurements increases, the more likely it is 9 00:00:21,000 --> 00:00:24,003 that your data will be distributed as you expect. 10 00:00:24,003 --> 00:00:26,008 As an example, let's say that your data 11 00:00:26,008 --> 00:00:28,004 is normally distributed. 12 00:00:28,004 --> 00:00:31,008 And a normal distribution has an average 13 00:00:31,008 --> 00:00:34,003 and also a standard deviation. 14 00:00:34,003 --> 00:00:38,003 In this case, we're looking at a so-called normal curve 15 00:00:38,003 --> 00:00:42,000 with a mu or average value of 100 16 00:00:42,000 --> 00:00:44,006 and a standard deviation of 20. 17 00:00:44,006 --> 00:00:49,000 And you can see a curve of values on this graph. 18 00:00:49,000 --> 00:00:51,004 And on the left, in the vertical axis, 19 00:00:51,004 --> 00:00:54,008 there is the probability of a specific value occurring. 20 00:00:54,008 --> 00:00:59,007 So you can see that the chance of getting exactly 100 is 2%. 21 00:00:59,007 --> 00:01:01,005 That's pretty low but there are a lot 22 00:01:01,005 --> 00:01:03,003 of values clustered around it. 23 00:01:03,003 --> 00:01:06,002 And that is where the power of the normal curve 24 00:01:06,002 --> 00:01:09,000 and normal distribution comes into play. 25 00:01:09,000 --> 00:01:12,005 If your data is normally distributed, and a lot of it is, 26 00:01:12,005 --> 00:01:16,001 then you should expect to see about 68% of your values 27 00:01:16,001 --> 00:01:19,009 in your data set within one standard deviation 28 00:01:19,009 --> 00:01:21,009 plus or minus of the mean. 29 00:01:21,009 --> 00:01:25,002 In this case, that would mean that 68% of your values 30 00:01:25,002 --> 00:01:27,008 would be between 80 and 120. 31 00:01:27,008 --> 00:01:33,000 So again, that's the average or mean of 100 minus 20 for 80 32 00:01:33,000 --> 00:01:36,009 and plus 20 for 120. 33 00:01:36,009 --> 00:01:39,009 You can also expect to see about 95% of your values 34 00:01:39,009 --> 00:01:43,000 within two standard deviations plus or minus, 35 00:01:43,000 --> 00:01:48,002 between 60 and 140, and approximately 99.7% of values 36 00:01:48,002 --> 00:01:51,000 within three standard deviations plus or minus. 37 00:01:51,000 --> 00:01:53,008 And of course the probabilities of seeing other values 38 00:01:53,008 --> 00:01:57,007 get smaller as you go further away from the average. 39 00:01:57,007 --> 00:02:00,009 It doesn't mean they never occur but it does mean 40 00:02:00,009 --> 00:02:03,002 that they are very rare. 41 00:02:03,002 --> 00:02:05,004 To show you how this data works in practice, 42 00:02:05,004 --> 00:02:08,000 I will switch to an Excel workbook, which you can find 43 00:02:08,000 --> 00:02:11,001 in the exercise files collection, to use a macro 44 00:02:11,001 --> 00:02:12,009 to generate random values and show you 45 00:02:12,009 --> 00:02:15,007 what it looks like in practice. 46 00:02:15,007 --> 00:02:19,008 The workbook I'm using is 01_04 Central Limit Theorem. 47 00:02:19,008 --> 00:02:21,007 And as I said, it is available 48 00:02:21,007 --> 00:02:24,004 in the exercise files collection. 49 00:02:24,004 --> 00:02:28,005 This workbook uses macros so I'm going to go ahead 50 00:02:28,005 --> 00:02:30,009 and enable that content. 51 00:02:30,009 --> 00:02:33,009 If you're not able to run macros on your own system, 52 00:02:33,009 --> 00:02:35,005 then you probably won't be able 53 00:02:35,005 --> 00:02:37,002 to interact with this workbook. 54 00:02:37,002 --> 00:02:40,002 But if you can, go ahead and click Enable Content 55 00:02:40,002 --> 00:02:42,000 and we're ready to go. 56 00:02:42,000 --> 00:02:44,009 So what just happened was that Excel recalculated 57 00:02:44,009 --> 00:02:48,003 my workbook and I have a new set of random values. 58 00:02:48,003 --> 00:02:52,004 I have 30 values and again it's within plus or minus 59 00:02:52,004 --> 00:02:55,001 three standard deviations from the mean. 60 00:02:55,001 --> 00:02:57,003 Currently I have 30 values selected. 61 00:02:57,003 --> 00:03:01,005 And if I click the 30 button and you'll see 62 00:03:01,005 --> 00:03:05,006 that I have four values that are one standard deviation 63 00:03:05,006 --> 00:03:10,002 above the mean and I have one that is three 64 00:03:10,002 --> 00:03:13,002 standard deviations below and so on. 65 00:03:13,002 --> 00:03:16,004 But also note that the values are distributed 66 00:03:16,004 --> 00:03:19,000 in what appears to be a flat pattern. 67 00:03:19,000 --> 00:03:22,005 You don't see the curve that we saw in the graphic earlier. 68 00:03:22,005 --> 00:03:26,000 Now let's go up to 100 values. So I click 100. 69 00:03:26,000 --> 00:03:27,008 And you can see we're starting to get something 70 00:03:27,008 --> 00:03:30,000 that looks a little bit more like a curve. 71 00:03:30,000 --> 00:03:32,004 We're seeing more clustering toward the middle. 72 00:03:32,004 --> 00:03:36,005 So I'll click 100 again. 100 again. 73 00:03:36,005 --> 00:03:39,001 And we're seeing patterns but it's not 74 00:03:39,001 --> 00:03:41,002 what we looked at before. 75 00:03:41,002 --> 00:03:43,005 So now click 1,000. 76 00:03:43,005 --> 00:03:46,005 And here the pattern really does start to develop 77 00:03:46,005 --> 00:03:50,005 because we have created or randomized more values 78 00:03:50,005 --> 00:03:54,007 and we're seeing a bit more of a hump in the middle. 79 00:03:54,007 --> 00:03:56,005 And if I click there again. 80 00:03:56,005 --> 00:03:57,008 And you can see that the pattern 81 00:03:57,008 --> 00:04:00,006 is much more like what we expected. 82 00:04:00,006 --> 00:04:02,005 Now click 10,000. 83 00:04:02,005 --> 00:04:04,005 And here the curve really starts to look like 84 00:04:04,005 --> 00:04:07,000 what we saw in the graphic earlier. 85 00:04:07,000 --> 00:04:08,007 So you have 10,000. 86 00:04:08,007 --> 00:04:12,008 And it's still a little bit lumpy in the sense 87 00:04:12,008 --> 00:04:15,009 that some bars are larger than others toward the middle. 88 00:04:15,009 --> 00:04:18,008 But it looks very much like the curve we saw. 89 00:04:18,008 --> 00:04:21,001 And finally, if I click 100,000, 90 00:04:21,001 --> 00:04:23,006 then the curve looks almost perfect. 91 00:04:23,006 --> 00:04:25,005 Because we're taking so many values, 92 00:04:25,005 --> 00:04:29,007 we've had the opportunity to smooth out random chance 93 00:04:29,007 --> 00:04:32,002 and our values look very much like the normal curve 94 00:04:32,002 --> 00:04:34,006 that we saw before. 95 00:04:34,006 --> 00:04:38,001 So keep measuring, keep analyzing, and keep an open mind 96 00:04:38,001 --> 00:04:40,000 as to what your data tells you. 7714

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.