All language subtitles for 04 - Visualize what correlation means

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,004 --> 00:00:02,005 - [Instructor] When you analyze business data, 2 00:00:02,005 --> 00:00:03,008 you will often want to know 3 00:00:03,008 --> 00:00:06,006 whether two sets of values are related. 4 00:00:06,006 --> 00:00:10,005 One way to do that is to calculate correlation. 5 00:00:10,005 --> 00:00:13,008 Previously, I showed you how to calculate core variance. 6 00:00:13,008 --> 00:00:16,008 And as a reminder here is that formula. 7 00:00:16,008 --> 00:00:19,005 The idea is that you multiply the differences 8 00:00:19,005 --> 00:00:22,002 from the mean for each value pair, 9 00:00:22,002 --> 00:00:24,007 so we have two columns of data 10 00:00:24,007 --> 00:00:27,003 and we have an X and Y value that are matched up, 11 00:00:27,003 --> 00:00:31,000 and you find the sum of that product. 12 00:00:31,000 --> 00:00:35,001 You then divide that total by the number of data pairs. 13 00:00:35,001 --> 00:00:37,001 Correlation is related 14 00:00:37,001 --> 00:00:39,006 and here is the formula. 15 00:00:39,006 --> 00:00:42,003 So you'll see that the top term is the same 16 00:00:42,003 --> 00:00:44,007 but is divided by the term shown. 17 00:00:44,007 --> 00:00:46,006 And it takes a while to explain, 18 00:00:46,006 --> 00:00:51,003 so I will just ask you to accept the term as given 19 00:00:51,003 --> 00:00:54,002 and use it in your calculations. 20 00:00:54,002 --> 00:00:55,009 So the next question is 21 00:00:55,009 --> 00:00:58,008 how do you interpret your correlation values? 22 00:00:58,008 --> 00:01:01,007 You will get a value between 1 and - 1 23 00:01:01,007 --> 00:01:05,003 data that is completely uncorrelated returns 24 00:01:05,003 --> 00:01:07,000 a correlation value of 0. 25 00:01:07,000 --> 00:01:08,006 In other words, the two sets of values 26 00:01:08,006 --> 00:01:11,002 have nothing to do with each other. 27 00:01:11,002 --> 00:01:13,005 Data that is positively correlated 28 00:01:13,005 --> 00:01:16,006 will be between 0 and 1. 29 00:01:16,006 --> 00:01:18,005 And if you have 1, that means 30 00:01:18,005 --> 00:01:21,006 the data sets move and lockstep. 31 00:01:21,006 --> 00:01:24,007 In other words, they move the same way 32 00:01:24,007 --> 00:01:27,003 all the time, every time. 33 00:01:27,003 --> 00:01:29,008 And if you have data that is negatively correlated 34 00:01:29,008 --> 00:01:31,005 between minus 1 and 0 35 00:01:31,005 --> 00:01:34,005 then the data moves in opposite directions. 36 00:01:34,005 --> 00:01:39,001 So when one set of values goes up, the other one goes down. 37 00:01:39,001 --> 00:01:41,007 And of course it's possible to have values 38 00:01:41,007 --> 00:01:45,001 between 0 and 1 or 0 and -1, 39 00:01:45,001 --> 00:01:48,009 and that means the correlation isn't quite as strong. 40 00:01:48,009 --> 00:01:51,005 So let's take a look at a visual example 41 00:01:51,005 --> 00:01:53,008 of data that is not correlated. 42 00:01:53,008 --> 00:01:55,007 Here, I have five starting values 43 00:01:55,007 --> 00:01:57,004 and those are 1, 2, 3, 4, and 5. 44 00:01:57,004 --> 00:02:00,000 You can see those on the horizontal axis. 45 00:02:00,000 --> 00:02:03,006 And 1 produces two results, 3 and -3. 46 00:02:03,006 --> 00:02:08,004 And the same for the other values along the horizontal axis. 47 00:02:08,004 --> 00:02:10,003 What this means is that the starting value 48 00:02:10,003 --> 00:02:12,009 tells you nothing about the value that follows it, 49 00:02:12,009 --> 00:02:14,007 the X says nothing about the Y, 50 00:02:14,007 --> 00:02:17,006 so the correlation is 0. 51 00:02:17,006 --> 00:02:20,009 If you have data with a perfect positive correlation 52 00:02:20,009 --> 00:02:23,001 in other words, a correlation of 1, 53 00:02:23,001 --> 00:02:26,000 then you'll see that it goes up 54 00:02:26,000 --> 00:02:27,006 and the values exactly match. 55 00:02:27,006 --> 00:02:30,006 1 gives you 1, 2 gives you 2, and so on. 56 00:02:30,006 --> 00:02:33,000 It doesn't have to be this exact pattern 57 00:02:33,000 --> 00:02:36,000 but you can see a visual example 58 00:02:36,000 --> 00:02:38,009 of what a correlation of 1 looks like. 59 00:02:38,009 --> 00:02:41,008 And data with a perfect negative correlation 60 00:02:41,008 --> 00:02:44,009 goes in the opposite direction. 61 00:02:44,009 --> 00:02:47,003 The next question you have to ask, of course is, 62 00:02:47,003 --> 00:02:49,005 is my correlation significant? 63 00:02:49,005 --> 00:02:52,003 And that depends on several factors. 64 00:02:52,003 --> 00:02:54,003 You have the number of measurements 65 00:02:54,003 --> 00:02:56,009 whether a value can be positive or negative. 66 00:02:56,009 --> 00:02:59,006 And by that, I mean if you're looking 67 00:02:59,006 --> 00:03:01,008 for a positive or negative value only, 68 00:03:01,008 --> 00:03:03,003 in other words, one side, 69 00:03:03,003 --> 00:03:06,008 then that's called a one-tailed test. 70 00:03:06,008 --> 00:03:09,003 If your difference can be either positive or negative 71 00:03:09,003 --> 00:03:12,008 then you have a two-tailed test. 72 00:03:12,008 --> 00:03:15,000 And then you look up the correlation value 73 00:03:15,000 --> 00:03:18,007 in a table, which you can find in statistics, 74 00:03:18,007 --> 00:03:20,003 textbooks are online, 75 00:03:20,003 --> 00:03:22,005 and see what that looks like. 76 00:03:22,005 --> 00:03:24,001 And here is an example 77 00:03:24,001 --> 00:03:27,009 of a two-tailed correlation lookup table. 78 00:03:27,009 --> 00:03:30,007 And the leftmost column gives you the number 79 00:03:30,007 --> 00:03:32,004 of samples that you have, 80 00:03:32,004 --> 00:03:36,000 and then to the right you have different confidence levels. 81 00:03:36,000 --> 00:03:38,007 And you subtract that confidence level from 1 82 00:03:38,007 --> 00:03:42,002 to give you the value that you want. 83 00:03:42,002 --> 00:03:46,002 So, for example, in the column next to N, 0.1 84 00:03:46,002 --> 00:03:50,001 you subtract that from 1 for the 90% confidence interval 85 00:03:50,001 --> 00:03:52,001 or a confidence level. 86 00:03:52,001 --> 00:03:54,005 And then to the right of that, you have 0.05, 87 00:03:54,005 --> 00:03:57,005 so that's 95%, 98%, 88 00:03:57,005 --> 00:04:00,004 and the other two values you see there. 89 00:04:00,004 --> 00:04:05,008 So if you have 10 values and you calculate the correlation 90 00:04:05,008 --> 00:04:08,006 and you want to look at the 90% level 91 00:04:08,006 --> 00:04:11,009 then you would need to have a correlation value 92 00:04:11,009 --> 00:04:13,007 of at least 0.55, 93 00:04:13,007 --> 00:04:16,003 and that's the value at the intersection 94 00:04:16,003 --> 00:04:19,000 of 10 and 0.1 95 00:04:19,000 --> 00:04:23,000 to have a 90% certainty or 90% confidence level 96 00:04:23,000 --> 00:04:25,006 that the correlation is valid. 97 00:04:25,006 --> 00:04:30,001 And the various other combinations are available here. 98 00:04:30,001 --> 00:04:33,002 So again, there are a lot of moving parts to correlation 99 00:04:33,002 --> 00:04:36,000 but it is a very useful way to look at your data. 7609

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.