All language subtitles for 02 - Calculate covariance between two columns of data

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,004 --> 00:00:01,007 - [Instructor] When you analyze data, 2 00:00:01,007 --> 00:00:04,003 it's often important to see how two sets of value 3 00:00:04,003 --> 00:00:06,007 vary in relation to one another. 4 00:00:06,007 --> 00:00:09,003 For example, if people drive to get to your store, 5 00:00:09,003 --> 00:00:11,009 you might be interested in knowing if the distance 6 00:00:11,009 --> 00:00:14,009 they drive is related to how much they spend. 7 00:00:14,009 --> 00:00:17,000 In this movie, I will show you how 8 00:00:17,000 --> 00:00:20,004 to look at the relationship between two sets of values 9 00:00:20,004 --> 00:00:22,009 by calculating covariance. 10 00:00:22,009 --> 00:00:26,002 My sample file is 05_02_SingleCovariance. 11 00:00:26,002 --> 00:00:28,004 And you can find it in the Chapter05 folder 12 00:00:28,004 --> 00:00:30,007 of the Exercise Files collection. 13 00:00:30,007 --> 00:00:33,009 I'll show you two different ways to calculate covariance. 14 00:00:33,009 --> 00:00:35,005 The first is the long way. 15 00:00:35,005 --> 00:00:39,004 So I will implement the formula that you see here 16 00:00:39,004 --> 00:00:42,008 in this graphic as part of the worksheet. 17 00:00:42,008 --> 00:00:45,005 And then once you understand what's going on, 18 00:00:45,005 --> 00:00:48,001 I will show you how to use a built-in function 19 00:00:48,001 --> 00:00:50,003 to perform the same calculation. 20 00:00:50,003 --> 00:00:52,002 Rather than give an overview, 21 00:00:52,002 --> 00:00:55,000 I will go ahead and start implementing the formula 22 00:00:55,000 --> 00:00:58,000 that you see on the right in cell C2. 23 00:00:58,000 --> 00:01:01,000 So this will be for our first pair of values. 24 00:01:01,000 --> 00:01:04,001 So in C2, I will type an equal sign. 25 00:01:04,001 --> 00:01:08,004 And the first thing I want to do is subtract the average 26 00:01:08,004 --> 00:01:10,006 of the values in column one 27 00:01:10,006 --> 00:01:14,007 from the specific value in cell A2. 28 00:01:14,007 --> 00:01:19,003 So I'll type a left parentheses, A2, 29 00:01:19,003 --> 00:01:22,006 minus, and then we want to find the average or mean 30 00:01:22,006 --> 00:01:23,007 of the values in column one. 31 00:01:23,007 --> 00:01:25,007 So that's AVERAGE. 32 00:01:25,007 --> 00:01:30,006 And the range is A2 to A11. 33 00:01:30,006 --> 00:01:33,001 And I don't want that reference to change. 34 00:01:33,001 --> 00:01:35,002 I always want to refer to that set of values 35 00:01:35,002 --> 00:01:36,003 for the average. 36 00:01:36,003 --> 00:01:40,007 So I'll press F4 and I'll go back to A2. 37 00:01:40,007 --> 00:01:44,001 Press F4 to make that an absolute reference. 38 00:01:44,001 --> 00:01:46,000 So I have my range of values, 39 00:01:46,000 --> 00:01:49,005 and I can type two right parentheses 40 00:01:49,005 --> 00:01:54,000 to close out that part of the calculation. 41 00:01:54,000 --> 00:01:57,001 Then I'll type in asterisk for multiplication, 42 00:01:57,001 --> 00:02:00,004 and we'll do the same thing for the values in column two. 43 00:02:00,004 --> 00:02:02,002 So I'll go a little faster. 44 00:02:02,002 --> 00:02:06,005 We're going to multiply that by, left parentheses, B2 45 00:02:06,005 --> 00:02:11,004 minus the average of the values in B2. 46 00:02:11,004 --> 00:02:13,008 And again, that will be an absolute reference. 47 00:02:13,008 --> 00:02:17,001 Colon B11. 48 00:02:17,001 --> 00:02:19,007 And F4 again on the PC. 49 00:02:19,007 --> 00:02:23,000 On Mac, its command + T. 50 00:02:23,000 --> 00:02:26,007 Then two right parentheses, and I will press enter 51 00:02:26,007 --> 00:02:28,007 to create our calculation. 52 00:02:28,007 --> 00:02:31,007 And I have a covariance of 4.68. 53 00:02:31,007 --> 00:02:34,002 And it is a small value because we're working 54 00:02:34,002 --> 00:02:37,002 with small values in columns A and B. 55 00:02:37,002 --> 00:02:39,002 Now I want to copy this formula down. 56 00:02:39,002 --> 00:02:43,000 So it covers every row of values going down 57 00:02:43,000 --> 00:02:45,002 to row 11 in the worksheet. 58 00:02:45,002 --> 00:02:47,004 So I will click cell C2. 59 00:02:47,004 --> 00:02:50,003 Double-click the fill handle at the bottom-right corner. 60 00:02:50,003 --> 00:02:52,000 I know my mouse pointer is in the right place 61 00:02:52,000 --> 00:02:54,003 when it changes to a black cross. 62 00:02:54,003 --> 00:02:55,005 So I double-clicked, 63 00:02:55,005 --> 00:03:01,004 and there I have covariances for each pair of values. 64 00:03:01,004 --> 00:03:03,005 Now, I want to add all of that up, 65 00:03:03,005 --> 00:03:07,002 so I will click in cell C12. 66 00:03:07,002 --> 00:03:08,005 Type an equal sign. 67 00:03:08,005 --> 00:03:17,001 And then, I'll do SUM, and that will be C2 to C11. 68 00:03:17,001 --> 00:03:19,009 And I want to divide that by the number of pairs, 69 00:03:19,009 --> 00:03:23,007 and I'll type in a forward slash for division, 70 00:03:23,007 --> 00:03:26,005 10 'cause that's the number I have, and enter. 71 00:03:26,005 --> 00:03:31,001 And I have an overall covariance of 0.48. 72 00:03:31,001 --> 00:03:35,001 So that is the mechanics of how the calculation is done. 73 00:03:35,001 --> 00:03:38,005 Now, I'll show you how to do it with a built-in function. 74 00:03:38,005 --> 00:03:40,004 So I'll click in cell E3. 75 00:03:40,004 --> 00:03:42,001 Type an equal sign. 76 00:03:42,001 --> 00:03:48,004 And the function that we'll use is COVARIANCE.P. 77 00:03:48,004 --> 00:03:52,002 COVARIANCE.P returns the covariance of a population. 78 00:03:52,002 --> 00:03:54,001 And a population calculation assumes 79 00:03:54,001 --> 00:03:56,004 that you have every possible value, 80 00:03:56,004 --> 00:04:01,001 and you're working with them as part of the calculation. 81 00:04:01,001 --> 00:04:04,002 And then you need to enter in the two arrays of value. 82 00:04:04,002 --> 00:04:08,005 So I have A2 through A11, then a comma, 83 00:04:08,005 --> 00:04:11,002 and B2 through B11. 84 00:04:11,002 --> 00:04:14,000 Then a right parenthesis and enter. 85 00:04:14,000 --> 00:04:15,003 And you see that once again, 86 00:04:15,003 --> 00:04:19,000 we get the value of negative 0.48. 87 00:04:19,000 --> 00:04:21,001 So this value is very close to zero. 88 00:04:21,001 --> 00:04:23,003 And even though we're working with small numbers, 89 00:04:23,003 --> 00:04:25,002 that indicates that there probably 90 00:04:25,002 --> 00:04:26,009 isn't a very strong relationship 91 00:04:26,009 --> 00:04:29,003 between these two sets of values. 92 00:04:29,003 --> 00:04:32,005 I'll go back up to E3 and double-click. 93 00:04:32,005 --> 00:04:36,007 And you might have noticed that when I was entering 94 00:04:36,007 --> 00:04:39,001 in the formula, we had two options 95 00:04:39,001 --> 00:04:41,004 for calculating covariance. 96 00:04:41,004 --> 00:04:43,009 COVARIANCE.P, which is what I demonstrated here, 97 00:04:43,009 --> 00:04:45,008 and COVARIANCE.S. 98 00:04:45,008 --> 00:04:48,006 S assumes that you have a sample of your data. 99 00:04:48,006 --> 00:04:50,003 So I will highlight that. 100 00:04:50,003 --> 00:04:53,000 Press tab, and the function has been changed 101 00:04:53,000 --> 00:04:53,008 in the formula. 102 00:04:53,008 --> 00:04:55,001 I'll press enter. 103 00:04:55,001 --> 00:04:58,009 And you can see that the value is slightly larger, 104 00:04:58,009 --> 00:05:01,001 or at least farther away from zero 105 00:05:01,001 --> 00:05:02,009 than the previous calculation. 106 00:05:02,009 --> 00:05:05,007 And that's because we are dividing by n minus one 107 00:05:05,007 --> 00:05:07,006 instead of just n. 108 00:05:07,006 --> 00:05:09,007 so it's a slightly more conservative way 109 00:05:09,007 --> 00:05:12,000 of doing the same calculation. 8366

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.