All language subtitles for 08 - Solution Calculate correlations between columns of data

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,000 --> 00:00:05,002 (upbeat music) 2 00:00:05,002 --> 00:00:07,000 - [Instructor] I hope you've had a chance to work 3 00:00:07,000 --> 00:00:09,009 with the problem presented in this workbook. 4 00:00:09,009 --> 00:00:12,009 I'd like to now show you how I would solve it. 5 00:00:12,009 --> 00:00:15,009 And again, my sample file is Chapter_5_Challenge 6 00:00:15,009 --> 00:00:17,008 and you can find it in the chapter 5 folder 7 00:00:17,008 --> 00:00:21,004 of the exercise files collection. 8 00:00:21,004 --> 00:00:24,006 The first task is to go to cell F 2 9 00:00:24,006 --> 00:00:26,004 and calculate the co-variance 10 00:00:26,004 --> 00:00:30,003 between the values and the customers and sales columns. 11 00:00:30,003 --> 00:00:32,007 So in F 2 I'll type in equal sign 12 00:00:32,007 --> 00:00:36,002 and I will start typing covar 13 00:00:36,002 --> 00:00:38,004 and because you can never be sure 14 00:00:38,004 --> 00:00:40,003 that you have all of your data, 15 00:00:40,003 --> 00:00:42,005 I'm going to use COVARIANCE.S 16 00:00:42,005 --> 00:00:45,002 which assumes that we have a sample, 17 00:00:45,002 --> 00:00:46,006 so I'll highlight that, press tab, 18 00:00:46,006 --> 00:00:49,006 and now I need to identify the two arrays. 19 00:00:49,006 --> 00:00:51,009 The first one is customers 20 00:00:51,009 --> 00:00:56,002 and those values are in A 2 through A 12 21 00:00:56,002 --> 00:01:00,006 then a comma and then sales, which are in column B. 22 00:01:00,006 --> 00:01:04,006 So B 2 to B 12 I've selected there, right parenthesis 23 00:01:04,006 --> 00:01:11,003 and enter, and we get a covariance of 4023.16. 24 00:01:11,003 --> 00:01:13,002 How you interpret that covariance value 25 00:01:13,002 --> 00:01:15,002 depends on your original data. 26 00:01:15,002 --> 00:01:18,009 In this case, it does appear just from visual inspection 27 00:01:18,009 --> 00:01:20,002 that the number of customers 28 00:01:20,002 --> 00:01:22,006 in sales tend to co-vary together 29 00:01:22,006 --> 00:01:27,003 and 4,023 is about twice the maximum value 30 00:01:27,003 --> 00:01:29,008 in the sales column, so that's a good indicator 31 00:01:29,008 --> 00:01:32,007 that there probably is a relationship. 32 00:01:32,007 --> 00:01:34,008 Now let's do a correlation 33 00:01:34,008 --> 00:01:38,006 for the same two data sets and see what that does. 34 00:01:38,006 --> 00:01:41,006 So I'll go to cell F 4, type in equal sign, 35 00:01:41,006 --> 00:01:45,009 and we'll have CORREL, C O R R E L, 36 00:01:45,009 --> 00:01:48,006 and again, array1 and array2, 37 00:01:48,006 --> 00:01:52,006 array1 is A 2 through A 12, then a comma, 38 00:01:52,006 --> 00:01:58,005 and then array2, B 2 to B 12, right parenthesis, and enter. 39 00:01:58,005 --> 00:02:02,004 And I get a correlation of customers and sales of 0.84 40 00:02:02,004 --> 00:02:04,005 and that is very high. 41 00:02:04,005 --> 00:02:08,001 Remember that correlation varies between negative one 42 00:02:08,001 --> 00:02:10,007 and one with a value of one 43 00:02:10,007 --> 00:02:13,005 meaning that the values are perfectly correlated. 44 00:02:13,005 --> 00:02:16,006 And in a case where you have a value of 0.84 45 00:02:16,006 --> 00:02:20,007 with only 11 data points, that's a very good indication 46 00:02:20,007 --> 00:02:25,006 that there is a correlation between those two sets of data. 47 00:02:25,006 --> 00:02:26,009 Finally, let's go ahead 48 00:02:26,009 --> 00:02:31,008 and create a grid that calculates the correlation 49 00:02:31,008 --> 00:02:34,007 for each pair of columns. 50 00:02:34,007 --> 00:02:38,008 So I'll start by clicking and cell G 8 51 00:02:38,008 --> 00:02:41,008 and I want to create a formula that will calculate 52 00:02:41,008 --> 00:02:44,007 the correlation of the customer's column, 53 00:02:44,007 --> 00:02:48,006 column A with itself and I'll be able to copy that formula 54 00:02:48,006 --> 00:02:50,006 throughout the rest of the grid 55 00:02:50,006 --> 00:02:54,008 and find correlations between every pair of columns. 56 00:02:54,008 --> 00:03:00,003 So in G 8 I'll type = and then correl, C O R R E L, 57 00:03:00,003 --> 00:03:06,004 left parentheses and array1 is A 2 through A 12. 58 00:03:06,004 --> 00:03:09,001 However, with the way I'm creating my worksheet, 59 00:03:09,001 --> 00:03:10,007 I don't want that reference to change 60 00:03:10,007 --> 00:03:12,002 when I copy the formula. 61 00:03:12,002 --> 00:03:14,009 So I'll press F4 62 00:03:14,009 --> 00:03:20,007 and that makes the entire reference absolute and unchanging, 63 00:03:20,007 --> 00:03:24,000 type a comma and then I'll select A 2 64 00:03:24,000 --> 00:03:28,002 through A 12 again, right parenthesis and enter, 65 00:03:28,002 --> 00:03:29,009 and I get a correlation of one 66 00:03:29,009 --> 00:03:32,002 which as I mentioned earlier is correct. 67 00:03:32,002 --> 00:03:35,003 I'm comparing customers to itself. 68 00:03:35,003 --> 00:03:40,004 Now I will double click cell G 8 and copy the formula, 69 00:03:40,004 --> 00:03:44,005 so remember I have copied the cell's formula text. 70 00:03:44,005 --> 00:03:46,006 I have not actually copied the cell. 71 00:03:46,006 --> 00:03:47,009 So it is important 72 00:03:47,009 --> 00:03:51,003 that you not just click a cell and press control C. 73 00:03:51,003 --> 00:03:53,000 So I've already copied the values 74 00:03:53,000 --> 00:03:56,003 and then press escape to stop editing the cell, 75 00:03:56,003 --> 00:04:00,005 double click cell G 9, press control V to paste the text, 76 00:04:00,005 --> 00:04:03,009 and now, because I want to change the absolute references 77 00:04:03,009 --> 00:04:07,007 from A 2 to A 12, to B 2 to B 12 78 00:04:07,007 --> 00:04:13,008 I will edit the first range reference so I have B 2 to B 12 79 00:04:13,008 --> 00:04:16,001 and because I'm comparing customers sales 80 00:04:16,001 --> 00:04:18,000 I will leave A 2 to A 12 81 00:04:18,000 --> 00:04:22,002 as the relative references unchanged, press enter. 82 00:04:22,002 --> 00:04:26,000 And I get a correlation of 0.842 83 00:04:26,000 --> 00:04:29,002 which we can see is correct because it matches 84 00:04:29,002 --> 00:04:33,006 the calculation we did earlier in cell F 4. 85 00:04:33,006 --> 00:04:35,009 Now I will double click cell G 10, 86 00:04:35,009 --> 00:04:38,007 press control V, there's my formula again 87 00:04:38,007 --> 00:04:42,008 and I will do the same thing for the first array 88 00:04:42,008 --> 00:04:47,007 except this time I'm changing it to column C returns. 89 00:04:47,007 --> 00:04:53,000 So C 2 to C 12, comparing it to A 2 to A 12, press enter, 90 00:04:53,000 --> 00:04:56,003 and I see a very slightly negative correlation 91 00:04:56,003 --> 00:04:59,001 but that is very close to zero. 92 00:04:59,001 --> 00:05:01,007 Now I can copy the results 93 00:05:01,007 --> 00:05:06,001 of these three formulas over to cover columns H and I, 94 00:05:06,001 --> 00:05:10,001 so I will select G 8 through G 10 95 00:05:10,001 --> 00:05:11,002 and drag the fill handle 96 00:05:11,002 --> 00:05:16,002 to the right until it covers the same rows in column I, 97 00:05:16,002 --> 00:05:18,000 and there I get the values 98 00:05:18,000 --> 00:05:20,002 and it looks like everything is correct 99 00:05:20,002 --> 00:05:23,007 because I have customers and customers at one, 100 00:05:23,007 --> 00:05:27,006 sales and sales at one, and returns and returns at one. 101 00:05:27,006 --> 00:05:31,005 And I can see also that sales and returns do not 102 00:05:31,005 --> 00:05:33,000 appear to be correlated, 103 00:05:33,000 --> 00:05:38,008 a correlation value of 0.126 let's call it, is very small. 104 00:05:38,008 --> 00:05:41,004 So it appears that the only meaningful relationship 105 00:05:41,004 --> 00:05:43,005 at least from a correlation standpoint 106 00:05:43,005 --> 00:05:48,000 that I have is between sales and customers per day. 8478

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.