All language subtitles for 08 - Solution Calculate correlations between columns of data

af Afrikaans
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,000 --> 00:00:05,002 (upbeat music) 2 00:00:05,002 --> 00:00:07,000 - [Instructor] I hope you've had a chance to work 3 00:00:07,000 --> 00:00:09,009 with the problem presented in this workbook. 4 00:00:09,009 --> 00:00:12,009 I'd like to now show you how I would solve it. 5 00:00:12,009 --> 00:00:15,009 And again, my sample file is Chapter_5_Challenge 6 00:00:15,009 --> 00:00:17,008 and you can find it in the chapter 5 folder 7 00:00:17,008 --> 00:00:21,004 of the exercise files collection. 8 00:00:21,004 --> 00:00:24,006 The first task is to go to cell F 2 9 00:00:24,006 --> 00:00:26,004 and calculate the co-variance 10 00:00:26,004 --> 00:00:30,003 between the values and the customers and sales columns. 11 00:00:30,003 --> 00:00:32,007 So in F 2 I'll type in equal sign 12 00:00:32,007 --> 00:00:36,002 and I will start typing covar 13 00:00:36,002 --> 00:00:38,004 and because you can never be sure 14 00:00:38,004 --> 00:00:40,003 that you have all of your data, 15 00:00:40,003 --> 00:00:42,005 I'm going to use COVARIANCE.S 16 00:00:42,005 --> 00:00:45,002 which assumes that we have a sample, 17 00:00:45,002 --> 00:00:46,006 so I'll highlight that, press tab, 18 00:00:46,006 --> 00:00:49,006 and now I need to identify the two arrays. 19 00:00:49,006 --> 00:00:51,009 The first one is customers 20 00:00:51,009 --> 00:00:56,002 and those values are in A 2 through A 12 21 00:00:56,002 --> 00:01:00,006 then a comma and then sales, which are in column B. 22 00:01:00,006 --> 00:01:04,006 So B 2 to B 12 I've selected there, right parenthesis 23 00:01:04,006 --> 00:01:11,003 and enter, and we get a covariance of 4023.16. 24 00:01:11,003 --> 00:01:13,002 How you interpret that covariance value 25 00:01:13,002 --> 00:01:15,002 depends on your original data. 26 00:01:15,002 --> 00:01:18,009 In this case, it does appear just from visual inspection 27 00:01:18,009 --> 00:01:20,002 that the number of customers 28 00:01:20,002 --> 00:01:22,006 in sales tend to co-vary together 29 00:01:22,006 --> 00:01:27,003 and 4,023 is about twice the maximum value 30 00:01:27,003 --> 00:01:29,008 in the sales column, so that's a good indicator 31 00:01:29,008 --> 00:01:32,007 that there probably is a relationship. 32 00:01:32,007 --> 00:01:34,008 Now let's do a correlation 33 00:01:34,008 --> 00:01:38,006 for the same two data sets and see what that does. 34 00:01:38,006 --> 00:01:41,006 So I'll go to cell F 4, type in equal sign, 35 00:01:41,006 --> 00:01:45,009 and we'll have CORREL, C O R R E L, 36 00:01:45,009 --> 00:01:48,006 and again, array1 and array2, 37 00:01:48,006 --> 00:01:52,006 array1 is A 2 through A 12, then a comma, 38 00:01:52,006 --> 00:01:58,005 and then array2, B 2 to B 12, right parenthesis, and enter. 39 00:01:58,005 --> 00:02:02,004 And I get a correlation of customers and sales of 0.84 40 00:02:02,004 --> 00:02:04,005 and that is very high. 41 00:02:04,005 --> 00:02:08,001 Remember that correlation varies between negative one 42 00:02:08,001 --> 00:02:10,007 and one with a value of one 43 00:02:10,007 --> 00:02:13,005 meaning that the values are perfectly correlated. 44 00:02:13,005 --> 00:02:16,006 And in a case where you have a value of 0.84 45 00:02:16,006 --> 00:02:20,007 with only 11 data points, that's a very good indication 46 00:02:20,007 --> 00:02:25,006 that there is a correlation between those two sets of data. 47 00:02:25,006 --> 00:02:26,009 Finally, let's go ahead 48 00:02:26,009 --> 00:02:31,008 and create a grid that calculates the correlation 49 00:02:31,008 --> 00:02:34,007 for each pair of columns. 50 00:02:34,007 --> 00:02:38,008 So I'll start by clicking and cell G 8 51 00:02:38,008 --> 00:02:41,008 and I want to create a formula that will calculate 52 00:02:41,008 --> 00:02:44,007 the correlation of the customer's column, 53 00:02:44,007 --> 00:02:48,006 column A with itself and I'll be able to copy that formula 54 00:02:48,006 --> 00:02:50,006 throughout the rest of the grid 55 00:02:50,006 --> 00:02:54,008 and find correlations between every pair of columns. 56 00:02:54,008 --> 00:03:00,003 So in G 8 I'll type = and then correl, C O R R E L, 57 00:03:00,003 --> 00:03:06,004 left parentheses and array1 is A 2 through A 12. 58 00:03:06,004 --> 00:03:09,001 However, with the way I'm creating my worksheet, 59 00:03:09,001 --> 00:03:10,007 I don't want that reference to change 60 00:03:10,007 --> 00:03:12,002 when I copy the formula. 61 00:03:12,002 --> 00:03:14,009 So I'll press F4 62 00:03:14,009 --> 00:03:20,007 and that makes the entire reference absolute and unchanging, 63 00:03:20,007 --> 00:03:24,000 type a comma and then I'll select A 2 64 00:03:24,000 --> 00:03:28,002 through A 12 again, right parenthesis and enter, 65 00:03:28,002 --> 00:03:29,009 and I get a correlation of one 66 00:03:29,009 --> 00:03:32,002 which as I mentioned earlier is correct. 67 00:03:32,002 --> 00:03:35,003 I'm comparing customers to itself. 68 00:03:35,003 --> 00:03:40,004 Now I will double click cell G 8 and copy the formula, 69 00:03:40,004 --> 00:03:44,005 so remember I have copied the cell's formula text. 70 00:03:44,005 --> 00:03:46,006 I have not actually copied the cell. 71 00:03:46,006 --> 00:03:47,009 So it is important 72 00:03:47,009 --> 00:03:51,003 that you not just click a cell and press control C. 73 00:03:51,003 --> 00:03:53,000 So I've already copied the values 74 00:03:53,000 --> 00:03:56,003 and then press escape to stop editing the cell, 75 00:03:56,003 --> 00:04:00,005 double click cell G 9, press control V to paste the text, 76 00:04:00,005 --> 00:04:03,009 and now, because I want to change the absolute references 77 00:04:03,009 --> 00:04:07,007 from A 2 to A 12, to B 2 to B 12 78 00:04:07,007 --> 00:04:13,008 I will edit the first range reference so I have B 2 to B 12 79 00:04:13,008 --> 00:04:16,001 and because I'm comparing customers sales 80 00:04:16,001 --> 00:04:18,000 I will leave A 2 to A 12 81 00:04:18,000 --> 00:04:22,002 as the relative references unchanged, press enter. 82 00:04:22,002 --> 00:04:26,000 And I get a correlation of 0.842 83 00:04:26,000 --> 00:04:29,002 which we can see is correct because it matches 84 00:04:29,002 --> 00:04:33,006 the calculation we did earlier in cell F 4. 85 00:04:33,006 --> 00:04:35,009 Now I will double click cell G 10, 86 00:04:35,009 --> 00:04:38,007 press control V, there's my formula again 87 00:04:38,007 --> 00:04:42,008 and I will do the same thing for the first array 88 00:04:42,008 --> 00:04:47,007 except this time I'm changing it to column C returns. 89 00:04:47,007 --> 00:04:53,000 So C 2 to C 12, comparing it to A 2 to A 12, press enter, 90 00:04:53,000 --> 00:04:56,003 and I see a very slightly negative correlation 91 00:04:56,003 --> 00:04:59,001 but that is very close to zero. 92 00:04:59,001 --> 00:05:01,007 Now I can copy the results 93 00:05:01,007 --> 00:05:06,001 of these three formulas over to cover columns H and I, 94 00:05:06,001 --> 00:05:10,001 so I will select G 8 through G 10 95 00:05:10,001 --> 00:05:11,002 and drag the fill handle 96 00:05:11,002 --> 00:05:16,002 to the right until it covers the same rows in column I, 97 00:05:16,002 --> 00:05:18,000 and there I get the values 98 00:05:18,000 --> 00:05:20,002 and it looks like everything is correct 99 00:05:20,002 --> 00:05:23,007 because I have customers and customers at one, 100 00:05:23,007 --> 00:05:27,006 sales and sales at one, and returns and returns at one. 101 00:05:27,006 --> 00:05:31,005 And I can see also that sales and returns do not 102 00:05:31,005 --> 00:05:33,000 appear to be correlated, 103 00:05:33,000 --> 00:05:38,008 a correlation value of 0.126 let's call it, is very small. 104 00:05:38,008 --> 00:05:41,004 So it appears that the only meaningful relationship 105 00:05:41,004 --> 00:05:43,005 at least from a correlation standpoint 106 00:05:43,005 --> 00:05:48,000 that I have is between sales and customers per day. 8478

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.