All language subtitles for 7. Power, Log, and Box-Cox Transformations in Code

af Afrikaans
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:12,120 --> 00:00:17,850 OK, so in this lecture, we are going to investigate the Box Cox and other Times series transformations 2 00:00:17,850 --> 00:00:18,450 in code. 3 00:00:19,110 --> 00:00:23,880 So we'll start by downloading a famous time series data set called Airline Passengers. 4 00:00:27,330 --> 00:00:33,510 Next, we're going to import numbers, pandas and matplotlib, nothing you haven't seen before, and 5 00:00:33,510 --> 00:00:35,760 also the Buzzcocks function from Zippi. 6 00:00:40,400 --> 00:00:44,790 Next, we're going to read an RSV using Pedigree's CSFI. 7 00:00:45,020 --> 00:00:47,270 So, again, nothing too surprising here. 8 00:00:51,000 --> 00:00:56,400 Now, one thing I always like to do whenever I load in some data is to just see what it looks like. 9 00:00:56,760 --> 00:01:01,410 This always makes me more confident in the code that I wrote, and it lets me know that my code makes 10 00:01:01,410 --> 00:01:01,940 sense. 11 00:01:02,370 --> 00:01:06,920 So we'll do a DFT head and this will print out the first few rows of the data. 12 00:01:09,550 --> 00:01:14,860 OK, so we can see that for the index, we have the date, which is monthly, and we have one integer 13 00:01:14,860 --> 00:01:16,420 column called the passengers. 14 00:01:19,650 --> 00:01:23,330 The next step is to plot our data set just to see what we're dealing with. 15 00:01:28,820 --> 00:01:34,550 So there are several characteristics of this data that I want you to notice, no one noticed that it 16 00:01:34,550 --> 00:01:35,360 has a trend. 17 00:01:35,810 --> 00:01:41,900 This Time series is going upward into the right number to notice that it has some seasonality. 18 00:01:42,230 --> 00:01:44,630 That is, there is a repeating pattern in time. 19 00:01:45,530 --> 00:01:50,730 Number three, notice that the amplitude of this seasonal pattern increases over time. 20 00:01:51,170 --> 00:01:56,100 So at the beginning, the amplitude is pretty small, but at the end it gets larger and larger. 21 00:01:56,870 --> 00:02:01,910 So these are all characteristics of Time series that we will explicitly model in this course. 22 00:02:02,150 --> 00:02:05,180 And you'll learn how each algorithm handles these characteristics. 23 00:02:06,680 --> 00:02:11,750 One thing we would like to see, which will make more sense later, is for things to not change over 24 00:02:11,750 --> 00:02:12,270 time. 25 00:02:12,560 --> 00:02:15,530 So, for example, this amplitude increasing over time. 26 00:02:15,710 --> 00:02:17,510 It would be nice if that went away. 27 00:02:21,160 --> 00:02:24,740 OK, so the next step will be to try to square root transform. 28 00:02:25,270 --> 00:02:28,990 So here we call the security function on the passengers column. 29 00:02:32,990 --> 00:02:35,030 The next step is to plot our new column. 30 00:02:38,900 --> 00:02:44,300 OK, so we can see that the Time series has been squashed down slightly, but the amplitude of the seasonal 31 00:02:44,300 --> 00:02:46,900 pattern still seems to increase over time. 32 00:02:49,710 --> 00:02:52,000 The next step is to try the log transform. 33 00:02:52,560 --> 00:02:55,590 So here we call the log function on the passengers column. 34 00:02:58,970 --> 00:03:00,830 The next step is to plot our new column. 35 00:03:06,510 --> 00:03:12,360 OK, so this log transform seems to do a pretty good job at squashing down the data to make it look 36 00:03:12,360 --> 00:03:13,620 more uniform and time. 37 00:03:17,550 --> 00:03:23,550 The final step will be to do a box cox transform, so this function takes in a one dimensional data 38 00:03:23,550 --> 00:03:28,800 set as input and it returns the transform data along with the optimal value of lambda. 39 00:03:29,370 --> 00:03:33,330 So you can see we've assigned the result to the variables called data in Lambe. 40 00:03:36,560 --> 00:03:39,770 The next step is to just print out lamb to see what value we got. 41 00:03:42,740 --> 00:03:47,990 So we get about zero point one five, which is kind of in between the log transform in the square root 42 00:03:47,990 --> 00:03:56,270 transform, the next step is to assign our box --'s transform data to a new column in our data frame 43 00:03:56,270 --> 00:03:57,320 and make a new plot. 44 00:03:57,590 --> 00:03:58,550 So let's try that. 45 00:04:03,030 --> 00:04:08,910 OK, and we see that it definitely looks like something in between the square root and the log transforms. 46 00:04:13,170 --> 00:04:16,770 The next step is to visualize our data in the form of a histogram. 47 00:04:17,370 --> 00:04:21,940 This should give us some insight into what the box clocks transform actually does. 48 00:04:22,350 --> 00:04:27,600 But as mentioned in the theory lecture, keep in mind that this kind of plot doesn't really make sense 49 00:04:27,780 --> 00:04:30,060 in terms of the distribution of the data. 50 00:04:30,480 --> 00:04:35,930 We can't really talk about the distribution when the distribution is dynamic and changing in time. 51 00:04:40,150 --> 00:04:45,640 OK, so for the raw passenger's data, we see that most of the values are concentrated in the lower 52 00:04:45,640 --> 00:04:46,360 hundreds. 53 00:04:51,260 --> 00:04:57,050 Now for the square root data, we see that the distribution has been pushed further to the right, so 54 00:04:57,050 --> 00:05:01,070 it's more flat than before and less concentrated on the lower values. 55 00:05:06,500 --> 00:05:12,290 For the log data, we see that the distribution now kind of resembles a mountain where it's more evenly 56 00:05:12,290 --> 00:05:15,250 spaced out in the center instead of off to one side. 57 00:05:20,720 --> 00:05:26,390 And for the Box Cox data, we see almost the same pattern, except the largest peak is now closer to 58 00:05:26,390 --> 00:05:27,050 the center. 59 00:05:28,430 --> 00:05:30,460 OK, so that's pretty much it for this code. 60 00:05:30,710 --> 00:05:35,630 I think this is enough to give you some sense of what effect these different transformations have. 61 00:05:37,460 --> 00:05:43,310 Note that in this course, we won't be applying the box cox transform very often the reason for this 62 00:05:43,310 --> 00:05:47,540 is there's going to be a combinatorial explosion of techniques for us to try. 63 00:05:47,900 --> 00:05:52,250 And if we tried them all, not only would you get very bored and think this course was very tedious, 64 00:05:52,430 --> 00:05:53,840 you wouldn't gain very much. 65 00:05:54,200 --> 00:05:59,240 But I want to make you aware of these tools so that you can apply them in your work if you think they 66 00:05:59,270 --> 00:05:59,990 would be useful. 6977

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.