Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:12,120 --> 00:00:17,850
OK, so in this lecture, we are going to investigate the Box Cox and other Times series transformations
2
00:00:17,850 --> 00:00:18,450
in code.
3
00:00:19,110 --> 00:00:23,880
So we'll start by downloading a famous time series data set called Airline Passengers.
4
00:00:27,330 --> 00:00:33,510
Next, we're going to import numbers, pandas and matplotlib, nothing you haven't seen before, and
5
00:00:33,510 --> 00:00:35,760
also the Buzzcocks function from Zippi.
6
00:00:40,400 --> 00:00:44,790
Next, we're going to read an RSV using Pedigree's CSFI.
7
00:00:45,020 --> 00:00:47,270
So, again, nothing too surprising here.
8
00:00:51,000 --> 00:00:56,400
Now, one thing I always like to do whenever I load in some data is to just see what it looks like.
9
00:00:56,760 --> 00:01:01,410
This always makes me more confident in the code that I wrote, and it lets me know that my code makes
10
00:01:01,410 --> 00:01:01,940
sense.
11
00:01:02,370 --> 00:01:06,920
So we'll do a DFT head and this will print out the first few rows of the data.
12
00:01:09,550 --> 00:01:14,860
OK, so we can see that for the index, we have the date, which is monthly, and we have one integer
13
00:01:14,860 --> 00:01:16,420
column called the passengers.
14
00:01:19,650 --> 00:01:23,330
The next step is to plot our data set just to see what we're dealing with.
15
00:01:28,820 --> 00:01:34,550
So there are several characteristics of this data that I want you to notice, no one noticed that it
16
00:01:34,550 --> 00:01:35,360
has a trend.
17
00:01:35,810 --> 00:01:41,900
This Time series is going upward into the right number to notice that it has some seasonality.
18
00:01:42,230 --> 00:01:44,630
That is, there is a repeating pattern in time.
19
00:01:45,530 --> 00:01:50,730
Number three, notice that the amplitude of this seasonal pattern increases over time.
20
00:01:51,170 --> 00:01:56,100
So at the beginning, the amplitude is pretty small, but at the end it gets larger and larger.
21
00:01:56,870 --> 00:02:01,910
So these are all characteristics of Time series that we will explicitly model in this course.
22
00:02:02,150 --> 00:02:05,180
And you'll learn how each algorithm handles these characteristics.
23
00:02:06,680 --> 00:02:11,750
One thing we would like to see, which will make more sense later, is for things to not change over
24
00:02:11,750 --> 00:02:12,270
time.
25
00:02:12,560 --> 00:02:15,530
So, for example, this amplitude increasing over time.
26
00:02:15,710 --> 00:02:17,510
It would be nice if that went away.
27
00:02:21,160 --> 00:02:24,740
OK, so the next step will be to try to square root transform.
28
00:02:25,270 --> 00:02:28,990
So here we call the security function on the passengers column.
29
00:02:32,990 --> 00:02:35,030
The next step is to plot our new column.
30
00:02:38,900 --> 00:02:44,300
OK, so we can see that the Time series has been squashed down slightly, but the amplitude of the seasonal
31
00:02:44,300 --> 00:02:46,900
pattern still seems to increase over time.
32
00:02:49,710 --> 00:02:52,000
The next step is to try the log transform.
33
00:02:52,560 --> 00:02:55,590
So here we call the log function on the passengers column.
34
00:02:58,970 --> 00:03:00,830
The next step is to plot our new column.
35
00:03:06,510 --> 00:03:12,360
OK, so this log transform seems to do a pretty good job at squashing down the data to make it look
36
00:03:12,360 --> 00:03:13,620
more uniform and time.
37
00:03:17,550 --> 00:03:23,550
The final step will be to do a box cox transform, so this function takes in a one dimensional data
38
00:03:23,550 --> 00:03:28,800
set as input and it returns the transform data along with the optimal value of lambda.
39
00:03:29,370 --> 00:03:33,330
So you can see we've assigned the result to the variables called data in Lambe.
40
00:03:36,560 --> 00:03:39,770
The next step is to just print out lamb to see what value we got.
41
00:03:42,740 --> 00:03:47,990
So we get about zero point one five, which is kind of in between the log transform in the square root
42
00:03:47,990 --> 00:03:56,270
transform, the next step is to assign our box --'s transform data to a new column in our data frame
43
00:03:56,270 --> 00:03:57,320
and make a new plot.
44
00:03:57,590 --> 00:03:58,550
So let's try that.
45
00:04:03,030 --> 00:04:08,910
OK, and we see that it definitely looks like something in between the square root and the log transforms.
46
00:04:13,170 --> 00:04:16,770
The next step is to visualize our data in the form of a histogram.
47
00:04:17,370 --> 00:04:21,940
This should give us some insight into what the box clocks transform actually does.
48
00:04:22,350 --> 00:04:27,600
But as mentioned in the theory lecture, keep in mind that this kind of plot doesn't really make sense
49
00:04:27,780 --> 00:04:30,060
in terms of the distribution of the data.
50
00:04:30,480 --> 00:04:35,930
We can't really talk about the distribution when the distribution is dynamic and changing in time.
51
00:04:40,150 --> 00:04:45,640
OK, so for the raw passenger's data, we see that most of the values are concentrated in the lower
52
00:04:45,640 --> 00:04:46,360
hundreds.
53
00:04:51,260 --> 00:04:57,050
Now for the square root data, we see that the distribution has been pushed further to the right, so
54
00:04:57,050 --> 00:05:01,070
it's more flat than before and less concentrated on the lower values.
55
00:05:06,500 --> 00:05:12,290
For the log data, we see that the distribution now kind of resembles a mountain where it's more evenly
56
00:05:12,290 --> 00:05:15,250
spaced out in the center instead of off to one side.
57
00:05:20,720 --> 00:05:26,390
And for the Box Cox data, we see almost the same pattern, except the largest peak is now closer to
58
00:05:26,390 --> 00:05:27,050
the center.
59
00:05:28,430 --> 00:05:30,460
OK, so that's pretty much it for this code.
60
00:05:30,710 --> 00:05:35,630
I think this is enough to give you some sense of what effect these different transformations have.
61
00:05:37,460 --> 00:05:43,310
Note that in this course, we won't be applying the box cox transform very often the reason for this
62
00:05:43,310 --> 00:05:47,540
is there's going to be a combinatorial explosion of techniques for us to try.
63
00:05:47,900 --> 00:05:52,250
And if we tried them all, not only would you get very bored and think this course was very tedious,
64
00:05:52,430 --> 00:05:53,840
you wouldn't gain very much.
65
00:05:54,200 --> 00:05:59,240
But I want to make you aware of these tools so that you can apply them in your work if you think they
66
00:05:59,270 --> 00:05:59,990
would be useful.
6977
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.