Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:11,050 --> 00:00:16,480
In this lecture, we are going to discuss another kind of moving average, the exponentially weighted
2
00:00:16,480 --> 00:00:17,390
moving average.
3
00:00:17,950 --> 00:00:22,490
Note that some other names for this are exponential smoothing and the low pass filter.
4
00:00:22,900 --> 00:00:27,970
So if you've taken some of my other courses before and you've heard me use those terms, recognize that
5
00:00:27,970 --> 00:00:33,850
this is the same thing, in fact, that this kind of moving average is very applicable in many areas
6
00:00:33,850 --> 00:00:37,590
of machine learning statistics, finance and signal processing.
7
00:00:37,840 --> 00:00:39,880
So you will generally see it pretty often.
8
00:00:40,510 --> 00:00:43,360
So what is the exponentially weighted moving average?
9
00:00:48,060 --> 00:00:52,500
I want to break this lecture up into two parts, the first part is the short summary.
10
00:00:52,920 --> 00:00:56,590
If you want to only watch this part and then skip to the code, that's fine.
11
00:00:57,120 --> 00:01:02,490
The second part of this lecture will be an optional in-depth discussion about why the exponentially
12
00:01:02,490 --> 00:01:04,470
weighted moving average has its name.
13
00:01:04,890 --> 00:01:10,010
You can opt to watch this if you want to get a better understanding of why and how this works.
14
00:01:14,500 --> 00:01:20,830
OK, so what's the short summary, as you know, the arithmetic mean can be calculated by taking all
15
00:01:20,830 --> 00:01:26,620
of your samples, summing them together and then dividing by the number of samples, the exponentially
16
00:01:26,620 --> 00:01:28,930
weighted moving average is calculated differently.
17
00:01:29,410 --> 00:01:33,480
In fact, it's calculated kind of on the fly or in an online manner.
18
00:01:34,000 --> 00:01:39,760
It says that the moving average at time T is equal to some constant alpha times.
19
00:01:39,760 --> 00:01:46,090
The sample at time T plus one minus alpha times the previous moving average at time at T minus one.
20
00:01:46,750 --> 00:01:52,540
In other words, at each step, the new moving average is the weighted some of the new sample and the
21
00:01:52,540 --> 00:01:53,800
old moving average.
22
00:01:54,400 --> 00:01:54,750
All right.
23
00:01:54,760 --> 00:01:55,740
So that's pretty much it.
24
00:01:55,840 --> 00:01:58,210
It's not a terribly complicated calculation.
25
00:01:58,750 --> 00:02:03,940
Of course, without further analysis, it's not clear why this is an average and it's not clear why
26
00:02:03,940 --> 00:02:05,260
it's exponentially weighted.
27
00:02:10,100 --> 00:02:16,430
The next part of our short summary is this how do we do it in code similar to the simple moving average
28
00:02:16,430 --> 00:02:20,540
we call a function on our series or our data frame called IWM.
29
00:02:20,990 --> 00:02:26,220
This returns and IWM object, which is similar to the rolling objects we saw previously.
30
00:02:26,780 --> 00:02:31,450
It has a similar set of functions such as mean variance, covariance and so forth.
31
00:02:36,200 --> 00:02:42,440
To discuss a practical issue, what value of Alpha should we choose Alpha is something like a decay
32
00:02:42,440 --> 00:02:43,040
factor.
33
00:02:43,490 --> 00:02:48,800
Typically, Alpha is chosen to be a small value between a zero and one like zero point one or zero point
34
00:02:48,800 --> 00:02:52,290
to it might help to look at some extreme cases.
35
00:02:52,580 --> 00:02:54,590
So let's say we choose Alpha equals one.
36
00:02:55,190 --> 00:02:58,880
That means set the average to be just the latest value of X..
37
00:02:59,300 --> 00:03:04,150
In this case, all we're doing is copying X and therefore it's not really an average at all.
38
00:03:04,820 --> 00:03:09,950
On the other hand, let's say we set Alpha equal to zero, then all we're doing is copying the previous
39
00:03:09,950 --> 00:03:14,720
average and we're not taking into account any new samples intuitively.
40
00:03:14,720 --> 00:03:21,410
Then if we set off a very close to one that says new samples matter much more in the old average matters,
41
00:03:21,410 --> 00:03:27,020
much less, you can imagine this will lead to a much more noisy time series which will more closely
42
00:03:27,020 --> 00:03:33,950
match the original if we set Alpha very close to zero that says new samples matter much less and the
43
00:03:33,950 --> 00:03:35,600
old average carries much more weight.
44
00:03:36,170 --> 00:03:41,450
In this situation, you'll get a much smoother time series and it will take a much more drastic change
45
00:03:41,450 --> 00:03:43,640
in X to affect the moving average.
46
00:03:48,390 --> 00:03:53,550
OK, so now that the short summary is complete, if you want to know the details behind the exponentially
47
00:03:53,550 --> 00:03:55,620
weighted moving average, keep listening.
48
00:03:56,490 --> 00:04:02,250
Let's suppose we want to calculate the usual arithmetic sample mean using the formula for the sample
49
00:04:02,250 --> 00:04:02,610
mean.
50
00:04:02,640 --> 00:04:04,710
You might suggest that this is quite obvious.
51
00:04:05,090 --> 00:04:09,960
Just take all the values of X that you've collected, add them all together and divide by the total
52
00:04:09,960 --> 00:04:11,610
number of X is that you have.
53
00:04:12,060 --> 00:04:14,140
The question is what's wrong with this?
54
00:04:14,760 --> 00:04:16,270
I'll give you a minute to think about it.
55
00:04:16,290 --> 00:04:19,500
So please pause the video until you think you have the answer.
56
00:04:24,530 --> 00:04:29,810
All right, so hopefully you thought about why calculating the sample mean naively might not be such
57
00:04:29,810 --> 00:04:30,710
a good idea.
58
00:04:31,370 --> 00:04:34,620
What if we have a lot or even an infinite amount of data?
59
00:04:35,300 --> 00:04:39,530
Obviously, our computers or our servers don't have an infinite amount of space.
60
00:04:39,980 --> 00:04:43,640
And even if they did, calculating a summation is of T.
61
00:04:43,850 --> 00:04:49,430
So the more data you have, the longer it will take and that will increase linearly with how much data
62
00:04:49,430 --> 00:04:50,250
you've collected.
63
00:04:51,170 --> 00:04:52,130
Here's my claim.
64
00:04:52,670 --> 00:04:59,060
I claim that you can make the calculation of the sample mean of one on each step in both space and time
65
00:04:59,060 --> 00:05:06,090
complexity, no matter how much data you collect again as an exercise before moving on to the next slide.
66
00:05:06,290 --> 00:05:08,930
I want you to think about how this might be the case.
67
00:05:09,470 --> 00:05:12,380
Please pause the video if you want to take a moment and think.
68
00:05:17,150 --> 00:05:22,790
OK, so hopefully you thought about how you might calculate a sample mean using constant space and time.
69
00:05:23,570 --> 00:05:29,600
The key is that you can calculate a sample mean using the previous sample mean let's call the sample
70
00:05:29,600 --> 00:05:36,380
mean after collecting samples X bar subscript T, this means that the sample mean after collecting T
71
00:05:36,380 --> 00:05:40,210
minus one samples is X, bar subscript T minus one.
72
00:05:40,790 --> 00:05:44,840
We can write down the definition of both of these, which I hope is pretty obvious.
73
00:05:45,740 --> 00:05:49,280
Now that you know the metric, let's again make this an exercise.
74
00:05:49,670 --> 00:05:56,810
Can you express Esbati in terms of X, bar T minus one, please pause the video until you've tried this
75
00:05:56,810 --> 00:05:57,470
on your own.
76
00:06:02,460 --> 00:06:04,060
OK, so here's what you can do.
77
00:06:04,680 --> 00:06:10,450
First, you take Esbati and split up the summation so that you only sum up to T minus one.
78
00:06:10,890 --> 00:06:13,680
Then you leave X subscript T by itself.
79
00:06:14,010 --> 00:06:16,170
This is just the last sample you've collected.
80
00:06:17,580 --> 00:06:23,340
The next step is to realize that the sum of the ex towers from one up to T minus one can be expressed
81
00:06:23,340 --> 00:06:25,940
in terms of X bar subscripts, T minus one.
82
00:06:26,550 --> 00:06:28,890
We just have to rearrange the equation from earlier.
83
00:06:29,670 --> 00:06:34,380
It's clear that this sum is just T minus one times X, bar T minus one.
84
00:06:35,930 --> 00:06:42,350
We can substitute this into our expression for Esbati to get the sample mean at time t in terms of the
85
00:06:42,350 --> 00:06:44,060
sample mean a time T minus one.
86
00:06:49,080 --> 00:06:53,400
One interesting thing you can do, although it's not totally clear why you'd want to do this at this
87
00:06:53,400 --> 00:06:56,220
time, is split up the formula as follows.
88
00:06:56,850 --> 00:07:04,080
The first step is to multiply out the one over Tetum that gives us T minus one over T as the first coefficient
89
00:07:04,230 --> 00:07:06,560
and one over T as the second coefficient.
90
00:07:07,140 --> 00:07:12,840
The second step is to simplify T minus one over T to one, minus one over T.
91
00:07:13,740 --> 00:07:17,290
At this point we can just leave this as is this is the form that we want.
92
00:07:17,700 --> 00:07:22,700
We have one term with the previous sample mean and we have one term with the latest sample.
93
00:07:23,490 --> 00:07:28,470
What's important to recognize about this equation is that we have discovered a way to calculate the
94
00:07:28,470 --> 00:07:33,600
sample mean that does not depend on carrying around all of the samples you've ever collected.
95
00:07:34,050 --> 00:07:39,450
All you need to have is the previous sample mean the latest sample and the number of samples you've
96
00:07:39,450 --> 00:07:40,200
seen in total.
97
00:07:45,280 --> 00:07:51,250
The next question to consider is, what if we believe that recent data matters more than past data?
98
00:07:51,910 --> 00:07:55,590
If we look at our equation carefully, we see an interesting characteristic.
99
00:07:56,140 --> 00:08:00,490
Remember that as we collect more and more samples, the value of tea is increasing.
100
00:08:01,120 --> 00:08:06,550
That means as we collect more and more samples, the weight that we give to the latest sample decreases.
101
00:08:07,120 --> 00:08:11,330
We can see that the weight that we give to the sample is exactly one over tea.
102
00:08:12,250 --> 00:08:16,930
Now, although this might make you think that the influence of each sample somehow decays over time,
103
00:08:17,200 --> 00:08:21,790
remember that this is not true because this is still just a regular arithmetic mean.
104
00:08:26,670 --> 00:08:32,250
But what if we want recent data to matter more, what would happen if instead of making the way one
105
00:08:32,250 --> 00:08:35,250
over tea, we simply make it a constant alpha?
106
00:08:35,820 --> 00:08:39,340
Well, then this is exactly the exponentially weighted moving average.
107
00:08:39,930 --> 00:08:45,810
The basic idea is instead of giving less and less weight to each new sample, we now give a constant
108
00:08:45,810 --> 00:08:47,150
weight to each new sample.
109
00:08:47,850 --> 00:08:51,000
Let's see how this affects the influence of each sample overall.
110
00:08:56,020 --> 00:09:01,450
The next question we want to answer is, how does this update actually implement an exponentially weighted
111
00:09:01,450 --> 00:09:02,260
moving average?
112
00:09:02,680 --> 00:09:04,120
Can we show that this is true?
113
00:09:04,780 --> 00:09:07,960
And in fact, it's not too difficult at this point.
114
00:09:07,960 --> 00:09:10,810
What we can do is just keep recursively plugging in.
115
00:09:10,810 --> 00:09:17,890
Older and older values of the sample mean so we can replace X, bar T minus one with its representation
116
00:09:17,890 --> 00:09:20,080
in terms of X bar at T minus two.
117
00:09:21,760 --> 00:09:27,280
Then we can multiply out the one minus alpha term so that we get X bar at T minus two by itself.
118
00:09:27,760 --> 00:09:33,970
Now we have three terms X bar T minus two, the sample at T minus one and the sample at time T.
119
00:09:34,960 --> 00:09:40,990
The next step is, of course, to replace X, bar T minus two with its representation in terms of X,
120
00:09:40,990 --> 00:09:42,110
bar T minus three.
121
00:09:42,820 --> 00:09:48,310
From there we can do the same thing, multiply out the one minus alpha and get each of the terms by
122
00:09:48,310 --> 00:09:49,120
themselves.
123
00:09:49,690 --> 00:09:51,520
At this point you should see a pattern.
124
00:09:52,850 --> 00:09:58,370
The number of individual samples keeps growing and the power on the one minus alpha term also keeps
125
00:09:58,370 --> 00:09:58,890
growing.
126
00:09:59,570 --> 00:10:05,660
If we keep repeating this pattern tee times, we end up with this expression involving a summation over
127
00:10:05,660 --> 00:10:08,060
all the past samples from one up to T.
128
00:10:08,750 --> 00:10:14,840
And of course, these weights are exactly exponentially decaying since Alpha is a number between zero
129
00:10:14,840 --> 00:10:18,970
and one, one minus Alpha is also a number between zero and one.
130
00:10:19,370 --> 00:10:24,560
And when you raise a number between zero and one to OPOWER, it gets smaller and smaller exponentially
131
00:10:24,740 --> 00:10:26,390
as K gets larger and larger.
132
00:10:31,510 --> 00:10:37,120
So how can we summarize what we've learned in this lecture, we've extended the concept of the mean
133
00:10:37,330 --> 00:10:39,460
to include the exponentially weighted mean.
134
00:10:39,940 --> 00:10:45,070
We can picture this by assigning weights to each of our samples with the arithmetic average.
135
00:10:45,250 --> 00:10:48,850
Each of the weights is just constant, with equal weight for each sample.
136
00:10:49,420 --> 00:10:54,460
With the exponentially weighted average, the weights decay exponentially, going backwards in time.
137
00:10:55,030 --> 00:10:57,440
This means that the latest sample matters the most.
138
00:10:57,670 --> 00:10:59,470
The second latest sample matters less.
139
00:10:59,650 --> 00:11:02,500
The third latest sample matters even less and so forth.
14950
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.