subtitlecat.com

All language subtitles for 5. EWMA Theory

Afrikaans

Albanian

Amharic

Arabic Download

Armenian

Azerbaijani

Basque

Belarusian

Bengali

Bosnian

Bulgarian

Catalan

Cebuano

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Khmer

Korean

Kurdish (Kurmanji)

Kyrgyz

Lao

Latin

Latvian

Lithuanian

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mongolian

Myanmar (Burmese)

Nepali

Norwegian

Pashto

Persian

Polish

Portuguese

Punjabi

Romanian

Russian

Samoan

Scots Gaelic

Serbian

Sesotho

Shona

Sindhi

Sinhala

Slovak

Slovenian

Somali

Spanish

Sundanese

Swahili

Swedish

Tajik

Tamil

Telugu

Thai

Turkish

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Xhosa

Yiddish

Yoruba

Zulu

Odia (Oriya)

Kinyarwanda

Turkmen

Tatar

Uyghur

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:11,050 --> 00:00:16,480 In this lecture, we are going to discuss another kind of moving average, the exponentially weighted 2 00:00:16,480 --> 00:00:17,390 moving average. 3 00:00:17,950 --> 00:00:22,490 Note that some other names for this are exponential smoothing and the low pass filter. 4 00:00:22,900 --> 00:00:27,970 So if you've taken some of my other courses before and you've heard me use those terms, recognize that 5 00:00:27,970 --> 00:00:33,850 this is the same thing, in fact, that this kind of moving average is very applicable in many areas 6 00:00:33,850 --> 00:00:37,590 of machine learning statistics, finance and signal processing. 7 00:00:37,840 --> 00:00:39,880 So you will generally see it pretty often. 8 00:00:40,510 --> 00:00:43,360 So what is the exponentially weighted moving average? 9 00:00:48,060 --> 00:00:52,500 I want to break this lecture up into two parts, the first part is the short summary. 10 00:00:52,920 --> 00:00:56,590 If you want to only watch this part and then skip to the code, that's fine. 11 00:00:57,120 --> 00:01:02,490 The second part of this lecture will be an optional in-depth discussion about why the exponentially 12 00:01:02,490 --> 00:01:04,470 weighted moving average has its name. 13 00:01:04,890 --> 00:01:10,010 You can opt to watch this if you want to get a better understanding of why and how this works. 14 00:01:14,500 --> 00:01:20,830 OK, so what's the short summary, as you know, the arithmetic mean can be calculated by taking all 15 00:01:20,830 --> 00:01:26,620 of your samples, summing them together and then dividing by the number of samples, the exponentially 16 00:01:26,620 --> 00:01:28,930 weighted moving average is calculated differently. 17 00:01:29,410 --> 00:01:33,480 In fact, it's calculated kind of on the fly or in an online manner. 18 00:01:34,000 --> 00:01:39,760 It says that the moving average at time T is equal to some constant alpha times. 19 00:01:39,760 --> 00:01:46,090 The sample at time T plus one minus alpha times the previous moving average at time at T minus one. 20 00:01:46,750 --> 00:01:52,540 In other words, at each step, the new moving average is the weighted some of the new sample and the 21 00:01:52,540 --> 00:01:53,800 old moving average. 22 00:01:54,400 --> 00:01:54,750 All right. 23 00:01:54,760 --> 00:01:55,740 So that's pretty much it. 24 00:01:55,840 --> 00:01:58,210 It's not a terribly complicated calculation. 25 00:01:58,750 --> 00:02:03,940 Of course, without further analysis, it's not clear why this is an average and it's not clear why 26 00:02:03,940 --> 00:02:05,260 it's exponentially weighted. 27 00:02:10,100 --> 00:02:16,430 The next part of our short summary is this how do we do it in code similar to the simple moving average 28 00:02:16,430 --> 00:02:20,540 we call a function on our series or our data frame called IWM. 29 00:02:20,990 --> 00:02:26,220 This returns and IWM object, which is similar to the rolling objects we saw previously. 30 00:02:26,780 --> 00:02:31,450 It has a similar set of functions such as mean variance, covariance and so forth. 31 00:02:36,200 --> 00:02:42,440 To discuss a practical issue, what value of Alpha should we choose Alpha is something like a decay 32 00:02:42,440 --> 00:02:43,040 factor. 33 00:02:43,490 --> 00:02:48,800 Typically, Alpha is chosen to be a small value between a zero and one like zero point one or zero point 34 00:02:48,800 --> 00:02:52,290 to it might help to look at some extreme cases. 35 00:02:52,580 --> 00:02:54,590 So let's say we choose Alpha equals one. 36 00:02:55,190 --> 00:02:58,880 That means set the average to be just the latest value of X.. 37 00:02:59,300 --> 00:03:04,150 In this case, all we're doing is copying X and therefore it's not really an average at all. 38 00:03:04,820 --> 00:03:09,950 On the other hand, let's say we set Alpha equal to zero, then all we're doing is copying the previous 39 00:03:09,950 --> 00:03:14,720 average and we're not taking into account any new samples intuitively. 40 00:03:14,720 --> 00:03:21,410 Then if we set off a very close to one that says new samples matter much more in the old average matters, 41 00:03:21,410 --> 00:03:27,020 much less, you can imagine this will lead to a much more noisy time series which will more closely 42 00:03:27,020 --> 00:03:33,950 match the original if we set Alpha very close to zero that says new samples matter much less and the 43 00:03:33,950 --> 00:03:35,600 old average carries much more weight. 44 00:03:36,170 --> 00:03:41,450 In this situation, you'll get a much smoother time series and it will take a much more drastic change 45 00:03:41,450 --> 00:03:43,640 in X to affect the moving average. 46 00:03:48,390 --> 00:03:53,550 OK, so now that the short summary is complete, if you want to know the details behind the exponentially 47 00:03:53,550 --> 00:03:55,620 weighted moving average, keep listening. 48 00:03:56,490 --> 00:04:02,250 Let's suppose we want to calculate the usual arithmetic sample mean using the formula for the sample 49 00:04:02,250 --> 00:04:02,610 mean. 50 00:04:02,640 --> 00:04:04,710 You might suggest that this is quite obvious. 51 00:04:05,090 --> 00:04:09,960 Just take all the values of X that you've collected, add them all together and divide by the total 52 00:04:09,960 --> 00:04:11,610 number of X is that you have. 53 00:04:12,060 --> 00:04:14,140 The question is what's wrong with this? 54 00:04:14,760 --> 00:04:16,270 I'll give you a minute to think about it. 55 00:04:16,290 --> 00:04:19,500 So please pause the video until you think you have the answer. 56 00:04:24,530 --> 00:04:29,810 All right, so hopefully you thought about why calculating the sample mean naively might not be such 57 00:04:29,810 --> 00:04:30,710 a good idea. 58 00:04:31,370 --> 00:04:34,620 What if we have a lot or even an infinite amount of data? 59 00:04:35,300 --> 00:04:39,530 Obviously, our computers or our servers don't have an infinite amount of space. 60 00:04:39,980 --> 00:04:43,640 And even if they did, calculating a summation is of T. 61 00:04:43,850 --> 00:04:49,430 So the more data you have, the longer it will take and that will increase linearly with how much data 62 00:04:49,430 --> 00:04:50,250 you've collected. 63 00:04:51,170 --> 00:04:52,130 Here's my claim. 64 00:04:52,670 --> 00:04:59,060 I claim that you can make the calculation of the sample mean of one on each step in both space and time 65 00:04:59,060 --> 00:05:06,090 complexity, no matter how much data you collect again as an exercise before moving on to the next slide. 66 00:05:06,290 --> 00:05:08,930 I want you to think about how this might be the case. 67 00:05:09,470 --> 00:05:12,380 Please pause the video if you want to take a moment and think. 68 00:05:17,150 --> 00:05:22,790 OK, so hopefully you thought about how you might calculate a sample mean using constant space and time. 69 00:05:23,570 --> 00:05:29,600 The key is that you can calculate a sample mean using the previous sample mean let's call the sample 70 00:05:29,600 --> 00:05:36,380 mean after collecting samples X bar subscript T, this means that the sample mean after collecting T 71 00:05:36,380 --> 00:05:40,210 minus one samples is X, bar subscript T minus one. 72 00:05:40,790 --> 00:05:44,840 We can write down the definition of both of these, which I hope is pretty obvious. 73 00:05:45,740 --> 00:05:49,280 Now that you know the metric, let's again make this an exercise. 74 00:05:49,670 --> 00:05:56,810 Can you express Esbati in terms of X, bar T minus one, please pause the video until you've tried this 75 00:05:56,810 --> 00:05:57,470 on your own. 76 00:06:02,460 --> 00:06:04,060 OK, so here's what you can do. 77 00:06:04,680 --> 00:06:10,450 First, you take Esbati and split up the summation so that you only sum up to T minus one. 78 00:06:10,890 --> 00:06:13,680 Then you leave X subscript T by itself. 79 00:06:14,010 --> 00:06:16,170 This is just the last sample you've collected. 80 00:06:17,580 --> 00:06:23,340 The next step is to realize that the sum of the ex towers from one up to T minus one can be expressed 81 00:06:23,340 --> 00:06:25,940 in terms of X bar subscripts, T minus one. 82 00:06:26,550 --> 00:06:28,890 We just have to rearrange the equation from earlier. 83 00:06:29,670 --> 00:06:34,380 It's clear that this sum is just T minus one times X, bar T minus one. 84 00:06:35,930 --> 00:06:42,350 We can substitute this into our expression for Esbati to get the sample mean at time t in terms of the 85 00:06:42,350 --> 00:06:44,060 sample mean a time T minus one. 86 00:06:49,080 --> 00:06:53,400 One interesting thing you can do, although it's not totally clear why you'd want to do this at this 87 00:06:53,400 --> 00:06:56,220 time, is split up the formula as follows. 88 00:06:56,850 --> 00:07:04,080 The first step is to multiply out the one over Tetum that gives us T minus one over T as the first coefficient 89 00:07:04,230 --> 00:07:06,560 and one over T as the second coefficient. 90 00:07:07,140 --> 00:07:12,840 The second step is to simplify T minus one over T to one, minus one over T. 91 00:07:13,740 --> 00:07:17,290 At this point we can just leave this as is this is the form that we want. 92 00:07:17,700 --> 00:07:22,700 We have one term with the previous sample mean and we have one term with the latest sample. 93 00:07:23,490 --> 00:07:28,470 What's important to recognize about this equation is that we have discovered a way to calculate the 94 00:07:28,470 --> 00:07:33,600 sample mean that does not depend on carrying around all of the samples you've ever collected. 95 00:07:34,050 --> 00:07:39,450 All you need to have is the previous sample mean the latest sample and the number of samples you've 96 00:07:39,450 --> 00:07:40,200 seen in total. 97 00:07:45,280 --> 00:07:51,250 The next question to consider is, what if we believe that recent data matters more than past data? 98 00:07:51,910 --> 00:07:55,590 If we look at our equation carefully, we see an interesting characteristic. 99 00:07:56,140 --> 00:08:00,490 Remember that as we collect more and more samples, the value of tea is increasing. 100 00:08:01,120 --> 00:08:06,550 That means as we collect more and more samples, the weight that we give to the latest sample decreases. 101 00:08:07,120 --> 00:08:11,330 We can see that the weight that we give to the sample is exactly one over tea. 102 00:08:12,250 --> 00:08:16,930 Now, although this might make you think that the influence of each sample somehow decays over time, 103 00:08:17,200 --> 00:08:21,790 remember that this is not true because this is still just a regular arithmetic mean. 104 00:08:26,670 --> 00:08:32,250 But what if we want recent data to matter more, what would happen if instead of making the way one 105 00:08:32,250 --> 00:08:35,250 over tea, we simply make it a constant alpha? 106 00:08:35,820 --> 00:08:39,340 Well, then this is exactly the exponentially weighted moving average. 107 00:08:39,930 --> 00:08:45,810 The basic idea is instead of giving less and less weight to each new sample, we now give a constant 108 00:08:45,810 --> 00:08:47,150 weight to each new sample. 109 00:08:47,850 --> 00:08:51,000 Let's see how this affects the influence of each sample overall. 110 00:08:56,020 --> 00:09:01,450 The next question we want to answer is, how does this update actually implement an exponentially weighted 111 00:09:01,450 --> 00:09:02,260 moving average? 112 00:09:02,680 --> 00:09:04,120 Can we show that this is true? 113 00:09:04,780 --> 00:09:07,960 And in fact, it's not too difficult at this point. 114 00:09:07,960 --> 00:09:10,810 What we can do is just keep recursively plugging in. 115 00:09:10,810 --> 00:09:17,890 Older and older values of the sample mean so we can replace X, bar T minus one with its representation 116 00:09:17,890 --> 00:09:20,080 in terms of X bar at T minus two. 117 00:09:21,760 --> 00:09:27,280 Then we can multiply out the one minus alpha term so that we get X bar at T minus two by itself. 118 00:09:27,760 --> 00:09:33,970 Now we have three terms X bar T minus two, the sample at T minus one and the sample at time T. 119 00:09:34,960 --> 00:09:40,990 The next step is, of course, to replace X, bar T minus two with its representation in terms of X, 120 00:09:40,990 --> 00:09:42,110 bar T minus three. 121 00:09:42,820 --> 00:09:48,310 From there we can do the same thing, multiply out the one minus alpha and get each of the terms by 122 00:09:48,310 --> 00:09:49,120 themselves. 123 00:09:49,690 --> 00:09:51,520 At this point you should see a pattern. 124 00:09:52,850 --> 00:09:58,370 The number of individual samples keeps growing and the power on the one minus alpha term also keeps 125 00:09:58,370 --> 00:09:58,890 growing. 126 00:09:59,570 --> 00:10:05,660 If we keep repeating this pattern tee times, we end up with this expression involving a summation over 127 00:10:05,660 --> 00:10:08,060 all the past samples from one up to T. 128 00:10:08,750 --> 00:10:14,840 And of course, these weights are exactly exponentially decaying since Alpha is a number between zero 129 00:10:14,840 --> 00:10:18,970 and one, one minus Alpha is also a number between zero and one. 130 00:10:19,370 --> 00:10:24,560 And when you raise a number between zero and one to OPOWER, it gets smaller and smaller exponentially 131 00:10:24,740 --> 00:10:26,390 as K gets larger and larger. 132 00:10:31,510 --> 00:10:37,120 So how can we summarize what we've learned in this lecture, we've extended the concept of the mean 133 00:10:37,330 --> 00:10:39,460 to include the exponentially weighted mean. 134 00:10:39,940 --> 00:10:45,070 We can picture this by assigning weights to each of our samples with the arithmetic average. 135 00:10:45,250 --> 00:10:48,850 Each of the weights is just constant, with equal weight for each sample. 136 00:10:49,420 --> 00:10:54,460 With the exponentially weighted average, the weights decay exponentially, going backwards in time. 137 00:10:55,030 --> 00:10:57,440 This means that the latest sample matters the most. 138 00:10:57,670 --> 00:10:59,470 The second latest sample matters less. 139 00:10:59,650 --> 00:11:02,500 The third latest sample matters even less and so forth. 14950