All language subtitles for 12. The Naive Forecast and the Importance of Baselines

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:11,110 --> 00:00:17,860 In this lecture, we are going to discuss a method of forecasting called the naive forecast before we 2 00:00:17,860 --> 00:00:20,030 discuss what the new forecast is. 3 00:00:20,320 --> 00:00:24,280 Let's talk about the importance of establishing baselines in machine learning. 4 00:00:25,120 --> 00:00:28,750 The purpose of a baseline is to have a relevant point of comparison. 5 00:00:29,410 --> 00:00:34,660 To give you an example, suppose that you went to your professor tomorrow and say, hey, I just made 6 00:00:34,660 --> 00:00:35,650 a great discovery. 7 00:00:35,950 --> 00:00:39,880 I built a model using deep learning and I got 99 percent accuracy. 8 00:00:40,510 --> 00:00:45,670 Unfortunately, the statement is meaningless because you don't have a context of a baseline. 9 00:00:46,420 --> 00:00:52,810 Your professor might say, but student, we already know that a simple linear model achieves 100 percent 10 00:00:52,810 --> 00:00:53,590 accuracy. 11 00:00:54,370 --> 00:00:57,670 In this case, your model is worse than what currently exists. 12 00:00:57,880 --> 00:00:59,960 And furthermore, it's also slower. 13 00:01:00,580 --> 00:01:03,210 So how can we make sure that we don't make this mistake? 14 00:01:07,990 --> 00:01:10,100 The answer is using a baseline. 15 00:01:10,720 --> 00:01:15,310 You'll notice that when you're reading machine learning papers, they often compare their results against 16 00:01:15,310 --> 00:01:16,810 the existing state of the art. 17 00:01:17,500 --> 00:01:22,060 Now, it's important to realize that you don't have to constantly try to beat the state of the art. 18 00:01:22,600 --> 00:01:27,400 In fact, that's kind of antithetical to science because all you're doing is chasing numbers. 19 00:01:27,910 --> 00:01:33,640 A good a real life example of that is when students study very hard to memorize their notes so that 20 00:01:33,640 --> 00:01:39,580 they can ace their exam without truly understanding the material or how it's applied when you're chasing 21 00:01:39,580 --> 00:01:40,430 only numbers. 22 00:01:40,720 --> 00:01:44,560 Sometimes it's easy to forget why those numbers matter in the first place. 23 00:01:45,190 --> 00:01:46,790 Anyway, to get back to the point. 24 00:01:47,020 --> 00:01:50,090 It's not always true that you want to compare to the state of the art. 25 00:01:50,710 --> 00:01:55,600 Sometimes if you just want to test whether or not something is working, for example, as a proof of 26 00:01:55,600 --> 00:01:59,110 concept, then the simplest model possible should suffice. 27 00:02:03,880 --> 00:02:10,450 In Time series analysis, the simplest model possible happens to be the naive forecast, what is the 28 00:02:10,450 --> 00:02:11,560 new forecast? 29 00:02:11,980 --> 00:02:17,530 Well, it's just another name for something we've already discussed that is to copy the previous known 30 00:02:17,530 --> 00:02:19,120 value forward in time. 31 00:02:19,960 --> 00:02:26,230 One interesting phenomena that happens in a Time series analysis is that a lot of bad models end up 32 00:02:26,230 --> 00:02:27,970 looking like naive forecasts. 33 00:02:28,570 --> 00:02:33,600 When you look at the model predictions from afar, they seem to be pretty close to the true values. 34 00:02:34,030 --> 00:02:39,340 But when you zoom in, you can see that they seem pretty close only because they're just copying the 35 00:02:39,340 --> 00:02:41,290 previous value or close to it. 36 00:02:46,190 --> 00:02:53,030 A really bad situation is this suppose that you fit a model and you say, aha, I beat the naive forecast 37 00:02:53,030 --> 00:02:56,300 because my accuracy is better than the Niyi forecast. 38 00:02:56,870 --> 00:03:02,330 But of course, you have to ask, are you talking about the accuracy on the sample data or the out of 39 00:03:02,330 --> 00:03:03,060 sample data? 40 00:03:03,740 --> 00:03:08,640 Note that in machine learning we often refer to these as the training data and the test data. 41 00:03:09,200 --> 00:03:11,240 So I would use these terms interchangeably. 42 00:03:11,250 --> 00:03:12,140 Just be aware. 43 00:03:13,430 --> 00:03:19,280 Well, one common mistake is that people believe because they got good accuracy on their in sample data, 44 00:03:19,460 --> 00:03:21,770 that the same will be true for their out of sample data. 45 00:03:22,190 --> 00:03:25,570 And also, quite commonly, it turns out that the opposite is true. 46 00:03:26,210 --> 00:03:31,790 You might beat the night forecast on the sample data, but on the out of sample data, the naive forecast 47 00:03:31,790 --> 00:03:32,480 beats you. 48 00:03:33,080 --> 00:03:34,100 Why does this happen? 49 00:03:36,080 --> 00:03:41,450 It happens when your model over fits to the noise in the training data, but doesn't actually generalize 50 00:03:41,450 --> 00:03:45,510 to the true underlying pattern in the Time series if one exists. 51 00:03:46,070 --> 00:03:51,360 So that's why it's really important to compare your model to a baseline such as the naive forecast. 52 00:03:51,890 --> 00:03:58,440 It's not good enough to say I got 80 percent classification rate on my train and 75 percent on my test 53 00:03:58,450 --> 00:03:58,750 set. 54 00:03:59,150 --> 00:04:04,260 If some other method achieves 70 percent on the test, then your model isn't as good. 55 00:04:09,370 --> 00:04:14,080 There's one application I've seen a lot, which I think is a very interesting consequence of the times 56 00:04:14,080 --> 00:04:16,140 that we live in today. 57 00:04:16,180 --> 00:04:18,210 There are marketers everywhere on the Internet. 58 00:04:18,640 --> 00:04:22,170 The name of the game is SEO search engine optimization. 59 00:04:23,110 --> 00:04:27,470 Everyone is trying to get clicks for their blog or to get people to sign up for their course or whatever. 60 00:04:28,000 --> 00:04:33,640 And of course, one of the obvious topics that many beginners care about is stock price prediction. 61 00:04:34,300 --> 00:04:36,340 Now, if you're taking this course, then you know better. 62 00:04:36,340 --> 00:04:37,960 But let's continue the story. 63 00:04:38,680 --> 00:04:40,080 So imagine what happens. 64 00:04:40,480 --> 00:04:45,490 You take one of the most popular topics that would appeal to someone who doesn't know about finance 65 00:04:45,490 --> 00:04:46,920 like stock predictions. 66 00:04:47,320 --> 00:04:52,090 You take one of the most popular machine learning algorithms that would appeal to someone who may not 67 00:04:52,090 --> 00:04:56,290 know a lot about machine learning, but has heard many buzzwords like LSM. 68 00:04:56,980 --> 00:05:00,310 LSM has been a very popular model for a sequence model. 69 00:05:00,320 --> 00:05:02,390 And then what do you do? 70 00:05:02,800 --> 00:05:07,890 Well, you combine these two, you make a blog post on stock predictions with LSD. 71 00:05:08,350 --> 00:05:10,420 In fact, many people have done so. 72 00:05:12,800 --> 00:05:15,060 It's obviously a very appealing idea. 73 00:05:15,590 --> 00:05:20,430 No one would blame you for clicking on an article or buying a course that claims to be able to do this. 74 00:05:20,930 --> 00:05:26,060 I won't name any names, but there are some courses that make the very mistake I'm talking about in 75 00:05:26,060 --> 00:05:26,820 this lecture. 76 00:05:27,410 --> 00:05:30,710 Maybe you're watching this and you know where you've seen something like this yourself. 77 00:05:31,580 --> 00:05:34,520 There is another even worse mistake that these marketers make. 78 00:05:34,700 --> 00:05:39,050 But we'll discuss that later when we talk about forecasting in general for a time series. 79 00:05:39,050 --> 00:05:44,780 Models like Arima, if you want, you're encouraged to skip over to the forecasting lecture so we can 80 00:05:44,780 --> 00:05:46,040 continue this discussion. 81 00:05:51,160 --> 00:05:55,540 To get back to the naive forecast, let's recall what we learned about random walks. 82 00:05:56,170 --> 00:06:02,320 As you recall, a random walk is where on every step of a time series I flip a coin or pick a number 83 00:06:02,320 --> 00:06:03,370 from a distribution. 84 00:06:03,700 --> 00:06:08,440 And that number is added to my current position in order to go to the next position. 85 00:06:09,160 --> 00:06:15,550 If my noise distribution is a zero centered Gaussian with variants Sigma Square, which is not unreasonable, 86 00:06:16,000 --> 00:06:19,390 then the best forecast is the naive forecast. 87 00:06:19,960 --> 00:06:23,030 I can do no better than predicting the last known value. 88 00:06:23,830 --> 00:06:29,410 Another way to think about this is if you build a model that you think is good but it cannot beat the 89 00:06:29,410 --> 00:06:35,480 night forecast, then it might suggest that your model is actually worse than a random walk model. 90 00:06:36,100 --> 00:06:40,570 In other words, a random walk model describes the data better than your model. 9445

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.