subtitlecat.com

All language subtitles for 12. The Naive Forecast and the Importance of Baselines

Afrikaans

Akan

Albanian

Amharic

Arabic Download

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:11,110 --> 00:00:17,860 In this lecture, we are going to discuss a method of forecasting called the naive forecast before we 2 00:00:17,860 --> 00:00:20,030 discuss what the new forecast is. 3 00:00:20,320 --> 00:00:24,280 Let's talk about the importance of establishing baselines in machine learning. 4 00:00:25,120 --> 00:00:28,750 The purpose of a baseline is to have a relevant point of comparison. 5 00:00:29,410 --> 00:00:34,660 To give you an example, suppose that you went to your professor tomorrow and say, hey, I just made 6 00:00:34,660 --> 00:00:35,650 a great discovery. 7 00:00:35,950 --> 00:00:39,880 I built a model using deep learning and I got 99 percent accuracy. 8 00:00:40,510 --> 00:00:45,670 Unfortunately, the statement is meaningless because you don't have a context of a baseline. 9 00:00:46,420 --> 00:00:52,810 Your professor might say, but student, we already know that a simple linear model achieves 100 percent 10 00:00:52,810 --> 00:00:53,590 accuracy. 11 00:00:54,370 --> 00:00:57,670 In this case, your model is worse than what currently exists. 12 00:00:57,880 --> 00:00:59,960 And furthermore, it's also slower. 13 00:01:00,580 --> 00:01:03,210 So how can we make sure that we don't make this mistake? 14 00:01:07,990 --> 00:01:10,100 The answer is using a baseline. 15 00:01:10,720 --> 00:01:15,310 You'll notice that when you're reading machine learning papers, they often compare their results against 16 00:01:15,310 --> 00:01:16,810 the existing state of the art. 17 00:01:17,500 --> 00:01:22,060 Now, it's important to realize that you don't have to constantly try to beat the state of the art. 18 00:01:22,600 --> 00:01:27,400 In fact, that's kind of antithetical to science because all you're doing is chasing numbers. 19 00:01:27,910 --> 00:01:33,640 A good a real life example of that is when students study very hard to memorize their notes so that 20 00:01:33,640 --> 00:01:39,580 they can ace their exam without truly understanding the material or how it's applied when you're chasing 21 00:01:39,580 --> 00:01:40,430 only numbers. 22 00:01:40,720 --> 00:01:44,560 Sometimes it's easy to forget why those numbers matter in the first place. 23 00:01:45,190 --> 00:01:46,790 Anyway, to get back to the point. 24 00:01:47,020 --> 00:01:50,090 It's not always true that you want to compare to the state of the art. 25 00:01:50,710 --> 00:01:55,600 Sometimes if you just want to test whether or not something is working, for example, as a proof of 26 00:01:55,600 --> 00:01:59,110 concept, then the simplest model possible should suffice. 27 00:02:03,880 --> 00:02:10,450 In Time series analysis, the simplest model possible happens to be the naive forecast, what is the 28 00:02:10,450 --> 00:02:11,560 new forecast? 29 00:02:11,980 --> 00:02:17,530 Well, it's just another name for something we've already discussed that is to copy the previous known 30 00:02:17,530 --> 00:02:19,120 value forward in time. 31 00:02:19,960 --> 00:02:26,230 One interesting phenomena that happens in a Time series analysis is that a lot of bad models end up 32 00:02:26,230 --> 00:02:27,970 looking like naive forecasts. 33 00:02:28,570 --> 00:02:33,600 When you look at the model predictions from afar, they seem to be pretty close to the true values. 34 00:02:34,030 --> 00:02:39,340 But when you zoom in, you can see that they seem pretty close only because they're just copying the 35 00:02:39,340 --> 00:02:41,290 previous value or close to it. 36 00:02:46,190 --> 00:02:53,030 A really bad situation is this suppose that you fit a model and you say, aha, I beat the naive forecast 37 00:02:53,030 --> 00:02:56,300 because my accuracy is better than the Niyi forecast. 38 00:02:56,870 --> 00:03:02,330 But of course, you have to ask, are you talking about the accuracy on the sample data or the out of 39 00:03:02,330 --> 00:03:03,060 sample data? 40 00:03:03,740 --> 00:03:08,640 Note that in machine learning we often refer to these as the training data and the test data. 41 00:03:09,200 --> 00:03:11,240 So I would use these terms interchangeably. 42 00:03:11,250 --> 00:03:12,140 Just be aware. 43 00:03:13,430 --> 00:03:19,280 Well, one common mistake is that people believe because they got good accuracy on their in sample data, 44 00:03:19,460 --> 00:03:21,770 that the same will be true for their out of sample data. 45 00:03:22,190 --> 00:03:25,570 And also, quite commonly, it turns out that the opposite is true. 46 00:03:26,210 --> 00:03:31,790 You might beat the night forecast on the sample data, but on the out of sample data, the naive forecast 47 00:03:31,790 --> 00:03:32,480 beats you. 48 00:03:33,080 --> 00:03:34,100 Why does this happen? 49 00:03:36,080 --> 00:03:41,450 It happens when your model over fits to the noise in the training data, but doesn't actually generalize 50 00:03:41,450 --> 00:03:45,510 to the true underlying pattern in the Time series if one exists. 51 00:03:46,070 --> 00:03:51,360 So that's why it's really important to compare your model to a baseline such as the naive forecast. 52 00:03:51,890 --> 00:03:58,440 It's not good enough to say I got 80 percent classification rate on my train and 75 percent on my test 53 00:03:58,450 --> 00:03:58,750 set. 54 00:03:59,150 --> 00:04:04,260 If some other method achieves 70 percent on the test, then your model isn't as good. 55 00:04:09,370 --> 00:04:14,080 There's one application I've seen a lot, which I think is a very interesting consequence of the times 56 00:04:14,080 --> 00:04:16,140 that we live in today. 57 00:04:16,180 --> 00:04:18,210 There are marketers everywhere on the Internet. 58 00:04:18,640 --> 00:04:22,170 The name of the game is SEO search engine optimization. 59 00:04:23,110 --> 00:04:27,470 Everyone is trying to get clicks for their blog or to get people to sign up for their course or whatever. 60 00:04:28,000 --> 00:04:33,640 And of course, one of the obvious topics that many beginners care about is stock price prediction. 61 00:04:34,300 --> 00:04:36,340 Now, if you're taking this course, then you know better. 62 00:04:36,340 --> 00:04:37,960 But let's continue the story. 63 00:04:38,680 --> 00:04:40,080 So imagine what happens. 64 00:04:40,480 --> 00:04:45,490 You take one of the most popular topics that would appeal to someone who doesn't know about finance 65 00:04:45,490 --> 00:04:46,920 like stock predictions. 66 00:04:47,320 --> 00:04:52,090 You take one of the most popular machine learning algorithms that would appeal to someone who may not 67 00:04:52,090 --> 00:04:56,290 know a lot about machine learning, but has heard many buzzwords like LSM. 68 00:04:56,980 --> 00:05:00,310 LSM has been a very popular model for a sequence model. 69 00:05:00,320 --> 00:05:02,390 And then what do you do? 70 00:05:02,800 --> 00:05:07,890 Well, you combine these two, you make a blog post on stock predictions with LSD. 71 00:05:08,350 --> 00:05:10,420 In fact, many people have done so. 72 00:05:12,800 --> 00:05:15,060 It's obviously a very appealing idea. 73 00:05:15,590 --> 00:05:20,430 No one would blame you for clicking on an article or buying a course that claims to be able to do this. 74 00:05:20,930 --> 00:05:26,060 I won't name any names, but there are some courses that make the very mistake I'm talking about in 75 00:05:26,060 --> 00:05:26,820 this lecture. 76 00:05:27,410 --> 00:05:30,710 Maybe you're watching this and you know where you've seen something like this yourself. 77 00:05:31,580 --> 00:05:34,520 There is another even worse mistake that these marketers make. 78 00:05:34,700 --> 00:05:39,050 But we'll discuss that later when we talk about forecasting in general for a time series. 79 00:05:39,050 --> 00:05:44,780 Models like Arima, if you want, you're encouraged to skip over to the forecasting lecture so we can 80 00:05:44,780 --> 00:05:46,040 continue this discussion. 81 00:05:51,160 --> 00:05:55,540 To get back to the naive forecast, let's recall what we learned about random walks. 82 00:05:56,170 --> 00:06:02,320 As you recall, a random walk is where on every step of a time series I flip a coin or pick a number 83 00:06:02,320 --> 00:06:03,370 from a distribution. 84 00:06:03,700 --> 00:06:08,440 And that number is added to my current position in order to go to the next position. 85 00:06:09,160 --> 00:06:15,550 If my noise distribution is a zero centered Gaussian with variants Sigma Square, which is not unreasonable, 86 00:06:16,000 --> 00:06:19,390 then the best forecast is the naive forecast. 87 00:06:19,960 --> 00:06:23,030 I can do no better than predicting the last known value. 88 00:06:23,830 --> 00:06:29,410 Another way to think about this is if you build a model that you think is good but it cannot beat the 89 00:06:29,410 --> 00:06:35,480 night forecast, then it might suggest that your model is actually worse than a random walk model. 90 00:06:36,100 --> 00:06:40,570 In other words, a random walk model describes the data better than your model. 9445