Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:11,110 --> 00:00:17,860
In this lecture, we are going to discuss a method of forecasting called the naive forecast before we
2
00:00:17,860 --> 00:00:20,030
discuss what the new forecast is.
3
00:00:20,320 --> 00:00:24,280
Let's talk about the importance of establishing baselines in machine learning.
4
00:00:25,120 --> 00:00:28,750
The purpose of a baseline is to have a relevant point of comparison.
5
00:00:29,410 --> 00:00:34,660
To give you an example, suppose that you went to your professor tomorrow and say, hey, I just made
6
00:00:34,660 --> 00:00:35,650
a great discovery.
7
00:00:35,950 --> 00:00:39,880
I built a model using deep learning and I got 99 percent accuracy.
8
00:00:40,510 --> 00:00:45,670
Unfortunately, the statement is meaningless because you don't have a context of a baseline.
9
00:00:46,420 --> 00:00:52,810
Your professor might say, but student, we already know that a simple linear model achieves 100 percent
10
00:00:52,810 --> 00:00:53,590
accuracy.
11
00:00:54,370 --> 00:00:57,670
In this case, your model is worse than what currently exists.
12
00:00:57,880 --> 00:00:59,960
And furthermore, it's also slower.
13
00:01:00,580 --> 00:01:03,210
So how can we make sure that we don't make this mistake?
14
00:01:07,990 --> 00:01:10,100
The answer is using a baseline.
15
00:01:10,720 --> 00:01:15,310
You'll notice that when you're reading machine learning papers, they often compare their results against
16
00:01:15,310 --> 00:01:16,810
the existing state of the art.
17
00:01:17,500 --> 00:01:22,060
Now, it's important to realize that you don't have to constantly try to beat the state of the art.
18
00:01:22,600 --> 00:01:27,400
In fact, that's kind of antithetical to science because all you're doing is chasing numbers.
19
00:01:27,910 --> 00:01:33,640
A good a real life example of that is when students study very hard to memorize their notes so that
20
00:01:33,640 --> 00:01:39,580
they can ace their exam without truly understanding the material or how it's applied when you're chasing
21
00:01:39,580 --> 00:01:40,430
only numbers.
22
00:01:40,720 --> 00:01:44,560
Sometimes it's easy to forget why those numbers matter in the first place.
23
00:01:45,190 --> 00:01:46,790
Anyway, to get back to the point.
24
00:01:47,020 --> 00:01:50,090
It's not always true that you want to compare to the state of the art.
25
00:01:50,710 --> 00:01:55,600
Sometimes if you just want to test whether or not something is working, for example, as a proof of
26
00:01:55,600 --> 00:01:59,110
concept, then the simplest model possible should suffice.
27
00:02:03,880 --> 00:02:10,450
In Time series analysis, the simplest model possible happens to be the naive forecast, what is the
28
00:02:10,450 --> 00:02:11,560
new forecast?
29
00:02:11,980 --> 00:02:17,530
Well, it's just another name for something we've already discussed that is to copy the previous known
30
00:02:17,530 --> 00:02:19,120
value forward in time.
31
00:02:19,960 --> 00:02:26,230
One interesting phenomena that happens in a Time series analysis is that a lot of bad models end up
32
00:02:26,230 --> 00:02:27,970
looking like naive forecasts.
33
00:02:28,570 --> 00:02:33,600
When you look at the model predictions from afar, they seem to be pretty close to the true values.
34
00:02:34,030 --> 00:02:39,340
But when you zoom in, you can see that they seem pretty close only because they're just copying the
35
00:02:39,340 --> 00:02:41,290
previous value or close to it.
36
00:02:46,190 --> 00:02:53,030
A really bad situation is this suppose that you fit a model and you say, aha, I beat the naive forecast
37
00:02:53,030 --> 00:02:56,300
because my accuracy is better than the Niyi forecast.
38
00:02:56,870 --> 00:03:02,330
But of course, you have to ask, are you talking about the accuracy on the sample data or the out of
39
00:03:02,330 --> 00:03:03,060
sample data?
40
00:03:03,740 --> 00:03:08,640
Note that in machine learning we often refer to these as the training data and the test data.
41
00:03:09,200 --> 00:03:11,240
So I would use these terms interchangeably.
42
00:03:11,250 --> 00:03:12,140
Just be aware.
43
00:03:13,430 --> 00:03:19,280
Well, one common mistake is that people believe because they got good accuracy on their in sample data,
44
00:03:19,460 --> 00:03:21,770
that the same will be true for their out of sample data.
45
00:03:22,190 --> 00:03:25,570
And also, quite commonly, it turns out that the opposite is true.
46
00:03:26,210 --> 00:03:31,790
You might beat the night forecast on the sample data, but on the out of sample data, the naive forecast
47
00:03:31,790 --> 00:03:32,480
beats you.
48
00:03:33,080 --> 00:03:34,100
Why does this happen?
49
00:03:36,080 --> 00:03:41,450
It happens when your model over fits to the noise in the training data, but doesn't actually generalize
50
00:03:41,450 --> 00:03:45,510
to the true underlying pattern in the Time series if one exists.
51
00:03:46,070 --> 00:03:51,360
So that's why it's really important to compare your model to a baseline such as the naive forecast.
52
00:03:51,890 --> 00:03:58,440
It's not good enough to say I got 80 percent classification rate on my train and 75 percent on my test
53
00:03:58,450 --> 00:03:58,750
set.
54
00:03:59,150 --> 00:04:04,260
If some other method achieves 70 percent on the test, then your model isn't as good.
55
00:04:09,370 --> 00:04:14,080
There's one application I've seen a lot, which I think is a very interesting consequence of the times
56
00:04:14,080 --> 00:04:16,140
that we live in today.
57
00:04:16,180 --> 00:04:18,210
There are marketers everywhere on the Internet.
58
00:04:18,640 --> 00:04:22,170
The name of the game is SEO search engine optimization.
59
00:04:23,110 --> 00:04:27,470
Everyone is trying to get clicks for their blog or to get people to sign up for their course or whatever.
60
00:04:28,000 --> 00:04:33,640
And of course, one of the obvious topics that many beginners care about is stock price prediction.
61
00:04:34,300 --> 00:04:36,340
Now, if you're taking this course, then you know better.
62
00:04:36,340 --> 00:04:37,960
But let's continue the story.
63
00:04:38,680 --> 00:04:40,080
So imagine what happens.
64
00:04:40,480 --> 00:04:45,490
You take one of the most popular topics that would appeal to someone who doesn't know about finance
65
00:04:45,490 --> 00:04:46,920
like stock predictions.
66
00:04:47,320 --> 00:04:52,090
You take one of the most popular machine learning algorithms that would appeal to someone who may not
67
00:04:52,090 --> 00:04:56,290
know a lot about machine learning, but has heard many buzzwords like LSM.
68
00:04:56,980 --> 00:05:00,310
LSM has been a very popular model for a sequence model.
69
00:05:00,320 --> 00:05:02,390
And then what do you do?
70
00:05:02,800 --> 00:05:07,890
Well, you combine these two, you make a blog post on stock predictions with LSD.
71
00:05:08,350 --> 00:05:10,420
In fact, many people have done so.
72
00:05:12,800 --> 00:05:15,060
It's obviously a very appealing idea.
73
00:05:15,590 --> 00:05:20,430
No one would blame you for clicking on an article or buying a course that claims to be able to do this.
74
00:05:20,930 --> 00:05:26,060
I won't name any names, but there are some courses that make the very mistake I'm talking about in
75
00:05:26,060 --> 00:05:26,820
this lecture.
76
00:05:27,410 --> 00:05:30,710
Maybe you're watching this and you know where you've seen something like this yourself.
77
00:05:31,580 --> 00:05:34,520
There is another even worse mistake that these marketers make.
78
00:05:34,700 --> 00:05:39,050
But we'll discuss that later when we talk about forecasting in general for a time series.
79
00:05:39,050 --> 00:05:44,780
Models like Arima, if you want, you're encouraged to skip over to the forecasting lecture so we can
80
00:05:44,780 --> 00:05:46,040
continue this discussion.
81
00:05:51,160 --> 00:05:55,540
To get back to the naive forecast, let's recall what we learned about random walks.
82
00:05:56,170 --> 00:06:02,320
As you recall, a random walk is where on every step of a time series I flip a coin or pick a number
83
00:06:02,320 --> 00:06:03,370
from a distribution.
84
00:06:03,700 --> 00:06:08,440
And that number is added to my current position in order to go to the next position.
85
00:06:09,160 --> 00:06:15,550
If my noise distribution is a zero centered Gaussian with variants Sigma Square, which is not unreasonable,
86
00:06:16,000 --> 00:06:19,390
then the best forecast is the naive forecast.
87
00:06:19,960 --> 00:06:23,030
I can do no better than predicting the last known value.
88
00:06:23,830 --> 00:06:29,410
Another way to think about this is if you build a model that you think is good but it cannot beat the
89
00:06:29,410 --> 00:06:35,480
night forecast, then it might suggest that your model is actually worse than a random walk model.
90
00:06:36,100 --> 00:06:40,570
In other words, a random walk model describes the data better than your model.
9445
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.