Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:00,940 --> 00:00:02,470
[Autogenerated] So here is there be a rat.
1
00:00:02,470 --> 00:00:04,230
We know that we need to split our data
2
00:00:04,230 --> 00:00:07,120
into training. Validation and test data
3
00:00:07,120 --> 00:00:09,039
will produce end candidate models for
4
00:00:09,039 --> 00:00:11,220
running and training and validation
5
00:00:11,220 --> 00:00:13,970
processes. But just one test process to
6
00:00:13,970 --> 00:00:15,779
evaluate the finding model that we have
7
00:00:15,779 --> 00:00:19,039
chosen. This is referred to us singular
8
00:00:19,039 --> 00:00:22,030
cross validation the splitter original
9
00:00:22,030 --> 00:00:25,309
data in tow, training tests on a single
10
00:00:25,309 --> 00:00:28,629
validation set. Let's visually see how we
11
00:00:28,629 --> 00:00:31,190
use these three subsets off our data to
12
00:00:31,190 --> 00:00:33,600
get the best possible model. We trained
13
00:00:33,600 --> 00:00:35,500
the different candidate models on the
14
00:00:35,500 --> 00:00:37,390
training data. Evaluate them on the
15
00:00:37,390 --> 00:00:39,969
validation data. This process is called
16
00:00:39,969 --> 00:00:42,700
hyper parameter tuning. Each candidate,
17
00:00:42,700 --> 00:00:44,840
Morty will have different design
18
00:00:44,840 --> 00:00:46,840
parameters. You're trying to figure out
19
00:00:46,840 --> 00:00:49,340
which design off your model works well for
20
00:00:49,340 --> 00:00:52,179
your data. And finally, after you've used
21
00:00:52,179 --> 00:00:54,520
hyper parameter tuning to find the best
22
00:00:54,520 --> 00:00:57,369
design for your model, you'll do our final
23
00:00:57,369 --> 00:00:59,820
evaluation. The test data. So you know
24
00:00:59,820 --> 00:01:02,479
this is how your model performs the EU's
25
00:01:02,479 --> 00:01:05,040
off. A holdout validation set is a huge
26
00:01:05,040 --> 00:01:06,790
improvement over what we were doing
27
00:01:06,790 --> 00:01:09,209
earlier. However, that is still a problem.
28
00:01:09,209 --> 00:01:11,030
The model's performance on the validation
29
00:01:11,030 --> 00:01:14,030
sec get incorporated into the model
30
00:01:14,030 --> 00:01:17,890
itself, and this may introduce bias. So
31
00:01:17,890 --> 00:01:21,180
the validation set data become partof the
32
00:01:21,180 --> 00:01:23,939
models designed. And that's not good.
33
00:01:23,939 --> 00:01:25,819
What? We're trying to get us a model that
34
00:01:25,819 --> 00:01:28,760
is as robust as we can make it, which is
35
00:01:28,760 --> 00:01:31,540
why an alternative to using singular cross
36
00:01:31,540 --> 00:01:34,840
validation is key. Fold cross validation.
37
00:01:34,840 --> 00:01:36,719
Here. You don't have a single set off
38
00:01:36,719 --> 00:01:39,180
validation data to generate each candidate
39
00:01:39,180 --> 00:01:41,200
mortal. You'd repeatedly trained and
40
00:01:41,200 --> 00:01:43,250
validate using different subsets off
41
00:01:43,250 --> 00:01:46,189
training data. Now, this might not seem
42
00:01:46,189 --> 00:01:48,239
intuitive to you at first, but we'll see
43
00:01:48,239 --> 00:01:49,810
it visually and you'll understand what's
44
00:01:49,810 --> 00:01:52,510
going on. Okay, full cross validation
45
00:01:52,510 --> 00:01:54,299
tends to be very compute. A Shin Lee
46
00:01:54,299 --> 00:01:57,620
intensive but very robust. It does not
47
00:01:57,620 --> 00:02:00,019
waste. Eight are all off. The data is used
48
00:02:00,019 --> 00:02:03,109
well to generate a good model. Let's
49
00:02:03,109 --> 00:02:05,140
visually understand how k fold cross
50
00:02:05,140 --> 00:02:07,599
validation books. You have all of the data
51
00:02:07,599 --> 00:02:09,120
available to you in the real ball. You
52
00:02:09,120 --> 00:02:12,020
split it into training data on dhe test
53
00:02:12,020 --> 00:02:14,349
data. Test data is what you lose to
54
00:02:14,349 --> 00:02:16,780
perform a final evaluation on the model.
55
00:02:16,780 --> 00:02:19,159
Now instruct using the same validation
56
00:02:19,159 --> 00:02:21,349
data to evaluate different candidate
57
00:02:21,349 --> 00:02:23,560
models, you'll stay split your training
58
00:02:23,560 --> 00:02:25,960
data into different falls. Here I have
59
00:02:25,960 --> 00:02:28,419
five fold. This is fivefold. Cross
60
00:02:28,419 --> 00:02:31,219
validation with five full cross validation
61
00:02:31,219 --> 00:02:33,830
for each candidate model. You'll train
62
00:02:33,830 --> 00:02:37,240
your model five times the first time. Fold
63
00:02:37,240 --> 00:02:40,039
234 and five will be the training data.
64
00:02:40,039 --> 00:02:42,740
Fold one will be the validation data.
65
00:02:42,740 --> 00:02:44,860
You'll then train the same candidate of
66
00:02:44,860 --> 00:02:47,569
model with a different subset of training
67
00:02:47,569 --> 00:02:50,819
data full 134 and five complex. The
68
00:02:50,819 --> 00:02:53,110
training data fall, too. It's a validation
69
00:02:53,110 --> 00:02:56,139
data. You'll then do 1/3 round of training
70
00:02:56,139 --> 00:02:58,240
for the same candidate, Marty. This time,
71
00:02:58,240 --> 00:03:00,840
fold three is the validation data that the
72
00:03:00,840 --> 00:03:03,270
meaning falls. Make up your training data,
73
00:03:03,270 --> 00:03:05,460
and you'll continue this for split four
74
00:03:05,460 --> 00:03:08,629
and split five as well. So when you use
75
00:03:08,629 --> 00:03:11,219
fivefold cross validation for a single
76
00:03:11,219 --> 00:03:13,969
candidate model, you've run five training
77
00:03:13,969 --> 00:03:17,439
processes and five validation processes.
78
00:03:17,439 --> 00:03:19,800
Training and validation is run on each
79
00:03:19,800 --> 00:03:23,030
full off your training data. Once you run
80
00:03:23,030 --> 00:03:25,060
these five different training and
81
00:03:25,060 --> 00:03:27,509
validation processes, you average the
82
00:03:27,509 --> 00:03:29,699
performance off this candidate model
83
00:03:29,699 --> 00:03:32,639
across all fools, so you'll get one
84
00:03:32,639 --> 00:03:35,680
average score. And for this particular
85
00:03:35,680 --> 00:03:38,020
candidate, Morty, this average performance
86
00:03:38,020 --> 00:03:40,650
scores what you'll use to find the best
87
00:03:40,650 --> 00:03:43,250
candidate model, which candidate model has
88
00:03:43,250 --> 00:03:45,759
the best average performance school across
89
00:03:45,759 --> 00:03:48,259
all falls off training and validation.
90
00:03:48,259 --> 00:03:50,110
Once you've trained all of her candidate
91
00:03:50,110 --> 00:03:52,610
models on all of these falls, on average,
92
00:03:52,610 --> 00:03:54,740
their performance score, you'll take the
93
00:03:54,740 --> 00:03:57,569
best one that you phoned, evaluated on the
94
00:03:57,569 --> 00:04:00,990
test data. Thus, with careful cross
95
00:04:00,990 --> 00:04:03,520
validation since the validation data
96
00:04:03,520 --> 00:04:06,770
changes in each fold off training, it's
97
00:04:06,770 --> 00:04:08,840
impossible for the information in the
98
00:04:08,840 --> 00:04:13,000
validation data to become incorporated as part of the model.
7715
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.