Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:11,190 --> 00:00:16,260
So in this lecture, we'll be investigating something that came up earlier, which is why the fine tune
2
00:00:16,270 --> 00:00:18,570
model outputs only generic label names.
3
00:00:19,200 --> 00:00:24,450
As you recall in the previous lecture, we solved this in kind of a hacky way, which was to modify
4
00:00:24,450 --> 00:00:26,880
the config file after calling the save method.
5
00:00:27,510 --> 00:00:30,450
If you're a programmer, this might make you recoil in horror.
6
00:00:31,050 --> 00:00:33,030
Luckily, there is a slightly better way.
7
00:00:33,690 --> 00:00:37,200
Unfortunately, what you would hope existed doesn't actually exist.
8
00:00:37,770 --> 00:00:43,440
Specifically, it would be nice if you could just pass in specific label names into the Franprix Pre-Trained
9
00:00:43,440 --> 00:00:46,500
method, just as you can specify the number of labels.
10
00:00:46,980 --> 00:00:50,040
This would be ideal and in my opinion makes the most sense.
11
00:00:50,490 --> 00:00:52,830
But unfortunately, this currently isn't possible.
12
00:00:53,550 --> 00:00:58,950
However, there is a way to achieve a similar effect, which is what we'll look at in this lecture in
13
00:00:58,950 --> 00:00:59,500
particular.
14
00:00:59,520 --> 00:01:01,770
Hugging face also has config objects.
15
00:01:02,310 --> 00:01:05,280
We'll pass in this config object into the from pre-trained method.
16
00:01:05,610 --> 00:01:08,040
So it pretty much works like our ideal scenario.
17
00:01:09,540 --> 00:01:13,080
Note that these config objects are model specific like tokenisation.
18
00:01:13,470 --> 00:01:16,980
So you can have a better config, a GP2 config and so forth.
19
00:01:17,730 --> 00:01:23,190
As you might expect, there is also an auto config which automatically chooses the right config object
20
00:01:23,190 --> 00:01:24,900
based on the checkpoint you give it.
21
00:01:25,560 --> 00:01:28,200
Just as we can have auto tokenization and auto models.
22
00:01:28,530 --> 00:01:30,330
We also have auto configs.
23
00:01:31,990 --> 00:01:35,650
Now please note that most of this notebook is the same as the previous one.
24
00:01:35,950 --> 00:01:37,630
So we'll skip to the relevant parts.
25
00:01:49,510 --> 00:01:54,040
So we'll begin by importing auto config along with the auto model and the trainer class.
26
00:01:54,970 --> 00:01:59,620
So recall that earlier in this notebook we've loaded in the data sets, converted them into the correct
27
00:01:59,620 --> 00:02:00,910
format and so forth.
28
00:02:17,140 --> 00:02:22,090
So the next step is to load up a config by calling from pre-trained passing in our checkpoint.
29
00:02:26,810 --> 00:02:30,110
The next step is to print out our config just to see what it looks like.
30
00:02:34,480 --> 00:02:38,110
So as you can see, it's sort of like a dictionary with keys and values.
31
00:02:38,620 --> 00:02:41,680
Importantly, notice how there's nothing here about label names.
32
00:02:47,170 --> 00:02:51,880
Now, if you check the attributes of the config object, you'll see that there are two relevant attributes
33
00:02:51,880 --> 00:02:55,330
corresponding to labels, i.e. to label and label to ID.
34
00:02:56,260 --> 00:02:57,880
So let's look at ID to label.
35
00:03:01,860 --> 00:03:07,480
As you can see, this is a dictionary mapping an integer label ID to the corresponding label name.
36
00:03:07,500 --> 00:03:08,760
As you may have expected.
37
00:03:12,360 --> 00:03:14,250
The next step is to look at the label to ID.
38
00:03:18,220 --> 00:03:22,030
So as you can see, we get the reverse mapping, which again, you may have expected.
39
00:03:26,970 --> 00:03:33,120
So it should be evident that what we need to do is overwrite these 82 label in label to ID attributes.
40
00:03:33,690 --> 00:03:35,760
Now the API for this isn't too great.
41
00:03:36,360 --> 00:03:41,010
In my opinion, there should be a function for doing this so that you don't have to manually overwrite
42
00:03:41,010 --> 00:03:42,030
attributes yourself.
43
00:03:42,600 --> 00:03:45,990
For example, you can just pass in gibberish and it would break your config.
44
00:03:46,560 --> 00:03:50,370
But since no such method exists, we'll just stick with what we can get.
45
00:03:51,770 --> 00:03:56,600
So you can see here that we're basically assigning these the targeted map from earlier in this notebook.
46
00:03:57,110 --> 00:04:02,540
As you recall, the targeted map had our desired label names mapped to corresponding integer IDs.
47
00:04:08,530 --> 00:04:12,940
The next step is to call from Pre-Trained with their auto model to get back a model object.
48
00:04:14,050 --> 00:04:18,910
The difference between what we did before and what we are doing now is that we are now passing in the
49
00:04:18,910 --> 00:04:20,709
config object we just looked at.
50
00:04:31,920 --> 00:04:32,250
Okay.
51
00:04:32,250 --> 00:04:37,200
So essentially all of the remaining steps are the same as the previous notebook, so I won't bother
52
00:04:37,200 --> 00:04:38,340
to explain them again.
53
00:05:07,420 --> 00:05:07,750
Okay.
54
00:05:07,750 --> 00:05:13,210
So at this point we fine tune our model, saved it and loaded it back in as a pipeline object.
55
00:05:13,810 --> 00:05:17,380
At this point, we can just pass in some strings and get back predictions.
56
00:05:22,490 --> 00:05:22,810
Okay.
57
00:05:22,820 --> 00:05:24,650
So our first input is JetBlue.
58
00:05:24,650 --> 00:05:25,250
Thank you.
59
00:05:26,030 --> 00:05:31,460
Predictably, the prediction is positive and the label shows up as the string positive instead of something
60
00:05:31,460 --> 00:05:32,480
like label one.
61
00:05:32,810 --> 00:05:35,570
So passing in the config object was a success.
62
00:05:35,990 --> 00:05:39,110
We no longer had to manually modify the config file.
6179
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.