Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:11,090 --> 00:00:16,520
So in this lecture, we'll be looking at how to do fine tuning for text classification when we have
2
00:00:16,520 --> 00:00:18,140
more than one input sentence.
3
00:00:18,800 --> 00:00:22,400
We'll begin by installing transformers and data sets, as we normally do.
4
00:00:32,549 --> 00:00:35,850
The next step is to import low data set as well as numpy.
5
00:00:42,390 --> 00:00:46,200
The next step is to load in our dataset, which is the RTC dataset.
6
00:00:46,680 --> 00:00:48,240
This is part of the glue benchmark.
7
00:00:48,270 --> 00:00:51,060
Like the earlier example on sentiment analysis.
8
00:00:51,720 --> 00:00:56,820
Note that I've pasted a description of the dataset here which explains that the data has been processed
9
00:00:56,820 --> 00:00:58,650
to be for binary classification.
10
00:00:59,370 --> 00:01:04,590
Other options are possible, for example, having three classes where the third class is neutral, meaning
11
00:01:04,590 --> 00:01:07,560
that the two sentences neither entail nor contradict.
12
00:01:16,740 --> 00:01:19,020
The next step is to print out the data set object.
13
00:01:19,020 --> 00:01:21,510
We got back to see what it contains.
14
00:01:26,180 --> 00:01:29,570
So as you can see, we have a train of validation and test sets.
15
00:01:30,110 --> 00:01:34,790
Each data set object contains two sentence columns along with the label and index.
16
00:01:40,030 --> 00:01:42,520
The next step is to check out the features attributes.
17
00:01:46,430 --> 00:01:51,920
So as you can see, this tells us that we have two classes where the first classes entail men and the
18
00:01:51,920 --> 00:01:53,510
second is not entitlement.
19
00:01:58,040 --> 00:02:02,370
The next step is to simply print out some sentences just to see what we're dealing with.
20
00:02:05,630 --> 00:02:07,910
So feel free to check these out, if you like.
21
00:02:13,770 --> 00:02:15,780
The next step is to define our checkpoint.
22
00:02:16,410 --> 00:02:19,320
As mentioned, will be using distilled word for this notebook.
23
00:02:19,770 --> 00:02:22,830
But feel free to try it as well as an exercise.
24
00:02:29,860 --> 00:02:35,230
The next step is to import everything we need from the Transformers library, including the tokenized
25
00:02:35,230 --> 00:02:37,870
model trainer and training arguments.
26
00:02:45,680 --> 00:02:48,740
The next step is to load up our tokens there from the checkpoint.
27
00:02:54,980 --> 00:02:59,450
The next step is to test their token either on the first pair of sentences in our dataset.
28
00:03:07,370 --> 00:03:09,830
The next step is to check the keys of the dictionary.
29
00:03:09,830 --> 00:03:10,700
We got back.
30
00:03:17,000 --> 00:03:20,780
So as you can see, the key for token type IDs is not present.
31
00:03:21,500 --> 00:03:26,930
Recall that one exercise for this notebook is to use Bert instead and to check out the format of the
32
00:03:26,930 --> 00:03:27,500
token type.
33
00:03:27,500 --> 00:03:28,160
It is.
34
00:03:33,610 --> 00:03:36,280
The next step is to decode the input IDs.
35
00:03:40,760 --> 00:03:46,790
So as you can see, our input text is effectively the two sentences concatenated into a single string.
36
00:03:47,450 --> 00:03:50,240
We use the set token to separate the two sentences.
37
00:03:56,610 --> 00:03:59,970
The next step is to load up our Pre-Trained model from the checkpoint.
38
00:04:10,880 --> 00:04:13,850
The next step is to create the training arguments object.
39
00:04:14,510 --> 00:04:17,660
Note that there is one new argument here which is logging steps.
40
00:04:18,290 --> 00:04:22,130
The reason we need this is because this data set doesn't have a lot of samples.
41
00:04:22,580 --> 00:04:28,370
However, the default value for logging steps is very large, resulting in this operation not occurring.
42
00:04:29,240 --> 00:04:34,610
The result is that without setting this argument, we would see no log appearing under the train laws.
43
00:04:34,940 --> 00:04:38,600
When we train our model instead of the loss, which is nice to know.
44
00:04:39,230 --> 00:04:42,230
So try commenting out this argument to see that for yourself.
45
00:04:49,880 --> 00:04:52,520
The next step is to import the load metric function.
46
00:04:59,490 --> 00:05:02,370
The next step is to get the metric for our current task.
47
00:05:08,730 --> 00:05:12,150
The next step is to test our metric just to see what we will get back.
48
00:05:15,930 --> 00:05:19,320
So unfortunately we only get accuracy, which is kind of boring.
49
00:05:24,000 --> 00:05:29,700
Instead we're going to include the F1 score, so we're going to import F1 score from psyche to learn.
50
00:05:36,220 --> 00:05:38,950
The next step is to define our compute metrics function.
51
00:05:39,730 --> 00:05:42,250
We begin by splitting up the logic and the labels.
52
00:05:42,940 --> 00:05:47,320
The next step is to take the arguments of the logic s in order to get the predictions.
53
00:05:47,890 --> 00:05:51,670
Once we have the predictions, we can then compute the accuracy and the F1.
54
00:05:52,450 --> 00:05:55,900
Finally, we return a dictionary containing our desired metrics.
55
00:06:03,210 --> 00:06:07,080
The next step is to define our tokenize, our function as usual.
56
00:06:07,080 --> 00:06:10,560
The input to this function is a batch of data from our datasets.
57
00:06:11,130 --> 00:06:16,590
We can pass in both sets one and send it to a separate arguments and also truncation equals true.
58
00:06:23,750 --> 00:06:26,360
The next step is to create our tokenized datasets.
59
00:06:34,490 --> 00:06:36,830
The next step is to create our trainer object.
60
00:06:45,040 --> 00:06:47,050
The next step is to call trainer to train.
61
00:06:59,980 --> 00:07:00,340
Okay.
62
00:07:00,340 --> 00:07:05,680
So notice that five bucks is probably too much since the validation loss seems to be creeping up.
63
00:07:12,580 --> 00:07:16,930
So as a final exercise for this lecture, don't forget to try Bert with this notebook as well.
64
00:07:17,470 --> 00:07:20,800
Compare the performance of the two models as well as the training time.
65
00:07:21,460 --> 00:07:26,130
Consider if this were a real world project, which model you would prefer to use?
66
00:07:27,440 --> 00:07:28,700
As another exercise.
67
00:07:28,790 --> 00:07:31,640
Please also compute the metrics on our test set as well.
6776
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.