Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:10,970 --> 00:00:16,640
So in this lecture, we are going to look at another example of tax classification, which is slightly
2
00:00:16,640 --> 00:00:17,600
more complicated.
3
00:00:18,440 --> 00:00:23,780
In this example, we are going to build a tax classifier where the input consists of not just one but
4
00:00:23,780 --> 00:00:24,770
two sentences.
5
00:00:25,370 --> 00:00:30,380
This is very practical and we can think of examples where we might want to have a model that knows how
6
00:00:30,380 --> 00:00:33,680
to process multiple input sentences at the same time.
7
00:00:34,520 --> 00:00:38,550
One example is answering multiple choice questions as input.
8
00:00:38,570 --> 00:00:43,850
We might want to pass in the question along with all the possible answers, and we want our model to
9
00:00:43,850 --> 00:00:45,140
select the right answer.
10
00:00:45,800 --> 00:00:47,750
Another example is with chat bots.
11
00:00:48,200 --> 00:00:53,630
With earlier chat bots, a simple way to train them was with single prompt and response pairs.
12
00:00:54,050 --> 00:00:58,940
So you would give the chat bot a prompt and the chat bot would memorise appropriate responses.
13
00:00:59,390 --> 00:01:04,010
But this doesn't take into account the history of the conversation, which might make the conversation
14
00:01:04,010 --> 00:01:04,550
awkward.
15
00:01:06,200 --> 00:01:11,960
Another example is question answering where we pass in a question along with a passage of text which
16
00:01:11,960 --> 00:01:13,160
contains the answer.
17
00:01:13,640 --> 00:01:18,380
We would then like the model to select the portion of the passage where the answer resides.
18
00:01:23,090 --> 00:01:28,310
Now you might be wondering how are we going to build a transformer that can handle multiple sentences
19
00:01:28,310 --> 00:01:29,120
as inputs?
20
00:01:29,900 --> 00:01:34,580
If you have experience with machine learning or neural networks, you might have some sense that this
21
00:01:34,580 --> 00:01:35,930
would be a difficult task.
22
00:01:36,560 --> 00:01:41,420
Normally, with transfer learning, we only have to change the head of the Pre-Trained network while
23
00:01:41,420 --> 00:01:43,730
keeping the inputs in the middle layers of the same.
24
00:01:44,390 --> 00:01:46,550
But how can we change the number of inputs?
25
00:01:47,090 --> 00:01:48,710
The answer is we don't have to.
26
00:01:49,250 --> 00:01:55,130
It turns out that with the single input already in place, we can train the transformer to understand
27
00:01:55,130 --> 00:01:58,700
having multiple sentences concatenated into the same inputs.
28
00:01:59,300 --> 00:02:02,600
In fact, conceptually, this could work with almonds as well.
29
00:02:03,110 --> 00:02:06,590
I'll leave it as an exercise for you to think about how that might work.
30
00:02:11,360 --> 00:02:16,430
You'll recall that with Bert, which is the main model we are using in this section, is pre-trained
31
00:02:16,430 --> 00:02:18,650
on unsupervised NLP tasks.
32
00:02:19,280 --> 00:02:24,410
Well, it turns out that one of these tasks involves having multiple sentences in the same input.
33
00:02:24,860 --> 00:02:28,430
So it makes sense that Bert should be able to handle multiple sentences.
34
00:02:29,060 --> 00:02:32,480
In particular, the task is called next sentence prediction.
35
00:02:33,290 --> 00:02:38,870
To build the data for this task, we take two sentences from our training corpus and the label is whether
36
00:02:38,870 --> 00:02:41,330
or not the second sentence follows the first.
37
00:02:41,900 --> 00:02:44,390
In other words, we build a binary classifier.
38
00:02:46,590 --> 00:02:51,960
For Bird, we use special formatting to help the model understand where the first and second sentences
39
00:02:51,960 --> 00:02:52,710
are located.
40
00:02:53,340 --> 00:02:56,190
In particular, we always start with the class token.
41
00:02:56,670 --> 00:02:58,740
We then follow it with the first sentence.
42
00:02:59,190 --> 00:03:00,570
We then add the SEP token.
43
00:03:01,050 --> 00:03:03,030
We then follow that with the second sentence.
44
00:03:03,420 --> 00:03:06,180
And finally, we add another token at the end.
45
00:03:06,810 --> 00:03:12,960
So hopefully you now understand that the utility of these special tokens, when we only had one sentence
46
00:03:12,960 --> 00:03:17,910
as input, they may have seemed superfluous, but now they actually have practical utility.
47
00:03:22,560 --> 00:03:26,160
The task we'll be looking at next is known as textual intelligence.
48
00:03:26,700 --> 00:03:29,550
This might sound complicated, but in fact it is quite simple.
49
00:03:30,090 --> 00:03:32,200
Thanks to my rule, all data is the same.
50
00:03:32,430 --> 00:03:36,030
It is no different from the next sentence prediction task I just described.
51
00:03:36,660 --> 00:03:41,610
In particular, we're still going to have to input sentences and the target will still be binary.
52
00:03:42,060 --> 00:03:44,460
The only difference is the meaning of the task.
53
00:03:45,120 --> 00:03:48,570
In this case, we want to know whether one sentence entails another.
54
00:03:49,020 --> 00:03:51,340
For example, consider the input sentence pair.
55
00:03:51,360 --> 00:03:54,240
Bob buys a car and Bob owns a car.
56
00:03:54,840 --> 00:03:58,080
This is an example of where the first sentence entails the second.
57
00:03:58,710 --> 00:04:02,520
Now consider Bob purchased cheese and Bob doesn't have cheese.
58
00:04:02,910 --> 00:04:05,430
This is an example of where there is no entitlement.
59
00:04:10,210 --> 00:04:10,570
Okay.
60
00:04:10,570 --> 00:04:13,210
So now let's discuss what this will look like in code.
61
00:04:13,840 --> 00:04:16,810
If you think about it, we know that the model has to be the same.
62
00:04:17,320 --> 00:04:18,880
What changes are the inputs?
63
00:04:19,930 --> 00:04:23,170
The input is still text and the output is still a binary prediction.
64
00:04:23,740 --> 00:04:25,270
What changes are the inputs?
65
00:04:25,810 --> 00:04:29,400
In other words, we should pay attention to the data set and tokenize her.
66
00:04:30,280 --> 00:04:32,620
So the data set will look something like this.
67
00:04:33,310 --> 00:04:38,410
Recall that you can think of it like a tabular data set with different columns for different things,
68
00:04:38,420 --> 00:04:41,150
just like a csv previously.
69
00:04:41,170 --> 00:04:43,330
We only had one input sentence and a label.
70
00:04:43,810 --> 00:04:47,810
This time we'll have two sentences and a label in our dataset.
71
00:04:47,830 --> 00:04:50,170
These are simply called sentence one incentives to.
72
00:04:55,010 --> 00:05:00,200
Luckily, the tokenized brand hugging face is well equipped to handle two sentences at once.
73
00:05:00,620 --> 00:05:03,320
You simply need to pass them in a separate arguments.
74
00:05:03,950 --> 00:05:09,170
This will conceptually convert the inputs into a string formatted as we described earlier, where we
75
00:05:09,170 --> 00:05:11,300
have the class token in the first sentence.
76
00:05:11,570 --> 00:05:12,410
The SEP token.
77
00:05:12,500 --> 00:05:13,370
The second sentence.
78
00:05:13,520 --> 00:05:14,660
And another SEP token.
79
00:05:15,380 --> 00:05:19,730
In reality, these of course, will be converted into tokens and then token IDs.
80
00:05:21,730 --> 00:05:23,920
Now there's one important note to make at this point.
81
00:05:24,610 --> 00:05:29,620
You'll recall that for the bird's organizer, we generate this input called token type IDs.
82
00:05:30,250 --> 00:05:36,220
Having two input sentences will help us understand that the role of this input when we use Bird and
83
00:05:36,220 --> 00:05:42,100
we have two sentences as input, the Birds organizer will use zeros for the positions of the first sentence
84
00:05:42,490 --> 00:05:44,950
and ones for the positions of the second sentence.
85
00:05:45,550 --> 00:05:50,350
So now you know that this input actually has a purpose and isn't totally superfluous.
86
00:05:51,130 --> 00:05:56,350
However, this is an interesting point, and the next example will be using a slightly different model
87
00:05:56,350 --> 00:05:57,700
called Distil Birds.
88
00:05:58,360 --> 00:06:01,750
For this model, there is no input called token type IDs.
89
00:06:02,080 --> 00:06:05,340
It is simply not necessary as an exercise.
90
00:06:05,350 --> 00:06:09,460
After seeing the following notebook, you should double check that this is the case.
91
00:06:10,090 --> 00:06:15,730
When we use Bird, there will be an input called token type IDs, but when we use two Silbert, there
92
00:06:15,730 --> 00:06:16,270
won't be.
93
00:06:16,960 --> 00:06:21,820
Also compare the results to see which model performs better along with their speed of training.
94
00:06:26,590 --> 00:06:31,420
As you recall, we would like to apply that tokenisation to every sample in our data set object.
95
00:06:31,900 --> 00:06:35,500
So we write a tokenized function that we can pass into the map method.
96
00:06:36,100 --> 00:06:37,930
This will also take care of truncation.
97
00:06:39,310 --> 00:06:44,110
The next step is to call the map method passing in the tokenize function, as we did before.
98
00:06:44,530 --> 00:06:46,510
This gives us back our tokenized data.
99
00:06:47,410 --> 00:06:53,140
Of course, at this point, the format of our data is exactly the same as it was in the previous lectures,
100
00:06:53,140 --> 00:06:55,120
which means there is no more work to do.
101
00:06:56,080 --> 00:06:57,100
As an exercise.
102
00:06:57,100 --> 00:06:58,360
Since this is quite simple.
103
00:06:58,720 --> 00:07:01,390
You might want to complete the following notebook on your own.
104
00:07:01,690 --> 00:07:03,460
Before looking at the next lecture.
105
00:07:04,000 --> 00:07:08,230
We'll be using the RC Easy to set up, which is also part of the glue benchmark.
106
00:07:08,650 --> 00:07:10,870
So good luck and I'll see you in the next lecture.
10569
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.