All language subtitles for 012 Fine-Tuning with Multiple Inputs (Textual Entailment)_en

af Afrikaans
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:10,970 --> 00:00:16,640 So in this lecture, we are going to look at another example of tax classification, which is slightly 2 00:00:16,640 --> 00:00:17,600 more complicated. 3 00:00:18,440 --> 00:00:23,780 In this example, we are going to build a tax classifier where the input consists of not just one but 4 00:00:23,780 --> 00:00:24,770 two sentences. 5 00:00:25,370 --> 00:00:30,380 This is very practical and we can think of examples where we might want to have a model that knows how 6 00:00:30,380 --> 00:00:33,680 to process multiple input sentences at the same time. 7 00:00:34,520 --> 00:00:38,550 One example is answering multiple choice questions as input. 8 00:00:38,570 --> 00:00:43,850 We might want to pass in the question along with all the possible answers, and we want our model to 9 00:00:43,850 --> 00:00:45,140 select the right answer. 10 00:00:45,800 --> 00:00:47,750 Another example is with chat bots. 11 00:00:48,200 --> 00:00:53,630 With earlier chat bots, a simple way to train them was with single prompt and response pairs. 12 00:00:54,050 --> 00:00:58,940 So you would give the chat bot a prompt and the chat bot would memorise appropriate responses. 13 00:00:59,390 --> 00:01:04,010 But this doesn't take into account the history of the conversation, which might make the conversation 14 00:01:04,010 --> 00:01:04,550 awkward. 15 00:01:06,200 --> 00:01:11,960 Another example is question answering where we pass in a question along with a passage of text which 16 00:01:11,960 --> 00:01:13,160 contains the answer. 17 00:01:13,640 --> 00:01:18,380 We would then like the model to select the portion of the passage where the answer resides. 18 00:01:23,090 --> 00:01:28,310 Now you might be wondering how are we going to build a transformer that can handle multiple sentences 19 00:01:28,310 --> 00:01:29,120 as inputs? 20 00:01:29,900 --> 00:01:34,580 If you have experience with machine learning or neural networks, you might have some sense that this 21 00:01:34,580 --> 00:01:35,930 would be a difficult task. 22 00:01:36,560 --> 00:01:41,420 Normally, with transfer learning, we only have to change the head of the Pre-Trained network while 23 00:01:41,420 --> 00:01:43,730 keeping the inputs in the middle layers of the same. 24 00:01:44,390 --> 00:01:46,550 But how can we change the number of inputs? 25 00:01:47,090 --> 00:01:48,710 The answer is we don't have to. 26 00:01:49,250 --> 00:01:55,130 It turns out that with the single input already in place, we can train the transformer to understand 27 00:01:55,130 --> 00:01:58,700 having multiple sentences concatenated into the same inputs. 28 00:01:59,300 --> 00:02:02,600 In fact, conceptually, this could work with almonds as well. 29 00:02:03,110 --> 00:02:06,590 I'll leave it as an exercise for you to think about how that might work. 30 00:02:11,360 --> 00:02:16,430 You'll recall that with Bert, which is the main model we are using in this section, is pre-trained 31 00:02:16,430 --> 00:02:18,650 on unsupervised NLP tasks. 32 00:02:19,280 --> 00:02:24,410 Well, it turns out that one of these tasks involves having multiple sentences in the same input. 33 00:02:24,860 --> 00:02:28,430 So it makes sense that Bert should be able to handle multiple sentences. 34 00:02:29,060 --> 00:02:32,480 In particular, the task is called next sentence prediction. 35 00:02:33,290 --> 00:02:38,870 To build the data for this task, we take two sentences from our training corpus and the label is whether 36 00:02:38,870 --> 00:02:41,330 or not the second sentence follows the first. 37 00:02:41,900 --> 00:02:44,390 In other words, we build a binary classifier. 38 00:02:46,590 --> 00:02:51,960 For Bird, we use special formatting to help the model understand where the first and second sentences 39 00:02:51,960 --> 00:02:52,710 are located. 40 00:02:53,340 --> 00:02:56,190 In particular, we always start with the class token. 41 00:02:56,670 --> 00:02:58,740 We then follow it with the first sentence. 42 00:02:59,190 --> 00:03:00,570 We then add the SEP token. 43 00:03:01,050 --> 00:03:03,030 We then follow that with the second sentence. 44 00:03:03,420 --> 00:03:06,180 And finally, we add another token at the end. 45 00:03:06,810 --> 00:03:12,960 So hopefully you now understand that the utility of these special tokens, when we only had one sentence 46 00:03:12,960 --> 00:03:17,910 as input, they may have seemed superfluous, but now they actually have practical utility. 47 00:03:22,560 --> 00:03:26,160 The task we'll be looking at next is known as textual intelligence. 48 00:03:26,700 --> 00:03:29,550 This might sound complicated, but in fact it is quite simple. 49 00:03:30,090 --> 00:03:32,200 Thanks to my rule, all data is the same. 50 00:03:32,430 --> 00:03:36,030 It is no different from the next sentence prediction task I just described. 51 00:03:36,660 --> 00:03:41,610 In particular, we're still going to have to input sentences and the target will still be binary. 52 00:03:42,060 --> 00:03:44,460 The only difference is the meaning of the task. 53 00:03:45,120 --> 00:03:48,570 In this case, we want to know whether one sentence entails another. 54 00:03:49,020 --> 00:03:51,340 For example, consider the input sentence pair. 55 00:03:51,360 --> 00:03:54,240 Bob buys a car and Bob owns a car. 56 00:03:54,840 --> 00:03:58,080 This is an example of where the first sentence entails the second. 57 00:03:58,710 --> 00:04:02,520 Now consider Bob purchased cheese and Bob doesn't have cheese. 58 00:04:02,910 --> 00:04:05,430 This is an example of where there is no entitlement. 59 00:04:10,210 --> 00:04:10,570 Okay. 60 00:04:10,570 --> 00:04:13,210 So now let's discuss what this will look like in code. 61 00:04:13,840 --> 00:04:16,810 If you think about it, we know that the model has to be the same. 62 00:04:17,320 --> 00:04:18,880 What changes are the inputs? 63 00:04:19,930 --> 00:04:23,170 The input is still text and the output is still a binary prediction. 64 00:04:23,740 --> 00:04:25,270 What changes are the inputs? 65 00:04:25,810 --> 00:04:29,400 In other words, we should pay attention to the data set and tokenize her. 66 00:04:30,280 --> 00:04:32,620 So the data set will look something like this. 67 00:04:33,310 --> 00:04:38,410 Recall that you can think of it like a tabular data set with different columns for different things, 68 00:04:38,420 --> 00:04:41,150 just like a csv previously. 69 00:04:41,170 --> 00:04:43,330 We only had one input sentence and a label. 70 00:04:43,810 --> 00:04:47,810 This time we'll have two sentences and a label in our dataset. 71 00:04:47,830 --> 00:04:50,170 These are simply called sentence one incentives to. 72 00:04:55,010 --> 00:05:00,200 Luckily, the tokenized brand hugging face is well equipped to handle two sentences at once. 73 00:05:00,620 --> 00:05:03,320 You simply need to pass them in a separate arguments. 74 00:05:03,950 --> 00:05:09,170 This will conceptually convert the inputs into a string formatted as we described earlier, where we 75 00:05:09,170 --> 00:05:11,300 have the class token in the first sentence. 76 00:05:11,570 --> 00:05:12,410 The SEP token. 77 00:05:12,500 --> 00:05:13,370 The second sentence. 78 00:05:13,520 --> 00:05:14,660 And another SEP token. 79 00:05:15,380 --> 00:05:19,730 In reality, these of course, will be converted into tokens and then token IDs. 80 00:05:21,730 --> 00:05:23,920 Now there's one important note to make at this point. 81 00:05:24,610 --> 00:05:29,620 You'll recall that for the bird's organizer, we generate this input called token type IDs. 82 00:05:30,250 --> 00:05:36,220 Having two input sentences will help us understand that the role of this input when we use Bird and 83 00:05:36,220 --> 00:05:42,100 we have two sentences as input, the Birds organizer will use zeros for the positions of the first sentence 84 00:05:42,490 --> 00:05:44,950 and ones for the positions of the second sentence. 85 00:05:45,550 --> 00:05:50,350 So now you know that this input actually has a purpose and isn't totally superfluous. 86 00:05:51,130 --> 00:05:56,350 However, this is an interesting point, and the next example will be using a slightly different model 87 00:05:56,350 --> 00:05:57,700 called Distil Birds. 88 00:05:58,360 --> 00:06:01,750 For this model, there is no input called token type IDs. 89 00:06:02,080 --> 00:06:05,340 It is simply not necessary as an exercise. 90 00:06:05,350 --> 00:06:09,460 After seeing the following notebook, you should double check that this is the case. 91 00:06:10,090 --> 00:06:15,730 When we use Bird, there will be an input called token type IDs, but when we use two Silbert, there 92 00:06:15,730 --> 00:06:16,270 won't be. 93 00:06:16,960 --> 00:06:21,820 Also compare the results to see which model performs better along with their speed of training. 94 00:06:26,590 --> 00:06:31,420 As you recall, we would like to apply that tokenisation to every sample in our data set object. 95 00:06:31,900 --> 00:06:35,500 So we write a tokenized function that we can pass into the map method. 96 00:06:36,100 --> 00:06:37,930 This will also take care of truncation. 97 00:06:39,310 --> 00:06:44,110 The next step is to call the map method passing in the tokenize function, as we did before. 98 00:06:44,530 --> 00:06:46,510 This gives us back our tokenized data. 99 00:06:47,410 --> 00:06:53,140 Of course, at this point, the format of our data is exactly the same as it was in the previous lectures, 100 00:06:53,140 --> 00:06:55,120 which means there is no more work to do. 101 00:06:56,080 --> 00:06:57,100 As an exercise. 102 00:06:57,100 --> 00:06:58,360 Since this is quite simple. 103 00:06:58,720 --> 00:07:01,390 You might want to complete the following notebook on your own. 104 00:07:01,690 --> 00:07:03,460 Before looking at the next lecture. 105 00:07:04,000 --> 00:07:08,230 We'll be using the RC Easy to set up, which is also part of the glue benchmark. 106 00:07:08,650 --> 00:07:10,870 So good luck and I'll see you in the next lecture. 10569

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.