subtitlecat.com

All language subtitles for 012 Fine-Tuning with Multiple Inputs (Textual Entailment)_en

Afrikaans

Akan

Albanian

Amharic

Arabic Download

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:10,970 --> 00:00:16,640 So in this lecture, we are going to look at another example of tax classification, which is slightly 2 00:00:16,640 --> 00:00:17,600 more complicated. 3 00:00:18,440 --> 00:00:23,780 In this example, we are going to build a tax classifier where the input consists of not just one but 4 00:00:23,780 --> 00:00:24,770 two sentences. 5 00:00:25,370 --> 00:00:30,380 This is very practical and we can think of examples where we might want to have a model that knows how 6 00:00:30,380 --> 00:00:33,680 to process multiple input sentences at the same time. 7 00:00:34,520 --> 00:00:38,550 One example is answering multiple choice questions as input. 8 00:00:38,570 --> 00:00:43,850 We might want to pass in the question along with all the possible answers, and we want our model to 9 00:00:43,850 --> 00:00:45,140 select the right answer. 10 00:00:45,800 --> 00:00:47,750 Another example is with chat bots. 11 00:00:48,200 --> 00:00:53,630 With earlier chat bots, a simple way to train them was with single prompt and response pairs. 12 00:00:54,050 --> 00:00:58,940 So you would give the chat bot a prompt and the chat bot would memorise appropriate responses. 13 00:00:59,390 --> 00:01:04,010 But this doesn't take into account the history of the conversation, which might make the conversation 14 00:01:04,010 --> 00:01:04,550 awkward. 15 00:01:06,200 --> 00:01:11,960 Another example is question answering where we pass in a question along with a passage of text which 16 00:01:11,960 --> 00:01:13,160 contains the answer. 17 00:01:13,640 --> 00:01:18,380 We would then like the model to select the portion of the passage where the answer resides. 18 00:01:23,090 --> 00:01:28,310 Now you might be wondering how are we going to build a transformer that can handle multiple sentences 19 00:01:28,310 --> 00:01:29,120 as inputs? 20 00:01:29,900 --> 00:01:34,580 If you have experience with machine learning or neural networks, you might have some sense that this 21 00:01:34,580 --> 00:01:35,930 would be a difficult task. 22 00:01:36,560 --> 00:01:41,420 Normally, with transfer learning, we only have to change the head of the Pre-Trained network while 23 00:01:41,420 --> 00:01:43,730 keeping the inputs in the middle layers of the same. 24 00:01:44,390 --> 00:01:46,550 But how can we change the number of inputs? 25 00:01:47,090 --> 00:01:48,710 The answer is we don't have to. 26 00:01:49,250 --> 00:01:55,130 It turns out that with the single input already in place, we can train the transformer to understand 27 00:01:55,130 --> 00:01:58,700 having multiple sentences concatenated into the same inputs. 28 00:01:59,300 --> 00:02:02,600 In fact, conceptually, this could work with almonds as well. 29 00:02:03,110 --> 00:02:06,590 I'll leave it as an exercise for you to think about how that might work. 30 00:02:11,360 --> 00:02:16,430 You'll recall that with Bert, which is the main model we are using in this section, is pre-trained 31 00:02:16,430 --> 00:02:18,650 on unsupervised NLP tasks. 32 00:02:19,280 --> 00:02:24,410 Well, it turns out that one of these tasks involves having multiple sentences in the same input. 33 00:02:24,860 --> 00:02:28,430 So it makes sense that Bert should be able to handle multiple sentences. 34 00:02:29,060 --> 00:02:32,480 In particular, the task is called next sentence prediction. 35 00:02:33,290 --> 00:02:38,870 To build the data for this task, we take two sentences from our training corpus and the label is whether 36 00:02:38,870 --> 00:02:41,330 or not the second sentence follows the first. 37 00:02:41,900 --> 00:02:44,390 In other words, we build a binary classifier. 38 00:02:46,590 --> 00:02:51,960 For Bird, we use special formatting to help the model understand where the first and second sentences 39 00:02:51,960 --> 00:02:52,710 are located. 40 00:02:53,340 --> 00:02:56,190 In particular, we always start with the class token. 41 00:02:56,670 --> 00:02:58,740 We then follow it with the first sentence. 42 00:02:59,190 --> 00:03:00,570 We then add the SEP token. 43 00:03:01,050 --> 00:03:03,030 We then follow that with the second sentence. 44 00:03:03,420 --> 00:03:06,180 And finally, we add another token at the end. 45 00:03:06,810 --> 00:03:12,960 So hopefully you now understand that the utility of these special tokens, when we only had one sentence 46 00:03:12,960 --> 00:03:17,910 as input, they may have seemed superfluous, but now they actually have practical utility. 47 00:03:22,560 --> 00:03:26,160 The task we'll be looking at next is known as textual intelligence. 48 00:03:26,700 --> 00:03:29,550 This might sound complicated, but in fact it is quite simple. 49 00:03:30,090 --> 00:03:32,200 Thanks to my rule, all data is the same. 50 00:03:32,430 --> 00:03:36,030 It is no different from the next sentence prediction task I just described. 51 00:03:36,660 --> 00:03:41,610 In particular, we're still going to have to input sentences and the target will still be binary. 52 00:03:42,060 --> 00:03:44,460 The only difference is the meaning of the task. 53 00:03:45,120 --> 00:03:48,570 In this case, we want to know whether one sentence entails another. 54 00:03:49,020 --> 00:03:51,340 For example, consider the input sentence pair. 55 00:03:51,360 --> 00:03:54,240 Bob buys a car and Bob owns a car. 56 00:03:54,840 --> 00:03:58,080 This is an example of where the first sentence entails the second. 57 00:03:58,710 --> 00:04:02,520 Now consider Bob purchased cheese and Bob doesn't have cheese. 58 00:04:02,910 --> 00:04:05,430 This is an example of where there is no entitlement. 59 00:04:10,210 --> 00:04:10,570 Okay. 60 00:04:10,570 --> 00:04:13,210 So now let's discuss what this will look like in code. 61 00:04:13,840 --> 00:04:16,810 If you think about it, we know that the model has to be the same. 62 00:04:17,320 --> 00:04:18,880 What changes are the inputs? 63 00:04:19,930 --> 00:04:23,170 The input is still text and the output is still a binary prediction. 64 00:04:23,740 --> 00:04:25,270 What changes are the inputs? 65 00:04:25,810 --> 00:04:29,400 In other words, we should pay attention to the data set and tokenize her. 66 00:04:30,280 --> 00:04:32,620 So the data set will look something like this. 67 00:04:33,310 --> 00:04:38,410 Recall that you can think of it like a tabular data set with different columns for different things, 68 00:04:38,420 --> 00:04:41,150 just like a csv previously. 69 00:04:41,170 --> 00:04:43,330 We only had one input sentence and a label. 70 00:04:43,810 --> 00:04:47,810 This time we'll have two sentences and a label in our dataset. 71 00:04:47,830 --> 00:04:50,170 These are simply called sentence one incentives to. 72 00:04:55,010 --> 00:05:00,200 Luckily, the tokenized brand hugging face is well equipped to handle two sentences at once. 73 00:05:00,620 --> 00:05:03,320 You simply need to pass them in a separate arguments. 74 00:05:03,950 --> 00:05:09,170 This will conceptually convert the inputs into a string formatted as we described earlier, where we 75 00:05:09,170 --> 00:05:11,300 have the class token in the first sentence. 76 00:05:11,570 --> 00:05:12,410 The SEP token. 77 00:05:12,500 --> 00:05:13,370 The second sentence. 78 00:05:13,520 --> 00:05:14,660 And another SEP token. 79 00:05:15,380 --> 00:05:19,730 In reality, these of course, will be converted into tokens and then token IDs. 80 00:05:21,730 --> 00:05:23,920 Now there's one important note to make at this point. 81 00:05:24,610 --> 00:05:29,620 You'll recall that for the bird's organizer, we generate this input called token type IDs. 82 00:05:30,250 --> 00:05:36,220 Having two input sentences will help us understand that the role of this input when we use Bird and 83 00:05:36,220 --> 00:05:42,100 we have two sentences as input, the Birds organizer will use zeros for the positions of the first sentence 84 00:05:42,490 --> 00:05:44,950 and ones for the positions of the second sentence. 85 00:05:45,550 --> 00:05:50,350 So now you know that this input actually has a purpose and isn't totally superfluous. 86 00:05:51,130 --> 00:05:56,350 However, this is an interesting point, and the next example will be using a slightly different model 87 00:05:56,350 --> 00:05:57,700 called Distil Birds. 88 00:05:58,360 --> 00:06:01,750 For this model, there is no input called token type IDs. 89 00:06:02,080 --> 00:06:05,340 It is simply not necessary as an exercise. 90 00:06:05,350 --> 00:06:09,460 After seeing the following notebook, you should double check that this is the case. 91 00:06:10,090 --> 00:06:15,730 When we use Bird, there will be an input called token type IDs, but when we use two Silbert, there 92 00:06:15,730 --> 00:06:16,270 won't be. 93 00:06:16,960 --> 00:06:21,820 Also compare the results to see which model performs better along with their speed of training. 94 00:06:26,590 --> 00:06:31,420 As you recall, we would like to apply that tokenisation to every sample in our data set object. 95 00:06:31,900 --> 00:06:35,500 So we write a tokenized function that we can pass into the map method. 96 00:06:36,100 --> 00:06:37,930 This will also take care of truncation. 97 00:06:39,310 --> 00:06:44,110 The next step is to call the map method passing in the tokenize function, as we did before. 98 00:06:44,530 --> 00:06:46,510 This gives us back our tokenized data. 99 00:06:47,410 --> 00:06:53,140 Of course, at this point, the format of our data is exactly the same as it was in the previous lectures, 100 00:06:53,140 --> 00:06:55,120 which means there is no more work to do. 101 00:06:56,080 --> 00:06:57,100 As an exercise. 102 00:06:57,100 --> 00:06:58,360 Since this is quite simple. 103 00:06:58,720 --> 00:07:01,390 You might want to complete the following notebook on your own. 104 00:07:01,690 --> 00:07:03,460 Before looking at the next lecture. 105 00:07:04,000 --> 00:07:08,230 We'll be using the RC Easy to set up, which is also part of the glue benchmark. 106 00:07:08,650 --> 00:07:10,870 So good luck and I'll see you in the next lecture. 10569