All language subtitles for 013 Fine-Tuning Transformers with Multiple Inputs in Python_en

af Afrikaans
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:11,090 --> 00:00:16,520 So in this lecture, we'll be looking at how to do fine tuning for text classification when we have 2 00:00:16,520 --> 00:00:18,140 more than one input sentence. 3 00:00:18,800 --> 00:00:22,400 We'll begin by installing transformers and data sets, as we normally do. 4 00:00:32,549 --> 00:00:35,850 The next step is to import low data set as well as numpy. 5 00:00:42,390 --> 00:00:46,200 The next step is to load in our dataset, which is the RTC dataset. 6 00:00:46,680 --> 00:00:48,240 This is part of the glue benchmark. 7 00:00:48,270 --> 00:00:51,060 Like the earlier example on sentiment analysis. 8 00:00:51,720 --> 00:00:56,820 Note that I've pasted a description of the dataset here which explains that the data has been processed 9 00:00:56,820 --> 00:00:58,650 to be for binary classification. 10 00:00:59,370 --> 00:01:04,590 Other options are possible, for example, having three classes where the third class is neutral, meaning 11 00:01:04,590 --> 00:01:07,560 that the two sentences neither entail nor contradict. 12 00:01:16,740 --> 00:01:19,020 The next step is to print out the data set object. 13 00:01:19,020 --> 00:01:21,510 We got back to see what it contains. 14 00:01:26,180 --> 00:01:29,570 So as you can see, we have a train of validation and test sets. 15 00:01:30,110 --> 00:01:34,790 Each data set object contains two sentence columns along with the label and index. 16 00:01:40,030 --> 00:01:42,520 The next step is to check out the features attributes. 17 00:01:46,430 --> 00:01:51,920 So as you can see, this tells us that we have two classes where the first classes entail men and the 18 00:01:51,920 --> 00:01:53,510 second is not entitlement. 19 00:01:58,040 --> 00:02:02,370 The next step is to simply print out some sentences just to see what we're dealing with. 20 00:02:05,630 --> 00:02:07,910 So feel free to check these out, if you like. 21 00:02:13,770 --> 00:02:15,780 The next step is to define our checkpoint. 22 00:02:16,410 --> 00:02:19,320 As mentioned, will be using distilled word for this notebook. 23 00:02:19,770 --> 00:02:22,830 But feel free to try it as well as an exercise. 24 00:02:29,860 --> 00:02:35,230 The next step is to import everything we need from the Transformers library, including the tokenized 25 00:02:35,230 --> 00:02:37,870 model trainer and training arguments. 26 00:02:45,680 --> 00:02:48,740 The next step is to load up our tokens there from the checkpoint. 27 00:02:54,980 --> 00:02:59,450 The next step is to test their token either on the first pair of sentences in our dataset. 28 00:03:07,370 --> 00:03:09,830 The next step is to check the keys of the dictionary. 29 00:03:09,830 --> 00:03:10,700 We got back. 30 00:03:17,000 --> 00:03:20,780 So as you can see, the key for token type IDs is not present. 31 00:03:21,500 --> 00:03:26,930 Recall that one exercise for this notebook is to use Bert instead and to check out the format of the 32 00:03:26,930 --> 00:03:27,500 token type. 33 00:03:27,500 --> 00:03:28,160 It is. 34 00:03:33,610 --> 00:03:36,280 The next step is to decode the input IDs. 35 00:03:40,760 --> 00:03:46,790 So as you can see, our input text is effectively the two sentences concatenated into a single string. 36 00:03:47,450 --> 00:03:50,240 We use the set token to separate the two sentences. 37 00:03:56,610 --> 00:03:59,970 The next step is to load up our Pre-Trained model from the checkpoint. 38 00:04:10,880 --> 00:04:13,850 The next step is to create the training arguments object. 39 00:04:14,510 --> 00:04:17,660 Note that there is one new argument here which is logging steps. 40 00:04:18,290 --> 00:04:22,130 The reason we need this is because this data set doesn't have a lot of samples. 41 00:04:22,580 --> 00:04:28,370 However, the default value for logging steps is very large, resulting in this operation not occurring. 42 00:04:29,240 --> 00:04:34,610 The result is that without setting this argument, we would see no log appearing under the train laws. 43 00:04:34,940 --> 00:04:38,600 When we train our model instead of the loss, which is nice to know. 44 00:04:39,230 --> 00:04:42,230 So try commenting out this argument to see that for yourself. 45 00:04:49,880 --> 00:04:52,520 The next step is to import the load metric function. 46 00:04:59,490 --> 00:05:02,370 The next step is to get the metric for our current task. 47 00:05:08,730 --> 00:05:12,150 The next step is to test our metric just to see what we will get back. 48 00:05:15,930 --> 00:05:19,320 So unfortunately we only get accuracy, which is kind of boring. 49 00:05:24,000 --> 00:05:29,700 Instead we're going to include the F1 score, so we're going to import F1 score from psyche to learn. 50 00:05:36,220 --> 00:05:38,950 The next step is to define our compute metrics function. 51 00:05:39,730 --> 00:05:42,250 We begin by splitting up the logic and the labels. 52 00:05:42,940 --> 00:05:47,320 The next step is to take the arguments of the logic s in order to get the predictions. 53 00:05:47,890 --> 00:05:51,670 Once we have the predictions, we can then compute the accuracy and the F1. 54 00:05:52,450 --> 00:05:55,900 Finally, we return a dictionary containing our desired metrics. 55 00:06:03,210 --> 00:06:07,080 The next step is to define our tokenize, our function as usual. 56 00:06:07,080 --> 00:06:10,560 The input to this function is a batch of data from our datasets. 57 00:06:11,130 --> 00:06:16,590 We can pass in both sets one and send it to a separate arguments and also truncation equals true. 58 00:06:23,750 --> 00:06:26,360 The next step is to create our tokenized datasets. 59 00:06:34,490 --> 00:06:36,830 The next step is to create our trainer object. 60 00:06:45,040 --> 00:06:47,050 The next step is to call trainer to train. 61 00:06:59,980 --> 00:07:00,340 Okay. 62 00:07:00,340 --> 00:07:05,680 So notice that five bucks is probably too much since the validation loss seems to be creeping up. 63 00:07:12,580 --> 00:07:16,930 So as a final exercise for this lecture, don't forget to try Bert with this notebook as well. 64 00:07:17,470 --> 00:07:20,800 Compare the performance of the two models as well as the training time. 65 00:07:21,460 --> 00:07:26,130 Consider if this were a real world project, which model you would prefer to use? 66 00:07:27,440 --> 00:07:28,700 As another exercise. 67 00:07:28,790 --> 00:07:31,640 Please also compute the metrics on our test set as well. 6776

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.