All language subtitles for 018将输入批量处理在一起(PyTorch

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified) Download
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:05,200 --> 00:00:10,880 How to batch inputs together? In this video, we  will see how to batch input sequences together.   2 00:00:12,320 --> 00:00:16,560 In general, the sentences we want to pass through  our model won't all have the same lengths.   3 00:00:17,520 --> 00:00:21,280 Here we are using the model we saw  in the sentiment analysis pipeline   4 00:00:21,840 --> 00:00:26,800 and want to classify two sentences.  When tokenizing them and mapping each   5 00:00:26,800 --> 00:00:31,280 token to its corresponding input IDs,  we get two lists of different lengths.   6 00:00:33,040 --> 00:00:38,400 Trying to create a tensor or a NumPy array from  those two lists will result in an error, because   7 00:00:38,400 --> 00:00:44,560 all arrays and tensors should be rectangular.  One way to overcome this limit is to make the   8 00:00:44,560 --> 00:00:50,160 second sentence the same length as the first by  adding a special token as many times as necessary.   9 00:00:51,360 --> 00:00:55,760 Another way would be to truncate the first  sequence to the length of the second, but we   10 00:00:55,760 --> 00:01:00,720 would them lose a lot of information that might  be necessary to properly classify the sentence.   11 00:01:02,000 --> 00:01:06,720 In general, we only truncate sentences when  they are longer than the maximum length the   12 00:01:06,720 --> 00:01:14,000 model can handle. The value used to pad the second  sentence should not be picked randomly: the model   13 00:01:14,000 --> 00:01:19,200 has been pretrained with a certain padding ID,  which you can find in tokenizer.pad_token_id.   14 00:01:20,800 --> 00:01:25,200 Now that we have padded our sentences,  we can make a batch with them. If   15 00:01:25,200 --> 00:01:29,840 we pass the two sentences to the model  separately and batched together however,   16 00:01:29,840 --> 00:01:35,120 we notice that we don't get the same results for  the sentence that is padded (here the second one).   17 00:01:39,120 --> 00:01:42,880 If you remember that Transformer models make  heavy use of attention layers, this should   18 00:01:42,880 --> 00:01:47,760 not come as a total surprise: when computing  the contextual representation of each token,   19 00:01:48,560 --> 00:01:54,320 the attention layers look at all the other words  in the sentence. If we have just the sentence or   20 00:01:54,320 --> 00:01:58,720 the sentence with several padding tokens added,  it's logical we don't get the same values.   21 00:02:00,000 --> 00:02:05,120 To get the same results with or without padding,  we need to indicate to the attention layers   22 00:02:05,120 --> 00:02:10,320 that they should ignore those padding tokens.  This is done by creating an attention mask,   23 00:02:10,320 --> 00:02:16,560 a tensor with the same shape as the input  IDs, with zeros and ones. Ones indicate the   24 00:02:16,560 --> 00:02:21,840 tokens the attention layers should consider in the  context and zeros the tokens they should ignore.   25 00:02:23,360 --> 00:02:26,560 Now passing this attention  mask along with the input ids   26 00:02:26,560 --> 00:02:30,720 will give us the same results as when we sent  the two sentences individually to the model!   27 00:02:32,160 --> 00:02:36,640 This is all done behind the scenes by the  tokenizer when you apply it to several sentences   28 00:02:36,640 --> 00:02:41,280 with the flag padding=True. It will  apply the padding with the proper value   29 00:02:41,280 --> 00:02:49,840 to the smaller sentences and create  the appropriate attention mask. 3683

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.