All language subtitles for 019将输入批量处理在一起(TensorFlow

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified) Download
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:05,120 --> 00:00:10,880 How to batch inputs together? In this video, we  will see how to batch input sequences together.   2 00:00:12,480 --> 00:00:16,560 In general, the sentences we want to pass  through our model won't all have the same   3 00:00:16,560 --> 00:00:23,520 lengths. Here we are using the model we saw in the  sentiment analysis pipeline and want to classify   4 00:00:23,520 --> 00:00:29,760 two sentences. When tokenizing them and mapping  each token to its corresponding input IDs,   5 00:00:29,760 --> 00:00:31,680 we get two lists of different lengths.   6 00:00:33,120 --> 00:00:38,240 Trying to create a tensor or a NumPy array from  those two lists will result in an error, because   7 00:00:38,240 --> 00:00:44,320 all arrays and tensors should be rectangular.  One way to overcome this limit is to make the   8 00:00:44,320 --> 00:00:50,080 second sentence the same length as the first by  adding a special token as many times as necessary.   9 00:00:51,040 --> 00:00:55,360 Another way would be to truncate the first  sequence to the length of the second, but we   10 00:00:55,360 --> 00:01:00,080 would them lose a lot of information that might  be necessary to properly classify the sentence.   11 00:01:01,040 --> 00:01:05,760 In general, we only truncate sentences when  they are longer than the maximum length the   12 00:01:05,760 --> 00:01:12,560 model can handle. The value used to pad the second  sentence should not be picked randomly: the model   13 00:01:12,560 --> 00:01:18,000 has been pretrained with a certain padding ID,  which you can find in tokenizer.pad_token_id.   14 00:01:19,760 --> 00:01:22,640 Now that we have padded our sentences,  we can make a batch with them.   15 00:01:23,920 --> 00:01:28,400 If we pass the two sentences to the model  separately and batched together however,   16 00:01:28,400 --> 00:01:33,600 we notice that we don't get the same results for  the sentence that is padded (here the second one).   17 00:01:37,360 --> 00:01:41,440 If you remember that Transformer models make  heavy use of attention layers, this should   18 00:01:41,440 --> 00:01:46,800 not come as a total surprise: when computing  the contextual representation of each token,   19 00:01:46,800 --> 00:01:52,800 the attention layers look at all the other words  in the sentence. If we have just the sentence or   20 00:01:52,800 --> 00:01:57,200 the sentence with several padding tokens added,  it's logical we don't get the same values.   21 00:01:58,560 --> 00:02:03,520 To get the same results with or without padding,  we need to indicate to the attention layers   22 00:02:03,520 --> 00:02:08,639 that they should ignore those padding tokens.  This is done by creating an attention mask,   23 00:02:08,639 --> 00:02:15,920 a tensor with the same shape as the input IDs,  with zeros and ones. Ones indicate the tokens the   24 00:02:15,920 --> 00:02:22,160 attention layers should consider in the context  and zeros the tokens they should ignore. Now   25 00:02:22,160 --> 00:02:27,040 passing this attention mask along with the input  ids will give us the same results as when we sent   26 00:02:27,040 --> 00:02:33,600 the two sentences individually to the model! This  is all done behind the scenes by the tokenizer   27 00:02:33,600 --> 00:02:39,680 when you apply it to several sentences with the  flag padding=True. It will apply the padding with   28 00:02:39,680 --> 00:02:49,840 the proper value to the smaller sentences  and create the appropriate attention mask. 3641

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.