subtitlecat.com

All language subtitles for 018将输入批量处理在一起（PyTorch

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified) Download

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:05,200 --> 00:00:10,880 How to batch inputs together? In this video, we will see how to batch input sequences together. 2 00:00:12,320 --> 00:00:16,560 In general, the sentences we want to pass through our model won't all have the same lengths. 3 00:00:17,520 --> 00:00:21,280 Here we are using the model we saw in the sentiment analysis pipeline 4 00:00:21,840 --> 00:00:26,800 and want to classify two sentences. When tokenizing them and mapping each 5 00:00:26,800 --> 00:00:31,280 token to its corresponding input IDs, we get two lists of different lengths. 6 00:00:33,040 --> 00:00:38,400 Trying to create a tensor or a NumPy array from those two lists will result in an error, because 7 00:00:38,400 --> 00:00:44,560 all arrays and tensors should be rectangular. One way to overcome this limit is to make the 8 00:00:44,560 --> 00:00:50,160 second sentence the same length as the first by adding a special token as many times as necessary. 9 00:00:51,360 --> 00:00:55,760 Another way would be to truncate the first sequence to the length of the second, but we 10 00:00:55,760 --> 00:01:00,720 would them lose a lot of information that might be necessary to properly classify the sentence. 11 00:01:02,000 --> 00:01:06,720 In general, we only truncate sentences when they are longer than the maximum length the 12 00:01:06,720 --> 00:01:14,000 model can handle. The value used to pad the second sentence should not be picked randomly: the model 13 00:01:14,000 --> 00:01:19,200 has been pretrained with a certain padding ID, which you can find in tokenizer.pad_token_id. 14 00:01:20,800 --> 00:01:25,200 Now that we have padded our sentences, we can make a batch with them. If 15 00:01:25,200 --> 00:01:29,840 we pass the two sentences to the model separately and batched together however, 16 00:01:29,840 --> 00:01:35,120 we notice that we don't get the same results for the sentence that is padded (here the second one). 17 00:01:39,120 --> 00:01:42,880 If you remember that Transformer models make heavy use of attention layers, this should 18 00:01:42,880 --> 00:01:47,760 not come as a total surprise: when computing the contextual representation of each token, 19 00:01:48,560 --> 00:01:54,320 the attention layers look at all the other words in the sentence. If we have just the sentence or 20 00:01:54,320 --> 00:01:58,720 the sentence with several padding tokens added, it's logical we don't get the same values. 21 00:02:00,000 --> 00:02:05,120 To get the same results with or without padding, we need to indicate to the attention layers 22 00:02:05,120 --> 00:02:10,320 that they should ignore those padding tokens. This is done by creating an attention mask, 23 00:02:10,320 --> 00:02:16,560 a tensor with the same shape as the input IDs, with zeros and ones. Ones indicate the 24 00:02:16,560 --> 00:02:21,840 tokens the attention layers should consider in the context and zeros the tokens they should ignore. 25 00:02:23,360 --> 00:02:26,560 Now passing this attention mask along with the input ids 26 00:02:26,560 --> 00:02:30,720 will give us the same results as when we sent the two sentences individually to the model! 27 00:02:32,160 --> 00:02:36,640 This is all done behind the scenes by the tokenizer when you apply it to several sentences 28 00:02:36,640 --> 00:02:41,280 with the flag padding=True. It will apply the padding with the proper value 29 00:02:41,280 --> 00:02:49,840 to the smaller sentences and create the appropriate attention mask. 3683