subtitlecat.com

All language subtitles for 008Transformer 模型：编码器-解码器

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified) Download

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:04,160 --> 00:00:07,200 In this video, we'll study the encoder-decoder architecture. 2 00:00:08,160 --> 00:00:16,160 An example of a popular encoder-decoder model is T5. In order to understand how the encoder-decoder 3 00:00:16,160 --> 00:00:21,680 works, we recommend you check out the videos on encoders and decoders as standalone models. 4 00:00:22,400 --> 00:00:30,320 Understanding how they behave individually will help understanding how an encoder-decoder behaves. 5 00:00:30,320 --> 00:00:35,360 Let's start from what we've seen about the encoder. The encoder takes words as inputs, 6 00:00:36,000 --> 00:00:40,640 casts them through the encoder, and retrieves a numerical representation 7 00:00:40,640 --> 00:00:47,360 for each word cast through it. We now know that the numerical representation holds information 8 00:00:47,360 --> 00:00:54,000 about the meaning of the sequence. Let's put this aside and add the decoder to the diagram. 9 00:00:56,480 --> 00:01:00,160 In this scenario, we're using the decoder in a manner that we haven't seen before. 10 00:01:00,720 --> 00:01:07,600 We're passing the outputs of the encoder directly to it! Additionally to the encoder outputs, 11 00:01:07,600 --> 00:01:13,040 we also give the decoder a sequence. When prompting the decoder for an output with no 12 00:01:13,040 --> 00:01:17,360 initial sequence, we can give it the value that indicates the start of a sequence. 13 00:01:18,000 --> 00:01:23,520 And that's where the encoder-decoder magic happens. The encoder accepts a sequence as input. 14 00:01:24,560 --> 00:01:30,480 It computes a prediction, and outputs a numerical representation. Then, it sends 15 00:01:30,480 --> 00:01:38,000 that over to the decoder. It has, in a sense, encoded the sequence. And the decoder, in turn, 16 00:01:38,000 --> 00:01:42,960 using this input alongside its usual sequence input, will take a stab at decoding the sequence. 17 00:01:44,720 --> 00:01:50,400 The decoder decodes the sequence, and outputs a word. As of now, we don't need to make sense of 18 00:01:50,400 --> 00:01:55,440 that word, but we can understand that the decoder is essentially decoding what the encoder has 19 00:01:55,440 --> 00:02:02,160 output. The "start of sequence word" indicates that it should start decoding the sequence. 20 00:02:03,600 --> 00:02:10,240 Now that we have both the feature vector and an initial generated word, we don't need the 21 00:02:10,240 --> 00:02:17,760 encoder anymore. As we have seen before with the decoder, it can act in an auto-regressive manner; 22 00:02:18,640 --> 00:02:24,960 the word it has just output can now be used as an input. This, in combination with the 23 00:02:24,960 --> 00:02:30,800 numerical representation output by the encoder, can now be used to generate a second word. 24 00:02:33,200 --> 00:02:38,880 Please note that the first word is still here; as the model still outputs it. However, it is greyed 25 00:02:38,880 --> 00:02:45,120 out as we have no need for it anymore. We can continue on and on; for example until the decoder 26 00:02:45,120 --> 00:02:50,720 outputs a value that we consider a "stopping value", like a dot, meaning the end of a sequence. 27 00:02:53,440 --> 00:02:58,080 Here, we've seen the full mechanism of the encoder-decoder transformer: let's go over it one 28 00:02:58,080 --> 00:03:05,120 more time. We have an initial sequence, that is sent to the encoder. That encoder output is then 29 00:03:05,120 --> 00:03:12,240 sent to the decoder, for it to be decoded. While we can now discard the encoder after a single use, 30 00:03:12,240 --> 00:03:17,840 the decoder will be used several times: until we have generated every word that we need. 31 00:03:20,000 --> 00:03:25,120 Let's see a concrete example; with Translation Language Modeling; also called transduction; 32 00:03:25,120 --> 00:03:30,800 the act of translating a sequence. Here, we would like to translate this English sequence "Welcome 33 00:03:30,800 --> 00:03:38,400 to NYC" in French. We're using a transformer model that is trained for that task explicitly. We use 34 00:03:38,400 --> 00:03:43,520 the encoder to create a representation of the English sentence. We cast this 35 00:03:43,520 --> 00:03:48,880 to the decoder and, with the use of the start of sequence word, we ask it to output the first word. 36 00:03:50,720 --> 00:03:52,960 It outputs Bienvenue, which means "Welcome". 37 00:03:55,280 --> 00:04:02,480 We then use "Bienvenue" as the input sequence for the decoder. This, alongside the feature vector, 38 00:04:04,320 --> 00:04:08,480 allows the decoder to predict the second word, "à", which is "to" in English. 39 00:04:10,160 --> 00:04:14,400 Finally, we ask the decoder to predict a third word; it predicts "NYC", 40 00:04:14,400 --> 00:04:20,240 which is, once again, correct. We've translated the sentence! Where the encoder-decoder really 41 00:04:20,240 --> 00:04:24,880 shines, is that we have an encoder and a decoder; which often do not share weights. 42 00:04:27,280 --> 00:04:31,440 We, therefore, have an entire block (the encoder) that can be trained to understand the sequence, 43 00:04:31,440 --> 00:04:36,480 and extract the relevant information. For the translation scenario we've seen earlier, for 44 00:04:36,480 --> 00:04:44,160 example, this would mean parsing and understanding what was said in the English language; extracting 45 00:04:44,160 --> 00:04:49,040 information from that language, and putting all of that in a vector dense in information. 46 00:04:50,880 --> 00:04:57,280 On the other hand, we have the decoder, whose sole purpose is to decode the feature output by 47 00:04:57,280 --> 00:05:03,760 the encoder. This decoder can be specialized in a completely different language, or even modality 48 00:05:03,760 --> 00:05:11,760 like images or speech. Encoders-decoders are special for several reasons. Firstly, 49 00:05:11,760 --> 00:05:17,040 they're able to manage sequence to sequence tasks, like translation that we have just seen. 50 00:05:18,640 --> 00:05:23,880 Secondly, the weights between the encoder and the decoder parts are not necessarily shared. Let's 51 00:05:24,480 --> 00:05:31,200 take another example of translation. Here we're translating "Transformers are powerful" in French. 52 00:05:32,240 --> 00:05:36,560 Firstly, this means that from a sequence of three words, we're able to generate 53 00:05:36,560 --> 00:05:42,240 a sequence of four words. One could argue that this could be handled with a decoder; 54 00:05:42,240 --> 00:05:46,960 that would generate the translation in an auto-regressive manner; and they would be right! 55 00:05:49,840 --> 00:05:53,840 Another example of where sequence to sequence transformers shine is in summarization. 56 00:05:54,640 --> 00:05:58,560 Here we have a very long sequence, generally a full text, 57 00:05:58,560 --> 00:06:03,840 and we want to summarize it. Since the encoder and decoders are separated, 58 00:06:03,840 --> 00:06:08,880 we can have different context lengths (for example a very long context for the encoder which 59 00:06:08,880 --> 00:06:13,840 handles the text, and a smaller context for the decoder which handles the summarized sequence). 60 00:06:16,240 --> 00:06:20,480 There are a lot of sequence to sequence models. This contains a few examples of 61 00:06:20,480 --> 00:06:24,160 popular encoder-decoder models available in the transformers library. 62 00:06:26,320 --> 00:06:31,200 Additionally, you can load an encoder and a decoder inside an encoder-decoder 63 00:06:31,200 --> 00:06:35,040 model! Therefore, according to the specific task you are targeting, 64 00:06:35,040 --> 00:06:40,240 you may choose to use specific encoders and decoders, which have proven their worth 65 00:06:40,240 --> 00:06:49,850 on these specific tasks. This wraps things up for the encoder-decoders. Thanks for watching! 8342