All language subtitles for 008Transformer 模型:编码器-解码器

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified) Download
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:04,160 --> 00:00:07,200 In this video, we'll study the  encoder-decoder architecture.   2 00:00:08,160 --> 00:00:16,160 An example of a popular encoder-decoder model is  T5. In order to understand how the encoder-decoder   3 00:00:16,160 --> 00:00:21,680 works, we recommend you check out the videos  on encoders and decoders as standalone models.   4 00:00:22,400 --> 00:00:30,320 Understanding how they behave individually will  help understanding how an encoder-decoder behaves.   5 00:00:30,320 --> 00:00:35,360 Let's start from what we've seen about the  encoder. The encoder takes words as inputs,   6 00:00:36,000 --> 00:00:40,640 casts them through the encoder, and  retrieves a numerical representation   7 00:00:40,640 --> 00:00:47,360 for each word cast through it. We now know that  the numerical representation holds information   8 00:00:47,360 --> 00:00:54,000 about the meaning of the sequence. Let's put  this aside and add the decoder to the diagram.   9 00:00:56,480 --> 00:01:00,160 In this scenario, we're using the decoder  in a manner that we haven't seen before.   10 00:01:00,720 --> 00:01:07,600 We're passing the outputs of the encoder directly  to it! Additionally to the encoder outputs,   11 00:01:07,600 --> 00:01:13,040 we also give the decoder a sequence. When  prompting the decoder for an output with no   12 00:01:13,040 --> 00:01:17,360 initial sequence, we can give it the value  that indicates the start of a sequence.   13 00:01:18,000 --> 00:01:23,520 And that's where the encoder-decoder magic  happens. The encoder accepts a sequence as input.   14 00:01:24,560 --> 00:01:30,480 It computes a prediction, and outputs a  numerical representation. Then, it sends   15 00:01:30,480 --> 00:01:38,000 that over to the decoder. It has, in a sense,  encoded the sequence. And the decoder, in turn,   16 00:01:38,000 --> 00:01:42,960 using this input alongside its usual sequence  input, will take a stab at decoding the sequence.   17 00:01:44,720 --> 00:01:50,400 The decoder decodes the sequence, and outputs a  word. As of now, we don't need to make sense of   18 00:01:50,400 --> 00:01:55,440 that word, but we can understand that the decoder  is essentially decoding what the encoder has   19 00:01:55,440 --> 00:02:02,160 output. The "start of sequence word" indicates  that it should start decoding the sequence.   20 00:02:03,600 --> 00:02:10,240 Now that we have both the feature vector and  an initial generated word, we don't need the   21 00:02:10,240 --> 00:02:17,760 encoder anymore. As we have seen before with the  decoder, it can act in an auto-regressive manner;   22 00:02:18,640 --> 00:02:24,960 the word it has just output can now be used  as an input. This, in combination with the   23 00:02:24,960 --> 00:02:30,800 numerical representation output by the encoder,  can now be used to generate a second word.   24 00:02:33,200 --> 00:02:38,880 Please note that the first word is still here; as  the model still outputs it. However, it is greyed   25 00:02:38,880 --> 00:02:45,120 out as we have no need for it anymore. We can  continue on and on; for example until the decoder   26 00:02:45,120 --> 00:02:50,720 outputs a value that we consider a "stopping  value", like a dot, meaning the end of a sequence.   27 00:02:53,440 --> 00:02:58,080 Here, we've seen the full mechanism of the  encoder-decoder transformer: let's go over it one   28 00:02:58,080 --> 00:03:05,120 more time. We have an initial sequence, that is  sent to the encoder. That encoder output is then   29 00:03:05,120 --> 00:03:12,240 sent to the decoder, for it to be decoded. While  we can now discard the encoder after a single use,   30 00:03:12,240 --> 00:03:17,840 the decoder will be used several times: until  we have generated every word that we need.   31 00:03:20,000 --> 00:03:25,120 Let's see a concrete example; with Translation  Language Modeling; also called transduction;   32 00:03:25,120 --> 00:03:30,800 the act of translating a sequence. Here, we would  like to translate this English sequence "Welcome   33 00:03:30,800 --> 00:03:38,400 to NYC" in French. We're using a transformer model  that is trained for that task explicitly. We use   34 00:03:38,400 --> 00:03:43,520 the encoder to create a representation  of the English sentence. We cast this   35 00:03:43,520 --> 00:03:48,880 to the decoder and, with the use of the start of  sequence word, we ask it to output the first word.   36 00:03:50,720 --> 00:03:52,960 It outputs Bienvenue, which means "Welcome".   37 00:03:55,280 --> 00:04:02,480 We then use "Bienvenue" as the input sequence for  the decoder. This, alongside the feature vector,   38 00:04:04,320 --> 00:04:08,480 allows the decoder to predict the second  word, "à", which is "to" in English.   39 00:04:10,160 --> 00:04:14,400 Finally, we ask the decoder to predict  a third word; it predicts "NYC",   40 00:04:14,400 --> 00:04:20,240 which is, once again, correct. We've translated  the sentence! Where the encoder-decoder really   41 00:04:20,240 --> 00:04:24,880 shines, is that we have an encoder and a  decoder; which often do not share weights.   42 00:04:27,280 --> 00:04:31,440 We, therefore, have an entire block (the encoder)  that can be trained to understand the sequence,   43 00:04:31,440 --> 00:04:36,480 and extract the relevant information. For the  translation scenario we've seen earlier, for   44 00:04:36,480 --> 00:04:44,160 example, this would mean parsing and understanding  what was said in the English language; extracting   45 00:04:44,160 --> 00:04:49,040 information from that language, and putting  all of that in a vector dense in information.   46 00:04:50,880 --> 00:04:57,280 On the other hand, we have the decoder, whose  sole purpose is to decode the feature output by   47 00:04:57,280 --> 00:05:03,760 the encoder. This decoder can be specialized in  a completely different language, or even modality   48 00:05:03,760 --> 00:05:11,760 like images or speech. Encoders-decoders  are special for several reasons. Firstly,   49 00:05:11,760 --> 00:05:17,040 they're able to manage sequence to sequence  tasks, like translation that we have just seen.   50 00:05:18,640 --> 00:05:23,880 Secondly, the weights between the encoder and the  decoder parts are not necessarily shared. Let's   51 00:05:24,480 --> 00:05:31,200 take another example of translation. Here we're  translating "Transformers are powerful" in French.   52 00:05:32,240 --> 00:05:36,560 Firstly, this means that from a sequence  of three words, we're able to generate   53 00:05:36,560 --> 00:05:42,240 a sequence of four words. One could argue  that this could be handled with a decoder;   54 00:05:42,240 --> 00:05:46,960 that would generate the translation in an  auto-regressive manner; and they would be right!   55 00:05:49,840 --> 00:05:53,840 Another example of where sequence to sequence  transformers shine is in summarization.   56 00:05:54,640 --> 00:05:58,560 Here we have a very long  sequence, generally a full text,   57 00:05:58,560 --> 00:06:03,840 and we want to summarize it. Since the  encoder and decoders are separated,   58 00:06:03,840 --> 00:06:08,880 we can have different context lengths (for  example a very long context for the encoder which   59 00:06:08,880 --> 00:06:13,840 handles the text, and a smaller context for the  decoder which handles the summarized sequence).   60 00:06:16,240 --> 00:06:20,480 There are a lot of sequence to sequence  models. This contains a few examples of   61 00:06:20,480 --> 00:06:24,160 popular encoder-decoder models  available in the transformers library.   62 00:06:26,320 --> 00:06:31,200 Additionally, you can load an encoder  and a decoder inside an encoder-decoder   63 00:06:31,200 --> 00:06:35,040 model! Therefore, according to the  specific task you are targeting,   64 00:06:35,040 --> 00:06:40,240 you may choose to use specific encoders  and decoders, which have proven their worth   65 00:06:40,240 --> 00:06:49,850 on these specific tasks. This wraps things up  for the encoder-decoders. Thanks for watching! 8342

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.