All language subtitles for 006Transformer编码器模型

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified) Download
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:04,320 --> 00:00:09,120 In this video, we'll study the encoder  architecture. An example of a popular   2 00:00:09,120 --> 00:00:13,120 encoder-only architecture is BERT, which  is the most popular model of its kind.   3 00:00:14,400 --> 00:00:20,880 Let's first start by understanding how it works.  We'll use a small example, using three words. We   4 00:00:20,880 --> 00:00:27,040 use these as inputs, and pass them through the  encoder. We retrieve a numerical representation   5 00:00:27,040 --> 00:00:34,160 of each word. Here, for example, the encoder  converts the three words “Welcome to NYC”   6 00:00:34,160 --> 00:00:40,880 in these three sequences of numbers. The encoder  outputs exactly one sequence of numbers per input   7 00:00:40,880 --> 00:00:46,880 word. This numerical representation can also be  called a "Feature vector", or "Feature tensor".  8 00:00:48,880 --> 00:00:53,680 Let's dive in this representation. It contains  one vector per word that was passed through the   9 00:00:53,680 --> 00:00:59,680 encoder. Each of these vector is a numerical  representation of the word in question.   10 00:01:00,880 --> 00:01:06,400 The dimension of that vector is defined by the  architecture of the model, for the base BERT   11 00:01:06,400 --> 00:01:15,280 model, it is 768. These representations contain  the value of a word; but contextualized. For   12 00:01:15,280 --> 00:01:21,280 example, the vector attributed to the word "to",  isn't the representation of only the "to" word.   13 00:01:22,160 --> 00:01:29,680 It also takes into account the words around it,  which we call the “context”.As in, it looks to the   14 00:01:29,680 --> 00:01:34,960 left context, the word on the left of the one  we're studying (here the word "Welcome") and   15 00:01:34,960 --> 00:01:41,120 the context on the right (here the word "NYC") and  outputs a value for the word, within its context.   16 00:01:41,840 --> 00:01:49,280 It is therefore a contextualized value. One  could say that the vector of 768 values holds the   17 00:01:49,280 --> 00:01:55,840 "meaning" of that word in the text. How it does  this is thanks to the self-attention mechanism.   18 00:01:57,120 --> 00:02:02,240 The self-attention mechanism relates to different  positions (or different words) in a single   19 00:02:02,240 --> 00:02:08,320 sequence, in order to compute a representation  of that sequence. As we've seen before, this   20 00:02:08,320 --> 00:02:13,600 means that the resulting representation of a word  has been affected by other words in the sequence.   21 00:02:15,600 --> 00:02:20,160 We won't dive into the specifics here, but we'll  offer some further readings if you want to get   22 00:02:20,160 --> 00:02:26,480 a better understanding at what happens under  the hood. So when should one use an encoder?   23 00:02:27,040 --> 00:02:33,680 Encoders can be used as standalone models in a  wide variety of tasks. For example BERT, arguably   24 00:02:33,680 --> 00:02:38,800 the most famous transformer model, is a standalone  encoder model and at the time of release,   25 00:02:38,800 --> 00:02:44,000 beat the state of the art in many sequence  classification tasks, question answering tasks,   26 00:02:44,000 --> 00:02:50,240 and masked language modeling, to only cite a  few. The idea is that encoders are very powerful   27 00:02:50,240 --> 00:02:55,920 at extracting vectors that carry meaningful  information about a sequence. This vector can   28 00:02:55,920 --> 00:02:59,680 then be handled down the road by additional  layers of neurons to make sense of them.   29 00:03:01,200 --> 00:03:04,240 Let's take a look at some examples  where encoders really shine.   30 00:03:06,080 --> 00:03:11,760 First of all, Masked Language Modeling, or  MLM. It's the task of predicting a hidden word   31 00:03:11,760 --> 00:03:18,560 in a sequence of words. Here, for example, we have  hidden the word between "My" and "is". This is one   32 00:03:18,560 --> 00:03:24,000 of the objectives with which BERT was trained: it  was trained to predict hidden words in a sequence.   33 00:03:25,040 --> 00:03:30,160 Encoders shine in this scenario in particular,  as bidirectional information is crucial here.   34 00:03:30,960 --> 00:03:35,520 If we didn't have the words on the right (is,  Sylvain, and the dot), then there is very little   35 00:03:35,520 --> 00:03:41,200 chance that BERT would have been able to identify  "name" as the correct word. The encoder needs to   36 00:03:41,200 --> 00:03:46,720 have a good understanding of the sequence in order  to predict a masked word, as even if the text is   37 00:03:46,720 --> 00:03:52,080 grammatically correct, It does not necessarily  make sense in the context of the sequence.   38 00:03:54,960 --> 00:03:58,720 As mentioned earlier, encoders are  good at doing sequence classification.   39 00:03:59,360 --> 00:04:03,560 Sentiment analysis is an example  of a sequence classification task.   40 00:04:04,240 --> 00:04:11,040 The model's aim is to identify the sentiment of  a sequence – it can range from giving a sequence   41 00:04:11,040 --> 00:04:16,720 a rating from one to five stars if doing review  analysis, to giving a positive or negative rating   42 00:04:16,720 --> 00:04:22,800 to a sequence, which is what is shown here.  For example here, given the two sequences,   43 00:04:22,800 --> 00:04:28,800 we use the model to compute a prediction and to  classify the sequences among these two classes:   44 00:04:28,800 --> 00:04:35,040 positive and negative. While the two sequences  are very similar, containing the same words,   45 00:04:35,040 --> 00:04:41,840 the meaning is different – and the encoder  model is able to grasp that difference. 5923

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.