All language subtitles for 009管道函数内部发生了什么?(PyTorch)

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified) Download
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:05,200 --> 00:00:09,680 What happens inside the pipeline  function? In this video,   2 00:00:09,680 --> 00:00:14,240 we will look at what actually happens when we use  the pipeline function of the Transformers library.   3 00:00:14,880 --> 00:00:19,440 More specifically, we will look at the  sentiment analysis pipeline, and how it   4 00:00:19,440 --> 00:00:24,960 went from the two following sentences to the  positive labels with their respective scores.   5 00:00:26,560 --> 00:00:30,720 As we have seen in the pipeline presentation,  there are three stages in the pipeline.   6 00:00:31,520 --> 00:00:35,920 First, we convert the raw texts to  numbers the model can make sense of,   7 00:00:35,920 --> 00:00:41,520 using a tokenizer. Then, those numbers go  through the model, which outputs logits.   8 00:00:42,640 --> 00:00:47,040 Finally, the post-processing steps transforms  those logits into labels and scores.   9 00:00:47,920 --> 00:00:53,440 Let's look in detail at those three steps, and how  to replicate them using the Transformers library,   10 00:00:53,440 --> 00:01:01,040 beginning with the first stage, tokenization. The  tokenization process has several steps. First,   11 00:01:01,040 --> 00:01:07,360 the text is split into small chunks called tokens.  They can be words, parts of words or punctuation   12 00:01:07,360 --> 00:01:14,160 symbols. Then the tokenizer will had some special  tokens (if the model expect them). Here the model   13 00:01:14,160 --> 00:01:19,440 uses expects a CLS token at the beginning and a  SEP token at the end of the sentence to classify.   14 00:01:20,400 --> 00:01:25,440 Lastly, the tokenizer matches each token to its  unique ID in the vocabulary of the pretrained   15 00:01:25,440 --> 00:01:31,360 model. To load such a tokenizer, the Transformers  library provides the AutoTokenizer API.   16 00:01:32,400 --> 00:01:36,320 The most important method of this  class is from_pretrained, which will   17 00:01:36,320 --> 00:01:41,680 download and cache the configuration and the  vocabulary associated to a given checkpoint.   18 00:01:43,040 --> 00:01:48,880 Here, the checkpoint used by default for the  sentiment analysis pipeline is distilbert base   19 00:01:48,880 --> 00:01:56,080 uncased finetuned sst2 english. We instantiate  a tokenizer associated with that checkpoint,   20 00:01:56,640 --> 00:02:01,920 then feed it the two sentences. Since those  two sentences are not of the same size,   21 00:02:01,920 --> 00:02:05,040 we will need to pad the shortest  one to be able to build an array.   22 00:02:05,760 --> 00:02:08,240 This is done by the tokenizer  with the option padding=True.   23 00:02:09,600 --> 00:02:14,800 With truncation=True, we ensure that any sentence  longer than the maximum the model can handle   24 00:02:14,800 --> 00:02:21,840 is truncated. Lastly, the return_tensors option  tells the tokenizer to return a PyTorch tensor.   25 00:02:23,040 --> 00:02:29,040 Looking at the result, we see we have a dictionary  with two keys. Input IDs contains the IDs of both   26 00:02:29,040 --> 00:02:34,080 sentences, with 0s where the padding is  applied. The second key, attention mask,   27 00:02:34,080 --> 00:02:37,840 indicates where padding has been applied,  so the model does not pay attention to it.   28 00:02:38,640 --> 00:02:43,040 This is all what is inside the tokenization  step. Now let's have a look at the second step,   29 00:02:43,760 --> 00:02:50,560 the model. As for the tokenizer, there is an  AutoModel API, with a from_pretrained method.   30 00:02:50,560 --> 00:02:54,720 It will download and cache the configuration  of the model as well as the pretrained weights.   31 00:02:55,840 --> 00:03:00,480 However, the AutoModel API will only  instantiate the body of the model,   32 00:03:00,480 --> 00:03:05,120 that is, the part of the model that is  left once the pretraining head is removed.   33 00:03:05,840 --> 00:03:11,360 It will output a high-dimensional tensor that is a  representation of the sentences passed, but which   34 00:03:11,360 --> 00:03:17,200 is not directly useful for our classification  problem. Here the tensor has two sentences,   35 00:03:17,200 --> 00:03:25,440 each of sixteen tokens and the last dimension is  the hidden size of our model 768. To get an output   36 00:03:25,440 --> 00:03:30,240 linked to our classification problem, we need to  use the AutoModelForSequenceClassification class.   37 00:03:30,960 --> 00:03:35,200 It works exactly as the AutoModel class,  except that it will build a model with a   38 00:03:35,200 --> 00:03:40,720 classification head. There is one auto class for  each common NLP task in the Transformers library.   39 00:03:42,000 --> 00:03:47,600 Here, after giving our model the two  sentences, we get a tensor of size two by two:   40 00:03:47,600 --> 00:03:53,680 one result for each sentence and for each possible  label. Those outputs are not probabilities yet   41 00:03:53,680 --> 00:03:59,120 (we can see they don't sum to 1). This is because  each model of the Transformers library returns   42 00:03:59,120 --> 00:04:05,120 logits. To make sense of those logits, we need to  dig into the third and last step of the pipeline:   43 00:04:05,680 --> 00:04:11,840 post-processing. To convert logits into  probabilities, we need to apply a SoftMax   44 00:04:11,840 --> 00:04:17,760 layer to them. As we can see, this transforms  them into positive numbers that sum up to 1.   45 00:04:18,959 --> 00:04:22,800 The last step is to know which of those  corresponds to the positive or the negative label.   46 00:04:23,360 --> 00:04:30,160 This is given by the id2label field of the  model config. The first probabilities (index 0)   47 00:04:30,160 --> 00:04:35,360 correspond to the negative label, and the seconds  (index 1) correspond to the positive label.   48 00:04:36,240 --> 00:04:40,560 This is how our classifier built with the  pipeline function picked those labels and computed   49 00:04:40,560 --> 00:04:52,080 those scores. Now that you know how each steps  works, you can easily tweak them to your needs. 6311

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.