All language subtitles for 010管道函数内部发生了什么?(TensorFlow)

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified) Download
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:05,360 --> 00:00:07,680 What happens inside the pipeline function?   2 00:00:09,840 --> 00:00:14,800 In this video, we will look at what actually  happens when we use the pipeline function of   3 00:00:14,800 --> 00:00:20,880 the Transformers library. More specifically, we  will look at the sentiment analysis pipeline, and   4 00:00:20,880 --> 00:00:26,720 how it went from the two following sentences to  the positive labels with their respective scores.   5 00:00:28,560 --> 00:00:34,160 As we have seen in the pipeline presentation,  there are three stages in the pipeline. First,   6 00:00:34,800 --> 00:00:38,880 we convert the raw texts to numbers the  model can make sense of, using a tokenizer.   7 00:00:40,000 --> 00:00:43,520 Then, those numbers go through  the model, which outputs logits.   8 00:00:44,400 --> 00:00:49,120 Finally, the post-processing steps transforms  those logits into labels and scores.   9 00:00:50,720 --> 00:00:54,960 Let's look in detail at those three steps, and how  to replicate them using the Transformers library,   10 00:00:54,960 --> 00:01:03,280 beginning with the first stage, tokenization. The  tokenization process has several steps. First,   11 00:01:03,280 --> 00:01:09,120 the text is split into small chunks called tokens.  They can be words, parts of words or punctuation   12 00:01:09,120 --> 00:01:17,440 symbols. Then the tokenizer will had some special  tokens (if the model expect them). Here the model   13 00:01:17,440 --> 00:01:22,800 uses expects a CLS token at the beginning and a  SEP token at the end of the sentence to classify.   14 00:01:23,760 --> 00:01:28,880 Lastly, the tokenizer matches each token to its  unique ID in the vocabulary of the pretrained   15 00:01:28,880 --> 00:01:34,640 model. To load such a tokenizer, the Transformers  library provides the AutoTokenizer API.   16 00:01:35,680 --> 00:01:40,640 The most important method of this class is  from_pretrained, which will download and cache   17 00:01:40,640 --> 00:01:47,200 the configuration and the vocabulary associated  to a given checkpoint. Here, the checkpoint used   18 00:01:47,200 --> 00:01:53,840 by default for the sentiment analysis pipeline is  distilbert base uncased finetuned sst2 english.   19 00:01:56,560 --> 00:02:01,440 We instantiate a tokenizer associated with that  checkpoint, then feed it the two sentences.   20 00:02:02,640 --> 00:02:07,360 Since those two sentences are not of the same  size, we will need to pad the shortest one to   21 00:02:07,360 --> 00:02:11,680 be able to build an array. This is done by  the tokenizer with the option padding=True.   22 00:02:13,840 --> 00:02:18,960 With truncation=True, we ensure that any sentence  longer than the maximum the model can handle   23 00:02:18,960 --> 00:02:25,600 is truncated. Lastly, the return_tensors option  tells the tokenizer to return a TensorFlow tensor.   24 00:02:26,720 --> 00:02:29,680 Looking at the result, we see we  have a dictionary with two keys.   25 00:02:30,240 --> 00:02:37,280 Input IDs contains the IDs of both sentences, with  0s where the padding is applied. The second key,   26 00:02:37,280 --> 00:02:42,080 attention mask, indicates where padding has been  applied, so the model does not pay attention to   27 00:02:42,080 --> 00:02:48,000 it. This is all what is inside the tokenization  step. Now let's have a look at the second step,   28 00:02:48,640 --> 00:02:54,960 the model. As for the tokenizer, there is an  TFAutoModel API, with a from_pretrained method.   29 00:02:55,600 --> 00:02:59,840 It will download and cache the configuration  of the model as well as the pretrained   30 00:02:59,840 --> 00:03:05,600 weights. However, the TFAutoModel API will  only instantiate the body of the model,   31 00:03:06,320 --> 00:03:10,640 that is, the part of the model that is  left once the pretraining head is removed.   32 00:03:12,000 --> 00:03:16,960 It will output a high-dimensional tensor that  is a representation of the sentences passed,   33 00:03:16,960 --> 00:03:20,080 but which is not directly useful  for our classification problem.   34 00:03:21,760 --> 00:03:28,080 Here the tensor has two sentences, each of sixteen  tokens and the last dimension is the hidden size   35 00:03:28,080 --> 00:03:34,320 of our model 768. To get an output linked  to our classification problem, we need to   36 00:03:34,320 --> 00:03:40,000 use the TFAutoModelForSequenceClassification  class. It works exactly as the AutoModel class,   37 00:03:40,000 --> 00:03:45,440 except that it will build a model with a  classification head. There is one auto class for   38 00:03:45,440 --> 00:03:52,160 each common NLP task in the Transformers library.  Here, after giving our model the two sentences,   39 00:03:52,160 --> 00:03:59,120 we get a tensor of size two by two: one result for  each sentence and for each possible label. Those   40 00:03:59,120 --> 00:04:04,800 outputs are not probabilities yet (we can see they  don't sum to 1). This is because each model of the   41 00:04:04,800 --> 00:04:10,960 Transformers library returns logits. To make sense  of those logits, we need to dig into the third and   42 00:04:10,960 --> 00:04:17,519 last step of the pipeline: post-processing. To  convert logits into probabilities, we need to   43 00:04:17,519 --> 00:04:22,800 apply a SoftMax layer to them. As we can see,  this transforms them into positive numbers that   44 00:04:22,800 --> 00:04:28,160 sum up to 1. The last step is to know which of  those corresponds to the positive or the negative   45 00:04:28,160 --> 00:04:34,720 label. This is given by the id2label field  of the model config. The first probabilities   46 00:04:34,720 --> 00:04:40,800 (index 0) correspond to the negative label, and  the seconds (index 1) correspond to the positive   47 00:04:40,800 --> 00:04:46,640 label. This is how our classifier built with the  pipeline function picked those labels and computed   48 00:04:46,640 --> 00:04:55,840 those scores. Now that you know how each steps  works, you can easily tweak them to your needs. 6278

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.