subtitlecat.com

All language subtitles for 004什么是迁移学习

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified) Download

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:05,440 --> 00:00:07,120 What is transfer learning? 2 00:00:09,360 --> 00:00:13,760 The idea of Transfer Learning is to leverage the knowledge acquired by a model trained with lots of 3 00:00:13,760 --> 00:00:20,720 data on another task. The model A will be trained specifically for task A. Now, let's say you want 4 00:00:20,720 --> 00:00:26,320 to train a model B for a different task. One option would be to train the model from scratch. 5 00:00:27,120 --> 00:00:34,240 This could take lots of computation, time and data. Instead, we could initialize model B with 6 00:00:34,240 --> 00:00:38,880 the same weights as model A, transferring the knowledge of model A on task B. 7 00:00:40,800 --> 00:00:47,040 When training from scratch, all the model’s weight are initialized randomly. In this example, 8 00:00:47,040 --> 00:00:52,480 we are training a BERT model on the task of recognizing if two sentences are similar or not. 9 00:00:53,680 --> 00:00:58,560 On the left, it’s trained from scratch, and on the right, it’s fine-tuning a pretrained 10 00:00:58,560 --> 00:01:04,080 model. As we can see, using transfer learning and the pretrained model yields better results. 11 00:01:04,959 --> 00:01:09,360 And it doesn’t matter if we train longer, the training from scratch is capped around 70% 12 00:01:09,360 --> 00:01:13,040 accuracy while the pretrained model beats the 86% easily. 13 00:01:14,240 --> 00:01:18,720 This is because pretrained models are usually trained on large amounts of data that provide 14 00:01:18,720 --> 00:01:22,720 the model with a statistical understanding of the language used during pretraining. 15 00:01:24,240 --> 00:01:28,960 In computer vision, transfer learning has been applied successfully for almost ten years. 16 00:01:29,840 --> 00:01:35,840 Models are frequently pretrained on ImageNet, a dataset containing 1.2 millions of photo images. 17 00:01:36,880 --> 00:01:41,680 Each image is classified by one of 1000 labels. Training like this, 18 00:01:42,240 --> 00:01:48,960 on labeled data is called supervised learning. In Natural Language Processing, 19 00:01:48,960 --> 00:01:54,320 transfer learning is a bit more recent. A key difference with ImageNet is that the pretraining 20 00:01:54,320 --> 00:01:59,280 is usually self-supervised, which means it doesn’t require humans annotations for the labels. 21 00:02:00,480 --> 00:02:05,040 A very common pretraining objective is to guess the next word in a sentence, 22 00:02:05,040 --> 00:02:08,720 which only requires lots and lots of text. GPT-2 for instance, 23 00:02:09,360 --> 00:02:16,720 was pretrained this way using the content of 45 millions links posted by users on Reddit. Another 24 00:02:16,720 --> 00:02:21,520 example of self-supervised pretraining objective is to predict the value of randomly masked words, 25 00:02:22,160 --> 00:02:25,360 which is similar to fill-in-the-blank tests you may have done in school. 26 00:02:26,560 --> 00:02:31,520 BERT was pretrained this way using the English Wikipedia and 11,000 unpublished books. 27 00:02:32,960 --> 00:02:38,880 In practice, transfer learning is applied on a given model by throwing away its head, that is, 28 00:02:38,880 --> 00:02:43,680 its last layers focused on the pretraining objective, and replacing it with a new, 29 00:02:43,680 --> 00:02:50,000 randomly initialized, head suitable for the task at hand. For instance, when we fine-tuned a BERT 30 00:02:50,000 --> 00:02:55,440 model earlier, we removed the head that classified mask words and replaced it with a classifier with 31 00:02:55,440 --> 00:03:01,680 2 outputs, since our task had two labels. To be as efficient as possible, the pretrained 32 00:03:01,680 --> 00:03:07,200 model used should be as similar as possible to the task it’s fine-tuned on. For instance, 33 00:03:07,200 --> 00:03:12,720 if the problem it’s to classify German sentences, it’s best to use a German pretrained model. 34 00:03:14,160 --> 00:03:19,200 But with the good comes the bad. The pretrained model does not only transfer its knowledge, 35 00:03:19,200 --> 00:03:25,440 but also any bias it may contain. ImageNet mostly contains images coming from the United States and 36 00:03:25,440 --> 00:03:29,680 Western Europe, so models fine-tuned with it usually will perform better on images from 37 00:03:29,680 --> 00:03:35,280 these countries. OpenAI also studied the bias in the predictions of its GPT-3 model 38 00:03:35,840 --> 00:03:40,960 (which was pretrained using the guess the next work objective). Changing the gender of the prompt 39 00:03:40,960 --> 00:03:46,720 from "He was very" to "She was very" changed the predictions from mostly neutral adjectives 40 00:03:47,360 --> 00:03:52,240 to almost only physical ones. In their model card of the GPT-2 model, 41 00:03:52,240 --> 00:03:59,840 OpenAI also acknowledges its bias and discourages its use in systems that interact with humans. 5215