subtitlecat.com

All language subtitles for 01_gradient-descent-implementation.en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian Download

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:01,040 --> 00:00:05,145 To fit the parameters of a logistic regression model, 2 00:00:05,145 --> 00:00:06,450 we're going to try to find 3 00:00:06,450 --> 00:00:08,865 the values of the parameters w and b 4 00:00:08,865 --> 00:00:12,870 that minimize the cost function J of w and b, 5 00:00:12,870 --> 00:00:15,945 and we'll again apply gradient descent to do this. 6 00:00:15,945 --> 00:00:18,000 Let's take a look at how. 7 00:00:18,000 --> 00:00:21,165 In this video we'll focus on how to find 8 00:00:21,165 --> 00:00:24,270 a good choice of the parameters w and b. 9 00:00:24,270 --> 00:00:26,145 After you've done so, 10 00:00:26,145 --> 00:00:29,115 if you give the model a new input, x, 11 00:00:29,115 --> 00:00:31,140 say a new patients at 12 00:00:31,140 --> 00:00:34,050 the hospital with a certain tumor size and age, 13 00:00:34,050 --> 00:00:35,775 then these are diagnosis. 14 00:00:35,775 --> 00:00:38,744 The model can then make a prediction, 15 00:00:38,744 --> 00:00:41,070 or it can try to estimate 16 00:00:41,070 --> 00:00:45,600 the probability that the label y is one. 17 00:00:45,600 --> 00:00:48,290 The average you can use to minimize 18 00:00:48,290 --> 00:00:50,875 the cost function is gradient descent. 19 00:00:50,875 --> 00:00:54,270 Here again is the cost function. 20 00:00:54,270 --> 00:00:56,195 If you want to minimize 21 00:00:56,195 --> 00:00:59,855 the cost j as a function of w and b, 22 00:00:59,855 --> 00:01:03,755 well, here's the usual gradient descent algorithm, 23 00:01:03,755 --> 00:01:05,630 where you repeatedly update 24 00:01:05,630 --> 00:01:09,545 each parameter as the 0 value minus Alpha, 25 00:01:09,545 --> 00:01:14,240 the learning rate times this derivative term. 26 00:01:14,240 --> 00:01:17,045 Let's take a look at the derivative 27 00:01:17,045 --> 00:01:19,595 of j with respect to w_j. 28 00:01:19,595 --> 00:01:23,330 This term up on top here, where as usual, 29 00:01:23,330 --> 00:01:25,490 j goes from one through n, 30 00:01:25,490 --> 00:01:28,210 where n is the number of features. 31 00:01:28,210 --> 00:01:31,565 If someone were to apply the rules of calculus, 32 00:01:31,565 --> 00:01:35,930 you can show that the derivative with respect to w_j of 33 00:01:35,930 --> 00:01:38,300 the cost function capital J is 34 00:01:38,300 --> 00:01:41,150 equal to this expression over here, 35 00:01:41,150 --> 00:01:44,210 is 1 over m times the sum 36 00:01:44,210 --> 00:01:48,575 from 1 through m of this error term. 37 00:01:48,575 --> 00:01:56,220 That is f minus the label y times x_j. 38 00:01:56,220 --> 00:01:58,800 Here are just x I j is 39 00:01:58,800 --> 00:02:02,520 the j feature of training example i. 40 00:02:02,520 --> 00:02:05,690 Now let's also look at the derivative of 41 00:02:05,690 --> 00:02:08,500 j with respect to the parameter b. 42 00:02:08,500 --> 00:02:12,575 It turns out to be this expression over here. 43 00:02:12,575 --> 00:02:15,125 It's quite similar to the expression above, 44 00:02:15,125 --> 00:02:17,600 except that it is not multiplied by 45 00:02:17,600 --> 00:02:22,105 this x superscript i subscript j at the end. 46 00:02:22,105 --> 00:02:24,019 Just as a reminder, 47 00:02:24,019 --> 00:02:26,435 similar to what you saw for linear regression, 48 00:02:26,435 --> 00:02:28,400 the way to carry out these updates is 49 00:02:28,400 --> 00:02:30,365 to use simultaneous updates, 50 00:02:30,365 --> 00:02:32,150 meaning that you first 51 00:02:32,150 --> 00:02:34,490 compute the right-hand side for all of 52 00:02:34,490 --> 00:02:37,505 these updates and then simultaneously 53 00:02:37,505 --> 00:02:41,970 overwrite all the values on the left at the same time. 54 00:02:42,320 --> 00:02:45,890 Let me take these derivative expressions 55 00:02:45,890 --> 00:02:50,435 here and plug them into these terms here. 56 00:02:50,435 --> 00:02:56,425 This gives you gradient descent for logistic regression. 57 00:02:56,425 --> 00:02:59,180 Now, one funny thing you might be 58 00:02:59,180 --> 00:03:01,940 wondering is, that's weird. 59 00:03:01,940 --> 00:03:03,590 These two equations look 60 00:03:03,590 --> 00:03:05,870 exactly like the average we had come up 61 00:03:05,870 --> 00:03:08,155 with previously for linear regression 62 00:03:08,155 --> 00:03:09,905 so you might be wondering, 63 00:03:09,905 --> 00:03:11,570 is linear regression actually 64 00:03:11,570 --> 00:03:14,260 secretly the same as logistic regression? 65 00:03:14,260 --> 00:03:18,755 Well, even though these equations look the same, 66 00:03:18,755 --> 00:03:22,040 the reason that this is not linear regression is because 67 00:03:22,040 --> 00:03:26,390 the definition for the function f of x has changed. 68 00:03:26,390 --> 00:03:28,155 In linear regression, 69 00:03:28,155 --> 00:03:29,400 f of x is, 70 00:03:29,400 --> 00:03:31,620 this is wx plus b. 71 00:03:31,620 --> 00:03:33,590 But in logistic regression, 72 00:03:33,590 --> 00:03:35,300 f of x is defined to be 73 00:03:35,300 --> 00:03:39,745 the sigmoid function applied to wx plus b. 74 00:03:39,745 --> 00:03:42,830 Although the algorithm written looked the 75 00:03:42,830 --> 00:03:46,399 same for both linear regression and logistic regression, 76 00:03:46,399 --> 00:03:49,070 actually they're two very different algorithms 77 00:03:49,070 --> 00:03:52,985 because the definition for f of x is not the same. 78 00:03:52,985 --> 00:03:55,400 When we talked about gradient descent 79 00:03:55,400 --> 00:03:57,890 for linear regression previously, 80 00:03:57,890 --> 00:03:59,900 you saw how you can monitor 81 00:03:59,900 --> 00:04:02,870 a gradient descent to make sure it converges. 82 00:04:02,870 --> 00:04:05,750 You can just apply the same method for 83 00:04:05,750 --> 00:04:09,655 logistic regression to make sure it also converges. 84 00:04:09,655 --> 00:04:13,700 I've written out these updates as if you're updating 85 00:04:13,700 --> 00:04:18,990 the parameters w_j one parameter at a time. 86 00:04:19,310 --> 00:04:23,000 Similar to the discussion 87 00:04:23,000 --> 00:04:26,555 on vectorized implementations of linear regression, 88 00:04:26,555 --> 00:04:29,540 you can also use vectorization to make 89 00:04:29,540 --> 00:04:33,310 gradient descent run faster for logistic regression. 90 00:04:33,310 --> 00:04:35,390 I won't dive into the details of 91 00:04:35,390 --> 00:04:37,955 the vectorized implementation in this video. 92 00:04:37,955 --> 00:04:40,130 But you can also learn more about it and 93 00:04:40,130 --> 00:04:42,620 see the code in the optional labs. 94 00:04:42,620 --> 00:04:45,200 Now you know how to implement 95 00:04:45,200 --> 00:04:48,220 gradient descent for logistic regression. 96 00:04:48,220 --> 00:04:50,840 You might also remember feature 97 00:04:50,840 --> 00:04:54,170 scaling when we were using linear regression. 98 00:04:54,170 --> 00:04:56,524 Where you saw how feature scaling, 99 00:04:56,524 --> 00:04:58,160 that is scaling all the features to 100 00:04:58,160 --> 00:04:59,960 take on similar ranges of values, 101 00:04:59,960 --> 00:05:02,345 say between negative 1 and plus 1, 102 00:05:02,345 --> 00:05:06,095 how they can help gradient descent to converge faster. 103 00:05:06,095 --> 00:05:08,675 Feature scaling applied the same way 104 00:05:08,675 --> 00:05:10,610 to scale the different features to take on 105 00:05:10,610 --> 00:05:12,860 similar ranges of values can also speed 106 00:05:12,860 --> 00:05:15,875 up gradient descent for logistic regression. 107 00:05:15,875 --> 00:05:18,395 In the upcoming optional lab, 108 00:05:18,395 --> 00:05:21,370 you also see how the gradient 109 00:05:21,370 --> 00:05:25,655 for the logistic regression can be calculated in code. 110 00:05:25,655 --> 00:05:28,390 This will be useful to look at because you 111 00:05:28,390 --> 00:05:29,620 also implement this in 112 00:05:29,620 --> 00:05:32,245 the practice lab at the end of this week. 113 00:05:32,245 --> 00:05:35,395 After you run gradient descent in this lab, 114 00:05:35,395 --> 00:05:36,730 there'll be a nice set of 115 00:05:36,730 --> 00:05:40,010 animated plots that show gradient descent in action. 116 00:05:40,010 --> 00:05:41,920 You see the sigmoid function, 117 00:05:41,920 --> 00:05:44,245 the contour plot of the cost, 118 00:05:44,245 --> 00:05:46,540 the 3D surface plot of the cost, 119 00:05:46,540 --> 00:05:47,710 and the learning curve or 120 00:05:47,710 --> 00:05:50,395 evolve as gradient descent runs. 121 00:05:50,395 --> 00:05:53,200 There will be another optional lab after that, 122 00:05:53,200 --> 00:05:54,430 which is short and sweet, 123 00:05:54,430 --> 00:05:55,930 but also very useful 124 00:05:55,930 --> 00:05:57,730 because they're showing you how to use 125 00:05:57,730 --> 00:06:00,740 the popular scikit-learn library to 126 00:06:00,740 --> 00:06:04,430 train the logistic regression model for classification. 127 00:06:04,430 --> 00:06:07,460 Many machine learning practitioners in 128 00:06:07,460 --> 00:06:08,840 many companies today use 129 00:06:08,840 --> 00:06:11,660 scikit-learn regularly as part of their job. 130 00:06:11,660 --> 00:06:13,880 I hope you check out the scikit-learn 131 00:06:13,880 --> 00:06:15,155 function as well and 132 00:06:15,155 --> 00:06:18,845 take a look at how that is used. That's it. 133 00:06:18,845 --> 00:06:22,250 You should now know how to implement logistic regression. 134 00:06:22,250 --> 00:06:24,200 This is a very powerful and very 135 00:06:24,200 --> 00:06:26,375 widely used learning algorithm 136 00:06:26,375 --> 00:06:27,890 and you now know how to get it 137 00:06:27,890 --> 00:06:31,020 to work yourself. Congratulations.9838