subtitlecat.com

All language subtitles for 02_simplified-cost-function-for-logistic-regression.en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian Download

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,680 --> 00:00:04,380 In the last video you saw the loss function and 2 00:00:04,380 --> 00:00:07,559 the cost function for logistic regression. 3 00:00:07,559 --> 00:00:09,300 In this video you'll see 4 00:00:09,300 --> 00:00:11,400 a slightly simpler way to write 5 00:00:11,400 --> 00:00:13,710 out the loss and cost functions, 6 00:00:13,710 --> 00:00:15,315 so that the implementation 7 00:00:15,315 --> 00:00:17,160 can be a bit simpler when we get to 8 00:00:17,160 --> 00:00:18,960 gradient descent for fitting 9 00:00:18,960 --> 00:00:22,080 the parameters of a logistic regression model. 10 00:00:22,080 --> 00:00:24,645 Let's take a look. As a reminder, 11 00:00:24,645 --> 00:00:27,495 here is the loss function that we had defined 12 00:00:27,495 --> 00:00:30,915 in the previous video for logistic regression. 13 00:00:30,915 --> 00:00:33,030 Because we're still working on 14 00:00:33,030 --> 00:00:34,920 a binary classification problem, 15 00:00:34,920 --> 00:00:37,770 y is either zero or one. 16 00:00:37,770 --> 00:00:41,430 Because y is either zero or 17 00:00:41,430 --> 00:00:45,440 one and cannot take on any value other than zero or one, 18 00:00:45,440 --> 00:00:47,630 we'll be able to come up with a simpler way 19 00:00:47,630 --> 00:00:50,030 to write this loss function. 20 00:00:50,030 --> 00:00:53,285 You can write the loss function as follows. 21 00:00:53,285 --> 00:00:57,590 Given a prediction f of x and the target label y, 22 00:00:57,590 --> 00:01:04,805 the loss equals negative y times log of f minus 23 00:01:04,805 --> 00:01:07,865 1 minus y times log of 24 00:01:07,865 --> 00:01:12,890 1 minus f. It turns out this equation, 25 00:01:12,890 --> 00:01:14,930 which we just wrote in one line, 26 00:01:14,930 --> 00:01:17,120 is completely equivalent to 27 00:01:17,120 --> 00:01:20,180 this more complex formula up here. 28 00:01:20,180 --> 00:01:23,130 Let's see why this is the case. 29 00:01:23,150 --> 00:01:26,840 Remember, y can only take on 30 00:01:26,840 --> 00:01:31,120 the values of either one or zero. 31 00:01:31,120 --> 00:01:33,350 In the first case, 32 00:01:33,350 --> 00:01:35,690 let's say y equals 1. 33 00:01:35,690 --> 00:01:38,915 This first y over here is 34 00:01:38,915 --> 00:01:43,780 one and this 1 minus y is 1 minus 1, 35 00:01:43,780 --> 00:01:46,205 which is therefore equal to 0. 36 00:01:46,205 --> 00:01:50,615 So the loss becomes negative 1 times log 37 00:01:50,615 --> 00:01:54,980 of f of x minus 0 times a bunch of stuff. 38 00:01:54,980 --> 00:01:58,240 That becomes zero and goes away. 39 00:01:58,240 --> 00:02:00,895 When y is equal to 1, 40 00:02:00,895 --> 00:02:04,130 the loss is indeed the first term on top, 41 00:02:04,130 --> 00:02:07,450 negative log of f of x. 42 00:02:07,450 --> 00:02:09,990 Let's look at the second case, 43 00:02:09,990 --> 00:02:13,605 when y is equal to 0. 44 00:02:13,605 --> 00:02:18,500 In this case, this y here is equal to 0, 45 00:02:18,500 --> 00:02:21,350 so this first term goes away, 46 00:02:21,350 --> 00:02:23,825 and the second term is 47 00:02:23,825 --> 00:02:29,010 1 minus 0 times that logarithmic term. 48 00:02:29,450 --> 00:02:36,090 The loss becomes this negative 1 times log 49 00:02:36,090 --> 00:02:38,535 of 1 minus f of x. 50 00:02:38,535 --> 00:02:43,550 That's just equal to this second term up here. 51 00:02:43,550 --> 00:02:47,465 In the case of y equals 0, 52 00:02:47,465 --> 00:02:49,070 we also get back 53 00:02:49,070 --> 00:02:52,735 the original loss function as defined above. 54 00:02:52,735 --> 00:02:57,685 What you see is that whether y is one or zero, 55 00:02:57,685 --> 00:02:59,960 this single expression here is 56 00:02:59,960 --> 00:03:03,110 equivalent to the more complex expression up here, 57 00:03:03,110 --> 00:03:04,700 which is why this gives us 58 00:03:04,700 --> 00:03:06,950 a simpler way to write the loss with 59 00:03:06,950 --> 00:03:09,320 just one equation without separating 60 00:03:09,320 --> 00:03:13,135 out these two cases, like we did on top. 61 00:03:13,135 --> 00:03:16,440 Using this simplified loss function, 62 00:03:16,440 --> 00:03:17,810 let's go back and write out 63 00:03:17,810 --> 00:03:20,940 the cost function for logistic regression. 64 00:03:21,710 --> 00:03:26,700 Here again is the simplified loss function. 65 00:03:26,700 --> 00:03:31,415 Recall that the cost J is just the average loss, 66 00:03:31,415 --> 00:03:35,905 average across the entire training set of m examples. 67 00:03:35,905 --> 00:03:37,490 So it's 1 over 68 00:03:37,490 --> 00:03:41,120 n times the sum of the loss from i equals 1 69 00:03:41,120 --> 00:03:43,760 to m. If you 70 00:03:43,760 --> 00:03:44,960 plug in the definition for 71 00:03:44,960 --> 00:03:46,775 the simplified loss from above, 72 00:03:46,775 --> 00:03:48,395 then it looks like this, 73 00:03:48,395 --> 00:03:52,960 1 over m times the sum of this term above. 74 00:03:52,960 --> 00:03:57,245 If you bring the negative signs and move them outside, 75 00:03:57,245 --> 00:04:00,425 then you end up with this expression over here, 76 00:04:00,425 --> 00:04:02,875 and this is the cost function. 77 00:04:02,875 --> 00:04:05,270 The cost function that pretty much everyone 78 00:04:05,270 --> 00:04:08,495 uses to train logistic regression. 79 00:04:08,495 --> 00:04:10,655 You might be wondering, 80 00:04:10,655 --> 00:04:13,850 why do we choose this particular function when 81 00:04:13,850 --> 00:04:15,170 there could be tons of 82 00:04:15,170 --> 00:04:17,710 other costs functions we could have chosen? 83 00:04:17,710 --> 00:04:19,850 Although we won't have time to go into 84 00:04:19,850 --> 00:04:22,490 great detail on this in this class, 85 00:04:22,490 --> 00:04:24,200 I'd just like to mention that 86 00:04:24,200 --> 00:04:27,320 this particular cost function is derived from 87 00:04:27,320 --> 00:04:30,560 statistics using a statistical principle 88 00:04:30,560 --> 00:04:33,400 called maximum likelihood estimation, 89 00:04:33,400 --> 00:04:36,530 which is an idea from statistics on how to 90 00:04:36,530 --> 00:04:40,195 efficiently find parameters for different models. 91 00:04:40,195 --> 00:04:43,310 This cost function has 92 00:04:43,310 --> 00:04:46,175 the nice property that it is convex. 93 00:04:46,175 --> 00:04:48,140 But don't worry about learning 94 00:04:48,140 --> 00:04:50,300 the details of maximum likelihood. 95 00:04:50,300 --> 00:04:52,580 It's just a deeper rationale and 96 00:04:52,580 --> 00:04:56,910 justification behind this particular cost function. 97 00:04:57,010 --> 00:05:01,010 The upcoming optional lab will show you how 98 00:05:01,010 --> 00:05:04,700 the logistic cost function is implemented in code. 99 00:05:04,700 --> 00:05:07,075 I recommend taking a look at it, 100 00:05:07,075 --> 00:05:09,710 because you implement this later 101 00:05:09,710 --> 00:05:13,180 into practice lab at the end of the week. 102 00:05:13,180 --> 00:05:17,225 This upcoming optional lab also shows you how 103 00:05:17,225 --> 00:05:19,489 two different choices of the parameters 104 00:05:19,489 --> 00:05:22,540 will lead to different cost calculations. 105 00:05:22,540 --> 00:05:25,790 You can see in the plot that 106 00:05:25,790 --> 00:05:28,550 the better fitting blue decision boundary has 107 00:05:28,550 --> 00:05:33,430 a lower cost relative to the magenta decision boundary. 108 00:05:33,430 --> 00:05:36,560 So with the simplified cost function, 109 00:05:36,560 --> 00:05:38,780 we're now ready to jump into applying 110 00:05:38,780 --> 00:05:41,555 gradient descent to logistic regression. 111 00:05:41,555 --> 00:05:44,640 Let's go see that in the next video.7813