All language subtitles for 02_simplified-cost-function-for-logistic-regression.en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,680 --> 00:00:04,380 In the last video you saw the loss function and 2 00:00:04,380 --> 00:00:07,559 the cost function for logistic regression. 3 00:00:07,559 --> 00:00:09,300 In this video you'll see 4 00:00:09,300 --> 00:00:11,400 a slightly simpler way to write 5 00:00:11,400 --> 00:00:13,710 out the loss and cost functions, 6 00:00:13,710 --> 00:00:15,315 so that the implementation 7 00:00:15,315 --> 00:00:17,160 can be a bit simpler when we get to 8 00:00:17,160 --> 00:00:18,960 gradient descent for fitting 9 00:00:18,960 --> 00:00:22,080 the parameters of a logistic regression model. 10 00:00:22,080 --> 00:00:24,645 Let's take a look. As a reminder, 11 00:00:24,645 --> 00:00:27,495 here is the loss function that we had defined 12 00:00:27,495 --> 00:00:30,915 in the previous video for logistic regression. 13 00:00:30,915 --> 00:00:33,030 Because we're still working on 14 00:00:33,030 --> 00:00:34,920 a binary classification problem, 15 00:00:34,920 --> 00:00:37,770 y is either zero or one. 16 00:00:37,770 --> 00:00:41,430 Because y is either zero or 17 00:00:41,430 --> 00:00:45,440 one and cannot take on any value other than zero or one, 18 00:00:45,440 --> 00:00:47,630 we'll be able to come up with a simpler way 19 00:00:47,630 --> 00:00:50,030 to write this loss function. 20 00:00:50,030 --> 00:00:53,285 You can write the loss function as follows. 21 00:00:53,285 --> 00:00:57,590 Given a prediction f of x and the target label y, 22 00:00:57,590 --> 00:01:04,805 the loss equals negative y times log of f minus 23 00:01:04,805 --> 00:01:07,865 1 minus y times log of 24 00:01:07,865 --> 00:01:12,890 1 minus f. It turns out this equation, 25 00:01:12,890 --> 00:01:14,930 which we just wrote in one line, 26 00:01:14,930 --> 00:01:17,120 is completely equivalent to 27 00:01:17,120 --> 00:01:20,180 this more complex formula up here. 28 00:01:20,180 --> 00:01:23,130 Let's see why this is the case. 29 00:01:23,150 --> 00:01:26,840 Remember, y can only take on 30 00:01:26,840 --> 00:01:31,120 the values of either one or zero. 31 00:01:31,120 --> 00:01:33,350 In the first case, 32 00:01:33,350 --> 00:01:35,690 let's say y equals 1. 33 00:01:35,690 --> 00:01:38,915 This first y over here is 34 00:01:38,915 --> 00:01:43,780 one and this 1 minus y is 1 minus 1, 35 00:01:43,780 --> 00:01:46,205 which is therefore equal to 0. 36 00:01:46,205 --> 00:01:50,615 So the loss becomes negative 1 times log 37 00:01:50,615 --> 00:01:54,980 of f of x minus 0 times a bunch of stuff. 38 00:01:54,980 --> 00:01:58,240 That becomes zero and goes away. 39 00:01:58,240 --> 00:02:00,895 When y is equal to 1, 40 00:02:00,895 --> 00:02:04,130 the loss is indeed the first term on top, 41 00:02:04,130 --> 00:02:07,450 negative log of f of x. 42 00:02:07,450 --> 00:02:09,990 Let's look at the second case, 43 00:02:09,990 --> 00:02:13,605 when y is equal to 0. 44 00:02:13,605 --> 00:02:18,500 In this case, this y here is equal to 0, 45 00:02:18,500 --> 00:02:21,350 so this first term goes away, 46 00:02:21,350 --> 00:02:23,825 and the second term is 47 00:02:23,825 --> 00:02:29,010 1 minus 0 times that logarithmic term. 48 00:02:29,450 --> 00:02:36,090 The loss becomes this negative 1 times log 49 00:02:36,090 --> 00:02:38,535 of 1 minus f of x. 50 00:02:38,535 --> 00:02:43,550 That's just equal to this second term up here. 51 00:02:43,550 --> 00:02:47,465 In the case of y equals 0, 52 00:02:47,465 --> 00:02:49,070 we also get back 53 00:02:49,070 --> 00:02:52,735 the original loss function as defined above. 54 00:02:52,735 --> 00:02:57,685 What you see is that whether y is one or zero, 55 00:02:57,685 --> 00:02:59,960 this single expression here is 56 00:02:59,960 --> 00:03:03,110 equivalent to the more complex expression up here, 57 00:03:03,110 --> 00:03:04,700 which is why this gives us 58 00:03:04,700 --> 00:03:06,950 a simpler way to write the loss with 59 00:03:06,950 --> 00:03:09,320 just one equation without separating 60 00:03:09,320 --> 00:03:13,135 out these two cases, like we did on top. 61 00:03:13,135 --> 00:03:16,440 Using this simplified loss function, 62 00:03:16,440 --> 00:03:17,810 let's go back and write out 63 00:03:17,810 --> 00:03:20,940 the cost function for logistic regression. 64 00:03:21,710 --> 00:03:26,700 Here again is the simplified loss function. 65 00:03:26,700 --> 00:03:31,415 Recall that the cost J is just the average loss, 66 00:03:31,415 --> 00:03:35,905 average across the entire training set of m examples. 67 00:03:35,905 --> 00:03:37,490 So it's 1 over 68 00:03:37,490 --> 00:03:41,120 n times the sum of the loss from i equals 1 69 00:03:41,120 --> 00:03:43,760 to m. If you 70 00:03:43,760 --> 00:03:44,960 plug in the definition for 71 00:03:44,960 --> 00:03:46,775 the simplified loss from above, 72 00:03:46,775 --> 00:03:48,395 then it looks like this, 73 00:03:48,395 --> 00:03:52,960 1 over m times the sum of this term above. 74 00:03:52,960 --> 00:03:57,245 If you bring the negative signs and move them outside, 75 00:03:57,245 --> 00:04:00,425 then you end up with this expression over here, 76 00:04:00,425 --> 00:04:02,875 and this is the cost function. 77 00:04:02,875 --> 00:04:05,270 The cost function that pretty much everyone 78 00:04:05,270 --> 00:04:08,495 uses to train logistic regression. 79 00:04:08,495 --> 00:04:10,655 You might be wondering, 80 00:04:10,655 --> 00:04:13,850 why do we choose this particular function when 81 00:04:13,850 --> 00:04:15,170 there could be tons of 82 00:04:15,170 --> 00:04:17,710 other costs functions we could have chosen? 83 00:04:17,710 --> 00:04:19,850 Although we won't have time to go into 84 00:04:19,850 --> 00:04:22,490 great detail on this in this class, 85 00:04:22,490 --> 00:04:24,200 I'd just like to mention that 86 00:04:24,200 --> 00:04:27,320 this particular cost function is derived from 87 00:04:27,320 --> 00:04:30,560 statistics using a statistical principle 88 00:04:30,560 --> 00:04:33,400 called maximum likelihood estimation, 89 00:04:33,400 --> 00:04:36,530 which is an idea from statistics on how to 90 00:04:36,530 --> 00:04:40,195 efficiently find parameters for different models. 91 00:04:40,195 --> 00:04:43,310 This cost function has 92 00:04:43,310 --> 00:04:46,175 the nice property that it is convex. 93 00:04:46,175 --> 00:04:48,140 But don't worry about learning 94 00:04:48,140 --> 00:04:50,300 the details of maximum likelihood. 95 00:04:50,300 --> 00:04:52,580 It's just a deeper rationale and 96 00:04:52,580 --> 00:04:56,910 justification behind this particular cost function. 97 00:04:57,010 --> 00:05:01,010 The upcoming optional lab will show you how 98 00:05:01,010 --> 00:05:04,700 the logistic cost function is implemented in code. 99 00:05:04,700 --> 00:05:07,075 I recommend taking a look at it, 100 00:05:07,075 --> 00:05:09,710 because you implement this later 101 00:05:09,710 --> 00:05:13,180 into practice lab at the end of the week. 102 00:05:13,180 --> 00:05:17,225 This upcoming optional lab also shows you how 103 00:05:17,225 --> 00:05:19,489 two different choices of the parameters 104 00:05:19,489 --> 00:05:22,540 will lead to different cost calculations. 105 00:05:22,540 --> 00:05:25,790 You can see in the plot that 106 00:05:25,790 --> 00:05:28,550 the better fitting blue decision boundary has 107 00:05:28,550 --> 00:05:33,430 a lower cost relative to the magenta decision boundary. 108 00:05:33,430 --> 00:05:36,560 So with the simplified cost function, 109 00:05:36,560 --> 00:05:38,780 we're now ready to jump into applying 110 00:05:38,780 --> 00:05:41,555 gradient descent to logistic regression. 111 00:05:41,555 --> 00:05:44,640 Let's go see that in the next video.7813

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.