Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,680 --> 00:00:04,380
In the last video you saw
the loss function and
2
00:00:04,380 --> 00:00:07,559
the cost function for
logistic regression.
3
00:00:07,559 --> 00:00:09,300
In this video you'll see
4
00:00:09,300 --> 00:00:11,400
a slightly simpler way to write
5
00:00:11,400 --> 00:00:13,710
out the loss and cost functions,
6
00:00:13,710 --> 00:00:15,315
so that the implementation
7
00:00:15,315 --> 00:00:17,160
can be a bit simpler
when we get to
8
00:00:17,160 --> 00:00:18,960
gradient descent for fitting
9
00:00:18,960 --> 00:00:22,080
the parameters of a
logistic regression model.
10
00:00:22,080 --> 00:00:24,645
Let's take a look.
As a reminder,
11
00:00:24,645 --> 00:00:27,495
here is the loss function
that we had defined
12
00:00:27,495 --> 00:00:30,915
in the previous video
for logistic regression.
13
00:00:30,915 --> 00:00:33,030
Because we're still working on
14
00:00:33,030 --> 00:00:34,920
a binary classification problem,
15
00:00:34,920 --> 00:00:37,770
y is either zero or one.
16
00:00:37,770 --> 00:00:41,430
Because y is either zero or
17
00:00:41,430 --> 00:00:45,440
one and cannot take on any
value other than zero or one,
18
00:00:45,440 --> 00:00:47,630
we'll be able to come
up with a simpler way
19
00:00:47,630 --> 00:00:50,030
to write this loss function.
20
00:00:50,030 --> 00:00:53,285
You can write the loss
function as follows.
21
00:00:53,285 --> 00:00:57,590
Given a prediction f of x
and the target label y,
22
00:00:57,590 --> 00:01:04,805
the loss equals negative
y times log of f minus
23
00:01:04,805 --> 00:01:07,865
1 minus y times log of
24
00:01:07,865 --> 00:01:12,890
1 minus f. It turns
out this equation,
25
00:01:12,890 --> 00:01:14,930
which we just wrote in one line,
26
00:01:14,930 --> 00:01:17,120
is completely equivalent to
27
00:01:17,120 --> 00:01:20,180
this more complex
formula up here.
28
00:01:20,180 --> 00:01:23,130
Let's see why this is the case.
29
00:01:23,150 --> 00:01:26,840
Remember, y can only take on
30
00:01:26,840 --> 00:01:31,120
the values of
either one or zero.
31
00:01:31,120 --> 00:01:33,350
In the first case,
32
00:01:33,350 --> 00:01:35,690
let's say y equals 1.
33
00:01:35,690 --> 00:01:38,915
This first y over here is
34
00:01:38,915 --> 00:01:43,780
one and this 1 minus
y is 1 minus 1,
35
00:01:43,780 --> 00:01:46,205
which is therefore equal to 0.
36
00:01:46,205 --> 00:01:50,615
So the loss becomes
negative 1 times log
37
00:01:50,615 --> 00:01:54,980
of f of x minus 0 times
a bunch of stuff.
38
00:01:54,980 --> 00:01:58,240
That becomes zero and goes away.
39
00:01:58,240 --> 00:02:00,895
When y is equal to 1,
40
00:02:00,895 --> 00:02:04,130
the loss is indeed the
first term on top,
41
00:02:04,130 --> 00:02:07,450
negative log of f of x.
42
00:02:07,450 --> 00:02:09,990
Let's look at the second case,
43
00:02:09,990 --> 00:02:13,605
when y is equal to 0.
44
00:02:13,605 --> 00:02:18,500
In this case, this y
here is equal to 0,
45
00:02:18,500 --> 00:02:21,350
so this first term goes away,
46
00:02:21,350 --> 00:02:23,825
and the second term is
47
00:02:23,825 --> 00:02:29,010
1 minus 0 times that
logarithmic term.
48
00:02:29,450 --> 00:02:36,090
The loss becomes this
negative 1 times log
49
00:02:36,090 --> 00:02:38,535
of 1 minus f of x.
50
00:02:38,535 --> 00:02:43,550
That's just equal to this
second term up here.
51
00:02:43,550 --> 00:02:47,465
In the case of y equals 0,
52
00:02:47,465 --> 00:02:49,070
we also get back
53
00:02:49,070 --> 00:02:52,735
the original loss function
as defined above.
54
00:02:52,735 --> 00:02:57,685
What you see is that
whether y is one or zero,
55
00:02:57,685 --> 00:02:59,960
this single expression here is
56
00:02:59,960 --> 00:03:03,110
equivalent to the more
complex expression up here,
57
00:03:03,110 --> 00:03:04,700
which is why this gives us
58
00:03:04,700 --> 00:03:06,950
a simpler way to
write the loss with
59
00:03:06,950 --> 00:03:09,320
just one equation
without separating
60
00:03:09,320 --> 00:03:13,135
out these two cases,
like we did on top.
61
00:03:13,135 --> 00:03:16,440
Using this simplified
loss function,
62
00:03:16,440 --> 00:03:17,810
let's go back and write out
63
00:03:17,810 --> 00:03:20,940
the cost function for
logistic regression.
64
00:03:21,710 --> 00:03:26,700
Here again is the
simplified loss function.
65
00:03:26,700 --> 00:03:31,415
Recall that the cost J is
just the average loss,
66
00:03:31,415 --> 00:03:35,905
average across the entire
training set of m examples.
67
00:03:35,905 --> 00:03:37,490
So it's 1 over
68
00:03:37,490 --> 00:03:41,120
n times the sum of the
loss from i equals 1
69
00:03:41,120 --> 00:03:43,760
to m. If you
70
00:03:43,760 --> 00:03:44,960
plug in the definition for
71
00:03:44,960 --> 00:03:46,775
the simplified loss from above,
72
00:03:46,775 --> 00:03:48,395
then it looks like this,
73
00:03:48,395 --> 00:03:52,960
1 over m times the sum
of this term above.
74
00:03:52,960 --> 00:03:57,245
If you bring the negative
signs and move them outside,
75
00:03:57,245 --> 00:04:00,425
then you end up with this
expression over here,
76
00:04:00,425 --> 00:04:02,875
and this is the cost function.
77
00:04:02,875 --> 00:04:05,270
The cost function that
pretty much everyone
78
00:04:05,270 --> 00:04:08,495
uses to train
logistic regression.
79
00:04:08,495 --> 00:04:10,655
You might be wondering,
80
00:04:10,655 --> 00:04:13,850
why do we choose this
particular function when
81
00:04:13,850 --> 00:04:15,170
there could be tons of
82
00:04:15,170 --> 00:04:17,710
other costs functions
we could have chosen?
83
00:04:17,710 --> 00:04:19,850
Although we won't
have time to go into
84
00:04:19,850 --> 00:04:22,490
great detail on
this in this class,
85
00:04:22,490 --> 00:04:24,200
I'd just like to mention that
86
00:04:24,200 --> 00:04:27,320
this particular cost
function is derived from
87
00:04:27,320 --> 00:04:30,560
statistics using a
statistical principle
88
00:04:30,560 --> 00:04:33,400
called maximum
likelihood estimation,
89
00:04:33,400 --> 00:04:36,530
which is an idea from
statistics on how to
90
00:04:36,530 --> 00:04:40,195
efficiently find parameters
for different models.
91
00:04:40,195 --> 00:04:43,310
This cost function has
92
00:04:43,310 --> 00:04:46,175
the nice property
that it is convex.
93
00:04:46,175 --> 00:04:48,140
But don't worry about learning
94
00:04:48,140 --> 00:04:50,300
the details of
maximum likelihood.
95
00:04:50,300 --> 00:04:52,580
It's just a deeper rationale and
96
00:04:52,580 --> 00:04:56,910
justification behind this
particular cost function.
97
00:04:57,010 --> 00:05:01,010
The upcoming optional
lab will show you how
98
00:05:01,010 --> 00:05:04,700
the logistic cost function
is implemented in code.
99
00:05:04,700 --> 00:05:07,075
I recommend taking a look at it,
100
00:05:07,075 --> 00:05:09,710
because you implement this later
101
00:05:09,710 --> 00:05:13,180
into practice lab at
the end of the week.
102
00:05:13,180 --> 00:05:17,225
This upcoming optional
lab also shows you how
103
00:05:17,225 --> 00:05:19,489
two different choices
of the parameters
104
00:05:19,489 --> 00:05:22,540
will lead to different
cost calculations.
105
00:05:22,540 --> 00:05:25,790
You can see in the plot that
106
00:05:25,790 --> 00:05:28,550
the better fitting blue
decision boundary has
107
00:05:28,550 --> 00:05:33,430
a lower cost relative to the
magenta decision boundary.
108
00:05:33,430 --> 00:05:36,560
So with the simplified
cost function,
109
00:05:36,560 --> 00:05:38,780
we're now ready to
jump into applying
110
00:05:38,780 --> 00:05:41,555
gradient descent to
logistic regression.
111
00:05:41,555 --> 00:05:44,640
Let's go see that
in the next video.7813
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.