Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,600 --> 00:00:05,820
Hello welcome back to the course on computer vision we talk about the horror like features the name
2
00:00:05,880 --> 00:00:14,970
Haar comes from Alfred Haar who was a Hungarian mathematician in the 19th century and he developed a
3
00:00:15,240 --> 00:00:24,870
mathematical concept called the haar wavelet which is a basis of way for functions which act similar
4
00:00:24,870 --> 00:00:30,900
to the four year transformation but we're not going to get into detail on that right now because we
5
00:00:30,900 --> 00:00:35,790
don't really need that we just need the horror like features which have since been derived from the
6
00:00:35,790 --> 00:00:42,840
hard wavelet and our that's why they called Horo like features they just kind of like the descendants
7
00:00:42,870 --> 00:00:48,650
of the haar wavelet and so we're going to look at the HARLICK features straight away.
8
00:00:48,840 --> 00:00:50,840
So what are these features.
9
00:00:51,060 --> 00:00:52,520
They are.
10
00:00:52,530 --> 00:00:53,340
Let's first look at them.
11
00:00:53,370 --> 00:00:59,010
They are the edge features the line features and the four rectangle features these are the Horlick features
12
00:00:59,010 --> 00:01:04,460
that Viola Jones male and Jones identified in their research in their paper.
13
00:01:04,590 --> 00:01:07,590
And so what are they.
14
00:01:07,630 --> 00:01:11,220
What does it mean to be a feature.
15
00:01:11,220 --> 00:01:14,100
Basically what we have is pixels.
16
00:01:14,100 --> 00:01:20,220
So on an image you might have somewhere for example like a edge.
17
00:01:20,250 --> 00:01:25,200
Right so an edge of feature like for instance an edge of an image for example just imagine like a photo
18
00:01:25,200 --> 00:01:26,580
of a table or something.
19
00:01:26,700 --> 00:01:33,990
Then part of the table where you can see the table top might be light and or lighter than the part where
20
00:01:33,990 --> 00:01:39,360
the table ends might be darker because it's further away it's the floor or the carpet might be a different
21
00:01:39,360 --> 00:01:40,320
color or something.
22
00:01:40,500 --> 00:01:48,120
And so you can identify that that's the end of the table because you know there's a line of pixels which
23
00:01:48,120 --> 00:01:55,920
are brighter line of pixels which are dark in terms of the age in terms of line features you might you
24
00:01:55,920 --> 00:02:01,620
know might identify some objects and it doesn't have to be just like one line of pixel this could be
25
00:02:01,620 --> 00:02:06,030
like three columns of pixels and then you know 15 rows of pixels.
26
00:02:06,270 --> 00:02:08,210
So these are scalable.
27
00:02:08,520 --> 00:02:11,680
They can be longer or they can be wider and so on.
28
00:02:11,850 --> 00:02:17,160
So let's have a look at how all of these work out in the in terms of a face.
29
00:02:17,260 --> 00:02:17,460
Right.
30
00:02:17,460 --> 00:02:23,760
So here we've got our friend J.D. and let's see if we can identify some Horlick features here and that
31
00:02:23,790 --> 00:02:27,840
will help us understand better why exactly they were picked.
32
00:02:27,840 --> 00:02:31,040
What's a role what purpose they serve.
33
00:02:31,230 --> 00:02:32,310
So have a look here.
34
00:02:32,310 --> 00:02:39,250
So if you look at his mouth you can see here there is a tiny tiny tiny blackline then.
35
00:02:39,290 --> 00:02:43,530
So that line that part is darker than the lips.
36
00:02:43,530 --> 00:02:46,130
So that's a horror like feature.
37
00:02:46,230 --> 00:02:52,220
You can see that that's a feature that's a line feature because that part is darker than that.
38
00:02:52,230 --> 00:02:53,630
Those two parts bottom and top.
39
00:02:53,640 --> 00:03:03,370
And this is all very rough very very very kind of rough approach to this image but nevertheless it works.
40
00:03:03,540 --> 00:03:04,570
And you will see why it works.
41
00:03:04,580 --> 00:03:08,420
Further down as we talk about things like the integral image and so on.
42
00:03:08,430 --> 00:03:13,890
But for now let's just let's just stick to this bar so we kind of like breaking out these features we're
43
00:03:13,890 --> 00:03:15,430
trying to see them in this image.
44
00:03:15,480 --> 00:03:17,040
So what else can we see here.
45
00:03:17,070 --> 00:03:19,560
Can you see any other of the features that we discussed.
46
00:03:19,680 --> 00:03:22,300
If you if we go back we've got the edge features.
47
00:03:22,440 --> 00:03:25,760
So we just identified this one we just identified a line feature.
48
00:03:26,070 --> 00:03:32,450
Can we find maybe an age feature in several There's a line feature they'll look at the eyebrow you can
49
00:03:32,450 --> 00:03:41,390
see that the eyebrow is darker than the forehead so let's put a feature there so you can see that that
50
00:03:41,570 --> 00:03:46,700
signifies the edge of the forehead that line that eyebrow.
51
00:03:46,910 --> 00:03:59,000
And so therefore the hard feature of an edge can serve a as a concern to show where this break this
52
00:03:59,330 --> 00:04:00,510
division happens.
53
00:04:00,860 --> 00:04:05,600
And that's normally going to be the case for in most photos that you process.
54
00:04:05,840 --> 00:04:09,980
Most of the time the eyebrow is going to be darker in the face.
55
00:04:10,060 --> 00:04:15,770
So for you know some some rare instances or some non frequent instances in most cases are going to be
56
00:04:15,770 --> 00:04:19,750
like this and we'll I will talk about how all of these features were together.
57
00:04:19,850 --> 00:04:25,430
So it doesn't have to be like every single time it's is like this but they're going to have a kind of
58
00:04:25,430 --> 00:04:30,020
like a voting system that the more features the more likely it is a face.
59
00:04:30,140 --> 00:04:31,300
Let's have a look at some more.
60
00:04:31,520 --> 00:04:35,350
What can we see on for example on the nose.
61
00:04:35,450 --> 00:04:40,040
Can we see that part of the nose is brighter than the other part.
62
00:04:40,040 --> 00:04:42,400
So you can see here.
63
00:04:42,930 --> 00:04:44,780
Let's see on the left here.
64
00:04:44,800 --> 00:04:45,740
The nose is brighter.
65
00:04:45,740 --> 00:04:48,660
The bridge of the nose is brighter than this part.
66
00:04:48,710 --> 00:04:51,130
And that's normally going to be the case.
67
00:04:51,170 --> 00:04:57,560
So if you look at pretty much any photo in grayscale you will often find that the bridge of the nose
68
00:04:57,710 --> 00:05:02,240
is brighter than either just one side or even both sides.
69
00:05:02,240 --> 00:05:03,110
It depends on the lighting.
70
00:05:03,110 --> 00:05:08,360
Sometimes you might find here like a line type of feature so this part is wide and then you've got dark
71
00:05:08,780 --> 00:05:09,330
and dark.
72
00:05:09,350 --> 00:05:14,390
That's just how the human face reflects light in this case the lights coming from the left.
73
00:05:14,480 --> 00:05:17,500
And that's why the left part is a bit brighter overall.
74
00:05:17,540 --> 00:05:18,360
The right parts are.
75
00:05:18,390 --> 00:05:20,230
But nevertheless you can see this edge.
76
00:05:20,270 --> 00:05:23,160
So here you are nobody have a line or an edge feature.
77
00:05:23,390 --> 00:05:25,880
That's exactly what we're drawing here.
78
00:05:25,880 --> 00:05:28,060
There's our edge feature.
79
00:05:28,840 --> 00:05:30,050
And then let's have a look another one.
80
00:05:30,050 --> 00:05:36,410
You can see the eye for instance because though the outside outside of the outer part of the eye is
81
00:05:36,420 --> 00:05:39,350
white and the inner part is not white.
82
00:05:39,500 --> 00:05:45,740
It means it's dark or so it doesn't have to be like black and white it's black and white would be ideal.
83
00:05:45,740 --> 00:05:50,540
So for instance the eyebrow in the forehead in this case are very very close to black and white and
84
00:05:50,600 --> 00:05:52,540
because of the eye it's less close.
85
00:05:52,550 --> 00:05:58,460
But nevertheless it still works you can see that you can identify that feature inside the eye as you
86
00:05:58,460 --> 00:06:03,020
could could go keep going like that you could find another one for the other eye than eyebrow.
87
00:06:03,200 --> 00:06:08,060
You could you just keep finding these features and some of them would be dependent on the lighting some
88
00:06:08,060 --> 00:06:10,550
of them would be dependent on the person inside.
89
00:06:10,570 --> 00:06:19,580
But overall you'd have kind of like a set of features that are predominantly present on most human faces
90
00:06:19,580 --> 00:06:27,880
or like you have like a set of features most of which are present on any face meaning that if you you
91
00:06:27,890 --> 00:06:33,110
have to have like at least 80 or 90 percent of the features like speaking roughly of course we'll get
92
00:06:33,110 --> 00:06:35,220
into more detail about this further down.
93
00:06:35,420 --> 00:06:39,860
But you'd have a speaker after you'd have to have like 80 or 90 percent of those features actually and
94
00:06:39,860 --> 00:06:44,690
fight for it something to be a face would be very rare that you would have a face that would have only
95
00:06:44,690 --> 00:06:45,770
half of those.
96
00:06:46,070 --> 00:06:50,600
So now let's see how exactly this works in terms of the pixels.
97
00:06:50,720 --> 00:06:56,310
So let's talk about the nose bridge featured or identified this age feature over here.
98
00:06:56,390 --> 00:07:00,480
So if we zoom in a little bit you can see it is a feature.
99
00:07:00,520 --> 00:07:02,520
Now I'm going to draw.
100
00:07:02,780 --> 00:07:11,040
I'm going to show the nose without the feature overlaid but instead we're going to have the pixels so
101
00:07:11,050 --> 00:07:18,540
of course this is a very simplified representation because there are many more pixels in this space.
102
00:07:18,550 --> 00:07:23,050
But just for argument's sake let's say that there's four pixels by eight pixels thirty two pixels in
103
00:07:23,050 --> 00:07:25,130
total and this nose.
104
00:07:25,240 --> 00:07:28,470
And now we're going to understand how the algorithm works.
105
00:07:28,540 --> 00:07:36,100
So let's ride out these pixels and here I've given them values between 0 and 10 there is 0 and 1 0 means
106
00:07:36,100 --> 00:07:36,880
very white.
107
00:07:36,880 --> 00:07:43,950
So 100 percent white 10 mean and 1 means 100 percent black and anywhere in between is greyscale.
108
00:07:43,960 --> 00:07:51,810
And of course greyscale is from 0 to 255 because it's like 256 values and so on.
109
00:07:51,820 --> 00:07:53,740
But we're not going to bother with writing them a lot.
110
00:07:53,740 --> 00:08:00,560
We're just going to normalize them two from 0 to 1 and just keep it that way.
111
00:08:00,940 --> 00:08:02,790
So it will be easy just for us to discuss.
112
00:08:02,840 --> 00:08:08,020
And so here you can look at this image and you can see that this is not 100 percent white.
113
00:08:08,020 --> 00:08:09,210
This would be 100 percent white.
114
00:08:09,370 --> 00:08:10,630
But this is close to white.
115
00:08:10,630 --> 00:08:18,010
So in an image like for argument's sake again this could be like 0.1 this pixel could have like a 0.1
116
00:08:18,010 --> 00:08:24,430
intensity given that zeros and white and one is 100 percent like this pixel is a bit darker.
117
00:08:24,430 --> 00:08:29,320
So is there a point to this pixel is about the same brightness or upon one's report ones or report one
118
00:08:29,560 --> 00:08:31,770
0.2 it's a bit darker over here.
119
00:08:31,870 --> 00:08:35,720
0.3 it's even darker if you compare to that one.
120
00:08:35,740 --> 00:08:37,450
This one is 0.1.
121
00:08:37,450 --> 00:08:42,040
This is a point and so on and so on so you can see that these are they're not all zeros but they're
122
00:08:42,040 --> 00:08:42,700
close.
123
00:08:42,760 --> 00:08:49,540
They're on the side you can see that this one is about 0.4 or somewhere on their 0.5.
124
00:08:49,540 --> 00:08:55,780
This gets darker 0.8 3.6 this was a bit brighter sort of potent force similar to that.
125
00:08:56,350 --> 00:09:01,310
Closer to that than to that and then to that for instance the reported fires are pretty high.
126
00:09:01,360 --> 00:09:03,610
So you can see kind of the values here.
127
00:09:03,610 --> 00:09:11,770
Now what's the algorithm will do what the algorithm will do is now that we've identified all of this
128
00:09:11,770 --> 00:09:17,680
so this is this is where we see the feature we can see that kind of like white and black but this is
129
00:09:17,680 --> 00:09:22,530
what is happening in reality not actually 100 percent black and Harmsen white that somewhere greyscale.
130
00:09:22,720 --> 00:09:25,860
So what the Belgians algorithm will do is it will compare.
131
00:09:25,870 --> 00:09:32,410
How close is this real world tonight or what is happening in the real world to the ideal Haar like feature
132
00:09:32,410 --> 00:09:33,970
that it's looking for the white and black.
133
00:09:33,970 --> 00:09:34,740
How close is it.
134
00:09:34,750 --> 00:09:35,880
So how would it do that.
135
00:09:36,070 --> 00:09:40,740
Well it will take the average of these Vaso First of all it will calculate the sum.
136
00:09:40,750 --> 00:09:45,820
That's the first thing it does it calculates the sum of all these pixels and then divide by the number
137
00:09:45,820 --> 00:09:52,180
of pixels it takes the average intensity of the white pixels and 0.1 6:06 takes the same thing for the
138
00:09:52,180 --> 00:09:57,360
black pixels takes the sum of all of these values and now takes the average.
139
00:09:57,370 --> 00:10:04,240
So as devised by whatever 16 here adds them up to as 16 gets average and this is the average intensity
140
00:10:04,240 --> 00:10:07,240
of all of these pixels Fizer 0.06.
141
00:10:07,480 --> 00:10:13,120
And now it's simply subtract one from the other subtracts this from that.
142
00:10:13,120 --> 00:10:17,680
So the black minus the white is 0.4 02.
143
00:10:17,920 --> 00:10:24,000
So these pixels and minus these pixels are the average minus averages 0.4.
144
00:10:24,490 --> 00:10:31,150
Now so the question is what would this difference be in the ideal scenario.
145
00:10:31,150 --> 00:10:38,890
Kind of the ideal scenario that where you would find exactly this feature where this would be all black
146
00:10:38,890 --> 00:10:40,110
and this will be all white.
147
00:10:40,390 --> 00:10:41,500
Well exactly right.
148
00:10:41,500 --> 00:10:43,510
If it is all black this is all one.
149
00:10:43,530 --> 00:10:45,380
If so why this is all zeros.
150
00:10:45,460 --> 00:10:50,320
So it'd be one minus zero the average be one the average here would be 0 1 0 would be 1.
151
00:10:50,320 --> 00:10:59,210
So basically in this in the in this approach the closer it is to one the closer the real world scenario.
152
00:10:59,230 --> 00:11:03,020
This one is to what is being looked for.
153
00:11:03,070 --> 00:11:10,390
Now of course we understand that we're humans we're not robots we don't have black and white delimited
154
00:11:10,390 --> 00:11:16,180
parts like that so when we're And you know it depends on the lighting depends on on all these different
155
00:11:16,180 --> 00:11:16,440
things.
156
00:11:16,440 --> 00:11:20,880
So of course you're never going to have an exact one here or an exact zero.
157
00:11:20,880 --> 00:11:26,200
This is always going to be some sort of gradients some sort of different colors shades and so on.
158
00:11:26,500 --> 00:11:33,250
That's why the bell Jones has some thresholds and through the training which we'll discuss further down
159
00:11:33,640 --> 00:11:36,090
through the training these thresholds identified.
160
00:11:36,100 --> 00:11:42,190
And for instance for this specific feature in this position it's identified that the minimum threshold
161
00:11:42,190 --> 00:11:43,920
might be 0.3.
162
00:11:44,110 --> 00:11:52,600
And so whenever you get a result a calculated value here that is greater than 0.3 that means that indeed
163
00:11:52,720 --> 00:11:57,820
this is somewhat similar to this HARLICK feature which we're looking for and we're going to say that
164
00:11:57,850 --> 00:12:02,530
yes this Harlech feature is present in the image where in that position where we were looking for it
165
00:12:03,280 --> 00:12:09,980
if on the other hand you get like 0.1 or basically a value that's less than your threshold then is the
166
00:12:10,020 --> 00:12:13,450
difference between the two colorings for instance here.
167
00:12:13,940 --> 00:12:18,580
For answers if you just basically applied looked for this feature because the algorithm doesn't know
168
00:12:18,580 --> 00:12:22,600
that it needs to look for it here it will just look for it everywhere and look for it here or look for
169
00:12:22,600 --> 00:12:23,100
a hero.
170
00:12:23,110 --> 00:12:27,790
It's looking for the nose and at the end all it knows about the nose is that the nose looks like this
171
00:12:28,210 --> 00:12:29,770
like this like this feature.
172
00:12:29,770 --> 00:12:31,000
So it's going to look for it everywhere.
173
00:12:31,000 --> 00:12:36,580
So if for instance algorithm were to apply this exact HARLICK feature some over here in the top left
174
00:12:36,580 --> 00:12:43,890
corner then the difference between the real world of whites and the real or black would be nothing right
175
00:12:43,900 --> 00:12:46,090
because it would be zero.
176
00:12:46,120 --> 00:12:51,430
In this case because it's exactly the same it's just looking at the wall in the white and the wall in
177
00:12:51,430 --> 00:12:52,140
the black.
178
00:12:52,360 --> 00:13:00,140
So it would look for the nose in many different places and then it might find a nose somewhere else.
179
00:13:00,160 --> 00:13:04,120
But then through other features it will understand that that's not the face.
180
00:13:04,240 --> 00:13:06,760
But for now what will it find here in.
181
00:13:06,850 --> 00:13:09,420
Once it gets to the actual nose it will see that yes.
182
00:13:09,450 --> 00:13:15,340
In this case it it surpasses the three threshold that exceeds the threshold so this we're going to say
183
00:13:15,340 --> 00:13:20,080
that there is this potentially or knows over here that might be a potential nose over there that might
184
00:13:20,080 --> 00:13:23,830
be a potential nose or here but it's definitely going to say that there is no nose in many other places
185
00:13:24,160 --> 00:13:27,780
and then through the other features and we'll learn more about this throughout the section.
186
00:13:27,790 --> 00:13:32,710
It will be able to identify where exactly the nose is where exactly the faces but that's that's the
187
00:13:32,710 --> 00:13:33,370
starting point.
188
00:13:33,370 --> 00:13:35,380
And so that's where the threshold comes in.
189
00:13:35,380 --> 00:13:43,270
The threshold is very useful to understand if this feature is actually present or if this is or we should
190
00:13:43,270 --> 00:13:47,640
move on and there is this feature is not present in that location we're looking.
191
00:13:47,650 --> 00:13:51,010
So that's how the Horlick features work.
192
00:13:51,130 --> 00:13:52,280
Very important.
193
00:13:52,320 --> 00:13:54,960
They lie in the foundation of this algorithm.
194
00:13:55,000 --> 00:14:01,840
They're kind of like once those features that were identified in the face the like for eyebrows and
195
00:14:01,840 --> 00:14:02,240
so on.
196
00:14:02,260 --> 00:14:09,160
Once you once the Aluf you don't have to do this by your algorithm identifies through training a a number
197
00:14:09,160 --> 00:14:17,410
of features quite a large number of features then that is kind of like forms the basis for to understand
198
00:14:17,410 --> 00:14:24,060
what elements roughly a face is constructed from and then it will be able to use those to detect the
199
00:14:24,060 --> 00:14:24,530
face.
200
00:14:24,640 --> 00:14:30,340
But in the in a nutshell this is how it works through the pixels through the white and black regions
201
00:14:30,400 --> 00:14:34,410
which it calculates averages for and then subtract.
202
00:14:34,900 --> 00:14:35,590
So there we go.
203
00:14:35,590 --> 00:14:39,040
Hope you enjoyed today said Tauriel and I'll look for see in the next one.
204
00:14:39,050 --> 00:14:40,770
And until then enjoy computer vision.
22398
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.