Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,540 --> 00:00:02,730
Hello and welcome to the course on computer vision.
2
00:00:02,730 --> 00:00:06,930
Today we're talking about the viola and Jones algorithm.
3
00:00:06,930 --> 00:00:11,870
This is the algorithm that lies at the foundation of the open CV a library.
4
00:00:11,880 --> 00:00:18,820
And this is one of the most powerful to date algorithms for computer vision.
5
00:00:19,140 --> 00:00:20,290
Let's have a look.
6
00:00:20,700 --> 00:00:22,970
So this hologram was developed by two people.
7
00:00:22,980 --> 00:00:29,440
Paul Viola and Michael Jones and it was developed in 2001.
8
00:00:29,730 --> 00:00:30,950
How crazy is that.
9
00:00:30,960 --> 00:00:40,060
It's been like over 16 years since then and it's still one of the most powerful algorithms in the world.
10
00:00:40,170 --> 00:00:42,510
It's slowly being surpassed by deep learning.
11
00:00:42,540 --> 00:00:50,280
But nevertheless it is in its simplicity it's so powerful that it is still being used for computers
12
00:00:50,310 --> 00:00:53,850
and it's not just detecting faces on images.
13
00:00:53,850 --> 00:00:58,800
This is actually for real time computer vision detecting faces on videos as well.
14
00:00:58,800 --> 00:01:00,220
That's how powerful it is.
15
00:01:00,390 --> 00:01:04,730
The voltage on the algorithm consists of two stages.
16
00:01:04,920 --> 00:01:06,570
The first stage is the training.
17
00:01:06,570 --> 00:01:08,910
The second stage is the detection.
18
00:01:08,910 --> 00:01:14,120
So training of the algorithm and then detection of the actual faces in application.
19
00:01:14,280 --> 00:01:17,110
And we are going to start off by talking about detection.
20
00:01:17,310 --> 00:01:21,960
This might be a bit counterintuitive because you think you want to start with training but it just makes
21
00:01:22,400 --> 00:01:26,710
it be much clearer if we start with the detection and understand how it works.
22
00:01:26,820 --> 00:01:31,680
Once it's already trained up once everything is in place we will see how it works in action.
23
00:01:31,740 --> 00:01:37,890
And that way when we're talking about training everything will make much more sense why exactly certain
24
00:01:37,890 --> 00:01:39,990
things are structured in certain ways.
25
00:01:40,020 --> 00:01:40,790
So here we go.
26
00:01:40,800 --> 00:01:41,830
Let's get started.
27
00:01:42,090 --> 00:01:51,090
We've got a photo here and we're just going to call this person Jaideep to keep them anonymous but they're
28
00:01:51,120 --> 00:01:53,070
totally comfortable for us using this photo.
29
00:01:53,250 --> 00:02:02,040
And yet I actually want to add LUNs friends and yeah so JD is is a great example of a photo here because
30
00:02:03,410 --> 00:02:08,060
you can see the face it's a frontal face and that's exactly what the voltage on the algorithm looks
31
00:02:08,060 --> 00:02:08,720
for.
32
00:02:08,880 --> 00:02:16,430
It's designed to look for frontal faces not on us not somebody looking to the side or up or down.
33
00:02:16,430 --> 00:02:20,600
It's like most of the time it's performs the best with frontal face.
34
00:02:20,610 --> 00:02:22,580
That's what it's designed for.
35
00:02:22,580 --> 00:02:28,540
And the first thing we first step that we do in the well Joe's algorithm is that it is turned into a
36
00:02:28,540 --> 00:02:29,470
gray scale.
37
00:02:29,480 --> 00:02:35,950
So for simplicity's sake it's just easier to work with grayscale images results are still astonishing.
38
00:02:36,110 --> 00:02:38,450
And there's just less data to process.
39
00:02:38,450 --> 00:02:40,340
That's why it turned to gray scale.
40
00:02:40,340 --> 00:02:46,220
Then once the face is detected Jones algorithm finds that actual location or face on the color image
41
00:02:46,230 --> 00:02:51,290
so you won't even notice that it's working grayscale but in reality in the background while Jones algorithm
42
00:02:51,350 --> 00:02:53,890
is working with a greyscale version of your image.
43
00:02:54,350 --> 00:03:02,210
And so what the very low Jones algorithm does is it starts looking for the face its outlines like a
44
00:03:02,210 --> 00:03:07,850
little box starts from the top right corner top left corner and moves to the right.
45
00:03:07,850 --> 00:03:09,850
Step by step and it's looking for the face.
46
00:03:09,860 --> 00:03:11,230
And how is it looking for the face.
47
00:03:11,300 --> 00:03:13,450
And this is what we'll discuss in the further towards.
48
00:03:13,520 --> 00:03:18,540
But for now we're just going to say it's looking for certain features of the face and by features.
49
00:03:18,650 --> 00:03:26,120
For now we're just going to mean that it's looking for eyebrows eyes the nose the lips the chin the
50
00:03:26,120 --> 00:03:28,940
forehead the cheeks and so on.
51
00:03:28,940 --> 00:03:32,610
So how exactly it can find them we'll find out further down the track.
52
00:03:32,630 --> 00:03:38,910
But philologists agree that it's looking for those features so in this box it can see an eyebrow.
53
00:03:39,380 --> 00:03:44,570
So it's like looking through out through the pixels in this box is going like k looking looking looking
54
00:03:44,570 --> 00:03:49,520
looking for any of those features and it can detect an eyebrow and then it thinks OK this might be a
55
00:03:49,520 --> 00:03:55,330
face but then it realizes that for it to be a face as to be an eye there is to be two eyes a nose as
56
00:03:55,470 --> 00:03:57,290
mouth and looks for those other features.
57
00:03:57,380 --> 00:04:01,610
And then as soon as it doesn't detect an eye like it looks at this whole box it doesn't detect any eyes
58
00:04:01,610 --> 00:04:02,610
at all.
59
00:04:02,810 --> 00:04:05,460
It understands for itself this is not a phase.
60
00:04:05,480 --> 00:04:05,710
OK.
61
00:04:05,720 --> 00:04:06,770
So let's move on.
62
00:04:07,100 --> 00:04:08,330
So it moves on.
63
00:04:08,330 --> 00:04:08,860
There you go.
64
00:04:08,870 --> 00:04:10,920
Now it's moved a bit to the right.
65
00:04:10,940 --> 00:04:13,090
It again looks through the detects an eyebrow again.
66
00:04:13,100 --> 00:04:15,740
So does that same thing doesn't detect any eyes.
67
00:04:16,100 --> 00:04:17,230
So then it's on her face.
68
00:04:17,240 --> 00:04:19,080
McCain keeps moving and keeps moving.
69
00:04:19,080 --> 00:04:20,740
Aha now it can detect two eyebrows.
70
00:04:20,750 --> 00:04:22,520
But again the problem is there's no eyes.
71
00:04:22,550 --> 00:04:23,860
It's not a face.
72
00:04:23,990 --> 00:04:28,940
It's moving it's moving one mile Barrow nothing nothing nothing then skips down.
73
00:04:29,150 --> 00:04:32,890
OK so now now it goes to the image you can see an eyebrow.
74
00:04:32,930 --> 00:04:33,370
Yes.
75
00:04:33,380 --> 00:04:33,650
OK.
76
00:04:33,650 --> 00:04:36,590
That's the score then it looks for an eye.
77
00:04:36,650 --> 00:04:37,740
You can see the eye.
78
00:04:37,740 --> 00:04:38,730
That's great.
79
00:04:38,780 --> 00:04:44,750
Then next it might look for the nose and it's looking so it looks it finds the eyebrow Yes that looks
80
00:04:44,750 --> 00:04:45,920
again finds the eye.
81
00:04:45,950 --> 00:04:46,450
Yes.
82
00:04:46,460 --> 00:04:52,640
And then looks for the nose there's no nose so it's probably not a face because the face has to have
83
00:04:52,650 --> 00:04:55,780
nose keeps going again and keeps going.
84
00:04:55,790 --> 00:05:02,630
And then at this point it might say OK I see something that might think that this is a nose because
85
00:05:02,630 --> 00:05:05,490
it's a very I remember it's a very very kind of basic algorithm.
86
00:05:05,600 --> 00:05:09,590
And from the features that we'll discuss you might you'll see that it might actually think that this
87
00:05:09,590 --> 00:05:12,620
part is a nose or in its own right.
88
00:05:12,620 --> 00:05:17,810
And so even if it thinks that this is a nose it realizes that there's no eye there's no other eye or
89
00:05:17,810 --> 00:05:19,550
there's no other eyebrow.
90
00:05:19,970 --> 00:05:21,490
So he keeps looking keeps looking.
91
00:05:21,530 --> 00:05:25,700
Then here it might or do you think that this is I eyebrow eyebrow.
92
00:05:25,700 --> 00:05:26,620
I know.
93
00:05:26,630 --> 00:05:31,780
But then you will see that there's no face there's no mouth there's no cheeks there's not.
94
00:05:31,850 --> 00:05:34,100
None of these features that are over here at the bottom.
95
00:05:34,130 --> 00:05:35,250
There's no chin.
96
00:05:35,480 --> 00:05:37,500
If it's that if that's what it's looking for again.
97
00:05:37,520 --> 00:05:40,720
And all these things depend on the training of the Algor which we'll discuss further down.
98
00:05:40,850 --> 00:05:42,620
So we will understand this is not a face.
99
00:05:42,620 --> 00:05:44,660
Ok keeps going keeps going keeps going.
100
00:05:45,890 --> 00:05:53,900
And so here you can see now this time it's definitely got an eye an eyebrow a nose but no mouth no other
101
00:05:54,230 --> 00:05:56,750
or no eye or second eye or second eyebrow.
102
00:05:56,750 --> 00:05:57,770
So it keeps going.
103
00:05:57,860 --> 00:06:01,380
And then here this is almost the face but there's no MUF.
104
00:06:01,400 --> 00:06:07,130
So again discards this and it keeps going keeps going keeps going keeps going and then finally when
105
00:06:07,130 --> 00:06:13,330
it gets somewhere here it can see that it's got both eyebrows both eyes and nose and a mouth.
106
00:06:13,340 --> 00:06:19,970
So then it highlights this as a very high very high potential to be a face.
107
00:06:19,970 --> 00:06:25,220
So it for instance makes his boxers and then keeps going again and now in this box you can see another
108
00:06:25,220 --> 00:06:28,510
face it can see two eyebrows two eyes and nose and mouth.
109
00:06:28,520 --> 00:06:31,220
So it highlights that again it keeps going on.
110
00:06:31,340 --> 00:06:36,470
So after this box office he goes over here it can no longer see the full eyebrow or the full eyes so
111
00:06:36,470 --> 00:06:38,320
this is no longer a face.
112
00:06:38,540 --> 00:06:40,600
So he keeps going it's going from the face.
113
00:06:40,640 --> 00:06:41,590
And then finally here.
114
00:06:41,600 --> 00:06:44,090
No eyes or no face.
115
00:06:44,090 --> 00:06:45,520
So there you go.
116
00:06:45,520 --> 00:06:52,460
That's how it scans this whole image and the size of this box varies because faces can be small or large.
117
00:06:52,460 --> 00:06:55,390
So you've got an image might have a face of there is this small face.
118
00:06:55,400 --> 00:06:59,240
So this box this will happen many times there'll be a box.
119
00:06:59,240 --> 00:07:03,020
This box will be a bigger box will be a small box lots of different boxes.
120
00:07:03,140 --> 00:07:09,470
And also if you notice like we had very large steps so here you can see the step is quite large and
121
00:07:09,470 --> 00:07:11,450
also the vertical step is quite large.
122
00:07:11,450 --> 00:07:13,970
In reality the steps are smaller.
123
00:07:13,970 --> 00:07:15,040
The books are more frequent.
124
00:07:15,040 --> 00:07:18,690
This is just a disposition for us to get the intuition.
125
00:07:18,860 --> 00:07:23,750
And so in reality you might this face might have not been detected just twice might have been detected
126
00:07:23,930 --> 00:07:26,410
because the box went through many times.
127
00:07:26,450 --> 00:07:31,370
What were the steps of the box were much smaller this facemask it might have been detected many more
128
00:07:31,370 --> 00:07:32,820
times something like that.
129
00:07:33,010 --> 00:07:35,910
You know each box says that there's a high likelihood of being there.
130
00:07:36,020 --> 00:07:41,180
And when a lot of boxes overlap that means that that is most likely a face and then it will detect the
131
00:07:41,180 --> 00:07:41,810
face there.
132
00:07:41,960 --> 00:07:50,510
And that's how you see that Green Square in your Facebook or in in in any other kind of like phase detection
133
00:07:51,260 --> 00:07:55,160
application that you might have on your phone or and things like that again.
134
00:07:55,190 --> 00:08:00,980
They might be using other algorithms but the ones that are using the computer vision algorithm which
135
00:08:01,180 --> 00:08:01,950
Jones created.
136
00:08:01,970 --> 00:08:03,510
That's exactly how they work.
137
00:08:03,620 --> 00:08:10,110
And then it all just find the same box the position the same box on the color image too.
138
00:08:10,130 --> 00:08:11,240
And that's exactly what you will see.
139
00:08:11,240 --> 00:08:12,970
You see the grayscale you see the color.
140
00:08:13,040 --> 00:08:13,580
And so there you go.
141
00:08:13,580 --> 00:08:15,670
We found J.D. found his face.
142
00:08:15,680 --> 00:08:19,520
So that's in a nutshell how the vial of Jones algorithm works.
143
00:08:19,520 --> 00:08:24,590
I know it's very basic very straightforward but it was important for us to understand how this box travels
144
00:08:24,590 --> 00:08:29,870
through the image what exactly happens because now we're going to start building on top of that and
145
00:08:29,870 --> 00:08:35,210
we're going to talk about the training like for the Donald who about the training for now will talk
146
00:08:35,210 --> 00:08:41,810
about the features then we'll talk about you know some hacks on how this this process is can be can
147
00:08:41,810 --> 00:08:45,510
be expedited and how this happens much faster and more efficient.
148
00:08:45,860 --> 00:08:51,620
Now if you would like to do some additional reading then the best place to start is the original paper
149
00:08:51,620 --> 00:08:53,780
by pull oil and Michael Jones.
150
00:08:53,780 --> 00:08:58,220
It's called Rapid Oggi dictation using a boosted cascade of simple features.
151
00:08:58,430 --> 00:09:03,650
It's actually a very simple paper even though the name might be complex.
152
00:09:03,650 --> 00:09:08,190
The the language is very friendly it's very easy to read I highly recommend checking it out.
153
00:09:08,360 --> 00:09:12,140
Maybe not at this stage maybe you know maybe you want to go through tutorials first and then read it
154
00:09:12,200 --> 00:09:14,180
after you finished the section.
155
00:09:14,180 --> 00:09:15,730
But it will be very helpful.
156
00:09:15,760 --> 00:09:20,970
The talk in a very human language are very easy to understand.
157
00:09:21,360 --> 00:09:21,820
And yeah.
158
00:09:21,830 --> 00:09:27,350
And so here you can actually see an example of some of the features which are used in the Jones algorithm
159
00:09:27,380 --> 00:09:32,770
and we'll be talking about these in the next tutorial and I look forward See you next then.
160
00:09:32,870 --> 00:09:34,850
Until then enjoy computer vision.
16298
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.