Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,930 --> 00:00:03,400
Hello and welcome back to the course on computer vision.
2
00:00:03,490 --> 00:00:06,040
And today we're going to talk about cascading.
3
00:00:06,100 --> 00:00:12,790
So previously we discussed about the show on classifiers and how they're constructed from Week classifiers
4
00:00:12,790 --> 00:00:13,860
and put them together.
5
00:00:14,140 --> 00:00:19,780
And now we're going to talk about another hack that the burial of Jones algorithm applies in order to
6
00:00:19,780 --> 00:00:20,950
speed up the process.
7
00:00:21,220 --> 00:00:25,420
And the hack is called Cascadian schematically it looks like this.
8
00:00:25,480 --> 00:00:28,170
We take a window that we're evaluating on our.
9
00:00:28,170 --> 00:00:37,150
So that's a window that is sliding across our lives and we look for the first feature in all or list
10
00:00:37,150 --> 00:00:37,540
of features.
11
00:00:37,540 --> 00:00:43,210
So for the most important feature and then we see if it's present on the image or in that same window
12
00:00:43,210 --> 00:00:51,040
if it's present in some window or not if it's not present in the same window then we reject that window
13
00:00:51,040 --> 00:00:53,500
completely and we don't even evaluate the remainder of the picture.
14
00:00:53,500 --> 00:00:59,410
So if for instance this is the most water feature and we find out that it's not present in some window
15
00:00:59,620 --> 00:01:02,400
then we're not even going to look at these.
16
00:01:03,070 --> 00:01:10,510
Then if you say the way to think about it is if if there is no nos an image then it's just not going
17
00:01:10,510 --> 00:01:16,210
to be a face like a frontal face has to have a nose therefore what's the point of looking for eyes and
18
00:01:16,210 --> 00:01:19,540
cheeks and eyebrows and so on if we're you know that there's no nodes.
19
00:01:19,840 --> 00:01:27,070
Then if this feature is in the presence of if there is a nose then it is time to walk for the second
20
00:01:27,070 --> 00:01:27,540
feature.
21
00:01:27,550 --> 00:01:31,880
Now we're going to evaluate if the second features in that sampling them if it's not then we reject
22
00:01:31,890 --> 00:01:38,620
the subway for instance if there's a nose but then whatever this is there's this looks thing an eyebrow
23
00:01:38,650 --> 00:01:42,320
but it's upside down so maybe there's no Water-Lily.
24
00:01:42,560 --> 00:01:47,500
Whatever this feature represents whatever the algorithm was looking for there.
25
00:01:47,800 --> 00:01:53,650
It's not present then we're going to now reject the reject image because it has a nose and doesn't have
26
00:01:53,650 --> 00:01:55,310
a lip so on your face.
27
00:01:55,330 --> 00:02:01,130
So then if certain value the third feature if it is present if that should also happen then and without
28
00:02:01,250 --> 00:02:02,090
talked.
29
00:02:02,650 --> 00:02:03,340
And so on.
30
00:02:03,340 --> 00:02:08,250
So if servies is not present then we reject and move on.
31
00:02:08,710 --> 00:02:13,640
So that's the simplistic diagrammatic kind of explanation.
32
00:02:13,660 --> 00:02:15,450
In reality it's a bit more complex.
33
00:02:15,530 --> 00:02:22,660
In reality of course it's very risky to rely on just one feature because maybe it might not be.
34
00:02:22,900 --> 00:02:28,240
Maybe the shade is in such a way or the lighting is different or maybe just so for some reason wasn't
35
00:02:28,240 --> 00:02:34,660
detected as can be lots of different circumstance so what it actually does is in this first step it
36
00:02:34,660 --> 00:02:39,110
doesn't look at the one feature it looks like the top five features with the top 12 inches.
37
00:02:39,260 --> 00:02:44,650
Well let's say let's say Tokai for argument's sake it looks at top five features and then decides based
38
00:02:44,650 --> 00:02:51,220
on them so if if they're not present if for instance none of them are present then or some portion of
39
00:02:51,220 --> 00:03:00,820
them are present like the details the details are not important right now what is important is the whole
40
00:03:00,820 --> 00:03:07,930
concept that regardless if you find identifies based on the first five features it takes one of two
41
00:03:07,930 --> 00:03:14,530
or three or four or five and it's based on them that this window this some Windu is pointless had to
42
00:03:14,530 --> 00:03:20,590
keep looking at it's not a face it won't continue then here it will look at the next 12 here or look
43
00:03:20,590 --> 00:03:22,310
at the next 25 or something.
44
00:03:22,330 --> 00:03:27,250
So every time because the features are becoming less and less important less and less prominent is taking
45
00:03:27,250 --> 00:03:28,660
more and more of these features.
46
00:03:28,870 --> 00:03:36,280
But nevertheless is basing its decision on whether or not to continue going forward with this on the
47
00:03:37,690 --> 00:03:39,940
at the at that point in time all the features it's looking at.
48
00:03:39,940 --> 00:03:46,720
And if it is not happy with one and seeing things that it might now like if it decides that over here
49
00:03:46,720 --> 00:03:52,690
that right now this should be the these features and all this many of these features that are not there.
50
00:03:52,690 --> 00:03:53,340
This is not a face.
51
00:03:53,350 --> 00:03:55,020
I'm not even going to evaluate the rest.
52
00:03:55,150 --> 00:03:56,550
And that is called cascading.
53
00:03:56,590 --> 00:04:00,170
It helps really speed up the process.
54
00:04:00,190 --> 00:04:04,450
So in visual terms let's say that these are all features that we identify.
55
00:04:04,540 --> 00:04:09,510
Again we'll go back to the simple example of just getting them one by one rather than in batches.
56
00:04:09,730 --> 00:04:13,490
And so what we're going to look at is we're going to look at the subgroup over here.
57
00:04:13,550 --> 00:04:18,090
And so let's say we look at this feature and we can see that it is present.
58
00:04:18,100 --> 00:04:19,900
That's the eyebrow over here.
59
00:04:19,900 --> 00:04:20,740
Dark and bright.
60
00:04:20,740 --> 00:04:23,230
So this will pass so that that passes.
61
00:04:23,230 --> 00:04:24,580
OK so it goes from here to here.
62
00:04:24,580 --> 00:04:25,140
Good.
63
00:04:25,150 --> 00:04:27,010
Second Future looks for this.
64
00:04:27,010 --> 00:04:28,310
You can see that in the eye.
65
00:04:28,360 --> 00:04:35,260
OK it passes then goes to this and maybe in this case it actually can see that.
66
00:04:35,260 --> 00:04:45,040
So this is technically this feature is reserved or was originally planned for the lips so light black
67
00:04:45,310 --> 00:04:51,760
lights a light dark light and it would that would have identify liberties we've got an intuition that
68
00:04:51,970 --> 00:04:58,210
actually eyeball you can see the same thing you can see light dark and light so it can mistakenly actually
69
00:04:58,780 --> 00:05:03,560
accept the this feature is here and would say oh this features here so we do have the lips and really
70
00:05:03,560 --> 00:05:04,190
we don't.
71
00:05:04,250 --> 00:05:10,110
But OK so even if that passes it gets to this future and it cannot identify this feature because this
72
00:05:10,110 --> 00:05:15,860
feature for instance like nowhere on this image you can see like light and dark until dark and you can
73
00:05:15,860 --> 00:05:20,350
see the light and dark of light but not the other way around the white and black like that.
74
00:05:20,390 --> 00:05:23,990
So it was actually for this part of the nose as far as I remember.
75
00:05:24,140 --> 00:05:28,730
So at this point it would at this point it would reject I would say this feature features are present
76
00:05:28,730 --> 00:05:30,530
so featuring them for the present.
77
00:05:30,680 --> 00:05:34,790
They reject the sub window completely and it wouldn't even evaluate this feature.
78
00:05:34,850 --> 00:05:39,440
And as you can imagine you can get kind of 100 features or how many features after that it would just
79
00:05:39,440 --> 00:05:41,080
not evaluate them any of them.
80
00:05:41,210 --> 00:05:46,010
And even though evaluation takes like split seconds for each one but when you add them up because there's
81
00:05:46,010 --> 00:05:52,590
so many of them it takes time and plus you have to go through this image lots of time with some windows.
82
00:05:52,970 --> 00:05:59,480
And so it saves not just time devaluing the Swan But lots of features all the features that come afterwards.
83
00:05:59,480 --> 00:06:02,000
And it just rejects this whole sub window and moves on.
84
00:06:02,000 --> 00:06:05,780
And that really speeds up the process and that's called cascading.
85
00:06:05,780 --> 00:06:08,030
So on that note we're going to wrap up today's tutorial.
86
00:06:08,030 --> 00:06:11,960
I hope you enjoyed it and I look forward to seeing you next time.
87
00:06:11,960 --> 00:06:13,670
And until then enjoy computer vision.
9787
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.