subtitlecat.com

All language subtitles for 004 Viola-Jones Algorithm-en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English Download

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese Download

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,540 --> 00:00:02,730 Hello and welcome to the course on computer vision. 2 00:00:02,730 --> 00:00:06,930 Today we're talking about the viola and Jones algorithm. 3 00:00:06,930 --> 00:00:11,870 This is the algorithm that lies at the foundation of the open CV a library. 4 00:00:11,880 --> 00:00:18,820 And this is one of the most powerful to date algorithms for computer vision. 5 00:00:19,140 --> 00:00:20,290 Let's have a look. 6 00:00:20,700 --> 00:00:22,970 So this hologram was developed by two people. 7 00:00:22,980 --> 00:00:29,440 Paul Viola and Michael Jones and it was developed in 2001. 8 00:00:29,730 --> 00:00:30,950 How crazy is that. 9 00:00:30,960 --> 00:00:40,060 It's been like over 16 years since then and it's still one of the most powerful algorithms in the world. 10 00:00:40,170 --> 00:00:42,510 It's slowly being surpassed by deep learning. 11 00:00:42,540 --> 00:00:50,280 But nevertheless it is in its simplicity it's so powerful that it is still being used for computers 12 00:00:50,310 --> 00:00:53,850 and it's not just detecting faces on images. 13 00:00:53,850 --> 00:00:58,800 This is actually for real time computer vision detecting faces on videos as well. 14 00:00:58,800 --> 00:01:00,220 That's how powerful it is. 15 00:01:00,390 --> 00:01:04,730 The voltage on the algorithm consists of two stages. 16 00:01:04,920 --> 00:01:06,570 The first stage is the training. 17 00:01:06,570 --> 00:01:08,910 The second stage is the detection. 18 00:01:08,910 --> 00:01:14,120 So training of the algorithm and then detection of the actual faces in application. 19 00:01:14,280 --> 00:01:17,110 And we are going to start off by talking about detection. 20 00:01:17,310 --> 00:01:21,960 This might be a bit counterintuitive because you think you want to start with training but it just makes 21 00:01:22,400 --> 00:01:26,710 it be much clearer if we start with the detection and understand how it works. 22 00:01:26,820 --> 00:01:31,680 Once it's already trained up once everything is in place we will see how it works in action. 23 00:01:31,740 --> 00:01:37,890 And that way when we're talking about training everything will make much more sense why exactly certain 24 00:01:37,890 --> 00:01:39,990 things are structured in certain ways. 25 00:01:40,020 --> 00:01:40,790 So here we go. 26 00:01:40,800 --> 00:01:41,830 Let's get started. 27 00:01:42,090 --> 00:01:51,090 We've got a photo here and we're just going to call this person Jaideep to keep them anonymous but they're 28 00:01:51,120 --> 00:01:53,070 totally comfortable for us using this photo. 29 00:01:53,250 --> 00:02:02,040 And yet I actually want to add LUNs friends and yeah so JD is is a great example of a photo here because 30 00:02:03,410 --> 00:02:08,060 you can see the face it's a frontal face and that's exactly what the voltage on the algorithm looks 31 00:02:08,060 --> 00:02:08,720 for. 32 00:02:08,880 --> 00:02:16,430 It's designed to look for frontal faces not on us not somebody looking to the side or up or down. 33 00:02:16,430 --> 00:02:20,600 It's like most of the time it's performs the best with frontal face. 34 00:02:20,610 --> 00:02:22,580 That's what it's designed for. 35 00:02:22,580 --> 00:02:28,540 And the first thing we first step that we do in the well Joe's algorithm is that it is turned into a 36 00:02:28,540 --> 00:02:29,470 gray scale. 37 00:02:29,480 --> 00:02:35,950 So for simplicity's sake it's just easier to work with grayscale images results are still astonishing. 38 00:02:36,110 --> 00:02:38,450 And there's just less data to process. 39 00:02:38,450 --> 00:02:40,340 That's why it turned to gray scale. 40 00:02:40,340 --> 00:02:46,220 Then once the face is detected Jones algorithm finds that actual location or face on the color image 41 00:02:46,230 --> 00:02:51,290 so you won't even notice that it's working grayscale but in reality in the background while Jones algorithm 42 00:02:51,350 --> 00:02:53,890 is working with a greyscale version of your image. 43 00:02:54,350 --> 00:03:02,210 And so what the very low Jones algorithm does is it starts looking for the face its outlines like a 44 00:03:02,210 --> 00:03:07,850 little box starts from the top right corner top left corner and moves to the right. 45 00:03:07,850 --> 00:03:09,850 Step by step and it's looking for the face. 46 00:03:09,860 --> 00:03:11,230 And how is it looking for the face. 47 00:03:11,300 --> 00:03:13,450 And this is what we'll discuss in the further towards. 48 00:03:13,520 --> 00:03:18,540 But for now we're just going to say it's looking for certain features of the face and by features. 49 00:03:18,650 --> 00:03:26,120 For now we're just going to mean that it's looking for eyebrows eyes the nose the lips the chin the 50 00:03:26,120 --> 00:03:28,940 forehead the cheeks and so on. 51 00:03:28,940 --> 00:03:32,610 So how exactly it can find them we'll find out further down the track. 52 00:03:32,630 --> 00:03:38,910 But philologists agree that it's looking for those features so in this box it can see an eyebrow. 53 00:03:39,380 --> 00:03:44,570 So it's like looking through out through the pixels in this box is going like k looking looking looking 54 00:03:44,570 --> 00:03:49,520 looking for any of those features and it can detect an eyebrow and then it thinks OK this might be a 55 00:03:49,520 --> 00:03:55,330 face but then it realizes that for it to be a face as to be an eye there is to be two eyes a nose as 56 00:03:55,470 --> 00:03:57,290 mouth and looks for those other features. 57 00:03:57,380 --> 00:04:01,610 And then as soon as it doesn't detect an eye like it looks at this whole box it doesn't detect any eyes 58 00:04:01,610 --> 00:04:02,610 at all. 59 00:04:02,810 --> 00:04:05,460 It understands for itself this is not a phase. 60 00:04:05,480 --> 00:04:05,710 OK. 61 00:04:05,720 --> 00:04:06,770 So let's move on. 62 00:04:07,100 --> 00:04:08,330 So it moves on. 63 00:04:08,330 --> 00:04:08,860 There you go. 64 00:04:08,870 --> 00:04:10,920 Now it's moved a bit to the right. 65 00:04:10,940 --> 00:04:13,090 It again looks through the detects an eyebrow again. 66 00:04:13,100 --> 00:04:15,740 So does that same thing doesn't detect any eyes. 67 00:04:16,100 --> 00:04:17,230 So then it's on her face. 68 00:04:17,240 --> 00:04:19,080 McCain keeps moving and keeps moving. 69 00:04:19,080 --> 00:04:20,740 Aha now it can detect two eyebrows. 70 00:04:20,750 --> 00:04:22,520 But again the problem is there's no eyes. 71 00:04:22,550 --> 00:04:23,860 It's not a face. 72 00:04:23,990 --> 00:04:28,940 It's moving it's moving one mile Barrow nothing nothing nothing then skips down. 73 00:04:29,150 --> 00:04:32,890 OK so now now it goes to the image you can see an eyebrow. 74 00:04:32,930 --> 00:04:33,370 Yes. 75 00:04:33,380 --> 00:04:33,650 OK. 76 00:04:33,650 --> 00:04:36,590 That's the score then it looks for an eye. 77 00:04:36,650 --> 00:04:37,740 You can see the eye. 78 00:04:37,740 --> 00:04:38,730 That's great. 79 00:04:38,780 --> 00:04:44,750 Then next it might look for the nose and it's looking so it looks it finds the eyebrow Yes that looks 80 00:04:44,750 --> 00:04:45,920 again finds the eye. 81 00:04:45,950 --> 00:04:46,450 Yes. 82 00:04:46,460 --> 00:04:52,640 And then looks for the nose there's no nose so it's probably not a face because the face has to have 83 00:04:52,650 --> 00:04:55,780 nose keeps going again and keeps going. 84 00:04:55,790 --> 00:05:02,630 And then at this point it might say OK I see something that might think that this is a nose because 85 00:05:02,630 --> 00:05:05,490 it's a very I remember it's a very very kind of basic algorithm. 86 00:05:05,600 --> 00:05:09,590 And from the features that we'll discuss you might you'll see that it might actually think that this 87 00:05:09,590 --> 00:05:12,620 part is a nose or in its own right. 88 00:05:12,620 --> 00:05:17,810 And so even if it thinks that this is a nose it realizes that there's no eye there's no other eye or 89 00:05:17,810 --> 00:05:19,550 there's no other eyebrow. 90 00:05:19,970 --> 00:05:21,490 So he keeps looking keeps looking. 91 00:05:21,530 --> 00:05:25,700 Then here it might or do you think that this is I eyebrow eyebrow. 92 00:05:25,700 --> 00:05:26,620 I know. 93 00:05:26,630 --> 00:05:31,780 But then you will see that there's no face there's no mouth there's no cheeks there's not. 94 00:05:31,850 --> 00:05:34,100 None of these features that are over here at the bottom. 95 00:05:34,130 --> 00:05:35,250 There's no chin. 96 00:05:35,480 --> 00:05:37,500 If it's that if that's what it's looking for again. 97 00:05:37,520 --> 00:05:40,720 And all these things depend on the training of the Algor which we'll discuss further down. 98 00:05:40,850 --> 00:05:42,620 So we will understand this is not a face. 99 00:05:42,620 --> 00:05:44,660 Ok keeps going keeps going keeps going. 100 00:05:45,890 --> 00:05:53,900 And so here you can see now this time it's definitely got an eye an eyebrow a nose but no mouth no other 101 00:05:54,230 --> 00:05:56,750 or no eye or second eye or second eyebrow. 102 00:05:56,750 --> 00:05:57,770 So it keeps going. 103 00:05:57,860 --> 00:06:01,380 And then here this is almost the face but there's no MUF. 104 00:06:01,400 --> 00:06:07,130 So again discards this and it keeps going keeps going keeps going keeps going and then finally when 105 00:06:07,130 --> 00:06:13,330 it gets somewhere here it can see that it's got both eyebrows both eyes and nose and a mouth. 106 00:06:13,340 --> 00:06:19,970 So then it highlights this as a very high very high potential to be a face. 107 00:06:19,970 --> 00:06:25,220 So it for instance makes his boxers and then keeps going again and now in this box you can see another 108 00:06:25,220 --> 00:06:28,510 face it can see two eyebrows two eyes and nose and mouth. 109 00:06:28,520 --> 00:06:31,220 So it highlights that again it keeps going on. 110 00:06:31,340 --> 00:06:36,470 So after this box office he goes over here it can no longer see the full eyebrow or the full eyes so 111 00:06:36,470 --> 00:06:38,320 this is no longer a face. 112 00:06:38,540 --> 00:06:40,600 So he keeps going it's going from the face. 113 00:06:40,640 --> 00:06:41,590 And then finally here. 114 00:06:41,600 --> 00:06:44,090 No eyes or no face. 115 00:06:44,090 --> 00:06:45,520 So there you go. 116 00:06:45,520 --> 00:06:52,460 That's how it scans this whole image and the size of this box varies because faces can be small or large. 117 00:06:52,460 --> 00:06:55,390 So you've got an image might have a face of there is this small face. 118 00:06:55,400 --> 00:06:59,240 So this box this will happen many times there'll be a box. 119 00:06:59,240 --> 00:07:03,020 This box will be a bigger box will be a small box lots of different boxes. 120 00:07:03,140 --> 00:07:09,470 And also if you notice like we had very large steps so here you can see the step is quite large and 121 00:07:09,470 --> 00:07:11,450 also the vertical step is quite large. 122 00:07:11,450 --> 00:07:13,970 In reality the steps are smaller. 123 00:07:13,970 --> 00:07:15,040 The books are more frequent. 124 00:07:15,040 --> 00:07:18,690 This is just a disposition for us to get the intuition. 125 00:07:18,860 --> 00:07:23,750 And so in reality you might this face might have not been detected just twice might have been detected 126 00:07:23,930 --> 00:07:26,410 because the box went through many times. 127 00:07:26,450 --> 00:07:31,370 What were the steps of the box were much smaller this facemask it might have been detected many more 128 00:07:31,370 --> 00:07:32,820 times something like that. 129 00:07:33,010 --> 00:07:35,910 You know each box says that there's a high likelihood of being there. 130 00:07:36,020 --> 00:07:41,180 And when a lot of boxes overlap that means that that is most likely a face and then it will detect the 131 00:07:41,180 --> 00:07:41,810 face there. 132 00:07:41,960 --> 00:07:50,510 And that's how you see that Green Square in your Facebook or in in in any other kind of like phase detection 133 00:07:51,260 --> 00:07:55,160 application that you might have on your phone or and things like that again. 134 00:07:55,190 --> 00:08:00,980 They might be using other algorithms but the ones that are using the computer vision algorithm which 135 00:08:01,180 --> 00:08:01,950 Jones created. 136 00:08:01,970 --> 00:08:03,510 That's exactly how they work. 137 00:08:03,620 --> 00:08:10,110 And then it all just find the same box the position the same box on the color image too. 138 00:08:10,130 --> 00:08:11,240 And that's exactly what you will see. 139 00:08:11,240 --> 00:08:12,970 You see the grayscale you see the color. 140 00:08:13,040 --> 00:08:13,580 And so there you go. 141 00:08:13,580 --> 00:08:15,670 We found J.D. found his face. 142 00:08:15,680 --> 00:08:19,520 So that's in a nutshell how the vial of Jones algorithm works. 143 00:08:19,520 --> 00:08:24,590 I know it's very basic very straightforward but it was important for us to understand how this box travels 144 00:08:24,590 --> 00:08:29,870 through the image what exactly happens because now we're going to start building on top of that and 145 00:08:29,870 --> 00:08:35,210 we're going to talk about the training like for the Donald who about the training for now will talk 146 00:08:35,210 --> 00:08:41,810 about the features then we'll talk about you know some hacks on how this this process is can be can 147 00:08:41,810 --> 00:08:45,510 be expedited and how this happens much faster and more efficient. 148 00:08:45,860 --> 00:08:51,620 Now if you would like to do some additional reading then the best place to start is the original paper 149 00:08:51,620 --> 00:08:53,780 by pull oil and Michael Jones. 150 00:08:53,780 --> 00:08:58,220 It's called Rapid Oggi dictation using a boosted cascade of simple features. 151 00:08:58,430 --> 00:09:03,650 It's actually a very simple paper even though the name might be complex. 152 00:09:03,650 --> 00:09:08,190 The the language is very friendly it's very easy to read I highly recommend checking it out. 153 00:09:08,360 --> 00:09:12,140 Maybe not at this stage maybe you know maybe you want to go through tutorials first and then read it 154 00:09:12,200 --> 00:09:14,180 after you finished the section. 155 00:09:14,180 --> 00:09:15,730 But it will be very helpful. 156 00:09:15,760 --> 00:09:20,970 The talk in a very human language are very easy to understand. 157 00:09:21,360 --> 00:09:21,820 And yeah. 158 00:09:21,830 --> 00:09:27,350 And so here you can actually see an example of some of the features which are used in the Jones algorithm 159 00:09:27,380 --> 00:09:32,770 and we'll be talking about these in the next tutorial and I look forward See you next then. 160 00:09:32,870 --> 00:09:34,850 Until then enjoy computer vision. 16298