Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,750 --> 00:00:05,310
Hello and welcome back to the course on computer vision then we're going to talk about some applications
2
00:00:05,310 --> 00:00:06,340
of Ganns.
3
00:00:06,690 --> 00:00:06,990
All right.
4
00:00:06,990 --> 00:00:08,260
So let's have a look.
5
00:00:08,340 --> 00:00:16,620
Gannes can be used for many different things and for instance here's a couple generating images image
6
00:00:16,620 --> 00:00:25,900
modification super resolution assisting artists photo realistic images speech generation and face Ajahn.
7
00:00:26,190 --> 00:00:32,340
So those are just a couple of examples of how guns can be used and we're going to look into a few of
8
00:00:32,340 --> 00:00:33,420
them in more detail.
9
00:00:33,430 --> 00:00:35,500
So not all of them but just a few.
10
00:00:35,940 --> 00:00:42,210
Just to kind of like give you some ideas maybe for how you can use guides in your own life and your
11
00:00:42,210 --> 00:00:49,050
work maybe in your own projects maybe in some applications that you might want to create.
12
00:00:49,050 --> 00:00:53,890
All right so generating images what can guns be used for in this space.
13
00:00:53,890 --> 00:01:01,140
Well where you saw how Gannes generate images that's kind of like the core Skase for gas or how they
14
00:01:01,170 --> 00:01:02,610
came to be.
15
00:01:02,730 --> 00:01:04,640
We saw that example dogs.
16
00:01:04,710 --> 00:01:11,010
Let's have a look at a more successful example like you could see that it's really hard for a guy generative
17
00:01:11,010 --> 00:01:15,410
adversarial network to generate an image of a like a realistic dog.
18
00:01:15,420 --> 00:01:17,070
But let's see where they do actually.
19
00:01:17,190 --> 00:01:24,870
Well so Franzen's these are just by looking at this can you say what this is what do you think these
20
00:01:24,870 --> 00:01:25,530
are images of.
21
00:01:25,530 --> 00:01:30,500
They're all images of one certain thing.
22
00:01:30,600 --> 00:01:38,610
So you probably can already tell what they are and by the way this is just like after one iteration
23
00:01:38,610 --> 00:01:40,680
of the one epoch.
24
00:01:40,680 --> 00:01:46,350
And this is taken from an official paper which I will link to at the end of this tutorial but this is
25
00:01:46,350 --> 00:01:52,140
just like a very under-trained gambit you Gary kind of see what it is.
26
00:01:52,140 --> 00:01:56,850
If you can tell then I'm going to show you just now what it looks like what these same images look like
27
00:01:56,850 --> 00:01:58,050
after more steps.
28
00:01:58,050 --> 00:02:06,400
I think it was like after 12-Step so there's not steps after 12 epochs up to 12 training cycles.
29
00:02:06,420 --> 00:02:12,740
So if we do that and let's see if you'll see if it's easier to tell what's going on.
30
00:02:13,230 --> 00:02:13,950
So there you go.
31
00:02:13,950 --> 00:02:19,680
So these are actually generated images and you can probably tell these are images of bedroom's.
32
00:02:19,890 --> 00:02:26,480
And this is an example of where Gad's are doing really well some of these bedroom's looks so realistic
33
00:02:26,490 --> 00:02:32,500
right so here you got a badge you got a pillow you've got a window.
34
00:02:32,700 --> 00:02:34,160
Let's see what else.
35
00:02:34,200 --> 00:02:34,710
Here you go.
36
00:02:34,710 --> 00:02:38,130
Like a big window here somewhere and somewhere there's a bend here.
37
00:02:38,340 --> 00:02:43,230
Here you've got a bed of some pillows and other window you can see the lighting changes the colors are
38
00:02:43,230 --> 00:02:44,780
changing.
39
00:02:45,010 --> 00:02:48,060
Here you go to bed and kind of like a closet over there.
40
00:02:48,090 --> 00:02:54,720
You've got to like a bed up close over here so you can see is doing well and that's one of the cases
41
00:02:54,720 --> 00:03:03,270
where you can do well not just bedrooms but and then just images of bedrooms but actually when you have
42
00:03:03,360 --> 00:03:11,310
a very confined domain where something is like very you know like bedrooms on like dogs are usually
43
00:03:11,700 --> 00:03:13,540
much more similar one to another.
44
00:03:13,650 --> 00:03:19,800
You know they can be a of variety but there's less variety than in images of dogs because dogs are so
45
00:03:19,800 --> 00:03:22,690
different and backgrounds and different positions are different.
46
00:03:22,830 --> 00:03:25,650
But for instance Benham's it works quite well.
47
00:03:25,740 --> 00:03:31,260
So we can just go back and forth a couple times so you can see just pick an image and try to observe
48
00:03:31,260 --> 00:03:32,010
how it changes.
49
00:03:32,010 --> 00:03:38,230
For instance look at this one you can see that it's quite pronounced here.
50
00:03:38,370 --> 00:03:45,960
Before it was not as well defined have another one.
51
00:03:46,180 --> 00:03:47,760
This one looks quite good.
52
00:03:47,870 --> 00:03:49,090
It look like that.
53
00:03:49,220 --> 00:03:56,960
I do look a little bit different as well so as you can see the more training you put in the better off
54
00:03:56,960 --> 00:04:00,740
on our parts like these ones they don't look as defined over here.
55
00:04:00,760 --> 00:04:03,950
But once you do more training they look more.
56
00:04:04,170 --> 00:04:05,100
So there you go.
57
00:04:05,140 --> 00:04:11,930
You know if you want to generate some fake images of bedrooms and gowns are your go to but also not
58
00:04:12,010 --> 00:04:21,070
bedrooms any kind of like confined space or confined type like very tight type of image or object that
59
00:04:21,070 --> 00:04:28,780
where the variety isn't that great and gowns can do a pretty good job image modification.
60
00:04:29,050 --> 00:04:31,060
OK so what does that mean.
61
00:04:31,090 --> 00:04:32,820
Well let's have a look at this example.
62
00:04:32,950 --> 00:04:38,600
So here we've got this from the same paper and here we got a smiling woman on the left.
63
00:04:38,610 --> 00:04:42,490
This is these are images generated by Gahn all of them.
64
00:04:42,490 --> 00:04:47,600
Then you go to a neutral woman on in the middle and then you go to a neutral man.
65
00:04:48,010 --> 00:04:50,390
So if you take the smiling woman.
66
00:04:50,390 --> 00:04:55,000
So basically what they're showing here is that through Gannes through the way that these images are
67
00:04:55,000 --> 00:04:58,250
generated they're actually assigned a vector.
68
00:04:58,300 --> 00:05:04,180
If you look at the neural network that represents them and represent that in a vector you can then do
69
00:05:04,540 --> 00:05:08,050
arithmetic arithmetics with those vectors so you can take a picture of a smiling woman and subtract
70
00:05:08,050 --> 00:05:10,720
the vector for a neutral woman and you'll get.
71
00:05:10,840 --> 00:05:13,330
And then you add a veteran for unusual man.
72
00:05:13,340 --> 00:05:14,830
Well you you'll get the smiling man.
73
00:05:14,830 --> 00:05:20,200
So basically by subtracting this from this you'll get a vector for the smile you added on top of the
74
00:05:20,500 --> 00:05:23,480
natural man and you get them one that is smiling.
75
00:05:23,620 --> 00:05:26,020
Very very interesting.
76
00:05:26,020 --> 00:05:27,270
A little bit creepy as well.
77
00:05:27,280 --> 00:05:29,800
But you know that's they stayed there.
78
00:05:29,830 --> 00:05:31,690
Now here's another example.
79
00:05:31,800 --> 00:05:36,430
Manner of glasses subtract magnifying glasses completely different man as you can see.
80
00:05:36,820 --> 00:05:42,130
And then you add a woman with fogged glasses and you get a woman with glasses.
81
00:05:42,190 --> 00:05:51,220
Very interesting and so similar results have been already seen in the space of context where for instance
82
00:05:51,820 --> 00:05:59,290
a very famous example is if you take her use neural networks to her present words or like sentences
83
00:05:59,290 --> 00:06:05,580
and then you take or like the word sorry if you just take send.
84
00:06:05,590 --> 00:06:13,690
If you take words and represent them vectors and then for instance you take king and you subtract man
85
00:06:13,810 --> 00:06:22,570
and you add the word woman then you get something close to a queen and that can be that has been achieved
86
00:06:22,570 --> 00:06:23,320
before.
87
00:06:23,320 --> 00:06:31,990
But it's not as like as not as big as a breakthrough as the same with images because images actually
88
00:06:31,990 --> 00:06:37,750
represent not just something that we have like an alphabet or different words actually represent something
89
00:06:37,770 --> 00:06:43,570
I can see in real life and you can see this movement of glass all the way here is pretty distinct.
90
00:06:43,570 --> 00:06:48,520
It's pretty you can really see that it's warm and glossy even though these people never existed.
91
00:06:48,520 --> 00:06:53,170
This is just all gang generated images.
92
00:06:55,480 --> 00:07:02,890
And for example here they're showing that if you try to do the same with just arithmetic operations
93
00:07:02,890 --> 00:07:11,440
for pixels then you will never actually have the result as good because as you can see you just get
94
00:07:11,440 --> 00:07:18,820
like overlay on different images and that's nowhere near to as good as what we had over here.
95
00:07:19,540 --> 00:07:19,950
All right.
96
00:07:19,990 --> 00:07:23,910
So that's what does this.
97
00:07:23,920 --> 00:07:26,320
What does this application image modification.
98
00:07:26,320 --> 00:07:30,030
So this next one is super resolution.
99
00:07:30,340 --> 00:07:35,410
Here's an example from the end Goodfellows presentation where they had an original image which looked
100
00:07:35,410 --> 00:07:35,920
like that.
101
00:07:35,950 --> 00:07:46,920
Then they reduced the resolution so made it like well like a more grainy and less detail.
102
00:07:47,140 --> 00:07:51,720
And then they applied different algorithms to increase their resolution back.
103
00:07:51,730 --> 00:07:57,040
And so that and by Kubik there is as far as s.
104
00:07:57,070 --> 00:08:02,110
NET and this superposition by again.
105
00:08:02,530 --> 00:08:09,040
And so you can see that that can do a really good job even though this was a bit smoother and a really
106
00:08:09,040 --> 00:08:12,010
good job and you can see that it's quite Kristo.
107
00:08:12,010 --> 00:08:20,290
Chris backflushing they could really not really call if they had the image of the the the low resolution
108
00:08:20,480 --> 00:08:25,240
that they created in between here so the image that they took from this what they created and then the
109
00:08:25,240 --> 00:08:28,870
results would be called to compare but I couldn't find that image unfortunately.
110
00:08:28,870 --> 00:08:31,310
So all we have to compare is to the original.
111
00:08:31,680 --> 00:08:39,400
And we can see kind of like how this could perform over well overall well but if you look in detail
112
00:08:39,400 --> 00:08:45,140
like for instance here on the hat there is a pattern and it's not here.
113
00:08:45,290 --> 00:08:50,360
There are some like horses or elephants there that are not here.
114
00:08:50,440 --> 00:08:57,670
There's certain detail here that has been destroyed but nevertheless this is a bear of them.
115
00:08:57,680 --> 00:09:02,860
You could see Prius's So now imagine if you didn't have the original if you just have a low resolution
116
00:09:02,860 --> 00:09:09,780
image and you want to recreate the higher resolution version of that image then then you could use a
117
00:09:09,780 --> 00:09:11,660
gown for that as well.
118
00:09:12,490 --> 00:09:13,280
OK.
119
00:09:13,290 --> 00:09:14,870
Photo realistic images.
120
00:09:14,870 --> 00:09:16,570
This is a very interesting one.
121
00:09:16,570 --> 00:09:24,260
So it was this company owner company was this application online pics depicts which allows you to like
122
00:09:24,260 --> 00:09:31,250
you draw something with just on line and then you apply and generative adversarial literally it just
123
00:09:31,690 --> 00:09:39,420
presses button basically and you get a photo realistic image so you get.
124
00:09:39,520 --> 00:09:46,140
So again he uses this as an input and learns what a human should look like.
125
00:09:46,140 --> 00:09:51,220
He knows kind of from it's training where a human should look like and overlay styles on top of that
126
00:09:51,550 --> 00:09:53,310
overlay.
127
00:09:53,680 --> 00:09:57,570
Turn this image into something that looks like a real photo.
128
00:09:57,790 --> 00:10:04,470
And this service was so popular they got over like 2 million views 2 million visitors on the Web site.
129
00:10:04,580 --> 00:10:10,630
The people who put it up there they just didn't have enough funds to maintain the service so they had
130
00:10:10,630 --> 00:10:12,240
to close that unfortunately.
131
00:10:12,300 --> 00:10:19,500
But there's a really cool YouTube video of a an artist playing around with this and we'll link to it
132
00:10:19,510 --> 00:10:20,250
at the end.
133
00:10:20,270 --> 00:10:27,130
In additional reading or viewing in this case and you'll be able to check it out there's some very interesting
134
00:10:27,130 --> 00:10:27,950
stuff.
135
00:10:28,270 --> 00:10:35,530
But this is really crazy like from just a pencil drawing a line drawing you get a realistic looking
136
00:10:35,530 --> 00:10:39,250
photo like that face aging.
137
00:10:39,250 --> 00:10:44,050
So a popular one so I think these are all Hollywood actors.
138
00:10:44,230 --> 00:10:49,310
I'm not actually sure they look like Hollywood action so.
139
00:10:49,900 --> 00:10:55,660
And I don't really know the background of this how they did it did they lose.
140
00:10:55,720 --> 00:10:58,850
Maybe they took these photos of here at the start and then age them.
141
00:10:59,020 --> 00:11:06,520
But basically the word was applied here is a generator about a central network to generate a sequence
142
00:11:06,520 --> 00:11:08,010
of photos from the photo.
143
00:11:08,010 --> 00:11:15,160
So basically you can use you sort of like a young photo and see how a person will age over time by applying
144
00:11:15,250 --> 00:11:16,630
Genter of adversarial networks.
145
00:11:16,630 --> 00:11:22,990
Of course they have to go through certain training to understand you know the concepts of aging too
146
00:11:23,750 --> 00:11:28,720
for those concepts to be programmed into the nor embedded into the neural network.
147
00:11:28,900 --> 00:11:34,290
But this is another application of general advertising in general.
148
00:11:34,340 --> 00:11:38,410
There are several networks as you can see quite a lot of different applications.
149
00:11:38,450 --> 00:11:47,260
So the couple of resources that I mentioned this one is a paper called as it provides representation
150
00:11:47,260 --> 00:11:53,370
learning with deep convolutional generation generative adversarial networks so DC Gannes This is like
151
00:11:53,370 --> 00:11:59,420
a step above Gannes or an enhancement on top of Gannes whether networks or even deeper.
152
00:11:59,640 --> 00:12:07,560
And you can find it on our archive there and it's actually going to a lot of those examples are we talked
153
00:12:07,560 --> 00:12:07,950
about.
154
00:12:08,070 --> 00:12:15,280
They are given in this paper if you want to read about it and see some additional images and the YouTube
155
00:12:15,340 --> 00:12:19,600
that are referenced is called artist versus picks topics.
156
00:12:19,620 --> 00:12:22,070
Is this humor or horror.
157
00:12:22,080 --> 00:12:22,540
Check it out.
158
00:12:22,540 --> 00:12:27,780
This just this art is drawing these images.
159
00:12:27,780 --> 00:12:32,010
If you just Google artist versus Bicks depicts you'll find this artist drawing these images and then
160
00:12:32,040 --> 00:12:38,690
applying the Gahn to see how it will overlay human skin and hair and tones over and.
161
00:12:38,700 --> 00:12:42,110
Very interesting all of the time it's very creepy.
162
00:12:42,120 --> 00:12:45,200
All right so that's a precaution of Ganze.
163
00:12:45,240 --> 00:12:48,360
Hope you enjoyed this Tauriel and courtsey next time.
164
00:12:48,360 --> 00:12:50,450
Until then enjoy computer vision.
17802
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.