Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:01,790 --> 00:00:05,899
Okay, other areas. Another big sub-area of
artificial intelligence is perceiving the world,
2
00:00:05,899 --> 00:00:09,730
and in large part this is vision
but there's other kinds of perception.
3
00:00:09,730 --> 00:00:11,949
So, things like object recognition, face
recognition.
4
00:00:11,949 --> 00:00:15,079
You probably have a lot of this
technology closer than you think, you
5
00:00:15,079 --> 00:00:18,880
probably have face recognition built into your
cameras, that's actually face detection,
6
00:00:18,880 --> 00:00:22,669
not always face recognition though some cameras do
that too. Segmenting scenes into pieces,
7
00:00:22,669 --> 00:00:26,320
figuring out for a given image what it means,
what's going on.
8
00:00:26,320 --> 00:00:30,529
Here's an example on the left of an
image, and overlaid on the image is the
9
00:00:30,529 --> 00:00:33,780
machine's reconstruction of kind of the
underlying 3D
10
00:00:33,780 --> 00:00:35,180
outline and mesh,
11
00:00:35,180 --> 00:00:39,020
and you can see that reconstructed on
the right. This is from an image, but actually
12
00:00:39,020 --> 00:00:42,660
it turns out that in addition to being
able to do a bunch of cool things in
13
00:00:42,660 --> 00:00:44,380
vision with the image,
14
00:00:44,380 --> 00:00:47,990
one realization we've had in cases like autonomous driving and vision is
15
00:00:47,990 --> 00:00:51,210
we don't have to use the tools that humans
use. We've spent a long time with vision
16
00:00:51,210 --> 00:00:54,830
just trying to use like a camera,
or maybe two cameras slightly apart because
17
00:00:54,830 --> 00:00:58,140
that's what we have, we've got two cameras
slightly apart. But then we realized,
18
00:00:58,140 --> 00:01:01,300
we can do other stuff. So what's this, anybody recognize this?
19
00:01:01,300 --> 00:01:05,010
It's a Kinect. The Kinect's got sensors that you don't.
20
00:01:05,010 --> 00:01:08,790
Sorry you didn't get a rangefinder, a depth detector, you just didn't.
21
00:01:08,790 --> 00:01:11,980
But, you know, we can build them so why
not. And so, now we can do cool things like
22
00:01:11,980 --> 00:01:15,660
take an image and produce a depth map
that isn't just about parallax, looking at
23
00:01:15,660 --> 00:01:18,740
the difference between the two eyes, or
about kind of inferring from occlusion.
24
00:01:18,740 --> 00:01:22,740
You'll notice, people like to think
that vision is all about having two images,
25
00:01:22,740 --> 00:01:26,110
but if you close one eye, you can still see depth.
26
00:01:26,110 --> 00:01:29,990
It's not like the world suddenly goes flat and you shriek. I mean, you close one eye, you still have a sense of depth,
27
00:01:29,990 --> 00:01:33,050
we want to be to build machines that do that,
but at the moment we do pretty well by
28
00:01:33,050 --> 00:01:36,730
using things like depth detectors, cause why not.
29
00:01:36,730 --> 00:01:38,600
Let's take a look at science fiction again.
30
00:01:38,600 --> 00:01:43,120
Does anybody recognize this movie? Does anybody know what this is gonna be?
31
00:01:43,120 --> 00:01:48,720
Yeah, this is Terminator here, and let's
take a look at what it's like to be a
32
00:01:48,720 --> 00:01:52,520
Terminator--it's relevant to vision. So here's what it's like to be a terminator.
33
00:01:52,520 --> 00:01:59,520
It's actually a lot like being Governor of
California, apparently.
34
00:02:00,009 --> 00:02:02,070
Okay, so he looks around,
35
00:02:02,070 --> 00:02:07,090
okay, motorcycle, motorcycle, motorcycle, car,
36
00:02:07,090 --> 00:02:09,229
motorcycle,
37
00:02:09,229 --> 00:02:13,189
some place, target acquired. Okay, so
38
00:02:13,189 --> 00:02:16,969
looking around, outlines, detection.
Identifying what the objects are, figuring
39
00:02:16,969 --> 00:02:20,379
out what the target is, that's in the
movies.
40
00:02:20,379 --> 00:02:21,729
Straight out of science fiction.
41
00:02:21,729 --> 00:02:25,879
Let's look at some vision
recognition system--this is a cute demo from Al Rahimi's lab.
42
00:02:25,879 --> 00:02:27,579
43
00:02:27,579 --> 00:02:31,849
So here we have a camera panning around, and it's kind of--we can do exactly the same
44
00:02:31,849 --> 00:02:34,050
thing but for real. So here we have
45
00:02:34,050 --> 00:02:35,859
the cat.
46
00:02:35,859 --> 00:02:38,319
Cat...
47
00:02:38,319 --> 00:02:41,489
Frog...
48
00:02:41,489 --> 00:02:44,839
Fox...
49
00:02:44,839 --> 00:02:49,379
Dalmatian...
50
00:02:49,379 --> 00:02:51,669
Bulldog... Terminate bulldog, right. Okay.
51
00:02:51,669 --> 00:02:55,229
So,
52
00:02:55,229 --> 00:02:59,269
this is a case where I think it's amazing
how close we can to what people thought
53
00:02:59,269 --> 00:03:04,019
it might be like if this technology
were possible. This is not robots from the
54
00:03:04,019 --> 00:03:05,909
future detecting the bulldog, this is today.
4965
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.