Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:01,999 --> 00:00:05,600
Alright, so we're not yet to the point where we
can tell intentionally funny stories.
2
00:00:05,600 --> 00:00:07,270
What can we do with language?
3
00:00:07,270 --> 00:00:11,260
Well I mentioned Siri. Siri may not be
the best at telling bedtime stories
4
00:00:11,260 --> 00:00:14,960
but Siri does some amazing things, and the
pieces that make that up are
5
00:00:14,960 --> 00:00:18,450
actually used in many places in industry.
There's automatic speech recognition,
6
00:00:18,450 --> 00:00:22,040
where you go from speech to text.
There's text-to-speech synthesis which
7
00:00:22,040 --> 00:00:26,180
is an easier problem where you go from the text
to the speech. And then there's dialogue systems
8
00:00:26,180 --> 00:00:28,740
that integrate all this together,
linguistic analysis.
9
00:00:28,740 --> 00:00:30,349
Let me show you what a
10
00:00:30,349 --> 00:00:32,270
speech recognition system looks like
11
00:00:32,270 --> 00:00:36,400
just kind of when you point it at the TV.
So this is not customized to a
12
00:00:36,400 --> 00:00:39,930
specific speaker, this is not over some
great microphone like how your phones
13
00:00:39,930 --> 00:00:42,640
have really sophisticated microphones
these days.
14
00:00:42,640 --> 00:00:45,920
This is just plugged straight into the TV as essentially
automatic transcription. Let's see
15
00:00:45,920 --> 00:00:50,530
how well it does, and in particular watch the
errors.
16
00:00:50,530 --> 00:00:54,429
17
00:00:54,429 --> 00:00:59,289
18
00:00:59,289 --> 00:01:03,579
19
00:01:03,579 --> 00:01:07,520
20
00:01:07,520 --> 00:01:11,290
21
00:01:11,290 --> 00:01:13,960
So, what's interesting about this. First of all,
22
00:01:13,960 --> 00:01:16,340
is it good? Is it bad?
23
00:01:16,340 --> 00:01:18,310
It does a lot of stuff.
24
00:01:18,310 --> 00:01:20,369
It does a lot of things right. It makes some
mistakes.
25
00:01:20,369 --> 00:01:24,290
The linguistics are of multiple kinds, so for example, here
26
00:01:24,290 --> 00:01:26,690
"The classmates said their final goodbyes".
27
00:01:26,690 --> 00:01:30,280
That's like good buys like Best Buy.
right. That is exactly the sounds that
28
00:01:30,280 --> 00:01:31,670
the reporter said.
29
00:01:31,670 --> 00:01:34,659
The failing here in this case was not
in the acoustic modeling which tries to
30
00:01:34,659 --> 00:01:35,420
connect
31
00:01:35,420 --> 00:01:37,869
the wave forms to the underlying
linguistic sounds.
32
00:01:37,869 --> 00:01:40,479
Here the failing is there multiple
things that sound the same.
33
00:01:40,479 --> 00:01:45,040
You gotta figure which one the reporter
could possibly mean in the context.
34
00:01:45,040 --> 00:01:48,840
This is a sad story right. Somebody died,
people are not going shopping, right, and
35
00:01:48,840 --> 00:01:52,230
we know this is humans, but the
system does not and so in this case this
36
00:01:52,230 --> 00:01:55,070
isn't a problem the language model. there
are other cases here where the problem is
37
00:01:55,070 --> 00:01:58,270
more in the acoustics and putting all the stuff
together in some probabilistic framework
38
00:01:58,270 --> 00:02:00,060
that lets you reconcile it all,
39
00:02:00,060 --> 00:02:01,949
that's a big part of how speech
recognition works.
40
00:02:01,949 --> 00:02:04,720
We'll have more discussion on that later.
41
00:02:04,720 --> 00:02:08,120
We can do more with language than just
manipulate the signal from speech to text.
42
00:02:08,120 --> 00:02:11,830
This is actually my research area. We
can do things like question answering.
43
00:02:11,830 --> 00:02:14,390
We talked a little about Watson and
we'll have a lot more
44
00:02:14,390 --> 00:02:15,430
later in the course about Watson.
45
00:02:15,430 --> 00:02:18,549
So Watson is basically a question answering system.
46
00:02:18,549 --> 00:02:22,189
Like, yeah, there's this layer of remembering
to phrase it as a question 'cause
47
00:02:22,189 --> 00:02:25,049
you're on Jeopardy and making sure you
wager the right amount on the
48
00:02:25,049 --> 00:02:28,680
Daily Double and that kinda stuff but to
a first approximation a question comes in,
49
00:02:28,680 --> 00:02:30,540
Watson kind of has to
50
00:02:30,540 --> 00:02:33,839
dig through a lot of information like
you know largely Wikipedia
51
00:02:33,839 --> 00:02:36,010
and connect up some answer to the
question,
52
00:02:36,010 --> 00:02:39,670
so that you know how to respond.
Basically a question answering system,
53
00:02:39,670 --> 00:02:42,420
although an amazingly cool demonstration
of a very good one.
54
00:02:42,420 --> 00:02:45,730
Another thing we can do is machine
translation. How many of you have used
55
00:02:45,730 --> 00:02:47,390
a tool like Google translate?
56
00:02:47,390 --> 00:02:51,859
So, you know, again, C-3PO. How good
is machine translation?
57
00:02:51,859 --> 00:02:56,529
Well, depends on the language pair. I mean,
if I'm looking at a page, say in Chinese,
58
00:02:56,529 --> 00:03:00,159
and I don't speak any Chinese, the machine
translation's pretty good because I was kind
59
00:03:00,159 --> 00:03:03,359
of starting with nothing. But if I
actually speak the language maybe
60
00:03:03,359 --> 00:03:06,749
I'm better off reading reading it in its natural form.
You can see some of these problems if
61
00:03:06,749 --> 00:03:10,479
you do round trip from say, English to Chinese and back, and you can see how
62
00:03:10,479 --> 00:03:13,389
good what comes back--actually that's a
good way to make an unintentionally funny story.
63
00:03:13,389 --> 00:03:14,409
64
00:03:14,409 --> 00:03:18,459
What else can we do: things like web
search, really are about a lot of things.
65
00:03:18,459 --> 00:03:21,159
It has something to do with the
words but also kind of click stream information,
66
00:03:21,159 --> 00:03:22,340
67
00:03:22,340 --> 00:03:25,370
and kind of a local search and things
like that. And so there's a lot that goes into
68
00:03:25,370 --> 00:03:27,400
web search. A big part of that is the
language.
69
00:03:27,400 --> 00:03:30,670
Text classification, spam filtering.
Again, spam filtering is a case where it's
70
00:03:30,670 --> 00:03:33,579
part language, part not language. We'll
talk more about spam filtering later
71
00:03:33,579 --> 00:03:36,349
--and so on. These are the kinds of things
you can do in the domain of natural language.
72
00:03:36,349 --> 00:03:40,260
We're no longer trying so hard
to tell stories funny or otherwise.
73
00:03:40,260 --> 00:03:41,610
We're trying to build things like this.
74
00:03:41,610 --> 00:03:44,459
And there has been a lot of traction.
There's a lot of stuff we can build.
75
00:03:44,459 --> 00:03:46,999
We're not yet to C-3PO, but
76
00:03:46,999 --> 00:03:49,499
we actually can now translate Russian,
77
00:03:49,499 --> 00:03:51,949
which we couldn't do in the fifties even
though they thought would be able to do
78
00:03:51,949 --> 00:03:55,409
it by the sixties. But now, today, we can.
79
00:03:55,409 --> 00:03:56,959
It only took
80
00:03:56,959 --> 00:03:59,289
something like twelve times longer than
they thought it would.
7223
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.