Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,796 --> 00:00:05,801
(piano music)
2
00:00:14,243 --> 00:00:15,451
- [Voiceover] In the near future,
3
00:00:15,451 --> 00:00:18,751
every object on earth
will be generating data,
4
00:00:18,751 --> 00:00:21,343
including our homes, our cars,
5
00:00:21,343 --> 00:00:23,049
even our bodies.
6
00:00:23,049 --> 00:00:24,269
- Do you see it?
7
00:00:25,146 --> 00:00:26,883
Yeah, right up there.
8
00:00:27,400 --> 00:00:29,399
- [Voiceover] Almost
everything we do today
9
00:00:29,399 --> 00:00:32,072
leaves a trail of digital exhaust,
10
00:00:32,072 --> 00:00:34,698
a perpetual stream of
texts, location data,
11
00:00:34,698 --> 00:00:37,163
and other information that will live on
12
00:00:37,163 --> 00:00:40,038
well after each of us is long gone.
13
00:00:43,136 --> 00:00:45,681
We are now being exposed
to as much information
14
00:00:45,681 --> 00:00:49,528
in a single day as our
15th century ancestors
15
00:00:49,528 --> 00:00:52,857
were exposed to in their entire lifetime.
16
00:00:53,734 --> 00:00:56,035
But we need to be very careful
17
00:00:56,035 --> 00:00:58,081
because in this vast ocean of data
18
00:00:58,081 --> 00:01:01,800
there's a frighteningly
complete picture of us,
19
00:01:01,800 --> 00:01:04,008
where we live, where we go,
20
00:01:04,008 --> 00:01:06,472
what we buy, what we say,
21
00:01:06,472 --> 00:01:10,604
it's all being recorded
and stored forever.
22
00:01:12,399 --> 00:01:15,200
This is the story of an
extraordinary revolution
23
00:01:15,200 --> 00:01:18,454
that's sweeping almost
invisibly through our lives
24
00:01:18,454 --> 00:01:20,546
and about how our planet
is beginning to develop
25
00:01:20,546 --> 00:01:25,305
a nervous system with each of
us acting as human sensors.
26
00:01:26,554 --> 00:01:29,733
This is the human face of big data.
27
00:01:33,353 --> 00:01:35,116
- All these devices and
machines and everything
28
00:01:35,116 --> 00:01:37,243
we're building these days,
whether it's phones or computers
29
00:01:37,243 --> 00:01:41,084
or cars or refrigerators,
are throwing off data.
30
00:01:41,996 --> 00:01:45,506
- Information is being
extracted out of toll booths,
31
00:01:45,506 --> 00:01:46,970
out of parking spaces,
32
00:01:46,970 --> 00:01:48,760
out of Internet searches,
33
00:01:48,760 --> 00:01:51,259
out of Facebook, out of your phone,
34
00:01:51,259 --> 00:01:53,595
tablets, photographs, videos.
35
00:01:53,595 --> 00:01:57,825
- Every single thing that you
do leaves a digital trace.
36
00:01:57,825 --> 00:02:00,336
- The exhaust or evidence of humans
37
00:02:00,336 --> 00:02:04,543
interacting with technology
and what side effect that has
38
00:02:04,543 --> 00:02:08,089
and that's literally, it's just
this massive amount of data.
39
00:02:14,151 --> 00:02:15,440
- What we're doing is
we're measuring things
40
00:02:15,440 --> 00:02:16,649
more than we ever have.
41
00:02:16,649 --> 00:02:19,691
It's that active measurement
that produces data.
42
00:02:19,691 --> 00:02:22,072
- If you were some omniscient god
43
00:02:22,072 --> 00:02:24,839
and you could look at the
footprints of electric devices,
44
00:02:24,839 --> 00:02:26,791
you could kind of see the world.
45
00:02:26,791 --> 00:02:30,103
If the whole world is being
recorded in real time,
46
00:02:30,103 --> 00:02:31,940
you could see everything
that is going on in the world
47
00:02:31,940 --> 00:02:34,230
through the footprints.
48
00:02:34,230 --> 00:02:35,519
I think it's a lot like
written language, right,
49
00:02:35,519 --> 00:02:36,986
it's just at some point
they got to the point
50
00:02:36,986 --> 00:02:38,370
where you had to start writing stuff down.
51
00:02:38,370 --> 00:02:40,451
You just got to the point
where it wouldn't work
52
00:02:40,451 --> 00:02:42,042
unless we wrote it down,
which is making the same point
53
00:02:42,042 --> 00:02:43,879
where well it ain't gonna
work unless we write
54
00:02:43,879 --> 00:02:45,715
all the data down and then look at it.
55
00:02:45,715 --> 00:02:48,844
- And all that data coming in is big data.
56
00:02:49,965 --> 00:02:53,185
- We estimate that by 2020
57
00:02:53,185 --> 00:02:56,985
the data volumes will be
at about 40 zigabytes.
58
00:02:56,985 --> 00:02:58,112
Just to put it in perspective,
59
00:02:58,112 --> 00:03:00,530
if you were to add up
every single grain of sand
60
00:03:00,530 --> 00:03:03,702
on the planet and multiply that by 75,
61
00:03:03,702 --> 00:03:07,125
that would be 40 zigabytes of information.
62
00:03:08,549 --> 00:03:11,477
- All the data processing
we did in the last two years
63
00:03:11,477 --> 00:03:13,057
is more than all the data processing
64
00:03:13,057 --> 00:03:16,646
we did in the last 3,000 years.
65
00:03:16,646 --> 00:03:18,819
- And so the more information we get,
66
00:03:18,819 --> 00:03:21,998
the larger the problems
will be that we solve.
67
00:03:23,828 --> 00:03:26,873
- Every powerful tool has a
dark side, every last one.
68
00:03:26,873 --> 00:03:28,256
Anything that's going to change the world,
69
00:03:28,256 --> 00:03:31,510
by definition has to be able
to change it for the worse
70
00:03:31,510 --> 00:03:32,684
as much as for the better.
71
00:03:32,684 --> 00:03:35,598
It doesn't work one way without the other.
72
00:03:35,598 --> 00:03:36,934
- When it comes to big
data, a lot of people
73
00:03:36,934 --> 00:03:38,433
are very nervous.
74
00:03:38,433 --> 00:03:41,106
Data can be used in any number of ways
75
00:03:41,106 --> 00:03:43,698
that you're either aware of or you're not.
76
00:03:43,698 --> 00:03:47,333
The less aware of the use
of that data that you are,
77
00:03:47,333 --> 00:03:50,215
the less power you have
in the coming society
78
00:03:50,215 --> 00:03:51,214
we're going to live.
79
00:03:51,214 --> 00:03:53,178
- Well sort of just in the
beginning of this big data thing,
80
00:03:53,178 --> 00:03:54,297
you don't know how it's
going to change it,
81
00:03:54,297 --> 00:03:55,937
but you just know it is.
82
00:03:55,937 --> 00:04:00,942
(dramatic music)
83
00:04:07,878 --> 00:04:11,760
- The first real data set to
change everything in the world
84
00:04:11,760 --> 00:04:13,933
was the astronomical data set,
85
00:04:13,933 --> 00:04:18,151
meticulously collected over
tens of years by Copernicus
86
00:04:18,151 --> 00:04:20,767
that ultimately revealed, even
though the sun seemed to be
87
00:04:20,767 --> 00:04:23,405
moving over the sky every
morning and every night,
88
00:04:23,405 --> 00:04:25,949
the sun is not moving,
it is we who are moving,
89
00:04:25,949 --> 00:04:27,785
it is we who are spinning.
90
00:04:27,785 --> 00:04:30,074
It happened again when
we suddenly could see
91
00:04:30,074 --> 00:04:31,342
beneath the visible level
92
00:04:31,342 --> 00:04:33,968
and the microscope in the 1650s and 60s,
93
00:04:33,968 --> 00:04:36,839
opened up the invisible world
94
00:04:36,839 --> 00:04:40,396
and we for the first time
were seeing cells and bacteria
95
00:04:40,396 --> 00:04:43,324
and creatures that we
couldn't imagine were there.
96
00:04:43,324 --> 00:04:46,124
It then happened again when
we revealed the atomic world,
97
00:04:46,124 --> 00:04:47,578
when we said wait a
second, there's a level
98
00:04:47,578 --> 00:04:50,087
below the optical microscope
where we could begin
99
00:04:50,087 --> 00:04:53,632
to see things at billionths of
a meter at a nanometer scale,
100
00:04:53,632 --> 00:04:55,433
where we imagined the atom and the nucleus
101
00:04:55,433 --> 00:04:57,559
and the electron, where
we understood that light
102
00:04:57,559 --> 00:05:00,069
is electromagnetic frequencies.
103
00:05:00,069 --> 00:05:02,731
But now, there's actual
a supervisible world
104
00:05:02,731 --> 00:05:04,192
coming into play.
105
00:05:04,192 --> 00:05:06,947
Ironically, big data is a microscope.
106
00:05:06,947 --> 00:05:09,829
We're now collecting exabytes
and petabytes of data
107
00:05:09,829 --> 00:05:11,874
and we're looking through that microscope
108
00:05:11,874 --> 00:05:13,963
using incredibly powerful algorithms
109
00:05:13,963 --> 00:05:17,561
to see what we would never see before.
110
00:05:20,564 --> 00:05:22,826
- Before what we did was we
111
00:05:22,826 --> 00:05:25,826
thought of things and
then we wrote it down
112
00:05:25,826 --> 00:05:28,491
and that became knowledge.
113
00:05:30,041 --> 00:05:31,215
Big data's kind of the opposite.
114
00:05:31,215 --> 00:05:35,388
You have a pile of data
that isn't knowledge really
115
00:05:35,388 --> 00:05:37,886
until you start looking
at it and noticing wait,
116
00:05:37,886 --> 00:05:40,105
maybe if you shift it this
way and you shift it this way,
117
00:05:40,105 --> 00:05:43,019
this turns into this interesting
piece of information.
118
00:05:43,019 --> 00:05:45,855
- I think that the BDAD moment,
119
00:05:45,855 --> 00:05:48,040
you know, before data, after data moment,
120
00:05:48,040 --> 00:05:49,501
is really Search.
121
00:05:49,501 --> 00:05:52,372
(tapping)
122
00:05:52,372 --> 00:05:56,040
That was the moment at which we got a tool
123
00:05:56,040 --> 00:05:59,562
that was used by hundreds
of millions of people
124
00:05:59,562 --> 00:06:01,015
within a few years,
125
00:06:01,015 --> 00:06:04,444
where we could navigate
an incredible amount
126
00:06:04,444 --> 00:06:06,241
of information.
127
00:06:06,241 --> 00:06:10,065
We took all of human knowledge
that was in text, right,
128
00:06:10,065 --> 00:06:11,669
and we put it on the web
129
00:06:11,669 --> 00:06:13,365
and we thought to
ourselves, "Well we're done.
130
00:06:13,365 --> 00:06:14,749
"Wow that was hard."
131
00:06:14,749 --> 00:06:18,084
And now we realize that
was the first minute
132
00:06:18,084 --> 00:06:21,013
of the first inning of the game, right,
133
00:06:21,013 --> 00:06:23,105
because that was just the
knowledge we already had
134
00:06:23,105 --> 00:06:25,359
and the knowledge that we
continue to add to the web
135
00:06:25,359 --> 00:06:28,732
at a relatively slow pace, you know.
136
00:06:28,732 --> 00:06:30,531
But there is so much more information
137
00:06:30,531 --> 00:06:32,461
that we have not
digitized and so much more
138
00:06:32,461 --> 00:06:35,214
information that we're
about to take advantage of.
139
00:06:35,214 --> 00:06:39,040
(piano music)
140
00:06:39,040 --> 00:06:41,552
- [Voiceover] In recent years,
our technology has allowed us
141
00:06:41,552 --> 00:06:45,470
to store and process
mass quantities of data.
142
00:06:47,115 --> 00:06:49,392
Visualizing that data will allow us to see
143
00:06:49,392 --> 00:06:51,953
complex systems function,
144
00:06:53,074 --> 00:06:54,922
see patterns and meaning in ways
145
00:06:54,922 --> 00:06:57,752
that were previously impossible.
146
00:06:59,466 --> 00:07:03,761
Almost everything is
measurable and quantifiable.
147
00:07:12,528 --> 00:07:14,294
- So when I look at data,
what's exciting to me
148
00:07:14,294 --> 00:07:16,166
is kind of recontextualizing that data
149
00:07:16,166 --> 00:07:18,211
and taking it and putting
it back into a form
150
00:07:18,211 --> 00:07:21,221
that we can perceive,
understand, talk about,
151
00:07:21,221 --> 00:07:22,727
think about.
152
00:07:24,185 --> 00:07:25,522
- [Voiceover] This is the
data for airplane traffic
153
00:07:25,522 --> 00:07:28,403
over North America for a 24-hour period.
154
00:07:28,403 --> 00:07:29,868
When it's visualized,
you see everything starts
155
00:07:29,868 --> 00:07:32,447
to fade to black as
everyone goes to sleep,
156
00:07:32,447 --> 00:07:34,376
then on the West Coast,
planes start moving across
157
00:07:34,376 --> 00:07:36,375
on red-eye flights to the east
158
00:07:36,375 --> 00:07:38,293
and you see everyone waking
up on the East Coast,
159
00:07:38,293 --> 00:07:42,136
followed by European flights
in the upper right-hand corner.
160
00:07:42,136 --> 00:07:44,473
I think it's one thing to say
that there's 140,000 planes
161
00:07:44,473 --> 00:07:47,646
being monitored by the federal
government at any one time
162
00:07:47,646 --> 00:07:49,028
and it's another thing to see that system
163
00:07:49,028 --> 00:07:51,766
as it ebbs and flows in front of you.
164
00:07:57,082 --> 00:07:59,336
These are text messages being
sent in the city of Amsterdam
165
00:07:59,336 --> 00:08:00,801
on December 31st.
166
00:08:00,801 --> 00:08:02,427
You're seeing the daily
flow of text messages
167
00:08:02,427 --> 00:08:05,019
from different parts of the
city until we approach midnight,
168
00:08:05,019 --> 00:08:06,434
where everyone says--
169
00:08:06,434 --> 00:08:09,149
- [Voiceover] Happy New Year!
170
00:08:09,654 --> 00:08:12,618
- It takes people or
programs or algorithms
171
00:08:12,618 --> 00:08:15,325
to connect it all together
to make sense of it
172
00:08:15,325 --> 00:08:16,581
and that's what's important.
173
00:08:16,581 --> 00:08:20,090
We have every single action
that we do in this world
174
00:08:20,090 --> 00:08:23,181
is triggering off some amount of data
175
00:08:23,181 --> 00:08:24,634
and most of that data is meaningless
176
00:08:24,634 --> 00:08:27,644
until someone adds some
interpretation of it,
177
00:08:27,644 --> 00:08:30,485
someone adds a narrative around it.
178
00:08:36,732 --> 00:08:38,998
- Often, we sort of think
of data as stranded numbers,
179
00:08:38,998 --> 00:08:41,544
but they're tethered to things
180
00:08:41,544 --> 00:08:44,425
and if we follow those
tethers in the right ways,
181
00:08:44,425 --> 00:08:47,680
then we can find the real-world objects
182
00:08:47,680 --> 00:08:49,446
and the real-world
stories that were there.
183
00:08:49,446 --> 00:08:52,654
So a lot of the work is that kind of work.
184
00:08:52,654 --> 00:08:56,036
It's almost investigative
work of trying to follow
185
00:08:56,036 --> 00:08:59,412
that trail from the data
to what actually happened.
186
00:09:05,891 --> 00:09:07,553
- Sometimes the power of large data sets
187
00:09:07,553 --> 00:09:09,801
isn't immediately obvious.
188
00:09:10,481 --> 00:09:11,989
Google flu trends is a great example
189
00:09:11,989 --> 00:09:14,988
of taking a look at a
massive corpus of data
190
00:09:14,988 --> 00:09:17,033
and deriving somewhat
tangential information
191
00:09:17,033 --> 00:09:19,878
that can actually be really valuable.
192
00:09:19,878 --> 00:09:21,923
- [Voiceover] Until recently,
the only way to detect
193
00:09:21,923 --> 00:09:24,678
a flu epidemic was by
accumulating information
194
00:09:24,678 --> 00:09:27,431
submitted by doctors about patient visits,
195
00:09:27,431 --> 00:09:30,941
a process that took about
two weeks to reach the CDC.
196
00:09:30,941 --> 00:09:33,022
So the researchers turned it around.
197
00:09:33,022 --> 00:09:36,149
They asked themselves if they
could predict a flu outbreak
198
00:09:36,149 --> 00:09:40,412
in real time simply using
data from online searches.
199
00:09:40,412 --> 00:09:42,970
So they set out to do the near impossible,
200
00:09:42,970 --> 00:09:45,713
searching the searches, billions of them,
201
00:09:45,713 --> 00:09:48,513
spanning five years to see if user queries
202
00:09:48,513 --> 00:09:50,774
could tell them something.
203
00:09:51,976 --> 00:09:53,604
- When we do searches on Google,
204
00:09:53,604 --> 00:09:55,033
we all think of it as a one-way street,
205
00:09:55,033 --> 00:09:57,276
that we're going into Google
and extracting information
206
00:09:57,276 --> 00:09:58,996
from Google, but one of
the things we don't really
207
00:09:58,996 --> 00:10:00,704
think about very much is
we're actually contributing
208
00:10:00,704 --> 00:10:03,329
information back simply
by doing the search.
209
00:10:03,329 --> 00:10:05,711
- [Voiceover] And that's where
the breakthrough occurred.
210
00:10:05,711 --> 00:10:07,965
In looking at all the data,
they saw that not only
211
00:10:07,965 --> 00:10:10,544
did the number of flu-related
searches correlate
212
00:10:10,544 --> 00:10:12,311
with the people who had the flu,
213
00:10:12,311 --> 00:10:14,438
but they also could
identify the search terms
214
00:10:14,438 --> 00:10:17,575
that could let them accurately
predict flu outbreaks
215
00:10:17,575 --> 00:10:20,862
up to two weeks before the CDC.
216
00:10:20,862 --> 00:10:23,034
- The CDC system takes about a week or two
217
00:10:23,034 --> 00:10:25,801
for the numbers to sort of fully flow in.
218
00:10:25,801 --> 00:10:28,055
What Google could do is
to say based on our model,
219
00:10:28,055 --> 00:10:29,764
we'll have it on the spot.
220
00:10:29,764 --> 00:10:32,390
We'll just run the algorithm
221
00:10:32,390 --> 00:10:34,738
based on how people are
searching right now.
222
00:10:34,738 --> 00:10:35,946
- And now we have, for the first time,
223
00:10:35,946 --> 00:10:38,829
this real-time feedback
loop where we can see
224
00:10:38,829 --> 00:10:41,868
in real time what's going
on and respond to it.
225
00:10:42,500 --> 00:10:44,206
- Now there is a flip side to this though
226
00:10:44,206 --> 00:10:46,678
and that is there was a
big story this year that
227
00:10:46,678 --> 00:10:49,711
there was a lot of media attention about
228
00:10:49,711 --> 00:10:52,059
what an intense flu season this was.
229
00:10:52,059 --> 00:10:52,965
And so what did that do?
230
00:10:52,965 --> 00:10:55,022
That drove up search.
231
00:10:55,022 --> 00:10:56,567
That drove people who were more interested
232
00:10:56,567 --> 00:10:57,986
in what's going on with this flu
233
00:10:57,986 --> 00:11:01,414
or might have made more
people think I must have it
234
00:11:01,414 --> 00:11:04,919
and so they were off,
they got it way wrong.
235
00:11:05,923 --> 00:11:08,340
- So you know, one way
to think about big data
236
00:11:08,340 --> 00:11:10,723
and all of the computational tools
237
00:11:10,723 --> 00:11:13,059
that we wrap around that big data
238
00:11:13,059 --> 00:11:16,220
to let us discover patterns
that are in the data
239
00:11:16,220 --> 00:11:20,698
is when we point all that
machinery at ourselves.
240
00:11:22,911 --> 00:11:25,038
- [Voiceover] At MTI, Deb
Roy and his colleagues
241
00:11:25,038 --> 00:11:26,712
wanted to see if they could understand
242
00:11:26,712 --> 00:11:28,830
how children acquire language.
243
00:11:30,265 --> 00:11:32,217
- And we realize that no one really knew
244
00:11:32,217 --> 00:11:34,806
for a simple reason, there was no data.
245
00:11:34,806 --> 00:11:36,386
- [Voiceover] After he and his wife Rupal
246
00:11:36,386 --> 00:11:38,896
brought their newborn son
home from the hospital,
247
00:11:38,896 --> 00:11:41,360
they did what every
normal parent would do,
248
00:11:41,360 --> 00:11:43,918
mount a camera in the ceiling
of each room in their home
249
00:11:43,918 --> 00:11:47,281
and record every moment of
their lives for two years,
250
00:11:47,281 --> 00:11:50,470
a mere 200 gigabytes of
data recorded every day.
251
00:11:53,509 --> 00:11:55,682
- [Deb] We ended up
transcribing somewhere between
252
00:11:55,682 --> 00:11:57,984
eight and nine million words of speech.
253
00:11:57,984 --> 00:12:00,238
- [Voiceover] Ga ga ga.
254
00:12:00,238 --> 00:12:03,733
- And as soon as we had that,
we could go and identify
255
00:12:03,733 --> 00:12:08,504
the exact moment where my
son first said a new word.
256
00:12:10,380 --> 00:12:12,676
- [Deb] We started calling them births.
257
00:12:15,505 --> 00:12:17,551
- We took this idea of a
word birth and we started
258
00:12:17,551 --> 00:12:20,817
thinking about why don't
we trace back in time
259
00:12:20,817 --> 00:12:23,773
and look at the gestation
period for that word.
260
00:12:25,532 --> 00:12:28,030
One example of this was water.
261
00:12:28,030 --> 00:12:32,679
So we looked at every time
my son heard the word water,
262
00:12:32,679 --> 00:12:36,062
what was happening, where
in the house were they,
263
00:12:36,062 --> 00:12:37,723
how were they moving about
264
00:12:37,723 --> 00:12:40,931
and using that visual information
265
00:12:40,931 --> 00:12:43,360
to capture something about the context
266
00:12:43,360 --> 00:12:46,161
within which the words are used.
267
00:12:46,161 --> 00:12:47,835
We call them wordscapes.
268
00:12:47,835 --> 00:12:49,252
Then we could ask the question
269
00:12:49,252 --> 00:12:52,425
how does the wordscape
associated with a word
270
00:12:52,425 --> 00:12:56,466
predict when my son will
actually start using that word?
271
00:12:56,466 --> 00:12:58,686
- [Voiceover] What they
learned from watching Deb's son
272
00:12:58,686 --> 00:13:02,817
was that the texture of the
wordscapes had predictive power.
273
00:13:02,817 --> 00:13:04,985
If most of the previous
research had indicated
274
00:13:04,985 --> 00:13:08,378
that the way language was
learned was through repetition,
275
00:13:08,378 --> 00:13:10,377
then this analysis of the
data showed that it wasn't
276
00:13:10,377 --> 00:13:14,758
actually repetition that
generated learning, but context.
277
00:13:14,758 --> 00:13:16,851
Words with more distinct wordscapes,
278
00:13:16,851 --> 00:13:19,849
that is words heard in
many varied locations,
279
00:13:19,849 --> 00:13:21,728
would be learned first.
280
00:13:21,728 --> 00:13:23,646
- Not only is that true,
281
00:13:23,646 --> 00:13:26,575
but the wordscapes are far more predictive
282
00:13:26,575 --> 00:13:28,109
of when a word will be learned
283
00:13:28,109 --> 00:13:31,003
than the frequency, the number
of times it's actually heard.
284
00:13:31,003 --> 00:13:33,292
It's like we're building
a new kind of instrument,
285
00:13:33,292 --> 00:13:35,256
like we're building a microscope
286
00:13:35,256 --> 00:13:38,684
and we're able to examine
something that is around us,
287
00:13:38,684 --> 00:13:42,067
but it has a structure
and patterns and beauty
288
00:13:42,067 --> 00:13:45,402
that are invisible without
the right instruments
289
00:13:45,402 --> 00:13:48,783
and all of this data is opening up
290
00:13:48,783 --> 00:13:52,963
to our ability to
perceive things around us.
291
00:13:53,548 --> 00:13:55,833
(giggling)
292
00:13:55,833 --> 00:13:57,253
- He's walking.
293
00:13:57,253 --> 00:14:02,258
(beeping)
294
00:14:03,935 --> 00:14:05,608
- A lot of people don't realize
295
00:14:05,608 --> 00:14:07,991
that when a baby is born premature,
296
00:14:07,991 --> 00:14:11,076
it can develop infection in the hospital
297
00:14:11,076 --> 00:14:13,081
and it can kill them.
298
00:14:15,585 --> 00:14:19,847
In our research, we started
to just look at infection.
299
00:14:19,847 --> 00:14:22,928
By the time the baby is
physically showing signs
300
00:14:22,928 --> 00:14:27,025
of having infection, they
are very, very unwell.
301
00:14:27,611 --> 00:14:30,748
So the very first time
that I went into a neonatal
302
00:14:30,748 --> 00:14:32,922
intensive care unit, I was amazed
303
00:14:32,922 --> 00:14:35,711
by the sights, the sound, the smell,
304
00:14:35,711 --> 00:14:37,547
just the whole environment,
305
00:14:37,547 --> 00:14:40,432
but mainly for me, the data.
306
00:14:41,472 --> 00:14:45,319
What shocked me was the
amount of data lost.
307
00:14:45,319 --> 00:14:46,981
They showed me the paper chart
308
00:14:46,981 --> 00:14:49,654
that the information's recorded onto.
309
00:14:49,654 --> 00:14:52,779
One number every hour for
the baby's heart rate,
310
00:14:52,779 --> 00:14:55,533
the respiration, the blood oxygen.
311
00:14:55,533 --> 00:14:58,846
Now in that time, the
baby's heart has beaten
312
00:14:58,846 --> 00:15:00,962
more than 7,000 times,
313
00:15:00,962 --> 00:15:03,437
they breathe more than 2,000 times,
314
00:15:03,437 --> 00:15:06,610
and the monitor showing
the blood oxygen level
315
00:15:06,610 --> 00:15:10,233
has showed that more than three
and a half thousand times.
316
00:15:10,233 --> 00:15:11,749
I said, "Well, where's all the data going
317
00:15:11,749 --> 00:15:13,110
"that's in those machines?"
318
00:15:13,110 --> 00:15:16,038
And they said, "Oh it
scrolls out of the memory."
319
00:15:16,038 --> 00:15:21,043
So we have an enormous
amount of data lost.
320
00:15:21,384 --> 00:15:23,429
So we're trying to gather that information
321
00:15:23,429 --> 00:15:25,649
and use it over a longer time
322
00:15:25,649 --> 00:15:28,148
in much more complex ways than before
323
00:15:28,148 --> 00:15:31,622
and we try and write computing code
324
00:15:31,622 --> 00:15:34,203
to look at the trends in the monitors
325
00:15:34,203 --> 00:15:35,585
and the trends in the data
326
00:15:35,585 --> 00:15:39,630
to see how that can tell us
when a baby's becoming unwell.
327
00:15:39,630 --> 00:15:42,640
- [Voiceover] So Dr. McGregor
did what data scientists do,
328
00:15:42,640 --> 00:15:44,511
she looked for the invisible.
329
00:15:44,511 --> 00:15:46,220
She and her team analyzed the data
330
00:15:46,220 --> 00:15:49,113
from thousands of heart beats
and what they discovered
331
00:15:49,113 --> 00:15:51,194
were minute fluctuations
that could predict
332
00:15:51,194 --> 00:15:53,607
the onset of life-threatening infections
333
00:15:53,607 --> 00:15:56,420
long before physical symptoms appeared.
334
00:15:56,420 --> 00:16:00,046
- When the body first starts
dealing with infection,
335
00:16:00,046 --> 00:16:01,800
there are these subtle changes
336
00:16:01,800 --> 00:16:04,938
and that's why we have to
watch every single heart beat.
337
00:16:04,938 --> 00:16:07,228
And what we're finding is
that when you're starting
338
00:16:07,228 --> 00:16:10,401
to become unwell, the
heart's ability to react,
339
00:16:10,401 --> 00:16:14,329
to speed up and slow down, gets subdued.
340
00:16:16,542 --> 00:16:19,425
The human body has always been
341
00:16:19,425 --> 00:16:21,830
exhibiting these certain things.
342
00:16:21,830 --> 00:16:25,842
The difference is we've started to gather
343
00:16:25,842 --> 00:16:28,315
more information about the body now
344
00:16:28,315 --> 00:16:32,360
so that we can build this virtual person.
345
00:16:32,360 --> 00:16:35,915
The better we have the
virtual representation,
346
00:16:35,915 --> 00:16:38,368
then the better we can start to understand
347
00:16:38,368 --> 00:16:41,053
what will happen to them in the future.
348
00:16:41,053 --> 00:16:44,968
Back in 1999 I was pregnant
with my first child.
349
00:16:44,968 --> 00:16:48,560
She was born premature
and she passed away.
350
00:16:49,217 --> 00:16:52,526
There was no other viable outcome for her.
351
00:16:52,526 --> 00:16:56,988
But there are so many others
who have just been born early
352
00:16:56,988 --> 00:17:01,626
and they just need that
opportunity to grow and develop.
353
00:17:02,585 --> 00:17:06,420
We want to let the
computers monitor a baby
354
00:17:06,420 --> 00:17:10,302
as it breathes, as its
heart beats, as it sleeps,
355
00:17:10,302 --> 00:17:15,160
so that these algorithms are
watching for certain behaviors
356
00:17:15,160 --> 00:17:19,169
and if something starts
to go wrong for that baby,
357
00:17:19,169 --> 00:17:22,968
we have the ability to intervene.
358
00:17:25,055 --> 00:17:27,727
If we can just save one life,
359
00:17:27,727 --> 00:17:31,987
then for me personally,
it's already worthwhile.
360
00:17:34,654 --> 00:17:38,791
- Everybody understands
what it takes to digitize
361
00:17:38,791 --> 00:17:43,626
photography, a movie,
a magazine, newspaper,
362
00:17:43,626 --> 00:17:46,508
but they haven't yet grasped what it means
363
00:17:46,508 --> 00:17:50,686
to digitize the medical
essence of a human being.
364
00:17:52,273 --> 00:17:56,201
Everything about us now
that's medically relevant
365
00:17:56,201 --> 00:17:57,955
can be captured.
366
00:17:57,955 --> 00:18:01,361
With sensors, we can
digitize all of our metrics
367
00:18:01,361 --> 00:18:04,510
and with imaging, we
can digitize our anatomy
368
00:18:04,510 --> 00:18:06,091
and with our sequence of our DNA,
369
00:18:06,091 --> 00:18:08,851
we can digitize our biology.
370
00:18:10,182 --> 00:18:12,552
- The data story in the
genome is the fact that
371
00:18:12,552 --> 00:18:15,655
we have six billion data
points sitting in our genomes
372
00:18:15,655 --> 00:18:18,738
that we've never had access to before.
373
00:18:20,696 --> 00:18:22,253
When you sequence a person's genome,
374
00:18:22,253 --> 00:18:24,624
there are known differences
in the human genome
375
00:18:24,624 --> 00:18:27,216
that can predict a risk for a disease,
376
00:18:27,216 --> 00:18:29,134
or that you're a carrier for a disease,
377
00:18:29,134 --> 00:18:31,144
or that you have a certain ancestry.
378
00:18:31,144 --> 00:18:33,352
There's a lot of information
packed in the genome
379
00:18:33,352 --> 00:18:36,112
that we're starting to
learn more and more about.
380
00:18:38,326 --> 00:18:41,499
Getting your own personal
information through your genome
381
00:18:41,499 --> 00:18:43,544
would not have been possible
382
00:18:43,544 --> 00:18:45,952
even 10 years ago because of cost.
383
00:18:45,952 --> 00:18:47,923
The technologies that have enabled this
384
00:18:47,923 --> 00:18:50,178
have dropped precipitously
and now we're able to
385
00:18:50,178 --> 00:18:55,012
get a really good look at
your genome for under $500.
386
00:18:55,012 --> 00:18:58,522
- And when it becomes
100 bucks or 10 bucks,
387
00:18:58,522 --> 00:19:02,166
we're going to have
everyone's genome as data.
388
00:19:04,658 --> 00:19:06,332
- The results came back on Tuesday,
389
00:19:06,332 --> 00:19:08,714
it was October 2nd, 1996.
390
00:19:08,714 --> 00:19:11,643
I was diagnosed that
day with breast cancer.
391
00:19:11,643 --> 00:19:14,106
A year out of treatment, I
found a lump on the other breast
392
00:19:14,106 --> 00:19:17,151
in the exact same position and I went in
393
00:19:17,151 --> 00:19:20,238
and they told me that I
had breast cancer again.
394
00:19:21,370 --> 00:19:23,996
Sedona's known about me being
tested for the BRCA gene,
395
00:19:23,996 --> 00:19:25,531
she's known my sister has tested,
396
00:19:25,531 --> 00:19:26,925
she knows my other sister tested
397
00:19:26,925 --> 00:19:29,180
and was negative for the gene mutation
398
00:19:29,180 --> 00:19:32,002
and so she actually told me,
"When I'm 18, I want to test,
399
00:19:32,002 --> 00:19:34,639
"you know, and see if I have
this gene mutation or not."
400
00:19:34,639 --> 00:19:39,185
I am gonna be completely distraught
401
00:19:39,185 --> 00:19:42,368
if I hand this gene down to my kid.
402
00:19:42,368 --> 00:19:44,785
- Do you know what your chances
are of having the mutation
403
00:19:44,785 --> 00:19:45,947
that your mom has?
404
00:19:45,947 --> 00:19:47,085
- I'd say 50/50.
405
00:19:47,085 --> 00:19:48,841
- You're exactly right.
406
00:19:48,841 --> 00:19:51,130
BRCA2 is a gene that we all have,
407
00:19:51,130 --> 00:19:52,978
it's called tumor suppressor gene,
408
00:19:52,978 --> 00:19:55,128
but women, if you have
a mutation in the gene
409
00:19:55,128 --> 00:19:57,766
it causes the gene not to
function like it should.
410
00:19:57,766 --> 00:20:01,392
So the risk mainly of
breast and ovarian cancer
411
00:20:01,392 --> 00:20:04,076
is a lot higher than in
the general population.
412
00:20:04,076 --> 00:20:06,575
- An average woman would have a 12% risk
413
00:20:06,575 --> 00:20:08,214
of getting breast cancer in a lifetime
414
00:20:08,214 --> 00:20:09,551
and most women aren't going out there,
415
00:20:09,551 --> 00:20:11,387
getting preventive mastectomies,
416
00:20:11,387 --> 00:20:13,600
but when you're faced with an 87% risk
417
00:20:13,600 --> 00:20:16,308
of getting breast cancer in your lifetime,
418
00:20:16,308 --> 00:20:20,486
it kind of makes that a possible choice.
419
00:20:23,316 --> 00:20:26,070
- [Voiceover] You'll need
to swish this mouth wash
420
00:20:26,070 --> 00:20:28,040
for 30 seconds.
421
00:20:28,708 --> 00:20:30,510
- We are definitely moving into a world
422
00:20:30,510 --> 00:20:33,334
where the patient or the person
is at the center of things
423
00:20:33,334 --> 00:20:36,350
and hopefully also at the controls.
424
00:20:36,971 --> 00:20:38,691
People will have access to the data
425
00:20:38,691 --> 00:20:43,235
that is informative around
the type of disease they have
426
00:20:43,235 --> 00:20:45,827
and that data then can
point much more directly
427
00:20:45,827 --> 00:20:47,953
to proper treatments,
428
00:20:47,953 --> 00:20:49,627
but the data can also say that a treatment
429
00:20:49,627 --> 00:20:51,963
works for a person or it
doesn't work for a person
430
00:20:51,963 --> 00:20:53,671
based on their genetic profile
431
00:20:53,671 --> 00:20:55,345
and we're gonna start moving more and more
432
00:20:55,345 --> 00:20:57,518
into this notion of personalized medicine
433
00:20:57,518 --> 00:20:59,063
as we learn more about the genome
434
00:20:59,063 --> 00:21:01,236
and the study of pharmacogenetics,
435
00:21:01,236 --> 00:21:05,037
which is how do our genes
influence the drugs we take.
436
00:21:05,037 --> 00:21:07,408
Ultimately, instead of treating disease,
437
00:21:07,408 --> 00:21:09,116
is there data that could really help us
438
00:21:09,116 --> 00:21:12,673
move away from contracting
these illnesses to begin with
439
00:21:12,673 --> 00:21:15,723
and go more toward a preventive model?
440
00:21:15,723 --> 00:21:20,228
(mellow music)
441
00:21:20,228 --> 00:21:25,202
- Now you can't talk about
information separate from health.
442
00:21:25,202 --> 00:21:26,484
How you feel is information,
443
00:21:26,484 --> 00:21:28,200
how you respond to a drug is information,
444
00:21:28,200 --> 00:21:30,037
your genetic code is information.
445
00:21:30,037 --> 00:21:32,001
What's really happening is
when we start collecting it,
446
00:21:32,001 --> 00:21:32,919
we're going to start seeing it
447
00:21:32,919 --> 00:21:34,964
and we're going to start interpreting it.
448
00:21:35,807 --> 00:21:38,096
We're beginning the age
of collecting information
449
00:21:38,096 --> 00:21:40,398
from sensors that are cheap and ubiquitous
450
00:21:40,398 --> 00:21:42,513
that we can process continuously
451
00:21:42,513 --> 00:21:45,159
and we can actually start knowing things.
452
00:21:45,159 --> 00:21:47,239
- If we monitored our
health throughout the day,
453
00:21:47,239 --> 00:21:50,737
continuously every second,
what would that really enable?
454
00:21:50,737 --> 00:21:53,422
- And there's now a lot
of really great technology
455
00:21:53,422 --> 00:21:57,129
coming out around this sense
of tracking and monitoring
456
00:21:57,129 --> 00:22:00,058
and we have all kinds of
sensor companies and devices.
457
00:22:00,058 --> 00:22:01,859
- We're actually collecting
a lot of physiological
458
00:22:01,859 --> 00:22:04,265
information, you know,
heart rate, breathing,
459
00:22:04,265 --> 00:22:07,324
in real-time, you know,
every minute, every second.
460
00:22:08,992 --> 00:22:11,491
- [Linda] People wanting to
measure their daily activities
461
00:22:11,491 --> 00:22:13,571
and being able to track your own sleep,
462
00:22:13,571 --> 00:22:16,581
being able to watch and
monitor your own food uptake,
463
00:22:16,581 --> 00:22:18,717
being able to track your own movement.
464
00:22:18,717 --> 00:22:20,169
- It's almost like
looking down at our lives
465
00:22:20,169 --> 00:22:21,518
from 30,000 feet.
466
00:22:21,518 --> 00:22:23,272
There's a company right now in Boston
467
00:22:23,272 --> 00:22:25,527
that can actually predict that
you're going to get depressed
468
00:22:25,527 --> 00:22:27,562
two days before you get depressed
469
00:22:27,562 --> 00:22:29,027
and the gentleman who created it said
470
00:22:29,027 --> 00:22:31,000
if you actually watch any one of us,
471
00:22:31,000 --> 00:22:34,208
most people have a very
discernible pattern of behavior.
472
00:22:34,208 --> 00:22:37,253
And for the first week, our
software basically determines
473
00:22:37,253 --> 00:22:39,008
what your normal pattern is
474
00:22:39,008 --> 00:22:40,554
and then two days before you're showing
475
00:22:40,554 --> 00:22:42,727
any outward signs of depression,
476
00:22:42,727 --> 00:22:44,610
the amount of Tweets and
emails that you're sending
477
00:22:44,610 --> 00:22:47,154
go down, your radius of
travel starts shrinking,
478
00:22:47,154 --> 00:22:49,153
the amount of time that
you spend at home goes up.
479
00:22:49,153 --> 00:22:52,151
- You can look to see if how you exercise
480
00:22:52,151 --> 00:22:54,081
changes your social behavior,
481
00:22:54,081 --> 00:22:56,173
if what you eat changes how you sleep
482
00:22:56,173 --> 00:23:00,008
and how that impacts your medical claims.
483
00:23:00,008 --> 00:23:01,972
- All kinds of data and information
484
00:23:01,972 --> 00:23:05,063
are sitting inside the
world you do every day.
485
00:23:05,063 --> 00:23:06,528
- Now, with all these devices,
486
00:23:06,528 --> 00:23:10,270
we have real-time information,
real-time understanding.
487
00:23:10,270 --> 00:23:11,327
- Now that might sound interesting,
488
00:23:11,327 --> 00:23:13,617
might help you shed a few pounds,
489
00:23:13,617 --> 00:23:15,255
realize you're eating
too many potato chips
490
00:23:15,255 --> 00:23:16,755
and sitting around too much perhaps
491
00:23:16,755 --> 00:23:19,009
and that's useful to you individually,
492
00:23:19,009 --> 00:23:23,146
but if hundreds of
millions of people do that,
493
00:23:23,146 --> 00:23:26,145
you have a big cloud of data
494
00:23:26,145 --> 00:23:29,236
about people's behavior
that can be crawled through
495
00:23:29,236 --> 00:23:31,956
by pattern recognition algorithm.
496
00:23:33,204 --> 00:23:35,622
And doctors and health policy officials
497
00:23:35,622 --> 00:23:38,213
can start to see patterns
that change the way,
498
00:23:38,213 --> 00:23:40,677
collectively as a society, we understand
499
00:23:40,677 --> 00:23:44,129
not just our health, but every single area
500
00:23:44,129 --> 00:23:46,723
where data can be applied
501
00:23:46,723 --> 00:23:49,323
because we start to
understand how we might,
502
00:23:49,323 --> 00:23:53,107
collectively as a culture,
change our behavior.
503
00:23:56,657 --> 00:23:58,586
- And if you look at the future of this,
504
00:23:58,586 --> 00:24:02,642
we're gonna be embedded in a
sea of information services
505
00:24:02,642 --> 00:24:07,360
that are connected to massive
databases in the cloud.
506
00:24:07,360 --> 00:24:11,111
(rhythmic electronic music)
507
00:24:11,111 --> 00:24:12,611
- If you take a look at
everything that you touch
508
00:24:12,611 --> 00:24:15,039
in everyday life, the
majority of these things
509
00:24:15,039 --> 00:24:18,246
were invented many, many,
many, many, many years ago
510
00:24:18,246 --> 00:24:20,385
and they're ripe for reinvention
511
00:24:20,385 --> 00:24:22,594
and when they get reinvented,
512
00:24:22,594 --> 00:24:23,848
they're gonna be connected,
513
00:24:23,848 --> 00:24:26,184
they're gonna be connected in some way
514
00:24:26,184 --> 00:24:30,031
that data that comes off of
these devices that you touch
515
00:24:30,031 --> 00:24:32,855
is gonna be collected and
stored in a central location
516
00:24:32,855 --> 00:24:36,457
and people are gonna run big
data algorithms on this data
517
00:24:36,457 --> 00:24:37,911
and then you're gonna get the feedback
518
00:24:37,911 --> 00:24:41,043
of the collective whole
rather than the individual.
519
00:24:42,931 --> 00:24:44,383
- So it's taking people
who are already out there,
520
00:24:44,383 --> 00:24:45,895
who already have these devices,
521
00:24:45,895 --> 00:24:48,358
and turning all these
people into contributors
522
00:24:48,358 --> 00:24:51,281
of information back to the system.
523
00:24:52,949 --> 00:24:56,487
You become one of the
nodes on the network.
524
00:24:57,586 --> 00:24:59,539
I think the Internet,
as wondrous as it's been
525
00:24:59,539 --> 00:25:01,666
over the last 20 years, was like a layer
526
00:25:01,666 --> 00:25:03,920
that needed to be in place
for all these sensors
527
00:25:03,920 --> 00:25:06,764
and devices to be able to
communicate with each other.
528
00:25:06,764 --> 00:25:09,181
- You know, we're
building this global brain
529
00:25:09,181 --> 00:25:12,854
that has these new functions
and we're accessing them
530
00:25:12,854 --> 00:25:14,899
primarily now through our mobile devices,
531
00:25:14,899 --> 00:25:17,409
or obviously also on our desktops,
532
00:25:17,409 --> 00:25:19,117
but increasingly mobile.
533
00:25:19,117 --> 00:25:22,535
- I think this data revolution
has a strange impact really
534
00:25:22,535 --> 00:25:26,009
of people feeling like there's
somebody listening to them
535
00:25:26,009 --> 00:25:29,728
and that could mean listening
in the sense of Big Brother,
536
00:25:29,728 --> 00:25:31,355
someone's listening in,
537
00:25:31,355 --> 00:25:34,492
or it could be someone's
really hearing me.
538
00:25:34,492 --> 00:25:37,189
This device in my hand knows who I am,
539
00:25:37,189 --> 00:25:40,746
it can somewhat anticipate what I want
540
00:25:40,746 --> 00:25:44,162
or where I'm going and react to that.
541
00:25:45,748 --> 00:25:48,886
The implications of that are huge
542
00:25:48,886 --> 00:25:50,350
for the decisions that we make
543
00:25:50,350 --> 00:25:52,902
and for the systems that we're part of.
544
00:25:55,486 --> 00:25:57,742
I think about living in a city
545
00:25:57,742 --> 00:26:00,252
and how you're experience
of living in that city
546
00:26:00,252 --> 00:26:02,413
would be, in 10 or 15 years.
547
00:26:02,413 --> 00:26:03,782
You've got places like Chicago
548
00:26:03,782 --> 00:26:05,258
where they're being hugely innovative
549
00:26:05,258 --> 00:26:07,431
and they're taking massive data sets,
550
00:26:07,431 --> 00:26:09,128
combining them in interesting ways,
551
00:26:09,128 --> 00:26:10,895
running interesting algorithms on them
552
00:26:10,895 --> 00:26:13,521
and figuring out ways
that they can intervene
553
00:26:13,521 --> 00:26:15,695
in this system to sort of see patterns
554
00:26:15,695 --> 00:26:18,362
and be able to react to those patterns.
555
00:26:19,030 --> 00:26:23,295
When you take in data, it
affects you as an individual
556
00:26:23,295 --> 00:26:24,783
and then you affect the system
557
00:26:24,783 --> 00:26:26,421
and that affects the data again
558
00:26:26,421 --> 00:26:29,722
and this round trip that you
start to see yourself part of
559
00:26:29,722 --> 00:26:33,309
makes me understand that I'm
an actor in a larger system.
560
00:26:33,309 --> 00:26:35,564
For instance, if you know
by looking at the data,
561
00:26:35,564 --> 00:26:37,703
and you have to put
different data sets together
562
00:26:37,703 --> 00:26:40,480
to be able to see this, that
some of the street lights,
563
00:26:40,480 --> 00:26:43,165
you know, when they go out,
they cause higher crime
564
00:26:43,165 --> 00:26:44,966
in that particular block,
565
00:26:44,966 --> 00:26:46,419
(siren blares)
566
00:26:46,419 --> 00:26:49,173
you start to see ways that
if you can query that data
567
00:26:49,173 --> 00:26:51,765
in intelligent ways,
that you can prioritize
568
00:26:51,765 --> 00:26:54,275
the limited resources
that you have in a city
569
00:26:54,275 --> 00:26:56,774
to take care of the things
that have, you know,
570
00:26:56,774 --> 00:26:59,903
follow along effects
and follow along costs.
571
00:27:00,408 --> 00:27:02,202
- In the end, you know,
you're going to hope that
572
00:27:02,202 --> 00:27:05,421
this is just our reaction as a species
573
00:27:05,421 --> 00:27:07,386
to this scale problem, right,
574
00:27:07,386 --> 00:27:08,932
how do you get another, you know,
575
00:27:08,932 --> 00:27:11,473
two billion people on the planet?
576
00:27:11,473 --> 00:27:13,600
You can't do it unless
you start instrumenting
577
00:27:13,600 --> 00:27:16,111
every little thing and
dialing it in just right.
578
00:27:16,111 --> 00:27:18,284
- And you know, right
now you wait for the bus
579
00:27:18,284 --> 00:27:20,945
because the bus is coming
on a particular schedule
580
00:27:20,945 --> 00:27:23,118
and it's great, we're
now at the point where
581
00:27:23,118 --> 00:27:26,128
your phone will tell you when
the bus is really coming,
582
00:27:26,128 --> 00:27:28,720
not just when the bus
is scheduled to come.
583
00:27:29,631 --> 00:27:31,724
You know, take that a little bit forward.
584
00:27:31,724 --> 00:27:33,060
What about when there's more use
585
00:27:33,060 --> 00:27:34,896
on one line than the other?
586
00:27:34,896 --> 00:27:36,650
Well instead of sticking
with the schedule,
587
00:27:36,650 --> 00:27:40,033
does the system start to understand
588
00:27:40,033 --> 00:27:44,576
that maybe this route
doesn't need 10 buses today
589
00:27:44,576 --> 00:27:46,751
and automatically shift those resources
590
00:27:46,751 --> 00:27:50,210
over to the lines where
the buses are full.
591
00:27:50,210 --> 00:27:53,557
- Boston just created a new smartphone app
592
00:27:53,557 --> 00:27:57,229
which uses the
accelerometer in your phone.
593
00:27:57,229 --> 00:27:59,646
So if you're driving through
the streets of south Boston
594
00:27:59,646 --> 00:28:03,122
and all of a sudden there's
a big dip in the street,
595
00:28:03,122 --> 00:28:05,830
the phone realizes it.
596
00:28:05,830 --> 00:28:07,584
So anybody in the city of Boston
597
00:28:07,584 --> 00:28:09,339
that has this up and running
598
00:28:09,339 --> 00:28:12,012
is feeding real-time data
on the quality of the roads
599
00:28:12,012 --> 00:28:13,639
to the city of Boston.
600
00:28:13,639 --> 00:28:15,394
- Then you start to feel that your city
601
00:28:15,394 --> 00:28:17,311
is sort of a responsive organism
602
00:28:17,311 --> 00:28:21,398
just like your body puts
your blood where it needs it.
603
00:28:22,657 --> 00:28:26,040
Think about ways that
we could live in cities
604
00:28:26,040 --> 00:28:29,050
when they're that responsive to our needs
605
00:28:29,050 --> 00:28:31,304
and think about the implications
of that for the planet
606
00:28:31,304 --> 00:28:33,640
because really cities are also really
607
00:28:33,640 --> 00:28:36,973
how we're going to
survive the 21st century.
608
00:28:36,973 --> 00:28:39,645
You can live in a city with
a far smaller footprint
609
00:28:39,645 --> 00:28:42,015
than anywhere else in the world
610
00:28:42,015 --> 00:28:45,619
and I think data and sort
of the responsive systems
611
00:28:45,619 --> 00:28:48,414
will play an enormous role in that.
612
00:28:51,093 --> 00:28:53,010
- I think one of the most
exciting things about data
613
00:28:53,010 --> 00:28:56,765
is that, you know, it's
giving us extra senses,
614
00:28:56,765 --> 00:28:58,391
it's expanding upon, you know,
615
00:28:58,391 --> 00:29:01,529
our ability to perceive the world
616
00:29:01,529 --> 00:29:03,993
and it actually ends up
giving us the opportunity
617
00:29:03,993 --> 00:29:06,201
to make things tangible again
618
00:29:06,201 --> 00:29:08,084
and to actually get a
perspective on ourselves,
619
00:29:08,084 --> 00:29:11,506
both as individuals and also as society.
620
00:29:13,510 --> 00:29:16,847
- And there's always that
moment in data visualization
621
00:29:16,847 --> 00:29:18,438
when you're looking at, you know,
622
00:29:18,438 --> 00:29:20,193
tons and tons and tons of data.
623
00:29:20,193 --> 00:29:22,610
The point is not to look
at the tons and tons
624
00:29:22,610 --> 00:29:25,283
and tons of data, but what are the stories
625
00:29:25,283 --> 00:29:27,451
that emerge out of it.
626
00:29:28,956 --> 00:29:31,002
- If you said look, give
me the home street address
627
00:29:31,002 --> 00:29:35,429
of everyone who entered New
York State prison last year
628
00:29:35,429 --> 00:29:37,300
and the home street address of everyone
629
00:29:37,300 --> 00:29:39,438
who left New York State prison last year
630
00:29:39,438 --> 00:29:42,146
and we said look, let's get
the numbers, put it on a map
631
00:29:42,146 --> 00:29:44,154
and actually show it to people.
632
00:29:44,154 --> 00:29:47,251
And when we first
produced our Brooklyn map,
633
00:29:47,251 --> 00:29:49,082
which was the first one we did,
634
00:29:49,082 --> 00:29:51,673
they hit the floor, not
because nobody knew this.
635
00:29:51,673 --> 00:29:53,126
You know, everyone knew anecdotally
636
00:29:53,126 --> 00:29:57,472
how concentrated the effect
of incarceration was,
637
00:29:57,472 --> 00:30:00,308
but no one had actually seen
it based on actual data.
638
00:30:00,308 --> 00:30:04,318
We started to show these
remarkably intensive
639
00:30:04,318 --> 00:30:06,986
concentrations of people
going in and out of prison,
640
00:30:06,986 --> 00:30:09,125
highly disproportionately located
641
00:30:09,125 --> 00:30:12,443
in very small areas around the city.
642
00:30:16,214 --> 00:30:19,003
- [Voiceover] And what we found
is that the home addresses
643
00:30:19,003 --> 00:30:22,268
of incarcerated people
correlates very highly
644
00:30:22,268 --> 00:30:25,819
with poverty and with people of color.
645
00:30:28,940 --> 00:30:31,415
- You have a justice system,
which by all accounts
646
00:30:31,415 --> 00:30:32,822
is supposed to be essentially based on
647
00:30:32,822 --> 00:30:37,179
a case-by-case, individual
decision of justice.
648
00:30:37,179 --> 00:30:39,015
Well when you looked at the map over time,
649
00:30:39,015 --> 00:30:43,315
what you really were seeing
was this mass population
650
00:30:43,315 --> 00:30:48,162
movement out and mass
population resettlement back,
651
00:30:48,162 --> 00:30:50,564
this cyclical movement of people.
652
00:30:51,276 --> 00:30:52,952
- So once we had mapped the data,
653
00:30:52,952 --> 00:30:55,334
we quantified it in
terms of how much it cost
654
00:30:55,334 --> 00:30:58,132
to house those same people in prison.
655
00:30:58,132 --> 00:30:59,050
- And that's where we started to think
656
00:30:59,050 --> 00:31:01,560
about million dollar blocks.
657
00:31:01,560 --> 00:31:06,395
We found over 35 individual
city blocks in Brooklyn alone
658
00:31:06,395 --> 00:31:08,654
for which the state was spending
659
00:31:08,654 --> 00:31:11,072
more than a million dollars every year
660
00:31:11,072 --> 00:31:14,042
to remove and return people to prison.
661
00:31:16,663 --> 00:31:18,882
We needed to reframe that conversation
662
00:31:18,882 --> 00:31:21,682
and what immediately
emerged out of this was
663
00:31:21,682 --> 00:31:23,937
this idea of justice reinvestment.
664
00:31:23,937 --> 00:31:25,819
We weren't building
anything in those places
665
00:31:25,819 --> 00:31:27,889
for those dollars.
666
00:31:27,889 --> 00:31:30,329
How can we demand sort of more equity
667
00:31:30,329 --> 00:31:31,991
for that investment
668
00:31:31,991 --> 00:31:33,873
to extract those neighborhoods
669
00:31:33,873 --> 00:31:37,801
from what decades of
criminalization has done?
670
00:31:37,801 --> 00:31:40,788
And that shift had to come from the data
671
00:31:40,788 --> 00:31:43,961
and a new way of thinking
about information.
672
00:31:46,314 --> 00:31:48,900
These maps did that.
673
00:31:52,450 --> 00:31:54,612
- The amount of data that
now is being collected
674
00:31:54,612 --> 00:31:59,086
about those areas that are
stuck in cycles of poverty,
675
00:31:59,086 --> 00:32:02,549
cycles of famine, cycles of war,
676
00:32:02,549 --> 00:32:05,978
gives people or governments and NGOs
677
00:32:05,978 --> 00:32:09,349
an opportunity to do good.
678
00:32:09,349 --> 00:32:12,602
Understanding on the ground,
information on the ground,
679
00:32:12,602 --> 00:32:15,124
data on the ground can change the way
680
00:32:15,124 --> 00:32:18,158
people apply resources
681
00:32:18,158 --> 00:32:21,255
which are intended to try to help.
682
00:32:22,596 --> 00:32:24,015
- We really fundamentally believe
683
00:32:24,015 --> 00:32:25,897
that data has intrinsic value
684
00:32:25,897 --> 00:32:27,559
and we also fundamentally believe
685
00:32:27,559 --> 00:32:30,488
that the individuals who create that data
686
00:32:30,488 --> 00:32:33,742
should be able to benefit from that data.
687
00:32:34,863 --> 00:32:36,669
But we're working with one
of the big mobile phone
688
00:32:36,669 --> 00:32:39,714
operators in Kenya, we're
looking at the dynamics
689
00:32:39,714 --> 00:32:42,550
of these mobile phone subscribers.
690
00:32:42,550 --> 00:32:44,886
Millions of phones in Kenya.
691
00:32:46,101 --> 00:32:47,355
We're looking at how the population
692
00:32:47,355 --> 00:32:49,813
was moving over the country.
693
00:32:50,772 --> 00:32:53,201
And we're overlaying that movement data
694
00:32:53,201 --> 00:32:56,397
with data about parasite prevalence
695
00:32:56,397 --> 00:32:59,622
from household surveys
and data from hospitals.
696
00:33:02,545 --> 00:33:05,346
We can start identifying
these malaria hot spots,
697
00:33:05,346 --> 00:33:09,016
regions within Kenya
that desperately needed
698
00:33:09,016 --> 00:33:11,311
the eradication dollars.
699
00:33:13,769 --> 00:33:15,860
It's fascinating to
start extracting models
700
00:33:15,860 --> 00:33:17,418
and plotting graphs of the behavior
701
00:33:17,418 --> 00:33:19,660
of tens of millions of people in Kenya,
702
00:33:19,660 --> 00:33:22,508
but it's meaningful when you
can make those insights count,
703
00:33:22,508 --> 00:33:25,007
when you can take the
insights that you've gleaned
704
00:33:25,007 --> 00:33:26,807
and put them into practice
705
00:33:26,807 --> 00:33:29,771
and measure what the impact was
706
00:33:29,771 --> 00:33:32,108
and hopefully making
the lives of the people
707
00:33:32,108 --> 00:33:34,186
who are generating this data better.
708
00:33:34,186 --> 00:33:37,163
(children yelling)
709
00:33:37,163 --> 00:33:41,544
(siren blaring)
710
00:33:41,544 --> 00:33:45,344
- That afternoon when the
earthquake struck in January,
711
00:33:45,344 --> 00:33:48,645
I was watching CNN and
saw the breaking news
712
00:33:48,645 --> 00:33:52,317
and I had taken my wife in
Port-au-Prince at the time
713
00:33:52,317 --> 00:33:54,154
and for the better part of 12 hours
714
00:33:54,154 --> 00:33:56,362
had no idea whether any one of my friends
715
00:33:56,362 --> 00:33:58,450
were alive or dead.
716
00:33:58,450 --> 00:34:01,495
- [Voiceover] Meier was a
Tufts University PhD student
717
00:34:01,495 --> 00:34:04,040
and directed crisis mapping for Ushahidi,
718
00:34:04,040 --> 00:34:06,260
a nonprofit that collects, visualizes,
719
00:34:06,260 --> 00:34:08,375
and then maps crisis data.
720
00:34:08,375 --> 00:34:10,095
- And so I went on social media
721
00:34:10,095 --> 00:34:12,315
and I found dozens and dozens of Haitians
722
00:34:12,315 --> 00:34:15,522
tweeting live about the damage
723
00:34:15,522 --> 00:34:17,486
and a lot of the time they were sharing
724
00:34:17,486 --> 00:34:19,276
where this damage was happening.
725
00:34:19,276 --> 00:34:22,286
So they would say the church
on the corner of X and Y
726
00:34:22,286 --> 00:34:25,261
has been destroyed or is collapsed
727
00:34:25,261 --> 00:34:27,423
and they would refer to
street names and so on.
728
00:34:27,423 --> 00:34:29,933
So it's about really
becoming a digital detector
729
00:34:29,933 --> 00:34:33,637
and then trying to understand
where on the map this was.
730
00:34:33,637 --> 00:34:35,148
- [Voiceover] So he
called everyone he knew
731
00:34:35,148 --> 00:34:37,937
and put together a mostly
volunteer team in Boston
732
00:34:37,937 --> 00:34:40,575
to prioritize the most
life and death tweets
733
00:34:40,575 --> 00:34:42,876
and map them for rescue workers.
734
00:34:42,876 --> 00:34:45,967
- For the first time,
it wasn't the government
735
00:34:45,967 --> 00:34:47,838
emergency management organization
736
00:34:47,838 --> 00:34:50,011
that had the best data
of what was happening,
737
00:34:50,011 --> 00:34:53,301
but it was legions of
volunteers that came together
738
00:34:53,301 --> 00:34:55,346
and crowdmapped the location
739
00:34:55,346 --> 00:34:57,101
of buildings that had collapsed,
740
00:34:57,101 --> 00:34:58,948
people that were trapped in rubble,
741
00:34:58,948 --> 00:35:00,738
locations where water was needed,
742
00:35:00,738 --> 00:35:03,867
where physicians were needed and the like.
743
00:35:04,500 --> 00:35:06,673
- I think we've seen, not only in Haiti
744
00:35:06,673 --> 00:35:08,556
but almost every disaster since Haiti,
745
00:35:08,556 --> 00:35:13,065
just an explosion of social media content.
746
00:35:13,065 --> 00:35:14,727
- [Voiceover] Disaster
mapping groups like Meier's
747
00:35:14,727 --> 00:35:16,610
realized that there was so much at stake
748
00:35:16,610 --> 00:35:19,015
and so much raw data
coming from social media
749
00:35:19,015 --> 00:35:20,619
during natural disasters.
750
00:35:20,619 --> 00:35:22,328
They needed to come up with new algorithms
751
00:35:22,328 --> 00:35:24,536
to sort through the flood of information.
752
00:35:24,536 --> 00:35:28,383
- We are drawing on
artificial intelligence,
753
00:35:28,383 --> 00:35:31,090
machine learning, working
with data scientists
754
00:35:31,090 --> 00:35:34,182
to develop semi-automated ways
755
00:35:34,182 --> 00:35:38,063
to extract relevant, informative
and actionable information
756
00:35:38,063 --> 00:35:40,156
from social media during disasters.
757
00:35:40,156 --> 00:35:41,329
So one of our projects is called
758
00:35:41,329 --> 00:35:44,455
Artificial Intelligence
for Disaster Response.
759
00:35:46,332 --> 00:35:47,958
During the Hurricane Sandy,
760
00:35:47,958 --> 00:35:51,971
we collected five million tweets
during the first few days.
761
00:35:52,593 --> 00:35:55,858
With the Sandy data, we've
been able to show empirically
762
00:35:55,858 --> 00:35:58,485
that we can automatically
identify whether or not
763
00:35:58,485 --> 00:36:02,622
a tweet has been written
by an eye witness.
764
00:36:02,622 --> 00:36:04,074
So somebody who is writing something
765
00:36:04,074 --> 00:36:06,620
saying the bridge is down,
766
00:36:06,620 --> 00:36:08,840
we can say with a degree of accuracy
767
00:36:08,840 --> 00:36:11,304
of about 80% and higher whether that tweet
768
00:36:11,304 --> 00:36:13,012
has actually been posted
by an eye witness,
769
00:36:13,012 --> 00:36:16,214
which is really important
for disaster response.
770
00:36:18,230 --> 00:36:20,729
I think that goes to the heart of why
771
00:36:20,729 --> 00:36:23,367
something like social media
and Twitter is so important.
772
00:36:23,367 --> 00:36:26,540
Having these millions of
eyes and ears on the ground.
773
00:36:26,540 --> 00:36:28,341
It's about empowering the crowd,
774
00:36:28,341 --> 00:36:30,002
it's about empowering
those who are effected
775
00:36:30,002 --> 00:36:32,134
and those who want to help.
776
00:36:32,134 --> 00:36:34,040
These are real lives that we're capturing.
777
00:36:34,040 --> 00:36:36,481
This is not abstract information.
778
00:36:36,481 --> 00:36:39,398
These are real people who
are affected by disasters
779
00:36:39,398 --> 00:36:41,815
who are trying to either
help or seek help.
780
00:36:41,815 --> 00:36:44,321
It doesn't get more real than this.
781
00:36:48,788 --> 00:36:51,170
- Today, technology allows,
782
00:36:51,170 --> 00:36:53,624
in a lot of our communication tools,
783
00:36:53,624 --> 00:36:56,052
allows an idea to be spread instantly
784
00:36:56,052 --> 00:37:00,023
and with the original source of truth.
785
00:37:00,023 --> 00:37:02,894
I can have an idea and I can decide that
786
00:37:02,894 --> 00:37:04,324
I want to bring this around the world
787
00:37:04,324 --> 00:37:07,999
and I can do it almost instantaneously.
788
00:37:09,795 --> 00:37:11,666
- Tunisia's a great example.
789
00:37:11,666 --> 00:37:15,175
There were little uprisings
happening all over Tunisia
790
00:37:15,175 --> 00:37:17,431
and each one was brutally squashed
791
00:37:17,431 --> 00:37:19,650
and there was no media attention
792
00:37:19,650 --> 00:37:24,276
so no one knew that any other
little village had an issue.
793
00:37:24,276 --> 00:37:27,157
But what happened was in one village
794
00:37:27,157 --> 00:37:30,214
there was the man who
self-immolated in protest
795
00:37:30,214 --> 00:37:33,165
and the images were put online
796
00:37:33,165 --> 00:37:37,977
by a distant group onto Facebook
797
00:37:37,977 --> 00:37:39,883
and then Al Jazeera picked it up
798
00:37:39,883 --> 00:37:42,696
and broadcasted the
image across their region
799
00:37:42,696 --> 00:37:44,857
and then all of Tunisia realized
800
00:37:44,857 --> 00:37:47,287
wait a second, we're
about to have an uprising
801
00:37:47,287 --> 00:37:48,449
and it just went.
802
00:37:48,449 --> 00:37:53,454
(yelling)
803
00:37:55,095 --> 00:37:58,513
So Tunisia was really
activists on the ground,
804
00:37:58,513 --> 00:38:02,395
social media and mainstream
media working together,
805
00:38:02,395 --> 00:38:05,486
spreading across Tunisia this idea that
806
00:38:05,486 --> 00:38:07,031
you're not the only ones
807
00:38:07,031 --> 00:38:10,745
and it gave everyone the
courage to do the uprising.
808
00:38:12,412 --> 00:38:14,713
Technology has fundamentally changed
809
00:38:14,713 --> 00:38:17,167
the way people interact with government.
810
00:38:17,167 --> 00:38:19,432
That's another layer of the stack
811
00:38:19,432 --> 00:38:21,012
that's sort of being opened up.
812
00:38:21,012 --> 00:38:23,104
I think that's one of the
key challenges that big data
813
00:38:23,104 --> 00:38:26,067
has so much opportunity for both good
814
00:38:26,067 --> 00:38:28,287
and for also really
screwing up our system.
815
00:38:28,287 --> 00:38:30,414
- You can't talk about data
without talking about people
816
00:38:30,414 --> 00:38:31,959
because people create the data
817
00:38:31,959 --> 00:38:33,923
and people utilize the data.
818
00:38:33,923 --> 00:38:38,171
(whirring)
819
00:38:44,360 --> 00:38:47,265
- So a handful of years ago
there's a guy named Andrew Pole
820
00:38:47,265 --> 00:38:50,077
who is a statistician
who gets hired by Target.
821
00:38:50,077 --> 00:38:51,496
He's sitting at his desk and some guys
822
00:38:51,496 --> 00:38:53,076
from the marketing department
come by and they say,
823
00:38:53,076 --> 00:38:55,505
"Look, if we wanted to figure out
824
00:38:55,505 --> 00:38:58,003
"which of our customers are pregnant,
825
00:38:58,003 --> 00:39:00,049
"could you tell us that?"
826
00:39:00,049 --> 00:39:01,722
So what Andrew Pole
started doing is he said
827
00:39:01,722 --> 00:39:05,569
the women who had signed
up for the baby registry,
828
00:39:05,569 --> 00:39:07,477
let's track what they're buying
829
00:39:07,477 --> 00:39:09,602
and see if there's any patterns.
830
00:39:09,602 --> 00:39:11,531
I mean, obviously if
someone starts buying a crib
831
00:39:11,531 --> 00:39:13,332
or a stroller, you know they're pregnant.
832
00:39:13,332 --> 00:39:15,622
But by using all of this
data they had collected,
833
00:39:15,622 --> 00:39:18,388
they were able to start
seeing these patterns
834
00:39:18,388 --> 00:39:21,331
that you couldn't actually guess at.
835
00:39:22,394 --> 00:39:25,526
When women were in their second trimester,
836
00:39:25,526 --> 00:39:28,524
they suddenly stopped
buying scented lotion
837
00:39:28,524 --> 00:39:30,697
and started buying unscented lotion
838
00:39:30,697 --> 00:39:32,777
and about at the end of
their second trimester,
839
00:39:32,777 --> 00:39:35,078
the beginning of their third
trimester, they would start
840
00:39:35,078 --> 00:39:38,887
buying a lot of cotton
balls and wash cloths.
841
00:39:38,887 --> 00:39:42,850
- And then they could start
to subtly send you coupons
842
00:39:42,850 --> 00:39:45,901
for things that might be
related to your pregnancy.
843
00:39:46,720 --> 00:39:48,114
- The decided to do a little test case.
844
00:39:48,114 --> 00:39:50,648
So they send out some of
these ads to a local community
845
00:39:50,648 --> 00:39:52,787
and a couple weeks later
this father comes in
846
00:39:52,787 --> 00:39:55,704
to one of the stores and he's furious
847
00:39:55,704 --> 00:39:58,923
and he's got a flyer in his
hand that was sent to his house
848
00:39:58,923 --> 00:40:02,049
and he finds the manager
and he says to the manager,
849
00:40:02,049 --> 00:40:03,932
he says, "Look, I'm so upset.
850
00:40:03,932 --> 00:40:07,313
"You know, my daughter is 18 years old.
851
00:40:07,313 --> 00:40:10,277
"I don't know what you're
doing sending her this trash.
852
00:40:10,277 --> 00:40:12,497
"You sent her these coupons for diapers
853
00:40:12,497 --> 00:40:15,077
"and for cribs and for nursing equipment.
854
00:40:15,077 --> 00:40:16,623
"She's 18 years old
855
00:40:16,623 --> 00:40:18,877
"and it's like you're
encouraging her to get pregnant."
856
00:40:18,877 --> 00:40:21,004
Now the manager, who has
no idea what's going on
857
00:40:21,004 --> 00:40:23,839
with the pregnancy prediction machine
858
00:40:23,839 --> 00:40:25,222
that Andrew Pole built,
859
00:40:25,222 --> 00:40:26,896
says "Look, I'm so sorry.
860
00:40:26,896 --> 00:40:30,231
"I apologize, it's not
going to happen again."
861
00:40:30,231 --> 00:40:32,568
And a couple days later the
guy feels so bad about this
862
00:40:32,568 --> 00:40:35,159
that he calls the father at
home and he says to the father,
863
00:40:35,159 --> 00:40:36,879
"I just wanted to apologize again.
864
00:40:36,879 --> 00:40:38,622
"I'm so sorry this happened."
865
00:40:38,622 --> 00:40:40,167
And the father kind of
pauses for a moment.
866
00:40:40,167 --> 00:40:42,597
He says, "Well, I want you to know
867
00:40:42,597 --> 00:40:44,305
"I had a conversation with my daughter
868
00:40:44,305 --> 00:40:47,106
"and there's been some
activities in my household
869
00:40:47,106 --> 00:40:49,023
"that I haven't been aware of
870
00:40:49,023 --> 00:40:50,778
"and she's due in August.
871
00:40:50,778 --> 00:40:53,777
"So I owe you an apology."
872
00:40:53,777 --> 00:40:55,368
And when I asked Andrew Pole about this,
873
00:40:55,368 --> 00:40:56,996
before he stopped talking to me,
874
00:40:56,996 --> 00:41:00,122
before Target told him that he
couldn't talk to me anymore,
875
00:41:00,122 --> 00:41:03,469
he said, "Oh look, like
you gotta understand,
876
00:41:03,469 --> 00:41:05,305
"like this science is
just at the beginning,
877
00:41:05,305 --> 00:41:07,257
"like we're still playing
with what we can figure out
878
00:41:07,257 --> 00:41:08,598
"about your life."
879
00:41:08,598 --> 00:41:13,603
(mellow electronic music)
880
00:41:18,126 --> 00:41:19,962
- Everybody who's on Facebook is involved
881
00:41:19,962 --> 00:41:22,298
in a transaction in which
they're donating their data
882
00:41:22,298 --> 00:41:24,262
to Facebook, who then sells their data
883
00:41:24,262 --> 00:41:26,040
and in return they get this service
884
00:41:26,040 --> 00:41:27,388
which allows them to post pictures
885
00:41:27,388 --> 00:41:28,353
and connect to their friends
886
00:41:28,353 --> 00:41:30,608
and so on and so on and so on and so on.
887
00:41:30,608 --> 00:41:32,153
That's the transaction,
888
00:41:32,153 --> 00:41:34,442
but nobody knows that's the transaction.
889
00:41:34,442 --> 00:41:36,395
Most people, I think,
don't understand that.
890
00:41:36,395 --> 00:41:39,626
They just literally think
they're getting Facebook for free
891
00:41:39,626 --> 00:41:41,125
and it's not a free thing,
892
00:41:41,125 --> 00:41:46,130
we're paying for it by allowing
them access to our data.
893
00:41:48,377 --> 00:41:51,143
- There are a lot of people
on Facebook who don't know,
894
00:41:51,143 --> 00:41:54,653
for example, how much
information is really out there
895
00:41:54,653 --> 00:41:57,453
about themselves and probably
and apparently don't care
896
00:41:57,453 --> 00:42:00,286
as long as they can put
up pictures of their cats.
897
00:42:00,286 --> 00:42:04,005
I think most people, when
they think about privacy,
898
00:42:04,005 --> 00:42:06,338
they don't seem to connect
899
00:42:06,338 --> 00:42:09,647
their willingness to share
their personal information
900
00:42:09,647 --> 00:42:12,553
with the world, either
through social media
901
00:42:12,553 --> 00:42:14,900
or through shopping
online or anything else,
902
00:42:14,900 --> 00:42:18,614
they don't seem to equate
that with surveillance.
903
00:42:21,083 --> 00:42:24,593
- Every time I receive a text message,
904
00:42:24,593 --> 00:42:26,544
every time I make a phone call,
905
00:42:26,544 --> 00:42:28,473
my location is being recorded.
906
00:42:28,473 --> 00:42:32,518
That data about me is being
pushed off to a server
907
00:42:32,518 --> 00:42:35,319
that is owned by my mobile operator.
908
00:42:35,319 --> 00:42:36,865
If I call that mobile
phone operator and say
909
00:42:36,865 --> 00:42:39,619
"Hey, I'd like to have my data, please.
910
00:42:39,619 --> 00:42:40,735
"At the minimum, share it with me.
911
00:42:40,735 --> 00:42:45,337
"I'd like to see my locations over time."
912
00:42:45,337 --> 00:42:47,798
They won't give it to me.
913
00:42:47,798 --> 00:42:50,854
- The increased ability of
these devices that we have
914
00:42:50,854 --> 00:42:53,387
to become recording and sensing objects,
915
00:42:53,387 --> 00:42:55,435
so data collection devices essentially,
916
00:42:55,435 --> 00:42:59,658
in public space, that
changes a lot of things.
917
00:43:00,186 --> 00:43:02,359
- Even if the phone company took away
918
00:43:02,359 --> 00:43:04,207
all of your personal
identifying information,
919
00:43:04,207 --> 00:43:06,625
it would know within about 30 centimeters
920
00:43:06,625 --> 00:43:08,135
where you woke up every morning
921
00:43:08,135 --> 00:43:09,553
and where you went to work every day
922
00:43:09,553 --> 00:43:10,762
and the path that you took
923
00:43:10,762 --> 00:43:12,145
and who you were walking with
924
00:43:12,145 --> 00:43:14,016
and so even if they
didn't know who you are,
925
00:43:14,016 --> 00:43:16,021
they know who you are.
926
00:43:16,724 --> 00:43:20,280
What I'm really worried about
is the cost to democracy.
927
00:43:20,280 --> 00:43:23,744
Now, today, it's nearly
impossible to be truly anonymous
928
00:43:23,744 --> 00:43:27,742
and so the ability to everything
to be connected to you
929
00:43:27,742 --> 00:43:29,426
and for everything you
do in the real world
930
00:43:29,426 --> 00:43:30,879
to be connected to you,
everything you're doing
931
00:43:30,879 --> 00:43:33,308
in cyberspace, and then the ability for
932
00:43:33,308 --> 00:43:35,399
whoever it is to take
that, put it together,
933
00:43:35,399 --> 00:43:37,236
and turn it into a story.
934
00:43:37,236 --> 00:43:40,689
My fear really is that once
there's so much data out there
935
00:43:40,689 --> 00:43:42,489
and once governments and companies
936
00:43:42,489 --> 00:43:45,836
start to be able to use
that data to profile people,
937
00:43:45,836 --> 00:43:48,871
to filter them out, everybody
is going to start to worry
938
00:43:48,871 --> 00:43:51,886
about their activities.
939
00:43:52,390 --> 00:43:56,854
- We're at a very, very important point
940
00:43:56,854 --> 00:44:01,536
where I think our society
has come to realize this fact
941
00:44:01,536 --> 00:44:06,505
and just begun in earnest to
debate the implictions of it.
942
00:44:07,254 --> 00:44:11,055
- You have, I think,
an attitude in the NSA
943
00:44:11,055 --> 00:44:14,436
that they have a right to
every bit of information
944
00:44:14,436 --> 00:44:16,305
they can collect.
945
00:44:16,305 --> 00:44:20,523
We have constructed a world where
946
00:44:20,523 --> 00:44:22,943
the government is collecting secretly
947
00:44:22,943 --> 00:44:25,870
all of the data it can on
each individual citizen,
948
00:44:25,870 --> 00:44:29,783
whether that individual citizen
has done anything or not.
949
00:44:29,783 --> 00:44:32,932
They have been collecting
massive amounts of data
950
00:44:32,932 --> 00:44:36,012
through cell phone providers,
Internet providers,
951
00:44:36,012 --> 00:44:38,801
that is then sifted through secretly
952
00:44:38,801 --> 00:44:42,577
by people over whom no
democratic institution
953
00:44:42,577 --> 00:44:44,440
has effective control.
954
00:44:45,665 --> 00:44:47,955
There's a feeling that if you're not
955
00:44:47,955 --> 00:44:49,033
communing with terrorists,
956
00:44:49,033 --> 00:44:51,334
what do you care if the government
gathers your information.
957
00:44:51,334 --> 00:44:53,345
This is probably the most pernicious,
958
00:44:53,345 --> 00:44:56,053
anti Bill of Rights line
of thought that there is
959
00:44:56,053 --> 00:44:57,970
because these are rights
we hold in common.
960
00:44:57,970 --> 00:44:59,481
Every violation of somebody else's rights
961
00:44:59,481 --> 00:45:01,612
is a violation of yours.
962
00:45:02,408 --> 00:45:04,116
- What's going to happen,
I think, is that we now
963
00:45:04,116 --> 00:45:06,580
have so much information
out there about ourselves
964
00:45:06,580 --> 00:45:08,544
and the ability for people to abuse it,
965
00:45:08,544 --> 00:45:09,962
people are going to get hurt,
966
00:45:09,962 --> 00:45:11,124
people are going to lose their jobs,
967
00:45:11,124 --> 00:45:12,751
people are going to get divorced,
968
00:45:12,751 --> 00:45:14,598
people are going to get killed
969
00:45:14,598 --> 00:45:16,307
and it's going to become really painful
970
00:45:16,307 --> 00:45:17,544
and everyone's going to realize
971
00:45:17,544 --> 00:45:19,439
we have to do something about this
972
00:45:19,439 --> 00:45:21,055
and then we're going to start to change.
973
00:45:21,055 --> 00:45:23,483
Now the question is how bad is it.
974
00:45:23,483 --> 00:45:26,447
- [Voiceover] You can't
have a secret operation
975
00:45:26,447 --> 00:45:29,957
validated by a secret court
based on secret evidence
976
00:45:29,957 --> 00:45:31,165
in a democratic republic.
977
00:45:31,165 --> 00:45:33,920
So the system closes and
no information gets out
978
00:45:33,920 --> 00:45:37,684
except it gets leaked or
it gets dumped on the world
979
00:45:37,684 --> 00:45:39,521
by outside actors,
whether that's WikiLeaks,
980
00:45:39,521 --> 00:45:40,812
or whether that's Bradley Manning,
981
00:45:40,812 --> 00:45:42,275
or whether that's Edward Snowden.
982
00:45:42,275 --> 00:45:43,856
That's the way that people find out
983
00:45:43,856 --> 00:45:46,029
what their government is up to.
984
00:45:46,029 --> 00:45:47,412
We're living in a future where we've lost
985
00:45:47,412 --> 00:45:48,574
our right to privacy.
986
00:45:48,574 --> 00:45:49,992
We've given it away for convenience sake
987
00:45:49,992 --> 00:45:51,840
in our economic and social lives
988
00:45:51,840 --> 00:45:55,415
and we've lost it for fear's
sake vis-a-vis our government.
989
00:45:58,270 --> 00:46:01,068
- Any time you're looking
at an ability to segment,
990
00:46:01,068 --> 00:46:04,734
analyze, you've got to
think about both sides.
991
00:46:05,240 --> 00:46:06,995
But there's so much good here,
992
00:46:06,995 --> 00:46:10,167
there's so much chance to
improve the quality of life
993
00:46:10,167 --> 00:46:12,213
that to basically close the box and say,
994
00:46:12,213 --> 00:46:13,084
"You know what, we're not going to look
995
00:46:13,084 --> 00:46:15,304
"at all this information,
we're not going to collect it,"
996
00:46:15,304 --> 00:46:16,756
that's not practical.
997
00:46:16,756 --> 00:46:20,098
What we're going to have to
do is think as a community.
998
00:46:20,557 --> 00:46:22,940
- We have cultures that
have never been in dialogue
999
00:46:22,940 --> 00:46:26,031
with more than a hundred
or 200 or 400 people
1000
00:46:26,031 --> 00:46:29,366
now connected to three billion.
1001
00:46:29,366 --> 00:46:34,371
(mellow music)
1002
00:46:35,918 --> 00:46:38,347
The phone is the on-ramp
to the information network.
1003
00:46:38,347 --> 00:46:40,218
Once you're on the information network,
1004
00:46:40,218 --> 00:46:42,689
you're in, everybody's in.
1005
00:46:42,689 --> 00:46:44,479
- Billions and billions of people
1006
00:46:44,479 --> 00:46:46,909
who have been excluded
from the discussion,
1007
00:46:46,909 --> 00:46:48,949
who couldn't afford to step into the world
1008
00:46:48,949 --> 00:46:50,158
of being connected,
1009
00:46:50,158 --> 00:46:51,738
step into the world of information,
1010
00:46:51,738 --> 00:46:54,992
step into the world of
being able to learn things
1011
00:46:54,992 --> 00:46:58,380
they could never learn are
suddenly on the network.
1012
00:47:00,303 --> 00:47:01,186
- [Voiceover] The world of the Internet,
1013
00:47:01,186 --> 00:47:02,430
from an innovation perspective,
1014
00:47:02,430 --> 00:47:05,149
is push innovation out
of large institutions
1015
00:47:05,149 --> 00:47:07,700
to people on the edges.
1016
00:47:09,821 --> 00:47:13,377
- [Voiceover] I suspect as
we equip these next billion
1017
00:47:13,377 --> 00:47:17,793
consumers with these
devices that connect them
1018
00:47:17,793 --> 00:47:21,141
with the rest of the world
and with the Internet,
1019
00:47:21,141 --> 00:47:24,855
we'll have a lot to learn
about how they use them.
1020
00:47:26,730 --> 00:47:28,276
- All of these people in these countries
1021
00:47:28,276 --> 00:47:29,869
are now connecting with each other,
1022
00:47:29,869 --> 00:47:33,750
sharing data about prices
of crops, prices of parts.
1023
00:47:33,750 --> 00:47:35,621
The Africans are talking to the Chinese
1024
00:47:35,621 --> 00:47:37,504
who are talking to the Indians
1025
00:47:37,504 --> 00:47:41,335
and the world is connected
in its nooks and crannies.
1026
00:47:43,931 --> 00:47:46,777
- The person that is in Rwanda
1027
00:47:46,777 --> 00:47:50,659
that has their first
phone that now has access
1028
00:47:50,659 --> 00:47:53,077
to an education system
1029
00:47:53,077 --> 00:47:55,831
that they never could
have dreamed of before
1030
00:47:55,831 --> 00:47:58,632
can start finding solutions
1031
00:47:58,632 --> 00:48:02,467
for his or her little town,
1032
00:48:02,467 --> 00:48:04,762
his or her village.
1033
00:48:05,942 --> 00:48:09,277
- Once we have that
ability to connect people
1034
00:48:09,277 --> 00:48:10,486
and they are able to be connected,
1035
00:48:10,486 --> 00:48:12,531
there's gonna be some genius, you know,
1036
00:48:12,531 --> 00:48:14,751
in some remote location who would never
1037
00:48:14,751 --> 00:48:16,332
have been discovered,
who would never have had
1038
00:48:16,332 --> 00:48:19,423
the capability to get to the education,
1039
00:48:19,423 --> 00:48:23,974
to get to the resources
that he or she needs and...
1040
00:48:26,059 --> 00:48:30,313
that young woman is
going to change the world
1041
00:48:30,313 --> 00:48:33,073
rather than just changing her village.
1042
00:48:33,694 --> 00:48:38,668
- The idea that that
genius will be able to find
1043
00:48:38,668 --> 00:48:41,423
his or her way into the greater culture
1044
00:48:41,423 --> 00:48:44,630
through the tiny, little two-by-two window
1045
00:48:44,630 --> 00:48:48,010
of a feature phone is very exciting.
1046
00:48:48,010 --> 00:48:51,221
- A billion people in India,
a billion people in China,
1047
00:48:51,221 --> 00:48:52,534
you're talking, you know,
1048
00:48:52,534 --> 00:48:54,451
500 million to a billion in Africa.
1049
00:48:54,451 --> 00:48:56,997
Suddenly the world has a lot more minds
1050
00:48:56,997 --> 00:49:01,575
connected in the simplest,
least expensive possible way
1051
00:49:01,575 --> 00:49:03,511
to make the world better.
1052
00:49:04,342 --> 00:49:05,969
- So you look at the
agricultural revolution
1053
00:49:05,969 --> 00:49:07,514
and the Industrial Revolution.
1054
00:49:07,514 --> 00:49:09,676
Is the Internet and
then the data revolution
1055
00:49:09,676 --> 00:49:11,605
associated with it of that scale?
1056
00:49:11,605 --> 00:49:13,197
It's certainly possible.
1057
00:49:13,197 --> 00:49:14,684
- I don't think there's any question that
1058
00:49:14,684 --> 00:49:16,289
we're at a moment in human history
1059
00:49:16,289 --> 00:49:19,043
that we will look back on
in 50 or a hundred years
1060
00:49:19,043 --> 00:49:24,048
and say right around 2000
or so it all changed.
1061
00:49:25,970 --> 00:49:27,724
And I do think we will date
1062
00:49:27,724 --> 00:49:32,323
before the explosion of data and after.
1063
00:49:32,323 --> 00:49:33,903
I don't think it's an
issue of climate change
1064
00:49:33,903 --> 00:49:36,414
or health or jobs, I
think it's all issues.
1065
00:49:36,414 --> 00:49:39,737
Everything has information
at its core, everything.
1066
00:49:39,737 --> 00:49:43,259
So if information
matters, then reorganizing
1067
00:49:43,259 --> 00:49:45,386
the entire information
network of the planet
1068
00:49:45,386 --> 00:49:47,721
is like wiring up the brain
of a two-year-old child.
1069
00:49:47,721 --> 00:49:49,186
Suddenly that child can talk
1070
00:49:49,186 --> 00:49:51,568
and think and act and behave, right.
1071
00:49:51,568 --> 00:49:55,032
The world is wiring up a
cerebral cortex, if you will,
1072
00:49:55,032 --> 00:49:57,228
of billions of connected elements
1073
00:49:57,228 --> 00:49:59,494
that are going to exchange
billions of ideas,
1074
00:49:59,494 --> 00:50:01,087
billions of points of knowledge,
1075
00:50:01,087 --> 00:50:04,047
and billions of ways of working together.
1076
00:50:04,047 --> 00:50:07,591
- Together, there becomes
an enormous wave of change
1077
00:50:07,591 --> 00:50:09,892
and that wave of change
is going to take us
1078
00:50:09,892 --> 00:50:13,861
in directions that we
can't begin to imagine.
1079
00:50:14,355 --> 00:50:18,690
- The ability to turn that
data into actionable insight
1080
00:50:18,690 --> 00:50:20,538
is what computers are very good at,
1081
00:50:20,538 --> 00:50:23,048
the ability to take action
is what we're really good at
1082
00:50:23,048 --> 00:50:26,244
and I think it's really
important to separate those two
1083
00:50:26,244 --> 00:50:28,928
because people conflate
them and get scared
1084
00:50:28,928 --> 00:50:31,056
and think the computers are taking over.
1085
00:50:31,056 --> 00:50:33,565
The computers are this extraordinary tool
1086
00:50:33,565 --> 00:50:37,610
that we have at our disposal
to accelerate our ability
1087
00:50:37,610 --> 00:50:38,946
to solve the problems that, frankly,
1088
00:50:38,946 --> 00:50:40,457
we've gotten ourselves into.
1089
00:50:40,457 --> 00:50:42,247
- I am fundamentally optimistic,
1090
00:50:42,247 --> 00:50:46,257
but I'm not blindly, foolishly optimistic.
1091
00:50:46,257 --> 00:50:48,964
You got to remember, the
financial crisis was brought to us
1092
00:50:48,964 --> 00:50:52,020
by big data people as well because
1093
00:50:52,020 --> 00:50:53,856
they weren't actually thinking very hard
1094
00:50:53,856 --> 00:50:55,853
about how do they create
value for the world.
1095
00:50:55,853 --> 00:50:57,074
They were just thinking about
1096
00:50:57,074 --> 00:51:00,110
how do they create value for themselves.
1097
00:51:00,661 --> 00:51:02,660
You know, we have a fair
amount of literature,
1098
00:51:02,660 --> 00:51:04,624
a fair amount of
understanding that if you take
1099
00:51:04,624 --> 00:51:07,538
more out of the ecosystem
than you put back in,
1100
00:51:07,538 --> 00:51:09,595
the whole thing breaks down.
1101
00:51:09,595 --> 00:51:13,733
That's why I think we have
to actually earn our future.
1102
00:51:13,733 --> 00:51:16,104
We can't just sort of
pat ourselves on the back
1103
00:51:16,104 --> 00:51:18,323
and think it's just going
to fall into our laps.
1104
00:51:18,323 --> 00:51:22,194
We have to care about what
kind of future we're making
1105
00:51:22,194 --> 00:51:23,959
and we have to invest in that future
1106
00:51:23,959 --> 00:51:26,168
and we have to make the right choices.
1107
00:51:26,168 --> 00:51:30,057
- It is, to me, paramount
that a culture understands,
1108
00:51:30,057 --> 00:51:31,975
our culture understands
1109
00:51:31,975 --> 00:51:36,980
that we must take this data thing as ours,
1110
00:51:38,102 --> 00:51:40,075
that we are the platform for it,
1111
00:51:40,075 --> 00:51:42,400
humans, individuals are
the platform for it,
1112
00:51:42,400 --> 00:51:45,083
that it is not something done to us,
1113
00:51:45,083 --> 00:51:49,012
but rather it is ours to do
with something as we wish.
1114
00:51:52,644 --> 00:51:54,842
When I was young, we landed on the moon
1115
00:51:54,842 --> 00:51:58,931
and so the future to me meant
going further than that.
1116
00:51:58,931 --> 00:52:00,697
We looked outward.
1117
00:52:00,697 --> 00:52:04,032
Today, I think there's a new energy around
1118
00:52:04,032 --> 00:52:06,497
the future and it has much more to do
1119
00:52:06,497 --> 00:52:08,960
with looking at where we are now
1120
00:52:08,960 --> 00:52:11,714
and the globe we stand on
1121
00:52:11,714 --> 00:52:14,977
and solving for that.
1122
00:52:14,977 --> 00:52:17,859
The tools that are in our hands now
1123
00:52:17,859 --> 00:52:20,483
are going to allow us to do that.
1124
00:52:20,483 --> 00:52:23,574
Now it's like no wait a
minute, this is our place
1125
00:52:23,574 --> 00:52:27,792
and we're going to figure
out how to make it blossom.
1126
00:52:27,792 --> 00:52:32,797
(dramatic music)
1127
00:53:15,806 --> 00:53:20,811
(mid tempo orchestral music)
91405
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.